A work colleague recently asked me about efficient inner join algorithms that could be applied to in-memory data collections. My mind immediately went back to the common algorithms that most modern databases usually employ:

• Nested Loop Join
• Sort-Merge Join
• Hash Join

Let's explain each one of these using some simple C# 4.0 code to illustrate. First, consider we have 2 lists of key-value pairs. The first is a list of customerID and name. The second list is a list of customerID and item purchased:

```            // a list of customers with corresponding customerID
var names = new List<KeyValuePair<int,string>>();

// a list of things the customers purchased. integer value is customerID
var items = new List<KeyValuePair<int, string>>();

Nested loop joins are the simplest of all joins. It is an extremely naive algorithm that loops over one data set in an outer loop and loops over the data set to join on in an inner loop (assuming the join involves only 2 items). For an inner (equi-) join, when it finds matches on the join key it outputs the resulting tuples. This approach has worst case run time complexity of O(A.B) where A and B are the sizes of the lists/tables being joined, hence it's only feasible when the lists are fairly small. Here's an example of it in C#:

```        static void NestedLoopJoin(List<KeyValuePair<int, string>> people, List<KeyValuePair<int, string>> purchases)
{
foreach (var n in people)
foreach (var i in purchases)
if (n.Key == i.Key)
Console.WriteLine("Name index {0}, {1}, bought some {2} ", n.Key, n.Value, i.Value);
}```

As the name implies, the sort-merge join starts by sorting both lists/tables by the join key and then doing an interleaved linear scan of both join key sets. Because the key sets of the join tables are both sorted the algorithm knows when no more inner joins are possible for a given join key and can advance the set pointers accordingly. Of course, the downside is that you have to pre-sort the data if it isn't already sorted. [On the flip side, it's a good candidate when the data is already sorted for you!]

```        static void SortMergeJoin(List<KeyValuePair<int, string>> people,
List<KeyValuePair<int, string>> purchases)
{
// first we sort both lists
people.Sort(Compare);
purchases.Sort(Compare);

// then we do an interleaved linear scan to merge the sorted lists
int i = 0, j = 0;

while(i<people.Count & j<purchases.Count)
{
if (people[i].Key < purchases[j].Key)
i++;
else if (people[i].Key > purchases[j].Key)
j++;
else   // must be a match
{
Console.WriteLine("Name index {0}, {1}, bought some {2} ",
people[i].Key, people[i].Value, purchases[j].Value);
j++;
}
}
}

static int Compare(KeyValuePair<int, string> a, KeyValuePair<int, string> b)
{
if (a.Key > b.Key)
return 1;

if (a.Key == b.Key)
return 0;

return -1;
}
```

Hash joins involve creating a hash table up front, in memory, for the smaller list/table in the join. You hash on the join keys found in the smaller table so that you can loop over the larger table in the join and for each encountered join key take the hash of it and use this to interrogate the pre-prepared hash table for joining rows. The nice thing about this approach is that hash-based lookup component is a constant time operation. Hash joins excel when at-least one of the join tables has no index or is not sorted.

```        static void HashJoin(List<KeyValuePair<int, string>> people,
List<KeyValuePair<int, string>> purchases)
{
var peopleHash = new Dictionary<int,string>();
foreach(var n in people)