Uncategorized

Concurrent Programming: The Basics

From number of years, microprocessor designers have been focusing on increasing core count rather than individual clock rate, although we are still seeing Moore's Law. As multi-core processors become more prevalent, programs need to be multi-threaded to take advantage of the parallel processing capabilities on offer. But concurrent programming is hard. Damn hard. One solution to this is to use functional languages like Erlang, Haskell or F# to make concurrent programming easier for the developer since they often employ immutable "variables" which makes reasoning about state management significantly easier (read: there are no side effects). However old-school folks who have to use their current language (C++/Java/C#) can use threads in order to achieve concurrency, as long as they also employ various constructs/patterns to manage synchronisation of threads.

It might seem obvious but you need to remember that concurrency is for performance, and there are generally 3 ways in which concurrent execution can improve performance:

  1. to reduce latency (making the operation execute faster);
  2. to hide latency (avoiding blocking other operations whilst slow operations execute); or
  3. to increase throughput (making the system perform more work per unit time).

Reducing Latency
Reducing latency requires that the unit of work under investigation be "parallelizable". That is, you need to find an algorithm that lets you divide the work into chunks that can execute in parallel, and also a way of putting the chunks back together to complete the work. As well, when you divide the work up into smaller chunks you need to ensure that the cost of communication between the processing sub-components, and re-assembly of any interim processing doesn't outweigh the benefit of having parallel operations. The same problems are faced by proponents of grid/cloud computing operations.

An example of using concurrency to reduce latency is a database engine servicing a query. At a certain point it knows what data pages it needs and it know which of these are already in memory and which are not - in which case we need to fetch them from disk. The database engine can spawn multiple threads to read different data pages off disk into memory. This parallel loading reduces the time taken by the query.

Obviously, the degree to which one can use concurrency to reduce latency is dependent upon how "parallelizable" the problem is.

Hiding Latency
If the operation is long-running and not amenable to parallelization then concurrent execution can instead be used to go ahead with other tasks while the long-running task is done on another thread. Here we aren't reducing the latency of the task we are just hiding it.

This sort of concurrency is often done in UI development where we wish to improve the perceived responsiveness of the UI so we can employ asynchronous background threads to perform certain operations.

Increasing Throughput
Here, instead of using parallel operations to make a single operation faster (case 1 above) we are employing multiple concurrent operations to get more operations processed per unit time. For example say we have a service/daemon running on a server that watches the file system and when it sees an mp3 file in a certain directory it fetches it and does some transcoding operations. If the server is a multi-core machine it's beneficial to have to daemon use a pool of X threads where each thread is given a single file to process. This increases the throughput of the daemon by virtue of concurrent operations.

In a future post I'll dive into details of the some locking constructs and also some of the patterns that make concurrent programming easier.

Uncategorized

Sorting Fundamentals: Part 2

In Part 1 of this series I detailed the workings and complexity of the O(n2) sorting algorithms - Bubble, Selection and Insertion - giving implementations in both Java and C#. Let's now look at some of the O(n log2 n) algorithms that perform better.

MergeSort

The mergesort is a divide and conquer algo. It splits the input in half then sorts each half separately before merging the results back together. However when it needs to sort each half, it employs the same technique of splitting and merging. Hence it is a recursive algorithm. It is based on 2 important facts: 1) it's simpler to sort a smaller list than a larger list, and (2) merging 2 sorted lists is not computationally intensive. Here is the non-generic implementation in Java:

   1:      public static void mergeSortInPlace(int[] input, int startIndex, int endIndex)
   2:      {
   3:          if (endIndex - startIndex <= 0)
   4:              return;
   5:          
   6:          // split the input into 2 halves
   7:          int half = (endIndex - startIndex + 1)/2;
   8:          
   9:          mergeSortInPlace(input, startIndex,startIndex + half -1);
  10:          mergeSortInPlace(input, startIndex + half, endIndex);
  11:          mergeInPlace(input, startIndex, startIndex + half, endIndex);
  12:      }
  13:      
  14:      private static void mergeInPlace(int[] input, int startA, int startB, int endB)
  15:      {
  16:          int[] c = new int[endB-startA+1];
  17:          int ap=startA,bp=startB,cp=0;
  18:          while (cp < c.length)
  19:          {
  20:              if (bp > endB || (ap <= (startB-1) && input[ap] < input[bp]))
  21:              {
  22:                  c[cp]=input[ap];
  23:                  ap++;
  24:              }
  25:              else
  26:              {
  27:                  c[cp]=input[bp];
  28:                  bp++;
  29:              }
  30:              cp++;
  31:          }
  32:          
  33:          for(int i=startA; i <= endB; i++)
  34:              input[i] = c[i-startA];
  35:      }

Here's the generic C# implementation:

   1:          public static void MergeSortInPlace<T>(T[] input)
   2:              where T : IComparable<T>
   3:          {
   4:              MergeSortInPlace(input, 0, input.Length-1);
   5:          }
   6:   
   7:          public static void MergeSortInPlace<T>(T[] input, int startIndex, int endIndex)
   8:              where T:IComparable<T>
   9:          {
  10:              if (endIndex - startIndex <= 0)
  11:                  return;
  12:   
  13:              // split the input into 2 halves
  14:              int half = (endIndex - startIndex + 1) / 2;
  15:   
  16:              MergeSortInPlace(input, startIndex, startIndex + half - 1);
  17:              MergeSortInPlace(input, startIndex + half, endIndex);
  18:              MergeInPlace(input, startIndex, startIndex + half, endIndex);
  19:          }
  20:   
  21:          private static void MergeInPlace<T>(T[] input, int startA, int startB, int endB)
  22:              where T:IComparable<T>
  23:          {
  24:              var c = new T[endB - startA + 1];
  25:              int ap = startA, bp = startB, cp = 0;
  26:              while (cp < c.Length)
  27:              {
  28:                  if (bp > endB || (ap <= (startB - 1) && (input[ap].CompareTo(input[bp]) < 0)))
  29:                  {
  30:                      c[cp] = input[ap];
  31:                      ap++;
  32:                  }
  33:                  else
  34:                  {
  35:                      c[cp] = input[bp];
  36:                      bp++;
  37:                  }
  38:                  cp++;
  39:              }
  40:              for (int i = startA; i <= endB; i++)
  41:                  input[i] = c[i - startA];
  42:          }

The merge routine is somewhat complicated by the array bounds checking but in essence all it is doing is comparing the head of 2 sequences and returning the minimum of these until no items are left. When either input list is exhausted, the remainder of the other list is returned.

Run Time Complexity

The best way to understand the run time complexity of the mergesort is to see the problem as a binary tree of sub-problems of reducing size.

Since we are splitting in half at each branch the depth of the tree is log2(n). Because no items have been removed or added to the original array - it's merely been partitioned - the total number of comparisons done at each level of the tree is ~O(n). [In fact it's interesting to note that our sort procedure doesn't do any comparisons - only the merge procedure does comparisons.] Thus it follows that the worst case time complexity is ~O(n.log2n). This is also true for the average case.

For the mathematically minded, an alternate way to derive the run time complexity is to examine the recurrence relationship. This is possible because mergesort is a recursive algorithm. So if we let T(n) represent the time it takes to sort n items, the recipe of mergesort yields the following recurrence:

Note that the final term is n because the complexity of merging two sorted list is proportional to the total number of elements across both lists. Of course, to be a recurrence we must specify a base case. Because it's easy to sort/merge a list of size 1, we know T(1) = 0.

Now we can use the master theorem from Introduction to Algorithms [Cormen et.al] to derive the asymptotic complexity. In the above recurrence, a=2, b=2, f(n) = n, according to the generic formula give by Cormen et al. Since logba =log22=1 and f(n) ~ O(n) this is in fact a type 2 case thus T(n) is bounded by O(n.log2n). Check out page 73-75 of Introduction to Algorithms for full details. It a fairly handy rule to make sense of recurrence relationships.

Space Complexity

There are numerous implementations of the mergesort so one has to be careful when defining the space complexity since it is invariably implementation-specific. The implementations given above in both Java and C# are in-place however, since mergesort is a recursive algorithm we are going to be pushing activation records, or stack frames, onto the stack repeatedly. In fact the growth rate of call stack memory is O(log n) since the number of recursive calls made is proportional to the tree depth. Each of these activation records contains a fixed amount of data, some local variables and a return address, but none the less the total size consumed has a defined growth rate. As well, the merge procedure uses auxiliary memory to do the merge. [Doing an in-place merge on contiguous memory in an array is not easy, but it is in a linked list which is why mergesort works better on lists.] The extra memory required for the merge is, at worst, n hence the total space needed for a merge sort is: 2n + log(n). Since n will dominate log(n) in this function, and we need not worry about coefficients when stating asymptotic upper bounds, we say the space complexity ~ O(n).

Uses and Optimisations

Because it employs a divide-and-conquer approach, mergesort is an excellent algorithm to use in situations where the input cannot fit into main memory but must be stored in blocks on disk. Knuth, in the new edition of The Art of Computer Programming, Vol. 3: Sorting and Searching, identified a technique - which he called "randomized striping" - as the method of choice for sorting with parallel disks/reads. This is an slightly modified version of mergesort optimised for use when the data to be sorted is to be read from multiple disks.

If you have multiple cores to play with, the merge sort algorithm can be further optimised since the conventional algorithm has poor performance due to the successive reduction of the number of participating processors by half, and down to one in the last merging stage. An optimised parallel merge sort would utilize all processors throughout the computation to achieve a speedup of (P-1)/log P where P is the number of processors.

Quick-Sort

Quicksort is another very popular divide-and-conquer sorting algorithm. It works be selecting a pivot item, shuffling "lesser" items to positions below the pivot, and "higher" items to positions above the pivot. Leaving the pivot in place, it then recursively does the same procedure to the "lesser" and "greater" sub-lists created when the input was partitioned about the pivot.

Here is my version of quick sort in non-generic Java:

   1:      public static void quickSort(int[] input, int start, int finish)
   2:      {
   3:          if (finish-start < 1)
   4:              return;
   5:          
   6:          //pick a pivot
   7:          int pivotIndex = (finish+start)/2;
   8:          
   9:          //partition about the pivot
  10:          int i=start, j=finish;
  11:          while(i<=j)
  12:          {
  13:              while(input[i] < input[pivotIndex])
  14:                  i++;
  15:              
  16:              while(input[j] > input[pivotIndex])
  17:                  j--;
  18:              
  19:              if(i<=j)
  20:              {
  21:                  int temp = input[i];
  22:                  input[i]=input[j];
  23:                  input[j]=temp;
  24:                  i++;
  25:                  j--;
  26:              }
  27:          }
  28:          
  29:          if (start < i-1)
  30:              quickSort(input, start, i-1);
  31:          
  32:          if (i < finish)
  33:              quickSort(input, i, finish);        
  34:      }

And here it is in a generic C# flavour:

   1:          public static void QuickSort<T>(T[] input)
   2:                      where T : IComparable<T>
   3:          {
   4:              QuickSort(input, 0, input.Length - 1);
   5:          }
   6:   
   7:          public static void QuickSort<T>(T[] input, int start, int finish)
   8:              where T:IComparable<T>
   9:          {
  10:              if (finish - start < 1)
  11:                  return;
  12:   
  13:              //pick a pivot
  14:              int pivotIndex = (finish + start) / 2;
  15:   
  16:              //partition about the pivot
  17:              int i = start, j = finish;
  18:              while (i <= j)
  19:              {
  20:                  while (input[i].CompareTo(input[pivotIndex]) < 0)
  21:                      i++;
  22:   
  23:                  while (input[j].CompareTo(input[pivotIndex]) > 0)
  24:                      j--;
  25:   
  26:                  if (i <= j)
  27:                  {
  28:                      T temp = input[i];
  29:                      input[i] = input[j];
  30:                      input[j] = temp;
  31:                      i++;
  32:                      j--;
  33:                  }
  34:              }
  35:   
  36:              if (start < i - 1)
  37:                  QuickSort(input, start, i - 1);
  38:   
  39:              if (i < finish)
  40:                  QuickSort(input, i, finish);
  41:          }

Run Time Complexity

Like the mergesort, quicksort has 2 components: partitioning about a pivot, and recursion. The first of these, partitioning, is clearly a O(n) operation for a set of size n since we need to scan through all items and perform (n-1) comparisons. If this partition splits the set of items upon which it operates in half each and every time then there will be log2(n) recursive calls, leading to a run time complexity of O(n.log2 n). This is the best case run time complexity for quicksort. However, the partitioning phase might not split the items exactly in half. In the worst case it might select a pivot that has all other (n-1) values above or below it. If such a 1:(n-1) split were to happen on every pass we would have recursion n-levels deep rather than log(n) deep and the total number of comparisons performed would be (n-1) + (n-2) + (n-3)+...+ 1 which, by our knowledge of arithmetic progressions, results in run time complexity of O(n2). That is the worst case run time complexity for quicksort. The chances of this partitioning happening are pretty remote though. Each time a pivot is selected it would have to be the maximum or minimum value in the set of values. Other implementations just pick the first item in the set to be the pivot value, but in this case, if the data is already sorted, you will encounter this repeated 1:(n-1) partitioning and thus worst-case quadratic time complexity.

So we've defined the best-case and the worst-case, what is the average-case I hear you ask. It turns out that the average case run time complexity if also ~O(n.log2 n). To derive this you need to solve the recurrence relationship that considers all possible choices of the pivot selection but rather than detail that here, you can see the details of this analysis on wikipedia, or a more complete description in Section 7.4 of Introduction to Algorithms [Cormen et al].

Space Complexity

As with the mergesort, the space complexity of quicksort is determined by the amount of auxiliary memory needed for interim workings, plus the memory required for the recursive call stack. Again it's implementation specific, but in the implementation above, the partitioning is effectively done in-place so apart from the initial memory required to store the original input, the memory required is purely for the call stack. This implies the space complexity of quicksort is O(log2 n) for the average and best case. In the worst case, as stated earlier, quicksort makes n recursive calls so that would indicate a O(n) space complexity in that case. However, if you were to transform the algorithm from a recursive one to an iterative one the call stack memory component would be considerably reduced. In certain circumstances your compiler will be able to do this automatically for you, assuming you are using a statically-typed language. This is called tail recursion and if it can be done then the worst case space complexity reduces to O(log2 n) from O(n).

To be continued...

UPDATE (Oct 2009):
See these algorithms in F# here.

Uncategorized

Sorting Fundamentals: Part 1

Ever since computers were invented people have been interested in sorting collections of items. It's useful in itself plus it is also a pre-requisite for efficiently implementing other functionality like searching and merging.

There are a myriad of different ways to implement sorting each with it's pros and cons. Whilst many programmers consider sorting a solved problem, it's still useful to understand the fundamentals so that you are able to make the necessary trade-offs when faced with a decision about which sorting technique to use (or are curious about which technique the major frameworks use in their sorting routines). Accordingly, the paragraphs that follow discuss the more well-known sorting techniques giving implementation details in both Java and C#, and noting their efficiency using complexity theory ("Big O notation").

Bubble Sort

Perhaps the most (in)famous sorting algorithm is the bubble sort. This algo uses pairwise comparisons to shuffle "bigger" items to the right (or "smaller" items to the left). Unless the data is already sorted, a bubble sort needs to make multiple passes through the data set to sort it.

In Java, this is how it can be implemented:

   1:      public static void bubbleSort(int[] input)
   2:      {
   3:          for(int i=input.length-1; i>=0; i--)
   4:          {
   5:              int swaps=0;
   6:              for(int j=0; j<i ;j++)
   7:              {
   8:                  if (input[j] > input[j+1])
   9:                  {
  10:                      int temp = input[j];
  11:                      input[j] = input[j+1];
  12:                      input[j+1] = temp;
  13:                      swaps++;
  14:                  }
  15:              }
  16:              if (swaps==0)
  17:                  return;
  18:          }
  19:      }

Note that the input is an array of integers. We use the primitive type here because unlike C# which has a unified type system, Java has different constructs for primitives and class types, and regrettably the performance characteristics of the non-primitive types is not as good as the primitive types. Hence we use an array of primitives here.

Generally we are interested in two performance characteristics: asymptotic running time, and space complexity. Other considerations such as stability are sometimes important but here we'll focus on run time and space complexity. [Stability means the equivalent elements retain their relative positions, after sorting. In fact, any sorting algorithm can be made stable at the cost of O(n) extra space, and moderately increased run time.]

Run Time Complexity

For run time complexity we are concerned with the asymptopic upper bound on the number of comparisons needed. From the code above, it should be obvious that for an input array of length n, in the worst case a total of (n-1) passes are made and the total number of comparisons needed is: (n-1) + (n-2) +... + 2 + 1. This is the sum of an arithmetic progression and reduces to n*(n-1)/2. This expands to a polynomial with n2 being the highest term. Hence we say the running time is ~O(n2). We don't worry about constant factors, the co-efficients or other lower-order terms - it's the dominant term that concerns us. In this case, a quadratic running time is not particularly efficient when n is large. The best case occurs when the input is already sorted in which case the swap counter check in lines 16 and 17 ensures that we only need to do n-1 comparisons which leads to running time ~O(n).

Apart from the best and worst cases, it is also useful to understand the average case. The average case running time is the expected running time when the input is randomly drawn from a given distribution (hence you need to ask yourself "what input distribution"). Thus knowledge of all possible input sequences, the probability of these sequences occurring, and the individual running times for all these possible sequences is required to obtain the average case running time. In general we assume the Uniform distribution, since in the absence of other information we assume all possible sequences are equally likely. This is why the uniform distribution is also referred to as the distribution of maximum ignorance.

To derive the average case for the bubble sort we need to identify the total number of comparisons required for the n-1 passes and then divide this sum by n-1. This is because the algorithm bails out as soon as it has one pass in the inner loop that has no swaps. That indicates that the remainder of the list is sorted and no further comparisons are needed. We are, of course, assuming that we have an equal chance of bailing out after each of the n-1 passes in the outer loop. The maths of this calculation is easily done if we represent the comparison count for each scenario. Assuming n=5, here is a table showing the comparison count for each of the 4 passes.

Comparisons Needed
Passes Needed 1st Pass 2nd Pass 3rd Pass 4th Pass Total
1 n-1 n-1
2 n-1 n-2 (n-1) + (n-2)
3 n-1 n-2 n-3 (n-1) + (n-2) + (n-3)
4 n-1 n-2 n-3 n-4 (n-1) + (n-2) + (n-3) + (n-4)

So to sum all of these it is easiest to consider each column and realise that the sum of the far right column yields a polynomial whose highest order of n is 2. Thus the average run time complexity is also ~O(n2).

With an average case and worst case run time complexity of O(n2), clearly the bubble sort is not at all efficient. Really, you should never use this sort. It's included here merely for completeness. The chart below shows, even for small cases of n (shown on the x-axis), how much faster quadratic run time (the green line) grows compared to other growth rates.

Space Complexity

The bubble sort code does an in-place sort, meaning that only a small, fixed amount of additional memory is required and the original item array is used to store the sorted array. Thus we say that the worst case space complexity ~ O(n) with O(1) auxiliary memory. This is as efficient as it gets.

Other Languages

The multi-lingual reader will note that the above Java code works without change in C#. However, in C# we can make use of generics without taking much of a performance hit like we do in Java when we switch from value types (primitives) to reference types. The code shown below employs a generic constraint on the open generic type to ensure that it implements the generic IComparable interface. This enables the same code to operate on both reference and value types, as long as they implement this interface - which all of the primitive types do in C#.

   1:          public static void BubbleSort<T>(T[] input) where T:IComparable<T>
   2:          {
   3:              for (var i = input.Length - 1; i >= 0; i--)
   4:              {
   5:                  var swaps = 0;
   6:                  for (var j = 0; j < i; j++)
   7:                  {
   8:                      if (input[j].CompareTo(input[j + 1]) > 0)
   9:                      {
  10:                          T temp = input[j];
  11:                          input[j] = input[j + 1];
  12:                          input[j + 1] = temp;
  13:                          swaps++;
  14:                      }
  15:                  }
  16:                  if (swaps == 0)
  17:                      return;
  18:              }
  19:          }

Of course, we can do a generic version in Java, but as mentioned earlier, the performance of reference types compared to primitives is not great:

   1:      public static <T extends Comparable<T>> void bubbleSort(T[] input)
   2:      {
   3:          for(int i=input.length-1; i>=0; i--)
   4:          {
   5:              int swaps=0;
   6:              for(int j=0; j<i ;j++)
   7:              {
   8:                  if (input[j].compareTo(input[j+1]) > 0)
   9:                  {
  10:                      T temp = input[j];
  11:                      input[j] = input[j+1];
  12:                      input[j+1] = temp;
  13:                      swaps++;
  14:                  }
  15:              }
  16:              if (swaps==0)
  17:                  return;
  18:          }
  19:      }

Selection Sort

The selection sorting algorithm examines all of the unplaced items, picks out the minimum (or maximum) item, places it, then repeats this cycle until all items have been placed. To "place the item" we swap it with the (unplaced) item in the position that we are trying to fill. In Java, it looks like this:

   1:      public static void selectionSort(int[] input)
   2:      {
   3:          for(int i=0; i<input.length; i++)
   4:          {
   5:              int minIndex = i;
   6:              for(int j=i+1; j<input.length; j++)
   7:              {
   8:                  if (input[j] < input[minIndex])
   9:                      minIndex =j; 
  10:              }
  11:              if (minIndex != i)
  12:              {
  13:                  int temp = input[i];
  14:                  input[i] = input[minIndex];
  15:                  input[minIndex]=temp;
  16:              }
  17:          }
  18:      }

Or if you want the generic version in Java:

   1:      public static<T extends Comparable<T>> void selectionSort(T[] input)
   2:      {
   3:          for(int i=0; i<input.length; i++)
   4:          {
   5:              int minIndex = i;
   6:              for(int j=i+1; j<input.length; j++)
   7:              {
   8:                  if (input[j].compareTo(input[minIndex]) < 0)
   9:                      minIndex =j; 
  10:              }
  11:              if (minIndex != i)
  12:              {
  13:                  T temp = input[i];
  14:                  input[i] = input[minIndex];
  15:                  input[minIndex]=temp;
  16:              }
  17:          }
  18:      }

And lastly in C#:

   1:          public static void SelectionSort<T>(T[] input) where T:IComparable<T>
   2:          {
   3:              for(int i=0; i<input.Length; i++)
   4:              {
   5:                  int minIndex = i;
   6:                  for(int j=i+1; j<input.Length; j++)
   7:                  {
   8:                      if (input[j].CompareTo(input[minIndex]) < 0)
   9:                          minIndex =j; 
  10:                  }
  11:                  if (minIndex != i)
  12:                  {
  13:                      T temp = input[i];
  14:                      input[i] = input[minIndex];
  15:                      input[minIndex]=temp;
  16:                  }
  17:              }
  18:          }


Run Time Complexity

As you can see this uses nested loops leading to a worst case running time complexity ~ O(n2). You cannot take any short cuts when you need to find the minimum (or maximum) item in a collection, hence the best and average running times are also ~ O(n2) using our knowledge of arithmetic progressions (on the first pass you need to compare n-1 items to get the minimum/maximum, on the next pass n-2 items, then n-3, etc). So this sort isn't any better, asymptotically, than the bubble search. i.e it's not particularly efficient either.

Space Complexity

Again the code does an in-place sort, implying that the worst case space complexity ~ O(n) with O(1) auxiliary memory.

Insertion Sort

This approach works by inserting each item into an already sorted sub-list. With careful implementation it can be implemented in-place as follows (in Java):

   1:      public static void insertionSort(int[] input)
   2:      {
   3:          for(int i=1; i<input.length; i++)
   4:          {
   5:              int current = input[i];
   6:              int j=i-1;
   7:              while(j>=0 && current < input[j])
   8:              {
   9:                  input[j+1] = input[j];
  10:                  j--;
  11:              }
  12:              input[j+1] = current;
  13:          }
  14:      }

And if you prefer a generic Java variety:

   1:      public static<T extends Comparable<T>> void insertionSort(T[] input)
   2:      {
   3:          for(int i=1; i<input.length; i++)
   4:          {
   5:              T current = input[i];
   6:              int j=i-1;
   7:              while(j>=0 && (current.compareTo(input[j]) < 0))
   8:              {
   9:                  input[j+1] = input[j];
  10:                  j--;
  11:              }
  12:              input[j+1] = current;
  13:          }
  14:      }

And for completeness, here it is in C#:

   1:          public static void InsertionSort<T>(T[] input) where T:IComparable<T>
   2:          {
   3:              for(int i=1; i<input.Length; i++)
   4:              {
   5:                  T current = input[i];
   6:                  int j=i-1;
   7:                  while(j>=0 && (current.CompareTo(input[j]) < 0))
   8:                  {
   9:                      input[j+1] = input[j];
  10:                      j--;
  11:                  }
  12:                  input[j+1] = current;
  13:              }
  14:          }


Run Time and Space Complexity

This also has quadratic worst case running time, O(n2). As to be expected, the best case happens when the input is already sorted. In that case the inner while loop will only ever do 1 comparison which implies O(n) running time for the best case.

In this case, perhaps the hardest metric to identify is the average case. To derive the average case run time complexity first consider that the outer loop must perform n-1 iterations. On the first iteration of the outer loop there is 1 item in the sorted sub-list, the index 0 item, that we will compare the current item, the index 1 item, to. On the second iteration we have 2 items, the index 0 and 1 items, in the sorted sub-list that we compare to the current item, the index 2 item. Thus we can say that on pass j of the outer loop we will have j items in the sorted sub-list. Thus the number of comparisons required in the inner while loop could be anything in the set {1,2,3,...,j}. The average number of comparisons for pass j is given by:




So the overall average number of comparisons is given by:


Thus the average case run time is also quadratic.

With respect to space complexity, again the analysis is pretty easy. Because it is in-place the worst case space complexity ~ O(n) with O(1) auxiliary memory.

Although the insertion sort is efficient when the input is "almost sorted", for all practical purposes none of these sort techniques are particularly good when the array/collection to sort is "large". There is a family of algorithms that employ divide-and-conquer techniques that have significantly better run time complexity than those mentioned above. More on those in Part 2...

Next »