Sum of an Arithmetic Progression
An arithmetic progression is a sequence of numbers where the difference between 2 successive numbers is constant. An example is: 1,2,3,4,5,6,7,8... This sequence is increasing by one each time. So, the more general definition for the ith term in an arithmetic progression, where d is the constant difference, and a_{0} is the first term in the sequence, is given by the formula:
There is a formula to find the sum of the first n terms but you need not remember it since you can derive it on the spot with common sense. Assuming the simple case where d=1, if you pick the first and last element in the sequence and sum them you get n+1. Ignoring these 2 items from the list, if you do the same thing again you get 2+(n1)=n+1, the same result. And if you do it again, you get the same result again: 3 + (n2) = n+1. So a pattern emerges  selecting and removing the first and last element in the sequence and summing them gives (n+1) so the sum of all of these must be that number, n+1, multiplied by the number of pairs you can extract from the sequence, which is in fact n/2. Hence the sum of the first n terms in this particular sequence is (n+1)*n/2. A similar process can be applied regardless of the value of d.
Another way to visualise the result is to write down the sequence of numbers, then just below it write down that same sequence but in reverse order. Now sum each vertical pair and you'll see the same pattern: n/2 lots of (n+1).
It's amazing how many mathematical puzzles and technical interview questions reduce to this simple summation! Don't believe me  well here's a sample of puzzles/questions that are all easily solved if you know the sum of an arithmetic progression:
 In a sequential search, what is the average number of comparisons it takes to search through n elements?
 Given 100 boxes each of which, except one, contains 100 x 10g weights, whilst the remaining box has 100 x 9g weights in it. You need to identify the box that is lighter than the others but you can only use the scales once. [Thanks to Phil Lancaster, a work colleague of mine for throwing this devious puzzle at me one day.]
 Prove that the worst case run time for a bubble sort is O(n^{2})
 You have an array of size n1 which is suppose to contain all the integers between 1 and n, without duplicates, except there is a missing number. You need to find the missing number using a constantspace algorithm
 You have an array of size n2 which is suppose to contain all the integers between 1 and n, without duplicates, except two numbers are missing. You need to find the missing numbers using a constantspace algorithm
Answers

If we assume that the answer to our search could be in any position from 1 to n, all with equal likelihood (i.e. a Uniform distribution, aka the distribution of maximum ignorance), then the average number of comparisons is equal to the sum of all possibilities divided by n. That is, the sum of the arithmetic progression above divided by n, which is (n+1)/2

Enumerative combinatorics tells us that we'd need, at most, log_{2}(n) weightings to identify which box has the lighter blocks  divide all remaining boxes into 2 halves, weigh one half of the boxes as a total unit, pick the lighter half, and repeat the process  but we are allowed only one! So consider redistributing the weights as follows: take all the weights out marking them with the box number they came from. Then, using one empty box, put 1 block from box1, 2 blocks from box2, 3 blocks from box3, ..., n1 blocks from box(n1) and n blocks from box(n). From our knowledge of the sum of an arithmetic progression we know what the total weight of the box, if all the blocks were 10g, would be 100/2*(100+1)*10g = 5050g. Now, box(j) contributes j blocks each of which is 10 grams for a total contribution of 10.j grams. If box(j) is the lighter box then it will only contribute 9.j grams, a difference of exactly j grams. Blocks from all other box will not deviate from there expected weights since only 1 box originally had lighter blocks. Given this, it follows that the weight difference between actual weight of the specially prepared box and theoretical total weight using our formula always gives the precise box number of the original light box.

The bubble sort uses pairwise comparisons to sort lists. The worst case for the bubble sort is when the list is in strictly descending order when you are trying to put it into ascending order. In this case the algorithm compares the first element to (n1) other elements, and then compares the second element to (n2) elements to it's right, and then compares the third element to (n3) elements to it's right, etc, etc. So in total the number of comparisons is (n1) + (n2) + (n3) + (n4) + ...+ 2 + 1 = n/2(n1). Expanding this out to polynomial form it has it's highest power of n being 2, hence complexity theory classes this as O(n^{2}).

This seems simple at first  just iterate through the list updating an array to indicate which items you have, but this approach does not meet the constant space limitation imposed in the wording of the question. It would require O(n) space. i.e the space required grows linearly with the size of the list.
But if we allocate some space to store a running total which we update as we iterate through the list, once we reach the end of the list we can compare the running total with the theoretical total that we can calculate using, you guessed it, the sum of an arithmetic progression we can easily identify the missing integer. This approach has constant space complexity.

This problem is an extension of the previous one. In the previous case we had a single equation with a single unknown that was easily solved. Now we have a single equation with 2 unknown numbers. If you remember anything about solving simultaneous equations you'll know this is not a good situation. However, once again we can make use of some simple algebra to help out. Instead of summing all the terms we can calculate the product of them. The theoretical value for this, n!, is known assuming n isn't too large, so we can compare the calculated value to the theoretical value and now we have 2 equations which we can use to solve for the 2 unknowns, and we've used only constant space complexity to do it. Viola!
26 Mar 2009 Damien Wintour 2 comments