Archive for October, 2008

Uncategorized

Optimal File Arrangement Puzzle

Puzzle: Suppose you have n files that must be stored on a storage device that only supports sequential access, such as a tape drive. Because the device only supports sequential access, the time to read the i-th file on the device includes time needed to skip over all files stored before it. Assuming each file is equally likely to be needed how would you arrange the files on the device?

Solution: First let's define some notation. Let size[1.. n] be an array listing the file size of each file to be stored, so size[i] is the size of the i-th file. If the files are stored in order from 1 to n, then the cost of accessing the k-th file is the sum of accessing all previous files plus the k-th file:

Because each file is equally likely to be accessed, the expected cost for retrieving a file at random is:

But this is for a particular ordering (1..n). If we were to change the position of files around the cost of accessing some files would become more expensive and some others might become less expensive to access. To examine this let us denote a file storage permutation, f, whereby f(i) is the index of the file stored at position i in the ordering. Since the problem as stated asks us to identify the optimal f let's restate the expected cost of a permutation f.

We defined our problem formally, but by now it should be intuitive what the solution is: store the files in order from the shortest to longest. But how do we prove it? The answer, my friend, is proof by contradiction. Let's assume that our proposed solution is wrong. In that case we would have some file pair not in ascending order. That is, size[f(i)] > size[f(i+1)]. Now consider if we were to swap the position of these files. The cost of accessing f(i) would increase by size[f(i+1)], and the cost of accessing f(i+1) would decrease by size[f(i)]. Thus the net change to the expected cost for the permutation f would be (size[f(i+1)] - size[f(i)])/n. But since size[f(i)] > size[f(i+1)] the net change would be negative indicating an improvement. Hence, if any file pair is not in non-decreasing order we can improve the overall expected access time by swapping them. QED.

Uncategorized

Birthday Paradox # 1

Problem: There are 37 students in a classroom. If it was paying even money, would you bet that at least 4 of them would have their birthday in the same month?

Source: Adapted from a puzzle on Thoughts of a Rambler.

Solution:

Firstly, assume the worst case (from your betting perspective) where all the students have birthdays in different months. This is your worst case scenario because it reduces the changes of one month having a cluster of4 or more students. But there are only 12 months in the year so the most evenly distributed assignment of students to birthday months is 3 in each plus 4 in one month, given 3x11 + 4 = 37. In other words, there is no possible distribution of 37 students to 12 months which does not produce at least 1 month which has 4 or more students assigned to it. Given that you cannot lose on this bet - so bet the house!!! Remember - it's not gambling if you know the odds are in your favour!