## Optimal File Arrangement Puzzle

**Puzzle:** Suppose you have *n* files that must be stored on a storage device that only supports sequential access, such as a tape drive. Because the device only supports sequential access, the time to read the *i*-th file on the device includes time needed to skip over all files stored before it. Assuming each file is equally likely to be needed how would you arrange the files on the device?

**Solution:** First let's define some notation. Let size[1.. *n*] be an array listing the file size of each file to be stored, so size[*i*] is the size of the* i*-th file. If the files are stored in order from 1 to *n*, then the cost of accessing the *k*-th file is the sum of accessing all previous files plus the * k*-th file:

Because each file is equally likely to be accessed, the *expected cost* for retrieving a file at random is:

But this is for a particular ordering (1..*n*). If we were to change the position of files around the cost of accessing some files would become more expensive and some others might become less expensive to access. To examine this let us denote a file storage permutation, **f**, whereby f(*i*) is the index of the file stored at position *i* in the ordering. Since the problem as stated asks us to identify the optimal **f **let's restate the expected cost of a permutation f.

We defined our problem formally, but by now it should be intuitive what the solution is: *store the files in order from the shortest to longest*. But how do we prove it? The answer, my friend, is *proof by contradiction*. Let's assume that our proposed solution is wrong. In that case we would have some file pair not in ascending order. That is, size[f(*i*)] > size[f(*i*+1)]. Now consider if we were to swap the position of these files. The cost of accessing f(*i*) would increase by size[f(*i*+1)], and the cost of accessing f(*i*+1) would decrease by size[f(*i*)]. Thus the net change to the expected cost for the permutation **f** would be (size[f(*i*+1)] - size[f(*i*)])/n. But since size[f(*i*)] > size[f(*i*+1)] the net change would be negative indicating an improvement. Hence, if any file pair is not in non-decreasing order we can improve the overall expected access time by swapping them. QED.

*13 Oct 2008*
*Damien Wintour*
0 comments