Uncategorized

## Optimal File Arrangement Puzzle

Puzzle: Suppose you have n files that must be stored on a storage device that only supports sequential access, such as a tape drive. Because the device only supports sequential access, the time to read the i-th file on the device includes time needed to skip over all files stored before it. Assuming each file is equally likely to be needed how would you arrange the files on the device?

Solution: First let's define some notation. Let size[1.. n] be an array listing the file size of each file to be stored, so size[i] is the size of the i-th file. If the files are stored in order from 1 to n, then the cost of accessing the k-th file is the sum of accessing all previous files plus the k-th file:

Because each file is equally likely to be accessed, the expected cost for retrieving a file at random is:

But this is for a particular ordering (1..n). If we were to change the position of files around the cost of accessing some files would become more expensive and some others might become less expensive to access. To examine this let us denote a file storage permutation, f, whereby f(i) is the index of the file stored at position i in the ordering. Since the problem as stated asks us to identify the optimal f let's restate the expected cost of a permutation f.

We defined our problem formally, but by now it should be intuitive what the solution is: store the files in order from the shortest to longest. But how do we prove it? The answer, my friend, is proof by contradiction. Let's assume that our proposed solution is wrong. In that case we would have some file pair not in ascending order. That is, size[f(i)] > size[f(i+1)]. Now consider if we were to swap the position of these files. The cost of accessing f(i) would increase by size[f(i+1)], and the cost of accessing f(i+1) would decrease by size[f(i)]. Thus the net change to the expected cost for the permutation f would be (size[f(i+1)] - size[f(i)])/n. But since size[f(i)] > size[f(i+1)] the net change would be negative indicating an improvement. Hence, if any file pair is not in non-decreasing order we can improve the overall expected access time by swapping them. QED.

Following on from the Minimum Scalar Product problem in Round 1A, I next attempted the Milkshakes problem. This problem initially had me perplexed. You could solve the problem using brute force by generating a list of all possible permutations of the N flavours (malted or unmalted) and then assessing if any of these candidate solutions from the solution space was feasible, and from all the feasible solutions pick the one that had the least number of malted batches made. That would require evaluating 2N candidate solutions which is NOT computationally practical given the range for N (1 < N < 2000).

I must admit I did check through a few reference books to get ideas on this one. From my studies in Operations Research I had a suspicion that this was similar to a boolean satisfiability problem and found reference to it in  this book on page 38 and 48. That told me that Cooke's Theorem had proved this to be NP-Complete. I know Google gives out challenging problems but I very much doubt they'd be evil enough to give out an NP-Complete problem to solve! Hence I kept looking for a solution. The problem was subtlety different from that problem in that each customer could have at most 1 malted flavour preference. That's the opening we need...

Intuition should tell us that we can improve drastically on the brute-force approach by adopting a greedy-like algorithm whereby we start with the best case scenario - no malted flavour batches made - and assess if this candidate solution is feasible (in that it satisfies all of the customers flavour preferences). If it is feasible then we know it will be optimal. If it is not feasible we need to amend our candidate solution such that it is closer to being feasible, and then start the process over.

In summary, our approach to solving the milkshakes problem is as follows:

1. Set first candidate solution to be all flavours made as unmalted
2. Check if current candidate solution is feasible. If it is then exit. The current solution will be optimal.
3. If current candidate solution is not feasible, and the unsatisfied customer has a malted flavour preference then change the candidate solution to make that flavour as malted rather than unmalted, and return to Step 2. NB: it should be noted that the customer who caused us to amend our candidate solution is now satisfied and can be removed from the list of customers to check on subsequent passes.
4. If current candidate solution is not feasible, and the unsatisfied customer has no malted flavour preference then it means that they had unmalted preferences that we were forced to make as malted to satisfy other customers. This implies that we cannot change our current candidate solution in such a way so as to satisfy the customer and hence no feasible solution exists. In this case exit and mark the problem as "Impossible".

Here's my C# code to do that:

```using System;
using System.Collections.Generic;
using System.IO;

{
class Program
{
public class Customer
{
private List<int> _likesUnmalted;   // a list of the customers preferences
private int? _likesMalted = null;   // the index of the malted flavour they list

public Customer(List<int> likesIndices, int? maltedIndex)
{
_likesUnmalted = likesIndices;
_likesMalted = maltedIndex;
}

public int? LikesMaltedIndex
{
get {return _likesMalted;}

}

public bool IsSatisfied(bool[] batchMaltedFlags)
{
if (this.LikesMaltedIndex != null && (batchMaltedFlags[(this.LikesMaltedIndex ?? 0) - 1]))
{
return true;
}

foreach (int i in _likesUnmalted)
{
if (!batchMaltedFlags[i-1])
{
return true;
}
}
return false;
}
}

static void Main()
{
StreamWriter outfile = new System.IO.StreamWriter("output.txt");

for (int caseNo = 1; caseNo <= testCases; caseNo++) // note this is 1-based
{
bool IsImpossible = false;

bool[] isBatchMalted = new bool[flavours];

List<Customer> customers = new List<Customer>();
for (int i = 0; i < custCount; i++ )
{
string[] data=(infile.ReadLine()).Split(new char[] { ' ' }, StringSplitOptions.None);
int flavoursLiked = Int32.Parse(data[0]); // number of flavours liked

List<int> likes = new List<int>();
int? likesMaltedIndex = null;

for (int j = 0; j < flavoursLiked; j++)
{
if (Int32.Parse(data[j * 2 + 2]) == 1)
{
likesMaltedIndex = Int32.Parse(data[j * 2 + 1]);
}
else
{
}
}
}

// for initial solution assume we only make unmalted flavours
// and check for feasibility
for (int i = 0; i < isBatchMalted.Length; i++ )
{
isBatchMalted[i] = false;
}

do
{
foreach (Customer c in customers.ToArray())
{
if (!c.IsSatisfied(isBatchMalted))   // this customer isn't happy
{
if (c.LikesMaltedIndex == null)
{
// they must be unsatisfied because none of their prefered
// MALTED flavours have been made
IsImpossible = true;
break;
}
else
{
// they have a malted-preference so we MUST make that flavour
// as a malt rather than an unmalted

isBatchMalted[(c.LikesMaltedIndex ?? 0) - 1] = true;

// since this customer is now satisfied we can remove them
customers.Remove(c);
}
}
}

// write out the results
if (IsImpossible)
{
outfile.WriteLine(String.Format("Case #{0}: IMPOSSIBLE", caseNo));
//outfile.WriteLine();
}
else
{
outfile.Write(String.Format("Case #{0}:", caseNo));
foreach (bool b in isBatchMalted)
{
outfile.Write(" " + (b ? 1 : 0));
}
outfile.WriteLine();
}
}

infile.Close();
outfile.Flush();
outfile.Close();
}
}
}```