## Google Code Jam 2009: Collecting Cards

The last problem in Round 1A was the "Collecting Cards" problem which is a simply-stated problem but requires some understanding of combinatorics to solve. Basically, there are **C **distinct cards in a collection and you need to figure out how many packs of **N **distinct cards you need to buy in order to have the complete set assuming you started with none.

Firstly, some basic observations:

- Since order is irrelevant, we are dealing with
*combinations*not*permutations*. - Combinatorics tells us that there are
^{C}C_{N}distinct combinations, all of which are equally likely. - Since we start with nothing we must always buy at least 1 pack.
- The theoretical minimum number of packs that we would need would only occur when every pack we bought had completely new cards in it (rather unlikely). That is, absolute lower limit is:

- We need to account for the likelihood that in packs purchased we are going to see an percentage of cards we already have. Dealing with this is the crux of the problem.

**Background on Combinations**

Note that ^{x}C_{y} can also be written as:

and, choosing *y* items from a population of *x* items is equivalent to choosing *x-y* items. That is:

Clearly there is only 1 combination of *x*items in a population of *x*, and by the above rule there is only 1 combination of zero items (that confuses many people!). That is:

**Problem Solution**

To solve this problem let us **define f(k) be the expected number of packs we need to buy if we still need k cards** (i.e we already have C-k distinct cards). To solve the problem we just need to find f(C). It is also obvious that f(0)=0, and f(C) = 1 + f(C-N) since we are guaranteed to get N distinct

*new*cards when we purchase the first pack. What we now need is a generic formula for f(k) that we can use in a recursive fashion.

When we need *k* (distinct) cards, it follows that we must already have (C-k) distinct cards. Thus purchasing an additional pack of N cards can result in 0, 1, 2, 3, ..., min(N, k) new cards being collected. If we **let p _{j,m} be the probability of getting j new cards from an additional pack of N randomly-drawn cards, given that we already have m distinct cards**, then we can say:

To calculate p_{j,m} we simply need to figure out how many packs would have *j * cards that we don't already have and (N-j) cards that we do have, divided by the total number of possible pack combinations. In other words:

To efficiently compute the combinations we make use of the well-known property of Pascal's Triangle - the

[i,j] coefficient is the sum of the 2 co-efficients "above" it. This recursive definition can be stated formally as,

Since we know that N<=C<=40 we can go ahead and pre-compute these storing them in a 2-dimensional array.

Substituting for p_{j,C-k} in our formula for f(k) yields:

But there is a problem here. If you use the above formula as is you'll hit a stack overflow as I did (dohh!) because you're creating an infinite loop due to f(k) being on both sides of the equation. To remedy this we need to revise our equation as follows (we took out the j=0 case in the summation and ensured that f(k) only appears on the LHS):

Holly cat fish , Batman! Who would have though collecting cards was so complicated

Here's the C# code to implement the above ~~mess~~, errr equation:

using System; using System.IO; namespace CollectingCards { class Program { private static readonly ulong[,] choose = new ulong[41,41]; private static double?[] solved; private static int C,N; static void Main() { const string basePath = ""; var infile = new StreamReader(basePath + "small.txt"); var outfile = new StreamWriter(basePath + "output.txt"); for (int i=0; i<41; i++) { choose[i, i] = 1; choose[i, 0] = 1; for (int j=1; j<i; j++) choose[i,j] = choose[i-1,j] + choose[i-1,j-1]; } int testCases = Int32.Parse(infile.ReadLine()); for (int caseNo = 1; caseNo <= testCases; caseNo++) { string[] data = (infile.ReadLine()).Split(' '); C = Int32.Parse(data[0]); N = Int32.Parse(data[1]); solved = new double?[41]; outfile.WriteLine("Case #{0}: {1:#.00000}", caseNo, Solve(C)); } infile.Close(); outfile.Close(); } static double Solve(int k) { if (k == 0) return 0; if (solved[k].HasValue) return solved[k].Value; double factor = 1.0/(1.0-(double)choose[C-k,N]/choose[C,N]); double ans = 1; for (int j = 1; j <= Math.Min(N, k); j++) { if (k - j == 0) continue; double p = (double)(choose[k, j] * choose[C - k, N - j]) / choose[C, N]; if (p != 0) ans += p*Solve(k - j); } ans *= factor; solved[k] = ans; return ans; } } }

And here's some **C++** code to do the same:

#include <iostream> #include <string> using namespace std; long long choose[41][41]; int C; int N; double solved[41]; double Solve(int k) { if (solved[k] != -1) return solved[k]; double factor = 1.0/(1.0-(double)choose[C-k][N]/choose[C][N]); double ans = 1; int minVal = min(k,N); for (int j = 1; j <= minVal; j++) { if (k - j == 0) continue; double p = (double)(choose[k][j] * choose[C - k][N - j]) / choose[C][N]; if (p != 0) ans += p*Solve(k - j); } ans *= factor; solved[k] = ans; return ans; } void main() { string basePath = ""; string inFile = basePath + "large.txt"; string outFile = basePath + "output.txt"; freopen(inFile.c_str(),"r",stdin); freopen(outFile.c_str(),"w",stdout); for (int i=0; i<41; i++) { choose[i][i] = 1; choose[i][0] = 1; for (int j=1; j<i; j++) choose[i][j] = choose[i-1][j] + choose[i-1][j-1]; } int testCases; cin >> testCases; for (int caseNo = 1; caseNo <= testCases; caseNo++) { cin >> C >> N; for(int i=0; i<41;i++) solved[i]=-1; cout << "Case #" << caseNo << ": " << Solve(C) << endl; } }

*13 Sep 2009*
*Damien Wintour*
0 comments