Google Code Jam 2009: Welcome Substrings
During the qualification phase you only need to solve one of the three puzzles. The first of these puzzles, Alien Languages, was pretty easy making qualification more of a formality. Regardless, I couldn't resist tackling the other problems on offer including the Welcome to Code Jam puzzle which asks that you count the number of ways of finding the string literal "welcome to code jam" in a given text. There are a couple of catches that make it more interesting:
- you need to search for sub-sequences, meaning that the characters need not be consecutive - they can have other characters interleaved; and
- only the last 4 digits of the results need to be given. i.e. mod 10,000.
In solving this, firstly we note that there are 11 distinct characters in our target string,and there are up to 500 total characters in the search string. Along the same lines as that used in Alien Languages I first built an index of the characters of interest - the 11 distinct chars in the target string. In C# I did this as a list of lists using a fixed ordering because doing so enables quick retrieval of the indexes of the character of interest in the search string. Obviously, the cost of doing this pre-processing is O(n) if the search string is n-characters long which is feasible since n<500 in the worst case.
After pre-processing you can walk backwards through the target string keeping count of the possible ways to get the current letter. That is, find all occurrences of the character "m" (the pre-prepared indices make this easier), give these a count of 1 and keep the index of occurrence. Then pass the results of the "m" search to a similar search for all occurrences of "a" that appear to the left of the "m" in question. The count for the "a" search would be the sum of the counts for all the "m"s to the right of the current "a". This process continues all the way through the target string with the final result being given by the sum of all the counts that result from the "w" search. Effectively this is a dynamic programming approach using iteration rather than recursion. You could also use graph theory to solve it.
The C# code is shown below:
using System; using System.Collections.Generic; using System.IO; namespace GoogleCodeJam.WelcomeToCodeJam { class Program { static void Main() { const string basePath = @"D:\"; var infile = new StreamReader(basePath + "large.txt"); var outfile = new StreamWriter(basePath + "output.txt"); const string target = "welcome to code jam"; const string distinct = "welcom tdja"; int N = Int32.Parse(infile.ReadLine()); for (int caseNo = 1; caseNo <= N; caseNo++) { string input = infile.ReadLine().Trim().ToLower(); //index the characters of interest var indicies = new List<List<int>>(); for (int i = 0; i < distinct.Length; i++) indicies.Add(new List<int>()); for(int i=0; i<input.Length; i++) { int index = distinct.IndexOf(input[i]); if (index >=0) indicies[index].Add(i); } var possible = new List<Pair>(); foreach (var item in indicies[distinct.IndexOf(target[target.Length-1])]) possible.Add(new Pair(item, 1)); // move through the chars right to left for (int i = target.Length-2; i >=0; i--) { var newpossible = new List<Pair>(); foreach (var j in indicies[distinct.IndexOf(target[i])]) { ulong count = 0; foreach(var p in possible) if (j < p.charindex) count += p.count % 10000; newpossible.Add(new Pair(j, count)); } possible = newpossible; } ulong answer = 0; foreach (var p in possible) answer += p.count%10000; outfile.WriteLine(String.Format("Case #{0}: {1}", caseNo, (answer % 10000).ToString("0000"))); } infile.Close(); outfile.Close(); } } class Pair { public int charindex {get;set;} public ulong count { get; set;} public Pair(int c, ulong num) { charindex = c; count = num; } } }
13 Sep 2009 Damien Wintour







