Archive for September, 2009

Uncategorized

Google Code Jam 2009: All Your Bases

Google Code JamThe first puzzle in Round 1C was the All Your Bases puzzle. This puzzle requires you to examine a given string literal and return the lowest possible numeric value that can be made from it. The string is a character pattern that represents a number but we don't know what base it is in. The obvious way to solve this puzzle is to:

  1. Count the number of distinct characters in the string. This must be the base to use that produces the lowest possible number. To see this consider a 4 digit number "wxyz". If this is in base b, then the value of wxyz is: w.b3+x.b2+y.b1+z.b0. Since we have 4 distinct digits the base, b >= 4. If we go with b=4 then the value resulting from the above expression is clearly going to be less than the value resulting from any higher value of b. One gotcha: if the count=1 then you need to bump it up to 2 since we are explicitly told that unary (base-1) is not used.

  2. For the baseToUse found above, create a sequence of digits from that base in increasing order. Since all bases have a 0, but we are told no numbers in the test data include leading zeros, we must start this sequence with 1, then 0, then 2, 3, 4... up to (baseToUse-1).

  3. Starting at the left of the string (the problem doesn't specify if the aliens use left-right or right-left notation but the sample problem implies left-right is used. Given the nature of the problem - finding out the earliest possible start time for the war - this is a rather dangerous assumption and so I pointed it out to Google. See the comment at the bottom) we assign the first unused digit from the list constructed above for each character represented by the char in the position being examined.

  4. Since you know the baseToUse, and the digits in each position it is easy to calculate the base10 equivalent of the number created. One gotcha here: the numbers are going to get very large so don't get caught out using floating point functions that silently lose precision when they get too big (that issue has bitten me more often than I care to admit). Use an integer type all the way.

UPDATE: the official analysis from the first round of CodeJam is out and the problem creator, Bartholomew Furrow of Google, acknowledged my observation that perhaps this problem makes a somewhat dangerous assumption. That said, I am envious of his terse solution in Python. Make my C# code look really verbose. Note to self: go learn Python TOMORROW.

The C# code is shown below:

using System;
using System.Collections.Generic;
using System.IO;

namespace GoogleCodeJam.AllYourBases
{
    class Program
    {
        static void Main()
        {
            string basePath = @"D:\";
            var infile = new StreamReader(basePath + "large.txt");
            var outfile = new StreamWriter(basePath + "output.txt");

            int testCases = Int32.Parse(infile.ReadLine());
            for (int caseNo = 1; caseNo <= testCases; caseNo++)
            {
                string message = infile.ReadLine().Trim();
                ulong minSeconds = UInt64.MaxValue;
                int startBase = CountDistinct(message);
                if (startBase == 1)
                    startBase++;

                minSeconds = message.Length ==1 ? 1 : FindMinimum(startBase, message);

                outfile.WriteLine(String.Format("Case #{0}: {1}", caseNo, minSeconds));
            }
            infile.Close();
            outfile.Close();
        }

        //returns a base10 number
        static ulong FindMinimum(int baseUsed, string message)
        {
            message = message.Trim();
            var digits = new List<int> {1, 0};
            for (int d = 2; d < baseUsed; d++)
                digits.Add(d);

            ulong ans = 0;
            foreach (var c in GetDistinctInOrder(message))
            {
                int index = message.IndexOf(c);
                while (index >=0)
                {
                    ans += (ulong)digits[0] * Power( baseUsed, message.Length - (index + 1));
                    index = message.IndexOf(c,index+1);
                }
                digits.RemoveAt(0);
            }
            return ans;
        }

        static ulong Power(int baseUsed, int exponent)
        {
            ulong answer = 1;
            for(int i=0; i<exponent; i++)
                answer *= (ulong)baseUsed;

            return answer;
        }

        static List<char> GetDistinctInOrder(string input)
        {
            input = input.Trim();
            var seen = new List<char>();
            for (int i = 0; i < input.Length; i++)
            {
                if (!seen.Contains(input[i]))
                    seen.Add(input[i]);
            }
            return seen;
        }

        static int CountDistinct(string input)
        {
            return GetDistinctInOrder(input).Count;
        }
    }
}

And here's my unpolished C++ code to do the same:

#include <string>
#include <iostream>

using namespace std;

char seen[256];
char seenOrder[256];

long long Power(int baseUsed, int exponent)
{
    long long answer = 1;
    for(int i=0; i<exponent; i++)
        answer *= baseUsed;
    return answer;
}

//returns a base10 number
long long findMinimum(int baseUsed, string* msg)
{
    int digits[36];
    memset(digits,0,sizeof(digits));
    digits[0]=1;
    digits[1]=0;
    for (int d = 2; d < baseUsed; d++)
        digits[d]=d;

    long long ans = 0;
    for(int i=0; i < baseUsed; i++)
    {
        char c = seenOrder[i];
        int index = msg->find(c);
        while (index != string::npos)
        {
            ans += digits[i] * Power( baseUsed, msg->length() - (index + 1));
            index = msg->find(c,index+1);
        }
    }
    return ans;
}

void main()
{
    string basePath = "";
    string inFile = basePath + "large.txt";
    string outFile = basePath + "output.txt";

    freopen(inFile.c_str() , "r", stdin);
    freopen(outFile.c_str(), "w", stdout);

    int testCases;
    cin >> testCases;

    for (int caseNo = 1; caseNo <= testCases; caseNo++)
    {
        long long minSeconds = 0;

        string message;
        cin >> message;

        memset(seen,0,sizeof(seen));
        memset(seenOrder,0,sizeof(seenOrder));
        int startBase=0;
        for (int i = 0; i < message.length(); i++)
        {
            char c = message[i];
            if (seen[c] == 0) //not seen before
            {
                seen[c] = 1;
                seenOrder[startBase] = c;
                startBase++;
            }
        }
        if (startBase == 1) startBase++;

        minSeconds = (message.length() == 1) ? 1 : findMinimum(startBase, &message);

        printf("Case #%d: %lld\n", caseNo, minSeconds);
    }
}

Uncategorized

Google Code Jam 2009: Welcome Substrings

Google Code JamDuring the qualification phase you only need to solve one of the three puzzles. The first of these puzzles, Alien Languages, was pretty easy making qualification more of a formality. Regardless, I couldn't resist tackling the other problems on offer including the Welcome to Code Jam puzzle which asks that you count the number of ways of finding the string literal "welcome to code jam" in a given text. There are a couple of catches that make it more interesting:

  1. you need to search for sub-sequences, meaning that the characters need not be consecutive - they can have other characters interleaved; and
  2. only the last 4 digits of the results need to be given. i.e. mod 10,000.

In solving this, firstly we note that there are 11 distinct characters in our target string,and there are up to 500 total characters in the search string. Along the same lines as that used in Alien Languages I first built an index of the characters of interest - the 11 distinct chars in the target string. In C# I did this as a list of lists using a fixed ordering because doing so enables quick retrieval of the indexes of the character of interest in the search string. Obviously, the cost of doing this pre-processing is O(n) if the search string is n-characters long which is feasible since n<500 in the worst case.

After pre-processing you can walk backwards through the target string keeping count of the possible ways to get the current letter. That is, find all occurrences of the character "m" (the pre-prepared indices make this easier), give these a count of 1 and keep the index of occurrence. Then pass the results of the "m" search to a similar search for all occurrences of "a" that appear to the left of the "m" in question. The count for the "a" search would be the sum of the counts for all the "m"s to the right of the current "a". This process continues all the way through the target string with the final result being given by the sum of all the counts that result from the "w" search. Effectively this is a dynamic programming approach using iteration rather than recursion. You could also use graph theory to solve it.

The C# code is shown below:

using System;
using System.Collections.Generic;
using System.IO;

namespace GoogleCodeJam.WelcomeToCodeJam
{
    class Program
    {
        static void Main()
        {
            const string basePath = @"D:\";
            var infile = new StreamReader(basePath + "large.txt");
            var outfile = new StreamWriter(basePath + "output.txt");

            const string target = "welcome to code jam";
            const string distinct = "welcom tdja";

            int N = Int32.Parse(infile.ReadLine());
            for (int caseNo = 1; caseNo <= N; caseNo++)
            {
                string input = infile.ReadLine().Trim().ToLower();

                //index the characters of interest
                var indicies = new List<List<int>>();
                for (int i = 0; i < distinct.Length; i++)
                    indicies.Add(new List<int>());

                for(int i=0; i<input.Length; i++)
                {
                    int index = distinct.IndexOf(input[i]);
                    if (index >=0)
                        indicies[index].Add(i);
                }

                var possible = new List<Pair>();
                foreach (var item in indicies[distinct.IndexOf(target[target.Length-1])])
                    possible.Add(new Pair(item, 1));

                // move through the chars right to left
                for (int i = target.Length-2; i >=0; i--)
                {
                    var newpossible = new List<Pair>();
                    foreach (var j in indicies[distinct.IndexOf(target[i])])
                    {
                        ulong count = 0;
                        foreach(var p in possible)
                            if (j < p.charindex)
                                count += p.count % 10000;

                        newpossible.Add(new Pair(j, count));
                    }
                    possible = newpossible;
                }
                ulong answer = 0;
                foreach (var p in possible)
                    answer += p.count%10000;

                outfile.WriteLine(String.Format("Case #{0}: {1}", caseNo, (answer % 10000).ToString("0000")));
            }
            infile.Close();
            outfile.Close();
        }
   }
    class Pair
    {
        public int charindex {get;set;}
        public ulong count { get; set;}

        public Pair(int c, ulong num)
        {
            charindex = c;
            count = num;
        }
    }
}

Uncategorized

Google Code Jam 2009: Alien Language

Google Code JamThis years' CodeJam qualification round started with a problem called Alien Language which is essentially a pattern matching exercise. The alien language has D words, each of length L, with all of these known in advance. You are given a bunch of regex-like word patterns and you need to report on the number of matches that each word pattern can produce.

The approach I took was to build a list of indices of the possible words that match the given pattern and prune this list as we progress through the pattern from left to right. For each token in the pattern we only need to look for matches in the list of remaining possible word indexes, as this gets more selective as we go. Although unnecessary in this case, a further optimization would be to start with the tokens that consist of only a single letter as they are likely to be more restrictive. It's the same sort of approach a database engine would take when evaluating a multi-clause where predicate (do the one that is likely to be the most restrictive first). The worst case run time complexity of this approach is O(D.L) which is good enough when D=5000 and L=15.

Alternately, if you are handy with regex syntax and are confident your language has an efficient regex implementation you can resort to using that.

The C# code is shown below:

using System;
using System.Collections.Generic;
using System.IO;

namespace GoogleCodeJam.AlienLanguage
{
    class Program
    {
        static void Main()
        {
            const string basePath = @"D:\";
            var infile = new StreamReader(basePath + "large.txt");
            var outfile = new StreamWriter(basePath + "output.txt");

            string[] input = (infile.ReadLine()).Split(new[] { ' ' }, StringSplitOptions.None);
            int L = Int32.Parse(input[0]);  // length of each word
            int D = Int32.Parse(input[1]);  // number of words
            int N = Int32.Parse(input[2]);  // number of test cases

            var words = new List<string>();
            for (int i = 0; i < D; i++)
                words.Add(infile.ReadLine().Trim());

            for (int caseNo = 1; caseNo <= N; caseNo++)
            {
                var wordIndicies = new List<int>();
                var removalList = new List<int>();
                for (int i = 0; i < D; i++)
                    wordIndicies.Add(i);

                var tokens = GetTokens(infile.ReadLine());
                for (int i = 0; i < L; i++)
                {
                    foreach (var j in wordIndicies)
                        if (!tokens[i].Contains(words[j].Substring(i, 1)))
                            removalList.Add(j);
                    foreach (var k in removalList)
                        wordIndicies.Remove(k);
                }
                outfile.WriteLine(String.Format("Case #{0}: {1}", caseNo, wordIndicies.Count));
            }
            infile.Close();
            outfile.Close();
        }

        public static List<string> GetTokens(string input)
        {
            var results = new List<string>();
            bool inToken = false;
            string currentToken = "";
            foreach(char c in input)
            {
                if (c == '(')
                {
                    currentToken = "";
                    inToken = true;
                }
                else if (c == ')')
                {
                    results.Add(currentToken);
                    inToken = false;
                }
                else
                {
                    if (inToken)
                        currentToken = currentToken + c;
                    else
                        results.Add(c.ToString());
                }
            }
            return results;
        }
    }
}

For completeness I also hacked together a solution in C++. Working with lists and the STL is somewhat painful so I've resorted to inelegant pre-allocated arrays (yuck!) and a large lookup map for the token matching, but hey, it solves the problem.

#include <string>
#include <list>
#include <iostream>

#define maxL 15
#define maxD 5000

using namespace std;

int L,D,N;
char words[maxD][maxL];
char tokenMap[maxL][26];
char input[28*maxL];

/* Builds a boolean matrix of flags indicating if the i-th token in the 
   pattern matches the j-th lowercase alphabet character */
void parseTokens()
{
    memset(tokenMap, 0, sizeof(tokenMap));
    int i,j = 0;
    for (i = 0; i < L; i++) {
        if (islower(input[j])) {
            tokenMap[i][input[j] - 'a'] = 1;
            j++;
        }
        else {
            j++;
            while(input[j] != ')') {
                tokenMap[i][input[j] - 'a'] = 1;
                j++;
            }
            j++;
        }
    }
}

void main()
{
    string basePath = "";
    string inFile = basePath + "large.txt";
    string outFile = basePath + "output.txt";

    freopen(inFile.c_str(),"r",stdin);
    freopen(outFile.c_str(),"w",stdout);

    cin >> L >> D >> N;

    for (int i=0; i<D; i++)
        cin >> words[i];

    for (int caseNo = 1; caseNo <= N; caseNo++)
    {
        cin >> input;
        parseTokens();

        int j,ans=0;
        for(int i=0; i<D; i++)    //for each word in the dictionary
        {
            for (j=0; j<L; j++)    //foreach char of the current word
            {
                if (!tokenMap[j][words[i][j] - 'a'])
                    break;
            }
            if (j == L) ans++;
        }
         cout << "Case #" << caseNo << ": " << ans <<endl;
    }
}

« Prev