Uncategorized

Puzzle: The Reckless Skateboarder

Problem: An old lady was going to a street market when a reckless kid on a skateboard bumped into her and made her drop a basket full of painted porcelain figurines breaking all of them into many small pieces. The kid's father saw the whole thing and offered to pay for all the damaged items so he asked her how many figurines she had bought. The old lady couldn't remember the exact number, but she remembered that when she had taken them out two at a time, there was one figurine left. The same happened when she picked them out three, four, five, and six at a time, but when she took them seven at a time they came out even. What is the smallest number of figurines she could have had?

Solution: First, let's formulate this using mathematical notation. Let x = the number of porcelain figurines the old lady originally had. Clearly, x is a positive integer and we also know that:


x % 2 = 1
x % 3 = 1
x % 4 = 1
x % 5 = 1
x % 6 = 1
x % 7 = 0

where % is the modulus operator. These equations can also be written as congruence relationships:


x = 1 (mod 2)
x = 1 (mod 3)
x = 1 (mod 4)
x = 1 (mod 5)
x = 1 (mod 6)
x = 0 (mod 7)

If you haven't already figured it out, this is a problem that derives from the Chinese Remainder Theorem. This theorem deals with solving simultaneous modular arithmetic equations.

The least common multiple of 2,3,4,5,6 is 60, thus the first 5 equations above can be reduced to a single equation:

x = 1 (mod 60)

So clearly the solution to the simultaneous equations is a number that is a multiple of 7 and gives a remainder of 1 when divided by 60. We can therefore examine numbers of the form: 60m+1 where m=0,1,2,3,... Here is the first 10 numbers in this sequence:

1, 61, 121, 181, 241, 301, 361, 421, 481, 541, ...

Starting with the lowest number and dividing each by 7 we can quickly determine that 301 is the answer since it is the first number divisible by 7.

Uncategorized

Applied Number Theory: One Way Functions

Whilst many developers have to incorporate a degree of "security" or "crypto" elements into products they build, it recently came to my attention that many developers have a thin grasp of the mathematical concepts that underpin the foundations of cryptography. I guess it shouldn't have surprised me, but none-the-less it did.

For example, many developers will tell you that passwords should not be stored in clear-text in the database, but rather the hash of the password should be stored. That is excellent advice, but when probed on hash functions - what exactly are they, how do they work, what assumptions do they make, etc - you'll quickly find that it is an area that isn't well understood by many developers.

Under the covers hash functions are specialized one way functions. One way functions are a cryptographic primitive that, by definition, are "easy" to compute in one direction but "hard" to calculate in the other direction, that is, finding the inverse.  In this context, "easy" means solvable in polynomial time, and "hard" means, on average, the inverse for some randomly given value is not solvable in polynomial time. That is, easy means ∈ P, and hard means ∈ NP.

An Example

Let me give an example to demonstrate the easy/hard asymmetry that gives the one way function it's power. Consider the following equation:

c ≡ be mod m   (1)           [ here b is the "base", e is the "exponent", and m is the "modulus" ]

This is called modular or discrete exponentiation and it is a carefully constructed equation to exhibit qualities that make it a suitable one way function. Essentially, (1) consists of 2 components: an exponentiation operation, be, and a modulus operation. It's clear that evaluating be requires e-1 multiplication operations. This can also be seen by considering the recurrence relationship :

be = b. b(e-1) (2)

= b. b. b(e-2)

...

= b. b. b. b.  ... .b

The astute reader will note that we can perform this more efficiently by computing b2, and then multiplying that by itself to get b4 and so on to get b8,  b16,  b32 and so on. We can see that this approach requires approximately log2(e) operations. This is called exponentiation by squaring. It should be noted that there is considerable efficiency to be gained in using an Θ(log2e) algorithm over a Θ(e) algorithm when e is large.

The second component needing evaluation in (1) is the modulus operation. This involves dividing be by m and finding the remainder. So overall we can be certain that we have an expression that can be solved in polynomial time. Let's now consider the inverse function. That is, calculating e for given values of c, b and m. The equation is:

e = loge (b) -> Zn (3)

This is a discrete logarithm to base e in Zn and the tractability of such a problem is known to be hard and have no efficient solutions (unless of course, you have a quantum computer). Ergo, modular exponentiation is an elegant construction that can be used as a cryptographic one way function. Indeed, this particular function is the basis of the ElGamal encryption scheme.

Underlying Assumptions

The strength of one-way functions is in the difficulty in calculating the inverse. In the forward direction we have a problem ∈ P, and in the the reverse direction we have a problem ∈ NP. Clearly, we are assuming that:

PNP (4)

(4) is considered to be one of the most fundamental, yet difficult, open unresolved mathematical equations. Presently this assumption stated in (4) is a safe, albeit unproven, assumption as the majority of mathematicians believe (4) to be true as it has been scrutinized over many years without being disproved. The importance of this assumption is exemplified by the lucrative financial rewards (1 million USD) on offer from the Clay Institute for a formal proof or disproof. To date, however, no one has been able to offer such a proof. Until that happens cryptographers will continue to assume this is the case.

Uncategorized

Google Code Jam: Ugly Numbers

Google Code Jam Following on from the text message outrage, the second problem in Round 1C was the "Ugly Numbers" problem. This puzzle involves counting the number of "ugly" numbers that can be formed from a given input string using only addition and subtraction operations. An "ugly" number is one that is divisible by either 2, 3, 5, or 7 (the one-digit primes).

Since there are 3 possible actions: addition, subtraction, and append; and D digits, clearly there are 3D-1 possible numbers that can be made. Calculating each of these and testing for ugliness is not at all feasible since D can be anything up to 40.

So first let us put our number theory caps on and consider the algebraic properties of the ugly numbers. Listing a few of the ugly numbers is a good starting point:

2, 4, 6, 8, 10, 12, ...
3, 6, 9, 12, 15, ...
5, 10, 15, 20, 25, ...
7, 14, 21, 28, 35, ...

Note that 9 and 10 are ugly numbers because they are divisible by 3 and 5 respectively. However the addition of these two numbers is 9+10=19 which is not ugly.

Also note that 20 and 7 are ugly numbers because they are divisible by 2 and 7 respectively. However the difference of these two numbers is 20-7=13 which is not ugly.

What this tells us is that both addition and subtraction are not closed over the set of ugly numbers.

So how does that help? It doesn't really. It just tells us that a dynamic programming approach based on the number of ugly numbers found using a smaller subset of the digits is not going to work since we don't have the overlapping sub-problem property with this approach.

It is also interesting that the ugliness criteria is based around prime numbers, specifically the set {2,3,5,7}. Hmmm, that might be our hint. Since prime number are only divisible by themselves and 1, then the greatest common divisor (GCD) of any two numbers in this set must be 1. In other words the set is pairwise coprime, and therefore the lowest common multiple (LCM) of the set must be the product: 2*3*5*7=210.

To ascertain ugliness we could write a function like this in C#:

        static bool isUgly(int x)
        {
            return (x%2 == 0 || x%3 == 0 || x%5 == 0 || x%7 == 0);
        }

However, since 210 is the LCM of all the one-digit primes we can pass in x'= x % 210 to this function and still get accurate ugliness checking. That is, the ugliness of x is going to be equivalent to the ugliness of x % 210 because 210 is a multiple of each of the one-digit primes. This is definitely something we can exploit because modular arithmetic has nice algebraic properties. Specifically, if:


a = b mod (210)
c = d mod (210)

then it follows that:

a + c = (b + d) mod (210)
a - c = (b - d) mod (210)

In other words, both addition and subtraction are closed over the set of integers mod 210, which is usually written Z210.

So this means that we can take an iterative approach and consider the first i characters, determine all possible resulting numbers but instead of storing all of them we store only the result mod 210. This limits the possible output to 210 numbers (0-209) and also allows us to re-use the results of earlier calculations since addition and subtraction are closed over Z210. This now gives us the overlapping sub-problem property that we need for dynamic programming.

First, define cj,d to be the number of possible ways of producing a mod 210 residual of j when we consider only the characters in the given input string up to, and including, index d (assuming a zero-based index). Since j ranges from 0 to 209 and d ranges from 0 to D-1, we can represent this as a matrix as follows:


And the solution to the problem would be:

where isUgly(x) returns 1 if x is ugly, and 0 otherwise.

Now consider the j-th column vector in this matrix, cj, and how it can be calculated from the column vector immediately to its left, cj-1.

All we need to do is to go through all 210 elements of cj-1 and perform three operations on each non-zero count found: an addition (mod 210), a subtraction (mod 210) and the append operation putting the results of all three into their respective rows in cj. Addition and subtraction are fairly easy (see formula below) with the only trick being modular arithmetic on negative numbers (you need to add 210), however the append operation is a bit more complicated.


To do the append operation for a given element of the column vector consider the possible ways the residual (x) represented by the count could have been derived. It could be the sum of 2 numbers or the difference of 2 numbers:


a + b = x (mod 210)
a - b = x (mod 210)

Here we know the value of x but we don't know the operands (a and b) or the operator used. Thus we need to maintain more information than just the counts mod 210 in order to solve the problem. If we were to store the position of the last operator and what that operator was (addition or subtraction) for the count we would be able to derive both operands, and thus derive the value of the append operation. This can be done as follows (with all calculations done mod 210):

  1. Use d and the position of the last operator to derive b. b will be the number formed by all digits from the position of the last operator to d (mod 210)
  2. Use x and b and the sign of the last operator to derive a. That is: a = x - b (mod 210) if the last operator was addition; a= x + b (mod 210) if the last operator was subtraction.
  3. Calculate b', the new value of b by appending the new digit to the end of b
  4. Calculate the new residual value, x' = a + b' (mod 210) if the last operator was addition, otherwise x' = a - b' (mod 210).

I used a simple struct to hold the count and the position of the last operator along with the operator type. If the position is positive or zero, it implies addition was the operator. If the position is negative it implies that subtraction was the last operator. This enables us to store 2 bits of information in a single signed integer. Furthermore, because a given residual can be obtained by numerous combinations of operators and operands we need to store a list of these structs for each point in the matrix above.

Here's my C# code for this problem:

   1:  using System;
   2:  using System.Collections.Generic;
   3:  using System.IO;
   4:  using System.Linq;
   5:   
   6:  namespace GoogleCodeJam.UglyNumbers
   7:  {
   8:      class Program
   9:      {
  10:          const int lcm = 2 * 3 * 5 * 7;
  11:   
  12:          static void Main()
  13:          {
  14:              
  15:              const string basePath = "";
  16:              var infile = new StreamReader(basePath + "large.txt");
  17:              var outfile = new StreamWriter(basePath + "output.txt");
  18:   
  19:              int testCases = Int32.Parse(infile.ReadLine());
  20:              for (int caseNo = 1; caseNo <= testCases; caseNo++) 
  21:              {
  22:                  string input = infile.ReadLine();
  23:                  var c = new List<Pair>[lcm,input.Length];
  24:   
  25:                  for (int y = 0; y < input.Length; y++)
  26:                  {
  27:                      int digit = input[y]-'0';
  28:   
  29:                      if (y == 0)
  30:                      {
  31:                          c[digit, y] = new List<Pair> {new Pair(1, 0)};
  32:                          continue;
  33:                      }
  34:                          
  35:                      for (int x = 0; x < lcm; x++)
  36:                      {
  37:                          if (c[x, y - 1] != null && c[x, y - 1].Sum(p => p.Count) > 0)
  38:                          {
  39:                              if (c[(x+digit)%lcm,y] == null)
  40:                                  c[(x+digit)%lcm,y] = new List<Pair>();
  41:   
  42:                              c[(x+digit)%lcm,y].Add(new Pair(c[x,y-1].Sum(p=>p.Count),y));
  43:   
  44:                              if (c[(x-digit+lcm)%lcm,y] == null)
  45:                                  c[(x-digit+lcm)%lcm,y] = new List<Pair>();
  46:   
  47:                              c[(x-digit+lcm)%lcm,y].Add(new Pair(c[x,y-1].Sum(p=>p.Count),-y));
  48:   
  49:                              foreach (var pair in c[x,y-1])
  50:                              {
  51:                                  int ind = pair.LastSignPosition;
  52:                                  var b = Parse(input.Substring(Math.Abs(ind),y-Math.Abs(ind)),lcm);
  53:                                  var bDash = Parse(input.Substring(Math.Abs(ind),y-Math.Abs(ind)+1),lcm);
  54:   
  55:                                  if (ind >= 0)
  56:                                  {
  57:                                      if (c[(x-b+bDash+lcm)%lcm,y] == null)
  58:                                          c[(x-b+bDash+lcm)%lcm,y] = new List<Pair>();
  59:   
  60:                                      c[(x-b+bDash+lcm)%lcm,y].Add(new Pair(pair.Count,ind));
  61:                                  }
  62:                                  else if (ind < 0)
  63:                                  {
  64:                                      if (c[(x+b-bDash+lcm)%lcm,y] == null)
  65:                                          c[(x+b-bDash+lcm)%lcm,y] = new List<Pair>();
  66:   
  67:                                      c[(x+b-bDash+lcm)%lcm,y].Add(new Pair(pair.Count,ind));
  68:                                  }
  69:                              }
  70:                          }
  71:                      }
  72:                  
  73:                  }
  74:   
  75:                  long sum = 0;
  76:                  for (int x = 0; x < lcm; x++)
  77:                      if (x % 2 == 0 || x % 3 == 0 || x % 5 == 0 || x % 7 == 0)
  78:                      {
  79:                          if (c[x, input.Length - 1] != null)
  80:                              sum += c[x, input.Length - 1].Sum(p => p.Count);
  81:                      }
  82:   
  83:                  // write out the results
  84:                  outfile.WriteLine(String.Format("Case #{0}: {1}", caseNo, sum));
  85:              }
  86:              infile.Close();
  87:              outfile.Flush();
  88:              outfile.Close();
  89:          }
  90:   
  91:          static int Parse(string num, int mod)
  92:          {
  93:              //return UInt64.Parse(num) % mod;
  94:              int result = 0;
  95:              for (int i = 0; i < num.Length; i++ )
  96:                  result = (result*10 + (num[i] - '0')) % mod;    
  97:              return result;
  98:          }
  99:      }
 100:   
 101:      internal struct Pair
 102:      {
 103:          public long Count;
 104:          public int LastSignPosition;
 105:   
 106:          public Pair(long count, int lastsign)
 107:          {
 108:              Count = count;
 109:              LastSignPosition = lastsign;
 110:          }
 111:      }
 112:  }

Next »