Archive for March, 2009

Uncategorized

Meeting Etiquette

As you move up the ranks from technical positions to more managerial positions you invariably find yourself doing less technical work but attending more meetings where issues are discussed and decisions need to be made. Well-run, concise meetings are essential to information flow and collective decision making however I've attended far too many meetings that fail to observe what I consider basic etiquette for meetings. Accordingly, here's my short list of things to do to make meetings more productive:

  1. Before initiating a business discussion, know your goal!
  2. Non-verbal communication is a big part of your message!
  3. Respect and listen to (but don't always follow) other peoples' opinions
  4. Be conscious of how much time you are taking in the conversation.
  5. Time-box open forum discussions to force people to be concise.
  6. Don't be afraid to cut people short if it's heading in the wrong direction.
  7. Evaluate the personality of the other attendees, and adjust communication as necessary.i.e. Don't talk techno-babble to the CEO.
  8. If you invite subject matter experts to attend the meeting let them speak to those issues - they're the SMEs right!
  9. Before agreeing to attend, ask 3 questions about a meeting:
    • What is the purpose?
    • What is the agenda?
    • Am I really needed?
  10. Don't use meetings to grandstand - you'll lose respect fast
  11. If you are providing input to the meeting, come prepared so as not to waste other people's time.
  12. Take notes - reliance on human memory is a sure path to failure.
  13. Prefer quick hand-drawn diagrams on the whiteboard (which can later be photographed/scanned and circulated) over elaborate PowerPoint presentations. You'll get the job done just as effectively an order of magnitude faster.
  14. Prefer video- and tele-conferencing over long-distance travel but make sure the technology works beforehand.
  15. At the end, summarise action points and key decisions made so everyone is on the same page.
  16. And finally, be on-time. You are wasting other peoples' time by being tardy.

Uncategorized

Sum of an Arithmetic Progression

An arithmetic progression is a sequence of numbers where the difference between 2 successive numbers is constant. An example is: 1,2,3,4,5,6,7,8... This sequence is increasing by one each time. So, the more general definition for the i-th term in an arithmetic progression, where d is the constant difference, and a0 is the first term in the sequence, is given by the formula:

There is a formula to find the sum of the first n terms but you need not remember it since you can derive it on the spot with common sense. Assuming the simple case where d=1, if you pick the first and last element in the sequence and sum them you get n+1. Ignoring these 2 items from the list, if you do the same thing again you get 2+(n-1)=n+1, the same result. And if you do it again, you get the same result again: 3 + (n-2) = n+1. So a pattern emerges - selecting and removing the first and last element in the sequence and summing them gives (n+1) so the sum of all of these must be that number, n+1, multiplied by the number of pairs you can extract from the sequence, which is in fact n/2. Hence the sum of the first n terms in this particular sequence is (n+1)*n/2. A similar process can be applied regardless of the value of d.

Another way to visualise the result is to write down the sequence of numbers, then just below it write down that same sequence but in reverse order. Now sum each vertical pair and you'll see the same pattern: n/2 lots of (n+1).

It's amazing how many mathematical puzzles and technical interview questions reduce to this simple summation! Don't believe me - well here's a sample of puzzles/questions that are all easily solved if you know the sum of an arithmetic progression:

  1. In a sequential search, what is the average number of comparisons it takes to search through n elements?
  2. Given 100 boxes each of which, except one, contains 100 x 10g weights, whilst the remaining box has 100 x 9g weights in it. You need to identify the box that is lighter than the others but you can only use the scales once. [Thanks to Phil Lancaster, a work colleague of mine for throwing this devious puzzle at me one day.]
  3. Prove that the worst case run time for a bubble sort is O(n2)
  4. You have an array of size n-1 which is suppose to contain all the integers between 1 and n, without duplicates, except there is a missing number. You need to find the missing number using a constant-space algorithm
  5. You have an array of size n-2 which is suppose to contain all the integers between 1 and n, without duplicates, except two numbers are missing. You need to find the missing numbers using a constant-space algorithm

Answers

  1. If we assume that the answer to our search could be in any position from 1 to n, all with equal likelihood (i.e. a Uniform distribution, aka the distribution of maximum ignorance), then the average number of comparisons is equal to the sum of all possibilities divided by n. That is, the sum of the arithmetic progression above divided by n, which is (n+1)/2

  2. Enumerative combinatorics tells us that we'd need, at most, log2(n) weightings to identify which box has the lighter blocks - divide all remaining boxes into 2 halves, weigh one half of the boxes as a total unit, pick the lighter half, and repeat the process - but we are allowed only one! So consider redistributing the weights as follows: take all the weights out marking them with the box number they came from. Then, using one empty box, put 1 block from box1, 2 blocks from box2, 3 blocks from box3, ..., n-1 blocks from box(n-1) and n blocks from box(n). From our knowledge of the sum of an arithmetic progression we know what the total weight of the box, if all the blocks were 10g, would be 100/2*(100+1)*10g = 5050g. Now, box(j) contributes j blocks each of which is 10 grams for a total contribution of 10.j grams. If box(j) is the lighter box then it will only contribute 9.j grams, a difference of exactly j grams. Blocks from all other box will not deviate from there expected weights since only 1 box originally had lighter blocks. Given this, it follows that the weight difference between actual weight of the specially prepared box and theoretical total weight using our formula always gives the precise box number of the original light box.

  3. The bubble sort uses pairwise comparisons to sort lists. The worst case for the bubble sort is when the list is in strictly descending order when you are trying to put it into ascending order. In this case the algorithm compares the first element to (n-1) other elements, and then compares the second element to (n-2) elements to it's right, and then compares the third element to (n-3) elements to it's right, etc, etc. So in total the number of comparisons is (n-1) + (n-2) + (n-3) + (n-4) + ...+ 2 + 1 = n/2(n-1). Expanding this out to polynomial form it has it's highest power of n being 2, hence complexity theory classes this as O(n2).

  4. This seems simple at first - just iterate through the list updating an array to indicate which items you have, but this approach does not meet the constant space limitation imposed in the wording of the question. It would require O(n) space. i.e the space required grows linearly with the size of the list.

    But if we allocate some space to store a running total which we update as we iterate through the list, once we reach the end of the list we can compare the running total with the theoretical total that we can calculate using, you guessed it, the sum of an arithmetic progression we can easily identify the missing integer. This approach has constant space complexity.

  5. This problem is an extension of the previous one. In the previous case we had a single equation with a single unknown that was easily solved. Now we have a single equation with 2 unknown numbers. If you remember anything about solving simultaneous equations you'll know this is not a good situation. However, once again we can make use of some simple algebra to help out. Instead of summing all the terms we can calculate the product of them. The theoretical value for this, n!, is known assuming n isn't too large, so we can compare the calculated value to the theoretical value and now we have 2 equations which we can use to solve for the 2 unknowns, and we've used only constant space complexity to do it. Viola!

Uncategorized

A Review of String Fundamentals in C#

Strings are a fundamental data type in most languages and the .NET platform is no different. What follows is a Q&A on strings that (hopefully) serves as a refresher course on basic string knowledge and operations that programmers use on a regular basis.

What is a string and how are they stored in .NET ?
In .NET strings are just a series of Unicode characters used to represent text. They are serializable, comparable, equatable, enumerable and cloneable. They are not null-terminated like other languages. In fact, strings can have null characters embedded within them.

What is a Unicode character?
The Unicode Standard was designed to be able to codify all the characters of all the languages in the world. To do this it uses Unicode characters each of which is assigned a unique number called a code point. Various encodings are defined by the standard that specify how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF (often written U+0000 - U+FFFF) and in .NET is stored in a Char structure.

What's the difference between "string" and "System.String" ?
None. The lowercase version seen in C# code is merely an alias to the framework-provided System.String class.

What's the difference between a char and a string data type?
A Char represents a single Unicode character, whereas a string represents any number of Unicode characters. The System.Char data type is a struct so it is a value-type and therefore not nullable. Char literals are defined with single quotes whereas string literals are defined with double quotes.

Are strings value- or reference-types?
Strings are objects in .NET which makes them reference types, however they are unusual in that they exhibit value-type semantics since they are immutable and operations on them create new strings.

Are strings immutable?
Yes. See answer above. This means that any modifying operation performed on a string actually creates a new modified string. In other words, once created strings cannot be modified.

How are "empty" strings defined?
Since strings are reference types they can be set to null. There is also a system provided String.Empty field that you can use. It is effectively a zero-length string, "".

How big can strings be?
Like other CLR objects, string reference types cannot breach the maximum object size allowed in the GC heap which is 2GB. If you are more interested in character count you can use the Length instance property to obtain the number of characters (Char objects) in the string. Note that embedded nulls are counted by the Length property.

What is a character encoding?
As mentioned earlier, strings in .NET are sequences of Unicode characters. Encoding is the process of transforming a set of Unicode characters into a byte sequence. Whenever you want to transform a .NET string into a byte array you'll need to use an encoding. The same encoding must be used to transform the byte array back to a string! Note that encoding classes replace characters with "?" or a "substitute character" if a problem occurs.

What encodings are supported by .NET ?
The framework provides the Encoding class with supports ASCII, UTF-7, UTF-8, UTF-32, and of course Unicode. For Unicode and UTF-32 encoding both little endian and big endian byte orders are supported. These refer to whether the most-significant or the least-significant byte come first in the byte stream. The .NET framework can detect the endian-ness of a byte array by checking the first couple of bytes (by convention FE FF is passed).

What is string normalization?
Some Unicode characters have multiple equivalent binary representations consisting of sets of combining and/or composite Unicode characters. The existence of multiple representations for a single character complicates searching, sorting, matching, and other operations thus the Unicode standard defines a process called normalization that returns one binary representation when given any of the equivalent binary representations of a character. In .NET, the String.Normalize() method returns a new string whose binary representation is in a particular Unicode normalization form. The .NET Framework currently supports normalization forms C, D, KC, and KD.

How do you compare 2 strings? What algorithm is used?
There is an instance method available for strings called CompareTo() that accepts a second string and returns a negative integer if the first string is "less than" the second, a positive integer if the first string is "greater than" the second, or zero if they are the same. This is a linguistic operation (explanation below) therefore the comparison is culture-sensitive. This, and most other string comparisons, compare the actual string values rather than the object references.

There are other ways to perform comparisons. The static == operator can be used for string equality checks. Under the covers this uses the Equals() non-static method.

There is also a String.Compare(s1,s2) static method that does culture-aware string comparisons. You should prefer to use this one if the strings could be null. Note that comparing a null to a null returns match (0).

What's the difference between Compare and CompareOrdinal?
Both are static methods used to perform string comparisons but CompareOrdinal doesn't consider the culture (but it is case-sensitive). In general, string operations in .NET can be considered ordinal or linguistic. An ordinal operation acts on the numeric value of each Char object. A linguistic operation acts on the value of the String taking into account culture-specific casing, sorting, formatting, and parsing rules. Linguistic operations execute in the context of an explicitly declared culture or the implicit current culture. An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points. In general, a culture-sensitive comparison is typically appropriate for sorting, and ordinal comparison is typically appropriate for equality checking.

What's the difference between a string and a StringBuilder?
Whereas the String class represents an immutable sequence of Unicode characters, the StringBuilder class can be used to represent a mutable string of characters. Highly repetitive string manipulation operations (such as concatenation) are best done with a StringBuilder.

What is string interning?
As mentioned previously strings are objects and they are immutable. Because of this there is no real need to ever have multiple objects that represent the same string. To facilitate this the .NET framework employs string interning whereby an intern pool holds interned objects that represent strings. String literals are automatically interned, and you can also force interning of other strings via the Intern() method. There are also methods available to determine if a given string literal is interned or not.
Update (Dec-2009): Eric Lippert has a great article describing how interning and reference/value comparisons create interesting edge cases.

How can you replace a portion of a string?
Use the Replace method as follows: string output = input.Replace("searchTerm","replacementValue");

How would you reverse a string in-place?
See my earlier article on this.

What is a verbatim string literal?
Certain characters have special meaning and therefore need to be "escaped" when included in a string. This concept won't be new to most programmers, but the term verbatim string might be. A verbatim string is simply a string that is prefixed with @ before the double quotes, and this signifies that any embedded escape characters are not to be treated as such. Of course, it still respects the double quote to define the start and end of the string, and a pair of double quotes can be used to represent an embedded double quote character.

How would you create a string that consists of a single character repeated X times?
Use the constructor syntax (this creates a string of 80 consecutive hashes): string a = new string("#", 80);

How would you convert a string to a numeric?
Use the Convert class and code similar to this (for an integer): int i = Convert.ToInt32(someString);
Alternately you can call the Parse() static methods on the data types and pass it the string to convert.