## Perfect Bridge Hand

Question: Bridge is a game where you are dealt 13 cards. Assuming the deck is a standard 52 card deck, and it is well shuffled, and no one else is in the deal, what are the chances that you would be dealt 13 cards of the same suit ?

Answer:We have 13 events of interest - that being drawing 13 individual cards. Each card draw is an independent event so we can simply multiply the probabilities of each of the 13 events happening.

More concretely, the first card we draw from the 52-card deck has no real constraints, thus the probability of this is 1. Any card will do. The second event is a card drawn from the remaining deck which has 51 cards, and it must be one of the other 12 cards that have the same suit as that in the first drawn card. Thus the probability for the second event is 12/51. Similarly, the likelihood of drawing the 3rd card from the remaining deck and it having the same suit as the first two cards drawn is 11/50.

Continuing this for all 13 cards we get the answer:

(1).(1251)(1150)(1049)(948)(847)(746)(645)(544)(443)(342)(241)(140) = (12!)(39!)51!

Alternately, we can consider the problem in terms of the total number of combinations. The number of different ways we can get 13 cards from a deck of 52 is: 52C13 = 52! / (39!)(13!). since there are 4 suits there are only 4 ways to get 13 cards of the same suit, thus the probability of getting one of these 4 perfect bridge hands is:

4 / (52! / (39!)(13!)) = 4 (39!)(13!) / 52! = (39!)(13!)/ (13. 51!) = (39!)(12!)/51!

## C++11 on Centos

The good thing about Centos is it's free and RHEL-like. The bad thing is it is always behind RHEL so if you want to play with the latest greatest packages you have to hunt them down, unless you want to compile from source, which gets tedious rather quickly.

Whilst still classified as "experimental", support for the latest ISO standard for C++, C++11, has been progressively added to gcc. This page details the GCC versions that first supported each given C++11 feature. From this we can tell that GCC v4.8 is what we're after. Regrettably, the most recent package with Centos 6.5 is gcc 4.4.7, which is very old!

[damien@web1 ~]\$ cat /etc/centos-release
CentOS release 6.5 (Final)

[damien@web1 ~]\$ uname -a
Linux web1 2.6.32-431.20.5.el6.x86_64 #1 SMP Fri Jul 25 08:34:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[damien@web1 ~]\$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So after some searching for relevant EL6 packages I found this repository (http://puias.princeton.edu/data/puias/DevToolset/6/x86_64/) of EL6 x86_64 packages that includes a good set of Developer Tools including GCC 4.8.1. A big thank you to the good folks at Princeton for providing this!

To make use of yum to easily install this package and any dependencies we need to add a new repository. These live under /etc/yum.repos.d/ so, as root, we need to create a file called, say /etc/yum.repos.d/DevToolset.repo and add to it the following information:

[DevToolset-2]
name=RedHat DevToolset v2 \$releasever - \$basearch
baseurl=http://puias.princeton.edu/data/puias/DevToolset/\$releasever/\$basearch/
enabled=1
gpgcheck=0

Here "baseurl" is a URL to the directory where the repodata directory of a repository is located. Note that Yum always expands the \$releasever, \$arch, and \$basearch variables in URLs. [Read more that that here]. I've disabled the GPG signature check simply for convenience, and have set enabled=1 to instruct Yum to include this repository as a package source.

Now we can use yum to install the packages we want...

sudo yum install devtoolset-2-gcc-4.8.1 devtoolset-2-gcc-c++-4.8.1

This places the relevant binaries in /opt/rh/devtoolset-2/root/usr/bin...

[damien@web1 ~]\$ ll /opt/rh/devtoolset-2/root/usr/bin
total 5544
-rwxr-xr-x 4 root root 753392 Sep 12 2013 c++
lrwxrwxrwx 1 root root 3 Jul 7 12:11 cc -> gcc
-rwxr-xr-x 1 root root 751872 Sep 12 2013 cpp
-rwxr-xr-x 4 root root 753392 Sep 12 2013 g++
-rwxr-xr-x 2 root root 749872 Sep 12 2013 gcc
-rwxr-xr-x 1 root root 24768 Sep 12 2013 gcc-ar
-rwxr-xr-x 1 root root 24736 Sep 12 2013 gcc-nm
-rwxr-xr-x 1 root root 24736 Sep 12 2013 gcc-ranlib
-rwxr-xr-x 1 root root 311088 Sep 12 2013 gcov
-r-xr-xr-x 1 root root 348 Mar 12 13:19 sudo
-rwxr-xr-x 4 root root 753392 Sep 12 2013 x86_64-redhat-linux-c++
-rwxr-xr-x 4 root root 753392 Sep 12 2013 x86_64-redhat-linux-g++
-rwxr-xr-x 2 root root 749872 Sep 12 2013 x86_64-redhat-linux-gcc
[damien@web1 ~]\$

And we can confirm we have the desired version of GCC..

[damien@web1 ~]\$ /opt/rh/devtoolset-2/root/usr/bin/gcc --version
gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Now it is simply a matter of making this GCC version the one that is found by default. We can do that by creating a symbolic link in the /usr/local/bin folder to the relevant GCC binary folder. Since the /usr/local/bin directory is including very early on in the \$PATH variable the relevant binary should be found without giving the full path.

[damien@web1 ~]\$ sudo ln -s /opt/rh/devtoolset-2/root/usr/bin/* /usr/local/bin/

[damien@web1 ~]\$ hash -r

[damien@web1 ~]\$ which gcc
/usr/local/bin/gcc

[damien@web1 ~]\$ gcc --version
gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Testing all is good with a small program with C++11 features...

#include <vector>
#include <iostream>

int main( )
{
std::cout << "Testing some c++11 things...\n" << std::endl;

std::vector myvector;
for (int i=1; i<=5; i++) myvector.push_back(i);

std::cout << "myvector contains:";

//use auto instead of the long-winded std::vector::iterator
for (auto it = myvector.begin() ; it != myvector.end(); ++it)
std::cout << ' ' << *it;

std::cout << '\n';

return 0;
}

To tell the compiler we want to use C++11 we can use the std=c++11, or std=gnu++11 compiler option...

[damien@web1 C11]\$ g++ -Wall -std=c++11 ./test.cpp -o testc11
[damien@web1 C11]\$ ./testc11
Testing some c++11 things...

myvector contains: 1 2 3 4 5
[damien@web1 C11]\$

In general, tight data structures with compact representations are good for low latency as they ensure that as much data as possible fits in the data caches closer to the CPU, that is, Level 1 (L1d) and Level 2 (L2d) on modern CPUs. But sometimes the compiler interferes with your structs to preserve data alignment requirements of the hardware resulting in struct sizes that are not what you'd expect. To illustrate that let's examine a short program that we'll compile with G++ .

1:
2:  #include <iostream>
3:  using namespace std;
4:
6:  struct foo1 {
7:      char a;        /* 1 byte  */
9:      int b;        /* 4 bytes */
10:      char c;     /* 1 byte  */
12:  };
13:
14:  struct foo2 {
15:      char a1;    /* 1 byte  */
16:      char a2;    /* 1 byte  */
17:      char a3;    /* 1 byte  */
18:      char a4;    /* 1 byte  */
19:      int b;        /* 4 bytes */
20:      char c1;     /* 1 byte  */
21:      char c2;     /* 1 byte  */
22:      char c3;     /* 1 byte  */
23:      char c4;     /* 1 byte  */
24:  };
25:
26:  struct foo3 {
27:      char a;        /* 1 byte  */
28:      short b;    /* 2 bytes */
30:      int c;        /* 4 bytes */
31:  };
32:
33:  //  reorder the structure members by decreasing alignment
34:  struct foo4 {
35:      int a;        /* 4 bytes */
36:      short b;    /* 2 bytes */
37:      char c;        /* 1 byte  */
39:  };
40:
41:  // structure packing... tells g++ not to add padding and not
42:  // to make assumptions about alignment of accesses to members
43:  struct foo1p {
44:      char a;        /* 1 byte  */
45:      int b;        /* 4 bytes */
46:      char c;     /* 1 byte  */
47:  } __attribute__((__packed__));
48:
49:  int main() {
50:
51:      // calculate the sizes of the structs
52:      const int sizefoo1 = sizeof(foo1);
53:      const int sizefoo2 = sizeof(foo2);
54:      const int sizefoo3 = sizeof(foo3);
55:      const int sizefoo4 = sizeof(foo4);
56:
57:      // try again on the package variant
58:      const int sizefoo1p = sizeof(foo1p);
59:
60:      // output the size results
61:      cout << "foo1 has size = " << sizefoo1 << endl;
62:      cout << "foo2 has size = " << sizefoo2 << endl;
63:      cout << "foo3 has size = " << sizefoo3 << endl;
64:      cout << "foo4 has size = " << sizefoo4 << endl;
65:      cout << "foo1p has size = " << sizefoo1p << endl;
66:
67:      return 0;
68:  }

Running this program, after compiling with g++ and the -O0 option, we get...

foo1 has size = 12
foo2 has size = 12
foo3 has size = 8
foo4 has size = 8
foo1p has size = 6

Note that the foo1 struct would at first seem to have 6 bytes, but it actually has double that, with 12 bytes. Why? The answer is struct padding. The compiler is trying to keep the CPU happy by ensuring that each member field of the struct is stored at an offset such that offset mod data_size = 0. i.e data-size aligned.

Since the second member of foo1 is an int which has a size of 4 bytes it's address need to be aligned to a 4 byte boundary. Since the first member of foo1 was a 1 byte char the compiler has little choice but to insert 3 bytes of padding after the "char a" member. The last member field in foo1 is also a char which can be aligned to any byte so you might be wondering why there is 3 bytes of padding after it. The reason is because, on x64 architectures, G++ adds padding such that the structure is aligned according to the largest member data type within the structure. In this case the largest member is 4-bytes so the total size is padded by 3 bytes to make it a total of 12 bytes, which is 4-byte aligned.

In the second example note that foo2 has no padding whatsoever (expected size = actual size = 12 bytes) because all member fields are naturally aligned to offsets equal to their data size.

In the next example, Foo3, the first two fields are data-size aligned, but the last field, c, needs 1 byte of padding preceding it to ensure it's offset is a multiple of it's size of 4 bytes.

A clever trick to minimise or avoid the padding is to order the fields in decreasing size, as shown in foo4.

Lastly we consider a case, foo1p, where we have explicitly disabled padding. Notice that the foo1p structure has __attribute__((__packed__)) specified. This is a G++ extension that tells the compiler NOT to pad the members of the structure. Such extensions are implementation specific so your millage may vary depending on the compiler and platform.

Clearly compiler writers are not stupid and they add padding for a reason! Whilst you can fiddle with the padding by re-ordering member fields or by using the above mentioned G++ extension to suppress it, the trade-off you are making is improved space compression vs possible performance issues with unaligned memory access and lesser portability of the code, as some platforms will even throw exceptions for unaligned access. Also note that if your data fields fall across cache lines the processor has little choice but to do at least 2 reads to get the data, rather than one which means you need to careful as that read operation for the member field adds many more CPU cycles and potential data stalls than the equivalent aligned read. Here's an article that tries to measure it in a semi-scientific fashion.

So if fields can be re-arranging to save save, and not span cache lines too much there can still be benefit in doing this, especially when you have a large array/vector of the structs and memory efficiency is paramount.