C++11 on Centos

The good thing about Centos is it's free and RHEL-like. The bad thing is it is always behind RHEL so if you want to play with the latest greatest packages you have to hunt them down, unless you want to compile from source, which gets tedious rather quickly.

Whilst still classified as "experimental", support for the latest ISO standard for C++, C++11, has been progressively added to gcc. This page details the GCC versions that first supported each given C++11 feature. From this we can tell that GCC v4.8 is what we're after. Regrettably, the most recent package with Centos 6.5 is gcc 4.4.7, which is very old!

[damien@web1 ~]$ cat /etc/centos-release
CentOS release 6.5 (Final)

[damien@web1 ~]$ uname -a
Linux web1 2.6.32-431.20.5.el6.x86_64 #1 SMP Fri Jul 25 08:34:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[damien@web1 ~]$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So after some searching for relevant EL6 packages I found this repository (http://puias.princeton.edu/data/puias/DevToolset/6/x86_64/) of EL6 x86_64 packages that includes a good set of Developer Tools including GCC 4.8.1. A big thank you to the good folks at Princeton for providing this!

To make use of yum to easily install this package and any dependencies we need to add a new repository. These live under /etc/yum.repos.d/ so, as root, we need to create a file called, say /etc/yum.repos.d/DevToolset.repo and add to it the following information:

[DevToolset-2]
name=RedHat DevToolset v2 $releasever - $basearch
baseurl=http://puias.princeton.edu/data/puias/DevToolset/$releasever/$basearch/
enabled=1
gpgcheck=0

Here "baseurl" is a URL to the directory where the repodata directory of a repository is located. Note that Yum always expands the $releasever, $arch, and $basearch variables in URLs. [Read more that that here]. I've disabled the GPG signature check simply for convenience, and have set enabled=1 to instruct Yum to include this repository as a package source.

Now we can use yum to install the packages we want...

sudo yum install devtoolset-2-gcc-4.8.1 devtoolset-2-gcc-c++-4.8.1

This places the relevant binaries in /opt/rh/devtoolset-2/root/usr/bin...

[damien@web1 ~]$ ll /opt/rh/devtoolset-2/root/usr/bin
total 5544
-rwxr-xr-x 4 root root 753392 Sep 12 2013 c++
lrwxrwxrwx 1 root root 3 Jul 7 12:11 cc -> gcc
-rwxr-xr-x 1 root root 751872 Sep 12 2013 cpp
-rwxr-xr-x 4 root root 753392 Sep 12 2013 g++
-rwxr-xr-x 2 root root 749872 Sep 12 2013 gcc
-rwxr-xr-x 1 root root 24768 Sep 12 2013 gcc-ar
-rwxr-xr-x 1 root root 24736 Sep 12 2013 gcc-nm
-rwxr-xr-x 1 root root 24736 Sep 12 2013 gcc-ranlib
-rwxr-xr-x 1 root root 311088 Sep 12 2013 gcov
-r-xr-xr-x 1 root root 348 Mar 12 13:19 sudo
-rwxr-xr-x 4 root root 753392 Sep 12 2013 x86_64-redhat-linux-c++
-rwxr-xr-x 4 root root 753392 Sep 12 2013 x86_64-redhat-linux-g++
-rwxr-xr-x 2 root root 749872 Sep 12 2013 x86_64-redhat-linux-gcc
[damien@web1 ~]$

And we can confirm we have the desired version of GCC..

[damien@web1 ~]$ /opt/rh/devtoolset-2/root/usr/bin/gcc --version
gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Now it is simply a matter of making this GCC version the one that is found by default. We can do that by creating a symbolic link in the /usr/local/bin folder to the relevant GCC binary folder. Since the /usr/local/bin directory is including very early on in the $PATH variable the relevant binary should be found without giving the full path.

[damien@web1 ~]$ sudo ln -s /opt/rh/devtoolset-2/root/usr/bin/* /usr/local/bin/

[damien@web1 ~]$ hash -r

[damien@web1 ~]$ which gcc
/usr/local/bin/gcc

[damien@web1 ~]$ gcc --version
gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Testing all is good with a small program with C++11 features...

#include
#include

int main( )
{
std::cout << "Testing some c++11 things...\n" << std::endl;

std::vector myvector;
for (int i=1; i<=5; i++) myvector.push_back(i);

std::cout << "myvector contains:";

//use auto instead of the long-winded std::vector::iterator
for (auto it = myvector.begin() ; it != myvector.end(); ++it)
std::cout << ' ' << *it;

std::cout << '\n';

return 0;
}

To tell the compiler we want to use C++11 we can use the std=c++11, or std=gnu++11 compiler option...

[damien@web1 C11]$ g++ -Wall -std=c++11 ./test.cpp -o testc11
[damien@web1 C11]$ ./testc11
Testing some c++11 things...

myvector contains: 1 2 3 4 5
[damien@web1 C11]$

Bookmark and Share

Struct Padding in GCC

In general, tight data structures with compact representations are good for low latency as they ensure that as much data as possible fits in the data caches closer to the CPU, that is, Level 1 (L1d) and Level 2 (L2d) on modern CPUs. But sometimes the compiler interferes with your structs to preserve data alignment requirements of the hardware resulting in struct sizes that are not what you'd expect. To illustrate that let's examine a short program that we'll compile with G++ .

   1:   
   2:  #include <iostream>
   3:  using namespace std;
   4:   
   5:  // structure padding...
   6:  struct foo1 {
   7:      char a;        /* 1 byte  */
   8:                  /* 3 bytes padding added here */
   9:      int b;        /* 4 bytes */
  10:      char c;     /* 1 byte  */
  11:                  /* 3 bytes padding added here */
  12:  };
  13:   
  14:  struct foo2 {
  15:      char a1;    /* 1 byte  */
  16:      char a2;    /* 1 byte  */
  17:      char a3;    /* 1 byte  */
  18:      char a4;    /* 1 byte  */
  19:      int b;        /* 4 bytes */
  20:      char c1;     /* 1 byte  */
  21:      char c2;     /* 1 byte  */
  22:      char c3;     /* 1 byte  */
  23:      char c4;     /* 1 byte  */
  24:  };
  25:   
  26:  struct foo3 {
  27:      char a;        /* 1 byte  */
  28:      short b;    /* 2 bytes */
  29:                  /* 1 byte padding added here */
  30:      int c;        /* 4 bytes */
  31:  };
  32:   
  33:  //  reorder the structure members by decreasing alignment
  34:  struct foo4 {
  35:      int a;        /* 4 bytes */
  36:      short b;    /* 2 bytes */
  37:      char c;        /* 1 byte  */
  38:                  /* 1 byte padding added here */
  39:  };
  40:   
  41:  // structure packing... tells g++ not to add padding and not
  42:  // to make assumptions about alignment of accesses to members
  43:  struct foo1p {
  44:      char a;        /* 1 byte  */
  45:      int b;        /* 4 bytes */
  46:      char c;     /* 1 byte  */
  47:  } __attribute__((__packed__));
  48:   
  49:  int main() {
  50:   
  51:      // calculate the sizes of the structs
  52:      const int sizefoo1 = sizeof(foo1);
  53:      const int sizefoo2 = sizeof(foo2);
  54:      const int sizefoo3 = sizeof(foo3);
  55:      const int sizefoo4 = sizeof(foo4);
  56:   
  57:      // try again on the package variant
  58:      const int sizefoo1p = sizeof(foo1p);
  59:   
  60:      // output the size results
  61:      cout << "foo1 has size = " << sizefoo1 << endl;
  62:      cout << "foo2 has size = " << sizefoo2 << endl;
  63:      cout << "foo3 has size = " << sizefoo3 << endl;
  64:      cout << "foo4 has size = " << sizefoo4 << endl;
  65:      cout << "foo1p has size = " << sizefoo1p << endl;
  66:   
  67:      return 0;
  68:  }

Running this program, after compiling with g++ and the -O0 option, we get...

foo1 has size = 12
foo2 has size = 12
foo3 has size = 8
foo4 has size = 8
foo1p has size = 6

Note that the foo1 struct would at first seem to have 6 bytes, but it actually has double that, with 12 bytes. Why? The answer is struct padding. The compiler is trying to keep the CPU happy by ensuring that each member field of the struct is stored at an offset such that offset mod data_size = 0. i.e data-size aligned.

Since the second member of foo1 is an int which has a size of 4 bytes it's address need to be aligned to a 4 byte boundary. Since the first member of foo1 was a 1 byte char the compiler has little choice but to insert 3 bytes of padding after the "char a" member. The last member field in foo1 is also a char which can be aligned to any byte so you might be wondering why there is 3 bytes of padding after it. The reason is because, on x64 architectures, G++ adds padding such that the structure is aligned according to the largest member data type within the structure. In this case the largest member is 4-bytes so the total size is padded by 3 bytes to make it a total of 12 bytes, which is 4-byte aligned.

In the second example note that foo2 has no padding whatsoever (expected size = actual size = 12 bytes) because all member fields are naturally aligned to offsets equal to their data size.

In the next example, Foo3, the first two fields are data-size aligned, but the last field, c, needs 1 byte of padding preceding it to ensure it's offset is a multiple of it's size of 4 bytes.

A clever trick to minimise or avoid the padding is to order the fields in decreasing size, as shown in foo4.

Lastly we consider a case, foo1p, where we have explicitly disabled padding. Notice that the foo1p structure has __attribute__((__packed__)) specified. This is a G++ extension that tells the compiler NOT to pad the members of the structure. Such extensions are implementation specific so your millage may vary depending on the compiler and platform.

Clearly compiler writers are not stupid and they add padding for a reason! Whilst you can fiddle with the padding by re-ordering member fields or by using the above mentioned G++ extension to suppress it, the trade-off you are making is improved space compression vs possible performance issues with unaligned memory access and lesser portability of the code, as some platforms will even throw exceptions for unaligned access. Also note that if your data fields fall across cache lines the processor has little choice but to do at least 2 reads to get the data, rather than one which means you need to careful as that read operation for the member field adds many more CPU cycles and potential data stalls than the equivalent aligned read. Here's an article that tries to measure it in a semi-scientific fashion.

So if fields can be re-arranging to save save, and not span cache lines too much there can still be benefit in doing this, especially when you have a large array/vector of the structs and memory efficiency is paramount.

Bookmark and Share

HFT in Java

OpenHFT is a series of components, written in Java by Peter Lawrey, that are purportedly useful for latency-sensitive systems, specifically financial systems. An introductory video presentation is here. Slides for the presentation can be found here. Source is on github.

The presentation moves along fairly slowly but none-the less it's worth a viewing. The major points raised by the speaker include:

i) the performance benefits of shared-memory (and off-heap data structures);
ii) promotion of his low-latency, high-throughput logging system, Chronical.

The metrics on Chronical do look decent but I have to wonder why you wouldn't use the free version of kdb for such a task.

The other thing to keep in mind is that low-latency Java requires programmers to think and act like good C++ programmers which means the argument for using Java because there are more programmers to choose from isn't strictly true.

and is the dependenc

Bookmark and Share

Next »