C++11 on Centos

The good thing about Centos is it's free and RHEL-like. The bad thing is it is always behind RHEL so if you want to play with the latest greatest packages you have to hunt them down, unless you want to compile from source, which gets tedious rather quickly.

Whilst still classified as "experimental", support for the latest ISO standard for C++, C++11, has been progressively added to gcc. This page details the GCC versions that first supported each given C++11 feature. From this we can tell that GCC v4.8 is what we're after. Regrettably, the most recent package with Centos 6.5 is gcc 4.4.7, which is very old!

[damien@web1 ~]$ cat /etc/centos-release
CentOS release 6.5 (Final)

[damien@web1 ~]$ uname -a
Linux web1 2.6.32-431.20.5.el6.x86_64 #1 SMP Fri Jul 25 08:34:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[damien@web1 ~]$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO

So after some searching for relevant EL6 packages I found this repository (http://puias.princeton.edu/data/puias/DevToolset/6/x86_64/) of EL6 x86_64 packages that includes a good set of Developer Tools including GCC 4.8.1. A big thank you to the good folks at Princeton for providing this!

To make use of yum to easily install this package and any dependencies we need to add a new repository. These live under /etc/yum.repos.d/ so, as root, we need to create a file called, say /etc/yum.repos.d/DevToolset.repo and add to it the following information:

name=RedHat DevToolset v2 $releasever - $basearch

Here "baseurl" is a URL to the directory where the repodata directory of a repository is located. Note that Yum always expands the $releasever, $arch, and $basearch variables in URLs. [Read more that that here]. I've disabled the GPG signature check simply for convenience, and have set enabled=1 to instruct Yum to include this repository as a package source.

Now we can use yum to install the packages we want...

sudo yum install devtoolset-2-gcc-4.8.1 devtoolset-2-gcc-c++-4.8.1

This places the relevant binaries in /opt/rh/devtoolset-2/root/usr/bin...

[damien@web1 ~]$ ll /opt/rh/devtoolset-2/root/usr/bin
total 5544
-rwxr-xr-x 4 root root 753392 Sep 12 2013 c++
lrwxrwxrwx 1 root root 3 Jul 7 12:11 cc -> gcc
-rwxr-xr-x 1 root root 751872 Sep 12 2013 cpp
-rwxr-xr-x 4 root root 753392 Sep 12 2013 g++
-rwxr-xr-x 2 root root 749872 Sep 12 2013 gcc
-rwxr-xr-x 1 root root 24768 Sep 12 2013 gcc-ar
-rwxr-xr-x 1 root root 24736 Sep 12 2013 gcc-nm
-rwxr-xr-x 1 root root 24736 Sep 12 2013 gcc-ranlib
-rwxr-xr-x 1 root root 311088 Sep 12 2013 gcov
-r-xr-xr-x 1 root root 348 Mar 12 13:19 sudo
-rwxr-xr-x 4 root root 753392 Sep 12 2013 x86_64-redhat-linux-c++
-rwxr-xr-x 4 root root 753392 Sep 12 2013 x86_64-redhat-linux-g++
-rwxr-xr-x 2 root root 749872 Sep 12 2013 x86_64-redhat-linux-gcc
[damien@web1 ~]$

And we can confirm we have the desired version of GCC..

[damien@web1 ~]$ /opt/rh/devtoolset-2/root/usr/bin/gcc --version
gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO

Now it is simply a matter of making this GCC version the one that is found by default. We can do that by creating a symbolic link in the /usr/local/bin folder to the relevant GCC binary folder. Since the /usr/local/bin directory is including very early on in the $PATH variable the relevant binary should be found without giving the full path.

[damien@web1 ~]$ sudo ln -s /opt/rh/devtoolset-2/root/usr/bin/* /usr/local/bin/

[damien@web1 ~]$ hash -r

[damien@web1 ~]$ which gcc

[damien@web1 ~]$ gcc --version
gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO

Testing all is good with a small program with C++11 features...

#include <vector>
#include <iostream>

int main( )
std::cout << "Testing some c++11 things...\n" << std::endl;

std::vector myvector;
for (int i=1; i<=5; i++) myvector.push_back(i);

std::cout << "myvector contains:";

//use auto instead of the long-winded std::vector::iterator
for (auto it = myvector.begin() ; it != myvector.end(); ++it)
std::cout << ' ' << *it;

std::cout << '\n';

return 0;

To tell the compiler we want to use C++11 we can use the std=c++11, or std=gnu++11 compiler option...

[damien@web1 C11]$ g++ -Wall -std=c++11 ./test.cpp -o testc11
[damien@web1 C11]$ ./testc11
Testing some c++11 things...

myvector contains: 1 2 3 4 5
[damien@web1 C11]$


Struct Padding in GCC

In general, tight data structures with compact representations are good for low latency as they ensure that as much data as possible fits in the data caches closer to the CPU, that is, Level 1 (L1d) and Level 2 (L2d) on modern CPUs. But sometimes the compiler interferes with your structs to preserve data alignment requirements of the hardware resulting in struct sizes that are not what you'd expect. To illustrate that let's examine a short program that we'll compile with G++ .

   2:  #include <iostream>
   3:  using namespace std;
   5:  // structure padding...
   6:  struct foo1 {
   7:      char a;        /* 1 byte  */
   8:                  /* 3 bytes padding added here */
   9:      int b;        /* 4 bytes */
  10:      char c;     /* 1 byte  */
  11:                  /* 3 bytes padding added here */
  12:  };
  14:  struct foo2 {
  15:      char a1;    /* 1 byte  */
  16:      char a2;    /* 1 byte  */
  17:      char a3;    /* 1 byte  */
  18:      char a4;    /* 1 byte  */
  19:      int b;        /* 4 bytes */
  20:      char c1;     /* 1 byte  */
  21:      char c2;     /* 1 byte  */
  22:      char c3;     /* 1 byte  */
  23:      char c4;     /* 1 byte  */
  24:  };
  26:  struct foo3 {
  27:      char a;        /* 1 byte  */
  28:      short b;    /* 2 bytes */
  29:                  /* 1 byte padding added here */
  30:      int c;        /* 4 bytes */
  31:  };
  33:  //  reorder the structure members by decreasing alignment
  34:  struct foo4 {
  35:      int a;        /* 4 bytes */
  36:      short b;    /* 2 bytes */
  37:      char c;        /* 1 byte  */
  38:                  /* 1 byte padding added here */
  39:  };
  41:  // structure packing... tells g++ not to add padding and not
  42:  // to make assumptions about alignment of accesses to members
  43:  struct foo1p {
  44:      char a;        /* 1 byte  */
  45:      int b;        /* 4 bytes */
  46:      char c;     /* 1 byte  */
  47:  } __attribute__((__packed__));
  49:  int main() {
  51:      // calculate the sizes of the structs
  52:      const int sizefoo1 = sizeof(foo1);
  53:      const int sizefoo2 = sizeof(foo2);
  54:      const int sizefoo3 = sizeof(foo3);
  55:      const int sizefoo4 = sizeof(foo4);
  57:      // try again on the package variant
  58:      const int sizefoo1p = sizeof(foo1p);
  60:      // output the size results
  61:      cout << "foo1 has size = " << sizefoo1 << endl;
  62:      cout << "foo2 has size = " << sizefoo2 << endl;
  63:      cout << "foo3 has size = " << sizefoo3 << endl;
  64:      cout << "foo4 has size = " << sizefoo4 << endl;
  65:      cout << "foo1p has size = " << sizefoo1p << endl;
  67:      return 0;
  68:  }

Running this program, after compiling with g++ and the -O0 option, we get...

foo1 has size = 12
foo2 has size = 12
foo3 has size = 8
foo4 has size = 8
foo1p has size = 6

Note that the foo1 struct would at first seem to have 6 bytes, but it actually has double that, with 12 bytes. Why? The answer is struct padding. The compiler is trying to keep the CPU happy by ensuring that each member field of the struct is stored at an offset such that offset mod data_size = 0. i.e data-size aligned.

Since the second member of foo1 is an int which has a size of 4 bytes it's address need to be aligned to a 4 byte boundary. Since the first member of foo1 was a 1 byte char the compiler has little choice but to insert 3 bytes of padding after the "char a" member. The last member field in foo1 is also a char which can be aligned to any byte so you might be wondering why there is 3 bytes of padding after it. The reason is because, on x64 architectures, G++ adds padding such that the structure is aligned according to the largest member data type within the structure. In this case the largest member is 4-bytes so the total size is padded by 3 bytes to make it a total of 12 bytes, which is 4-byte aligned.

In the second example note that foo2 has no padding whatsoever (expected size = actual size = 12 bytes) because all member fields are naturally aligned to offsets equal to their data size.

In the next example, Foo3, the first two fields are data-size aligned, but the last field, c, needs 1 byte of padding preceding it to ensure it's offset is a multiple of it's size of 4 bytes.

A clever trick to minimise or avoid the padding is to order the fields in decreasing size, as shown in foo4.

Lastly we consider a case, foo1p, where we have explicitly disabled padding. Notice that the foo1p structure has __attribute__((__packed__)) specified. This is a G++ extension that tells the compiler NOT to pad the members of the structure. Such extensions are implementation specific so your millage may vary depending on the compiler and platform.

Clearly compiler writers are not stupid and they add padding for a reason! Whilst you can fiddle with the padding by re-ordering member fields or by using the above mentioned G++ extension to suppress it, the trade-off you are making is improved space compression vs possible performance issues with unaligned memory access and lesser portability of the code, as some platforms will even throw exceptions for unaligned access. Also note that if your data fields fall across cache lines the processor has little choice but to do at least 2 reads to get the data, rather than one which means you need to careful as that read operation for the member field adds many more CPU cycles and potential data stalls than the equivalent aligned read. Here's an article that tries to measure it in a semi-scientific fashion.

So if fields can be re-arranging to save save, and not span cache lines too much there can still be benefit in doing this, especially when you have a large array/vector of the structs and memory efficiency is paramount.


Hacked with the Webarh Redirect

Today I noticed that Google Chrome and Firefox were both telling me my website was a malware source, because it had been redirected to a URL at webarh.com (the nasty people) - obvious from the URL shown. This is a Red Hat Linux setup using php, mySQL and Apache, and here's what I did to diagnose and fix the problem, in case you get infected to...

First thing I did was figure out if it was a DNS hijack or a website attack. This can be done by resolving the domain name and ensuring it is the public IP of your web server.

[root@red3 /]# nslookup www.necessaryandsufficient.net

Non-authoritative answer:
www.necessaryandsufficient.net  canonical name = necessaryandsufficient.net.
Name:   necessaryandsufficient.net

[root@red3 /]# curl -s myip.dk | grep "IP Address" | egrep -o '[0-9.]+'
[root@red3 /]#

Since they both say, this indicates that it isn't a DNS attack. Something on the website is screwing things up. We can confirm this with another curl command...

[root@red3 /]# curl -i www.necessaryandsufficient.net | more

In the infected state you will see a 302 (redirect) to the nasty URL on webarh.com in the output to the command above. So something in the website config is redirecting. Given this is Apache with mod-rewrite perhaps someone has managed to setup new redirects. Let's do some simple string matching first up...

[root@red3 /]# grep -r "webarh" /var/www/

This threw up a bunch of .htaccess files that had redirect instructions to the nasty webarh.com URL. Killing those is the first step. Trying the site again, and the problem still wasn't fixed. There must be more nasty redirects lurking elsewhere, so I looked more closely to the output of the above command and saw a bunch of index.php files had some Javascript that was doing a redirect in code. Setting document.location.href = "whatever" tells the browser to go to the URL you specified (albeit with a depricated command), so you have to deftly trim out those nastry script tags with your favourite text editor.

Tried the website again and now the home page comes up but the links, Wordpress use seo-friendly SLUGs, don't work. Clearly this is because we've lost some of the URL rewrite rules from the base directory of the website. Luckily I have backups and that file hasn't changed in ages (until now) so a simple restore fixed that.

Things seem back up and running now, so let's figure out what caused all this in the first place and how to patch it. I did some Googling first then dived into the web server logs to confirm something I'd read about a phpMyAdmin security hole...

[root@red3 logs]# cd /usr/local/apache2/logs
[root@red3 logs]# grep "/scripts/setup.php" access_log

This threw up a massive scan (GET ...) of setup.php files for older versions of phpMyAdmin confirming what I'd seen posted about this attack. Most HTTP GETs had 500 "internal server" errors, but the file mentioned above returned a HTTP 200 code and the attacker posted to this file (see last 2 lines of output below). - - [24/Nov/2010:08:00:52 +1100] "GET /phpMyAdmin-2.8.0-beta1/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:53 +1100] "GET /phpMyAdmin-2.8.0-rc1/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:53 +1100] "GET /phpMyAdmin-2.8.0-rc2/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:53 +1100] "GET /phpMyAdmin- HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:54 +1100] "GET /phpMyAdmin- HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:56 +1100] "GET /phpMyAdmin- HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:56 +1100] "GET /phpMyAdmin-2.8.0/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:00:58 +1100] "GET /phpMyAdmin-2.8.1/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:21 +1100] "GET /pma/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:23 +1100] "GET /roundcube/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:23 +1100] "GET /scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:24 +1100] "GET /sl2/data/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:27 +1100] "GET /sqlmanager/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:27 +1100] "GET /sqlweb/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:27 +1100] "GET /typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:27 +1100] "GET /web/phpMyAdmin/scripts/setup.php HTTP/1.1" 500 546 - - [24/Nov/2010:08:01:28 +1100] "GET /web/scripts/setup.php HTTP/1.1" 500 546 - - [26/Nov/2010:01:37:02 +1100] "GET /phpMyAdmin/scripts/setup.php HTTP/1.1" 200 13723 - - [26/Nov/2010:01:37:03 +1100] "POST /phpMyAdmin/scripts/setup.php HTTP/1.1" 200 24975 - - [26/Nov/2010:03:22:27 +1100] "GET /phpMyAdmin/scripts/setup.php HTTP/1.1" 200 13744 - - [26/Nov/2010:03:22:28 +1100] "POST /phpMyAdmin/scripts/setup.php HTTP/1.1" 200 13744

Essentially, the setup file for phpMyAdmin allows arbitrary PHP code injection, and the attacker exploited this to inject all these nasty redirects. See this article for more information.

You could upgrade phpMyAdmin but I don't use it much so it's gone. What a PITA.