In general, tight data structures with compact representations are good for low latency as they ensure that as much data as possible fits in the data caches closer to the CPU, that is, Level 1 (L1d) and Level 2 (L2d) on modern CPUs. But sometimes the compiler interferes with your structs to preserve data alignment requirements of the hardware resulting in struct sizes that are not what you'd expect. To illustrate that let's examine a short program that we'll compile with G++ .

   1:   
   2:  #include <iostream>
   3:  using namespace std;
   4:   
   5:  // structure padding...
   6:  struct foo1 {
   7:      char a;        /* 1 byte  */
   8:                  /* 3 bytes padding added here */
   9:      int b;        /* 4 bytes */
  10:      char c;     /* 1 byte  */
  11:                  /* 3 bytes padding added here */
  12:  };
  13:   
  14:  struct foo2 {
  15:      char a1;    /* 1 byte  */
  16:      char a2;    /* 1 byte  */
  17:      char a3;    /* 1 byte  */
  18:      char a4;    /* 1 byte  */
  19:      int b;        /* 4 bytes */
  20:      char c1;     /* 1 byte  */
  21:      char c2;     /* 1 byte  */
  22:      char c3;     /* 1 byte  */
  23:      char c4;     /* 1 byte  */
  24:  };
  25:   
  26:  struct foo3 {
  27:      char a;        /* 1 byte  */
  28:      short b;    /* 2 bytes */
  29:                  /* 1 byte padding added here */
  30:      int c;        /* 4 bytes */
  31:  };
  32:   
  33:  //  reorder the structure members by decreasing alignment
  34:  struct foo4 {
  35:      int a;        /* 4 bytes */
  36:      short b;    /* 2 bytes */
  37:      char c;        /* 1 byte  */
  38:                  /* 1 byte padding added here */
  39:  };
  40:   
  41:  // structure packing... tells g++ not to add padding and not
  42:  // to make assumptions about alignment of accesses to members
  43:  struct foo1p {
  44:      char a;        /* 1 byte  */
  45:      int b;        /* 4 bytes */
  46:      char c;     /* 1 byte  */
  47:  } __attribute__((__packed__));
  48:   
  49:  int main() {
  50:   
  51:      // calculate the sizes of the structs
  52:      const int sizefoo1 = sizeof(foo1);
  53:      const int sizefoo2 = sizeof(foo2);
  54:      const int sizefoo3 = sizeof(foo3);
  55:      const int sizefoo4 = sizeof(foo4);
  56:   
  57:      // try again on the package variant
  58:      const int sizefoo1p = sizeof(foo1p);
  59:   
  60:      // output the size results
  61:      cout << "foo1 has size = " << sizefoo1 << endl;
  62:      cout << "foo2 has size = " << sizefoo2 << endl;
  63:      cout << "foo3 has size = " << sizefoo3 << endl;
  64:      cout << "foo4 has size = " << sizefoo4 << endl;
  65:      cout << "foo1p has size = " << sizefoo1p << endl;
  66:   
  67:      return 0;
  68:  }

Running this program, after compiling with g++ and the -O0 option, we get...

foo1 has size = 12
foo2 has size = 12
foo3 has size = 8
foo4 has size = 8
foo1p has size = 6

Note that the foo1 struct would at first seem to have 6 bytes, but it actually has double that, with 12 bytes. Why? The answer is struct padding. The compiler is trying to keep the CPU happy by ensuring that each member field of the struct is stored at an offset such that offset mod data_size = 0. i.e data-size aligned.

Since the second member of foo1 is an int which has a size of 4 bytes it's address need to be aligned to a 4 byte boundary. Since the first member of foo1 was a 1 byte char the compiler has little choice but to insert 3 bytes of padding after the "char a" member. The last member field in foo1 is also a char which can be aligned to any byte so you might be wondering why there is 3 bytes of padding after it. The reason is because, on x64 architectures, G++ adds padding such that the structure is aligned according to the largest member data type within the structure. In this case the largest member is 4-bytes so the total size is padded by 3 bytes to make it a total of 12 bytes, which is 4-byte aligned.

In the second example note that foo2 has no padding whatsoever (expected size = actual size = 12 bytes) because all member fields are naturally aligned to offsets equal to their data size.

In the next example, Foo3, the first two fields are data-size aligned, but the last field, c, needs 1 byte of padding preceding it to ensure it's offset is a multiple of it's size of 4 bytes.

A clever trick to minimise or avoid the padding is to order the fields in decreasing size, as shown in foo4.

Lastly we consider a case, foo1p, where we have explicitly disabled padding. Notice that the foo1p structure has __attribute__((__packed__)) specified. This is a G++ extension that tells the compiler NOT to pad the members of the structure. Such extensions are implementation specific so your millage may vary depending on the compiler and platform.

Clearly compiler writers are not stupid and they add padding for a reason! Whilst you can fiddle with the padding by re-ordering member fields or by using the above mentioned G++ extension to suppress it, the trade-off you are making is improved space compression vs possible performance issues with unaligned memory access and lesser portability of the code, as some platforms will even throw exceptions for unaligned access. Also note that if your data fields fall across cache lines the processor has little choice but to do at least 2 reads to get the data, rather than one which means you need to careful as that read operation for the member field adds many more CPU cycles and potential data stalls than the equivalent aligned read. Here's an article that tries to measure it in a semi-scientific fashion.

So if fields can be re-arranging to save save, and not span cache lines too much there can still be benefit in doing this, especially when you have a large array/vector of the structs and memory efficiency is paramount.

Bookmark and Share