Came across some good work from Daniel Lemire, CS Professor at University of Quebec, for highly-efficient, fast compression/encoding/decoding of integers. Full details here:
It's a very nice use of vectorisation and Streaming SIMD Extensions (SSE2/3/4) in modern chipsets. Their SIMD-BP128 vectorised scheme makes use of vectorisation for bit packing prior to compress/decompress operations, and also for the delta encoding phase where, instead of storing the native integers, the differences are stored, with these differences calculated in a vectorised manner. Their scheme is not patented, and they have posted their code on github. You can find C, C++ and Java libraries for usage.
For the specific case of sorted integers, they have published a separate paper, given below, that also applies algorithms based on available SIMD instructions to boost compression speed. They claim their S4-BP128-D4 scheme "uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression".
Conveniently they provide the code for it on github: https://github.com/lemire/SIMDCompressionAndIntersection.
In finance most market data is effectively an append-only, immutable time-series where every record has a new timestamp, but timestamps are just long offsets from a time epoch (such as midnight 1/1/1900). Thus, using columnar storage, we get a column of "time" values which is a vector of non-descending integers/longs. These are very amenable to delta-encoding or some more specialised form of sorted integer compression.