A fun benchmarking framework for testing memory allocators, numeric representations, and SIMD optimizations in the context of high-performance financial systems.
A limit order book implementation with swappable memory allocators:
- Standard: Uses
std::allocator(malloc/free) - Pool: Fixed-size block allocator with O(1) alloc/free
- Arena: Bump allocator with bulk deallocation
Features:
- Price-time priority matching
- Configurable numeric representations (int64, double, fixed-point)
- Overflow handling modes (undefined, checked, saturating, widening)
Portfolio Greeks computation with multiple implementation strategies:
- Scalar Naive: Simple loop
- Scalar Unrolled: 4x loop unrolling
- Kahan Summation: Compensated summation for accuracy
- Pairwise: Divide-and-conquer summation
- AVX2: 256-bit SIMD with FMA
- AVX-512: 512-bit SIMD
- AVX2 + Kahan: SIMD with compensation
Includes DAZ/FTZ (Denormals-Are-Zero/Flush-To-Zero) mode testing.
Binary protocol parser with fault injection for robustness testing:
- Simulated ITCH-style market data protocol
- Length validation and overflow protection
- Sequence gap detection
- Fault injection: bit flips, truncation, invalid lengths/types
Pure allocator performance comparison:
- Pool Allocator: Fixed-size blocks, free list
- Arena Allocator: Bump pointer, bulk free
- Slab Allocator: Object-specific caching
- Size-Class Allocator: jemalloc-style size classes
- Thread-Safe Pool: Lock-free concurrent allocator
- CMake 3.16+
- C++20 compiler (GCC 10+, Clang 12+)
- Optional: AVX2/AVX-512 capable CPU
# Standard build
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)
# With sanitizers (for debugging)
cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_SANITIZERS=ON ..
make -j$(nproc)
# With AVX-512
cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX512=ON ..
make -j$(nproc)# Order book benchmark
./order_book_bench
# Risk engine benchmark
./risk_engine_bench
# With DAZ/FTZ enabled
./risk_engine_bench --daz-ftz
# Subnormal stress test
./risk_engine_bench --subnormal
# Cancellation stress test
./risk_engine_bench --cancellation
# Feed parser benchmark
./feed_parser_bench
# Allocator benchmark
./allocator_benchecho '{"config": {"allocator": "pool", "operation_count": 100000}}' | ./order_book_bench --json
echo '{"config": {"position_count": 100000, "daz_ftz": true}}' | ./risk_engine_bench --jsoncd python
pip install -r requirements.txt
# Run all benchmarks
python benchmark_harness.py --build-dir ../build
# Run specific benchmark
python benchmark_harness.py --test allocator
python benchmark_harness.py --test orderbook --operations 500000
python benchmark_harness.py --test risk --positions 1000000
# Property-based tests
python property_tests.pyenum class OverflowHandling {
UNDEFINED, // Let it wrap (UB in C++)
CHECKED_ABORT, // Abort on overflow
CHECKED_SATURATE, // Clamp to INT64_MAX/MIN
WIDENING, // Use __int128 intermediate
};enum class PriceRepresentation {
INT64_TICKS, // Integer tick counts
DOUBLE_NATIVE, // IEEE 754 double
DOUBLE_DAZ_FTZ, // Double with denormals flushed
FIXED_POINT_32_32, // 32.32 fixed point
};Compare allocation latency across different strategies:
Small Objects (64 bytes):
malloc/free: X.XX ms Y.YYe+07 ops/s
Pool: X.XX ms Y.YYe+08 ops/s (10x faster)
Arena: X.XX ms Y.YYe+09 ops/s (100x faster)
Compare throughput and accuracy:
Method Time (μs) Throughput Rel. Error ULP
scalar_naive XXX X.XXe+08 /s X.XXe-16 0
avx2 XXX X.XXe+09 /s X.XXe-16 0
kahan XXX X.XXe+08 /s 0.00e+00 0
- Subnormal Performance: Compare with/without DAZ/FTZ (10x+ speedup typical)
- Cancellation: Alternating large positive/negative values reveal precision loss
- Accumulation: Long summations show drift in naive implementations
Design sequences where:
- Total quantity overflows 32-bit
- Price × quantity overflows 64-bit
- Compare checked vs unchecked behavior
Structures are cache-line aligned (64 bytes) where beneficial:
struct alignas(64) Position {
double quantity;
double delta_per_unit;
// ...
};┌─────────────────────────────────────────┐
│ Free List Head → Block → Block → NULL │
├─────────────────────────────────────────┤
│ [Block 0] [Block 1] [Block 2] ... │
│ Fixed size, contiguous memory │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Memory Region │
│ [████████████████░░░░░░░░░░░░░░░░░░░] │
│ ↑ │
│ Bump Pointer │
│ No individual free, bulk reset only │
└─────────────────────────────────────────┘
| Allocator | 10K ops | 100K ops | 1M ops |
|---|---|---|---|
| Standard | ~1.5 | ~1.2 | ~1.0 |
| Pool | ~3.0 | ~2.5 | ~2.0 |
| Arena | ~5.0 | ~4.0 | ~3.0 |
| Method | 10K pos | 100K pos | 1M pos |
|---|---|---|---|
| scalar | ~50M | ~50M | ~50M |
| avx2 | ~200M | ~200M | ~200M |
| avx512 | ~400M | ~400M | ~400M |
| kahan | ~25M | ~25M | ~25M |
MIT License - See LICENSE file