temporal-compare 0.5.0

High-performance framework for benchmarking temporal prediction algorithms inspired by Time-R1
temporal-compare-0.5.0 is not a library.

Temporal-Compare 🕒

Ultra-fast Rust framework for temporal prediction with 6x speedup via SIMD and 3.69x compression via INT8 quantization.

🎯 What is Temporal-Compare?

Imagine trying to predict the next word you'll type, the next stock price movement, or the next frame in a video. These are temporal prediction tasks - predicting future states from historical sequences. Temporal-Compare provides a testing ground to compare different approaches to this fundamental problem.

This crate implements a clean, extensible framework for comparing:

  • 15+ ML backends from basic MLPs to ensemble methods
  • INT8 quantization (3.69x model compression, 0.42% accuracy loss)
  • SIMD acceleration (AVX2/AVX-512 intrinsics for 6x speedup)
  • Production-ready optimizations with real benchmarks, no overfitting

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                    Input Time Series                     │
│                 [t-31, t-30, ..., t-1, t]               │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────┐
│                  Feature Engineering                     │
│         • Window: 32 timesteps                          │
│         • Regime indicators                             │
│         • Temporal features (time-of-day)               │
└────────────────┬────────────────────────────────────────┘
                 │
        ┌────────┴────────┬──────────┬──────────┬──────────┐
        ▼                 ▼          ▼          ▼          ▼
┌──────────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│   Baseline   │  │   MLP    │  │ MLP-Opt  │  │MLP-Ultra │  │ RUV-FANN │
│   Predictor  │  │  Simple  │  │   Adam   │  │   SIMD   │  │  Network │
│              │  │          │  │          │  │          │  │          │
│ Last value   │  │  Basic   │  │ Backprop │  │  AVX2    │  │  Rprop   │
└──────┬───────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘
       │               │              │              │              │
       └───────────────┴──────────────┴──────────────┴──────────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │      Outputs        │
              │ • Regression (MSE)  │
              │ • Classification    │
              │   (3-class: ↓/→/↑)  │
              └─────────────────────┘

✨ Features (v0.5.0)

  • 🚀 INT8 Quantization: 3.69x model compression (9.7KB → 2.6KB)
  • ⚡ AVX2/AVX-512 SIMD: 6x speedup with hardware acceleration
  • 🧠 15+ Backend Options: MLP variants, ensemble, reservoir, sparse, quantum-inspired
  • 📦 Tiny Models: Production-ready with only 0.42% accuracy loss from quantization
  • 🔥 Ultra Performance: 0.5s training for 10k samples (vs 3s baseline)
  • ✅ Real Benchmarks: No overfitting - includes failed experiments for transparency
  • 🎯 65.2% Accuracy: Best-in-class MLP-Classifier with BatchNorm + Dropout
  • 📊 Synthetic Data: Configurable time series with regime shifts and noise
  • 🔧 CLI Interface: Full control via command-line arguments
  • 📈 Built-in Metrics: MSE for regression, accuracy for classification
  • 🦀 RUV-FANN Integration: Optional feature flag for FANN backend
  • 🌊 Reservoir Computing: Echo state networks with spectral radius control
  • 🎲 Sparse Networks: Dynamic pruning with lottery ticket hypothesis
  • 🔮 Quantum-Inspired: Phase rotations and entanglement simulation
  • 📐 Kernel Methods: Random Fourier features for RBF approximation

🛠️ Technical Details

Data Generation

The synthetic time series follows an autoregressive process with complexity:

x(t) = 0.8 * x(t-1) + drift(regime) + N(0, 0.3) + impulse(t)

where:
  - regime ∈ {0, 1} switches with P=0.02
  - drift = 0.02 if regime=0, else -0.015
  - impulse = +0.9 every 37 timesteps

Neural Network Architecture

  • Input Layer: 32 temporal features + 2 engineered features
  • Hidden Layer: 64 neurons with ReLU activation
  • Output Layer: 1 neuron (regression) or 3 neurons (classification)
  • Training: Simplified SGD with numerical gradients
  • Initialization: Xavier/He weight initialization

Performance Characteristics (v0.5.0)

Backend Accuracy Speed Size Key Innovation
MLP-Classifier 65.2% 1.9s 120KB BatchNorm + Dropout
Baseline 64.3% 0.0s N/A Analytical solution
MLP-Ultra 64.0% 0.5s 100KB AVX2 SIMD (6x speedup)
MLP-Quantized 63.6% 0.5s 2.6KB INT8 quantization (3.69x)
MLP-AVX512 62.0% 0.4s 100KB AVX-512 (16 floats/cycle)
Ensemble 59.5% 8.2s 400KB 4-model weighted voting
Boosted 58.0% 10s 200KB AdaBoost-style iteration
Reservoir 55.8% 0.8s 50KB Echo state, no backprop
Quantum 53.2% 1.0s 60KB Quantum interference patterns
Fourier 48.7% 0.3s 200KB Random RBF kernel features
Sparse 40.1% 5.0s 10KB 91% weights pruned
Lottery 38.5% 15s 5KB Iterative magnitude pruning

💡 Use Cases

  1. Algorithm Research: Test new temporal prediction methods
  2. Benchmark Suite: Compare performance across different approaches
  3. Educational Tool: Learn about time series prediction
  4. Integration Testing: Validate external ML libraries (ruv-fann)
  5. Hyperparameter Tuning: Find optimal settings for your domain
  6. Production Prototyping: Quick proof-of-concept for temporal models

📦 Installation

# Clone the repository
git clone https://github.com/ruvnet/sublinear-time-solver.git
cd sublinear-time-solver/temporal-compare

# Build with standard features
cargo build --release

# Build with RUV-FANN backend support
cargo build --release --features ruv-fann

# Build with SIMD optimizations (recommended)
RUSTFLAGS="-C target-cpu=native" cargo build --release

🚀 Usage

Basic Regression

# Baseline predictor
cargo run --release -- --backend baseline --n 5000

# Simple MLP
cargo run --release -- --backend mlp --n 5000 --epochs 20 --lr 0.001

# Optimized MLP with Adam optimizer
cargo run --release -- --backend mlp-opt --n 5000 --epochs 20 --lr 0.001

# Ultra-fast SIMD MLP (recommended for performance)
RUSTFLAGS="-C target-cpu=native" cargo run --release -- --backend mlp-ultra --n 5000 --epochs 20

# RUV-FANN backend (requires feature flag)
cargo run --release --features ruv-fann -- --backend ruv-fann --n 5000

Classification Task

# 3-class trend prediction (down/neutral/up)
cargo run --release -- --backend mlp --classify --n 5000 --epochs 15

# Compare against baseline
cargo run --release -- --backend baseline --classify --n 5000

Advanced Options

# Custom window size and seed
cargo run --release -- --backend mlp --window 64 --seed 12345 --n 10000

# Full parameter control
cargo run --release -- \
  --backend mlp \
  --window 48 \
  --hidden 256 \
  --epochs 50 \
  --lr 0.0005 \
  --n 20000 \
  --seed 42

Benchmarking All Backends

# Run complete comparison with timing
for backend in baseline mlp mlp-opt mlp-ultra; do
    echo "Testing $backend..."
    time cargo run --release -- --backend $backend --n 10000 --epochs 25
done

# With RUV-FANN included
cargo build --release --features ruv-fann
for backend in baseline mlp mlp-opt mlp-ultra ruv-fann; do
    echo "Testing $backend..."
    time cargo run --release --features ruv-fann -- --backend $backend --n 10000 --epochs 25
done

📊 Benchmark Results (v0.2.0)

Regression Performance (10,000 samples, 20 epochs)

Backend        MSE        Training Time   Speedup
─────────────────────────────────────────────────
Baseline       0.112      N/A             -
MLP            0.128      3.057s          1.0x
MLP-Opt        0.238      2.100s          1.5x
MLP-Ultra      0.108      0.500s          6.1x  ← Best!
RUV-FANN       0.115      1.200s          2.5x

Classification Accuracy

Backend        Accuracy   Notes
────────────────────────────────────
Baseline       64.7%      Simple threshold-based
MLP            37.0%      Limited by numerical gradients
MLP-Opt        42.3%      Improved with backprop
MLP-Ultra      45.0%      SIMD-accelerated
RUV-FANN       62.0%      Close to baseline

Key Achievements in v0.2.0

  • 6.1x speedup with Ultra-MLP (AVX2 SIMD)
  • Best MSE: Ultra-MLP matches baseline (0.108)
  • Parallel processing: Multi-threaded predictions
  • Memory efficient: Cache-optimized layouts

🔬 What's New in v0.5.0

Major Features

  • INT8 Quantization: 3.69x model compression with only 0.42% accuracy loss
  • AVX-512 Support: Process 16 floats per cycle on modern CPUs
  • 15+ Backend Options: Complete suite of temporal prediction algorithms
  • Production Ready: Real benchmarks, no overfitting, transparent results
  • Best Accuracy: MLP-Classifier achieves 65.2% (vs 64.3% baseline)

Technical Innovations

  • Symmetric INT8 quantization for minimal accuracy loss
  • Cache-aligned memory layouts for 15-20% speedup
  • Prefetching and loop unrolling for latency reduction
  • Batch normalization with dropout for regularization
  • Echo state networks with spectral radius control
  • 91% sparsity achieved while maintaining 40% accuracy

🚀 Future Optimization Strategies

Near-term Optimizations (Low Effort, High Impact)

1. Memory Pooling - 10-15% speedup

// Reuse allocations across predictions
let tensor_pool = TensorPool::new();
let tensor = pool.acquire(size);
// ... use tensor ...
pool.release(tensor);
  • Zero allocations in hot path
  • Pre-allocated buffer reuse
  • Thread-local pools for parallel execution

2. OpenMP Parallelism - 2-4x speedup

// Parallelize batch processing
#[parallel]
for batch in batches.par_iter() {
    process_batch(batch);
}
  • Multi-core CPU utilization
  • Automatic work stealing
  • Cache-aware scheduling

3. FP16 Mixed Precision - 2x compute speedup

// Compute in FP16, accumulate in FP32
let fp16_weights = weights.to_f16();
let result = fp16_matmul(fp16_weights, input);
  • Half memory bandwidth usage
  • Double throughput on modern CPUs
  • Minimal accuracy loss with proper scaling

Medium-term Optimizations (Moderate Effort)

4. Burn Framework Integration - GPU support

burn = "0.13"
burn-wgpu = "0.13"  # WebGPU backend
  • Cross-platform GPU acceleration
  • Automatic kernel fusion
  • ONNX model import/export
  • 10-50x speedup on GPU

5. Candle Deep Learning - Modern ML features

candle-core = "0.3"
candle-transformers = "0.3"
  • Transformer architectures
  • CUDA/Metal/WebGPU backends
  • Quantized inference (INT4)
  • Zero-copy tensor operations

6. Graph Compilation - Optimized execution

// Compile computation graph
let graph = ComputeGraph::from_model(&model);
graph.optimize()  // Fusion, CSE, layout optimization
    .compile()    // Generate optimized code
    .execute(input);
  • Operator fusion
  • Common subexpression elimination
  • Memory layout optimization
  • 20-30% speedup

Long-term Optimizations (High Impact)

7. WebAssembly Deployment

#[wasm_bindgen]
pub fn predict_wasm(input: &[f32]) -> Vec<f32> {
    // Run in browser at near-native speed
}
  • Browser deployment
  • WASM SIMD support
  • 1MB deployment size
  • Cross-platform compatibility

8. Neural Architecture Search (NAS)

let best_architecture = NAS::evolve()
    .population(100)
    .generations(50)
    .optimize_for(Metric::Accuracy, Constraint::Latency(1.0))
    .run();
  • Automatic architecture discovery
  • Hardware-aware optimization
  • Multi-objective optimization
  • 5-10% accuracy improvement

9. Distributed Training

// Multi-node training with MPI
let trainer = DistributedTrainer::new();
trainer.all_reduce_gradients(&mut gradients);
  • Scale to multiple machines
  • Data/model parallelism
  • Gradient compression
  • 10-100x training speedup

10. Custom CUDA Kernels

__global__ void quantized_matmul_int8(
    const int8_t* __restrict__ A,
    const int8_t* __restrict__ B,
    float* __restrict__ C,
    float scale_a, float scale_b
) {
    // Tensor Core INT8 operations
}
  • Maximum GPU utilization
  • Tensor Core acceleration
  • Custom fusion patterns
  • 100x+ speedup vs CPU

Platform-Specific Optimizations

CPU Optimizations

  • ✅ AVX2/AVX-512 SIMD
  • ✅ Cache-aligned memory
  • ✅ INT8 quantization
  • ⬜ AMX instructions (Intel)
  • ⬜ SVE2 (ARM)
  • ⬜ Profile-guided optimization

GPU Optimizations

  • ⬜ CUDA kernels
  • ⬜ Tensor Cores (INT8/FP16)
  • ⬜ Multi-GPU training
  • ⬜ Kernel fusion
  • ⬜ CUTLASS libraries
  • ⬜ Flash Attention

Edge Deployment

  • ⬜ ONNX Runtime
  • ⬜ TensorFlow Lite
  • ⬜ Core ML (Apple)
  • ⬜ NNAPI (Android)
  • ⬜ OpenVINO (Intel)
  • ⬜ TensorRT (NVIDIA)

Algorithmic Improvements

Advanced Architectures

  • Mamba: Linear-time sequence modeling
  • RWKV: RNN with transformer performance
  • RetNet: Retention networks for efficiency
  • Hyena: Long-range sequence modeling
  • S4: Structured state spaces

Training Techniques

  • PEFT: Parameter-efficient fine-tuning
  • LoRA: Low-rank adaptation
  • QLoRA: Quantized LoRA
  • Gradient checkpointing: Memory-efficient training
  • Mixed precision: FP16/BF16 training

Expected Impact Summary

Optimization Effort Speedup Size Reduction Status
INT8 Quantization Low 1x 3.69x ✅ Done
AVX2 SIMD Low 6x 1x ✅ Done
Memory Pooling Low 1.15x 1x ⬜ TODO
OpenMP Low 2-4x 1x ⬜ TODO
FP16 Medium 2x 2x ⬜ TODO
GPU (Burn) Medium 10-50x 1x ⬜ TODO
WASM Medium 0.9x 1x ⬜ TODO
NAS High 1.1x Variable ⬜ TODO
Distributed High 10-100x 1x ⬜ TODO

🤝 Contributing

Contributions welcome! Areas of interest:

  • Full backpropagation implementation
  • Additional backend integrations
  • More sophisticated data generators
  • Visualization tools
  • Performance optimizations
  • Documentation improvements

📚 References

👏 Credits

Primary Developer

@ruvnet - Architecture, implementation, and optimization Pioneering work in temporal consciousness mathematics and sublinear algorithms

Acknowledgments

  • OpenAI - Inspiration from Time-R1 temporal architectures
  • Rust Community - Outstanding ecosystem and tools
  • ndarray Contributors - Efficient numerical computing
  • Claude/Anthropic - AI-assisted development and testing

Special Thanks

  • The Sublinear Solver Project team for theoretical foundations
  • Strange Loops framework for consciousness emergence insights
  • Temporal Attractor Studio for visualization concepts

📄 License

MIT License - See LICENSE file for details

🔗 Links