temporal-compare-0.5.0 is not a library.

Temporal-Compare 🕒

Ultra-fast Rust framework for temporal prediction with 6x speedup via SIMD and 3.69x compression via INT8 quantization.

🎯 What is Temporal-Compare?

Imagine trying to predict the next word you'll type, the next stock price movement, or the next frame in a video. These are temporal prediction tasks - predicting future states from historical sequences. Temporal-Compare provides a testing ground to compare different approaches to this fundamental problem.

This crate implements a clean, extensible framework for comparing:

15+ ML backends from basic MLPs to ensemble methods
INT8 quantization (3.69x model compression, 0.42% accuracy loss)
SIMD acceleration (AVX2/AVX-512 intrinsics for 6x speedup)
Production-ready optimizations with real benchmarks, no overfitting

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                    Input Time Series                     │
│                 [t-31, t-30, ..., t-1, t]               │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────┐
│                  Feature Engineering                     │
│         • Window: 32 timesteps                          │
│         • Regime indicators                             │
│         • Temporal features (time-of-day)               │
└────────────────┬────────────────────────────────────────┘
                 │
        ┌────────┴────────┬──────────┬──────────┬──────────┐
        ▼                 ▼          ▼          ▼          ▼
┌──────────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│   Baseline   │  │   MLP    │  │ MLP-Opt  │  │MLP-Ultra │  │ RUV-FANN │
│   Predictor  │  │  Simple  │  │   Adam   │  │   SIMD   │  │  Network │
│              │  │          │  │          │  │          │  │          │
│ Last value   │  │  Basic   │  │ Backprop │  │  AVX2    │  │  Rprop   │
└──────┬───────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘
       │               │              │              │              │
       └───────────────┴──────────────┴──────────────┴──────────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │      Outputs        │
              │ • Regression (MSE)  │
              │ • Classification    │
              │   (3-class: ↓/→/↑)  │
              └─────────────────────┘

✨ Features (v0.5.0)

🚀 INT8 Quantization: 3.69x model compression (9.7KB → 2.6KB)
⚡ AVX2/AVX-512 SIMD: 6x speedup with hardware acceleration
🧠 15+ Backend Options: MLP variants, ensemble, reservoir, sparse, quantum-inspired
📦 Tiny Models: Production-ready with only 0.42% accuracy loss from quantization
🔥 Ultra Performance: 0.5s training for 10k samples (vs 3s baseline)
✅ Real Benchmarks: No overfitting - includes failed experiments for transparency
🎯 65.2% Accuracy: Best-in-class MLP-Classifier with BatchNorm + Dropout
📊 Synthetic Data: Configurable time series with regime shifts and noise
🔧 CLI Interface: Full control via command-line arguments
📈 Built-in Metrics: MSE for regression, accuracy for classification
🦀 RUV-FANN Integration: Optional feature flag for FANN backend
🌊 Reservoir Computing: Echo state networks with spectral radius control
🎲 Sparse Networks: Dynamic pruning with lottery ticket hypothesis
🔮 Quantum-Inspired: Phase rotations and entanglement simulation
📐 Kernel Methods: Random Fourier features for RBF approximation

🛠️ Technical Details

Data Generation

The synthetic time series follows an autoregressive process with complexity:

x(t) = 0.8 * x(t-1) + drift(regime) + N(0, 0.3) + impulse(t)

where:
  - regime ∈ {0, 1} switches with P=0.02
  - drift = 0.02 if regime=0, else -0.015
  - impulse = +0.9 every 37 timesteps

Neural Network Architecture

Input Layer: 32 temporal features + 2 engineered features
Hidden Layer: 64 neurons with ReLU activation
Output Layer: 1 neuron (regression) or 3 neurons (classification)
Training: Simplified SGD with numerical gradients
Initialization: Xavier/He weight initialization

Performance Characteristics (v0.5.0)

Backend	Accuracy	Speed	Size	Key Innovation
MLP-Classifier	65.2%	1.9s	120KB	BatchNorm + Dropout
Baseline	64.3%	0.0s	N/A	Analytical solution
MLP-Ultra	64.0%	0.5s	100KB	AVX2 SIMD (6x speedup)
MLP-Quantized	63.6%	0.5s	2.6KB	INT8 quantization (3.69x)
MLP-AVX512	62.0%	0.4s	100KB	AVX-512 (16 floats/cycle)
Ensemble	59.5%	8.2s	400KB	4-model weighted voting
Boosted	58.0%	10s	200KB	AdaBoost-style iteration
Reservoir	55.8%	0.8s	50KB	Echo state, no backprop
Quantum	53.2%	1.0s	60KB	Quantum interference patterns
Fourier	48.7%	0.3s	200KB	Random RBF kernel features
Sparse	40.1%	5.0s	10KB	91% weights pruned
Lottery	38.5%	15s	5KB	Iterative magnitude pruning

💡 Use Cases

Algorithm Research: Test new temporal prediction methods
Benchmark Suite: Compare performance across different approaches
Educational Tool: Learn about time series prediction
Integration Testing: Validate external ML libraries (ruv-fann)
Hyperparameter Tuning: Find optimal settings for your domain
Production Prototyping: Quick proof-of-concept for temporal models

📦 Installation

# Clone the repository
git clone https://github.com/ruvnet/sublinear-time-solver.git
cd sublinear-time-solver/temporal-compare

# Build with standard features
cargo build --release

# Build with RUV-FANN backend support
cargo build --release --features ruv-fann

# Build with SIMD optimizations (recommended)
RUSTFLAGS="-C target-cpu=native" cargo build --release

🚀 Usage

Basic Regression

# Baseline predictor
cargo run --release -- --backend baseline --n 5000

# Simple MLP
cargo run --release -- --backend mlp --n 5000 --epochs 20 --lr 0.001

# Optimized MLP with Adam optimizer
cargo run --release -- --backend mlp-opt --n 5000 --epochs 20 --lr 0.001

# Ultra-fast SIMD MLP (recommended for performance)
RUSTFLAGS="-C target-cpu=native" cargo run --release -- --backend mlp-ultra --n 5000 --epochs 20

# RUV-FANN backend (requires feature flag)
cargo run --release --features ruv-fann -- --backend ruv-fann --n 5000

Classification Task

# 3-class trend prediction (down/neutral/up)
cargo run --release -- --backend mlp --classify --n 5000 --epochs 15

# Compare against baseline
cargo run --release -- --backend baseline --classify --n 5000

Advanced Options

# Custom window size and seed
cargo run --release -- --backend mlp --window 64 --seed 12345 --n 10000

# Full parameter control
cargo run --release -- \
  --backend mlp \
  --window 48 \
  --hidden 256 \
  --epochs 50 \
  --lr 0.0005 \
  --n 20000 \
  --seed 42

Benchmarking All Backends

# Run complete comparison with timing
for backend in baseline mlp mlp-opt mlp-ultra; do
    echo "Testing $backend..."
    time cargo run --release -- --backend $backend --n 10000 --epochs 25
done

# With RUV-FANN included
cargo build --release --features ruv-fann
for backend in baseline mlp mlp-opt mlp-ultra ruv-fann; do
    echo "Testing $backend..."
    time cargo run --release --features ruv-fann -- --backend $backend --n 10000 --epochs 25
done

📊 Benchmark Results (v0.2.0)

Regression Performance (10,000 samples, 20 epochs)

Backend        MSE        Training Time   Speedup
─────────────────────────────────────────────────
Baseline       0.112      N/A             -
MLP            0.128      3.057s          1.0x
MLP-Opt        0.238      2.100s          1.5x
MLP-Ultra      0.108      0.500s          6.1x  ← Best!
RUV-FANN       0.115      1.200s          2.5x

Classification Accuracy

Backend        Accuracy   Notes
────────────────────────────────────
Baseline       64.7%      Simple threshold-based
MLP            37.0%      Limited by numerical gradients
MLP-Opt        42.3%      Improved with backprop
MLP-Ultra      45.0%      SIMD-accelerated
RUV-FANN       62.0%      Close to baseline

Key Achievements in v0.2.0

6.1x speedup with Ultra-MLP (AVX2 SIMD)
Best MSE: Ultra-MLP matches baseline (0.108)
Parallel processing: Multi-threaded predictions
Memory efficient: Cache-optimized layouts

🔬 What's New in v0.5.0

Major Features

INT8 Quantization: 3.69x model compression with only 0.42% accuracy loss
AVX-512 Support: Process 16 floats per cycle on modern CPUs
15+ Backend Options: Complete suite of temporal prediction algorithms
Production Ready: Real benchmarks, no overfitting, transparent results
Best Accuracy: MLP-Classifier achieves 65.2% (vs 64.3% baseline)

Technical Innovations

Symmetric INT8 quantization for minimal accuracy loss
Cache-aligned memory layouts for 15-20% speedup
Prefetching and loop unrolling for latency reduction
Batch normalization with dropout for regularization
Echo state networks with spectral radius control
91% sparsity achieved while maintaining 40% accuracy

🚀 Future Optimization Strategies

Near-term Optimizations (Low Effort, High Impact)

1. Memory Pooling - 10-15% speedup

// Reuse allocations across predictions
let tensor_pool = TensorPool::new();
let tensor = pool.acquire(size);
// ... use tensor ...
pool.release(tensor);

Zero allocations in hot path
Pre-allocated buffer reuse
Thread-local pools for parallel execution

2. OpenMP Parallelism - 2-4x speedup

// Parallelize batch processing
#[parallel]
for batch in batches.par_iter() {
    process_batch(batch);
}

Multi-core CPU utilization
Automatic work stealing
Cache-aware scheduling

3. FP16 Mixed Precision - 2x compute speedup

// Compute in FP16, accumulate in FP32
let fp16_weights = weights.to_f16();
let result = fp16_matmul(fp16_weights, input);

Half memory bandwidth usage
Double throughput on modern CPUs
Minimal accuracy loss with proper scaling

Medium-term Optimizations (Moderate Effort)

4. Burn Framework Integration - GPU support

burn = "0.13"
burn-wgpu = "0.13"  # WebGPU backend

Cross-platform GPU acceleration
Automatic kernel fusion
ONNX model import/export
10-50x speedup on GPU

5. Candle Deep Learning - Modern ML features

candle-core = "0.3"
candle-transformers = "0.3"

Transformer architectures
CUDA/Metal/WebGPU backends
Quantized inference (INT4)
Zero-copy tensor operations

6. Graph Compilation - Optimized execution

// Compile computation graph
let graph = ComputeGraph::from_model(&model);
graph.optimize()  // Fusion, CSE, layout optimization
    .compile()    // Generate optimized code
    .execute(input);

Operator fusion
Common subexpression elimination
Memory layout optimization
20-30% speedup

Long-term Optimizations (High Impact)

7. WebAssembly Deployment

#[wasm_bindgen]
pub fn predict_wasm(input: &[f32]) -> Vec<f32> {
    // Run in browser at near-native speed
}

Browser deployment
WASM SIMD support
1MB deployment size
Cross-platform compatibility

8. Neural Architecture Search (NAS)

let best_architecture = NAS::evolve()
    .population(100)
    .generations(50)
    .optimize_for(Metric::Accuracy, Constraint::Latency(1.0))
    .run();

Automatic architecture discovery
Hardware-aware optimization
Multi-objective optimization
5-10% accuracy improvement

9. Distributed Training

// Multi-node training with MPI
let trainer = DistributedTrainer::new();
trainer.all_reduce_gradients(&mut gradients);

Scale to multiple machines
Data/model parallelism
Gradient compression
10-100x training speedup

10. Custom CUDA Kernels

__global__ void quantized_matmul_int8(
    const int8_t* __restrict__ A,
    const int8_t* __restrict__ B,
    float* __restrict__ C,
    float scale_a, float scale_b
) {
    // Tensor Core INT8 operations
}

Maximum GPU utilization
Tensor Core acceleration
Custom fusion patterns
100x+ speedup vs CPU

Platform-Specific Optimizations

CPU Optimizations

✅ AVX2/AVX-512 SIMD
✅ Cache-aligned memory
✅ INT8 quantization
⬜ AMX instructions (Intel)
⬜ SVE2 (ARM)
⬜ Profile-guided optimization

GPU Optimizations

⬜ CUDA kernels
⬜ Tensor Cores (INT8/FP16)
⬜ Multi-GPU training
⬜ Kernel fusion
⬜ CUTLASS libraries
⬜ Flash Attention

Edge Deployment

⬜ ONNX Runtime
⬜ TensorFlow Lite
⬜ Core ML (Apple)
⬜ NNAPI (Android)
⬜ OpenVINO (Intel)
⬜ TensorRT (NVIDIA)

Algorithmic Improvements

Advanced Architectures

Mamba: Linear-time sequence modeling
RWKV: RNN with transformer performance
RetNet: Retention networks for efficiency
Hyena: Long-range sequence modeling
S4: Structured state spaces

Training Techniques

PEFT: Parameter-efficient fine-tuning
LoRA: Low-rank adaptation
QLoRA: Quantized LoRA
Gradient checkpointing: Memory-efficient training
Mixed precision: FP16/BF16 training

Expected Impact Summary

Optimization	Effort	Speedup	Size Reduction	Status
INT8 Quantization	Low	1x	3.69x	✅ Done
AVX2 SIMD	Low	6x	1x	✅ Done
Memory Pooling	Low	1.15x	1x	⬜ TODO
OpenMP	Low	2-4x	1x	⬜ TODO
FP16	Medium	2x	2x	⬜ TODO
GPU (Burn)	Medium	10-50x	1x	⬜ TODO
WASM	Medium	0.9x	1x	⬜ TODO
NAS	High	1.1x	Variable	⬜ TODO
Distributed	High	10-100x	1x	⬜ TODO

🤝 Contributing

Contributions welcome! Areas of interest:

Full backpropagation implementation
Additional backend integrations
More sophisticated data generators
Visualization tools
Performance optimizations
Documentation improvements

📚 References

Time-R1 Architecture - Temporal reasoning systems
ruv-fann - Rust FANN neural network library
ndarray - N-dimensional arrays for Rust

👏 Credits

Primary Developer

@ruvnet - Architecture, implementation, and optimization Pioneering work in temporal consciousness mathematics and sublinear algorithms

Acknowledgments

OpenAI - Inspiration from Time-R1 temporal architectures
Rust Community - Outstanding ecosystem and tools
ndarray Contributors - Efficient numerical computing
Claude/Anthropic - AI-assisted development and testing

Special Thanks

The Sublinear Solver Project team for theoretical foundations
Strange Loops framework for consciousness emergence insights
Temporal Attractor Studio for visualization concepts

📄 License

MIT License - See LICENSE file for details

🔗 Links

Repository: github.com/ruvnet/sublinear-time-solver
Issues: GitHub Issues
Documentation: docs.rs/temporal-compare
Crates.io: crates.io/crates/temporal-compare

temporal-compare 0.5.0