Skip to content

End-to-end big data system for financial markets: ingest, transform, and visualize market & macro data

License

Notifications You must be signed in to change notification settings

Si944-byte/Finance-Data-OS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Finance Data OS

License: MIT
Python
Status
GitHub stars


πŸš€ Project Goal

To build a modular financial data platform that takes raw equities, options, and macroeconomic datasets and transforms them into analytics-ready features for trading and investment research.


πŸ—οΈ Architecture (Phase 1)

Untitled diagram _ Mermaid Chart-2025-09-04-222054

πŸ“‚ Project Layout

docs/ # architecture diagrams, notes

artifacts/ # Power BI files, exported charts

Build Logs/ # Weekly build logs

notebooks/ # Jupyter notebooks (Week 1, Week 2, etc....)


πŸ›£ Project Roadmap

This project is built and shipped in weekly artifacts. Each week delivers a small but meaningful piece of the pipeline.

Week 1 – Single-Ticker Prototype βœ…

Week 2 – Multi-Ticker Ingest & Finance Chart βœ…

Week 3 – Expanding History & Feature Store βœ…

Week 4 – Signals, Backtest & 3-Page Power BI βœ…

Week 5 – Costs, Controls & Tuning βœ…

Week 6 - Rolling Metrics, Cost Modeling and Cleaner Pipeline βœ…

Week 7 - Push-button backtest, parameter sweeps and one-command workflows βœ…

Week 8 - Deterministic parameter tuning + clean analytics βœ…

Week 9 - Performance Simulation & Validation βœ…

Objective: Extend the Finance Data OS pipeline to simulate trade-level execution, validate equity reconciliation, and visualize performance metrics in Power BI.

Pipeline Summary:

Simulate: Load tuned parameters and signals; apply execution logic (slippage, commission, fees).

Validate: Check PK uniqueness, null policies, and reconciliation between trades and equity.

Visualize: Publish Power BI dashboards (Trade Blotter, Equity vs. Drawdown, KPI cards).

Artifacts Created:

/lake/trade_mart_v3/trade-test_2025w09.parquet

/lake/equity_curve_daily_v3/eq-test_2025w09.parquet

/lake/signals_mart_v3/combined_week9.parquet

/lake/tuning_mart_v3/combined_week9.parquet

Power BI Deliverables:

Trade Blotter (PnL, Slippage, Fees, Entry/Exit Reason)

Equity (NAV) vs. Drawdown (%) Chart

KPIs: Sharpe (252d), CAGR, Win Rate

Slicers: Run ID, Symbol, Entry Reason

Results:

Metric Value Sharpe (252d) 1.34 CAGR (%) 21.96 Win Rate (%) 59.43

Validation Summary: βœ… PK uniqueness βœ… Null policy βœ… Drawdown ≀ 0 βœ… Reconciliation (trades ↔ equity)


⚑ Quick Start (Follow along with me!)

  1. Clone the repo:

Quick start

  1. Set up virtual environment:

virtual environment

  1. Install Dependencies:

dependencies

  1. Run the notebooks:

Jupyter notebook


πŸ“ Build Logs

Build Log – Week 1 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk1

Build Log – Week 2 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk2

Build Log - Week 3 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk3

Build Log - Week 4 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk4

Build Log - Week 5 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk5

Build Log - Week 6 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk6

Build Log - Week 7 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk7

Build Log - Week 8 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk8

Build Log - Week 9 https://github.com/Si944-byte/Finance-Data-OS/blob/main/Build%20Logs/Build%20Log%20wk9


πŸ—ΊοΈ Week 9 β€” What's New

Pipeline Enhancements

  • Added full simulation-validation loop (simulate(), validate_run(), reconciliation parity).

  • Introduced trade_mart_v3 and equity_curve_daily_v3 with deterministic structure.

  • Implemented timestamp normalization (UTC-safe ordering).

Testing & Reliability

  • Expanded pytest coverage for validation, PK uniqueness, and reconciliation.

  • Enforced schema-on-append verification.

  • Validation summary now logs βœ… pass/fail states for all checks.

Visualization & Reporting

  • Built Trade Blotter table (PnL, Commission, Fees, Entry/Exit Reason).

  • Added Equity (NAV) vs Drawdown (%) chart with dual axis.

  • Created KPI cards: Sharpe (252d), CAGR %, Win Rate %.

  • Added slicers for Run ID, Symbol, Entry Reason for filtering and analysis.

Performance & Usability

  • Introduced --max-combos, --allow-large safety flags in simulation.

  • Batched Parquet writes for faster runs and cleaner logs.

  • Seed-controlled simulation for reproducibility.


🧠 What I Learned

Week 9 – Performance Simulation & Validation

  1. End-to-End System Thinking I learned how each mart (signals β†’ tuning β†’ trades β†’ equity) connects as a complete system. Every step now produces validated outputs that feed the next stage β€” transforming raw data into a reliable simulation.

  2. Deterministic Design Matters Reproducibility isn’t optional. Setting seeds, controlling timezones, and enforcing schema validation ensured that identical inputs always yield identical outputs. It made debugging predictable and CI-safe.

  3. Validation is the Final Guardrail Having validation scripts that check PK uniqueness, null policies, and equity-trade reconciliation gave confidence that the pipeline’s math actually holds up. It shifted the mindset from β€œdoes it run” to β€œis it right.”

  4. Power BI as an Analysis Surface Building the Trade Blotter and Equity vs. Drawdown views clarified how to communicate system performance visually. Every KPI (Sharpe, CAGR, Win Rate) now ties directly to verified data β€” not estimates.

  5. Clean Models β†’ Clear Insights Simplifying relationships to a star schema (Date β†’ Facts, Symbol β†’ Facts) made the visuals snap into place. Data lineage now feels intuitive rather than tangled.


πŸ—‚οΈ Artifacts (Week 8: current week)

Dashboard Pages:

Page 1 - Signals Screenshot 2025-10-28 094748

Page 2 - Back-test Screenshot 2025-10-28 094444

Page 3 - Tuning Results Screenshot 2025-10-28 094019

Page 4 - Performance Overview Screenshot 2025-10-28 093348

Page 5 - About Page Screenshot 2025-10-27 191326


🀝 Contributing

This is an open project for learning and sharing best practices in data engineering for financial markets. Suggestions, issues, and PRs are welcome.


πŸ“œ License

MIT License β€” see LICENSE for details.

About

End-to-end big data system for financial markets: ingest, transform, and visualize market & macro data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published