Skip to content

Tell me your trading strategy in your words, and I'll evaluate it for you

License

Notifications You must be signed in to change notification settings

artvandelay/agentic-backtesting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NLBT β€” Natural Language Backtesting

Ask DeepWiki

Turn plain English into professional backtesting reports in minutes.
Describe your trading strategy in natural language. Get Python code, backtest results, and professional reports. No coding required.

πŸ†• v0.3.0: Now powered by 8 LLM-driven intelligence features with multilingual support and dramatically improved performance!


πŸš€ Quick Start

# 1. Install
git clone https://github.com/yourusername/nlbt && cd nlbt
pip install -e .

# 2. Configure LLM
llm keys set openrouter
llm models default openrouter/anthropic/claude-3.5-sonnet

# 3. Run
nlbt

Try it: Type "Buy and hold AAPL in 2024 with $10,000" and press enter.


✨ Key Benefits

Feature Benefit
πŸ’¬ Natural Language Describe strategies in plain English - no coding needed
🧠 LLM-Powered Intelligence 8 AI features: smart extraction, validation, multilingual reports
🌍 Multilingual Support Generate reports in any language (Spanish, Hindi, etc.)
⚑ High Performance Dramatically improved strategy execution (up to 24x better returns)
πŸ”„ Self-Correcting Auto-retries with intelligent error diagnosis
πŸ“Š Professional Reports Markdown + PDF with metrics, charts, and full code
πŸ”§ Clean Architecture LLM-first design with 20% less code, more intelligence

πŸš€ What's New in v0.3.0

Major Architecture Overhaul: Complete "extreme promptification" with 8 LLM-powered intelligence features:

🧠 LLM-Powered Features

  • Smart Title Generation: Dynamic, context-aware report titles
  • Intelligent Requirement Extraction: Structured parsing from natural language
  • Flexible User Intent Detection: Understands "yes", "go", "proceed" variations
  • Adaptive Result Validation: Evaluates backtest quality intelligently
  • Multilingual Section Naming: Localized headings for any language
  • Smart Column Detection: Automatically finds best DataFrame columns
  • Dynamic Clarification Limits: Stops asking when enough info gathered
  • Targeted Error Diagnosis: Analyzes errors and suggests specific fixes

πŸ“ˆ Performance Impact

Real-world example: Same NVDA RSI strategy

  • Before v0.3.0: 10% return (1 trade)
  • After v0.3.0: 240% return (multiple trades)
  • 24x improvement in strategy execution quality

πŸ—οΈ Architecture Improvements

  • 20% less code: Removed 311 lines of redundant logic
  • LLM-first design: Intelligent reasoning replaces hardcoded rules
  • Clean fallbacks: Simple backups instead of complex regex patterns
  • Zero breaking changes: Seamless upgrade path

πŸ“₯ What You Get

Input β†’ Output

You type:

"NVDA RSI strategy: buy when RSI drops below 30 with larger positions when RSI is lower, sell when RSI goes above 70, use 2023 data with $50000 capital"

You get (in reports/NVDA_2023_<timestamp>/):

πŸ“ See actual example: reports/EXAMPLE_NVDA_2023/
πŸ“„ View report: report.md | report.pdf
πŸ’» View code: strategy.py

πŸ“Š Professional Report (report.md / report.pdf)

# NVDA 2023 Trading Strategy

Initial Capital: $50,000 β†’ Final Equity: $55,039.29 β†’ Gain: +$5,039.29 (+10.08%)

## Summary
- Test Period: 2023-01-03 to 2023-12-29 (360 days)
- Strategy: RSI Mean Reversion with Dynamic Position Sizing
- Total Return: 10.08% vs Buy & Hold 158.14%
- Risk Metrics: Sharpe 1.54, Max Drawdown -2.80%

## Strategy Implementation
- Entry: Buy when RSI < 30 with position scaling
- Position Size: Larger positions when RSI is lower (1x to 2x)
- Exit: Sell when RSI > 70
- Risk Management: 95% max equity exposure

## Performance Metrics
- Alpha: 7.80% (significant outperformance vs risk)
- Beta: 0.01439 (low market correlation)
- Calmar Ratio: 3.63 (excellent risk-adjusted returns)
- Win Rate: 100% (1 successful trade)

[Full analysis with code implementation]

πŸ’» Executable Code (strategy.py)

# Generated by NLBT - NVDA RSI Strategy with Dynamic Position Sizing

from backtesting import Backtest, Strategy
import numpy as np
import pandas as pd

def RSI(array, n=14):
    """Helper for RSI calculation"""
    delta = pd.Series(array).diff()
    gain = (delta.where(delta > 0, 0)).rolling(n).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(n).mean()
    rs = gain / loss
    return (100 - (100 / (1 + rs))).to_numpy()

class MyStrategy(Strategy):
    def init(self):
        self.rsi = self.I(RSI, self.data.Close, 14)
    
    def next(self):
        if not self.position:
            if self.rsi[-1] < 30:
                # Dynamic position sizing based on RSI
                rsi_scale = (30 - self.rsi[-1]) / 30  # 0 to 1 scale
                position_size = 1 + rsi_scale  # 1x to 2x sizing
                
                units = int((self.equity * 0.95 * position_size) / self.data.Close[-1])
                if units > 0:
                    self.buy(size=units)
        
        elif self.position and self.rsi[-1] > 70:
            self.position.close()

# Execute backtest
data = get_ohlcv_data('NVDA', '2023-01-01', '2023-12-31')
bt = Backtest(data, MyStrategy, cash=50000)
stats = bt.run()

πŸ” Debug & Agent Logs

  • debug.log - Execution trace for troubleshooting
  • agent.log - Full LLM context for iteration (~6-8K words)

⚠️ Important Notes

  • Safety: This tool runs AI-generated Python code locally. Use in trusted environments only.
  • Status: Functional for single-ticker strategies. APIs may change without notice.
  • Limitations: Multi-asset portfolios not yet supported. Works best with clear strategy descriptions.

Requirements

  • Python 3.8+
  • OpenRouter account (recommended) or OpenAI/Anthropic
  • 5 minutes for setup

Install & Setup

1. Clone and install everything

git clone https://github.com/yourusername/nlbt
cd nlbt
pip install -e .

This installs all dependencies including llm CLI, backtesting, ta, and more

2. Set up OpenRouter (recommended)

Why OpenRouter? Cost control, multiple models, spending limits

  1. Create account: Go to https://openrouter.ai/
  2. Get API key: Click "Keys" β†’ "Create Key"
  3. Add credits: Add $5-10 (you'll use <$1 for examples)
  4. Set spending limit: Optional but recommended
  5. Configure locally:
llm keys set openrouter
# Paste your API key when prompted

llm models default openrouter/anthropic/claude-3.5-sonnet

3. Quick test

nlbt

Try: "Buy and hold AAPL in 2024 with $1000"

What you should see:

  • Agent asks clarifying questions (if needed)
  • Shows "Phase 1 - Understanding" β†’ "Phase 2 - Implementation" β†’ "Phase 3 - Reporting"
  • Saves report to reports/<TICKER>_<PERIOD>_<TIMESTAMP>/report.md (+ PDF)
  • Takes 2-3 minutes total

πŸ’¬ Usage

nlbt                    # Start interactive session

In-chat commands:

  • info - Show current phase and requirements
  • debug - Show internal state
  • lucky - Quick demo with AAPL
  • exit - Quit

Language preference

  • Set report language: Include lang <language> or language: <language> anywhere in your message to generate the entire report (including the TL;DR) in that language. Defaults to English if omitted.

Example:

πŸ’­ You: Buy and hold AAPL in 2024 with $10,000; lang Spanish

πŸ”„ How It Works

NLBT uses a 3-phase agentic workflow with automatic error recovery:

Simple Overview

  1. πŸ” Understanding - Chat with AI to gather requirements (ticker, period, capital, strategy)
  2. βš™οΈ Implementation - AI generates Python code, tests it, and auto-retries if needed
  3. πŸ“Š Reporting - AI creates professional analysis with metrics and insights

Visual Workflow

Click to see detailed architecture diagram

Color Key:

  • Purple = User actions | Yellow = LLM actions | Green = System/sandbox
  • Orange = Decisions | Teal = Phase states | Gray = Outputs
graph TD
    Start([User describes strategy]) --> P1[Phase 1: Understanding]
    P1 --> Extract[Extract requirements from conversation]
    Extract --> Check{Complete &<br/>implementable?}
    
    Check -->|Missing/unclear| Ask[Ask clarifying questions]
    Ask --> P1
    
    Check -->|Complete & valid| Ready[Ready to Implement]
    Ready --> Present[Present plan to user]
    Present --> Response{User response}
    
    Response -->|Anything else| BackToP1[Return to understanding]
    BackToP1 --> P1
    Response -->|Yes/Go| P2[Phase 2: Implementation]
    
    P2 --> Plan[Plan: LLM creates implementation plan]
    Plan --> Code[Producer: Generate Python code]
    Code --> Test[Test: Validate syntax & imports]
    Test --> Execute[Execute: Run in sandbox]
    Execute --> Critic[Critic: Evaluate results]
    Critic --> Decision{Critic decision}
    
    Decision -->|PASS| P3[Phase 3: Reporting]
    Decision -->|RETRY| Count{Attempt < 3?}
    Count -->|Yes| Plan
    Count -->|No| FailBack[Show error & return to understanding]
    FailBack --> P1
    
    P3 --> ReportPlan[Plan: Structure report]
    ReportPlan --> Write[Write: Generate markdown]
    Write --> Refine[Refine: Polish & save]
    Refine --> Done([Report saved])

    %% Role-based styling
    classDef user fill:#d1c4e9,stroke:#7e57c2,color:#4a148c;
    classDef llm fill:#fff9c4,stroke:#fbc02d,color:#6d4c41;
    classDef system fill:#e8f5e9,stroke:#43a047,color:#1b5e20;
    classDef decision fill:#ffccbc,stroke:#e64a19,color:#bf360c;
    classDef userInput fill:#e1bee7,stroke:#8e24aa,color:#4a148c;
    classDef state fill:#b2dfdb,stroke:#00897b,color:#004d40;
    classDef output fill:#eceff1,stroke:#90a4ae,color:#37474f;

    %% Assign roles
    class Start user;
    class P1,P2,P3,Ready state;
    class Extract,Ask,Plan,Code,Critic,ReportPlan,Write,Refine llm;
    class Test,Execute,Present system;
    class Check,Decision,Count decision;
    class Response userInput;
    class Done output;
    class BackToP1,FailBack system;
Loading

Key Features

  • Smart Confirmation: Say "yes" to proceed, anything else returns to conversation
  • Auto-Retry: Up to 3 attempts with error feedback
  • Error Recovery: After failures, returns to chat with error context
  • Producer-Critic Pattern: Separate AI for generation and evaluation (reduces bias)

Examples (realistic conversations)

Simple Buy & Hold

πŸ’­ You: Buy and hold AAPL in 2024 with $10,000

πŸ€– Perfect! I have everything needed:
   β€’ Ticker: AAPL
   β€’ Period: 2024
   β€’ Capital: $10,000
   β€’ Strategy: Buy and hold
   
   Ready to implement? (yes/no/change/explain)

πŸ’­ You: yes

πŸ€– βš™οΈ Phase 2: Implementation
   [Attempt 1/3] Planning implementation...
   [Attempt 1/3] Generating Python code...
   [Attempt 1/3] Testing code...
   [Attempt 1/3] Executing backtest...
   [Attempt 1/3] Results: Return 38.88%, Sharpe 1.25, Max DD -15.26%
   βœ… Critic: PASS - Proceeding to reporting

πŸ€– πŸ“Š Phase 3: Reporting
   Planning report structure...
   Writing comprehensive analysis...
   βœ… Report saved: reports/AAPL_2024_20241002_123456/report.md

RSI Strategy

πŸ’­ You: NVDA with RSI: buy when RSI < 30, sell when RSI > 70, 2023, $20K

πŸ€– STATUS: READY
   β€’ Ticker: NVDA
   β€’ Period: 2023  
   β€’ Capital: $20,000
   β€’ Strategy: RSI mean reversion (buy <30, sell >70)
   
   Proceed? (yes/no/change/explain)

πŸ’­ You: yes

πŸ€– [Generates β†’ Tests β†’ βœ“ PASS]
   βœ… Report: reports/NVDA_2023_20241002_143012/report.md

Progressive Disclosure (provide info gradually)

πŸ’­ You: Test a moving average crossover

πŸ€– I'll help you backtest a moving average strategy! I need:
   β€’ Which ticker? (e.g., AAPL, SPY, TSLA)
   β€’ Time period? (e.g., 2024, 2020-2023)
   β€’ Initial capital? (e.g., $10,000)
   β€’ Which MA periods? (e.g., 50/200 day)

πŸ’­ You: SPY, 2024, $25K

πŸ€– Great! Which moving average periods? (e.g., 20/50, 50/200)

πŸ’­ You: use 50/200 day crossover

πŸ€– Perfect! All set. Ready to proceed?

πŸ’­ You: yes

πŸ”§ Troubleshooting

Common Issues & Solutions

"Unknown model" error

llm models list                    # See available models
llm models default [model-name]    # Set default

"LLM failed" or timeout

  • Check API key: llm keys list
  • Check OpenRouter credits/limits
  • Try simpler strategy description
  • Use debug command to see internal state

"No data found" error

  • Verify ticker symbol (use Yahoo Finance format)
  • Ensure date range is in the past
  • Try different dates or ticker

Code execution fails

  • Agent will auto-retry up to 3 times
  • If still failing, simplify your strategy
  • Use info to see what requirements were gathered
  • Check for typos in ticker/dates

General debugging

  • Use info command to see current phase
  • Use debug command to see conversation history
  • Check reports/ folder for any partial outputs
  • Restart with exit and try again
Alternative LLM Providers

OpenAI:

llm keys set openai
llm models default gpt-4o-mini

Anthropic:

llm keys set anthropic  
llm models default claude-3-5-sonnet-20241022

🀝 Contributing

Contributions welcome! Areas of interest:

  • Multi-asset portfolio backtesting
  • Additional technical indicators
  • Parameter optimization
  • Risk management strategies
  • Interactive visualizations

See issues or open a PR!


πŸ“„ License

GPL-3.0 License. See LICENSE.

This is copyleft software - any derivative works must also be open source under GPL-3.0.


πŸ—οΈ Technical Details

Project Structure
src/nlbt/
β”œβ”€β”€ cli.py              # Interactive CLI with rich formatting
β”œβ”€β”€ reflection.py       # 3-phase reflection engine
β”œβ”€β”€ llm.py              # LLM wrapper using `llm` CLI
└── sandbox.py          # Safe code execution

reports/                # Generated backtest reports
β”œβ”€β”€ <TICKER>_<PERIOD>_<TIMESTAMP>/
β”‚   β”œβ”€β”€ report.md       # User: Professional report
β”‚   β”œβ”€β”€ report.pdf      # User: PDF version
β”‚   β”œβ”€β”€ strategy.py     # Developer: Executable code
β”‚   β”œβ”€β”€ debug.log       # Developer: Execution trace
β”‚   └── agent.log       # Agent: Full LLM context
└── EXAMPLE_*/          # Sample outputs

tests/                  # Unit and integration tests
Architecture & Design Patterns

This project implements several Agentic Design Patterns:

  • Reflection Pattern: 3-phase autonomous workflow with LLM controlling transitions
  • Producer-Critic Pattern: Separate models for generation and evaluation (avoids confirmation bias)
  • Planning Pattern: Phase 2 plans before coding; Phase 3 plans before writing
  • Tool Use Pattern: Sandbox execution, data fetching, indicator calculations
  • Prompt Chaining: Phase transitions chain prompts with context
  • Error Recovery: Auto-retry loop (max 3 attempts) with error feedback
  • Checkpoint Pattern: Three-tier output (user/developer/agent) for reproducibility

See cursor_chats/Agentic_Design_Patterns_Complete.md for detailed documentation.

About

Tell me your trading strategy in your words, and I'll evaluate it for you

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published