Chase the Source

A source attribution system that traces factual claims back to their original sources.

Given a piece of text containing a factual claim, Chase the Source will:

Extract the factual claim
Search for original sources
Tell you if the source directly states, paraphrases, or contradicts the claim

Why This Exists

Articles and social posts often contain factual claims that are paraphrased, distorted, or fabricated. Readers can't easily trace claims back to primary sources to verify accuracy.

Chase the Source solves this by automatically finding and comparing original sources against claims.

How It Works

flowchart LR
    A[User pastes text<br>with a factual claim] --> B[System finds<br>original sources] --> C[System shows how<br>sources relate to claim]

Attribution Categories

Result	Meaning
🟢 Direct	Source states the claim verbatim or near-verbatim
🟡 Paraphrase	Source conveys the same meaning in different words
🔴 Contradiction	Source states the opposite or conflicts with the claim
⚪ Not Found	No sources found that address this specific claim

Example

Input:

Elon's company absolutely crushed it last year, delivering a whopping 1.8 million vehicles. The stock is going to the moon!

Output:

EXTRACTED CLAIM: Tesla delivered 1.8 million vehicles in 2023.

ATTRIBUTION: 🟢 DIRECT

SUMMARY: Tesla's official Q4 2023 investor report directly states
delivery of approximately 1.81 million vehicles in 2023.

BEST SOURCE: Tesla Q4 2023 Update (primary)
URL: https://ir.tesla.com/press-release/tesla-q4-2023-update

QUOTE: "In 2023, we produced 1.85 million vehicles and delivered
over 1.8 million vehicles."

Quick Start

Prerequisites

Python 3.11+
OpenAI API key (get one here)
Tavily API key (get one here - free tier available)

Installation

# Clone the repository
git clone <repository-url>
cd chase_source

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate  # Linux/macOS
# OR: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env and add your API keys

Run

python app.py

Open http://localhost:7860 in your browser.

Docker

# With docker-compose (recommended)
docker compose up

# Or build manually
docker build -t chase-the-source .
docker run -p 7860:7860 --env-file .env chase-the-source

Architecture

flowchart LR
    subgraph LangGraph Workflow
        direction LR
        CE[Claim Extractor] --> SR[(Source Retriever)]
        SR --> EF[Evidence Filter]
        EF --> SC[Source Comparer]
        SC --> AA[[Attribution Assembler]]
        CE -->|no claim found| AA
    end

Pipeline Stages

Stage	Purpose	Technology
Claim Extractor	Extract factual sub-claim from opinionated text	GPT-5-mini
Source Retriever	Find relevant web sources	Tavily Search API
Evidence Filter	Filter for relevance, extract quotes	GPT-5-mini
Source Comparer	Classify: direct/paraphrase/contradiction	GPT-5-mini
Attribution Assembler	Produce final result with best source	GPT-5-mini

Source Types

The system classifies sources by reliability:

Type	Description	Examples
Primary	Original data or direct statements	SEC filings, press releases, official statistics, court documents
Original Reporting	First-party journalism	Interviews, investigations, on-scene reporting
Secondary	Aggregation or commentary	Wire rewrites, opinion pieces, blog posts

Primary sources are prioritized when determining the "best source" for a claim.

Configuration

Environment variables (.env):

# Required
OPENAI_API_KEY=sk-your-key-here
TAVILY_API_KEY=tvly-your-key-here

# Optional
OPENAI_MODEL=gpt-5-mini          # LLM model to use
TAVILY_MAX_RESULTS=10            # Max search results
TAVILY_SEARCH_DEPTH=advanced     # Search depth
LOG_LEVEL=INFO                   # Logging verbosity
GRADIO_SERVER_PORT=7860          # UI port

Project Structure

chase_source/
├── app.py                      # Gradio UI
├── graph.py                    # LangGraph workflow
├── config.py                   # Settings
├── nodes/
│   ├── claim_extractor.py      # Extract claims from text
│   ├── source_retriever.py     # Search for sources
│   ├── evidence_filter.py      # Filter relevant evidence
│   ├── source_comparer.py      # Classify source-claim relationship
│   └── attribution_assembler.py # Produce final result
├── schemas/
│   └── models.py               # Pydantic data models
├── prompts/
│   └── templates.py            # LLM prompts
├── tests/                      # Test suite
├── docs/                       # Detailed documentation
│   ├── SETUP_GUIDE.md
│   ├── DATA_SCHEMAS.md
│   ├── PROMPTS.md
│   ├── TECHNICAL_SPEC.md
│   └── TESTING_STRATEGY.md
├── requirements.txt
├── Dockerfile
└── docker-compose.yml

Testing

# Run all unit tests
pytest -m unit

# Run with coverage
pytest --cov=nodes --cov=schemas --cov-report=html

# Run specific test
pytest tests/test_claim_extractor.py -v

Limitations

Single claim: Extracts and traces one claim at a time
No persistence: Ephemeral, single-session (no saved history)
Search dependent: Quality depends on Tavily search results
Temporal gaps: May find outdated sources for recent claims
Language: English only

Documentation

Document	Contents
SETUP_GUIDE.md	Full installation instructions, Docker setup
DATA_SCHEMAS.md	Pydantic models and type definitions
PROMPTS.md	LLM prompts with few-shot examples
TECHNICAL_SPEC.md	Implementation details for each node
TESTING_STRATEGY.md	Test fixtures and mocking patterns

Tech Stack

LangGraph - Workflow orchestration
OpenAI API - LLM reasoning (GPT-5-mini)
Tavily - Web search
Gradio - Web UI
Pydantic - Data validation

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chase the Source

Why This Exists

How It Works

Attribution Categories

Example

Quick Start

Prerequisites

Installation

Run

Docker

Architecture

Pipeline Stages

Source Types

Configuration

Project Structure

Testing

Limitations

Documentation

Tech Stack

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
nodes		nodes
prompts		prompts
schemas		schemas
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ADR.md		ADR.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
PRD.md		PRD.md
README.md		README.md
app.py		app.py
config.py		config.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
graph.py		graph.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Codeblockz/chase_source

Folders and files

Latest commit

History

Repository files navigation

Chase the Source

Why This Exists

How It Works

Attribution Categories

Example

Quick Start

Prerequisites

Installation

Run

Docker

Architecture

Pipeline Stages

Source Types

Configuration

Project Structure

Testing

Limitations

Documentation

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages