Greatness-Analyzed

Mechanistic interpretability on texts about human greatness, biographies, speeches, and philosophy, using a frozen Meta Llama 3 8B model, sparse autoencoders (SAEs) on residual-stream activations, and activation patching / circuit-style analysis in the spirit of frontier interpretability work (e.g. sparse features, causal interventions).

The scientific bet: train an SAE on internal activations elicited by “greatness-dense” public-domain text, then interpret which latent features fire and how patching changes behavior, turning “greatness” from a vague label into testable hypotheses about representations and circuits.

Status (what exists vs planned)

Area	Status
Chunk 1 — Data ingestion	Implemented: `src/data_pipeline.py`
Chunk 2 — Activation harvester	Planned (`src/activation_harvester.py`)
Chunk 3 — SAE architecture	Planned (`src/sae_model.py`)
Chunk 4 — SAE training	Planned (`src/sae_trainer.py`)
Chunk 5 — Feature interpretation	Planned (`src/interpreter.py`)
Activation patching scaffold	Implemented (baseline sweep): `src/patching_engine.py`

Upcoming: Data Curation (Apr 8-12)

This week focuses on expanding the data corpus beyond the initial Carnegie autobiography. I'll curate a "greatness-dense" dataset from primary sources of 15 exceptional historical and contemporary figures I will call "characters." These characters represent diverse domains of human achievement, providing rich text for training sparse autoencoders on greatness representations.

Core Characters:

Winston Churchill
Thomas Jefferson
George Washington
Leo Tolstoy
Voltaire
Charles Darwin
Mahatma Gandhi
Ralph Waldo Emerson
Samuel Pepys
Eleanor Roosevelt
Warren Buffett
Ulysses S. Grant
Benjamin Franklin
Oprah Winfrey
Naval Ravikant

Plan:

Develop a scraper script (src/data_scraper.py) to download top 3 primary sources per character (45 total) from public-domain archives like Project Gutenberg and Archive.org.
Handle manual collection for licensed or modern sources (e.g., Buffett letters, Oprah transcripts).
Store raw text directly to my external hard drive with metadata tracking.
Validate integrity and estimate ~50GB total corpus.
Output: cleaned and tokenized chunks ready for activation harvesting.

Planned pipeline (five chunks)

Work is intended to proceed chunk by chunk with validation before scaling on GPU.

Raw data ingestion — Pull public-domain text (Project Gutenberg), clean boilerplate, tokenize with the same tokenizer as Llama 3 8B, emit fixed-length windows (currently 256 tokens) for batching. Output: data/raw/ (e.g. chunked .pt with input_ids + metadata).
Activation harvester — Load Llama 3 8B via TransformerLens (HookedTransformer), run chunks through the frozen model, record residual activations at a chosen layer (e.g. mid-depth hook_resid_post). Output: results/activations/ (large .pt shards; batched to avoid OOM on A100-class GPUs).
SAE architecture — Encoder (linear → nonlinearity), wide latent (typically several× d_model), decoder (linear). Losses: MSE reconstruction + L1 sparsity on latents.
SAE training — Train on harvested activations with Adam; checkpoint to results/models/ (e.g. sae_weights.pth).
Interpretation — For text snippets: forward through LLM → layer activation → SAE encode → top-k latent features; correlate spikes with text; optional mining of max-activating examples from stored activations.

Patching (parallel track): src/patching_engine.py already runs a full-layer sweep of residual patching between a matched-length target vs baseline prompt pair (greatness / composure themed). Next step is to attach a metric (e.g. logit difference on contrast tokens) rather than only confirming hooks run.

Chunk 1 details (implemented)

Source: Project Gutenberg #17976 — The Autobiography of Andrew Carnegie (plain text via the standard cache/epub URL).
Tokenizer: meta-llama/Meta-Llama-3-8B (gated on Hugging Face; requires access + token).
Output file: data/raw/gutenberg_17976_chunks_256.pt — tensor [num_chunks, 256], dtype long, plus a meta dict.

Run:

export HF_TOKEN=...   # Hugging Face token with Llama 3 access
python -m src.data_pipeline

Repository layout

├── LICENSE
├── README.md
├── requirements.txt    # Python deps (torch, transformers, transformer-lens, …)
├── .gitignore          # includes results/, typical Python ignores
├── src/
│   ├── data_pipeline.py    # Chunk 1
│   └── patching_engine.py  # Activation patching experiment scaffold
├── data/raw/           # Chunked datasets (created by pipeline; may be large)
└── results/            # Activations, checkpoints (gitignored; use for heavy artifacts)

Setup

Python environment

Python 3.10+ recommended.
Use a virtual environment. Prefer creating the venv outside cloud-synced folders (e.g. Google Drive): sync clients often make many small file operations slow or flaky, which hurts pip, git, and even source .venv/bin/activate.

Dependencies

Install from requirements.txt when present, or equivalently:

torch (≥ 2.4 recommended for recent transformers 5.x)
transformers
transformer-lens
datasets, accelerate (for planned scaling / HF workflows)

NumPy: Avoid NumPy 2.x with older torch wheels that were built against NumPy 1.x (ABI warnings / _ARRAY_API errors). Prefer numpy>=1.26,<2 with a pinned modern torch, or follow current PyTorch release notes for NumPy 2 compatibility.

Hugging Face / Llama 3

Request access to meta-llama/Meta-Llama-3-8B on Hugging Face and accept the license.
export HF_TOKEN=... or huggingface-cli login.

Without access, Chunk 1 fails at tokenizer download with a 403 / gated repo error; that is expected until access is granted.

Compute

Local: development and small tests (CPU).
Planned: Google Colab Pro (e.g. A100) for model load, activation harvesting, and SAE training. Large tensors belong under results/ (ignored by git); sync or copy those separately if you use cloud backup drives.

What has been attempted / learned (project hygiene)

Tokenizer alignment: Chunk 1 uses the Llama 3 8B tokenizer so chunks match HookedTokenizer / HookedTransformer later.
Dependency stack: transformers 5.x expects torch ≥ 2.4; mixing torch 2.2 + NumPy 2.x produced ABI warnings and broken NumPy integration in some installs—address with coordinated upgrades or numpy<2.
Git + cloud sync: Keeping a live .git directory inside a Google Drive–backed path led to Operation timed out on reads (e.g. .git/HEAD, rsync/mmap on source files). GitHub is the canonical history; a local clone on normal disk for git commit / push, with optional manual sync of working files to/from Drive, is more reliable than running Git fully inside the synced tree.
Virtualenv on Drive: Same class of I/O issues; venv on local disk avoids multi-second activate and fragile pip.

Roadmap (short)

Implement Chunks 2–5 in order; keep layer index and paths configurable.
Expand the greatness corpus (more Gutenberg sources; screenplays only with clear rights).
Extend patching with a quantitative logit- or probability-based metric on contrast pairs.
Optional: JumpReLU / transcoder-style SAE variants, automated feature labeling, steering experiments—documented as stretch goals.

License

See LICENSE.

Name

Greatness-Analyzed — interpretability-first study of how language models represent ambition, composure, and greatness in text, without claiming those labels are uniquely “one feature” in the model; features are hypotheses checked against reconstruction, sparsity, examples, and causal tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Greatness-Analyzed

Status (what exists vs planned)

Upcoming: Data Curation (Apr 8-12)

Planned pipeline (five chunks)

Chunk 1 details (implemented)

Repository layout

Setup

Python environment

Dependencies

Hugging Face / Llama 3

Compute

What has been attempted / learned (project hygiene)

Roadmap (short)

License

Name

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Greatness-Analyzed

Status (what exists vs planned)

Upcoming: Data Curation (Apr 8-12)

Planned pipeline (five chunks)

Chunk 1 details (implemented)

Repository layout

Setup

Python environment

Dependencies

Hugging Face / Llama 3

Compute

What has been attempted / learned (project hygiene)

Roadmap (short)

License

Name

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages