StafferFi consists of three logically seperate layers.
./demo/- (Py, Go)
./zOS- (C, Java)
./zenbase- (PG, JS)
This project uses pnpm exclusively. The specific version is pinned in package.json:
- Version:
pnpm@10.24.0 - Never use
npmoryarnfor JavaScript dependencies - Docker builder stage enables corepack for pnpm
The zOS project is a separate framework for federal mainframe data modernization:
- Purpose: Extract data from z/OS mainframes (DB2, VSAM, IMS) via secure TN3270/JCL
- Architecture: TN3270 TLS connector → JCL submission → DB2 unloads → DuckDB → PostgreSQL → APIs
- Mission: Cross-agency data layer for federal government without changing legacy mainframes
- Deployment: Designed for LTOD (Limited Tour of Duty) engineering teams
- Scope: 20-30 agencies/year with 7-8 engineers via automation
This is conceptual/planning stage work. The demo/ project serves as a reference implementation of the data pipeline architecture.
StafferFi is a dual-purpose repository containing:
- demo/ - Multi-tier polyglot ETL service for eCFR analytics (pnpm monorepo)
- zOS/ - Mainframe integration framework for federal data modernization
brew install --cask docker
./demo.shcd demo
# Install dependencies (uses pnpm workspaces)
pnpm install
# Build all apps
pnpm build
# Build specific apps
pnpm build:web # Next.js frontend
pnpm build:api # Express API# Run web app (Next.js) in dev mode
pnpm dev:web
# Run API in dev mode
pnpm dev:api
# Run individual apps from their directories
cd apps/web && pnpm dev
cd apps/api && pnpm dev# Web app tests (Vitest)
cd apps/web
pnpm test
pnpm test:watch
# Cypress E2E tests
pnpm cypress:open
pnpm cypress
# Python lake tests
cd apps/lake
python test_pipeline.py
python test_postgres.py# Web app
cd apps/web
pnpm lint
pnpm typecheck
pnpm format
# API (TypeScript)
cd apps/api
pnpm build # Runs tsc which checks typescd demo
# Quick start all services (recommended)
./demo.sh
# Or use docker compose
sudo docker compose up --build
# Access services:
# - Web UI: http://localhost:3000
# - API: http://localhost:4000
# - Lake: http://localhost:8000
# - Postgres: localhost:5432
# Stop services
sudo docker compose down
# Full reset
sudo docker compose down -v
sudo docker compose up --buildcd demo/apps/lake
# Create virtualenv
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run ingestion and ETL
python ingestion.py
python etl_to_postgres.py
# Run lake service
gunicorn app:app --bind 0.0.0.0:8000The zOS directory contains mainframe integration tools but has minimal executable code. Refer to zOS/README.md for architecture details.
This is a pnpm monorepo with three main applications:
- Framework: Next.js 15 with App Router
- Language: TypeScript
- Styling: Tailwind CSS
- Charts: amCharts4
- Testing: Vitest, Testing Library, Cypress
- Structure:
app/- Next.js app router pages (dashboard, agencies, corrections, trends, reports)components/- Reusable React components (BarChart, LineChart, etc.)
- Framework: Express.js
- Language: TypeScript
- Database: PostgreSQL (via
pgclient) - Previously: Used DuckDB, migrated to PostgreSQL for production
- Endpoints: 11 REST endpoints for agencies, corrections, and trends data
- Port: 4000 (configurable via
API_PORT)
- Framework: Flask + Gunicorn
- Language: Python
- Analytics Engine: DuckDB for in-memory analytics
- Data Pipeline:
ingestion.py- Downloads eCFR data, validates checksums (SHA-256), stores in DuckDBanalytics.py- Calculates RVI (Regulatory Volatility Index) and other metricsetl_to_postgres.py- Migrates transformed data from DuckDB to PostgreSQL
- Schemas:
duckdb_schema.sqlandpostgres_schema.sql - Port: 8000
packages/tailwind-config- Shared Tailwind configuration
eCFR API → ingestion.py → DuckDB (analytics) → etl_to_postgres.py → PostgreSQL → Express API → Next.js Web
- Ingestion: Python fetches eCFR corrections and agency data, validates checksums
- Analytics: DuckDB performs analytical transformations (RVI calculation, aggregations)
- ETL: Data loaded into PostgreSQL for API consumption
- API Layer: Express serves REST endpoints with PostgreSQL queries
- Frontend: Next.js fetches from API and renders charts/dashboards
The Dockerfile uses a sophisticated multi-stage build:
- deps (node:20-alpine) - Installs pnpm dependencies
- builder - Builds Next.js (standalone output) and API (TypeScript), creates self-contained API bundle
- lake-deps (python:3.10-slim) - Creates Python virtualenv with lake dependencies
- runner (python:3.10-slim) - Final image with all services
supervisord manages all services in a single container:
- lake_pipeline (priority 5) - Runs ingestion + ETL once at startup
- web (priority 20) - Next.js standalone server on port 3000
- api (priority 20) - Express API on port 4000
- lake (priority 20) - Gunicorn Flask app on port 8000
The priority system ensures ETL completes before web services start.
Preferred for development:
- postgres service with health checks
- etl one-shot service (depends on postgres health)
- api service (depends on postgres + etl completion)
- web service (depends on api)
DATABASE_URL- PostgreSQL connection string (default:postgresql://stafferfi:stafferfi_dev@localhost:5432/ecfr_analytics)API_PORT- API listen port (default: 4000)NODE_ENV- Node environment (production/development)
PORT- Web server port (default: 3000)HOSTNAME- Bind address (default: 0.0.0.0)API_URL- Internal API URL for SSR (default: http://api:4000)NEXT_PUBLIC_API_URL- Client-side API URL (default: http://localhost:4000)NEXT_TELEMETRY_DISABLED- Set to 1 in production
DATABASE_URL- PostgreSQL connection string for ETL
- Tool: pnpm with workspaces
- Rationale: Efficient disk usage, strict dependency resolution, fast
- Config:
pnpm-workspace.yamldefines workspace structure
- Web app built with
output: 'standalone'in next.config.ts - Produces self-contained server bundle (no external dependencies)
- Reduces runtime image size significantly
The API build process creates an isolated bundle:
- TypeScript compiled to
dist/ - Copied to
/tmp/apiwith package.json - Production dependencies installed via npm (not pnpm)
- Decouples runtime from monorepo structure
- DuckDB: Used for analytical transformations (columnar, fast aggregations)
- PostgreSQL: Used for API queries (ACID, connection pooling)
- Why Both: DuckDB excels at ETL analytics, PostgreSQL serves web requests
Custom metric calculated in analytics.py:
- Measures frequency and impact of regulatory corrections
- Combines correction count, recency, and magnitude
- Core business logic for the eCFR analytics platform
- Unit Tests: Vitest + Testing Library
- E2E Tests: Cypress
- Storybook: Component development and visual testing
- TypeScript compilation serves as type checking
- No explicit test suite (integration tests via E2E)
test_pipeline.py- Data integrity, checksum verification, analytics validationtest_postgres.py- PostgreSQL schema and ETL verification- Run with:
python test_pipeline.py
- Always run
pnpm installfrom repo root - Use workspace filters:
pnpm --filter @stafferfi/web <command> - Workspace names:
@stafferfi/web,@stafferfi/api
- Postgres runs in Docker (non-persistent tmpfs for MVP)
- Schema changes: Edit
apps/lake/postgres_schema.sqland rebuild ETL - DuckDB file:
apps/lake/ecfr_analytics.duckdb(gitignored)
- Use
docker composefor development (orchestrates dependencies) - Use
./demo.shfor quick demos - The single-container
stafferfi-allimage requires external Postgres - Always check
docker compose logs -fwhen debugging
Inside the container:
supervisorctl -c /etc/supervisord.conf status
supervisorctl -c /etc/supervisord.conf tail <service> stdout
supervisorctl -c /etc/supervisord.conf restart <service>The lake_pipeline program runs ingestion AND ETL sequentially in one command to prevent multiple processes from accessing DuckDB simultaneously. Never run these as separate supervisor programs.
The API (apps/api/src/index.ts) exposes:
/- API metadata and endpoint list/health- Health check/api/stats- Aggregate statistics/api/agencies- List agencies (supports pagination)/api/agencies/:slug- Agency details/api/agencies/top/corrections- Top agencies by correction count/api/agencies/top/rvi- Top agencies by RVI/api/corrections- List corrections (filterable by year, title)/api/corrections/recent- Recent corrections/api/trends/yearly- Yearly trend data/api/trends/monthly- Monthly trend data/api/trends/titles- Top CFR titles/api/reports/word-count- Word count report (if implemented)/api/reports/scorecard- Scorecard report (if implemented)
app/page.tsx- Dashboard with stats cards and chartsapp/agencies/page.tsx- Sortable, searchable agency listapp/agencies/[slug]/page.tsx- Agency detail pageapp/corrections/page.tsx- Corrections listapp/trends/page.tsx- Trend visualizationsapp/reports/*- Report pages