Skip to content

This is an agent with skills that most AI IDEs can implement for data analytics.

License

Notifications You must be signed in to change notification settings

SankaiAI/crushdataai-agent

Repository files navigation

CrushData AI

Data Analyst Intelligence for AI IDEs

An AI skill that provides structured, professional data analysis workflows with built-in validation - helping AI coding assistants perform data analysis like a careful human analyst.

CrushData AI Landing Page

🎯 What It Does

CrushData AI provides:

  • 10 Analysis Workflows - EDA, Dashboard, A/B Test, Cohort, Funnel, Time Series, Segmentation, Data Cleaning, Ad-hoc, KPI Reporting
  • 400+ Searchable Patterns - Metrics, SQL, Python, Charts, Database Tips, Common Mistakes
  • Context-Building Protocol - Forces AI to ask questions and validate before delivering results
  • 4 Industry Modules - SaaS, E-commerce, Finance, Marketing specific metrics

πŸš€ Quick Start

Install via CLI

npm install -g crushdata

What npm install -g crushdata Does

The -g flag means Global Install:

Local Install (npm install) Global Install (npm install -g)
Location ./node_modules/ in current folder System-wide (e.g., %APPDATA%\npm\)
Scope Only available in that project Available everywhere on your computer
Use Case Libraries for your project CLI tools you want to run anywhere

Then in any project:

cd your-project
crushdata init --ai all    # All AI IDEs
crushdata init --ai claude # Claude Code only

What crushdata init Does

When you run crushdata init --ai all, the CLI:

  1. Creates .shared/data-analyst/ - Contains the BM25 search engine and 13 CSV knowledge databases (~400 rows of data analyst patterns)

  2. Creates AI IDE config files based on --ai flag:

    Flag Creates
    --ai claude .claude/skills/data-analyst/SKILL.md
    --ai cursor .cursor/commands/data-analyst.md
    --ai windsurf .windsurf/workflows/data-analyst.md
    --ai antigravity .agent/workflows/data-analyst.md
    --ai copilot .github/prompts/data-analyst.prompt.md
    --ai kiro .kiro/steering/data-analyst.md
    --ai all All of the above
  3. Your AI IDE automatically detects the config files and enables the /data-analyst command

Updating

To update the CLI and refresh your project's AI skill files:

npm install -g crushdata@latest
# Update specific IDE (recommended):
crushdata init --ai cursor --force

# Or update everything:
crushdata init --force

πŸ”Œ Data Connections (New in v1.2)

CrushData AI now features a Connection Manager to securely handle your data credentials.

1. Add Data Sources

Run the connect command to open the management UI:

crushdata connect
  • Supported Types: CSV, MySQL, PostgreSQL, Shopify, BigQuery, Snowflake
  • Private & Secure: Credentials are stored locally on your machine (~/.crushdata/connections.json). They are never uploaded to any server or included in the npm package.

CrushData AI Connection Manager

Note

Persistence: Once you add a connection, you can close the UI (Ctrl+C). The AI IDE reads the saved connection details directly from your local config file, so the server does NOT need to keep running.

2. View Saved Connections

crushdata connections

πŸ“ˆ Data Visualization (New in v1.3)

CrushData AI generates interactive dashboards to visualize your analysis results.

1. View Dashboard

Run the dashboard command to open the local React-based viewer:

# Using installed package (generally faster)
crushdata dashboard

# OR using npx (if not in PATH)
npx crushdata dashboard

Advanced Dashboard Advanced charts visualization (Funnel, Gauge, Radar, etc.)

Simple Dashboard Standard charts visualization (Line, Bar, Pie, etc.)

2. Features

  • Tier 1 Charts: Line, Bar, Pie, Area, Scatter, Radar (via Recharts)
  • Tier 2 Charts: Funnel, Gauge, Heatmap, Sankey, Treemap, Waterfall (via Plotly)
  • Auto-Refresh: The dashboard automatically updates when your AI agent writes new data to reports/dashboards/.
  • Data Refresh: Use the "Refresh" button πŸ”„ on any chart to re-run the saved SQL/Python query against your data source.

3. AI Workflow Example

When you ask an AI agent (like Claude or Cursor) to "create a dashboard", it follows this process:

  1. Analyzes Data: The AI runs SQL/Python to calculate metrics and aggregates.
  2. Generates JSON: It creates a file at reports/dashboards/your-topic.json using the CrushData schema.
  3. Visualizes: You run the dashboard command to see the rendered charts instantly.

The AI automatically selects the best chart type (e.g., Line for trends, Bar for comparisons) based on your data.

πŸ’» Usage

Step 1: Initialize

crushdata init --ai all

Step 2: Use in AI IDE

The skill activates automatically (Claude) or via slash command (others).

Example Workflow:

  1. User Request: "Analyze the sales trends in my-shop-data"
  2. AI Action: The AI checks your saved connections.
  3. AI Action: The AI runs:
    npx crushdata snippet my-shop-data --lang python
  4. Result: The AI receives the secure code to connect to your data (read-only) and proceeds with analysis.

Claude Code

Claude Code

The skill activates automatically when you request data analysis work. Just chat naturally:

Analyze customer churn for my SaaS product

Cursor / Windsurf / Antigravity

Use the slash command to invoke the skill:

/data-analyst Analyze customer churn for my SaaS product

Kiro

Type / in chat to see available commands, then select data-analyst:

/data-analyst Analyze customer churn for my SaaS product

GitHub Copilot

In VS Code with Copilot, type / in chat to see available prompts, then select data-analyst:

/data-analyst Analyze customer churn for my SaaS product

Example Prompts

Analyze customer churn for my SaaS product
Create a dashboard for e-commerce analytics
Calculate MRR and ARR from subscription data
Build a cohort retention analysis
Perform A/B test analysis on conversion rates

Search Directly

# Search workflows
python3 .shared/data-analyst/scripts/search.py "EDA" --domain workflow

# Search metrics
python3 .shared/data-analyst/scripts/search.py "churn" --domain metric

# Search SQL patterns
python3 .shared/data-analyst/scripts/search.py "cohort" --domain sql

# Industry-specific
python3 .shared/data-analyst/scripts/search.py "MRR" --industry saas

πŸ“Š Search Domains

Domain Content
workflow Step-by-step analysis processes
metric Metric definitions with formulas
chart Visualization recommendations
cleaning Data quality patterns
sql SQL patterns (window functions, cohorts)
python pandas/polars code snippets
database PostgreSQL, BigQuery, Snowflake tips
report Dashboard UX guidelines
validation Common mistakes to avoid

🏭 Industry Modules

Industry Key Metrics
saas MRR, ARR, Churn, CAC, LTV, NRR
ecommerce Conversion, AOV, Cart Abandonment
finance Margins, ROI, Cash Flow, Ratios
marketing CTR, CPA, ROAS, Lead Conversion

πŸ”’ How It Works

Context-Building Protocol

  1. Discovery - AI asks about business context before coding
  2. Data Profiling - Mandatory checks before analysis
  3. Data Cleaning (ETL) - Handle missing values/duplicates in etl/ folder
  4. Validation - Verify JOINs, aggregations, and totals
  5. Sanity Checks - Compare to benchmarks before delivery

Python Environment

To prevent global conflicts, the AI is instructed to:

  1. Check: Look for existing venv or .venv.
  2. Create: If missing, run python3 -m venv venv.
  3. Reports: Save all validation/profiling outputs to reports/ folder. Create if missing.

This prevents the common AI mistakes:

  • ❌ Wrong metric definitions
  • ❌ Duplicate row inflation
  • ❌ Incorrect JOIN types
  • ❌ Unreasonable totals
  • ❌ Cluttered workspaces (scripts are organized in analysis/ and etl/)

πŸ“ License

Apache 2.0

About

This is an agent with skills that most AI IDEs can implement for data analytics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published