Skip to content

wjlgatech/data-architecture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
data-architecture banner

πŸ—οΈ data-architecture

The open-source Claude plugin for data architects.

Community-built skills that turn Claude into a senior data architect β€” for modeling, platforms, cloud, AI, and modernization.

CI Skills Contributors PRs Welcome License: MIT Stars

πŸš€ Quick Start Β· πŸ“š Browse Skills Β· ✏️ Contribute a Skill Β· πŸ—ΊοΈ Roadmap


What is this?

data-architecture is a living, community-built Claude plugin that gives Claude the skills of a senior data architect.

Think of it like brain modules for your AI β€” each skill you install teaches Claude how to:

  • Design data models using Data Vault 2.0, Star Schema, 3NF, and AUDM
  • Architect modern platforms: Data Mesh, Data Fabric, Lakehouse, Lambda
  • Evaluate and choose cloud technologies: Snowflake, Databricks, Azure Synapse
  • Build supply chain analytics from real KPI catalogs
  • Design AI/ML feature stores and RAG architectures
  • Execute data modernization and migration playbooks

Built at Data Architect School (Accenture, 2024) Β· 5-day curriculum Β· community-driven Β· MIT licensed


⚑ Quick Start

Install a skill into Claude (30 seconds)

# 1. Clone
git clone https://github.com/wjlgatech/data-architecture.git

# 2. Pick a skill and read its instructions
cat skills/day1-modeling/SKILL.md

# 3. Paste into Claude's system prompt or use as a Project instruction

Use a slash command

Once installed, Claude responds to built-in commands:

/design-model I need to model a pharmaceutical supply chain with 30 KPIs

/choose-architecture We have 5 source systems, daily batch + real-time events

/kpi-catalog Order Fulfillment domain, OTIF and Perfect Order needed

/audit-vault Review my Hub-Link-Satellite design for DV 2.0 compliance

πŸ—ΊοΈ Roadmap

Day Module Commands Status
0️⃣ Skill Orchestrator discover-client, assess-maturity, orchestrate-engagement, translate-for-stakeholder, estimate-effort βœ… Active
1️⃣ Intro to Data Architecture & Modeling design-model, choose-architecture, kpi-catalog, audit-vault, dimension-map βœ… Active
2️⃣ Data Management design-mdm, check-data-quality, governance-check, lifecycle-plan, security-review βœ… Active
3️⃣ Cloud Data & Technology design-cloud-platform, design-data-platform, design-ingestion-pipeline, design-api-layer, multi-region-plan βœ… Active
4️⃣ Data Intelligence, Analytics & AI analyze-big-data, design-nlp-pipeline, build-mlops-pipeline, design-realtime-intelligence, responsible-ai-review βœ… Active
5️⃣ Data Strategy & GenAI design-genai-architecture, data-strategy-alignment, build-data-product, modernization-roadmap, operating-model-design βœ… Active

6 skills Β· 30 commands Β· full 5-day curriculum complete. PRs welcome to extend any module.


πŸ“ Repository Structure

data-architecture/
β”œβ”€β”€ skills/                       # 🧠 Claude skills (one folder = one skill module)
β”‚   β”œβ”€β”€ skill-orchestrator/       # Meta-skill: client intake, maturity, engagement orchestration
β”‚   β”œβ”€β”€ day1-modeling/            # Data modeling: Vault, Star, 3NF, AUDM
β”‚   β”‚   β”œβ”€β”€ SKILL.md              # Main Claude instructions (paste into system prompt)
β”‚   β”‚   β”œβ”€β”€ metadata.json         # Skill metadata, version, tags
β”‚   β”‚   β”œβ”€β”€ commands/             # Slash command definitions
β”‚   β”‚   └── references/           # Deep reference material
β”‚   β”œβ”€β”€ day2-data-management/     # MDM, Data Quality, Governance, Lifecycle, Security
β”‚   β”œβ”€β”€ day3-cloud-data/          # Cloud platforms, Lakehouse, FHIR, multi-region
β”‚   β”œβ”€β”€ day4-analytics/           # Big data, clinical NLP, MLOps, real-time, responsible AI
β”‚   β”œβ”€β”€ day5-strategy/            # GenAI/RAG, data products, modernization, operating model
β”‚   └── index.json                # Machine-readable skill registry
β”‚
β”œβ”€β”€ knowledge-base/               # πŸ“š Cross-skill shared domain knowledge
β”‚   β”œβ”€β”€ healthcare-standards.md   # HL7 FHIR, ICD-10, LOINC, SNOMED
β”‚   β”œβ”€β”€ cloud-platform-patterns.md
β”‚   β”œβ”€β”€ analytics-patterns.md
β”‚   └── genai-data-patterns.md
β”‚
β”œβ”€β”€ schemas/                      # πŸ”’ JSON schemas for CI validation
β”œβ”€β”€ templates/                    # 🧩 Copy-paste starters for new skills
β”œβ”€β”€ examples/                     # πŸ“– Real case studies (interactive HTML)
β”‚   β”œβ”€β”€ newlife-pharmacy/         # Pharma supply chain β€” Day 1
β”‚   └── newlife-hospital/         # Healthcare HIS β€” Days 2–5
β”œβ”€β”€ docs/                         # πŸ“„ Architecture decisions, specs
β”œβ”€β”€ tests/                        # βœ… Validation scripts (run by CI)
β”œβ”€β”€ scripts/                      # πŸ› οΈ CLI tooling
└── .github/                      # βš™οΈ Workflows, issue/PR templates

🀝 Contributing

We merge PRs every day. If your skill passes CI, it gets merged.

# 1. Fork + clone
git clone https://github.com/YOUR_USERNAME/data-architecture.git

# 2. Create a branch
git checkout -b skill/your-skill-name

# 3. Copy the template
cp -r templates/skill-template skills/your-skill-name

# 4. Fill in SKILL.md and metadata.json

# 5. Validate locally
npm run validate

# 6. Open a PR β€” we'll review and merge same day

β†’ Full guide: CONTRIBUTING.md

β†’ Easy wins: good first issue


πŸ† Contributors


Paul Wu

πŸ—οΈ Founder

πŸŽ“ Live Portfolio β€” Data Architect School Solutions

Proof of expertise, not slides. Each solution below is a fully-interactive artifact built end-to-end from a real case study. Click to explore.


Day 1 Β· Pharmaceutical Supply Chain Data Model

NewLife Pharmacy β€” D2P supply chain, 30 KPIs, Data Vault 2.0 vs. Star Schema decision

One-line verdict: Chose Data Vault 2.0 over Star Schema because multi-vendor invoice discrepancies require storing multiple source "truths" simultaneously β€” something a Star Schema can't do without picking a winner at ETL time.

Dimension Decision
Architecture Data Vault 2.0 β€” 9 Hubs, 6 Links, 6+ Satellites
Analytics Layer Star Schema Information Marts on top of Business Vault
KPIs catalogued 30 KPIs across Inventory, Order Fulfillment, Transportation, Returns, Warehousing
External enrichment FDA Drug Shortages Β· Weather API Β· FreightWaves Β· IQVIA Β· EPA Emissions
Key insight DV stores supplier A and supplier B versions in separate Satellites; golden record resolved in Business Vault β€” never at load time

β–Ά Open Interactive Solution β†’


Day 2 Β· Unified Health Information System β€” MDM, Governance & Security

NewLife Hospital β€” 300 hospitals, 90+ countries, 200M+ patients, HIPAA + GDPR

One-line verdict: Federated Hub-and-Spoke MDM β€” the only pattern that gives a global patient identity and data residency compliance simultaneously. Pure centralised violates GDPR. Pure decentralised makes "unified" impossible.

Dimension Decision
MDM Architecture Federated Hub-and-Spoke β€” Global Hub (de-ID MPI + reference) + Regional Nodes (full PHI per jurisdiction)
MDM Domains Party (Patient MPI, Physician) Β· Places (Facilities) Β· Things (Drugs) Β· Reference (ICD-10, LOINC, SNOMED)
Regulatory GDPR Β· HIPAA Β· PIPL Β· DPDP Β· PDPA β€” attribute-level consent, data residency routing, right-to-erasure workflow
Data Quality Profile β†’ Rules β†’ Cleanse β†’ Monitor; β‰₯99% patient completeness; 100% drug code accuracy
Governance EGC β†’ DGC β†’ Domain Owners β†’ Stewards Β· RBAC + ABAC Β· Break-glass emergency access with full audit
Lifecycle Hot/Warm/Cold/Archive metadata-driven policy engine β€” 7 lifecycle stages automated
Security Zero Trust + field-level AES-256 + DevSecOps (Build β†’ Test β†’ Deploy β†’ Operate)

β–Ά Open Interactive Solution β†’



Day 3 Β· Cloud Data Platform β€” Azure Medallion Lakehouse

NewLife Hospital β€” Multi-region healthcare data platform, FHIR R4 API, Medallion Lakehouse, 90+ countries

One-line verdict: Azure Medallion Lakehouse (Bronze/Silver/Gold) on Delta Lake β€” the only pattern that handles FHIR R4 streaming ingestion, multi-jurisdictional data residency, and clinical AI feature serving from a single coherent architecture.

Dimension Decision
Platform Azure β€” ADF, Event Hub, Databricks, Delta Lake, Synapse, ADLS Gen2
Architecture Medallion Lakehouse β€” Bronze (raw FHIR) β†’ Silver (cleaned) β†’ Gold (marts)
APIs FHIR R4 with SMART on FHIR OAuth 2.0, geo-load balancing, 99.9% SLA
Multi-Region Hub-and-spoke β€” 5 regional nodes, data residency enforcement per GDPR/PIPL/PDPA
Security Zero Trust, Private Endpoints, Azure Purview RBAC, field-level encryption
Clinical AI Predictive sepsis, NLP discharge summaries, imaging triage β€” all within Medallion Gold

β–Ά Open Interactive Solution β†’


Day 4 Β· Data Intelligence, Analytics & AI

NewLife Hospital — Clinical NLP, Medical Imaging AI, Real-time Sepsis Alerting, MLOps, $2M→$6M Year-1 ROI

One-line verdict: Lambda architecture for batch + streaming analytics, with a unified MLOps platform (MLflow + Databricks) that governs clinical models from FDA SaMD Class II compliance to bedside alerting in under 60 seconds.

Dimension Decision
Big Data Lambda architecture β€” Spark batch (Databricks) + Kafka/Event Hub streaming
Clinical NLP spaCy + Med7 + BERT-clinical pipeline: 92%+ F1 on entity extraction
Imaging AI CNN + ViT ensemble, 3-stage review workflow, FDA SaMD Class II governance
Real-time NEWS2 sepsis score β€” Kafka β†’ Feature Store β†’ model inference β†’ alert in <60s
MLOps MLflow + AzureML: Experiment β†’ Train β†’ Validate β†’ Deploy β†’ Monitor β†’ Retrain
Responsible AI Bias audit, GDPR Art. 22 human-in-loop, FDA SaMD classification, explainability
ROI Year 1: $2M invest β†’ $6M return Β· Year 2: $4M β†’ $16M Β· Year 3: $8M β†’ $40M

β–Ά Open Interactive Solution β†’


Day 5 Β· Data Strategy, GenAI & Final Blueprint

NewLife Hospital β€” RAG pipeline, Data Products, $127M NPV business case, 5-year operating model

One-line verdict: A GenAI Clinical Intelligence Platform built on Retrieval-Augmented Generation, with PHI de-identification gate, vector store serving 200M+ patient records, and a federated data product marketplace β€” all governed by a CDO-led operating model with a measurable $127M NPV over 5 years.

Dimension Decision
GenAI Architecture RAG pipeline β€” PHI De-ID β†’ Chunking β†’ Embedding β†’ Vector Store β†’ LLM β†’ Audit
Vector Store Azure AI Search (hybrid dense + sparse) β€” HIPAA-compliant, 200M+ patient records
Data Products Federated marketplace β€” 12 certified products across Clinical, Ops, Finance, Research
Modernization Legacy EHR β†’ Cloud: Assess (3I) β†’ Lift-and-Shift β†’ Re-platform β†’ Re-architect
Operating Model CDO β†’ Data Domains β†’ Product Owners β†’ Engineers Β· Hub-and-Spoke federated
Business Case $127M NPV, 287% ROI, 18-month payback β€” board-ready financial model

β–Ά Open Interactive Solution β†’


πŸ“– Case Studies

Case Study Domain Days Link
NewLife Pharmacy Supply Chain Pharmaceutical D2P Day 1 View β†’
NewLife Hospital β€” Data Management Healthcare MDM + Governance Day 2 View β†’
NewLife Hospital β€” Cloud Platform Healthcare Lakehouse + FHIR Day 3 View β†’
NewLife Hospital β€” Analytics & AI Clinical NLP, MLOps, Sepsis AI Day 4 View β†’
NewLife Hospital β€” Strategy & GenAI RAG, Data Products, $127M NPV Day 5 View β†’

πŸ“œ License

MIT Β© 2024 wjlgatech and contributors.


Built with ❀️ at Data Architect School · ⭐ Star if useful

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors