Reproducible, portfolio-oriented project that builds an end-to-end data flow in SQL Server:
raw ingestion, clean standardization, data-quality controls with quality-safe views, and a BI semantic layer ready for dashboards.
Olist e-commerce (public).
SQL is provided for demo/portfolio purposes; dataset ownership remains with the original authors.
- Executive Insights (SQL): Data quality, delivery & BI-ready views
- Layered architecture
- Conventions
- Quick start (runbook)
- Required privileges
- ERD (clean)
- Data-quality controls & published views
- BI layer (overview)
- Troubleshooting
- Credits & license
| Order | Folder | Purpose | README |
|---|---|---|---|
| 00 | sql_server/00_environment |
Initial setup (DB, schemas, types, utilities). | open |
| 01 | sql_server/01_raw |
Load raw tables (source-faithful mirror). | open |
| 02 | sql_server/02_clean |
Cleaning/standardization and trusted PK/FK. | open |
| 99 | sql_server/99_quality |
QA: sanity & deep checks + publish quality.* (invalid IDs, valid subsets, repaired timestamps, KPI snapshot). | open |
| 03 | sql_server/03_bi |
BI semantic layer bi.* and showcase queries (KPIs, lead time, payment mix, repeaters). | open |
- Naming:
lower_snake_case - Idempotent scripts:
CREATE OR ALTER - Percentiles (SQL Server 2019+):
PERCENTILE_CONT - All scripts assume SQLCMD Mode enabled in SSMS / Azure Data Studio.
Recommended order: 00 → 01 → 02 → 99 → 03.
Enable SQLCMD Mode and run from the repo root:
USE olist_sqlsrv;
-- 00) Environment
:r .\sql_server\00_environment\00_create_database_and_schemas.sql
-- 01) Raw (see 01_raw README for exact names)
:r .\sql_server\01_raw\01_create_raw_tables.sql
-- 02) Clean (PK/FK creation and domain normalization)
:r .\sql_server\02_clean\02a_build_clean_tables.sql
-- 99) Quality (QA + quality.* views)
:r .\sql_server\99_quality\99a_quality_checks.sql
:r .\sql_server\99_quality\99b_quality_deep_checks.sql
:r .\sql_server\99_quality\99c_quality_fixes.sql
:r .\sql_server\99_quality\99d_quality_sanity_checks.sql
-- 03) BI (bi.* views and README queries)
:r .\sql_server\03_bi\03a_publish_bi_views.sql
:r .\sql_server\03_bi\03b_bi_readme_queries.sql
:r .\sql_server\03_bi\03c_bi_extra_views.sql
:r .\sql_server\03_bi\03d_bi_extra_readme_queries.sql- CREATE SCHEMA, CREATE VIEW (schemas
quality,bi) - ALTER TABLE ... WITH CHECK CHECK CONSTRAINT (schema
clean)
- T1 —
approved_at < purchase_timestamp - T2 —
carrier_date < approved_at - T3 —
customer_date < carrier_date - T4 —
customer_date < purchase_timestamp
| View | Summary |
|---|---|
quality.invalid_orders_time_logic |
One row per time-logic violation (T1–T4). |
quality.invalid_orders_ids |
Distinct order_id with at least one violation. |
quality.invalid_orders_summary |
Count by violation type/description. |
quality.valid_orders |
clean.orders minus invalid orders. |
quality.order_items_valid |
clean.order_items scoped to valid orders. |
quality.payments_valid |
clean.payments scoped to valid orders. |
quality.orders_repaired |
Monotonic timestamp fixes (*_fixed columns) without mutating clean.*. |
quality.orders_quality_snapshot |
One-row KPIs: total_orders, invalid_orders, valid_orders, invalid_ratio. |
👉 For the full validation set (invalid orders, reconciliation, repaired checks, violation breakdown, etc.), see 99_quality/README.
Detailed screenshots for:
- Sanity counts & orphans
- Uniqueness dups
- Nulls in keys
- Domain/range violations
- Time logic violations
- Economic consistency diffs
- FK trusted status
- Published views list
👉 See the complete documentation in 99_quality/README.
bi.v_orders_corebi.v_order_items_enrichedbi.v_payments_per_orderbi.v_daily_salesbi.v_delivery_lead_timebi.v_late_ordersbi.v_payment_mixbi.v_kpi_summarybi.v_category_sales_monthlybi.v_state_lead_timebi.v_repeat_customers
- lead_time_days — days between
order_purchase_timestampandactual_delivery_date(after fixes) - late_flag —
1ifactual_delivery_date > estimated_delivery_date(inclusive end-of-day) - gross_sales —
price + freight_value(pre-discounts; aggregated per view) - payment mix — % by transactions and by amount (
bi.v_payment_mix) - unknown — normalized fallback when
product_categoryorpayment_typeis missing/blank - is_delivered —
1when the order is delivered (base filter for sales/lead-time views)
- BI layer consumes:
quality.*→ e.g.,quality.valid_orders,quality.order_items_valid,quality.payments_validclean.*→ standardized tables (customer, product, geo, etc.)
This ensures analytics are always based on validated and repaired data, not directly on raw ingestion.
- RAW vs CLEAN row-count mismatches in 99a → revisit
02_cleantransforms - Non-zero findings in 99b → are flags; fix upstream or rely on
quality.orders_repairedfor analytics - Any FK with
is_not_trusted = 1→ re-run withWITH CHECK CHECK CONSTRAINT still_bad_after_fix > 0(99d-03) → should be 0; otherwise inspect edge timestamps
- Dataset: Olist e-commerce (public)
- SQL: provided for portfolio/demo purposes
- Ownership: belongs to the original authors
- License: see MIT
