DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Automating CCAR Data Integrity: A Python Firewall for Federal Regulatory Compliance

Automating CCAR Data Integrity: A Python Firewall for Federal Regulatory Compliance

Comments
2 min read
Day 24: Spark Structured Streaming

Day 24: Spark Structured Streaming

Comments
1 min read
Modern Data Pipelines: Why Five Layers Changed Everything (Part 1 of 3)

Modern Data Pipelines: Why Five Layers Changed Everything (Part 1 of 3)

Comments
6 min read
Day 23: Spark Shuffle Optimization

Day 23: Spark Shuffle Optimization

Comments
1 min read
Day 22: Spark Shuffle Deep Dive

Day 22: Spark Shuffle Deep Dive

Comments
1 min read
Day 20: Handling Bad Records & Data Quality in Spark

Day 20: Handling Bad Records & Data Quality in Spark

Comments
1 min read
Data-Architect-Master-Professional-Workbook

Data-Architect-Master-Professional-Workbook

Comments
1 min read
Day 18: Spark Performance Tuning

Day 18: Spark Performance Tuning

Comments
1 min read
Day 19: Spark Broadcasting & Caching

Day 19: Spark Broadcasting & Caching

Comments
1 min read
Designing a YouTube Digest for Signal Over Noise

Designing a YouTube Digest for Signal Over Noise

Comments
4 min read
dbt & Airflow in 2025: Why These Data Powerhouses Are Redefining Engineering

dbt & Airflow in 2025: Why These Data Powerhouses Are Redefining Engineering

Comments
11 min read
Day 21: Building a Production-Grade Data Quality Pipeline with Spark & Delta

Day 21: Building a Production-Grade Data Quality Pipeline with Spark & Delta

Comments
1 min read
Why Most MIS Reporting Systems Break Before Data Processing Starts

Why Most MIS Reporting Systems Break Before Data Processing Starts

Comments
1 min read
The Missing Step in RAG: Why Your Vector DB is Bloated (and how to fix it locally)

The Missing Step in RAG: Why Your Vector DB is Bloated (and how to fix it locally)

1
Comments
3 min read
Behind the Scenes of Data Ingestion: How Small Issues Cause Big Headaches

Behind the Scenes of Data Ingestion: How Small Issues Cause Big Headaches

1
Comments
3 min read
Building a CDC Skyscraper: How SeaTunnel Leverages Debezium Under the Hood

Building a CDC Skyscraper: How SeaTunnel Leverages Debezium Under the Hood

Comments
3 min read
The Bear Awakens: From Pure Speed to Massive Endurance (640 Million Rows Tested)

The Bear Awakens: From Pure Speed to Massive Endurance (640 Million Rows Tested)

Comments
16 min read
Bulletproof Power Query (Part 2): A Smart, Fuzzy-Match Rename Function

Bulletproof Power Query (Part 2): A Smart, Fuzzy-Match Rename Function

Comments
4 min read
System Architecture Analysis: The Data Pipeline Issues of TraderKnows

System Architecture Analysis: The Data Pipeline Issues of TraderKnows

Comments
2 min read
Part 1: Database Concepts & Architecture

Part 1: Database Concepts & Architecture

Comments
14 min read
Beyond Tagging: A Blueprint for Real-Time Cost Attribution in Data Platforms

Beyond Tagging: A Blueprint for Real-Time Cost Attribution in Data Platforms

Comments
9 min read
New release: LightningChart Python 2.1

New release: LightningChart Python 2.1

Comments
1 min read
Why Your Model is Failing (Hint: It’s Not the Architecture)

Why Your Model is Failing (Hint: It’s Not the Architecture)

Comments
4 min read
The Time Our Pipeline Processed the Same Day’s Data 47 Times

The Time Our Pipeline Processed the Same Day’s Data 47 Times

Comments
5 min read
Firehose and Iceberg Tables

Firehose and Iceberg Tables

Comments
4 min read
loading...