SQL Agro Pipeline 🚜 🌽

📋 Project Overview

This project demonstrates an end-to-end Data Engineering pipeline applied to the Agriculture sector.

The goal was to transform raw, unstructured data (CSV) into a high-performance Star Schema Data Warehouse using Python for ingestion and SQL for data modeling and analysis. The project focuses on analyzing global crop yield trends and Year-Over-Year (YoY) growth using Window Functions.

⚙️ Architecture & Technologies

Ingestion (ETL): Python (Pandas) to clean and load raw CSV data into a staging area.
Database: SQLite (embedded relational database).
Data Modeling: Transformation of flat tables into a Star Schema (Fact & Dimensions) using SQL DDL.
Analysis: Complex queries using Window Functions (LAG, OVER), CTEs, and Views for anomaly detection.

🗂️ Data Modeling (Star Schema)

To ensure data integrity and query performance, the database was normalized from a single flat table into a Relational Schema:

Fact Table: fact_crop_yields (Contains metrics: Yield, Rainfall, Temperature, Pesticides).
Dimension Tables: dim_crops, dim_countries.

🧠 Key SQL Skills Demonstrated

The core of this project is the notebooks/sql_data_modeling.ipynb file, which covers:

Normalization: Converting raw text columns into normalized Dimension tables.

Window Functions: Using LAG() to calculate Year-over-Year Growth %:

LAG(current_yield) OVER (PARTITION BY crop_name ORDER BY year)

Data Quality Views: Creating a persistent SQL View (view_data_anomalies) to automatically flag data outliers (e.g., Temperature > 40°C).

🚀 How to Run

Clone the repository:

git clone [https://github.com/PedruMota/Advanced-SQL-Agro-Pipeline.git](https://github.com/PedruMota/sql-crop-yield-analytics.git)
cd SQL-Agro-Pipeline

Set up the environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Run the ETL Pipeline:
```
python src/01_ingest_data.py
```
Explore the Analysis: Open notebooks/sql_data_modeling.ipynb in VS Code or Jupyter Lab to see the step-by-step SQL transformation.

📊 Sample Results

Year	Crop	Yield (hg/ha)	Growth %
1993	Maize	28,997	📈 +3.70%
1994	Maize	28,344	📉 -2.25%
1996	Maize	30,763	📈 +5.35%

Author: Pedro Mota

Dataset Source: Kaggle Crop Yield Prediction ...

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SQL Agro Pipeline 🚜 🌽

📋 Project Overview

⚙️ Architecture & Technologies

🗂️ Data Modeling (Star Schema)

🧠 Key SQL Skills Demonstrated

🚀 How to Run

📊 Sample Results

About

Uh oh!

Releases

Packages

Languages

PedruMota/sql-crop-yield-analytics

Folders and files

Latest commit

History

Repository files navigation

SQL Agro Pipeline 🚜 🌽

📋 Project Overview

⚙️ Architecture & Technologies

🗂️ Data Modeling (Star Schema)

🧠 Key SQL Skills Demonstrated

🚀 How to Run

📊 Sample Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages