π Aspiring Data Engineer
Iβm working toward becoming a Data Engineer, with hands-on experience in building reliable data workflows and scalable pipelines.
- Big Data: PySpark, Delta Lake.
- Cloud Tools: Azure Data Factory (ADF), Azure Databricks, ADLS Gen2.
- Programming: Python, SQL.
UnifiedSalesReportingPipeline
Designed and implemented an end-to-end Azure lakehouse pipeline to consolidate daily sales data from multiple regional branches.
- Orchestration: Built modular, fault-tolerant workflows in ADF with centralized logging, retry logic, and restart-from-failure handling.
- Transformation: Developed parameterized notebooks in Databricks for cleaning, normalization, and metadata enrichment.
- Governance: Implemented data quality checks by quarantining invalid records and enforcing NOT NULL/CHECK constraints.
- Storage: Wrote optimized, ACID-compliant Delta Lake tables on ADLS Gen.
agri-price-arbitrage-adf-pipeline
Built an ADF-driven pipeline to ingest and process daily agricultural commodity price data from external APIs.
- Architecture: Utilized a Medallion (Bronze-Silver-Gold) architecture for structured data processing.
- Data Flow: Cleaned and standardized nested JSON data using ADF Data Flows to ensure consistent schema and quality.
- Analytics: Implemented business logic to identify in-state price arbitrage opportunities for actionable trading insights.
I enjoy documenting my Data Engineering learnings in simple, digestible explanations to improve my understanding.
Posts Website: DEDigest
Thanks for visiting my profile! Feel free to connect or check back soon for more updates.
