Skip to content

marameref/SQL_for_Data_Engineers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Perfect ✅ Here's GitHub-style SQL Exam Prep Cheatsheet that’s geared toward Data Engineering Certification.

It includes:

  • 📌 Core SQL Concepts (syntax + quick examples)
  • 📊 Data Engineering–Specific SQL Tasks
  • 💡 Real-world business case problems (with step-by-step solutions)
  • 🛠 Performance tips for big datasets
  • 📝 Practice challenge section

📚 SQL for Data Engineering – Certification Prep Cheatsheet

(Prepare for Datacamp's Data Engineer Certification Exam)


🛠 SQL for Data Engineering – Exam Prep Cheatsheet

🚀 A complete SQL quick-reference for Data Engineers preparing for DataCamp / Data Engineering certification. Includes syntax, examples, and real-world challenges.


1️⃣ Core SQL Commands

-- Select specific columns
SELECT column1, column2
FROM table;

-- Filter rows
SELECT *
FROM employees
WHERE department = 'Sales'
  AND salary > 50000;

-- Sort results
ORDER BY salary DESC, name ASC;

-- Limit output
LIMIT 10;

2️⃣ Joins (Multiple Tables)

-- INNER JOIN: Only matching rows
SELECT o.order_id, c.customer_name
FROM orders o
INNER JOIN customers c
  ON o.customer_id = c.customer_id;

-- LEFT JOIN: All from left, match from right
SELECT c.customer_name, o.order_id
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id;

-- Multi-table join
SELECT o.order_id, c.customer_name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id;

💡 Tip: Use table aliases (o, c, p) to keep queries clean.


3️⃣ Aggregations & Grouping

SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000
ORDER BY avg_salary DESC;

4️⃣ Window Functions (Data Engineering Gold 🥇)

-- Rank top products by sales per category
SELECT product_id, category,
       RANK() OVER (PARTITION BY category ORDER BY SUM(sales) DESC) AS rank_in_cat
FROM sales_data
GROUP BY product_id, category;

5️⃣ CTEs & Subqueries

-- Common Table Expression
WITH high_value_orders AS (
    SELECT order_id, customer_id, total_amount
    FROM orders
    WHERE total_amount > 1000
)
SELECT customer_id, COUNT(*) AS big_orders
FROM high_value_orders
GROUP BY customer_id;

-- Subquery example
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

6️⃣ Data Cleaning in SQL

-- Handle NULLs
SELECT COALESCE(middle_name, 'N/A') AS middle_name_fixed
FROM employees;

-- Remove spaces
UPDATE employees
SET name = TRIM(name);

-- Change formats
SELECT TO_CHAR(order_date, 'YYYY-MM') AS order_month
FROM orders;

7️⃣ Date & Time Functions

SELECT order_id,
       EXTRACT(YEAR FROM order_date) AS year,
       EXTRACT(MONTH FROM order_date) AS month,
       CURRENT_DATE AS today
FROM orders;

8️⃣ Performance Tips

  • ✅ Use EXPLAIN to check query plans.
  • ✅ Select only needed columns (avoid SELECT * in production).
  • ✅ Index join keys & filter columns.
  • ✅ Use CTEs for step-by-step logic in complex pipelines.

💼 Real-World Challenges

Q1: Find the top 3 products by revenue in 2024. Q2: Identify customers who made no purchases in the last 6 months. Q3: Get the month-over-month growth rate of orders.


📝 Practice Task

Dataset: orders, order_details, products, customers Goal: Return:

  • Customer name
  • Total orders in 2024
  • Total spend in 2024 Sort by total spend DESC.

💡 Hint:

SELECT c.customer_name,
       COUNT(o.order_id) AS total_orders,
       SUM(od.quantity * p.price) AS total_spend
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_details od ON o.order_id = od.order_id
JOIN products p ON od.product_id = p.product_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2024
GROUP BY c.customer_name
ORDER BY total_spend DESC;

Do you want me to add that section?

About

🎯 50+ SQL Interview Questions for Data Engineering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors