Welcome to my analysis of the data job market, specifically focused on data analyst roles. This project stems from a desire to better understand and navigate the job market. It explores the highest-paying and most in-demand skills to identify optimal opportunities for data analysts.
The dataset, sourced from Luke Barousse's Python Course, provides a solid foundation for this analysis, offering detailed insights into job titles, salaries, locations, and key skills. Using Python, I address critical questions such as which skills are most in demand, salary trends, and how demand correlates with salaries in the field of data analytics.
Here are the key questions I aim to answer in this project:
- Which skills are most in demand for the top three most popular data roles?
- What are the trends for in-demand skills among Data Analysts?
- How do job roles and skills translate to pay for Data Analysts?
- What are the best skills for Data Analysts to learn? (High Demand and High Paying)
For my in-depth exploration of the data analyst job market, I utilized several essential tools:
- Python: The core of my analysis, enabling me to extract valuable insights from the data. I also leveraged the following Python libraries:
- Pandas: Used for comprehensive data analysis and manipulation.
- Matplotlib: Employed for creating basic visualizations.
- Seaborn: Used to produce more advanced and polished visualizations.
- Jupyter Notebooks: Provided a seamless environment for running Python scripts, integrating notes, and documenting the analysis.
- Visual Studio Code: My primary editor for executing Python scripts and managing code efficiently.
- Git & GitHub: Crucial for version control, project tracking, and sharing my Python scripts and findings for collaboration and transparency.
This section details the steps undertaken to prepare the data for analysis, focusing on ensuring its accuracy and usability.
I begin by importing the required libraries and loading the dataset, followed by performing initial data cleaning to ensure high data quality.
# Importing Libraries
import ast
import pandas as pd
import seaborn as sns
from datasets import load_dataset
import matplotlib.pyplot as plt
# Loading Data
dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()
# Data Cleanup
df['job_posted_date'] = pd.to_datetime(df['job_posted_date'])
df['job_skills'] = df['job_skills'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)To identify the most in-demand skills for the top 3 most popular data roles, I filtered the dataset to focus on the most frequently occurring positions and extracted the top 5 skills associated with each of these roles. This analysis highlights the most sought-after job titles and their corresponding key skills, providing insight into the skills to prioritize based on the role of interest.
View my notebook with detailed steps here: 2_Skill_Demand.ipynb
fig, ax = plt.subplots(len(job_titles), 1)
for i, job_title in enumerate(job_titles):
df_plot = df_skills_perc[df_skills_perc['job_title_short'] == job_title].head(5)[::-1]
sns.barplot(data=df_plot, x='skill_percent', y='job_skills', ax=ax[i], hue='skill_count', palette='dark:b_r')
plt.show()A bar graph showing salaries for the top 3 data roles and their top 5 associated skills.
- SQL is the most in-demand skill for both Data Analysts and Data Scientists, appearing in over half of the job postings for each role. For Data Engineers, Python is the top skill, listed in 68% of job postings.
- Data Engineers tend to require more specialized technical expertise, such as AWS, Azure, and Spark, whereas Data Analysts and Data Scientists are expected to excel in more general data tools like Excel and Tableau.
- Python is a versatile skill that is highly valued across all three roles, with the highest demand among Data Scientists (72%) and Data Engineers (65%).
To analyze skill trends for Data Analysts in 2023, I filtered job postings specific to data analyst roles and grouped the skills by the posting month. This allowed me to identify the top 5 skills for data analysts each month, highlighting their popularity over the course of 2023.
You can view my notebook with detailed steps here: 3_Skills_Trend.
from matplotlib.ticker import PercentFormatter
df_plot = df_DA_US_percent.iloc[:, :5]
sns.lineplot(data=df_plot, dashes=False, legend='full', palette='tab10')
plt.gca().yaxis.set_major_formatter(PercentFormatter(decimals=0))
plt.show()
Bar chart showcasing the top trending skills for data analysts in the US during 2023.
- SQL consistently remained the most in-demand skill throughout the year, though its demand gradually declined over time.
- Excel saw a notable rise in demand starting in September, ultimately surpassing both Python and Tableau by year-end.
- Python and Tableau maintained relatively stable demand throughout the year, with minor fluctuations, continuing to be key skills for data analysts. Power BI, while less in demand compared to the others, displayed a slight upward trend toward the end of the year.
Check out my notebook with detailed steps here: 4_Salary_Analysis.
sns.boxplot(data=df_US_top6, x='salary_year_avg', y='job_title_short', order=job_order)
ticks_x = plt.FuncFormatter(lambda y, pos: f'${int(y/1000)}K')
plt.gca().xaxis.set_major_formatter(ticks_x)
plt.show()
A box plot illustrating the salary distributions for the top 6 data-related job titles.
-
There is a notable variation in salary ranges across different job titles. Senior Data Scientist roles stand out with the highest earning potential, reaching up to $600K, highlighting the industry's high demand for advanced data expertise and experience.
-
Both Senior Data Engineer and Senior Data Scientist positions exhibit a significant number of high-end outliers, indicating that exceptional skills or unique circumstances can result in substantial compensation. In comparison, Data Analyst roles show more consistent salaries with fewer outliers.
-
Median salaries tend to rise with the level of seniority and specialization. Senior roles, such as Senior Data Scientist and Senior Data Engineer, not only command higher median salaries but also exhibit greater variability in earnings, reflecting the increased complexity and responsibility associated with these positions.
Next, I refined my analysis to focus specifically on Data Analyst roles. I examined the highest-paying skills and the most in-demand skills, presenting the findings using two bar charts.
fig, ax = plt.subplots(2, 1)
# Top 10 Highest Paid Skills for Data Analysts
sns.barplot(data=df_DA_top_pay, x='median', y=df_DA_top_pay.index, hue='median', ax=ax[0], palette='dark:b_r')
# Top 10 Most In-Demand Skills for Data Analystsr')
sns.barplot(data=df_DA_skills, x='median', y=df_DA_skills.index, hue='median', ax=ax[1], palette='light:b')
plt.show()Here’s an overview of the highest-paying and most in-demand skills for Data Analysts in the United States:
Two distinct bar charts illustrating the highest-paying skills and the most in-demand skills for Data Analysts in the United States.
-
The top chart reveals that specialized technical skills such as
dplyr,Bitbucket, andGitlabare linked to higher salaries, with some reaching up to $200K. This indicates that advanced technical expertise significantly boosts earning potential. -
The bottom chart shows that foundational skills like
Excel,PowerPoint, andSQLare the most sought-after, even though they may not lead to the highest salaries. This underscores the essential role these core skills play in securing data analysis positions. -
There is a distinct difference between the skills associated with the highest salaries and those most in demand. To maximize career opportunities, data analysts should aim to build a well-rounded skill set that incorporates both high-paying specialized skills and widely demanded foundational skills.
To determine the most optimal skills to learn (those that are both highly paid and in high demand), I calculated the percentage of skill demand and the median salary for these skills. This approach makes it easier to identify the top skills to focus on.
You can view my notebook with detailed steps here: 5_Optimal_Skills.
from adjustText import adjust_text
import matplotlib.pyplot as plt
plt.scatter(df_DA_skills_high_demand['skill_percent'], df_DA_skills_high_demand['median_salary'])
plt.show()
A scatter plot showcasing the most optimal skills for Data Analysts in the US, highlighting those that are both high-paying and in high demand.
-
The skill
Oraclestands out with the highest median salary of nearly $97K, despite being less frequently mentioned in job postings, indicating the high value placed on specialized database expertise in the data analyst field. -
Widely required skills like
ExcelandSQLare prominent in job postings but tend to have lower median salaries compared to specialized skills such asPythonandTableau, which offer higher salaries and are moderately common in job listings. -
Skills like
Python,Tableau, andSQL Serverare positioned near the top of the salary range while also being fairly prevalent in job postings, suggesting that proficiency in these tools can lead to strong career opportunities in data analytics.
Let’s enhance the visualization by including different technologies in the graph. We'll assign color labels based on the type of technology (e.g., {Programming: Python}).
from matplotlib.ticker import PercentFormatter
# Create a scatter plot
scatter = sns.scatterplot(
data=df_DA_skills_tech_high_demand,
x='skill_percent',
y='median_salary',
hue='technology', # Color by technology
palette='bright', # Use a bright palette for distinct colors
legend='full' # Ensure the legend is shown
)
plt.show()
A scatter plot showcasing the most optimal skills (those that are high-paying and in high demand) for data analysts in the US, featuring color-coded labels to represent various technologies.
-
The scatter plot reveals that programming skills (blue) tend to cluster at higher salary levels compared to other categories, highlighting the significant salary benefits associated with programming expertise in the data analytics field.
-
Database skills (orange), such as Oracle and SQL Server, are linked to some of the highest salaries among data analyst tools, underscoring the high demand and value of data management and manipulation expertise in the industry.
-
Analyst tools (green), like Tableau and Power BI, are widely mentioned in job postings and offer competitive salaries. These tools are essential for data roles, providing strong earning potential and versatility across various data tasks.
This project allowed me to deepen my understanding of the data analyst job market and improve my technical skills in Python. Here are a few key takeaways:
- Advanced Python Proficiency: Leveraging libraries such as Pandas for data manipulation, Seaborn and Matplotlib for data visualization, and others enabled me to perform complex analyses more efficiently.
- Importance of Data Cleaning: I learned that thorough data cleaning and preparation are vital to ensure accurate and reliable insights during analysis.
- Strategic Skill Assessment: This project underscored the importance of aligning skills with market demand. Understanding the connection between skill demand, salary, and job availability is critical for informed career planning in tech.
This project offered several important insights into the data analyst job market:
- Correlation Between Skill Demand and Salary: Skills that are in high demand, such as Python and Oracle, often command higher salaries, showcasing a clear relationship between market demand and compensation.
- Evolving Market Trends: Skill demand in the data analytics field evolves, emphasizing the need to stay updated with industry trends for sustained career growth.
- Economic Value of Skill Development: Identifying skills that are both in-demand and well-compensated can help data analysts prioritize their learning efforts to maximize career and financial returns.
This project came with its share of challenges, offering valuable learning experiences:
- Data Inconsistencies: Managing missing or inconsistent data entries required meticulous data-cleaning techniques to maintain the integrity and reliability of the analysis.
- Complex Data Visualization: Creating clear and effective visualizations for complex datasets was challenging but essential for presenting insights in a compelling and understandable way.
- Balancing Breadth and Depth: Striking the right balance between diving deeply into specific analyses and maintaining a broad overview of the data landscape was a constant challenge to ensure thorough coverage without losing focus.
This analysis of the data analyst job market has been highly insightful, shedding light on the key skills and trends that define this dynamic field. The findings not only deepen my understanding but also offer actionable guidance for anyone aiming to advance in data analytics. As the industry continues to evolve, ongoing analysis will be crucial to staying competitive. This project serves as a strong foundation for future research and emphasizes the importance of continuous learning and adaptability in the data analytics profession. And last but not the least, a massive thanks to the amazing Luke Barousse. Without him, this project would never exist.

