MYSQL Analysis of HR Employees’ Attrition
As part of my goals for the month of May. I desired to work on some SQL projects and case studies, Starting with The IBM HR Analytics Employee Attrition & Performance.
As much as most of the attention is on Power BI, Python, and the likes, I believe that mastering SQL for data analysis is a valuable skill that can benefit data analysts and data scientists in a variety of ways. By understanding how to use SQL to analyze data, will aid a deeper understanding of your data and uncover hidden patterns and relationships.
Introduction
Employees are a company’s greatest asset, which is why rising employee attrition rates are giving businesses cause for concern. Attrition is the departure of employees from the organization for any reason (voluntary or involuntary), including resignation, termination, death, or retirement.
It is commonly used to describe the downsizing of a firm’s employee pool by human resources (HR) professionals. In this case, downsizing is voluntary, where employees either resign or retire and aren’t replaced by the company. An organization may not have direct control over staff attrition (like in the case of retirement). In other cases, an organization has direct control over employee attrition in an effort to reduce costs.
The IBM HR Analytics Employee Attrition & Performance contains 1470 rows and 35 columns. The data source is located at kaggle.
Preamble Step
I used Power query to replace the numeric code used to represent some of the variables. For instance, in the Education column, I replaced 1 with ‘Below College’, 2 with ‘College’, 3 with ‘Bachelor’, 4 with ‘Master’, and 5 with ‘Doctor’.
Similarly, for EnvironmentSatisfaction column, JobInvolvement, JobSatisfaction, PerformanceRating, RelationshipSatisfaction, and WorkLifeBalance.
HR Attrition Analysis
After this, the data was imported into the MySQL server for analysis.
Check for the employee with a Doctorate degree with age less than 40
-- check for the employee with a Doctorate degree with age less than 40--
select *
from `hr_employee-attrition`
where Age < 40
having Education = "Doctor";
Employee count by Gender
Employee Education distribution by Department
-- count the Educational level for each department--
SELECT Department, Education, COUNT(*) AS COUNT
FROM `hr_employee-attrition`
Group by Department, Education
Order by Department;
Employee performance rating distribution by Department
Job Role and Average Monthly Income and Average Monthly Rate
-- calcualte the average monthly rate and income across the job role --
SELECT JobRole,
AVG(Monthlyrate) AS Avg_Monthlyrate,
AVG(MonthlyIncome) AS Avg_MonthlyIncome
FROM `hr_employee-attrition`
GROUP BY JobRole
ORDER BY Avg_Monthlyincome DESC;
-- calcualte the average monthly rate and income across the job role and department --
SELECT Department,JobRole,
AVG(Monthlyrate) AS Avg_Monthlyrate,
AVG(MonthlyIncome) AS Avg_MonthlyIcome
FROM`hr_employee-attrition`
GROUP BY Department, JobRole
ORDER BY Avg_Monthlyrate DESC;
Use CASE to check for Attrition across the Age range
-- using the age range and to know the attrition across the age strata--
SELECT
CASE
WHEN Age BETWEEN 18 AND 24 THEN '18-24'
WHEN Age BETWEEN 25 AND 34 THEN '25-34'
WHEN Age BETWEEN 35 AND 44 THEN '35-44'
WHEN Age BETWEEN 45 AND 54 THEN '45-54'
WHEN Age BETWEEN 55 AND 60 THEN '55-60'
else '60+'
END AS Agerange, Attrition, count(*) as count
from `hr_employee-attrition`
where Age <= 60
group by Agerange, Attrition
order by Agerange;
Attrition by Department
Attrition by Education
Attrition by Gender
Conclusion
You can as well as start working on SQL. It is a fundamental tool that is widely used in the field of data analysis.
You can connect with me via Linkedin. Let me know what you feel in the comment session.
Thanks for reading