Tools: SQL (PostgreSQL) · Data Analysis · KPI Reporting
Domain: HR Operations · Workforce Management · Data Quality
Author: Tai Nguyen | GitHub
This project simulates a real-world workforce operations dataset and answers 10 business questions a Data Analyst would face in an operations or HR analytics role. It covers employee attendance, shift scheduling, task productivity, and performance tracking — mirroring the KPI workflows I support in my current role.
The dataset is synthetic but modeled after realistic operational patterns: scheduling gaps, late arrivals, quality variance across departments, and productivity trends.
| # | Question | SQL Skills Used |
|---|---|---|
| 01 | How many active employees does each department have, and what is the average hourly rate? | JOIN, GROUP BY, aggregation |
| 02 | Which employees have the highest and lowest attendance rates? | LEFT JOIN, CASE, calculated fields |
| 03 | How often is each employee late, and by how many minutes on average? | TIMESTAMP math, filtering |
| 04 | Who are the top performers based on task completion and quality score? | Multi-aggregate, ORDER BY |
| 05 | How many hours did each employee work, and who is under target? | TIMESTAMP math, CASE |
| 06 | What is the cumulative task output day-over-day? | Window: SUM() OVER (ORDER BY) |
| 07 | Who is the top performer in each department? | Window: RANK() OVER (PARTITION BY) |
| 08 | Which employees score below the company-wide average quality score? | CTE, CROSS JOIN, variance |
| 09 | Which scheduled shifts had no attendance record (coverage gaps)? | Anti-join: LEFT JOIN + IS NULL |
| 10 | Full operational dashboard: attendance + quality + hours + performance flag | Multi-CTE chain, COALESCE, CASE |
departments employees shifts
----------- --------- ------
department_id --> department_id shift_id
department_name employee_id --> employee_id
location first_name shift_date
manager_id last_name scheduled_start
job_title scheduled_end
hire_date shift_type
hourly_rate
employment_type attendance tasks
is_active ---------- -----
attendance_id task_id
shift_id <-- employee_id
employee_id task_date
clock_in task_type
clock_out tasks_assigned
status tasks_completed
quality_score
- 1 coverage gap identified — Employee 109 (Taylor Johnson, Research) had a scheduled shift with no attendance record on Jan 11
- Top performer: Jordan Kim (Data Operations) — 99% quality score, 100% attendance rate
- Highest late-arrival rate: Multiple employees in QA averaged 15–32 minutes late across tracked shifts
- Below-average quality flag: Employees scoring below the company average (93.7%) identified and ranked by variance
- Department with highest avg hourly rate: Engineering ($46.67/hr avg)
- Clone this repo
- Open any PostgreSQL-compatible editor (pgAdmin, DBeaver, TablePlus, or DB Fiddle online)
- Run
data/seed_data.sqlfirst to create and populate all tables - Run any query from
queries/analysis_queries.sql
No setup required for quick testing: paste the seed data + any single query into db-fiddle.com (select PostgreSQL 15) and run instantly.
JOINtypes: INNER, LEFT, CROSS- Aggregation:
COUNT,SUM,AVG,MIN,MAX,ROUND - Window functions:
RANK(),SUM()withOVER,PARTITION BY,ORDER BY - CTEs: single and chained (
WITHstatements) CASE WHENlogic for business rules and flagsTIMESTAMParithmetic withEXTRACT(EPOCH FROM ...)- Anti-join pattern with
LEFT JOIN + IS NULL NULLIFandCOALESCEfor safe division and null handlingROUND,DECIMALprecision handling
workforce-ops-sql/
├── data/
│ └── seed_data.sql # Table definitions + sample records
├── queries/
│ └── analysis_queries.sql # All 10 business questions + solutions
└── README.md
This project was built to demonstrate SQL skills applicable to Data Analyst roles in operations, HR analytics, and business intelligence. The dataset design mirrors real patterns I've worked with in data operations — scheduling workflows, data quality tracking, and KPI reporting against acceptance rate thresholds.
Connect with me on LinkedIn or check out my other projects on GitHub.