Difference between Fact and Dimension tables in SQL/Power BI with example.
Quick Introduction
Fact and dimension tables are two fundamental concepts in data warehousing and database design. They serve different roles in organizing and storing data for efficient analysis. Here's the difference between them:
Key Differences between Fact and Dimension Tables
Fact Table:
A fact table contains the quantitative data or facts that you want to analyze.
It typically consists of numerical values, measures, or metrics.
Fact tables are often quite large, containing millions or even billions of records.
They are associated with business transactions or events.
Fact tables usually have foreign keys that link to dimension tables.
They store data about what happened, such as sales, orders, or customer interactions.
Dimension Table:
A dimension table contains descriptive attributes that provide context to the facts in the fact table.
It holds textual or categorical data that helps in filtering, grouping, and labeling facts.
Dimension tables are generally smaller compared to fact tables.
They provide additional information for slicing and dicing the data.
Dimension tables often have primary keys that are referenced as foreign keys in the fact table.
They store data about who, what, where, when, and why something happened.
Example - [HumanResource Schema]
Let's take an example of HumanResource Schema from the AdventureWorks2022 Database. Here are the tables and data from the HumanResources schema in the AdventureWorks2022 database:
- Department: This table contains information about the departments in the company, such as the department name, the department manager, and the department location.
Column Name | Data Type | Description |
DepartmentID | int | The unique identifier for the department. |
Name | nvarchar(50) | The name of the department. |
ManagerID | int | The employee ID of the department manager. |
LocationID | int | The location ID of the department. |
- Employee: This table contains information about the employees of the company, such as their name, address, and phone number.
Column Name | Data Type | Description |
EmployeeID | int | The unique identifier for the employee. |
Name | nvarchar(50) | The name of the employee. |
Title | nvarchar(50) | The title of the employee. |
BirthDate | datetime | The date of birth of the employee. |
HireDate | datetime | The date the employee was hired. |
DepartmentID | int | The department ID of the employee. |
ManagerID | int | The employee ID of the employee's manager. |
Phone | nvarchar(25) | The phone number of the employee. |
nvarchar(50) | The email address of the employee. | |
Address | nvarchar(60) | The address of the employee. |
City | nvarchar(30) | The city of the employee's address. |
StateProvince | nvarchar(2) | The state or province of the employee's address. |
CountryRegion | nvarchar(20) | The country of the employee's address. |
PostalCode | nvarchar(10) | The postal code of the employee's address. |
- EmployeeDepartmentHistory: This table contains information about the departments that employees have worked in, such as the department name, the department manager, and the department start and end dates.
Column Name | Data Type | Description |
EmployeeID | int | The unique identifier for the employee. |
DepartmentID | int | The department ID of the employee. |
DepartmentName | nvarchar(50) | The name of the department. |
ManagerID | int | The employee ID of the department manager. |
StartDate | datetime | The start date of the employee's employment in the department. |
EndDate | datetime | The end date of the employee's employment in the department. |
- EmployeePayHistory: This table contains information about the pay history of employees, such as their salary, bonus, and overtime pay.
Column Name | Data Type | Description |
EmployeeID | int | The unique identifier for the employee. |
PayDate | datetime | The date the pay was earned. |
Rate | money | The hourly rate of pay for the employee. |
HoursWorked | int | The number of hours worked by the employee. |
RegularPay | money | The regular pay for the employee. |
OvertimePay | money | The overtime pay for the employee. |
BonusPay | money | The bonus pay for the employee. |
TotalPay | money | The total pay for the employee. |
- JobCandidate: This table contains information about job candidates, such as their name, address, and resume.
Column Name | Data Type | Description |
JobCandidateID | int | The unique identifier for the job candidate. |
Name | nvarchar(50) | The name of the job candidate. |
Address | nvarchar(60) | The address of the job candidate. |
City | nvarchar(30) | The city of the job candidate's address. |
StateProvince | nvarchar(2) | The state or province of the job candidate's address. |
CountryRegion | nvarchar(20) | The country of the job candidate's address. |
PostalCode | nvarchar(10) | The postal code of the job candidate's address. |
Phone | nvarchar(25) | The phone number of the job candidate. |
nvarchar(50) | The email address of the job candidate. | |
Resume | varbinary(max) | The resume of the job candidate. |
So Let's Identify which are Fact and dimension Tables from the HumanResource Schema.
So we have added prefixes to whichever tables are Fact as Fact and Dimension ad Dim.
fig. Data Model of HumanResource Schema
The following tables in the HumanResources schema are fact tables:
EmployeePayHistory: This table contains information about the pay history of employees.
EmployeeDepartmentHistory: This table contains information about the departments that employees have worked in.
The following tables in the HumanResources schema are dimension tables:
Department: This table contains information about the departments in the company.
Employee: This table contains information about the employees of the company.
JobCandidate: This table contains information about job candidates.
Shift: This table contains information about the shifts that employees work.
How tables were categorized as Fact and Dimension tables?
The EmployeePayHistory and EmployeeDepartmentHistory tables are fact tables because they store measurements of business events. The EmployeePayHistory table stores information about the pay of employees, such as their salary, bonus, and overtime pay. The EmployeeDepartmentHistory table stores information about the departments that employees have worked in, such as the department name, the department manager, and the department start and end dates.
The Department, Employee, JobCandidate, and Shift tables are dimension tables because they provide context for the measurements in the fact tables. The Department table provides information about the departments in the company, such as the department name, the department manager, and the department location. The Employee table provides information about the employees of the company, such as their name, address, and phone number. The JobCandidate table provides information about job candidates, such as their name, address, and resume. The Shift table provides information about the shifts that employees work, such as the shift start and end times, and the number of employees on each shift.
Thank you for reading till the end!