Exploring the WITHIN GROUP Clause in PostgreSQL

Exploring the WITHIN GROUP Clause in PostgreSQL

What is WITHIN GROUP CLAUSE in PostgreSQL?

The WITHIN GROUP clause is a special part of SQL that helps you control how data is grouped and processed when you're using ordered-set aggregate functions. The WITHIN GROUP clause is particularly useful when performing aggregations on ordered subsets of data, making it possible to customize how you want to see the results.

What are ordered-set aggregate functions?

The WITHIN GROUP clause in PostgreSQL is primarily used with ordered-set aggregate functions, which are functions that return multiple values and allow you to specify a custom order for the input values within each group.

The ordered-set aggregate functions that can be used with the WITHIN GROUP clause include:

  1. PERCENTILE_CONT(): Calculates the continuous percentile value within each group.

  2. PERCENTILE_DISC(): Calculates the discrete percentile value within each group.

  3. MODE(): Calculates the mode (most frequent value) within each group.

These functions allow you to specify custom ordering within the group and retrieve aggregated results based on that ordering.

Other common aggregate functions like SUM, AVG, MIN, and MAX don't require the WITHIN GROUP clause because they perform calculations directly on the group without needing specific ordering.

Functions like rank(), dense_rank(), percent_rank(), cume_dist() are already window functions so they cannot be used with within group instead they are used with OVER (PARTITION BY/ORDER BY).

Syntax -

Here's the syntax of the WITHIN GROUP clause in SQL:

SELECT
  column1,
  aggregate_function(column2) WITHIN GROUP (ORDER BY column3)
FROM
  table_name
GROUP BY
  column1;

Explanation of the syntax:

  • column1: The column you want to group the data by.

  • aggregate_function(column2): The aggregate function you want to apply to column2 within each group.

  • ORDER BY column3: Specifies the column by which you want to order the data within each group before applying the aggregate function.

  • table_name: The name of the table you're querying.

  • GROUP BY column1: Groups the data based on the values in column1.

In simple terms, the WITHIN GROUP clause is used after an aggregate function to tell the database how to sort the data within each group before performing the aggregation. This allows you to get specific results for each group of data based on your sorting preference.

Let's dive deeper into the examples of Within Group clause. For that we will be using a sample database Sales.

Sample Database - Sales

We are using sales database for learning within group clause in postgresql. The below queries will create the necessary tables and insert sample data into them. You can execute these SQL statements in your PostgreSQL database to set up the Sales Database schema with sample data for practice.

Create Table Queries -

-- Customers table
CREATE TABLE Customers (
  customer_id SERIAL PRIMARY KEY,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  email VARCHAR(100),
  phone VARCHAR(20)
);

-- Products table
CREATE TABLE Products (
  product_id SERIAL PRIMARY KEY,
  product_name VARCHAR(100) NOT NULL,
  category VARCHAR(50),
  price DECIMAL(10, 2) NOT NULL
);

-- Orders table
CREATE TABLE Orders (
  order_id SERIAL PRIMARY KEY,
  order_date DATE NOT NULL,
  customer_id INTEGER REFERENCES Customers(customer_id)
);

-- Order_Items table
CREATE TABLE Order_Items (
  order_id INTEGER REFERENCES Orders(order_id),
  product_id INTEGER REFERENCES Products(product_id),
  quantity INTEGER NOT NULL,
  PRIMARY KEY (order_id, product_id)
);

Insert Queries -

-- Insert data into Customers table
INSERT INTO Customers (first_name, last_name, email, phone)
VALUES
  ('John', 'Doe', 'john.doe@example.com', '+123456789'),
  ('Jane', 'Smith', 'jane.smith@example.com', '+987654321'),
  ('Michael', 'Johnson', 'michael.johnson@example.com', '+444555666');

-- Insert data into Products table
INSERT INTO Products (product_name, category, price)
VALUES
  ('Laptop', 'Electronics', 1000.00),
  ('Smartphone', 'Electronics', 600.00),
  ('Desk Chair', 'Furniture', 150.00),
  ('Desk', 'Furniture', 250.00),
  ('Bookshelf', 'Furniture', 120.00);

-- Insert data into Orders table
INSERT INTO Orders (order_date, customer_id)
VALUES
  ('2023-07-01', 1),
  ('2023-07-02', 2),
  ('2023-07-03', 3);

-- Insert data into Order_Items table
INSERT INTO Order_Items (order_id, product_id, quantity)
VALUES
  (1, 1, 2),
  (1, 3, 1),
  (2, 2, 3),
  (3, 4, 1),
  (3, 5, 2);

Select Queries

SELECT * FROM Customers;
SELECT * FROM Products;
SELECT * FROM Orders;
SELECT * FROM Order_Items;

Customers Table

Products Table

Orders Table

Order_items Table

Examples

Problem 1: Find the Median Product Price for Each Category

Query -

SELECT
  p.category,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY p.price) AS median_price
FROM
  Products p
GROUP BY
  p.category;

Output -

Explanation -

The PERCENTILE_CONT function calculates the median price for each category by determining the price value that falls at the 50th percentile within each group. The ORDER BY p.price ensures that the prices are ordered in ascending order when calculating the median.

The result of the query will show each product category along with its calculated median price.

Problem 2: Determine the Most Common Order Quantity for Each Product Category

Query -

SELECT
  p.category,
  MODE() WITHIN GROUP (ORDER BY oi.quantity) AS most_common_order_quantity
FROM
  Products p
JOIN
  Order_Items oi ON p.product_id = oi.product_id
GROUP BY
  p.category;

Output -

Explanation The MODE function calculates the most common order quantity for each category by determining the quantity value that appears most frequently within each group. The ORDER BY oi.quantity ensures that the values are considered in ascending order when calculating the mode.

The result of the query will show each product category along with its most common order quantity.

Problem 3: Determine the price at the 25th percentile for products within each product category in the Sales Database.

Query -

SELECT
  p.category,
  PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY p.price) AS x25th_percentile_price
FROM
  Products p
GROUP BY
  p.category;

Output -

Explanation - The result will show the 25th percentile price for products within each category. This value indicates that 25% of the product prices in that category are less than or equal to this calculated value.

Conclusion

The WITHIN GROUP clause gives you the power to focus on specific parts of your data, making it easier to find important information in your database. By using it with functions that organize data, you can do smart calculations that help you understand your data better. This helps you dig deeper and uncover more insights from your data, making your analysis even more valuable.

Thank you for reading till the end!

Did you find this article valuable?

Support The Analyst Geek by becoming a sponsor. Any amount is appreciated!