Showing posts with label SUM(CASE) GROUP BY Method. Show all posts
Showing posts with label SUM(CASE) GROUP BY Method. Show all posts

Pivot Large Table With Dynamic Columns

In general, it is recommended to use SUM(CASE) GROUP BY method to pivot rows into columns. For details why recommended such, please refer to http://koncordpartners.blogspot.com/2010/03/pivot-turning-rows-into-columns.html. However, in two circumstances SUM(CASE) GROUP BY method does not work:

1. The column numbers are too many. By using SUM(CASE) GROUP BY method, one must know every column’s name.

2. The column names are unknown when generate queries. Usually this could be happened with column names are result from another SELECT statement.

In fact, these two circumstances do always emerge together. In this situation, built-in PIVOT function in Oracle and SQL Server is the only option. In Ms Access, it is TRANSFORM, which is even better in comparison to Oracle and SQL Server, because it accept the list of column names directly from the SELECT statement. Following example is using T-SQL, and the scenario is to draw a matrix table summarizing the total air ambulance flights between airports within a state for past ten year. The involved records are around 500,000.

The PIVOT function is easy:

SELECT *
FROM
(
SELECT sendport AS Sending_Airport
, receiveport AS Receiving_Airport
, 1 AS InterimCounter
FROM dw_Legs
) AS Inner_Q
PIVOT
(
SUM(InterimCounter)
FOR Receiving_Airport IN ([The list of receiving airport])
) AS Pivot_Q

The problem is “The list of receiving airport” must be selected from airport table, and very unfortunately, T-SQL does not accept it as an outcome of SELECT statement. So, dynamic query is the only option left. We human just make our life so miserable by disallow this and disallow that.

Another issue is not all airports receive planes from every rest airports, which means “The list of receiving airport” should not be directly from airport table, rather is should be from the intermediate queries. So, for presentation purpose, a temporary table is the best solution, which is also in accordance with the design of SQL Server.

Step 1

CREATE VIEW V_Airport_Matrix
AS
SELECT sendport AS Sending_Airport
, receiveport AS Receiving_Airport
, 1 AS InterimCounter
FROM dw_Legs
GO

Step 2

Prepare dynamic queries:

DECLARE @columns VARCHAR(MAX)
SELECT @columns = COALESCE(@columns + ',[' + CAST(LTRIM(RTRIM(Receiving_Airport)) AS VARCHAR) + ']', '[' + CAST(LTRIM(RTRIM(Receiving_Airport)) AS VARCHAR)+ ']')
FROM
(
SELECT DISTINCT Receiving_Airport
FROM V_Airport_Matrix
) AS base_Q

Why LTRIM(RTRIM()) is needed? Because the length of the VARCHAR string becomes critical since VARCHAR(MAX) does have limit. And very often, VARCHAR(MAX) does not meet our need here. And, the text, ntext, and image data types are invalid for local variables. Following is the send part of Step 2:

DECLARE @query VARCHAR(MAX)
SET @query =
'
SELECT *
FROM
(
SELECT *
FROM V_Airport_Matrix
) AS Inner_Q
PIVOT
(
SUM(InterimCounter)
FOR Receiving_Airport IN (' + @columns + ')
) AS Pivot_Q
'

What about your query does exceed the VARCHAR(MAX) limit? You will need to shorten the query by cutting them into pieces:

DECLARE @columns VARCHAR(MAX)
SELECT @columns = COALESCE(@columns + ',[' + CAST(LTRIM(RTRIM(Receiving_Airport)) AS VARCHAR) + ']', '[' + CAST(LTRIM(RTRIM(Receiving_Airport)) AS VARCHAR)+ ']')
FROM
(
SELECT DISTINCT Receiving_Airport
FROM V_Airport_Matrix
WHERE Receiving_Airport LIKE 'A%'
OR Receiving_Airport LIKE 'B%'
OR Receiving_Airport LIKE 'C%'
OR Receiving_Airport LIKE 'D%'
OR Receiving_Airport LIKE 'E%'
OR Receiving_Airport LIKE 'F%'
OR Receiving_Airport LIKE 'G%'
OR Receiving_Airport LIKE 'H%'
OR Receiving_Airport LIKE 'I%'
OR Receiving_Airport LIKE 'J%'
OR Receiving_Airport LIKE 'K%'
OR Receiving_Airport 'L%'
) AS base_Q

Then, execute one by one before union together.

Step 3

To execute the queries:

EXECUTE(@query)


http://blog-mstechnology.blogspot.com/2010/06/t-sql-pivot-operator-with-dynamic.html

PIVOT – Turning Rows into Columns

Here we exam various PIVOT functions/methods in MS Access, Oracle and SQL Server.

MS Access:

TRANSFORM does not work properly because it would generate many rows with a lot of zeros. Let us call it seggragation. However, covring it by a SUM() GROUP BY statement will do.

SELECT with IIF() does work perfect. Example: http://koncordpartners.blogspot.com/2009/08/practical-case-of-transpose-query.html

Oracle:

SELECT PIVOT does not work properly because of seggragation. However, covring it by a SUM() GROUP BY statement will do.

SELECT with CASE does work perfect. Example: http://koncordpartners.blogspot.com/2009/08/practical-case-of-transpose-query.html

OVER PARTITION BY does work fine, but not recommended because lack of flexibility. Example: http://baurdotnet.wordpress.com/2009/08/26/select/

SQL Server:

PIVOT does work fine, but strongly not recommended for two reasons: 1, Aggragate function whtin PIVOT can not be recalculated according to your will. 2, T-SQL does not allow ORDER BY working within the inner query, which does cause a lot of difficulties for layout arrangement.

A Practical Case of Transpose Query

Let us say a pension fund wants to generate a summary consists of total contributions had been made so far for each member, by their employers. The fund does not care about exact contribution made by each employers for each member. However, the fund wants to know the three employers paid the contributions most for each member. As the result, the summary table should have 5 columns: Member_Id (unique), Total_Contributions, Employer_1, Employer_2, Employer_3. If a member has less than three employers, leave it blank. Each member should have occupied one row only.

Simple SQL query of SELECT GROUP BY won't be able to accomplish this task, since it is a transpose requirement, namely, turning the row into column. However, CASE or DECODE in PL/SQL will do. The equivalent of CASE in Microsoft Office Access is IIF. Alternatively, TRANSFORM in Access and SELECT PIVOT in PL/SQL can be used. By use CASE or DECODE, it is possible to avoid the complex programming. There are five steps:

1. To get dataset first which should consist of all necessary information. Let us use Dataset_All to describe it:

SELECT Member_Id
, Employer_Id
, SUM(Contribution) Total_Contribution
FROM Tab
GROUP BY Member_Id
, Employer_Id
ORDER BY Member_Id

ORDER BY is important since it would make sure the ROWNUM is in align with the Member_Id.

2. Get a dataset which consists of the Member_Id and their last ROWNUM in Dataset_All. Let us name it Dataset_Uni:

SELECT Member_Id
, MAX(ROWNUM) Serial_No
FROM Dataset_All
GROUP BY Member_Id

Using GROUP BY to get largest ROWNUM if same Member_Id having multiple rows.

3. To flag each employer according their rank, start from 1. Let us name it Dataset_Rank:

SELECT Dataset_Uni.Member_Id
, Total_Contribution
, (Serial_No - Dataset_All.ROWNUM + 1) Employer_Serial
, Employer_Id
FROM Dataset_Uni
JOIN Dataset_All ON Dataset_Uni.Member_Id = Dataset_All.Member_Id

4. Transpose process. We also use the numerical feather of Employer_Id which was generated from sequence:

SELECT Member_Id
, Total_Contribution
, SUM(CASE Employer_Serial (WHEN 1 Employer_Id, ELSE 0)) Employer_1
, SUM(CASE Employer_Serial (WHEN 2 Employer_Id, ELSE 0)) Employer_2
, SUM(CASE Employer_Serial (WHEN 3 Employer_Id, ELSE 0)) Employer_3
FROM Dataset_Rank
GROUP BY Member_Id
, Total_Contribution
ORDER BY Member_Id

Put everything together:

WITH Dataset_All AS
( SELECT Member_Id
, Employer_Id
, SUM(Contribution) Total_Contributio
FROM Tab
GROUP BY Member_Id
, Employer_Id
ORDER BY Member_Id
)
SELECT Member_Id
, Total_Contribution
, SUM(CASE Employer_Serial (WHEN 1 Employer_Id, ELSE 0)) Employer_1
, SUM(CASE Employer_Serial (WHEN 2 Employer_Id, ELSE 0)) Employer_2
, SUM(CASE Employer_Serial (WHEN 3 Employer_Id, ELSE 0)) Employer_3
FROM
( SELECT Dataset_Uni.Member_Id
, Total_Contribution
, (Serial_No - Dataset_All.ROWNUM + 1) Employer_Serial
, Employer_Id
FROM
( SELECT Member_Id
, MAX(ROWNUM) Serial_No
FROM Dataset_All
GROUP BY Member_Id
) Dataset_Uni
JOIN Dataset_All ON Dataset_Uni.Member_Id = Dataset_All.Member_Id
)
GROUP BY Member_Id
, Total_Contribution
ORDER BY Member_Id

Done.

Labels