A Practical Case of Transpose Query

Let us say a pension fund wants to generate a summary consists of total contributions had been made so far for each member, by their employers. The fund does not care about exact contribution made by each employers for each member. However, the fund wants to know the three employers paid the contributions most for each member. As the result, the summary table should have 5 columns: Member_Id (unique), Total_Contributions, Employer_1, Employer_2, Employer_3. If a member has less than three employers, leave it blank. Each member should have occupied one row only.

Simple SQL query of SELECT GROUP BY won't be able to accomplish this task, since it is a transpose requirement, namely, turning the row into column. However, CASE or DECODE in PL/SQL will do. The equivalent of CASE in Microsoft Office Access is IIF. Alternatively, TRANSFORM in Access and SELECT PIVOT in PL/SQL can be used. By use CASE or DECODE, it is possible to avoid the complex programming. There are five steps:

1. To get dataset first which should consist of all necessary information. Let us use Dataset_All to describe it:

SELECT Member_Id
, Employer_Id
, SUM(Contribution) Total_Contribution
FROM Tab
GROUP BY Member_Id
, Employer_Id
ORDER BY Member_Id

ORDER BY is important since it would make sure the ROWNUM is in align with the Member_Id.

2. Get a dataset which consists of the Member_Id and their last ROWNUM in Dataset_All. Let us name it Dataset_Uni:

SELECT Member_Id
, MAX(ROWNUM) Serial_No
FROM Dataset_All
GROUP BY Member_Id

Using GROUP BY to get largest ROWNUM if same Member_Id having multiple rows.

3. To flag each employer according their rank, start from 1. Let us name it Dataset_Rank:

SELECT Dataset_Uni.Member_Id
, Total_Contribution
, (Serial_No - Dataset_All.ROWNUM + 1) Employer_Serial
, Employer_Id
FROM Dataset_Uni
JOIN Dataset_All ON Dataset_Uni.Member_Id = Dataset_All.Member_Id

4. Transpose process. We also use the numerical feather of Employer_Id which was generated from sequence:

SELECT Member_Id
, Total_Contribution
, SUM(CASE Employer_Serial (WHEN 1 Employer_Id, ELSE 0)) Employer_1
, SUM(CASE Employer_Serial (WHEN 2 Employer_Id, ELSE 0)) Employer_2
, SUM(CASE Employer_Serial (WHEN 3 Employer_Id, ELSE 0)) Employer_3
FROM Dataset_Rank
GROUP BY Member_Id
, Total_Contribution
ORDER BY Member_Id

Put everything together:

WITH Dataset_All AS
( SELECT Member_Id
, Employer_Id
, SUM(Contribution) Total_Contributio
FROM Tab
GROUP BY Member_Id
, Employer_Id
ORDER BY Member_Id
)
SELECT Member_Id
, Total_Contribution
, SUM(CASE Employer_Serial (WHEN 1 Employer_Id, ELSE 0)) Employer_1
, SUM(CASE Employer_Serial (WHEN 2 Employer_Id, ELSE 0)) Employer_2
, SUM(CASE Employer_Serial (WHEN 3 Employer_Id, ELSE 0)) Employer_3
FROM
( SELECT Dataset_Uni.Member_Id
, Total_Contribution
, (Serial_No - Dataset_All.ROWNUM + 1) Employer_Serial
, Employer_Id
FROM
( SELECT Member_Id
, MAX(ROWNUM) Serial_No
FROM Dataset_All
GROUP BY Member_Id
) Dataset_Uni
JOIN Dataset_All ON Dataset_Uni.Member_Id = Dataset_All.Member_Id
)
GROUP BY Member_Id
, Total_Contribution
ORDER BY Member_Id

Done.

No comments:

Post a Comment

Labels