Practical Mean

In statistics, mean has two related meanings:
- the arithmetic mean.
- the expected value of a random variable, which is also called the population mean.

To get the mean for a column in database, in general built-in aggregation function AVG() is good enough. However, one much understand this function is of arithmetic mean. If your database contains noise data, this is not good enough, simply because this mean does not have noise data prune mechanism. While different means would have their precise meaning, if you do not know which to use, use interquartile mean (IQM).

Following is the some notes related to T-SQL’s mean functions:

Arithmetic mean (AM)
There is built-in function – Never use DISTINCT, and no need to use ALL because it is default:

AVG ( [ ALL | DISTINCT ] expression )

Median
Median is not mean, but it is included here anyway. There is no such function in T-SQL. But you can get from here (credits to Itzik Ben-Gan):

SELECT(
(SELECT MAX(Value) FROM
(SELECT TOP 50 PERCENT Value FROM dbo.VOrders ORDER BY Value) AS H1)
+
(SELECT MIN(Value) FROM
(SELECT TOP 50 PERCENT Value FROM dbo.VOrders ORDER BY Value DESC) AS H2)
) / 2 AS Median;

Interquartile mean (IQM)
There is no such function in T-SQL. But you can get from here:

SELECT AVG(S1_2) AS P1_2
FROM Minut
WHERE S1_2 IS NOT NULL
AND Minut.Trip_Id NOT IN
(SELECT TOP 25 PERCENT Trip_Id FROM Minut WHERE S1_2 IS NOT NULL ORDER BY S1_2)
AND Minut.Trip_Id NOT IN
(SELECT TOP 25 PERCENT Trip_Id FROM Minut WHERE S1_2 IS NOT NULL ORDER BY S1_2 DESC)

If the dataset is so small, such as only a few records exist, SELECT TOP 25 PERCENT won't work. Instead, use following solution:

SELECT AVG(S1_2) AS P1_2
FROM Minut
WHERE S1_2 IS NOT NULL
AND Minut.Trip_Id NOT IN
(
SELECT TOP
(
SELECT CONVERT(INT, COUNT(*)/4)
FROM Minut
WHERE S1_2 IS NOT NULL
) Trip_Id
FROM Minut
WHERE S1_2 IS NOT NULL
ORDER BY S1_2
)
AND Minut.Trip_Id NOT IN
(
SELECT TOP
(
SELECT CONVERT(INT, COUNT(*)/4)
FROM Minut
WHERE S1_2 IS NOT NULL
) Trip_Id
FROM Minut
WHERE S1_2 IS NOT NULL
ORDER BY S1_2 DESC
)


http://en.wikipedia.org/wiki/Mean
http://blogical.se/blogs/mikael_sand/archive/2009/07/24/t-sql-is-mean-there-is-no-median.aspx
http://www.sqlmag.com/article/tsql3/calculating-the-median-gets-simpler-in-sql-server-2005.aspx
http://koncordpartners.blogspot.com/2010/12/rounding-issue-in-select-top-n-percent.html

No comments:

Post a Comment

Labels