Aggregations in Tableau

Choosing your aggregation is the first step in the creation of your visualization. To make this easier, Tableau offers several shortcut aggregations.

Let’s take a look at each of them.

When you drag a measure to the View, you don’t get to see all of the data points in your visualization. Instead, you see an aggregated view.

An aggregation combines data points together into a group. By doing so it reduces the granularity and shows less level of detail. In other words, instead of seeing each row of the table in the you get a summarized view.

To see all data points in your visualization, go to ‘Analysis’ in the top menu bar and uncheck ‘Aggregate Measures.’

Showing aggregated data has two main benefits:

  1. It makes data easier to comprehend and analyze. It helps you gain quick insight into the data and discover trends and patterns.
  2. It increases the performance of large datasets, amongst others by reducing the query time.

While aggregations are most commonly used for measures, Tableau also makes some aggregations available for dimensions.

There are many ways to aggregate data, for example by sum, average or median. Tableau offers several predefined aggregations. Below is a complete list of these aggregations. The examples are based on this simple table:

Example table.

Let’s dive in.

Sum

The sum is the default aggregation used in Tableau. It adds up the values of the measure and returns the sum.

Sum is shown as SUM([field name]) on the Shelves.

The sum for the example table is:

10 + 25 + 40 + 30 + 25 = 130

Average

The average adds up the values of the measure and divides it by the number of data points.

Average is shown as AVG([field name]) on the Shelves.

The average for the example table is:

(10 + 25 + 40 + 30 + 25) / 5 = 26

Median

The median pinpoints the value of the data point that is located in the middle of the dataset when the values of the data points are arranged on ascending or descending order. The median is the exact middle position in which half of the data values are above and half are below.

Median is shown as MEDIAN([field name]) on the Shelves.

The median is useful for datasets that contain outliers and anomalies. Different from the average, the median is less susceptible to extreme values. It makes it a more reliable metric for asymmetric data.

For example, the average and median income of a room full of data analysts may be $100,000. But what happens if Bill Gates walks in? The average is suddenly a lot higher. The median stays the same.

Find the median by sorting the data points according to value in descending or ascending order and find the value that divides the list into two equal portions. In case of the example table above, the median is 25:

Count

Count returns the number of values.

Count is shown as CNT([field name]) on the Shelves.

The count for the example table is 5.

Count (Distinct)

Count (Distinct) returns the number of unique values. Rows with the same value are counted as a single row.

Count (Distinct) is shown as CNTD([field name]) on the Shelves.

The example table contains two rows with the value of 25. These two rows are count as a single instance. Therefore, the count for the example table is 4.

Minimum

Minimum returns the smallest value.

Minimum is shown as MIN([field name]) on the Shelves.

The minimum for the example table is 10.

Maximum

Maximum returns the largest value.

Maximum is shown as MAX([field name]) on the Shelves.

The maximum for the example table is 40.

Percentile

A percentile is a value on a scale of 100 that indicates the percent of a distribution that is equal or below it.

Let’s say you have a list of house rental prices in a suburb of Mobile, Alabama. The 70th percentile is at $3200/month. This means that 70% of the house rental prices in that list is below $3200.

Percentile is shown as PCT[number]([field name]) on the Shelves.

Tableau lets you choose from a list of percentiles from the drop-down.

The 50th percentile — which is the median — for the example table is 25.

Variance (Population)

The variance measures the spread of the data points from their average value. In other words, it measures the average degree to which each point differs from the mean (average).

Variance (Population) is shown as VARP([field name]) on the Shelves.

The variance is calculated as the squared difference from the mean:

  1. Calculate the difference between each data point and the mean.
  2. Square each of the results.
  3. Calculate the average from the results.

Squaring ensures that differences above and below the mean do not cancel each other out.

To get the variance (population) for the example table, you first calculate the difference between each data point and the mean of 26:

10 - 26 = -16
25 - 26 = -1
40 - 26 = 14
30 - 26 = 4
25 - 26 = -1

Then, square each of the results.

-16^2 = 256
-1^2 = 1
14^2 = 196
4^2 = 16
-1^2 = 1

Finally, add the numbers up and divide by the number of data points.

(256 + 1 + 196 + 16 + 1) / 5 = 94

The variance (population) for the example table is 94.

Variance (population) in Tableau assumes that the dataset contains the entire population. In practice, this function is used for large sample sizes. For small samples, it is recommended to use Variance.

Variance

In many cases you’ll find that you are not able to get the data of the full population, but only of a sample. To reduce the bias of a finite sample size, Tableau adjusts the variance (population) according to Bessel’s correction.

Bessel’s correction entails reducing the number of observations. Whereas the formula for variance (population) is based on n observations, the formula for variance of a sample is based on n-1 observations. This correction usually increases the error, with it accounting for the lack of data.

Variance is shown as VAR([field name]) on the Shelves.

To get the variance for the table above, you follow the same steps as for variance (population), but instead of dividing the sum of the squared differences by n=5, you divide it by n-1=4.

(256 + 1 + 196 + 16 + 1) / 4 = 117,5

This makes the variance 117,5.

Use this type of variance if you have a finite sample that represents the population.

Standard Deviation (Population)

Standard deviation looks at how spread out a group of numbers is from the mean. It is a way to quantify the amount of variation or dispersion of a range of values.

The standard deviation helps determine if an individual data point lies at an abnormal distance from other data points and qualifies as an outlier or anomaly.

Standard Deviation (Population) is shown as STDEVP([field name]) on the Shelves.

It is measured by calculating the square root of variance. A low standard deviation indicates that most of the values are clustered closely together. A high standard deviation indicates that there is a higher level of spread and variation.

To get the standard deviation (population) for the example table, you calculate the square root of 94.

√94 ≈ 9.695

Which comes out to approximately 9.695.

Similar to variance (population), standard deviation (population) assumes that the dataset contains the entire population. For small samples, it is recommended to use Standard Deviation.

Standard Deviation

Like Variance, Standard Deviation is adjusted for a finite same size by using n-1.

Standard Deviation is shown as STDEV([field name]) on the Shelves.

To get the standard deviation (population) for the example table, you calculate the square root of 117.5.

√117.5 ≈ 10.84

Which comes out to approximately 10.84.

There you have it — all 12 predefined aggregations in Tableau. Which aggregation do you use the most?

Questions? I love questions! Please leave them in the comments below and I will get back to you as soon as I can.

Disclaimer: I am in no way affiliated with Tableau.

I write beginner-level Tableau tutorials.