Skip to main content

Aggregate functions

Aggregate functions are calculations performed on a set of values to produce a single summary value. They are incredibly useful for gaining insights from large datasets by condensing information into meaningful metrics. In effect you are prompting the Visualizer to take a numerical column, perform a specific calculation (Sum, Max, Min, Average, Count, Unique Count etc.) on its values, and then display the results grouped by the categories in this other column.

By strategically applying these aggregate functions to your data columns and choosing appropriate chart types, you can uncover valuable patterns, trends, and summaries that help you understand your data more effectively. Go ahead and experiment with these functions in your workbook to see the different insights you can gain!

In DataGOL, you can apply any of the following functions to selected columns when you visualize data via chart widgets.

  • Sum:

    • What it does: Calculates the total of all numerical values in a selected column.

    • How it's applied: When you apply Sum to a numerical column (e.g., Sales, Revenue, Quantity), the visualization will display the total sum of all the values in that column. This is great for understanding overall totals.
      For example, a bar chart showing the Sum of Sales for different product categories would reveal which category contributes the most to the total sales.

  • Maximum (Max):

    • What it does: Identifies the highest numerical value within a selected column.

    • How it's applied: Applying Max to a numerical column will highlight the largest value. This can be useful for finding peak performance, highest prices, or maximum quantities.
      For example, A line chart showing the Max Temperature recorded each day would indicate the hottest day in the period.

  • Minimum (Min):

    • What it does: Identifies the lowest numerical value within a selected column.

    • How it's applied: Applying Min to a numerical column will highlight the smallest value. This can help identify the lowest sales periods, cheapest products, or minimum stock levels.
      For example, a bar chart showing the Min Order Value for different customer segments could pinpoint the segments with the smallest individual orders.

  • Average:

    • What it does: Calculates the arithmetic mean of all numerical values in a selected column.

    • How it's applied: Applying Average to a numerical column provides the central tendency of the data. This is useful for understanding typical values like average customer spend or average product price. For example, a line chart showing the Average Monthly Website Visits can illustrate the typical traffic to your site each month.

  • Count:

    • What it does: Determines the total number of rows or non-null values in a selected column.

    • How it's applied: Applying Count to any column (numerical or categorical) will tell you how many records exist. This is useful for understanding the size of your dataset or the number of entries in a specific category.
      For example, A pie chart showing the Count of Customers in different regions would illustrate the distribution of customers across those regions.

  • Unique Count:

    • What it does: Determines the number of distinct or unique values within a selected column.

    • How it's applied: Applying Unique Count to a column (numerical or categorical) helps identify the number of different categories, unique customers, or distinct product IDs.
      For example, a bar chart showing the Unique Count of Products Sold in each month would reveal the variety of products purchased over time.

  • Percent:

    • What it does: Calculates the percentage of a part relative to the whole. This function often requires you to define what the whole represents (e.g., percentage of total sales, percentage of customers in a specific segment).

    • How it's applied: You will typically apply Percent to a numerical column after it has been aggregated (e.g., Count of items). The percentage is then calculated for that each category that contributes to the total count.
      For example, A pie chart showing the Percent of Total Sales for different product lines clearly illustrates the market share of each line. A stacked bar chart could show the Percent of Orders that fall into different priority levels within each month.

  • Positive Rate:

    • What it does: Calculates the proportion of positive values within a selected numerical column.

    • How it's applied: You apply Positive Rate to a numerical column. The tool will count the number of values that meet the positive criteria and divide it by the total number of values in that column.
      For example, a line chart showing the Positive Feedback Rate (percentage of positive customer reviews) over time can highlight trends in customer satisfaction. A bar chart comparing the Positive Conversion Rate across different marketing campaigns can identify the most effective ones.

  • Standard Deviation:

    • What it does: Measures the dispersion or spread of a set of numerical values around their average (mean). A high standard deviation indicates that the values are spread out over a wider range, while a low standard deviation indicates that the values are clustered closely around the mean.

    • How it's applied: Apply Standard Deviation to a numerical column. The result will be a single value representing the standard deviation of that column.
      For example, while the standard deviation itself might not be directly visualized as a primary chart element, it can be incredibly useful in filtering or highlighting. You might use the standard deviation as a basis for filtering data points that fall outside a certain number of standard deviations from the mean, highlighting outliers.

  • Variance:

    • What it does: Similar to standard deviation, variance measures the spread of data points around the mean. Mathematically, it's the square of the standard deviation. While standard deviation is often preferred due to being in the same units as the original data, variance is important in some statistical calculations.

    • How it's applied: Apply Variance to a numerical column. The result will be a single value representing the variance of that column.
      Like standard deviation, variance is often used behind the scenes for more advanced analysis or in conjunction with other visual elements (e.g., as a parameter in a statistical model whose output is visualized). You might not directly see a Variance bar on a chart, but it informs the understanding of data spread.

  • Median:

    • What it does: Represents the middle value in a sorted set of numerical data. If there's an even number of data points, the median is the average of the two middle values. The median is less sensitive to extreme outliers than the average.

    • How it's applied: Apply Median to a numerical column. The result will be the median value of that column.

    • For example:

      • Comparison with Average: You can display both the Average Sales and the Median Sales on a chart (perhaps as lines on a bar chart of different product categories). If the median is significantly different from the average, it suggests the presence of outliers skewing the average.

      • Box Plots: Box plots inherently visualize the median as a central line within the box, providing a clear picture of the central tendency and the spread of the data.