Aggregations & Logical Functions in Data Preparation & EDA
Aggregations and logical functions are essential tools in data preparation and analysis. They help us summarize data, find patterns, and make decisions based on logical conditions. These functions are widely used in Excel, SQL, Python (Pandas), Tableau, and other tools.
1. What are Aggregation Functions?
Aggregation functions perform summary operations on groups of data. These are used to analyze trends, understand distribution, and extract high-level insights.
Common Aggregation Functions:
- SUM() – Adds all numeric values.
- COUNT() – Counts the number of entries.
- AVERAGE() or MEAN() – Calculates the average value.
- MAX() / MIN() – Finds the highest or lowest value.
- MEDIAN() – Identifies the middle value in a data set.
- STDEV() / VAR() – Measures spread or variability.
Use Cases:
- Summing total sales for each region.
- Counting customers in different segments.
- Finding the average rating of a product.
These are often used with grouping tools like Pivot Tables or the GROUP BY
clause in SQL.
2. What are Logical Functions?
Logical functions help make data-driven decisions based on specific conditions. These are commonly used to filter, classify, or flag records.
Common Logical Functions:
- IF(condition, value_if_true, value_if_false)
Example:IF(Sales > 5000, "High", "Low")
- AND(), OR() – Combine multiple conditions.
-
AND(A > 10, B < 20)
returns TRUE only if both conditions are true. -
OR(A > 10, B < 20)
returns TRUE if any condition is true.
-
- IFERROR() / ISERROR() – Handle errors and replace them with custom messages.
Use Cases:
- Classifying data into categories (e.g., pass/fail, high/low income).
- Flagging data with quality issues.
- Creating conditional columns in Excel, Tableau, or Python.
3. How Aggregations & Logic Work Together in EDA
When performing Exploratory Data Analysis, both aggregations and logical operations are often combined to:
- Identify outliers (e.g., sales above a threshold).
- Segment customers based on spending habits.
- Compare performance across different time periods or groups.
Example:
“Calculate the average purchase value for customers who made more than 5 orders.”
This requires:
- A logical condition (
OrderCount > 5
) - An aggregation (
AVERAGE(PurchaseAmount)
)