1001Ferramentas
🚨Calculators

Outlier Detection (IQR Rule)

Detect outliers using IQR rule (outside Q1−1.5·IQR or Q3+1.5·IQR).


  

IQR and outlier detection

The interquartile range measures the central spread of the data: IQR = Q3 − Q1. Tukey's fences (1977) flag outliers without assuming normality — mild outliers fall below Q1 − 1.5·IQR or above Q3 + 1.5·IQR; extreme outliers use the 3·IQR multiplier. Example: 10, 12, 14, 15, 16, 18, 100 → Q1 = 12, Q3 = 18, IQR = 6, upper fence = 27 → 100 is flagged. The boxplot draws a box from Q1 to Q3 with the median inside, "whiskers" out to the last non-outlier value, and isolated points beyond. More robust than the σ rule because Q1/Q3 are not pulled by extreme values.

Applications

Data cleaning in ML pipelines (sklearn, pandas), salary analysis (separate discrepant cases), stock markets (rare events, fat tails), industrial quality control, banking fraud (atypical transactions), and epidemiology (extreme cases).

FAQ

IQR or standard deviation — which to choose? IQR is robust: it does not depend on the mean nor on σ, so it does not blow up with outliers. σ is preferable on clean, approximately normal data.

Why 1.5·IQR? Empirical choice by Tukey balancing false positives and false negatives. Under a normal distribution, the fences cover about 99.3% of the data — beyond is suspicious.

Should I always remove flagged outliers? No. Investigate first: it may be a typo, a measurement error, or a real signal (fraud, rare event). Removing without thinking destroys information.

Related Tools