In Matplotlib, Box Plot helps visualize data distribution by intuitively showing the median, quartiles, outliers, and more. It often plays a key role in statistical analysis for comparing data distribution and variability.
- Median (Median): The 50th percentile of the data (Q2).
- Quartiles:
- Q1 (1st quartile): The 25th percentile of the data.
- Q3 (3rd quartile): The 75th percentile of the data.
- IQR (Interquartile Range): Q3 – Q1.
- Whiskers:
- Data that typically falls within the range of Q1 – 1.5×IQR and Q3 + 1.5×IQR.
- Values outside of these ranges are labeled as Outliers.
- Outliers: Points outside the whisker range.
Basic Usage: plt.boxplot()
The default function to plot is plt.boxplot().
plt.boxplot(
x, # data (list or array)
labels=None, # labels for each box
notch=False, # show median confidence interval (True/False)
vert=True, # vertical boxplot (False: horizontal)
patch_artist=False, # set box style (True: color fill)
showmeans=False, # show mean values (True)
meanline=False, # show means as a line (True)
showfliers=True, # show outliers (True/False)
whis=1.5 # set whisker range (default: 1.5×IQR)
)
Box Plot Code
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
data = [np.random.normal(0, std, 100) for std in range(1, 5)]
plt.boxplot(data, vert=True, patch_artist=True)
"""
notch=True : Shows a notch to show the confidence interval of the median.
vert=False : Displays the box horizontally.
patch_artist=True : Applies a style to fill the box.
"""
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Box Plot Example')
plt.show()
data:image/s3,"s3://crabby-images/3fcdf/3fcdf36e07f6229a580a3ba9bd620da16bec635a" alt=""
Benefits
- Compare distributions: Compare distributions across multiple data sets at a glance.
- Identify outliers: Easily identify outliers.
- Simplicity: Summarize complex data in a concise way.
Cons
- Lack of detail: Lack of detailed distribution information compared to histograms or violin plots.
- Potential for misinterpretation: Different medians and means can lead to interpretation errors.
Box Plot vs Violin Plot
Characteristics | Box Plot | Violin Plot |
---|---|---|
Informative | Quantiles, medians, and outliers | Density and quantiles of distributions |
Visual complexity | Simple | Complex |
Use cases | Quick distribution comparison | Detailed distribution analysis |
Organizing Box Plot Information
- Effective statistical visualization: They help present key statistical information in a clear and structured way.
- Intuitive visualization: Box plots clearly show the distribution and outliers in your data.
- Functionality: The
plt.boxplot()
function creates box plots and supports customization with parameters likenotch
,patch_artist
, andwhis
. - Customization: You can enhance readability by adjusting colors, line styles, and labels.
- Comparison & analysis: Box plots are useful for comparing data across different groups and displaying statistical summaries.