In Matplotlib, Box Plot helps visualize data distribution by intuitively showing the median, quartiles, outliers, and more. It often plays a key role in statistical analysis for comparing data distribution and variability.

  • Median (Median): The 50th percentile of the data (Q2).
  • Quartiles:
    • Q1 (1st quartile): The 25th percentile of the data.
    • Q3 (3rd quartile): The 75th percentile of the data.
  • IQR (Interquartile Range): Q3 – Q1.
  • Whiskers:
    • Data that typically falls within the range of Q1 – 1.5×IQR and Q3 + 1.5×IQR.
    • Values outside of these ranges are labeled as Outliers.
  • Outliers: Points outside the whisker range.

Basic Usage: plt.boxplot()

The default function to plot is plt.boxplot().

plt.boxplot(
    x,                  # data (list or array)
    labels=None,        # labels for each box
    notch=False,        # show median confidence interval (True/False)
    vert=True,          # vertical boxplot (False: horizontal)
    patch_artist=False, # set box style (True: color fill)
    showmeans=False,    # show mean values (True)
    meanline=False,     # show means as a line (True)
    showfliers=True,    # show outliers (True/False)
    whis=1.5            # set whisker range (default: 1.5×IQR)
)

Box Plot Code

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(10)
data = [np.random.normal(0, std, 100) for std in range(1, 5)]

plt.boxplot(data, vert=True, patch_artist=True)
"""
notch=True : Shows a notch to show the confidence interval of the median.
vert=False : Displays the box horizontally.
patch_artist=True : Applies a style to fill the box.
"""
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Box Plot Example')
plt.show()

Benefits

  • Compare distributions: Compare distributions across multiple data sets at a glance.
  • Identify outliers: Easily identify outliers.
  • Simplicity: Summarize complex data in a concise way.

Cons

  • Lack of detail: Lack of detailed distribution information compared to histograms or violin plots.
  • Potential for misinterpretation: Different medians and means can lead to interpretation errors.

Box Plot vs Violin Plot

CharacteristicsBox PlotViolin Plot
InformativeQuantiles, medians, and outliersDensity and quantiles of distributions
Visual complexitySimpleComplex
Use casesQuick distribution comparisonDetailed distribution analysis

Organizing Box Plot Information

  • Effective statistical visualization: They help present key statistical information in a clear and structured way.
  • Intuitive visualization: Box plots clearly show the distribution and outliers in your data.
  • Functionality: The plt.boxplot() function creates box plots and supports customization with parameters like notch, patch_artist, and whis.
  • Customization: You can enhance readability by adjusting colors, line styles, and labels.
  • Comparison & analysis: Box plots are useful for comparing data across different groups and displaying statistical summaries.

By Mark

-_-

Leave a Reply

Your email address will not be published. Required fields are marked *