Hist plot(matplot document) is used to visualize the distribution of data. It divides continuous data into specific bins and shows the number of data points (frequency) in each bin as bars. It is commonly used for statistical analysis, checking data distribution, detecting outliers, and more.

Basic usage

Use the matplotlib.pyplot.hist() function. The main parameters are

plt.hist(data, bins=30, color="skyblue", edgecolor="black", alpha=0.6)

Hist plot Code

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(10)
data = np.random.randn(1000)   # random Data 1000 EA

plt.hist(data, bins=30, alpha=0.75, edgecolor='black')
"""
data: The data to use for the histogram.
bins=30: Number of bins in the histogram.
alpha=0.75: Transparency of the bars.
edgecolor='black': Border color of the bars.
"""
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram Example')
plt.grid(True)  # Add grid for better visualization
plt.show()

Notes

  • Choice of bins: Too few will not represent the distribution well, too many will be noisy.
  • Normalization: Setting density=True will convert to a probability density function. It is scaled so that the total area is equal to 1.

Use cases

  • Analyze data distributions: check for normal distribution, uniform distribution, skewness/kurtosis.
  • Outlier detection: Check for outliers in the tails.
  • Multi-data comparison: Compare the distributions of multiple datasets by overlapping them.

Histograms and bar charts may look similar, but they serve different purposes and work with different types of data. Their interpretation also differs.

To be honest, I used them interchangeably for a while without realizing the differences, so here’s a quick refresher.

HistogramsBar graphs
It deals with continuous data.
(e.g., age, temperature, time)
Deals with categorical data.
(Example: fruit type, region, gender)
Divide the data into bins and calculate frequencies.Compare the values in each category directly.
Determine the distribution of the data.
(e.g., normal distribution, skewed, outliers)
Compare the difference in values between categories.
(e.g., sales volume, poll results)
“Distribution of height data for 100 people”
→ x-axis: 150-160 cm, 160-170 cm, … (bins)
→ y-axis: number of people in each bin
“Compare sales by fruit”
→ x-axis: Apples, bananas, oranges (categories)
→ y-axis: Sales of each fruit

It’s important not to confuse the two and to use them in context.

By Mark

-_-

Leave a Reply

Your email address will not be published. Required fields are marked *