A scatter plot is a type of graph that visually represents the relationship between two variables.
In this graph, data points are displayed as dots on two axes, showing how each point is positioned relative to the two variables.
Scatter plots are commonly used to:
- Explore correlations between variables.
- Identify patterns or trends within data.
They are useful for detecting relationships, clusters, or anomalies in datasets.
Key Features and Uses
- Identify Correlations
- Scatter plots help reveal linear or non-linear correlations between two variables.
- If the points form a clear ascending or descending pattern, it shows a linear correlation.
- Detect Outliers
- Outliers stand out easily in scatter plots.
- Data points that are far from the majority may indicate unusual or exceptional values.
- Visualize Data Distribution
- Scatter plots display the spread and density of data across two variables.
- You can quickly identify clusters, gaps, or variations in data distribution.
- Analyze Grouped Data
- Different colors or symbols can be used to distinguish multiple groups within one graph.
- This makes it easier to compare groups and observe trends among different categories.
Scatter Plot Code
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
colors = [80, 40, 10, 50, 60]
sizes = [100, 900, 2000, 1000, 1600]
plt.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap='viridis')
plt.colorbar()
plt.title('Scatter Plot with Colors and Sizes')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
I haven’t used scatter plots much for analysis in my day-to-day work.
I’ve primarily used them in reports to explain concepts to others. Most of the time, I’ve used them to illustrate directional convergence toward a particular metric rather than for deep analysis.
Common Examples
- Scientific Research
- Understanding the relationship between temperature and humidity in environmental data.
- Examining how the dose of a drug affects its therapeutic impact in medical research.
- Economics
- Analyzing relationships between economic indicators, such as the correlation between GDP and unemployment rates.
- Marketing
- Studying the relationship between product price and sales volume to determine the optimal pricing strategy.