Master efficient calculation of sum, mean, and median for robust statistical functions analysis.
Table of Contents
1. Introduction to NumPy & Statistical Functions
NumPy is a Python library specialized in array computations, offering fast and efficient statistical function processing. It supports central tendency measures like mean, median, and mode—but here we focus on mean and median.
2. Sum (sum)
2.1 1D Array Sum
import numpy as np
arr = np.array([1,2,3,4,5])
total_sum = np.sum(arr)
print(total_sum) # Output: 15
np.sum()
returns the total of all elements.
2.2 2D Array Sum by Axis
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(np.sum(arr2d, axis=0)) # Column sums: [12, 15, 18]
print(np.sum(arr2d, axis=1)) # Row sums: [6, 15, 24]
axis=0
sums columnsaxis=1
sums rows
3. Mean (mean)
3.1 1D Array Mean
mean_value = np.mean(arr)
print(mean_value) # Output: 3.0
np.mean()
calculates the arithmetic mean.
3.2 2D Array Mean by Axis
print(np.mean(arr2d, axis=0)) # Column means: [4., 5., 6.]
print(np.mean(arr2d, axis=1)) # Row means: [2., 5., 8.]
3.3 Mean with Missing Values (nan)
arr_nan = np.array([1,2,np.nan,4])
print(np.mean(arr_nan)) # Output: nan
print(np.nanmean(arr_nan)) # Ignores nan, outputs 2.333...
mean()
returns nan if any nan present- Use
np.nanmean()
to ignore nan
4. Median (median)
4.1 What Median Means
Median is the middle value after sorting data. For even-numbered sets, it’s the average of the two center values.
4.2 Median in 1D and 2D Arrays
print(np.median(arr)) # 3.0
print(np.median(arr2d, axis=0)) # [4., 5., 6.]
print(np.median(arr2d, axis=1)) # [2., 5., 8.]
np.median()
also accepts an axis
parameter.
5. Extra Tips: Range, Variance, Standard Deviation
Useful for understanding data spread:
- Range:
np.ptp()
gives the max-minus-min value - Variance:
np.var()
measures data dispersion - Standard Deviation:
np.std()
(andnp.nanstd()
if nan present)
6. Code Examples Summary
import numpy as np
arr = np.array([1,2,3,4,5])
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
# Sum
sum1 = np.sum(arr)
sum_col = np.sum(arr2d, axis=0)
sum_row = np.sum(arr2d, axis=1)
# Mean
mean1 = np.mean(arr)
mean_col = np.mean(arr2d, axis=0)
mean_row = np.mean(arr2d, axis=1)
# Median
median1 = np.median(arr)
median_col = np.median(arr2d, axis=0)
median_row = np.median(arr2d, axis=1)
print(sum1, sum_col, sum_row)
print(mean1, mean_col, mean_row)
print(median1, median_col, median_row)
Expected Output:
15
[12 15 18]
[6 15 24]
3.0
[4. 5. 6.]
[2. 5. 8.]
3.0
[4. 5. 6.]
[2. 5. 8.]
7. Handling Errors in Statistical Functions
np.sum() Common Errors
- TypeError: cannot perform reduce with flexible type
Caused by mixing strings/None with numbers (e.g.,[1, 2, 'a']
).
Fix: Clean data and ensure numeric-only arrays. - ValueError: operands could not be broadcast together
Happens when array shapes are incompatible.
Fix: Check array shapes via.shape
.
np.mean() Common Issues
- RuntimeWarning: Mean of empty slice
When callingnp.mean([])
.
Fix: Checkarr.size > 0
before computing. - nan result due to missing values
Fix: Usenp.nanmean()
to ignore nan.
np.median() Common Errors
- TypeError: ‘<‘ not supported between instances
Caused by mixing numbers and strings (e.g.,[1, 2, '3']
).
Fix: Filter out non-numeric elements. - ValueError: cannot convert float NaN to integer
Occurs when integer arrays include nan.
Fix: Usedtype=float
ornp.nanmedian()
.
Summary Table
Function | Common Errors | Fix Summary |
---|---|---|
np.sum | Mixed types, broadcasting errors | Filter numeric, check shapes |
np.mean | Empty array, nan presence | Check size, use nanmean |
np.median | String mix, nan in integer arrays | Filter types, use nanmedian |
8. Conclusion & Tags
NumPy’s sum
, mean
, and median
make it easy to compute statistics on 1D and multi-dimensional arrays. With added tips like nanmean
, nanstd
and data cleaning, you’re set for real-world data analysis tasks.