Series and DataFrame are the most basic data types in Pandas. Let’s start with a quick overview of Pandas.
The Birth and Evolution of Pandas
Pandas was created in 2008 by Wes McKinney as a Python library for financial data analysis. At the time, Python lacked efficient tools for data manipulation and analysis, so he adopted R’s DataFrame concept to design a structure optimized for handling data.
Since then, Pandas has evolved into a widely used open-source project, growing with contributions from the community. Today, it is an essential library for data analytics.
Built on NumPy, Pandas supports high-performance computations and can seamlessly handle various data formats. Its ability to process Excel, CSV, JSON, and SQL-like queries has made it the standard library for data analysis in Python.
Series
- A one-dimensional Array, consisting of an Index and a Value.
- It is primarily used to represent data in a single column.
Series Code
import pandas as pd# Series: a one-dimensional labeled array
s = pd.Series([1, 3, 5, 7, 9])
print(s)
Output
0 1
1 3
2 5
3 7
4 9
dtype: int64
DataFrame
- A two-dimensional table structure that organizes multiple Series into columns.
- It includes both row labels (Index) and column labels (Column) for easy data manipulation and access.
DataFrame Code 1
# DataFrame: a two-dimensional data structure
df = pd.DataFrame(data=[1, 3, 5, 7, 9], index=range(0,5), columns=['A'])
print(df)
Output 1
A
0 1
1 3
2 5
3 7
4 9
DataFrame Code 2
dates = pd.date_range("20130101", periods=6)
pd.DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[],[],[],[]], index=dates, columns=list("ABCD"))
print(df)
Output 2
A B C D
2013-01-01 1.0 2.0 3.0 4.0
2013-01-02 5.0 6.0 7.0 8.0
2013-01-03 NaN NaN NaN NaN
2013-01-04 NaN NaN NaN NaN
2013-01-05 NaN NaN NaN NaN
2013-01-06 NaN NaN NaN NaN
Series and DataFrame
특징 | Series | DataFrame |
---|---|---|
Dimensions | One dimension | Two dimension |
Structures | Index + single column data | Index + multiple columns of data |
Usage examples | Temperature on a specific day | Student information, stock data, CSV file |
How to create | pd.Series([values], index=[index]) | pd.DataFrame({column names: [value]}) |
Pandas plays a crucial role in data science, machine learning, financial analytics, and more. Recently, it has been expanding into big data analytics by integrating with technologies like PyArrow, Modin, and Dask to enhance performance.
Key Developments
- Parallel processing support – Works with Dask and Modin for faster computations.
- GPU acceleration – Supports CuDF to leverage GPU power for speed.
- Optimized for large datasets – Uses PyArrow to handle massive DataFrames efficiently.