Series and DataFrame

Series and DataFrame are the most basic data types in Pandas. Let’s start with a quick overview of Pandas.

The Birth and Evolution of Pandas

Pandas was created in 2008 by Wes McKinney as a Python library for financial data analysis. At the time, Python lacked efficient tools for data manipulation and analysis, so he adopted R’s DataFrame concept to design a structure optimized for handling data.

Since then, Pandas has evolved into a widely used open-source project, growing with contributions from the community. Today, it is an essential library for data analytics.

Built on NumPy, Pandas supports high-performance computations and can seamlessly handle various data formats. Its ability to process Excel, CSV, JSON, and SQL-like queries has made it the standard library for data analysis in Python.

Series

A one-dimensional Array, consisting of an Index and a Value.
It is primarily used to represent data in a single column.

Series Code

import pandas as pd# Series: a one-dimensional labeled array
s = pd.Series([1, 3, 5, 7, 9])
print(s)

Output

0    1
1    3
2    5
3    7
4    9
dtype: int64

DataFrame

A two-dimensional table structure that organizes multiple Series into columns.
It includes both row labels (Index) and column labels (Column) for easy data manipulation and access.

DataFrame Code 1

# DataFrame: a two-dimensional data structure
df = pd.DataFrame(data=[1, 3, 5, 7, 9], index=range(0,5), columns=['A'])
print(df)

Output 1

DataFrame Code 2

dates = pd.date_range("20130101", periods=6)
pd.DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[],[],[],[]], index=dates, columns=list("ABCD"))
print(df)

Output 2

              A    B    C    D
2013-01-01  1.0  2.0  3.0  4.0
2013-01-02  5.0  6.0  7.0  8.0
2013-01-03  NaN  NaN  NaN  NaN
2013-01-04  NaN  NaN  NaN  NaN
2013-01-05  NaN  NaN  NaN  NaN
2013-01-06  NaN  NaN  NaN  NaN

특징	Series	DataFrame
Dimensions	One dimension	Two dimension
Structures	Index + single column data	Index + multiple columns of data
Usage examples	Temperature on a specific day	Student information, stock data, CSV file
How to create	`pd.Series([values], index=[index])`	`pd.DataFrame({column names: [value]})`

Pandas plays a crucial role in data science, machine learning, financial analytics, and more. Recently, it has been expanding into big data analytics by integrating with technologies like PyArrow, Modin, and Dask to enhance performance.

Key Developments

Parallel processing support – Works with Dask and Modin for faster computations.
GPU acceleration – Supports CuDF to leverage GPU power for speed.
Optimized for large datasets – Uses PyArrow to handle massive DataFrames efficiently.

Series and DataFrame – Python(pandas)