Series and DataFrame are the most basic data types in Pandas. Let’s start with a quick overview of Pandas.

The Birth and Evolution of Pandas

Pandas was created in 2008 by Wes McKinney as a Python library for financial data analysis. At the time, Python lacked efficient tools for data manipulation and analysis, so he adopted R’s DataFrame concept to design a structure optimized for handling data.

Since then, Pandas has evolved into a widely used open-source project, growing with contributions from the community. Today, it is an essential library for data analytics.

Built on NumPy, Pandas supports high-performance computations and can seamlessly handle various data formats. Its ability to process Excel, CSV, JSON, and SQL-like queries has made it the standard library for data analysis in Python.

Series

  • A one-dimensional Array, consisting of an Index and a Value.
  • It is primarily used to represent data in a single column.

Series Code

import pandas as pd# Series: a one-dimensional labeled array
s = pd.Series([1, 3, 5, 7, 9])
print(s)

Output

0    1
1    3
2    5
3    7
4    9
dtype: int64

DataFrame

  • A two-dimensional table structure that organizes multiple Series into columns.
  • It includes both row labels (Index) and column labels (Column) for easy data manipulation and access.

DataFrame Code 1

# DataFrame: a two-dimensional data structure
df = pd.DataFrame(data=[1, 3, 5, 7, 9], index=range(0,5), columns=['A'])
print(df)

Output 1

   A
0  1
1  3
2  5
3  7
4  9

DataFrame Code 2

dates = pd.date_range("20130101", periods=6)
pd.DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[],[],[],[]], index=dates, columns=list("ABCD"))
print(df)

Output 2

              A    B    C    D
2013-01-01  1.0  2.0  3.0  4.0
2013-01-02  5.0  6.0  7.0  8.0
2013-01-03  NaN  NaN  NaN  NaN
2013-01-04  NaN  NaN  NaN  NaN
2013-01-05  NaN  NaN  NaN  NaN
2013-01-06  NaN  NaN  NaN  NaN

Series and DataFrame

특징SeriesDataFrame
DimensionsOne dimensionTwo dimension
StructuresIndex + single column dataIndex + multiple columns of data
Usage examplesTemperature on a specific dayStudent information, stock data, CSV file
How to createpd.Series([values], index=[index])pd.DataFrame({column names: [value]})

Pandas plays a crucial role in data science, machine learning, financial analytics, and more. Recently, it has been expanding into big data analytics by integrating with technologies like PyArrow, Modin, and Dask to enhance performance.

Key Developments

  • Parallel processing support – Works with Dask and Modin for faster computations.
  • GPU acceleration – Supports CuDF to leverage GPU power for speed.
  • Optimized for large datasets – Uses PyArrow to handle massive DataFrames efficiently.

By Mark

-_-

Leave a Reply

Your email address will not be published. Required fields are marked *