Viewing data as a table (JSON to Data Table) has a number of advantages and disadvantages. Let’s write about some of the advantages and disadvantages of analyzing data with Pandas DataFrame.
Pros
- Highly readable: Data is clearly structured and easy to understand.
- Easy to compare: Quickly spot differences or similarities between items.
- Easy to sort and filter: Data can be sorted or filtered based on specific criteria.
- Highly compatible: Easily integrates with tools like Excel, making analysis more efficient.
Cons
- Cluttered with big data: Large datasets can become visually overwhelming and hard to interpret.
- Difficult to recognize patterns: Numerical tables don’t easily reveal trends or tendencies.
- Lacks intuitiveness: Tables alone can be hard to interpret, often requiring charts or graphs for better visualization.
JSON to Data Table Code 1
import json
import pandas as pd
import datetime
with open("b0044_AAPL_2406.json", "r") as fp_json:
aapl_2406_data = json.load(fp_json)
pd_tmp_data = []
for _date in aapl_2406_data["Time Series (5min)"]:
tmp_data = aapl_2406_data["Time Series (5min)"][_date]
pd_tmp_data.append([datetime.datetime.strptime(_date, '%Y-%m-%d %H:%M:%S'), float(tmp_data['open']),
float(tmp_data['high']), float(tmp_data['low']), float(tmp_data['close']),
float(tmp_data['volume'])])
df = pd.DataFrame(pd_tmp_data, columns=['date', 'open', 'high', 'low', 'close', 'volume'])
print(df)
Output 1
date open high low close volume
0 2024-06-28 19:55:00 211.41 211.70 211.40 211.660 13341.0
1 2024-06-28 19:50:00 211.31 211.44 211.31 211.440 4606.0
2 2024-06-28 19:45:00 211.31 211.38 211.29 211.350 2269.0
3 2024-06-28 19:40:00 211.28 211.38 211.28 211.325 1970.0
4 2024-06-28 19:35:00 211.16 211.30 211.14 211.280 4367.0
... ... ... ... ... ... ...
3643 2024-06-03 04:20:00 192.85 192.91 192.84 192.910 1499.0
3644 2024-06-03 04:15:00 192.91 192.96 192.84 192.900 2229.0
3645 2024-06-03 04:10:00 192.91 192.95 192.79 192.950 11909.0
3646 2024-06-03 04:05:00 193.10 193.31 192.90 192.900 14845.0
3647 2024-06-03 04:00:00 192.45 193.14 192.45 193.050 26584.0
Code 2
print(df.median())
"""
pandas.DataFreame.median()
axis: The axis on which to standard the {0: index/1: columns} calculation.
skipna : Whether to ignore missing values.
level : For Multi Index, the level at which to perform the calculation.
numeric_only: Whether to use only numbers, decimals, and booleans.
kwargs : Additional keywords to pass to the function.
"""
Output 2
[3648 rows x 6 columns]
date 2024-06-14 11:57:30
open 209.78
high 209.94
low 209.545
close 209.78
volume 26344.5
dtype: object
Personally, I use it most to check for any “jumping data” and to see if deduplication is necessary.
That said, I’ve developed a habit of first tabulating data before drawing graphs or refining it. I think many of you who work with data frequently are probably the same.
Things might be different now, but when I was learning, I relied heavily on my personal intuition to analyze data. I would visually inspect different aspects of the data first, make a judgment, and then apply logic. But nowadays, there seems to be a more structured approach based on the characteristics of the data.
Personally, I still believe an analyst’s intuition plays a crucial role in data analysis. Maybe that’s just because I’m from an older generation. I sometimes wonder if I’m falling behind the latest trends.