Viewing data as a table (JSON to Data Table) has a number of advantages and disadvantages. Let’s write about some of the advantages and disadvantages of analyzing data with Pandas DataFrame.

Pros

  • Highly readable: Data is clearly structured and easy to understand.
  • Easy to compare: Quickly spot differences or similarities between items.
  • Easy to sort and filter: Data can be sorted or filtered based on specific criteria.
  • Highly compatible: Easily integrates with tools like Excel, making analysis more efficient.

Cons

  • Cluttered with big data: Large datasets can become visually overwhelming and hard to interpret.
  • Difficult to recognize patterns: Numerical tables don’t easily reveal trends or tendencies.
  • Lacks intuitiveness: Tables alone can be hard to interpret, often requiring charts or graphs for better visualization.

JSON to Data Table Code 1

import json
import pandas as pd
import datetime

with open("b0044_AAPL_2406.json", "r") as fp_json:
    aapl_2406_data = json.load(fp_json)

pd_tmp_data = []

for _date in aapl_2406_data["Time Series (5min)"]:
    tmp_data = aapl_2406_data["Time Series (5min)"][_date]
    pd_tmp_data.append([datetime.datetime.strptime(_date, '%Y-%m-%d %H:%M:%S'), float(tmp_data['open']),
                        float(tmp_data['high']), float(tmp_data['low']), float(tmp_data['close']),
                        float(tmp_data['volume'])])
df = pd.DataFrame(pd_tmp_data, columns=['date', 'open', 'high', 'low', 'close', 'volume'])
print(df)

Output 1

                    date    open    high     low    close   volume
0    2024-06-28 19:55:00  211.41  211.70  211.40  211.660  13341.0
1    2024-06-28 19:50:00  211.31  211.44  211.31  211.440   4606.0
2    2024-06-28 19:45:00  211.31  211.38  211.29  211.350   2269.0
3    2024-06-28 19:40:00  211.28  211.38  211.28  211.325   1970.0
4    2024-06-28 19:35:00  211.16  211.30  211.14  211.280   4367.0
...                  ...     ...     ...     ...      ...      ...
3643 2024-06-03 04:20:00  192.85  192.91  192.84  192.910   1499.0
3644 2024-06-03 04:15:00  192.91  192.96  192.84  192.900   2229.0
3645 2024-06-03 04:10:00  192.91  192.95  192.79  192.950  11909.0
3646 2024-06-03 04:05:00  193.10  193.31  192.90  192.900  14845.0
3647 2024-06-03 04:00:00  192.45  193.14  192.45  193.050  26584.0

Code 2

print(df.median())

"""
pandas.DataFreame.median()
axis: The axis on which to standard the {0: index/1: columns} calculation.
skipna : Whether to ignore missing values.
level : For Multi Index, the level at which to perform the calculation.
numeric_only: Whether to use only numbers, decimals, and booleans.
kwargs : Additional keywords to pass to the function.
"""

Output 2

[3648 rows x 6 columns]
date      2024-06-14 11:57:30
open                   209.78
high                   209.94
low                   209.545
close                  209.78
volume                26344.5
dtype: object

Personally, I use it most to check for any “jumping data” and to see if deduplication is necessary.

That said, I’ve developed a habit of first tabulating data before drawing graphs or refining it. I think many of you who work with data frequently are probably the same.

Things might be different now, but when I was learning, I relied heavily on my personal intuition to analyze data. I would visually inspect different aspects of the data first, make a judgment, and then apply logic. But nowadays, there seems to be a more structured approach based on the characteristics of the data.

Personally, I still believe an analyst’s intuition plays a crucial role in data analysis. Maybe that’s just because I’m from an older generation. I sometimes wonder if I’m falling behind the latest trends.

By Mark

-_-

Leave a Reply

Your email address will not be published. Required fields are marked *