SemanticScuttle - klotz.me » klotz: pyarrow

klotz: pyarrow*

Python Pandas Ditches NumPy for Speedier PyArrow

Pandas 3.0 will significantly boost performance by replacing NumPy with PyArrow as its default engine, enabling faster loading and reading of columnar data.

2025-05-27 Tags: python, pandas, numpy, pyarrow, data analysis, performance, machine learning by klotz
Anatomy of a Parquet File

A deep dive into the structure and performance benefits of Parquet files, including columnar storage, partitioning strategies, and row groups.

2025-03-14 Tags: parquet, data, storage, pyarrow, data engineering by klotz
PyStore - Fast data store for Pandas timeseries data

PyStore is a simple (yet powerful) datastore for Pandas dataframes, designed with storing timeseries data in mind. It leverages Pandas, Numpy, Dask, and Parquet (via pyarrow) for efficient data handling.

2024-12-21 Tags: pystore, pandas, timeseries, datastore, dask, parquet, pyarrow, shrunk by klotz

First / Previous / Next / Last / Page 1 of 0