klotz: parquet*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A deep dive into the structure and performance benefits of Parquet files, including columnar storage, partitioning strategies, and row groups.

    2025-03-14 Tags: , , , , by klotz
  2. PyStore is a simple (yet powerful) datastore for Pandas dataframes, designed with storing timeseries data in mind. It leverages Pandas, Numpy, Dask, and Parquet (via pyarrow) for efficient data handling.

  3. usersDF.write.format("orc") .option("orc.bloom.filter.columns", "favorite_color") .option("orc.dictionary.key.threshold", "1.0") .option("orc.column.encoding.direct", "name") .save("users_with_options.orc") Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo

    2021-12-01 Tags: , , , , by klotz
  4. // set Parquet file block size and page size values int blockSize = 256 * 1024 * 1024; int pageSize = 64 * 1024;  

    2017-02-15 Tags: , , , by klotz
  5. 2017-02-13 Tags: , , , , , by klotz
  6. 2014-02-19 Tags: , , , by klotz
  7. 2014-02-19 Tags: , , , by klotz
  8. 2013-09-26 Tags: , , by klotz
  9. 2013-08-27 Tags: , , , by klotz
  10. 2013-07-01 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: parquet

About - Propulsed by SemanticScuttle