SemanticScuttle - klotz.me » Tags: orc+hadoop

Tags: orc* + hadoop*

0 bookmark(s) - Sort by: Date ↓ / Title /

Generic Load/Save Functions - Spark 3.2.0 Documentation

usersDF.write.format("orc") .option("orc.bloom.filter.columns", "favorite_color") .option("orc.dictionary.key.threshold", "1.0") .option("orc.column.encoding.direct", "name") .save("users_with_options.orc") Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo

2021-12-01 Tags: spark, orc, bloom filter, parquet, hadoop by klotz
ORC Creation Best Practices - Hortonworks

-set hive.exec.orc.split.strategy=ETL; -- this will work only for specific values scan, if full table scan will be required anyway, use default (HYBRID) or BI.

2017-06-12 Tags: spark, hadoop, orc, hive, performance, tuning, etl, bi, hybrid by klotz
Hive Optimizations with Indexes, Bloom-Filters and Statistics – Technology Snippets by Jörn Franke

2017-03-08 Tags: orc, optimization, bloom filter, index, hadoop by klotz
Microsoft OLAP Blog by Hilmar Buchta: Hive file format comparison

2017-02-13 Tags: hadoop, avro, orc, parquet, comparison, performance by klotz

First / Previous / Next / Last / Page 1 of 0