klotz: hadoop*

The open source, distributed, parallel computation framework developed by Doug Cutting and Mike Cafarella and based on functional programming operations Map and Reduce, as described in the Google MapReduce paper.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Apache Iceberg is emerging as a cornerstone for data lakes and lakehouses in the modern data stack, drawing parallels to the rise of Hadoop a decade ago. This article explores these similarities, highlighting both the opportunities and challenges that Iceberg presents for data engineering.
  2. 2022-05-16 Tags: , , , , , , by klotz
  3. The reload4j project offers a clear and easy migration path for the thousands of users who have an urgent need to fix vulnerabilities in log4j 1.2.17.
    2022-01-25 Tags: , , , by klotz
  4. usersDF.write.format("orc")
    .option("orc.bloom.filter.columns", "favorite_color")
    .option("orc.dictionary.key.threshold", "1.0")
    .option("orc.column.encoding.direct", "name")
    .save("users_with_options.orc")
    Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo
    2021-12-01 Tags: , , , , by klotz
  5. 2021-01-28 Tags: , , , , , by klotz
  6. sparkSession.conf
    .set(“spark.sql.sources.partitionOverwriteMode”, “dynamic”)
  7. QUOTA REMAINING_QUOTA SPACE_QUOTA REMAINING_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME
    2019-05-31 Tags: , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: hadoop

About - Propulsed by SemanticScuttle