SemanticScuttle - klotz.me

klotz: hadoop*

The open source, distributed, parallel computation framework developed by Doug Cutting and Mike Cafarella and based on functional programming operations Map and Reduce, as described in the Google MapReduce paper.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

Apache Iceberg: The Hadoop of the Modern Data Stack?

Apache Iceberg is emerging as a cornerstone for data lakes and lakehouses in the modern data stack, drawing parallels to the rise of Hadoop a decade ago. This article explores these similarities, highlighting both the opportunities and challenges that Iceberg presents for data engineering.

2024-12-15 Tags: apache, iceberg, hadoop, data lake, lakehouse, data engineering, meradata by klotz
Spark on EMR — Cost Optimization. First-hand experience of cost-saving… | by Amit Singh Rathore | May, 2022 | Medium

2022-05-16 Tags: emr, eks, spark, aws, cost, optimization, data engineering by klotz
reload4j

The reload4j project offers a clear and easy migration path for the thousands of users who have an urgent need to fix vulnerabilities in log4j 1.2.17.

2022-01-25 Tags: log4j, hadoop, logging, security by klotz
Generic Load/Save Functions - Spark 3.2.0 Documentation

usersDF.write.format("orc")
.option("orc.bloom.filter.columns", "favorite_color")
.option("orc.dictionary.key.threshold", "1.0")
.option("orc.column.encoding.direct", "name")
.save("users_with_options.orc")
Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo

2021-12-01 Tags: spark, orc, bloom filter, parquet, hadoop by klotz
Oozie 3.2.0 - Workflow Expression Language

2021-02-13 Tags: oozie, expression language, hadoop, retrocomputing by klotz
Migration Guide: SQL, Datasets and DataFrame - Spark 3.0.1 Documentation

2021-01-28 Tags: spark, spark 3.0, scala, python, migration, hadoop by klotz
Delta Lake in Action: Upsert & Time Travel | by Jyoti Dhiman | Sep, 2020 | Towards Data Science

2020-09-11 Tags: apache, spark, delta lake, data lake, analytics, data engineering, hadoop by klotz
Spark Dynamic Partition Inserts and AWS S3 — Part 2

2020-03-18 Tags: s3, emr, kubernetes, spark, dynamic partition inserts, aws, partitioning, s3a, production engineering by klotz
Spark Dynamic Partition Inserts — Part 1 - Nielsen-Tel-Aviv-tech-blog - Medium

sparkSession.conf
.set(“spark.sql.sources.partitionOverwriteMode”, “dynamic”)

2020-03-18 Tags: dynamic partition inserts, spark, partitioning, hadoop by klotz
Understanding HDFS quotas and Hadoop fs and fsck tools

QUOTA REMAINING_QUOTA SPACE_QUOTA REMAINING_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME

2019-05-31 Tags: hadoop, fs by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0

About - Propulsed by SemanticScuttle

SemanticScuttle - klotz.me

klotz: hadoop*

Linked Tags

Related Tags