klotz: spark*

Spark is an open-source, distributed computing framework for large-scale data processing, originally developed by the UC Berkeley AmpLab It is designed to be fast and general enough to handle a wide variety of workloads, including ETL, machine learning, streaming, and graph processing. It is built on top of Hadoop, Yarn, or other substrates and provides a programming interface for programming with an ecosystem of libraries for machine learning, graph processing, and streaming. Spark is used in cloud engineering and machine learning science for its ability to process large amounts of data quickly and efficiently. It is written in Scala, and can be used with Python, Java, and R for production-level applications. It integrates with Kubernetes and cloud providers for scalability and management.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. 2023-12-24 Tags: , by klotz
  2. 2023-08-03 Tags: , , by klotz
  3. 2023-04-25 Tags: , , , , by klotz
  4. 2023-04-25 Tags: , , , , , by klotz
  5. 2023-01-07 Tags: , , , by klotz
  6. 2022-05-16 Tags: , , , , , , by klotz
  7. 2022-01-31 Tags: , , , by klotz
  8. 2021-12-06 Tags: , , , by klotz
  9. usersDF.write.format("orc")
    .option("orc.bloom.filter.columns", "favorite_color")
    .option("orc.dictionary.key.threshold", "1.0")
    .option("orc.column.encoding.direct", "name")
    .save("users_with_options.orc")
    Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo
    2021-12-01 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: spark

About - Propulsed by SemanticScuttle