SemanticScuttle - klotz.me » Tags: data engineering

Tags: data engineering*

0 bookmark(s) - Sort by: Date ↓ / Title /

Apache Iceberg Won the Future — What’s Next for 2025?

The article discusses the rise of Apache Iceberg as the dominant open table format, backed by major endorsements, and outlines key developments expected for 2025 such as Role-Based Access Control (RBAC) catalogs, Change Data Capture (CDC) capabilities, and materialized views.

2024-11-20 Tags: apache iceberg, data engineering, data lakehouse by klotz

Efficient Testing of ETL Pipelines with Python

This article explains how to quickly detect data quality issues and identify their causes using Python for ETL pipelines. It discusses strategies to minimize the time required to fix data quality problems.

2024-10-07 Tags: etl, pipelines, data quality, python, tableau, data engineering, business intelligence by klotz

Building a Robust Data Observability Framework

How to ensure data quality and integrity using open-source tools for observability in data pipelines.

2024-08-29 Tags: observability, data pipeline, data engineering, production engineering by klotz

The definitive guide to data pipelines

Data pipelines are essential for connecting data across systems and platforms. This article provides a deep dive into how data pipelines are implemented, their use cases, and how they're evolving with generative AI.

2024-08-27 Tags: data engineering, pipeline, observability, data governance, mlops, production engineering by klotz

Tracking in Practice: Code, Data and ML Model

A guide to tracking in MLOps, covering code, data, and machine learning model tracking

2024-07-12 Tags: mlops, data engineering, production engineering by klotz

Airbyte Tutorials

Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases.

2024-07-12 Tags: etl, data engineering, pipeline, data warehousing, data bus, airbyre by klotz

Simplifying the Python Code for Data Engineering Projects

This article provides Python tricks and techniques for data ingestion, validation, processing, and testing in data engineering projects. It offers practical solutions for streamlining the code, including tips for data validation, handling errors, and testing.

2024-06-13 Tags: python, data engineering by klotz

How moving from Pandas to Polars made me write better code (without writing better code)

An exploration of the benefits of switching from the popular Python library Pandas to the newer Polars for data manipulation tasks, highlighting improvements in performance, concurrency, and ease of use.

2024-07-13 Tags: pandas, polars, data engineering, python, dataframe by klotz

DuckDB: In-Process Python Analytics for Not-Quite-Big Data

An in-process analytics database, DuckDB can work with surprisingly large data sets without having to maintain a distributed multiserver system. Best of all? You can analyze data directly from your Python app.

2024-06-02 Tags: duckdb, python, analytics, database, big data, sql, panda_s, data engineering by klotz

Automating Data Pipelines with Python & GitHub Actions

An article discussing a simple and free way to automate data workflows using Python and GitHub Actions, written by Shaw Talebi.

2024-06-01 Tags: pipeline, python, github actions, machine learning, screwdriver, data engineering by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: data engineering*

Linked Tags

Related Tags