An article on building an AI agent to interact with Apache Airflow using PydanticAI and Gemini 2.0, providing a structured and reliable method for managing DAGs through natural language queries.
- Agent interacts with Apache Airflow via the Airflow REST API.
- Agent can understand natural language queries about workflows, fetch real-time status updates, and return structured data.
- Sample DAGs are implemented for demonstration purposes.
A seven-week structured self-paced study guide for learning Apache Iceberg and its ecosystem, created after the author realized its increasing relevance in the data industry.
Apache Iceberg is emerging as a cornerstone for data lakes and lakehouses in the modern data stack, drawing parallels to the rise of Hadoop a decade ago. This article explores these similarities, highlighting both the opportunities and challenges that Iceberg presents for data engineering.
Apache Airflow's latest update, version 2.10, introduces hybrid execution and enhanced data lineage for more efficient and trustworthy data orchestration, especially for AI workloads.
- standardization, governance, simplified troubleshooting, and reusability in ML application development.
- integrations with vector databases and LLM providers to support new applications -
provides tutorials on integrating