The article discusses how Visa leverages retrieval-augmented generation (RAG) and deep learning to enhance operations. It describes Visa's 'Secure ChatGPT,' which offers a multi-model interface for secure internal use, and how RAG improves policy-related data retrieval. The article also explores Visa's data infrastructure and AI's role in fraud prevention.
This article describes a workflow using Large Language Models (LLMs) to automate the process of normalising spreadsheet data, making it tidy and machine-readable for easier analysis and insights.
A deep dive into the structure and performance benefits of Parquet files, including columnar storage, partitioning strategies, and row groups.
Google has enhanced Google Sheets with an AI-powered upgrade using its Gemini technology. This update allows users to automatically convert spreadsheets into charts, identify trends, and create advanced visualizations like heatmaps. Users can interact with the Gemini feature directly through a chat interface within Sheets.
Amazon S3 Batch Operations allows you to process hundreds, millions, or even billions of S3 objects efficiently. You can perform various actions such as copying objects, setting tags, restoring from Glacier, or invoking AWS Lambda functions on each object without writing custom code.
Spotify, a human's digital jukebox, has been a data-driven company since day one, using data for various purposes including payments and experimentation. Managing the vast amount of data required a more streamlined approach, leading to the development of their internal data platform.
**Event Delivery System:**
- **On-Premises Setup:** Initially, Spotify used on-premises solutions like Kafka and HDFS. Event data from clients was captured, timestamped, and routed to a central Hadoop cluster.
- **Google Cloud Transition:** In 2015, Spotify moved to Google Cloud Platform (GCP) for better scalability and reliability. Key components include File Tailer, Event Delivery Service, Reliable Persistent Queue, and ETL jobs using Dataflow and BigQuery.
Arize Phoenix is an open-source observability library for AI experimentation, evaluation, and troubleshooting, built by Arize AI.
Learn how GPU acceleration can significantly speed up JSON processing in Apache Spark, reducing runtime and costs for enterprise data applications.
This article introduces Streamlit, a Python library for building data dashboards, as a solution for Python programmers to create graphical front-ends without needing to delve into CSS, HTML, or JavaScript. The author, a seasoned data engineer, explains how Streamlit and similar tools enable the creation of attractive dashboards, marking a shift from traditional tools like Tableau or Quicksight. This piece serves as the first in a series focusing on Streamlit, with future articles planned on Gradio and Taipy. The author aims to replicate similar layouts and functionalities across dashboards using consistent data.
This article explains the challenges of data integration in modern systems and how Apache Kafka addresses these issues by providing a decoupled, scalable, and maintainable architecture through its publish-subscribe model. The article covers Kafka’s architecture, core concepts, and benefits for real-time data streaming and event-driven systems.