This page details the command-line utility for the Embedding Atlas, a tool for exploring large text datasets with metadata. It covers installation, data loading (local and Hugging Face), visualization of embeddings using SentenceTransformers and UMAP, and usage instructions with available options.
   
    
 
 
  
   
   An article discussing the importance of time series databases and data visualization tools like Grafana for managing and interpreting streams of data in various applications.
The author mentions several time series databases (TSDs) and visualization tools, focusing on their features, advantages, and some limitations. The article also provides an example of a Building Management and Control (BMaC) project that uses InfluxDB and Grafana for data visualization.
| Database          | Description                                                                                     | Notable Features                                                                 |
|-------------------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
| InfluxDB          | Partially open source, with version 3 being an edge data collector.                            | Shard-based storage, compaction levels, time series index, optional retention. |
| Apache Kudu       | Column-based database optimized for multidimensional OLAP workloads.                            | Part of the Apache Hadoop ecosystem.                                           |
| Prometheus        | Developed at SoundCloud for metrics monitoring.                                                | Written in Go, similar to InfluxDB v1 and v2.                                   |
| RRDTool           | All-in-one package with a circular buffer TSD that also does graphing.                         | Language bindings for various programming languages.                           |
| Graphite          | Similar to RRDTool but uses a Django web-based application to render graphs.                   | Web-based graphing.                                                             |
| TimescaleDB       | Extends PostgreSQL, supporting typical SQL queries with TSD functionality and optimizations.   | Supports all typical SQL queries.                                               |
The article also discusses Grafana as a popular tool for creating dashboards to visualize time series data, mentioning its compatibility with multiple TSDs and SQL databases. It concludes by highlighting the importance of understanding one's specific needs before choosing a TSD and visualization solution.
   
    
 
 
  
   
   A guide to building a front-end data application using Taipy, comparing it to Streamlit and Gradio, and providing a step-by-step implementation of a sales performance dashboard.
   
    
 
 
  
   
   This article introduces Streamlit, a Python library for building data dashboards, as a solution for Python programmers to create graphical front-ends without needing to delve into CSS, HTML, or JavaScript. The author, a seasoned data engineer, explains how Streamlit and similar tools enable the creation of attractive dashboards, marking a shift from traditional tools like Tableau or Quicksight. This piece serves as the first in a series focusing on Streamlit, with future articles planned on Gradio and Taipy. The author aims to replicate similar layouts and functionalities across dashboards using consistent data.
   
    
 
 
  
   
   An exploration of AG Grid, a JavaScript data grid library used to build interactive and advanced data tables or grids in web applications, highlighting its features, performance, and how it compares to other solutions.
   
    
 
 
  
   
   A visual representation of papers on ArXiv using UMAP and nomic-embed.
   
    
 
 
  
   
   A map of math articles from ArXiv using t-SNE and nomic-embed.
   
    
 
 
  
   
   This article details a data-driven exploration of owl species, using Wikipedia data to create a network visualization of owl relationships.
   
    
 
 
  
   
   This article introduces Path-Swarm and Super-Swarm, new techniques for creating swarm charts using circle arrangements for data visualization. The author, Nick Gerend, discusses two primary swarm techniques and some extensions for rapid visual exploration of data. Written for Towards Data Science.
   
    
 
 
  
   
   eplot is an Emacs package for generating time series charts, plots and bar charts interactively.