Logward is an open-source log collector and viewer designed for small environments like home labs. It offers a modern interface and supports Sigma rules for log detection and alerting.
Lux is a Python library designed to automate data visualization within Pandas DataFrames, streamlining the exploratory data analysis (EDA) process. It automatically generates insightful charts like distributions, correlations, and temporal trends upon displaying a DataFrame, reducing the need for manual plotting code. Users can also save visualizations as interactive HTML reports or export individual charts for further customization using tools like Matplotlib, Seaborn, or Altair. While best suited for Jupyter Notebook environments and smaller datasets, Lux aims to accelerate data understanding and hypothesis building, particularly for learners and researchers.
The author discusses a shift in approach to clustering mixed data, advocating for starting with the simpler Gower distance metric before resorting to more complex embedding techniques like UMAP. They introduce 'Gower Express', an optimized and accelerated implementation of Gower.
Learn how to connect several essential tools to develop a simple yet intuitive dashboard using Streamlit, Plotly, DuckDB, and Pandas to visualize data from a JSON file.
A guide to building a front-end data application using Taipy, comparing it to Streamlit and Gradio, and providing a step-by-step implementation of a sales performance dashboard.
The article uses a WSJ measles heatmap to illustrate heatmaps' effectiveness in displaying vaccine impacts on infectious diseases. It guides creating custom colormaps with Matplotlib’s LinearSegmentedColormap and pcolormesh function.
TimesFM is a pretrained time-series foundation model developed by Google Research for time-series forecasting, focusing on point forecasts for univariate time series up to 512 time points with any horizon length and an optional frequency indicator.
PyStore is a simple (yet powerful) datastore for Pandas dataframes, designed with storing timeseries data in mind. It leverages Pandas, Numpy, Dask, and Parquet (via pyarrow) for efficient data handling.