PhD student Sarah Alnegheimish is developing Orion, an open-source, user-friendly machine learning framework for detecting anomalies in large-scale industrial and operational settings. She focuses on making machine learning systems accessible, transparent, and trustworthy, and is exploring repurposing pre-trained models for anomaly detection.
A machine learning library for unsupervised time series anomaly detection. Orion provides verified ML pipelines to identify rare patterns in time series data.
SigLLM is an extension of the Orion library, built to detect anomalies in time series data using LLMs. It provides two types of pipelines for anomaly detection: Prompter (directly prompting LLMs) and Detector (using LLMs to forecast time series).
A podcast exploring the history of computing, from early machines like the LINC and LGP-30 to more modern devices like the Keypact Micro-VIP and Friden Flexowriter, and the people who restore and preserve them.
Abstract
>
Optimizing deep learning algorithms currently requires slow, manual derivation, potentially
leaving much performance untapped. Methods like FlashAttention have achieved a ×6
performance improvement over native PyTorch by avoiding unnecessary data transfers, but
required three iterations over three years to be developed. Automated compiled methods
have consistently lagged behind. This paper extends Neural Circuit Diagrams for deep
learning models to consider resource usage and the distribution of tasks across a GPU
hierarchy. We show how diagrams can use simple relabellings to derive high-level streaming
and tiling optimization strategies along with performance models. We show how this high-
level performance model allows the effects of quantization and multi-level GPU hierarchies
to be readily considered. We develop a methodology for representing intermediate-level
pseudocode with diagrams, allowing hardware-aware algorithms to be derived step-by-step.
Finally, we show how our methodology can be used to better understand existing techniques
like FlashAttention. This work uses a theoretical framework to link assumptions about
GPU behaviour to claims about performance. We aim to lay the groundwork for a scientific
approach to GPU optimization where experiments can address clear hypotheses rather than
post-hoc rationalizations.
>"TL;DR: We unify over 23 methods in contrastive learning, dimensionality reduction, spectral clustering, and supervised learning with a single equation."
>"As the field of representation learning grows, there has been a proliferation of different loss functions to solve different classes of problems. We introduce a single information-theoretic equation that generalizes a large collection of mod- ern loss functions in machine learning. In particular, we introduce a framework that shows that several broad classes of machine learning methods are precisely minimizing an integrated KL divergence between two conditional distributions: the supervisory and learned representations. This viewpoint exposes a hidden information geometry underlying clustering, spectral methods, dimensionality re- duction, contrastive learning, and supervised learning. This framework enables the development of new loss functions by combining successful techniques from across the literature. We not only present a wide array of proofs, connecting over 23 different approaches, but we also leverage these theoretical results to create state-of-the-art unsupervised image classifiers that achieve a +8% improvement over the prior state-of-the-art on unsupervised classification on ImageNet-1K. We also demonstrate that I-Con can be used to derive principled debiasing methods which improve contrastive representation learners."
This paper explores 'TeleAbsence,' a concept extending telepresence to address emotional distance from lost loved ones through poetic encounters with their digital and physical traces, inspired by the Portuguese concept of 'Saudade.' It outlines five design principles – presence of absence, illusory communication, materiality of memory, traces of reflection, and remote time – and explores applications using mediums like poetry, phone, piano, and pen.
A reference manual for the extensible, customizable, self-documenting real-time display editor. This manual corresponds to EMACS version 162.
New genetic research suggests that humans first developed language around 135,000 years ago, with its widespread social use around 100,000 years ago. This study, using data from 15 genetic studies, indicates that language likely began as a cognitive system before becoming crucial for social communication.