SemanticScuttle - klotz.me » Tags: inference+machine learning

Tags: inference* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

Artifacts as Memory Beyond the Agent Boundary

This paper explores how reinforcement learning agents can use environmental features, termed artifacts, to function as external memory. By formalizing this intuition within a mathematical framework, the authors prove that certain observations can reduce the information required to represent an agent's history. Through experiments with spatial navigation tasks using both Linear Q-learning and Deep Q-Networks (DQN), the study demonstrates that observing paths or landmarks allows agents to achieve higher performance with lower internal computational capacity. Notably, this effect of externalized memory emerges unintentionally through the agent's sensory stream without explicit design for memory usage.

- Formalization of artifacts as observations that encode information about the past.
- The Artifact Reduction Theorem proving environmental artifacts reduce history representation requirements.
- Empirical evidence showing reduced internal capacity needs when spatial paths are visible.
- Observation that externalized memory can emerge implicitly in standard RL agents.
- Implications for agent design, suggesting performance gains may come from environment-agent coevolution rather than just scaling parameters.

2026-04-15 Tags: memory, artifacts, situated cognition, bounded agent, reinforcement learning, llm, inference, optimization, john d. martin, openmind research institute, bird nest algorithm, epistemology by klotz

Run LLMs on Intel® GPUs Using llama.cpp

This article details how to run Large Language Models (LLMs) on Intel GPUs using the llama.cpp framework and its new SYCL backend, offering performance improvements and broader hardware support.

2026-01-07 Tags: llama.cpp, llm, intel gpu, sycl, oneapi, inference, machine learning, linux by klotz

How LLM Inference Works

A deep dive into the process of LLM inference, covering tokenization, transformer architecture, KV caching, and optimization techniques for efficient text generation.

2025-11-26 Tags: llm, inference, transformer, tokenization, kv cache, quantization, deep learning, machine learning, neural networks by klotz

A ferroelectric-memristor memory for both training and inference

A unified memory stack that functions as a memristor as well as a ferroelectric capacitor is reported, enabling both energy-efficient inference and learning at the edge.

2025-09-23 Tags: ferroelectric memory, memristor, artificial intelligence, edge computing, in-memory computing, neural networks, training, inference by klotz

Introducing gpt-oss

OpenAI releases gpt-oss-120b and gpt-oss-20b, two state-of-the-art open-weight language models that deliver strong real-world performance at low cost. They outperform similarly sized open models on reasoning tasks and are optimized for efficient deployment.

2025-08-06 Tags: gpt-oss, open-weight models, llm, reasoning, openai, o3, o4-mini, machine learning, inference by klotz

El Reg's essential guide to deploying LLMs in production

Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.

2025-04-28 Tags: llm, ai, production engineering, inference engineering, deployment, vllm, nvidia, kubernetes, inference, api, scaling, gpu, machine learning by klotz

Primer LLM Embedding

This Space demonstrates a simple method for embedding text using a LLM (Large Language Model) via the Hugging Face Inference API. It showcases how to convert text into numerical vector representations, useful for semantic search and similarity comparisons.

2025-03-28 Tags: llm, embedding, hugging face, inference, api, semantic search, vector representation, text embedding by klotz

NVIDIA DGX Spark

NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.

2025-03-24 Tags: machine learning, nvidia, dgx spark, llm, grace blackwell, ai development, inference, data science, gpu, cpu by klotz

Is it a creditable approach to use Random Forrest Variable importance for causal inference?

The article discusses the credibility of using Random Forest Variable Importance for identifying causal links in data where the output is binary. It contrasts this method with fitting a Logistic Regression model and examining its coefficients. The discussion highlights the challenges of extracting causality from observational data without controlled experiments, emphasizing the importance of domain knowledge and the use of partial dependence plots for interpreting model results.

2025-01-11 Tags: machine learning, random forest, inference, causality, importance, feature engineering by klotz

Running Machine Learning Workloads on Kubernetes with Google AI Platform and TensorFlow Serving

In this article, we explore how to deploy and manage machine learning models using Google Kubernetes Engine (GKE), Google AI Platform, and TensorFlow Serving. We will cover the steps to create a machine learning model and deploy it on a Kubernetes cluster for inference.

2024-05-15 Tags: gcp, kubernetes, machine learning, tensorflow sing, deployment, inference, mlops, production engineering by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: inference* + machine learning*

Linked Tags

Related Tags