SemanticScuttle - klotz.me » Tags: transformers+machine learning+deep learning

Tags: transformers* + machine learning* + deep learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

Chess Llama - Training a tiny Llama model to play chess

This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.

2025-07-21 Tags: chess, llama, llm, machine learning, artificial intelligence, deep learning, transformers, huggingface, chessgpt, uci, pgn by klotz

The Big LLM Architecture Comparison

A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.

2025-07-19 Tags: llm, large language models, deep learning, ai, architecture, deepseek, olmo, gemma, mistral, llama, qwen, smollm, kimi, moe, attention, transformers by klotz

Hands-On Attention Mechanism for Time Series Classification, with Python

This article demonstrates how to use the attention mechanism in a time series classification framework, specifically for classifying normal sine waves versus 'modified' (flattened) sine waves. It details the data generation, model implementation (using a bidirectional LSTM with attention), and results, achieving high accuracy.

2025-06-01 Tags: deep learning, python, time series, transformers, attention, lstm, classification, production engineering, observability by klotz

Understanding Attention in LLMs

The attention mechanism in Large Language Models (LLMs) helps derive the meaning of a word from its context. This involves encoding words as multi-dimensional vectors, calculating query and key vectors, and using attention weights to adjust the embedding based on contextual relevance.

2025-03-07 Tags: attention, llm, machine-learning, neural networks, nlp, transformers by klotz

Deep Dive into Self-Attention by Hand✍︎

Explore the intricacies of the attention mechanism responsible for fueling the transformers.

2025-02-04 Tags: transformers, self-attention, neural networks, llm, machine learning by klotz

New Trends in LLM Architecture

Discusses the trends in Large Language Models (LLMs) architecture, including the rise of more GPU, more weights, more tokens, energy-efficient implementations, the role of LLM routers, and the need for better evaluation metrics, faster fine-tuning, and self-tuning.

2024-06-01 Tags: llm, machine learning, deep learning, transformers, self-tuning, evaluation by klotz

Mastering LLM Techniques: Training

Delving into transformer networks

2023-11-18 Tags: nvidia, llm, training, transformers, deep learning by klotz

Pretrained Transformers as Universal Computation Engines – The Berkeley Artificial Intelligence Research Blog

Pretrained Transformers as Universal Computation Engines

2022-04-27 Tags: transformers, universal computation, machine learning, uc berkeley, bair, deep learning, transfer learning, ontology by klotz

AI Weekly: Researchers attempt an open source alternative to GitHub's Copilot | VentureBeat

2021-09-25 Tags: transformers, fine-tune, deep learning, gpt-3, huggingface, codex by klotz

zero-shot-pipeline-sentiment.ipynb - Colaboratory