klotz: transformer models*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Learn how to label text without the need for task-specific training data by using zero-shot text classification. This guide explains how pretrained transformer models, such as BART, reframe classification as a reasoning task where labels are treated as natural language statements.
    Key topics include:
    * The core concept of zero-shot classification and its advantages for rapid prototyping.
    * Using the Hugging Face transformers pipeline with the facebook/bart-large-mnli model.
    * Implementing multi-label classification for texts belonging to multiple categories.
    * Improving accuracy through custom hypothesis template tuning and clear label wording.
  2. A comprehensive curated collection of Large Language Model (LLM) architecture figures and technical fact sheets. This gallery provides a visual and data-driven overview of modern model designs, ranging from classic dense architectures like GPT-2 to advanced sparse Mixture-of-Experts (MoE) systems and hybrid attention models. Users can explore detailed specifications including parameter scales, context windows, attention mechanisms, and intelligence indices for various prominent models.
    Key features include:
    * Detailed architecture fact sheets for a wide array of models such as Llama, DeepSeek, Qwen, Gemma, and Mistral.
    * An architecture diff tool to compare two different model designs side-by-side.
    * Comparative analysis across dense, MoE, MLA, and hybrid decoder families.
    * Links to original source articles and technical reports for deeper research.
  3. This paper demonstrates that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence. It explores the use of the 'detached Jacobian' to interpret semantic concepts within LLMs and potentially steer next-token prediction.
  4. • A beginner's guide to understanding Hugging Face Transformers, a library that provides access to thousands of pre-trained transformer models for natural language processing, computer vision, and more.
    • The guide covers the basics of Hugging Face Transformers, including what it is, how it works, and how to use it with a simple example of running Microsoft's Phi-2 LLM in a notebook
    • The guide is designed for non-technical individuals who want to understand open-source machine learning without prior knowledge of Python or machine learning.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: transformer models

About - Propulsed by SemanticScuttle