klotz: deep learning*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. OpenMythos is an open-source PyTorch project by Kye Gomez that proposes a theoretical reconstruction of Anthropic's Claude Mythos architecture. Instead of standard transformer layers, it suggests a Recurrent-Depth Transformer (RDT) design where weights loop through multiple iterations to increase reasoning depth during inference. By combining Mixture-of-Experts with Multi-Latent Attention and stability constraints, the model achieves performance parity between 770M parameters and a 1.3B parameter standard transformer.

    * open-source PyTorch reconstruction of claude mythos
    * proposes recurrent-depth transformer architecture
    * reasoning depth scales via inference-time loops rather than parameter count
    * uses mixture-of-experts for domain breadth
    * implements multi-latent attention to reduce memory usage
    * employs lti injection and adaptive computation time for stability
    * achieves 1.3b parameter performance with only 770m parameters
  2. Personal website of Jamie Simon, a scientist specializing in fundamental theory for deep learning. He runs a research lab at the Redwood Center at UC Berkeley with funding from Imbue and recently completed his PhD under Mike DeWeese. The site serves as a hub for his scientific research, personal blog posts regarding science and life adventures, and custom-made puzzles.
    Main topics:
    * Deep learning fundamental theory
    * Research publications
    * Science and lifestyle blog
    * Puzzle creation
  3. A comprehensive curated collection of Large Language Model (LLM) architecture figures and technical fact sheets. This gallery provides a visual and data-driven overview of modern model designs, ranging from classic dense architectures like GPT-2 to advanced sparse Mixture-of-Experts (MoE) systems and hybrid attention models. Users can explore detailed specifications including parameter scales, context windows, attention mechanisms, and intelligence indices for various prominent models.
    Key features include:
    * Detailed architecture fact sheets for a wide array of models such as Llama, DeepSeek, Qwen, Gemma, and Mistral.
    * An architecture diff tool to compare two different model designs side-by-side.
    * Comparative analysis across dense, MoE, MLA, and hybrid decoder families.
    * Links to original source articles and technical reports for deeper research.
  4. This paper explores how reinforcement learning agents can use environmental features, termed artifacts, to function as external memory. By formalizing this intuition within a mathematical framework, the authors prove that certain observations can reduce the information required to represent an agent's history. Through experiments with spatial navigation tasks using both Linear Q-learning and Deep Q-Networks (DQN), the study demonstrates that observing paths or landmarks allows agents to achieve higher performance with lower internal computational capacity. Notably, this effect of externalized memory emerges unintentionally through the agent's sensory stream without explicit design for memory usage.

    - Formalization of artifacts as observations that encode information about the past.
    - The Artifact Reduction Theorem proving environmental artifacts reduce history representation requirements.
    - Empirical evidence showing reduced internal capacity needs when spatial paths are visible.
    - Observation that externalized memory can emerge implicitly in standard RL agents.
    - Implications for agent design, suggesting performance gains may come from environment-agent coevolution rather than just scaling parameters.
  5. This is an open, unconventional textbook covering mathematics, computing, and artificial intelligence from foundational principles. It's designed for practitioners seeking a deep understanding, moving beyond exam preparation and focusing on real-world application. The author, drawing from years of experience in AI/ML, has compiled notes that prioritize intuition, context, and clear explanations, avoiding dense notation and outdated material.
    The compendium covers a broad range of topics, from vectors and matrices to machine learning, computer vision, and multimodal learning, with future chapters planned for areas like data structures and AI inference.
  6. Qwen3-Coder-Next is an 80-billion-parameter language model that activates only 3 billion parameters during inference, achieving strong coding capabilities through agentic training with verifiable task synthesis and reinforcement learning. It is an open-weight model specialized for coding agents, and both base and instruction-tuned versions are released to support research and real-world coding agent development.
  7. NVIDIA GTC is the premier AI conference and exhibition. Learn about the latest advancements in AI, deep learning, and accelerated computing. Includes keynote speakers, sessions, workshops, and an exhibit hall.
  8. This article explores how agentic AI can revolutionize deep learning experimentation by automating tasks like hyperparameter tuning, architecture search, and data augmentation. It delves into the core concepts, benefits, and practical considerations of using agentic systems to accelerate and improve the deep learning workflow.
  9. SpiderPi Pro is an advanced hexapod robot integrated with AI vision and powered by Raspberry Pi. It features intelligent serial bus servos with a torque of 20KG, 5DOF robot arm, glowy ultrasonic sensor, IMU sensor and dot matrix module and can be programmed using Python. SpiderPi Pro serves as an ideal platform for conducting research in motion control for hexapod robots, machine vision, OpenCV, deep learning, and various other fields.
  10. A curated reading list for those starting to learn about Large Language Models (LLMs), covering foundational concepts, practical applications, and future trends, updated for 2026.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: deep learning

About - Propulsed by SemanticScuttle