Tags: nlp*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Andrej Karpathy's recommended paper reading list, covering various aspects of Language Models (LLMs), including attention mechanisms, unsupervised multi-task learning (GPT-2), instruction-following language models (InstructGPT), LLaMA, reinforcement learning from human feedback (RLAIF), and early experiments of GPT-4, offering insights into significant research developments in LLM and their role in AI landscape, benefiting both novice and experienced AI enthusiasts.
  2. - Challenges in measuring similarity between unstructured text data like movie descriptions.
    - Simple NLP methods may not yield meaningful results; thus, a controlled vocabulary is proposed.
    - Using an LLM, a genre list is generated for movie titles, which helps improve the similarity model.
    A function is created to find the most similar movies to a given title based on cosine similarity scores.
    Network visualization highlights clusters of genres linked via movies, showcasing potential improvements in recommender systems.
    2024-02-10 Tags: , , , , , by klotz
  3. - Embeddings transform words and sentences into sequences of numbers for computers to understand language.
    - This technology powers tools like Siri, Alexa, Google Translate, and generative AI systems like ChatGPT, Bard, and DALL-E.
    - In the early days, embeddings were crafted by hand, which was time-consuming and couldn't adapt to language nuances easily.
    - The 3D hand-crafted embedding app provides an interactive experience to understand this concept.
    - The star visualization method offers an intuitive way to understand word embeddings.
    - Machine learning models like Word2Vec and GloVe revolutionized the generation of word embeddings from large text datasets.
    - Universal Sentence Encoder (USE) extends the concept of word embeddings to entire sentences.
    - TensorFlow Projector is an advanced tool to interactively explore high-dimensional data like word and sentence embeddings.
  4. With deep learning, the ROI for having clean and high quality data is immense, and this is realized in every phase of training. For context, the era right before BERT in the text classification world was one where you wanted an abundance of data, even at the expense of quality. It was more important to have representation via examples than for the examples to be perfect. This is because many Al systems did not use pre-trained embeddings (or they weren't any good, anyway) that could be leveraged by a model to apply practical generalizability. In 2018, BERT was a breakthrough for down-stream text tasks,
    2023-11-11 Tags: , , , , by klotz
  5. LangChain has many advanced retrieval methods to help address these challenges. (1) Multi representation indexing: Create a document representation (like a summary) that is well-suited for retrieval (read about this using the Multi Vector Retriever in a blog post from last week). (2) Query transformation: in this post, we'll review a few approaches to transform humans questions in order to improve retrieval. (3) Query construction: convert human question into a particular query syntax or language, which will be covered in a future post
    2023-10-26 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "nlp"

About - Propulsed by SemanticScuttle