klotz: text*

Bookmarks on this page are managed by an admin user.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. - Challenges in measuring similarity between unstructured text data like movie descriptions.
    - Simple NLP methods may not yield meaningful results; thus, a controlled vocabulary is proposed.
    - Using an LLM, a genre list is generated for movie titles, which helps improve the similarity model.
    A function is created to find the most similar movies to a given title based on cosine similarity scores.
    Network visualization highlights clusters of genres linked via movies, showcasing potential improvements in recommender systems.
    2024-02-10 Tags: , , , , , by klotz
  2. The TextWrapper class provides functionality for wrapping long pieces of text into multiple shorter lines while preserving the initial and subsequent indents.
    2024-02-07 Tags: , , , by klotz
  3. - Embeddings transform words and sentences into sequences of numbers for computers to understand language.
    - This technology powers tools like Siri, Alexa, Google Translate, and generative AI systems like ChatGPT, Bard, and DALL-E.
    - In the early days, embeddings were crafted by hand, which was time-consuming and couldn't adapt to language nuances easily.
    - The 3D hand-crafted embedding app provides an interactive experience to understand this concept.
    - The star visualization method offers an intuitive way to understand word embeddings.
    - Machine learning models like Word2Vec and GloVe revolutionized the generation of word embeddings from large text datasets.
    - Universal Sentence Encoder (USE) extends the concept of word embeddings to entire sentences.
    - TensorFlow Projector is an advanced tool to interactively explore high-dimensional data like word and sentence embeddings.
  4. Browsh is a fully-modern text-based browser. It renders anything that a modern browser can; HTML5, CSS3, JS, video and even WebGL. Its main purpose is to be run on a remote server and accessed via SSH/Mosh or the in-browser HTML service in order to significantly reduce bandwidth and thus both increase browsing speeds and decrease bandwidth costs.
    2023-12-05 Tags: , by klotz
  5. RETVec is a state-of-the-art text vectorizer which works directly on text inputs to create resilient classification models. Models trained with RETVec achieve better classification performance with fewer parameters and exhibit stronger resilience against adversarial attacks and typos, as reported in our paper.
  6. Google is countering with RETVec (Resilient & Efficient Text Vectorizer). Open sourced by Google Research, this approach “helps models achieve state-of-the-art classification performance and drastically reduces computational cost,” while supporting “every language and all UTF-8 characters without the need for text preprocessing.” This makes it ideal for on-device, web, and other large-scale use cases:
    2023-12-04 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 2 of 0 SemanticScuttle - klotz.me: Tags: text

About - Propulsed by SemanticScuttle