Tags: data quality* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Microsoft's Phi-4-Reasoning-Vision-15B model challenges the trend of ever-larger AI models by demonstrating strong reasoning capabilities with a comparatively compact size. Trained on curated reasoning data, it aims to achieve performance without the massive compute costs associated with frontier models. The model supports multimodal tasks, combining text and image understanding, and offers flexible reasoning modes for different workloads. This research highlights the importance of data quality and training strategy, suggesting that smarter training techniques can be as impactful as simply increasing model size, particularly for AI agents and practical deployments.
  2. As a quick refresher, the Data Dirtiness Score estimates the expected proportion of cells in a data set that contain errors. Here are the key hypotheses behind this metric:

    Data errors are related to violated constraints.
    If there are no expectations, there is no effect on the score.
    Data problems can be pinpointed to specific cells.
    Each data error is assigned a confidence score.
    Every cell has an equal impact on the overall score.
    2024-03-23 Tags: , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "data quality+llm"

About - Propulsed by SemanticScuttle