klotz: evaluation metrics* + adam tauman kalai*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This paper explains that hallucinations in large language models (LLMs) aren’t due to flawed data, but to the way these models are trained and evaluated. LLMs are incentivized to guess rather than admit uncertainty, leading to errors that are statistically predictable. The authors frame this as a binary classification problem – correctly identifying valid outputs – and demonstrate a link between misclassification rate and hallucination rate. They argue that fixing this requires a shift in evaluation metrics, moving away from rewarding overconfidence and towards accepting uncertainty, to build more trustworthy models.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: evaluation metrics + adam tauman kalai

About - Propulsed by SemanticScuttle