An Apple study shows that large language models (LLMs) can improve performance by using a checklist-based reinforcement learning scheme, similar to a simple productivity trick of checking one's work.
“we found no evidence of formal reasoning in language models …. Their behavior is better explained by sophisticated pattern matching—so fragile, in fact, that changing names can alter results by ~10%!”