The Illusion of Thinking in Large Reasoning Models

Researchers at Apple investigated whether Large Reasoning Models (LRMs) like o1, o3, DeepSeek-R1, and Claude 3.7 Sonnet truly reason or simply pattern-match using longer chains of thought. They found that for medium complexity tasks, thinking models show advantages over their regular LLM counterparts, but for high complexity tasks, both model types show significant limitations. This suggests the “thinking” capability may be more limited than they appear.

Of course, this doesn’t change the usefulness of these models, but better understanding how they work — and which models are good for which tasks — is essential for using them well.

machinelearning.apple.com → The Illusion of Thinking in Large Reasoning Models

machinelearning.apple.com →
The Illusion of Thinking in Large Reasoning Models