Researchers at Arizona State University are waving a big neon sign that says “chains of thought” in AI aren’t little windows into a model’s brain—they’re just stat-driven tokens stitched together to look like reasoning. In their new position paper, they call anthropomorphizing these intermediate steps a “cargo cult” move that lulls researchers into false confidence about transparency and control. In fact, models trained with random or even nonsensical chains often outperform those with polished, human-readable ones, proving that the only thing that really matters is the final answer, not the fluff in between.
Instead of chasing the illusion of step-by-step human logic, the team suggests we focus on verifying outputs and understanding how feedback shapes model behavior. Those “aha” and “hmm” tokens? They’re just statistically likely continuations, not Eureka moments. Bottom line: stop reading tea leaves in the token soup and start measuring real performance and robustness.
Top comments (0)