Artificial Intelligence

Research Reveals LLMs Default to Simple Reasoning When Complexity Increases

Published November 25, 2025

Alex McFarland

A team of researchers published a comprehensive study on November 20 analyzing over 192,000 reasoning traces from large language models (LLMs), revealing that AI systems rely on shallow, linear strategies rather than the hierarchical cognitive processes humans naturally employ.

The research team examined 18 different models across text, vision, and audio reasoning tasks, comparing their approaches against 54 human think-aloud traces collected specifically for the study. The analysis established a taxonomy of 28 cognitive elements that encompass computational constraints, meta-cognitive controls, knowledge representations, and transformation operations—providing a framework to evaluate not just whether models produce correct answers, but how they arrive at those conclusions.

Fundamental Differences in Cognitive Architecture

Human reasoning consistently demonstrates hierarchical nesting and meta-cognitive monitoring—the ability to reflect on and regulate one’s own thinking processes. Humans fluidly organize information into nested structures while actively tracking their progress through complex problems.

LLMs predominantly use shallow forward chaining, moving step-by-step through problems without the hierarchical organization or self-reflection that characterizes human cognition. This divergence becomes most pronounced when tasks are ill-structured or ambiguous, where human adaptability significantly outperforms AI approaches.

The study found that language models possess the behavioral components associated with successful reasoning but often fail to deploy them spontaneously. Performance varies dramatically by problem type: dilemma reasoning exhibited the highest variance, with smaller models struggling significantly, while logical reasoning showed moderate performance with larger models generally outperforming smaller ones. Models demonstrate counter-intuitive weaknesses, succeeding on complex tasks while failing on simpler variants.

Performance Improvements Through Guided Reasoning

The research team developed test-time reasoning guidance that automatically scaffolds successful cognitive structures, demonstrating performance improvements up to 66.7% on complex problems when models are prompted to adopt more human-like reasoning approaches. This finding suggests that LLMs possess latent capabilities for more sophisticated reasoning but need explicit guidance to employ them effectively.

The gap between human and AI reasoning grows wider as task complexity increases. While models can handle straightforward problems through forward chaining alone, they struggle with the kind of recursive, self-monitoring strategies humans deploy naturally when facing ambiguous or multi-layered challenges.

The study’s publicly available dataset provides a baseline for future research comparing artificial and human intelligence. By mapping 28 distinct cognitive elements, the framework enables researchers to pinpoint exactly where AI reasoning breaks down rather than simply measuring accuracy scores.

Implications for AI Development

The findings highlight a fundamental limitation in current AI systems: the gap between computational capability and genuine cognitive sophistication. Models trained on massive datasets can pattern-match their way to correct answers on many tasks, but lack the reflective, hierarchical thinking that characterizes human problem-solving.

This research builds on growing concerns about AI reasoning limitations identified across multiple domains. The performance improvement from guided reasoning suggests that better prompting strategies and architectural modifications could help models access their latent reasoning capabilities more effectively.

The study’s most significant contribution may be its detailed taxonomy of cognitive elements, providing researchers and developers with specific targets for improvement. Rather than treating reasoning as a monolithic capability, the framework breaks it into measurable components that can be individually addressed through training modifications or prompt engineering techniques.