In cognitive science, human thought processes are commonly divided into two systems: the fast, intuitive System 1 and the slower, analytical System 2. Recent research has shown that incorporating System 2-style processing into Transformers, including large language models (LLMs), can significantly improve their reasoning abilities. However, models that fully emulate System 2 thinking tend to demand substantial computational resources and exhibit slower response times.
In a new paper Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces, a Meta research team presents Dualformer, a single Transformer model that merges both fast and slow reasoning modes within a unified framework.
Dualformer demonstrates improved performance and computational efficiency over baseline models. Surprisingly, this work reveals that a simple data preparation approach is sufficient to enable dynamic switching between System 1 and System 2-style thinking for various reasoning tasks. Dualformer can be configured to operate in either fast or slow mode during inference, or it can decide autonomously which mode to adopt.
To simulate System 2 reasoning, Dualformer is trained with data that includes both intermediate reasoning steps and final solutions. By leveraging this structured reasoning process, the researchers designed specific trace-dropping strategies to simplify the reasoning steps, creating a shortcut-like, System 1 approach. This training strategy, termed “randomized reasoning traces,” selectively omits elements of the reasoning sequence to mimic the rapid shortcuts associated with System 1 thinking.
The researchers developed four levels of trace-dropping strategies to structure this simplification process:
- Level 1 removes all “close” clauses from a reasoning trace.
- Level 2 builds on Level 1 by also dropping cost-related tokens.
- Level 3 further simplifies the trace by randomly removing 30% of the “create” clauses.
- Level 4 eliminates the entire reasoning trace, leaving only the solution.
Each level of dropping encourages Dualformer to learn progressively more efficient reasoning shortcuts. For instance, Level 1 simplifies the A* search by bypassing close-set calculations, while Level 2 omits both close-set and cost calculations. Levels 3 and 4 go further, teaching Dualformer to skip portions or all of the reasoning steps, aligning it more closely with a System 1 processing approach.
In essence, Dualformer is trained on randomized traces where selected parts of the reasoning process are omitted. This tailored trace-dropping strategy is akin to examining and streamlining the thinking process through systematic shortcuts. During inference, Dualformer can produce only the final solutions (fast mode), output both the reasoning chain and final solution (slow mode), or determine the appropriate mode automatically (auto mode).
Dualformer consistently outperforms baseline models in both reasoning efficacy and computational efficiency:
- Slow Mode: Dualformer achieves a 97.6% success rate on unseen 30 × 30 maze navigation tasks, outperforming the Searchformer baseline (93.3%) and reducing reasoning steps by 45.5%.
- Fast Mode: Dualformer completes tasks with an 80% optimal rate, surpassing the Solution-Only model’s 30% optimal rate.
- Auto Mode: Dualformer reaches an optimal rate of 96.6%, using 59.9% fewer reasoning steps than Searchformer.
Overall, Dualformer delivers superior performance in complex tasks, such as maze navigation and Sokoban puzzles, while reducing computational requirements by streamlining reasoning steps and shortening input sequences.
The paper Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces is on arXiv.
Author: Hecate He | Editor: Chain Zhang
The post Meta’s Dualformer: Bridging Fast and Slow Thinking in Transformers for Superior AI Reasoning first appeared on Synced.