大语言模型如何进行思考(Reasoning)？

Mr.R2025/10/27小于 1 分钟

大语言模型如何进行思考(Reasoning)？

思维链（Chain-of-Thought，CoT）

Short CoT：

Few-shot CoT：给定QA的样例

Zero-shot CoT：Let's think step by step.

short CoT

Long CoT： https://arxiv.org/pdf/2503.09567

Supervised CoT：告诉模型如何think step by step

给模型推论工作流程

通过同一个问题多问几次让模型Explore产生多个output，通过Majority Vote（Self-consistency）或者Confidence（used in CoT decoding）选定正确答案
用其他的语言模型对每个输出的结果进行Verifier(Best-of-N)得到score

LLM输出多个"结果" Parallel or Sequential

Parallel vs. Sequential

Parallel & Sequential

教师模型推理过程（Imitation/Distill Learning）

生成reasoning训练数据

以结果为导向学习推理（Reinforcement Learning，RL）

DeepSeek-v3-base → DeepSeek-R1-Zero