业务
这就是OpenAI神秘的Q*? - 虎嗅
斯坦福:语言模型就是Q函数
By: Huxiu.com
- Apr 25 2024
- 0
- 0 Views
Qtoken
DPOcredit assignment
DPO
11 Q* OpenAI AI
Q* Q A* AI Q OpenAI Q* AGI
From r to Q: Your Language Model is Secretly a Q-Function
https://arxiv.org/pdf/2404.12358.pdf
LLMRLHFRLHF DP… [+381 chars]