#Alignment

卧底特工 Sleeper Agents

Sleeper Agents: 训练能在安全训练中持续欺骗的大语言模型

Posted on Fri, Jan 10, 2025 📖 Note LLM Alignment