稀疏特征电路 Sparse Feature Circuits
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (ICLR ’25)
阶段性模型差异比较
Stage-Wise Model Diffing 阶段性模型差异比较