稀疏特征电路 Sparse Feature Circuits
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (ICLR ’25)
干预,在何处及如何进行?大规模非线性 SCM 的主动因果发现
Interventions, Where and How? Experimental Design for Causal Models at Scale