#Interpretability

稀疏特征电路 Sparse Feature Circuits

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (ICLR ’25)

Posted on Tue, Jan 14, 2025 📖 Note LLM Interpretability Causal

阶段性模型差异比较

Stage-Wise Model Diffing 阶段性模型差异比较

Posted on Thu, Jan 2, 2025 📖 Note LLM Interpretability