Week 1
- https://docs.google.com/presentation/d/1inmAImDbD2MJtZJDTZhrhnvwcvlaoI0PQy1v2nadO7U/edit?slide=id.g3b2d0c429d3_0_0#slide=id.g3b2d0c429d3_0_0
- Mechanistic Interpretability
- Grown Rather Than Crafted
- CoT
- T-distributed Stochastic Neighbor Embedding
- Inceptionism
- Activation Atlas
- Refusal Vector
- Causal Intervention
- Circuit Tracing
- MLP
- Representation Space
- Logistic Regression Model
- Probe
- Resnet
Week 2
Week 3
Week 4
- Attention-specific behaviors