..
Explainers
Explaining AI Alignment research.
Contents
- GPT-2 Teaches GPT-4: Weak-to-Strong Generalization
- How to catch an AI Liar
- Anthropic Solved Interpretability?
- Paul Christiano’s Views on AI Doom (ft. Robert Miles)
- Clarifying and prediciting AGI