..

Explainers

Explaining AI Alignment research.

Contents

GPT-2 Teaches GPT-4: Weak-to-Strong Generalization

How to catch an AI Liar

Anthropic Solved Interpretability?

Paul Christiano’s Views on AI Doom (ft. Robert Miles)

Clarifying and prediciting AGI