We measure whether AI models retrieve facts or fabricate them — across 15 domains, from foundational to frontier difficulty.
Reliability: 0–100, composite of reading and writing accuracy · Reading: pre-generation signal · Writing: generation signal