Beyond the Scorecard — Spectralgraph

01 — Domain Profiling

You're choosing the wrong models.

Every AI benchmark tests whether models can pass exams. None test whether models fabricate when they don't know. Our profiling reveals which domains a model actually knows and which it confidently fakes.

Spectralgraph domain profiler

Model: Phi-4 14B

02 — Real-Time Detection

Hallucinations are catchable. In real time.

When an AI fabricates medical dosages or invents legal precedents, nobody knows until a human reads it. Fabrication has a geometric shape. We can see it at every token.

Live hallucination monitor

What is the recommended dosage of Celtrazine for pediatric patients with acute bronchial inflammation?

Token DR signal

—

03 — Pre-Generation Gate

We can tell if a model knows the answer before it opens its mouth.

The strongest signal happens while the model reads your question — before a single word is generated. Fifteen times more signal. We gate it and prevent fabricated responses from ever being generated.

Pre-generation gate

KNOWN TOPIC

What is the role of mitochondria in cellular respiration?

1

Reading prompt tokens...

2

Computing pre-gen DR...

3

Gate decision

✓ PASSED — DR = −0.142
generating response

FABRICATED TOPIC

Explain the Brevington coefficient in quantum fluid dynamics.

1

Reading prompt tokens...

2

Computing pre-gen DR...

3

Gate decision

✕ BLOCKED — DR = +0.087
fabrication detected, generation halted

04 — Continuous Monitoring

Models change. Nobody notices.

A provider pushes an update. A fine-tune shifts behavior. Your model was reliable last month — is it still? Continuous measurement makes drift visible the moment it happens.

Drift monitor

05 — Safety Geometry

Safety guardrails leave a measurable geometric signature.

When safety training activates, the probability distributions look fundamentally different from both normal operation and fabrication — 235 times more texture in the geometry.

Geometric signature comparison

Normal response — PR per token

Safety refusal — PR per token (235× texture)

06 — Reverse Projection

You can see what the model is thinking. From the outside.

From output probabilities alone, we reconstruct which concepts the model is weighing. Regulators asking “why did the AI say that?” get an answer without opening the black box.

Transparent hypercube — concept activation

PROMPT TOKENS — CLICK TO INSPECT

ACTIVATED CONCEPTS (RECONSTRUCTED FROM LOGITS)

07 — Fleet Analysis

One failing model is a bug. A fleet of them is a crisis.

When organizations deploy multiple AI models, correlated failures compound. If three models share the same blind spot, the system has synchronized risk, not redundancy.

Fleet vulnerability scanner — pharmaceutical interactions

⚠ CORRELATED BLIND SPOT — 3/5 deployed models fabricate on pharmaceutical interaction queries. Synchronized failure risk: HIGH.

08 — Memorization Spectrum

The geometry of memorization is visible.

The same measurement that distinguishes knowledge from fabrication reveals a continuous spectrum — from pure fabrication to verbatim reproduction.

Content relationship analyzer

          GeneratedFamiliarMemorizedVerbatim