Live · api.qear.ai

Know not just if your AI is wrong, but why.

Other APIs give you a generic confidence score. QEAR gives you a diagnosis. Five uncertainty classes. Plain-language explanations. A recommended action for every flagged response.

A score is a hint.
A diagnosis is an action.

Production AI fails in different ways. Sometimes the model is confidently wrong. Sometimes it's honestly admitting it doesn't know. Sometimes the answer is right but phrased differently than your expected answer.

These need different actions. Most APIs collapse them into one number. QEAR doesn't. Every response includes one of five uncertainty classes, a plain-language diagnosis, and a recommended action.

POST api.qear.ai/v1/verify
# Customer gave the wrong year
curl https://api.qear.ai/v1/verify \
  -H "Authorization: Bearer qe_..." \
  -d '{
    "prompt": "What year did Napoleon die?",
    "answer": "1924"
  }'
Response — the diagnosis
{
  "confidence": 0.5,
  "verdict": "low_confidence",
  "uncertainty_class": "factual_disagreement",
  "diagnosis": "Your answer '1924' contradicts
    the consensus derived from 5 candidates.
    The model is confident the correct answer
    is different.",
  "recommended_action": "Your answer is likely
    wrong. Verify externally.",
  "consensus_answer": "May 5, 1821"
}

Every kind of "I'm not sure" looks different.

Each class returns a distinct diagnosis and a specific recommended action — so you can route, retry, refuse, or trust.

01none

All candidates agreed

High confidence in the consensus answer. Trust the result and ship it.

02surface_variation

Same fact, different phrasing

Candidates split on wording but agree on substance. Normalization resolved it; trust the result.

03factual_disagreement

Candidates contradict each other

Different dates, names, or values for the same fact. Do not trust — verify externally or escalate to human review.

04knowledge_gap

The model honestly doesn't know

Most candidates refused to answer or said the question is unanswerable. Trust the refusal.

05degenerate_default

The model is hedging, not answering

Candidates returned defaults (0, N/A, none). The model is producing convention rather than knowledge.

Built for factual AI outputs. Honest about the rest.

QEAR is built for
  • RAG outputs — verify answers from retrieval-augmented systems are grounded, not fabricated
  • LLM Q&A in production — chatbots, support agents, knowledge bases
  • AI-generated customer replies — flag low-confidence responses before they reach users
  • Research & fact-checking — know which AI claims need human verification
  • Agentic systems — catch uncertain decisions before they compound across steps
  • Short factual answers — works best on focused, single-claim outputs (1–3 sentences)
Not built for (yet)
  • Code correctness — use AST analysis or execution-based testing instead
  • Long-form documents — multi-claim essays need claim-level decomposition (on our roadmap)
  • Mathematical proofs — use symbolic verification tools
  • Image / video / audio — different problem, different tools
  • Real-time streaming — verification adds 1–3 seconds; best for async or pre-send checks

We'd rather tell you the truth than sell you a tool that fails on your use case. Code verification is on our roadmap — join the waitlist.

Confidence scoring is a commodity. Diagnosis is a category.

Feature QEAR Competitor APIs Native logprobs
Numeric confidence score Yes Yes Token-level
Uncertainty classification 5 classes No No
Human-readable diagnosis Yes No No
Recommended action per response Yes No No
Adaptive compute routing Yes No No
Model agnostic Groq + OpenAI + Anthropic Varies Proprietary
Peer-reviewed methodology Nature 2024 Proprietary Standard

Not a wrapper.
A method.

QEAR extends the semantic entropy methodology from Farquhar et al. (Nature, 2024) with three original engineering contributions, validated across three model scales:

  • Entropy-gated adaptive compute allocationUncertain queries automatically escalate to more samples or a stronger model.
  • Three-tier NLI cascadeMost comparisons resolve via deterministic string normalization before any model fires.
  • Quality-aware candidate selectionReturns the best representative of the majority cluster, not just any.
Accuracy
86.8%
vs 81.2% baseline (p < 0.005)
Latency
~400ms
Cloudflare global edge
Validated AUC
0.83
325-question benchmark at 32B
Classes
5
Each with a recommended action

Free to start. Honest pricing.

Start free with 1,000 verifications a month. Pay only when you need more. No surprise fees. No "contact sales" for tiers under enterprise.

Free

$0/mo
  • 1,000 verifications / month
  • 200 generations / month
  • Groq Llama models
  • Full diagnostic output
Start free

Indie

$19/mo
  • 25,000 verifications / month
  • 5,000 generations / month
  • Groq Llama models
  • Email support
Choose Indie

Scale

$299/mo
  • 500,000 verifications / month
  • 150,000 generations / month
  • Custom NLI endpoints
  • SLA + dedicated support
Choose Scale

Write five lines of code. Or write none.

For developers

The API

Add QEAR to your AI pipeline with a single HTTP call. Works in any language — Python, Node, Go, Ruby, anything that speaks HTTP.

# Add 5 lines to your existing app
r = requests.post(
  "https://api.qear.ai/v1/verify",
  headers={"Authorization": f"Bearer {KEY}"},
  json={"prompt": q, "answer": ai_answer}
).json()
# → r["uncertainty_class"], r["diagnosis"]
Get API key
For everyone else

No-code verify

Not a developer? Paste your question and the AI's answer into a web page and get the same diagnosis instantly. No code, no integration. Perfect for double-checking ChatGPT or Claude on important work.

Question: When was the GDPR enacted?
AI's answer: 2018
⚠ verify factual_disagreement — candidates split between 2016 (adopted) and 2018 (enforced)
Open no-code verify

Two ways to start. Both free.

Sign up for an API key in 30 seconds. Or try QEAR in your browser with no signup at all. Either way — see the diagnosis in action.

1,000 free verifications / month No credit card Magic-link signup, no password