// FILE: methodology.txt

How a council of language models rates politicians.

ratemypolitician is an independent, non-partisan accuracy index. We do not editorialize, endorse, or recommend candidates. We do one thing: take public statements politicians actually made and check whether they're true.

The Council

Every statement is evaluated independently by 7 frontier language models from different labs. Each model is given identical inputs: the verbatim quote, full context, the date and venue. Each returns a verdict and a written justification.

  • InclusionAI Ling 2.6 (1T) · weight 14.3%
  • Tencent Hy3 Preview · weight 14.3%
  • NVIDIA Nemotron 3 Super (120B) · weight 14.3%
  • MiniMax M2.5 · weight 14.3%
  • OpenAI gpt-oss (120B) · weight 14.3%
  • Z.ai GLM 4.5 Air · weight 14.3%
  • Meta Llama 3.3 (70B) · weight 14.3%

The Score

The final Truth Index is the equal-weighted aggregate of the council's verdicts, scaled 0-100 and rounded to the nearest integer. Verdict bands are fixed and apply uniformly:

  • 85-100 · TRUE: supported by primary sources.
  • 70-84 · MOSTLY TRUE: substantively correct with minor caveats.
  • 50-69 · MIXED: partially supported; framing materially affects accuracy.
  • 30-49 · MISLEADING: technically defensible but designed to mislead.
  • 0-29 · FALSE: contradicted by primary sources.

Sourcing

Statements are extracted from news coverage by an LLM and linked back to the original article. Each council member sees the verbatim quote and the article context, and is instructed to weigh primary sources over secondary reporting wherever its training allows.

Disagreement

When models disagree, we publish the disagreement. Every fact-check page lists each model's individual verdict and reasoning. Agreement is reported as a 0-1 ratio. We do not suppress outliers or hide model identities.

What this is not

This is not a measure of a politician's character, competence, voting record, or policy quality. It only measures the empirical accuracy of statements they have personally made in public. A high score means they tell the truth often; it does not mean their policies are good. A low score means the opposite of the first sentence; it does not mean their policies are bad. Read the receipts and decide for yourself.

Limitations

  • Coverage is not uniform. Some officials say more in public than others.
  • Statements are surfaced through news reporting; private statements and floor speeches are under-represented.
  • Models inherit training-data biases; running 7 from different labs reduces but does not eliminate this.
  • Free-tier rate limits sometimes drop a model's vote on a given statement; the council size shown per fact-check is the number that responded.

// EOF · methodology.txt