Traditional RAG pipelines rely on similarity-based re-ranking with arbitrary top-k cutoffs: opaque, rigid, and vulnerable to adversarial content. METEORA replaces the re-ranking step entirely: a preference-tuned LLM generates query-conditioned rationales that guide evidence selection, explain every decision, and power a verifier that filters poisoned or misleading chunks before they reach the generator.
In sensitive domains like law, finance, and academic research, RAG errors don't just mislead. They invite lawsuits, undermine scholarly credibility, and breach compliance. Current pipelines have three fundamental gaps.
Re-rankers return opaque similarity scores with no justification for why particular evidence was chosen. Stakeholders cannot trace why a model selected a specific clause, figure, or finding.
The value of k is a heuristic that must be hand-tuned per dataset and query type. Too few chunks omit critical context; too many introduce noise that degrades generation quality.
Injecting a single semantically coherent but factually incorrect chunk is enough to corrupt generation. Similarity-based methods have no mechanism to detect or remove adversarial content.
Phase one preference-tunes an LLM to produce query-aligned rationales via DPO. No manual annotation required; preference pairs are built automatically from existing QA annotations. Phase two uses those rationales to select, adaptively threshold, expand, and verify evidence before generation.
A general-purpose LLM is fine-tuned using Direct Preference Optimization. Rationales that lead to correct evidence selection are positive samples; others are negative. Preference pairs are constructed automatically from existing QA annotations with no manual labeling needed.
Each generated rationale is encoded and paired with its most similar evidence chunk via cosine similarity. This local pairing ensures each rationale captures a distinct facet of what the answer requires, boosting precision.
A pooled rationale embedding scores all chunks globally. First-order similarity differences are z-score normalized; the first statistically significant drop defines cutoff k*. No fixed top-k is required as the threshold adapts to each query.
Each selected chunk is expanded by including adjacent chunks. This recovers evidence that spans document boundaries and would otherwise be lost to chunking artifacts. Final evidence: Es = Ev + Eg + Ew.
A Verifier LLM checks each chunk against the rationales, flagging factual violations, contradictions with other verified evidence, and instruction violations. Flagged chunks are discarded before generation.
Only verified evidence reaches the generator. Because rationales are used consistently across selection and verification, users can trace which rationale selected which chunk, and why it influenced the final answer.
Each dataset provides QA pairs, lengthy reference documents, and human-annotated evidence spans serving as ground truth for precision and recall evaluation.
| Dataset | Domain | Documents | Avg tokens/doc | QA pairs | Description |
|---|---|---|---|---|---|
| ContractNLI | Legal | 95 | 10,673 | 946 | NDA clause entailment, law experts mark exact clauses needed for reasoning |
| CUAD | Legal | 462 | 55,827 | 4,042 | Commercial contracts, 41 clause categories annotated by legal professionals |
| MAUD | Legal | 150 | 351,476 | 1,676 | M&A merger agreements, corporate attorneys identify sections addressing acquisition terms |
| PrivacyQA | Legal | 7 | 25,266 | 194 | Consumer app privacy policies, privacy specialists identify relevant disclosure sections |
| FinQA | Finance | 2,789 | ~700 | 8,281 | Financial reports requiring numerical reasoning, analysts mark exact tables and figures |
| QASPER | Academic | 1,585 | ~6,500 | 5,000+ | NLP research papers, domain scientists identify minimal sentences for accurate answers |
Evaluated against standard re-ranking baselines. For fair comparison, baselines receive the same number of evidence chunks that METEORA selects on average. Adversarial defense is compared against perplexity-based filtering from Zhou et al. (2024).
Higher avg recall across all six datasets
Higher precision without context expansion
Evidence needed to match baseline recall
Downstream generation accuracy improvement
F1 gain over perplexity defense under poisoning
Install from the GitHub repository, then swap MeteoraReranker in wherever you call your existing re-ranker. DPO fine-tuning adapts the rationale generator to any domain using existing QA annotations.
import os, sys, shutil, subprocess, importlib
os.chdir("/content")
shutil.rmtree("/content/METEORA", ignore_errors=True)
subprocess.run(["git", "clone", "https://github.com/YashSaxena21/METEORA.git", "/content/METEORA"], check=True)
os.chdir("/content/METEORA")
subprocess.run([sys.executable, "-m", "pip", "install", "-q", "/content/METEORA[hf]"], check=True)
from meteora import HFRationaleGenerator, MeteoraReranker
from meteora import HFRationaleGenerator, MeteoraReranker
rationale_generator = HFRationaleGenerator(
model_name,
sample_shots=sample_shots,
domain="commercial contracts",
num_rationales=4,
max_new_tokens=256,
torch_dtype="float16",
device_map="auto",
)
reranker = MeteoraReranker(encoder, rationale_generator=rationale_generator)
# Returns only evidence that survives rationale-guided selection + verification
selected_docs = reranker.filter(query, candidate_documents)
# Build preference pairs from existing QA annotations, no manual labeling
meteora dpo-prepare \
--input data/preference_examples.json \
--sample-shots sample_shots.json \
--output-dir data/dpo \
--domain "commercial contracts"
# Fine-tune the rationale generator for your domain
meteora dpo-train \
--train data/dpo/train.jsonl \
--validation data/dpo/validation.jsonl \
--model path-or-hf-id \
--output-dir models/meteora-rationale-dpo \
--torch-dtype float16
@misc{saxena2026rankingfreeragreplacing,
title={Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains},
author={Yash Saxena and Ankur Padia and Mandar S Chaudhary and Kalpa Gunaratna and Srinivasan Parthasarathy and Manas Gaur},
year={2026},
eprint={2505.16014},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.16014},
}