INF-X-Retriever

INF-X-Retriever

A Pragmatic, Production-Grade Framework for Reasoning-Intensive Retrieval

Rank Hugging Face Hugging Face GitHub Repo License

INF-X-Retriever is a high-performance dense retrieval framework developed by INF.
It provides a robust solution for complex reasoning-intensive tasks across a collection of tasks (X) with minimal supervision, prioritizing architectural simplicity, deployment reliability, and deep semantic alignment.

Introduction • Design Principles • Architecture • Performance • Models • Citation • Contact


đź“– Introduction

The evolution of Large Language Models (LLMs) has redefined information retrieval, shifting the paradigm from surface-level keyword matching to intent-aware reasoning. Modern queries in RAG (Retrieval-Augmented Generation) pipelines often contain intricate narrative contexts, logical constraints, and domain-specific directives—elements that act as “semantic noise” for conventional retrieval systems.

INF-X-Retriever addresses this challenge through Intent Distillation. By aligning complex queries into a unified semantic space and executing single-stage dense retrieval, it effectively penetrates surface-level complexity to reach core information. Our approach is validated by its 1st-place ranking on the BRIGHT Benchmark, a rigorous evaluation suite dedicated to reasoning-heavy retrieval scenarios.


đź’ˇ Design Principles

Our framework is guided by engineering practicality and first-principles reasoning. We intentionally avoid architectural bloat in favor of production readiness and computational efficiency.

Pipeline Comparison

🎯 Core Principle: “Less is More” — Maximal efficacy through deliberate minimalism.

▫️ Single-Stage Retrieval (No Rerankers)

While reranking stages can marginally improve accuracy, they introduce significant latency and operational overhead. In modern RAG pipelines, downstream LLMs already perform implicit context discrimination during synthesis. We demonstrate that a high-fidelity single-stage retriever provides the optimal balance of precision and throughput, significantly simplifying deployment and monitoring in production environments.

▫️ Direct Query Alignment (No HyDE)

Hypothetical Document Embeddings (HyDE) rely on LLMs to generate “pseudo-answers” to guide retrieval. This introduces two critical failure modes:

  1. Redundancy: If the LLM already knows the answer, retrieval is redundant.
  2. Hallucination Drift: When the LLM lacks domain knowledge (the primary use case for RAG), hypothetical answers can steer the retriever toward misleading content. INF-X-Retriever performs Direct Intent Extraction, ensuring retrieval remains grounded in the actual user requirements and source evidence.

▫️ Operational Simplicity

We eliminate components that introduce fragility or hyperparameter sensitivity:


🛠️ Architecture

The framework consists of two tightly integrated, purpose-built components:

1. Query Aligner

2. Retriever

INF-X-Retriever Architecture


📊 Performance

INF-X-Retriever achieves state-of-the-art results on the BRIGHT Benchmark (as of Dec 20, 2025).

The BRIGHT (Benchmark for Reasoning-Intensive Grounded HT) is a rigorous text retrieval benchmark designed to evaluate the capability of retrieval models in handling questions that require intensive reasoning and cross-document synthesis. Collected from real-world sources such as StackExchange, competitive programming platforms, and mathematical competitions, it comprises complex queries spanning diverse domains like mathematics, coding, biology, economics, and robotics.

Why BRIGHT Matters:

Short document

Overall & Category Performance

Model Avg ALL StackExchange Coding Theorem-based
INF-X-Retriever 63.4 68.3 55.3 57.7
DIVER (v3) 46.8 51.8 39.9 39.7
BGE-Reasoner-0928 46.4 52.0 35.3 40.7
LATTICE 42.1 51.6 26.9 30.0
ReasonRank 40.8 46.9 27.6 35.5
XDR2 40.3 47.1 28.5 32.1

Detailed Results Across 12 Datasets

Model Avg Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT.
INF-X-Retriever 63.4 79.8 70.9 69.9 73.3 57.7 64.3 61.9 56.1 54.5 51.9 53.1 67.9
DIVER (v3) 46.8 66.0 63.7 42.4 55.0 40.6 44.7 50.4 32.5 47.3 17.2 46.4 55.6
BGE-Reasoner-0928 46.4 68.5 66.4 40.6 53.1 43.2 44.1 47.8 29.0 41.6 17.2 46.5 58.4
LATTICE 42.1 64.4 62.4 45.4 57.4 47.6 37.6 46.4 19.9 34.0 12.0 30.1 47.8
ReasonRank 40.8 62.7 55.5 36.7 54.6 35.7 38.0 44.8 29.5 25.6 14.4 42.0 50.1
XDR2 40.3 63.1 55.4 38.5 52.9 37.1 38.2 44.6 21.9 35.0 15.7 34.4 46.2

Long document

Detailed Results Across 8 Datasets

Model Avg Bio. Earth. Econ. Pony Psy. Rob. Stack. Sus.
INF-X-Retriever 54.6 73.2 59.6 69.3 12.1 74.3 55.9 27.8 64.8
inf-retriever-v1-pro 30.5 44.1 42.2 31.4 0.4 43.1 20.8 21.4 41.0

Notes:


đź§Ş Models

Both models are released under the Apache-2.0 License for both research and commercial application.

Component Hugging Face Repository Description
Query Aligner inf-query-aligner LLM-based intent distiller.
Retriever inf-retriever-v1-pro Advanced dense embedding model.

📝 Citation

If you utilize INF-X-Retriever in your research or production systems, please cite our work:

@misc{inf-x-retriever-2025,
    title        = {INF-X-Retriever: A Pragmatic Framework for Reasoning-Intensive Dense Retrieval},
    author       = {Yichen Yao, Jiahe Wan, Yuxin Hong, Mengna Zhang, Junhan Yang, Zhouyu Jiang, Qing Xu, Kuan Lu, Yinghui Xu, Wei Chu, Emma Wang, Yuan Qi},
    year         = {2025},
    url          = {[https://github.com/yaoyichen/INF-X-Retriever](https://github.com/yaoyichen/INF-X-Retriever)},
    publisher    = {GitHub repository}
}

📬 Contact

We welcome inquiries regarding technical deep-dives, strategic collaborations, or large-scale deployment support. Our team is committed to advancing the boundaries of reasoning-intensive retrieval.


Built with precision by the INF Team.