A Pragmatic, Production-Grade Framework for Reasoning-Intensive Retrieval
INF-X-Retriever is a high-performance dense retrieval framework developed by INF.
It provides a robust solution for complex reasoning-intensive tasks across a collection of tasks (X) with minimal supervision, prioritizing architectural simplicity, deployment reliability, and deep semantic alignment.
Introduction • Design Principles • Architecture • Performance • Models • Citation • Contact
The evolution of Large Language Models (LLMs) has redefined information retrieval, shifting the paradigm from surface-level keyword matching to intent-aware reasoning. Modern queries in RAG (Retrieval-Augmented Generation) pipelines often contain intricate narrative contexts, logical constraints, and domain-specific directives—elements that act as “semantic noise” for conventional retrieval systems.
INF-X-Retriever addresses this challenge through Intent Distillation. By aligning complex queries into a unified semantic space and executing single-stage dense retrieval, it effectively penetrates surface-level complexity to reach core information. Our approach is validated by its 1st-place ranking on the BRIGHT Benchmark, a rigorous evaluation suite dedicated to reasoning-heavy retrieval scenarios.
Our framework is guided by engineering practicality and first-principles reasoning. We intentionally avoid architectural bloat in favor of production readiness and computational efficiency.
🎯 Core Principle: “Less is More” — Maximal efficacy through deliberate minimalism.
While reranking stages can marginally improve accuracy, they introduce significant latency and operational overhead. In modern RAG pipelines, downstream LLMs already perform implicit context discrimination during synthesis. We demonstrate that a high-fidelity single-stage retriever provides the optimal balance of precision and throughput, significantly simplifying deployment and monitoring in production environments.
Hypothetical Document Embeddings (HyDE) rely on LLMs to generate “pseudo-answers” to guide retrieval. This introduces two critical failure modes:
We eliminate components that introduce fragility or hyperparameter sensitivity:
The framework consists of two tightly integrated, purpose-built components:
inf-retriever-v1 with targeted long-query adaptation.
INF-X-Retriever achieves state-of-the-art results on the BRIGHT Benchmark (as of Dec 20, 2025).
The BRIGHT (Benchmark for Reasoning-Intensive Grounded HT) is a rigorous text retrieval benchmark designed to evaluate the capability of retrieval models in handling questions that require intensive reasoning and cross-document synthesis. Collected from real-world sources such as StackExchange, competitive programming platforms, and mathematical competitions, it comprises complex queries spanning diverse domains like mathematics, coding, biology, economics, and robotics.
Why BRIGHT Matters:
| Model | Avg ALL | StackExchange | Coding | Theorem-based |
|---|---|---|---|---|
| INF-X-Retriever | 63.4 | 68.3 | 55.3 | 57.7 |
| DIVER (v3) | 46.8 | 51.8 | 39.9 | 39.7 |
| BGE-Reasoner-0928 | 46.4 | 52.0 | 35.3 | 40.7 |
| LATTICE | 42.1 | 51.6 | 26.9 | 30.0 |
| ReasonRank | 40.8 | 46.9 | 27.6 | 35.5 |
| XDR2 | 40.3 | 47.1 | 28.5 | 32.1 |
| Model | Avg | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| INF-X-Retriever | 63.4 | 79.8 | 70.9 | 69.9 | 73.3 | 57.7 | 64.3 | 61.9 | 56.1 | 54.5 | 51.9 | 53.1 | 67.9 |
| DIVER (v3) | 46.8 | 66.0 | 63.7 | 42.4 | 55.0 | 40.6 | 44.7 | 50.4 | 32.5 | 47.3 | 17.2 | 46.4 | 55.6 |
| BGE-Reasoner-0928 | 46.4 | 68.5 | 66.4 | 40.6 | 53.1 | 43.2 | 44.1 | 47.8 | 29.0 | 41.6 | 17.2 | 46.5 | 58.4 |
| LATTICE | 42.1 | 64.4 | 62.4 | 45.4 | 57.4 | 47.6 | 37.6 | 46.4 | 19.9 | 34.0 | 12.0 | 30.1 | 47.8 |
| ReasonRank | 40.8 | 62.7 | 55.5 | 36.7 | 54.6 | 35.7 | 38.0 | 44.8 | 29.5 | 25.6 | 14.4 | 42.0 | 50.1 |
| XDR2 | 40.3 | 63.1 | 55.4 | 38.5 | 52.9 | 37.1 | 38.2 | 44.6 | 21.9 | 35.0 | 15.7 | 34.4 | 46.2 |
| Model | Avg | Bio. | Earth. | Econ. | Pony | Psy. | Rob. | Stack. | Sus. |
|---|---|---|---|---|---|---|---|---|---|
| INF-X-Retriever | 54.6 | 73.2 | 59.6 | 69.3 | 12.1 | 74.3 | 55.9 | 27.8 | 64.8 |
| inf-retriever-v1-pro | 30.5 | 44.1 | 42.2 | 31.4 | 0.4 | 43.1 | 20.8 | 21.4 | 41.0 |
Notes:
Both models are released under the Apache-2.0 License for both research and commercial application.
| Component | Hugging Face Repository | Description |
|---|---|---|
| Query Aligner | inf-query-aligner |
LLM-based intent distiller. |
| Retriever | inf-retriever-v1-pro |
Advanced dense embedding model. |
If you utilize INF-X-Retriever in your research or production systems, please cite our work:
@misc{inf-x-retriever-2025,
title = {INF-X-Retriever: A Pragmatic Framework for Reasoning-Intensive Dense Retrieval},
author = {Yichen Yao, Jiahe Wan, Yuxin Hong, Mengna Zhang, Junhan Yang, Zhouyu Jiang, Qing Xu, Kuan Lu, Yinghui Xu, Wei Chu, Emma Wang, Yuan Qi},
year = {2025},
url = {[https://github.com/yaoyichen/INF-X-Retriever](https://github.com/yaoyichen/INF-X-Retriever)},
publisher = {GitHub repository}
}
We welcome inquiries regarding technical deep-dives, strategic collaborations, or large-scale deployment support. Our team is committed to advancing the boundaries of reasoning-intensive retrieval.
Built with precision by the INF Team.