INF-X-Retriever

A pragmatic, general solution for reasoning-intensive retrieval

INF-X-Retriever is a production-grade dense reasoning retrieval framework developed by INF.
It delivers robust retrieval performance across arbitrary task collections (X) with minimal supervision, emphasizing deployability, reliability, and reasoning depth over architectural complexity.

Introduction • Design Principles • Architecture • Performance • Models • Installation & Quick Start • Evaluation & Reproducibility • Citation • Contact • GitHub

📖 Introduction

Large Language Models (LLMs) have shifted information retrieval from keyword matching to intent-aware reasoning. Modern queries often include narrative context, constraints, and formatting directives—elements that are semantically noisy for conventional retrieval systems.

INF-X-Retriever addresses this shift by performing intent distillation on complex queries and executing single-stage dense retrieval. The approach is validated on the BRIGHT Benchmark, which reflects realistic, reasoning-heavy retrieval scenarios.

💡 Design Principles

Our design emphasizes engineering practicality and first-principles reasoning. We prioritize production readiness, architectural coherence, and computational efficiency.

Pipeline Comparison

🎯 Core Principle: “Less is More” — Maximal efficacy through deliberate minimalism.

▫️ No Rerankers

Reranking stages add latency and operational overhead, while downstream LLMs in RAG pipelines already perform implicit context discrimination during answer synthesis. In production environments, the marginal gains from explicit reranking often do not justify the additional complexity in deployment, monitoring, and maintenance.

Our solution achieves robust performance via a single-stage dense retrieval pipeline, favoring operational simplicity and efficiency.

▫️ No HyDE

Hypothetical Document Embeddings (HyDE) first generate a hypothetical answer with an LLM and then retrieve documents similar to that answer. This introduces methodological risks:

When the LLM already possesses the necessary knowledge, retrieval adds little value.
When the LLM lacks the domain knowledge (the common case for RAG), the generated “hypothetical answer” may be unreliable, steering retrieval toward misleading content.

We therefore perform direct query alignment—extracting core retrieval intent without generating hypothetical content—so that retrieval remains grounded in user requirements and source documents.

Operational Simplicity

We avoid techniques that introduce fragility or unnecessary complexity:

No sparse retrieval (e.g., BM25) — eliminates hybrid fusion complexity and hyperparameter sensitivity
No multi-query expansion — single-pass alignment minimizes latency
No ensemble methods — favors robustness and maintainability

Result: a system that is streamlined, latency-conscious, and transparent for diagnostics in production.

🛠️ Architecture

Our system comprises two tightly integrated components:

Query Aligner

Model: 🤗 inf-query-aligner
Method: Reinforcement Learning fine-tuning on Qwen2.5-7B-Instruct
Function: Semantic intent distillation from verbally complex queries. Performs pure query alignment to extract core retrieval intent, avoiding hypothetical content generation.

Retriever

Model: 🤗 inf-retriever-v1-pro
Method: Continual training on the general-purpose inf-retriever-v1 backbone with targeted long-query adaptation.
Function: Generalized dense retrieval architecture built for cross-task transfer and stability.

INF-X-Retriever Architecture

📊 Performance

Short document

Overall & Category Performance

Model	Avg ALL	StackExchange	Coding	Theorem-based
INF-X-Retriever	63.4	68.3	55.3	57.7
DIVER (v3)	46.8	51.8	39.9	39.7
BGE-Reasoner-0928	46.4	52.0	35.3	40.7
LATTICE	42.1	51.6	26.9	30.0
ReasonRank	40.8	46.9	27.6	35.5
XDR2	40.3	47.1	28.5	32.1

Detailed Results Across 12 Datasets

Model	Avg	Bio.	Earth.	Econ.	Psy.	Rob.	Stack.	Sus.	Leet.	Pony	AoPS	TheoQ.	TheoT.
INF-X-Retriever	63.4	79.8	70.9	69.9	73.3	57.7	64.3	61.9	56.1	54.5	51.9	53.1	67.9
DIVER (v3)	46.8	66.0	63.7	42.4	55.0	40.6	44.7	50.4	32.5	47.3	17.2	46.4	55.6
BGE-Reasoner-0928	46.4	68.5	66.4	40.6	53.1	43.2	44.1	47.8	29.0	41.6	17.2	46.5	58.4
LATTICE	42.1	64.4	62.4	45.4	57.4	47.6	37.6	46.4	19.9	34.0	12.0	30.1	47.8
ReasonRank	40.8	62.7	55.5	36.7	54.6	35.7	38.0	44.8	29.5	25.6	14.4	42.0	50.1
XDR2	40.3	63.1	55.4	38.5	52.9	37.1	38.2	44.6	21.9	35.0	15.7	34.4	46.2

Long document

Detailed Results Across 8 Datasets

Model	Avg	Bio.	Earth.	Econ.	Pony	Psy.	Rob.	Stack.	Sus.
INF-X-Retriever	54.6	73.2	59.6	69.3	12.1	74.3	55.9	27.8	64.8
inf-retriever-v1-pro	30.5	44.1	42.2	31.4	0.4	43.1	20.8	21.4	41.0

Notes:

Results reflect end-to-end retrieval accuracy on BRIGHT under the official evaluation protocol.
Performance may vary with hardware, index size, and dataset versions.

🧪 Models

Query Aligner: inf-query-aligner
Retriever: inf-retriever-v1-pro

Both models are released under Apache-2.0 for research and production use.

📄 License

INF-X-Retriever is released under the Apache-2.0 License.

📝 Citation

If you use INF-X-Retriever in your research or products, please cite:

@misc{inf-x-retriever-2025,
    title        = {INF-X-Retriever},
    author       = {Yichen Yao, Jiahe Wan, Yuxin Hong, Mengna Zhang, Junhan Yang, Zhouyu Jiang, Qing Xu, Kuan Lu, Yinghui Xu, Wei Chu, Yuan Qi},
    year         = {2025},
    url          = {https://yaoyichen.github.io/INF-X-Retriever},
    publisher    = {GitHub repository}
}

📬 Contact

We welcome collaboration and inquiries from researchers and practitioners interested in reasoning-intensive retrieval.

Yichen Yao
Email: eason.yyc@inftech.ai

For technical discussions, collaborations, or deployment questions, please get in touch.