Introduction & Motivation
Introduction & Motivation
Traditional information retrieval operates on a simple paradigm: a user formulates a query, a system returns ranked results, and the user evaluates them. This paradigm, formalized by Robertson and Zaragoza (2009) (Robertson & Zaragoza, 2009) and implemented through systems like BM25 (Robertson et al., 1995) and TF-IDF, has served as the backbone of web search for decades. But this static, one-shot interaction is fundamentally limited for complex information needs that require multi-step reasoning, synthesis across sources, or iterative refinement of understanding [@metzler2021rethinking, @nakano2021webgpt]. A researcher investigating a complex question -- "What are the most promising approaches to carbon capture, and what are their economic viability projections?" -- cannot answer this from a single search query. They must conduct dozens of searches, read and evaluate papers, follow citation chains, cross-reference data from different sources, and synthesize findings into a coherent picture.
Agentic search represents a paradigm shift: instead of returning documents, the system actively pursues answers through multi-step information-seeking strategies. An agentic search system can reformulate queries, follow citation chains, synthesize information from multiple sources, verify claims against evidence, and iteratively refine its understanding -- all autonomously [@singh2025agentic, @wang2024survey]. The distinction between traditional retrieval and agentic search is analogous to the distinction between looking up a word in a dictionary and conducting a research project: the former is a single lookup, the latter is a goal-directed process involving planning, execution, evaluation, and iteration.
The emergence of large language models (LLMs) as capable reasoning engines has made agentic search practically viable. LLMs provide the "brain" that can plan search strategies, interpret results, decide what to search next, and synthesize findings into coherent answers. When augmented with tools -- web search engines, databases, APIs, code executors -- LLMs become search agents that can tackle information needs far beyond the reach of traditional IR systems [@qin2023tool, @schick2023toolformer]. The convergence of four capabilities enables this transformation:
-
Reasoning and planning. Modern LLMs can decompose complex questions into sub-questions, identify what information is needed, and plan multi-step search strategies. Chain-of-thought prompting (Wei et al., 2022), zero-shot reasoning (Kojima et al., 2022), least-to-most decomposition (Zhou et al., 2023), and self-consistency (Wang et al., 2023) have demonstrated that LLMs can perform the kind of structured reasoning needed to navigate complex information landscapes.
-
Tool use. LLMs can learn to invoke external tools -- search engines, calculators, code interpreters, databases -- through in-context learning or fine-tuning [@schick2023toolformer, @qin2023tool]. This gives them the ability to interact with the information ecosystem rather than relying solely on their parametric knowledge.
-
Self-evaluation. LLMs can assess the quality and relevance of retrieved information, identify gaps in their knowledge, and decide when more evidence is needed. This self-monitoring capability enables adaptive search strategies that respond to the quality of initial results.
-
Synthesis. LLMs can combine information from multiple sources into coherent, well-structured answers with proper attribution. This synthesis capability distinguishes agentic search from traditional retrieval, which returns documents rather than answers.
This chapter surveys the landscape of agentic search, from foundational retrieval-augmented generation (RAG) to sophisticated multi-hop reasoning systems, from web browsing agents to search-guided mathematical proof and code generation. We organize the literature by the level of agency: from passive retrieval augmentation (the system retrieves once and generates), through active multi-step search (the system iteratively retrieves and reasons), to fully autonomous search agents (the system plans and executes complex research workflows). Throughout, we emphasize the underlying computational principles -- search as sequential decision-making, the explore-exploit tradeoff in information gathering, and the role of verification in guiding search -- that connect these diverse approaches.
The field is progressing along several axes simultaneously: from simple to complex retrieval strategies (single-shot RAG to multi-hop iterative search), from narrow to broad tool use (search-only to multi-tool agents), from supervised to self-improving systems (human-designed strategies to learned search policies), and from text-only to multi-modal search (document retrieval to web browsing with vision). Understanding the current state of each axis -- and the interactions between them -- is essential for navigating this rapidly evolving landscape.
References
- Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa (2022). Large Language Models are Zero-Shot Reasoners. NeurIPS.
- Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, Mike Gatford (1995). Okapi at TREC-3. TREC.
- Stephen Robertson, Hugo Zaragoza (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval.
- Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou (2023). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ICLR.
- Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS.
- Denny Zhou, Nathanael Scharli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc V. Le, Ed H. Chi (2023). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. ICLR.