Connections to Other Chapters
Connections to Other Chapters
World Models (Chapter 2): World models and agentic search share the fundamental principle of model-based planning -- using an internal model to simulate outcomes before committing to actions.
- Planning as tree search: MuZero-style planning with learned world models is structurally identical to tree search in agentic reasoning: both explore a tree of possibilities, evaluate states, and select the most promising path. Tree-of-Thought (Yao et al., 2023), RAP (Hao et al., 2023), and LATS (Zhou et al., 2024) use the LLM itself as a "world model" that predicts the consequences of reasoning steps.
- Information world models: Agentic search can be viewed as building a world model of the information landscape -- understanding what information exists, where it is located, how it is structured, and how to access it efficiently. The dynamics model in world model RL (predicting the next state given an action) is analogous to the retrieval model in agentic search (predicting what information will be returned given a query).
- LLM-as-world-model: The LLM-as-world-model hypothesis (Hao et al., 2023) (Chapter 2) connects directly to agentic search: if LLMs have implicitly learned world dynamics through text prediction, their chain-of-thought reasoning can be viewed as step-by-step simulation using this implicit world model. The quality of the LLM's internal "world model" determines its effectiveness as a search planner.
- Simulation for planning: Video world models that predict visual futures are analogous to search agents that predict what evidence will be found -- both use simulation to reduce the cost of exploration.
Efficient Architectures (Chapter 3): The computational cost of agentic search is dominated by LLM inference calls, making efficiency a first-class concern:
- Inference cost amplification: An agent that issues 100 queries to an LLM during a single research task amplifies the cost of each query by the fan-out factor, making inference optimization (speculative decoding, KV-cache management, quantization from Chapter 3) directly impact the economic viability of agentic search.
- Efficient embedding models: The retrieval component of RAG systems depends on efficient embedding models for dense retrieval. The bi-encoder vs. cross-encoder tradeoff (ColBERT's late interaction (Khattab & Zaharia, 2020) as a middle ground) directly affects search latency. Modern embedding models (E5, GTE, BGE) trained with contrastive learning provide better retrieval quality at manageable cost.
- KV-cache reuse: Multi-turn agentic conversations can reuse KV-cache across turns (RadixAttention in SGLang), connecting serving system design to agentic workflow efficiency. For agents that maintain long conversations with tool calls, KV-cache management determines whether the system can maintain context or must re-process.
- MoE for multi-capability agents: MoE models are particularly relevant for agentic systems that must switch between different modes (retrieval, reasoning, generation, tool use), as different experts can specialize in different modes without proportional compute cost.
- Batch inference for search: The tree search methods in agentic reasoning require efficient batch inference for parallel candidate evaluation. Continuous batching and speculative decoding (Chapter 3) directly impact the throughput of search-based reasoning.
Continual Learning (Chapter 1): Search agents that operate over extended periods face classic continual learning challenges:
- Continual retrieval: Retrieval indices must be updated as new documents are added or existing ones change, creating a continual learning challenge for the retrieval component -- the retriever must incorporate new documents without degrading retrieval quality for existing queries. This is essentially a class-incremental learning problem applied to retrieval (Lange et al., 2021).
- Knowledge consistency: Knowledge editing in LLMs (ROME (Meng et al., 2022), MEMIT (Meng et al., 2023) from Chapter 1) connects to RAG systems where the model's parametric knowledge must be kept consistent with retrieved evidence. When parametric and retrieved knowledge conflict, the system must decide which to trust -- a problem that Self-RAG (Asai et al., 2024) addresses through self-reflection tokens.
- Strategy accumulation: Self-improving search agents that learn from their search experiences (Reflexion (Shinn et al., 2023), Voyager (Wang et al., 2023)) face the classic stability-plasticity tradeoff: how to accumulate effective search strategies without forgetting strategies that worked for previous types of questions. The episodic memory in Reflexion functions as a replay buffer that must be managed to avoid interference.
- Continual alignment: As search agents are continually updated with new capabilities, maintaining alignment properties (safety, truthfulness, helpfulness) across updates is a continual learning problem addressed by the techniques in Chapter 1.
Randomized Algorithms (Chapter 5): Approximate nearest neighbor search -- the computational backbone of dense retrieval in RAG systems -- relies fundamentally on randomized data structures:
- ANN search infrastructure: Locality-sensitive hashing enables sub-linear retrieval from billion-scale document collections. HNSW (Hierarchical Navigable Small World) graphs (Malkov & Yashunin, 2020) use randomized graph construction for efficient approximate nearest neighbor search, and FAISS (Johnson et al., 2021) provides GPU-accelerated implementations for billion-scale vector search. The retrieval quality-speed tradeoff is directly determined by the approximation parameters of these randomized data structures.
- Bloom filters for deduplication: Bloom filters enable efficient membership testing for deduplication of retrieved documents, preventing redundant processing in multi-step search. Agentic search systems that retrieve from large corpora use Bloom filters to track which documents have already been retrieved.
- Sketching for efficiency: Count-Min sketches can track query frequency distributions for cache optimization in retrieval systems. The streaming algorithms perspective from Chapter 5 provides the theoretical framework for processing retrieval results in a single pass with bounded memory, directly applicable to online index updates.
- Random projections for embedding: Random projections reduce the dimensionality of embedding spaces for efficient indexing, with the JL lemma guaranteeing that pairwise distances (and thus retrieval quality) are approximately preserved.
References
- Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi (2024). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ICLR.
- Shibo Hao, Yi Gu, Haodi Ma (2023). Reasoning with Language Model is Planning with World Model. EMNLP.
- Jeff Johnson, Matthijs Douze, Herve Jegou (2021). Billion-scale Similarity Search with GPUs. IEEE TBD.
- Omar Khattab, Matei Zaharia (2020). ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR.
- Matthias De Lange, Rahaf Aljundi, Marc Masana (2021). A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE TPAMI.
- Yu A. Malkov, Dmitry A. Yashunin (2020). Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE TPAMI.
- Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov (2022). Locating and Editing Factual Associations in GPT. NeurIPS.
- Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau (2023). Mass-Editing Memory in a Transformer. ICLR.
- Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS.
- Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv.
- Shunyu Yao, Dian Yu, Jeffrey Zhao (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS.
- Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman (2024). Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. ICML.