Taxonomy of Approaches
Continual learning methods can be organized into five major families [@parisi2019continual, @delange2021continual, @masana2023class, @wang2024comprehensive]:
-
Regularization-based methods add penalty terms to the loss function that discourage changes to parameters important for previous tasks. This family includes weight regularization (EWC (Kirkpatrick et al., 2017), SI (Zenke et al., 2017), MAS (Aljundi et al., 2018)) and functional regularization (LwF (Li & Hoiem, 2017), PODNet (Douillard et al., 2020), knowledge distillation (Hinton et al., 2015)). Regularization methods are memory-efficient (no data storage needed) but suffer from capacity saturation on long task sequences -- as more tasks are learned, the feasible region of parameter space that satisfies all constraints shrinks, eventually leaving insufficient capacity for new learning (Hsu et al., 2018).
-
Replay-based methods store or generate exemplars from previous tasks and interleave them with new task data during training. This family includes experience replay with stored exemplars (ER (Chaudhry et al., 2019), GEM (Lopez-Paz & Ranzato, 2017), DER++ (Buzzega et al., 2020)), generative replay using learned generative models (DGR (Shin et al., 2017), DDGR (Gao, 2023)), and compressed replay approaches (REMIND (Hayes et al., 2020)). Replay methods are currently the dominant paradigm, consistently achieving state-of-the-art results across settings [@buzzega2020dark, @boschini2022class], but require storage of previous data which may conflict with privacy or memory constraints.
-
Architecture-based methods allocate dedicated parameters or subnetworks for each task, preventing interference by design. This family includes parameter isolation (PackNet (Mallya & Lazebnik, 2018), HAT (Serra et al., 2018), SupSup (Wortsman et al., 2020)), dynamic expansion (PNN (Rusu et al., 2016), DEN (Yoon et al., 2018), FOSTER (Wang et al., 2022)), and modular networks (Mendez & Eaton, 2022). Architecture methods achieve zero forgetting by construction but face challenges in scaling to many tasks and enabling backward transfer.
-
Meta-learning-based methods learn to learn continually, optimizing for the ability to quickly adapt to new tasks while retaining old knowledge. This family includes optimization-based meta-learning (OML (Javed & White, 2019), La-MAML (Gupta et al., 2020), ANML (Beaulieu et al., 2020)) and metric-based approaches, building on the broader meta-learning framework [@hospedales2021metalearning, @finn2017model]. Meta-learning methods can achieve strong forward transfer but are computationally expensive during meta-training.
-
Prompt-based methods represent a new paradigm enabled by pre-trained vision transformers and large language models. Instead of modifying model weights, these methods learn task-specific prompts (small learnable parameters prepended to the input or hidden layers) while keeping the pre-trained backbone frozen. This family includes L2P (Wang et al., 2022), DualPrompt (Wang et al., 2022), S-Prompts (Wang et al., 2022), and CODA-Prompt (Smith et al., 2023). Prompt-based methods achieve strong performance with minimal forgetting, as the backbone remains unchanged, but depend on the quality and generality of the pre-trained model.
These families are not mutually exclusive; many state-of-the-art methods combine elements from multiple families. For instance, DER++ combines replay with knowledge distillation (functional regularization) (Buzzega et al., 2020); FOSTER combines dynamic architecture expansion with knowledge distillation (Wang et al., 2022); Co2L combines contrastive replay with asymmetric knowledge distillation (Cha et al., 2021); and GPM (Saha et al., 2021) combines gradient projection with experience replay. The trend in recent work is toward hybrid methods that leverage complementary strengths -- regularization alone cannot prevent forgetting at scale, replay alone does not optimize the use of limited buffer capacity, and architecture methods alone do not enable backward transfer.
Comparative Analysis Across Settings
The relative effectiveness of these families varies dramatically across settings [@vandeven2019three, @masana2023class]. In Task-IL (task identity provided at test time), regularization methods like EWC perform well because they can use task-specific output heads, reducing the problem to preserving shared representations. In Class-IL (no task identity), regularization methods fail dramatically because they cannot maintain a calibrated global decision boundary, and replay-based methods dominate. In Domain-IL, both regularization and replay approaches are effective. This setting-dependence means that method claims must always be qualified by the evaluation setting -- a method that "solves" continual learning in Task-IL may completely fail in Class-IL (Ven & Tolias, 2019).
The Role of Pre-Training
A major shift in the continual learning landscape has been the move from learning representations from scratch to adapting pre-trained models. This shift is significant because pre-trained models already encode rich, general-purpose representations, fundamentally changing the nature of the continual learning problem. With a strong pre-trained backbone:
- Forgetting is reduced because the pre-trained features are already broadly useful, and fine-tuning from a good initialization tends to stay closer to it (Mehta et al., 2023).
- The capacity problem is alleviated because the model starts with representations that are useful across many tasks, rather than having to carve out capacity for each new task.
- New method families become possible (prompt-based, adapter-based) that were not feasible without pre-training.
This has led some researchers to argue that continual learning with pre-trained models is a fundamentally different problem from continual learning from scratch (Kim et al., 2023), requiring different methods, benchmarks, and evaluation protocols.
References
- Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars (2018). Memory Aware Synapses: Learning What (not) to Forget. ECCV.
- Shawn Beaulieu, Lapo Frati, Thomas Miconi (2020). Learning to Continually Learn. ECAI.
- Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, Simone Calderara (2020). Dark Experience for General Continual Learning: a Strong, Simple Baseline. NeurIPS.
- Hyuntak Cha, Jaeho Lee, Jinwoo Shin (2021). Co2L: Contrastive Continual Learning. ICCV.
- Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Robert Ajemian, Nicholas Piesco (2019). On Tiny Episodic Memories in Continual Learning. arXiv.
- Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, Eduardo Valle (2020). PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning. ECCV.
- Rui Gao (2023). DDGR: Continual Learning with Deep Diffusion-based Generative Replay. ICML.
- Gunshi Gupta, Karmesh Yadav, Liam Paull (2020). La-MAML: Look-ahead Meta Learning for Continual Learning. NeurIPS.
- Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, Christopher Kanan (2020). REMIND Your Neural Network to Prevent Catastrophic Forgetting. ECCV.
- Geoffrey Hinton, Oriol Vinyals, Jeff Dean (2015). Distilling the Knowledge in a Neural Network. NeurIPS Workshop.
- Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, Zsolt Kira (2018). Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines. NeurIPS CL Workshop.
- Khurram Javed, Martha White (2019). Meta-Learning Representations for Continual Learning. NeurIPS.
- Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann (2023). Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning. CVPR.
- James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz (2017). Overcoming Catastrophic Forgetting in Neural Networks. PNAS.
- Zhizhong Li, Derek Hoiem (2017). Learning without Forgetting. IEEE TPAMI.
- David Lopez-Paz, Marc'Aurelio Ranzato (2017). Gradient Episodic Memory for Continual Learning. NeurIPS.
- Arun Mallya, Svetlana Lazebnik (2018). PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning. CVPR.
- Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell (2023). An Empirical Investigation of the Role of Pre-training in Lifelong Learning. JMLR.
- Jorge A. Mendez, Eric Eaton (2022). Lifelong Learning with Modular and Compositional Knowledge. ICML.
- Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins (2016). Progressive Neural Networks. arXiv.
- Gobinda Saha, Isha Garg, Kaushik Roy (2021). Gradient Projection Memory for Continual Learning. ICLR.
- Joan Serra, Didac Suris, Marius Miron, Alexandros Karatzoglou (2018). Overcoming Catastrophic Forgetting with Hard Attention to the Task. ICML.
- Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim (2017). Continual Learning with Deep Generative Replay. NeurIPS.
- James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira (2023). CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning. CVPR.
- Gido M. van de Ven, Andreas S. Tolias (2019). Three Scenarios for Continual Learning. NeurIPS Continual Learning Workshop.
- Fu-Yun Wang, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan (2022). FOSTER: Feature Boosting and Compression for Class-Incremental Learning. ECCV.
- Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister (2022). Learning to Prompt for Continual Learning. CVPR.
- Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister (2022). DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning. ECCV.
- Yabin Wang, Zhiwu Huang, Xiaopeng Hong (2022). S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning. NeurIPS.
- Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu (2020). Supermasks in Superposition. NeurIPS.
- Jaehong Yoon, Eunho Yang, Jeongtae Lee, Sung Ju Hwang (2018). Lifelong Learning with Dynamically Expandable Networks. ICLR.
- Friedemann Zenke, Ben Poole, Surya Ganguli (2017). Continual Learning through Synaptic Intelligence. ICML.