Meta-Continual Learning
Meta-learning approaches aim to learn initialization, update rules, or representations that are inherently amenable to continual learning. The key idea is to "learn to learn continually" -- rather than designing anti-forgetting mechanisms by hand, meta-learning discovers them automatically by training on sequences of tasks and optimizing for the ability to learn new tasks without forgetting old ones. This family bridges continual learning with the broader meta-learning literature (Hospedales et al., 2021).
Online Meta-Learning (OML)
Javed and White (2019) (Javed & White, 2019) proposed OML, which meta-learns a representation that is robust to catastrophic forgetting. The architecture separates the network into a representation learning network (RLN) and a prediction learning network (PLN). The RLN is meta-trained using a MAML-like outer loop (Finn et al., 2017) where the meta-objective explicitly penalizes forgetting: after the inner loop updates the PLN on a sequence of tasks, the outer loop adjusts the RLN so that the learned representation supports learning new tasks without degrading performance on earlier ones. OML demonstrated that representations can be meta-trained to be inherently continual-learning-friendly, generalizing well to long task sequences (up to 200 tasks) not seen during meta-training.
A Neuromodulated Meta-Learning Algorithm (ANML)
Beaulieu et al. (2020) (Beaulieu et al., 2020) introduced ANML, which draws inspiration from neuromodulation in biological brains -- the process by which neuromodulatory signals (e.g., dopamine, acetylcholine) selectively gate plasticity in neural circuits. ANML adds a neuromodulatory network that produces activation gates for the prediction network during learning. These gates determine which neurons are allowed to update, selectively protecting important features while allowing plasticity for new learning.
The neuromodulatory network is meta-learned end-to-end, so it automatically discovers task-dependent gating patterns that minimize forgetting. ANML outperforms OML on several benchmarks and demonstrates better scalability to long task sequences. The biological plausibility of the neuromodulatory mechanism is an appealing feature, connecting algorithmic continual learning research to neuroscience.
La-MAML
Gupta et al. (2020) (Gupta et al., 2020) proposed La-MAML (Look-Ahead Meta Learning for Continual Learning), which takes a different approach to meta-continual learning. Rather than meta-learning representations, La-MAML meta-learns per-parameter learning rates. The key idea is that parameters important for previous tasks should have low learning rates (to prevent forgetting), while parameters less important for previous tasks can have higher learning rates (to enable plasticity). La-MAML sets these learning rates by optimizing a "look-ahead" loss that evaluates the effect of a gradient update on both the current task and a buffer of samples from previous tasks.
La-MAML achieves strong performance on both task-incremental and class-incremental settings while being more computationally efficient than standard MAML-based approaches (it avoids the expensive second-order gradient computation of MAML through a first-order approximation). Empirically, La-MAML matches or exceeds the performance of OML and ANML while being applicable to more diverse settings.
Meta-Continual Learning with Replay (MRCL)
MRCL combines meta-learning with replay, learning representations that are specifically optimized for replay-based continual learning (Arani et al., 2022). The meta-objective ensures that a small number of replay samples is maximally effective for retaining previous task knowledge. This is a pragmatic approach: rather than trying to eliminate the need for replay (which is arguably the most effective anti-forgetting mechanism), MRCL optimizes the representation to make replay as efficient as possible, requiring fewer stored exemplars for the same level of retention.
CLS-ER: Experience Replay with Complementary Learning Systems
Arani et al. (2022) (Arani et al., 2022) proposed CLS-ER, which implements a dual-memory system inspired by the complementary learning systems (CLS) theory from neuroscience (Kumaran et al., 2016). CLS-ER maintains two model copies: a "working model" that learns quickly (high plasticity) and a "stable model" that consolidates knowledge slowly (high stability). The stable model is updated as an exponential moving average of the working model, and both models provide complementary supervision during replay. This architecture mirrors the hippocampal-cortical interaction in biological memory systems and achieves strong performance, particularly on challenging class-incremental benchmarks.
Limitations and Open Questions
Meta-continual learning methods face several challenges:
-
Computational cost: The meta-training phase is computationally expensive, requiring training on many simulated task sequences. OML and ANML both require thousands of meta-training episodes, each involving an inner loop (sequential task learning) and outer loop (meta-gradient computation). For large models, this cost can exceed that of standard continual learning by an order of magnitude.
-
Generalization gap: The generalization of meta-learned strategies to task distributions different from those seen during meta-training is not guaranteed. If meta-training uses 5-way classification tasks from Omniglot but deployment involves 100-way classification on ImageNet classes, the learned update rules may not transfer. Bridging this domain gap between meta-training and meta-testing distributions is an open challenge (Hospedales et al., 2021).
-
Task boundary requirement: Most methods require explicit task boundaries during meta-training, even if the goal is task-free continual learning at deployment. Developing meta-learning approaches for task-free settings -- where the meta-learner must also discover when tasks change -- remains largely unexplored.
-
Scaling: Scaling these methods to large models and long task sequences remains challenging. The memory requirements of second-order meta-gradients (needed by MAML-based approaches) grow with model size, though first-order approximations like La-MAML (Gupta et al., 2020) and Reptile (Nichol et al., 2018) partially address this.
-
Relationship to pre-training: With the success of large pre-trained models, the role of meta-continual learning is evolving. Pre-trained models already provide a representation that is broadly useful across tasks -- arguably achieving the same goal as OML's meta-learned representation but through a different mechanism (massive supervised pre-training rather than meta-learning). Whether meta-learning can add value on top of pre-training, for instance by learning task-adaptive prompting strategies, is an active area of investigation.
Nevertheless, the principle of learning the learning algorithm itself -- rather than designing it by hand -- is a promising direction for fundamentally solving the continual learning problem. The success of meta-learning in few-shot learning (Finn et al., 2017) suggests that analogous breakthroughs may be possible for continual learning when the right meta-objective and training procedure are identified.
References
- Elahe Arani, Fahad Sarfraz, Bahram Zonooz (2022). Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System Theory. ICLR.
- Shawn Beaulieu, Lapo Frati, Thomas Miconi (2020). Learning to Continually Learn. ECAI.
- Chelsea Finn, Pieter Abbeel, Sergey Levine (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML.
- Gunshi Gupta, Karmesh Yadav, Liam Paull (2020). La-MAML: Look-ahead Meta Learning for Continual Learning. NeurIPS.
- Timothy Hospedales, Antreas Antoniou, Paul Micaelli, Amos Storkey (2021). Meta-Learning in Neural Networks: A Survey. IEEE TPAMI.
- Khurram Javed, Martha White (2019). Meta-Learning Representations for Continual Learning. NeurIPS.
- Dharshan Kumaran, Demis Hassabis, James L. McClelland (2016). What Learning Systems Do Intelligent Agents Need? Complementary Learning Systems Theory Updated. Trends in Cognitive Sciences.
- Alex Nichol, Joshua Achiam, John Schulman (2018). On First-Order Meta-Learning Algorithms. arXiv.