Skip to main content

Bibliography

All references cited across Mochi Lab. Use [@key] in any markdown file to cite a reference.

Showing 597 of 597 references

@singh2025agenticarticle
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Aditi Singh (2025)arXiv
@deepmind2025genie3article
Genie 3
Google DeepMind (2025)Google DeepMind
@yuan2025native_sparse_attentionarticle
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Jingyang Yuan, Huazuo Gao, Damai Dai, et al. (2025)arXiv
@liu2025waveletarticle
Wavelet-integrated Deep Neural Networks: A Systematic Review
Peng Liu (2025)Neurocomputing
@specache2025speculativearticle
Speculative Key-Value Caching for Efficient LLM Inference
SpeCache (2025)ICML
@scialom2025continualarticle
Continual Learning of Large Language Models
Thomas Scialom (2025)EMNLP Tutorials
@zhang2025dfloat11article
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang, Mohsen Hariri, Shaochen Zhong, et al. (2025)NeurIPS
@zhang2025storm_wmarticle
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
Weipu Zhang, Gang Wang, Jian Sun, et al. (2025)NeurIPS
@ibrahim2024investigatingarticle
Investigating Continual Pretraining in Large Language Models
Adam Ibrahim (2024)OpenReview
@bardes2024vjepaarticle
V-JEPA: Latent Video Prediction for Visual Representation Learning
Adrien Bardes, Quentin Garrido, Jean Ponce, et al. (2024)arXiv
@asai2024selfragarticle
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, et al. (2024)ICLR
@gu2024mambaarticle
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu, Tri Dao (2024)arXiv
@jiang2024mixtralarticle
Mixtral of Experts
Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux (2024)arXiv
@drouin2024browsergymarticle
The BrowserGym Ecosystem for Web Agent Research
Alexandre Drouin (2024)arXiv
@zhou2024latsarticle
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman (2024)ICML
@anthropic2024claude_computerarticle
Introducing Computer Use
Anthropic (2024)Anthropic Blog
@ahmadian2024rlooarticle
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallee, et al. (2024)ACL
@jimenez2024swebencharticle
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez, John Yang, Alexander Wettig (2024)ICLR
@snell2024scaling_testtimearticle
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, et al. (2024)arXiv
@hsieh2024rulerarticle
RULER: What's the Real Context Size of Your Long-Context Language Models?
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, et al. (2024)arXiv
@dai2024deepseekmoearticle
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Damai Dai, Chengqi Deng, Chenggang Zhao (2024)arXiv
@kondratyuk2024videopoetarticle
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk, Lijun Yu, Xiuye Gu, et al. (2024)ICML
@valevski2024diffusionarticle
Diffusion Models Are Real-Time Game Engines
Dani Valevski, Yaniv Leviathan, Moab Arar, et al. (2024)arXiv
@edge2024graphragarticle
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, et al. (2024)arXiv
@raposo2024mixturearticle
Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models
David Raposo, Sam Ritter, Blake Richards (2024)arXiv
@deepseek2024deepseekv2article
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek-AI (2024)arXiv
@alonso2024diffusionarticle
Diffusion for World Modeling: Visual Details Matter in Atari
Eloi Alonso, Adam Jelley, Anssi Kanervisto, et al. (2024)NeurIPS
@zelikman2024quietstararticle
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Eric Zelikman, Georges Harik, Yijia Shao, et al. (2024)arXiv
@deepmind2024alphaproofarticle
AI Achieves Silver-Medal Standard Solving International Mathematical Olympiad Problems
Google DeepMind (2024)Google DeepMind Blog
@mialon2024gaiaarticle
GAIA: A Benchmark for General AI Assistants
Gregoire Mialon (2024)ICLR
@wang2024legoproverarticle
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Haiming Wang (2024)ICLR
@wu2024continual_llm_surveyarticle
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi, Zihao Xu, Hengyi Wang, et al. (2024)arXiv
@liu2024ringattentionarticle
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu, Matei Zaharia, Pieter Abbeel (2024)ICLR
@he2024webvoyagerarticle
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He (2024)ACL
@xin2024deepseekproverarticle
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Huajian Xin (2024)arXiv
@bruce2024geniearticle
Genie: Generative Interactive Environments
Jake Bruce, Michael Dennis, Ashley Edwards (2024)ICML
@shah2024flashattention3article
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah, Ganesh Bikshandi, Ying Zhang, et al. (2024)arXiv
@lin2024learningarticle
Learning to Model the World with Language
Jessy Lin, Yilun Du, Olivia Watkins (2024)ICLR
@lin2024awqarticle
AWQ: Activation-Aware Weight Quantization for On-Device LLM Compression and Acceleration
Ji Lin, Jiaming Tang, Haotian Tang (2024)MLSys
@wu2024ivideogptarticle
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu (2024)NeurIPS
@wu2024prearticle
Pre-training Contextualized World Models with In-the-Wild Videos for Reinforcement Learning
Jialong Wu, Haoyu Ma, Chaoyi Deng, et al. (2024)NeurIPS
@su2024roformerarticle
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su, Yu Lu, Shengfeng Pan, et al. (2024)Neurocomputing
@su2024ropearticle
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su, Murtadha Ahmed, Yu Lu, et al. (2024)Neurocomputing
@xiang2024languagearticle
Language Models Meet World Models: Embodied Experiences Enhance Language Models
Jiannan Xiang, Tianhua Tao, Yi Gu, et al. (2024)NeurIPS
@zhao2024galorearticle
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao, Zhenyu Zhang, Beidi Chen, et al. (2024)ICML
@yu2024boosting_moe_clarticle
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, et al. (2024)CVPR
@koh2024visualwebarenaarticle
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh, Robert Lo, Lawrence Jang, et al. (2024)ACL
@lee2024geckoarticle
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, et al. (2024)arXiv
@yang2024sweagentarticle
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang, Carlos E. Jimenez, Alexander Wettig, et al. (2024)arXiv
@tirumala2024d4article
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Kushal Tirumala, Daniel Simig, Armen Aghajanyan, et al. (2024)NeurIPS
@yu2024language_darearticle
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu, Bowen Yu, Haiyang Yu, et al. (2024)ICML
@wang2024surveyarticle
A Survey on Large Language Model Based Autonomous Agents
Lei Wang, Chen Ma, Xueyang Feng (2024)Frontiers of Computer Science
@wang2024e5article
Improving Text Embeddings with Large Language Models
Liang Wang, Nan Yang, Xiaolong Huang, et al. (2024)ACL
@zheng2024sglangarticle
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, et al. (2024)arXiv
@zheng2024sglangradixtreearticle
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, et al. (2024)NeurIPS
@wang2024comprehensivearticle
A Comprehensive Survey of Continual Learning: Theory, Method and Application
Liyuan Wang, Xingxing Zhang, Hang Su, et al. (2024)IEEE TPAMI
@wang2024hierarchical_hidearticle
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality
Liyuan Wang, Jingyi Xie, Xingxing Zhang, et al. (2024)NeurIPS
@okada2024dreamerv3article
DreamerV3-XL: Scaling World Models with Transformers
Masashi Okada, Tadahiro Taniguchi (2024)arXiv
@beck2024xlstmarticle
xLSTM: Extended Long Short-Term Memory
Maximilian Beck, Korbinian Poppel, Markus Spanring (2024)NeurIPS
@sun2024wandaarticle
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun, Zhuang Liu, Anna Bair, et al. (2024)ICLR
@hansen2024tdmpc2article
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, Xiaolong Wang (2024)ICLR
@nvidia2024cosmosarticle
Cosmos World Foundation Model Platform for Physical AI
NVIDIA (2024)NVIDIA Technical Report
@khattab2024dspyarticle
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, et al. (2024)ICLR
@openai2024soraarticle
Video Generation Models as World Simulators
OpenAI (2024)OpenAI Technical Report
@openai2024o1article
Learning to Reason with LLMs
OpenAI (2024)OpenAI Blog
@lieber2024jambaarticle
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber, Barak Lenz, Hofit Bata (2024)arXiv
@glorioso2024zambaarticle
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso, Quentin Anthony, Yury Tokpanov, et al. (2024)arXiv
@yadav2024tiesarticle
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav, Derek Tam, Leshem Choshen, et al. (2024)NeurIPS
@patel2024splitwisearticle
Splitwise: Efficient Generative LLM Inference Using Phase Splitting
Pratyush Patel, Esha Choukse, Chaojie Zhang, et al. (2024)ISCA
@es2024ragasarticle
RAGAS: Automated Evaluation of Retrieval Augmented Generation
Shahul Es, Jithin James, Luis Espinosa-Anke, et al. (2024)EACL
@yan2024cragarticle
Corrective Retrieval Augmented Generation
Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, et al. (2024)arXiv
@xu2024searcharticle
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
Shicheng Xu, Liang Pang, Huawei Shen, et al. (2024)WWW
@liu2024doraarticle
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, et al. (2024)ICML
@xiao2024bgearticle
C-Pack: Packaged Resources To Advance General Chinese Embedding
Shitao Xiao, Zheng Liu, Peitian Zhang, et al. (2024)SIGIR
@ma2024bitnet158article
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, et al. (2024)arXiv
@zhou2024webarenaarticle
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou, Frank F. Xu, Hao Zhu (2024)ICLR
@arora2024simplearticle
Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff
Simran Arora, Sabri Eyuboglu, Michael Zhang (2024)ICML
@arora2024basedarticle
Simple linear attention language models balance the recall-throughput tradeoff
Simran Arora, Sabri Eyuboglu, Michael Zhang, et al. (2024)ICML
@de2024griffinarticle
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De, Samuel L. Smith, Anushan Fernando (2024)arXiv
@jeong2024adaptivearticle
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Soyeong Jeong, Jinheon Baek, Sukmin Cho, et al. (2024)NAACL
@xie2024osworldarticle
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, et al. (2024)NeurIPS
@zhang2024raftarticle
RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang, Shishir G. Patil, Naman Jain, et al. (2024)arXiv
@cai2024medusaarticle
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng, et al. (2024)ICML
@gao2024alcearticle
Enabling Large Language Models to Generate Text with Citations
Tianyu Gao, Howard Yen, Jiatong Yu, et al. (2024)EMNLP
@ye2024diff_transformerarticle
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia, et al. (2024)arXiv
@ye2024differentialarticle
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia, et al. (2024)arXiv
@wang2024driving_world_surveyarticle
A Survey of World Models for Autonomous Driving
Tuo Wang, Guangming Wang, Yanfeng Wang, et al. (2024)arXiv
@shi2024trustingarticle
Trusting Your Evidence: Hallucinate Less with Context-Aware Decoding
Weijia Shi, Sewon W. Han, Mike Lewis (2024)NAACL Findings
@sun2024rankgptarticle
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
Weiwei Sun, Lingyong Yan, Xinyu Ma, et al. (2024)EMNLP
@zheng2024occworldarticle
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, et al. (2024)ECCV
@gurnee2024languagearticle
Language Models Represent Space and Time
Wes Gurnee, Max Tegmark (2024)ICLR
@deng2024mind2webarticle
Mind2Web: Towards a Generalist Agent for the Web
Xiang Deng, Yu Gu, Boyuan Zheng (2024)NeurIPS
@liu2024agentbencharticle
AgentBench: Evaluating LLMs as Agents
Xiao Liu, Hao Yu, Hanchen Zhang, et al. (2024)ICLR
@yang2024cragarticle
CRAG -- Comprehensive RAG Benchmark
Xiao Yang, Kai Sun, Hao Xin, et al. (2024)arXiv
@wang2024openhandsarticle
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Xingyao Wang, Boxuan Li, Yufan Song, et al. (2024)arXiv
@chen2024cot_decodingarticle
Chain-of-Thought Reasoning Without Prompting
Xuezhi Wang, Denny Zhou (2024)arXiv
@shao2024stormarticle
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Yijia Shao, Yucheng Jiang, Theodore A. Kanell (2024)NAACL
@du2024videoarticle
Video Language Planning
Yilun Du, Mengjiao Yang, Pete Florence, et al. (2024)ICLR
@sheng2024sloraarticle
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng, Shiyi Cao, Dacheng Li, et al. (2024)MLSys
@ge2024worldgptarticle
WorldGPT: Empowering LLM as Multimodal World Model
Yue Ge (2024)arXiv
@li2024snapkvarticle
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li, Yingbing Huang, Bowen Yang, et al. (2024)arXiv
@gao2024modular_ragarticle
Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks
Yunfan Gao, Yun Xiong, Xinyu Gao, et al. (2024)arXiv
@feng2024retrievalarticle
Retrieval-Generation Synergy Augmented Large Language Models
Zhangyin Feng, Xiaocheng Feng, Dongyan Zhao, et al. (2024)EMNLP Findings
@zhou2024robodreamerarticle
RoboDreamer: Learning Compositional World Models for Robot Imagination
Zhenan Zhou (2024)arXiv
@zhang2024h2oarticle
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang, Ying Sheng, Tianyi Zhou (2024)NeurIPS
@shao2024deepseekmatharticle
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, et al. (2024)arXiv
@zhu2024deepseekmathrlarticle
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, et al. (2024)arXiv
@liu2024scissorhandsarticle
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu, Aditya Desai, Fangshuo Liao, et al. (2024)NeurIPS
@jiang2024longragarticle
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Ziyan Jiang, Xueguang Ma, Wenhu Chen (2024)arXiv
@galashov2023continuallyarticle
Continually Learning Representations at Scale
Alexandre Galashov (2023)CoLLAs
@madaan2023selfrefinearticle
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, et al. (2023)NeurIPS
@abbas2023semdeduparticle
SemDeDup: Data-efficient Learning at Web-scale through Semantic Deduplication
Amro Abbas, Kushal Tirumala, Daniel Simig, et al. (2023)arXiv
@blattmann2023stablearticle
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, et al. (2023)arXiv
@brohan2023rtarticle
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan, Noah Brown, Justice Carbajal, et al. (2023)arXiv
@hu2023gaia1article
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, et al. (2023)arXiv
@jiang2023vadarticle
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
Bo Jiang, Shaoyu Chen, Qing Xu, et al. (2023)ICCV
@peng2023rwkvarticle
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng, Eric Alcaide, Quentin Anthony (2023)EMNLP Findings
@chen2023acceleratingarticle
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen, Sebastian Borgeaud, Geoffrey Irving (2023)arXiv
@hafner2023masteringarticle
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, et al. (2023)arXiv
@hafner2023dreamerv3article
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, et al. (2023)arXiv
@zhou2023leasttomostarticle
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Denny Zhou, Nathanael Scharli, Le Hou, et al. (2023)ICLR
@frantar2023gptqarticle
GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, et al. (2023)ICLR
@frantar2023sparsegptarticle
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar, Dan Alistarh (2023)ICML
@epoch2023trendsarticle
Trends in Machine Learning Hardware
Epoch AI (2023)Epoch AI Research
@keles2023computationalarticle
On the Computational Complexity of Self-Attention
Feyza Duman Keles, Pruthuvi Mahesakya Wijewardena, Cengiz Candan, et al. (2023)ALT
@ilharco2023editingarticle
Editing Models with Task Arithmetic
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, et al. (2023)ICLR
@kim2023treearticle
Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, et al. (2023)EMNLP
@izacard2023atlasarticle
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Gautier Izacard, Patrick Lewis, Maria Lomeli (2023)JMLR
@alphacode2_2023article
AlphaCode 2 Technical Report
Google DeepMind (2023)Google DeepMind
@voyager2023article
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, et al. (2023)arXiv
@rashkin2023measuringarticle
Measuring Attribution in Natural Language Generation Models
Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm (2023)CL
@trivedi2023ircotarticle
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, et al. (2023)ACL
@su2023embedderarticle
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su, Weijia Shi, Jungo Kasai, et al. (2023)ACL Findings
@wang2023bitnetarticle
BitNet: Scaling 1-Bit Transformers for Large Language Models
Hongyu Wang, Shuming Ma, Li Dong, et al. (2023)arXiv
@touvron2023llamaarticle
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al. (2023)arXiv
@lightman2023prmarticle
Let's Verify Step by Step
Hunter Lightman, Vineet Kosaraju, Yura Burda, et al. (2023)ICLR
@smith2023codapromptarticle
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, et al. (2023)CVPR
@robine2023transformerarticle
Transformer-based World Models Are Happy with 100k Interactions
Jan Robine, Marc Hoftmann, Tobias Uelwer, et al. (2023)ICLR
@hoelscher2023detectingarticle
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Jason Hoelscher-Obermaier, Julia Perber, Fazl Barez, et al. (2023)ACL Findings
@ainslie2023gqaarticle
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie, James Lee-Thorp, Michiel de Jong (2023)EMNLP
@greshake2023morearticle
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, et al. (2023)AISec
@yang2023leandojoarticle
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang (2023)NeurIPS
@li2023emergent_othelloarticle
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Kenneth Li, Aspen K. Hopkins, David Bau, et al. (2023)ICLR
@meng2023massarticle
Mass-Editing Memory in a Transformer
Kevin Meng, Arnab Sen Sharma, Alex Andonian, et al. (2023)ICLR
@ahn2023sayplanarticle
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
Krishan Rana, Jesse Haviland, Sourav Garg, et al. (2023)CoRL
@gupta2023continualarticle
Continual Pre-Training of Large Language Models: How to Re-warm Your Model?
Kshitij Gupta, Benjamin Acting, Yi Tay (2023)ICML Workshop
@zheng2023judgingarticle
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng (2023)NeurIPS
@guan2023leveragingarticle
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Lin Guan, Karthik Valmeekam, Sarath Sreedharan, et al. (2023)NeurIPS
@wong2023wordarticle
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Lionel Wong, Gabriel Grand, Alexander Lew (2023)arXiv
@gao2023hydearticle
Precise Zero-Shot Dense Retrieval without Relevance Labels
Luyu Gao, Xueguang Ma, Jimmy Lin, et al. (2023)ACL
@masana2023classarticle
Class-Incremental Learning: Survey and Performance Evaluation on Image Classification
Marc Masana, Xialei Liu, Bartlomiej Twardowski (2023)IEEE TPAMI
@mitchell2023debatearticle
The Debate Over Understanding in AI's Large Language Models
Melanie Mitchell, David C. Krakauer (2023)PNAS
@yang2023learningarticle
Learning Interactive Real-World Simulators
Mengjiao Yang, Yilun Du, Kamyar Ghasemipour (2023)arXiv
@poli2023hyenaarticle
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli, Stefano Massaroli, Eric Nguyen (2023)ICML
@nanda2023emergentarticle
Othello-GPT: How Does an LLM Learn the Rules of a Board Game?
Neel Nanda, Andrew Lee, Martin Wattenberg (2023)NeurIPS ATTRIB Workshop
@liu2023evaluatingarticle
Evaluating Verifiability in Generative Search Engines
Nelson F. Liu, Tianyi Zhang, Percy Liang (2023)EMNLP Findings
@muennighoff2023mtebarticle
MTEB: Massive Text Embedding Benchmark
Niklas Muennighoff, Nouamane Tazi, Loic Magne, et al. (2023)EACL
@shinn2023reflexionarticle
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, et al. (2023)NeurIPS
@press2023measuringarticle
Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press, Muru Zhang, Sewon Min, et al. (2023)EMNLP Findings
@khattab2023dsparticle
Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP
Omar Khattab, Keshav Santhanam, Xiang Lisa Li, et al. (2023)arXiv
@ram2023incontextarticle
In-Context Retrieval-Augmented Language Models
Ori Ram, Yoav Levine, Itay Dalmedigos, et al. (2023)TACL
@liang2023holisticarticle
Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee (2023)TMLR
@wu2023daydreamerarticle
DayDreamer: World Models for Physical Robot Learning
Philipp Wu, Alejandro Escontrela, Danijar Hafner, et al. (2023)CoRL
@wu2023autogenarticle
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu, Gagan Bansal, Jieyu Zhang, et al. (2023)arXiv
@rafailov2023dpoarticle
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, et al. (2023)NeurIPS
@rafailov2023directarticle
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, et al. (2023)NeurIPS
@lam2023graphcastarticle
Learning Skillful Medium-Range Global Weather Forecasting
Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, et al. (2023)Science
@kim2023achievingarticle
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, et al. (2023)CVPR
@mehta2023empiricalarticle
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, et al. (2023)JMLR
@min2023factscorearticle
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min, Kalpesh Krishna, Xinxi Lyu, et al. (2023)EMNLP
@lin2023trainarticle
How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval
Sheng-Chieh Lin, Akari Asai, Minghan Li, et al. (2023)EMNLP Findings
@hao2023reasoningarticle
Reasoning with Language Model is Planning with World Model
Shibo Hao, Yi Gu, Haodi Ma (2023)EMNLP
@patil2023gorillaarticle
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil, Tianjun Zhang, Xin Wang, et al. (2023)arXiv
@li2023languagearticle
Pre-Trained Language Models for Interactive Decision-Making
Shuang Li, Xavier Puig, Chris Paxton, et al. (2023)NeurIPS
@yao2023reactarticle
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu (2023)ICLR
@yao2023treearticle
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao (2023)NeurIPS
@significant2023autogptarticle
Auto-GPT: An Autonomous GPT-4 Experiment
Significant Gravitas (2023)GitHub
@moerland2023modelarticle
Model-based Reinforcement Learning: A Survey
Thomas M. Moerland, Joost Broekens, Aske Plaat, et al. (2023)Foundations and Trends in Machine Learning
@gao2023enablingarticle
Enabling Large Language Models to Generate Text with Citations
Tianyu Gao, Howard Yen, Jiatong Yu, et al. (2023)EMNLP
@dettmers2023qloraarticle
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, et al. (2023)NeurIPS
@schick2023toolformerarticle
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi (2023)NeurIPS
@micheli2023transformersarticle
Transformers are Sample-Efficient World Models
Vincent Micheli, Eloi Alonso, Francois Fleuret (2023)ICLR
@shi2023replugarticle
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi, Sewon Min, Michihiro Yasunaga, et al. (2023)arXiv
@huang2023innerarticle
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, et al. (2023)CoRL
@kwon2023efficientarticle
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang (2023)SOSP
@ma2023queryarticle
Query Rewriting for Retrieval-Augmented Large Language Models
Xinbei Ma (2023)EMNLP
@wang2023selfconsistencyarticle
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, et al. (2023)ICLR
@leviathan2023fastarticle
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan, Matan Kalman, Yossi Matias (2023)ICML
@zhao2023pytorcharticle
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Yanli Zhao, Andrew Gu, Rohan Varma (2023)VLDB
@hu2023planningarticle
Planning-oriented Autonomous Driving
Yihan Hu, Jiazhi Yang, Li Chen, et al. (2023)CVPR
@seo2023maskedarticle
Masked World Models for Visual Control
Younggyo Seo, Danijar Hafner, Hao Liu, et al. (2023)CoRL
@li2023teacherlmarticle
Textbooks Are All You Need II: phi-1.5 Technical Report
Yuanzhi Li, Sebastien Bubeck, Ronen Eldan, et al. (2023)arXiv
@qin2023toolarticle
Tool Learning with Foundation Models
Yujia Qin, Shengding Hu, Yankai Lin (2023)arXiv
@yao2023editingarticle
Editing Large Language Models: Problems, Methods, and Opportunities
Yunzhi Yao, Peng Wang, Bozhong Tian, et al. (2023)EMNLP
@sun2023retentivearticle
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun, Li Dong, Shaohan Huang, et al. (2023)arXiv
@li2023gtearticle
Towards General Text Embeddings with Multi-stage Contrastive Learning
Zehan Li, Xin Zhang, Yanzhao Zhang, et al. (2023)arXiv
@jiang2023flarearticle
Active Retrieval Augmented Generation
Zhengbao Jiang, Frank F. Xu, Luyu Gao (2023)EMNLP
@shao2023enhancingarticle
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Zhihong Shao, Yeyun Gong, Yelong Shen (2023)EMNLP Findings
@wang2023orthogonalarticle
Orthogonal Subspace Learning for Language Model Continual Learning
Zhilin Wang (2023)EMNLP Findings
@wu2023slotformerarticle
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models
Ziyi Wu, Nikita Dvornik, Klaus Greff, et al. (2023)ICLR
@parisi2022talmarticle
TALM: Tool Augmented Language Models
Aaron Parisi, Yao Zhao, Noah Fishi (2022)arXiv
@gu2022efficientlyarticle
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, Christopher Re (2022)ICLR
@hu2022milearticle
Model-Based Imitation Learning for Urban Driving
Anthony Hu, Gianluca Corrado, Nicolas Griffiths, et al. (2022)NeurIPS
@zoph2022stmoearticle
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph, Irwan Bello, Sameer Kumar (2022)arXiv
@chen2022gslidearticle
G-SLIDE: A GPU-Based Sub-Linear Deep Learning Engine via LSH Sparsification
Beidi Chen (2022)IEEE TPDS
@chen2022transdreamerarticle
TransDreamer: Reinforcement Learning with Transformer World Models
Chang Chen, Yi-Fu Wu, Jaesik Yoon, et al. (2022)arXiv
@li2022neuralarticle
Neural Architecture Search Survey: A Hardware Perspective
Chunyun Li (2022)ACM Computing Surveys
@zhou2022memoarticle
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning
Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, et al. (2022)ICLR
@hafner2022directorarticle
Deep Hierarchical Planning from Pixels
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, et al. (2022)NeurIPS
@hu2022loraarticle
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, et al. (2022)ICLR
@arani2022learningarticle
Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System Theory
Elahe Arani, Fahad Sarfraz, Bahram Zonooz (2022)ICLR
@mitchell2022fastarticle
Fast Model Editing at Scale
Eric Mitchell, Charles Lin, Antoine Bosselut, et al. (2022)ICLR
@zelikman2022stararticle
STaR: Bootstrapping Reasoning With Reasoning
Eric Zelikman, Yuhuai Wu, Jesse Mu, et al. (2022)NeurIPS
@normandin2022sequoiaarticle
Sequoia: A Software Framework to Unify Continual Learning Research
Fabrice Normandin, Florian Golemo, Oleksiy Ostapenko, et al. (2022)arXiv
@liu2022randomarticle
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
Fanghui Liu, Xiaolin Huang, Yudong Chen, et al. (2022)IEEE TPAMI
@deng2022dreamerproarticle
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
Fei Deng, Ingook Jang, Sungjin Ahn (2022)ICML
@wang2022fosterarticle
FOSTER: Feature Boosting and Compression for Class-Incremental Learning
Fu-Yun Wang, Da-Wei Zhou, Han-Jia Ye, et al. (2022)ECCV
@izacard2022contrieverarticle
Unsupervised Dense Information Retrieval with Contrastive Learning
Gautier Izacard, Mathilde Caron, Lucas Hosseini, et al. (2022)TMLR
@vandeven2022three_typesarticle
Three Types of Incremental Learning
Gido M. van de Ven, Hava T. Siegelmann, Andreas S. Tolias (2022)Nature Machine Intelligence
@yu2022orcaarticle
Orca: A Distributed Serving System for Transformer-Based Generative Models
Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, et al. (2022)OSDI
@trivedi2022musiquearticle
MuSiQue: Multihop Questions via Single Hop Question Composition
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, et al. (2022)TACL
@evron2022catastrophicarticle
How Catastrophic Can Catastrophic Forgetting Be in Linear Regression?
Itay Evron, Edward Moroshko, Rachel Ward, et al. (2022)COLT
@pathak2022fourcastnetarticle
FourCastNet: A Global Data-driven High-resolution Weather Forecasting Model using Adaptive Fourier Neural Operators
Jaideep Pathak, Shashank Subramanian, Peter Harrington, et al. (2022)arXiv
@sevilla2022computearticle
Compute Trends Across Three Eras of Machine Learning
Jaime Sevilla, Lennart Heim, Anson Ho, et al. (2022)arXiv
@leethorp2022fnetarticle
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, et al. (2022)NAACL
@wei2022chainarticle
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, et al. (2022)NeurIPS
@jang2022towardsarticle
Towards Continual Knowledge Learning of Language Models
Joel Jang, Seonghyeon Ye, Sungdong Yang (2022)ICLR
@guibas2022adaptivearticle
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers
John Guibas, Morteza Mardani, Zongyi Li (2022)ICLR
@guibas2022afnoarticle
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers
John Guibas, Morteza Mardani, Zongyi Li, et al. (2022)ICLR
@ho2022videoarticle
Video Diffusion Models
Jonathan Ho, Tim Salimans, Alexey Gritsenko (2022)NeurIPS
@ho2022cfgarticle
Classifier-Free Diffusion Guidance
Jonathan Ho, Tim Salimans (2022)NeurIPS Workshop
@ho2022imagenarticle
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, et al. (2022)arXiv
@hoffmann2022trainingarticle
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch (2022)NeurIPS
@mendez2022modular_clarticle
Lifelong Learning with Modular and Compositional Knowledge
Jorge A. Mendez, Eric Eaton (2022)ICML
@lee2022deduplicatingarticle
Deduplicating Training Data Makes Language Models Better
Katherine Lee (2022)ACL
@santhanam2022colbertv2article
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, et al. (2022)NAACL
@meng2022locatingarticle
Locating and Editing Factual Associations in GPT
Kevin Meng, David Bau, Alex Andonian, et al. (2022)NeurIPS
@ouyang2022trainingarticle
Training Language Models to Follow Instructions with Human Feedback
Long Ouyang, Jeff Wu, Xu Jiang, et al. (2022)NeurIPS
@caccia2022newarticle
New Insights on Reducing Abrupt Representation Change in Online Continual Learning
Lucas Caccia, Rahaf Aljundi, Nader Asadi, et al. (2022)ICLR
@boschini2022classarticle
Class-Incremental Continual Learning into the eXtended DER-verse
Matteo Boschini, Lorenzo Bonicelli, Pietro Buzzega, et al. (2022)IEEE TPAMI
@wortsman2022model_soupsarticle
Model Soups: Averaging Weights of Multiple Fine-tuned Models Improves Accuracy without Increasing Inference Time
Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, et al. (2022)ICML
@hansen2022temporalarticle
Temporal Difference Learning for Model Predictive Control
Nicklas Hansen, Xiaolong Su, Xiaolong Wang (2022)ICML
@micikevicius2022fp8article
FP8 Formats for Deep Learning
Paulius Micikevicius, Dusan Stosic, Neil Burgess, et al. (2022)arXiv
@jeevan2022wavemixarticle
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan, Amit Sethi (2022)arXiv
@rombach2022higharticle
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, et al. (2022)CVPR
@srivastava2022behaviorarticle
BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
Sanjana Srivastava, Chengshu Li, Michael Lingelbach, et al. (2022)CoRL
@borgeaud2022retroarticle
Improving Language Models by Retrieving from Trillions of Tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann (2022)ICML
@yao2022webshoparticle
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao, Howard Chen, John Yang, et al. (2022)NeurIPS
@kojima2022largearticle
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, et al. (2022)NeurIPS
@schuster2022confidentarticle
Confident Adaptive Language Modeling
Tal Schuster, Adam Fisch, Jai Gupta (2022)NeurIPS
@dettmers2022gpt3int8article
GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers, Mike Lewis, Younes Belkada, et al. (2022)NeurIPS
@dao2022flashattentionarticle
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, et al. (2022)NeurIPS
@singer2022makearticle
Make-A-Video: Text-Conditioned Video Generation with Diffusion Models
Uriel Singer, Adam Polyak, Thomas Hayes, et al. (2022)arXiv
@voleti2022mcvdarticle
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
Vikram Voleti, Alexia Jolicoeur-Martineau, Christopher Pal (2022)NeurIPS
@fedus2022switcharticle
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
William Fedus, Barret Zoph, Noam Shazeer (2022)JMLR
@wang2022spromptsarticle
S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning
Yabin Wang, Zhiwu Huang, Xiaopeng Hong (2022)NeurIPS
@lecun2022patharticle
A Path Towards Autonomous Machine Intelligence
Yann Lecun (2022)openreview.net
@lecun2024sora_critiquearticle
A Path Towards Autonomous Machine Intelligence
Yann LeCun (2022)OpenReview
@zhou2022expert_choicearticle
Mixture-of-Experts with Expert Choice Routing
Yanqi Zhou, Tao Lei, Hanxiao Liu, et al. (2022)NeurIPS
@li2022alphacodearticle
Competition-Level Code Generation with AlphaCode
Yujia Li, David Choi, Junyoung Chung (2022)Science
@li2022bevformerarticle
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Zhiqi Li, Wenhai Wang, Hongyang Li, et al. (2022)ECCV
@wang2022learning_l2particle
Learning to Prompt for Continual Learning
Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, et al. (2022)CVPR
@wang2022dualpromptarticle
DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning
Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, et al. (2022)ECCV
@dosovitskiy2021imagearticle
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, et al. (2021)ICLR
@madotto2021continualarticle
Continual Learning in Task-Oriented Dialogue Systems
Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, et al. (2021)EMNLP
@lester2021prompt_tuningarticle
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester, Rami Al-Rfou, Noah Constant (2021)EMNLP
@hendrycks2021mmluarticle
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, et al. (2021)ICLR
@hendrycks2021appsarticle
Measuring Coding Challenge Competence With APPS
Dan Hendrycks, Steven Basart, Saurav Kadavath, et al. (2021)NeurIPS
@hafner2021masteringarticle
Mastering Atari with Discrete World Models
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, et al. (2021)ICLR
@narayanan2021efficientarticle
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan, Mohammad Shoeybi, Jared Casper (2021)SC
@lepikhin2021gshardarticle
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu (2021)ICLR
@metzler2021rethinkingarticle
Rethinking Search: Making Domain Experts out of Dilettantes
Donald Metzler, Yi Tay, Dara Bahri, et al. (2021)SIGIR Forum
@petroni2021kiltarticle
KILT: A Benchmark for Knowledge Intensive Language Tasks
Fabio Petroni, Aleksandra Piktus, Angela Fan (2021)NAACL
@izacard2021fidarticle
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard, Edouard Grave (2021)EACL
@saha2021gradientarticle
Gradient Projection Memory for Continual Learning
Gobinda Saha, Isha Garg, Kaushik Roy (2021)ICLR
@benmeziane2021comprehensivearticle
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi (2021)arXiv
@liu2021payarticle
Pay Attention to MLPs
Hanxiao Liu, Zihang Dai, David So, et al. (2021)NeurIPS
@peng2021rfaarticle
Random Feature Attention
Hao Peng, Nikolaos Pappas, Dani Yogatama, et al. (2021)ICLR
@peng2021random_feature_attentionarticle
Random Feature Attention
Hao Peng, Nikolaos Pappas, Dani Yogatama, et al. (2021)ICLR
@ahn2021ssilarticle
SS-IL: Separated Softmax for Incremental Learning
Hongjoon Ahn, Jihwan Kwak, Subin Lim, et al. (2021)ICCV
@cha2021co2larticle
Co2L: Contrastive Continual Learning
Hyuntak Cha, Jaeho Lee, Jinwoo Shin (2021)ICCV
@yoon2021federatedarticle
Federated Continual Learning with Weighted Inter-client Transfer
Jaehong Yoon, Wonyong Jeong, Giwoong Lee, et al. (2021)ICML
@johnson2021faissarticle
Billion-scale Similarity Search with GPUs
Jeff Johnson, Matthijs Douze, Herve Jegou (2021)IEEE TBD
@choromanski2021rethinkingarticle
Rethinking Attention with Performers
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan (2021)ICLR
@chen2021humanevalarticle
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, et al. (2021)arXiv
@delange2021continualarticle
A Continual Learning Survey: Defying Forgetting in Classification Tasks
Matthias De Lange, Rahaf Aljundi, Marc Masana (2021)IEEE TPAMI
@lewis2021basearticle
BASE Layers: Simplifying Training of Large, Sparse Models
Mike Lewis (2021)ICML
@chen2022autoformer_nasarticle
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen, Houwen Peng, Jianlong Fu, et al. (2021)ICCV
@babaeizadeh2021fitvidarticle
FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair (2021)arXiv
@geva2023strategyqaarticle
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva, Daniel Khashabi, Elad Segal, et al. (2021)TACL
@thakur2021beirarticle
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur, Nils Reimers, Andreas Ruckle, et al. (2021)NeurIPS
@nakkiran2021deeparticle
Deep Double Descent: Where Bigger Models and More Data Can Hurt
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, et al. (2021)JMLR
@nakano2021webgptarticle
WebGPT: Browser-assisted Question-Answering with Human Feedback
Reiichiro Nakano (2021)arXiv
@lee2021continualarticle
Continual Learning in the Teacher-Student Setup: Impact of Task Similarity
Sebastian Lee, Sebastian Goldt, Andrew Saxe (2021)ICML
@wang2021orthogonal_adamnarticle
Training Networks in Null Space of Feature Covariance for Continual Learning
Shipeng Wang, Xiaorong Li, Jian Sun, et al. (2021)CVPR
@yan2021der_clarticle
DER: Dynamically Expandable Representation for Class Incremental Learning
Shipeng Yan, Jiangwei Xie, Xuming He (2021)CVPR
@hospedales2021metalearningarticle
Meta-Learning in Neural Networks: A Survey
Timothy Hospedales, Antreas Antoniou, Paul Micaelli, et al. (2021)IEEE TPAMI
@veniat2021efficientarticle
Efficient Continual Learning with Modular Networks and Task-Driven Priors
Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato (2021)ICLR
@lomonaco2021avalanchearticle
Avalanche: An End-to-End Library for Continual Learning
Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu (2021)CLVision Workshop at CVPR
@ye2021masteringarticle
Mastering Atari Games with Limited Data
Weirui Ye, Shaohuai Liu, Thanard Kurutach, et al. (2021)NeurIPS
@yan2021videogptarticle
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan, Yunzhi Zhang, Pieter Abbeel, et al. (2021)arXiv
@li2021prefixarticle
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li, Percy Liang (2021)ACL
@tay2021longarticle
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar (2021)ICLR
@dong2021attentionarticle
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas (2021)ICML
@xiong2021nystromformerarticle
Nystromformer: A Nystrom-Based Algorithm for Approximating Self-Attention
Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty (2021)AAAI
@liu2021swinarticle
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu, Yutong Lin, Yue Cao, et al. (2021)ICCV
@li2021fourierarticle
Fourier Neural Operator for Parametric Partial Differential Equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, et al. (2021)ICLR
@gu2020hippoarticle
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu, Tri Dao, Stefano Ermon, et al. (2020)NeurIPS
@lee2020stochasticarticle
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
Alex X. Lee, Anusha Nagabandi, Pieter Abbeel, et al. (2020)NeurIPS
@sanchez2020learningarticle
Learning to Simulate Complex Physics with Graph Networks
Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, et al. (2020)ICML
@katharopoulos2020transformersarticle
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, et al. (2020)ICML
@zela2020understandingarticle
Understanding and Robustifying Differentiable Architecture Search
Arber Zela, Thomas Elsken, Tonmoy Saikia, et al. (2020)ICLR
@chrysakis2020onlinearticle
Online Continual Learning from Imbalanced Data
Aristotelis Chrysakis, Marie-Francine Moens (2020)ICML
@douillard2020podnetarticle
PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning
Arthur Douillard, Matthieu Cord, Charles Ollion, et al. (2020)ECCV
@chen2020slidearticle
SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
Beidi Chen, Tri Dao, Eric Winsor, et al. (2020)MLSys
@mildenhall2020nerfarticle
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, et al. (2020)ECCV
@zhao2020maintaining_waarticle
Maintaining Discrimination and Fairness in Class Incremental Learning
Bowen Zhao, Xi Xiao, Guojun Gan, et al. (2020)CVPR
@hafner2020dreamarticle
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, et al. (2020)ICLR
@cubuk2020randaugmentarticle
Randaugment: Practical Automated Data Augmentation with a Reduced Search Space
Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, et al. (2020)CVPR Workshops
@locatello2020objectarticle
Object-Centric Learning with Slot Attention
Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, et al. (2020)NeurIPS
@gupta2020lamamlarticle
La-MAML: Look-ahead Meta Learning for Continual Learning
Gunshi Gupta, Karmesh Yadav, Liam Paull (2020)NeurIPS
@cai2020onceforallarticle
Once-for-All: Train One Network and Specialize it for Efficient Deployment
Han Cai, Chuang Gan, Tianzhe Wang, et al. (2020)ICLR
@cai2020oncearticle
Once-for-All: Train One Network and Specialize it for Efficient Deployment
Han Cai, Chuang Gan, Tianzhe Wang, et al. (2020)ICLR
@beltagy2020longformerarticle
Longformer: The Long-Document Transformer
Iz Beltagy, Matthew E. Peters, Arman Cohan (2020)arXiv
@kaplan2020scalingarticle
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan (2020)arXiv
@cordonnier2020relationshiparticle
On the Relationship between Self-Attention and Convolutional Layers
Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi (2020)ICLR
@ho2020ddpmarticle
Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel (2020)NeurIPS
@schrittwieser2020masteringarticle
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert (2020)Nature
@guu2020realmarticle
Retrieval Augmented Language Model Pre-Training
Kelvin Guu, Kenton Lee, Zora Tung, et al. (2020)ICML
@kaiser2020modelarticle
Model Based Reinforcement Learning for Atari
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos (2020)ICLR
@zaheer2020bigbirdarticle
Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey (2020)NeurIPS
@zaheer2020bigarticle
Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, et al. (2020)NeurIPS
@chen2020generativearticle
Generative Pretraining from Pixels
Mark Chen, Alec Radford, Rewon Child, et al. (2020)ICML
@tancik2020fourierarticle
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall (2020)NeurIPS
@wortsman2020supermasksarticle
Supermasks in Superposition
Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu (2020)NeurIPS
@shoeybi2020megatronlmarticle
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, et al. (2020)arXiv
@lambert2020objectivearticle
Objective Mismatch in Model-based Reinforcement Learning
Nathan Lambert, Brandon Amos, Omry Yadan, et al. (2020)L4DC
@thompson2020computationalarticle
The Computational Limits of Deep Learning
Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, et al. (2020)arXiv
@kitaev2020reformerarticle
Reformer: The Efficient Transformer
Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya (2020)ICLR
@khattab2020colbertarticle
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab, Matei Zaharia (2020)SIGIR
@ahmed2020causalworldarticle
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
Ossama Ahmed, Frederik Träuble, Anirudh Goyal, et al. (2020)ICLR
@ferragina2020pgmarticle
The PGM-index: A Fully-dynamic Compressed Learned Index with Provable Worst-case Bounds
Paolo Ferragina, Giorgio Vinciguerra (2020)VLDB
@lewis2020ragarticle
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus (2020)NeurIPS
@martinsson2020randomizedarticle
Randomized Numerical Linear Algebra: Foundations and Algorithms
Per-Gunnar Martinsson, Joel A. Tropp (2020)Acta Numerica
@mattson2020mlperfarticle
MLPerf Training Benchmark
Peter Mattson (2020)MLSys
@buzzega2020darkarticle
Dark Experience for General Continual Learning: a Strong, Simple Baseline
Pietro Buzzega, Matteo Boschini, Angelo Porrello, et al. (2020)NeurIPS
@khosla2020supervisedarticle
Supervised Contrastive Learning
Prannay Khosla, Piotr Teterwak, Chen Wang, et al. (2020)NeurIPS
@sekar2020planningarticle
Planning to Explore via Self-Supervised World Models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, et al. (2020)ICML
@rajbhandari2020zeroarticle
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, et al. (2020)SC
@beaulieu2020learningarticle
Learning to Continually Learn
Shawn Beaulieu, Lapo Frati, Thomas Miconi (2020)ECAI
@wang2020linformerarticle
Linformer: Self-Attention with Linear Complexity
Sinong Wang, Belinda Z. Li, Madian Khabsa, et al. (2020)arXiv
@lee2020neural_clarticle
A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning
Soochan Lee, Junsoo Ha, Dongsu Zhang, et al. (2020)ICLR
@bhojanapalli2020lowarticle
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, et al. (2020)ICML
@james2020rlbencharticle
RLBench: The Robot Learning Benchmark & Learning Environment
Stephen James, Zicong Ma, David Rovick Arrojo, et al. (2020)IEEE Robotics and Automation Letters
@gururangan2020dontarticle
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan, Ana Marasovic, Swabha Swayamdipta, et al. (2020)ACL
@kipf2020contrastivearticle
Contrastive Learning of Structured World Models
Thomas Kipf, Elise van der Pol, Max Welling (2020)ICLR
@yu2020metaworldarticle
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
Tianhe Yu, Deirdre Quillen, Zhanpeng He, et al. (2020)CoRL
@brown2020gpt3article
Language Models are Few-Shot Learners
Tom Brown, Benjamin Mann, Nick Ryder, et al. (2020)NeurIPS
@hayes2020remindarticle
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes, Kushal Kafle, Robik Shrestha, et al. (2020)ECCV
@karpukhin2020dprarticle
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, et al. (2020)EMNLP
@zhao2020simarticle
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey
Wenshuai Zhao, Jorge Pena Queralta, Tomi Westerlund (2020)IEEE Symposium Series on Computational Intelligence
@malkov2020hnswarticle
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs
Yu A. Malkov, Dmitry A. Yashunin (2020)IEEE TPAMI
@radford2019languagearticle
Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, et al. (2019)OpenAI Blog
@razavi2019generatingarticle
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi, Aaron van den Oord, Oriol Vinyals (2019)NeurIPS
@howard2019searchingarticle
Searching for MobileNetV3
Andrew Howard, Mark Sandler, Grace Chu, et al. (2019)ICCV
@chaudhry2019tinyarticle
On Tiny Episodic Memories in Continual Learning
Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, et al. (2019)arXiv
@chaudhry2019efficientarticle
Efficient Lifelong Learning with A-GEM
Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, et al. (2019)ICLR
@burgess2019monetarticle
MONet: Unsupervised Scene Decomposition and Representation
Christopher P. Burgess, Loic Matthey, Nicholas Watters, et al. (2019)arXiv
@hafner2019learningarticle
Learning Latent Dynamics for Planning from Pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer (2019)ICML
@narayanan2019pipedreamarticle
PipeDream: Generalized Pipeline Parallelism for DNN Training
Deepak Narayanan, Aaron Harlap, Amar Phanishayee, et al. (2019)SOSP
@voita2019analyzingarticle
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita, David Talbot, Fedor Moiseev, et al. (2019)ACL
@strubell2019energyarticle
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell, Ananya Ganesh, Andrew McCallum (2019)ACL
@parisi2019continualarticle
Continual Lifelong Learning with Neural Networks: A Review
German I. Parisi, Ronald Kemker, Jose L. Part, et al. (2019)Neural Networks
@vandeven2019threearticle
Three Scenarios for Continual Learning
Gido M. van de Ven, Andreas S. Tolias (2019)NeurIPS Continual Learning Workshop
@husain2019codesearchnetarticle
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, et al. (2019)arXiv
@liu2019dartsarticle
DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, Yiming Yang (2019)ICLR
@loshchilov2019adamwarticle
Decoupled Weight Decay Regularization
Ilya Loshchilov, Frank Hutter (2019)ICLR
@devlin2019bertarticle
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, et al. (2019)NAACL
@frankle2019lotteryarticle
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle, Michael Carlin (2019)ICLR
@javed2019metaarticle
Meta-Learning Representations for Continual Learning
Khurram Javed, Martha White (2019)NeurIPS
@greff2019multiarticle
Multi-Object Representation Learning with Iterative Variational Inference
Klaus Greff, Raphael Lopez Kaufman, Rishabh Kabra, et al. (2019)ICML
@savva2019habitatarticle
Habitat: A Platform for Embodied AI Research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets (2019)ICCV
@riemer2019learningarticle
Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference
Matthew Riemer, Ignacio Cases, Robert Ajemian (2019)ICLR
@janner2019whenarticle
When to Trust Your Model: Model-Based Policy Optimization
Michael Janner, Justin Fu, Marvin Zhang, et al. (2019)NeurIPS
@janner2019trustarticle
When to Trust Your Model: Model-Based Policy Optimization
Michael Janner, Justin Fu, Marvin Zhang, et al. (2019)NeurIPS
@tan2019efficientnetarticle
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan, Quoc V. Le (2019)ICML
@tan2019mnasnetarticle
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Mingxing Tan, Bo Chen, Ruoming Pang, et al. (2019)CVPR
@ke2019modelingarticle
Modeling the Long Term Future in Model-Based Reinforcement Learning
Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati (2019)ICLR
@rahaman2019spectralarticle
On the Spectral Bias of Neural Networks
Nasim Rahaman, Aristide Baratin, Devansh Arpit (2019)ICML
@houlsby2019parameterarticle
Parameter-Efficient Transfer Learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, et al. (2019)ICML
@ivkin2019communicationarticle
Communication-efficient Distributed SGD with Sketching
Nikita Ivkin, Daniel Rothchild, Enayat Ullah (2019)NeurIPS
@shazeer2019fastarticle
Fast Transformer Decoding: One Write-Head is All You Need
Noam Shazeer (2019)arXiv
@aljundi2019taskfreearticle
Task-Free Continual Learning
Rahaf Aljundi, Klaas Kelchtermans, Tinne Tuytelaars (2019)CVPR
@aljundi2019onlinearticle
Online Continual Learning with Maximal Interfered Retrieval
Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, et al. (2019)NeurIPS
@aljundi2019gradientarticle
Gradient Based Sample Selection for Online Continual Learning
Rahaf Aljundi, Min Lin, Baptiste Goujaud, et al. (2019)NeurIPS
@child2019generatingarticle
Generating Long Sequences with Sparse Transformers
Rewon Child, Scott Gray, Alec Radford, et al. (2019)arXiv
@spring2019compressingarticle
Compressing Gradient Optimizers via Count-Sketches
Ryan Spring, Anshumali Shrivastava (2019)ICML
@hou2019learningarticle
Learning a Unified Classifier Incrementally via Rebalancing
Saihui Hou, Xinyu Pan, Chen Change Loy, et al. (2019)CVPR
@yun2019cutmixarticle
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, et al. (2019)ICCV
@farquhar2019towardsarticle
Towards Robust Evaluations of Continual Learning
Sebastian Farquhar, Yarin Gal (2019)Privacy in Machine Learning Workshop at NeurIPS
@elsken2019neuralarticle
Neural Architecture Search: A Survey
Thomas Elsken, Jan Hendrik Metzen, Frank Hutter (2019)JMLR
@kwiatkowski2019naturalarticle
Natural Questions: A Benchmark for Question Answering Research
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield (2019)TACL
@sanh2019distilbertarticle
DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
Victor Sanh, Lysandre Debut, Julien Chaumond, et al. (2019)EMC2 Workshop at NeurIPS
@chen2019progressive_dartsarticle
Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild
Xin Chen, Lingxi Xie, Jun Wu, et al. (2019)IJCV
@song2019scorearticle
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song, Stefano Ermon (2019)NeurIPS
@huang2019gpipearticle
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Yanping Huang, Youlong Cheng, Ankur Bapna (2019)NeurIPS
@wu2019large_bicarticle
Large Scale Incremental Learning
Yue Wu, Yinpeng Chen, Lijuan Wang, et al. (2019)CVPR
@tsai2019transformerarticle
Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel
Yun-Hsuan Hsiao Tsai, Shaojie Bai, Barnabas Poczos, et al. (2019)EMNLP
@allen2019convergencearticle
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song (2019)ICML
@oord2018infoncearticle
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, Oriol Vinyals (2018)arXiv
@nichol2018firstarticle
On First-Order Meta-Learning Algorithms
Alex Nichol, Joshua Achiam, John Schulman (2018)arXiv
@nagabandi2018neuralarticle
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Anusha Nagabandi, Gregory Kahn, Ronald S. Feisal, et al. (2018)ICRA
@chaudhry2018riemannianarticle
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, et al. (2018)ECCV
@jacot2018ntkarticle
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot, Franck Gabriel, Clement Hongler (2018)NeurIPS
@jacot2018neuralarticle
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot, Franck Gabriel, Clément Hongler (2018)NeurIPS
@mallya2018packnetarticle
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
Arun Mallya, Svetlana Lazebnik (2018)CVPR
@louart2018randomarticle
A Random Matrix Approach to Neural Networks
Cosme Louart, Zhenyu Liao, Romain Couillet (2018)Annals of Applied Probability
@ha2018worldarticle
World Models
David Ha, Juergen Schmidhuber (2018)arXiv
@ha2018recurrentarticle
Recurrent World Models Facilitate Policy Evolution
David Ha, Jurgen Schmidhuber (2018)NeurIPS
@silver2018generalarticle
A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play
David Silver, Thomas Hubert, Julian Schrittwieser, et al. (2018)Science
@denton2018stochasticarticle
Stochastic Video Generation with a Learned Prior
Emily Denton, Rob Fergus (2018)ICML
@ghiasi2018dropblockarticle
DropBlock: A Regularization Technique for Convolutional Networks
Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le (2018)NeurIPS
@zhang2018mixuparticle
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, et al. (2018)ICLR
@clavera2018modelarticle
Model-Based Reinforcement Learning via Meta-Policy Optimization
Ignasi Clavera, Jonas Rothfuss, John Schulman, et al. (2018)CoRL
@yoon2018lifelongarticle
Lifelong Learning with Dynamically Expandable Networks
Jaehong Yoon, Eunho Yang, Jeongtae Lee, et al. (2018)ICLR
@thorne2018feverarticle
FEVER: A Large-scale Dataset for Fact Extraction and VERification
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, et al. (2018)NAACL
@wang2018surveyarticle
A Survey on Learning to Hash
Jingdong Wang, Ting Zhang, Jingkuan Song, et al. (2018)IEEE TPAMI
@serra2018overcomingarticle
Overcoming Catastrophic Forgetting with Hard Attention to the Task
Joan Serra, Didac Suris, Marius Miron, et al. (2018)ICML
@schwarz2018progressarticle
Progress & Compress: A Scalable Framework for Continual Learning
Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina (2018)ICML
@chua2018deeparticle
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Kurtland Chua, Roberto Calandra, Rowan McAllister, et al. (2018)NeurIPS
@kriss2018adaptivearticle
Adaptive Cuckoo Filters
Michael Kriss, Michael Mitzenmacher, Sergei Vassilvitskii (2018)ALENEX
@diaz2018dontarticle
Don't Forget, There is More than Forgetting: New Metrics for Continual Learning
Natalia Diaz-Rodriguez, Vincenzo Lomonaco, David Filliat, et al. (2018)NeurIPS Workshop
@micikevicius2018mixedarticle
Mixed Precision Training
Paulius Micikevicius, Sharan Narang, Jonah Alben (2018)ICLR
@velickovic2018gatarticle
Graph Attention Networks
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, et al. (2018)ICLR
@battaglia2018relationalarticle
Relational Inductive Biases, Deep Learning, and Graph Networks
Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, et al. (2018)arXiv
@aljundi2018memoryarticle
Memory Aware Synapses: Learning What (not) to Forget
Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, et al. (2018)ECCV
@sutton2018reinforcementbook
Reinforcement Learning: An Introduction
Richard S. Sutton, Andrew G. Barto (2018)MIT Press
@vershynin2018highbook
High-Dimensional Probability: An Introduction with Applications in Data Science
Roman Vershynin (2018)Cambridge University Press
@kraska2018casearticle
The Case for Learned Index Structures
Tim Kraska, Alex Beutel, Ed H. Chi, et al. (2018)SIGMOD
@hsu2018reevaluatingarticle
Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, et al. (2018)NeurIPS CL Workshop
@tassa2018deepmindarticle
DeepMind Control Suite
Yuval Tassa, Yotam Doron, Alistair Muldal (2018)arXiv
@yang2018hotpotqaarticle
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang, Peng Qi, Saizheng Zhang (2018)EMNLP
@van2017neuralarticle
Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu (2017)NeurIPS
@gomez2017reversiblearticle
The Reversible Residual Network: Backpropagation Without Storing Activations
Aidan N. Gomez, Mengye Ren, Raquel Urtasun, et al. (2017)NeurIPS
@dosovitskiy2017carlaarticle
CARLA: An Open Urban Driving Simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, et al. (2017)CoRL
@vaswani2017attentionarticle
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. (2017)NeurIPS
@zoph2017neuralarticle
Neural Architecture Search with Reinforcement Learning
Barret Zoph, Quoc V. Le (2017)ICLR
@finn2017mamlarticle
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn, Pieter Abbeel, Sergey Levine (2017)ICML
@finn2017modelarticle
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn, Pieter Abbeel, Sergey Levine (2017)ICML
@lopezpaz2017gradientarticle
Gradient Episodic Memory for Continual Learning
David Lopez-Paz, Marc'Aurelio Ranzato (2017)NeurIPS
@silver2017masteringarticle
Mastering the Game of Go without Human Knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, et al. (2017)Nature
@pathak2017curiosityarticle
Curiosity-driven Exploration by Self-Supervised Prediction
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, et al. (2017)ICML
@oyallon2017scalingarticle
Scaling the Scattering Transform: Deep Hybrid Networks
Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko (2017)ICCV
@bach2017equivalencearticle
On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
Francis Bach (2017)JMLR
@zenke2017continualarticle
Continual Learning through Synaptic Intelligence
Friedemann Zenke, Ben Poole, Surya Ganguli (2017)ICML
@shin2017continualarticle
Continual Learning with Deep Generative Replay
Hanul Shin, Jung Kwon Lee, Jaehong Kim, et al. (2017)NeurIPS
@kirkpatrick2017overcomingarticle
Overcoming Catastrophic Forgetting in Neural Networks
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz (2017)PNAS
@pennington2017nonlineararticle
Nonlinear Random Matrix Theory for Deep Learning
Jeffrey Pennington, Pratik Worah (2017)NeurIPS
@schulman2017ppoarticle
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, et al. (2017)arXiv
@larsen2017optimalityarticle
Optimality of the Johnson-Lindenstrauss Lemma
Kasper Green Larsen, Jelani Nelson (2017)FOCS
@clarkson2017lowarticle
Low-Rank Approximation and Regression in Input Sparsity Time
Kenneth L. Clarkson, David P. Woodruff (2017)JACM
@joshi2017triviaqaarticle
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi, Eunsol Choi, Daniel S. Weld, et al. (2017)ACL
@arjovsky2017wganarticle
Wasserstein Generative Adversarial Networks
Martin Arjovsky, Soumith Chintala, Leon Bottou (2017)ICML
@mitzenmacher2017probabilitybook
Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis
Michael Mitzenmacher, Eli Upfal (2017)Cambridge University Press
@watters2017visualarticle
Visual Interaction Networks: Learning a Physics Simulator from Video
Nicholas Watters, Daniel Zoran, Theophane Weber, et al. (2017)NeurIPS
@shazeer2017outrageouslyarticle
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz (2017)ICLR
@aljundi2017expertarticle
Expert Gate: Lifelong Learning with a Network of Experts
Rahaf Aljundi, Punarjay Chakravarty, Tinne Tuytelaars (2017)CVPR
@chiappa2017recurrentarticle
Recurrent Environment Simulators
Silvia Chiappa, Sebastien Racaniere, Daan Wierstra, et al. (2017)ICLR
@rebuffi2017icarlarticle
iCaRL: Incremental Classifier and Representation Learning
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, et al. (2017)CVPR
@kipf2017gcnarticle
Semi-Supervised Classification with Graph Convolutional Networks
Thomas N. Kipf, Max Welling (2017)ICLR
@lin2017fpnarticle
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, et al. (2017)CVPR
@lomonaco2017core50article
CORe50: a New Dataset and Benchmark for Continuous Object Recognition
Vincenzo Lomonaco, Davide Maltoni (2017)CoRL
@lotter2017deeparticle
Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
William Lotter, Gabriel Kreiman, David Cox (2017)ICLR
@cao2017hashnetarticle
HashNet: Deep Learning to Hash by Continuation
Zhangjie Cao, Mingsheng Long, Jianmin Wang, et al. (2017)ICCV
@li2017learningarticle
Learning without Forgetting
Zhizhong Li, Derek Hoiem (2017)IEEE TPAMI
@gittens2016revisitingarticle
Revisiting the Nystrom Method for Improved Recommendations
Alex Gittens, Michael W. Mahoney (2016)JMLR
@rusu2016progressivearticle
Progressive Neural Networks
Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins (2016)arXiv
@poole2016exponentialarticle
Exponential Expressivity in Deep Neural Networks through Transient Chaos
Ben Poole, Subhaneil Lahiri, Maithra Raghu, et al. (2016)NeurIPS
@finn2016unsupervisedarticle
Unsupervised Learning for Physical Interaction through Video Prediction
Chelsea Finn, Ian Goodfellow, Sergey Levine (2016)NeurIPS
@silver2016masteringarticle
Mastering the Game of Go with Deep Neural Networks and Tree Search
David Silver, Aja Huang, Chris J. Maddison, et al. (2016)Nature
@kumaran2016whatarticle
What Learning Systems Do Intelligent Agents Need? Complementary Learning Systems Theory Updated
Dharshan Kumaran, Demis Hassabis, James L. McClelland (2016)Trends in Cognitive Sciences
@yu2016orthogonalarticle
Orthogonal Random Features
Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, et al. (2016)NeurIPS
@huang2016stochastic_deptharticle
Deep Networks with Stochastic Depth
Gao Huang, Yu Sun, Zhuang Liu, et al. (2016)ECCV
@huang2016deeparticle
Deep Networks with Stochastic Depth
Gao Huang, Yu Sun, Zhuang Liu, et al. (2016)ECCV
@liu2016deeparticle
Deep Supervised Hashing for Fast Image Retrieval
Haomiao Liu, Ruiping Wang, Shiguang Shan, et al. (2016)CVPR
@jung2016lessarticle
Less-Forgetting Learning in Deep Neural Networks
Heechul Jung, Jeongwoo Ju, Minju Jung, et al. (2016)arXiv
@he2016deepinproceedings
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. (2016)CVPR
@he2016resnetarticle
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. (2016)CVPR
@mathieu2016deeparticle
Deep Multi-Scale Video Prediction Beyond Mean Square Error
Michael Mathieu, Camille Couprie, Yann LeCun (2016)ICLR
@hardt2016trainarticle
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt, Ben Recht, Yoram Singer (2016)ICML
@alon2016probabilisticbook
The Probabilistic Method
Noga Alon, Joel H. Spencer (2016)Wiley
@battaglia2016interactionarticle
Interaction Networks for Learning about Objects, Relations and Physics
Peter W. Battaglia, Razvan Pascanu, Matthew Lai, et al. (2016)NeurIPS
@mallat2016understandingarticle
Understanding Deep Convolutional Networks
Stephane Mallat (2016)Phil. Trans. R. Soc. A
@chen2016trainingarticle
Training Deep Nets with Sublinear Memory Cost
Tianqi Chen, Bing Xu, Chiyuan Zhang, et al. (2016)arXiv
@gal2016dropoutarticle
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Yarin Gal, Zoubin Ghahramani (2016)ICML
@andoni2015practicalarticle
Practical and Optimal LSH for Angular Distance
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, et al. (2015)NeurIPS
@andoni2015optimalarticle
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Alexandr Andoni, Ilya Razenshteyn (2015)STOC
@kingma2015adamarticle
Adam: A Method for Stochastic Optimization
Diederik P. Kingma, Jimmy Ba (2015)ICLR
@hinton2015distillingarticle
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, Jeff Dean (2015)NeurIPS Workshop
@tropp2015introductionarticle
An Introduction to Matrix Concentration Inequalities
Joel A. Tropp (2015)Foundations and Trends in Machine Learning
@oh2015actionarticle
Action-Conditional Video Prediction using Deep Networks in Atari Games
Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, et al. (2015)NeurIPS
@chung2021rethinkingarticle
Rethinking Attention with Performers
Junyoung Chung, Sungjin Ahn, Yoshua Bengio (2015)arXiv
@he2015initarticle
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. (2015)ICCV
@he2015delvingarticle
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. (2015)ICCV
@srivastava2015unsupervisedarticle
Unsupervised Learning of Video Representations using LSTMs
Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov (2015)ICML
@ronneberger2015unetarticle
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger, Philipp Fischer, Thomas Brox (2015)MICCAI
@ioffe2015batchnormarticle
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe, Christian Szegedy (2015)ICML
@han2015learningarticle
Learning Both Weights and Connections for Efficient Neural Networks
Song Han, Jeff Pool, John Tran, et al. (2015)NeurIPS
@saxe2014exactarticle
Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks
Andrew M. Saxe, James L. McClelland, Surya Ganguli (2014)ICLR
@fan2014cuckooarticle
Cuckoo Filter: Practically Better Than Bloom
Bin Fan, Dave G. Andersen, Michael Kaminsky, et al. (2014)CoNEXT
@woodruff2014sketchingarticle
Sketching as a Tool for Numerical Linear Algebra
David P. Woodruff (2014)Foundations and Trends in Theoretical Computer Science
@kingma2014adamarticle
Adam: A Method for Stochastic Optimization
Diederik P. Kingma, Jimmy Ba (2014)ICLR
@kingma2014vaearticle
Auto-Encoding Variational Bayes
Diederik P. Kingma, Max Welling (2014)ICLR
@kingma2014autoarticle
Auto-Encoding Variational Bayes
Diederik P. Kingma, Max Welling (2014)ICLR
@goodfellow2014generativearticle
Generative Adversarial Nets
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al. (2014)NeurIPS
@anden2014deeparticle
Deep Scattering Spectrum
Joakim Anden, Stephane Mallat (2014)IEEE Transactions on Signal Processing
@cho2014learningarticle
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, et al. (2014)EMNLP
@srivastava2014dropoutarticle
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, et al. (2014)JMLR
@liberty2013simplearticle
Simple and Deterministic Matrix Sketching
Edo Liberty (2013)KDD
@bruna2013invariantarticle
Invariant Scattering Convolution Networks
Joan Bruna, Stephane Mallat (2013)IEEE TPAMI
@wan2013dropconnectarticle
Regularization of Neural Networks using DropConnect
Li Wan, Matthew Zeiler, Sixin Zhang, et al. (2013)ICML
@mermillod2013stabilityarticle
The Stability-Plasticity Dilemma: Investigating the Continuum from Catastrophic Forgetting to Age-Limited Learning Effects
Martial Mermillod, Aurélia Bugaiska, Patrick Bonin (2013)Frontiers in Psychology
@le2013fastfoodarticle
Fastfood - Approximating Kernel Expansions in Loglinear Time
Quoc Le, Tamas Sarlos, Alexander Smola (2013)ICML
@wager2013dropoutarticle
Dropout Training as Adaptive Regularization
Stefan Wager, Sida Wang, Percy Liang (2013)NeurIPS
@todorov2012mujocoarticle
MuJoCo: A physics engine for model-based control
Emanuel Todorov, Tom Erez, Yuval Tassa (2012)IROS
@mallat2012grouparticle
Group Invariant Scattering
Stephane Mallat (2012)Communications on Pure and Applied Mathematics
@recht2011simplerarticle
A Simpler Approach to Matrix Completion
Benjamin Recht (2011)JMLR
@deisenroth2011pilcoarticle
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
Marc Peter Deisenroth, Carl Edward Rasmussen (2011)ICML
@halko2011findingarticle
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2011)SIAM Review
@glorot2010initarticle
Understanding the Difficulty of Training Deep Feedforward Neural Networks
Xavier Glorot, Yoshua Bengio (2010)AISTATS
@glorot2010understandingarticle
Understanding the Difficulty of Training Deep Feedforward Neural Networks
Xavier Glorot, Yoshua Bengio (2010)AISTATS
@rahimi2009weightedarticle
Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning
Ali Rahimi, Benjamin Recht (2009)NeurIPS
@candes2009exactarticle
Exact Matrix Completion via Convex Optimization
Emmanuel J. Candes, Benjamin Recht (2009)Foundations of Computational Mathematics
@pearl2009causalitybook
Causality: Models, Reasoning, and Inference
Judea Pearl (2009)Cambridge University Press
@weinberger2009featurearticle
Feature Hashing for Large Scale Multitask Learning
Kilian Weinberger, Anirban Dasgupta, John Langford, et al. (2009)ICML
@mahoney2009curarticle
CUR Matrix Decompositions for Improved Data Analysis
Michael W. Mahoney, Petros Drineas (2009)PNAS
@robertson2009probabilisticarticle
The Probabilistic Relevance Framework: BM25 and Beyond
Stephen Robertson, Hugo Zaragoza (2009)Foundations and Trends in Information Retrieval
@bengio2009curriculumarticle
Curriculum Learning
Yoshua Bengio, Jerome Louradour, Ronan Collobert, et al. (2009)ICML
@matousek2008variantsarticle
On Variants of the Johnson-Lindenstrauss Lemma
Jiří Matoušek (2008)Random Structures & Algorithms
@bottou2008tradeoffsarticle
The Tradeoffs of Large Scale Learning
Leon Bottou, Olivier Bousquet (2008)NeurIPS
@mallat2008waveletbook
A Wavelet Tour of Signal Processing
Stephane Mallat (2008)Academic Press
@abraham2008metaplasticityarticle
Metaplasticity: Tuning Synapses and Networks for Plasticity
Wickliffe C. Abraham (2008)Nature Reviews Neuroscience
@rahimi2007randomarticle
Random Features for Large-Scale Kernel Machines
Ali Rahimi, Benjamin Recht (2007)NeurIPS
@ji2007coordinatedarticle
Coordinated Memory Replay in the Visual Cortex and Hippocampus During Sleep
Daoyun Ji, Matthew A. Wilson (2007)Nature Neuroscience
@flajolet2007hyperloglogarticle
HyperLogLog: The Analysis of a Near-Optimal Cardinality Estimation Algorithm
Philippe Flajolet, Eric Fusy, Olivier Gandouet, et al. (2007)DMTCS Proceedings
@andoni2006neararticle
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
Alexandr Andoni, Piotr Indyk (2006)FOCS
@donoho2006compressedarticle
Compressed Sensing
David L. Donoho (2006)IEEE Transactions on Information Theory
@candes2006robustarticle
Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information
Emmanuel J. Candes, Justin Romberg, Terence Tao (2006)IEEE Transactions on Information Theory
@ailon2006fastarticle
The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors
Nir Ailon, Bernard Chazelle (2006)STOC
@efraimidis2006weightedarticle
Weighted Random Sampling with a Reservoir
Pavlos S. Efraimidis, Paul G. Spirakis (2006)Information Processing Letters
@ferguson2006optimalbook
Optimal Stopping and Applications
Thomas S. Ferguson (2006)Mathematics Department, UCLA
@cormode2005improvedarticle
An Improved Data Stream Summary: The Count-Min Sketch and Its Applications
Graham Cormode, S. Muthukrishnan (2005)Journal of Algorithms
@charikar2004findingarticle
Finding Frequent Items in Data Streams
Moses Charikar, Kevin Chen, Martin Farach-Colton (2004)Theoretical Computer Science
@achlioptas2003databasearticle
Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins
Dimitris Achlioptas (2003)Journal of Computer and System Sciences
@dasgupta2003elementaryarticle
An Elementary Proof of a Theorem of Johnson and Lindenstrauss
Sanjoy Dasgupta, Anupam Gupta (2003)Random Structures & Algorithms
@charikar2002similarityarticle
Similarity Estimation Techniques from Rounding Algorithms
Moses S. Charikar (2002)STOC
@haykin2002adaptivebook
Adaptive Filter Theory
Simon Haykin (2002)Prentice Hall
@williams2001nystromarticle
Using the Nystrom Method to Speed Up Kernel Machines
Christopher K.I. Williams, Matthias Seeger (2001)NeurIPS
@french1999catastrophicarticle
Catastrophic Forgetting in Connectionist Networks
Robert M. French (1999)Trends in Cognitive Sciences
@indyk1998approximatearticle
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
Piotr Indyk, Rajeev Motwani (1998)STOC
@broder1997resemblancearticle
On the Resemblance and Containment of Documents
Andrei Z. Broder (1997)SEQUENCES
@karger1997consistentarticle
Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
David Karger, Eric Lehman, Tom Leighton (1997)STOC
@mcclelland1995therearticle
Why There Are Complementary Learning Systems in the Hippocampus and Neocortex
James L. McClelland, Bruce L. McNaughton, Randall C. O'Reilly (1995)Psychological Review
@motwani1995randomizedbook
Randomized Algorithms
Rajeev Motwani, Prabhakar Raghavan (1995)Cambridge University Press
@robertson1995okapiarticle
Okapi at TREC-3
Stephen E. Robertson, Steve Walker, Susan Jones, et al. (1995)TREC
@jordan1994hierarchicalarticle
Hierarchical Mixtures of Experts and the EM Algorithm
Michael I. Jordan, Robert A. Jacobs (1994)Neural Computation
@daubechies1992tenbook
Ten Lectures on Wavelets
Ingrid Daubechies (1992)SIAM
@hornik1991approximationarticle
Approximation Capabilities of Multilayer Feedforward Networks
Kurt Hornik (1991)Neural Networks
@jacobs1991adaptivearticle
Adaptive Mixtures of Local Experts
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, et al. (1991)Neural Computation
@ratcliff1990connectionistarticle
Connectionist Models of Recognition Memory: Constraints Imposed by Learning and Forgetting Functions
Roger Ratcliff (1990)Psychological Review
@cybenko1989approximationarticle
Approximation by Superpositions of a Sigmoidal Function
George Cybenko (1989)Mathematics of Control, Signals and Systems
@mccloskey1989catastrophicarticle
Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem
Michael McCloskey, Neal J. Cohen (1989)Psychology of Learning and Motivation
@mallat1989theoryarticle
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
Stephane Mallat (1989)IEEE Transactions on Pattern Analysis and Machine Intelligence
@vitter1985randomarticle
Random Sampling with a Reservoir
Jeffrey S. Vitter (1985)ACM Transactions on Mathematical Software
@valiant1984pacarticle
A Theory of the Learnable
Leslie G. Valiant (1984)Communications of the ACM
@johnson1984extensionsarticle
Extensions of Lipschitz Mappings into a Hilbert Space
William B. Johnson, Joram Lindenstrauss (1984)Contemporary Mathematics
@johnsonlaird1983mentalbook
Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness
Philip N. Johnson-Laird (1983)Harvard University Press
@oja1982simplifiedarticle
Simplified Neuron Model as a Principal Component Analyzer
Erkki Oja (1982)Journal of Mathematical Biology
@grossberg1980howarticle
How Does a Brain Build a Cognitive Code?
Stephen Grossberg (1980)Psychological Review
@vapnik1971uniformarticle
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
Vladimir N. Vapnik, Alexey Ya. Chervonenkis (1971)Theory of Probability and Its Applications
@bloom1970spacearticle
Space/Time Trade-offs in Hash Coding with Allowable Errors
Burton H. Bloom (1970)Communications of the ACM
@robbins1951stochasticarticle
A Stochastic Approximation Method
Herbert Robbins, Sutton Monro (1951)Annals of Mathematical Statistics
@kullback1951informationarticle
On Information and Sufficiency
Solomon Kullback, Richard A. Leibler (1951)Annals of Mathematical Statistics
@craik1943naturebook
The Nature of Explanation
Kenneth J. W. Craik (1943)Cambridge University Press
@turing1936computablearticle
On Computable Numbers, with an Application to the Entscheidungsproblem
Alan M. Turing (1936)Proceedings of the London Mathematical Society
@kolmogorov1933foundationsbook
Foundations of the Theory of Probability
Andrey N. Kolmogorov (1933)Julius Springer