Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao
Year: 2024
Venue: ICML
Type: article
URL: https://arxiv.org/abs/2401.10774
arXiv: 2401.10774
Cite as: [@cai2024medusa]
No raw files yet. Run node scripts/fetch-bibliography-raw.mjs --only cai2024medusa to populate, or drop files into raw/bibliography/cai2024medusa/.
@inproceedings{cai2024medusa,
title = {Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads},
author = {Tianle Cai and Yuhong Li and Zhengyang Geng and Hongwu Peng and Jason D. Lee and Deming Chen and Tri Dao},
year = {2024},
booktitle = {ICML},
url = {https://arxiv.org/abs/2401.10774}
}No notes yet. Create notes/cai2024medusa.md to add notes.