Skip to main content
← Back to Bibliography

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

Year: 2024

Venue: ICML

Type: article

URL: https://arxiv.org/abs/2401.10774

arXiv: 2401.10774

Cite as: [@cai2024medusa]

Raw Files

No raw files yet. Run node scripts/fetch-bibliography-raw.mjs --only cai2024medusa to populate, or drop files into raw/bibliography/cai2024medusa/.

BibTeX

@inproceedings{cai2024medusa,
  title = {Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads},
  author = {Tianle Cai and Yuhong Li and Zhengyang Geng and Hongwu Peng and Jason D. Lee and Deming Chen and Tri Dao},
  year = {2024},
  booktitle = {ICML},
  url = {https://arxiv.org/abs/2401.10774}
}

Notes

No notes yet. Create notes/cai2024medusa.md to add notes.