Skip to main content
← Back to Bibliography

Efficient Memory Management for Large Language Model Serving with PagedAttention

Authors: Woosuk Kwon, Zhuohan Li, Siyuan Zhuang

Year: 2023

Venue: SOSP

Type: article

URL: https://arxiv.org/abs/2309.06180

arXiv: 2309.06180

Cite as: [@kwon2023efficient]

Raw Files

No raw files yet. Run node scripts/fetch-bibliography-raw.mjs --only kwon2023efficient to populate, or drop files into raw/bibliography/kwon2023efficient/.

BibTeX

@inproceedings{kwon2023efficient,
  title = {Efficient Memory Management for Large Language Model Serving with PagedAttention},
  author = {Woosuk Kwon and Zhuohan Li and Siyuan Zhuang},
  year = {2023},
  booktitle = {SOSP},
  url = {https://arxiv.org/abs/2309.06180}
}

Notes

No notes yet. Create notes/kwon2023efficient.md to add notes.