Skip to main content
← Back to Bibliography

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Authors: Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari S. Morcos

Year: 2024

Venue: NeurIPS

Type: article

URL: https://arxiv.org/abs/2308.12284

arXiv: 2308.12284

Cite as: [@tirumala2024d4]

Raw Files

No raw files yet. Run node scripts/fetch-bibliography-raw.mjs --only tirumala2024d4 to populate, or drop files into raw/bibliography/tirumala2024d4/.

BibTeX

@inproceedings{tirumala2024d4,
  title = {D4: Improving LLM Pretraining via Document De-Duplication and Diversification},
  author = {Kushal Tirumala and Daniel Simig and Armen Aghajanyan and Ari S. Morcos},
  year = {2024},
  booktitle = {NeurIPS},
  url = {https://arxiv.org/abs/2308.12284}
}

Notes

No notes yet. Create notes/tirumala2024d4.md to add notes.