Zeyu Yang - 2 posts

Evolving Internet with Swarm Intelligence

May 30, 2026 · 9 min read

PhD student at Rice University

Today's web is flat text behind URLs. Search agents crawl it, extract what they need, and leave nothing behind. We propose a different model: every document carries structured metadata (entities, paraphrases, questions), and every agent visit enriches that metadata. The internet evolves through use.

This idea sits next to several existing lines of work, and it helps to say up front how it differs. The Semantic Web and RDF annotation programs also attach structured metadata to documents, but they depend on manual or schema-driven curation rather than on annotations produced as a byproduct of agent use. Knowledge-graph construction from text (entity and relation extraction) builds graphs once, offline, rather than treating the graph as a living artifact that every visit updates. Self-improving and self-play retrieval systems close a training loop on a fixed corpus; agent-memory and write-back systems persist what an agent learns, but usually in a private scratchpad rather than back into a shared, re-indexable web. The distinguishing claim here is the combination: a shared document graph that is enriched in place by the same agents that consume it, so the substrate and the training signal co-evolve.

To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration

October 3, 2025 · 6 min read

Zeyu Yang

PhD student at Rice University

Zeyu Yang, Tianyi Zhang, Jianwen Xie, Chuan Li, Zhaozhuo Xu, Anshumali Shrivastava

Accepted at the International Conference on Learning Representations (ICLR), 2026

[arXiv] [OpenReview] [Code]

Disclosure: this post describes my own work, so treat the framing as that of an author rather than a neutral third party.

Modern GenAI models are enormous. DeepSeek-R1 alone has 671 billion parameters. Even after converting to FP8, serving these models eats up massive amounts of GPU memory and bandwidth. The standard response is lossy quantization: throw away some precision and hope the outputs don't degrade too much. But what if you didn't have to lose anything at all?

In this work, we show that you can losslessly compress FP8 model weights by exploiting a simple observation about how neural networks store information. The result is ECF8, a format that saves up to 26.9% memory and speeds up inference by up to 177.1%, while producing outputs that are bit for bit identical to the original model.