Skip to main content

ML Systems

Building and deploying ML models at scale is a constant tension between compute cost, latency, and memory. This chapter covers the systems techniques that manage that tension: distributed training, mixed precision, inference optimization, and profiling.