Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism

(mlsys.wuklab.io)

2 points | by matt_d 10 hours ago ago

No comments yet.