SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference

(supercomputing-system-ai-lab.github.io)

3 points | by matt_d 11 hours ago ago

No comments yet.