Moe inference optimizations: 15% lower expert load by request reordering

(blog.doubleword.ai)

3 points | by mezark 11 hours ago ago

No comments yet.