Why LLM decode is memory-bound, not compute-bound

(github.com)

4 points | by harshuljain13 5 hours ago ago

1 comments