I don't like this article. In particular, this section is especially poor:
> Block 1: We couldn't calculate fast enough. Solution: The GPU.
> Block 2: We couldn't train deep enough. Solution: Transformer architecture.
> Block 3: We can't "think" fast enough. Solution: Groq’s LPU.
#2 is outright wrong. Deep networks were made viable from residual layers and their refinement. #3 is also incorrect; "think" = compute so this is the same statement as #1.
I don't like this article. In particular, this section is especially poor:
> Block 1: We couldn't calculate fast enough. Solution: The GPU.
> Block 2: We couldn't train deep enough. Solution: Transformer architecture.
> Block 3: We can't "think" fast enough. Solution: Groq’s LPU.
#2 is outright wrong. Deep networks were made viable from residual layers and their refinement. #3 is also incorrect; "think" = compute so this is the same statement as #1.
Also, the "limestone" analogy is pretty weak.