Is Nvidia's post-Rubin roadmap shifting toward inference-first architectures?

(buysellram.com)

2 points | by jamesbsr 7 hours ago ago

1 comments

jamesbsr 7 hours ago ago

The Pivot to "Inference Sovereignty" NVIDIA is shifting focus from raw training power to deterministic inference to solve the "Stochastic Wall"—the unpredictable latency jitter in current GPUs that hampers real-time AI agents.
Feynman Architecture (1.6nm): Utilizing TSMC’s A16 node with Backside Power Delivery (Super Power Rail) to achieve a projected 100x efficiency gain over Blackwell.
LPX Cores: Integration of Groq-derived deterministic logic to provide guaranteed p95 latency for "Chain of Thought" reasoning.
Storage Next: Collaboration on 100M IOPS SSDs that function as a peer to GPU memory, eliminating the "Memory Wall" for million-token contexts.
Vertical Fusion: 3D logic-on-logic stacking that places SRAM-rich chiplets directly over compute dies to minimize token-generation energy costs.
Supply Chain: Rumors of a strategic shift to Intel Foundry (18A) for I/O sourcing to diversify away from total TSMC reliance.