The End of GPU Scaling? Compute & The Agent Era — Tim Dettmers (Ai2) & Dan Fu (Together AI)
Podcast:The MAD Podcast with Matt Turck Published On: Thu Jan 22 2026 Description: Will AGI happen soon - or are we running into a wall?In this episode, I’m joined by Tim Dettmers (Assistant Professor at CMU; Research Scientist at the Allen Institute for AI) and Dan Fu (Assistant Professor at UC San Diego; VP of Kernels at Together AI) to unpack two opposing frameworks from their essays: “Why AGI Will Not Happen” versus “Yes, AGI Will Happen.” Tim argues progress is constrained by physical realities like memory movement and the von Neumann bottleneck; Dan argues we’re still leaving massive performance on the table through utilization, kernels, and systems—and that today’s models are lagging indicators of the newest hardware and clusters.Then we get practical: agents and the “software singularity.” Dan says agents have already crossed a threshold even for “final boss” work like writing GPU kernels. Tim’s message is blunt: use agents or be left behind. Both emphasize that the leverage comes from how you use them—Dan compares it to managing interns: clear context, task decomposition, and domain judgment, not blind trust.We close with what to watch in 2026: hardware diversification, the shift toward efficient, specialized small models, and architecture evolution beyond classic Transformers—including state-space approaches already showing up in real systems.Sources:Why AGI Will Not Happen - https://timdettmers.com/2025/12/10/why-agi-will-not-happen/Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work - https://timdettmers.com/2026/01/13/use-agents-or-be-left-behind/Yes, AGI Can Happen – A Computational Perspective - https://danfu.org/notes/agi/The Allen Institute for Artificial IntelligenceWebsite - https://allenai.orgX/Twitter - https://x.com/allen_aiTogether AIWebsite - https://www.together.aiX/Twitter - https://x.com/togethercomputeTim DettmersBlog - https://timdettmers.comLinkedIn - https://www.linkedin.com/in/timdettmers/X/Twitter - https://x.com/Tim_DettmersDan FuBlog - https://danfu.orgLinkedIn - https://www.linkedin.com/in/danfu09/X/Twitter - https://x.com/realDanFuFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)Blog - https://mattturck.comLinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) - Intro(01:06) – Two essays, two frameworks on AGI(01:34) – Tim’s background: quantization, QLoRA, efficient deep learning(02:25) – Dan’s background: FlashAttention, kernels, alternative architectures(03:38) – Defining AGI: what does it mean in practice?(08:20) – Tim’s case: computation is physical, diminishing returns, memory movement(11:29) – “GPUs won’t improve meaningfully”: the core claim and why(16:16) – Dan’s response: utilization headroom (MFU) + “models are lagging indicators”(22:50) – Pre-training vs post-training (and why product feedback matters)(25:30) – Convergence: usefulness + diffusion (where impact actually comes from)(29:50) – Multi-hardware future: NVIDIA, AMD, TPUs, Cerebras, inference chips(32:16) – Agents: did the “switch flip” yet?(33:19) – Dan: agents crossed the threshold (kernels as the “final boss”)(34:51) – Tim: “use agents or be left behind” + beyond coding(36:58) – “90% of code and text should be written by agents” (how to do it responsibly)(39:11) – Practical automation for non-coders: what to build and how to start(43:52) – Dan: managing agents like junior teammates (tools, guardrails, leverage)(48:14) – Education and training: learning in an agent world(52:44) – What Tim is building next (open-source coding agent; private repo specialization)(54:44) – What Dan is building next (inference efficiency, cost, performance)(55:58) – Mega-kernels + Together Atlas (speculative decoding + adaptive speedups)(58:19) – Predictions for 2026: small models, open-source, hardware, modalities(1:02:02) – Beyond transformers: state-space and architecture diversity(1:03:34) – Wrap