Austin and Vik discuss Credo's acquisition of Dust Photonics, XPO as the new standard for scale-out (maybe instead of CPO?) and some thoughts about Nuvacore entering the CPU scene for agentic AI.Gavin Baker's tweet: https://x.com/GavinSBaker/status/2044410644301046031?s=20Vik's Substack: https://www.viksnewsletter.comAustin's Substack: https://www.chipstrat.comChapters00:00 Introduction to the Semiconductor Landscape02:49 The Rise of Nuvacore and CPU Innovations05:27 The Demand for CPUs in the AI Era07:59 Photonics: The Next Frontier in Semiconductors10:26 Credo's Acquisition of Dust Photonics13:12 Vertical Integration in Semiconductor Companies15:15 The Future of Copper and Optical Technologies20:28 The Evolution of AI Training Models25:28 Innovations in Optical Interconnects31:10 The Future of Data Center Connectivity36:56 Strategic Implications in the Optical Ecosystem
In this episode, Austin and Vik discuss if Intel is finally back with CPU partnerships with Google, and heterogeneous inference with SambaNova, while market cap soars above $300B. Vik tries to get his OpenClaw instance to dream every night.Chapters00:00 Anthropic's New Direction: Chip Development02:30 Navigating Subscription Changes and Token Costs05:25 Exploring Alternative AI Models08:10 The Economics of AI: Rent vs. Buy10:56 Intel's Resurgence and Market Dynamics15:23 Intel's Strategic Partnerships and Market Positioning19:37 The Role of IPUs in Modern Computing25:08 Coexistence of x86 and ARM Architectures29:55 Innovations in Chip Architecture and Future Prospects
Reiner Pope is the co-founder and CEO of MatX, the startup building chips designed from first principles for LLMs. Before MatX, Reiner was on the Google Brain team training LLMs, and his co-founder Mike Gunter was on the TPU team. They left Google one week before ChatGPT was released.A counterintuitive throughput insight from the conversation:“Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.”We get into:• The hybrid SRAM + HBM bet, and why pipeline parallelism finally works• Overcoming the CUDA moat• Why frontier labs are willing to bet on an AI ASIC startup• Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not)• Why 95% of model-side news is noise for chip design• Why sparse MoE drives MatX to “the most interconnect of any announced product”• How MatX uses AI for its own chip design• The biggest challenges aheadChapters:00:00 “We left Google one week before ChatGPT”00:24 Intro: who is MatX01:17 Origin story: leaving Google for LLM chips02:21 GPT-3 and the “too expensive” problem04:25 Why buy hardware that is not a GPU05:52 Overcoming the CUDA moat08:46 Early investors09:35 The name MatX09:59 The chip: matrix multiply + hybrid SRAM/HBM12:11 Why pipeline parallelism finally works14:22 Reading papers and Google going dark15:20 Research agenda: attention and numerics17:06 Five specs and meeting customers where they are19:24 Why frontier labs are the natural first customer20:32 Workloads: training, prefill, decode22:18 Little’s law and the throughput case for low latency24:29 Interconnect and MoE topology26:35 Inside the team: 100 people, full stack28:32 Agentic AI: 95% noise for hardware30:35 KV cache sizing in an agentic world32:11 How MatX uses AI for chip design (Verilog + BlueSpec)34:23 Go to market: proving credibility under NDA35:12 Porting effort for frontier labs36:34 Biggest skepticism: manufacturing at gigawatt scale37:32 Hiring plugAustin Lyons @ Chipstrat: https://www.chipstrat.comVik Sekar @ Vik's Newsletter: https://www.viksnewsletter.com/
Intel Foundry just partnered with Elon Musk’s Terafab. What is Terafab anyway, why vertically integrated fabs make sense but the economics don’t (yet!), and what Intel is doing here (hint: no idea).Then: OpenAI acquires TBPN for an estimated $100-300M. Not sure why, but the more interesting thing is the value of niche audiences when five companies control a trillion dollars in AI capex.And finally, Citrini Research sent an analyst to the Strait of Hormuz with a Pelican case full of spy gear, $15K cash, and Cuban cigars. The most unhinged research trip in Substack history.Austin Lyons — Chipstrat (https://chipstrat.com) Vik Sekar — Vik's Newsletter (https://www.viksnewsletter.com)Subscribe for weekly episodes on semiconductors, AI, infrastructure, and the business of chips.
In this episode, Austin and Vik analyze NVIDIA's $2 billion investment in Marvell NVLink Fusion, exploring its implications for AI infrastructure, interconnect protocols, and the broader chip ecosystem. They also discuss the current memory market surge, DRAM pricing, and Intel's strategic fab buyback, providing deep insights into industry trends and future directions.On SubstackVik: https://www.viksnewsletter.com/Austin: https://www.chipstrat.com/Chapters00:00 NVIDIA's $2 Billion Investment in Marvell20:11 The Memory Market Crisis20:16 The Future of Memory Pricing and Consumer Impact22:55 The Cycle of Supply and Demand in Memory27:23 AI's Impact on Memory Demand31:46 Long-Term Agreements and Market Stability35:07 Intel's Strategic Fab Buyback40:44 Monopoly Analogy: Intel's Market Strategy
In this episode, Austin and Vik analyze recent developments in GloFo patent lawsuits, the impact of TurboQuant on AI inference, and ARM's strategic move into silicon for agentic AI workloads. Read Vik's substack: https://www.viksnewsletter.comRead Austin's substack: https://www.chipstrat.comChapters00:00 Patent Wars in Semiconductor Industry07:14 Understanding TurboQuant and Its Implications24:42 Innovations in Memory Management28:00 The Rise of ARM AGI CPUs32:56 Agentic AI and CPU Compatibility39:54 Performance Metrics in Agentic AI44:52 ARM's Market Timing and Challenges
Austin and Vik break down a packed week in semiconductors, covering GTC, OFC, and Micron earnings. The conversation kicks off with Jensen Huang's bold claim that engineers should spend $250K/year on AI tokens, and whether companies will buy tokens or token generators (i.e., on-prem hardware like the Dell Pro Max with GB300). They dig into the CapEx vs OpEx tradeoffs, data security concerns, and how sharing GPU resources might end up looking a lot like the old EDA license model.Next up: Micron crushed earnings and appears to be designed into Vera Rubin for HBM4 — despite months of rumors saying otherwise. Austin and Vik unpack the nuance around HBM pin speeds, memory node base dies, and what Micron's massive new fab investments in Taiwan, Singapore, Idaho, and New York mean for the memory cycle.The back half of the episode dives into optical interconnects for AI scale-up. A new industry consortium (OCI-MSA) has formed with Meta, Broadcom, NVIDIA, and OpenAI to standardize optical components. Vik explains why traditional indium phosphide lasers might be overkill for short-reach scale-up, and makes the case for micro LEDs — a "slow but wide" approach that could fill the gap between copper and conventional optics. They also touch on Credo's expanding product portfolio (and the infamous purple-to-orange cable saga), plus Lumentum's new VCSEL work for scale-up.Vik - https://www.viksnewsletter.com/Austin - https://www.chipstrat.com/CHAPTERS0:00 Intro & GTC/OFC Conference Overload2:09 Jensen's $250K Token Budget Per Engineer5:08 On-Prem Inference vs. Cloud Token Spending (Dell Pro Max, CapEx vs OpEx)6:44 Sharing GPU Resources Like EDA Licenses8:16 Data Security & On-Prem Privacy Concerns9:53 Matthew Berman's Fine-Tuned Open Claw Agent10:35 Vik Sets Up Open Claw on a Home Server11:53 Always Be Clauden (ABC) – Managing Agents from Your Phone13:34 Micron Earnings & HBM4 in Vera Rubin16:39 HBM Pin Speeds & the Micron Design-In Debate20:17 Micron's New Fab Investments & Memory Cycle Fears23:49 Why AI Drives a Step Change in Memory Demand26:30 Optical Compute Interconnect MSA (OCI-MSA)29:48 Scale-Up Optics: Do We Need New Technology?30:58 Micro LEDs – The "Slow but Wide" Approach35:45 Micro LEDs vs. Copper vs. Traditional Optics36:55 Credo's Product Spectrum & the Purple Cable Story39:31 VCSELs & Lumentum's 1060nm Scale-Up Play
Vik and Austin unpack the Nvidia GTC keynote with fresh, top-of-mind takes while trying to breakdown key announcements, what matters and what doesn't. They discuss Groq's LPX, optics+copper for scale up, new CPU requirements, CPO for networking, and what agents means for software, and much, much, more.Check out Austin's substack: https://www.chipstrat.comCheck out Vik's substack: https://www.viksnewsletter.comChapters00:00 Introduction and Keynote Context03:18 Keynote Highlights and Gaming Innovations06:18 Generative AI: The Three Eras09:28 Inference: The New Revenue Generator12:21 NVIDIA's Tiered Approach to AI Models15:30 The Grok Chip and Its Role18:35 Vera Rubin System: A Full Data Center21:18 CPU Demand and Performance24:31 Networking Innovations and Future Directions32:32 Innovations in PCB Technology34:06 Scaling GPU Systems36:57 Understanding the STX Rack and AI Storage38:23 The Rosa CPU and Its Significance40:07 Digital Twin Platforms and AI Factories43:53 NVIDIA's New Software Innovations47:09 The Future of Token Budgets in AI54:15 Balancing CapEx and OpEx in AI Deployments
Austin recaps moderating an agentic AI panel at Synopsys Converge, then gives an in-depth technical breakdown of Meta's MTIA custom silicon. Why they're building it, how chiplets let them ship a new chip every 6 months, and how the roadmap is shifting toward gen AI inference. Vik digs into Applied Optoelectronics (AAOI), the vertically integrated Texas laser shop whose stock went from $1.48 to $100+, and whether history is about to rhyme. Austin Lyons: https://www.chipstrat.comVik Sekar: https://www.viksnewsletter.com/ Topics covered:• Agentic AI in chip design — how it changes roles for junior and senior engineers• Optical circuit switching and what it means for Arista's business model• Meta's ad-serving pipeline: Andromeda, Lattice, and the GEM foundation model• Why custom silicon (MTIA) makes sense at Meta's scale• MTIA chiplet strategy — 4 generations in 2 years• AAOI's vertical integration, Amazon's $4B warrant deal, and the 2017 parallelChapters:0:00 Intro1:26 Synopsys Converge — Agentic AI Panel9:44 Vik's Article: Optical Circuit Switching & Arista14:43 Meta MTIA — A New Chip Every 6 Months21:32 Why Custom Silicon Makes Sense for Meta27:22 MTIA Chiplet Strategy & Roadmap33:56 Gen AI Fits Meta's Business Model36:31 How Meta Ships Chips So Fast40:30 Applied Optoelectronics (AAOI) Deep Dive45:02 Amazon's $4B Warrant Deal48:54 Can AAOI's Lasers Compete with Lumentum?53:16 AAOI's Aggressive Capacity Buildout55:35 History Rhymes: AAOI's 2017 Boom & Bust1:00:55 Wrap-Up#semiconductors #chips #tech #meta #MTIA #AAOI #optics #inference #AI
This week, Austin and Vik break down the optics vs. copper debate that rocked semis this week. Nvidia dropped $4 billion on Lumentum and Coherent, Credo posted a blowout quarter betting on copper, and then Hock Tan shocked everyone claiming 400G per lane works over copper in Broadcom’s labs — potentially pushing CPO out to 2030+. Plus, Vik’s 4D chess conspiracy theory on why Hock Tan is talking up copper when Broadcom is a CPO company.Like, subscribe, and drop your thoughts on the copper vs. optics debate in the comments!Subscribe to our newsletters:* Chipstrat by Austin Lyons — chipstrat.com* Vik’s Semiconductor Newsletter by Vik Sekar — viksnewsletter.comChapters(00:00) - Newsletter Plugs: Groq LPUs & Broadcom’s Laser Business(03:15) - Dynamo & the Rise of Workload-Specific Hardware(08:04) - Austin’s Broadcom Laser Deep Dive(09:53) - The Week’s Whiplash: Optics Monday, Copper Wednesday(17:50) - Why Nvidia Invested $4B: Geopolitics, Supply & the HBM Playbook(24:15) - CPO Lasers & Optical Circuit Switches(26:16) - Credo Earnings: 200% YoY Growth & the Copper Bull Case(31:09) - Reliability, AECs & Oracle’s GPU Cluster Problem(35:48) - Credo’s Optics Play: Micro-LED Active Cables & the CPO Timing Risk(38:45) - Broadcom Earnings: Hock Tan’s Copper Bombshell(43:34) - Customer-Owned Tooling: Hock Tan Says “Good Luck”(44:25) - Vik’s 4D Chess Theory: Why Hock Tan Talks Up Copper(47:03) - Wrap-Up: It’s Both — The Real Question Is Timing
This week, we move from optics technology to optics companies. We walk the AI optical supply chain from bottom to top. Main debate: Who has a moat? Who is already priced for perfection? *Not investment advice, do your own due diligence*AXTI - Indium phosphide substrate supplier. Critical bottleneck in the laser stack. Major China export-control risk. Massive stock run vs thin earnings.Tower Semiconductor - Leading silicon photonics foundry. 5x capacity expansion with customer prepayments. Strong process lock-in. Pure-play optics exposure.GlobalFoundries - 300mm monolithic photonics platform + Chips Act support. Optics growing fast but still small piece of overall business.Lumentum - Dominant EML laser supplier. Explosive AI demand. Strong technical moat. Valuation and capex sensitivity are key risks.Coherent - Vertically integrated from substrate to module. 6-inch InP push could lower costs structurally. Execution and margin mix matter.Fabrinet - Optics assembly partner. High NVIDIA exposure. Scales with industry, but dependent on upstream supply.Corning - AI data centers require far more fiber than traditional cloud. $6B Meta deal adds visibility. Timing of scale-up optics is the swing factor.Timestamps00:01 Intro06:59 AXT $AXTI13:38 Tower Semiconductor $TSEM23:58 GlobalFoundries $GFS32:43 Lumentum $LITE39:38 Coherent $COHR47:09 Fabrinet $FN54:07 Corning $GLWAustin's Substack: https://www.chipstrat.com/Vik's Substack: https://www.viksnewsletter.com/
Austin and Vik delve into the evolving landscape of optics and networking, particularly in relation to AI and data centers. The conversation covers various scales of networking, including scale across, scale out, and scale up, while also addressing the demand-supply dynamics in laser manufacturing and the future of optical circuit switches. The episode highlights the technological advancements and market opportunities in the optics sector, emphasizing the significance of these developments for the future of AI.TakeawaysSilicon photonics is becoming crucial for data center connectivity.Optics is essential for overcoming copper's limitations in speed and distance.Scale across technology is vital for connecting data centers.Scale out optics is the standard for connecting GPUs between racks.Co-packaged optics can reduce energy consumption in data centers.The scale up market for optics is emerging as a new opportunity.Indium phosphide wafers are a critical bottleneck in laser manufacturing.Optical circuit switches are gaining traction in data centers.2026 is anticipated to be a pivotal year for optical networking. Chapters00:00 Introduction to AI and CPU Bottlenecks03:00 The Rise of Silicon Photonics06:01 Understanding Optical Networking and Data Centers08:49 Scale Across: Connecting Data Centers11:56 Scale Out: Optimizing Data Center Connectivity14:53 Scale Up: The Future of GPU Connectivity23:32 The Shift from Copper to Optical Connections26:13 Challenges and Reliability of Lasers30:47 Understanding Co-Packaged Optics34:17 Market Dynamics: Demand and Supply of Lasers40:46 Emerging Technologies: Optical Circuit SwitchesCheck out Austin's Substack: https://www.chipstrat.comCheck out Vik's Substack: https://www.viksnewsletter.com
In this episode of the Semi Doped podcast, Austin and Vik delve into the current state of the semiconductor industry, focusing on the memory crisis driven by increasing demand from AI applications. They discuss the implications of rising memory prices, the impact of hyperscaler spending on the market, and the strategic moves of major players like Google, Microsoft, Meta, and Amazon in the AI landscape. TakeawaysMemory prices are skyrocketing, impacting consumer electronics.The memory crisis is affecting the production of lower-end devices.DRAM prices have doubled in a single quarter, creating challenges for manufacturers.Nanya Tech's revenue growth indicates a booming memory market.AI applications are driving unprecedented demand for memory.Hyperscalers are significantly increasing their capital expenditures for AI infrastructure.The integration of AI into advertising is reshaping business models for companies like Google and Meta.Chapters00:00 The State of Memory in Semiconductors03:08 Nvidia's GPU Dilemma and Market Dynamics06:13 The Impact of AI on Memory Demand09:08 NAND Flash and Context Memory Trends11:59 The Future of Memory Supply and Demand15:12 AI Infrastructure and CapEx Spending17:47 Google's Strategic Investments in AI20:58 The Advertising Business Model and AI Integration30:26 Revenue vs. Expenses: A Balancing Act31:08 The Future of TPUs vs. GPUs in Cloud Computing35:31 Microsoft vs. Google: AI Investments and Market Reactions38:22 AI Integration in Enterprises: Microsoft’s Unique Position39:57 The Power of Microsoft’s Reach in AI40:30 GitHub: A Hidden Gem for Microsoft’s AI Strategy43:52 Meta’s AI Strategy: Advertising and Revenue Growth51:18 Amazon’s Massive CapEx: Implications for the Future54:00 Looking Ahead: Predictions for 2027 and BeyondCheck out Austin's substack: https://www.chipstrat.com/Check out Vik's substack: https://www.viksnewsletter.com/
In this episode, Vik and Wayne Nelms discuss the emerging financial exchange for GPU compute, exploring its implications for the AI infrastructure market. They discuss the value of compute, pricing dynamics, hedging strategies, and the future of GPU and memory trading. Wayne shares insights on partnerships, the depreciation of GPUs, and how inference demand may reshape hardware utilization. The conversation highlights the importance of financial products in facilitating data center development and optimizing profitability in the evolving landscape of compute resources.TakeawaysWayne Nelms is the CTO of Ornn, focusing on GPU compute as a commodity.The value of compute is still being defined in the market.Hedging strategies are essential for managing compute costs.The pricing of GPUs varies significantly across providers.Memory trading is becoming a crucial aspect of the compute market.Partnerships can enhance trading platforms and market efficiency.Depreciation of GPUs is not linear and varies by use case.Inference demand may change how GPUs are utilized in the future.Transparency in pricing benefits smaller players in the market.Financial products can facilitate data center development and profitability.Chapters00:00 Introduction to GPU Compute Futures03:13 The Value of Compute in Today's Market05:59 Understanding GPU Pricing Dynamics08:46 Hedging and Futures in Compute11:52 The Role of Memory in AI Infrastructure15:14 Partnerships and Market Expansion17:46 Depreciation and Residual Value of GPUs20:57 Future of Data Centers and Compute Demand24:01 The Impact of Financialization on AI Infrastructure27:04 Looking Ahead: The Future of Compute MarketsKeywordsGPU compute, financial exchange, futures market, data centers, AI infrastructure, pricing strategies, hedging, memory trading, Ornn Follow Wayne Nelms (@wayne_nelmz on X)Check out Ornn's website: https://www.ornnai.com/Check out Vik's Substack: https://www.viksnewsletter.com/Check out Austin's Substack: https://www.chipstrat.com/
Vik and Val Bercovici discuss the evolution of storage solutions in the context of AI, focusing on Weka's innovative approaches to context memory, high bandwidth flash, and the importance of optimizing GPU usage. Val shares insights from his extensive experience in the storage industry, highlighting the challenges and advancements in memory requirements for AI models, the significance of latency, and the future of storage technologies.TakeawaysContext memory is crucial for AI performance.The demand for memory has drastically increased.Latency issues can hinder AI efficiency.High bandwidth flash offers new storage capabilities.Weka's Axon software enhances GPU storage utilization.Token warehouses can significantly reduce costs.Augmented memory grids improve memory access speeds.Networking innovations are essential for AI storage solutions.Understanding memory hierarchies is vital for optimization.The future of storage will involve more advanced technologies.Chapters00:00 Introduction to Weka and AI Storage Solutions05:18 The Evolution of Context Memory in AI09:30 Understanding Memory Hierarchies and Their Impact16:24 Latency Challenges in Modern Storage Solutions21:32 The Role of Networking in AI Storage Efficiency29:42 Dynamic Resource Utilization in AI Networks30:04 Introducing the Context Memory Network31:13 High Bandwidth Flash: A Game Changer32:54 Weka's Neural Mesh and Storage Solutions35:01 Axon: Transforming GPU Storage into Memory39:00 Augmented Memory Grid Explained42:00 Pooling DRAM and CXL Innovations46:02 Token Warehouses and Inference Economics52:10 The Future of Storage InnovationsResourcesManus AI $2B Blog: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-ManusAlso listen to this podcast on your favorite platform. https://www.semidoped.fm/Check out Vik's Substack: https://www.viksnewsletter.com/Check out Austin's Substack: https://www.chipstrat.com/
Austin and Vik discuss the emerging trend of AI agents, particularly focusing on Claude Code and OpenClaw, and the resulting hardware implications.Key Takeaways:2026 is expected to be a pivotal year for AI agents.The rise of agentic AI is moving beyond marketing to practical applications.Claude Code is being used for more than just coding; it aids in research and organization.Integrating AI with tools like Google Drive enhances productivity.Security concerns arise with giving AI agents access to personal data.Local computing options for AI can reduce costs and increase control.AI agents can automate repetitive tasks, freeing up human time for creative work.The demand for CPUs is increasing due to the needs of AI agents.AI can help summarize and organize information but may lack deep insights.The future of AI will involve balancing automation with human oversight.Chapters(00:00) Introduction: Why 2026 may be the year of AI agents(01:12) What people mean by agents and the OpenClaw naming chaos(02:41) Agents behaving badly: crypto losses and social posting(03:38) Claude Code as a research tool, not a coding tool(05:54) Terminal-first workflows vs GUI-based agents(07:44) Connecting Claude Code to Gmail, Drive, and Calendar via MCP(09:12) Token waste, authentication friction, and workflow optimization(10:54) Automating newsletter ingestion and research archives(12:33) Giving agents login credentials and security tradeoffs(13:50) Filtering signal from noise with topic constraints(16:36) AI-driven idea generation and its limitations(17:34) When automation effort is not worth it(19:02) Are agents ready for non-technical users?(20:55) Why OpenClaw should not run on your personal laptop(21:33) Safe agent deployment: VPS vs local servers(23:33) The true cost of agents: infrastructure plus inference(24:18) What OpenClaw adds beyond Claude Code(26:53) Agents require managerial thinking and self-awareness(28:18) Local inference vs cloud APIs(30:46) Cost control with OpenRouter and model hierarchies(32:31) Scaling agents forces model and cost optimization(33:00) AI aggregation vs creator analytics(35:58) AI as discovery, not a replacement for reading(38:17) When summaries are enough and when they are not(39:47) Why AI cannot understand what is not said(41:18) Agentic AI is driving unexpected CPU demand(41:49) Intel caught off guard by CPU shortages(44:53) Security, identity, and encryption shift work to CPUs(46:10) Closing thoughts: agents are real, early, and unevenDeploy your secure OpenClaw instance with DigitalOcean:https://www.digitalocean.com/blog/moltbot-on-digitaloceanVisit the podcast website: https://www.semidoped.fmAustin's Substack: https://www.chipstrat.com/Vik's Substack: https://www.viksnewsletter.com/
Maia 100 was a pre-GPT accelerator.Maia 200 is explicitly post-GPT for large multimodal inference.Saurabh Dighe says if Microsoft were chasing peak performance or trying to span training and inference, Maia would look very different. Higher TDPs. Different tradeoffs. Those paths were pruned early to optimize for one thing: inference price-performance. That focus drives the claim of ~30% better performance per dollar versus the latest hardware in Microsoft’s fleet.Intereting topics include:• What “30% better price-performance” actually means• Who Maia 200 is built for• Why Microsoft bet on inference when designing Maia back in 2022/2023• Large SRAM + high-capacity HBM• Massive scale-up, no scale-out• On-die NIC integrationMaia is a portfolio platform: many internal customers, varied inference profiles, one goal. Lower inference cost at planetary scale.Chapters:(00:00) Introduction(01:00) What Maia 200 is and who it’s for(02:45) Why custom silicon isn’t just a margin play(04:45) Inference as an efficient frontier(06:15) Portfolio thinking and heterogeneous infrastructure(09:00) Designing for LLMs and reasoning models(10:45) Why Maia avoids training workloads(12:00) Betting on inference in 2022–2023, before reasoning models(14:40) Hyperscaler advantage in custom silicon(16:00) Capacity allocation and internal customers(17:45) How third-party customers access Maia(18:30) Software, compilers, and time-to-value(22:30) Measuring success and the Maia 300 roadmap(28:30) What “30% better price-performance” actually means(32:00) Scale-up vs scale-out architecture(35:00) Ethernet and custom transport choices(37:30) On-die NIC integration(40:30) Memory hierarchy: SRAM, HBM, and locality(49:00) Long context and KV cache strategy(51:30) Wrap-up
OpenAI's partnership with Cerebras and Nvidia's announcement of context memory storage raises a fundamental question: as agentic AI demands long sessions with massive context windows, can SRAM-based accelerators designed before the LLM era keep up—or will they converge with GPUs?Key Takeaways1. Context is the new bottleneck. As agentic workloads demand long sessions with massive codebases, storing and retrieving KV cache efficiently becomes critical.2. There's no one-size-fits-all. Sachin Khatti's (OpenAI, ex-Intel) signals a shift toward heterogeneous compute—matching specific accelerators to specific workloads.3. Cerebras has 44GB of SRAM per wafer — orders of magnitude more than typical chips — but the question remains: where does the KV cache go for long context?4. Pre-GPT accelerators may converge toward GPUs. If they need to add HBM or external memory for long context, some of their differentiation erodes.5. Post-GPT accelerators (Etched, MatX) are the ones to watch. Designed specifically for transformer inference, they may solve the KV cache problem from first principles.Chapters - 00:00 — Intro - 01:20 — What is context memory storage? - 03:30 — When Claude runs out of context - 06:00 — Tokens, attention, and the KV cache explained - 09:07 — The AI memory hierarchy: HBM → DRAM → SSD → network storage - 12:53 — Nvidia's G1/G2/G3 tiers and the missing G0 (SRAM) - 14:35 — Bluefield DPUs and GPU Direct Storage - 15:53 — Token economics: cache hits vs misses - 20:03 — OpenAI + Cerebras: 750 megawatts for faster Codex - 21:29 — Why Cerebras built a wafer-scale engine - 25:07 — 44GB SRAM and running Llama 70B on four wafers - 25:55 — Sachin Khatti on heterogeneous compute strategy - 31:43 — The big question: where does Cerebras store KV cache? - 34:11 — If SRAM offloads to HBM, does it lose its edge? - 35:40 — Pre-GPT vs Post-GPT accelerators - 36:51 — Etched raises $500M at $5B valuation - 38:48 — Wrap up
Innoviz CEO Omer Keilaf believes the LIDAR market is down to its final players—and that Innoviz has already won its seat.In this conversation, we cover the Level 4 gold rush sparked by Waymo, why stalled Level 3 programs are suddenly accelerating, the technical moat that separates L4-grade LIDAR from everything else, how a one-year-old startup won BMW, and why Keilaf thinks his competitors are already out of the race.Omer Keilaf founded Innoviz in 2016. Today it's a publicly traded Tier 1 supplier to BMW, Volkswagen, Daimler Truck, and other global OEMs.Chapters 00:00 Introduction 00:17 Why Start a LIDAR Company in 2016? 01:32 The Personal Story Behind Innoviz 03:12 Transportation Is Still Our Biggest Daily Risk 04:28 The 2012 Spark: Xbox Kinect and 3D Sensing 06:32 From Mobile to Automotive: Finding the Right Platform 07:54 "I Didn't Know What LIDAR Was, But I'd Do It Better" 08:19 How a One-Year-Old Startup Won BMW 10:04 Surviving the First Product 11:23 From Tier 2 to Tier 1: The Volkswagen Win 13:47 Lessons Learned Scaling Through Partners 14:45 The SPAC Decision: A Wake-Up Call from a Competitor 16:42 From 200 LIDAR Companies to a Handful 17:27 NREs: How Tier 1 Status Funds R&D 18:44 Why Automotive-First Is the Right Strategy 19:45 Consolidation Patterns: Cameras, Radars, Airbags 20:31 "The Music Has Stopped" 21:07 Non-Automotive: Underserved Markets 23:51 Working with Secretive OEMs 25:27 The Press Release They Tried to Stop 26:42 CES 2025: 85% of Meetings Were Level 4 27:40 Why Level 3 Programs Are Suddenly Accelerating 28:33 The EV/ADAS Coupling Problem 29:49 Design Is Everything: The Holy Grail Is Behind the Windshield 31:13 The Three-Year RFQ: Grill → Roof → Windshield 32:32 Innoviz3: Small Enough for Behind-the-Windshield 34:40 Innoviz2 for L4, Innoviz3 for Consumer L3 36:38 What's the Real Difference Between L2, L3, and L4 LIDAR? 38:51 The Mud Test: Why L4 Demands 100% Availability 40:50 "We're the Only LIDAR Designed for Level 4" 42:52 Patents and the Maslow Pyramid of Autonomy 44:15 Non-Automotive Markets: Agriculture, Mining, Security 46:15 Closing
Austin and Vik discuss why LiDAR is important for autonomy, how modern systems work, and how the technology has evolved. They compare Time of Flight and FMCW architectures, explain why wavelength choice matters, and walk through the tradeoffs between 905 nm and 1550 nm across eye safety, cost, and performance. The discussion closes with a clear-eyed look at competition, Chinese suppliers, and supply chain risk.Chapters(00:00) Introduction to LiDAR and why it matters(05:40) The case for LiDAR in autonomous vehicles(12:41) Wavelengths, eye safety, and system tradeoffs(15:38) How LiDAR works: Time of Flight vs. FMCW(20:12) Mechanical vs. solid-state LiDAR designs(27:31) Market dynamics, competition, and geopolitics
Episode SummaryAustin and Vik break down NVIDIA’s CES 2026 keynote, focusing on Vera Rubin, DGX Spark and DGX Station, uneducated investor panic, and physical AI.Key TakeawaysDGX Spark brings server-class NVIDIA architecture to the desktop at low power, aimed at developers, enthusiasts, and enterprises experimenting locally. DGX Station functions more like a mini-AI rack on-prem: Grace Blackwell for inference and development without full racks The historical parallel is mainframes to minicomputers, expanding compute TAM rather than displacing cloud usage. On-prem AI converts some GPU rental OpEx into CapEx, appealing to CFOs NVIDIA positioned autonomy as physical AI with vision-language-action models and early Mercedes-Benz deployments in 2026. Vera Rubin integrates CPU, GPU, DPU, networking, and photonics into a single platform, emphasizing Ethernet for scale-out. (Where was the Infiniband switch?) The new Vera CPU highlights rising CPU importance for agentic workloads through higher core counts, SMT, and large LPDDR capacity. Rubin GPU’s move to HBM4 and adaptive precision targets inference efficiency gains and lower cost per token. Context memory storage elevates SSDs and DPUs, enabling massive KV cache offload beyond HBM and DRAM. Cable-less rack design and warm-water cooling show NVIDIA’s shift from raw performance toward manufacturability and enterprise polish.
Austin and Vik discuss key insights from the IEDM conference. They explore the significance of IEDM for engineers and investors, the networking opportunities it offers, and the latest innovations in silicon photonics, complementary FETs, NAND flash memory, and GaN-on-silicon chiplets. TakeawaysPenta-level NAND flash memory could disrupt the SSD marketGaN-on-Silicon chiplets enhance power efficiencyComplementary FETsOptical scale-up has a power problemThe future of transistors is still bright
Key TopicsWhat Nvidia actually bought from Groq and why it is not a traditional acquisitionWhy the deal triggered claims that GPUs and HBM are obsoleteArchitectural trade-offs between GPUs, TPUs, XPUs, and LPUsSRAM vs HBM. Speed, capacity, cost, and supply chain realitiesGroq LPU fundamentals: VLIW, compiler-scheduled execution, determinism, ultra-low latencyWhy LPUs struggle with large models and where they excel insteadPractical use cases for hyper-low-latency inference:Ad copy personalization at search latency budgetsModel routing and agent orchestrationConversational interfaces and real-time translationRobotics and physical AI at the edgePotential applications in AI-RAN and telecom infrastructureMemory as a design spectrum: SRAM-only, SRAM plus DDR, SRAM plus HBMNvidia’s growing portfolio approach to inference hardware rather than one-size-fits-allCore TakeawaysGPUs are not dead. HBM is not dead.LPUs solve a different problem: deterministic, ultra-low-latency inference for small models.Large frontier models still require HBM-based systems.Nvidia’s move expands its inference portfolio surface area rather than replacing GPUs.The future of AI infrastructure is workload-specific optimization and TCO-driven deployment.