Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space <br/><br/><a href="https://www.latent.space?utm_medium=podcast">www.latent.space</a>

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!On the product side, everyone is getting Computer - Perplexity, Manus, Cursor, and so on. Meanwhile on the research side, agentic evals like TerminalBench and GDPVal are also assuming computer (Harbor). On both ends, the consolidating LLM OS stack has become a standard toolkit, and Daytona is one of a small set of AI Infra companies that are booming because of it.“The end of localhost” has been Ivan Burazin’s obsession for more than a decade.Something that is all too familiar…Long before agents became the default way people talked about software development, Ivan was already chasing the idea that development should not depend on a fragile local machine. CodeAnywhere, one of the first browser-based IDEs, was an early attempt at that future: move the development environment into the cloud, make setup reproducible, and free developers from the endless “works on my machine” tax.The thesis was directionally right, but the market wasn’t ready yet.However, agents changed that. They do not care about a laptop, desk setup, or favorite editor. They need a computer they can access through an API: something stateful enough to keep working, fast enough to spin up instantly, flexible enough to resize, isolated enough to be safe, and composable enough to run the messy real-world workflows that real software engineering actually requires.Daytona isn’t just selling “sandboxes” in the narrow code-execution sense. It is the latest version of Ivan’s original localhost thesis.In this episode, Daytona’s CEO joins swyx to explain why AI agents need more than code execution boxes: they need composable computers, stateful sandboxes, instant startup, dynamic resources, and infrastructure that can survive workloads going from zero to 100,000 CPUs.We go deep on the new agent compute market: Daytona’s hard pivot from human dev environments to AI sandboxes, the New Year’s Eve MVP that customers begged for, why Daytona runs on bare metal with its own scheduler, how one customer runs almost 850,000 sandboxes a day, and why RL/eval workloads went from 0% to roughly 50% of usage in just months. Ivan also explains why agents need Windows and macOS machines, why CLI may matter more than MCP, why Kubernetes is painful for this workload, and why the future AI cloud may look more like Stripe than AWS.We discuss:* How Daytona grew out of CodeAnywhere, Shift, and the “end of localhost” thesis* Why Daytona pivoted from human dev environments to AI sandboxes* Why agents need composable computers instead of disposable code execution boxes* The New Year’s Eve MVP that customers chased API keys for* Why Daytona chose bare metal, stateful snapshots, and its own scheduler* How Daytona spins up one sandbox in ~60ms and 50,000 sandboxes in ~75 seconds* Why Daytona’s biggest customer runs ~850,000 sandboxes a day* How RL/eval workloads create zero-to-100,000 CPU spikes* Why RL workloads went from 0% to roughly 50% of Daytona usage* Why customers compare Daytona against EKS/GKS and say they’re “never going back”* Why every AI agent may need a computer, including Windows and macOS environments* The Apple licensing constraints that make macOS sandboxes hard* Why CLI gives agents more power than MCP* How open source helps agents integrate Daytona* Why agent-generated PRs may break today’s CI/CD assumptions* Why AI SaaS companies reselling tokens may face a cold shower* Why the AI cloud may look more like Stripe than AWSIvan Burazin* LinkedIn: https://www.linkedin.com/in/ivanburazin* X: https://x.com/ivanburazinDaytona* Website: https://www.daytona.io* X: https://x.com/daytonaioTimestamps* 00:00:00 Hook* 00:01:12 Introduction* 00:03:15 CodeAnywhere, Shift, and the end of localhost* 00:05:58 What Daytona is: composable computers for AI agents* 00:08:07 The pivot from dev environments to AI sandboxes* 00:10:17 The New Year’s Eve MVP and customers begging for API keys* 00:12:56 Bare metal, stateful sandboxes, and Daytona’s scheduler* 00:17:28 60ms startup, 50,000 sandboxes, and 850K daily runs* 00:21:53 Spiky RL/eval workloads and the new agent infra problem* 00:28:12 RL workloads, Kubernetes pain, and dynamic resizing* 00:33:31 Why every AI agent needs a computer* 00:38:48 macOS sandboxes and Apple’s licensing problem* 00:44:28 Why CLI may matter more than MCP* 00:48:11 Open source, GitHub stars, and agent integration* 00:53:11 Git, CI/CD, and agent collaboration bottlenecks* 00:58:15 Founder life and building a 25-person infra company* 01:02:44 AI SaaS, token resale, and API-first business models* 01:06:10 GPU sandboxes, data centers, and compute growth* 01:09:48 Why the AI cloud may look more like Stripe than AWS* 01:11:26 Closing thoughtsTranscriptIntroduction: Daytona, CodeAnywhere, and the End of LocalhostSwyx [00:00:02]: Okay, we’re in the studio with Ivan Burazin, CEO of Daytona. Welcome.Ivan [00:00:07]: Thanks for having me, man.Swyx [00:00:08]: Ivan, you and I go back.Ivan [00:00:10]: Way back.Swyx [00:00:11]: How I don’t even know how, you found, did you reach out or, for Shift.Ivan [00:00:17]: I reached out to you. The reason was you - we were just - we were thinking about I was one of the co-founders of CodeAnywhere, the first browser-based IDE, and so we were thinking a long time of, localhost should die. And you had this article.Swyx [00:00:29]: End of localhost.Ivan [00:00:30]: Then I reached out to you because of that, and then we talked, and I was actually at a different job and learning about I was the head of, developer experience, and you were quite well-versed in that, and I actually reached out to you, among other people, how do we go about that? What are the key things and whatnot at this point in time? And you were nice enough to take the call, and I remember I was late on your call with you.Swyx [00:00:51]: I don’t remember.Ivan [00:00:52]: I remember because I was with my then I’m thinking of a girlfriend or wife at that point in time, I’m not sure. It’s the same person, so that’s great, and I was late ‘cause we were, in, Italy on, vacation, and then I was late for something. I felt so bad, and you were so nice to be, good about.Swyx [00:01:10]: The reason I’m nice is because I’m also late to other people, so it’s like, who’s, who’s without sin here, yeah, so I have to, for those who don’t know, InfoBip Shift, there’s this whole thing that, you did in the past, and, and that was basically one of the inspirations for me starting AI Engineer, which is like, I have to thank you for giving me that push to be like, “Oh, you can, you can build and sell conferences?”Ivan [00:01:34]: I remember you asked you asked me at the beginning to give me advisory shares, and I was so focused on what we were doing, I said no, and I should’ve took the advisory shares. So I’m sorry, dude. But anyway.Swyx [00:01:43]: We’re not, we’re not venture backed.Ivan [00:01:44]: No, it doesn’t matter.Swyx [00:01:45]: It’s Yeah, anyway, so I think what’s impressive about you is that CodeAnywhere is the thing that you’ve been trying to build, and, you kind of put it on hold and then came back after InfoBip. Just give us the story, do you - the story and the origin story, going into Daytona.From CodeAnywhere and Shift to DaytonaIvan [00:02:05]: Sure. Like, really way back, me and my co-founder have been together. I say this, I’ve said this multiple times, it’s like we were married and divorced and married. Some people actually ask me is my co-founder my partner. they thought it literally. It’s not literally, but we have done multiple companies together, and to your point, we had this shift where we went from the CodeAnywhere to the conference called Shift, and then back to, Daytona. We originally started stacking servers, doing like virtualization in the early 2000s and, routers and doing basically all these things, at a foundational level, and that was a services company which we sold to focus on what my co-founder actually invented, which was the very first browser-based IDE, right, I say the first. Before us was actually Heroku. They did it for a very short time until they became Heroku. But outside of them, we were the only one, and it was called.Swyx [00:02:55]: There was Cloud9.Ivan [00:02:57]: Cloud9 came out slightly after us. There was Replit, which came out when we stopped doing it, Replit came out, and they have been successful since then, which is great. There was Nitrous.io. There was quite a few that existed at the time, but it was like too early. But the interesting part is that we, at that point in time, because there was no VS Code, there was no Kubernetes, and Docker had just started when we Or I’m not sure if it was even public at that point in time. And so we had to build everything to the whole stack ourselves and that was the key learning that we brought into and that we’ve been using in Daytona today. So it was super early. There’s about 3 million people used CodeAnywhere. It was slightly, it was angel-backed more than venture-backed. We ended up paying everyone back because it didn’t have that sort of scale. But, three years ago, we started something similar with Daytona, which is not what we are today, but it was automating dev environments for human engineers, the basically the underlying stack of CodeAnywhere. And then we did a hard pivot last January to sandboxes. And so here we are.Swyx [00:04:01]: Historic pivot, yeah, and, it’s one of those things where, I had independently invested in CodeAnywhere, but also in E2B, and then both of you pivoted into the same thing, and I’m like, “F**k.”Ivan [00:04:12]: You invested, you invested in Daytona. You invested in Daytona. But you were the first If we had not got your check, we wouldn’t have done it.Swyx [00:04:18]: No way.Ivan [00:04:19]: No, it was like, “We have to get him on board first,” and you were that kicker that we, that got us off the ground.Swyx [00:04:23]: No, because you were putting me on your pitch deck, man. I was like, “Man, this is like a good trip if I don’t invest.”Ivan [00:04:29]: That’s because it was your quote. It’s like we.Swyx [00:04:30]: Yeah. It’s the end of localhost.Ivan [00:04:31]: Did a bunch of research about end of localhost and who was interested in that,.Swyx [00:04:34]: No, that’s like, I put, I wrote that blog post, and every single company in that field reached out to me, and then every VC who was receiving those pitches then also had to call me and, talk it, talk through it with me.Ivan [00:04:47]: It’s finally happening though.Swyx [00:04:48]: It was really super interesting.Ivan [00:04:48]: It’s finally happening.Swyx [00:04:49]: It’s finally happening.Ivan [00:04:49]: Yeah, it’s finally.Swyx [00:04:49]: It’s finally happening, with maybe sort of non-human users. Yeah, so what is Daytona today? Let’s get like a quick description. I’m wearing the shirt.What Daytona Is Today: Composable Computers for AI AgentsIvan [00:04:58]: You’re wearing the shirt. Yes,.Swyx [00:04:59]: It says, I think your branding is very good. Like, it’s very consistent. It runs AI code. Like, it cannot be simpler.Ivan [00:05:05]: Exactly, but we’re gonna probably have to change that.Swyx [00:05:07]: Oh, s**t.Ivan [00:05:07]: It’s also a subset of what we do. Unfortunately, we really love this, Run AI Code is super simple. People interpret it different ways. I think we’ve given out 5,000, 6,000 of these shirts. People wear them with pride because it doesn’t really market about us.Swyx [00:05:21]: Yeah, Daytona’s on the back.Ivan [00:05:22]: It markets the back. It markets to the person itself, so I think we did a really good job on that one. But it is also a subset of what we do, because people, when they think about Run AI Code, they just think about these small, let’s call it isolates, code execution boxes that, you send some code, you get an output. Whereas what Daytona is today is essentially composable computers for AI agents. It is, the market calls them sandboxes which can be misleading.Swyx [00:05:44]: All these things. All these things on.Ivan [00:05:45]: Yeah, exactly, ‘cause it can be misleading ‘cause people usually think about sandboxes as a demo or a test environment versus a production-grade environment. But what Daytona does, if you think of the laptop that you have in front of you or the computer that’s over there, or, my wife is an architect, so she has like a Windows with a 3D graphics card inside to do 3D rendering. Like, as humans, we have different computers or different compositions of computers. And our belief is strongly that agents today and going forward will need all these different compositions of computers to do different types of tasks. And so we offer that basically through an API.Swyx [00:06:19]: Yeah, to give people - I’m trying to sort of front-load all the aha moments or the wow moments so that people can, stay engaged and click like and subscribe. the market is exploding, right? Like, you have been reporting 74% month-on-month growth, and it also, it’s just been growing for a while. Like, it’s been going like this. And every single - It’s not just you guys. It’s every single.Ivan [00:06:41]: Everyone, yeah.Swyx [00:06:42]: Sort of, compute provider. I don’t know if you agree with me saying compute provider or not.Ivan [00:06:48]: It’s fine.Swyx [00:06:48]: Yeah. So like organically PLG-driven growth, but also enterprise is doing super well, I think I wanna rewind to January of last year when you did the pivot. Like, so you obviously called this market early, and you were positioned for it, and you are now one of the market leaders. But what was the insight that made you do the pivot?The Pivot: From Human Dev Environments to Agent SandboxesIvan [00:07:06]: The insight that made us do this pivot is the quarter before that, so end of 2024, when we had - Basically, we did a demo with - I don’t I think we discussed this as well, Devin was not public. You actually gave me access to Devin at that time. So Devin.Swyx [00:07:25]: I did?Ivan [00:07:26]: Yeah, you gave me access.Swyx [00:07:26]: I don’t think I was supposed.Ivan [00:07:27]: Yeah, exactly.Swyx [00:07:28]: Yeah, I.Ivan [00:07:28]: So it doesn’t matter. You.Swyx [00:07:29]: Yeah. I gave like three friends access.Ivan [00:07:31]: Yeah, or it was a call and you showed it to me. It doesn’t matter. but OpenDevin was available, which is now called OpenHands. And so we’re like, “Oh, this seems to be a thing. This is not public. Let’s take our for human automation of dev environments and take, OpenDevin and launch that as a SaaS.” And we did that. Not very many people signed up and used it, but a lot of people reached out that were building agents, and they were like, “Hey, my agent needs a compute sandbox runtime,” whatever you wanna call it. I forgot what it was called at that point. And then we were like, “Oh, amazing. This is a new market. Here is our infrastructure. Here’s our product, and go.” And what we found really fast, soon, was that people did not like what we had built. It didn’t work. And I remember talking to people at the beginning when we’re doing this, the sandbox we’re building for agents. People were like, “Oh, why is it different? It’s the same thing. We have like EC2, we have VMs, we have all these things.” But we saw that everyone we gave it to, it was like 20, 30 people, they all said, “No.” Like, “This is not what we need. This sort of breaks.” And basically, me and my co-founder not knowing a lot about - ‘cause we’re infra people. We’re not AI people. So I basically took it upon myself to like watch every single podcast that exists, including all of, all of these and all that, and sort of get up to date, read all the blogs, like get, understand what’s going on.Swyx [00:08:45]: Do you wanna shout out who else was useful, just in case people are also looking.Ivan [00:08:49]: Generally we -, I looked at There’s a few of podcast, different segments and different types. So there’s you guys, No Priors, Bill Gurley’s was great while.Swyx [00:09:04]: VG2, yeah.Ivan [00:09:05]: Yeah, while it was around. So there’s a few. 20VC is interesting from a different dynamic, and some are different dynamic. But there was, also Red Points.Swyx [00:09:14]: We’re not really about the compute market.Ivan [00:09:15]: It was also already - Sorry?Swyx [00:09:16]: You’re, you want - You’re looking at the agent infra market.Ivan [00:09:19]: I was looking at the agent market and the AI market in general and sort of understanding who are the players, what the perception, and how that goes. And like obviously you complement this with like going to conferences, going to events, going to meetups, reading white papers, like doing all the things that you have to do to understand what’s happening. And so when we figured, when we sort of had an idea of what we had to build, literally over the New Year’s Eve, literally on New Year’s Eve, I half vibe coded the first MVP, first minimal viable product of what Daytona is today. And I went to sleep at like 3:00 AM or something like that. I was doing - I just put my like baby daughter and wife to sleep and, Happy New Year’s, and go back to just, doing this. And I sent it to my co-founder, my CTO, and he saw it in the morning. He’s like, “This is absolute garbage.” “Do not show this to anybody at all, but the idea is good.” And so he took two weeks, and he rebuilt it.Swyx [00:10:09]: Did it like look like that? Listen, I - It was rough idea.Ivan [00:10:12]: Oh, not even, not even close. Like it was it was way worse. But it was like a very - It was a simplistic view of what it should be. Like, it worked, but it was not ideal. And so he went, we went down the whole, which is his job as CTO, to go, and he came back with this version. We then called all the people that had said like, “This is garbage,” a quarter ago. And we set up these calls, and we gave it to - We just demoed it to everyone. And all the calls went long, every single one. They were 15-minute calls, and they all went to like 25, 30 minutes or whatnot. And everyone said, “We need, we want access.” There was no login, just an API key, ‘cause it was just a beta or an alpha. And they said, “Oh, we want access.” And we’re like, “Sure, yeah. Okay, thank you very much.” But after like the next day, if we’d not send it, every single one, like every call that we did, everyone came back, “Where is my API key?” Like everyone wanted it. We’re like, “S**t.” Like this is it. Like I’ve never felt So one, the understanding to your point was like most people thought it was the same infrastructure for humans and agents. We understood a quarter ago it’s not. We just didn’t know what was the right primitive. And then when we came, and we can talk about what that is, and we gave it to these people, I’ve never seen, I’ve never experienced - I’ve done multiple companies in my life. I’ve never experienced this, that people literally call you if you do not give them access. Like they want access right now. And so it’s like, okay, they don’t want this. the thing that they want doesn’t seem to exist, or they have not found it, and they really want what we want. And then when we understood that we’re onto something, and then when you think about the size of the market, like the market for human engineers and enterprise is a very large market, so think GitLab or whatnot. But the market for every single agent that will exist ever in the future is just like, what is that market? How big is that? And we’re like, “We are all in on this.” And so that is where we made sort of the cut between the old product and the new one.Bare Metal, Stateful Sandboxes, and the Lambda + EC2 ModelSwyx [00:12:02]: Yeah. But it wasn’t composable at the time?Ivan [00:12:05]: It was very - It was basically just a Linux box that you could change, that you could define number of CPUs, disk, and RAM. Like that is what you could do, but you couldn’t have multiple operating systems, you couldn’t resize it on the fly, you couldn’t add a GPU, you couldn’t do like all the things. It was just the, just the first sort of variation of that, yeah.Swyx [00:12:22]: Was it bare metal from the start?Ivan [00:12:24]: It was bare metal from the start. And so the interesting thing that we thought about right away, so our.Swyx [00:12:29]: Which, give people the background, what is the normal path?Ivan [00:12:32]: Yeah, so, basically most providers run this on top of VMs. And also.Swyx [00:12:37]: Firecracker.Ivan [00:12:38]: Yeah, they run on Firecracker and VM. And so we also fire - We can get - We have multiple isolation layers and we can do that. But the common way to do it is that they, one, that the state of the machine, or the hard disk is not part of the sandbox itself. And the other thing is they’re not meant to last forever. So most of them are preemptible, like they can There’s a time that they can live. And so our thought was when we were going into this is, agents will be like humans in the sense of you don’t want your laptop to be shut down until you’re done with work. Like, and you want to close the lid and open the lid, it’s the same state. So you - Agents would want that, like the pause and come back. They want those two things. But also agents really want speed, right? Can they get it? So when we thought about it’s like we need something insanely fast, how to make it fast, how to make it long-running, and stateful. And so those two things, it’s like combining a Lambda and an EC2, right? Those two things together. And so we didn’t have an idea how others did it, ‘cause we didn’t know too that there was a market around this. It was more like, okay, this is what we need, what they need. And we looked at Kubernetes, it wasn’t wasn’t good enough for that. We looked at Nomad, it didn’t enable that. And so our history in rewriting our own scheduler at CodeAnywhere is basically what my CTO came up with. Like, he’s like, “Oh, the learnings from there,” and he brought it. And the funny thing is, our third co-founder, when he saw it, he’s like, “Dude, what is this? This is like 2008.” Like, we went back in time, and he’s like, “Exactly.” And so the reason why Daytona is like super fast, and you see this on benchmarks, is we essentially, we run on bare metal. We have our own scheduler, we use the underlying, disk, CPU, and RAM of the underlying machine, which means your IOPS are insanely fast because there’s no, there’s no network between an EBS or something like that. But also the snapshot, the point in time, the templates, are also preloaded on the bare metal machines. So when you fire off a sandbox from a template or a snapshot, you’re essentially directed to the bare metal machine where that snapshot is based on that NVMe drive, and then it literally just turns on that machine, and it’s local. There’s no network latency, anything on there. And so that is sort of the specificities that we, when we’re thinking from first principles, what a computer would look like for an agent, that is what we came up with, and that’s what we created.Benchmarks, 60ms Startup, and 50,000 SandboxesSwyx [00:15:02]: Yeah. I should maybe, I don’t know if you endorse this, but there’s someone that does compute SDK, you guys do very well on there, with like the TTI, right? I. is this a, is this a is this a relevant benchmark for you guys? I don’t know.Ivan [00:15:16]: I don’t know, and it changes every day. So today RKL is.Swyx [00:15:18]: I don’t know what RKL is. Never heard of it.Ivan [00:15:20]: Yeah. RK, yeah, so it is there.Swyx [00:15:22]: You are, at least a third of the next tier of performance, and then, there’s a lot of other better-known names that are very slow to start.Ivan [00:15:31]: Yeah. We’ve been the number one by far for a long time, and now there’s different, there’s different definitions also of sandboxes, different isolation patterns, different other things. So RKL runs it literally on the S3, the data, so it’s very different, and they spin up a sandbox, spin up a container for that, so it’s a different type of thing. So the definition of a sandbox is something that we can all, we all need to get along with. But yeah, we’re insanely fast on getting these things, up and running. And so you can see even there that it’s a zero point 0.10 to 0.11, so.Swyx [00:16:03]: Close enough. Yeah. what else do you need, right?Ivan [00:16:05]: Yeah. So the benchmarks itself, so, in this, in I don’t think the benchmarks equate to market ownership or revenue or anything like that. and I’ve seen this with multiple benchmarks, not just in sandboxes, but in general benchmarks around.Swyx [00:16:20]: It’s table stakes. It’s just like.Ivan [00:16:21]: Exactly. But it doesn’t hurt.Swyx [00:16:22]: Just roughly check.Ivan [00:16:22]: Like you definitely have to be up there and you have to be competing so that people know that, oh, this is definitely one of the top. Because this is only one dimension of what customers look for. There’s other things like how many can you spin up consecutively? There’s a feature set, there’s support, there’s like all different things that people look at, but you definitely have to be there, on the benchmarks.Swyx [00:16:40]: How many people do people spin up consecutively?Ivan [00:16:43]: So we have.Swyx [00:16:43]: Or concurrently, is the Concurrency, right?Ivan [00:16:45]: There’s three metrics that we look at. And so one is like time to spin up one, and so our time to spin up one is 60 milliseconds with network latency. So request, spin up, reply, 60, the whole thing, 60 milliseconds. That is one. But if you wanna spin up 50,000 at once, we are now at about 75 seconds. So it takes about 75 seconds to spin up concurrently 50,000. Some others, there’s public data around this, like take 2,000 seconds, which is 30 minutes. Like there’s different variations of that. And then there is the so it is speed of one, speed of like multiple, and then how many can you consistently have up and running. And so we basically have right now no limit to how much we can add because we basically own our own metal. But the biggest customer of ours does like about 850,000 every single day is sort of where they’re, where they’re just shy of a million every single day that they’re running, we do have a request for half a million concurrent, which is literally half a million CPUs somewhere running. So that’s an interesting.Swyx [00:17:44]: They pay by like vCPU seconds.Ivan [00:17:47]: By seconds, yeah.Swyx [00:17:47]: Or whatever. Yeah. Okay, and so and then, and the other thing is, the sleeping and the resuming, ‘cause it’s all the stateful resumption of all these things, how, what kind of workload are people putting through this, right? Like how is it Do we measure by gigabytes in memory, gigabytes in storage? I don’t In like network attached storage. I, what are the costly ones of, out of all these features?Workload Economics: CPU, RAM, Network, and StorageIvan [00:18:15]: The most expensive thing are CPU.Swyx [00:18:18]: Okay. Yeah, of course.Ivan [00:18:18]: The second one, yeah Then it’s RAM, then it’s disk. We actually don’t charge.Swyx [00:18:22]: Which is snapshotting, right?Ivan [00:18:23]: No, it’s actually the, snapshotting’s part of it, but basically the size of your hard disk, of your machine. So do you have 10 gigabytes, do you have 20, do you have 50, do you have whatever? And then the transference of that. Right now, currently we don’t charge for, network at all at Polychron.Swyx [00:18:37]: Oh, you gotta, yeah, you gotta fix.Ivan [00:18:38]: Yeah. It is very much a it’s a larger and larger part of our bill, so we’re working around, that part there. Obviously, that is the least, expensive, so the hard disk is the least expensive, so it’s basically CPU, RAM, for us network, ‘cause we don’t charge the customer, and then hard disk, is how it’s split up. But there’s also different types of workloads, so we basically split it up into two types of workloads in Daytona. One is what we call background agents or long-running agents. and the other is, basically RLs and evals, which I put sort of together. And so they have very different patterns of usage, and if you look at the usage of a background And I’ll just name names of companies, not specifically.Background Agents vs. RL/Evals: Two Usage ShapesSwyx [00:19:21]: Yeah, open, all hands.Ivan [00:19:23]: Yeah. So like a background agent’s a Cognition, a Lovable, a like all these things are Harvey. These are all long-running, background agents. And so if you look at their usage patterns, their usage patterns are similar to human, which is like follow the sun. Basically, the usage patterns of that is like noon is probably the highest, and the midnight is the lowest, and then weekends are lower. weekday is higher.Swyx [00:19:42]: Yeah, that’s a fun question. How global is it? Is it very US-centric or?Ivan [00:19:46]: The US is a large part, but we have currently, we have Asia, Europe, and the US regions.Swyx [00:19:52]: So it’s quite global.Ivan [00:19:53]: Yeah, it’s quite global. We have it all over. It’s interesting that our I talked to you a bit about this. Our number one city by user.Swyx [00:20:01]: Hmm.Ivan [00:20:02]: Is Singapore.Swyx [00:20:04]: Oh, wow. Amazing.Ivan [00:20:05]: Which is an interesting one, right? Not by revenue, just by just like by individual head count.Swyx [00:20:09]: Really?Ivan [00:20:09]: Just like an interesting thing.Swyx [00:20:10]: Singapore is, Singapore is weirdly high in the adoption charts of AI for the population. It’s like an, seven, eight million population. And it’s like keeps showing up.Ivan [00:20:20]: No, it’s quite interesting. We were quite shocked, and I was like, “Oh, this is interesting.” And also one that’s up there.Swyx [00:20:24]: There’s a reason I’m doing AI using Singapore. it’s because I’m from there.Ivan [00:20:27]: We’re there. We’re gonna, we’re gonna be there as well. and it’s interesting that Japan is in the top or like Tokyo’s in the top, which is in all the tech cycles it has never been. It has never been, so it’s quite interesting that they’re.Swyx [00:20:39]: I think the Japanese just love AI. Yeah. It’s that, and then it’s Brazil. That’s it.Ivan [00:20:44]: Brazil has always been in.Swyx [00:20:45]: I think.Ivan [00:20:46]: Even when I look, if you look at like GitHub’s data and ask historically with CodeAnywhere, it was always like US, Western Europe, and then you’d have like India, Brazil, China, like that would be there. But like Singapore was not in, specifically Japan was never in sort of that top, that top.Swyx [00:21:01]: Yeah. Weird pockets.Ivan [00:21:01]: Weird. Yeah, so it’s very global.Swyx [00:21:02]: Okay, so actually that, but that’s helps you to distribute your load through, all time?Ivan [00:21:08]: The interesting thing is like we have those kind of loads, but if you look at the researcher loads, they’re quite different. So what they are is like if you give them concurrency of 10,000 or 50,000 or 100,000 CPUs at ARMb, when they fire off a run, it’s just 100%. And then it just runs, and then it stops. So it’s very, the usage pattern is squares basically, right? And it’s also not follow the sun, because people will fire it off at midnight before they go to sleep but then wake up and so it’s very unpredictable, so you don’t know where that is. So the shapes of the usage are quite different than we have had before. And also what’s interesting is when it’s sort of a follow the sun, even if you have a high growth company, you can sort of predict your usage patterns and have enough capacity for that, because it’s sort of, it grows in a, in a way you can project. When you have companies doing sort of like evals and RL, they’re super spiky. So they’re gonna come in, it’s like, “We’re gonna use nothing, then can we have 100,000?” Right? And then go back down. And then 100,000, go back down. So it’s very different, right? And.Swyx [00:22:09]: Do you want to lock them into commits so.Ivan [00:22:11]: Yeah, we do.Swyx [00:22:12]: Yeah, okay.Ivan [00:22:12]: We so we have to lock them into some sort of commits to have that capacity, because we have to have, basically we have to have the capacity for peak. Right? And so right now, Daytona’s mean utilization is 15%, 1-5.Swyx [00:22:25]: Oh my God.Ivan [00:22:26]: So it’s very low.Swyx [00:22:27]: Because it’s very spiky.Ivan [00:22:27]: It’s very spiky, but we get up to 90%. so we have these things. And so what we’re, what we’re looking at right now as a company is similar to Cloudflare where you can like geo move things around, but that works really well for basically the background agent where it’s follow the sun. But this, it’s not. Like it’s a very different shape. Obviously with scale you figure these things out, but that’s an interesting new problem that we have, as a compute provider in the agent space. And when we were doing the conference recently, and so we talked to like Nikita from Neon and.Swyx [00:22:57]: I should bring it up.Ivan [00:22:58]: Parag from Parallel and whatnot, everyone has the same problem. Whereas the usage is super spiky, and this is something that has not happened before, that you have these types of like it was always, it the amplitudes were not this high, right? So it’s quite interesting use case and problem solve.Compute Conference and Spiky Agent InfrastructureSwyx [00:23:12]: Yeah, I don’t know if we’re gonna bring this up again, but let’s just talk about the conference, you had like 1,000 something people at the Warriors game, at the Sorry, where is it? What’s.Ivan [00:23:22]: Chase Center.Swyx [00:23:23]: Chase Center.Ivan [00:23:23]: Chase Center.Swyx [00:23:24]: I went. It was, it was very impressive. Obviously, you can, how to throw a conference, what did you learn? you put, you pulled together all these impressive names.Ivan [00:23:33]: What I.Swyx [00:23:34]: What were you looking for?Ivan [00:23:35]: My thesis behind the Compute Conference was let’s bring together people that are building infrastructure for AI agents. Because when I think of what we’re building, it is the agent is the primary user, what are the ergonomics and usage patterns of agents, and so we can do that. And what I found, this was a theory, it wasn’t proven, is that we all have these problems, as I touched onto. And I was, as I was talking on stage, it was like we all have the same underlying infra problems, which is this spiky workloads, unpredictable workloads that we’ve never had before, in human, compute or human infrastructure. And it’s, again, it’s the same when I was talking to Parag or when I was talking.Swyx [00:24:20]: Lynn. Nikita.Ivan [00:24:21]: Lynn, Nikita. Lynn especially, I was talking to her the other day as well. Like the It is a very interesting type of problem to solve because I can touch on Cloudflare because there’s a lot of like talk about that recently as to how they solve that, which is they have a bunch of geos, and basically, as users work in different places, and depending on your tier, they can move you around the geos. And so that how, that’s how they get the higher utilization. But you can sort of predict these, and it’s If it’s something in You’ll rarely get a spike that is 10 orders of magnitude. Like you’ll get a like let’s say one of your customers has some like an exponential curve. What is that to I’m using Cloudflare as an example. 10%, 20%, whatever it is. I don’t, I don’t have this data, I’m just assessing. It’s surely not 10x, right? It’s surely not something there. And so how do you go out and solve this problem? And we’re all solving this in different ways. So we have.Swyx [00:25:11]: She also has the same thing.Ivan [00:25:12]: Yeah, I know specifically that like Neon had that issue as well. Like how are we solving these spiky loads and things like that ‘cause we talked about it. And so the interesting thing for me to actually internalize was, yes, everyone that’s building for agents first is going through this, and we’re all solving similar problems, which is quite.Swyx [00:25:28]: Let me let me double-click on this. Okay. So for example, Neon, I happen to know that they’re very sort of S3 oriented, right? so they’re just like fully bet on S3. And you get to benefit from S3’s distribution and infrastructure. So I would imagine that Neon doesn’t have to care, whereas Lynn maybe has to care a bit more because obviously she’s doing GPU inference. And, for listeners, we did an episode with her, one and a half years ago. And you have to care. But like, right?Ivan [00:25:54]: Parag cares for sure, and Nikita.Swyx [00:25:58]: And Parag is C of, Parallel.Ivan [00:25:59]: Parallel, yeah.Swyx [00:26:00]: Former CTO of Twitter.Ivan [00:26:01]: Twitter, yeah.Swyx [00:26:02]: They are the search.Ivan [00:26:03]: Yeah, they’re search, yeah.Swyx [00:26:03]: I You and I know but the listeners don’t know.Ivan [00:26:08]: Yeah, we can put it down in the screen, and so ‘cause we, when we were talking.Swyx [00:26:11]: I’ll put it up on the, on the screen.Ivan [00:26:12]: Yeah, right.Swyx [00:26:12]: People can look it up if they need.Ivan [00:26:14]: Look it up. And, yes, but they still have CPU and RAM, allocation that you have to have up and running. And so CPU and RAM, you have to allocate that and have that ready. And so there’s basically two ways to do it. One is you either over-provision and you can handle the bursts, or two, you basically have, I don’t know if this is a term, just-in-time compute, which is like as your load becomes, as your usage comes in, you can fire off requests for VMs or bare metals at other cloud providers and then get them up and running.Swyx [00:26:43]: This is if you go above 100%, right?Ivan [00:26:45]: Yeah, this is.Swyx [00:26:46]: Like your overflow.Ivan [00:26:46]: If your overflow, like spillage or whatever you do.Swyx [00:26:48]: You probably lose money on it, but it doesn’t matter, right?Ivan [00:26:50]: It, not Well, you might, you might not That is a more cost-effective way to do it but it’s a slower way to do it. Because basically what you have to do is you have to like queue your requests, spin up these just-in-time compute, get it all ready, provision it, and then get your workload there. And so if the time isn’t important that much, that’s fine, and you can do that. But if your customer, and especially for, let’s say, the RL training runs, the reason why a lot of people come to us is because GPUs are more expensive than CPUs, right? So you want your GPU running at, what, 100% the entire time. And so when you’re running runs on CPUs, when the when the CPU cycle is like down and spinning up the next one, you want that to be instantaneous so that your GPU doesn’t go down, right? And if you then have to like go out and provision machines, you’re essentially telling the GPU that it has to wait, and that’s incurring our cost. So there’s things that you have to try to solve for there.RL Workloads, Declarative Images, and Kubernetes ReplacementSwyx [00:27:43]: Yeah, let’s talk about the different workload, right? You said that, what was it? A few months ago, you had zero RL workload and now it’s 50%.Ivan [00:27:52]: It will be this one, 50%, yeah.Swyx [00:27:54]: Let’s talk about how different it is, right? Like I imagine, for example, a lot less dynamic code generation of like arbitrary code. Like here, it’s probably all the same code. You’re just doing parallel runs or something, I don’t know.Ivan [00:28:05]: Yeah. So you’ll have multiple Depends on the like for each run, you’ll have a snapshot. And they, for the most part, they actually do use our declarative image builder, which is like, “Oh, we, the agent wants these dependencies, these env vars.”Swyx [00:28:17]: These ones, yeah.Ivan [00:28:18]: Yeah, the declarative image builder, it.Swyx [00:28:20]: Which is a very modal like thing that they.Ivan [00:28:22]: Yeah. And so we build it on the fly and then we propagate that snapshot, and you can spin up as many sandboxes as you want against that snapshot. And then if you have to do changes, the model can, or like it could be also be automated. It’s like, “Oh, now for the next run, we need to install these things or remove these things or whatever to get, a task done,” and then it goes off and runs that. So yes, that is something that it seems that they prefer. The number one reason I found, or should I say, let’s take a step back. What we are competing against in that environment is essentially managed Kubernetes. So EKS, GKE, whatever. That is what the vast majority run on. And anyone that has tried Daytona versus GKE, EKS is like, “I’m never going back.” That has always been. There’s a few reasons. One is the ergonomics. So if you have, if you’re using Kubernetes to spin that up, you have to essentially manage the interface interactions with that. Daytona, although as a compute provider, it’s more akin to a Twilio and Stripe from a consumption perspective than it is an AWS. Like you have an API, an SDK, it’s quite like easy and seamless to get these things up and running, that’s one. The other is the speed to which we spin up, which we mentioned earlier, which is much faster, and the scale to which we can go to. We haven’t got into features, but an interesting feature is that it’s very hard to OOM, or out of memory, our sandboxes, because we can dynamically on the fly.Swyx [00:29:48]: Resize.Ivan [00:29:49]: Resize, which is like impossible on almost any other thing. There are some technologies that enable you to do that, but it’s like a very hard thing. And so we actually saw this when, the Terminal Revenge team is, brought us actually. So thank you, Alex and the team, that brought us into this whole space.Swyx [00:30:05]: It’s just very rare that, a framework would just say, “Guys, just use Daytona.”Ivan [00:30:11]: Yeah, I think it says it somewhere. Yeah.Swyx [00:30:13]: Yeah. I was like, “What is this?”Ivan [00:30:15]: There’s all, there’s multiple there, but they also mention a few other places. and so Daytona specifically-We have, the, just jumping on themes here We, I don’t know where it says Data Center.Swyx [00:30:27]: I, there.Ivan [00:30:27]: Doesn’t matter.Swyx [00:30:28]: There’s a very strong recommendation, which is, very unusual. Which is, it’s.Ivan [00:30:33]: We do not pay them for this, just.Swyx [00:30:34]: I know, yeah. They just like you.Ivan [00:30:35]: Yeah, they like us. yeah, and also a thing, so, Data Center has multiple isolation sets underneath. The customer doesn’t have to know what they are. But basically we have Docker, which is a container, that’s hardened with Sysbox. So it’s Docker’s, isolation that is a security equivalent to a VM, but it’s still a container. And that is the default, and they, especially in these training workloads, really like that as an interface to be able to use just a basic Docker container, and we enable Docker and Docker. Which for these RL runs, if you need to do a Docker compose or Kubernetes, you can spin up a K3S inside of these things, which unlocks a huge amount of workloads that you can do that you cannot do on other providers. So just on that part is much more interesting. And so we went that, through that. We showed them that we could do that, and they enjoyed that quite a bit. They being the general venture people.Swyx [00:31:28]: Those people, yeah.Ivan [00:31:29]: And Harbor people.Swyx [00:31:29]: Harbor people, do are they, are they a company yet?Ivan [00:31:33]: As far, I do not know.Customer Pull, Slack Connect, and the Computer Use BetSwyx [00:31:35]: Okay. All right. Yeah. It’s like super obvious that like, there’s a lot of excitement and success around these things, okay, so yeah, tell us more, right? Like, this is an exploding workload, Harbor adopted you, which helped speed things along. But what are you learning as this new workload comes online?Ivan [00:31:53]: There’s a couple things that we learned, which we chat about in the beginning. We, and this has led our story, as we mentioned, we like talked to a lot of customers along the way, and we add more features and more tool sets as we talk to customers. And it’s interesting that And I think it’s that the ecosystem is so small and/or the models get smarter, where when we see one user come with a request, we know it goes on a roadmap if like three to five customers come with the same request in that week. It’s like very bizarre. It happens so many times, which is.Swyx [00:32:27]: Because they’re all friends.Ivan [00:32:28]: Sorry?Swyx [00:32:28]: They all, they’re all friends. They’re all in the same group chat.Ivan [00:32:30]: Yeah, probably, yeah. ‘Cause and they’re like, “Oh, can you do this?” And I’m like, “Okay, this is interesting. We’ll put it on a feature request.” And then the next one’s like, “Oh, can you do this?” “Okay.” It’s all the same, right? It’s always the same. And so what we try to do, and I personally try to do, I try to be on as many call, quote-unquote “sales calls” I can. I’m in every Slack channel. We literally have about 1,000 Slack Connect channels, something like that. It’s an interesting, there’s so many interesting things you find out when you have all the Slack channels. You can also see where people, transfer between companies. You see leave Slack channel, enter Slack channel. It’s an interesting thing. Also, just I digress, I feel that Slack Connect is literally LinkedIn what it should be. You have a list.Swyx [00:33:08]: LinkedIn charges you to, use your own connections, but Slack doesn’t, right? Slack is like, do it for free. It’s more lock-in. It’s great.Ivan [00:33:15]: Yeah. It’s amazing. Yeah. It’s one of the reasons.Swyx [00:33:17]: You’re gonna pay Slack for life.Ivan [00:33:18]: Exactly. You’re there for life. So that’s interesting. And so one of the things, the newer things we were talking about earlier is we made a big bet and put a lot of investment on computer use. that is not seen publicly the light of day. We haven’t GA’d that yet, but we have.Swyx [00:33:32]: Is there a thing I can pull up?Ivan [00:33:33]: There is computer use there. It’s right up a bit.Swyx [00:33:36]: Oh, yeah. Okay.Ivan [00:33:38]: What we have, what we talked about and what we’ve seen publicly is there’s this theme now about, the human emulator where And Elon from XAI has talked about this publicly, and if you think about the models today, they’re actually quite sophisticated and they can do a lot of work, but they still don’t have access to all the tools. Like, I’m a strong believer that the most efficient way for an agent to work is essentially headless or through, terminal or whatnot. But if we, if we look at knowledge work in general, there’s about 100 million knowledge workers in the US, about a billion in the world, and knowledge workers, and the salaries of them aggregate to 10 trillion in the US 50 trillion worldwide.Swyx [00:34:24]: Wow.Ivan [00:34:25]: Something like that. And if we look at, the five most important sectors of that, so like healthcare and government and financial services and whatnot, that’s about 56% of that. So let’s say it’s about half of that. So in the US it’s about 25 trillion, and most of them, most of that work is actually still locked into legacy apps inside of Windows, which is not going anywhere for a very long time. Like, people just won’t invest in that. How much of it? our assumption is the following: if, in the RPA market, which is similar market, well, not the same 25% of, these white collar, workers’, work is automated. If an agent is more sophisticated, can go through more runs, figure stuff out, let’s say it’s, 40%, right? And so if you take 40% of that, you get to essentially, $10 trillion a year.Swyx [00:35:17]: That’s a TAM.Ivan [00:35:18]: That is a that is a TAM. So that’s the TAM of the models, right? That’s not our, essentially ours. But you get to that size, and to be able to do that, you essentially have to give agents these computers with the legacy. So computer use, either Mac or Windows or Linux. Linux we also obviously have and others have. But Windows specifically is something very new, and the only option right now is an EC2 with, Windows or on Azure. Both of them take anywhere from three to five minutes to spin up. We’ve created an actual sandbox, so it’s a second instead of milliseconds, but you have, point in time snapshots, you have, forking, you have all the things that you have from a sandbox, but essentially enables you to hopefully unlock all this value. And so that’s been our big push and bet, but we’ve sort of, kept our ear to the ground. What is sort of the next things in the market?RPA Returns: Why Agents Still Need ComputersSwyx [00:36:06]: Yeah, knowledge work, and building, and sort of RPA, the next wave of RPA. I got very excited about RPA kind of during COVID times. The UI path was IPO-ing. And it was, a very hot Isn’t it, Eastern European?Ivan [00:36:20]: It is, Romanian.Swyx [00:36:21]: Romanian?Yeah, it might be the only Romanian, big unicorn okay, yeah. This I don’t I don’t, I don’t have like a I think there’s, I think there’s a stage being set for the resurgence of RPA, ‘cause everyone understands that, yeah, no one wants to deal with these shitty apps and no one’s gonna rewrite them. Like, you just have to do, a remote operation and programmatic operation of them.Ivan [00:36:45]: If you wanna unlock it, my own setup was basically the following. So I was doing a board deck recently, last month, whatever, and I’m like, “Okay, let’s just, let’s just do automated.” So, all our data’s in, ClickHouse and PostHog and QuickBooks, where everyone else’s is, and I’m basically, connected that all to, my Cloud code, like go off and go Cloud code whatever. Go off and, here’s the integrations, go do that. It pulled out the first report, which was great. It connected to Brex and all these things, pulled it, which was great, and then I say, “Okay, now pull out this, and this,” and I kept getting, really well McKinsey-style design reports, but the data said partial data. all the missing data, partial data. Like, it can’t access all the things, and I got so frustrated, and so I got, I got, my Mac Mini virtual sandbox with OpenClaw. I gave it its own account in our company, and then I went to all these services and created a read-only account, so literally like an intern in your company. And so I would say, “Now go and do this report,” and it would get the same, or like, “I can’t via the MCP or the API or whatever. I can’t get all the information.” I’m like, “Go log in.” And it will log into the website, then go in, export the data. It’ll export the data and do the thing end to end. So even for things that have today APIs, not all of it is exposed, and I to get value, I get immense value right now, but it has to be a computer usage, unfortunately, and so I spend a bunch of tokens just on that, but I get the job done. And so if even a startup like ours, and using all the hottest tools, still needs a computer agent what hope does, Goldman have to have a headless, right?Swyx [00:38:22]: Yeah, what a - Why isn’t Microsoft doing this?Ivan [00:38:27]: I’m pretty sure, Satya had a post yesterday.Swyx [00:38:29]: Oh, okay. I see.Ivan [00:38:29]: Which was like, “Every agent needs a computer.”Swyx [00:38:31]: I see, I see.Ivan [00:38:32]: So they have launched something recently.Swyx [00:38:34]: Yeah, they have Microsoft Power Automate, I’m sure, I’m sure, they’re gonna have their version.macOS Sandboxes, Apple Constraints, and the Windows OpportunityIvan [00:38:39]: Version of that, yeah.Swyx [00:38:39]: You’re gonna try to do yours, and it - I always know there’s always demand for Mac, but I know it’s, tricky to host, macOS sandboxes.Ivan [00:38:49]: We will have macOS sandboxes fairly soon. The problem with macOS, OS sandboxes is, I’m deep in this, I don’t know how much interesting is.Swyx [00:38:55]: No, it’s.Ivan [00:38:56]: MacOS has this problem.Swyx [00:38:57]: It’s a licensing thing, right?Ivan [00:38:58]: Licensing thing. So one, you’re allowed to run only two parallel VMs per machine, so that’s one. Two, you can only license to a different user every 24 hours. So if you come in and theoretically, if I wanna charge you per second and I charge you one second, I have to have it idle for the rest of the day. I can’t have anyone else doing that. So the pricing will be different in the sense that I will have to - we would have to charge for 24 hours, and that’s not even, that’s not even the most difficult thing. But the, thing above that is, from a security perspective, they enable you to do memory snapshot, pause, resume, but only on the same physical drive, physical machine. And so what you can do in, Windows world or Linux world is that I can move in the background, your snapshot from one to the other and manage load, right? Here, if you wanna do that, you essentially have to have your.Swyx [00:39:49]: Yeah, snapshots. Yeah.Ivan [00:39:50]: Your.Swyx [00:39:51]: It’s like.Ivan [00:39:51]: Physical machine.Swyx [00:39:52]: You can’t break it up.Ivan [00:39:53]: You can’t, you can’t move things around that, and all of that is, that part is, from a security standpoint, if it is written. Like, I understand the security aspect of that, but it disables you from doing these agentic, like really scalable agentic workloads.Swyx [00:40:08]: You need to do a vibe-coded, clean room implementation on macOS that you can then - That’s like Clean OS or something. I don’t know.Ivan [00:40:17]: So. We have.Swyx [00:40:18]: ‘cause like Linux was originally like a clean room rewrite of Unix.Ivan [00:40:21]: Okay. Yeah.Swyx [00:40:21]: Or something like that, right? Like same thing to macOS. Someone needs to do it.Ivan [00:40:25]: Someone will do that, and someone will have some long-running agents for a few days to figure this stuff out. But yeah. So definitely we - we’re really close to offering something ‘cause people do want it, but the pricing will be different, and the feature set will be sort of stringent.Swyx [00:40:38]: Yeah, nobody’s gonna use this. like, the labs, the labs will because they want to automate macOS.Ivan [00:40:42]: They have to do RL. They have to do RL again. But even if you The - So the point is with the RL part, if you, if you do RL on macOS, then the next iteration of the model comes out, it will be able to use these tools significantly. Then you actually need to run those, that somewhere. So you’re gonna have to have that, later on. And from, if anyone at Apple is listening, I very much feel that they are shooting themselves in the foot of the scale of the revenue of compute or licensing they could get if they would just enable a concurrency model similar to what you can get on a Windows and a, and Linux.Swyx [00:41:17]: Yeah. Yeah. And I’m sure they’ve heard this before. They just don’t care. Yeah, it’s And maybe they will change their mind with the new CEO.Ivan [00:41:24]: Yeah. We’ll see.Swyx [00:41:25]: We’ll see.Ivan [00:41:25]: High hopes.Swyx [00:41:26]: High hopes.Ivan [00:41:26]: High hopes.Swyx [00:41:27]: Okay. But I, it’s very clear the market opportunity is huge in Windows, and you can go for a long time on just Windows, but your customers are gonna want both. and I think, it is interesting to me that, this is the sort of God application of agents, right? Like, I don’t It was - How big was OpenClaw for you guys? Like, was it, was there, a significant bump.OpenClaw, Agent Labs, and the B2B2C Sandbox MarketIvan [00:41:54]: Not for us because we.Swyx [00:41:54]: Because you already.Ivan [00:41:55]: We’re kind of positioned differently. Whereas although it’s completely PLG and we have individual developers that use it, most of the users that use Daytona are sort of a B2B2C. Sort of it’s either B2B or B2B2C. So, in the researcher world, it’s B2B, so you’re selling to, labs and neo labs and things like that. But on the long-running agents, it’s mostly, from a scale revenue perspective, it’s mostly B2B2C, where you have a app layer agent that uses you at a big scale.Swyx [00:42:26]: Like a Manus. Yeah.Ivan [00:42:28]: Like a Manus Lovable type of thing.Swyx [00:42:31]: Yeah. I think that’s the question of, well how, um-Uh, yeah, B2B to C is basically to me what I’ve been calling an agent lab, which is kind of like you’re not in a model lab, but you’re making a very good wrapper that is a platform that other people can sign up so they don’t have to code those things. Yeah, it sound, it sounds like a much better market than the direct OpenClaw market.Ivan [00:42:56]: I’ve like - We I’ve done multiple things. So the CodeAnywhere’s part of our career path R in the calendar, was very much an end user developer product. And so that is great. It You can get a lot of developer love, and I feel that we do as a company have a bunch of developer love. But it’s a different type, where it’s people building these things. Again, it’s more akin to a Twilio because you don’t really run - As a person, you wouldn’t run Twilio. I don’t know how many people remember. It was like ask your developer billboard and whatnot. And people really love Twilio, but they only used it inside of like, “Oh, I’m building this app or service for thing.” And so we’re very much directly to that. And you also know that I used to work for a competitor for Twilio, so it’s kind of ingrained, in my DNA.Swyx [00:43:35]: People don’t know InfoBip is that big.Ivan [00:43:38]: Yeah, it’s.Swyx [00:43:39]: Because.Ivan [00:43:40]: It’s a billion euro.Swyx [00:43:40]: They’re all American. They’re like, “Whatever’s in Europe doesn’t matter to me.” But like it’s the, it’s the same size or bigger? Same size?Ivan [00:43:46]: It’s about half the size.Swyx [00:43:47]: Half the size?Ivan [00:43:48]: Yeah, about half the size.Swyx [00:43:48]: It’s like, yeah.Ivan [00:43:48]: Still huge. Multiple billions a year. Yes.Swyx [00:43:51]: That’s crazy.Ivan [00:43:51]: Exactly, and so that - These are like really interesting and large revenue-generating, very sticky businesses. Whereas when you’re selling to the - When your focus is the end developer, it is a very hard sell because they’re very price sensitive, very price conscious, very around that. And there’s very It’s very hard to scale. Your cap is the number of people that are willing to spin up - First of all, wanna spin that up, and then spin up multiple of these. Whereas if you’re in the enterprise one, like we know everyone’s talking about like how many tokens they’re spending, I’m spending. Like a lot of companies today are like, “If this is our company, spend as much as you can.” Like basically that is where we’re going. And so if you think about that paradigm, where you’re selling to companies that say, “Spend as much as you can to generate, productivity,” versus, “Oh, I’m a single person. I have this much budget, and I’m doing this thing because it’s fun or it’s helping me out or whatever.” Like it is a different, it’s a different go-to-market, I think, strategy.MCP, CLIs, and Sandboxes as the Agent RuntimeSwyx [00:44:50]: Yeah, there’s a lot of discussion. I’m just kind of going through like the mental list of things that are in your favor, which is, for example, MCP versus CLI. Like obviously you want CLI. It’s been very good for you. I feel like it’s maybe a drop in the bucket or maybe it’s huge. I’m just checking whether it’s like these are big trends.Ivan [00:45:10]: Those things you - work well in our favor, to your point just because every.Swyx [00:45:13]: They’re kind of drop in the bucket, right?Ivan [00:45:15]: I think it’s like sort of all the things come together. And so there’s so many things that impact that. To your point, like OpenClaw wasn’t huge for us, but like having the agent SDK, from Anthropic, so or Cloud Claude Code was very interesting. The reason why it was interesting is that a lot of, let’s call them app I don’t know what to call them, app layer agent companies, essentially they are like, “Oh, I can create this new app, this new agent. All I need, I just use Claude Code, and I throw it into a sandbox, and then I have my interface to the human to that.” And so that enabled so many more companies to actually offer this, and then they would pull on sandbox. So that was, that was interesting. And to your point, like MCP, versus the CLI, the MCP is an interface against an API, whereas the CLI is like you can actually go do things. Like this is it. The difference between integrations and actually running scripts or data or analysis against a thing. So being able to use a CLI very well enables the agent to do more things, and it’s because that people will invoke a sandbox, they’ll run it in the CLI, and but it’ll do anal-analysis on that data and then give you an actual result versus just, pulling data from an API source.Swyx [00:46:29]: Yeah, it’s a layer of indirection basically, it’s the same thing as agentic search versus RAG, which where you’re.Ivan [00:46:34]: Exactly, yeah.Swyx [00:46:34]: Just like you just win whenever people put more agents into their workflow. And so like it doesn’t really matter, but I’m just kinda teasing out like what else have people heard about that like it’s sort of, “Oh yeah, this is another sandbox use case. Oh yeah, that’s another one.” Am I, am I missing any big ones?Ivan [00:46:51]: The thing, the thing that people, which is the computer use stuff, which I think is probably the most interesting one, is, and to your point, we’ve talked to so many people over the last year. It’s like, “Oh, like why do you need a sandbox? Why do you need this? Why this?” And to your point, it’s like, “Oh, I need sandbox for this. I need sandbox for that. I need sandbox-” It’s like, “Oh, I need it for every single thing.” And so basically what I, what I - and it sounds like a broken record, it’s like you use a laptop every single day, right? And you are n of one. It’s just you. But now imagine how And by the way, the laptop, the computer PC market, the PC market is about equal to the cloud market in total. So it’s about 150, 180 billion a year. Something like that. It’s about roughly the three cloud hyperscalers is about equal to like Apple, HP, Lenovo, whatever, It’s a little bit less, but it’s sort of like that. And now imagine And that’s just like, so how big is the addressable market? What, how many people are there in the world now? What’s the last data?Swyx [00:47:45]: Let’s call it eight billion.Ivan [00:47:46]: Eight billion. And so let’s say you can have two computer, like you have one personal and one business, whatever. Like so it’s double that, right? and so that’s 16 billion, right? How many agents are gonna be running in two years, in 10 years, in 100 years? Like And for every single task, they will need one of these. And so how big is that? That market is essentially quote unquote “infinite”. You will get to the point, and Dylan Patel was at the conference talking about, from SemiAnalysis, that talks usually about GPUs, was also talking about how CPUs will now be a bottleneck because it will be the constraint. You won’t be able to grow, or we won’t be able to have enough of these because there won’t be enough CPUs to basically do.Swyx [00:48:23]: Yeah. Well, I actually had a really good podcast with Doug Oliphant, who, which was his president at SemiAnalysis, where they’ve basically been like, yeah, it’s been a GPU shortage first, but then it’s cascaded down to memory and now to CPUs.Ivan [00:48:35]: CPU, yeah.Swyx [00:48:35]: It-What’s next? So networking. So, networking actually has been in shortage for a while if you’re looking at, just GPU networking. But, yeah, it’s really crazy the amount of computer use that’s going on, yeah, cool. I, other questions are, just the one very big part is the open sourceness which you didn’t have to do, your competitors don’t do, like it’s not, a lot of people are worried about keeping their projects open source because some competitor can just slot fork it. I don’t know if there’s any reflections on just being an open source company.Open Source, Trust, and Enterprise ProcurementIvan [00:49:15]: Yeah. There’s a bunch. So we the original product that we did was open source.Swyx [00:49:19]: Yeah. CodeAnywhere.Ivan [00:49:20]: So doing that was actually very good for us. There’s basically a saying of, What’s the saying? Like, companies that are, that are doing really well, measure themselves against, free cashflow, that are kinda okay, it’s EBITDA, then, it’s, it goes all the way down.Swyx [00:49:36]: The worst is like GitHub stars.Ivan [00:49:37]: GitHub stars. GitHub stars are the worst, yeah. So you go all the way down to GitHub stars. And so our original one was GitHub stars. That’s what we talked about, we’re at the point we’re talking about revenue, so we’re we’ve gone up the stack on that. And so we started.Swyx [00:49:47]: No, profit.Ivan [00:49:48]: Yeah. We haven’t, we’re, we’ll get there. We’ll get there. But basically at that point we did stars and GitHub and it was useful, and the original variation that we did, it we split the core into its own repo and it was Apache 2.0, so very, permissive. And then we basically would bundle that on the enterprise side with a proprietary repo. So it was like open core, but it didn’t, it didn’t fill out the repository was very clean. When we did the pivot, we didn’t have time to rethink this, and we wanted to We had this open source community. It felt a shame not to do that, and so, but we still did want to add some restrictions, so in the new sandbox product we did add a AGPL 3, which is, it’s a kind of a shortcut way to do that where you are open source. And it is true open source in the sense of an enterprise can use it if it, if it wants, but you essentially can’t make a competitor without open sourcing your stuff, which.Swyx [00:50:42]: It’s one of, three approaches. Like, there’s, BSL and some of the other sort of, elastic license.Ivan [00:50:47]: Yeah. There’s some others there. So pure open source believers agree that this is not full open source and I totally respect that. That is absolutely true, but we did leave that. And Daytona, in its essence everything outside of what’s under a feature flag today, which is like the Windows stuff, GPU stuff, and whatever, it is in this open source. It is there. So everything is there, like our own scheduler, everything’s there. So we are I’ve had some competitors say, “You guys are actually open source open source. Like, you’re real.” “Like, you can actually see that.” And people do like that, and it has helped a bit, but it’s actually more helped in the consumption of our cloud product than actually transferring people over. The reason is you can actually You send the repository to your agent when you’re integrating Daytona and it just has more context. It’s like, “Oh, okay. This is why this is happening. This is why this, that.”Swyx [00:51:41]: You could equivalently just have docs that you can Yeah, so, okay.Ivan [00:51:45]: I agree, but I, it to be fair, and so it actually doesn’t really help the growth significantly today. We’ve had this conversation with, investors and other people is like, “How do you convert people.Swyx [00:51:56]: Dude,.Ivan [00:51:56]: From open source?”Swyx [00:51:57]: The open source business conversation is so all over the place, right? Okay, on and I would just, for listeners who maybe they haven’t thought this through, a lot of people say, “Oh, it’s our free tier,” right? Like, “Oh, if you run it yourself, but if when you get serious, call us.” Right? And then other, And then me personally, ‘cause of my Temporal experience, it actually is the way that, it’s the, it’s GTM into some of the largest companies where we wouldn’t pass their, review process maybe ‘cause we’re too young of a company or, there’s, parts of the stack that we haven’t, that just doesn’t work with them. But because it’s open source, then they, then they adopt it, and then later on we figure it out. Like, that’s the low end and the high end. I don’t know if it.Ivan [00:52:37]: No, absolutely, and that has been historically. The thing that we have found in this AI transition is, and so we haven’t talked about this, Daytona’s customers are everything from, the single developer, the YC startup, to people say Fortune 500, I’ll say Fortune 5, like the biggest companies in the world.Swyx [00:52:55]: Big Neo labs. You told me about the, we’re gonna keep them anonymous.Ivan [00:52:59]: All, the enormous companies, right? And because the market pull is so strong, we’re able to circumvent these processes. I’m not saying We go, we pass security audits, we pass all these things, but as you mentioned, like Temporal way back in the way, day, in our old version of Daytona, like it took us months, and usually at the end they would churn off because just like, “Oh, you’re too small of a company,” like, “We don’t trust you” “enough.” Whereas today we’ve had these large companies push us, like they would push us through. Like, usually when you would go through procurement to become a vendor of large companies, it would take you like two, three months. We get it done in five days now. And this is not saying that maybe we’re great, but it’s more, I think, a sign of the market where it is today. And so when you think about that, the open source is something that we, from a go-to-market perspective, don’t think about that much because everything that we’ve created right now has been PLG through the cloud product, people signing up and just pulling us inwards.GitHub, Agent-First Versioning, and CI BottlenecksSwyx [00:53:53]: Yeah, this is a personal interest, and I don’t know if you have an answer, but, do you have problems with GitHub?Ivan [00:54:02]: I do. A little bit. A little bit.Swyx [00:54:04]: Yeah. Tell me, tell me. ‘Cause I’m thinking about, well, okay, what would it take to replace GitHub?Ivan [00:54:09]: There’s a lot of things. I’ve thought about this, and I’ve talked, I’ve tweeted about this, and I looked at some. I’ve actually invested personally in some.Swyx [00:54:17]: Is it, Entire?Ivan [00:54:18]: No, I haven’t done it.Swyx [00:54:18]: No? Okay.Ivan [00:54:19]: Yeah, so I, and I’ve met Thomas or virtually and we’ve talked. So I really think that And this was my reason for that. Because we have a bunch of background long-run agents, and for our time most of them are coding agents. Like, everyone was building up a competitor to Lovable or Devin or whatnot. What we saw from our customers was that they were all trying to figure out how to do, versioningLike, everyone is doing it in different ways. There was like some really weird ways where people were doing that, and the reason was that GitHub as is was an overhead. Like, it wasn’t fast enough what they needed, it didn’t solve the problem that they needed. And to be fair, like GitHub is for post your the inner loop, right? It is post your laptop, right?Swyx [00:55:07]: Yeah, GitHub is the point at which the outer loop starts.Ivan [00:55:11]: So people started using that for sandboxes, which is inner loop, which is usually, it’s on your laptop, right? And so that is not what it’s made for, and then we had everything from people Actually, the most interesting one is we had one customer that would literally take the entire code base inside the sandbox and every I forgot what the time sequence was, they would just dump it all into a JSON and then push that to S3. And that’s it.Swyx [00:55:37]: Make your own Git.Ivan [00:55:38]: It’s, it But it’s not, there’s not even diffs, it’s just a whole thing every single time. It’s just every Because it was super fast. Like, it didn’t matter. And then they would go back and search and find, sort of what the file was and write it, and whatnot. Because there’s text file, there’s JSON, like they’re very small so the network cost is very low, and they didn’t care, and they just did it that way. And I’m like, if people are doing this, that means there needs to be a new solution to this problem, right? And so for me, it’s quite interesting to look at who is building these types of new things. Agent first. I think Git as is still exists in the future, maybe even GitHub exists, but there will be a whole new sort.Swyx [00:56:15]: Yeah, exactly. Git is like the deploy artifact to kick off CI/CD. But then there’s a layer before that is like the agent collaboration layer.Ivan [00:56:23]: Yeah. And so I think something needs to be said there, but on the other side, like there’s issues with Another interesting thing is just like CI right now. So the amount of PRs being created is insane right now, right? In general.Swyx [00:56:33]: Even for you guys, right?Ivan [00:56:34]: Everyone’s creating a bunch of PRs. everyone. And then all that has to go through CI, and then that’s the bottleneck. Like, everyone’s bottleneck. Like, not just like, not just actions, but like go to any CI provider, you will not be able to, if you have a high throughput of PRs There’s one company we’re talking to, they do 1,000 PRs a day. Which means like And they’re just waiting. They have just a queue on that, right?Swyx [00:56:55]: What do they use, Buildkite.Ivan [00:56:58]: I don’t know what they.Swyx [00:56:59]: Circle?Ivan [00:57:00]: They’re, whatever.Swyx [00:57:00]: Technically your tech can be used for CI.Ivan [00:57:03]: That’s, that was the conversation. That was the conversation.Swyx [00:57:06]: Is that a serious conversation?Ivan [00:57:08]: We’ll, we’ll see how that goes. We’ve had quite a few conversations around that. We’re we are not a CI provider by any means, right?Swyx [00:57:13]: But what is what’s missing?Ivan [00:57:15]: No, so essentially.Swyx [00:57:17]: Nothing.Ivan [00:57:18]: You, essentially you could use a Daytona sandbox instead of whatever you use for, your GitHub runners essentially.Swyx [00:57:27]: Like, yeah, I’m The only thing I would say is like maybe CI machines are supposed to be very cheap, maybe it’s like the low end because it’s supposed to be like, non-blocking or like something like a, like a background job. Like, it’s, the urgency is not that important for CI.Ivan [00:57:45]: Performance is, though. Performance is, yeah.What Sells Daytona: Responsiveness, Support, and Customer TrustSwyx [00:57:48]: Yeah, okay, that is interesting, and yeah, I think, like before we leave Daytona and go into like sort of broader like founder takes and what have you, any other Daytona elements that, is interesting that we haven’t touched on?Ivan [00:58:04]: Interesting Daytona things. There’s, there.Swyx [00:58:06]: I can, I can give you more prompts if you want.Ivan [00:58:07]: Yeah, I’d love more prompts, actually.Swyx [00:58:09]: Okay. So when startups evaluate you, so you have, you have all these like names and you have more that you can’t, you can’t even name, they see all your wall of competitors. and yeah, you have differentiation versus, many of these, but like what sells them?Ivan [00:58:26]: The thing that we found that sells people the most, this is more maybe a day two thing instead of a day one thing. And we’ve seen this again and again. So we have a bunch of case studies, and we have a bunch of them still coming out. They’re all done by a third party, so we don’t do the case studies, and it’s actually interesting to watch those cases. I watch, they’re recorded, and because it’s a third party, people are actually more open, and they will tell you, “Oh, we use this competitor,” or, “We like this competitor more,” or this thing or whatever. And the number one thing that people come back to us for is that our, we have an insane responsiveness.Swyx [00:58:57]: In terms of your team?Ivan [00:58:58]: In terms of the team, yeah. Insane responsiveness has been by far the Now, we can talk about like features and breadth of product and concurrency and CPUs and like all those things, but I feel that would probably So if all other things are equal, that is very much a differentiator I’ve found. And I didn’t know.Swyx [00:59:15]: Is that entirely Slack or Slack plus email?Ivan [00:59:18]: It is, there’s email there as well, there’s calls, but the vast majority is like on Slack. So it’s Slack. Like, we have had customers like, “Hey, we have a problem. Can you get on Huddle?” Like, we will get on that Huddle like in five minutes, literally. I’ve done this multiple times, so yeah.Swyx [00:59:31]: Wait, okay, so how big are you?Ivan [00:59:33]: 25 today.Swyx [00:59:34]: How do you do this kind of support like this?Ivan [00:59:36]: We’re insane. We don’t sleep. 007, have you heard the new thing?Swyx [00:59:40]: 007. like I’ve met your team. They’re very impressive, they’re very dedicated, but like also how do you get a team to do that? it’s.Startup Culture, Family Tradeoffs, and Enjoying the PainIvan [00:59:48]: So there’s.Swyx [00:59:49]: I have Slack exhaustion?Ivan [00:59:51]: Yeah, we all have Slack exhaustion. We’re very tired. the thing that is unique, I don’t know unique about us, but unique, I would say unique about any successful, serial founder is that you’re able to pull in people that you’ve worked with before, and so you can’t do that as a first-time founder. Like, I couldn’t have done that or not. But of the 25 people in Daytona, I think about 13 of them we have worked with seven years plus. So it’s like high trust, high throughput, high we know what we’re signing off to do. And especially these people worked with us when we were starting, and we were actually hustling. hungry for food hustling type level, and so those are the people that work with us. The, now the new segment that has come is almost everyone is sort of, one degree of separation, so it’s like someone that someone has known, and so they sort of come into this org. And we’ve had people that have like not fit into org as well. It’s just like, it’s type of culture where there is a high expectation of, being online, replying for these things, and I do that first. You if you ask any engineer, they’re like, “You never sleep,” like, about me. And so then I do that as an I don’t do it as an example. That’s just how I’m wired. My wife doesn’t appreciate that I have to tell you. My wife doesn’t appreciate that. I told her about 996, she said, “I wish.”Swyx [01:01:09]: It’s like these Chinese people are slacking.Ivan [01:01:13]: Yeah. So, that is something there. And so I think every company has their own culture, and that’s something very deep, ours. And it’s something that’s come up again and again, and every single day we’re reminded about that. And I didn’t go out thinking that is how I’m gonna build it. It’s just how I’ve built these things right now.Swyx [01:01:29]: Yeah. so okay, I’ll transition a little bit on the founder side. Like, I’m very impressed by you in general of, your sort of balance, you have, you have a young family.Ivan [01:01:38]: Two kids, yeah.Swyx [01:01:39]: Two kids now.Ivan [01:01:40]: Yeah, two kids now. Yeah.Swyx [01:01:41]: I think a lot of people I meet, they’re like, “Oh, I’m starting a family. I can’t be a founder,” and all that, what’s your advice to those people?Ivan [01:01:48]: Everyone has their own I, it’s a hard, it’s a hard, they Every single day, so my family, they’re here right now, but they’re usually I fly between Croatia and here. Like, a lot of our team is in Croatia. A part of our team, and are growing, is here now in San Francisco. And so I spend a lot of time away from my family, and that is hard. Like, that is a sacrifice that you have to. But going in, people say, on your deathbed, you’re gonna miss some of those things. The thing that, and probably might be true, but the thing that going into this, I already said, I know that this is gonna hurt, and everything has to hurt. By the way, I’m very much of a feeling that everything has to hurt. Going to the gym hurts. Losing weight hurts. Like, everything has to hurt, right? It does. Like, we all.Swyx [01:02:32]: No pain, no gain.Ivan [01:02:33]: It is literally, but you actually have to enjoy the pain and just, if you don’t enjoy the pain, it’s not for you. And so you get accustomed to that pain. And so love the kids, especially I have a daughter and a son. Daughter is the eldest, love her and do miss her when she’s not here, but it’s like, that’s what I signed up for, and there is a plan and target of what I’m trying to achieve. And now hopefully with my wife, which does support me, we can get ourselves together more, so it doesn’t there. But she takes a large part portion of that. And so if you have a partner on the other side that is okay with that, then you can do that. But even if they do, you have to be okay with not being there, right?Swyx [01:03:11]: Yeah. This is my vision for you, this meme.Ivan [01:03:15]: Yeah. I.Swyx [01:03:15]: That’s your kids in the future.Ivan [01:03:18]: Yeah, I think.Swyx [01:03:18]: It’s like this,.Ivan [01:03:18]: We have to teach them that they’re not rich.Swyx [01:03:19]: Because Dad, built the compute sandboxes.Ivan [01:03:21]: Yeah, you built compute sandboxes. Dad made sandboxes. Dad made sandboxes.Swyx [01:03:25]: Built the spiritual successor to serverless and Kubernetes and for agents, any other sort of, hot topics, trends? You have a lot of hot takes, actually, you are best known for, you were, you were, you were sort of in sort of hustle culture mode, right? And someone quoted you and said, “I haven’t even heard of you, bro.” “Just log off and take the, take the Christmas off.” And then your response was?Ivan [01:03:53]: Oh, my response was, “That’s why I can’t.”Swyx [01:03:56]: Like, I think that’s, very typical of you. I don’t have it here. I can’t, I can’t bring it up. But, I think that’s very typical of the culture. But, I think you have a lot of, interesting hot takes like that. Any other sort of takes on, the startup ecosystem?SaaS Token Resellers, API Revenue, and Startup Hot TakesIvan [01:04:11]: Oh, yeah, the startup ecosystem. And this was the recent one, which is I think that And this is general, business. I feel that the It didn’t come off, I think, well on Twitter. Some people at least misread it. Which is, the market is adding premium to SaaS vendors that are reselling tokens. And I think that’s incorrect.Swyx [01:04:34]: Why?Ivan [01:04:35]: Because I think So what I think, why I think that’s incorrect is that if you look at, one, your pricing depends on what the price is, if it’s public market or if it’s private or whatever. You’re saying, the person that’s reading that the re-acceleration of revenue is equal to the old revenue, which it’s not even close. Because one, you had on SaaS, you had typical SaaS margins, whatever it was, right? Stickiness and all these things. Now what you’re doing is you are saying, “Here is my agent, and I have whatever the margin is.” It’s way worse, right? And now you’re using Anthropic or OpenAI or whatever through me, the SaaS product, and then we as a community are saying now that is re-acceleration. And so one, I think that’s wrong because it, first, it’s not the same. The makeup is not the same. The other thing is, and go back to, what I mentioned earlier is, the Kua and how I set up OpenCloud and whatever. I don’t want your agent, essentially, because what happens, right now we have a problem that, and this has historically been, you have data siloed in, again, ClickHouse, QuickBooks, it’s all siloed, and now you’re giving me an agent that’ll give me the data, but it’s still siloed, right? And so now I have to, take that data and then get another agent.Swyx [01:05:52]: Just expose the data to my agent.Ivan [01:05:53]: Just expose the data. Just expose it. And one thing I have to and so I’m like, “Just expose everything and charge me for that.” So charge me for consumption of API. So you’ll have your old seat-based pricing for humans. Charge me for this. The number of agents will skyrocket, and essentially you’ll have more usage, and charge for more if your product has value. So, there’s arguments some of them do have value. It’s a database, not database. We can get into that. But some of them really do, and I was actually shocked that the first person to do this was Benioff.Swyx [01:06:24]: Salesforce, yeah.Ivan [01:06:25]: Sales.Swyx [01:06:25]: Agentforce?Ivan [01:06:26]: It, there was a tweet, I think three days ago, where she said every product in Salesforce has been exposed via an API.Swyx [01:06:33]: Wow.Ivan [01:06:33]: Everything. And I’m like, now I understand why this person has built.Swyx [01:06:38]: This guy’s king.Ivan [01:06:38]: This insane. Kudos to him. Amazing. It’s like, thank you. I don’t know if you listen to me or someone else, but like thank you for someone This is the direction of the world, and so if you can get real acceleration against that, against consumption of API, that is actual revenue, and that is actual real acceleration, and that is where value come from. And I think that there will be cold shower when people understand, no one’s actually gonna use and pay for these agents and tokens, and that wasn’t actually really a solution, but it’ll drop back down.Swyx [01:07:05]: Yeah. Yeah, look, obviously, I think generally correct, and I agree. I think - But people are going to try to become an AI company.Ivan [01:07:15]: No, absolutely. And nothing against that. And I - this is no, - To be very clear, this is not a downer on anyone that’s building this thing. Everyone has to get to, get to the revenues, get to the multiples, get the valuations, do what you have to get to the next step. Absolutely agree. But we, as a community, are now, saying, “Oh, this is, the magical way to get out.” This is not. Like, that is not what is happening, right?Swyx [01:07:35]: Yeah. No, I think, there was like this kitchen appliance company that put out some AI nonsense recently.Ivan [01:07:42]: It was also the sneaker as well. It was called Allbirds.Swyx [01:07:44]: Allbirds. No, Allbirds is pivoting to GPU. That’s fine. It’s like, I have - I can - I have some money left, I’m just gonna, do some lottery tickets, would you go into offering GPUs?GPU Sandboxes, Data Centers, and Bare Metal EconomicsIvan [01:07:55]: Oh, yeah, we will. But not for inference. Like, essentially, what we think about is, the GPU sandbox. So, if you think of, if you have a GPU in your computer, that is what you have a GPU in the sandbox. So, there are workloads that do need GPUs. Again, I always go back to 3D rendering ‘cause it’s the easiest one to comprehend. But, if you wanna do any type of RL on, CAD or something like that, you will need a GPU in the sandbox, and so that’s coming now as well, yeah.Swyx [01:08:18]: How about own data centers?Ivan [01:08:20]: Own data centers. So we run on co-location providers, bare metal machines. Data centers, we technically can run on that or our own data center. Like, that’s how we architected it. Today, from a gross profit margin perspective, it doesn’t make sense for us to get in that. You have to raise a large amount of capital, a large amount of risk for, single-digit percentage points. So today, that doesn’t make sense, but we are fundamentally architected so that we can do that if we want.Swyx [01:08:47]: Yeah. you’re a large customer of these guys now. Do you see any opportunity?Ivan [01:08:51]: We will see. We will see, yeah.Swyx [01:08:54]: Yeah. I see a lot of people, trying to do the bare metal thing, we talked to Railway, the other day and they’re also doing a very similar, strategy.Ivan [01:09:04]: They think - I think they’re building out something or they have their own sort of data centers now.Swyx [01:09:07]: Yeah, they have majority their own data centers, I - But I do think, they still use Equinix and all those things. So I think it’s just interesting that this model basically hasn’t changed. It’s basically a real estate model. They manage the facilities and then you do everything else, I wonder how it can be changed for the, for the future ‘cause, the AI wave is the opportunity to reinvent everything, yeah. anything else, cool. I think that’s about it. I didn’t have any other, topics. I think this is, as best and comprehensive, if you have, any questions about the compute market, and sandboxing and Daytona, this is the best place to start. Where does this go, man? Like, we’re here in April. Things are growing 75% month to month. Like, where are we, where are we gonna be by end of year?The Agent Cloud: New AWS, New Stripe, or Something ElseIvan [01:09:58]: It’s an insane number. I’m sort of scared to say it out loud. So, it is - It’s very big, just the sandbox market on - And we - There - We talked about this in general. The entire infrastructure market is growing 40% plus or minus month over month. Everyone is growing 40% month to month. And that’s also a hot take, is like if you’re not growing 40%-ish, it’s not that - It’s just the market. You might as well - You don’t have to come to work to grow that amount, basically. I’m half kidding, but that’s where it’s going. And so where does it end? We will see. The thing that I think about from at least a CPU perspective, a GPU is even crazier, but from a CPU perspective, it is like there’s a high probability that actually owning the CPUs beforehand will be a go-to-market tactic, and it will probably - ‘Cause I - You - As you do probably talk to a lot of GPU providers, their growth is hindered by the amount of GPUs that you have right now, right?Swyx [01:10:47]: Yeah. It’s just like, it’s whatever NVIDIA decides to bless that day.Ivan [01:10:51]: That’s how much, that’s how much they’re gonna grow, right? And so where - The CPU market in general, be it like something like Railway, for example, or Vercel or whatnot, or Deployment, or it’s like the sandboxes, they’re still CPUs. So, each is growing at the pace of the of their - the market and what their, plus or minus of that market. But it’s still not constrained by that. And so my thought is, for all of us in this market, and databases fall into that as well ‘cause databases also run on CPUs. And it’s like we all have to grow as fast as we can so we can get enough of, CPUs tomorrow from Intel or from NVIDIA, ‘cause they have now CPUs and everyone else later on. So it’ll be interesting when we get to that cap.Swyx [01:11:30]: Okay. maybe one version I’ll phrase this is like, are you, is the potential new Heroku, new AWS or new, what’s it? New Stripe but compute? Or like what’s the, what’s the analogy that is most appropriate?Ivan [01:11:48]: There’s interesting. There’s like analogies of like - So the, there’s new Cloudflare, but new Cloudflare is new Cloudflare.Swyx [01:11:54]: New Cloudflare.Ivan [01:11:54]: They’re actually doing a really good job about,.Swyx [01:11:56]: Cloudflare owns networking. No one can fight. it’s like, come on.Ivan [01:11:59]: They’re doing - No, they’re doing really well. No, what I said is in the sense of their whole agent portfolio is actually really good. And I should say there are some technical I think, personally, around, everything’s under constrained under Workers. Like, Workers is their thing. But from a go-to-market vision perspective, I think they’re actually really good. I think they actually get it, unlike some other companies, and to your question is like, what is gonna be - There will be an equivalent, everyone says like an AWS for AI agents, but your answer, it might look more like Stripe than AWS, in a sense. So there will be a cloud built out specifically for agents. And so that cloud will have sandboxes, and it will have web search, and it’ll have, databases like SQLite or Neon or whatever, specifically for agent and other things. We are not at the end of the new infrastructure primitives for agents. There are more coming. So people think like, “Oh, there’s nothing else. This it.” There are more. Like, we have some ideas about the next ones. We don’t have time to do them, but there are definitely more primitives that are being built out for agents, and there will be, I think, a cloud that runs all that together.Swyx [01:13:07]: Yeah. Yeah, OpenAI has said AI cloud, Vercel has said AI cloud, and you are potentially also one of the other, the prospective AI clouds. I think it’s a very big prize to win, well, thanks for coming on.Ivan [01:13:18]: Thank you for having me. It’s been amazing.Swyx [01:13:19]: Yeah. Okay. That’s it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal <> GCP <> AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem.Railway did not start as an AI infrastructure company.It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts.For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor.Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised.From rebuilding Railway’s network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway’s founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite.We go deep on Railway’s infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying.We discuss:* How Railway went from a slow six-year grind to adding 100,000 users a week* How Railway thinks about agents as the next dominant software species* Why agents need version control, observability, compute, storage, and orchestration at 1000x scale* The economics of Railway’s own-metal data centers and three-month payback* How Railway uses cloud bursting while scaling its own infrastructure* Why data center debt can be a better tool than venture debt for infra startups* Central Station, Railway’s internal system for clustering customer feedback and incidents* Why responsible disclosure and over-communication matter for platforms* Why feature flags, progressive rollouts, and shadow traffic are essential for agents* Temporal’s strengths, pain points, and why workflows matter for agents* Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems* Why “cattle, not pets” may change if you can clone the pets* Why Railway is building a new cloud from scratch instead of copying hyperscalers* The solo founder path, focus, writing, and how Jake thinks about company buildingRailway:* Website: https://railway.com/* X: https://x.com/RailwayJake Cooper:* LinkedIn: https://www.linkedin.com/in/thejakecooper/* X: https://x.com/JustJakeTimestamps00:00:00 Introduction: What Is Railway?00:02:07 Jake’s Path to Railway00:06:13 Railway’s Six-Year Growth Story00:08:52 Rebuilding the Business After the Free Tier00:11:17 Agents as the Next Software Platform00:13:29 Railway’s Infrastructure Philosophy00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch00:17:22 Cloud Bursting and Five-Cloud Networking00:20:20 Data Center Debt and Infra Financing00:23:31 Data Centers in Space00:25:24 What Agents Need From Infrastructure00:28:24 CLIs, Canvas, and Agent-Native UX00:35:15 Central Station, Incidents, and Responsible Disclosure00:40:30 Safe Rollouts, SRE Agents, and Production Forks00:45:00 AI SRE, Specs, Code, and Tests00:48:24 Self-Replicating Infrastructure and the New Serverless00:53:18 Heroku, Temporal, and Workflow Engines01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration01:10:56 The Pull Request Is Dying01:12:28 Feature Flags and the Agent-Era SDLC01:16:15 Cattle, Pets, and Cloning Machines01:19:29 Solo Founder Lessons01:24:12 Focus, GPUs, and Building a New Cloud01:28:20 Closing ThoughtsTranscriptAlessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I’m joined by Swyx, editor of Latent Space.Swyx [00:00:10]: Hey, hey, hey. Today we’re in the studio with Jake Cooper of Railway.Alessio [00:00:14]: Conductor of Railway.Swyx [00:00:15]: Conductor at Railway. Yeah.Alessio [00:00:16]: Choo-choo.Swyx [00:00:17]: Do you actually have that anywhere, like on your business card?Jake [00:00:20]: We call some of our volunteer moderators conductors. I don’t have a business card. We’re not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.”Swyx [00:00:30]: Business cards are coming back.Jake [00:00:32]: They’re cool. They’re hip. The conductor thing is good. We’re trying to figure out what we want to call each other internally. Some people think it’s super cringe and say, “You don’t need a name for people internally.” Some people want to call each other something. We still don’t have a really good one.Jake [00:00:55]: We’ve got New Railcrews, Trainiacs. Nothing has stuck yet.Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don’t know, what is Railway? Let’s give people a crisp definition up front.Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you’re off to the races.Swyx [00:01:22]: You’ve got a nice animation on the landing page.Jake [00:01:24]: Thank you. None of my work, by the way. They don’t let me touch the design stuff anymore.Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment.The Railway Origin Story: From Uber Systems to a New CloudSwyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway?Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal.Swyx [00:02:44]: Which, by the way, I’m happy to talk about, pros and cons.Jake [00:02:48]: Totally.Swyx [00:02:51]: But let’s do the Railway story.Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it’s walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don’t care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience.Jake [00:03:17]: I don’t have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That’s what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be.Swyx [00:03:49]: Other patches to the Linux kernel this week?Jake [00:03:51]: Yeah. Not upstream. Our fork.Swyx [00:03:52]: That’s a flex. Railpack? No, this is different. This is the OS on top of Railpack?Jake [00:03:57]: No, this is an actual kernel patch. It’s always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable.Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases?Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we’re building for some of the agentic stuff. Maybe it’ll be useful upstream, but it’s deeply useful for us internally.Open Source, Forks, and Non-Deterministic VersioningSwyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it?Jake [00:04:38]: GitHub’s original sin is that it’s almost a series of broken pointers. You have this thing, then you clone it, and now you’ve lost the whole upstream. How do we make it trivial for people to modify really small pieces of it?Jake [00:04:51]: We think of Git in a discrete sense: I’ve either made a change and merged upstream, or I haven’t. What would it look like if it were percentage-based, a little more non-deterministic, or a stream of changes that users traverse as a percentage rolled out in general and then rolled all the way up?Jake [00:05:13]: We have the open-source kickback program and let you deploy templates because we want to make it trivial for people to version these shards over time. It solves a large problem around authentication, authorization, and security. NPM has a way to define, “Don’t take any new packages.” The ideal end state is that you roll out progressively to users with the minimum impact zone and continue rolling up. JPMorgan should probably be the last one on the patch line, for all our sakes, because our money and livelihoods are there.Jake [00:05:53]: It’s okay if Johnny Vibe Coder gets a broken patch because there’s so much entropy in the system that the rubber has to meet the road at some point. You have to test at varying levels.The Long Grind: First Users, Free Tier, and Making the Business WorkSwyx [00:06:13]: I wanted to pull up this glorious chart, which is your usage or number of daily signups?Jake [00:06:22]: Daily signups, I think.Swyx [00:06:24]: You started six years ago. It was a slow grind, and now you’re on a rocket ship. You say, “Don’t doubt your fight and don’t quit.” Maybe pick out certain points that were key inflections for the company.Jake [00:06:40]: At the start, it’s about getting your first 100 users, hell or high water. We had a website and a support link. The support link was the Discord channel. I had notifications on with two monitors: the monitor I was working on and the other monitor with Discord. If anybody came in, I was immediately like, “Hey, how’s it going?” It was rare, so getting those first 100 users to come back was the start.Jake [00:07:14]: Then you build a consultancy factory because users want all these things. You have to go back to the board and ask, “What is the actual product offering I want to build on top of this?”Jake [00:07:28]: VCs want charts that always go up and to the right, but in reality you don’t necessarily want charts that look like that. For us, there have been periods of expansion where we add features to test use cases, and periods of compaction where we ask, “If the experience we have is good, how do we make it significantly better?” Maybe we strip out features that don’t fit our ICP anymore.Jake [00:07:57]: The boom from 2022 to 2023 came from the free tier. Everybody under the sun was using it.Swyx [00:08:09]: A lot of Reddit bots and Discord bots.Jake [00:08:12]: And crypto miners. When you build an open product on the internet where anybody can sign up, the internet is a horrible place with so many things. You go through periods of asking, “How do I reach as many people as possible?” Then, “How do I fit the exact use case for the people who really matter and are really excited about this specific thing?”Jake [00:08:39]: Then there was a two-year period of making the actual business work. During the free-tier era, we were losing about half a million dollars a month.Swyx [00:08:59]: On a $20 million bank account.Jake [00:09:02]: On a $20 million bank account with maybe $50,000 a month in revenue. That’s a horrible business. I don’t know how anybody invested. But you have to go through it and say, “We have an experience people love, but the business has to work.”Jake [00:09:17]: There are two schools of thought. You can run the horrible business all the way up with bad margins, or you can go back and make it work. We’ve always wanted a super lean team. We’re 35 people right now. It’s very small.Swyx [00:09:36]: Supporting three million already?Jake [00:09:38]: Yeah. We’re adding 100,000 users a week right now, so it’s growing fast. We don’t want to add headcount for the sake of headcount or throw bodies at problems. We want to build systems. It’s hard to build systems during expansion because you’re adding things to the system because people are asking for them or things are breaking.Jake [00:10:00]: We had to cut off the free users for a little while, rebuild the business, and make sure it worked. We want to reach as many people as possible because software is important. It’s become difficult to create things in the physical world, so it’s important to make it easy for people to build in the virtual world and have access to creation. But there are legs to that journey.Jake [00:10:30]: You can see divots in the charts. If you follow between 2025 and 2026, it’s either summer or winter. People go on holiday with family.Swyx [00:10:50]: It affects that much?Jake [00:10:51]: Yeah. It’s kind of B2C and kind of B2B. People are shipping constantly, then they stop. Our activation curve now shows more people activating on weekdays because we have more business users, so it smooths out over time.Agents as the New Interface to DeploymentSwyx [00:11:17]: Was there a point where you started prioritizing AI development or agent development?Jake [00:11:24]: We’ve prioritized agentic as a top-of-funnel thing. Over the last six months, we’ve deeply prioritized agentic as a mechanism to build and deploy things because we believe the curve is so steep and that is how people will build and deploy software.Jake [00:11:42]: It almost fundamentally doesn’t matter whether this is dot-com or not because we’re all on the internet anyway. If agents are going to deploy a bunch of things and we hit an inference wall at some point, we’ll fix those problems. The dominant species over the next 10 years is that we’ve moved from assembly to C to C++ to JavaScript to words. You’re going to need to close that loop.Swyx [00:12:13]: When you say this is dot-com, did you mean buying the domain, or the general case?Jake [00:12:17]: I mean the dot-com era, when companies had a huge run-up because people understood the internet was important. Then they hit bottlenecks, fundamental laws of physics, math didn’t work, and everybody came back down to earth. But it didn’t matter because the internet became so impactful. If you operate on a long enough time horizon, you should build these things anyway because you can see where it’s going.Jake [00:12:45]: That’s where I think a lot of agent stuff is. You get to a point where you’re running thousands of agents in parallel. What is the inference cost? What is the compute cost? How do you make that efficient? How do you coordinate all this? We have issues coordinating humans; we don’t even have good tooling for that. Now we have to figure out how to get agents to coordinate, safely version changes, and know when to raise their hand for someone to intervene. Otherwise it becomes an interrupt factory.Railway’s Infrastructure Thesis: Network, Compute, Storage, and MetalSwyx [00:13:19]: Let’s go right into the technical side. What are the core infrastructure or architectural beliefs of Railway that allow you to do what you do?Jake [00:13:29]: The primitives matter a lot for us. We need network, compute, storage, and orchestration around it. You need control over a lot of those things. We’ve talked a lot about how we don’t really use Kubernetes because we want higher-order control to place workloads in very specific places.Jake [00:13:48]: The reason is that you have to be very efficient with agents: memory reuse and all these other things, or you’re going to massively blow up your cost structure. Being able to rack and stack your own servers and build your own metal unlocks performance and cost. Experiences where you’re running 1,000 agents in parallel are not massively cost prohibitive.Jake [00:14:13]: Token use and compute use are blowing up. Over time, those things have to get a lot more efficient. You can get a lot of margin to make those experiences solid by building your own metal. That’s all in service of offering a differentiated experience to as many people as humanly possible.Swyx [00:14:51]: You have a data center in Singapore.Jake [00:14:53]: Yeah. We have two in every other region now. In Singapore, we’re adding a second one in Q3.Swyx [00:14:58]: What’s it like? I’ve never built a data center. Do you go to Equinix and say, “I want some slots?”Jake [00:15:05]: Yeah. Equinix. You basically go and say, “I want power and I want a cage.” They say, “Great, here’s what it’s going to be.” You rent the cage for a period of time, fill it with racks and servers, and hook up internet to it. That’s all the pieces.Swyx [00:15:36]: Then you handle everything else.Jake [00:15:37]: You handle everything else.Swyx [00:15:39]: What’s the math versus clouds doing it for you?Jake [00:15:43]: If we rented in the cloud, our payback period when we go to metal is about three months.Swyx [00:15:50]: Which is crazy.Jake [00:15:51]: It’s nuts. That’s four years of depreciated hardware. You’re going to see a lot of this compute crunch because hyperscalers are buying up a lot of stuff. We’re working directly with OEMs, resellers, and people building these machines: Supermicro, Dell, and others.Jake [00:16:11]: Upstream, there’s a bunch of supply pressure. When we raised our last round, between deploying capital for servers and now, the amount of money we’ve raised is less than the amount of money we have in the bank plus the value of the servers because the servers have appreciated as RAM has gone up. It’s nuts how valuable hardware has become.Jake [00:16:50]: If you look at hyperscalers, they deployed around $80 billion of capital expenditures this year, and next year will be more. That’s a massive infrastructure build-out. You look at that and think it’s crazy that they’re spending way more than the Manhattan Project. But if every person is going to run dozens or hundreds of agents in parallel, you have no conceptual idea how much compute is required to make that experience happen, even if you’re deeply efficient and sharing resources. And that doesn’t even count inference.Swyx [00:17:22]: How do you plan the build-out? The growth chart is so vertical. Are you usually at 100% utilization as soon as racks are live? How far ahead are you planning?Jake [00:17:33]: We still maintain cloud presence for bursting. We work with AWS, GCP, and a few other clouds. We can rent, and then the moment we get space or power, we compact those workloads off the cloud. We started on the clouds, then built a system to migrate to our own metal. There’s nothing that says you can’t continually do that again, and that’s exactly what we do. We never want to be compute constrained.Jake [00:18:09]: At the start of the year, we actually became compute constrained because one upstream provider wasn’t able to give us quota at the rate we needed, and the hardware was slower. I spent a weekend rebuilding our entire network overlay so we could straddle five clouds: Oracle, AWS, ourselves, GCP, and one other one. We can do more than that now.Jake [00:18:38]: We got into a spot where we were trying to pack instances tight because we couldn’t get enough compute. That led to a few reliability issues, which are now past us. I made a tweet pointing out that it’s becoming harder and harder to acquire compute at the rate these models need to acquire compute. We got bit by it.Swyx [00:19:15]: How do you think about pricing knowing you might not have your own metal available at all times? Are you pricing assuming you need extra margin if you end up going into the cloud?Jake [00:19:26]: Because we’ve built out our metal data centers, our margins on metal are around 70%. We can deeply subsidize the cloud business if we want to scale at a reasonable rate. We have a few levers: metal, which makes the margins; cloud burst; debt to buy servers; and venture capital. It’s an interesting operational problem: how much cash do we have, how much should we raise, how quickly can we deploy it, and can we scale revenue as quickly as we scale compute?Jake [00:20:05]: If we continue making it trivially easy for people to build and deploy, then the faster we close that loop and the more operationally excellent we are with capital, the faster the business can scale. It’s almost a straight linear deployment rate.Financing Infrastructure: Hardware Debt, VC, and Operational LeverageSwyx [00:20:20]: I think infra startups raising debt is a tool people don’t utilize enough or know enough about. What can you tell us about that? Is it secured against your CPUs?Jake [00:20:32]: It’s secured against our hardware.Swyx [00:20:37]: What rates do you get? Who are the lenders?Jake [00:20:39]: We pay prime plus a spread, and we can refinance any of the debt as rates go down. The terms are pretty good. The unfortunate thing is that Twitter has no nuance, so people say, “Venture debt bad.” But as with all things, there are specific tools and areas where you can be deliberate instead of using one tool as a hammer. Venture capital is not the hammer for everything. You have to explore and figure out what works.Swyx [00:21:12]: VC is usually the most expensive financing you can get.Jake [00:21:15]: Yeah. I also think people think about VC incorrectly from a capital-raising perspective. Most people think, “How do I raise as much money as possible from whoever is probably the best I can get at that time?” That’s close to right, but what we’ve tried to do is figure out what unfair advantage we can buy with that equity.Jake [00:21:34]: It’s the most expensive equity you’re going to give away at that point in time, assuming the company keeps getting better. How do you use it to work with someone stellar who complements you? In the seed stage, I had never started a company. Ray Tonsing had good advice, and I could text him all the time. He was really fast. Awesome.Jake [00:22:01]: Then with John and Erica at Unusual, they said, “You roughly know what you’re doing building a product. We’ll mostly leave you alone and be available for advice.” Amazing. Then we got to Series A and the business was an operational tire fire because we didn’t know how to scale a business. Work with Erica, and Jordan is over at Redpoint, so bonus.Jake [00:22:28]: Now we’ve raised from TQ and FPV as we’re moving into enterprises. Every step of the way, we’ve asked: who can we partner with at this specific time to unlock the next section of the journey? I don’t know enterprise sales. As an engineer, I can eyeball what features we might need, and we have wonderful people internally who can help. But you want boardroom dynamics where everyone is aligned and asking, “How do we win this?” instead of bickering about strategy.Data Centers in Space and the Physics of ComputeSwyx [00:23:31]: You had a tweet about data centers in space. Why no data centers in space?Jake [00:23:37]: It’s not “no data centers in space.” My hot take is that I think it is solvable. I’ve just never seen anybody solve it.Swyx [00:23:49]: You said, “How are you going to dissipate that much heat in a vacuum?” You’re making a physics claim.Jake [00:23:55]: I haven’t seen anybody prove how you’re going to dissipate that much heat in a vacuum. It doesn’t mean it’s not possible. It just means nobody has brought it up yet.Swyx [00:24:05]: Astrophage.Jake [00:24:06]: I don’t know what that is.Swyx [00:24:07]: The Martian thing. Okay, you’re very logical.Jake [00:24:09]: It could work. A lot of people are putting the cart before the horse. They say, “We’re going to put data centers in space.” Okay, but how? “We have time to figure it out.” It’s like in The Martian where they ask how they’re going to intercept something and say, “We’ll figure it out.”Swyx [00:24:36]: Making a bet on human invention is weird because you blind trust that it can be solved. But with physics, there are first-principles bounds you can put on it. Maybe not. Maybe you’re asking to travel time or break a fundamental thermodynamic law.Jake [00:24:57]: I don’t know how VCs do this either. How do you know what’s not possible and a grift versus what’s possible but sounds completely insane? “We’re going to put data centers in space.” Coin flip as to which it is, and I guess you’ll know in 10 years. That’s one cycle.What Agents Need: Versioning, Observability, and 1,000x ScaleSwyx [00:25:23]: Moving back to agents. The branching, fast spin-up, and orchestration you do feels like pre-work that happened to be exactly what agents want. What do agents want differently than humans?Jake [00:25:37]: They want the ability to version things. It’s not that different; it materializes slightly differently. Agents want a way to test changes incrementally. Engineers have feature flags. Is there a reason agents can’t use feature flags? I don’t think so.Jake [00:25:54]: They want version control. Can we use Git or not Git? That one is up in the air. I think something outside Git will emerge for how we version these things over time. They need observability. You need to query what happened, when it happened, which steps failed, traces, logs, metrics, and all the rest. They need network, compute, and storage. They need to write files, save files, iterate on files, and snapshot file systems.Jake [00:26:25]: A lot of what humans needed is in line with what agents need. Branching and forking are not different; we’re just moving 1,000 times quicker. It can look like you need something massively different, but what you need is something massively better than what existed. You need orchestration massively better than Kubernetes. You need networking probably better than Envoy. It goes all the way down the stack.Jake [00:26:55]: If the workload profile doesn’t change so much as it gets massively compressed because you need thousands of these things, what assumptions change? etcd is going to melt. You need to replace it with something. You can go all the way down the stack and say, “That part has to change, that part has to change, and that part has to change.”Jake [00:27:19]: The interesting thing about the super-exponential curve is that you have to build systems where you can rip out those parts at any time because a new bottleneck might emerge. You get good at parallel agents, and a different part of the system breaks. So it’s similar to what humans needed, but at 1,000x scale.Jake [00:27:55]: How do you do code review in the age of agents?Swyx [00:28:00]: You throw more agents at it.Jake [00:28:01]: You don’t. But then who reviews for CVEs and all these other things?Swyx [00:28:07]: More agents.Jake [00:28:08]: And that’s how we hit the inference wall. You can continually throw agents at the problem, but I think there’s a limit to the number of agents you can throw at a problem.CLI, Agent Handles, and Closing the LoopSwyx [00:28:24]: You already had a CLI before it was cool. How is the shape of what you’re exposing changing, if at all?Jake [00:28:28]: CLIs have always been cool. The CLI changes because we think about how to give Claude, Codex, ChatGPT, or any model a handhold.Jake [00:28:50]: A CLI is a single command: deploy, get logs, and so on. Things that were prohibitively annoying to humans are not annoying to agents. They’re nice. If I handed you a CLI with 40 arguments and 600 flags, you’d think, “I’m never going to use all of this.” But if you hand it to an agent, it says, “This is excellent. I have so many handles to work with.”Jake [00:29:24]: If you’re going to expose things to agents that way, you want as many handles as possible where they can get information, query dynamic information, and close the loop quickly. Most problems right now are about how to close the loop as quickly as possible. Where does the agent get stuck, and how can you remove that?Jake [00:29:49]: Telemetry is important. If you can tell where the agent gets stuck from the CLI and say, “12% of people deviate from the happy path because of this, and now I add this argument and drive it down to 2%,” you massively increase the rate of loop closure.Jake [00:30:03]: That’s how we think about not just the CLI, but every point in the dashboard. It’s a user journey: I hear about Railway. I get something deployed. I get my first green build or aha moment. I see an endpoint, logs, whatever. Then I iterate. The iteration loop is indefinite. The user wants to deploy a new thing, a Postgres instance, change code, and keep iterating.Jake [00:30:36]: If you focus on the iteration loops and what’s blocking them from closing quickly, one thing we say internally is: you never want to be waiting on compute anymore. You always want to be waiting on intelligence. If you’re waiting on compute, there’s a bottleneck that needs to be destroyed because eventually that bottleneck becomes so large that another workflow emerges to change it.Jake [00:31:04]: We’ve built a product where you push code, build it, and so on. But I fundamentally believe the push-pull loop is going away. We’ll get to a point where you make a small change in production, that change is versioned across your infrastructure, you’re working alongside copy-on-write versions of your database and infrastructure, and then you merge it in and it’s instantaneously live. That’s the holy grail of loops. The push-pull-rebuild thing is a point of friction that we’re removing entirely.Canvas as Output: Dashboards, Context Anchors, and HyperstructuresSwyx [00:31:43]: It’s incredibly fast. If anyone hasn’t tried it, that fast feedback is great. My hot take is that Railway was famous for its canvas, which visualizes your infrastructure and lets you manipulate it visually. But that was for humans. For the next phase of growth, Railway CLI is more important than canvas.Jake [00:32:05]: The canvas is funny because it’s a mechanism to show changes over time. You’re right that previously we used it a lot as an input. Moving forward, its goal is more like an output. You would go to the canvas, make changes, see them, and watch your infrastructure evolve. Now agents have access to the CLI and can make those changes. So the canvas becomes an output: what information does the human need at this moment to make suitable decisions about control requests? Do I approve this or not?Jake [00:32:57]: It also has to be an anchor for your context, a port in the storm. Think of it like layers in a file system. You start with a project, then drill down into services, then into a function or code, because you want to represent the entire thing not just in your head, but in the canvas. Other people can share that representation, think on the same wavelength, and move quickly.Jake [00:33:33]: A lot of organizations get in trouble as they scale because all the context lives in someone’s head. “How does this microservice work?” “I have no idea; go ask this person.” Then you have whole categories of products built around context discovery. A lot of that melts away if you have a solid hierarchy and can infinitely nest services, code, context, and everything else all the way down. That’s what lets you build these structures over time.Jake [00:34:18]: It’s also what lets us build what I’ve called hyperstructures: things that are way bigger. You look at the Golden Gate Bridge and ask, “How did we build that?” There’s a meme that we lost the technology. To some extent, yes, because the coordination that built those things evolved and changed. We lost some of the art of building structure as we jammed everything into Slack.Swyx [00:34:52]: But you jam everything in Discord.Jake [00:34:53]: Same point. It doesn’t matter. It’s message passing and interrupts, message passing and interrupts.Swyx [00:35:00]: So you’re arguing there should be something better and more structured than Slack?Jake [00:35:04]: Yeah. For sure. I think Slack is awful, and Discord is awful too.Central Station: Context Routing, Support, and Incident ClustersSwyx [00:35:09]: This is the equivalent of my mom test. What have you done that has your solution to this?Jake [00:35:15]: Internally, we’ve built a tool called Central Station that aggregates all the context from our users. Every piece of feedback, every customer support item, everything gets aggregated into clusters. If an incident is brewing, we can determine how many users are affected and break off a discussion based on that.Jake [00:35:40]: That is more helpful than long-running channels where you’re trying to decide which channel to put something in. If you can dynamically aggregate information and dynamically route it to the right person based on context, it works better. We know internally that these four people are close to networking. If we see a networking thing, we can drill it down to those four people. If it’s with this part, we can look at the commits. This is no longer a manual process internally.Jake [00:36:13]: If you go to station or help.railway.com, that’s why we built it. We wanted to scale with a massive amount of leverage by aggregating feedback.Swyx [00:36:27]: This is built in-house?Jake [00:36:28]: Yep.Swyx [00:36:29]: I remember helping out on this one with Angelo in 2023. You scale a lot with a very small team.Jake [00:36:38]: Yeah. We’re about 10 times bigger now.Swyx [00:36:40]: You have your full developer code here? Very cool.Jake [00:36:44]: If you go to railway.com/stats, we expose this as a pub-sub-able thing. It’s all real-time metrics. There’s a way to get it as JSON somewhere if you care.Jake [00:37:01]: We’re big on trying to build everything in public and talk about what we’re working on. We’ve had issues in the past, and we’ll say, “Here’s how we’re fixing these things.” We’ve gotten compliments and flak for incident reports. We’re always trying to make them better and talk with people.Incidents, Disclosure, and Progressive RolloutsSwyx [00:37:20]: You had a big one recently. I liked that it was scoped to 3,000. You presumably used Central Station. Talk through what happened and how you address it internally as a team.Jake [00:37:38]: Internally, this one really sucked. It had to do with an upstream provider that didn’t do the behavior it said it documented, which is unfortunate given they wrote the RFC for how the behavior should work. We rolled those things out, and Central Station caught it initially when a couple users said caches weren’t invalidating. We turned it off immediately.Jake [00:38:03]: When you roll out to a large user base of three million people, you get a lot of disparate behaviors. We tested in staging and had tests, but we hit an edge case. We’ve hardened those systems, and now we can make that better. But it was a tough one.Swyx [00:38:39]: I always wonder how private disclosure is supposed to work if people find an issue. Are they supposed to contact you first? When you run a platform, these things will happen. What channels should people pursue to quietly resolve it before it becomes a bigger incident?Jake [00:38:59]: There’s responsible disclosure. We err on the side of over-disclosing and letting you know something is wrong versus having your provider gaslight you. We’ve erred on sharing those things more publicly, even if they impact a small subset of users. That’s a decision we’ve made internally. We have four values. One is honor. The honorable thing is to notify people to the widest degree at which they may have been affected or there was an issue, and then confront it head-on: why did it happen, what can we do better?Swyx [00:39:45]: Not the whole user base. That’s because of incremental rollouts and other things?Jake [00:39:50]: Yeah. Progressive rollouts.Swyx [00:39:54]: That should be the norm at all large platforms.Jake [00:39:58]: It should. A variety of companies do this. There’s the quote that Meta runs 10,000 different versions of Meta. To our earlier point about agents, they need the same thing. They need shadow traffic and all these other things. We’ve built so much ceremony around production being sacred that we need to make it trivially easy to test different behaviors in a safe environment. Then you can make mistakes in a safe environment.Safe AI SRE: Customer Agents, Forked Environments, and Production ParityAlessio [00:40:30]: Do you see a world where these things get automatically caught, not necessarily by your agent, but by your customer’s agent? The cache invalidation issue seems easy to check if you know to look for it.Jake [00:40:44]: It’s hard because to determine it, we almost need to hook into your observability infrastructure. That’s why we have the template loop on the platform: so you can roll things out progressively. You can roll out to Johnny Vibe Coder initially, or push a shard that someone consumes at their own leisure. Or you can roll it out over weeks: 0.1% of people, 1% of people, early adopters, then all the way up. That’s the non-deterministic version control we talked about earlier.Jake [00:41:30]: I believe that’s where most things should go, because most companies end up building staged rollout systems in-house. It’s the same thing built again and again at every company. There’s a massive opportunity to consolidate developer debt.Alessio [00:41:45]: You should have a free tier. Model providers give free tokens if you let them use the data. You could give free compute if someone is the number-one shard that goes out and lets you plug into their observability.Jake [00:41:55]: We do that. That’s why we talked about the impact on 3,000 people. We start with lower-impact people. Larger companies on the platform are last to receive those rollouts so they have a version of the platform that’s deeply stable.Alessio [00:42:16]: I have three services, so I’m sure I get the first rollout. You can nuke my thing at any time. There are all these SRE agent companies. Observability people also want agents that fix upstream problems. You have your own agent in the canvas now. How do you see that playing out?Jake [00:42:39]: It’s the stacking entropy problem. If you don’t have primitives to make iteration in production safe, it becomes difficult. If you’re an observability provider saying, “Here’s the fix to this error,” assume 80% are good and make sense. But in the last 20% long tail of complex issues, if you let somebody stamp it, you create an opportunity for an incident.Jake [00:43:08]: That’s why forked environments are important. People have staging, but it always drifts from production. You need primitives, workflows, and experience built first-party on the platform so you can fork any service at any point in time.Jake [00:43:33]: I think of the canvas as a sheet of transparency paper. The agent is a little guy you push up into the canvas. It should say, “I need to copy that service and that service so I can test these two things.” It gets a read-only copy of production. Anything that’s PII gets marked as a transform when we clone the database, create a copy-on-write version, or read from it. Then the agent makes changes and asks, “Does this actually work?” as close to production as possible.Jake [00:44:22]: That’s how close you have to be, or you get massive drift. The system becomes unstable. You see this with massive systems built on Docker for local, Kubernetes for production, and a specific thing for something else. That complexity slows developers and becomes unstable at scale, making it hard to iterate. We want to compress that way down and say, “As close to prod as possible is where we want to be.”From AISRE Skeptic to Agent BelieverSwyx [00:45:00]: I was texting Erica for questions, and she says you were originally not a believer in AISRE. Have you come around on it?Jake [00:45:10]: I flipped, but I’m still not a believer in AISRE if you don’t have the primitives to make it safe. If you unleash AISRE on production infrastructure without safe primitives for copying volumes and making sure things are fine, it’s going to nuke your production database. It’s not a matter of if, but when. I’m a big believer in making those loops safe.Jake [00:45:33]: I was a deep AI skeptic until 2023. In 2024, I thought, “Maybe I can roughly make this thing do it.” In 2025, I thought, “Now I can hold this.” Over winter break, everybody came back saying, “It’s almost impossible to hold this.”Swyx [00:46:01]: Did you see this on the Claude docs? CloudBot? OpenCloud?Jake [00:46:06]: It’s gotten to a point where it’s harder to hold it wrong than to hold it right. There’s a scene in Avengers where Vision picks up Thor’s hammer and says it’s terribly well-balanced. It self-balances and works well. I’m a deep believer at this point that this will be the dominant species: assembly, C, C++, JavaScript, words.Swyx [00:46:35]: It feels like a big jump.Jake [00:46:37]: It is. But it’s not like you abandon CPU-based discrete logic and move straight to fuzzy logic. You need both. Your skills should call code or applications or some static structure. You can use skills to distill what the procedure should be or how the code should act.Jake [00:47:02]: I’m coming to a thesis: you need three points. You need a clear spec defining the system, the code, and the tests. When you say it out loud, if you’ve been in engineering long enough, you’re like, “Of course. That’s an RFC, tests, and code.” But they all matter. Having them together lets them reinforce each other: the spec and tests match, but the code doesn’t, so reconcile it. Or the tests and code match but the spec doesn’t, so reconcile that. That’s the iteration loop.Jake [00:47:41]: That’s why you’re seeing people talk about software factories, docs, and reconciliation. Some of that is architectural astronomy if you don’t implement it, but that loop is where most things will end up.Swyx [00:48:07]: For listeners, we’ve been talking about this on the pod for three years: the holy trinity of specs and tests. Itamar Friedman from Qodo is the reference if people want to look it up.Self-Modifying Infrastructure and the End of Push-Pull-RebuildSwyx [00:48:18]: One thing I want to mention on the OpenCloud idea is self-modification. I don’t know how Railway would support it, but I have my OpenClaw, and I just tell it it has the Railway CLI and can do whatever. In theory, whatever capabilities or new infra it needs, it can call the Railway CLI, provision it, and add it to itself. The agent can modify its own infra.Jake [00:48:45]: It’s nuts. I have a loop set up where you put the Railway CLI on top of something that runs on Railway. You’re authenticated as whatever the current box is, and you can make any changes to it. Then you call Railway deploy, and it deploys itself.Jake [00:49:04]: It’s like: “I need to spin up this instance of this environment. I already exist in this environment. Excellent, I have access to a Postgres instance now.” That’s where we want to go with agentic, self-replicating infrastructure. That’s your loop: iterate in production. You continue making changes. If it works, merge it upstream. If it doesn’t, throw it away.Jake [00:49:37]: How do you make throwaway copies trivial to spin up and super cheap? The era of “I have an AWS instance with four vCPU and 16 gigs of RAM” is going to get destroyed. If you do that for agents, you need a thousand of those machines. It’s prohibitively expensive compared with what we’ve spent a ton of time figuring out: the atomic unit of deploy, whether you call it isolates, sandboxes, or something else. Only pay for what you use, spin up instantaneously, and close the loop as quickly as possible.Jake [00:50:15]: If the system can self-replicate safely and say, “This is my environment, I’m making these changes,” it can come back with, “Does this look good? This is a new state of infrastructure given this prompt. I think I’ve solved it.” Then you go back and say, “Actually, it looks different.” It does the loop again. Then you say, “Cool. Apply.”Swyx [00:50:38]: That’s retroactively obvious, which is the most useful kind. Any other comments on agent deployment on Railway?Jake [00:50:51]: It’s getting better every day. I’m on X or Twitter. You can always yell at me about the parts not working as well as they should, because plenty of things should work way better.The New Serverless: Stateful, Long-Running, Pay-for-What-You-Use LinuxSwyx [00:51:04]: At this stage, when people want massively or embarrassingly parallel compute, they usually talk serverless. I feel like there’s a new serverless compared to the previous five years of serverless. You’re in that new bucket. Do you have comparisons or philosophical differences you want to call out?Jake [00:51:31]: It’s somewhere in between. It’s the ability to run stateful, long-running workflows or executions.Swyx [00:51:42]: Vercel has Fluid Compute, Cloudflare has some container thing, Google has App Runner and others.Jake [00:51:55]: That’s where everything is roughly going, and it’s why we’ve been working on this for six years. We believe users need access to a computer: a box that speaks Linux. They need to deploy what they want. Other systems change the surface area of what you can build. For us, users need a computer and need to deploy anything they truly want. That’s why we’ve focused on the primitives: network, compute, storage. If we give you those and expose them so you can run things indefinitely, that’s where we believe it’s going.Jake [00:52:43]: Twitter has no nuance, so everyone says “servers” or “serverless.” It’s always somewhere in the middle: I want to run it for a long time, but I don’t want to provision the resource statically or pay for things I’m not using. That’s been our thesis from day one: pay only for what you use, run it indefinitely, and it is full Linux.Swyx [00:53:12]: That’s why I like the naming of Fluid. It’s fluid. Flexible.Heroku, Focus, and Carrying the Torch Without Becoming the PastSwyx [00:53:18]: Another milestone is the Heroku official deprecation. You’re one of the presumptive new Herokus. “New Heroku” has been a category for as long as I’ve been in developer tooling. It’s finally happening. What was that like? Any behind-the-scenes of, “This is the moment”?Jake [00:53:42]: You have people where you’re like, “You were running stuff on here? You, as this company?” It’s crazy that names you would know are running on it and now coming to us saying, “We want to move a lot of this off.”Swyx [00:54:00]: Any behind-the-scenes on why Salesforce let Heroku stagnate?Jake [00:54:05]: I can only guess. It’s hard when it’s not your business. Salesforce’s business is to build a great CRM. That’s their focus. Then you acquire a compute business as an offshoot. A lot of early Meta people talk about focus. Boz has a write-up about how in the early days of Meta they had no money, so they were forced to focus. Then they turned on the money tree and had no reason not to split their focus.Jake [00:54:52]: But that dilutes your product. You get offshoots where you ask, “Is this the focus of the business?” If it’s not core, it languishes. A lot of companies get in trouble when they split focus because they’re fighting a multi-front war, not just externally but internally for alignment. Where are we going? What are we doing? What is our purpose?Jake [00:55:24]: If you’re Salesforce-built and mission-driven, you want to work on Salesforce. Heroku is off to the side. It’s not core to the business. Getting resources, budget, focus, and alignment internally becomes hard. It was a matter of time.Swyx [00:56:06]: Kudos for them to call it out instead of leaving it unknown.Jake [00:56:12]: Their release was a little odd. They called it out, but they didn’t say they were shutting it down. Behind the scenes, I think they issued messages to people saying they should close accounts and that they were going to deprecate and remove things over time.Jake [00:56:30]: It’s crazy because some of my first deployment experiences were on Heroku. You start with dragging things into an FTP server, then you try to get a deploy working, and then it’s Heroku. It was the on-ramp for us. But the wheel turns. New things emerge. We’re happy to carry the torch for a lot of that. But we don’t want to be the new Heroku. We want to be the way people build and deploy software, and ultimately the way people monetize software over time.Swyx [00:57:19]: It’s still a big crown to be the new Heroku. There are 50 companies that fought for that.Jake [00:57:23]: Everybody is holding some portion of it. We’re happy to support people and companies. The platform works differently. The game loop is similar, but we’ve been dogmatic about where these things are going: primitives, agents, fan-out. Some things fit; some workflows need to change. We have an approximation of Heroku pipelines with the environment system. It’s exciting. We’ve got a ton of people we can support, and it’s growing a lot.Temporal, Workflow Engines, and State MachinesSwyx [00:58:12]: I have one more technical question about Temporal. I’ve sold my shares. You’re a power user and one of our earliest customers. I met you through Temporal. You built on Temporal. You have complaints. This may be the most neutral and informed conversation anyone will hear about Temporal without someone working at the company.Jake [00:58:39]: That’s fair. I’ve used Temporal for almost 10 years because of Cadence at Uber.Swyx [00:58:52]: Give people a sense of what Cadence was at Uber.Jake [00:58:57]: Cadence was the precursor to Temporal. It powers trip actions, rides, when you rent a Jump bike or scooter or car. You’re running workflows for a period of time and saying, “This ride will run indefinitely until it finishes.” You attach information: you paused in this zone, so add this charge to the bill. When you end the trip, the workflow is done. That experience was powered by Cadence at the time.Swyx [00:59:34]: I used to say it’s like programming the entire user journey top-down as one function.Jake [00:59:39]: It’s a powerful idea and important. It’s also important for the next phase of the agentic journey. You want an agent to do a specific task, be complete or incomplete on that task, and move on to the next thing. You need a way to manage workflows dynamically.Jake [00:59:59]: Temporal was always great in theory, and great when you got it working the way you wanted in production. But it required you to model the entire journey in your head. If you didn’t, you could cause issues where replaying the state of the workflow causes non-determinism.Swyx [01:00:25]: Because it works on deterministic workflow history.Jake [01:00:28]: Exactly. I describe it as a jet engine. If you know how to operate it and run it, it’s great. But you can’t hand it to people trying to build complicated things if they don’t have the whole state in their head.Jake [01:00:48]: We run our whole deployment pipeline on top of it. That’s a reasonably complicated workflow: pre-commit hooks, signaling, queuing, and all the rest. We ran into the same thing at Uber. As you express a large workflow, it gets more complicated, with more states in the state machine that you have to map back to the workflow.Swyx [01:01:15]: It’s a lot of ifs.Jake [01:01:16]: Exactly. At Uber, we built a system for doing the state machine and testing it. We’ve started to build some of those things here because it’s grown heavily. It’s not quite love-hate. When it works well, it works super well. But if someone who doesn’t have full context puts something into the system that invalidates state or causes non-determinism, or spins off a ton of activities, you have to keep track of underlying SRE knobs like activity slots. Those should scale with memory, vCPU, and so on. It becomes a bear to scale.Swyx [01:02:10]: You need a capable sysadmin running things behind the scenes. If you moved off, what would you do?Jake [01:02:19]: We’d build our own workflow engine. We have a few internally that we’ve worked on.Swyx [01:02:27]: This is one of those classes of things you typically wouldn’t vibe code, but I’m wondering if you can.Jake [01:02:33]: I still don’t think you should vibe code it. You still want to run decent tests to make sure it works.Swyx [01:02:39]: Timo didn’t invent that from scratch either. There are libraries you can run. On top of that, it’s just a state machine that you have to map out. Ultimately, you define the instructions you want and run them through a state machine.Jake [01:03:00]: It’s very doable. Workflow stuff is interesting. Restate is doing neat stuff here.Swyx [01:03:10]: You’re tied into JavaScript. Are you a JavaScript maxi?Jake [01:03:13]: Internally, we have TypeScript, Rust, and Go. We don’t add more languages. Actually, we have a little C because we write BPF code and hooks. But those are the languages.Swyx [01:03:28]: Is this for sidecars?Jake [01:03:32]: No. It’s for the networking stack, volumes, and things like that. We use TypeScript a lot because it powers the dashboard, but we’re moving a lot of workflow stuff off the dashboard stack and into the infrastructure stack.Railpack, Nixpacks, and Content-Addressable FilesystemsSwyx [01:04:00]: Cool. Any other technical infrastructure stuff? Railpacks?Jake [01:04:07]: We built an engine for determining dependencies based on source code. It’s called Railpack. We built the first version, Nixpacks, on top of Nix, and then we moved.Swyx [01:04:17]: People have been trying to get me to adopt Nix and NixOS for four years. Is it ever going to be a thing?Jake [01:04:23]: I don’t know. We’re excited about it, but it has pain points. Think of it as a stack of versioned binaries at specific slices in time. If you want version X and version Y, you bloat the package space, which blows up image size and makes real-world workloads difficult.Swyx [01:04:53]: But you content-address it and cache it. In theory, there are optimizations.Jake [01:05:00]: In theory, yes. But with a large enough user base and disparate enough machines, you run into a problem Meta described in the XFAAS paper, their internal serverless system. It becomes difficult at scale unless you break out specific runtimes.Jake [01:05:24]: We didn’t want to do that because we wanted to truly allow you to deploy anything. That was our initial thing with Nix. But we’ve moved toward interesting work around content-addressable file systems that can lazy-load anything from any point and page it into memory.Swyx [01:05:48]: Amazing.Jake [01:05:49]: The future is very bright. It’s crazy, and it’s going to be nuts.Coding Agent Spend, Roadmaps, and Token ROISwyx [01:05:54]: Founder journey stuff?Alessio [01:05:56]: Your cloud usage: you tweeted you’re going to spend $300K this month?Jake [01:06:01]: I think we got to $200K.Alessio [01:06:02]: Coding agents?Jake [01:06:03]: Yeah.Swyx [01:06:04]: Across the company?Alessio [01:06:05]: You only have 35 people, so I’m sure they’re not all spending $10K a month. What’s the distribution?Jake [01:06:10]: I think I’m at about $25K. We have power users all the way down. We came back from winter break, and I basically said, “If you’re writing code by hand, you’re doing this wrong.” The tools are good enough now that you can move extremely quickly. There are issues and pain points, but you should be reviewing the code you are writing instead of writing it by hand.Jake [01:06:40]: Architectural patterns matter more now than ever, but you shouldn’t spend your time generating code you would write. If you know how to write it, ask the agent to write it and reconcile it until it looks like you would have written it yourself.Jake [01:06:58]: People misconstrue my propensity to push people toward agents as connected to our growth and some reliability bumps. They’re not necessarily related. The tools are good enough to move extremely quickly and build things way larger than you could before.Jake [01:07:19]: To the earlier point about cooling data centers in space: I don’t know. But with software, you can ask, “How would I build block storage from scratch? How would I do these things?” I have ideas because I have history and have read papers. Let me work them out and build massive test benches with thousands of tests, because those are now free to author. If you’re not using AI systems to speed-run your roadmap and reconcile your existing system onto the future, you’re missing a large point of what’s happening.Alessio [01:08:12]: What’s the path to spending $3 million a month? Is it bound by ideas and things customers can absorb?Jake [01:08:19]: For most companies, it’s bound by deployment at this point. That’s why we’ve seen a massive boom in users and companies, from Fortune 50s down, asking how to get developers to move faster. You’ll probably hit your CFO before any technical limits because they’ll look at the eye-watering amount of money spent on tokens. Inference costs have to come down, but we’re inference constrained now. There will be price discovery around what makes sense for an org to adopt.Jake [01:09:06]: I think you’ll end up with the F1 driver concept. If someone is really adept at these things, it makes sense to put them in a $3 million car. If they’re not, it probably doesn’t make sense. You’ll take a few people and say, “You can drive the F1 car. We need to go in this direction. Figure out if it works and prototype it.”Jake [01:09:33]: We’ve done some of that and vastly accelerated our roadmap. We thought we’d ship something in a few years; now we can probably ship it in a few months because we validated it and don’t have to build it incrementally. We can skip steps and move toward our vision.Alessio [01:09:58]: A lot of people are realizing the roadmap doesn’t always have a business impact, so they say tokens are too expensive. But if your roadmap were built to make more money by the time you built it, you’d have token pricing for it, the same way you do with sales. You’d spend a billion dollars on sales if you knew you would get $2 billion of revenue.Jake [01:10:19]: Exactly. A naive way to measure this is the percentage of tokens that end up in production. If you can measure impact because those tokens end up in production, that’s awesome. But the burden of proof will rise. Internally, we have a growing number of pull requests that haven’t merged. The question becomes: how do you get this into production? It’s about how quickly you can build and deploy software, which is exciting because that’s our whole thing.The SDLC Shift: Prompt Requests, Feature Flags, and Safe RolloutsSwyx [01:10:56]: The SDLC is changing. One thesis is that the pull request is dying. It’s going to be the prompt request. Beyond that, code review is also kind of dying if you have all the other systems in place. What else is changing about the SDLC?Jake [01:11:19]: The AISRE and the tools to make it happen. AISRE is pie-in-the-sky aspirational. What does it take to get an AISRE? What tools do you need to build?Swyx [01:11:32]: You should expose your tooling to customers at some point. The Central Station command center.Jake [01:11:39]: We have it for template maintainers. Template maintainers can deploy and maintain templates, and they get feedback. We’re going to expose those things incrementally.Swyx [01:11:51]: Clustering around incidents. Everyone has a version of that, but I don’t think anyone has solved it.Jake [01:11:56]: I won’t say we’ve solved it internally, but it’s gotten so good that we can see incidents forming pretty quickly. At some point, those will be things either someone else builds or we build. We’ve always built things purpose-built for us. If it makes sense to make it useful for users, monetize it, or turn that loop into a profit center instead of a cost center, we want to do that.Jake [01:12:28]: Pull request is definitely dying.Swyx [01:12:29]: Do you do first-party feature flagging and incremental rollout stuff?Jake [01:12:34]: We have a feature-flagging engine we built internally and will eventually roll out.Swyx [01:12:38]: I don’t see it as a user. How come you didn’t give us what you have?Jake [01:12:43]: We have to beta test it. We care a lot about the quality of the things. There’s plenty we’ve used internally that doesn’t make it all the way through the journey because it fails. It works for one service but not multiple services. We’d have to build it for multiple services and know that if we released it, we’d rebuild it again and again. Some things are worth that, but many inform the roadmap.Jake [01:13:18]: We don’t want to dilute the experience by saying, “This works, but only for this service,” unless it’s a core initiative. Over the next few months, we’ll roll out things that work for a single service, then multiple services, then multiple services across the environment. You have to be deliberate. Otherwise you create broken disparate experiences and support load because people ask how to use the feature.Jake [01:13:52]: It’s the earlier expansion and compaction pattern. You expand the company to get features, then compact and smooth them out so the experience is stellar. You told me in the hallway, “It’s gotten so much better.” Internally we’re saying, “This part really sucks. We need to make it significantly better.”Swyx [01:14:11]: I can attest to that over the last three years watching you build Railway. For listeners, feature flagging is a huge part of Uber culture. So much so that they have too many feature flags and another thing to remove feature flags. Facebook has Gatekeeper. Agents are going to need this. It’s fundamental to incremental rollouts. OpenAI acquired Statsig. GPT-5 is routing and flagging through different models.Jake [01:14:56]: It’s super important. If the software development lifecycle is going to change because we’re doing things 1,000 times faster and 1,000 times more concurrently, what becomes important at scale?Jake [01:15:16]: Before I started Railway, I built a feature-flagging product and tried to sell it. It was an easier version of LaunchDarkly. I ran into a problem: anyone small enough to adopt your technology doesn’t care about feature flags, and anyone large enough to need feature flags needs so much scale that you have to build out all the infrastructure. I scrapped it.Jake [01:15:42]: But what is old is new again. Companies are trying to move quickly, but you can’t YOLO a vibe-coded thing straight into production. You need to say, “Here’s my blast radius, my impact, and I want to shadow it for these users.” Feature flags. You’re going to need the tools larger companies built to maintain their structures. Everything gets compressed by 1,000x so everybody can build those structures quickly.Jake [01:16:07]: That’s exactly where we are: compressing the software development lifecycle, then expanding it and adding more new things.Cattle, Pets, and Clonable InfrastructureSwyx [01:16:15]: Another term that comes to mind for newer developers is “cattle, not pets.” People treat production like a pet. It has a name. You baby it and keep it alive. With cattle, you can mass farm, roll out, portion parts out, and kill them.Jake [01:16:37]: I think that might change. You can move toward having pets as long as you have a cloning machine for your pets.Swyx [01:16:52]: Yeah.Jake [01:16:52]: If you can snapshot every single thing at every frame, it doesn’t matter if something gets obliterated because you have a snapshot of it. The things we’ve built right now are designed to block changes from the hermetically sealed DevOps line. You have to write a Dockerfile because you need a specific cut of the file system.Jake [01:17:14]: What if you had the whole file system? What if you snapshot it and lazily load the entire file system? Then you get around this problem entirely. You don’t need the ceremony of Dockerfiles, Ansible scripts, or other things. You can iterate, snapshot, ask if it’s the right loop or state, and then merge it into production. Merge the file system.Swyx [01:17:45]: Why not?Jake [01:17:46]: It’s going to be fun.Swyx [01:17:47]: This is a whole other can of worms, but if you cataloged the stateful things in a VM and developed dedicated solutions for each, you can cut the problem down a lot. It’s surprising people weren’t trying until now.Jake [01:18:04]: It has always been surprising to me because these are the things we would work on. It’s obvious.Swyx [01:18:11]: At first principles, you need them. Everyone needs them in theory. Then the big clouds don’t do them, so you assume it’s impossible.Jake [01:18:18]: Exactly. You think, “Meta has all the people writing eBPF code, and they’re doing something with them.” But you need that kind of work to solve these problems. Whatever is required, however deep we have to go, we’ll go all the way down to the kernel’s TCP/IP stack if needed. If we need to modify something to make it work for the mental model of the universe moving forward, we’ll do it and keep going down.Swyx [01:18:52]: That sounds fun.Jake [01:18:53]: It’s so much fun. I have to peel myself away from fun, interesting problems to make sure we can scale the company in a way that works. There are so many fun problems: getting information from customers to support to the person who built the thing internally, safe iteration, context from the dashboard to users, drilling down to the infrastructure layer, and managing orchestration as a real-time operating system versus a feedback control system. It’s just so fun.Solo Founder Lessons: Obsession, Writing, and FocusSwyx [01:19:29]: Speaking of the founder side, you’re famously outside the YC/SF consensus. You go to YC, get a co-founder, and do all these things. You did none of that.Jake [01:19:40]: None.Swyx [01:19:45]: In the elevator you said a co-founder makes sense if one person is the tech person and the other is the biz dev person. But you have to contain those multitudes yourself. How do you do it?Jake [01:19:58]: I try to get eight hours of sleep.Swyx [01:20:11]: Is there a balance: 50/50, 30/30/30? What’s the mental model as a solo founder?Jake [01:20:17]: There’s no balance. You have to think about all these things and be obsessed with them. Be obsessed with how people think about your product from a go-to-market perspective, and be obsessed with the kernel-level change that makes a user’s SSH connection never drop. I want a universe where you can snapshot everything and it feels like iterating on a VM.Jake [01:20:47]: You have to be obsessed at every layer of the stack. That’s what makes it easier for me. Some people are obsessed with different portions of the company journey, and if you can segment those lines well and be clear about ownership, you’ll have a good time.Jake [01:21:12]: I said two is the worst number of co-founders because you have no tiebreak. You disagree, and how do you resolve it?Swyx [01:21:38]: Usually someone is CEO, so they have the tiebreaker.Jake [01:21:43]: Totally. It’s hard every way you cut it. It’s hard if you get help, and it’s hard if you do it yourself. Running things is hard, but it’s so rewarding and fun.Swyx [01:21:56]: What have you found useful? A coach? Any advice that has been helpful?Jake [01:22:01]: I like to write a lot. I get in trouble a lot for my Twitter. I once said if you’re working weekends, you’re messing up your planning. I’ve gone back and forth on that because right now we’re at an extenuating time where it makes sense to work more. The goals are clear in my mind. If you have the vision and know where you’re going, work harder to distill that vision and do those things.Jake [01:22:33]: If you’re not certain and need clarity, disconnect and take your weekends seriously. Write about where you are, what you want to do, where you want to go, and what problems you’re solving.Jake [01:22:56]: Writing is important. I don’t love the word meditation, but whatever gets you into mental clarity is important when you’re trying to say, “We’re here and need to be here,” or “We’re here and I think we need to be in this general space for this to work.”Jake [01:23:22]: Disconnect, hang out with people you love, and work hard when you’re working. I try to work sunup to sundown, Monday to Friday, all out. I disconnect on Saturday and come back Sunday afternoon to write, plan the week, and do everything else. It works well for me.Jake [01:23:43]: Another hot take: most advice should be digested and thrown out the window. If it’s helpful, it’ll come back. You’ll learn it through experience. We have made failure very expensive as a society, and it makes it difficult for people to walk off the paths.GPUs, Focus, and the Dominant Role of AgentsSwyx [01:24:03]: Anything you haven’t tweeted and gotten in trouble with that you want to preview to the world?Jake [01:24:12]: The agent stuff is crazy. It’s going to be the dominant way people do pretty much everything, provided we can get the inference required for that to happen. Over the next 10 years, you’ll see a fundamental shift in how people think about authoring the logic in their head.Swyx [01:24:36]: One way of phrasing it is: if Allbirds can become a GPU provider, so can Railway.Jake [01:24:44]: I think there’s a lot of “everyone becomes a GPU provider” that is actually not becoming a GPU provider. You’re defined more by the things you don’t do than the things you do, because it’s easy to say yes to a lot of things.Jake [01:24:56]: Anthropic is amazing and moving into different zones. They’re moving into Figma-like things.Swyx [01:25:09]: As we’re recording, Mike Krieger was on Figma’s board, they removed him Monday, and then they launched this today.Jake [01:25:18]: Things move fast right now. But agents are going to be the way people operate.Swyx [01:25:25]: So your answer is focus: no GPUs for now, but never say never.Jake [01:25:27]: Focus. We will not do GPUs now, but we 100% will do GPUs at some point in the future. That’s not me leaking our roadmap because we don’t have plans to do GPUs. It’s just a function of needing FLOPS at some point. If you’re fully vertically integrated and want to make it trivial for people to iterate, build, and deploy, you need access to this core piece of fundamental logic.A New Cloud From First PrinciplesSwyx [01:25:57]: Presumably your own data center traffic is a minority of your workload right now, but is there a point where it’s a majority or you turn off public clouds?Jake [01:26:10]: At some point, we got to 100% data center: our own data centers. Right now, the vast majority of what exists on our platform is on our bare-metal data centers.Swyx [01:26:21]: So you’re already there.Jake [01:26:23]: Yeah. The transition was completed at some point, and then we grew so fast that we had to scale back on that. It got to 100% on the Datadog dashboard and then divoted back into the 90s because we were adding capacity.Swyx [01:26:45]: You’re literally building a new independent cloud, and people assume that could never happen post-AWS.Jake [01:26:53]: It’s hard. We’re going to figure out a bunch of things to make sure the platform is deeply reliable. But you have to break ground on new things when you decide to build a cloud from scratch but not copy the hyperscalers.Jake [01:27:10]: We’ve been deliberate about inventing our own infrastructure from scratch based on reading a ton of papers, while promising ourselves we wouldn’t copy someone else’s homework. If we copy someone else, we lose. You become them over time. You need a core thesis for why this business needs to exist now.Jake [01:27:33]: For us, the activation energy required to deploy something in production on hyperscalers is far too high. We believe it should be instantaneous. There should be no friction between your thought and the reality that comes out and that you can share with friends. That’s what we’re building toward at every layer of the stack. If we have to go down to energy, we’ll go down to energy.Jake [01:27:58]: It matters for giving people access to this tooling. It’s gated not just for citizen developers who are now vibe coding. You have multiple layers: citizen developer, front-end developer, back-end developer, DevOps person, and more. Those layers need to disappear so people can just ship.Swyx [01:28:20]: Amazing. That’s the future of cloud.Jake [01:28:22]: Awesome. Thanks for coming on. Thank you for having me. It’s been wonderful. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
The future of war has been evolving before our eyes in Ukraine, yet the west still plans to fight the last war. In this special episode, guest host Noah Smith (@noahpinion) and Brandon Anderson sit down with Yaroslav Azhnyuk (@YaroslavAzhnyuk), a serial tech founder who went from building PetCube to founding The Fourth Law, one of the world’s most advanced AI-guided drone companies. Over two hours we cover the technology, tactics, and geopolitics of drone warfare, and why the modern battlefield has already left the West behind:* Yaroslav’s personal history and the Ukraine war [00:01:04 – 00:14:01]* The modern drone tech stack: why FPV drones are the new god of war, the future of the rifleman, fiber optic vs. AI, five levels of autonomy, and the eight dimensions of the autonomous battlefield [00:14:01 – 01:05:13]* The geopolitics and economics of drones: China’s manufacturing advantage, the drone race, Western defense readiness, countermeasures, and why the gap is widening [01:05:13 – 01:58:57]For those looking for Noah Smith’s commentary, it really gets going around the 00:51:31 mark.Yaroslav Azhnyuk / The Fourth Law:* X: https://x.com/YaroslavAzhnyuk* LinkedIn: https://www.linkedin.com/in/yaroslavazhnyuk/* The Fourth Law: https://thefourthlaw.aiNoah Smith:* Substack: Noah Smith * X: https://x.com/noahpinionTimestamps00:00:00 Cold Open: China’s 4 Billion Drones and the Cameras-to-Explosives Pipeline00:01:04 Introduction: Brandon, Noah Smith, and Yaroslav Azhnyuk00:05:41 From Tech Entrepreneur to Defense: PetCube, Brave One, and the D3 Fund00:10:42 The Ethics of Building Weapons: Dual-Use Technology and the Wolf at the Door00:14:01 The Tech Stack: Cameras, Autonomy Modules, Interceptors, and a Semiconductor Fab00:18:47 Fiber Optic vs. AI: The Radio Horizon Problem and $32/km Cable00:25:32 FPV Drones: The New God of War — 70–80% of Frontline Casualties00:28:28 The Five Levels of Drone Autonomy: From Terminal Guidance to Full Autonomy00:41:37 The Eight Dimensions of the Autonomous Battlefield00:45:32 AI Safety and the Morality of Autonomous Weapons00:51:31 The End of the Rifleman? Noah’s 2013 Prediction vs. Battlefield Reality01:05:13 China’s Manufacturing Advantage and Western Vulnerabilities01:24:21 Policy Advice for Western Defense: Defense Valley and the Widening Gap01:32:54 The Drone Race: Who’s Ahead, Category by Category01:41:57 Countermeasures: Shotguns, Jammers, Lasers, and Fishnets01:58:19 The Wedding and Final Takeaway: Be Prepared for WarTranscriptCold Open: China, FPV Drones, and the New Warning SignYaroslav [00:00:00]: Think about this. Last year, Ukraine produced 4 million FPV drones. Ukraine is not the most industrious nation in the world. China can produce 4 billion of these FPV drones.Noah [00:00:10]: Would you say that right now China is now the supreme conventional military power on Earth, given its ability to manufacture and deploy drones in the quantity and quality that you just described?Yaroslav [00:00:20]: I don’t think we have all the information to claim that but we cannot count it out, and that alone should be a big warning sign. As I say, at some point in my life I went from making cameras that fling treats to pets to cameras that fling explosives to the occupiers. So that’s the short story. And when you think about what your nation, what your patriots are going through, you realize that’s the only morally right thing to do is to fight back, and it is immoral not to fight back, and then the choice becomes very clear.Introduction: Yaroslav Azhnyuk, Petcube, and the Last Flight into KyivBrandon [00:01:04]: Welcome to Latent Space. I’m Brandon. I normally do science podcasts, but today we’re going to do something a little bit different. I’m joined by Noah Smith of Noahpinion on Substack and Twitter. And he has lots of interesting things to say about drones. And as a guest, we have Yaroslav Azhnyuk, founder of The Fourth Law and several other, drone-related startups. To get started, it is February 23rd, 2022. You are running a pet startup. You’re connecting pets with their owners. Let’s go in just a little bit of background. How did you get started in tech, and what were you working on before the Ukrainian war started?Yaroslav [00:01:50]: Good to be here. Thank you. On February 23rd, late in the evening, 11:00 PM Kyiv time, my wife and I landed in Kyiv. Actually, then she was a fiance. We came from Lviv, where we were looking at a church, where our wedding should have taken place. And we got into this cab ride from the airport to our home, and the driver was like, “You crazy. Like, everyone’s leaving Kyiv. Why do you come?” We’re like, “What? Nothing’s going to happen. Dude, chill.” And then obviously, eight minutes later, or eight hours later, the bombs fell in the city. It was quite surreal. We probably landed on the last flight that landed in Kyiv, or one of those last flights. My background, I’m a tech guy. Studied applied mathematics in Kyiv Polytechnics, born and raised in Kyiv. My parents are old PhDs from academia, and grandparents too. Like, everything, from linguistics to nuclear physics. And I’m an entrepreneur, so I’ve built a bunch of companies. Petcube is the one you were referencing. So I lived in San Francisco 2014 to 2020, building Petcube, which is one of the leading, pet device companies in the world, selling lots of pet cameras. And then, yeah, as I say, at some point in my life I went from making cameras that fling treats to pets to cameras that fling explosives to the occupiers. So that’s the short story.February 24th: Leaving Kyiv as the Invasion BeginsNoah [00:03:28]: February 24th, I guess a few hours after you, go to check out your wedding chapel, what do you do?Yaroslav [00:03:37]: We had a plan for this situation. So my parents and family live in Kyiv, and we’re like, “Okay, this has actually started. The worst has, come true.” And so we basically packed our belongings and got in the car and spent 17 hours driving west. And that was pretty sure most people in our audience watched at least one apocalyptic movie in their life, so that was exactly like that. Like, felt exactly like that. Missiles are falling. Like, there was smoke in Kyiv. Like, my dad and I went, like, to central part of the cities. It’s probably, likeYaroslav [00:04:20]: 800 meters from presidential office, to pick some stuff up at his workplace. Because he’s, like, the head of an academic institution, so he had to get some of the things with him. And super surreal. Like, the streets are empty. Like, the gas stations are out of gas. Like, we found some gas station. We didn’t have, like, spare canisters with us, so we’re like, We figured out, like, the car was diesel, so like, we figured out, if it’s diesel, you can actually store it in plastic, canisters, and we bought some window wash for the cars. We poured it out of the canisters, and we poured the diesel into that. Yeah, so it was like that. And then, like, helping friends get out, like my friend and his dog. Like, we found Like, my brother was also, like, riding in a separate car. We found a place for my friend who didn’t have a car. It was like, yeah, it was like, totally surreal. And we didn’t know of course, and you didn’t know this will last for so long. You didn’t know whether Ukraine will be able to defend Kyiv. And it was like, yeah, very little information and very little insight into future.From Pet Cameras to Defense Tech: Building for Ukraine and the Free WorldNoah [00:05:42]: What are your thoughts with regards to how do you, defend, Ukraine? So you eventually start building drones Like, what is the process to get from there from where you were building, devices that connect owners with pets to building drones, and what other things did you do to help the war effort in the process?Yaroslav [00:06:07]: It’s definitely non-trivial, right? Like, I didn’t go, to I didn’t get any, like, military education when I was a student. Like, normally, in Ukraine, you would, you would go to like, this military school even if you’re getting higher education in any other, sphere. I decided to skip that which is like, an unusual way to go. And I never thought that I will be somehow engaged in a war effort. Like, what is war? Of course, wars are over. It’s the end of history. So one thing you got to understand about, like, many Ukrainians and like, I guess, it’s also true about most of the people I met here in the US, that your who you are in terms of your nationality is a big part of your identity. So when that gets under attack, it’s something deeper than just the country you live in gets under attack, right? And I Day one, I figured I’m going to I’m going to fight back with everything I can, right? But I didn’t think on day one that I’m actually going to do, weapons. And a bunch of things. We were reaching out to a number of American, congresspeople and senators, and basically advocating for support of Ukraine, for voting for lend lease, which has happened in May 2022, but didn’t actually work as expected. We helped start, Brave One, which is now a very important defense innovation cluster, sort of like a DIU here in the US. We helped start, a fund called D3. It’s like, it was started or co-started by Eric Schmidt, former CEO of Google. So a bunch of these odd things, but then eventually I was like, “Okay,”by 2023 it was obvious this thing, A is going to last a lot more time, and B, that the whole world is shifting and that there’s going to be a new arms race, that the warfare is redefined by drones as platforms. And for the first time in history, you have a platform that is software defined, that can increase your battlefield capabilities, in a in a step change just overnight. So it’s like if you were able to push a software update and get all of your Roman legionnaires a new helmet? That has never been possible before. It’s the first time in the history of war this is possible. So all of that and many other things like, supply chain fragilization, and the impact that AI is going to have on all of this all these things have become evident to me in 2023, and it’s like, “Okay, I should do what I do best, or what I know how to do best, start a tech company, and sort of leverage the global techno capitalist machine, to provide, defensibility to Ukraine and the free world.” So that’s literally the mission of the company, increase defensibility of Ukraine and the free world. And then there was some sort of soul-searching and like, asking yourself. It’s like, “Okay, am I Actually, I know nothing about weapons. Am I actually, like, ready to make, things that other people use to kill other bad people?”Yaroslav [00:09:36]: When you think about what your nation, what your Compatriots are going through And think about all the terror of places like Bucha, the occupied cities in the east and south, the abducted children, the raped women, all the economic damage that’s being done, and the intention to destroy a whole nation, to genocide the people of Ukraine, you realize that’s the only morally right thing to do is to fight back, and it is immoral not to fight back. And then the choice becomes very clear. And look, we’re just passing the ammunition. We’re not doing the actual job. The actual fighters and defenders and heroes are people in the armed forces. We’re just support.The Moral Question: Weapons, Responsibility, and Fighting BackNoah [00:10:33]: I have so many questions. Actually, I know you seem to have a question. Do you want to ask anything?Yaroslav [00:10:38]: No, I’m just listening. Go ahead.Noah [00:10:40]: I do want to talk about, some of let’s say, the moral issues, like you just said. You endYaroslav [00:10:50]: I think there are no issues there.Yaroslav [00:10:52]: What would an example of a moral question be in this case?Noah [00:10:55]: No, I mean Okay. As you just said, you are creating the tools, but others are using them.Noah [00:11:05]: I was maybe thinking of having this conversation later, but one of the questions is like, is it actually you are going to be building them for your homeland, which you are building it for your homeland, which is I think, very a strong morally defensible position, but this technology is not going to stay with you, right?Noah [00:11:26]: This you will probably be selling these to other people Yeah. So the future is really where the moral issues may come into playYaroslav [00:11:38]: The this question becomes, easier and more complete if we ask this not about a particular technology or particular weapon, if we think that this question actually applies to any kind of technology Right? So -Knife or fire. You can use knife to do surgery and save people’s lives, or you can use it as a weapon to take people’s lives.Noah [00:12:06]: Cut tomatoes, too.Yaroslav [00:12:08]: Cut tomatoes too.Noah [00:12:09]: Yes, knife.Yaroslav [00:12:09]: That’s helpful.Noah [00:12:10]: In Japan, sword and knife, they, call the same word.Yaroslav [00:12:14]: It’s like, it’s with any technology. Large language models, right? Look at how powerful they are and yet they’re available to anyone in North Korea or in Russia.Yaroslav [00:12:29]: That’s one side of the argument. The other side is As a maker, what is your responsibility for how the tools you’re creating, will be used? There’s definitely some responsibility, right? Then How should the decision process look like? Should you, like, try to calculate all the possible scenarios before starting to work on something? Or do you create something that is needed now to save people’s lives, and then think about, addressing the unwanted edge cases later? In ideal world where there’s like, or okay, it’s not ideal world. In a mythical world where there is some one governing party and it gets to decide everything, and there is no other country, that can, decide on their own, you could say, “Well, we need to calculate for all the consequences, and only then, maybe build this building, by replacing this park because, maybe we need this park in the city,”right? So that kind of situation. But when you’re in a situation where you’re in a forest, in front of a wolf, you first going to deal with the wolf that wants to eat you, and then you’re going to go consult Greenpeace. So that’s kind of situation that Ukraine is in.The Fourth Law, Odd Systems, and Ukraine’s Drone StackNoah [00:13:59]: Enough. Because this is a tech podcast, I did want to spend some time talking about, sort of the tech in that you’ve developed and what you’ve been working on. So can you explain, I guess, first of all, like, the problem that you were trying to solve from a technical standpoint? And I think, and then maybe, like, go into some of the solutions and some of the design process that led you from designing, little laser-guided, guiding lasers with a with an iPhone versus Having drones.Yaroslav [00:14:34]: Like, it so happened, that my partners and I, we sort of So I started one company called The Fourth Law, and its goal was and is to Make, massively scalable on-drone autonomy. And then In parallel with that together with my, Petcube co-founders, partners, and friends, we started another company called Odd Systems Which, was focused on making thermal cameras. Cameras, thermal cameras are seeing thermal radiation and are used to see at night. And we’re now sort of those companies are getting closer and closer together and we’re probably going to merge them. And this group of companies is currently the leading, team in on-drone AI and thermal imaging on the Ukrainian battlefield, and Likely one of the leading, if not the leading in the world. So We have these, like, three sort of business units, which are cameras, drone autonomy, and drones. So the cameras and drone autonomy sell daytime and nighttime cameras and different types of drone autonomous modules to other drone manufacturers, over 200 drone manufacturers in Ukraine. And then the UAV, business unit sells the drones themselves to the armed forces of Ukraine, Ukrainian government. And there are different types of drones. Those are sort of front strike, as we call them, so those are sort of FPV strike drones and the bombers, and then interceptors. And there are different kinds of interceptors. We do Shahed interceptors and we do ISR interceptors. We don’t do the deep strike-FPV Drones, Interceptors, and Battery-Powered WarfareNoah [00:16:32]: What’s an ISR interceptor?Yaroslav [00:16:33]: ISR is stands for intelligence, surveillance, reconnaissance, and those are basically drones which are which, Russians are using to watch over positions and then communicate where, the targets are coming.Noah [00:16:48]: It’s a reconnaissance.Yaroslav [00:16:48]: That’s, the ISR is sort of a classical term for a for a reconnaissance drone.Noah [00:16:53]: Are all of these battery-powered drones that you just described? ‘Cause I know that the sort of deep strike drones still have, like Some sort ofYaroslav [00:17:01]: Internal combustion engine?Noah [00:17:02]: Internal combustion engine. Are all the things you’re talking about battery-powered?Yaroslav [00:17:06]: What we’re working on is all battery-powered, right? We don’t do the deep strikes, right? And then in terms of autonomy-Noah [00:17:12]: You can catch a Shahed with a battery-powered thing. It’s not Fast to catch.Yaroslav [00:17:17]: No, absolutely. Look, Shahed interceptor, like ours, it’s called Zero, it goes up to 326 kilometers per hour.Noah [00:17:26]: For reference, how fast is a Shahed?Yaroslav [00:17:28]: Eight, like, in internal phase it could be 280, but in cruise phase it’s, like, 220-ish.Yaroslav [00:17:36]: Yeah. And sorry, I’m not like you can convert that into miles if you’re interested.Noah [00:17:41]: No, that’s fine.Noah [00:17:41]: Multiply by two thirds or point six or something.Yaroslav [00:17:44]: That’s easy. Yeah, I was saying that for autonomy modules, right, we, -We make systems, autonomous systems for frontline, for interceptors and some for deep strikes as well, and then different levels of autonomy. So from terminal guidance, which is like lasts 500 meters, give or take, to autonomous bombing, to autonomous target detection, to autonomous navigation and all of that across day and night, different terrains, different time of the year, different platforms like quadcopters and fixed wing, and maybe some other platforms. So it’s quite a wide variety of products. We also have like our own simulation. We have our own training school for the war fighters. And we’re about to start construction of two, semiconductor plants to make, sensors for thermal cameras. So that’s super exciting for me as a computer science guy is Doing semiconductors. Super cool.Noah [00:18:49]: Like in terms of kind of core drone technologies, you basically are one is an FPV replacement without fiber optics, and the other isYaroslav [00:18:59]: YouNoah [00:18:59]: Signal tracking with interceptorsYaroslav [00:19:00]: With or without fiber optics. Fiber optics Is just like, sort of a communication module.Yaroslav [00:19:05]: You can, you can use classical analog, video link and radio link. Those would be two separate radios. You can do digital, or you can do fiber optic, and then fiber optic Has its own advantages but also adds weight and decreases, the distance and decreases, how fast you can, sort of turn and With a drone. Yeah.Noah [00:19:33]: Do you need AI for fiber optic drones?Yaroslav [00:19:36]: Like you can use AI for fiber optic drones. AI replaces a human, right? Fiber optic is making your communication link more resilient. So those are slightly different goals. Like if you want, you can have, AI controlling hundreds of fiber optic drones instead of having 100 operators for each.Fiber Optics, Radio Horizons, and Terminal GuidanceNoah [00:20:03]: I guess I thought that the key reason that people moved to fiber optic drones was for like electronic, countermeasures. Or I guess to counter those.Yaroslav [00:20:13]: I think that’s a correct assessment from sort of a public awareness standpoint. In practice it’s somewhat more difficult Because besides electronic countermeasures, you have these issues of a radio horizon For FPV drones, which means that asYaroslav [00:20:36]: I believe Earth is round Some people disagree. But basically if you fly a drone and you have a land station over here and a drone flying over hereYaroslav [00:20:49]: If your drone is flying high, you have good direct radio visibility. If your drone goes low, and usually, Russian infantry and vehicles, they’re on the ground and you want to hit them, you need to go low. Lower you go, maybe you’ll get behind a hill or behind a forest, and if you’re far enough, you’ll just get behind the curvature of the earth. You get into what’s called a radio shadow. And then That is a real bummer because for the last, be it 60 or 20 meters, you won’t be able to see anything and it will be very difficult to hit the target. So to counter that what-- And then the distances that these FPV drones, act on they’re, they can be quite large. So for example, here in the US there was this drone dominance program competition, and in drone dominance the furthest distance was about 10 kilometers.Noah [00:21:44]: What was drone dominance? What was that competition?Yaroslav [00:21:47]: Drone, the drone dominance is a is a program started, by the US government, to accelerate the development of drone technology here in the US.Noah [00:21:57]: Got it. And the longest range thing they were using was 10 kilometers.Yaroslav [00:22:00]: Was 10 kilometers, right. In Ukraine, like if your drone doesn’t fly at least 20, 25, it just, no one’s interested in it, and the usual hits are happening. It was like, okay, many hits are happening between 30 and 40 kilometers, and that’s what expected from a regular 10-inch, FPV drone. So at that distance, even at altitudes of like 60 to 100 meters, you might start losing, the link. So some of the earlier AI technology that was fielded in FPV drone was this terminal guidance technology. That was the first product that we ever, launched that helped you as an operator, once you see the target from two, three, 500 meters, you lock onto the target and then, it just, drives the drone towards the target no matter what, even after you lost the visual connection. So optic fiber solves that. However, if you want to go like 20 kilometers with optic fiber, that will add an extra three kilos, of useful weight to your drone. SoNoah [00:23:12]: ‘Cause the cable that you have to unspool as you go weighs.Noah [00:23:15]: It is heavy.Yaroslav [00:23:15]: At first, like the spool is about 800 grams, so a bit less than a kilo, and then, and then think about 10, 10 kilometer optic fiber is another kilo, something like that. That takes away from your useful mass and then now you have like, you need a 15-inch drone and it can only carry maybe one or two kilos of explosives if you want to go, 20 kilometers. If you want to go to 30 or 40, like 30 is probably max. 40 is like very problem problematic on optic fiber. And then the problem with optic fiber is it’s actually getting super expensive. So and why? Because of all the data centers for AI. That’s literally the same optic fiber-Noah [00:24:01]: We’re running out of centersYaroslav [00:24:02]: That’s being used there.Yaroslav [00:24:02]: Like when Ukrainians and Russians come to Chinese factories to buy the optic fiber, they’re like, “We’re out. We sold it out to the Americans.”? That’s the craziest thing. So optic fiber went up in price from like, $4 per, kilometer to like, $32 per kilometer in a few months in the beginning of this year. And I’veBrandon [00:24:26]: Claude Code is stopping the Russian drone effort here.Yaroslav [00:24:30]: Ukrainian as well. Yeah.Brandon [00:24:31]: Ukrainian. But I read somewhere that the Russians had grown more dependent on fiber optic drones relative to the Ukrainians, and that’s one reason why the Ukrainians have sort of regained the initiative in drones recently.Brandon [00:24:42]: How accurate’s that?Yaroslav [00:24:43]: The Russians were the first ones to scale that. I think by as of now, Ukraine has caught up. I think, like, as of maybe three months ago, Ukraine is mostly caught up on fiber optic. Yeah.Brandon [00:24:57]: What percent of damage would you say is in terms of FPV drone damage would you say is now fiber optic versus, like autonomous?FPVs as the New God of War: Tanks, Artillery, and Cost per KillYaroslav [00:25:07]: For our, for our audience, I actually, I cannot answer that question. Like, it’s like I know the answer, but I would not disclose that. But for our audience, I think another interesting fact is out of all the casualties on the front line Between 70 and 80% are done by FPV drones.Brandon [00:25:30]: FPV drones are the new weapon of universal weapon of warfare.Yaroslav [00:25:34]: It’sBrandon [00:25:35]: Land warfare, anywayYaroslav [00:25:35]: They used to say that artillery is a god of war because artillery used to cause, like 80% of casualties, and now On that ranking-Brandon [00:25:46]: FPVYaroslav [00:25:47]: FPV drones rule.Brandon [00:25:48]: FPV drones are the god of war.Yaroslav [00:25:51]: Sort of. Dethroned artillery. But it’s not to say that artillery is not useful, is not needed. Like, all of these systems are needed. Maybe except cavalry, although Russians still use it. I know, have you seen the videos of Russians using mules and horses?Brandon [00:26:09]: What is the usefulness-Yaroslav [00:26:10]: It’Brandon [00:26:10]: Of a tank in the in the modern-Yaroslav [00:26:11]: That’s where we need Greenpeace to say a word, but they’re silent. Yeah.Brandon [00:26:15]: What’s the use of a tank on the modern battlefield?Yaroslav [00:26:21]: It’s diminishing.Brandon [00:26:22]: Diminishing.Yaroslav [00:26:22]: However, I think there might be technologies which will, revive the tank. Look, tank still provides you armor, and armor is important. Like, you still need to armor and firepower, right? Like, you can be an armor personal carrier that provides you, armor. The challenge that currently exists is armor is not very well protected against incoming drones. However, there are ways to do to protect it. We were previously talking about this before the podcast. The CEO of Rheinmetall, recently sort of ridiculed, Ukrainian drone industry, saying that like, there is nothing interesting there, no real innovation, no to stand Compared to like, Rheinmetall or Boeing, and it’s all made by housewives. There was like, obviously a ton of memes about this people ridiculing the CEO of Rheinmetall. And one of the best quotes, I heard on this topic is from my friend, Alexey Babenko, who’s, the head of and founder of VIARI Drone, which is one of the largest manufacturers of FPV drones. They’re our partner. They’re using our autonomy. So he said that the drones we manufacture in one day will be more than enough to destroy all the tanks Rheinmetall manufactures in a year.Yaroslav [00:27:52]: Then, yeah, cost-wise, of course, a drone is like, $500 and a Rheinmetall tank is what, probably 5 million-ish or maybe more.Brandon [00:28:00]: Don’t mess with those housewives.Yaroslav [00:28:03]: Drone wives.Brandon [00:28:04]: Drone wives.Yaroslav [00:28:06]: That’s it.Noah [00:28:06]: There’s a classic saying that everyone always fights the last war.Noah [00:28:12]: Yet do How did So from your standpoint, how did we get to the point where tanks became irrelevant in at least for now In a matter of just a few years?Yaroslav [00:28:24]: Look, I think it’s the same way, how do we get to the point that calculators become irrelevant?Yaroslav [00:28:31]: Now we have iPhones. Like, why would you need a calculator? Technology progresses and its influence grows non-linearly. It’s all exponential. So I can tell you that full autonomy, when you put it on a drone Look, so if you, if you think about a tank and a like, it’s not a direct comparison, but even, like, a drone and a artillery shell or like, sort of cost per kill, an artillery shell for 155 caliber, which is a standard NATO caliber Currently market price is about $4,000 per piece. So compare that to say, $400 per drone. That’s 10 times more expensive. Account for the amortization of the artillery gun and for how vulnerable it is and what is the sort of tactical, capabilities it gives you as compared to a drone. You’ll figure out that an FPV drone is maybe three orders of magnitude, more versatile, more useful, more capable than artillery and many of than a classic artillery. Many of Because there are different types of artillery. Not just, like, one 155. You have mortars, you have all that. But give or take, roughly three orders of magnitude maybe. Again, it doesn’t have that firepower. It’s not one-to-one comparison still.Yaroslav [00:29:53]: Now, take that FPV drone. When you put full autonomy on that FPV drone, which can be not very expensive, like systems that we’re, producing are like, in hundreds of dollars of pure bombFull Autonomy: From Human Pilots to Smartphone-Directed Drone MissionsNoah [00:30:06]: Just interrupt. You said full autonomy Just a second ago you were saying that the autonomy here is guidance, right? It’s not decision-making.Yaroslav [00:30:14]: No, I was I was saying that’s the f-First and sort of easiest pieces of autonomy that was fielded by us. But if you, if you add full autonomy to a droneBrandon [00:30:24]: He, I think he’s asking what does it can you, for the listeners, can you explain What the term full autonomy means?Yaroslav [00:30:29]: Basically, I think a good way to think about an FPV drone is like an iPhone of warfare. It’s, like, very inexpensive, very mass producible, very versatile. You don’t need a bunch of other things when you have a iPhone in your pocket. You don’t have, need an MP3 player, you don’t need a calculator, don’t need other things. All right? So FPV drone is an iPhone. Or like, okay, Apple please don’t sue me, is a smartphone. And then, when you add autonomy to it sort of becomes like Uber or ride sharing. Okay? So what it means is instead of actually being a trained pilot who has this complex remote controller device which requires a couple months of training to actually pilot the drone, and then having to pilot it for 30 minutes, flying towards the target, et cetera, et cetera, now you basically, you have your smartphone, you have a drone, you pick your smartphone, you say, “We are here. The bad guys are here. Go and get them.” And the drone goes up, flies in a given direction, localizes itself on the map, finds the dedicated area where they, the bad guys are supposed to be sees the bad guys, bombs them, return, like, watches, so does a damage assessment, returns back, sits down, and then you can pick it up and watch the video if you didn’t have the radio link, right?Noah [00:31:59]: That’s a bomber drone.Yaroslav [00:32:00]: That’s full autonomy for a bomber drone, right?Noah [00:32:03]: You’re saying that no human decision is made in this entire process?Brandon [00:32:06]: That’s not, that’s not what he’s saying.Yaroslav [00:32:07]: A human decision was made at the beginning of the process-Noah [00:32:09]: I get it. I get itYaroslav [00:32:09]: The same way as you would fire an artillery.Yaroslav [00:32:12]: When you fire an artillery, you don’t stop at like, 500 meters away from a target and ask it whether, you want to strike or not. That’s exactly, a human decision is always made at some point. So when you do that’s full autonomy, and such full autonomy is happening as we speak. And such full autonomy increases the capabilities of an FPV drone, which is already, like, three orders more powerful than an artillery shell. Full autonomy increases its capabilities by four orders of magnitude because now you can have 100 times as many people who can use it, because you don’t need to train those people, and this is important. You can have 10 times, mission success rate, and you can have 10 times utility per drone because now instead of being one-way kamikaze, it’s, it can be a bomber.Brandon [00:33:05]: Now wait, let’s, you said 10 times mission success rate, which means that fully autonomous bomber drones succeed in their missions 10 times more often than human piloted bomber drones do. That’s an important thing to know.Noah [00:33:17]: Maybe, to push back onBrandon [00:33:19]: They’re super, they’re superhuman. They’re, they’ 10X superhuman.Yaroslav [00:33:22]: They’re not vulnerable to electronic warfare. They don’t care about the radio horizon. They don’t lose track during navigation. They are not susceptible to human error when, an artillery shell or other drone blows up besides you and you’re like, “Hell no,”like, “I’m getting out of here.” Right? That doesn’t happen to an autonomous drone. Like, all of those things. Like, we have, like, one of the brigades that’s using our drones with just first level autonomy They literally said that their success rates-Brandon [00:33:53]: What’s first level autonomy?Yaroslav [00:33:54]: First level autonomy is just the terminal guidance.Yaroslav [00:33:57]: By the way, we have video of that. We can watch that.Brandon [00:33:59]: Terminal guidance means a human gets it nearby and then the AI takes over.Yaroslav [00:34:03]: The human flies it all the way, like 30 kilometers towards the target, and obviously the target was probably given to that human by someone who’s flying some ISR drone, some reconnaissance drone, right? So all the way to the target, and once you see the target from a distance of 500 meters, you do target lock, and from there drone flies autonomous. So just that feature alone, it has increased the guy’s, his call sign is Grom, so it has increased his, mission success rate, like precision of mission, yeah, mission success rate from 20% to 71%, and it also increased his kill zone from three kilometers to 10 kilometers, which means there’s certain area around the front line which is designated kill zone. Whenever enemy goes into that area, it’s almost guaranteed to be to be destroyed by a drone. And then obviously the drones are not launched from like, the zero line. They’re usually launched from like, minus 10 kilometer-Mission Success, Failure Modes, and the Five Levels of AutonomyBrandon [00:35:03]: What is a zero line?Yaroslav [00:35:05]: Zero line is sort of an imaginary line of control, of two conflicting forces.Brandon [00:35:14]: It’s important to explain these things to a lot of the listeners who areYaroslav [00:35:17]: Thank you for askingBrandon [00:35:18]: Familiar with warfare.Noah [00:35:20]: Myself.Noah [00:35:20]: I’m one of those listeners.Brandon [00:35:20]: You said that level one autonomy, in other words just terminal guidance, just, like, human gets it to the finish line and then it goes over the finish line, increases mission success from 20 something percent to 71%, or something like that.Yaroslav [00:35:33]: Increases the kill zoneBrandon [00:35:34]: Increases the kill zoneYaroslav [00:35:34]: Three kilometers to 10 kilometers.Brandon [00:35:36]: Got it.Yaroslav [00:35:36]: On both parameters-Brandon [00:35:37]: What is full autonomy, dude? AndNoah [00:35:38]: Actually on real quick, can we define mission success and like, maybe in a way, what are the failure modes of missions?Brandon [00:35:44]: I have a guess what mission success is.Noah [00:35:46]: But I couldBrandon [00:35:47]: Get ‘em.Yaroslav [00:35:49]: No, but that’s a very good question, in fact, because, even if you fly into the target, well, first the target can be damaged or destroyed. Those are two different modes. Then there can be different targets. A sole infantryman is one kind of target. A dugout where supposed there are some, enemies there is another kind of target, and a some mechanical equipment is another type of target. Radio emitting equipment, which, like, often, like, the targets that the military want to get more than anything else is the some enemy radio tower or something like that or some small radio dish that really makes life difficult in that area, in that combat area. So those are different targets, right? It can be destroyed, can be damaged.Then sometimes, the drone hits but doesn’t explode. Like, that happens. And then, there are other failure modes. You didn’t even reach the target because you were A jammed by electronic warfare; B, you lost the control over drone because of the radio horizon; C, you were jammed by a different type of electronic warfare that happens way before You hit the target area. It’s, impacting your, video receiver. So like jamming on video or jamming on control are two different types of jamming. Then something malfunctioned on a drone, just a mechanical malfunction, maybe like a motor broke or like, whatever. So all of those are different failure modes. Yeah, or maybe you got lost, you’re navigate navigating to your, to your target. That happens, too.Noah [00:37:41]: The Level one autonomy, basically you manage to point in a direction.Noah [00:37:49]: You go there, and then the last mile The drone taking over.Yaroslav [00:37:52]: We define this like, I define that but it sort of got picked up by the industry. We define five levels of autonomy. So level one is terminal guidance. It’s what we just discussed. Level two is bombing. Level three is autonomous target detection and engagement decision. Level four is autonomous navigation. And level five is autonomous takeoff and landing.Noah [00:38:15]: Those are good things to knowYaroslav [00:38:16]: Those are five levels of autonomy. Now, if youNoah [00:38:19]: I have a question for you.Yaroslav [00:38:19]: Sorry. Like, let me finish withNoah [00:38:21]: SorryYaroslav [00:38:21]: Theoretical part.Noah [00:38:23]: What is Tesla running at right now?Yaroslav [00:38:25]: Tesla?Noah [00:38:25]: No, sorry.Yaroslav [00:38:26]: That’s very good point. Like, it’s exactly, it was inspired by the levels of self-driving autonomy.Noah [00:38:32]: Waymo’s level five, right?Noah [00:38:35]: You just tell it where you want to go, it picks you up, and then you go there.Yaroslav [00:38:36]: I think, like, if you, if you look at the classic definitions of self-driving cars, Waymo is still, like, level four because it still requires even remote, but still, like, human control. It’s like if Waymo gets in trouble, there is an operator who takes over and resolves this. So that would still be a level four. It doesn’t map directly, but it’s also five levels.Brandon [00:38:58]: Can I, can I interject a question here? In terms of an FPV drone that’s like a suicide drone that’ll just blow itself up killing something, how do what it hit? Like, does it, just transmit back, or do you sort of like, lose track of it and hope it hit? Like, what happens to that?Yaroslav [00:39:16]: That’s a great question. SoBrandon [00:39:18]: You need another droneYaroslav [00:39:19]: Like, the current battlefield in Ukraine is saturated with different types of drones. So obviously you have all the FPV drones and last year alone, Ukraine manufactured about 4 million of these, and then Russia’s maybe, like, 20% less than that. And for this year, the publicly voiced target was 7 million on Ukrainian side. So it’s, like, serious numbers. We’re getting in serious numbers here. And then besides those, there are different, reconnaissance drones, ISR as we call them, and there are sort of tactical level ISR where we, both Ukrainians and Russians usually use, Mavic, drone by DJI. And then there are a bunch of locally produced drones, which are sort of fixed wing drones that can stay in the air for much longer than Mavic, maybe, like, half an hour. And then, there are drones that can stay for many hours or even up to a day. And those drones have, are more expensive, have more expensive cameras, et cetera, et cetera. We hunt those drones that Russians launch. The Russians hunt our drones, and so on. But ideally, when you, are a group of soldiers operating an FPV, you’ll have someone in your, company, or someone in your platoon who has an ISR asset that will do target designation for you. They’ll say, “Oh, like, there’s a Russian vehicle over there. Go and get him.”and you go there, you get it, and they’re like, “Okay, confirmed.”Battlefield Surveillance and the Eight Dimensions of AutonomyBrandon [00:40:57]: Those guys are watching. They have their own drones in the sky.Yaroslav [00:40:59]: Target destroyed. They have, like, a carousel of drones because One Mavic cannot stay more than 30 minutes. ItBrandon [00:41:06]: They’re constantly surveilling the battlefield.Yaroslav [00:41:07]: Almost every spot on the battlefield.Yaroslav [00:41:11]: It’s not always the case. Sometimes you will not have a surveillance asset, so then you would launch another FPV just to confirm that there was a hit. Then if you see there was a hit and you’re not sure if it completely destroyed, you maybe hit again for good measure.Brandon [00:41:26]: You double tap.Yaroslav [00:41:28]: That’s how it works. But I was about to give you another sort of piece of taxonomy. So you have five levels of autonomy, right? Then you have sort of eight dimensions of autonomous battlefield. So what is eight dimensions? It’s crucial to understand how autonomy evolves in a modern, battlefield environment. So dimension number one is level of autonomy. What are the capabilities that your asset has? Dimension number two is the platform you’re operating on. So it can be a quadcopter, a fixed wing drone, different types of maybe, like, a long range drone or short range drone, but it can also be a missile. You can have autonomy even on an artillery shell or a ground vehicle or a sea vehicle. So all of those are different platforms. Level three would be domain. So it’s ground to ground or ground to air as an intersection, or ground to sea or sea to air. They’re all, like, all the nuances with different domains. Then level four, would be higher levels of autonomy, such as swarming, drone carriers, drone nests, et cetera.Brandon [00:42:39]: Now when you’re saying level, you’re talking about dimensions, not about-Yaroslav [00:42:42]: Sorry. YeahBrandon [00:42:43]: Autonomy levels. So dimension four.Yaroslav [00:42:43]: The dimension. Yeah, I used to say I was supposed to say dimension. I say dimension because each of them works with another, right? So you might have, like third level autonomy, fixed wing drone operating in land to air, and stuff like that right? And then operating in a swarm or operating from a nest. Right? Then you have, sort of dimension number five is environment. So is it day or night? Is it summer or winter? Is it, humid, cold, dry? What kind of target is it? Is your target hiding in a forest, or is it, behind a hill or within buildings? So all of that is environment. Then you have, dimension number six is command and control. How are you dealing with or like, tens of thousands of those assets around the battlefield? How are you coordinating that on the higher levels of command? How are you collecting data? All that.Yaroslav [00:43:44]: Dimension number seven would be infrastructure, so things like simulation, data collection tools, security, deployment mechanisms, et cetera. So all those systems have to be developed separately and integrate with all the others. And finally, dimension number eight is sort of distribution. Have you deployed 100 of these systems or 100,000 of these systems? Because those are two very different ballgames. So that now gives you a more broad overview of how autonomy propagates across the battle space.Targeting, Human Responsibility, and Rules of EngagementNoah [00:44:23]: As someone who has done machine learning and had gone out of distribution and had things, go horribly wrong, you were talking several of these, kind of axes of thinking about drone warfare seem like they could be very susceptible to some sort of distribution shift if you start making things autonomous.Yaroslav [00:44:41]: Like what?Noah [00:44:41]: I mean Well, first ofYaroslav [00:44:43]: If the I’m very interested Sort of sort of kinds of scenarios that you’re thinking about.Noah [00:44:48]: Like the most obvious one is you, if I assume these are computer vision guided systems for at least the last mile, how do you ensure that oh, well, like you now have some fog roll in or something, and you, the drones just attack the wrong thing? Or maybe, it probably will not turn around and fly back and attack you, but youYaroslav [00:45:10]: Same, the same, the same question, how do you ensure that your mortar fire hits the right thing? Well, it’s like mortar fire, give or take half a kilometer could be plus or minus. So maybe you fire one, and then you fire another. So drones are actually, much better in being precise in those scenarios. And I think, to your point, I think five to 10 years from now it will be immoral to use weapons without AI.Yaroslav [00:45:44]: ‘Cause weapons without AI will be more likely to cause, collateral damage or unwanted damage. Same way, it will be immoral to drive your own car manually on a public road because it’s more likely to cause, unwanted damage.Noah [00:46:02]: Wow, I never considered that mightBrandon [00:46:04]: Really? That’s definitely coming.Yaroslav [00:46:07]: Anyway.Brandon [00:46:07]: No, but that’ I don’t know, it’s an obvious, an obvious thought. I agree with you.Brandon [00:46:12]: I, No, they, obviously they’re not going to let you drive once most of the cars on the road are autonomous.Noah [00:46:17]: No, that one, don’t I believe.Yaroslav [00:46:19]: No, I think you were you were talking about drones, right?Brandon [00:46:21]: The drones, right. Cool.Yaroslav [00:46:22]: The weapons, right?Brandon [00:46:23]: Friendly fire and collateral damage and stuff like that is all minimized with AI.Brandon [00:46:27]: Here’s my question. Take all let’s go to level six autonomy. Let’s take all of the target selection. Let’s take all the battlefield data, integrate it into one big AI, and have that big AI basically be in command of the battlefield And agentically do target selection.Yaroslav [00:46:44]: Be the general, right?Brandon [00:46:44]: It’s a general. It’s, you’ve cut humans out of the loop except maybe as dexterous robots, repairing drones and fastening things to drones or maybe something like that because you don’t have those robots yet. How soon are we there? AI general.Yaroslav [00:46:58]: The most important thing to ask ourselves is who will be faster to that us or our adversaries?Brandon [00:47:07]: I assume us, but how fast will we be to that? I hope us.Yaroslav [00:47:11]: I hope so too.Brandon [00:47:12]: How fast can we Like when are we looking at that in terms of like horizons years?Yaroslav [00:47:18]: Like technically, it could be done now. The question is of course, there’s, some engineering work to be done. The bigger challenge is deployment. Right? So okay, technically Like operation in Iran, right? They, the publicly, it was claimed that I think Palantir system was used for target designation, et cetera, et cetera. So it is not exactly as you say, the AI makes all the decisions, but basically AI goes through all the data you have, gives you these 1,027 different targets and says, “You-- To confirm, please press Okay.” And you look at the targets and you’re like, “Yeah, sounds right. Press Okay.”so that’s, I think that’s where we are now already, or we were a couple weeks ago as we’re recording this on April 10th. Another question is how massively deployable it is. Is it, like, every decision being made like that or is it, like, just some of the decisions made like that? And then different levels of command and control. There you have, like, the platoon, the company level, the battalion, et cetera, et cetera, et cetera. But the tricky thing here when we get into that territory, the tricky thing is If your enemy is getting advantage of being Thousand times faster than yourself by deploying such systems What do you do?Yaroslav [00:49:10]: You got to-Brandon [00:49:12]: The if the enemy is a thousand times faster than you at deploying those systems?Yaroslav [00:49:16]: Like, if enemy starts deploying level six autonomy, as you call And you have not started doingBrandon [00:49:22]: You’re in troubleYaroslav [00:49:23]: Yes, exactly. So you have to catch up. So my point is that it is very important to think about the safety of these systems, but that thinking should not slow you down in developing them because they are critical for your existential, survival, right? And like, one person who doesn’t think, doesn’t get to think about the ethics of the war is a dead person. That person surely doesn’t get to think about that.Brandon [00:49:52]: What would be the safety risk of such a system?Yaroslav [00:49:55]: Of course-Brandon [00:49:56]: Friendly fire?Yaroslav [00:49:56]: Just wrong decisions, right?Brandon [00:49:59]: I see.Yaroslav [00:49:59]: Maybe, these decisions-AI Command Decisions, Dead Zones, and Complex BattlefieldsBrandon [00:50:06]: Skynet AI decides it’s going to useYaroslav [00:50:08]: No, these-Brandon [00:50:08]: Drone army to kill usYaroslav [00:50:09]: Decisions will not only be made about drones. They are likely to made about what the humans should do on your side as well. Then obviously some environments are more like Ukrainian-Russian war, where you haveBrandon [00:50:26]: It will have to choose to risk lives. It will have to choose to sacrifice human lives-Yaroslav [00:50:28]: Of courseBrandon [00:50:29]: On your side.Yaroslav [00:50:29]: Of course. And then some environments are just, like, dead, like, dead zones and there are no civilians there, or virtually no civilians close to the front line because, like, super dangerous. Everyone has evacuated from there. But there are other environments which are more like, okay, there’s a counterterrorist operation. There’s, like, a group of terrorists or a group of civilians. Or like, it’s like the recent operations in Iran, I imagine that the US and Israeli forces do not want to harm civilians. They only targeted the military targets there, right? So in those situations, it’s a different level of responsibility for that decision-making as well. And then there is just such a big variety of those military missions, and I’m not even, like, well-informed or well-educated in military science to tell you about all those scenarios. We would need to put some general besides me, and maybe a Ukraine general and American general would have told you very different stories about these things.Brandon [00:51:34]: Got it. Can I ask a few more questions? All right. So in 2013, I wrote one of my first, paid articles ever was about how the era of drones will change human society. I was just sitting around bored thinking about things.Yaroslav [00:51:54]: You were way ahead of your time.Brandon [00:51:55]: I said, I said, “The following will happen.”Yaroslav [00:51:57]: It’s, this article is real. I’ve read it.Yaroslav [00:51:58]: It’s actually-Brandon [00:51:59]: I said small autonomous, suicide drones, will cleanse the battlefield of human infantry. Human infantry will not be able to stand against swarms of AI-powered, suicide drones. That was I didn’t even know about, like, AlexNet at the time, I think.Yaroslav [00:52:19]: You’re just an avid sci-fi reader.Brandon [00:52:23]: I’m an avid sci-fi reader, but also, like, it’s not Like, there will be a way to do that. It’s a it’s a nonlinear multidimensional search problem, and you get enough compute, you’ll find some search algorithm that will get you there. And soBrandon [00:52:38]: I, yeah, I think that one sentence describes the bitter lesson right there.Brandon [00:52:41]: It’s just like it’s a multidimensional search space. You search it somehow. I don’t know. Figure out some get a grad student-Yaroslav [00:52:47]: Sooner or laterBrandon [00:52:47]: To make a search algorithm.Brandon [00:52:48]: It’s not that hard. Anyway, so but then, but I guess the point is The point is that human infantry on the battlefield will be will be gone at the end. I wrote that in 2013. Many people on social media laughed at me for that called me hysterical, said things like, “Electronic warfare will knock all the drones out of the sky.”like, “You need humans to hold ground.”that’s something you still hear from a lot of people on social media today. I feel that this article that I’ve written has never been directionally wrong. It has gotten more and more right steadily over time, and that we’re very reading the battlefield reports from Ukraine, where, human infantry are basically guy, like a few guys hiding in dugouts for months, and I’m not sure what they’re doing.Yaroslav [00:53:35]: That’s on Ukraine’s side. On the Russian side, that’s just like a zerg rush.Brandon [00:53:38]: The zerg rush, and then they just die. Then, but they have some guys in dugouts too, right? Like hiding in dugouts for months.Yaroslav [00:53:45]: They have. Yeah.Brandon [00:53:45]: Like, but that like, what are those guys doing in the dugouts? Are providing, like, frontline, like, reconnaissance? Like, what are they doing?Yaroslav [00:53:54]: If there is a guy in a dugout with some bullets and automatic weapon, the other guy cannot come and take the that dugout. That’Brandon [00:54:07]: I seeYaroslav [00:54:08]: They are they’re establishing control over territory.Brandon [00:54:10]: I see. So that is so there still is a use for human infantry on the battlefield as of today.Yaroslav [00:54:15]: LikeBrandon [00:54:15]: How long will that last?Yaroslav [00:54:17]: I think it will last for a while. This is funny. There’s this whole Layer of the modern culture, a modern Ukraine culture built around the war-related stuff. So there is this -Punk rock band, that is called SZC, I guess in English that would be. Which stands short for like a deserter or something like that. So anyhow, this band has a song titled “2030.” It’s basically about the year 2030, and the war still goes on as like the whatever, third world war or whatever. And they basically, they, sang about the AI and like cyborgs and everything, but the simple infantry is still needed, and we’re still, like, getting cold in those dugouts, and we’re still doing our job. That’s sort of the theme of the song. And it seems like that’s actually what’s going to happen. There areGround Robots, Simulation, and the Limits of World ModelsBrandon [00:55:30]: Ground robots will not replace humans in the dugouts soon.Yaroslav [00:55:34]: I’m very much interested in following the whole humanoid robot theme andBrandon [00:55:39]: What about like a dog robot?Noah [00:55:41]: Or just mobile controlled platforms or something.Brandon [00:55:44]: Spider robot, yeah.Brandon [00:55:45]: Everything evolves into a crab.Brandon [00:55:46]: You build a crab robot.Yaroslav [00:55:47]: A humanoid-Noah [00:55:48]: The carcinization of warfare.Yaroslav [00:55:51]: There is a lot of utility in humanoid robots because the world is designed around humanoids. So I would not, like, 100% disqualify the possibility that sometimes 10 years in the future, humanoid robots, will be actually fighting. So that’s an actual Terminator kind of scenario.Brandon [00:56:14]: Yeah, in the first Terminator movie, you look at what they’ve got on the battlefield, they’ve got flying bomber drones and humanoid robots.Yaroslav [00:56:20]: Look, the cost of large language models of running them is getting so low, you can have basically an inexpensive computer running, what was a state-of-the-art model a year and a half ago, running it locally on a device with an open source model, which also means that the Chinese can have it, the Russians can have it, the North Koreans can have it, et cetera. So that is already possible. And with when we’re looking at the acceleration of the neural nets, I would’ve, if not the acceleration of the large language models, I would’ve said that I don’t think that humanoid robots will be able to be useful in the battlefield earlier than in 10 years. But if you account for the exponential, it might be five years or so. The problem with all of the autonomous systems, and it’s like starts with self-driving cars and even with all the AI, like modern day AI agents, to make them really, useful, you have to solve such a long tail of edge cases, that it’s really difficult to make them useful. Like we were promised, self-driving cars, what, like 2007, Sebastian Thrun and Google, and even before that all the challenges, everything. And Elon of course told us it’s going to be one year from 2014, and now we still don’t have self-driving Teslas everywhere. We have Waymos in SF and some other places, but they’re still, like, not perfect. So I think, I expect something similar from self-flying drones and fully autonomous drones, and we saw that firsthand as with each level of autonomy that we’re adding, there is a very wide distance between a prototype and something that is ready to be scaled to millions of units and something that has been scaled to millions of units. But the race with like AI coding tools is just insane. So things might accelerate very fast, faster than we can imagine.Noah [00:58:46]: I think your point is that with due to this long tail behavior Level one autonomy as you’ve defined it, is actually very natural. Like you basically are just solving an image recognition and tracking system.Yaroslav [00:59:02]: It’s actually interesting that you say it that way, and I thought about this the very same way, and we have this joke that there are like 200 companies in Ukraine which are trying to solve last mile, targeting or terminal guidance. It seems like we’re like the only company that actually solved that because even that problem-Noah [00:59:22]: I’m not saying it’s, I’m not saying it’s trivial, but it’s at least something that you imagine given our current state.Yaroslav [00:59:26]: Like us and Eric Schmidt, like Eric Schmidt’s companies are pretty good.Yaroslav [00:59:29]: Like, I actually have lots of respect to what they’re doing, and they’re, they have been practically influential and helpful on the battlefield, and they have good engineering.Noah [00:59:38]: I wasn’t, I wasn’t saying it’s trivial. I’m just saying this is a something naturally adaptive based upon things that we know work, well. But some of the other domains that where you do have to make decisions and you have a long tail become much harder, and you worry about edge cases more.Yaroslav [00:59:57]: Like the more, the more complex behavior you’re trying to simulate, the more edge cases there are right? The more ways to do it wrong there are. And then there are different approaches. It’s like if you think about, if you read academic papers about robotics, right? You sort of the robot is represented as something that has the sort of sensor input, and then you have three, levels of sort of logics or decision-making, which are perception, planning, and control, and then you have actuators as output.So pre-neural nets, you would do perception output and control all with classic logics, right? Then, with AlexNet and computer vision, you could do perception with neural nets and the rest with logic. You cannot currently do each of those separately with neural nets, each of those separately with logics, or you can just have one huge neural net that just takes lots of sensory data. It’s not just pixels. Could be sound, could be accelerometer, could be everything, as input, and just outputs the controls. And some of the self-driving car companies are doing that or like, experimenting between different ways of doing that. So you can also, like, think about that and the way you implement those features, also influences how much degrees of freedom the system would have, right? Like control, you can do it classical algorithmic control with common filters and PAD filter, PAD controllers, et cetera, or you can do a neural net, that was trained in a gym with a reinforcement learning, et cetera. And those would be two different behaviors of a system.Noah [01:01:53]: I-- Maybe my point was just much more high level. It’Yaroslav [01:01:56]: Or you can If you go even like, if you go high level, you can, you can like train to like have whatever, like Feifei Li and folks who are doing like physical, sortBrandon [01:02:08]: World modelsYaroslav [01:02:08]: World models, right, physical intelligence, they’re trying to make these big models and sort of understand the world and then supposedly you have such model and you can tell a drone, “Okay, like, go over that hill and like, find the bad guys and then get them,”or “Make me a video, make me a photo of the guy smiling and get back to me.” Right? That’s one way. Another way you have like these subsystems, like one is navigation, another is finding the person, another is like getting to them to take a photo. And those are again, very different behaviors. And then it’s not that one is necessarily better than the other, and we might have more technological ability to do one or another. But all of those systems will exist. And then again, you should always keep in mind that it’s only the not only the good guys that are developing these systems, the bad guys are developing these systems as well.China’s Drone Supply Chain and the West’s Manufacturing GapNoah [01:03:00]: I guess where I’m going with this back to Noah’s original thought with the end of the end of the soldier. And so in order to replace-Brandon [01:03:10]: Or at least the end of the rifleman.Noah [01:03:11]: Or the end of the rifleman, yeah.Yaroslav [01:03:13]: I’m not seeing that very close, and it was like I’m, as much as I’m a lover of sci-fi and all of that and a technologist, the more I try to beYaroslav [01:03:27]: Like the I try to have certain humility about these things, and like the military, domain and there was just so much human history and blood and tears, dedicated to sort of understanding this art of war and perfecting it and so on. There is so much knowledge in there that I don’t feel like I even started to comprehend, a lot of that. But one thing that I really understood is that even though drones are now making eighty percent of the casualties, you go to the actual officers, you talk to the actual, like, brigade commanders, corps commanders, and they explain to you, how all of it fits together, how when you’re thinking about an operation that involves a couple thousand people to get this piece of land, out of the enemy’s hands, deoccu deoccupy it, how it is so complex, it involves, dozens of different types of drones and then land operations and reconnaissance operations, psychological operations and then aviations and tanks and logistics and all kinds of these different assets. So modern warfare is really very complex, and the fact that the drones are the latest, coolest thing, and then the AI is latest, coolest thing, doesn’t mean that now it’s that and only that right? So yeah. Whoever’s looking into that I think should realize that it’s not just what the press talks about, that the reality is much more difficult, much more complex.Brandon [01:05:17]: Let’s talk about China and China’s manufacturing capabilities. So suppose that someone, like suppose the United States went to war with China. AndYaroslav [01:05:26]: I hope not.Brandon [01:05:27]: I hope not as well. And then but suppose that drones were very essential to that war of all the types of drones that we’re talking about here, and that suppose that China said, “All right, well, you need X and Y and Z, to make those drones to fight us, and we control the production of X and Y and Z, so we’re just going to cut you right off, and now you have no drones.”Brandon [01:05:47]: I know that a number of countries, including Ukraine and Taiwan, have been making moves to China-proof their drone productions that China couldn’t do that. Examples of things they might be able to cut off might include rare earths, fiber optic cable that you were talking about before, various other things that where even if they don’t control one hundred percent of the production, they control enough of the production that would be extremely expensive to produce it without relying on Chinese sources. Or the market’s fragmented enough, et cetera. What do you see as China’s key bottlenecks, and how easy are those to overcome in terms of China-proofing drone production in case of a war against China?Yaroslav [01:06:30]: Let me start with a saying that -Although China does not sell directly to Ukraine and it does sell directly to Russia, a lot of Ukrainian supply chains, they start in China, right?Yaroslav [01:06:49]: We’re not in a conflict with China, and we would not want to be in a conflict with China. And we’d hope that China stays a neutral power between Ukraine and Russia and the US as well. That said, the scenario that you’re describing, everything is much worse.Yaroslav [01:07:11]: Think about this. Last year, Ukraine produced four million FPV drones. Ukraine is not the most industrious nation in the world.Yaroslav [01:07:19]: China can produce four billion of these FPV drones.Yaroslav [01:07:23]: China can make them not drones with propellers, but fixed-wing drones, which go not forty kilometers far, but maybe two to three hundred kilometers inland. Slightly more expensive.Brandon [01:07:34]: With internal combustionYaroslav [01:07:36]: No. WithBrandon [01:07:36]: Battery-powered fixed-wing drones.Yaroslav [01:07:38]: Battery, yeah.Brandon [01:07:39]: What’s the propulsion system on those propellers?Brandon [01:07:43]: I don’t-- I just don’t know how that works.Yaroslav [01:07:44]: You have that. They can also make them all fully autonomous. They have DJI, the world’s most advanced drone company. They can make them fully autonomous without GPS, without anything. Then they can put those drones on maybe tens of thousands of fully autonomous underwater submarines, or maybe not even that just on shipping containers and barges that ship goods or freight ships. And then they show up with millions of drones packed onto those, sea vessels. They show up to any coastline in the world, be it Taiwan or be it California, and they have millions of long-range impactors targeted at a at a piece of land.Yaroslav [01:08:38]: What do you do with that? There are not enough hunter submarines. There are not enough antiBrandon [01:08:46]: Ship missiles.Yaroslav [01:08:47]: Anti-ship missiles, anti-ship, planes. They can produce these assets, on in tens of thousands of factories because they’re so simple to produce that even the if the FBI director picks a phone, calls to the President of the United States, says, “Hey The scenario Yaroslav was warning us about is beginning to unfold. We need to do a preemptive strike,”You wouldn’t have enough assets, to do preemptive strikes because there can be like tens of thousands of places where these things are being manufactured. And then so to counteract a scenario like that we would need to have like a similar amount of massBrandon [01:09:39]: You mean a similar number of drones.Yaroslav [01:09:41]: Yes, to intercept that like either in sea or in air, et cetera, at a similar cost, right? So economics should work out. I’ll tell you that currently, we in the West and we in the United States, we don’t have the technology to do that. We don’tFour Layers Behind China: Technology, Manufacturing, Components, and Rare EarthsBrandon [01:10:01]: What technologies, key technologies do we lack?Yaroslav [01:10:03]: Like autonomy, mass drone manufacturing, stuff like that.Brandon [01:10:06]: We lack autonomy technology?Yaroslav [01:10:09]: I think so.Brandon [01:10:10]: Because our computer vision algorithms are not as good?Yaroslav [01:10:12]: It’s not only about the computer vision algorithms. It’s like the like if a group of companies by Eric Schmidt founded two, three years ago and my small startup, was like maybe not as small, but it’s also founded three years ago, are sort of two of the leading companies in the world, and maybe a couple others who are capable of something like that but not really on small drones. I do think we’ll, we were behind China in technology. So we lack technology, we lack mass manufacturing capacity, we lack the components, and we lack the rare earth materials. So there are four layers in which we’re behind this challenge. And that’s why it is my point that we in the in the West, and especially in the United States, we should, there should be far more smarter people working in defense, and there should be more funding, if we want to keep the resemblance of our good past life.Brandon [01:11:14]: That’s really important. Would you say that right now, as things stand, in conventional terms, not, abstracting from strategic nuclear weapons, but in conventional terms, would you say that China is now the supreme conventional military power on Earth, given its ability to manufacture and deploy drones in the quantity and quality that you just described?Yaroslav [01:11:35]: Look, I don’t, I don’t think we have all the information to claim that butYaroslav [01:11:41]: We cannot count it out, and that alone should be a big warning sign. We have not seen, Chinese drones in action. We’ve seen some of the Iranian drone in action and Russian drones in action. Not Chinese really. Not seen Chinese forces in action. Obviously, hopefully, this never happens, but the conflict of a scale US, China, there are many Sort of classical assets that we should not discount. As we just discussed, we should not discount artillery in the land war, we should not discount, air-carrying groups and the air force, and long-range missiles and electronic warfare and satellites, et cetera. But then there are also things that we, at least we as a general public don’t really know about China. I’m sure there’s a lot of information that the US intelligence has about the Chinese capabilities. -I think if you, if you get back to the scenario that I just described, and if you take that like, sort of to the maximum You basically see that whoever has bigger manufacturing capacity, that side wins.Brandon [01:13:03]: That’s just a typical law of conventional warfare Has been forever.Yaroslav [01:13:07]: Sort of.Noah [01:13:07]: Do you read Noah’s blog?Yaroslav [01:13:09]: I not as often as I would like. But I read Noah’s, X.Brandon [01:13:15]: It’s not necessary.Noah [01:13:15]: It’s a theme whereBrandon [01:13:16]: Don’t read my X.Brandon [01:13:19]: It’s just forNoah [01:13:19]: He doesn’t, he has no opinion about certain things. YeahBrandon [01:13:22]: It’s just jokes.Yaroslav [01:13:22]: No opinion. Okay.Brandon [01:13:22]: Okay, so here’s the I guess there’s two questions here. The question of could The United States and other countries allied with the United States even develop supply chains that are independent of China to make any of these drones? And the second question is could they do it in sufficient mass? And so I think the answer to the question of can they do it in sufficient mass is today, no. But in a extended, prolonged war situation, things change a lot. And all the development restrictions that we put on new factories go out the window, and a sense of urgency. Ukraine obviously wasn’t making all these drones before the war.Yaroslav [01:14:04]: Of course.Brandon [01:14:04]: So if America had the same kind of urgency that Ukraine has now, things would happen. Things would move, and of course, America has allies too, or had allies until recently, and may have them again in the future. But America has or had allies that would also scale up very quickly, like Japan and European countries if we ever ally with them again, et cetera. And so a lot of things could then change in terms of the actual mass. So I, in terms of looking at China and saying they have all these factories today, and looking at the history of conventional warfare, America had very few military very little defense production capability on the eve of World War II, and ended up easily outproducing everyone else, even the Soviet Union.Yaroslav [01:14:47]: Maybe not easily. Yeah.Brandon [01:14:49]: Not easily, but by a long, a long shot.Yaroslav [01:14:51]: Also the added benefit of not being attacked.Brandon [01:14:54]: That’s right. That’s right.Yaroslav [01:14:54]: That helps.Brandon [01:14:55]: Who knows how Secure they are now, but or what, where cyber influenceYaroslav [01:15:03]: No, look, I totally agree with your sentiment. I like, and I’m not as y, I’m even less doomerish than you are. Or as it seems to me, you’re a little bit doomerish, but like, in the long term, you’re bullish.Choke Points, Europe’s Wake-Up Call, and Defense Industrial PolicyBrandon [01:15:17]: I’m not, I’m not doomerish. I’m thinking about the I’m thinking about what we need to do.Brandon [01:15:21]: I’m not, I’m not thinking like, “Oh, we’re doomed.” That’s not my point. It’s never useful saying that. If you’re doomed, then just don’t go on podcasts.Brandon [01:15:28]: Go pet a rabbit and play a video game or something. It’s Anyway, no, if you’re, we’re not doomed, but I’m saying step one, how, what are the key choke points that we need tomorrow, besides rare earths, which we already know, what are the other key choke points that the West needs to free itself from Chinese supply chains on in order to manufacture even one drone Free Chinese supply chains?Yaroslav [01:15:54]: There are companies here who are doing that like our, we have, good friends, a company called Neuros. I know they’re, down in El Segundo or whatever, like somewhere on South California.Brandon [01:16:05]: What are the most pressing choke points besides rare earths that everyone talks about?Yaroslav [01:16:09]: That’s one of the pieces that we do, thermal cameras. That’s like actually a big one.Brandon [01:16:16]: Thermal cameras.Yaroslav [01:16:17]: Then, like, the motors. Like you need The special-Brandon [01:16:25]: Even after you have the magnets, then you turn them into a really good motor.Yaroslav [01:16:28]: You have, you need these special magnets, and then that’s sort of your rare earth component.Brandon [01:16:34]: That’s, that’Yaroslav [01:16:34]: Like rare earth is not that oh, like there are these metals that only for some reason, God only put them under the Chinese territory and not under any others. No, like they’re distributed. There are plenty of them around Earth. It’s about the refining capabilities and like, investing into that and so on. And then, like, frankly, at some point, we don’t have that many humans. Like, that’s where the humanoid robots help. Like China is a big populous country. The population of like, United West is comparable to that but the population of the US is much lower than that. And I definitely think that the whole West should get their act together, because, ubi semper victoria, ibi concordia. There’s always victory where there is union.Brandon [01:17:27]: Agreement.Yaroslav [01:17:27]: Agreement, yes.Yaroslav [01:17:31]: I think we sort of as the free nations of the world, we should get their act together because freedom is what unites us. And I’m also, like, pretty mad at what’s happening in the European Union. And I think that Current US administration is the best thing that has ever happened to Europe, since World War II probably. Or since post-World War II, because World War II wasn’t the best thing.Brandon [01:17:59]: Trump withdrawing the image of omnipotent American support forced the Europeans to get their butts in gear, unite Develop their defense industries.Yaroslav [01:18:07]: Also, like, doing that not in a nice way, right? Like when JD Vance came to Munich, Forum one year ago, he wasn’t, like, super nice, like, “Oh, please, our European friends, please could you please increase your, defense spending?” He was somewhat pushy. Let’s put it that way. And that I think that was a necessary measure. Like, I’ve been, I’ve been thinking about that. Could it, could it have been he, maybe he could have been nicer? I was like, no, because, like, the voters of European leaders, the European countries, would have not understood this. They would not get the message. And now I think the message was gotten across, but Europe is still sort ofSlow to wake up, I would put it that way. Things are getting better, but I’m not happy about the speed of how they’re getting better. So when I, when I, like, when I would go to some of the European capitals, I would get back pretty depressed from like, talking to their, military officials and their entrepreneurs, et cetera. Here, I’ve been in the US for the last month or so. I’m not depressed. I’m actually, I’m actually excited. I still think you should, like, 10X the effort in sort of making sure that you remain the strongest power, in the world and you can defend your values, et cetera. But I’m very optimistic, and definitely once we are in danger, I think, we’re just, like, lots of very smart people in the West who can figure these things out. But people in China are also extremely smart. It’s very different from even the Cold War sort of situation. Like, Soviet Union was economically a very declining power. China’s not like that. And then if we look at electric car race, I think they’re ahead of the US and ahead of the whole world, definitely ahead of Europe, which used to be sort of a car superpower. When you look at AI, I think they’re Almost where we are maybe slightly behind. When you look at humanoid robotics, I would argue they’re ahead. And in many other, like, in like medicine and sort of biosciences, there are lots of interesting things there, and like, in consumer space, there are lots of interesting, things there. I don’t know if you heard this podcast called 996. I don’t know if it’s still airing or not. There used to be a fantastic podcast by some, American Chinese, businessman, maybe venture funds.Humility About China, Taiwan, and DeterrenceBrandon [01:20:55]: About the Chinese economy?Yaroslav [01:20:56]: About China from a sort of tech venture point of view. So and I lived in China for maybe four months, and I visited a couple times. Like, even WeChat is like, such a more advanced app than anything we have in the West. So we, it’s very important not to be too arrogant, and I think we’re guilty of that like, definitely in the US. Sometimes we tend to be too arrogant. Like, I think, like, humility helps always, at least to me personally. And then I think, like, we don’t have to we don’t have to obviously be enemies. So Like with Ukraine and Russia, it’s like Russia came to kill all of these people and get all this territory. With China and the US, it’s not like that and thanks God it’s not like that right?Brandon [01:21:54]: It might be with China and Taiwan. Maybe.Yaroslav [01:21:57]: Hopefully not. Yeah. It’sBrandon [01:21:59]: Hopefully notYaroslav [01:22:00]: It’s like China has their own, problems probably with human rights, et cetera. But hopefully, it’s still not beyond the fixing point.Brandon [01:22:13]: Hopefully. Hopefully.Yaroslav [01:22:14]: We should, we should be armed, right? We should, we should be ready to whatever, and then that alone decreases the probability of any conflict. If you’re weak, you’re basically provoking the conflict. The problem with Europe these days is that like, last year, Ukraine and Russia went in drone technology of 2025, year to drone technology of 2026. Europe went from winter of 2022 to spring of 2022. So the gap, Europe didn’t even make one year of progress. The and the US, I would argue, made less than a year of progress as well in the last year. So the gap, the technological gap is getting wider and wider and wider. And at some point, like, I’m looking at polls who are like, very close to us and close to Russia.Brandon [01:23:06]: Polish people-Yaroslav [01:23:07]: Polish peopleBrandon [01:23:08]: Not surveys.Yaroslav [01:23:09]: Not, yeah. Oh, yeah, sorry. Yeah. That’s what I meant. Sorry, not my first language.Brandon [01:23:12]: When I’m looking at the polls, what do they, what do they say?Yaroslav [01:23:15]: Polish people. Polls.Brandon [01:23:16]: No, it’s the right word.Brandon [01:23:18]: You’re just thinking about-Yaroslav [01:23:20]: No, we.Yaroslav [01:23:20]: I’m looking at them, and they bought like 100 tanks and four submarines. It’s like, dudes, you don’t have, like, 1,000 people who know how to operate an FPV. What the hell you’re doing?Brandon [01:23:30]: Poland is not preparing for war correctly.Yaroslav [01:23:33]: From what I canBrandon [01:23:36]: They’re doing a very bad jobYaroslav [01:23:36]: They’re not doing it right. And the problem is they’ll be in a situation where, they’re so proud of their winged hussars and like, their cavalry, and the enemy is attacking with airplanes and tanks. That’s literally like the gap is getting wider between Russia and Poland.Brandon [01:23:57]: That happened in 1939.Yaroslav [01:24:01]: I don’t want that to happen again.What America Should Learn from Ukraine’s Defense ValleyBrandon [01:24:03]: All right, so the Europeans need to wake up more. If you were advising America’s defense establishment, which you might be doing in real life, but if you were saying things on a podcast that might be heard by some people connected to that defense establishment Then which you may or may not be what are like, the besides more funding, more funding, that’ll be necessary for anything, literally anything. But so what are the top priorities policy-wise for America to increase its readiness right now? And let’s say three to five priorities.Yaroslav [01:24:38]: Look, I really like this quote, I think it’s by Arthur C. Clarke, that “the future is already here - it’s just not evenly distributed yet.”and just the same way as Silicon Valley as this Sort ofFuture location for all things tech. Kyiv and Ukraine is sort of the defense valley. It’s the point where the future of defense has already arrived, and there is a ton of things to learn from that starting with particular, hundreds of companies in very particular fields, to the battlefield experience, from battlefield commanders of every level, starting from soldiers, surgeon to platoon level commander to brigade level commander, special forces and intelligence, all of that to how the government, organizes, the sort of the infrastructure and sort of the playing ground for all these businesses to flourish, et cetera. So I would definitely look into much tighter integration and exchanging, the experience and so on. That would be one thing.Yaroslav [01:26:03]: I think Reform and procurement would be another thing, and I think that’s what, is currently being done with drone dominance. I think Pete Hegseth is leading that and maybe some other people in the administration. I think that’s extremely sort of powerful and right thing to do, and they should scale that big times.Yaroslav [01:26:26]: Obviously, any sort of military person would say, “Well, yes, okay, Yar, you’re fine, cool,”but Ukraine and its war theater is very much different from potential scenarios that U.S. Might have to fight, and yes, I agree, but there is still so much to learn even, like, from the sea warfare that Ukraine is doing and then long strain, long range drones like these Shaheds that unfortunately damaged some of the American equipment in the Middle East. They can fly up to two thousand kilometers. So like, if you think about in the Pacific region, like two thousand kilometers, that covers a lot of land with all the like, islands and aircraft carriers, et cetera.Brandon [01:27:16]: I think America is learning that lesson right now in Iran, in the Middle East.Yaroslav [01:27:20]: You would think so but then, I’m not sure. It’s like there was so many chances to learn that lesson from Ukraine before, and I don’t think it was like, fully learned, so I’m not sure how fully learned the Middle East lessons were.Brandon [01:27:34]: Perhaps losing a war to a minor power will teach America.Yaroslav [01:27:38]: You can, youBrandon [01:27:39]: Although the their economic weapon will be the most important and decisive by far, but still, some of our bases were supposedly, allegedly rendered unusable by their Shahed-type drones.Yaroslav [01:27:51]: Look, I think, there are so many lessons to be taken from this like Russia, a much bigger power attacking Ukraine. Given the same logic that we discussed, whoever has more production capacity should win. But then Russia didn’t achieve victory in Ukraine, and then the US didn’t get, like, full victory in Iran. Probably achieved some of the goals, but probably not all of them. So that also, you can flip that. Like when you say, “Okay, what if China has so much more capacity than the US? What if they attack us for whatever reason? How can we hold them back if we don’t have the rare earths?” Well, as the Ukraine and Iranian examples show, you actually can hold back something like that even if you’re a less capable, party.Brandon [01:28:42]: Well, those examples did rely on Chinese supply chains, though.Yaroslav [01:28:47]: Partially, yes. But then if you think about Ukraine in February twenty-two, twenty-two to first half a year or a year, wasn’t much reliance on Chinese supply chain. We were just relying on whatever we’ve got. So that’s one side of things. Another side of things is basically how much suffering can you withstand along multiple axes? It’s not just the military axis, it’s also, like, the economic axis and the political axis, I would, I would argue. So like, one of the reasons why wars stop or start is because the political pressure on the leadership internally in the country is so high that you just have to stop that right? So I think that differs big times, from whether you were the one who’s seen by the population as the party which started the conflict or the one who was attacked. That’s one part. Another, just by overall state of the society. Like, and one thing I’m worried about in Europe now, that people are not ready to fight even if they’re attacked. Like, when people are asked about that they’re like, “Oh, I’m just going to move to somewhere where there’s like less, there’s no war.”so that’s a challenge, and that’s what makes Europe weaker right now. And the US didn’t really have to ever, I think, fight a foreign war on its own turf. I hope that never happens, but in case that would have happened, I don’t know what would be how would the rich cities of East or West Coast, how would people behave? Like, would all the Wall Street bankers and Silicon Valley VCs, mobilize and really start working on defense stuff? I would love to think so. I like-- That’s the way I think about the American spirit.The Nuclear Lesson: Budapest, Deterrence, and the World After 2022Brandon [01:30:49]: The way we did in World War II.Yaroslav [01:30:53]: In a way, but look, like it wasn’t that clear in World War II, and like Churchill was like famously said, “America will always make the right decision after trying all the wrong ones,”right? And it’s like one could argue that there is this sort of this USA that lives in popular culture and was sort of created by Hollywood as like cool dudes that will always come and do the right thing, right? And then if you, if you look at like, international politicsYaroslav [01:31:21]: It doesn’t necessarily always look like that. Like the Budapest Memorandum, like Ukraine gave all of its nuclear weapons, the second, worst, third largest, nuclear arsenal, because the US and Russia and the others were very persuasive and they’re like, “Yeah, just give it away. We guarantee you security.” And they’re like, “Oh, it’s not guarantees, it’s assurances. We use the word assurances, so therefore we didn’t promise you much. You just gave it away for free.” And then like Russia attacks and like no reaction. So the whole world, like 2022, the whole world looks at it and is like, “Oh, okay, so maybe we should get nukes.” So like my prediction, next couple decades, a lot more countries, will be working their own nukes.Brandon [01:32:02]: They really should. I’ve, I’m consistently advocated for specifically Japan, South Korea, and Poland to get nukes. But obviously Ukraine should as well, but can’tYaroslav [01:32:11]: Someone could argue that if a country currently doesn’t work on their own nuclear program, they’re, doing a disservice to their country and the government should be fired. Like, because it seems like from the recent world history that is like the only way to actually provide credible deterrence, all right? So I guess I think like in Europe, people are not quite sure, how will America behave. Will it behave as the Hollywood hero, or will it behave pragmatically as it did at the beginning of World War II, or as it did, with when Ukraine was attacked by Russia and the US just decided to sort of push the Budapest Memorandum, aside because of course Russia’s a nuclear power and like we don’t want to mess with it.The Drone Race: Where Ukraine, Russia, and the West StandBrandon [01:32:59]: Everyone says Russia’s behind right now in the drone war.Yaroslav [01:33:04]: True. Okay.Brandon [01:33:04]: But that wasn’t true a year ago. So a year ago people were saying either Russia was ahead or they’re at parity, or maybe a year and a half ago.Brandon [01:33:12]: Russia has more people, four times as many people about, or more.Yaroslav [01:33:17]: I think give or take, yeah. 30 versus like 120-ish. Yeah.Brandon [01:33:21]: Four times as many people.Brandon [01:33:27]: More help from China.Yaroslav [01:33:28]: Like economy is like 10, 10- 20 times bigger, I don’t know. A lot bigger.Brandon [01:33:33]: A lot of oil money, a lot of oil money, that Ukraine just doesn’t have. More direct help from China than Ukraine is getting.Brandon [01:33:41]: Russia just has this massive advantage in scaling against Ukraine itself. Ukraine has financial assistance from the EU, but Right now Ukraine is ahead in the drone raceYaroslav [01:33:54]: I’m not sure about that by the way.Brandon [01:33:56]: Is that I was Well, that was going to be my next question. Is that true? And if it is true, how long before Russia manages to pivot, course correct, and regain the lead?Noah [01:34:05]: Sorry. For my own curiosity, can we define drone race?Yaroslav [01:34:09]: Look, I think it’s also for our listeners It’s helpful to understand that there areYaroslav [01:34:17]: At least 30 different types, categories of drones, right? Like you have If you, if you, first you have like different domains. You have flying drones, ground vehicles, and you have sea vehicles, and you have undersea vehicles, right? Then for each of those domains, you have multiple use cases. Like for ground vehicles, you have logistics, evacuation, mining, de-miningYaroslav [01:34:48]: Like maybe something else. For aerial, you have reconnaissance, front strike, mid strike, deep strike, mining, de-mining, radio repeating, kamikaze and bombing, ISR, different types of surveillance, so tactical surveillance, operational level surveillance, maybe strategic level surveilla surveillance at some point.Yaroslav [01:35:17]: Logistics also with aerial drones. For sea drones, same thing. So In each of those categories, you have Dozens, sometimes over 100 companies, and products which compete. So that’s the current Ukrainian, battlefield. From the Russian side, it’s less of a zoo, as we say. So they, in each category, they usually have one to maybe three products, and then they scale it sort of in a centralized fashion. And then so when you talk about whether we are behind or who’s behind or ahead in drone warfare You got to analyzeBrandon [01:36:04]: It’s asymmetric, so it’s hard to compareYaroslav [01:36:05]: Sort of area by area, right? So if you’re like talking about their front strike, I would argue that Ukraine has gotten ahead recently with after scaling the fiber optic. Before that Russia was slightly ahead. So Ukraine got ahead. With like mid strikes, so say something like 40 to 200 kilometersYaroslav [01:36:35]: It’s hard for me to judge. At some point Russia was ahead. I think maybe we’re getting ahead as well, and deep strike we recently got ahead, so we were we were doing more damage to Russia with deep strike drones than they’re doing to us. In sea drones, we’re consistently ahead, always were ahead. In ground drones, I think we’re ahead. Yeah, I think like onBrandon [01:37:00]: Where are they still ahead?Yaroslav [01:37:01]: In general, I think we’re ahead. Where they, where they are still ahead? I think in certain parts, -Of the components, like A GPS free or navigation like these CRPA antennas are pretty good. They have, these, winged, bombs that they drop from their bomber planes.Yaroslav [01:37:33]: I forgot the English name for it.Brandon [01:37:34]: Glide bomb?Yaroslav [01:37:35]: Sort of. Yeah. So they’re ahead on that side, and it’s like it’s difficult to protect from those.Brandon [01:37:42]: What’s the range of that?Yaroslav [01:37:45]: It can be pretty big. I think it’s like, can be up to 80 kilometers. Then obviously the range-Brandon [01:37:52]: From like a fighter plane, like a strike?Yaroslav [01:37:54]: The range is a very iffy subject here because the range isYaroslav [01:38:01]: Is like basically the distance from where you drop the bomb to where it lands, but also you drop it from a fighter plane, and then fighter planes are susceptible to aerial interceptor missiles. So on our side, we have our own fighter planes, and we have the ground anti-air systems. And then, and then those two assets, they have their radars and radar fields. And then, depending on the enemy tactics, you can, calculate how big is the aerial area that you cover with those assets. And look, I’m not a professional military guy, so I’m covering these topics in a in layman terms. Don’t quote me on this. I’m just trying this to make this as understandable to an average listener as possible.Brandon [01:38:50]: Helicopters. I’ve recently seen reports of drones taking out helicopters in the air, and that this is new.Brandon [01:39:00]: Is that new? Is that going to be a big deal? Is that going to incre like, is that going to eventually get rid of helicopters the way drones are getting rid of tanks in the battlefield?Helicopters, Drone Carriers, and Future Air DefenseYaroslav [01:39:10]: Look, helicopters are also versatile assets. Front strike helicopters, I think we’re going to be seeing fewer and fewer of them. These few Russian helicopters that Ukraine’s intercepted with drones were more like edge cases than a systematic, sort of helicopter hunting campaign. I think it is possible to turn it into a systematic, countermeasure against helicopters.Brandon [01:39:38]: What kind of Will those be battery powered drones themselves, do you think?Yaroslav [01:39:41]: Potentially. And there are like so many different scenarios. Like you can have large aerial drone carriers carrying interceptor drones.Brandon [01:39:54]: That then go hit the helicopters.Yaroslav [01:39:56]: For example. Or you can have, battery powered interceptor drones, but not of a missile with a propeller type, as many of these well-known drones like Stinger or P-One Sun. They look like basically a missile with a quadcopter, behind it. But you can also have a plane or like fixed wing like, aerial interceptors.Brandon [01:40:25]: Does anyone, does anyone have like a little like, drone that flies super low under the helicopter and like shoots it from underneath?Yaroslav [01:40:33]: Like in theory you can imagine that but it’s justBrandon [01:40:37]: Or like surface, a drone that carries surface-to-air missiles somehow.Yaroslav [01:40:40]: I don’t think that’s very practical because whatever you have going on land will be just super slow and not fast enough to be able to hunt down a helicopter.Brandon [01:40:50]: I mean like in the in the air. Is it, is are is there a drone capable of carrying a small surface-to-air missile that can like skim, low and then launch its little missile, like a flying missile platform or something?Yaroslav [01:41:00]: In theory, but like a big part of a mission like that is not just kinetically getting to a helicopter, but also identifying it, either by means of first radar and then visually, and placing the asset you have, the interception asset you have in the right place in the right time. So the combination of those things is much more complex than just, how can we strike it like from behind or from below. But then helicopters are not, that does not mean they’re becoming like completely useless. Like for example, helicopters are used to intercept, deep strike drones. Like Ukraine uses a lot of helicopters to shoot down Shaheds.Yaroslav [01:41:44]: Russia uses helicopters to shoot down our deep strike drones.Counter-Drone Systems: Shotguns, EW, and Surviving FPVsBrandon [01:41:50]: A lot of people talk Oh, so Some ideas about drone countermeasures, things people do technologically to try to shoot down FPV drones or bomber drones or whatever.Brandon [01:42:03]: Dumb question that I probably already know the answer to but for the listeners, why can’t you use a shotgun? Shoot down drones that are coming after you. When you have like a Why can’t you just shoot the thing?Yaroslav [01:42:11]: That’s the main, weapon that people use against them.Brandon [01:42:15]: Why aren’t they very good?Yaroslav [01:42:17]: They’re pretty good. Like there are there are like hundreds, maybe thousands of cases of drones being shut down with shotguns, both by definitely thousands, but both by Ukrainians and Russians. There’s even like statistics ofBrandon [01:42:29]: Got itYaroslav [01:42:29]: What is the percentage of Ukraine FPV drones that didn’t accomplish the mission because they were shut down by a shotgun.Brandon [01:42:35]: Got it. So if I’m a guy with a shotgun, I’m walking around, FPV drone comes for meYaroslav [01:42:40]: I don’t recommend that.Brandon [01:42:42]: No. I don’t plan on it.Brandon [01:42:44]: I’m saying suppose that were the case. In or suppose there’s a there is a guy, he’s not me.Brandon [01:42:50]: He’s dumber than me, okay? He’s got a shotgun, he’s walking around. FPV drone is sent. Someone says, “Okay, there’s a guy walking around. Kill him. FPV drone go.”Brandon [01:43:00]: FPV drone goes after him. And he has a shotgun.Brandon [01:43:03]: What are his chances of using that shotgun to shoot down the drone before the drone gets him? Can Is Are you allowed to say that?Yaroslav [01:43:08]: Depending how good you are with a shotgun. I’ll tellBrandon [01:43:11]: Random dudeYaroslav [01:43:11]: Like I was I was talking to some Ukraine pilot group, and they told me like there was this Russian guy. He was just likeRambo.Yaroslav [01:43:20]: He’s like, he like, he shot down like seven FPV drones. They couldn’t, they couldn’t get him. They finally got him, but it was like nothing they’ve seen before, right?Brandon [01:43:30]: Got it.Brandon [01:43:30]: Your average non-Rambo.Yaroslav [01:43:32]: Average non-Rambo will just die.Brandon [01:43:34]: Will just die. So there’s like very low chance that they’ll be able to use a shotgun to shoot down the drones.Yaroslav [01:43:38]: Rather low chance. Yeah.Brandon [01:43:39]: Got it. Well, that was the kind of question I was getting at and there’s no, there’s no sort of portable electronic countermeasure that can get FPV drones if you’re just holding it, very effectively.Yaroslav [01:43:50]: There are plenty of it just, depends on it’s always like Electronic countermeasures are used all across the front line. The tricky thing is electronic countermeasures cover certain, radio electronic bands of frequencies.Brandon [01:44:06]: Let me simplify my question. Sorry.Yaroslav [01:44:07]: Like each side tries to tries to find frequency Will not be covered.Brandon [01:44:10]: Let me simplify my question. Is there a man portable system that will give me a greater than 50% chance of living if an FPV drone specifically targets me to come kill me right now?Yaroslav [01:44:21]: Look, if your system jams the frequency the drone works on and the drone doesn’t have optic fiber or a last mile autonomy, then you have 100% chance that it will, it will not fly towards you. But then what is the chance to not have drone that can either use different frequency or autonomy or fiber optic? Well, that depends on the on the area you’re in and who’s your adversary in that area, in that zone.Brandon [01:44:51]: Let’s I guess this question was maybe too dumb that I was trying to ask.Yaroslav [01:44:57]: No, it’s a great question. There are no dumb questions here, and it is just like my answers, if you feel the common theme here, is that things in practice, in war, things are way more complex than they seem.Brandon [01:45:11]: What, but so I want, like, I want I’ve read tons of things that say that basically if you’re walking around in the open and drones come for you’re not 100% dead, but you’re probably dead, and I’ve read a bunch of things that say that. I want Listeners to understand why, like, people, who are paying a tiny bit of attention to this debate, to this issue from far away intermittently in America, who don’t, I think don’t understand the weakness of our military against this kind of attack Against drone attack.Yaroslav [01:45:48]: I think there was IBrandon [01:45:49]: Have a lot of mechanisms, psychological mechanisms by which they cope with the mental idea of drones. I would like to bust those mechanisms by explaining why drones defeat in human infantry on the battlefield.Yaroslav [01:46:01]: It’s just A guided bomb flying at you, and it knows exactly where you are right? It’s not that it’s the ultimate weapon, but I think like one of the things that went viral in Ukrainian defense tech bubble, even before the words of the CEO of Rheinmetall, was some American, tank, battle tank pilot, who was interviewed and he was he was asked whether he’s afraid of FPV drones, and he’s like, “No, it’s like we have Our tanks are strong.” And that went viral among Ukrainians because they’re like, “Dude, you have no idea what you’re talking about.” Like, “Don’t mess with those drones.”like, Abrams tank, great tank, but against an FPV drone, sorry, dude, but it’Brandon [01:46:54]: Not just deadlyYaroslav [01:46:54]: Not going to work.Brandon [01:46:55]: Deadly.Yaroslav [01:46:55]: No, I was like, maybe not from one drone, but like a dozen drones will take it out. So yeah. But there is hope. So you just have to have kinetic countermeasures. Interesting thing-Brandon [01:47:10]: Kinetic countermeasure means a thing that shoots down the drone.Yaroslav [01:47:13]: Can mean many things. So if you, if you go to Ukrainian east and sort of territories close to the front lines, I think like about 50 kilometers in from the front line, all the roads are covered by fish nets.Yaroslav [01:47:31]: You literally, you ride in a corridor of fish nets, and that’s the mechanical countermeasure against the drone.Brandon [01:47:39]: You count that as a kinetic countermeasure?Yaroslav [01:47:41]: Mechanical. It says mechanical. Yeah.Brandon [01:47:42]: Got it. Got it.Brandon [01:47:43]: I don’t know all the jargon, so it’s, I’m, I’Yaroslav [01:47:45]: Whatever.Brandon [01:47:45]: What I’m talking about.Yaroslav [01:47:46]: Whatever. Then the tanks, if you look at Russian tanks and sometimes Ukrainian tanks or equipment They all look like Porcupines. They have these long sticking, I don’t know, poles? We talked about poles already on this podcast.Brandon [01:48:05]: Different kind of poles.Yaroslav [01:48:05]: Different kind of poles.Brandon [01:48:06]: A third kind of poles.Yaroslav [01:48:06]: That’s the way to protect from drone. That’s to make to that’s the way to make the drone detonate, maybe half a meter or a meter away from the actual shell of the tank. Or yeah, sometimes there are like nets on top of these tanks, just welded on some extra, sort of equipment. Then of course, there are guns ThatYaroslav [01:48:35]: Like what both Russians and Ukraine or Ukrainians are beginning to experiment with is Kind of interceptor drone, anti-FPV interceptor drone, which you put on top of something like a gun, like harpoon sort of thing, and when you see like a drone coming at you, maybe you can notice or hear it from 200 meters or 100 meters. So you have a couple of seconds, and you grab that thing, you point it, and you fire it, and then onboard it has certain AI that helps it to guide the small drone towards an attacking drone and intercept it that way. So those are the things that are being developed and like, we’re working on some of these things as well, and then you can imagine like an armor with -Hundreds on of drones on top of it, which are protector drones. They’re sort of like active armor. Whenever they see a drone-Brandon [01:49:27]: HuhYaroslav [01:49:27]: Coming at you, they, like, take off.Lasers, Skynex, and the Cost-to-Effect ProblemBrandon [01:49:29]: That’s cool. What about, what about the kind of things that the Germans are building, which is basically like a big truck with a some sort of automated shotgun on it?Yaroslav [01:49:40]: Like they have Skynex. It’s, by Rheinmetall, by the guy whom we mentioned today. Skynex is considered to be an okay weapon. Their shots are quite expensive though. So I’ll tell you this different story, aboutBrandon [01:50:00]: It’s about cost to fire each shot really and stuff.Yaroslav [01:50:03]: Cost to effect in a sort of a more abstract way. So I was last year I was speaking at Land Europe Conference. It’s the biggest USAA, USA Army, conference in Europe, called Land Europe. And There was an expo there, and there was like a Raytheon, a RTX booth there. And Raytheon is an amazing company. Gosh, we love Raytheon. They’re making Patriots. Patriots are the best. And they make a bunch of other things. And they had this laser gun project there basically.Brandon [01:50:44]: That’s what I was going to ask about next is laser.Yaroslav [01:50:46]: Laser thing was like they have it in two variations, two kilowatt, sorry, 10 kilowatt laser and 20 kilowatt laser. I’m like, “Okay, 10 kilowatt laser, tell me about it.” He’s like, “Can it take down an FPV drone?” I’m like, “Yes, of course it can.” I’m like, “Okay, cool. How much time does it take to take down an FPV drone?” And they’re like, “Well, maybe three seconds.” I’m like, “three seconds. That’s like a lot of time. But okay, maybe fine. And what if FPV drone tries to evade, right?” And he’s like, “Well, we will retarget it again.” And it’s like, “And then three seconds start again?”“Yeah.”“Okay. Well, can it take down like a dozen FPV drones?” They’re like, “Yeah, for sure.” I’m like, “Okay, a dozen FPV drones, 30 seconds? Maybe, yes. Two kilometers? Maybe yes, maybe no.” And I’m like, “Okay, how much does it cost?” And he said something like $3 million or something like that.Yaroslav [01:51:44]: I’m like, “Okay, $3 million. So that is 6,000 FPV drones.Yaroslav [01:51:51]: I doubt this thing will be able to handle 6,000 FPV drones or even 600 FPV drones coming at it at the same time.” So you have this kind of economic. And this product may not be necessarily a product against an FPV drone. It might Or against an FPV drone in an active battlefield environment. It might be guarding a stadium in a peaceful country. And then, some random dudes launch a couple drones above a stadium, shoot them down. Okay, everyone’s happy, although the drone will fall down, maybe fall on someone’s head. That wouldn’t be cool. So you would want something like catching bad drones with a net above a stadium or something like that. But whatever.Yaroslav [01:52:33]: My point is the economics mattersBrandon [01:52:35]: You’re talking about the 6,000 drones. If you sent them one by one, it wouldn’t, it would just be pew.Yaroslav [01:52:40]: But who would send them one by one?Brandon [01:52:40]: If you sent a mass of 6,000, it wouldn’Yaroslav [01:52:42]: Of course, yeah.Brandon [01:52:46]: What about just like a more powerful laser, like 100, kilowatt laser or something that wouldn’t need to spend, that wouldYaroslav [01:52:51]: No, that’s worse. You need less powerful laser that achieves the same effect.Brandon [01:52:56]: For cost of the system.Yaroslav [01:52:56]: A more powerful, yeah, a more powerful laser would be more expensive, heavier, more difficult to transport. It will be more difficult to make many of them. And therefore you wouldn’t be able to cover a long front line, and would be super expensive to replace if it gets damaged, all of those issues. So the reason why FPV drones or iPhones become so popular is because they’re small and everyone can have one? And so is with the countermeasures. So that’s, you were asking me about sort of policy advice. So that’s like another sort of mental shift that you got to go through. It’s no longer about an aircraft carrier that costs whatever, $14 billion and takes forever to build. It’s about mass, that is you can iterate on very quickly. You can upgrade it. Everyone can operate it. And then that mass when it is combined or the technologies when they’re, extrapolated from like one domain to another domain, they add up, right, as it happens with software. So I think that’s important.Noah [01:54:14]: Can I ask a follow-up question? So Russia is not necessarily the smartest army you could be fighting. What would happen if you, your adversary was smarter? Do you think things would change meaningfully?Yaroslav [01:54:31]: Look, I don’t know if I fully agree with not the smartest army. Who is the smartest army?Brandon [01:54:37]: Ukraine?Noah [01:54:38]: That’s a great question.Yaroslav [01:54:40]: I don’t know. I don’t know.Yaroslav [01:54:43]: I think those are like, very dangerous assumptions to make.Brandon [01:54:48]: Who was the smartest army in World War I?Yaroslav [01:54:51]: Like, well, define smart.Russia’s Strategy, Western Assumptions, and Preparing for WarBrandon [01:54:53]: The United States. Yeah.Yaroslav [01:54:53]: Why do you think so?Yaroslav [01:54:55]: Why do you think Russia is not the smartest army?Noah [01:54:56]: Maybe this is just my own, information bubble.Yaroslav [01:55:00]: I’m just like, maybe I agree with you. But I’m just like, I’m naturally wired To challenge those assumptions.Noah [01:55:06]: No, that’s a that’s a really good point. I guess, when I, from my information bubble, it seems like Russia’s strategy has largely been to just throw resources, people-Yaroslav [01:55:17]: You are living in a Western propaganda Information bubble, of course.Yaroslav [01:55:21]: Like, as am I.Yaroslav [01:55:22]: Like, because we’re all rooting Ukraine to win, right? Sorry, go on.Noah [01:55:26]: In but going back to this granted there’s a history of large powers failing to take over smaller, -Strategically, youYaroslav [01:55:38]: Divide and GoliathNoah [01:55:40]: They, thisBrandon [01:55:40]: They fail a lot more now than they used to. The success rate of taking-Noah [01:55:44]: That’s trueBrandon [01:55:44]: Places over has gone way down.Noah [01:55:46]: Certainly, yeah. But regardless, it does, I do wonder, like, if Russia had not essentially assumed victory early It may have different, yeahYaroslav [01:55:56]: I, like, they’re super stupid, of course.Yaroslav [01:55:58]: Like, they were marching at With their parade, costumes and like, they were thinking they’re going to have a parade in Kyiv in a few days. Like, that was super stupid. And like, there were lots of stupid things that are like they have no regard, no care for human life. They’re sending those Russian folks just, like, without armor, without anything, like folks on crutches, like sending them to storm Ukrainian positions. And it’sBrandon [01:56:23]: They’re the Zerg.Noah [01:56:23]: You think at this point there’sYaroslav [01:56:24]: I have, like, I have actually a good friend. He’s American. He’s from Seattle. He’s, served, had been in the Special Forces here in the US, had been in maybe three deployments, and then went to Ukraine, volunteered.Yaroslav [01:56:39]: He’s been fighting since, like, 2022. He’s a very good friend of mine. So at some point he’s like, he’s been texting me, and he’s like, “Okay, I’m near Pokrovsk,”and sorry, not Pokrovsk. It was gosh, the other city, Chasiv Yar.Yaroslav [01:56:55]: It, and he’s like, “Okay, so what Russians are doing, they’re just creating so much work for all the all the psychologists who are going to heal those Ukrainian, whatever, riflemen or machine gunmen, who are just, like, shooting at the Russians who are like, going nonstop,”right? So it’s like causing, or Russians are causing psychological trauma on Ukrainians because they’re dying in such stupid way.Noah [01:57:26]: JeezYaroslav [01:57:26]: That is indeed stupid of sort of Russian higher command, et cetera, et cetera, et cetera. But then that’s the resource they have. AndBrandon [01:57:38]: If you’ve got, if you’ve got Zerglings, you use your Zerglings.Yaroslav [01:57:40]: That’s the way. That’s their strategy. That’s their way of strategy, right?Brandon [01:57:43]: If you’re going to play Back in the That’s what you do.Yaroslav [01:57:46]: If you play StarCraft, that’s how Zergs win.Brandon [01:57:48]: Are Ukrainians the Terrans?Yaroslav [01:57:52]: I don’t know. I hope we will become Protoss soon.Yaroslav [01:57:57]: I’m working on that. I’m working on that.Brandon [01:58:02]: Protoss had fairly bad political management at the topYaroslav [01:58:04]: I wish Protoss with a speed closer to like, humans or Terrans, whatever it is. Hopefully we can do Protoss technology with a Zerg speed. That would be the best. I think that’s what the housewives are working on in fact.Brandon [01:58:20]: You cannot beat those housewives. Do not oppose Ukrainian housewives.Yaroslav [01:58:23]: Do not mess with Ukrainian housewives, for sure. Yeah.Noah [01:58:26]: Two final questions. First one, you started out by telling us a story about going to a chapel on February 23rd.Noah [01:58:34]: Were you able to get married there? Can you finish that story?Yaroslav [01:58:40]: We actually, we did get married, but we postponed the wedding as a social event, until the war is over.Noah [01:58:49]: Then last question, what do you want our audience to take away? If you have one point you want them to walk away with what would it be?Yaroslav [01:58:58]: You want peace, be prepared for war. Got to invest in defense and security.Noah [01:59:04]: All right. Thanks. Thank you for talking with us.Yaroslav [01:59:06]: Thank you.Noah [01:59:07]: Thank you, Noah, for all the great questions.Yaroslav [01:59:11]: No, it was fantastic.Yaroslav [01:59:12]: Thanks so much.Brandon [01:59:13]: Really fun.Noah [01:59:13]: Awesome. Thanks. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Special discounts up for AIE Melbourne (LS discount) and AIE World’s Fair (group discounts up to 25% - CFPs still open for Autoresearch and Vertical AI) Cya there!Abridge did not start as an “GPT wrapper”. It was founded in 2018, years before the Cambrian explosion of AI application layer companies. OpenAI launched ChatGPT publicly on November 30, 2022 and by then, Abridge had already spent years doing the unglamorous work of building trust for one of the highest context, most important workflows in healthcare: the conversation between a patient and a clinician.Abridge’s original wedge was clinical documentation. Listen to the visit, generate the note, reduce the clerical burden, and let clinicians spend more time with patients instead of the EHR. By focusing on how doctors actually document, how health systems actually buy, how EHR integration actually works, how clinicians verify outputs, and how missing context during a visit turns into downstream friction across billing, prior authorization, quality, and follow-up, the adoption of LLMs became a force multiplier on a workflow already optimized for sensitive context gathering.The company has scaled fast: Abridge says it is projected to support 80M+ patient-clinician conversations this year across 250 large and complex U.S. health systems, with support for 28+ languages and 50+ specialties. It raised $300M at a $5.3B valuation in June 2025, after a $250M round earlier that year.Today, Janie Lee and Chaitanya “Chai” Asawa of Abridge join us for another crossover pod with Redpoint’s Jacob Effron (who is on the board of Abridge) to dive into how Abridge is building the clinical intelligence layer for healthcare starting with ambient documentation, then expanding into clinical decision support, prior authorization, payer/provider/pharma workflows, and eventually real-time agents that act before, during, and after the patient conversation. We go inside the product, data, infra, evals, workflow, privacy, and org design choices behind bringing AI into one of the highest-stakes enterprise environments from 100M+ medical conversations and specialty-specific evals to real-time alerts, EHR integration, de-identification, clinician-scientist teams, and why healthcare may solve some of the hardest AI problems first.We discuss:* Why Abridge started with clinical documentation, “pajama time,” and saving clinicians 10–20 hours a week* The transition from ambient scribe to clinical intelligence layer: save time, save money, and save lives* Why conversations between patients and clinicians may be the most important workflow in healthcare (patient visit summary feature)* Chai’s “healthcare-coded Glean” framing: context is king, but healthcare raises the stakes on safety, evals, and rollout* Why Abridge wants AI to feel like “air conditioning”: always in the background, but only interrupting when it truly matters* The prior authorization example: turning a denied MRI weeks later into real-time guidance while the patient is still in the room* Why payer policies, EHR data, medical literature, and hospital-specific guidelines make the problem hard, and also create the moat* How Abridge thinks about ambient form factors: mobile, desktop, in-room devices, nursing workflows, multimodality, and future AR* The multi-sided healthcare customer: CMIOs, CFOs, CIOs, clinicians, patients, payers, and pharma* The hardest AI problem at Abridge: high-quality, low-latency, low-cost real-time support in a high-stakes clinical setting* When Abridge uses frontier models vs proprietary models, and why its unique data from medical conversations matters* Why “every agent is a coding agent underneath,” and how the EHR can be thought of as a filesystem for healthcare agents* How Abridge approaches personalization across individual doctors, specialties, and health systems* Why “AI slop” is AI without context, and how edits, memories, and clinician preferences create a data flywheel* Abridge’s eval stack: LFDs, LLM judges, in-house clinicians, third-party evaluators, specialty-specific evals, and progressive rollout* HIPAA, PHI, de-identification, one-way anonymization, customer contracts, and learning from healthcare data safely* What changes when you operate at 100M+ conversations: reliability, cost, post-training, model routing, and infrastructure optimization* Why the same clinical conversation can serve doctors, patients, payers, pharma, and future clinical-trial workflows* How Abridge works with EHRs, and why deep interoperability is table stakes for clinician adoption* Why healthcare AI has regulatory tailwinds, why 80/20 does not work here, and why high-stakes domains may drive AI forward* Why Abridge embeds “clinician scientists” into product and eval teams* What Chai learned from Glean about search, quality, and durable AI infrastructure* Why the future of AI infra may look like context layers, event-driven systems, Kafka, Temporal, sockets, CRDTs, and tools built for humans* Why Janie changed her mind on “PRDs are dead,” and why crisp written clarity matters more in complex AI products* How Abridge uses Claude Code, Cursor, and coding agents internallyAbridge:* Website: https://www.abridge.com/* X: https://x.com/AbridgeHQJanie Lee:* LinkedIn: https://www.linkedin.com/in/janiejleeChaitanya “Chai” Asawa:* LinkedIn: https://www.linkedin.com/in/casawaTimestamps00:00:00 Introduction and what Abridge does00:02:05 From ambient documentation to clinical intelligence00:04:04 Clinical decision support and context as king00:06:57 Alert fatigue, proactive intelligence, and prior authorization00:12:36 Ambient AI form factors and healthcare customers00:16:59 The hardest AI problems in healthcare00:18:26 Frontier models, proprietary data, and model strategy00:21:07 The EHR as a filesystem for agents00:24:03 Personalization, memory, and clinician preferences00:30:40 Evals, LLM judges, and progressive rollout00:36:47 HIPAA, de-identification, and privacy00:39:21 100M conversations and operating at scale00:44:10 EHR integration and the clinical intelligence layer00:46:39 Healthcare regulation, latency, and high-stakes AI00:50:11 Clinician scientists and long-tail quality00:53:04 Lessons from Glean and durable AI infrastructure00:57:03 The future of agentic healthcare workflows00:57:34 PRDs, product clarity, and building serious AI products01:03:11 AI coding tools at Abridge01:04:06 OutroTranscriptIntroduction: Abridge, Clinical Intelligence, and the Latent Space x Unsupervised Learning CrossoverSwyx [00:00:00]: Okay. This is a special crossover Latent Space Unsupervised Learning pod.Jacob [00:00:07]: Very excited to do this.Jacob [00:00:08]: At this point, we get together once a year.Swyx [00:00:10]: Once a yearJacob [00:00:11]: And this is a fun occasion to get to do it on.Swyx [00:00:13]: I really wanted to talk to Abridge but I felt very underqualified because healthcare is not something we cover very intensely. It just so happens that Redpoint’s our big investors and supporters of Abridge.Jacob [00:00:27]: Anytime you want to have a portfolio company on your podcastJacob [00:00:29]: Please, by all means.Swyx [00:00:31]: So we’ll introduce our guests. Chai and Janie, welcome to the pod.Janie [00:00:34]: Thanks for having us.Chai [00:00:35]: Thank you.Janie [00:00:35]: We’re excited to be here.Chai [00:00:36]: Thank you.Swyx [00:00:36]: So for listeners, what do you guys do, just to situate you guys in the company?Janie [00:00:42]: Abridge is a clinical intelligence layer for health systems. We really started with documentation and building for clinicians and as we think about reducing the burden that clinicians have, they’re spending 10 to 20 hours a week on documentation. There’s a massive doctor shortage in the country. We also think that conversations between patients and clinicians are probably the most important workflow in healthcare. It’s where care is given and received but if you think about the 20% of our GDP that goes towards healthcare, almost everything is a derivative of that conversation, whether it’s the claim, the payment, the actual diagnosis given, the treatment. And we’ve started with a conversation to reduce the burden for doctors on documentation but we’re really excited about the path ahead as we become this broader clinical intelligence layer.Chai [00:01:34]: I’m Chai. I work on clinical decision support at Abridge.Swyx [00:01:37]: Yes.Chai [00:01:37]: And so as Janie said, we’re uniquely situated where we started off with the clinical note. What I’m really excited about and where we’re expanding towards is what are all the things you can do before the conversation, during the conversation and after the conversation if you did have access to all the context about patients, payer guidelines, medical literature and put that together and to serve, how healthcare could look fundamentally different.Swyx [00:02:01]: And that’s the context engine that you guys have?Chai [00:02:04]: Yes.Swyx [00:02:04]: Is that what it’s called? Okay.Swyx [00:02:05]: So historically, as I understand it, the company started in 2018. A lot of people would be familiar with the AI voice notes form factor that doctors would be “Well, do you consent to being recorded?” It replaces handwriting and what have you. But it sounds like more recently there’s been a big transition in the company. Tell me about the broader transition.From Documentation to Clinical Intelligence: Save Time, Save Money, Save LivesJanie [00:02:26]: So from a transition perspective, we really think about our journey as The first act was: how do we help save time? And that’s where a lot of that original product was.Swyx [00:02:37]: By the way, one of those interesting statsSwyx [00:02:39]: On your landing page was, doctors spend time after hours.Janie [00:02:43]: They call it pajama time.Swyx [00:02:44]: Why is that pajama time?Janie [00:02:46]: Doctors after work in their pajamasSwyx [00:02:48]: In their pajamas. OhJanie [00:02:49]: At home are just writing and catching up on their notes every day.Janie [00:02:53]: Some of our favorite customer love stories, we have a Slack channel called Love Stories. We have clinicians telling us, “Abridge has helped us, from retiring early or we’re now finally able toJanie [00:03:06]: go home and eat dinner with our kids for the first time.”Chai [00:03:08]: Save the marriage in some cases.Swyx [00:03:10]: One of the quotes was “We’re not divorcing anymore.”Swyx [00:03:12]: I’m asking, “Why?”Swyx [00:03:14]: Because they’re working too much.Janie [00:03:16]: But, in terms of where we’re going and where we’re expanding, we really think about our second and third acts around how do we help health systems save and make more money. Health systems are operating with record-low operating margins. It’s getting harder and harder to serve patients and they have regulatory, some tailwinds but also a lot of headwinds coming their way and AI is ripe for helping on the saving and make-more-money piece. And then ultimately, how do we help save lives? The fact that our software and our product is open millions of times a week before, during and after a patient walks in the room, gives us massive opportunity with products like clinical decision support, which Chai is building but so many others to improve patient outcomes and probably one of the most important workflows and problems to be going after right now.From Glean to Healthcare: Context Is KingJacob [00:04:04]: One thing that’s interesting, Chai, is you came over to Abridge from Glean and clinical decision support, which for our listeners is, in the context of a visit, helping a doctor figure out the right type of care. It’s really a search problem in many ways, going through lots of different data sources. Very analogous to your previous role as one of the earliest engineers over at Glean. I’m sure a lot of our listeners are curious what’s similar about the problems that you’re going after now and what feels different, now that you’re in healthcare.Chai [00:04:33]: Very similar. Taking a step back, with every wave, there’s a lot of very similar patterns that happen across different products. A lot of social networking products look the same. A lot of credit-based products look the same. And we’re seeing that very similar in the agent era with many companies, of course, in Redpoint’s portfolio and so forth. And the key insight between both companies is that you have amazing models but context is king. Context is what puts them to work. So I see it in a lot of ways, a lot of similarities in this is a healthcare-coded version of Glean but the differences are really interesting. A couple things that come to mind. First and foremost, the rigor of the setting we’re in. The downside risk is extremely high here in healthcare. It can be fatal in some cases. You prescribe something that the patient is allergic to for example. Whereas at Glean, it’s “Oh, you got the question wrong.” It wasn’t the end of the world in most cases. And so what does that mean? That shapes our evaluation strategy, both offline evaluation, progressive rollout and there’s a lot more we could go into there. Second thing that comes to mind is, vertical versus horizontal. In both cases, there’s a large variance but when Glean is, it’s a much more horizontal company, there’s a variance of personas, companies that you’re working with. We also have a variance of personas, different types of specialties, different hospital systems. But the variance is a little more narrow. So from a product perspective, you’re able to focus far more, especially when you have a maturing technology and you’re building new products that never existed before. It lets you go after them much more easily and especially in healthcare where so many problems were solved with labor and process, that it’s extremely ripe for AI to keep helping augment and enable. And the final thing that’s really interesting, Abridge specifically compared to many other companies in the AI area, is the modality we started with where we’re ambient and we’re always listening in the background. And many more AI products will go that way but it’s how we started. And that’s the greatest form of AI we can create, AI that’s seamless. You’re not looking at your screen. It’s always there. It’s always helping you out and being proactive. The Jarvis vision that, every hackathon I went to over the past decade, there was always a Jarvis competitor. But Abridge very much started from the opportunity and continues to go that way.Ambient AI and Alert Fatigue: When Should the Product Interrupt?Jacob [00:06:57]: One thing that is super interesting then from a product perspective is you have this always-on seamless in the background and then you have to decide when you break the wall almost and say, “Hey, clinician, you might not have thought about X,” or whatever it is that you want to do. And in healthcare traditionally there’s been this idea of alert fatigue and a million pop-ups and then a doctor just ignores all of them. It’s probably a pattern that a lot of builders are thinking through now. How do you think about the right way to intervene or to pop up in a doctor visit?Janie [00:07:26]: It’s such a good question. Alerts are notorious in healthcare specifically. Over 90% of alerts are ignored. The first and most important thing is context is everything, as Chai alluded to and I also think about how do we go from being reactive alerting to really proactive intelligence at the point at which it matters most. One thing we like to say is we want our product to feel like air conditioning. It should be in the background just making things better and if there is something that has great clinical risk and we’re acutely aware that intervening now and not later is incredibly important, we should decide to act. But if you think about proactive versus reactive, instead of alerting a clinician during a visit when they’re with their patient having a pretty serious and sensitive conversation, how do we prep a clinician before they walk into the room with that patient? And so historically, clinicians might have to manually go through charts with a patient that they’ve had over the course of months or years and they’ll try to suss out what are the things they should be doing. You can imagine a world with Abridge. We’ll summarize all of the most recent context for you, tell you based on the reason for a visit the patient is coming in for the types of things you should be discussing. And so you’re going into that conversation prepped rather than walking in cold to that patient visit and then having this product interrupt you five or 10 times throughout the visit. And there might be times where it’s really important to interrupt. We have a product called Prior Authorization and so this is when you may go into a doctor’s office with knee pain. They’ll prescribe you an MRI and so many of us have had this experience before, where in four weeks you’ll get a call saying, “Hey, Sean, that MRI that you were prescribed wasn’t approved and why don’t you come back in? We’ll figure it out.” In a world with Abridge, we might choose to quietly but still alert a doctor in that visit. And alert is probably not even the word we would want to use. Before a patient leaves, we would want to tell the doctor, “Hey, Doctor, before Sean leaves, you should ask him, has he had physical therapy and has his pain lasted for more than six weeks? Because the Aetna plan that he’s on in California requires six things. We’ve already confirmed four of them have been met ‘cause we have all the context. But these two last criteria, if you can address with Sean before he leaves the room, we could guarantee that your MRI is approved before you leave.” And so when you think about clinical usefulness, impact to the patient, there are instances in which if we can catch a doctor while the patient is still in the room, as we think about save time, save money, save lives, we get to check all of those boxes. But when doctors have 15 minutes between visits, we have to be really thoughtful about when it matters.Prior Authorization: Reducing Latency in CareChai [00:10:23]: There’s this interesting product opportunity AI has is reducing latency in the world. For example, prior authorization is an example of where care gets delayed and so great AI can reduce that. And the problem with alerts before partially is a technical problem: the quality of your alerts really matters. They’re going to get ignored if you get alerts that... Similarly in engineering, where they’re noisy alerts that you can’t act on. But if you can make really high-quality alerts with both the context, as Janie said, and really high-quality models, then you can create a whole other game.Janie [00:10:53]: And I really like that experience because it starts to tease apart, what makes this so hard and unique. One, to make that prior authorization example possible, think about all the data that you need to have. You need to integrate with the electronic health record to know all of the patient context. Do we have access to your previous labs, previous imaging? And then to match you and to know that you’re on Aetna, we have to collect all of the different payer policies and they vary by state. Some of these payer policies live on websites. Some of them live in unstructured 50-page PDF files.Jacob [00:11:31]: I thought this episode wasJacob [00:11:31]: To make sure we didn’t scare people from healthcare.Janie [00:11:34]: But when you think about the things that make it hard, it also gives you the moat.Janie [00:11:39]: And then the second is the AI and the model quality we need to be able to hang our hat on. And so the bar, similarly when I worked at Opendoor, I worked on pricing models. Every outlier wiped out the margins of 30 and so similarly here in healthcare, the bar for accuracy is so high. And then I’d say the last is workflow is everything. If insurance companies deploy AI, it typically happens too late and this is when you have the notorious comical examples of AI just fighting each other when it’s too late. But if we can pull forward the use of both the AI but also the ability to solve problems when the patient’s in the room, you can start to collapse what typically takes weeks or months after your visit, ideally down to minutes or real-time. And it’s where healthcare is both very difficult but also extremely rewarding if you can crack it.Product Form Factors: Mobile, Desktop, In-Room Devices, and ARSwyx [00:12:36]: Just to get some baseline on the form factors, because I’ve seen some videos on your website and stuff. You guys talk a lot about ambient AI. Is it primarily on the phone? Is there any other form factor that people get Abridge in? Is there an Abridge room setup where it’s always on? I don’t know.Jacob [00:12:55]: An Abridge podcast studio.Janie [00:12:58]: Primary form factor is mobile and desktop. UsuallyJanie [00:13:00]: Clinicians are walking in and out of rooms with mobile but at the end of the day, when they’re closing out their notes or wanting to prep for the day ahead, they might use desktop. We have been having a lot of really interesting partnership conversations with a lot of these in-room device companies as you think about the power of multimodality and even more data, as you think about all of what is not captured today. It is fascinating to think about, especially even as we go into building and scaling our nursing product. It’s one where nurses constantly, as they’re walking in to check in on a patient for two minutes or maybe even 30 seconds,Janie [00:13:43]: Starting an Abridge experience is probably going to take longer than the visit. And so what can we do with in-room devices that are always on starts to raise really interesting and fun product questions.Swyx [00:13:54]: I was thinking, the way in tech companies we have all these Google MeetSwyx [00:13:58]: And other things, we might as well set up entire rooms with just Abridge tech.Chai [00:14:02]: Very much. AR glasses and related form factors are also relevant: how do we bring the information to the clinician in real-time without a screen, while still letting them focus on the patient?Swyx [00:14:18]: Do you think they want that? I’m skeptical of AR, but I’m curious what you’ve tried.Chai [00:14:26]: Admittedly, it’s not a near-term product roadmapChai [00:14:29]: By any means. I’m being far-fetched.Jacob [00:14:31]: There’s some sick AR stuff for surgeries.Swyx [00:14:33]: Really?Jacob [00:14:33]: When people are trying to visualize, you’re about to make an incision but you want to see, what the cut might look or what the body might look like inside and they can layer in imaging.Swyx [00:14:43]: That’s cool.Chai [00:14:45]: At some point in the future.Janie [00:14:46]: But there are a lot of our largest customers and at the largest health systems integrating already and so even as we think about building into it, unlocks a lot of product capabilities.Swyx [00:14:57]: And just to establish the terminology. Sorry, and I know I’m asking basic questions somewhat for myself but also for the audience who might beHealth Systems, Buyers, Clinicians, Patients, and PayersSwyx [00:15:05]: Less integrated. When you say health systems, it’s like the Johns Hopkins, the Kaiser Permanentes.Janie [00:15:09]: Mayos, the Kaisers of the world.Swyx [00:15:10]: These are your customers, right? And the outcome that you deliver for them is happier doctors, reduced cost of processing, reduced mistakes. It’s weird in a sense that I feel like there’s also, a secondary customer, the customer of the customer and I don’t know if you — do you think about it that way?Janie [00:15:28]: The other interesting and complex part of building product is we have our buyers, who are the chief medical information officersJanie [00:15:39]: The chief financial officers, the CIOs of these large health systems. Our users today are clinicians but if you think about who downstream is impacted, it’s patients. And so as we build, with every product in mind, we think about who we’re building for, who the secondary user is and what does that mean either in terms of experience, security compliance, ROI that we have to make tangible. And so like you said, time savings is one of them. But for CFOs, they care a lot more than just time savings. We have to show for every dollar you put into Abridge, because you have more compliant documentation or because you have fewer queries coming from your billing team, we save or add real dollars to your bottom line or top line, are things that we’re constantly thinking about because of the dynamic across all three sets of users.Chai [00:16:32]: There’s a whole other axis too with the payers and pharmaChai [00:16:35]: as well. Connecting all these three big stakeholders in healthcare isSwyx [00:16:39]: Do the payers ever see your data? Sorry, the payers meaning the insurers, right?Chai [00:16:44]: Yes.Swyx [00:16:44]: They also see Abridge data?Chai [00:16:47]: NoSwyx [00:16:47]: Like the direct integration to you guysChai [00:16:48]: They wouldn’t see the raw Abridge data but when you’re working together on something like prior authorization, whatever information they need, we’d communicate to them.Jacob [00:16:59]: That’s cool. I would love to dig into the AI side. You still have a lot of problems on the AI side. And so maybe to start at the highest level, what’s one of the hardest problems you have to solve in AI at Abridge today?The Hardest AI Problems: Quality, Latency, and CostChai [00:17:11]: To make things simple, let’s take, building off the prior auth example. So one thing Janie talked about is okay, this data is all over the place and there’s this combinatorial explosion of procedures, payer policies and even sometimes different health systems. There can be some cross-product of all of these different considerations you have to take into account. But what’s really hard about this problem is doing it real-time in the conversation. So, in any AI product, usually the three KPIs you care about are quality, latency and cost. Now, what we’re saying is we want you to do this real-time in the conversation, guiding the clinician. How do we do it in a way that does not break the bank? But we’re using — But we also need very intelligent models because you’re working with this cross-product of data and this, all this context layer as well. So you need high intelligence and high-quality because you don’t want the alert fatigue but you also need to be fast and cost-effective. And so that’s where a lot of clever engineering goes. It’s okay, without getting into all the details here, can you model these policies in some intermediate representation or other things that you can do that can make this problem tractable? And of course, the Pareto frontier is always changing but we are also trying to do this now.Model Strategy: Third-Party Models, Proprietary Data, and Medical ConversationsJacob [00:18:26]: What implications has that had for what you take off-the-shelf and say, “ what? We don’t need to be world-class at X. We’ll just take this from the model providers or from some infrastructure player,” and what you’re “No, this is where we spend most of our time focused on”?Chai [00:18:38]: This is, the fun challenge in AI?Jacob [00:18:42]: It changes every three months? SoChai [00:18:42]: Of course, with the shifting landscape, we try to be extremely thoughtful on predicting the trends of where third-party models are going and where we can uniquely go. And, sometimes when you talk about AI models, we’re the models are just going to get infinitely better. But I don’t think... It may be in the grandness of time you could say that but, within every month, every quarter, there’s specific ways they’re getting better. They’re training on a lot more, coding data to be better coding agents, for example. And soChai [00:19:14]: We have to think about where are the things that won’t — unique data that we’re uniquely training on or to step back a little, where is a proprietary model bringing advantage to us is if it can give higher quality or lower cost and latency for similar quality, very similar to many other companies. And when we can do that is when we have proprietary data. So, for example, we have on the order of eighty million or hundreds of millions now getting close to of medical conversations.Jacob [00:19:44]: It’s insane.Chai [00:19:45]: This is a unique data set. And this data set, it’s very interesting because this data set is effectively a large part of the trace between the patient and the provider. That’s where the quote-unquote debugging happens in healthcare. We have these traces at scale, as in as, our CEOs even called it, an exhaust that comes out of our product. And so when you have these traces, that’s how you can train better agents on certain use cases, whether it’s your transcription diarization use cases or so on or like note generation models and we can do that much cheaper and faster. But we’re always also working with these third-party model providers. We closely collaborate with them and that’s how we predict where the trends are going. The thing that I think about a lot is that, I know that the model providers are going to train much more on agentic workflows and so forth, so that’s great, so that you have a better agentic harness. But the other thing that’s interesting is that the model providers, because a large class of the consumer model providers is healthcare queries, that they might, optimize to train a lot of healthcare data to encode the knowledge in its weights. And this is just a great thing for us as well, where the off-the-shelf models can keep bett-getting better at general healthcare information, such that what our strategy is, we have a constellation of models, we can use something for this, that and, we only care about, at the end of the day, the best product experience.EHR as File System: Agentic Workflows and Real-Time InterfacesJacob [00:21:07]: And, you have, overall capabilities improving. I’m curious, as these models get better, is there something you look at and you’re “, three months ago, we really couldn’t do that but God, the the latest models really allow us to do it”?Chai [00:21:19]: So here’s something interesting that I’ve, been toying with. So all models are... This wasn’t super obvious a year ago but now it’s become clear and clear that almost every agent is a coding agent underneath the hood? So you give it whatever file system, it can write its own code and so forth. So when you think about within healthcare and the use case that we have, you can think of the EHR effectively like a file system. It’s just — it’s a storage of all this information. It’s a lot of information there that cannot fit into the context window, at least of today’s models and you want to use that context effectively for all these product use cases we’re talking about. And so if you have better agents that can, manipulate data, read that data, treat it as a file system as we see they’re going and we know model companies are investing this way, then that very directly benefits us.Swyx [00:22:09]: Yeah. Okay, cool. Again, just establishing basic things. But we’re going back to the model stuff. I’m really interested in double-clicking more on the real-time, element, which is pretty important for both of you. Is it — Is real-time just batches of every one minute, every five minutes? Is that how we do it? Or is there some more native, genuinely real-time in the sense that OpenAI has a real-time API or Gemini has a real-time API?Chai [00:22:35]: Yeah. Yeah. So today it is more on the on the batch basis but there’s interestingChai [00:22:41]: Prototypes that we have that we’re still not fully, full time, voice in text out or in that sense. But, can you trigger your models, your agents or agentic workflows, depending on the right times in the conversation?Chai [00:22:58]: And so you can imagine, different techniques to bring this latency down and, you want to bring the feedback loop down as much as you can. And so a lot of clever engineering there without fully... Maybe one day we’ll do full voice in and text out, train a model to do something like that.Swyx [00:23:15]: You do — People don’t want voice in voice out?Chai [00:23:18]: Now we aren’t creating experiences that are, during the conversation, inter — It’s almost likeSwyx [00:23:25]: Might be too disruptiveChai [00:23:26]: Too disruptive until, who knows, maybe eventually you could have full voice agents once we — the quality and we improve the comfort of the technology. But right now gra — that change is much more gradual and it’s more text focus, text out.Janie [00:23:42]: And so much of currently what our product is trying to do is allow a clinician to focus on their patient and maybe at some point but right now patients, clinicians don’t want a third voice, at least in a literal voice in that room. And so how do we be there with all the contacts and information ready at hand when there’s the right moment?Personalization: Individual Doctors, Specialties, and Health SystemsJacob [00:24:03]: Jenny, one thing I’m curious about is how you think about, personalization in the product. I imagine, every doctor is a special snowflake in their own way, has their own way they like to do things. There are probably a bunch of different approaches you could take to doing that, both within the model layer itself but then also just with clever prompting or engineering. How do youJacob [00:24:20]: Deliver on that?Janie [00:24:21]: It’s such a good question. Personalization is massive for us. We think about personalization at three levels. The first is at the individual, the second is at the specialty level and then the third is at the health system or the organization level. To your point, there are a lot of individual preferences. You-When a note is produced, it almost is a reflection that is so deeply personal of a doctor’s work and how they give care. And so do they have preferences on things like style? They might want bullets versus paragraphs, really concise versus comprehensive. They also might have phrases that they really like to use or the templates that they want every note to be structured. And, we see it in our feedback all the time. We want two spaces in between sentences or I refuse to use this tool. And so that’s something that we’ve had to build in. And the tricky part is how do you make sure that stylistic preferences don’t interrupt accuracy and quality and that’s something that we’ve really had to refine and hone over time. Second is at the specialty level. A cardiologist note or workflow is going to look very different from a dermatologist workflow.Jacob [00:25:32]: I assume cardiology notes are the highest stakes for you guys, given your CEO is a cardiologist.Jacob [00:25:36]: It’s “Oh my God, make sure we get this one.”Janie [00:25:37]: Shiv, our CEO, is still a practicing cardiologist. He rounds once a month. And so, first call when we want just quick and easy user feedback too.Janie [00:25:46]: But, specialties require a lot of personalization, both in terms of what does the product look and so we make sure that as new users onboard, we catch that and the product proportionally reflects that. But also on the back end, evals at the specialty level, they are hard-earned to calibrate and get. What does a really great dermatology note look like? What makes it complete? What makes it compliant and billable is very different than a primary care doctor. And so it’s not just about what does the product experience look but on the back end tuning and really deepening our understanding for the specialists. What does great output look like? And that’s, a problem that we need to calibrate internally, externally, online, offline but, takes lots of cycles but is necessary in a high-stakes environment. And then at the health system level, for products like clinical decision support, you have health systems who’ve spent years or decades refining their best practices and they want to know, “Hey, we love your clinical decision support product but how do we embed our own hospital guidelines into them to inform clinicians before, during or after a visit what brest — best practices should look like?” And as you think about, deepening moats as well, when health systems, trust us with that data, allow us to productize it and directly into the clinical workflow, makes us a really great partner to health systems who want to build something that truly meets their needs, their practicing guidelines.AI Slop, Memory, and Product Data FlywheelsChai [00:27:23]: And I want to add onto that. The for the clinical documentation problem, it’s very similar to AI writing that doesn’t feel like your own and then we call that slop. But the way I describe one framing of slop is like AI without context. But we have all that context and both the clinicians, can have it and can guide it. And so part of the other interesting exhaust for us is, memory is, one of these new systems recordsChai [00:27:49]: Almost.Janie [00:27:50]: And we also have all the edits people make on our product and when you think about a data flywheel and how we get better over time becomes really powerful as a mechanism to just going deeper in personalization.Jacob [00:28:04]: It’s interesting. I love this idea of working with systems on the guidelines they built up over a long time. I feel like so many of the best AI app companies today are... The question is: How do you take the expertise that a law firm or a bank has built up over many years and then add that as context and also a special sauce over, a an AI tool? And so seems like y’all are really doing that very effectively.Janie [00:28:24]: We’re now starting to have our customers ask, “What are other customers doing?”Janie [00:28:28]: “And how are they doing it?”Janie [00:28:30]: And as we think about having visibility across such a large set of care being delivered right now, a really interesting place we could also partner.Swyx [00:28:40]: I’m just curious. I — This may be a nothing question but, how different are health system guidelines from each other? Don’t they all converge to the same thing? And if not, where do they differ?Chai [00:28:52]: At a really high level, they’re going to talk about very similar things but the difference is probably in some more of the details. “Oh, you should refer to specialists only when XYZ conditions are met,” or so forth and maybe different organizations have different practices and guidelines around that. But high level, talking about similar things but the details are what, of course, that shapes the context and the decisions you make.Swyx [00:29:15]: And this all goes into the context engine and it might affect the notes but maybe not.Chai [00:29:21]: The — For these local pathways, we’re definitely thinking about it a little more for our clinical decision support product.Chai [00:29:26]: So yeah.Swyx [00:29:27]: Which is your stuff, yeah.Swyx [00:29:28]: And then the memory which you raised, let’s just tell us more about that. What have you tried in memory? What’s the structure of the memory? What works? What doesn’t work?Chai [00:29:38]: There’s, of course, many different ways you could do memory, where it’s okay, can you bake it into the model weights or can you do it in some external store? For us, what’s interesting is, of course, when you think the models are rapidly changing, whether it’s in-house or third-party, baking into the model weights, sometimes you worry that it could be a little throwaway. And so, how do you... You need to find a way that you decompose the problem, the preferences from the underlying models and so forth. The thing we’re right now most both that’s easiest to start with and we’re excited about is having, a separate store for memory, where you have, for example, a memory sub-agent that’s, working in the background, figuring out what are the important parts of the clinician’s actions that we want to remember for the long term. And then you can also imagine, other things where in the — you have background jobs that are running that are collating these, memories similar to Sleep, of course and what other pattern, patterns products do as well. Learning over all these action, all the action data we have, again, note edits, the conversations they did and the actual transcripts.Evals: LFD, LLM Judges, and Clinical SafetyJacob [00:30:40]: What about evals? How in the world do you... It is such a complex product surface area. We would love to hear you riff on that and also how has that evolved? I’m sure you’ve gotten better at it, so any learnings along the way.Janie [00:30:50]: From an evals perspective, we, from day one when we build any new product or feature, we think about, what does good look like? And there are table stakes things like clinical safety but then you start to get deeper into what does good quality look like. And when you go into something like our core product, there’s stuff like style and completeness and there’s things like does this note become something that can be billable, which is very high stakes for a health system. We have a number of ways in which we get confidence for this. We have, internal in-house clinicians who do what we call an LFD process to give us our very first pass at is this or isn’t this a good enough output, look at the effing data.Jacob [00:31:41]: LFD?Chai [00:31:42]: That’s why I was smiling. I was “Is Janie going to mention what it stands for?”Jacob [00:31:46]: I was not... There’s like a million acronyms.Jacob [00:31:48]: How am I supposed to know that I don’t? So “Oh yeah, of course, an LFD.”Swyx [00:31:51]: I’ve never heard of LFDs.Chai [00:31:53]: It’s a bridge for sure.Janie [00:31:55]: I got through three days and then I had to ask someone.Janie [00:31:58]: I thought it was just me that didn’t knowJanie [00:32:01]: It’s our internal process.Swyx [00:32:02]: But look at the data as a meme in ML, ‘cause you tend to not look at it. You just want to look at number go up.Chai [00:32:06]: Exactly.Swyx [00:32:07]: But yes.Janie [00:32:08]: But so, we make sure we look at the data and then as we think about all of the components of good output, we, one, create LLM judges across all of these and we make sure with annotated data and either internal or external evaluators, we feel like these judges are calibrated. And then depending on the stakes, we also work with in-house and third-party evaluators across all of these before we ship any big change. And the goal is, in terms of evolution, how do you go from this process taking months, down to weeks, down to days? Some of it is, a true science and ML problem. A lot of it’s also just, hard operational work. Have you planned ahead in terms of what you need? Have you really optimized the capacity that you need across all of the different specialties you need? Have you gotten a really good sense of which third parties are great to work with for what use cases? This takes a lot of domain, expertise and, lots of mistakes and errors in figuring that out. And so as much of it is an ML problem, so much of it has also been operational gains that are hugely important, where domain-specific expertise is everything.Specialty-Level Evaluation and Progressive RolloutsJacob [00:33:23]: But it’s funny, ‘cause I feel like people talk about healthcare like it’s one giant market and the reality isJacob [00:33:26]: It’s, dozens and dozens of sub-markets. And so it feels like in your evals you have to build that up across the board, probably.Swyx [00:33:34]: And is specialization the primary cardinality at... That’s the word that comes to mind.Janie [00:33:40]: Sometimes, depending on the product or the use case. And so if we’re making a note improvement or feature for a particular specialty, definitely but we have products that are for nurses. We have products that, are really aimed at making the document or the output a lot more billable. And so we’ll want to work with coding teams and not necessary clinicians. And so likeJacob [00:34:05]: Coding meaning healthcare coding.Janie [00:34:06]: Yes. Yes.Jacob [00:34:07]: NotChai [00:34:07]: Yes. I see you.Swyx [00:34:07]: Other kinds.Janie [00:34:09]: But is this output proportional to the work that was delivered? Is there sufficient documentation to justify the amount that a health system may end up charging? And so, specialty sometimes but also domain, very different across all of the different products that we’re working for. And building out that network is, not easy and is where a lot of our operational investments have gone into.Chai [00:34:35]: And I view a lot of analogies to self-driving cars here, where, part of it is we really want progressive rollout of features to test in the real world is this useful? Is this going to work? One big difference compared to past lives is before I’d build a product, maybe I’d alpha it and then I’d like GA it the next week, ‘cause I’m “Go, move fast, ship,” and whatnot. But the mentality is like you... I want to make contact with the reality as quick as possible but I want a progressive rollout. Because as much as I get as large of an offline eval set, I want the distribution of that to match real-life distribution. And over time, by rolling out early, similar to Waymo has a tagline, “The world’s most experienced driver,” another thing that can, at least linearly increase for us is, both the size of our evaluation offline and online, that and it all feeds back.Janie [00:35:25]: Something that’s been earned over time, speaking of evolution, is just the trust we’ve gotten with customers. Historically, a lot of these health systems, when they bring on new vendors, their release cycles are quarters, sometimes twice a year. We’ve gotten our customers onto monthly release cycles, which is pretty fast for health systems but what is more exciting over the last, call it, few quarters, has been, a subset of our customers have said, “We want to innovate with you. We trust you,” and we have a pretty, decent chunk of our customers who say, “We’ll develop with you outside of these monthly release cycles. We have a higher tolerance. We know that the stakes are very high but we want to be the first ones using these products, giving you feedback.” And so for a pretty substantial set of our customers, we’ve been able to convince them to be able to ship, in this gradual way before GA. Something we talk about a lot internally is, trust is earned in drops, earned in buckets and so we still can’t do what I used to do when I worked at Loom. We had 30 million users. I’d just be, rolling out experiments left and. The bar is still quite high for iterative rollout but because of the trust we’ve earned, we’re able to learn at pretty high volume very quickly.Privacy, HIPAA, and De-IdentificationSwyx [00:36:45]: Your scale is still pretty huge.Swyx [00:36:47]: One thing I want to... We were going to go into scale? In a sec. One thing I wanted to call up, follow up on evals, which, again, just coming from a generalist engineer point of view, just thinking through what would people be scared of in doing this, the privacy and HIPAAJacob [00:37:00]: Elements of this. I have zero experience in that. What do you have to do? What is surprisingly not that bad?Chai [00:37:06]: So one thing that’s really important here from a compliance perspective is very much that any of the data we use needs to be de-identified, any real-world data we use as a basis of online eval sets we’re learning from. And so you have to — And there’s, very clear, government guidelines, what counts as PHI. And so we’ve even have built models that can take, for example, a clinical transcript and remove all the key PHI indicators and so you have a scrubbed/de-identified version. And then once you... And so one thing that’s important is first you’ve got to get confidence in that model in the first place? And prove that out. Because, now you have, multiple probabilistic systems on top of each other.Chai [00:37:46]: But once you have that, then you can train on it use it for evaluation and so forth, provided one of the cool things also that you can do from a business side is the right data contracting as well with your partners.Jacob [00:37:57]: Is the anonymization one way? Once it’s done, you cannot undo it? Or is there someoneChai [00:38:01]: YesJacob [00:38:02]: Who holds the master key that can... Yeah, okay. So it’s one way.Chai [00:38:05]: It’s one way. Yeah.Jacob [00:38:06]: That’s how it works. I just wanted to... Because, there’s a lot of this, learning from feedback and everything that, you would want to debug more but you can’t because you just physically don’t allow yourself to.Janie [00:38:17]: Some of it’s also written in our customer contracts in terms of who can or can’t access PHI data, how long do we retain it,Jacob [00:38:27]: Very goodJanie [00:38:27]: Before it gets de-identified. And so we have a pretty high bar for who can access that PHI data, just to make sure that we always respect our customer data and privacy. But that’s something that we partner with our customers on too, to make sure that as we want full, as close to precision as possible in that qualityJanie [00:38:48]: We can still use it.Jacob [00:38:50]: But it’ll be fascinating to see how that space evolves? Because you think about, I used to work at a company that, did a lot of healthcare data in the cancer space and if you asked, the average cancer patient, “Hey, do you want people, do you want other patients to be able to learn-”Chai [00:39:03]: Take it.Jacob [00:39:03]: “... Learn from your experience?”Chai [00:39:04]: Take it all.Jacob [00:39:05]: They’re “Please.”Jacob [00:39:06]: “I’d love, nothing more than for other people to be able to learn fromJacob [00:39:10]: The experience that I had.” And so in the past it was a lot harder to do that learning. But with this technology, that might really be practical and so it’ll be fascinating to see how that continues to evolve.Chai [00:39:21]: There’s so much in our data set of 100 million conversations.Chai [00:39:26]: You can imagine things like insights that you can give to the clinician. How could you, oh, how could you have reacted to this? In coaching or insights around, which treatments are effective or, like... Because you have this, again, this data source that was never captured before but that’s, where, intuition or experience is created from, going back to this idea that the conversation is the agent of truth.Operating at Scale: Reliability, Cost, and Token EfficiencyJacob [00:39:46]: Back to the 100 million conversations, I feel like you have this insane scale that maybe only a few other AI app companies have and everyone else dreams of. So not everyone has had to confront this yet but maybe just talk about some of the challenges of operating at that scale and what, our listeners have to look forward to if they ever get to this level of scale.Chai [00:40:05]: At large and larger in scale, so of course there’s a general, infrastructure reliability. When you... In any given startup, you’re building the plane while it’s flying. So there’s some notion of that. But what gets interesting on the AI and ML side for sure is this, as you get at more and more scale, so one, you have the data to first and foremost do this. But, you start thinking about costs or infrastructure in a whole different way at scale versus, a prototype.Chai [00:40:34]: You can use the most expensive model, you can burn as many tokens as you want but when you’re doing 100 million conversationsJacob [00:40:41]: Token max on leaderboards are less upsetting than that context.Chai [00:40:45]: . When you’re doing that and so that comes for we have the data and we also have the team that’s able to post-train based on this and you can optimize for efficiency, especially in areas where you believe that maybe a lot of the quality headroom is less so and you don’t expect the other off-the-shelf models to go that way, such that you want to do, efficiency maximization, in terms of compute and tokens.Jacob [00:41:08]: I feel like you guys live in the future in some way where most use cases today are really just in use case discovery mode, where it’s “God, I really hope I can find something that can get to scale,” and so you’re always going to use the most powerful model. And then the few things that do get to this level of scale, you start to do those optimizations.Chai [00:41:22]: It’s a natural trajectory where it’s like zero-to-one, we’re not talking about any of these optimizations.Chai [00:41:26]: But when maybe we’re in the one-to-100 or so forth, then we’re in optimization mode and, what works out really well is you’ve got all this data from zero-to-one that lets you do this.What Comes Next: The Conversation as the Shared Healthcare PlatformJacob [00:41:36]: That’s fascinating. I feel like one thing that’s so interesting about the Abridge footprint is that you’re in the doctor-patient visit in real-time. I always like to say, there’s like probably 50 years’ worth of product you could build on top of that. What gets each of you, I don’t know, what are you most excited about building, either in the short term or medium term or even, long down the line?Janie [00:41:53]: Something that I get really excited about is that the same conversation can serve so many stakeholders. If you think about the conversation, a doctor needs to know what is the documentation, how do I make sure that this fully represent the care I gave? A patient needs to know, “What the heck just happened? This was really overwhelming. What are my next steps?” A payer needs to know, was this the proper and appropriate care given? A pharma company might want to know why isn’t this drug being properly used or is there a good candidate for this clinical trial that I’m about to run? And where I get excited is that our product and our platform and our infrastructure can be the same product across all of those things and start to what’s today, separate, very expensive, complex systems that serve each one of these stakeholders in very different ways, start to collapse all of that into a singular platform that enables not just more efficiency across the board but also better outcomes for everyone. And, all of us experience healthcare in probably very painful ways and knowing that there is a world in which we can simplify a lot is really exciting to me and it all starts with the conversation.Chai [00:43:15]: It’s interesting. Of it very similar to going back to the KPIs that any AI product cares about. How do you increase quality of care? How do you reduce latency to care? And how do you reduce costs? Which is a huge, in healthcareJacob [00:43:28]: They call it the triple aim in healthcare.Chai [00:43:30]: But very similar to building AI products and the thing that really excites me is when we talk about that latency piece, we talked about one example earlier of prior authorization, can you reduce the latency to care? But you can imagine so much more. Oh, as soon as the lab value gets updated, do you have like a background agent that, kicks off and uses all the context to be “Oh, hey, the patient should do this next,” for example. And of flagging that to the clinician who’s always in the loop but reducing that latency, to care. And then you can imagine this is much further down the road but it’s like even connecting that to the direct patient and the consumer. And so how can you, how can you build a bridge to all of these things?EHR Partnerships and the Clinical Intelligence LayerJacob [00:44:10]: Very cool. The connections piece is just an ever-growing thing. And one of the key partners is the EHR and I wonder what that relationship is like. Will they, look at this as, something that is valuable enough that they want to own someday?Janie [00:44:29]: Our partnerships with the EHR is, we know that we have to be extremely close partners with all the EHRs who we partner with. Being able to not only pull and push all of the data into the right places is, not only table stakes, if we can’t do that, health systems don’t want to use us. The second and the reality of today is clinicians spend a lot of their days in the EHR. So much of what allowed us to win in the largest health systems was pretty direct and, very close partnerships with some of the largest electronic health records that allowed us to pull and push data with APIs that weren’t ready out of the box. And clinicians want to save clicks. Anytime we introduce a new product that, adds two clicks for them in their day, they’re “We’re not going to use it.”Janie [00:45:21]: They have 15-minute back-to-back appointments with their patients. They’re spending, hours during pajama time doing documentation. Every second and every minute counts and so we really think about being deeply integrated into the EHR as also table stakes to getting real usage and adoption. And anything that we build or introduce, we really talk about earn the right internally a lot, which is we have to provide so much value or save so much time that people will use us. But those are the two things that are close to us, is we know that the product won’t be used unless it is deeply interoperable.Chai [00:46:01]: And strategically, to your point, it’s like what does EHR want to own versus us? EHRs are really focused on the clinical workflows and so forth but some of the things that we’re talking about here, I do these traditionally are outside of the domain where it’s oh, connecting pairs and providers together with provider policies or the clinical trial matching, as Janie brought up. And so these are, entirely — we position ourselves as building this entirely new intelligence, clinical intelligence layer across, again, providers, pharma and, payers.Chai [00:46:33]: And so that’s a it’s a whole different ballgame that we try to playChai [00:46:36]: In combination with them.Jacob [00:46:37]: But it’s like a different layer of scope.Healthcare AI Regulation, Technical Depth, and What Changed Their MindsJacob [00:46:39]: I’m curious, you are both relatively newcomers to healthcare. People have these, there’s lots of futuristic healthcare AI takes of “Oh, everything will look different.”, now that you’ve been in healthcare for a bit, you live at the edge of AI, what have you, changed your mind on around this, as you think about what healthcare looks like in ten, 20 years? Any updates to your mental model from the time being close to the problems?Chai [00:47:02]: One thing that IChai [00:47:04]: Was hesitant about before and it’s a common thing when I’m trying to recruit engineers that people ask me around, is definitely oh, healthcare, heavily regulated space. And it is, rightfully so. You want to keep, the patients at the end of the day safe. But one of the interesting things that, is a that surprised me how much it is coming to the company is there’s a lot of really favorable regulatory tailwinds as well. Where you think about, government really wants interoperability between all these systems that we talked about and so agents can access this information. The government just in January, the FDA released updated guidance on clinical decision support, what I work on in such a way that they used to have guidance from like 2022 that required you to have, mention all these options and do all these other things but it’s a very forward and forward-looking way. And so for me, what’s been really cool to work on is this, there’s this very special moment both in AI in general, we all know that but there’s a special moment also regulatory in healthcare as well.Janie [00:48:05]: One thing I would call out is for the very reasons things are higher stakes or, potentially considered more difficult in healthcare, it’s where some of the hardest AI problems will get solved first, just because the bar is so high. When I first joined, I was “Oh, this is where we’ll be on the tail end of where, all of the AI innovation will be able to be applied.” But when you think about, zero error evals or multi-step workflows that have really low tolerance, a lot of the innovation will happen here just because we have to or else we can’t ship.Jacob [00:48:42]: ‘Cause like in other domains, you’d much rather just solve the 80%-is-good-enough problems firstJanie [00:48:46]: 80/20 doesn’t work hereChai [00:48:48]: And building off that, traditionally, there was a bit of stigma that, oh, healthcare companies are not that interesting from a technical perspective or I’ve seen that or faced that myself. But these are really hard and fun problems from a pure technical perspective beyond just the impact. How do you bring the latency of this thing down and make it really high-quality?Reducing Latency: Clinical Workflows, Agents, and Implementation RealityJacob [00:49:07]: How do you bring the latency of things down?Chai [00:49:10]: Yeah. Yeah. Yeah. So okay, let’s answer the latency question. And maybe hopefully not too redundant with some of the things I’ve said earlier but some part of it is with any latency, you have to like what is, what is really your bottleneck. In a lot of workflows, it’s sometimes it’s the model itself. And so that’s where like our data flywheel, our post-training team and so forth come in so that can you make the models far more efficient. So that’s one aspect of latency. But there’s whole other aspects of latency where it’s okay, on top of that, if you use a constellation of different models, can you use — can you first use like a — it’s like thinking fast and slow. Can you use a cheap, fast model that triages and hands it off to a larger model where you get more intelligence and so forth and so all theseChai [00:49:56]: Clever tricks to make it work.Chai [00:49:58]: And by the way, we are totally — we also realize that the parameter frontier is changing and so these tricks will — may not get us to where we want to be in five years but we need to if we want to build a useful product right now.Jacob [00:50:11]: Should we go to the quick-fire or you want to ask more about Abridge? We can stuff everything that’s not Abridge into the quick-fireSwyx [00:50:16]: I don’t mind. I was — I feel like Janie was on the topic of more long tail stuff, which isSwyx [00:50:21]: Not the eighty/twenty thing and that really matters. And I’ll —, if you have any tips or cool stories or just general approaches that have worked for you that’s interesting to dig into.Janie [00:50:32]: One of them is even just how we staff our teams looks different than a traditional software engineering team, I’d say.Swyx [00:50:40]: Let’s go.Clinician Scientists, Edge Cases, and Evals at ScaleJanie [00:50:41]: We have a bunch of folks with different roles who are clinicians and so we have this role called the clinician scientist and I heard one of our leaders refer to them as mutants recently. But they are people who’ve had clinical backgrounds, so MDs typically, who are also deeply technical, somewhere, on the spectrum of like a full stack engineer all the way to like extremely scrappy prompter. But having each of these people embedded within our teams instantly raises the bar for everything that we build because not only are they determining, is this product clinically useful but they’re deeply embedded in our whole evals process. And so when we talk about LFDs, when we talk about what is our actual evaluation criteria, you don’t want Chai or me creating what those are because we don’t have clinical background. But is probably unique to Abridge but has been game changing. And when you think about where the puck is going, you have people build with clinical backgrounds who are technical and where AI tools are going, they just becomeJanie [00:51:53]: More and more, critical and like the killers of the team. And so that’s one. And then the second is just the scale at which we do evals to catch that long tail up front before anything ever gets into production is something that we’ve pretty much like really started to fine-tune, both from a scale but when do we know we need to get several hundred versus several thousand offline responses, what helps us make that quick decision and make this less of an art and as much of a science as possible. But that’s also been something we’ve had to tune over time.Swyx [00:52:27]: And you have partners who opted in to give you those evals.Janie [00:52:31]: So we work either internally or with third-party for offline evals and then we have customers who also agree to give us, whether it’s like thumbs up, thumbs down to like choose this or that, a lot of data to get us to what is as close to fully confident as possible.Swyx [00:52:51]: The term that comes to mind isSwyx [00:52:53]: Like active learning on things where you’re weak. I feel like it’s a lost artSwyx [00:52:58]: Is a lot of the polish that comes into doing something like this.Janie [00:53:02]: Really.Chai [00:53:03]: Hundred percent.Lessons from Glean: Technical Foundations and AI App InfrastructureJacob [00:53:04]: Maybe, on a totally unrelated note, Chai, you had a very, storied run at Glean before heading over to Abridge. And so, I’m curious like that — it’s was one of the early AI app success stories. As reflecting back on that experience, what do you think Glean got most, maybe most wrong? Yeah, curious for your reflections.Chai [00:53:24]: The... I attribute Glean’s success really to very strong technical foundations, that have really stood the test of time. And so it started with — it started with a known problem and like finding information where work is hard. The best technology at the time was to build really high-quality search. A lot of times enterprise search startups failed because the quality wasn’t great enough. But the learning that people took away from that is, oh, enterprise search is not good enough. And so like quality, really changes the game of like if something can be useful or not. It’s like similarly like people may have taken it that way, “Oh, Alexa voice assistants are not that useful.” But when you have quality, things can change the game. And so Glean’s early foundations, by bringing people who had built search at Google, the best place to have ever built search and being really creative and having a very concrete problem to solve but with the right technical backgrounds, laid the foundation for all of its success for the many years to come. And what’s interesting is always figuring out, hey, how does a company adapt in this, as we all know and we’ve talked many times, in this changing landscape. And so for Glean, how do you put this context layer to the use, has been the thing that we’ve really, the last few years, has been the fun from the challenge. That where like you could say, that’s been the opportunity for the company as well as the challenge as well.Jacob [00:54:46]: Definitely a competitive market. It feels like one at the epicenter of the foundation models and, the hyperscalers, so it’ll be interesting to see how it all plays out.Chai [00:54:55]: When you think about can you build something that helps everyone at knowledge work as well is a massive opportunity.Jacob [00:55:02]: Always my mental model is like there’s a few markets that are like the foundation model companies have to win or are like big enough to go after and It’s probably like consumer code and that.Jacob [00:55:11]: And so it would definitely be interesting to see how it plays out. One thing we often think about on the investing side is, the pace of progress in models changes so fast and so the building patterns adjust so fast. And it’s always hard to figure out, what pieces of the way people are building today, the infrastructure tools they use, are going to prove persistent versus, okay, six months later we’re doing something completely different becauseJacob [00:55:31]: Models have improved. I’m curious of the stuff you use today, how do you think about the pieces of AI infrastructure software that feel a little bit more persistent?Chai [00:55:40]: So generally, if you take the thesis that the models are going to be more and more agentic, before we had to build a lot of scaffolding around that. In previous gigs, I’ve — we’ve effectively, we made our own DSL effectively and you can view the because the models were not capable enough, so you needed to simplify things. And you can view it similar to other agent frameworks. But over time, if the models become more and more agentic and can use the similar tools that we already have, where it’s like computer use, writing code itself in sandbox, much more around, far more about, what are the right context layers and the tools to give agents. And then the other things that I think about are how do you really build truly event-driven real-time systems and especially at Abridge, again, where you’re doing something real-time in the conversation. And so there’s a lot of event-driven technology. And by the way, stuff that we’ve always used in the past, whether it’s Kafka, Temporal, Sockets and so forth, how do you bring that together is also durable. Or thinking about patterns in which humans collaborated with each other on Google Docs. How do you think about like CRDT and so forth when you have conflicts, when you have multi-agent systems? So all these things that we’ve built for — the things we’ve built for humans are the things that are going to be, continue to be durable.Jacob [00:56:55]: . Just with like 1,000 times more the scale of agents running at them instead.Jacob [00:56:58]: They’re going to really work.Chai [00:56:58]: So make sure that they scale, of course and fast and whatnot. Without a doubt, yes.How Agentic Does Abridge Become?Swyx [00:57:03]: Does Abridge become more agentic over time than, what is the next more agentic version of that look like?Swyx [00:57:10]: ‘Cause you’re already pretty proactive it’s, with like the notifications.Chai [00:57:15]: And so I view that as like a piece of being agentic but I also view it as maybe some of the things we mentioned before, oh, reacting to labs or, doing work in the background or doingChai [00:57:25]: Even more capabilities on behalf of the clinician, who we believe has a super important role to play as, in terms of patient connection and so forth.What They Changed Their Minds On: PRDs, Prototypes, and JudgmentJacob [00:57:34]: I’m curious for both of you, what’s one thing you’ve changed your mind on in AI in the past year?Janie [00:57:39]: The one I flopped on and this is much more product specific, is, probably the hotter take is that prototypes are the end all be all and that PRDs are dead.Janie [00:57:51]: We’ve tried switching and... We continue to evolve the way product is developed and, the products that we’re building are extremely complicated and nuanced and it is very difficult for a prototype to capture the full complexity of what can we or can’t we do with this data. What and who... Is this the actual right problem to be solving for in a world where software has become so cheap? Yes, this is a cool looking prototype but should we be spending any of our precious hours here? If so, why? And how does this deepen our moat in a world of decreasing moats? Does this require custom implementation from our customer to use? None of that gets captured in a prototype and so we’ve, we’re continuously evolving the way that we develop product here but even if not written in the same traditional ways as it was two years ago, as a team we’ve gotten pretty, high conviction that in a world of so much noise, crisp written clarity is more important than ever. It might now live in a markdown file that more teams and systems can use as context but that’s probably one that is much moreSwyx [00:59:06]: So you’reJanie [00:59:06]: Function specific to me.Jacob [00:59:08]: I love that.Swyx [00:59:09]: You’re disagreeing with the consensusJanie [00:59:10]: That PRDs are deadSwyx [00:59:11]: That’s great, yeah.Swyx [00:59:12]: So you are likeJanie [00:59:14]: That prototypes are the thing.Janie [00:59:14]: We should partner with AI to create great documentation but first, probably most important, is strategically answering like why is this problem the one our company and our product should solve? What happens if the next 20 competitors build this? Why, what is our right to win and does this help us differentiate in any way or are we just adding noise? It’s importantSwyx [00:59:39]: That’s a high bar. I don’t know if I could answer thatSwyx [00:59:41]: Because a lot of the times the answer is let’s do it first.Janie [00:59:44]: And when the cost of doing it first is so expensive, we just talked through the process of getting something out to customers. You need to have a higher bar for as a business, should we invest here? And as all of our roles evolve, one of product or like all of our jobs become should we do this thing? And that’s something that is worth the time spending up front on. And then, as you think about prototypes, it’s still really valuable to quickly show, “Here are the 20 ways we could do it. Clinician, I would love your feedback, which one resonates more?” Or as you get into deeper fidelity, you can also make the prototypes deeper fidelity and like get it as close to production ready as possible. But, beyond that, to get it out to customers, there’s a lot of implementation details, security compliance, edge cases, things that never get caught in a prototype that need to be written out somewhere. And so they look different but still more important than ever.Jacob [01:00:52]: It’s interesting. I imagine a lot of that also is like given the context of the stage that Abridge is at.Jacob [01:00:58]: I feel like for so many early stage companies, it’s just a desperate race to... You throw like 30 things at the wall, you’re “Please, something just like resonate with my end buyer.” and, you find something and that’s, why the prototype first approach is so powerful. But for you all, it’s like anything you’re going to do is across 200 systems, there’s like a whole, implementation change management side of things and you get a few big bullets to fire at at what you want those systems to do. And so being really thoughtful about that.Chai [01:01:25]: It makes a ton of sense and maybe the prototype first takes will all grow into your view of the world when they’re a bit more scaled.Janie [01:01:32]: The weekend demo versus it works at the largest health systems is, a massive gap. I don’t think it means we can’t go fast. This is the fastest I’ve built in my career, right now and theChai [01:01:47]: Compared to Loom?Janie [01:01:48]: From a the complexity and the scale of the products we’re trying to build and the problems we’re trying to solve, I’d say, yes, maybe I, updated a flow or, shipped a new feature pretty quickly but if you think about some of the products we’re building, we’re trying to collapse prior authorization, things that used to take 45 days across maybe 20 different touch points into one. I’m building faster than I ever have and so the thoughtfulness allows us just to go fast at the right things. It sounds contradictory but thatChai [01:02:28]: NoJanie [01:02:28]: Thought up frontChai [01:02:28]: Go slow to go fast.Janie [01:02:29]: Exactly.Chai [01:02:30]: It’s interesting. In the... When a lot of things are changing and in the AI discourse, sometimes we lose sight of things that always stood the test of time. Judgment and clarity always matters. As an engineer, sometimes I don’t want a prototype. I would like to see... I want the written, the clarity that comes from writing and then we build that. And again, for some things, of course, where it’s a small thing, yeah, just ship the prototype. That’s why, don’t sweat the details. So the interesting thing, the nuance that gets lost sometimes in discussion is, sometimes we need to recalibrate our judgment for sure because the costs and gains have changed but that doesn’t mean we go all the way on one spectrum or the other.AI Tools, Claude Code, and Closing NotesChai [01:03:11]: Outside of your specific tool, I always like to ask this question, any other AI tools that you guys are enjoying?Chai [01:03:16]: Claude Code. But, that feels, too basic of an answer.Chai [01:03:20]: Is all of Abridge engineering very built on Claude Code?Chai [01:03:23]: Yes.Chai [01:03:23]: Wow.Chai [01:03:23]: Very much so. I won’tChai [01:03:26]: We also have Cursor as well.Chai [01:03:28]: Many of theChai [01:03:29]: I’m just checking the boxes here.Chai [01:03:30]: Many of the tools available but it’s like you look at just earlier in the day, you see an engineer’s screen. You see, six different, Claudes running at it. Sometimes the same person, I’ve seen them on the sofa now with the remote control as well on the mobile. But, very much so. One of the interesting things for me is, as a relatively new person to companies, Claude Code helps me onboard much faster or any of these AI code... And, I feel like I learn so much. I do love the memes of “Claude’s going to do this.” So, I’d like to see Claude,Chai [01:04:00]: The venture equivalent is “I’d like to see Claude go do a company at a billion dollars pre-revenue.” LikeWhere to Learn More: Whitepapers, Research, and AbridgeHQChai [01:04:06]: We always like to leave the last word in these conversations to you both. And so, any place you want to point folks where they can go learn more about Abridge, the work you’re doing, any of the research you guys have done, whatever. The floor is yours.Chai [01:04:18]: A couple places. If you... On our Abridge website, we have a lot of our whitepapers where we’ve done a lot of interesting work, such as, reducing a hallucination objection.Chai [01:04:27]: Very well-presented, by the way. I liked it. Yeah.Chai [01:04:29]: Thank you. Our science team rigorously defined what is the problem. And one of the interesting things, by the way, at Abridge, is we have multiple, stats professors on staff as well. So in that specific whitepaper, Michael Oberst, who’s a professor at JHU. And so we have multiple... And from that comes, very high rigor and then also our taste for design comes from really good presentation. But setting that aside and we’re going to have many more technical topics there, please follow our Twitter account as well, AbridgeHQ. And then the other thing I’ll plug a little is, we have a open house of diving deep into AI and healthcare coming up with Andreessen Horowitz.Chai [01:05:07]: Amazing. Well, thanks so much.Janie [01:05:09]: Thanks.Chai [01:05:09]: This was super fun.Chai [01:05:10]: Thanks so much.Chai [01:05:10]: Thank you. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Some people are going crazy over GPT 5.5. Some people. This is the story of the Jagged Frontier. People who use AI to write emails or even code implementation work find the lift moderate whereas people pushing the limits of the model are figuring out that the limits just moved outwards.Alex Lupsaska has been tracking this limit for a year and a half now. “When GPT5 came out, it was able to reproduce one of my best papers (that took a very long time to come up with) in 30 minutes.”But Alex also notes that this shift was mostly invisible.I remember when GPT-5 came out… on Twitter, the reception was lukewarm. A lot of people were like, well, we expected a lot more, and it’s not better at writing email. And I remember thinking, well, okay, GPT-3 could write email. How much better can it get at writing email? That’s not the point. But at the science frontier, the capabilities were really taking off.We walk through his paper and more with him in today’s Science pod! Watch here.The “Oscar for physics”Alex made an early splash in his career with breakthroughs in our understanding of black holes. He’s also known for Black Hole Explorer and an iPhone app that makes visualizing black holes fun and interactive to regular audiences. Alex won the 2024 New Horizons in Fundamental Physics Breakthrough Prize. Known as the “Oscar for physics” this is arguably the most prestigious prize an early stage theoretical physicist can win.Alex first saw promise for AI in theoretical physics after he asked o3 for help on his research. In the podcast, Alex recalls asking GPT for help with a calculation that would have taken days, and getting a result in eleven minutes. He immediately recognized how impactful AI would be for his work even as though his physicist colleagues and the larger community gave it a lukewarm or skeptical reception.The Move 37 Moment for AI x PhysicsGPT-5 had just been released, and Alex tried asking it to solve a problem in a just published paper. GPT-5 said no answer. But Mark Chen, CRO of OpenAI, pushed a bit harder, and had Alex prime the model with a textbook warmup problem, which it easily solved. After using this “priming” trick, GPT-5 was able to reproduce his full result in eleven minutes (yes, the paper was released after the model’s training cutoff).“This changes everything.” Alex notes that we seem to be on the edge of a massive change in theoretical physics reasoning. A year prior LLMs were just starting do correct math. Now ChatGPT could reproduce his hardest paper in the time it takes to get a coffee.Alex was on sabbatical at Vanderbilt, and he joined OpenAI to start pushing the boundary of AI’s ability to accelerate physics.“AI solved the problem before the plane landed”Alex began to put GPT through it’s paces, reaching out to colleagues for problems they were stuck on. His old PhD advisor (Prof. Andrew Storminger at Harvard) had an insidght about certain physical quantities known as “single-minus gluon tree amplitudes”. In certain cases, these amplitudes may be non-zero when previously shown to always vanish. The team pushed this intuition forward, and came up with a formula for these quantities that appeared nonzero, but which was otherwise completely intractable. Spending over a year on this problem, no real progress was made.Prof. Storminger planned to visit OpenAI to work on the problem the week after the initial conversation started. In that one week ChatGPT fully solved the problem, as Alex recalled, before Prof. Storminger’s plane even landed.What was interesting is not only that ChatGPT solved this problem, but how it solved it. The model quickly realized found a limiting case (known as the “half-collinear regime”), that in hindsight has a nice intuitive explanation. Taking this limit, the gnarly results collapsed down to a simple and intuitive formula!The last step was to prove this intuitive formula. The team started with a fresh session, gave a prompt with the context of what they previously learned, and let the model loose. Not only was ChatGPT able to reproduce the previous result, it was able to prove it using a technique unknown to the authors!The Vibe Physics momentWith a concrete success in the bag, the team asked if they could generate new physics from scratch using ChatGPT. They took on what they felt to be a harder problem, looking at the graviton, a proposed particle that should appear when one combines gravity and quantum mechanics. They wrote up a simple prompt asking ChatGPT to perform the same research as the gluon paper but instead for gravitons. And then hit go!What came next was truly “vibe physics”, with ChatGPT pushing out 110 pages of novel physics, new calculations, and novel techniques. This was over the course of a day, with most interactions the familiar following the now familiar pattern for anyone who uses a coding agent:GPT: Here's your . Would you like me to do ? Alex: Yes, please do! GPT: And for those who look deeply, this really was not just a direct 1-1 mapping between gluons and gravitons. ChatGPT imported new techniques that were necessary due to the nature of gravitons, and used them flawlessly.They spent the next three weeks verifying all the results. And voila! A new paper featuring novel results in quantum gravity, generated in less than three days total. Truly a “Feel the AGI moment”.For those interested, there’s a blog post with the full transcript from initial prompt to final paper. Even if you know no physics, it’s crazy seeing pages of correct calculations fall out of simple prompts such as “Yes calculate outside of SD first. This is the first step.”Out-of-domain = new knowledgeThe thing that is qualitatively different between Vibe Physics and Vibe Coding is that Vibe Physics means actually extending the frontier of human knowledge. Looking at the Gluon and Graviton results, they seem in retrospect, like many results in physics and math, like natural extensions of what we already know. This is in fact part of what makes them beautiful. But this was a problem that stumped experts in the domain for a year. Although it does still have a bit of a recombinant flavor, this thing has never been done before.It may be that there are still large classes of problems that AI won’t do well on, and approaches that an AI might not think to take. This is the “taste” that everyone has been talking about. Alex told us that these capabilities, however, allow him to explore many possible avenues in order to map out much more ambitious problems to tackle. With AI able to output results basically as fast as we can conceive and validate them, the scope of what one theorist can hope to achieve has just gotten a lot, lot bigger. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
From building Applied Intuition from YC-era autonomy tooling into a $15B physical AI company, Qasar Younis and Peter Ludwig have spent the last decade living through the full arc of autonomy: from simulation and data infrastructure for robotaxi companies, to operating systems for safety-critical machines, to deploying AI onto cars, trucks, mining equipment, construction vehicles, agriculture, defense systems, and driverless L4 trucks running in Japan today. They join us to explain why “physical AI” is not just LLMs on wheels, why the real bottleneck is no longer model intelligence but deployment onto constrained hardware, and why the future of autonomy may look less like one-off demos and more like Android for every moving machine.We discuss:* Applied Intuition’s mission: building physical AI for a safer, more prosperous world, powering cars, trucks, construction and mining equipment, agriculture, defense, and other moving machines* Why physical AI is different from screen-based AI: learned systems can make mistakes in chat or coding, but safety-critical machines like driverless trucks, autonomous vehicles, and robots need much higher reliability* The evolution from autonomy tooling to a broad physical AI platform: starting with simulation and data infrastructure for robotaxi companies, then expanding into 30+ products across simulation, operating systems, autonomy, and AI models* Why tooling companies came back into fashion: Qasar on why developer tooling looked unfashionable in 2016, why Applied Intuition still bet on it, and how the AI boom made workflows and tools central again* The three core buckets of Applied Intuition’s technology: simulation and RL infrastructure, true operating systems for vehicles and machines, and fundamental AI models for autonomy and world understanding* Why vehicles need a real AI operating system: real-time control, sensor streaming, latency, memory management, fail-safes, reliable updates, and why “bricking a car” is much worse than bricking an iPad* Physical machines as “phones before Android and iOS”: Peter explains why today’s vehicle and machine software stack is fragmented across many operating systems, and why Applied Intuition wants to consolidate the platform layer* Coding agents inside Applied Intuition: Cursor, Claude Code, internal adoption leaderboards, and how AI tools are changing engineering workflows even in embedded systems and safety-critical software* Verification and validation for physical AI: why evals get harder as models improve, how end-to-end autonomy changes simulation requirements, and why neural simulation has to be fast and cheap enough to make RL practical* From deterministic tests to statistical safety: why autonomy validation is shifting from binary pass/fail requirements toward “how many nines” of reliability and mean time between failures* Cruise, Waymo, and public trust: Qasar and Peter discuss why autonomy failures are not just technical issues, how companies interact with regulators, and why Waymo is setting a high bar for the industry* Simulation vs. reality: why no simulator perfectly represents the real world, how sim-to-real validation works, and why real-world testing will never disappear* World models for physical AI: hydroplaning, construction equipment, visual cues, cause-and-effect learning, and where world models help versus where they are not enough* Onboard vs. offboard AI: why data-center models can be huge and slow, but onboard vehicle models need millisecond-level latency, low power, small size, and distillation-like efficiency* Why physical AI is not constrained by model intelligence alone: the hard part is deploying models onto real hardware, under safety, latency, power, cost, and reliability constraints* Legacy autonomy vs. intelligent autonomy: RTK GPS in mining and agriculture, why hand-coded path-following worked for decades, and why modern systems need perception and dynamic intelligence* Planning for physical systems: how “plan mode” applies to robotaxis, mining, defense, and multi-step physical tasks where actions change the state of the world* Why robotics demos are not production: the brittle last 1%, humanoid reliability, DARPA Grand Challenge-style prize policy, and the advanced engineering gap between research and deployment* Applied Intuition’s hard-earned lessons: after nearly a decade, Peter says they can look at a robotics demo and predict the next 20 problems the company will hit* Qasar’s advice to founders: constrain the commercial problem, avoid copying mature-company strategies too early, and remember that compounding technology only matters if you survive long enough to see it compound* Why 2014 YC advice may not apply in 2026: capital markets, AI company dynamics, and the difference between building in stealth with a deep network versus building as a new founder today* What Applied is hiring for: operating systems, autonomy, dev tooling, model performance, evals, safety-critical systems, hardware/software boundaries, and engineers with deep curiosity about how things workApplied Intuition:* YouTube: https://www.youtube.com/@AppliedIntuitionInc* X: https://x.com/AppliedInt* LinkedIn: https://www.linkedin.com/company/applied-intuition-incQasar Younis:* X: https://x.com/qasar* LinkedIn: https://www.linkedin.com/in/qasar/Peter Ludwig:* LinkedIn: https://www.linkedin.com/in/peterwludwig/Timestamps00:00:00 Introduction: Applied Intuition, Physical AI, and 10 Years of Building00:01:37 Physical AI vs. Screen AI: Why Safety-Critical Changes Everything00:02:51 The Origin Story: Tooling, YC, and the Scale AI Comparison00:05:41 The Three Buckets: Simulation, Operating Systems, and Autonomy Models00:11:10 Hardware, Sensors, and the LiDAR Question00:14:26 The Operating System Layer: Why Vehicles Are Like Pre-Android Phones00:19:13 Customers, Licensing, and the Better-Together Stack00:21:19 AI Coding Adoption: Cursor, Claude Code, and the Bimodal Engineer00:26:41 Verifiable Rewards, Evals, and Neural Simulation00:31:04 Statistical Validation, Regulators, and the Cruise Lesson00:40:25 World Models, Hydroplaning, and Cause-Effect Learning00:43:34 Onboard vs. Offboard: Latency, Embedded ML, and Distillation00:50:57 Plan Mode for Physical Systems and Next-Token Prediction Universally00:53:04 Productionization: The 20 Problems Every Robotics Demo Will Hit00:58:00 Founder Advice: Constraints, Compounding Tech, and Mature-Company Mimicry01:05:41 Hiring Philosophy: Hardware/Software Boundary and Engineering Mindset01:08:50 General Motors Institute, Education, and the Curiosity MindsetTranscriptIntroduction: Applied Intuition, Physical AI, and 10 Years of BuildingAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I’m joined by Swyx, editor of Latent Space.Swyx [00:00:10]: And today we’re very honored to have the founders of Applied Intuition, Qasar and Peter. Welcome.Qasar [00:00:17]: You guys really know how to turn it on to podcast mode. That was, you guys are real pros at this.Qasar [00:00:23]: They were just joking around right before this, and then they flipped it pretty quick.Alessio [00:00:29]: Oh, yeah, it’s good to have you guys. Maybe you just wanna introduce yourself so people know the voice on the mic and they’ll know what they’re hearing.Peter [00:00:33]: Oh, sure. Yeah, I’m Peter Ludwig. I’m the co-founder and CTO of Applied Intuition.Qasar [00:00:38]: And my name is Qasar Younis. I am the CEO and co-founder with Peter.Alessio [00:00:42]: Nice. Can you guys give the high-level overview of what Applied Intuition is? And I was reading through some of the Congress files, when you went out there, Peter, and eighteen of the top twenty global non-Chinese automakers, you two guys, you have customers in agriculture, defense, construction. I think most people have heard of Applied Intuition tied to YC when it was first started, and then you were kinda in stealth for a long time, so maybe just give people the high-level overview of what it is today, and then we’ll dive into the different pieces.Peter [00:01:10]: Yeah. So at Applied Intuition, our mission is to build physical AI for a safer, more prosperous world. And so we work on physical AI for all different types of moving systems, everything from cars to trucks to construction and mining equipment, to defense technologies. And we’re a true technology company, so we build and sell the technology, and we sell it to the companies that make the machines. We sell it to the government, really anyone that wants to buy a technology to make machines smart.Physical AI vs. Screen AI: Why Safety-Critical Changes EverythingQasar [00:01:38]: Yeah. And I think in the broader AI landscape, a lot of the focus, rightfully so in the last, three years has been on large language models, and so everything fits in a screen. Like, whether it’s code complete products or things like that. And what’s different about us is we’re deploying intelligence onto a lot of things that don’t have screens. they’re physical machines. There are sometimes screens within the cabin or for example of a car or a truck or something like that, but most of the value we provide is putting intelligence that is in safety critical environments. So that those two words are really important because learn systems can make mistakes if you’re asking for, like, some, so something like, “Tell me about these podcast hostsQasar [00:02:28]: that I’m about to go meet.” But you can’t do that obviously when you run, like, as an example, we run driverless trucks in Japan right now, as we speak. We can’t have errors. Those are L4 trucks. Yeah.Alessio [00:02:40]: Yeah. Was that always the mission? I remember initially, I think people put you and Scale AI very similarly for some things about being kinda like on the data infrastructure side of things. What was the evolution of the company?The Origin Story: Tooling, YC, and the Scale AI ComparisonPeter [00:02:51]: Well, from the very beginning, we always wanted to, really be a technology company that helped generally push forward the industrial sector. And so we started off working in autonomy. Our very first customers were robotaxi companies. And we started off doing a lot of work in simulation and data infrastructure. And then over the years, we’ve expanded our portfolios. Now we have, over thirty products, and it’s a pretty broad technology play within the landscape of physical AI.Qasar [00:03:19]: Yeah, I think the Scale reason is because we’re all YC Universe companies. But it was a very different company. Scale, was, is more of a services company, data labeling company fundamentally. We started and still are, do a lot of tooling. So like, you think developer tooling is now in vogue again, thanks to the AI boom. But honestly, ten years ago, it was out of vogue. It w Like, doing a tooling company in 2016, 2017 was not, like, the thing to do because, I don’t know if you remember, the VCs generally, their views was that toolings are They’re just workflows, and workflows ultimately are not really interesting. And we’ve gone and come, full circle with that. But when we started the company, our kind of it’s kinda like in the periphery of what the company wants to be. It was like, from our earliest days, like, we wanna deploy software on physical machines, like on cars and on trucks and things like that. And obviously, we didn’t know that the transformer boom was gonna happen. We didn’t know that autonomy systems would become end-to-end. Those things we didn’t know. And why that’s important when autonomy systems become end-to-end, it is just now those models can be generalized to, multiple form factors. And so back nine, ten years ago, tooling was a great way, and still is a great way to, build the technology and sell technology to our end customers, a lot of them who wanna build this stuff themselves. And so we just offer like a spectrum of solutions from you can just use like one part of a development suite of tools all the way to buying the full thing. The way to think about the company, or at least the way we think about the company is, as Peter said, a technology provider. It’s kinda like, what NVIDIA does or what an AMD, but we just don’t do chips.Qasar [00:05:06]: We don’t do silicon. But we’re a technology provider fundamentally. And I think even, we used to joke when we started the company, like, we’re not the guys to build, like, Instagram. Like that was just towards That’s not our That’s just not us in a most fundamental way. IAlessio [00:05:20]: You have thoughts.Qasar [00:05:21]: Yes.Qasar [00:05:22]: Well, it’s, it’s I mean, I think it’s just like what And I mean, we worked on Maps and stuff, Google Maps. Consumer products are extremely difficult for a lot of different reasons. It just, I think doesn’t scratch the itch. I think we’re like Michigan guys who are kind of more of that traditional engineering kind of a realm, or lineage. we used to jokeThe Three Buckets: Simulation, Operating Systems, and Autonomy ModelsPeter [00:05:41]: I gotta say, though, what was clear ten years ago was that there was so much more that was possible with software and AI in vehiclesPeter [00:05:47]: and that was generally the space that we started in ten years ago.Peter [00:05:51]: And the precise path that we’ve taken over the years, I think we’ve been strategic, and we’ve adjusted to make sure that we’re actually building stuff that’s valuable to the market. And like, the technology has changed so much. Like our own technology stack has completely changed, I would say, roughly every two years. And so now we’ve probably done, let’s say, four complete evolutions of our own technology stack. And I sort of see that cadence roughly keeping up.Peter [00:06:13]: And so the way even we think about engineering is almost on this two-year horizon, we’re preparing ourselves that, hey, like, we wanna invest the appropriate amount, but then also be very dynamic as the research gets published and as our research team figures out new advancements and adapting to that.Qasar [00:06:27]: Yeah. One thing that has been consistent is the type of people we’ve, we’ve recruited. It’s engineers who are fall into the sometimes very traditional, like, GoogleQasar [00:06:38]: -gen suite, but way different from, other companies. We are hiring folks who really know the intersection of hardware and software, who know really low-level systems. Obviously, traditional ML researchers and folks who’ve, actually, put ML systems into production. That’s been pretty consistent. I think that, like, you look at the mix of our engineering, eighty-three percent of the company is engineering, so it’s, like, a giant list.Qasar [00:07:05]: A lot of engineers.Alessio [00:07:06]: Which, by the way, a thousand engineersQasar [00:07:07]: Yeah. A thousand engineers.Alessio [00:07:08]: that’s on your website, so I imagine it’s up to date.Qasar [00:07:11]: It is, it is up to date, yes. Yes.Alessio [00:07:12]: okay. And then forty-plus founders.Qasar [00:07:15]: Yeah. We would tend to also, This was more luck than strategy. But we’ve recruited a lot of ex-founders. It’s been a great place for founders, YC and non, ‘cause obviously I know a lot of the YC folks. It’s kind of like we recruit a lot of Google people.Qasar [00:07:33]: For them to exercise both their technical and non-technical skills because, we’re, we’re, we’re on the applied side. We have a research team that we do fundamental research, we publish, and we’ve, we’ve had great traction there. But fundamentally, the business wants to take this intelligence and deploy it into production and there’s, like, a certain type of person that’s more interested in that.Alessio [00:07:54]: Yeah. You mentioned the tech stack, Peter, so I just wanted to give you some rein to just go into it. I’m interested in where Wayve Nutrition, starts and ends in some sense, what won’t you do? What, do you do that’s common among all the verticals that you cover?Peter [00:08:10]: There’s a few buckets of work that we do, and we’ve been at this for almost ten years now, so the technology’s pretty broad. But we got startedQasar [00:08:17]: Yeah, with a thousand engineers, like, you could work on lots of things.Peter [00:08:19]: There’s lots of stuff, yeah, espe-especially with AI tools to help.Peter [00:08:22]: So we got our start in simulation and simulation tooling and infrastructure. And so generally, if you’re trying to build a very complex software system that involves moving machines, you need to test that, and the best way to test it is it’s a combination of virtual developments, a simulation, and then also obviously real world testing.Peter [00:08:39]: And then there’s a very careful process of that correlation between the simulation results and the real world results and ensuring that the simulator is in fact accurate to that. Simulation’s a very deep topic.Peter [00:08:49]: We have a whole suite of products in that, and we could talk for many hours about that specifically. But that is one part of what we do as a company. Reinforcement learning as a subpart of that is also super critical. I think a lot of the a lot of the best advancements happening in a lot of these AI systems right now in some way relate to reinforcement learning, and with now we have lots of compute, and you can do tons of interesting things for reinforcement learning. The second bucket of work that we do is on operating systems technology. true operating systems. Like, think about, schedulers and memory management and middleware and message passing and highly reliable networking and data links. Like, the reality is, if you want to deploy AI onto vehicles, you need a really good operating system. And when we were getting deeper into that space, there wasn’t really anything that we were happy with.Peter [00:09:39]: Like, things existed, absolutely, and we were using what was available in the market, and as an engineering organization, we roughly realized these things aren’t great. We think we can do this better, and so let’s, let’s build something. And that was then the that was the moment of inspiration that started our operating systems business, which is now a very real business for us. And in order to write and run great AI, you need a great operating system, and so that-that’s what got us into that. And then the third bucket that we work on, it’s, it’s true fundamental AI technology. Models, we do a lot of work in, as mentioned, the foundational research, but then the also the world models and the actual autonomy models that are running on these physical machines, and that’s across cars, trucks, mining, construction, agriculture, and defense, and so that’s both land, air, and sea.Qasar [00:10:31]: And also, a smaller subsector of that third bucket is the interaction of humans with those machines.Qasar [00:10:38]: So that’s a multimodal, experience. Historically, if you’re moving a dirt mover or any of these machines, there are, like, buttons you press, whether they’re actual physical tactile buttons or something like a touch screen. That’s just That fundamentally is changing to where you’re just talking to the machine and the machine and you’re teaming with the machine.Alessio [00:10:58]: Voice?Qasar [00:10:59]: Yeah, voice, absolutely, yeah.Alessio [00:11:00]: Oh.Qasar [00:11:00]: And also the machine just being aware of who is in the cabin, what their state is. you can think from a safety systems perspective, the most simple version of this is, like, the driver is tired, right? They’re, they’re if you get those alerts when you’re driving your car and saysHardware, Sensors, and the LiDAR QuestionQasar [00:11:15]: -maybe take a coffee break, that take that times, a couple of order of magnitudes up. But this concept of teaming man and machine is important. When you think about running agents or just running, different instances of, Claude and doing work for you in the background, you can take that analogy out, almost copy and paste and put it into, like, a farm, where you have a farmer who’s running a number of machines. So where they interact with the machine is where there’s maybe a critical decision or a disengagement or something like that, but generally speaking, the agent on the physical machine is running and making decisions on the behalf of the farmer until there’s something maybe critical. And that’s also what we work on. So that’s not pure autonomy. It’s a little bit of a mix, but it falls under, autonomy. In the automotive sense, that’s typically defined in SAE levels as an L2++ systemQasar [00:12:05]: -with a human in the loop. But just take that idea, to other verticals.Alessio [00:12:09]: Yeah. You’ve not mentioned hardware at all, like sensors or obviously we you mentioned you don’t do chips. I think even in AV there’s, like, a big, cameras versus lidars. Like, what are, like, in your space maybe some of those design decisions that you made, and are they driven by the OEM’s ability to put things on the machinery? And like, how much influence do you guys have on co-designing those?Peter [00:12:32]: Yeah. So we don’t make sensors. Like, we’re, we’re not a manufacturer. Obviously, we use a lot of sensors in our autonomy products. in terms of what actually goes on the vehicles, we have a preferred set of sensors that we, let’s say fully support, and then our customers, they can sort of choose from those. And obviously if there’s a very strong opinion on supporting something else, we’ll add that to the platform as well. And the lidar question is at this point sort of the age-old,Peter [00:12:59]: topic in autonomy, and the state of the industry right now is lidar is hands down a useful sensor, specifically for data collection and the R&D phase of autonomy development. if you see, for example, a Tesla R&D vehicle, it actually has lidar on itPeter [00:13:17]: to this day, right? In the Bay Area we see these. you’ll see, like, Model Ys or Cybercab that have lidars on them just driving around. So it’s, it’s useful because it gives you per pixel depth information. So if you can pair a lidar with a camerand you can say that, well, this camera’s looking this direction, this lidar’s looking this direction, and now for each pixel of the camera I can see how far away is that pixel. you can actually then use that as a part of your model training, and then the that depth information then becomes a learned, a learned state of the camera data. And then when you’re doing the production system, you can now remove the lidarPeter [00:13:52]: and now you can actually get depth with just the camera. And so that difference between, like, a highly sensored R&D vehicle and then the down-costed production vehicle, we use that across our whole portfolio of products. And of course the end goal is you want super low cost and super reliable.Peter [00:14:08]: And then in certain use cases you have some more, bespoke things. Like in defense as an example, you do things at night oftentimes, and so you care about sensors like infrared, more so than And you don’t, you don’t wanna be putting energy out, so you don’t wanna use lidar or radar.Peter [00:14:23]: but you still need to be able to see at nighttime. So yeah, we work the whole gamut.The Operating System Layer: Why Vehicles Are Like Pre-Android PhonesAlessio [00:14:27]: Cool. So that’s kinda like on the hardware level. Then on the OS level, how does that look like? What is, like, unique? my drive- I drive a Tesla. Whenever I drive some other car that has a screen, it always sucks.Alessio [00:14:38]: It’s on, like, cheap Android tablet. It’s like, it’s laggy and all of that. What does the OS of, like, the autonomy future look like?Peter [00:14:46]: When most people, it’s really what you just described. When you think about operating system in a vehicle, you’re thinking about the HMI, right? The human machine interface, and absolutely that’s a an important part of it, but that’s actually only one thin layer on top. So when we talk about operating systems for, like, AI in vehicles, there’s many layers that go deep into the CPU critical realm and embedded systems, and you’re talking about the real time control ofPeter [00:15:13]: let’s say the electric motors or the engine and the actuators, and you have different redundancies for different, let’s say, the steering actuation in the vehicle. And all of these things, need very core support in the in the operating system. And then of course for autonomy you have real time sensor data that’s streaming in, and the latencies there are really important, right? If you try to Imagine you try to run Microsoft WindowsPeter [00:15:35]: like streaming your sensor data in or controlling the vehicle. Like, the latencies are gonna be absurd. Like, you can never do that. And so what’s special about what we do is we really have this system level thinking, right? So we’re looking at, we care about every performance characteristics of the entire system, and then we also, because we’re doing a lot of the software or all of that software, we can fine-tune and control all of those things. So we can very carefully tune in the latencies for every aspect of the system. We can carefully tune in the memory management. We can have the right, fail-safes and fallbacks, for different things. ‘Cause you have to account for what if, what if there is a critical failure? What if there’s a cosmic ray that flipsPeter [00:16:14]: a bit in the middle of the processor that causes some, malfunction? And you have to have a fail-safe to all of that, and so the core operating system is a part of that. And then the one last thing, which is a lot less exciting but is, actually a very big topic, is reliability of updates.Peter [00:16:30]: so the I have a Tesla and you get updates fairly frequently, right?Peter [00:16:36]: Once a month. Most companies that are making vehiclesPeter [00:16:40]: are basically never doing updates, and they’re And even if they are doing updates, they’re usually only updating maybe one module. Maybe they’re updating the HMI module. But they’re not able to update, let’s say, the CPU critical parts of the system.Peter [00:16:51]: You have to go into the dealer for that. And so with our operating system now we can actually enable highly reliable updates of any system in the vehicle, and that’s way easier said than done. Like, there’s lots of technical, technically deep stuff, in the tech stack to do that in a way that you’re not going to accidentally brick a vehicle.Peter [00:17:08]: And right? If, imagine yourAlessio [00:17:10]: That would be bad.Alessio [00:17:11]: Bad.Peter [00:17:11]: Bricking a car is a very expensivePeter [00:17:13]: and honestly, like across the industry maybe one of the most just pure impactful things that we’ve done is we’ve just, we’re, we’re now enabling the industry to actually do software updates.Alessio [00:17:22]: Just to clarify as well, who is the customer for this? Like, I assume a lot of hardware manufacturers have their own firmware, and I’m sure some of them would just have you write it for them because you’re experts. And others would have their own. Like, who pays for this? Who invites you into the house? Is it, is it the end user, or is it, is it the manufacturer?Peter [00:17:41]: Yeah. So let me make an analogy firstly on the on the fragmentation of software. So physical machines today are more akin to the state of the phone market before Android and iOS existed, right? So I worked on Android at Google by the way many years ago, and part of the reason that Larry at Google decided to get into Android was they wanted to run Google products on a bunch of phones, and they bought all of these phones from the industry, and it turned out they had like 50 different operating systems on these phones. And it was virtually impossiblePeter [00:18:17]: for Google to make their app run on all 50 devices equally well. And so the solution was, well, actually what if, what if they created-A really great operating system and made it attractive to all of these phone makers, and that was sort of the genesis for what Android was and why Android existed. It was a way for Google to get their products onto really wide diversity of devices. The state of the physical, industry right now, it’s a little bit like that. Like, there’s yes, these companies have firmware, but they have so many different operating systems, it’s so fragmented, and to actually get a modern AI application to run on these vehicles, you actually, you first have to consolidate the operating system, and so that’s, that’s why we’ve done that. And then, your specific question was who are our customers? It’s, it’s, generally it’s the companies that are making these machines.Peter [00:19:06]: And we’re, we’re, we’re selling our technology to them to really simplify the architecture and then enable these AI applications to run on them.Customers, Licensing, and the Better-Together StackSwyx [00:19:13]: How much is reusable across? Like, do you have, like, one OS that is just configured for everything, or is there some more customization that is needed?Peter [00:19:22]: Yeah, highly reusable. So the fundamental technology is quite universal, right? So things that we do have to think about though are, like, chipset support. And so if you’re, if you’re coding, let’s say, an LLM and you have start with an assumption that, “Hey, oh, I’m gonna, I’m gonna use CUDA, and I’m gonna run this, on an NVIDIA chip,” then you don’t really have to think about the hardware in that sense. Like, you’re just, “Okay, I’m just I’m in the CUDA/NVIDIA ecosystem, and I’m, I’m going to use that.” But the hardware, especially in safety critical systems, it’s a lot more diverse. There’s not one or one or two players. There’s a bunch of different chipsets that we have to support. And so our operating system doesn’t just run on, like, the equivalent of X86. It has to, it has to run on a number of different architectures from chips from a bunch of different companies. But again, we’ve been working on this for a long time now, so we have, we have support for all of those chipsets. And then when you want to then run the AI applications, we can then do that reliably across now a variety of providers.Qasar [00:20:19]: And I think that is, like, heavily inspired by Android, right? Android has a huge suite of testing and it’s a reliable operating system that runs on thousands of devices. And we think we can, we can do the same in all these physical moving machines, with the difference that we’re really in a safety critical realm. Android isn’t.Alessio [00:20:40]: So on Android, I don’t need to use Gmail, I can use Superhuman. Like, what about your machinery? Like, can people bring somebody else’s automation to it, or is it kinda like all-in-one?Qasar [00:20:50]: You have to use us. No. Yeah. we’re If, Yeah. Yeah, it’s totally open. Yeah.Peter [00:20:56]: Yeah. our philosophy is that we are a technology company, and so we license our technology to customers to use how they want. And so if a customer wants to If they wanna license our autonomy tech and our operating system, then great, we’ll license those. If they just wanna license the operating system and then use different autonomy tech, that’s fine also, and we have great documentation andSwyx [00:21:17]: Or if they wanna use developer tooling.Peter [00:21:18]: Yeah, exactly.AI Coding Adoption: Cursor, Claude Code, and the Bimodal EngineerSwyx [00:21:19]: It’s, like, a better together if, obviously, if you, if they work together. Is it all C++ I assume is with different compile targets?Peter [00:21:27]: We use a lot of C++.Peter [00:21:28]: Rust is sort of a hot, the new hot kid on the blockPeter [00:21:32]: for a bunch of things as well. But yeah, the lower level you get, especially when you get to real-time constraints, you hit C++ at some point, and at some point maybe you work your way into assembly when needed.Swyx [00:21:44]: Oh, damn.Alessio [00:21:46]: I’m curious about the coding agent adoption, just, like, since you’re mentioning more esoteric languages. Like, what’s the adoption internally? What have you learned?Peter [00:21:55]: Yeah. We use everything. So Cursor was, I think the hottest tool in the company for a good while. Now Claude Code, I think has taken the reign on that. We have a internal leader, leaderboard that we use just to sort of encourage adoptionPeter [00:22:09]: with-within the company. And yeah, it’s, they’re phenomenally useful. it’s, Honestly, we take inspiration from some of those tools also in how we’re adapting some of that mindset of thinking to the physical realm. Like if it’s so easy to build an app for this or that thing that lives just on a screen, we can We’re taking now a lot of the same ideas and applying that to, “Okay, well, if you wanted a physical machine to do something, how easy can we make that, using our own tooling and platform as well?”Alessio [00:22:40]: Are you changing any of, like, the OS architecture, kinda like the way you expose services to, like, be more AI friendly or?Peter [00:22:48]: Yeah, absolutely. The in the early days of our tools infrastructure work, it was a lot about, You had engineers that were experts in certain topics, but the things that you’re dealing with, they’re oftentimes more mathematical or more abstract, where actually GUI tools are very useful for certain things. Like as an example, we have a product we call Sensor Studio, which is, it helps you design the sensor suite for your autonomous vehicle, whether, again, it could be a car, it could be a drone, could be a mining equipment, could be a robot. And you place sensors in different places. You There’s different, There’s a library. You can understand what are the trade-offs that you’re making in the design of that system, and that was, like, a very, a very GUI intensive, thing ‘cause it’s a little more like a CAD tool in that senseSwyx [00:23:37]: YepPeter [00:23:37]: if you’ve seen CAD tools. Nowadays, though, right, we expose all of the underlying APIs for that and now using, AI agents, you can actually configure a sensor suite with just text and likely reach a better result than you could’ve through the GUI in the past, and we’re taking that thinking now through the whole product portfolio.Swyx [00:23:57]: Another thing I was thinking about is just in terms of, like, AI, adoption, does it change your hiring at least a little bit, or how do you, how do you sort of manage engineers, differently?Peter [00:24:08]: Yeah. absolutely, it does. we, I think like every company in the Valley right now, are evolving our hiring practicesPeter [00:24:16]: because the skills required to be effective are changing so fast, right? you used to really select for just rote implementation ability and now it is more the AI engineer skill set, right? Where it’s like, yeah, how to implement, but actually-Just banging out code is no longer the core job, right? It’s, it’s actually knowing what questions to ask, knowing how to tie, how to tie together these different AI tools. And so the interviews that we give now I think are way harder than they’ve ever been.Peter [00:24:46]: But we also allow, right, selective use of AI tools to solve the problems. And I think in that you start to see more of a bimodal distribution of engineers, right? You start to see like wow, there’s, there’s this subset of people that they really get it. Like they’re, they’re all in and they’ve, they’ve clearly invested the hours needed to learn these tools and how to be effective.Peter [00:25:09]: And then there’s sort of the group of people that haven’t done that, and that the productivity gap is just enormous. And so we’re, we’re trying to obviously select for the people that are really into this.Qasar [00:25:20]: I first wrote the my AI engineer piece three years ago, and when I first wrote about it, I was like, “Actually, not everyone should be an AI engineer,” ‘cause I think there’s a there’s an extremist stance where well, every software is an engineer is an AI engineer. And my actual example of people who should not be adopting AI was embedded systems and operating systems, and database people. Are they adopting AI?Peter [00:25:41]: I think it’s the classic bitter lesson, topic, which is the Six months ago I would’ve said the same thing, but it’s, it’s becoming super useful for every domain.Qasar [00:25:53]: I’m sure.Peter [00:25:54]: Right? Like,Peter [00:25:56]: there was, I think six months ago, or maybe a year ago, if you tried to use, let’s say the latest Claude model for writing shaders, GPU shaders, the results were probably underwhelming. And if you use the latest model now to do that kind of task, you’re a little bit blown away, like, “Wow, that actually worked. That’s amazing.” And we see the same thing in the embedded realm. No question though, especially when you get into safety critical systems, the human validation isPeter [00:26:25]: is 100% key. Like I You’re not gonna trust your life to a an AI written software that’s, that’s not been very carefully, checked by humans. And so I think now the really the challenge is about that appropriate level of human validation for these safety critical systems.Verifiable Rewards, Evals, and Neural SimulationAlessio [00:26:41]: How do you think about, yeah, touching on the simulation side, I think verifiable reward and reinforcement learning is, like, the hottest thing. What have you done internally to build around that? And like, what gives you What makes you sleep at night? Like, if somebody’s like, just web coding something or likeAlessio [00:26:57]: wants to try something new, you have like a good enough system. Because I think the opposite is also true, is like if it’s super easy to write anythingAlessio [00:27:04]: then it puts a lot of work on like the verifiableAlessio [00:27:07]: side of it. Like, what does that look like for people?Peter [00:27:10]: Yeah. So verifiability, a broader bucket of like evaluations, right? Like how do you evaluate the results that you’re, you’re getting? I think this is probably the hardest problem right now, because the As the models get better, it can be harder and harder to find the faults on the system.Peter [00:27:29]: And so like the problem of doing proper eval to find those faults, like that problem also keeps getting harder as the models get better. But it’s no less important than it’s ever been, right? You still there are still going to be edge cases that are not met and whatnot. And so it’s, it’s a big area of investment for us. On the reinforcement learning topic, the key thing is there’s all these new requirements that come to be in the latest generation of these technologies. So for example, end-to-end is the big thing right now in autonomy and physical AI, which is you can now train these models that can effectively take sensor data in and then put control signals out, and get really good results out of that. But the way that you train and improve those models is really different from the previous generations. And so to do reinforcement learning on an end-to-end model, you now need to actually simulate all the sensor data, right? So then this becomes a we call our, work in this neural simulation, but it’sPeter [00:28:26]: think of it like a hybrid of Gaussian, splatting and diffusion methods, and where you really care about performance. Like performance is everything. If you can’t do enough simulation fast enough and cheap enough, you actually can’t get results that are worthwhile, in the end. It also gets to a lot of our work in embedded systems, which is like performance critical work, and that performance optimization, performance criticality, it carries over to a lot of the model training work. because, like, the only way to make it affordable is it has to be really fast.Qasar [00:28:58]: I think it’s worth a few minutes talking about our own, evolving thoughts on verification and validation withinQasar [00:29:05]: kind of, traditional simulators, which are, you can think of like vehicle dynamics or something like that, which you’re just taking textbooks and taking those formulasQasar [00:29:13]: and putting them into software, to like now this neural sim/world model universe. I think that’s an interesting topic.Peter [00:29:20]: Yeah. So in more traditional development, right, you oftentimes would have, more black-and-white answers to questions.Peter [00:29:28]: And so the in Europe as an example, there’s, a regulatory, system, it’s called Euro NCAP. It’s the European New Car Assessment Program, and as part of that, the vehicles have to pass a bunch of tests, and those tests actually, include, safety systems. So automatic emergency braking for a child that runs in front of a carPeter [00:29:51]: or let’s say an occluded child that runs out and you hit it. And so you have You end up with sort of these binary answers of like, well, did the car under test pass this specific test? And there’s a very well-known set of test casesPeter [00:30:05]: that the vehicle has to pass. And that was how the industry worked, let’s say, until 10-ish years ago. But what’s changed now is with these models, everything is statistics, right? Like you no longer have a black-and-white answer, but it’s like, well, how many orders of magnitude or how many nines of reliability can I get in the system, and how can I, how can I prove that to be true? And the big unlock honestly for physical AI as an industry is that these models are just becoming much more reliable. Right? Things like things actually work a lot better. It’s like the number of nines you can get out of these systems are now good enough that it actually becomes cost effective to really deploy these things. And so the big shift in, so verification and validation has been from a little bit more of a Again the past it was strictly requirements, and are you meeting or not? And now it’s more of a statistical, verification and validation case where it’s all about how many nines of reliability and meantime between failures, that sort of thing.Statistical Validation, Regulators, and the Cruise LessonSwyx [00:31:04]: And is the target audience regulators or even the customers are yeah, if you I imagine the customers are bought in, and it’s mostly regulators that need to be satisfied.Peter [00:31:15]: We do work with the US government, we do work of course with the European governments and the government of Japan, and the government is not like an AI lab by any means.Peter [00:31:25]: So Swyx [00:31:26]: They just care about the outcome.Peter [00:31:27]: They care about the outcome.Peter [00:31:28]: And so we do education, in that regard, and like so sort of teaching about, “Hey, this is how we think validation should be done, and this is an approach that we think is reasonable,” and how to think about like when is a driverless system actually safe enough to go on the roads and that sort of thing. But I wouldn’t say that the government is asking for it. It’s like we’re more teaching the government in that, in that sense. It’s honestly, it’s more so for our own, our own comfort, right? Like, we want to build very safe systems, and then of course our customers care deeply about that as well. But in that context we’re also typically educating our customers.Qasar [00:32:01]: Yeah. Our first, our first core value is on round safety. So I think we can’t underline enough that, us also verifying and validating that the systems that we’re deploying are safe to us is probably as important as, like, some regulator or a customer saying,Swyx [00:32:19]: Of course. Okay. Yeah.Swyx [00:32:20]: You have to satisfy yourselves.Peter [00:32:22]: As I say, as a whole across the world, regulation oftentimes it’s like a almost lowest common denominator. But like, you really have to substantially exceed what the regulators are expecting to make good products.Swyx [00:32:33]: Yeah. One thing I often talk about, I think and I try to make this relatable to the audience also, is Cruise, where they had an accident that basically ended the company. I wonder if people overreact to single incidents, because incidents are going to happen regardless, right? ‘Cause it’s a statistical thing, but as long I don’t know if regulators understand that, you cannot extrapolate from a single incident, but we do because that’s all we have to go on. And your sample sizes are necessarily gonna be lower than, I don’t knowSwyx [00:33:00]: consumer driving.Qasar [00:33:01]: Yeah. I think the Cruise example wasn’t a technology failure. there was The real, compounding issue there was just how did the company talk to the regulators and what was their kind of behavior, and I think that became more of the issue. If you look,Peter [00:33:19]: It isn’t It definitely was a technology failure, but it was made much worse by theSwyx [00:33:23]: Put the car back on the woman.Qasar [00:33:25]: Yeah. And let me put it another way. There is a version where Cruise still exists.Swyx [00:33:29]: right. Right.Qasar [00:33:30]: Right. It’sSwyx [00:33:30]: It was like the last strawQasar [00:33:31]: ItSwyx [00:33:31]: in like a long chain ofSwyx [00:33:33]: like issues.Qasar [00:33:33]: So do you feel like ATG had that horrific accident or someone actually dying, because, that was a homeless person crossing the street? So yeah, I think we can’t understate enough that ultimately, like, statistical validation of something, that’s one part of it, but it’s not the only part of it. Like, consumer and let’s say, mainstream adoption of these technologies is also gonna be part of that conversation. I think companies like Waymo are doing a lot of service positively to the industry in the sense of they’re, they’re setting a high benchmark and they’re showing, kind of in a very responsible way how to, how to deal with these. There have been Waymo incidences as well. They’ve just not been as significant as the Cruise one that you mentioned. But yeah, so I think you’ll just continue to see that. I think probably the long term question is really gonna be, again, around Like it is very clear humans are way worse drivers statistically.Qasar [00:34:29]: Like, there’s no, there’s no debate. And so at what point But we’re emotional animals.Swyx [00:34:34]: Yeah. So my thing is, like, we have to get to a point as a society where we accept horrific accidents that would never happen by a human because statistically we understand that it is safer overall. In the same way that planes, they’re safer, than I think they’re the safest mode of transport that we have.Qasar [00:34:50]: Yeah. it’s more dangerous to drive to the airport than it is to get on a flight.Qasar [00:34:53]: So if you’re everQasar [00:34:54]: if you’re ever getting nervous about getting on a plane, just think “I just gotta get to the airport.”Swyx [00:34:58]: Yes, we’re flying.Qasar [00:34:59]: If I get to the airportQasar [00:35:00]: I’ll be good.Swyx [00:35:00]: But then it’s, planes also concentrate the tail risk if planesQasar [00:35:03]: Yeah. AndPeter [00:35:04]: And I was, I don’t think we honestly have to worry about there ever being, accidents from these systems that are like much worse than what humans would cause, ‘cause humans do terrible things.Peter [00:35:14]: Like, people fall asleep at the wheel all the time.Swyx [00:35:16]: I have.Swyx [00:35:17]: Like, I’ll call, I’ve been a drowsy driver.Peter [00:35:19]: Kinda drunk drivers, and that’sPeter [00:35:20]: that’s the extreme end of the example. But these AI systems, you have redundancies, you have fallbacks. Like, there’s many things have to go wrong for there to actually be a something catastrophic because there’s, there’s so many, fallbacks that these systems have.Alessio [00:35:36]: your simulation is like so vast because there’s so many use cases. What are, like, maybe things that worked in a simulation and then you put it out and it’s like, “F**k, this isAlessio [00:35:45]: this just did not work at all?”Peter [00:35:47]: Yes.Alessio [00:35:47]: IsPeter [00:35:47]: That’s maybe a bit of a misconception, about simulation there. So let me go a little bit, more technical on this. So at first go, no simulation is going to represent the real world. There’s always a process of this, sim to real matchingPeter [00:36:02]: where you actually, you need the real world feedback to basically feed into the parameters that are being used in the simulator, and you have to do that, it’s like this validation flow, a number of times until you can get some confidence that, like I think the simulator is now accurately representingPeter [00:36:19]: what’s gonna happen in the real world. Now, if you have a situation where you’ve done that full validation and you thought that it was accurate and then there’s something different, those are much trickier cases, and that’s, that absolutely can happen, but really I think the validation process is a really important part. You can never skip the simulation validation process, like where you’re actually ensuring that, hey, the actual, my sim to real gap here is small enough that I can trust these simulation results. And there’s, there’s so many fun things that you can do when you get into it. Like, I’ll, I’ll give one fun example that came up recently is like in these humanoid robotics, systemsOverheating actuators is a real problem, right? So obviously phenomenal demos. IPeter [00:37:01]: The most amazingAlessio [00:37:02]: For 10 minutes.Peter [00:37:03]: The most amazing I can get. I love, I love watching robots do acrobatics like everybody but the these systems actually overheat, right? If, like, And one of the ways you can use simulation though is you can actually have that, the temperature of those actuators be one of the parameters that’s representedPeter [00:37:18]: in the simulation. And if you’re doing reinforcement learning over a certain task, then the robot can actually adjust its motions in the simulation to account for the fact that, oh, it knows that as it’s moving, it’s actually beginning to overheat this motor. But if you didn’t have that parameter of, let’s say, the heat of that motor represented in the simulation initially, then your RL policy might It will disregard that. And now you run that on the robot and the robot will overheat and fail.Alessio [00:37:43]: I guess the question is, like, how do you have all of these parameters taken care of while also understanding the deployment environment? Like, temperature is like a great example, right? WellAlessio [00:37:53]: why did you make my robot worse when it runs in like a freezer?Alessio [00:37:57]: So it actually shouldn’t worry about that. it’s like, yeah, how do you design these simulations?Peter [00:38:02]: This is honestly the This is what makes simulation so hard, right? it’s because you Simulation is fundamentally about you’re trying to optimize the development of a system, right? Like, how can I build this system faster and better and cheaper and what are all the levers that I have to actually accomplish that? And because simulation’s just a software program, you can, you can change it a lot more easily than you can hardware systems. And then what’s particularly awesome about the let’s say, world models and using that as a part of simulation is now the simulation doesn’t just scale with, let’s say, adding new math equations inPeter [00:38:36]: but we can actually scale the simulation environment now with additional real world data and that also unlocks a whole new field of robotics.Qasar [00:38:46]: There is a meniscus line where you cross where still doing real world testing is better. there’s, in this, sim-to-real gap, you can reproduce reality at exceedingly expensive costs and this So nothing is free. So really you have to you’re finding that line where you’re getting great performance, you’re getting great feedback, whether it’s on the training side or on the eval side, but it’s way cheaper than doing it in the real world. At some point it, that doesn’t make sense. And so even, from our earliest days in autonomy, our view was you’re still gonna do real world testing. You There’s, there’s not, there’s not this, magical land where you’re not gonna do that. And maybe even like a more nuanced version of this in like traditional software development is, most of your testing for software in a vehicle, 95% of that can be like traditional CI/CD kind of, flows that you would have in traditional web development. But once you have Now you, let’s say you have a truck. Well, you can do like 4% of those in like a rig which has all the components, the electrical and electronics of a truck, but doesn’t have, it doesn’t have the tires and it doesn’t have the And then you have the 1%, which is actually the vehicle. There’s something There’s a similar analogy in terms of using simulation for intelligent systems. You can do a lot in a simulator, but in using world models, but ultimately it’s, it’s physical AI. So you’re gonna deploy it on physical machines andQasar [00:40:17]: the freezer example comes to, comes to light.Alessio [00:40:20]: The world model thing has been to me the hardest thing toAlessio [00:40:22]: wrap my head around. Like we have Faith Eliyon on the podcast.World Models, Hydroplaning, and Cause-Effect LearningQasar [00:40:25]: We’ve been doing a small series with like another Intuition company, General Intuition as well.Qasar [00:40:31]: yeah, and I mean, lots of, lots of coverage on NeRFs and yes.Alessio [00:40:34]: Yeah. It feels like we talk with about, the heliocentric system, right? It’s like in a world model, if you just feed visual data, the model might learn that the sun spins around the Earth. It makes sense, right? And it’s like, well, not really. And I think what are like some of these other things that like hydroplaning is one thing I think about, is like can a world model understand hydroplaning and like what amount of water like causes it to happen? And it’s like, yeah, to me it’s like I don’t understand how you guys do it. I guess it’s like the real thing is like when you’re doing both cars and the highway in Japan versus the excavator in a mine in,Qasar [00:41:13]: ArizonaAlessio [00:41:13]: wherever you’re Arizona, wherever you’re deploying them.Alessio [00:41:15]: How much of it are you relying on the world models to like generate the simulations for you and then try and close the gap after versus like giving the world models as a tool to your engineers to like curate the simulations if that makes sense?Peter [00:41:28]: Yeah, totally. So yeah, I can say at a pure engineering level, I think if you’re hoping to do real world deploys and you’re purely relying on a world model approach, you probably won’t get to something that works, before you go bankrupt. So there is just a very practical mindset of like, world models are amazing and they’re extremely useful for a lot of use cases, but there are a lot of other things that you need to do to actually get something started and something deployed and working. most fundamentally, world models are all about It’s understanding the world, but also understanding what’s going to happen. It’s like the cause-effect relationship.Peter [00:42:01]: Right? And so like it, right, if you have a take some sort of construction tool, and that construction tool is gonna be doing some work on the Earth in some way, it’s gonna be moving earth, the world model needs to understand that cause-effect relationship. Like, okay, when I, when I take this material from here and put it over there and now I have things that are over here and not over there anymore and that cause-effect, relationship. data obviously is a is a big problem. The hydroplaningPeter [00:42:26]: one is actually a really great example because it’s actually quite non-obvious sometimes. Right? It’s like, well, it’s, it’s raining and well this road, has, let’s say the appropriate curvature to it so the water is running off the road and cars are driving faster here and then you approach a road that’s very flat and water is now puddling on that road and all of a sudden cars are driving slower because when they were driving faster they were starting to lose control. And there are a lot of visual nuance, very nuanced visual cues in the scene and so I do think in the world model concept there’s a good chance that the model actually would learn that you should just drive slower when these visual cues exist, and that’s obviously the beautiful-The beauty of, these kinds of models where they just, they learn these non-obvious things.Swyx [00:43:14]: It doesn’t need to know about hydroplaning to know that it needs to drive slower.Peter [00:43:17]: Yes.Swyx [00:43:17]: I guess it’s Yeah. I wanna ask questions about, also deploying models. I presume, like, you use a lot of these world models for training data and simulation, but what about deploying it onto the systems in production? Presumably you have you have, like, GPUs on deviceOnboard vs. Offboard: Latency, Embedded ML, and DistillationSwyx [00:43:36]: but they’re I keep saying on device. What’s the what’s the right term for that?Peter [00:43:40]: On machine.Swyx [00:43:41]: On machine.Peter [00:43:41]: Or embedded, yeah.Swyx [00:43:42]: Yeah. What is the embedded world like? because for people who are not used to that world, this is very alien.Peter [00:43:49]: Yeah. So it’s actually We call it onboard and off board.Peter [00:43:52]: So like, onboard software and off board software.Peter [00:43:54]: And the great thing about off board software is you don’t have to care about time, and you can run really large models, right? So you can, you can say, “Well, this model, I don’t care if it takes one second for it to give me a result or 10 seconds for it to give me a result, because we have time.” And the models can be really big, and they can run, in a data center or on a on a huge GPU and you can obviously have distribute to compute, et cetera. But onboard you don’t have any of those benefits. You’re like, “Well, I need I have this many milliseconds where I need an answer from this model.” And so a lot more of the energy then is about, think of it more like distillation and it’s like truly efficiency and like, literally every fraction of a millisecond counts. And you can’t have a situation where the model takes too long because then the vehicle can’t actually function.Peter [00:44:42]: And so you can, you can still use a lot of the same techniques, and the models themselves you can think of as like a derivative of larger models that you can run offline, and then you’re, you’re trying to just get a model that is still performs really well but it’s, it’s a it’s smaller, small enough version that you can then run on this embedded system where you care about latency and power.Qasar [00:45:03]: Yeah. And I think like, the broader point I think which, maybe is not obvious but it’s worth saying is in physical AI world, we’re not really constrained right now by, like, the intelligence of the models. It’s actually what Peter’s talking about, it’s actually deploying them inSwyx [00:45:19]: The hardware they give you.Qasar [00:45:21]: Yeah. On the hardware you give you.Qasar [00:45:22]: And so And there’s just a reality is of safety critical systems. So those end up being the your limiting factorsQasar [00:45:29]: rather than, let’s say, a limiting factor for, a foundation model companyQasar [00:45:34]: is gonna be just capital maybe or researchers.Qasar [00:45:38]: So we’re, we’re in that way dealing with, for us as people who kind of come in that realm with like a very interesting Those constraints force creativity.Swyx [00:45:47]: And I imagine, nobody was deploying or giving you the hardware for transformers back in 2018, whatever, but now they are. What’s the evolution like? just peel back the curtains a little bit.Peter [00:45:59]: Yeah. Transformers first off, I think the paper was originally published in 2017.Swyx [00:46:02]: 2017.Swyx [00:46:02]: So there’s no time.Peter [00:46:04]: And ISwyx [00:46:05]: But I’m just saying I guess I’m saying, like, embedded ML systems usually, like, a lot less parameters, a lot less compute, and now, like, orders of magnitude more.Peter [00:46:14]: Yeah. absolutely. what I was gonna say though was I think in the in the original paper in 2017, maybe it’s in the last paragraph, somewhere in the paper they talk about, like, “Oh, by the way, this technique might be useful for, like, images and videos as well.”Peter [00:46:30]: These last subjects.Peter [00:46:31]: And it took a few years for that impact to really hit. But like, now, we’re seeing transformers are everywhere.Swyx [00:46:39]: Yeah. Vision transformers.Peter [00:46:40]: And then then the compute just keeps getting better and better. But you do have this fundamental trade-off, right? It’s like you have power, you have cost, and performance and like, getting the right, getting the right mix of those things in an embedded package that can also be, like, shaken and baked in all thePeter [00:47:00]: conditions that these things have to have to operate in. But yeah, I think that they’re only going to keep getting better and so we also try to plan our strategy understanding that, we know the rate of improvements of these systems.Swyx [00:47:11]: Yeah. So like, Google just released the Gemma 2B modelSwyx [00:47:15]: that effective 2B model. Is that useful to you guys or is that too big?Peter [00:47:18]: You can run that model on an embedded system, definitely.Peter [00:47:21]: the So yes, it’s, it’s useful in that regard. The bigger question is, like, what do you use it for in an embedded system? Like, you actually need to customize it quite a bit to make it useful for something. But yeah, you could run a two billion parameter model, definitely.Swyx [00:47:35]: It also interesting, like, what percent is a custom ML model that only does that thing versus a generalist LLMSwyx [00:47:41]: which probably is not that useful actually for your context.Peter [00:47:46]: Like, you, like, you can imagine different use cases, right?Peter [00:47:48]: So theSwyx [00:47:49]: The voice stuff, yes.Peter [00:47:49]: Yeah, the voice test. Totally, yes.Peter [00:47:51]: So for the actual, autonomy elements, that’s 100% in-house. We do every bit of that, the data simulation, the model, everything. But when you get into the more generic use cases like voice or voice assistant kind of thing, that’s where these more generalist models like Gemma actually can be quite, can be quite useful.Swyx [00:48:09]: Yeah. And then there’s also obviously a trade-off between, like, what percent must you do on machine, versus just call home.Peter [00:48:16]: Yeah. It’s all about latency.Swyx [00:48:17]: Latency.Peter [00:48:17]: It’s all about latency. Yeah.Swyx [00:48:18]: Yeah. Well, like, I think actually in a lot of contexts, especially in the US, you can just have a connection to the web.Qasar [00:48:26]: Yeah. I think though most of our universe is everything has to be fairly, embedded and local because just the nature of Even in the US there’s a lot of likeSwyx [00:48:39]: PatchinessQasar [00:48:40]: don’t haveQasar [00:48:41]: have coverage, right? And if you look at, like, the old world of autonomy within mining, which is, like, long before transformers and kind of, neural networks, in the like CNN and kind of a universe, they were really just hand-coded, systems. They were just like, this machine is gonna run to that place with thisPeter [00:49:03]: That was our GPS, like very accurate GPS.Qasar [00:49:05]: Yeah. And so that worked, and that worked for 20 years, so why would we actually need to use transformers or kind of more modern end-to-end systems? Mainly because you can only really run a path and run backwards. That provided a lot of value, but m-Not as much as you get when the machine is actually intelligent. It’s, it’s seeing, it’s perceiving, it’s acting in a dynamic world.Alessio [00:49:28]: I looked up RTK, real-time kinematic, one to two-centimeter accuracy.Qasar [00:49:32]: Yeah. Fantastic. But the and fantastic in faraway lands where there’s not gonna be cell phone coverage.Peter [00:49:39]: Yeah, so it’s widely used on the legacy mining and agricultural autonomy systems today. So like, for example, a combine that can be precise within one or two centimeters as it’s driving down the field, they use RTK.Qasar [00:49:53]: Yes.Peter [00:49:53]: But it’s, it’s expensive.Qasar [00:49:54]: Yeah. And it’s, it’s, it’s autonomy, but it’s not intelligent in the way that I think all of usQasar [00:49:58]: if in twenty-six we’d be talking about intelligence.Alessio [00:50:00]: In one of your blog posts, you mentioned research on large scale transformers that are similar to those doing modern generative AI. What are, like, the big differences other than, “You’re absolutely right. I should steer the car, so you probably wanna remove that?”Peter [00:50:14]: We have a diversified bet strategy internally, and the reason we’ve done that is because we operate in now a bunch of industries, a bunch of geographies, and each of the approaches has, obviously a different risk to them.Peter [00:50:27]: And so like, we’re not going to put all of our eggs in a single basket for a single approach because that approach may not work out.Peter [00:50:36]: and so that’s, that’s one of the bets that we have, and it has certain advantages in certain scenarios, and then But the way that these things play out in practice is it has certain benefits and also has certain drawbacks. And then, and then the research team tries to then work on, the situations where that’s actually worse than these other approaches and to ultimately arrive at a really great solution for all of these things.Plan Mode for Physical Systems and Next-Token Prediction UniversallyAlessio [00:50:57]: Is there a plan mode for physical autonomy, like the other planning step and then, action step or?Peter [00:51:03]: So short answer is yes, right? So just like you can use, Claude code to plan out some complex coding task and you get some almost specification written out, those similar approaches absolutely can be applied to physical systems because imagine you’re trying to accomplish some task. The easiest to think about is robotaxi, but I thinkPeter [00:51:23]: things get more interesting, let’s say, in the defense context or in the in the mining context. You actually do have to think about many steps in advance.Peter [00:51:32]: It’s, it’s not just this one thing, but to accomplish the goal, there’s a hundred steps, and then the this concept of the plan mode, it’s, yeah, very applicable, in thoseAlessio [00:51:40]: Yeah. I was gonna say, to me, driving feels like a great next token prediction thing because you’re kinda like on a path and like, it doesn’t really matter what you’ve done before. you can always turn around.Qasar [00:51:49]: It’s all planning. Yeah.Alessio [00:51:50]: Yeah. Versus, like, mining, it’s like, “Oh, man, I took a I took a scoop out of this thing.” It’s like, now we can’t reallyAlessio [00:51:57]: I can’t really go there anymore. it’s like, is there like a huge difference? Like, how would you I guess, like, do you have like a taxonomy of, like, these different types? So there’s kinda like drivingAlessio [00:52:07]: excavating, like, flying. How do youPeter [00:52:11]: So the interesting thing is, yeah, I think probably everything in the world can actually be boiled down to, like, a next token prediction problem.Peter [00:52:18]: and in any workflow, anything, can be thought of almost as like there’s this sequence of steps or the sequence of trajectories or what-whatever you wanna call it, and it can be boiled down actually to that sort of thing. And in the mining case, you can imagine, like, taking that scoop. Okay, that was that set of tokens, and now that’s, the model is now understanding that, okay, that the state space is different, and now the next time I do token predictions, it’s going to, going to be modified by that. But yeah, these The remarkable thing about these techniques is just how universally applicable they are, right? it’s, it’s truly is incredible.Alessio [00:52:53]: What else is underrated about what you guys are building on the physical side? I think there I mean, we were talking about it before the episode. There’s a lot of humanoid companies that do these great demos, and then I can’t buy it, so obviously it can’t all be there. In your case, you’re, like, in production on real streets with, like, a lot of customers. What are, like, the things people are underestimating? The same way the Waymo demos seven years ago were great and then took seven years to actually get them on the street. Can you share about maybe like, the last one percent that was really hard to get done technically?Productionization: The 20 Problems Every Robotics Demo Will HitPeter [00:53:27]: Yeah. So certainly, productionizing stuff is really challenging no matter what. So I maybe would, I would split the answer maybe into research and then also in production. First, on the production side, there’s just so many problems that you find when you actually get the stuff to go in the real world. And so the classic problem in humanoids right now is these systems are actually pretty brittle.Peter [00:53:48]: and so I’m not talking about any one company, but just as an industry, these systems are pretty brittle. interestingly, I saw this thing, the other day that, I think China is doing a marathon with humanoids.Qasar [00:54:00]: What?Peter [00:54:00]: Yeah. So in government, and not China specifically, but in any government, there is a there’s a concept called, prize policy, which is so that there’s, there’s different ways of influencing an industry to go a certain direction. Like, you can, you can regulate it, right? You can do mandates, or you can actually just do these competitions. So the US version of this was the DARPA Grand Challenge. thatAlessio [00:54:20]: That worked.Peter [00:54:21]: But it really worked. ItAlessio [00:54:22]: That really workedPeter [00:54:22]: took the whole industry. But I think China is literally doing this marathon because they know that reliability, of these humanoids is a problem. And so what cooler way to solve that than to have a competition where humanoids need to run twenty-six miles, right?Alessio [00:54:37]: Are we there? Can robots run a marathon?Peter [00:54:40]: I think it’s happening any day now.Peter [00:54:42]: So it’sAlessio [00:54:43]: So we’re there.Qasar [00:54:43]: By the way, also, automotive, there’s a version of this which is, like, twenty-four Hours Le Mans, right?Qasar [00:54:48]: It’s like Porsche wins twenty-four Hours Le MansAlessio [00:54:51]: New productQasar [00:54:51]: and then literally puts those, the products into production. I would actually break it down. You, talk about research and you talk about production. There’s actually a step in the middle which is, like, advanced engineering, and I think a lot of the industry is moving into advanced engineering where it’s like it’s not fundamental research. Like, we’re coming in with novel techniques. It really is advanced engineering for production. So what are the subcomponents that are gonna limit to getting into production? Once you’re in production, you’re dealing with another set of problems which is, like, the deployment, maintenance, of those machines that exist. So I’d say, at least in our field-We’re mostly in advanced engineering in the like, automotive parlance.Peter [00:55:29]: honestly, every step is hard though.Alessio [00:55:33]: Paul, this way you’re worth 15 billion dollars, so don’t answer.Qasar [00:55:36]: You bleed every step.Qasar [00:55:38]: Yeah. And I thinkPeter [00:55:39]: It’s fun. I think it’s like, I don’t know. I find it really enjoyable. Yeah, but what it was also fun is like, so we’ve, we’ve been doing this now for almost ten years, and we’ve just seen, we’ve seen so much bad times. And so right now we can look at any company in this space and like, get a demo, and like, I can, I can write down a list of I know exactly the next 20 problems they’re gonna hit.Peter [00:55:59]: And like, and I can guess also what they’re going to try to solve each of those, and I can guess which one’s gonna actually work.Qasar [00:56:04]: Yeah. It’s not because we’re, like, particularly, like, geniuses.Peter [00:56:07]: We’ve just seen this stuff now.Qasar [00:56:07]: Yeah. We’ve seen enough of this stuff. We lived enough of this stuff. We, our own kind of mental models of the world as leads in the company, we’ve tried so many things and many of We’re talking about the winds here. LikeQasar [00:56:21]: TherePeter [00:56:21]: Plenty of losses there.Qasar [00:56:21]: There’s plenty of losses among that many people doing that many different things and so that kinda, like, get baked into your, likeQasar [00:56:29]: mental model of the world.Peter [00:56:30]: Yeah. But I would say and in general, like, we’re excited about robotics for sure, and likePeter [00:56:34]: theQasar [00:56:36]: Massive opportunityPeter [00:56:37]: massive opportunity and what’s, what’s happening now in the industry is like none of these concept are new, right? What’s new is, like, this stuff is actually working now.Peter [00:56:46]: Right? The people have wanted to use, neural nets robotics for a long time, but now, like, again, we now have the data sets, we have the simulation technologies where stuff is actually starting to really work, and yeah, we wanna be part, wePeter [00:56:58]: we’re gonna be part of that for sure.Alessio [00:57:00]: Do you have requests for startups or like, advice against starting certain startups? There’s a lot of, like, scale-up robotics, companies. It’s like what do you think are thingsQasar [00:57:10]: A lot of, a lot of applied intuitions for other things.Qasar [00:57:14]: I think you hit a you hit a certain, what is it, badge when YCPeter [00:57:21]: X for YQasar [00:57:21]: right, you become like, or literally the same similar names, like,? I think my biggest advice, in this, like, almost like commercialization of technology is I think often the that constraint, so we talked about, like, hardware constraints, or we talked about, there’s also, like, on the commercial side, there’s constraints, which is we’re gonna only do things that fit in this box. That is, I think very good for founders. The reason I think it’s not often focused on is because you have plenty of access to capital, and the technical problems are so hard you’re like, “I already have a constraint,” which is just getting this technical problem solved, and I think the venture community, generally speaking, tends to be not very technical. For them, if you just say, “If we solve this thing, it’s gonna be a lot of money,” that’s kind of enough for them, but you as a founder, I’m not giving you advice on how to pitch VCs. That’ll work for VCs. You still gotta run a sustainable business. And I think we’re really in that, question you asked earlier about kind of, what’s maybe not obvious about our company. It’s like this is truly compounding technology. A lot of the work that we do just compounds. we don’t throw it away. It gets better. The operating system work gets better. The dev tooling gets better. The models get better, and so we’re really gonna get a hu- I think you see it in Waymo as an example. Like, Waymo is a company that is, I would say, very interesting for a long time, but not worth one hundred and twenty-six billion dollars, right? So what happens, like, is that the human brain just doesn’t emotionally understand the compounding effects, so that’s gonna happen in our universe. So now if you’re a founder, you’re at the beginning of that long, walk. If you can put a little constraint on commercials that has a small ability for you to more likely see the other end of that, the that walk, ‘cause if you can get to the other end, you will get the big return from compounding technology. Just a lot of people just don’t make it. So yeah. summarize, like, think a little bit about the equation of how you use money and where you use the limited resources and limited engineers that you have. I think sometimes then founders falsely kind of take very mature companies’ strategies and then apply to their, like, nascent. They’re like, “Oh, well, Steve Jobs says be completely vertical.” Well, yeah, in 2007, Apple is very different than 1978 and 1982. Those companies were different. They were literally just taking electronics from other manufacturers and just putting it in an enclosure. And so just be a bit more like, I don’t know, be a bit more nuanced in your, in your commercial approach as it informs your technical approach.Founder Advice: Constraints, Compounding Tech, and Mature-Company MimicryAlessio [01:00:03]: Do you feel differently today? Like, you just joined X, right?Alessio [01:00:06]: You’ve been building this companyAlessio [01:00:08]: you’ve been building this company in stealth, and now you’re like, “Well, I should probably be talking about what I’m doing.” I think a lot of founders are in a similar way where they wanna raise a lot of money to signal they’re strong, and you raise a lot of money without spending it.Qasar [01:00:20]: And to hire. And to hire, yeah.Alessio [01:00:21]: You obviously like that. Do you think that’s still possible to, like, have a very narrow approach of, like, “Hey, we’re kinda like building a compounding thing without a grand vision right away,” versusQasar [01:00:32]: It’s, it’s very difficult to answer very general questionsAlessio [01:00:35]: WellQasar [01:00:35]: that, I, but I, so maybe like, maybe I reframe it as in is it possible to build a product that has a small, let’s say, problem space and hope that the problem space will grow? Maybe that’s, like, a different way of asking the same question but ma- more answerable. I think always yes. That is the old YC, like, go really deep and then, rather than very broad and shallow.Qasar [01:01:00]: Very broad and shallow unfortunately, there’s just too many especially in hard tech companies, there’s just too many problems, and you can’you’re gonna do all of them in a very mediocre way, and so the full product is actually fairly mediocre. So yeah, I still in, I’m still in the camp of find a small problem space. The other question you’re asking is a tangential is, like, should you, like, build in stealth and anonymity? Well, yeah, if you’re a YC COOQasar [01:01:28]: you can beSwyx [01:01:29]: Oh, Travis Kalanick.Qasar [01:01:29]: And we, yeah, we worked, we worked, together at Google. We have a long history, and we don’t And which means, which is another way of saying we have big networks. our first of 400 people, majority were Googlers. Like, a majority of the company came from, this giant company we worked at, and that’s just very different. You’re a founder who is doesn’t have that experience. You have to do these things. And I think it’s kinda, that’s a so it’s like just don’t take my version of the world or whatever other founder, Jensen’s version of the world. They are in different time and space.Qasar [01:02:02]: And most importantly, their companies are in a different phase.Qasar [01:02:06]: And so then if you wanna take inspiration from other really young companies, that’s also bad because most of them are gonna fail.Qasar [01:02:11]: So the only, the only solution you really have is use first principle thinking and say, “Based on my skills, my co-founder’s skills, the skills of my early team members, and the what I’m hearing from customers, what’s a product space that I should, I should build?” AndQasar [01:02:26]: Yeah. Does that make sense?Swyx [01:02:27]: Yeah, it does.Alessio [01:02:27]: Yeah. I, Sam Altman, he said he regrets a lot of the advice that he’s given in YC.Alessio [01:02:33]: So I’m always curious to ask, founders like you who’ve now beenQasar [01:02:36]: So IAlessio [01:02:36]: Just a long time agoQasar [01:02:37]: everyone who leaves YC, like, does the opposite.Qasar [01:02:41]: well, Sam was president, I was COO.Qasar [01:02:43]: Right? So and we’d have a CEO, so we worked together, extremely closely would be an understatementQasar [01:02:48]: ‘cause the firm was also small. TheAlessio [01:02:50]: YepQasar [01:02:50]: YC wasn’t wasn’t as big as, like, an OpenAI is. I directionally agree with that, but I would say that’s not more of a YC function, it’s more of the marketQasar [01:03:02]: has changed.Qasar [01:03:03]: It is a different world. The AI industry is at the AI companies, I should say more specifically, and how they relate to the other YC companies and market, just so fundamentally different. The amount of money raised is different, the amount of investors, the sheer number of seed funds. One of our early investors is Floodgate, and they did some analysis in the late, 2000, like, double O’s, where they were like, “There’s, like, single-digit number of funds that were like Floodgate,” which were, like, writing sub $1 million checks, first checks, and they were not accelerating incubator. And Anne, who’s, who’s one of the co-founders there, with Mike, they said that today they try to do, or like, today as in, like, three, four years ago, they tried to do this analysis and they, like, lost count at, likeQasar [01:03:46]: 350 funds or something like that. So we’re just in a different environment, so the YC advice from 2014-Qasar [01:03:55]: just would not apply in 2026. But Sam is, like, way better at saying these things than me.Qasar [01:04:00]: Like, he sometimes makes sound like He says it in a shorter, most, more interesting and than me. I can just give you, like, the Like, I, like, if you ask me, like, “What is the purpose of a car?” Like, open the owner’s manual and I sayQasar [01:04:13]: “Number one, look, there’s a steering wheel,” and instead of, like, “It can change your life and will be there.”Alessio [01:04:21]: Yeah, it gives you autonomy and freedom.Qasar [01:04:22]: Yeah, exactly. Yeah.Swyx [01:04:24]: and then for Peter, I was just kinda curious if there’s any particular tech or research problem that you would call out as very meaningful for you guys if it was solved, and unsolved, and if anyone is working on it, they should get in touch with you.Peter [01:04:40]: Yeah, I think th- generally the making models very efficient, right? So because we have to run on actual vehicles, like physical AI is literally, it’s taking, like, very large AI and now making it very small and very efficient. And so we’re constantly just at that boundary of these limitations of, like, well, you have a great model, but now we need to make it faster and smaller and so that in general as a as a field. And then I would say also, folks that are just really passionate about, like, evaluating this technology. As in, like, mo- model evals, is, it’s a hugely difficult topic, especially in safety critical systems. And we have a I think a really great engineering team that works on this now and researchers, but it’s, it’s a big area of investment. And so yeah, folks that are passionate about, yeah, performance, I say model performance, both in terms of capability and literally latency, and then, and then evaluation of models.Hiring Philosophy: Hardware/Software Boundary and Engineering MindsetAlessio [01:05:41]: Awesome. You guys, any, specific engineering roles that you’re hiring for? And especially, like, who are people that succeed at your company as engineers? I think that’s always the most important thing.Qasar [01:05:50]: Yeah. fly.co/careers, I think there’s, there’s literally hundreds of roles. we’re looking at all the topics we talked about from, dev tooling and physical AI to operating systems, to autonomy and AI, within physical machines. The types of engineers, that’s a great question. That’s actually more interesting thanQasar [01:06:09]: the roles ‘cause we’re, we’re a large enough company, we’re roughlyAlessio [01:06:11]: Hiring everything.Qasar [01:06:12]: Everything, yeah. We hire everything.Qasar [01:06:14]: Yeah. I think we’re a Sunnyvale company and I think just from this conversation and kind of our backgrounds, you can kind of predict a little bit of what that means. we tend to hire fairly serious people, who are, who understand low-level systems, not just like a as a superficial understanding of technology, like engineers’ engineers almost. We definitely hire folks who are, like, have some diverse skill sets. We hire tons of specialists as well, to be very clear, but they’ve seen production and I think that, ‘cause that really informs how you, how you build technology.Peter [01:06:53]: Yeah. I would say people that really appreciate the hardware-software boundary.Qasar [01:06:56]: Yeah, exactly.Peter [01:06:56]: definitely in the vibe coding era, there are a crop of engineers that they don’t think about hardware at all.Peter [01:07:05]: And we don’t have that luxury, and so people that are a little more passionate about going a little bit deeper.Qasar [01:07:09]: Yeah, if you’re to contrast us versus, like, a AI lab or something, that’s where you’re gonna get the biggest contrast, which is, like, we’re just dealing with reality. what other things? All of the classic stuff. you want, you want folks who work hard and who are, who love the technology and like-Like a podcast like this or ratherQasar [01:07:30]: Like, if you made it to this part of the podcastQasar [01:07:33]: you’re probably qualified for or you’re interested in this.Swyx [01:07:37]: Yeah. And Peter said that he, likes the podcast as well, which is likeSwyx [01:07:42]: really cool.Qasar [01:07:43]: I’m a I’m a fan. Yeah.Swyx [01:07:44]: Yeah. Specifically on the hardware-software boundary part, it’s, it’s something I think about of our education system, in the States, but also maybe just in generally. I feel like there is that retreat away from that classical computer science or EE educationQasar [01:07:59]: Computer engineering or Yeah.Swyx [01:08:01]: And like, is there a point where you just do it yourself? Like, ‘cause at this point, you guys are the world experts on this, and actually you shouldn’t wait for some college system to spit them out for you.Peter [01:08:11]: you mean the in terms of education and upskilling kind of thing?Swyx [01:08:14]: Yeah. Yeah, just grab, like, youngQasar [01:08:16]: General Motors already did it.Swyx [01:08:17]: Smart kids.Peter [01:08:19]: GMI.Qasar [01:08:19]: Literally.Swyx [01:08:19]: Is there a Harvard University?Qasar [01:08:21]: Yeah, that’s where I went to for undergrad. Went to the General Motors Institute.Swyx [01:08:25]: I, that did not come up. I saw HBS.Swyx [01:08:27]: I didn’tQasar [01:08:27]: Everyone sees HBS.Qasar [01:08:31]: The Harvard brand, Lewis is high.Swyx [01:08:34]: What’s General Motors Institute like? WhatQasar [01:08:36]: it started 100 years ago for, to answer this exact question, literally the question you just said, which is likeQasar [01:08:40]: not enough engineers in Michigan. you’re talking about the early days of the modern corporationQasar [01:08:45]: General Motors being There’s a great book, Alfred P. Sloan’s, My Years with General Motors, that is highly recommended, which basically talks about what becomes a modern corporation. But a part of that is they’re like, “We are, we’re basically buffering on engineers.” So they started a school and actually even Google as most, as recent as probably 10 years ago was thinking of starting a university. In term there was discussions on it. So yeah, it was abso- we definitely up, we definitely upskill folks as well. The amount of training we do in term is actually surprising. Yeah. But it’s a luxury you have when you’re at our size.General Motors Institute, Education, and the Curiosity MindsetQasar [01:09:20]: When you’re, like, 25 engineersSwyx [01:09:22]: No.Qasar [01:09:22]: you just gotta survive. So again, take advice that’s relevant for your company rather than, like, immediately start trying to take high schoolersQasar [01:09:29]: and make them engineers.Swyx [01:09:30]: But I, like I did go up to a class that you taught ‘cause, like, it sounds like you can teach a lot.Peter [01:09:36]: Yeah. Well, I think honestly, the one of the most amazing use cases of these large models now is education, right?Peter [01:09:42]: Like, I’ve, I’ve taken, an engineer who, very good engineer, aerospace engineering background, and in a relatively short time span, like, he’s doing very confident front-end work, very confident back-end work, like, with the help of these models.Peter [01:09:57]: And like, not only can you do the implementation with them, but you can also just learn, right? It’s like you ask questions and you don’t feel embarrassed ‘cause the model’sPeter [01:10:04]: not gonna, model’s not gonna call you out on anything.Qasar [01:10:07]: Yeah. I think the I think the thing you probably need more than an engineering degree, though engineering degrees are, like, very important, like, I don’t know if there’s a way to shortcut, like, fluid dynamics or heat transferPeter [01:10:17]: The fundamental stuffQasar [01:10:17]: the fundamental stuff, at least on the mechanical side, is you need an engineering mindset and that sometimes is actually Not everybody actually has that. Some people are emotionally drawn towards arts or something else and that’s completely fine. There’s no judgment there. But I think the engineering mindset maybe in a more usable way is, like, wanting to understand a lower level and the lower level and the lower Like, how do photons move?Peter [01:10:42]: And extreme curiosity.Qasar [01:10:44]: Extreme curiosity. Like, what is light? What is a radio wave? Like, these really fundamental questions.Peter [01:10:49]: Right. If and if you get curious enough about software, you ultimately end up in hardware.Peter [01:10:55]: And soSwyx [01:10:56]: That’s the Alan Kay quote. Yeah.Qasar [01:10:57]: Yeah, exactly.Swyx [01:10:58]: So I’m trying to make analogies and then do all these things. Like, you’re kind of a blend between new General Motors and Tesla autonomy division for everyone else.Qasar [01:11:07]: we do work in all these other fields. I think if you talk to our trucking customers, they wouldn’t even perceive, they, like, some sense like, “Oh, you guys did some automotive stuff, but you’re, you’re really helping us.” SoSwyx [01:11:18]: Automotive is not trucking?Qasar [01:11:19]: No. no. That’s, that’sSwyx [01:11:20]: It’s, like, a wholeQasar [01:11:21]: It’s, it’s, it’s, it’s separate. There’s different problems. The mass And you have, you have the general categories of on-road and off-road. I think that’s what you’re thinking. So there’s on-road and off-road, but within on-road there’s all these subclassesSwyx [01:11:33]: Oh, okayQasar [01:11:33]: of machines. Especially when you talk about, you look at, a delivery robot that doesn’t have a human in it. That’s actually very different because now you’re not concerned with, like, the actual feeling that you haveQasar [01:11:45]: when you’re in a self-driving system. You don’t have to account for that. You canSwyx [01:11:48]: Just break.Qasar [01:11:48]: You can, you break hard.Qasar [01:11:50]: And you don’t care about jerk and all of these metrics don’t, or become inPeter [01:11:53]: The way to think about it, honestly, is a little bit like, any system that you as an as a human would need special training to operate, you can think of a little bit differently. So like, the license to operate a truck is different from the license to operate a carPeter [01:12:04]: which is different from the license to fly a plane. It’s different from You get it, right?Swyx [01:12:08]: Awesome, guys. Thank you for taking the time.Qasar [01:12:10]: Yeah, thanks for having us.Peter [01:12:11]: Thanks for having us.Peter [01:12:11]: Thank you. [outro music] This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Today, we check in a year after the first Unsupervised Learning x Latent Space Crossover special to discuss everything that has changed (there is a lot) in the world of AI. This episode was recorded just after AIE Europe, but before the Cursor-xAI deal.Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs.Thanks to Jacob and the UL production team for hosting and editing this!Jacob Effron* LinkedIn: https://www.linkedin.com/in/jacobeffron/* X: https://x.com/jacobeffronFull Episode on Their YouTubeWe discuss:* swyx’s view from the center of the AI engineering zeitgeist: OpenClaw, harness engineering, context engineering, evals, observability, GPUs, multimodality, and why conference tracks now reveal what matters most in AI* Whether AI infrastructure has finally stabilized: why “skills” may be the minimal viable packaging format for agents, why infra companies have had to reinvent themselves every year, and why application companies have had an easier time surviving model volatility* The vertical vs. horizontal AI startup debate: why application companies can act as the outsourced AI team for enterprises, why some horizontal companies still matter, and why sandboxes may be the clearest reinvention of classic cloud infrastructure for the AI era* The “agent lab” playbook: starting with frontier models, specializing for your domain, then training your own models once you have enough data, workload, and user behavior to justify the cost and latency savings* Why domain-specific model training is real, not just marketing: how companies like Cursor and Cognition can get users to choose their in-house models, and why search, domain specialization, and distillation are becoming more important* Open models, custom chips, and alternative inference infrastructure: why swyx has turned more bullish on open source, why non-NVIDIA hardware is suddenly getting real attention, and why every 10x speedup can unlock new product experiences* What it means to sell to agents instead of humans: why agent experience may mostly just be good developer experience by another name, why APIs and docs matter more than ever, and how pretraining-data incumbents are compounding advantages in an agent-first world* Why memory and personalization may become the next big wedge: today’s models mostly reward frequency of mentions, but in the future, swyx expects product choice to be shaped much more by personalized memory systems* The state of the AI coding wars: why coding has become one of the largest and fastest-growing categories in AI, how Anthropic, OpenAI, Cursor, and Cognition have all ridden the wave, and why the category may still have more room to run* Capability exploration vs. efficiency: why the industry is still in a token-maxing, experiment-heavy phase where people are rewarded for spending more rather than less* Claude Code vs. Codex and the strange stickiness of coding products: why first magical product experiences may matter more than expected, and why the bigger mystery may be why only a few names have emerged as real winners so far* What the end state of the coding market might look like: two major players, a longer tail of niche products, and possible disruption if Microsoft, Mistral, xAI, or the Chinese labs push harder into coding* Where application companies still have room against the labs: why frontier labs are trying to expand into verticals like finance and healthcare, but still leave space for focused companies that own the workflow and the last mile* Why coding may be a preview of every other AI market: the first category to truly go parabolic, the clearest example of foundation model companies colliding with application companies, and a template for how future vertical AI markets may develop* Why AI valuations now feel unbounded: from billion-dollar ARR products built in a year to trillion-dollar market caps, swyx and Jacob unpack how the AI market has broken traditional startup intuitions about scale and durability* Consumer AI vs. coding AI: why ChatGPT’s consumer category may have plateaued on frequency and product design, while coding continues to feel like a daily-use category with real momentum* The next product frontier beyond coding: consumer agents, computer use, and “coding agents breaking containment,” with swyx’s thesis that 2025 was the year of coding agents and 2026 may be the year they begin to do everything else* Whether foundation models are really killing startup categories: why swyx is less worried for early founders, more worried for mid-size startups and traditional SaaS, and why building something ambitious may now be the best job interview for a frontier lab* AI vs. SaaS and the internal culture war around adoption: the tension between AI-native employees who want to rip out expensive software and skeptics who think quick AI-built replacements create fragile systems* Why traditional SaaS may be under real pressure: swyx’s own experience spending six figures on event and sponsor management software, the temptation to rebuild it cheaply with AI, and the broader question of whether teams will trust custom AI-native replacements* Biosafety, security, and frontier model access: why swyx raised biosafety at a dinner with Anthropic’s Mike Krieger, why Krieger argued security is the bigger issue, and what restricted model releases reveal about Anthropic vs. OpenAI* The era of giant models: why 10T+ parameter systems may only be a temporary rationing phase before bigger clusters arrive, why labs may increasingly keep their most powerful models private for distillation, and why scale alone no longer feels like a complete answer* Memory as the slowest scaling factor in AI: why context windows have improved far more slowly than people hoped, why million-token context still has not changed most real workflows, and why memory may be the key bottleneck for the next generation of systems* What swyx changed his mind on in the past year: becoming more bullish on open models, more convinced that the top tier of agent startups behaves very differently from the median AI company, and more optimistic about fine-tuning and specialized model adaptation* “Dark factories” and zero-human-review coding: the next frontier after zero human-written code, where models not only write the code but ship it without human review, forcing companies to rethink testing and verification from first principles* Why RL and post-training may matter more than people assumed: even if the resulting models get thrown out every few months, the data, workflows, and domain-specific improvements persist* Synthetic rubrics, Doctor GRPO, and multi-turn RL: why reinforcement learning is becoming much more domain-specific and multi-step than many people realize, opening the door to much deeper customization* The next frontier after coding: memory, personalization, and world models, including why swyx thinks world models matter not just for robotics or gaming, but for giving AI something closer to lived understanding* Fei-Fei Li, spatial intelligence, and the Good Will Hunting analogy: the idea that today’s LLMs may know everything by reading it all, but still lack the lived experience that turns knowledge into a deeper kind of intelligenceTimestamps* 00:00:00 Intro preview: AI coding wars, startup pressure, and market structure* 00:00:28 Welcome to the Latent Space × Unsupervised Learning crossover* 00:01:17 What AI builders are focused on now: OpenClaw, harnesses, and infra* 00:04:33 Why AI infra is harder than apps, and where startups can still win* 00:06:39 Should companies train their own models?* 00:09:28 Open models, custom chips, and the new inference race* 00:11:25 Designing products for agents, not just humans* 00:16:49 The state of the AI coding wars in 2026* 00:19:27 Capability exploration, token-maxing, and why coding is going parabolic* 00:21:41 What the end state of the coding market could look like* 00:23:50 Where app companies still have room against the labs* 00:27:02 Why AI valuations and market swings feel unprecedented* 00:28:56 Consumer AI vs. coding AI, and why sticky products still matter* 00:32:28 What the next breakthrough product experience might be* 00:32:53 2026 thesis: coding agents break containment and eat the world* 00:35:27 Are foundation models wiping out startup categories?* 00:37:33 AI vs. SaaS, vibe coding, and internal team tensions* 00:40:01 Biosafety, security, and the politics of restricted model releases* 00:42:19 Giant models, compute constraints, and the limits of scale* 00:44:30 Memory as the real bottleneck in AI* 00:44:57 Why swyx changed his mind on open models* 00:47:44 Dark factories and the future of zero-human-review coding* 00:49:36 Why post-training and RL may matter more than people think* 00:51:50 Memory, world models, and the next frontier of intelligence* 00:53:54 The Good Will Hunting analogy for LLMs* 00:54:21 OutroTranscript[00:00:00] swyx: Isn’t that crazy? That number is just mind boggling.[00:00:03] Jacob Effron: What is the state of the AI coding wars today?[00:00:05] swyx: We’re in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year coding agents 2026 is coding agents breaking containments to do everything else.[00:00:16] Jacob Effron: Do you worry about the foundation models just getting into a bunch of these startup categories?[00:00:21] swyx: Mid-size startups. Yes.[00:00:23] Jacob Effron: What do you think the end state of this market is[00:00:25] swyx: for the market structure to, to significantly change? There would be[00:00:28] Jacob Effron: today on unsupervised learning. We had a, a fun episode and what’s really become an annual tradition, a crossover episode with our friends at Latent space.Swix and I sat down and we talked about everything happening in the AI ecosystem today. What we thought of the various changes at the model layer, what’s happening in the infra world, the coding wars, and a bunch of other things. It’s a ton of fun to do this with someone I really respect and another great podcaster in the game.Without further ado, here’s our episode. Well switch. This is, uh, super fun to be back with another unsupervised learning, uh, latent space crossover episode.[00:01:02] swyx: Yeah,[00:01:02] Jacob Effron: I feel like a lot of places we could start, but you know, one thing I always find fascinating, uh, about the way you spend your time is you obviously are like at the epicenter of this engineering movement and community, and you run these events and conferences and put on these.Awesome talks and, and I think just have a great pulse on the zeitgeist of what’s going on.[00:01:16] swyx: Yeah.[00:01:17] Jacob Effron: Maybe to, to start just what are the biggest topics people are thinking about right now?[00:01:21] swyx: Yeah, so I just came back from London, uh, where we did a IE Europe and we’re doing roughly one per quarter now, which Yeah, you’ve[00:01:27] Jacob Effron: really up[00:01:27] swyx: the, hopefully[00:01:28] Jacob Effron: up the, up the pace.[00:01:29] swyx: It’s trying. We’re trying to match AI speed, youknow?[00:01:30] Jacob Effron: Yeah, exactly. The tops would be completely different, I imagine. Uh,[00:01:33] swyx: yeah. You know, I definitely curate the tracks, like you can see what I think. When you see the track list and the, the speakers that I invite, obviously Open Claw is like the story of the last four or five months, and then be, be just below that.I would consider harness engineering, context engineering to be two related topics in agents and rag. And then there’s a long tail of Evergreen stuff like evals, observability, GPUs, uh, and uh, LM infra and just general, just in general. We also have other updates on like multimodality and, uh, generative media, let’s call it.Um, but I definitely, the, the first three that I mentioned are top of mind people. Yeah.[00:02:13] Jacob Effron: I think harness is particular like, so interesting. Um, you know, there was this tweet from Harrison Chase, the, the lane chain, CEO, that, that caught my eye recently where he said, you know, it finally feels like we have stability, uh, around the infrastructure for, uh, you know, around ai.And I think what. He basically was implying his like, look over the past two, three years as a company at the epicenter of AI infrastructure, it was a bit like playing whack-a-mole, right? You were constantly moving around with, however, the building patterns were evolving[00:02:36] swyx: for Harrison for sure. Right? Like he’s basically had to reinvent the company every year since he started Lang Chain.Right? It was Lang chain, Ang graph and LP agents and like, uh, I think he’s like one of the most nimble, adept sharp people about this. Yeah. Yeah.[00:02:49] Jacob Effron: Saying now, now is finally the time stability[00:02:51] swyx: this. Yeah.[00:02:52] Jacob Effron: Yeah. Um, do you buy that or what have you kind of make of that take?[00:02:56] swyx: I think that. It, it’s very expensive to say this Time is different sometimes, but when you’re just writing code, like it’s actually okay to just like try to make a call and I think it may not even matter if this call is right or not.Like I just don’t even care that much because you can be right on a thesis, but if you don’t, you don’t figure out how to monetize the thesis, then who cares if you said something first that said, um, it does feel like, for example. Uh, we went through a lot of different ways of passion packaging integrations up with, uh, with agents.And it feels like we’ve landed at skills, which is like the minimal viable format. Yeah. Which is just a markdown file, uh, with some scripts attached to it, and I don’t see how it can be more simple than that. And so there is some justification for. The stability around harnesses. I feel like there may be more adaptation with regards to maybe like the real time elements or subagents or memory or any of those like agent disciplines, let’s call it in, in agent engineering.Uh, but if, if the thesis is that, okay, you just want agents are LMS with tools in the loop with a file system, what they can do. Retrieval with, with skills and all these like standard tooling that now seems to be relatively consensus then probably. That makes sense. Um, I just think like there’s no point trying to stake your reputation on this thesis that we’re there because if it changes again, just change with it.It’s fine.[00:04:33] Jacob Effron: Yeah. It’s always, you know, I’ve always been struck by how that is. Much more challenging for infrastructure companies and application companies. Like obviously I think, yeah. You know, on the application side you’ve seen, you know, Brett Taylor from Sierra Max, from Lara. Like, they’re like, look, we build, you know, what’s ahead of the models and we’re willing to throw everything out every three months, you know, as the models get better and better.Exactly. Yeah. But the thing you at least have there is you have. Uh, you have an end customer, right? That’s like decently sticky. Um, you know, they will mostly stick, you know, they’ll, they’ll give you a shot at least of, of building these things. What I’ve always found more challenging, uh, at, at the kind of like, you know, reinvent yourself every three months of the infrastructure layer, it’s like, you know, developers are definitely a, a pickier audience maybe than an accounting firm or, uh, you know, a bank.Yeah. And so it’s definitely a, a, a more challenging position to be in to, to have to constantly reinvent yourself.[00:05:17] swyx: Yeah. Yeah. Yeah. And, and like when they turn, it’s like. Very complete. Like, they’ll leave to like the, the hot new thing, uh, because there’s like no defensibility, I guess. Like e even, even if you are a database, like, uh, people can migrate workloads off databases.Like it’s, it’s a, it’s a known thing. Uh, so I think like basically what we’re talking about is the vertical versus horizontal, uh, debate in, in AI startups. And uh, the way I think about it also is just that like when you are. Um, Lara, when you are a bridge, like you are the outsource AI team, right? You, you are, your job is to apply whatever state ofthe art AI methods.[00:05:55] Jacob Effron: Yeah. Like this translation layer between model capabilities and your[00:05:57] swyx: own customers. Yeah. To, to the end customers and like, well, if they didn’t have you, they would’ve to hire in house and they’re not gonna hire in house so they have you. And like, I think that’s like a reasonable, like very robust to any whatever trends and, and discoveries that people make in, in the engineering layer.I do think like there is, um. It like sort of useful horizontal companies being built, but they’re all. Very much like, sort of like the reinventions of classic cloud in the AI era and the, the primary one being sandboxes. Yeah. Um, which like, it’s another form of compute guys, like, let’s not get too excited about it.But I mean, like the, the workloads are enormous.[00:06:38] Jacob Effron: Right.[00:06:38] swyx: Yeah.[00:06:39] Jacob Effron: It’s interesting, and I feel like as, as part of this, you know, the questions that folks are asking around infrastructure, there’s a lot around, you know, the extent to which companies should have their own AI teams and what they should be doing in-house.And, you know, uh, I think there’s questions around should people be training their own models? Should people be doing, you know, rl, uh, in-house based on the data they have? I feel like, you know, one has to evolve their takes on this every, every three months with paces. But where, where are you at on this today?[00:07:00] swyx: I think, well, I mean actually all models have gone up. Um, and obviously I’m involved in cognition and also cursors doing, doing, uh, a lot of own model training. And I think that that is some part of the, what I’ve been calling the agent lab playbook, where you start off with the state of the art models from, uh, from the big labs and you, uh, specialize for your domain.But once you have enough workload and enough high quality data from your users, then you can obviously train your own models and like save a lot on cost and latency and all that, all that good stuff. Um, you also get like a marketing bonus of like calling it some fancy name and putting out some research[00:07:38] Jacob Effron: from my seat.I can’t tell how much of it is like actual, you know, value that’s provided to the end user. And how much of it is that marketing bonus? Right. It seems some combination of the[00:07:45] swyx: I think it’s both.[00:07:46] Jacob Effron: Yeah.[00:07:46] swyx: Um, no, no. There, there actually is real value. Um, and you, you know that for a number of reasons. Like one, even when it’s not subsidized, people do choose it as like one of the top four or five.This is both composer two and, uh, suite 1.6 I one of the top five models. Like in a, in a fair market? In a free market, yeah. In a, in a, in a model switch. Or people do choose it and like, it’s not subsidized. Like, so that’s as good as it gets. Uh, but beyond that, like domain specific models, for example. For search with, with both, which both companies have absolutely makes, makes a ton of sense.Everyone says like, yeah, we should always, always do this. And honestly like, I think the infrastructure for that is becoming easier with, um, like thinking machines tinker thing as well as primary like, uh, lab stuff. Yeah, I mean like, this is one of those like reversal of the, the bitter lesson where you first bootstrap on the large models and the general purpose models to get big.And as you get very well-defined workloads that are just high quantity but not high variance, um, then you just distill down to a smaller model and run that on your own. Right. Which like totally makes sense.[00:08:50] Jacob Effron: What I’m less clear on is the kind of DIY RL use case, which I think is really mostly around, you know, improved, uh, quality for, for different things.Obviously there’s probably like more efficient ways to, you know, get a smaller model that’s that’s faster and cheaper. And it’ll be interesting to see whether. You know, obviously you had, you know, uh, two, three years ago this whole case of companies that were, you know, pre-training and claiming better outcomes in, in their domains than getting kind of cooked as each model iteration improved.You know, I wonder whether that’s a, a similar story plays out in the, uh, in, in the, our all space. Yeah, for the focus on, on on pure outcomes and quality, not the cost side, which clearly your own models for cost at scale makes a ton of sense.[00:09:28] swyx: I think there are this, there are two sides of the same coin.Like you basically always want to hold, uh, quality constant or trade off a little bit of quality for a drastic decreasing cost. And that’s true for everyone. Uh, one element I wanted to bring out, which is very much in favor of open models, is custom chips. So this would be cereus, but also talu. And then there’s a huge range of stuff in between.This has been a huge story this past year on just like everything non Nvidia is getting bid up, including like freaking MatX is working for, which is very, which is very rewarding for me, but I think one of those things where like, oh, like the suddenly, because the number of alternative. Hard, uh, hardware is increasing and the inference that you can get is insanely high.Like, um, we’re talking thousands of tokens per second instead of less than a hundred. So the trade off for qua quality doesn’t hold as much anymore because the speed is so high.[00:10:24] Jacob Effron: Have you seen a lot of companies go all in on the alternative chip?[00:10:26] swyx: So cognition has Yeah. On Cerebras, uh, and, and so has OpenAIUm, uh, and so no, I don’t think so beyond that, uh, and that, do you think that’s like a, that’s mostly, that’s foreshadowing of, that’s, yeah. I used to be kind of a skeptic in terms of like, okay, so what if I get my inference at a hundred to a hundred tokens per second sped up to 200 tokens per second. It’s only two X faster.It’s not that big a deal. Um, but when you, uh, I think every 10 x does unlock a different usage pattern. Um, and you, we have proof in Talas and, and some of the others. That you can actually, um, drastically imp improve inference speed and what happens from there? I don’t even really know, like it’s, it’s so hard to predict when entire applications just appear at once.Yeah. Uh, and it also isn’t that expensive, right? So like, um, this is one of those things where like, I, I think the, the investment cycle is gonna be multi-year. Um, and I. Would caution people to not dismiss it too, too quickly.[00:11:25] Jacob Effron: Yeah. I mean, one other like infra question I was curious to get your thoughts on is obviously it seems increasingly a lot of the cutting edge infra companies are building for agents as the buyers of their product or users of their product, right?[00:11:35] swyx: Ooh,[00:11:36] Jacob Effron: and[00:11:37] swyx: another huge theme. Yeah. Yeah.[00:11:38] Jacob Effron: And I’m trying to figure out like what. What, what do you have to do differently about selling into agents? Um, are they just the ultimate rational developers? Uh, or is there, you know,[00:11:46] swyx: no, absolutely not. Um, I think they are easily prompt, injected and, uh, very tuned towards like, basically com compounding existing winners.[00:11:57] Jacob Effron: Yeah,[00:11:57] swyx: so like if, like, congrats if you won the lottery for getting into the training data right before 2023, because now you’re like installed in there for the foreseeable future. But yeah. Uh, you know, one stat that Versal, uh, CTO Malta dropped at my conference was that there are now, uh, 60% of traffic to Elle’s, um, like app arch, like admin app architecture for like configuring versal applications, uh, is bought.It’s not, it’s not human. Uh, so like your primary customer is agents now. Um, and it’s mostly co like mostly coding agents, mostly people using CLI on CP or whatever. But yeah, I mean, I think. More. I, I think step one, if it doesn’t exist as an API that agents can use, it doesn’t exist. Right, right. Which I think is like, uh, it’s a good hygiene thing anyway, to, to make everything API available, but not as like an extra, um.Push on like products, people to not only work on the ui, um, you should probably work on the on SCLI stuff. Beyond that, I think honestly there is like, so I, I come from the sensibility of, I think everything that you are trying to do for agents experience now, which is the term that Matt Bowman and Nullify is trying to coin, is the same thing that you should have been doing for developer experience.That you should have had good docs, you should have had a consistent API, uh, that is. Mostly stateless. Um, you should have, I guess, discoverable or progressive disclosure or like search or like whatever. And so now that people have energy in like finding these customers to do that, that’s great. Um, do I believe in.Extending beyond that into something like a EO, um, for gaming The chatbots? Not necessarily, but obviously there’s gonna be huge advantages when people who figure out the short term wins. Yeah. And short term wins can compound.[00:13:43] Jacob Effron: Do you think these compounding advantages to like the, the pre-training data cutoff companies, like, you know, obviously over some period of time, I imagine that doesn’t persist.And so as you think about like. I dunno, three, four years from now what the, you know, selection criteria end up being. Do you think it still mirrors exactly what you were saying before? Like it’s exactly what you should have been doing all along to sell a good product to developers?[00:14:01] swyx: It could be, except that I think in three, four years we’ll probably have much better memory and personalization.So then general a EO or GEO doesn’t really matter as much. So I think whatever memory or personalization system we end up with will probably d determine what you end up choosing much more. Than, than what is currently the case, which is just frequency of mentions, let’s call it. Yeah,[00:14:26] Jacob Effron: yeah.[00:14:26] swyx: Uh, so you just spa quantity and I think that’s, I mean, that’s something I’m looking forward to.I do think, like, like, you know, I, I think that the fundamental exercise to work through for yourself is if you start a new, um, sort of. Uh, disruptor company. Now there’s a, there’s a big incumbent that everyone knows, like, like superb base. Super base is like, kind of like the Postgres, like database, uh, incumbent.If you wanna start like new superb base, how would you compete with them? And I don’t necessarily have the answer, but I, I, I do think like people, like resend like relatively new. I think they would start like 20, 23 and still there was, there was a recent survey where like, people. Checked what Claude recommends by default.If you just don’t prompt it with anything, just say, gimme an email provider and says, resent as in like 70, 70% of each cases. Like the fact that you can get in there with like such a relatively short existence, I think is, is encouraging.[00:15:14] Jacob Effron: Yeah.[00:15:14] swyx: I do think like. Um, you do want to do whatever it is to, to like to, to get in that Very short mentions this because, um, it’s not gonna be 20 of them, it’s gonna be like three.[00:15:26] Jacob Effron: No, definitely. It feels like, uh, you know, probably more, more consolidation than ever. Uh, or, or kind of like, you know, uh, a winner take most market than maybe the, the, the physics of go-to market in the past. Yeah. Might have, uh, enabled.[00:15:38] swyx: The other thing also is like, semantic association is gonna be very important, uh, in the sense that like, you want to do like the combo articles where you’re like, use my thing with for sale, with blah, blah.And like that all gets picked up in a, in a corpus. And so that’s. Probably one thing that you, you wanna do? Well, I don’t know what else. Uh, it’s, it’s, it’s, it’s one of those things where like, I think I feel, I feel I’m behind, uh, I don’t know how you feel about this, but like,[00:16:04] Jacob Effron: I think AI is just everyone constantly feeling like they’re behind some, uh,[00:16:08] swyx: yeah.With,[00:16:09] Jacob Effron: I wanna meet the person that doesn’t feel behind,[00:16:11] swyx: but like with, with ax, right? Like, so, so like, my, my stance was that exactly what I said before, like everything that you, that you should do for agents is something that you should have done for humans anyway. Yeah. And so. To the extent that you’re just getting it more energy to, to do things for agents, great.But like, uh, it’s hard to articulate what new thing apart from just like more spam, um, that you should be doing. Anyway, that would be my take right now. Um, I I, I do think like there, there will be more turns at this. I think the personalization turn that is coming, um, will be big. And I don’t know what that looks like because like basically we’re kind of, we feel kind of tapped out on the memory side of things.[00:16:49] Jacob Effron: Yeah. I, I guess since we last chatted, you know, you, you took this role over at cognition, um, and you’ve obviously have a, have a front row seat to the AI coding space today. You know, I feel like coding in many ways. You know, people view it as this, like, I mean, besides being like the, the mother of all markets and this massive opportunity, I think it’s kinda a preview of like, what’s to come for many other spaces.Both. Yeah. You know, I feel like agents are most advanced in coding. I also feel like the, you know, competition between foundation models and application companies, you know, and, uh, mirrors what we may see in other spaces. And so maybe for our listeners, can you just lay out like what is the state of the AI coding wars today?[00:17:25] swyx: Um, it is massive, right? Like, uh, and I don’t think necessarily, last time we talked about this, we appreciated the size of what[00:17:32] Jacob Effron: No, I wish we did.[00:17:33] swyx: I state of AI coding wars today, um, both opening eye philanthropic have made it their p serials to competing coding. Um, and. Tropic is like 2.5 billion in a RR just from Cloud Code.The way they recognize a RR is. Opt for debate, uh, open ai. I don’t think the, a public number is known, but let’s call it 2 billion as well. And then cursor is like, rumored to be 2 billion, you know? And, and those, those are like the public numbers that are known? Yeah. Um, so like huge markets that have just been created in the past one year.Like, like anthropic, just like Claude Code just recently celebrated their one year anniversary, which is, yeah, pretty nice. Um, so, and then I think, like the other thing that I see is there’s, there’s some other people who are like, oh, here’s like the, the sort of relative penetration of, uh, Claude use cases, right?Like, and it’s like coding 50% and then legal, whatever. Health, uh, it’s like the, the remaining ones. And there was a very popular tweet that was like, okay, I’ll look at the, the empty space and all these other use cases. If you are a new founder today, you should be betting on the other stuff because on, on a sort of catch up Yeah.Theory and my. Consider my, my pushback is the same pushback that, uh, I had on app over Google, which is like, well, well why is this time different? Like, why, if it went from let’s say 10 to 50% in the past year, why can’t I keep going? Uh, and like getting that wrong is actually a very painful one because you could have just did, did the momentum bet.Instead of the mean reversion bed. So I, I, I think that that is the, the state of things now that people are very, very much into psychosis. Um, they’re are getting rewarded for spending more rather than spending less. And I think we’re not in that phase of efficiency. We’re in a phase of sort of like capability exploration.So I think people who are more crazy, who are more. Uh, creative, um, get rewarded comparatively. Yeah.[00:19:27] Jacob Effron: Well, it’s interesting. I mean, it feels like behind these like token maxing, leaderboards and whatnot is this, it’s like the first phase of this transition from a workforce perspective is you just gotta show your employer like, Hey, I, I use these tools.[00:19:37] swyx: Here’s my nu number of tokens I cost, and that’s it. They don’t care about the quality. Right. It is, uh, maybe distasteful to someone who cares about the craft and, and all that. Um, but directionally everyone just wants you to go up regardless. And so, um, there it is not very discerning. It’s, and it’s probably very sloppy, but I think it’s net fine because we’re still probably underusing ai just in generally.Yeah. Um, and so I think that’s like very interesting. Like we had on the podcast, uh, Ryan La Poplar from OBI, who spends a billion tokens a day. Yeah. Um, and that’s for those county home, it’s like something like 10,000 worth, $10,000 worth a day of API tokens. If they, they did market rates, um, and like most of us can’t afford that.Yeah. But like. And, and, and probably a lot of what he does is slop.[00:20:25] Jacob Effron: Right.[00:20:25] swyx: But like, he’s going to dis, he’s like, if there were a new capability, he would discover it first before you because he was, he was trying and you were not trying. Right. And like, you only do things that work like, well, good for you.But like the, the people who are going to discover the next hot thing are living at the edge.[00:20:42] Jacob Effron: Right and increase in living at the edge of just having the compute budget to like run these experiments. I mean, kind of similar to what living at the edge on the research side has always been. You know, it was constrained in many ways by the amount of compute you had to run these experiments.It feels similarly on the, almost on the builder or like actualizing these tools now.[00:20:56] swyx: Yeah. The other thing that’s, I mean, very obvious is philanthropic is kind of like the high price premium player. Um, that where, you know. Restricting limits or restricting model releases even is like the name of the game.Whereas Codex is like, come on in guys, use our SDK, use our login and we don’t care. We’re gonna reset limits. Whatever you do want to try to exploit the subsidies where you can get it. And definitely Codex is super subsidized right now. Gemini also very subsidized. Um, and. Comparatively, like, I think you should make, Hey, I guess while, while that’s going on, it’s not that bad to be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI.Um, and, uh, I I, I, my sense is that people aren’t even there yet.[00:21:41] Jacob Effron: How do you think this, like, market ultimately plays? I mean, it’s obviously such a big market that, you know, any slice of that market is interesting for, for anyone going after it. But I think what, what makes people so interesting in the coding market particularly is it feels like it’s kind of this.Foreshadowing of what will happen in other, you know, any other kind of application market that the foundation models eventually turn to and are all their models against and gather data around. And so how do you think, you know, like does there end up being room for lots of different kinds of players or like, what do you think the end state of this market is and is that, do you think that’s applicable to other markets?[00:22:10] swyx: I feel like there will be, I mean. Status quo is probably the most likely outcome, which is there are two big players and there’s a small range of longer tail people that, um, fit other use cases that the, the two big players don’t. That feels right to me. I think that, um, for it to, for the market structure to, to significantly change there would be, there needs to be significant change in like the economics or like the, the brand building or like the, the, the, the value propositions of the, of the companies involved and I.Haven’t seen any in the last six months that, that have really changed the stories materially. So I feel like they would just keep going until something, something else happens. Something else happens, meaning like Microsoft wakes up and like goes like. Guys, we have GitHub, we have, uh, you know, we, we, we’ll, we’ll do something much bigger here than other, other than just copilot.Um, and, uh, that would be a big change. Um, MSL has put out a model now, and I was in a breakfast with, uh, Alex Wang, where they were like, yeah, like, we, we really, really want to go after the coding use case. We haven’t done anything yet, but like, don’t underestimate them. Right. Um, and, and similarly for the Chinese labs.Um, I think they’re trying to go after it. Like ZAI is doing stuff. GLM uh, ZI and GLM is same thing. Um, uh, and, and so it’s, so like everyone’s trying to get a piece of that pie. I, I feel like the, the status quo has been pretty stable for the past, like almost a year I’ll say.[00:23:39] Jacob Effron: Yeah. And is the room for the, not like, you know, for, for the application companies more on like the enterprise side or like where do the, where do the, like what surface area do the model companies leave for application companies?[00:23:50] swyx: Yeah, that’s a good one. Um. It’s very much evolving. Um, it, I, I, I will say because opening I did not have this, the, this level of attention on coding. Yeah. Uh, a year ago. We just don’t have that much history. Right. Um, and it seems like, for example, so the big push at Open I now is the Super app. Um, is that a consumer thing?Is that like a products like. Portfolio rationalization thing, how much is that gonna take away attention from coding at the time when they actually do want to put more coding? I think it’s, it’s very unclear. So I do think like there’s, there’s all these, like in both big labs, there’s. Uh, sorry. Both of the, and, and drop and, and deep minus and XAI are are separate cases.Um, they are trying to see the other time expansion areas. So cloud code for finance. Yeah. Um, uh, cloud cowork, all those, all those things. Whereas I think cursor and cognition are like comparatively just focused on coding and so I, I do think they leave space and I do think for the other verticals that also means the same thing.Right. That, uh, that they’re not gonna be that. Um, intensely focused on, on, on that domain. Except for, I, I think I would mark out finance and healthcare as like the next ones, um, that they’re clearly going after. Uh, I, I would say comparatively, healthcare seems more thorny. There, there, there’ve been some announcements about it, but like, I would respect the, the finance work a lot more just because like the, the path to money is a lot clearer.[00:25:12] Jacob Effron: Yeah, no, I mean, obviously like, I, I think, you know, maybe similar to, to the space that’s being left in these other domains, you know, there’s obviously. Uh, a lot that’s required to actually implement these tools in enterprises, uh, versus, you know, maybe just giving them, uh, giving model access to, to folks outta the box.[00:25:27] swyx: Yeah, yeah. Yeah. So the, the agent lab thing is like, we’ll do the last mile for you. Whereas I think the model labs tend to just trust the model and, and be minimalist about it. Both of them work.[00:25:38] Jacob Effron: Yeah.[00:25:38] swyx: I, I don’t, I don’t necessarily think one, uh, beats the other, uh, for every, for every use case. Um, all I, all I do know is that it does seem like.Uh, the large enterprises do want a dedicated partner that isn’t just the model labs, which is kind of interesting.[00:25:55] Jacob Effron: We, we’ve been in this phase of, of pure capability exploration. And so I think nothing has been, you know, better for the large labs, right? I mean, they’re always gonna be, uh, uh, the frontier of, of capability exploration.And so I think have a very good relationship with a lot of these enterprises. But ultimately over time, like. The, uh, the incentive structure of these labs is always gonna be maximal, you know, token consumption for, uh, for the end customers they work with. And there’s just, I think, so few companies that have actually gotten to massive scale.Maybe coding again is the most interesting. So it’s the first space that really is just completely gone, you know? Yeah. You must love it every day. Like absolutely insane. And. I think it[00:26:32] swyx: gets even. Okay. I mean, like, I think we, we say good things about crystal cognition, but the sheer liftoff of like both end UPIC and open ai.‘cause they, they, they have independent valuations. I mean, let’s throw an XEI in there because it’s now I ping at 1.2 trillion. That number is just mind boggling. Like I, I feel like in normal investing or normal startups, there’s kind of like a ceiling market cap or valuation. Totally. That, that like you, you reach and you go like, all right, let’s, it’s gonna be chiller from now on.And these guys are not slow down. No.[00:27:02] Jacob Effron: Well, I also think the dynamic is fascinating about some of these later stage companies is, is, you know, in the past, I feel like in, in venture world, if you got to a certain level of scale, the question around you was really more a valuation question. And this is like why there was different phase, like, you know, types of venture people did and like the late stage growth people were just incredible at like, you know, a little bit of what’s the ultimate market opportunity of this company, but also what’s the right way to, to value it.Like we know it’s, it’s in some bands of an outcome that is like. Sure there’s some variance to it, but it’s like relatively understood what that bands is and then maybe you get over time surprised to the upside. Whereas any kind of like later, even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or two years are so massive because of how fast the ecosystem changes that it’s like.Even for later stage companies, every three months could be an existential level event to the upside to the downside. Yeah. Um, and I think that, like, you are obviously seeing it in the, in the positive with code, which, you know, if you think about a company like philanthropic, you know, that. For a while, it was like unclear if they were going to have access to enough capital, um, to really stay in the, in the race, right?And then coding hit at the exact right time. They had the perfect model for it. They executed brilliantly. Um, and you know, now are, are, you know, uh, you know, one of the most valuable companies in the world.[00:28:13] swyx: Uh, at the same time, I, I don’t find, I, I have zero sympathy for opening eye because they’re crushing it and they’re all rich.You know, this is like a high class champagne problem to have to, uh, to be number two at coding or whatever. Like, who cares? Like, you’re, you’re doing great.[00:28:27] Jacob Effron: Yeah. It’s funny though. I can’t even, I mean, you would be closer to this, uh, you know, even that you’re in the AI coding space, but it’s like a lot of people I talk to think Codex is just as good, if not better than Claude Code.Right. I think one thing that I’ve been really surprised by, and maybe, maybe Cloud Code is a better product in some ways, I’m curious your thoughts is just in consumer AI with chat GBT. You saw this big first mover advantage, right? Where admittedly today, like, I don’t know, Claude Gemini. Great products.Not sure, not abundantly clear chat GBTs any better, but like. People stick with chat, GBT, it’s the first thing to introduce them.[00:28:56] swyx: They stay, but they’re not growing anymore. I don’t know if you’ve seen[00:28:59] Jacob Effron: Right. But that to me is more of like a, a, a product problem than it is. They’re not like, it’s not like they’ve like lost share to someone else.My understanding is the overall problem with consumer AI today is much more of a how do you take this tool and, you know, for, for folks like us, like knowledge workers, it’s like this incredible magic tool, but it’s not necessarily a daily active use tool for a lot of people around the world today. And what are the like products?It’s, it’s kind of a category wide problem. Like in coding, for example, like. The entire space has gone parabolic. There may be some relative growth in, uh, in other consumer AI players, but it’s not like consumer AI as a category is like going parabolic and they’re not capturing most of that thing. I think it’s actually the larger problem is much more, hey, the category has kind of hit a bit of a plateau of people haven’t figured out how to bring, you know, tons more users on board.Yeah, yeah. Or increase the frequency of those users. And so it seems more of a category wide problem than it is, you know, a massive market share of change. I was gonna draw the comparison to, to the coding space where Claude Co is the first product, obviously, to introduce people to this magical experience.You know, by all accounts, codex is, is pretty damn close to as good, if not better. Um, but like still that first product, you, you would’ve thought that would not be a super sticky, uh, you know, product surface area. And it actually has, it turns out, I, it feels like the first lab to introduce you and experience really does, uh, keep a lot of, uh, a lot of the focus.[00:30:12] swyx: I, I think. M maybe it’s like still, still early days. You know, Chad, BT is like three plus years old and Yeah. Cloud code is only one. Just turned a year. Yeah. So give it time, you know? Yeah. Like, yeah. I mean, definitely sometimes a lot of people have switched from to Codex. Maybe that will keep going. I, it’s like really hard to tell.Uh, yeah. I, I, I do, I do think that. Because we are in this like, high volatility, high temperature phase. Um, the loyalty and stickiness to first movers and category creators, I don’t think is as high as it might be in some other, uh, areas in our careers that we’ve looked at.[00:30:47] Jacob Effron: Yeah. Though, I mean, I’ve been surprised by the cloud code thing.I, I would’ve thought that, like, in many ways I always worried about the[00:30:52] swyx: enterprise. You think you would’ve been gone by now?[00:30:53] Jacob Effron: Not gone. But I would’ve, I I always worried that the, that the consumer business of these companies would be quite sticky. And then the enterprise API business. Uh, was actually like, you know, in some ways like your least loyal buyers, like they would, they would move to,[00:31:05] swyx: right, right.But, but they worked out that it wasn’t the enterprise API it was enterprise product.[00:31:09] Jacob Effron: Totally. And maybe that was the, that was the secret that like, but the amount of lock-in or just default behavior that has happened in that space, uh, is, is more than I might’ve imagined with two products that by all accounts are pretty damn similar.Yeah.[00:31:22] swyx: No fight there. Uh, I will say I do think that Codex is still in like a catch up. Like in terms of personal experience. Um, the only thing I like out of, out of Codex is the, is like Spark and like yeah. Uh, the, I, I feel like the skills integration is a little bit better. I feel like, uh, the, the speed is a bit better.Maybe ‘cause it’s in, is written in rust or whatever. Um, very minor things that you like. Almost like telling yourself rather than like objectively assessing between two, two of them. I, I, I do think, like vibes wise, I think that’s going on. Um, the, the, you know, I, I feel like the, the missing questions, uh, in, in this whole debate is like, why is this so concentrated in only two names, right?Yeah. Like, um, how, where, like, where is the Gemini? You know, presence, where’s the Xai presence? Um, and like they are trying, it’s just they haven’t made that much progress yet.[00:32:12] Jacob Effron: But what the, what the Claude Co moment does show, and it actually in some ways makes you a little more bullish on the potential for someone else to catch up because it does feel like if you’re the first person to introduce some magical net new product experience, that that actually might be stickier than one might have imagined.[00:32:27] swyx: Right, right, right. Okay. Yeah.[00:32:28] Jacob Effron: And so it’s, everyone can believe they have shot[00:32:29] swyx: that. What do you think that new product experience might be like? I, I, it’s, it’s like, and this is a failure of imagination on my part. Like, I always wonder, like, people always say this like, well, the, the thing that will save us is like being first to the next new thing.Like what is it?[00:32:41] Jacob Effron: Yeah.[00:32:42] swyx: It’s like,[00:32:45] Jacob Effron: I dunno, something around like, uh, consumer agent, computer use, like hybrid. I think, obviously, I think we’re like scratching the surface on the consumer side.[00:32:53] swyx: So my, my current theory is like the. Open claw is like a vision of things to come.[00:32:58] Jacob Effron: Totally.[00:32:58] swyx: Um, and uh, it’s good that O open I has like the association with open claw, but by no means do they have the rights to win it.The general thesis that I have been pursuing now is that the year the same way that 2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else. Um, and so coding agents continue to still win, but because they generate software and software eats the world, so like, it’s kind of like the trans.Associated property of like software, eat the world, coding agents, eat software, therefore coding agents eat the world. Um, which is like an interesting,[00:33:30] Jacob Effron: yeah, and breaking containment always an easier phase phrase in the consumer context than the enterprise one. You’ve seen people run these really cool, uh, experiments in their own personal lives.I think like,[00:33:37] swyx: yes.[00:33:38] Jacob Effron: Figuring out, you know, how you, obviously everyone’s focused, you know, on the enterprise side now around how you create these experiences. I feel like the vibes, you know, people love to have these narratives of like, everything is completely shifted. It’s like I actually, you know, open AI.Organizationally, uh, you know, volatility aside is, you know, great products, great team, great models like everyone else in the world is incentivized for there to be. Two, three more. Everyone would love more like great model companies. And so I feel like the, the natural forces of the world revolt when any one company, you know, is too much the star of the show, right?There’s so many people in the ecosystem that are incentivized for that not to happen. And so I think I’d be shocked if we don’t have. Uh, uh, reversion of vibes, not maybe completely the other way, but at least a little bit more equal at some point over the next six, 12 months.[00:34:24] swyx: I, I think there’s just a kind of different stages when, when you talk about the world, one wanting more model companies, I talked think about like the neo labs.[00:34:30] Jacob Effron: Yeah.[00:34:31] swyx: And I mean, I don’t know, is it fair to say none of them have really broken through in the past year?[00:34:35] Jacob Effron: I think that’s totally fair,[00:34:37] swyx: which is rough. Um, and well, how are we gonna, how are we gonna grow that diversity in, in, in choice, like. Um, that’s, this is it.[00:34:46] Jacob Effron: Yeah. It’ll be really interesting to see what, what, what ends up happening with that.And you’ve seen, you know, folks like Nvidia, you know, very incentivized to make sure there’s, there’s a broader platform of, of other model providers.[00:34:57] swyx: I think, uh, I don’t know people say this, but I, I, I don’t think they try it hard. Nvidia tries harder to build neo clouds[00:35:05] Jacob Effron: Yeah.[00:35:06] swyx: Than neo labs.[00:35:07] Jacob Effron: Well, they try pretty damn hard to build neo Cloud, so[00:35:09] swyx: that’s,[00:35:09] Jacob Effron: yeah.[00:35:10] swyx: But like, you know, let’s call it like the, the core weaves of the world, much happier place in the, you know, than any neo lab built on top of them.[00:35:18] Jacob Effron: Yeah. That one might argue it’s, it’s easier to, to enable a neo cloud to be successful than it is. Uh, you can’t will a neo lab into existence the same way you, soNvidia[00:35:25] swyx: has more direct control over it.Uh, for sure.[00:35:27] Jacob Effron: What else is kind of catching your eye today on the startup side? I mean, you worry, there’s obviously this whole narrative of like, you know, the foundation models, you know, they announced a product and every stock goes down 15%. Like[00:35:36] swyx: Yeah.[00:35:37] Jacob Effron: Do you, do you worry about the foundation models just kind of eating into to a bunch of these startup categories?[00:35:43] swyx: Not really. I, I think actually like. As, uh, there’s, there’s, okay, there’s, there’s, there’s the, there’s the point of view of like being an investor in startups, and there’s a point of view of like, do you wanna start something? And I think honestly, like the, the downside for all these is so. Minimal in, in a sense of like, the worst you do is you just get hired into one of these labs anyway.So I, I think the, the market for people who just do things and try things and try to execute in like a competent way, even if like it doesn’t work out commercially, even if it just wasn’t that great anyway. Like, but like that’s your job interview to go into, into one of these things anyway, so, um, I don’t feel that.From a, from a very, very small startup perspective, mid-size startups. Yes. Uh, I will say there’s been a lot of dead, um, LM Infra, a lot of LM infra consolidation like the, the, uh, lang fuses of the world getting absorbed into, into click house. And I, I think. Like people have maybe worked out the domain specific playbook, uh, and like, I think that’s okay.Um, and, and yeah, I’m not that, not that worried about, uh, okay. So, um, I, I would say I’d be more worried about traditional SaaS, like low NPSS. This is the whole AI versus SaaS debate that has, that’s been going on. Uh, and, and like literally I’m going through that exact thing in my company where, so I like kind of.Thinking through this on a very visceral, visceral level, right? On one hand you have the people who say you vibe coders don’t appreciate the amount of work that goes into A-A-C-R-M and like, yeah, you think you can rip out Salesforce? So did the 30 entrepreneurs before you, right? Like, like, you know, you classically underestimate the things that you don’t.Deeply, no. And, and, and target audience is not you. Uh, at the same time, like we have never been able to build software so easily and customize software so easily and like Yeah, you’re not gonna use 90% of the things in Salesforce. So like, yeah. What’s the typical, so what have you, what[00:37:33] Jacob Effron: have you done internally?[00:37:34] swyx: So we have there the main SaaS that we do for event management and sponsor management. That’s, and we paid 200 KA year for that. Not, not huge, but like chunky for, for, for my, my scale. Um, and like, yeah, I could probably spend 2000 and, and build like a custom version of that. Um, the, the, the trick has been dealing with my, the rest of my team and getting them on board.Yeah. ‘cause I’m the most ethical person on my team, but like, I can’t make that decision myself. And I think in the same way I’ve been telling with other CEOs team leaders as well, it’s like, well you can be super cloud pilled. You can be super LM psychosis and that you think that’s okay, but you like you have to bring your team with you.And I think like there, the sort of widening disparity in LM psychosis in companies is causing real s real riffs because. And on one hand, on one hand, the people who are less AI native are not getting with the picture. They’re not, they’re actually like behind, they’re actually not waking up to the fact that like you, everything you think is necessary is not actually that necessary.And in fact, exactly would be better of you if you just like held your nose and went in and when came out the other side. Yeah, only talking to agents in natural language and like your life would actually be better and you just, you’re just like close-minded. There’s that perspective. The other perspective is, oh, you vibe coder.You, you did this in a weekend and you got the 80% solution and now the rest of your employees. Have to pick up the rest of your s**t, right, that you, that you thought you were, you were such hot, amazing, uh, uh, at, but like, actually you didn’t figure it out. And like, actually LMS are still useless at this and blah, blah, blah.So like, I think there’s this huge debate going on in every company right now. Um, and like, um, you know, I have a small microcosm of it, but like, yeah, it, it’s making me hesitate to, to pull the trigger. But like I will at some point, it’s like maybe I’ve put it off for one year, but not like five. Yeah, but like, so, so like SaaS is definitely getting squeezed.Um, it does make me wonder, like, I, I do think that there’s an opportunity for a more AI native, um, system of record thing that is not just Postgres. Um, or not just MongoDB, although both are very good. Maybe it’s like a convex or like people Yeah. Bring up convex a lot. I don’t know, like, like, I, I just feel like the sort of quote unquote firebase of, of AI apps isn’t really a thing yet.Um, beyond what we have. Uh, which, which is fine. It’s, it’s, it’s just. We could probably start in a more sort of rapid iteration cycle first before scaling up to like a Postgres or MongoDB, which are more sort of old tech. I was at a dinner with, uh, Mike Krieger, the CPO of en philanthropic, and, and he, we were just kind of going around the room going like, what are people most worried about?Yeah. And, uh, for me, uh, I, instead of security, I brought up biosafety. Yeah,[00:40:21] Jacob Effron: classic.[00:40:22] swyx: Um, actually, like I said, it was. Cliche and classic, and the rest of the table were, were like, what do you mean? Someone sitting at home can manufacture a virus that wipes out half of humanity,[00:40:32] Jacob Effron: almost like the OG Jeffrey Hinton.Like, this is why you should be scared.[00:40:35] swyx: I’m like, yeah, like the read the, you know, risk reports. Like this is like the thing. Um, I think, and Mike was just sitting there knowing he was sitting on Mythos and going like, actually it’s security. Um, and I think like, um, I think the, there’s, there’s, part of it is.A very good marketing. Like too good. Yeah, like I would actually advise and topic to tune down the marketing because also it’s, it is just a very good model and you don’t have to make so many marketing claims around it. At the same time, it is not really a private model. If you give it to 40 companies.Each of whom have like 10,000 employees or whatever. Right. It’s not, it’s not private, it’s, it’s like there’s bad actors in there.[00:41:18] Jacob Effron: Yeah. Hopefully, hopefully not as, uh, as bad as releasing it widely, but, uh, no, I mean, it’s an interesting. You know, it’s an interesting case study for how all, I mean, many model releases might, I mean, you know, this might be the first model release that looks like the rest of ‘em from from now on, right?[00:41:31] swyx: It, it, so it’s, it’s the, there’s an overall product strategy, uh, for anthropic of like bundle, uh, you know, restrict access bundle, uh, product with model maybe.Whereas, uh, OpenAI has definitely been a lot more sort of. Philosophically aligned on like, we will just enable access everywhere and we don’t know what you, what will come out of it. Right.[00:41:51] Jacob Effron: Right. Though, I mean, this current moment, uh, obviously the cynical take is also just ties to the amount of compute that both companies[00:41:56] swyx: Yeah.Right, right, right. Yeah, I think, I think that’s true. I I do think like the, the, this is the, the, the scale, the dawn of like larger than 10 trillion parameter models is very interesting. I don’t think it, I think it’s a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next like three, five years.It’s, and this is like already written in, in the cards.[00:42:18] Jacob Effron: Yeah.[00:42:19] swyx: So to the extent that like, you know, will we have rationing of models, uh, above 10 trillion, uh, in like two years? I don’t think so. I think everyone will have no, we’ll just[00:42:29] Jacob Effron: have rationing of the next phase.[00:42:30] swyx: Right. Right. But like, that’s as it should be almost like, um.My, my classic example, which I, this is just me theorizing, not anything confirmed by Google. When Google announced Gemini, they actually announced three sizes, which was Flash Pro Ultra. They never released Ultra. They only have Pro and Flash. Um, so my theory is they have ultra sitting in a basement and they just could distilling from it for, for flashing pro.Um, which like, yeah, I mean, I, I actually think that’s. As it should be for any lab that they, that they do that.[00:43:02] Jacob Effron: Yeah. Just because those are the models that people actually wanna end up using. And it’s just like cost prohibit.[00:43:06] swyx: It is more, yeah, it’s cost. Yeah. It’s, it’s not the want, it’s just, just, just the cost.Um, I do think, like, uh, it is interesting that, uh, for a while I was, I was considering the theory that models capped out at two, 2 trillion, and I think that’s proving to be wrong. And well then if I’m wrong, how wrong? How wrong am I? Do we do 200 trillion? Do we do two quarter trillion, whatever? Um, and I don’t think we have the straight answer to that, but like, uh, it’s interesting that we are continuing to scale number of pers when everyone kind of assu like can see that we’re not going to get like the next thousand or 1 million x from this paradigm.So like the others, like the alias of the world are working on other. Um, model architecture improvements. We need a different scaling law, I guess, because like, we’re, I, I feel like people already already feel like we’re tapped out on this. Like the, the end, the end state of this is we turn most of the world into data centers and like, I don’t know.I don’t know if we want that.[00:44:08] Jacob Effron: Yeah, I mean, uh, if the, if, if, if the return of intelligence are there, maybe, uh, maybe not so bad.[00:44:13] swyx: I, I, I think there, there’s just a sheer amount of like, like un scalability that like is wrangling people’s sensibilities right now. Um, especially in terms of like context lengths.Um, my classic quote is that context length is like the slowest scaling factor in, in lms.[00:44:30] Jacob Effron: Yeah.[00:44:30] swyx: Um, we, like, we took maybe. Three years to go from like 4,000 context length to a million and that’s about it. Yeah. Like Gemini has had a million token context length for two years now. Um, and no one’s using it.Like, so like yeah, it’s memory. Memory is probably gonna be the, the biggest limiting constraint on all these things.[00:44:50] Jacob Effron: Yeah. Certainly seems that way. I guess I’m curious over the last year since you recorded last, like what’s one thing you’ve changed your mind on?[00:44:57] swyx: I feel like I was kind of bearish on open models like last year.Um, in a sense of, like, I, I had just done the podcast with an Al[00:45:07] Jacob Effron: Yeah.[00:45:08] swyx: Of Braintrust where he, and he, I mean, you know, he has a good cross section of all the top AI companies and he says market share of open source is 5% and going down. Um, I think that’s changed. I think it’s going up. Um, and even if,[00:45:22] Jacob Effron: even though the capability gap does seem to be increasing.Spending on the[00:45:26] swyx: time. It’s hard to tell. Yeah, it’s, it’s really hard to tell. ‘cause like, okay, for, for listeners, capability gap increasing is like on public benchmarks. And let’s say you’re comparing mythos versus like, I don’t know, G-T-O-S-S or like GLM 5.1. And, um, it’s, it is really hard to tell. ‘cause even if they were closing, you will also not believe that they were closing that much because it’s very easy to gain the benchmarks.Yeah. So you just don’t really, really know. Um, all you know is like. Uh, there’s somewhat objective open router stats on like what people choose in a free market. And people do choose some of these open models in significant volume, except that a lot of them are heavily discounted. So you need to kind of like price adjust, uh, these things.So even if, even if that were true, which I, I’m not sure, like I, I, I feel like the numbers just up now instead of down. Uh, I think the. Separation between what the top tier agent labs are doing versus the average startup in ai or the average GPT wrapper is significant enough that you should not worry about the, the, the sort of mean industry number.And you should, you should cohort things into like, here’s the median here, here’s like the bottom 80% and here’s the top 20%. And top 20% acts very differently than the pome percent. And so top 20% is, which is what I all I care about, um, is. Definitely going towards more open models. Um, the fireworks and the togethers are crushing.Um, and, uh, and so will all the fine tuners, right? So like, um, I think maybe last time we even said things like, fine tuning is a service doesn’t work. Well, now it’s gonna work. It’s, it’s a derivative of the open market, uh, open models market.[00:47:01] Jacob Effron: Well, and also in the workload scaling to the point where people care about cost and speed, you know, more and more.[00:47:06] swyx: Yeah.[00:47:06] Jacob Effron: And that like the, you know, moving from just pure use case discovery of like, what can these models do to, okay, we know what they’re gonna do at scale now let’s do ‘em cheaper and faster.[00:47:14] swyx: Yeah. Yeah. Um, so, so like, uh, that change I, I think, is probably the most significant in, in my mind. And like, I, I always like to do the mental math of like, uh, this is what.Think about, uh, scheduling a learning rate, like when you’ve been wrong once. Yeah. What else were you wrong on? Um, and I, I’m kind of working through it. I, I, to me, the, the, the other thing was the coding one, um, which obviously I, I have now come full 360 on, but I think like. People are not appreciating dark factories enough, which I don’t know if you’ve discussed in the pod yet.[00:47:44] Jacob Effron: No.[00:47:45] swyx: Um, uh, and so this is a kind of a strong DM slash Simon Willis term. Uh, the, the general idea is, okay, there’s different levels of AI coding psychosis. You can have, um, the, the very first level, which I, I, by the way I encountered first in cognition five months ago was zero. Uh, human written code. Yeah.Right. Which like, seems like a reasonable thing now was less reasonable five months ago. The next frontier that sounds as crazy today as it as, as zero coding was in in the past is zero Human review.[00:48:17] Jacob Effron: Yeah.[00:48:18] swyx: Like, just, just check it in without even. Reviewing it, and very few people are doing that, but opening Eyes is, is exploring this and I feel like it’s, it’s definitely the only scalable way to do this.Uh, which it just means like you have to just kind of like flip the S-S-D-L-C or change large amounts of what, what you normally do. Um. Which is probably things you should have done anyway. More testing, more, you know, more automated verification or whatever. But like that is a frontier at which, like when you have unlocked that in your companies, um, you are just gonna produce much more quantity of software than than you’ve ever had.Uh, and it’s gonna be like so much, so disposable, so cheap that you can probably innovate in quality a lot as well. Like that that quantity helps you get to quality.[00:49:00] Jacob Effron: Yeah.[00:49:01] swyx: Which I think people are very uncomfortable with. ‘cause like people associate more quantity with slop.[00:49:07] Jacob Effron: Right. No, it’s back to exactly the discussion we’re having on like the reaction to these token maxing scoreboards and the, and the idea that like, today, maybe that’s not the most, uh, the, the, the, the best sign of, of, of productivity in efficiency, but going forward[00:49:18] swyx: yeah, you, but you still get rewarded for it.So they’re like, f**k it, whatever. But like, uh, I, I, I think like the, the, the people who are, who are doing well, who do well, who do most well in 2026, are not the cynics who go like, oh, that’s just slop. I’m not gonna participate in that. They’re like, okay, like this is happening with, with or without me. Bend this the right way.[00:49:36] Jacob Effron: Yeah, no, I love that. Um, I mean, I think for, for me, like any kind of related thing on, on the open source model side is for so long, I really didn’t think it made any sense to do any sort of RL post-training, pre-training, anything you could do to like improve kind of overall quality. Certainly for like latency and cost, it always made sense to me.But for overall quality, like God, you just get that for free in the models like three, six months later. I, I think what I’m starting to change my tune on a little bit is. You know, hearing all these app companies talk about, like, you know, we build stuff and then we throw it out three months later, as, as like the models improve.You’re like, okay, well then what you’re doing for capability improvement is just another version of that, right? Like, I still don’t think that like your RL or like post train is gonna make you have a better model for like. Years and years to come. But maybe I, I think you still have to be pretty rigorous on like, is that the single best thing you can do to solve a customer problem?And like, you know, oftentimes, like, it’s literally just like now, like add more data and like feed more data even via connectors to these models or like, I don’t know, do some clever engineering on the back end or whatever it is. But at the single best thing you can do for that three month time period to improve your customer’s outcomes is, you know, post-training in some way that like really improves the output of model even if you throw it out three months later because the general models get up there.It still might have been worth doing. And so I think I’m like more open to[00:50:45] swyx: you, you throw out the results, but you don’t throw out the raw data.[00:50:47] Jacob Effron: Totally.[00:50:48] swyx: And like, so like[00:50:48] Jacob Effron: Right. Then you just run it again. And so basically there’s some, obviously at the level of cost of like $10 million, maybe that’s too much, but there’s some level of cost where[00:50:55] swyx: No,[00:50:55] Jacob Effron: it’s the, it’s[00:50:56] swyx: not even 10 million,[00:50:56] Jacob Effron: right?No, of course it’s not. Uh, you know,[00:50:58] swyx: yeah.[00:50:58] Jacob Effron: There’s obviously some level of investment, uh, at which it’s the equivalent of just like staffing four engineers to go build something for three months.[00:51:04] swyx: Yeah. Uh, so the other thing I really, uh, for, for listeners, I’m just gonna leave some, some droplets of info. Uh, look into like the, the long trajectory, the synthetic rubrics work that people are doing is very important, uh, including, uh, something that’s called Doctor GRPO.I’ll just, I’ll just leave those key search terms in there. Um, I, I think it, what it means is that RL is going much more multi turn than. People think, and that means that you can customize the models in way more specific dimensions than traditional, let’s call it SFT, or uh, uh, you know, like a, a sort of shallow rl, um, that was done in a year ago.Um, so like hundreds of turns.[00:51:44] Jacob Effron: Yeah.[00:51:45] swyx: Uh, and, and, and I think that that leads you down a path of like complete domain specificity.[00:51:50] Jacob Effron: What else? Like are you, you know, uh, of these like unanswered questions in AI today? Are you like looking for, you know, in the next year? Are you, you, uh, you know, paying close attention to,[00:51:58] swyx: I, I have a few thesis for like, what?Is the sort of next frontier. Uh, one is memory, which memory and personalization we talked about. The other is really, uh, world models, which we’ve done a small little series on from Fefe Lee. Yeah, of course. To, uh, even Moon Lake. Um, and, uh, general intuition and there’s a lot of debate as to like. The relative importance of this.I think a lot of it, it manifests as like 3D static walls that you kind of inhabit for a little bit and you walk around and they’re like, cool, but like, how does this help me with my B2B SaaS? Right. And[00:52:29] Jacob Effron: it’s like all the hype now is robotics, right?[00:52:31] swyx: Yeah. Um, and there’s a, obviously a correlation between, uh, role models and embodied.Uh, vision and experiences, which leads to robotics. Uh, but I think role models is very interesting in just in improving intelligence itself. Um, from the next, from the next token prediction paradigm. Um, and so I think people are kind of testing their edges around that. One of our top articles this year so far has been on adversarial award models.Um. I, I do think, like, uh, if you don’t do anything else, just read FE’S essay on spatial intelligence on why, um, LMS don’t need, don’t have it. And she is, she may, she may not have the solution yet, but she has the right problems statement. Yeah. And so everyone else is trying to solve that problem statement in their own way.Um. And let’s see who wins. But like, I, I don’t think it does you any favor to equate role models to robotics or role models to gaming or some kind of like, uh, or like the current manifestations because what is at stake is a much more important. Conception of intelligence than just answering questions.It is, does, does, does, does the AI understand what a table is? Like, what, what matter is, what physics is? It is almost like for, for those who are movie fans, it’s like Google Hunting where, um, Matt Damon like knows everything because he read it in a book, but he’s never lived. Great,[00:53:54] Jacob Effron: great scene with[00:53:55] swyx: Robin Williams.With Robin Williams and I, I look at that scene and I go like, that’s exactly the, the, the difference between like a very intelligent LLM who knows everything but hasn’t experienced anything.[00:54:04] Jacob Effron: Wow. That’s an awesome note to end on. Uh, that’s a, have you used that before? That’s great.[00:54:08] swyx: Yeah. So, so one thing I’ve done with Lean Space is I moved to like, uh, adding daily writeups.Yeah. And so one, one of the times I was doing this daily writeup, I wrote that.[00:54:16] Jacob Effron: That’s a great[00:54:17] swyx: one. I love[00:54:17] Jacob Effron: that. Um, well, so it’s been a ton of fun. Thanks so much[00:54:19] swyx: for, for Coming Man.[00:54:21] Jacob Effron: I’m Jacob Effron and this has been Unsupervised Learning. A podcast where I get to talk to the smartest people in AI and ask them tons of questions about what’s happening with models and what it means for businesses in the world.As I hope is clear, I have a ton of fun doing this. It’s a nights and weekends project in addition to my day job as an investor at RedPoint, but our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It’s really what ultimately makes this whole thing work.And so please consider doing that. And thank you so much for your support and listening. We’ll see you next episode. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Early bird discounts for the San Francisco World’s Fair, the biggest AIE gathering of the year, end today - prices will go up by ~$500 tonight so do please lock in ASAP!From near-universal AI tool adoption inside Shopify to internal systems for ML experimentation, auto-research, customer simulation, and ultra-low-latency search, Mikhail Parakhin joins us for a deep dive into what it actually looks like when a 20-year-old, $200B software company goes all-in on AI. We cover why Shopify has become much more vocal about its internal stack, what changed after the December model-quality inflection, and why the real bottleneck in AI coding is no longer generation, but review, CI/CD, and deployment stability.We also go inside Tangle, Tangent, SimGym, which are three major AI initiatives that Shopify is doing to make experimentation reproducible, optimization automatic, customer behavior simulatable, and search and catalog intelligence faster and cheaper at scale. Along the way, Mikhail explains UCP, Liquid AI, and why token budgets are directionally right but often measured badly, why AI-written code can still increase bugs in production, what makes Shopify’s customer simulation defensible, and what he learned from the Sydney era at Bing.We discuss:* Mikhail’s path from running a major Microsoft business unit spanning Windows, Edge, Bing, and ads to becoming CTO of Shopify* Why Shopify is talking more publicly about AI now, and why staying at the frontier has become necessary for the company* Shopify’s internal AI adoption curve, the December inflection, and why CLI-style tools are rising faster than traditional IDE-based tools* Why Jensen Huang is directionally right on token budgets, but raw token count is still the wrong way to evaluate engineering output* Why the real unlock is not more agents in parallel, but better critique loops, stronger models, and spending more on review than generation* Why AI coding can still lead to more bugs in production even if models write cleaner code on average than humans* Why Shopify built its own PR review flow, and why Mikhail thinks most off-the-shelf review tools miss the point* How PR volume, test failures, and deployment rollback are becoming the real bottlenecks in the agent era* Why Git, pull requests, and CI/CD may need a new metaphor once code is written at machine speed* What Tangle is, and how Shopify uses it to make ML and data workflows reproducible, collaborative, and production-ready from the start* Why Tangle is different from Airflow, and why content-addressed caching creates network effects across teams* What Tangent is, and how Shopify is using auto-research loops to optimize search, themes, prompt compression, storage, and more* Why Tangent is becoming a democratizing tool for PMs and domain experts, not just ML engineers* Why AutoML finally feels real in the LLM era, and where auto-research still falls short today* Why Tangle, Tangent, and SimGym become much more powerful when combined into one system* What SimGym is, why simulated customers only work if you have real historical behavior, and why Shopify’s data gives it a moat* How SimGym evolved from comparing A/B variants to telling merchants what to change on a single live storefront to raise conversions* Why customer simulation is so expensive, from multimodal models to browser farms to serving and distillation costs* How Shopify models merchant and buyer trajectories, runs counterfactuals, and thinks about interventions like discounts, campaigns, and notifications* Why category-level behavior is so different across commerce, and why ideas like Chinese Restaurant Processes are showing up again in practice* Shopify’s new UCP and catalog work, including runtime product search, bulk lookups, and identity linking* Why Shopify is using Liquid AI, and why Mikhail sees it as the first genuinely competitive non-transformer architecture he has used in practice* Where Liquid already works inside Shopify today, from low-latency query understanding to large-scale catalog and Sidekick Pulse workloads* Whether Liquid could become frontier-scale with enough compute, and why Shopify remains pragmatic and merit-based about model choice* Who Shopify is hiring right now across ML, data science, and distributed databases* The Sydney story at Bing, why its personality was not an accident, and what Mikhail learned from deliberately shaping AI character early onMikhail Parakhin* LinkedIn: https://www.linkedin.com/in/mikhail-parakhin/* X: https://x.com/MParakhinTimestamps00:00:00 Introduction: Mikhail Parakhin, Microsoft, and Shopify00:01:16 Why Shopify Is Talking More About AI00:02:29 Internal AI Adoption at Shopify and the December Inflection00:06:54 Token Budgets, Jensen Huang, and Why Usage Metrics Can Mislead00:10:55 Why Shopify Built Its Own AI PR Review System00:12:38 AI Coding, More Bugs, and the Real Deployment Bottleneck00:14:11 Why Git, PRs, and CI/CD May Need to Change for Agents00:18:24 Tangle: Shopify’s Reproducible ML and Data Workflow Engine00:21:19 Why Tangle Is Different from Airflow00:26:14 Tangent: Auto Research for Optimization and Experimentation00:30:07 How Tangent Democratizes Experimentation Beyond ML Engineers00:33:06 The Limits of Auto Research00:36:36 Why Tangle, Tangent, and SimGym Compound Together00:37:20 SimGym: Simulating Customers with Shopify’s Historical Data00:42:47 The Infra Behind SimGym00:46:00 Why SimGym Gets Better with Real Customer History00:47:30 Counterfactuals, HSTU, and Modeling Merchant Trajectories00:51:55 CRPs, Clustering, and Category-Level Customer Behavior00:53:30 UCP, Shopify Catalog, and Identity Linking00:55:07 Liquid AI: Why Shopify Uses Non-Transformer Models00:59:13 Real Shopify Use Cases for Liquid01:03:00 Can Liquid Scale into a Frontier Model?01:09:49 Hiring at Shopify: ML, Data Science, and Databases01:10:43 Sydney at Bing: Personality Shaping and AI Character01:13:32 Closing ThoughtsTranscript[00:00:00] swyx: Okay. We’re here in the studio, a remote studio, with Mikhail Parakhin, CTO of Shopify. Welcome.[00:00:08] Mikhail Parakhin: Thank you. Welcome.[00:00:10] swyx: I don’t even know if I should introduce you as CTO of Shopify. I feel like you have many identities. Uh, you led sort of the, the Bing ML team, I guess, uh, uh, or ads team. I, I don’t know, I don’t know, uh, you know, it’s, uh, people va-variously refer you as like CEO or, or, uh, I don’t know what that, that, that said previous role at Microsoft was.[00:00:29] Mikhail Parakhin: Uh, that was... Yeah, my previous role w- at Microsoft was the-- I actually was the CEO of one of Microsoft’s business units, which included, as I, you know, as we discussed, all the things that people like to laugh about, uh, including Windows and Edge and Bing and ads and everything.[00:00:47] swyx: Yeah, yeah. What a, what a, what a wild time.You’ve obviously, uh, done a lot since you landed at Shopify. Uh, one of the reasons I reached out was because you started promoting more sort of internal tooling, uh, primarily Tangle, but also a lot of people have seen and adopted Tobi’s QMD, uh, and obviously, I think, uh, Shopify has always been sort of leading in terms of, uh, engineering.I think more-- it’s just more recent that you guys have been more vocal about your sort of AI adoption. Is that, is that true?[00:01:16] Mikhail Parakhin: Well, I think AI tools in general are fairly recent development, uh, and we’ve-- Shopify, you know, at this stage of its development, we’re developing AI in-in-house and other, uh, building tools that use AI and, you know, interfacing with the wider AI community, uh, you know, are on the sort of the, uh, runaway trajectory.So it just did by sort of natural byproduct. We, we talk about it more also. We just, uh, just even yesterday, Andrej Karpathy was famous in tweeting about, oh, are there some, uh, ways, uh, that, that you can organize your agents to store the data and then, uh, look up the data so that you don’t have to research or, or lose context every- Yestime. And a little bit tongue in cheek, I tweeted that, “Hey, we’ve, we’ve done it much earlier, and we even have different approaches, Tobi and I.” Tobi, of course, is a big fan of QMD, and I’m more of a SQL, SQLite fan. But, uh, yeah, very similar things that we’ve already done here. The point is, yeah, we’re very dynamic, you know, explosively growing company, and we have to be at the forefront of AI adoption, obviously.[00:02:29] swyx: Yeah. Yeah. Um, you, your team kindly prepared some slides actually that we were gonna bring up on to, uh, the screen. I think I can, I can screen share, and then we can kind of go through some of the shocking stats that maybe, maybe put some numbers to what exactly is going on. So here we have, uh- An internal AI tool adoption chart.What are we looking at here? What ?[00:02:54] Mikhail Parakhin: Yeah, this is very interesting statistics. Uh, this is number of daily active workers, you know, think of, uh, DAO, basically the active users of-[00:03:05] swyx: Yeah ...[00:03:05] Mikhail Parakhin: AI tool as a percentage of all the people in the company, right? And then- Yeah ... different AI tools. And, uh, you could see two things here is that one is the green is total.Uh, green is just total. So you could see that it approaches really % by now. It’s hard not to do your job now without interacting deeply, at least with one tool. You could see another interesting thing is just as many people commented in December was the phase transition when suddenly models gotten good enough that, that everything took off and started growing.Uh, it, it was many people noticed that the thing is that small improvements accumulated into this big change in Sep- December roughly timeframe.[00:03:52] swyx: Yeah.[00:03:52] Mikhail Parakhin: The other thing I would claim you could see is that, uh, CLI-based tools and tools that don’t require you to look at the code becoming more popular, and you could see, yeah, various versions of, uh, Cloud Code and Codex and Pi and internal development tools taking off.Uh, exactly, yeah, uh, and blue is our River, just internal agent for coding, where tools, uh, that require IDEs such as, uh, GitHub, Copilot or Cursor, they’re not exactly shrinking, but they’re not growing as fast. Like, uh, red, red line is, is the IDE kind of tools. So you could see that they’re, they’re not experiencing as, as fast of a growth.[00:04:37] swyx: As I understand it, basically, every employee has their choice, right? Of choose whatever tool you use, and then you’re just kind of doing a, a daily sur-survey or something.[00:04:47] Mikhail Parakhin: Exactly. And, uh, we- Yeah ... the, the push is to get your job done, you can use any tool, and we effectively fund unlimited tokens for everybody.Uh, we, we do, we do try to control the models that, uh, people use, but from the bottom, not from top. Like we basically say, “Hey, please don’t use anything less than Opus four point six.”[00:05:09] swyx: Oh .[00:05:10] Mikhail Parakhin: Some people, some people end up using GPT five point four extra high. Some people use Opus four point six. Um, uh, you know, uh, there are some, uh, there are plus and minuses in going for full one million context window versus not.But, uh, we try to discourage people from using anything less than that.[00:05:28] swyx: Yeah, yeah. Got it, got it. Uh, I mean, uh, that’s, you know... The, the next chart here, it really kind of shows the expansion and the sort of December twenty twenty-five inflection, right? That, uh, people are using a lot of tokens. I think it’s also really interesting that no one was kind of abusing it in twenty twenty-five.Like it was- Had comparatively, uh, to this year, there was almost no growth. I mean, it’s still like, you know, probably, probably gave fifty percent.[00:05:56] Mikhail Parakhin: Yeah. This is just a different scale. It’s still exponential- Yeah, yeah ...growth at just a different- ...rate of expansion. Uh, there was inflection point, and Sean, I would claim the, the super interesting part here is that you could see that the distribution becoming more and more skewed.Yes. The top percentiles grow faster. So that means- Yeah ...the people in the top ten percentile, they, their consumption grows faster than seventy-five and so forth. So, uh, the distribution skews more and more towards the highest users, which is... I don’t know what it tells me. It’s like it feels not ideal, to be honest.Or maybe it’s okay. We’ll see.[00:06:36] swyx: Why does it feel not ideal? Is, is it because of, um, quantity over quality, or what’s the concern?[00:06:42] Mikhail Parakhin: Because take it to the limit. That means, you know, if, if this rate of separation continued- Ah, yes ...a year, there will be one person consuming all the tokens. So it’s just, it’s kinda strange.[00:06:54] swyx: Yeah, I mean, um, uh, I, I think internal like teaching and all that, uh, will, will help sort of distribute things more widely. But in, in the early days, of course, the people who are sort of more AI-pilled will obviously find more ways to use it than the people who are less AI-pilled. Maybe let’s, let’s call it that.I’ll just, I’ll just kinda quickly, uh, pause from the, the... You know, we will go back to the rest of the slides, but I just wanna, um, review, you know, there are a lot of CTOs of, of large companies like yourself where they’re all considering some kind of token budget, right? Like I think it’s something, something that Jensen Huang has been talking about, where like if your 200K engineer is not using 100K of tokens every year, like they’re, they’re underutilizing coding agents.Of course, Jensen Huang would say that, but like it seems a very quantity over quality approach and like some, some people are basically saying like, well, is this comparable to judging engineer quality by lines of code, right? Which we also know is like kind of flawed, but better than nothing. So I, I don’t know if you have like a sort of management take here on, on how to view this kind of, uh, metrics.[00:08:02] Mikhail Parakhin: Well, I mean, you’re, you’re baiting me. I, I like... This is my favorite topic. Uh, if you let me, I’ll probably talk for two hours on just this. I have a lot of things to say. Like I do think Jensen gotten a lot of bad press saying, “Oh, of course you’re, you know, this, uh, the- ...the cake seller says you don’t need enough cakes.”You know? Like, of course. Uh, but, uh, I actually, uh, think that’s undeserved. I think he, he’s actually right. Uh, I do think- He,[00:08:33] swyx: he’s directionally correct.[00:08:35] Mikhail Parakhin: Yeah. Yeah. He’s directionally correct for sure. Uh-[00:08:37] swyx: Who knows what the right number is? Yeah.[00:08:39] Mikhail Parakhin: The thing that I do Uh, want to say, and this is something that we learned through trial and error and very important is like two things.One is that it’s not about just consuming tokens. Uh, you can consume tokens and, and in fact, the anti-pattern is running multiple agents, too many agents in parallel that don’t communicate with each other. That’s almost useless, uh, compared to just fewer agents and burns tokens very efficiently. Uh, setting up the right critique loop, especially with the high quality models, where one agent does something, the other one, ideally with a different model, critiques it, uh, suggests ways to improve it, the agent redoes it with this critique and, and so it takes much longer.So people don’t like it because latency goes up. You know, they, they have to wait until this debate is happening. But, uh, the quality of the code is much higher. And another thing, just since you mentioned like, look, uh, uh, yeah, the overall budget is just like, uh, lines of codes. Lines of codes are exploding for everybody right now, or partially because AI is really mover balls, but partially just because AI can write a lot more code, you know, doesn’t get tired.And so you have to have to have a very strong narrow waist during PR review. Otherwise, just the number of bugs will go through the roof. It’s, uh, it’s this unexpected consequence of the just volume trumping everything. I would claim by now good model writes code on average with fewer bugs than, than the average human.But since they write so much more of it, like more of it will make it into production. So you have to- You still[00:10:26] swyx: have[00:10:26] Mikhail Parakhin: more bugs. Yeah. Have to have a very rigorous PR reviews, also automated of course. But, uh, yeah, that to spend a lot budget there. Like this, this for me, for me, actually, the important metric is the ratio of budget spent during code generation versus, uh, spent, uh, expensive tokens like GPT, uh, five point four Pro or, uh, uh, Deep Think from Gemini, you know, checking on PR reviews.[00:10:55] swyx: Yeah, totally. Uh, I noticed in your chart you didn’t have any review tools. Do you just use like, like let’s say a Claude code to review tools? Or do you have another set of review tools like the Greptiles, the Code Rabbits, uh, Devin Reviews has a review tool. I don’t know if you’ve had those specialist review tools.[00:11:13] Mikhail Parakhin: You are a little bit jumping on my store tool right now because the graphs I was only showing public tools. Uh, uh, the-- I haven’t found a good PR review tool that, that does what I think should be done. And, uh, partially my, my thinking is because it’s so... It just goes against both what people feel like emotionally they prefer and, uh, some of the, uh, you know, frankly Even business models that, that the companies run.At peer review tool, uh, time, you want to run the largest models. That means, I don’t know, Codex or, or, uh, Cloud Code is not gonna cut it. You need to have pro-level models if you really want to, uh, stand the tide of bots from going into production. And you need us to spend a lot of time, the models taking turns, but you don’t want, like, a big swarm of, uh, of, uh, agents.So in fact, you end up in a different dual-dualistic world where you generate not that many tokens. You, in fact, generate few tokens, but it takes f-a long time because these are expensive models taking turns rather than many, many agents trying to do many things in parallel. So that’s, that’s why I feel like I haven’t found good tools, so we are using our own for peer review for now.[00:12:33] swyx: Yeah. Yeah. I mean, uh, I think a lot of companies are building their own, uh, especially to their needs, right?[00:12:38] Mikhail Parakhin: Mm-hmm.[00:12:38] swyx: Um, I, uh, you also have a chart here going back to the slides on, uh, PR merge growth, where we’re now at thirty percent, uh, month on month rather than ten percent. Uh, and also the, the estimated complexity is going up.You know, this is productivity, right? ‘Cause y- presumably there’s more stuff going into the code base and more, more features getting worked on. I’m curious about the backlog, right? Like the, the, the-- I actually don’t mind a pro-level model taking an hour or two hours to review my PR, because I’ve dealt with humans who take a week to review my PR, right?And I keep pinging them on Slack, “Hey, hey, review my PR.” So, you know, I think there’s some trade-off here where, like, it still doesn’t make sense.[00:13:18] Mikhail Parakhin: Exactly. That, that’s exactly m-my point. Uh, that on one hand, you can tolerate longer latencies at, uh, PR. On the other hand, like right now, the real problem is not in spending time waiting for PR.It’s real problem is since there’s so much more code than- Yeah ... uh, probability of at least some tests failing going up, and then you, like, keep de-failing, then you have to find the offending PR, evict it, retest it without that PR, and so deployment cycle becomes much longer. Uh, so it actually, in terms of the overall time to deploy, it’s total time savings if you spend more time on a longer model, like thinking for an hour, because then, then you, you don’t have to spend all that time during testing and rolling, you know, rolling back the deployment.[00:14:03] swyx: Yeah, totally. That’s still worth it. You know, you don’t look at the individual, look at the aggregate, and look at the, the, the change in the aggregate system.[00:14:11] Mikhail Parakhin: Exactly.[00:14:11] swyx: I’m kind of curious if, like, there’s this PR mentality and, like, c-- the, the, the CICD paradigm will be changed eventually. Some people are like, obviously a lot of people want new GitHub, but I even wonder if, like, Git is the problem, right?Like, is that the bottleneck? Is the concept of a PR a bottleneck? Do you guys use stack diffs? I don’t know if, uh, that’s a, like, a merge queue stack diff type of thing.[00:14:34] Mikhail Parakhin: We, we use, we use Stacks, we u- we use Graphite. We worked with, uh, Graphite a lot. Uh, so we use Stack, uh, PRs. I think, uh, like that’s clearly the overall CICD in general, and the interaction with the code repository right now is the, clearly the sort of the, the main issue and the bottleneck for us, uh, and highest top of mind.I would say we probably need a different metaphor or different whole design of how to process it in new agentic world. I haven’t seen anything dramatically better yet. I, I think everybody right now is just trying to keep their head above the water ‘cause, ‘cause there, there’s so many PRs and then everybody’s CICD pipelines start creaking, the, the times are increasing, the number of bugs slipping by increasing, and you have to, have to clap on down.And so we are a little bit in this situation when we need to first stabilize that story and then start thinking, hey, what, what it could be a completely different and new world, which I haven’t... I know some people working on it. I haven’t seen something, like anything super compelling yet, but clearly the old thing were designed for humans will need to be morphed into something new.[00:15:53] swyx: One of the thing that I, I think about is kind of like the merge conflict is basically a global mutex on the whole system, right? And in, in hu- in human organizations, we do have something like that. It’s the company standup. But like, other than that, it’s like it’s actually fitting for us to be somewhat decentralized, somewhat plugged into one stream of information source, but somewhat lossy.Like it’s okay, you know, that, that not every delivery is like atomic consistency. Like we’re not dealing with a database sometimes.[00:16:27] Mikhail Parakhin: This is a very good point, uh, because since humans don’t write code too fast, you know that global mutex is not too bad. Once you-[00:16:36] swyx: Yes ...[00:16:37] Mikhail Parakhin: start writing code at the speed of machine, it becomes the, you know, the bottleneck.Then what do you do? Maybe, and I can’t believe I’m saying this because I, I’m long-- lifelong opponent of, uh, microservices, and I always thought that was, like, a really bad idea. And now that you’re saying it, like, maybe in new guys like microservices will make a comeback, you know, because then you, you can ship things independently in tiny things and, and the managing all that complexity automatically will be much easier.I don’t know. Like, we’ll s-- we’ll have to see.[00:17:10] swyx: Yeah. I mean, I don’t know what the Microsoft or, or Shopify thing is, but I, I read this paper from Google where they have a monorepo that deploys into microservices, right? And then, uh, the other concept that I think about a lot is the Chaos Monkey concept from, from Netflix.Being able to create, like, this robust system where, um, uh, you know, you, you have the service discovery, you have the, uh, the independent, independent microservices discovery and, and, uh, you know, probably going to be a fair amount of duplication. That’s how an organic system sort of scales, uh, that, that you have that...I don’t know how you call it. Slack? Robustness? Depend-- uh, d-duplication. I, I, I forget the-- I, I’m-- And this-- those-- these are not exactly the terms- Hmm ... I’m looking for, but I c-can’t really think of the words. Okay. I was gonna go into Tangent and Tangle. Uh, so, uh, we, we sort of discussed the overall stats that, uh, Shopify has.Uh, but, you know, I, I think some, some pretty cool stuff that you guys are working on is your ML experimentation, uh, and your, your sort of auto tr-research training pipeline. Presumably you’re much closer to this one because it’s, it’s a sort of personal hobby of yours. How, how would you explain them in, together?I thought we have a slide that, like, uh, has the s- the system diagram.[00:18:24] Mikhail Parakhin: Yeah. Tangle first and then Tangent as a-[00:18:27] swyx: Yeah ...[00:18:28] Mikhail Parakhin: as a thing on top of Tangle. And, uh, Tangle is the third generation, I claim, of, uh, systems of, uh, running any data processing, but a bit with a skew for ML experiments, but not necessarily. Any sort of data processing tasks where you need to iterate, share, and you have scale so that you want maximum efficiency.You know how, like, normally you would work, you would-- Imagine you’re a data scientist or an ML practitioner, you would get Jupiter notebooks or, or maybe you would get, uh, you know, Pyth- your Python scripts, and you would manage the data, and you produce those TSV files, and you put them in some JFS or something.Then you would notice that, oh, it has this, uh, weird missing values. You go and write another script that, uh, goes and replaces them with, uh-[00:19:20] swyx: Ah ...[00:19:21] Mikhail Parakhin: dash S. And then, then you, then you run some, some, uh, “Oh, I need to filter bots.” And so you run some light GBM model that, uh, removes the bots. And then, then you like-- And then you, you kind of like get into shape, and then you start experimenting, and you run multiple experiments, and then you’re like, “Oh my God,” like, “this experiment is worse.”You undo, and you cannot get to previous result. And like, “Ah, what did I do?” Like that. Again, then, then you finally like get everything working. Then you like start throwing it over the fence to production. You, you replicate it, those things don’t work, and then sometimes you like don’t notice that you forgot some feature naming and the, the features don’t match.But then, like imagine you, you did everything, and then six months later you’re like, have to repeat it because now there’s more data, or you wanted to do another pass, and you’re like, “What, what did I do?” Or like, or like, “This script crashes now,” or the, “the path has changed.” And then, then you’re trying to, like you spend another month just doing ar- digital archeology on your own, you know, history, right?Now multiply that by many, many teams. Now imagine you got an intern that you wanna ramp up. Now you have to show that intern, “Oh, you know, look, here’s the folder, there’s the scripts, you know, ask your cloud agent to do, and then, uh, to, to figure it out.” And then cloud agent does something, and then you’re, “Ah, yeah, right, right, it was the wrong folder.I forgot to tell you, I actually have this other thing I forgot myself.” And, and that’s, that’s the, like, the daily life we all, uh, all know it, uh, if, if you’re a data scientist, machine practitioner, ma- machine learning practitioner or, uh, or even like any data managing, uh, person.[00:21:00] swyx: Yeah. So I, I used to do this, uh, f- uh, on the quant finance side, uh, in, in my hedge fund.So we did this before Airflow, and then, uh, obviously Airflow came along and, uh, then more recently Dagster, uh, I would say is like, in my mind, what I would use for that shape of problem, uh, where you had to materialize assets and create a pipeline.[00:21:19] Mikhail Parakhin: And that’s, that’s very good segue because... So Airflow is great, but Airflow is more about you, you have something and you wanna repeatedly run it in production on schedule.It’s less about you as a team developing things and being able to share, and you grabbing the standard pipeline and saying, “Hey, I wanna change this tiny little component in the huge sea of data processing, and I don’t wanna-- I wanna run ten experiments on this, and I wanna do hyperparameter optimization.”All that is very hard to do with Airflow. It’s very easy to do with Tango. Tango is m- more about, it’s everything about group of people Running experiments, it might be agents too nowadays. Uh, running experiments cheaply, collaborating, sharing results. Uh, you don’t need to understand fully. You, you grab-- you clone somebody else’s experiment or somebody else’s pipeline, uh, run, uh, change small piece, run it, be, like, get it to production state, and then ship in one click.So then the... You don’t have to port it into any other system to, to run in production. You can just run the same experiment. It’s, it’s fully production ready. And, and it’s, uh, it has lots of... Again, as I said, it’s third generation system. The original one was, I would claim there was Ether and then, uh, at least in my career, Ether was the first, first, uh, that pioneered this type of approach.And then there was, uh, Nirvana, which, uh, uh, at Yandex, which did kind of sec-second take on this. And now this one aggregates the, the learnings from all of those and, and Airflow as well to, to get to the state where you try it, it, it feels kind of magical. Uh, ‘cause now everything is based on content, uh, hashes.So even if the version changed, but if the output didn’t change, nothing is being rerun. It’s very efficient. If you... Multiple people start experiment that needs the same sort of data preprocessing, it’s not repeated multiple times. It’s automatically done only once. If you start ten experiments that all require, you know, some, some data preparation first as the first step, and you don’t have to coordinate for that.Like, you don’t have to know that other people are starting it. You now, it’s very easy compos-, uh, composability, any language you can u- uh, you wanna use, and it’s very visual. So you can see immediately, you can edit it easily, you can assemble small things with just even mouse clicks if you want to, and, uh, share, clone.And everybody knows also it’s fully kind of static in the sense that we rerun it second time, it will exactly have the same results. Like, you will never have to do digital archeology. So full versioning and everything is also there.[00:24:06] swyx: Uh, so, so people can, uh... It’s open source. Go to the GitHub repo and, and, uh, check it out.Uh, and it is also a really good, uh, blog post about it. I think all these is, like, really appealing. The, the, the, the thing that I think sells me the most about it is that, um, sort of development to production transition, right? Which I think, um, a lot of people haven’t really solved that, uh, strictly, right?Like, we develop really, really well in, in Python notebooks, but then, you know, that’s obviously not a sort of production ready process. I think that, like, any way in which that is solved, I think is, is very appealing. Then the other thing that you mentioned, which also raised my eyebrows, was content-based caching, which you mentioned is, is, um, you know, is ve-very much, uh, um, a sort of efficiency measure about, uh, you know, just like recalculation only on, on sort of content addressing Which I think makes sense.Uh, it surprised me that the savings could be this much, but maybe I just haven’t worked at your scale where there’s so much duplication, uh, that people just rerun because they change a single ID upstream.[00:25:10] Mikhail Parakhin: It does, yeah. But it’s not only you rerun. The, the main savings are coming from the fact that you ran it, you got your job done, and you moved on.Then- Yeah ... somebody else in some department you don’t know existed runs the same task, but on a newer version.[00:25:27] swyx: Yeah.[00:25:27] Mikhail Parakhin: Like right now, you can’t, in, in most of the organizations, you can’t even find out about it so that you can’t even measure that you’re spending that time twice, right? Here- Yeah ... if everybody’s on Tango, that’s detected automatically and detected that the output is the same.And then for that person, all it looks like is like experiment just suddenly moved, jumped forward, right? Uh, uh- Yeah ... so that’s because, because the, there’s network effect of multiple people helping each other.[00:25:51] swyx: Yeah. This is one of those things where it’s designed to be a platform from the beginning rather than an individual developer’s tool from the beginning, right?And, and everything’s gonna streams down from there. That is the sort of Tango, uh, orchestrator, and it’s, it manages jobs. We’ve seen a few versions of this, and this is obviously, uh, uh, the sort of, uh, unique approaches that you guys have, have, uh, figured out. And then there’s Tangent.[00:26:14] Mikhail Parakhin: Yeah. And Tangent is basically an automatic auto research loop that can help and kind of do your work for you.Uh- ... you know, uh, effectively, effectively, Andrej Karpathy recently popularized it with auto research. Yes. Remember he said like he was, uh, speed running this, uh... Yeah, uh, you know the story. The, here we’re basically bringing the same capability into Tango so that, uh, the, uh, Tangent can analyze it. It’s just an agent that can run multiple experiments, figure out what can be changed, and keep on rerunning it, keep on modifying until, uh, maximizing some goal, some loss function, whatever you need to, to achieve.And in general, I would say if you’re not using auto research-like approach in whatever you do, like literally whatever you do, then you’re missing out. We saw at Shopify that taking like a wildfire, anything where you can put measurements can be done dramatically better. Our-[00:27:19] swyx: Mm-hmm ...[00:27:20] Mikhail Parakhin: uh, speed of, uh, templatization HTML, uh, completely new UX tem- uh, templatization of, uh, reducing latency for liquid themes.Uh, we-- Our, uh, search, uh, recently we moved from It’s hard even, uh, quote from eight hundred QPS to forty-two hundred QPS with the same quality just by pure optimizations and not a research loop that kept running and changing code in our index serve on the same number of machines, just increasing the throughput.We, we managed to improve the quality of gisting and machine learning process. Uh, you know, gisting is the prompt compression technique that[00:27:59] swyx: allows for[00:28:00] Mikhail Parakhin: lower latency and, and lower and, uh, actually higher quality slightly. So like literally whatever different walks of life, and it doesn’t have to be AI related.Uh, we, we had a reduction in, uh, storage because the agents would go and find data sets that clearly are derivative, uh, and then you don’t need to store things twice. You know, we, we, we found somewhat embarrassingly that it was one of the largest tables was hashing random IDs into another random ID, and we literally- Oofput only one. So it was translating, yeah, two random IDs hashed[00:28:36] swyx: into[00:28:37] Mikhail Parakhin: each. So, so[00:28:37] swyx: it has access to the code as well, so it can, it can check the, like what, what the hell is it doing?[00:28:42] Mikhail Parakhin: So there, there cou- it could be run in two levels. You, uh, you know, at the superficial level, it could just use ex-existing components and, uh, reshuffle them.Uh, you know, like you can grab- Yeah ... uh, XGBoost, and you can grab some, some Py- PyTorch module, and then can grab some, you know, grab another tools and, and combine them. At a deeper level, since Tangle is all sort of CLI based underneath you, every, every component is a wrapped really CLI, uh, call and a YAML file, it can analyze code and create new components and, and, uh, keep on iterating as well.So, so you can, you can both have quick modifications of existing t- uh, pipelines with the, with components that are already there pre-baked, or you can create new components, uh, and-[00:29:29] swyx: Yeah ...[00:29:29] Mikhail Parakhin: keep iterating on those. So auto research is, again, this is probably the, the thing I was excited the most in the last two months happening, and we see it taking like, like totally like a wildfire.Just, uh, everybody, every day, every... well, every day, every minute, I would, uh, have somebody Slack message saying, “Oh, look how much better I made it.” And, uh, it’s all throughout the research.[00:29:53] swyx: Is this democratized in some way in, in the sense that like is it your ML, uh, engineers and researchers doing this, or is it your regular PMs and software engineers also have the ability to auto-- to use Tangent?[00:30:07] Mikhail Parakhin: This is an awesome question. Like, Tango in general and Tangent in particular are extremely democratizing. Like they- Yeah ... they are the main tools for- ‘Cause I don’t[00:30:15] swyx: need the details.[00:30:16] Mikhail Parakhin: Yeah. Exactly. Initially used by ML and AI engineers, but then literally, as you said, PMs are like the highest user right now is one of PMs on our org, uh, Sartak and he was, he was number one by, by usage of, of this ‘cause they’re just, uh, energetic and knowledgeable, and now it, it unlocks a lot of capability where you don’t have to co-change code manually.[00:30:39] swyx: I mean, I mean, because it kind of cuts out the ML, ML engineer from the process because the, the, the PMs have the domain knowledge and the ability to think about, uh, from first principles about, okay, what, what results do I want? And they can-- they even have the access to the data that, that needs to go in.So it’s like in some ways, like this is the magic black box that we’ve always wanted for, for training and, and for, uh, I guess, uh, uh, hill climbing, whatever.[00:31:04] Mikhail Parakhin: It’s basically cloud code for your AI development- ... uh, situation, right? Like now, now you don’t have to know exactly how algorithms work. You can just, uh, bring your domain knowledge and expertise and product knowledge and iterate within Tangent until you’ve gotten the results that you need.[00:31:21] swyx: In my previous roles, every time that someone has pitched AutoML, you know, I’ve always been like, “Uh, this is not, this is not gonna work. It’s, you know, it’s, it’s always gonna be a flop.” Somehow it’s working now. I mean, presumably the answer is now we have LLMs and it’s good enough, right? It’s, it’s an emergent property that we can do auto research, but like, it doesn’t feel that satisfying that how come we didn’t do this before, right?Like we just did like parameter search and like, I don’t know. That’s maybe that’s it.[00:31:48] Mikhail Parakhin: Yeah. Bayesian optimization and hyperparameter optimization was, was the one that, or facet of AutoML that was used very actively, which incidentally also built into, uh, Tango. But, you know, I know Patrice Simard very well, and, uh, he was such a, uh, such a proponent of AutoML, and he put, like literally spent careers trying to democratize it.Without LLMs, it just turned out to be very hard. Like it, you, you would have flexibility within certain narrow domain, but it was hard to wider scale, and now with LLMs suddenly it’s like magic wand, and so suddenly everybody- ... is an AutoML expert.[00:32:28] swyx: Yeah, I, I think it’s multiple things, right? Like I’m, I’m just gonna bring up the, the, the chart again, right?Like LLMs can do the monitoring very well. That is the very potentially unbounded, super unstructured. It can do the analysis very well, it can do the... Uh, and basically it is much more intelligence poured into every single step. Uh, there’s maybe nothing structurally changed about AutoML, but this is just m-more intelligent and more unstructured.[00:32:53] Mikhail Parakhin: Exactly.[00:32:54] swyx: Any flaws that you’ve run into? Like everyone is like drinking the Kool-Aid, oh my God, time savings, uh, you know, performance improvements. Like what, what, uh, issues have you have, uh, come up?[00:33:06] Mikhail Parakhin: This is really cool. It’s not a solution to all the world’s problems for sure. The limitations are usually the ones I-- And this is where we get into a bit of a subjective territory.Uh, I can only share what I’ve, I’ve seen so far, and I’m sure the situation, uh, is changing, and, you know, maybe after I say it, like many people will reach out and say, “Hey, what about this?” And you don’t know that, and then, then we’ll be probably right. But what I’ve seen is auto research is very good at doing kind of obvious things that you don’t have bandwidth to do or you didn’t notice or maybe you’re not aware of like the-- some standard practices.It is not good at doing something completely out of distribution, something that, you know, you have to think for, for multiple days, uh, and, and do something like none of this. So, so it’s, uh, I, uh, set an experiment once, uh, on, on my sort of, uh, hobby thing, and I let it run for, uh, ended up, uh, several weeks run, uh, you know, it’s like full production kind of scale, so it, you know, slow runs and, and it ex-- it performed in the end, uh, over four hundred experiments, and only one was successful.I’m like, “Okay, that’s, that’s good.” But-[00:34:18] swyx: But it saved time.[00:34:19] Mikhail Parakhin: Yeah, I saved time. Like it, it was the, that thing. Yeah, if I, if I were doing four hundred experiments myself, my betting average, as I said, would have been much higher, I’m sure. But also, first of all, it would take me like three years to do four hundred experiments.And, uh, I didn’t have to do them. Like the machines were just, uh, the price of electricity did that. So, and I got one improvement, uh, that in, uh, my, my-- Honestly, when I was starting that experiment, my thinking was to go and show that, “Hey, Andre, maybe you just don’t know how to optimize.” And I was super smart because in, in my pro-problem, it was optimized for many years, and it was like fully improved.Uh, and I didn’t expect it, you know, auto research to find anything at all. Yet it did. So instead of making fun of Andre, I ended up, uh, a big, big supporter. Yeah, that’s exactly the tweet. Yes.[00:35:10] swyx: You and Toby really, really go back and forth on-online a lot, which is really funny. Uh, think of it as, as an eval for the optimalness of the code it’s running on.Uh, it’s almost like it reminds me of like a Kolmogorov complexity thing, but, uh, I guess it’s-- there’s some optimal thing that you’re trying to sort of reduce down to, I guess. Um, and so, so you, you, you know, you should congratulate yourself that you had, uh, you know, uh, ninety-nine percent, uh, optimality.[00:35:36] Mikhail Parakhin: Exactly, yeah. I think Andre really deserves a lot of credit for popularizing this approach. This is, uh, this is incredibly, I think, powerful and cool and You know, the, uh, even him, him just mentioning it led to a lot of gains in a lot of places in the industry, so we should be thankful.[00:35:56] swyx: Yeah. I think he also has a just...I don’t know what it is. Like, um, you know, it, it is a simple self-contained project that people can take and apply to other things, which is, is, is one thing, but also just the name. Just like somehow no one, no one managed to call their thing auto research. It’s just naming things is very important. I think that that is mostly, uh, our coverage of Tango and, and, uh, Tangents.I think obviously, you know, there’s a lot of, uh, ML infra at, at Shopify that people can, uh, dive into. We’re about to go into SimGym, but before I do that, any, any other sort of broader comments around this whole effort? Like where is it, where is it leading to?[00:36:36] Mikhail Parakhin: As a segue to SimGym, like all those things start composing strongly.And, uh, you could see a huge unlock when you can look at each one of the tools and, and you see, oh, they’re extremely useful. Uh, Tango is useful by itself. Auto Research is useful by itself. SimGym is useful by itself. If you combine all three, you create like synergetic effect. I think that’s why we wanted to even, uh, cover them today is because this is something that if you go back even, you know, five years ago, would’ve been unthinkable.Uh, replicating that, uh, would, would be either incredibly costly or impossible, right? With probably thousands of people are required.[00:37:20] swyx: Well, we have serverless human, uh, serverless intelligence, right? Like, uh, so yes, you do have thousands of hu-- of, of intelligences, not just, not humans. And that’s, that’s close enough, right?Even if they’re not AGI, they’re, they’re close enough to do the, the task that you need them to do. And, and, you know, that’s, there’s plenty for, for a lot of routine work, knowledge work. Okay, let’s get into SimGym. Um, this is one of those things I, I was surprised to see actually it’s apparently your, uh, one of your most popular launches, and I think something that, uh, I think Sim AI, I think Yunjun Park, who did the Smallville thing, there’s a very small cottage industry of people trying to do like the simulate customer thing.I think a lot of people maybe don’t super trust this yet because they’re like, well, obviously they would just do what you prompt them to do, right? But maybe just think, uh, tell us about the sort of inspiration or origin story.[00:38:10] Mikhail Parakhin: That’s exactly actually the thing I wanted to cover, because if you don’t have the historical data, all you can do is prompt a-agents in a vacuum, and they will do exactly what you prompt them to do.In fact, when I first proposed it, and this is a bit of, um, my brainchild initially, if I, I can boast, even Toby said like, “But wouldn’t they, they just repeat what, what you tell them?” And, uh, but I’m like, “Yes, except Shopify has decades of history of how people made changes and what there is, uh, there, what it resulted in terms of sales.”So now what we can do is we can-- we have this... It’s not, it’s a noisy data. There’s a small, usually websites, uh, you know, like things, things are never in isolation. It’s almost never AB experiment. It’s always AA experiment when there’s has two meanings, but basically, you know, in different time you run two different things.But if you aggregate in general, uh, like everything together, and you apply, uh, denoising and collaborative filtering like approach, you can extract a very clear signal. And then you can optimize your agents. And that’s why it took so long. It took almost a year of that optimization of just us sitting and fiddling, and, and we had this internal goals of correlation of hitting-- internal goal was to hit zero point seven correlation with, uh, add to cart events, for example.Like that, that if we run real AB test experiment, that it should, it should go and, and rep-uh, replicate, uh, same sort of success that, that humans had or lack thereof. And it, it took forever, and I don’t think that’s easily replicatable because, uh, like who else would have that data? You have to have this historic, you know, decades, uh, worth of data.And now, now the, like the other thing you need is in-infrastructure and the scale, right? Because, uh, w- again, what we found, uh, stat sig results, you need to run a lot of simulations, a lot of agents, and, and it’s-- Those are expensive things. Like you’re, you’re making actions in the browser because you want a real friction.You want to, to be able to get the image like of what humans will see because you wanna, uh, detect effects like, “Hey, if I make my images larger, will I have more sales or l- uh, fewer sales?” And like usually people’s intuition here, by the way, is that I increase my images, I will have more because they look nicer.You know, designers all look sparse and big images. Like usually your sales tank, right? But, but, uh, you know, from HTML, all the characters look the same only the, the size tag looks different, right? So it’s very hard. So you have to take visual information, you have to run this in simulated browser environment on the big farm and, and of course, you have to have, uh, like very, very expensive model, good model with multi-model model.So all this it’s-- is what’s taken so long and, uh, to share my personal fail a little bit there, Sean, is like, you know, we always had this bias to-- for like large company bias. You know, we always, uh, whenever you-- we do, we’re like, “Hey, we’ll run an experiment,” right? We make, make a change, and we will run an experiment and then, uh, see, uh, see which one’s better or like, “No, this is worse,” and most of them are worse, so you discard it and keep iterating, hill climbing.And we’re like, “Oh, like smaller merchants, they cannot get stat sig results. They cannot really run experiments simply because, you know, in a week there would be not enough data for them.” So we thought from this perspective. What we didn’t realize is that most people don’t have A and B, they just have one thing, and they need suggestions of What A and B should be.So, uh, we first build this, hey, we run simulation on two separate teams and, and, uh, say, “Hey, which one is better?” We then morphed it into, and very recently just released it, when you have just your site, your theme, we run over it and we say, “Hey, here’s what predicted values of, of, uh, uh, conversions are, and here’s how we think you should modify it to increase your conversions.”And then circling back to what you started with, the proof is in the pudding. Like, if we are not correlating with reality, like, people will not be using it. And, uh, thankfully, we see literally every day more users than the previous day. So, so right now, uh, right now- It’s working. Yeah. I’m-- Right now my problem is how to pay for it all because the so our major thing is how to optimize the LLMs, do distillation, how to run the headless browsers, uh, and handful browsers, uh, uh, cheaper so that we can accommodate the increase in traffic.[00:42:47] swyx: Yeah. I, I understand that you, uh, you published a lot of technical detail at GTC, so I was just gonna bring it up a little bit. I think s- was this in, in con-conjunction with some kind of GTC presentation? Or something like that, right?[00:42:59] Mikhail Parakhin: Well, we, yeah, we, we did it in several place, but yeah, we had the engineering- Yeahblog, uh, as well. Yeah.[00:43:05] swyx: Yeah. So you’re running, uh, GPT OSS. Uh,[00:43:08] Mikhail Parakhin: the, this is an older version. You know, now we run multimodal model. But yeah- Yeah ... GPT OSS, we still run GPT OSS as well for[00:43:15] swyx: And then you have the VMs, and you also have browser-based. I really like this one where it you said, “It violates almost every assumption that standard LLM serving is designed for.”And then you had like, basically orders of magnitude differences between everything.[00:43:29] Mikhail Parakhin: Exactly. Which is, which, uh, which was, you know, a bit of a challenge to implement, like when, like even simple things. Uh, be- since it violates all the assumptions, for example, multi-instance GPUs, like MIGs don’t work as well.But we needed, uh, to get MIG to work because, ‘cause otherwise it’s way too expensive. And so we had to deal with the, yeah, with, uh, lots of infrastructure and, and, uh, work with, uh, uh, Fireworks and CentML, uh, you know, to help with optimizations and browser-based, as you mentioned. Yeah, like, takes a village.[00:44:04] swyx: Okay. So there’s a lot of like, I guess, experimentation in the infrastructure so far, and you’ve published more or less what you have here. I guess I’m, I’m less familiar with CentML. I, I don’t do, uh, that much work in this, this part of the stack. But why was it the sort of preferred instance platform?[00:44:22] Mikhail Parakhin: There are really three probably top companies. There used to be, uh, uh- Three top companies, uh, at least I was aware of that did, uh, LM optimization. You know, together Fireworks and Santa ML, not necessarily in that order. Santa ML recently got acquired by NVIDIA. Uh, what they did is if you have a model and you want to optimize it to a specific prof-- uh, profile of usage, uh, they would go and do it.And, uh, we work with, with those companies, uh, this was work particularly in with Santa ML and NVIDIA to get them the best possible results out of it. And, and sometimes you, you have to retune depending on, like sometimes you want the maximum throughput, sometimes you want minimal latency, sometimes you want like the cheapest, right?And, yeah, or some combination. And so yeah, these are people who would come and help you.[00:45:14] swyx: I see. I see. Yeah, yeah. I’m familiar with these people for the LLM, you know, autoregressive stack. But the other interesting category of these optimizers is also the diffusion people, whereas like Fel and, you know, uh, Pruna recently has come up a lot as well, which I think is like really underappreciated, uh, at least by myself, because I, I thought, oh, all the workload would be LLMs, but actually there’s a lot of diffusion as well.[00:45:38] Mikhail Parakhin: Exactly.[00:45:38] swyx: There’s a lot here, so I, I, I... it’s, it’s, uh, it’s, it’s, it’s hard to cover. But I, I do think like people underappreciate the importance of customer simulation, basically. I think this is something that I’m candidly still getting to terms with. Uh, you know, uh, you also-- your team also like prepared this, like, really nice diagram.Uh, I, I assume this is AI generated.[00:46:00] Mikhail Parakhin: Yeah, it looks-[00:46:01] swyx: Maybe it’s not.[00:46:01] Mikhail Parakhin: Yeah, it looks, uh, Gemini-ish. Yeah, but, uh, uh, honestly, I, I don’t know where, where the hell they generated. It looks, look, uh, looks like it’s, uh, Google. But the interesting part, John, that, that, uh, we haven’t covered, but I, I wanted to mention is if your store had previous customers, rather than it’s a new store, you’re like new merchant just launching things, it helps tremendously in just correlation and forecast.Yeah, we take your previous, uh, customer’s behavior, and we create agents that replicate those specific distribution of, of customers that you get, and then we a- we apply those to your changes, and then that, that raised raw, you know, the re-- uh, just correlation with the add to cart events or to-- with conversion or whatever it, it, it may be, uh, quite dramatically.So, uh, replicating humans in general seems like an interesting, cool challenge.[00:46:58] swyx: As a shareholder, I think this is the-- like if people are Shopify shareholders, they should really deeply understand this because this is basically the moat. The, the more you use Shopify, the more it will just automatically improve, right?Like you’re, you’re doing the job for them.[00:47:13] Mikhail Parakhin: Yeah, that’s what we started with. Like, uh- ... uh, otherwise, if you’re just a startup, I wouldn’t do it if, uh, you know, if it was my startup because Without the data, it, yeah, as, as you said, it’s, it’s exactly the case that, uh, whatever you say in prompt, that’s, that’s what the agents will be doing.[00:47:30] swyx: The statistician in me wants to like really satisfy the sort of, um, statistical intuition, I guess. Um, to me it’s kind of, uh, the, the word that comes to mind is, um, ergodicity. Uh, so let’s say a, a customer takes this path, customer takes this path, customer takes this path, right? Um, the... In my mind, the way I explain it is like, okay, here, here’s the ninety-five percentile, here’s the five percentile, and here’s the median, right?Um, but to me, what SimGym is potentially doing is that it can, uh, modify... It can sort of model the sort of in-between sort of journeys as well, that, that maybe are dependent on the previous states. This may be like a very RL-type conclusion where like basically the summary statistics, if you only did naive AB testing, you only have the, the statistics at, at, at a certain point, and you only judge based on the sort of overall summary statistics.But here you can actually model trajectories. Does that make sense? Or-[00:48:31] Mikhail Parakhin: That makes total sense because like, well, that, that makes even more sense that maybe even you realize bec- because-[00:48:38] swyx: Okay. Please,[00:48:38] Mikhail Parakhin: please. Yes ... we do-- Yeah. The, so internally, uh, we have this system, we talked about it briefly once at NeurIPS.We have a huge HSTU-based system that models the whole companies, uh, and their possible paths. And like- Yeah ... what you are, what you are showing, like actually at any point of time, you can either model the user’s behavior or you mo- can also think about, uh, the whole merchant as a company, as the entity that acts in the world.You can model that as well. And then you can do, can do counterfactuals. In your graph, like in your blue graph, uh, if you’re... Imagine in the center there, uh, somewhere in the middle, you would have an intervention. I give that person a coupon, or I don’t know, I send a personal thank you card, or give a discount in some- somewhere.And then you can, uh, then you can do forward rollouts from that counterfactual. So what would have happened with that intervention or without the intervention? And you can even ch- change where that intervention, uh, in time can happen, right? Like some- where, where in this journey. So we, we do this at the Shopify scale for our merchants, and then if we notice that something that they can be fixing, like there’s a strong counterfactual, like we have Shopify policy, they basically get a notification like, “Hey, we think your...something is wrong with your-” I don’t know, Canadian sales. Like, uh, it looks like it’s misconfigured. Here’s what you need to do. Or do you think like, uh, you have to set up this campaign with these parameters? And we do that at the buyer level to literally offer discounts or cashback or, or things to buyers.So this is-- I’m getting very excited. Like this is my sort of area of, uh, interest, I guess, and, and hobby. But being able to m-model something complex as human beings or companies and model counterfactuals on it, where you can have interventions in the future and optimize when to make intervention, what kind inter-- uh, what kind of intervention to make.It’s such an unlock that previously was completely impossible. Like the-- it was, it was always dreamed of, but never... Like how would you even simulate it without LLMs or HTUs? I think very, very exciting times.[00:50:59] swyx: I just wanted to, uh, to maybe illustrate this. I, I’m not the best illustrator, but I, I am a conceptual statistics guy.And y-you know, you cannot just do this. Like this is a dimensionality AB test doesn’t do, right? Like, uh, because it doesn’t have the, the, the change over time, uh, stochastic nature, uh, and it doesn’t have the sort of contextual like... Here’s all the context to this point. Um, okay, cool. Um, that’s SimGym.You’re, you’re gonna burn a lot of tokens on this thing. But you’re, you’re one of the, the only scale platforms in the world that can, uh, that can do this across a huge variety of workloads, right? I’m even curious on a sort of human, uh, research level of like, well, do, does retail behave d-differently from like clothing sales?D-does that behave differently from electronic sales? I, I don’t know. I don’t know what else you guys... The Kardashian shoppers, do they differ from like people who buy, uh, I don’t know, cars and, uh, whatever.[00:51:55] Mikhail Parakhin: Well, very different, and different sensitivities and different modes of, uh, shopping and, and different levels of what’s important.Now, to-totally, you can do aggregations at, uh, at a store level. You can do aggregations at a different, uh, category level. I don’t know if, uh, you know, for our statisticians among us, I couldn’t believe, but we-- recently we’re looking at it, and we had to bring back, uh, CRPs, you know, Chinese restaurant process.It’s a, like, way of aggregating and, like, naturally grow clustering. So across... Specifically to answer questions that, uh, like you were just posing on how, how if, if buyers behave different categories. And I’m like, “I haven’t seen CRP since two thousand and one.” It’s[00:52:37] swyx: so What? It’s so- What is... No, I haven’t, I haven’t seen this.No. This is not in my training. Uh,[00:52:44] Mikhail Parakhin: but, but yeah, it, uh, uh, it actually, like the, the-- there was a very popular kind of theory, popular neurips HTML circles in early two thousands, uh, kind of nice. And now, now it has practical applications, uh- Yeah ... that we were resurrecting.[00:53:03] swyx: Yeah, amazing. Uh, I, I can see, I can see how this is like a, uh, a fun job for you where you get to apply all these things.Um, yeah, yeah, so super cool. Super cool. So, okay, so, so anyone who, who knows what CRPs are and has always wanted to use them at work, uh, they should, they should definitely join Shopify. Okay, so w-we have a lot and but I, I’m, I’m being mindful of the time. I, I do wanted to, to sort of cover some other things.Um, I-I’ll give you a choice, UCP or Liquid?[00:53:30] Mikhail Parakhin: Liquid. I think, I think on UCP, you know, like UCP is very important for us and, and it just we are-- UCP, we have a structured, uh, discussions, and you can read about them, and we have, uh, blog posts, and we have a big release this week, in fact, like with our catalog.Oh,[00:53:46] swyx: okay.[00:53:46] Mikhail Parakhin: Uh, yeah,[00:53:46] swyx: but- Le-I mean, we, we can, we can discuss the, the, the release briefly because we’ll release this after the-- after it’s already announced so whatever. There’s a catalog that you guys are doing?[00:53:55] Mikhail Parakhin: Yeah. So we are, we are- Okay ... we are bringing in capabilities of a whole, uh, Shopify catalog.Basically, you now you can search for products, you can do lookups by specific ID, you can do bulk lookups when you need to bring m-multiple products. You don’t need to know in ad-in advance what you’re trying to show or to sell or check out. Like, you can now, you can now have this decided at, at runtime, and this big area for investment for us for both non-personalized and personalized searches, trying to provide basically a win-window into whole universe of products that are being sold everywhere in the world.And Shopify is really not exactly, but almost like a super set of any-anything being sold. Now we are bringing it into UCP and, uh, and, uh, identity linking is another big thing for us, uh, so that you, you can use, uh, like Google or whatever, whatever identity you have, uh, they’re minimizing friction.[00:54:56] swyx: Yeah. So[00:54:57] Mikhail Parakhin: yeah, big release for us.But Liquid AI of course we never talk about, and the problem might be more, more aligned with what we d-discussed previously on this chat.[00:55:07] swyx: Sure. The main thing that everyone understands about Liquid is that it is inspired by Worm, and I still don’t know why. I’m curious on your explanation. I think you, you, uh, you can make things very approachable.And also I think like what is the potential of like the, the level of efficiency that you get out of Liquid?[00:55:23] Mikhail Parakhin: You- we all familiar with transformer architectures. And, uh, for the longest time, there was a competing architecture, it’s called the state space models. So, so Sams, uh, you know, Chris, Chris Reyes, one of the pioneers and, and lots of startups, uh, trying to make those realities.They have, uh, significant benefits being main being, uh, being much faster and, uh, lower footprint and not quadratic in length, you know, sort of, uh, linear in, in, uh, in your context length. But with state space models- They never quite made it. Like they’re used-- They have, uh, certain niches when they thrive, their hybrid architectures are useful, but they never quite made it.And liquid neural networks are, you can think of them as a next step, like, uh, sort of, uh, state-space model square. It’s non-transformer architecture that’s more complicated than sta-state space and really difficult to code if you-- if I’m being honest. But it’s, um, very efficient. It’s, uh, subline-- sub, uh, quadratic in, in length of your context.Uh, it’s very compact way to represent things, and that’s a liquid AI company. They... Their goal is to productize it, and very often you have this need, uh, when you need to have long context and small model, and you want to have low latency. Like in general, it’s basically on par with transformers, and if you do hybrids with transformers, it’s, it’s even better.That’s why we at Shopify, when we tried multiple and we constantly try multiple models, multiple companies, we found that for small, particularly with low latency applications, when you have low latency and/or if you need longer context lengths, liquid was the best. And so we still use the whole zoo and always like obviously test and use everything, uh, every open source model and, you know, it feels like sometimes even every private model.Uh, but liquid’s been taking quite a bit of, uh, at least internal Shopify share. And the reason I’m excited is, yeah, because it’s, it’s the only non-transformer architecture that I found being genuinely competitive. Uh, and, uh, you know, for we use it for search and for, for long context, uh, pulse distilling and others.This is the overview. I don’t know how approachable Sha, sorry. Maybe, maybe still too obtuse.[00:57:51] swyx: I, I mean, I think they haven’t been that open about their implementation details. I think the... I would say like liquid hasn’t been like if there’s a lot of technical detail published, I haven’t read like a, a formal sort of paper on the implementation details.Uh, but I, I did get the sort of relationship between the SSMs and the others. This is one of the sort of, uh, charts that was, you know, showing the relationship between like full attention versus Something that’s, uh, more like a RNN type in terms of their, their efficiency. Um, and then the, the other chart was this old one, uh, where it compares versus, uh, some of the other models.Uh, doesn’t exactly have the correct Y-axis, but close enough where you can see like it’s basically a, a step change difference in terms of the efficiency. I think the surprise to me was that you guys are, uh, actively using it already in internally inside of Shopify. And like I, I’m curious, like what are the constraints that you’re optimizing for, right?Is it when you say smaller, is it like the 1B size? Uh, what kind of like latency constraint are you, are you optimizing for? What kind of context length, um, sort of considerations, right? Like I think for example, right, like in the audio kind, kind of use cases, the SSMs ef-effectively have unbounded context length because they, they just have to operate on like the most, the sliding window of the most recent stuff.Uh, I’m just kinda curious, like w-what do you see the potential here?[00:59:13] Mikhail Parakhin: Yeah. The SSMs are effectively because, yeah, because the state embeds all the, all the previous information needed, or that’s the assumption. SSMs effectively have infinite context length. The, the problem with, uh, with them is that expressiveness is not there.The, uh, uh, Liquids are effectively souped up SSMs. We are much more expressive, m-uh, com-more complicated again to code. There is, there is a paper on it. You can, you can see it. Differential equation rolled out and, and then computed as a, uh, as really as a convolution. It’s a bit involved. The thing where we, we use it is specifically either for where we need super low latency, and we’re-- there was a lot of very fun project with, uh, Santa ML and Liquid AI themselves.We run it at, uh, thirty milliseconds, a, a tiny model, like three hundred million parameters in, but we run it in thirty milliseconds, uh, end to end for search when you, when you type a query, and then we produce all the possible things what you, what you can mean by that query and some, you know, uh, not only synonyms, but, but, uh, a que-kind of full query understanding the, the whole tree of what you might need and including your personal personalization because you might have done like previous queries and lowering it all down into the search server so that the requirements on latency obviously they are very, uh, very strict.So, so then we are able to run it under thirty milliseconds because, ‘cause at Liquid, you know, Qwen doesn’t run on this. And even Liquid, we had to work a lot with NVIDIA and to... because almost everything is not designed in CUDA for or in, in the current stack for, for low latency. Like small things that don’t matter with large models, you know, start mattering a lot, and we had to optimize it.There is different end of the spectrum where this is maximum through, uh, bandwidth throughput for things like, for example, offline categorization when A new product appears. We need to do analysis. We need to assign where it is in taxonomy. We need to extract and normalize attributes. We need to do, uh, you know, clusters like, oh, it’s the same thing as that other merchant is selling, right?That is like un-- like almost unbounded, uh, amount of energy you need to spend on it because it’s, uh, you know, it’s quadratic kind of, uh, problem, and we have billions and billions of products. So you don’t care about latency as much. You know, it’s kind of an overnight batch job, but you, you want to maximum throughput.And you usually in those cases, you also sometimes like for, uh, Sidekick Pulse, you also need long context. These are... We are talking models in maybe seven, eight billion, uh, parameter range, uh, where we would, we would take a large model, like we would take something huge, largest we can, we can find. We would distill into liquid for a specific task, such as, for example, for our catalog, uh, formulation or for, for Pulse.And then we run it at a very large scale, like in batch jobs. Because just running... And, and it beats in that situation beat very often beats, uh, Qwen or, yeah, Kimi is more on the reasoning side. So Qwen, Qwen I would say is probably their major alternative. That’s when we use it. I mean, not a, not a panacea, not, not really, uh, I wouldn’t say that it’s frontier model in the sense of it’s not gonna suddenly compete with, uh, GPT 5.4.Uh, but, but, uh, uh, it is a phenomenal target for distillation, which is right now becoming more and more important with, uh, explosion of token usage.[01:03:00] swyx: Is that a, a now only thing or do you think you give Liquid a hundred billion dollars and they will do... Is it, is it just more scale or like what, what is limiting it?You know, what prevents it from running into the same issues that SSMs had?[01:03:14] Mikhail Parakhin: Their scale is already much larger than the largest SSM I, I’m aware of. Uh, uh- Wow, okay. So yeah. So, uh, SSM was just, was just not expressive enough or in my opinion. Like, um, again, I’m sure I’ve-- I’ll get a lot of pushback and probably accurately so.But in my opinion, SSMs are not expressive enough and, uh, liquid models are. I think, uh, especially in their hybrid form when with combined with the transformer, like in Mamba fashion, they probably the best architecture I’m aware of like period. But of course, Liquid AI is not at the scale of, uh, you know, Anthropic or, or Google or OpenAI in terms of compute.So I don’t think, uh, they... I think if, if they, uh, if they had similar level of compute, they, they would be very competitive and maybe even beat the, uh, the largest models, at least from what I’ve seen. They don’t have, uh, this level of, uh, investment But they still have decent investment and, and it’s, uh, it’s, uh, definitely for this scenario of smaller models and distilling into their second to none very often.We are very omnivorous, and we’re on purely merit-based. So the moment they will start being competitive, we’re like, we will switch to something else, and we constantly test. But, but so far, if you see progression, if I draw a graph of our workloads on Liquid versus our workloads on, I would say Qwen, which is another awesome model and probably, uh, another kind of standard within Shopfy, I would say, uh, Liquid’s been definitely taking share[01:04:48] swyx: I think that’s very promising and probably the best explanation I’ve heard, uh, directly from, from someone involved in Liquid.Um, I, I do have Maxime Lebon coming to, uh, my conference in London, uh, this week, so I, um, we’ll- Oh, that’s great ... hear more from him. I-- ‘cause, uh, there was this, like Liquid, uh, investor day or something like a, a year or, or a year and a half ago, and I, I think there just wasn’t that much technical detail that I think was, was sort of speaking to my crowd of like potential customers and users, right?Which like, yeah, it’s fine. Like, you know, maybe, maybe, uh, there, uh, we, we still need to wait for more results that come out, uh, before, before this. But I think it would be news to a lot of people that you guys are actually actively already using it for high-frequency use cases. I also wanted to highlight Psychic Pulse, which, uh, we didn’t cover, and we probably don’t have time to cover, but it’s something that you also launched, uh, recently.Basically REXIS, um, but also something that like I’ve-- the, the other REXIS trend I’ve been c- I’ve been covering a lot, uh, from like the YouTube side, even xAI’s, uh, REXIS has been LLM-based REXIS, right? Uh, which I think you are also effectively using liquid models for, but they are just throwing transformers at, at the problem.And maybe this is, uh, eh, the sort of hybrid architecture shift that will happen in order to accommodate the kind of long context and, and lo- and high efficiency that, that you need. I don’t really have a strong opinion there, like apart from I would highlight to anyone the, the, the work that the LLM base-- LLM-based REXIS community is doing is, is also very interesting there.[01:06:22] Mikhail Parakhin: Yeah. The-- again, the thing to get you excited is that it’s not just LLMs looking at things, it’s also HSTU model doing that counterfactual analysis- Yeah ... where we model the whole, uh, enterprise as an entity and, and its actions and then see what, what will, what will happen.[01:06:39] swyx: Overall, I think it, it pre-- this all presents like, uh, an enormous like...I think, uh, you know, uh, there, there was not that deep of a AI story to Shopify when it started. Uh, it was just a WordPress plugin, right? But now, you know, you are the sh- the, the storefronts, uh, e-commerce, you know, uh, guardians to s- like so many, so many people, and you’re, you’re really like applying all the AI, uh, methods and the state-of-the-art stuff.Uh, so like I, I think, you know, our conversation like today has like really, uh, oh, I guess opened my eyes to a lot. So thank you for doing this. Uh, this is a really amazing, um, overview of, uh, what you’re doing.[01:07:15] Mikhail Parakhin: Okay. Thank you for saying that, Shawn, and, uh, thank you for having me. Of course, it’s always a pleasure to talk to people who, you know, deeply technical and know what they’re talking about.[01:07:25] swyx: Yeah. I mean, uh, very few people are as technical as you but at least I can, I, I can like somewhat fo-- uh, vaguely follow along. Yeah. So, so, okay, um, there, there is a hi- there’s a hiring call, uh, you know, uh, any, any particular roles that you’re looking for that you’re like, “Okay, if you know the-- how to solve, um, this problem, uh, reach out”?[01:07:45] Mikhail Parakhin: Yeah. Uh, the, the things I would definitely call out that if you’re an ML person or if you’re data science person and, uh, uh, we, we, we have huge need for more, more people munching data, so to speak. Or surprisingly, if you’re a distributed database person and, uh, uh, you know, we, we think that there is a way to use LLMs to reimagine how we do distributed databases, and we’re working a lot with Yugabyte there.And so if you’re-- have interest in those areas, we’ve-- like ShortFi might be the best place in the world for you. That’s pretty good place for other, you know, other disciplines as well.[01:08:24] swyx: Cool. Um, I think that that was all the questions I had. I said I, I have one sort of a bonus thing if you, if you wanna indulge in, uh, some Bing history.What is your, uh, I guess, takeaways or any, any fun anecdotes about Sydney?[01:08:38] Mikhail Parakhin: Any fun anecdotes about Sydney? Well-[01:08:41] swyx: Yeah, it was a very interesting, you know-- I, I think it, like, woke up people to, like, this personality that, that, that it w-- emerged.[01:08:48] Mikhail Parakhin: The, the funny thing, like, I mean, the, the most interesting anecdote is that Sydney was first shipped, uh, in India for, uh-- and, uh, it was, uh, not noticed for a long time.And first implementation of Sydney didn’t even have OpenAI model under it. It was, it was, uh, Turing Megatron, um, Microsoft, uh, and NVIDIA collaboration model. Uh, and there were, uh, yeah, exactly. That’s, that’s the, that’s the one people thought it was a prank, uh, because it was, like, not many people were familiar with the LLMs at, at that point yet, and thought like, “That cannot be automatic.You, you must have, uh, you know, people thinking.” And then even they were complaining that, “Oh, the-- my-- this, this chatbot is gaslighting me.” And then, then people like what, what almost everybody doesn’t fully realize is that it wasn’t by accident that, uh, Sydney was Sydney. I mean, we spent a lot, a lot of effort on personality shaping.Uh, we-- I mean, it, it was a bit of my Yandex legacy, where previously we did this Alice, uh, uh, digital assistant, uh, which we learned the- Chatbot, yeah ... yeah. We, we learned the importance of, uh, personality shaping, and so here we brought, did a lot of personality shaping. Uh, so it was not fully an emerging scenario.It was, it was also a little bit edgy. What, what we learned in, in those experiments is you want to be polite, but you want to be a little bit on edge, and that draws people in. I haven’t seen, ever since the, uh, kind of those days, I haven’t seen anybody trying exactly that mode. I think we will see, we will see more of this at some point, but, uh, yeah.A lot, lots of good memories, you know. And by the way, the very first Sydney dev lead Is, uh, uh, Andrew McNamara is working in ShopFind, uh, and the head of Sidekick and, and our-- and the Pulse- Oh. And lots of these are actually, yeah, in his pur-purview.[01:10:53] swyx: Oh, okay. Uh, I-- That, that’s another fun fact. You’re, you’re- Yeahassembling the team again. Yeah. Yeah, it’s cool. Like, I think a lot of, uh, people woke up to the, the idea of AI personality for the first time there. And, like, I think now with maybe OpenClaw, like explicitly prompting a, a fun personality, I think that, that is a real selling point for, for people, right? And then I, I guess maybe the only other time that it’s like really emerged into public consciousness is Go to Gate Clawed.But yeah, I think, uh, you know, hopefully someday we’ll get Shopify Sydney.[01:11:23] Mikhail Parakhin: Well, we have Sidekick. It’s a- Yeah ... it’s a different, different thing a little bit. Yeah.[01:11:28] swyx: Yeah. Si-Sidekick was like your, your original big launch for, for AI stuff. Uh, yeah, cool. Uh, amazing. Uh, thank you so much. You guys do amazing work.Uh, honestly, if I was a Shopify customer, Shopify investor, um, hearing all the work that you guys are doing o-on this technical side, it, like, m-makes me feel more confident in like, okay, just choose Shopify, right? Like, like you’re never gonna do this in-house, which is obviously what you want. But like, uh, yeah, I mean, like, that-that’s, that’s what an ideal platform is, like, that you’re doing all the things that no individual could do at their scale, but you can at your scale.Uh, very exciting problems.[01:12:01] Mikhail Parakhin: Exactly. Exactly. Yeah. And creating network effect and hard to disagree. If you’re not using Shopify, you should.[01:12:09] swyx: Yeah, amazing. Okay, well, that’s it. Thank you so much. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Today, we explain this piece of “clickbait” from our guest!TL;DR: 95% of cancer treatments fail to pass clinical trials, but it may be a matching problem — if we better understood what patients have which tumors which will respond to which treatments, success rates improve dramatically and millions of lives can be saved — with the treatments we ALREADY have.See our full episode dropping today:Why Big Pharma is licensing AI ModelsTolstoy famously wrote, ‘All healthy cells are alike; each cancer cell is unhappy in its own way.’ Or something like that. Cancer might be the most misunderstood disease out there. It’s not one disease, it’s a family of diseases. Hundreds, maybe thousands, of unique diseases each with its own underlying biology. With this lens, saying you’ll “cure cancer” is like saying you’ll solve legos.We keep hearing AI will cure cancer, but sadly it may not be so easy. Today’s guests — Ron Alfa and Daniel Bear from Noetik — thinks they can use AI to break through a core bottleneck in the treatment development process.GSK recently signed a $50M deal for their technology that also includes an (undisclosed) long-term licensing deals for Noetik’s models like the recently announced TARIO-2, an autoregressive transformer trained on one of the largest sets of tumor spatial transcriptomics datasets in the world. Whole-plex spatial transcriptomics is the richest way to read a tumor, and approximately ~0% of cancer patients going through standard care ever get one — and TARIO-2 can now predict an ~19,000-gene spatial map from the H&E assay every patient already has. Most big AI plays in BioTech have focused on discovery, and usually result in an in-house development effort (meaning tools companies usually become drug companies). This deal stands out in that it is a software licensing deal, and represents a commitment to a platform rather than a drug. With attention on other software tools for drug development (see the Boltz episode and Isomorphic for example), it is starting to look like the appetite of Pharma for biotech tools has finally started to grow. Why the sudden interest?Cancer is hardBiology is hard, cancer is harder. But despite this, we’ve made incredible progress. So many cancers that would have been death sentences twenty years ago are routinely survivable. It used to be our main strategy was just chemotherapy — poison you and hope the tumor dies before you do. Now, there are many treatments that actually kill a tumor and leave the rest of you intact! Immune checkpoint inhibitors like Keytruda and Opdivo target the defenses of dozens of tumor types. CAR-T therapy adds modified T-cells to your blood that can target B-cell malignancies very accurately. Antibody Drug Conjugates such as Trastuzumab combine a drug with an antibody, allowing it to target very specific (cancer) cells. We truly live in marvelous times.With that said, we still have a long way to go. For every type of cancer with a miracle treatment, we have many more that are still death sentences. The world spends $20-30 billion a year trying to cure cancers, with hundreds of clinical trials yearly.Yet, progress is slow with a 95% failure rate in clinical trials.The lab doesn’t translate to the clinicAre we leaving something on the table? Enter Noetik and Ron Alfa. Ron’s core thesis is that many of these “failed” treatments actually work! But we’re not looking at the right patients with the right tumors. If only we had a way to really understand the unique types of cancer biologies and which patients will respond to which treatments, we might be able to show a much higher success rate. Millions of lives (and billions of dollars) may ride on this.The Hard part: Blind Faith in Data CollectionRon and Noetik had the conviction to spend almost two years just collecting data. Lots, and lots, and lots, of data. Noetik has acquired thousands of actual human tumors, and collects a large multimodal dataset of hundreds of millions of images that allows them to create a detailed map of the cell makeup in the local environment. These are real human tumors, not frankenstein mouse models or immortal cell lines.This data is then fed into a massive self-supervised model, creating a “virtual cell”. This model has a deep understanding of cancer biology — Noetik has worked carefully to show it can distinguish different types of tumors. Maybe even tumors we didn’t identify as distinct previously! More recently they figured out how to scale up their model and data, and see no limit in their scaling laws!Noetik’s models can simulate how a patient will respond to experimental treatments. They are working with partners to test promising drugs that were demonstrated to be safe, but not effective. If these models work as hoped, Noetik will bring new cancer treatments to patients without developing a new drug! Their models will also guide the discovery process towards drugs that are more likely to make it through clinical trials. You can imagine why this is so attractive to GSK.We’ll see…Ron and Dan make pretty persuasive arguments that their models will truly assist in cohort selection in useful ways and this seems valuable. And we think it’s pretty clear that* Translation from lab to clinic is the biggest bottleneck for drug development.* Better cohort selection using biomarkers is likely to improve translation from lab to clinic.Noetik has already had some success here. We’ll see if they’re able to translate that into a reliable advantage.Stepping back a bit from the technology, curing cancer is a pretty unambiguously positive application of AI. It is also a very hard problem to solve. Our guess is that most people have been impacted by cancer or will be at some point soon. And we hope that learning about the amazing work that companies like Noetik are doing will inspire a generation of AI engineers to work on the hardest and most exciting problems that society faces.Full Video Pod: This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
For all those who missed out on London, see you in Miami next week!Notion, the knowledge work decacorn, has been building AI tooling since before ChatGPT, with many hits from Q&A in 2023 and unified AI in 2024 and Meeting Notes in 2025. At the end of their last Make user conference, Ryan Nystrom teased Notion 3.0’s Custom Agents - and they are finally embracing the Agent Lab playbook!Sarah Sachs and Simon Last of Notion join us for a deep dive into how Notion built Custom Agents, why it took years and multiple rebuilds to get right, and what it means to turn a productivity tool into an agent-native system of record for enterprise work.We go inside the product, engineering, evals, pricing, and org design decisions behind one of the most ambitious AI product efforts in software today — from early failed tool-calling experiments in 2022 to agent harnesses, progressive tool disclosure, meeting notes as data capture, and the long-term vision for software factories and agentic work.We discuss:* Sarah and Simon’s path to launching Notion Custom Agents, and why the feature was rebuilt four or five times before it was ready for production* Why early agent attempts failed: no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model* The “Agent Lab” thesis: not just wrapping a model, but understanding how people collaborate and building the right product system around frontier capabilities* How Notion thinks about roadmap timing: not swimming upstream against model limitations, but also building early enough that the product is ready when the models are* Why coding agents feel like the kernel of AGI, and how Notion is thinking about “software factories” made up of agents that spec, code, test, debug, review, and maintain codebases together* How Sarah runs AI engineering at Notion (“notes from Token Town”): objective-setting over idea ownership, low-ego teams comfortable deleting their own work, and a culture designed to swarm around fast-changing opportunities* The “Simon Vortex,” company hackathons, and why security gets pulled in early rather than late* How Notion organizes AI: core AI capabilities and infrastructure, product packaging teams, and a broader company mandate that every product surface must increasingly work for both humans and agents* Why prototypes have become much easier to build internally, and how “demos over memos” changes product development inside a tool the whole company already uses every day* Notion’s eval philosophy: regression tests, launch-quality evals, and “frontier/headroom” evals that intentionally only pass ~30% of the time so the company can see where model capabilities are going* What a “Model Behavior Engineer” is, and why Notion treats eval writing, failure analysis, and model understanding as a distinct function rather than just software engineering* The changing role of software engineers in the age of coding agents, and why the new job looks less like typing code and more like supervising a rigorous outer system of agents, PRs, and verification loops* How the “software factory” should work: specs, self-verification, bug flows, subagents, and minimizing human intervention while preserving the invariants that matter* A live walkthrough of a Notion Custom Agent handling coworking space tenant applications by triaging email, enriching applicants with web search, and writing structured data into a Notion database* How agents compose inside Notion: shared databases as primitives, agents invoking other agents, “manager agents” supervising dozens of specialized agents, and memory implemented simply as pages and databases* Notion’s take on MCP vs CLI: why Simon is bullish on CLI’s self-debugging nature, where MCP still makes sense, and how Sarah thinks about capability, determinism, permissioning, and pricing alignment* The evolution of Notion’s internal agent harness: from early JavaScript coding agents, to custom XML, to Markdown and SQL-like abstractions, to tool definitions, progressive disclosure, and a much shorter system prompt* Why Notion cares about teaching “the top of the class,” building for sophisticated operators rather than abstracting away too much capability for everyone* How agent setup works today: agents that can configure themselves, inspect their own failures, and edit their own instructions — with guardrails around permissions* How Notion prices Custom Agents: credits as an abstraction over tokens, model type, serving tier, web search, and future sandbox costs; why usage-based pricing was necessary; and how “auto” tries to match the right model to the right task* Why Notion is not eager to train a foundation model, where they do fine-tune and optimize today, and why retrieval/ranking is one of the most important investment areas as more searches come from agents rather than humans* Why Meeting Notes became one of Notion’s strongest growth loops: not just as transcription, but as high-signal data capture that powers search, custom agents, follow-up workflows, and the broader system of record for company collaboration* Why Notion is more interested in being the place where collaboration data lives than in building hardware themselves — and how wearables or other capture devices may eventually feed into that systemSarah SachsLinkedIn: https://www.linkedin.com/in/sarahmsachsX: https://x.com/sarahmsachsSimon LastLinkedIn: https://www.linkedin.com/in/simon-last-41404140X: https://x.com/simonlastFull Video EpisodeTimestamps* 00:00:00 Introduction and launching Notion Custom Agents* 00:01:17 Why Notion rebuilt agents four or five times* 00:03:35 Building for where models are going, not just where they are* 00:05:32 The Agent Lab thesis, wrappers, and product intuition* 00:08:07 User journeys, leadership, and low-ego AI teams* 00:13:16 The Simon Vortex, hackathons, and bringing security in early* 00:16:39 Team structure, demos over memos, and building for agents* 00:20:25 Evals, Notion’s Last Exam, and the Model Behavior Engineer role* 00:27:37 Evals as an agent harness and the changing role of software engineers* 00:30:42 The software factory: specs, verification, and agent workflows* 00:32:18 Live demo: a custom agent for coworking space applications* 00:35:08 Composing agents, manager agents, and memory as pages* 00:38:15 Notion Mail, Gmail, native integrations, and tools* 00:39:43 MCP vs CLI and the cost of capability* 00:44:13 When Notion uses MCP vs building its own integrations* 00:47:43 The history of Notion’s agent harness rebuilds* 00:55:35 Power users, public tools, and the setup agent* 00:58:01 Self-fixing agents, permissions, and “flippy”* 01:01:13 Pricing, credits, and choosing the right model automatically* 01:09:01 Why Notion isn’t training its own frontier model* 01:14:07 Retrieval, ranking, and search built for agents* 01:17:27 Meeting Notes as data capture and workflow automation* 01:21:18 Wearables, hardware, and Notion as the system of record* 01:23:45 OutroTranscript[00:00:00] Alessio: Hey everyone. Welcome to the Latent Space podcast. This is Alessio founder of Kernel Labs and I’m joined by swyx, editor of the Latent Space.[00:00:11] swyx: Hello. Hello. We’re back in the beautiful studio that, uh, Alessio has set up for us with Simon and Sarah from Notion. Welcome.[00:00:18] Sarah Sachs: Thanks for having us.[00:00:19] Alessio: Thanks for having us. Yeah.[00:00:20] swyx: Congrats on the launch recently the custom agents, finally it’s here. How’s it feel?[00:00:26] Sarah Sachs: We ship things slowly. So it had been in Alpha for a little bit and at the point at which is it’s an alpha, um, there’s a group of people that are making sure it’s ready for prod, and then there’s a group of people working on the next thing.So sometimes some of these launches are a bit delayed satisfaction, so it’s quite nice to remind yourself all the work you did because we do have a habit of like. Being two or three milestones ahead. Uh, just ‘cause you have to be, you know, you can’t get complacent. Um, but it’s been great that people understood how this is helpful.And I think that’s just easier in general building AI tools today than it was two, three years ago. People kind of get it and so that user education, um, there’s just, it was our most successful launch in terms of free trials and converting people and things like that. It was really successful, so yeah.But there’s a lot to build.[00:01:12] swyx: Making it free for three months helps.[00:01:16] Sarah Sachs: Yep.[00:01:17] Simon Last: It was definitely super exciting for me because it’s probably the fourth or fifth time that we rebuilt that.[00:01:22] swyx: Yes.[00:01:23] Simon Last: And I mean,[00:01:24] swyx: you’ve been building this since like 20, 22.[00:01:26] Simon Last: Yeah, I mean, like, it was even right when we got access to like GPT four in late 20 22, 1 of the first ideas we had is like, oh, okay, let’s make an agent that I, we used the word assistant at the time, there wasn’t really the word, the word agent yet, but, oh, we’ll give an access to all the tools the notion can do, and then it, we run in the background like, like do work for us.And then we just tried that many times and it just. Was too early. Um,[00:01:48] swyx: I need to force you to like double click on that. What is too early? What didn’t work?[00:01:52] Sarah Sachs: We were fine to, like, before function calling came out. We were trying to fine tune with the Frontier Labs and with fireworks, like a function calling model on notion functions.This is right when I joined. I joined because, um, we needed a manager as Simon was needed to be able to go on vacation. So, uh, that’s, that’s around when I joined, so you can speak much more to it.[00:02:11] Simon Last: Yeah, we did partnerships with both philanthropic and open AI at different times, uh, to try to, at the time the, I mean, when we first tried, there wasn’t even a constant of like tools yet.We, we sort of designed our own like, like tool calling framework and then we tried to fine tune the models to, uh, to use it over multiple turns. Um, and because it, it didn’t work well out the box, I think. Yeah. The models are just too dumb and the context thing was also way too short.[00:02:37] Alsesio: Yeah.[00:02:37] Simon Last: Um, and yeah, we just kind of banged our head against it for a long time.Uh, unfortunately it was always like, there was always like sort of. Glimmers that it was working, but um, it never felt quite robust enough to be like a useful, delightful thing. Um, until I would say, uh, the big unlock was probably like Sonic 3.6 or seven, uh, early last year. And that’s when we started working on our agent, which we shipped last year.Um, and then, and then uh, uh, custom agents, kinda a similar capability and that, that one just took longer because we, we just wanted to get the reliability up a lot higher. ‘cause it’s actually running in the background.[00:03:14] Sarah Sachs: And the product interface of like permissions and understanding, you know, this custom agent is shared in a Slack channel with X group of people and has access to documents that are surfaced to Y group of people.And the intersect experts, Y might not be whole. And so how do you build the product around making sure administrators understand that permissioning took multiple swings.[00:03:35] Alsesio: Everything is hard back at the end of the day. Yeah. I’m curious, like when the models are not working, how do you inform the product roadmap of like, okay, we should probably build, expecting the models to be better at some reasonable pace, but at the same time we need to, you know, you had a lot of customers in 2022.It’s not like you were a new company or like no user base.[00:03:54] Simon Last: Yeah, I mean I think there’s always the balance of, you know, like you want to be a GI pilled and thinking ahead and building for where things are going. Uh, but also you wanna be like shipping useful things. And so we always try to like, like keep a balance there.You know, we. We try to take clear, like a portfolio approach. You know, we’re always working on multiple projects and, and we’re always trying to work on, you know, maintaining things where that have already shipped, like, like shipping new things that are like eminently working well and make them really good.And, and then we wanna always have a few projects that are a little bit crazy. Um,[00:04:23] Alsesio: and what are the a GI peel projects that you have today? I’m curious about, uh, you don’t have to share exactly what you’re working on, but I’m curious what are things today that maybe in 18 months people will be like, oh, obviously this was gonna work[00:04:35] Sarah Sachs: 18 months.[00:04:37] Alsesio: Yeah, 18 months is, you know,[00:04:37] Sarah Sachs: it’s a long time and Yeah. Yeah.[00:04:39] Simon Last: I mean, there’s a number of things happening. I think one thing that’s becoming more clear is I think like, like, uh, coding agents are the kernel of EGI, sort of, everything is a coding agent. Mm-hmm. I think that’s, that’s sort of one, one direction.Um, and then, yeah, the exciting thing about that is sort of your agent can sort of bootstrap its own software and capabilities and actually debug and maintain them. And so yeah, we’re, we’re, we’re thinking a lot about that. And then, yeah, like, like another category of things that I’m, I’m really excited about is like, uh, we call the software factory also.People are using this, uh, this, this sort of word. Um, basically it just means can you create sort of like a, as automated as possible, a workflow for developing debugging. Mm-hmm. Merging, reviewing, and maintaining a code base and a service where there’s a bunch of agents working together inside, and like, like how does that work?[00:05:28] Sarah Sachs: If you think back to your initial question, like, why did this take so long? I think something,[00:05:32] swyx: I didn’t say that, but Yes. Okay. Go ahead.[00:05:34] Sarah Sachs: Why, what, what changed over the three and half years of trying[00:05:37] swyx: it? Exactly. Right. Because most people always say like, it didn’t work yet. Then reasoning models came, then it worked.I was like, okay, let’s go a little[00:05:43] Sarah Sachs: bit. That’s, I mean, that’s part of it, but I think the other part of it that I actually think is really what will set notion apart for every new capability is we have like. Two skills that are crucial when it comes to frontier capabilities. One is not letting yourself swim upstream.So like quickly realizing if you’re just pressing against model capabilities versus not exposing the model to the right information, not having the right infrastructure set up. That and of itself is the skill of intuition. And the second is to see, okay, you’re not swimming upstream. Which direction is the river flowing and what is like, how do we think ahead about the product and start building it even if it’s not great yet, so that when it is there, we’re ready for it.Right? And like those can sometimes feel like counterintuitive things. Like we can be trying to fine tune a tool calling model when they don’t exist yet. And that the trick is to not do that for too long, but realize that there was something there. And we’ve had a lot of things which like, um, we’re just like not swimming in the right direction with the streams.I think we had multiple versions of transcription before we got meeting notes, right? Oh, I gotta talk[00:06:39] swyx: about that. Yeah.[00:06:40] Sarah Sachs: Yeah. Um, and so. I, I, I think that like we, we really closely partner with the Frontier Labs on capabilities and we also have to have strong conviction on, as those capabilities move.Notion is about being the best place for you to collaborate and do your work. And how does that narrative change if the way that we work changes?Yeah.[00:06:58] swyx: Yeah. You told me you were a fan of the Agent Lab thesis, and this is, this is kind of it, right?[00:07:02] Sarah Sachs: Right. I show that thesis to so many candidates. Like I have it as like micro chrome autofill.Um, at this point, like it’s one of my most visitations[00:07:10] swyx: because like, is this the, here’s why you should work in notion and not open, open eye. I, it’s like,[00:07:14] Sarah Sachs: here’s, here’s what’s different about it.[00:07:16] swyx: Yeah.[00:07:16] Sarah Sachs: And here’s why. It’s not just a rapper. I actually think more and more people understand it’s not just a wrapper.[00:07:21] swyx: Yeah.[00:07:22] Sarah Sachs: Um, and by the way, like in the beginning, parts of what we build are wrappers on functionality. That works well, of course, but that’s not really the most, um. I would say that’s not the product that, that drives revenue. And that’s not necessarily always what users need.[00:07:35] swyx: I mean, you know, notion is the AWS wrapper, but like the, the wrapper is very beautiful and like very, very well polished.So[00:07:40] Sarah Sachs: like the analogy,[00:07:41] swyx: like[00:07:42] Sarah Sachs: the analogy that I’ve been coming back to his Datadog in AWS[00:07:45] swyx: Yeah.[00:07:46] Sarah Sachs: So, uh, Datadog could not exist with, without cloud storage. Right. That it’s kind of fundamental that that works. Um, and AWS has like a CloudWatch product, but Datadog is an expert on understanding how people want observability on the products they launch.And we’re experts in understanding how people wanna collaborate, and that’s really where our expertise lies.[00:08:04] swyx: Totally.[00:08:04] Sarah Sachs: Um, regardless of the tools that we use,[00:08:07] Alsesio: I’m kind of curious how you think about implicit versus explicit expertise. I feel like Datadog is half and half implicit and explicit. It’s like they understand across markets and industries what engineering teams usually look for.With notion, it’s almost like more of the expertise is at the edge because you as a platform, you’re like so horizontal that the end user is not really the same. Mm-hmm. Like with Datadog, the end user is always like, yeah, an engineering lead, a kinda like SRE related person with notion. It can be anything.So I’m curious how you put that expertise into a product versus, you know, obviously it, WS cannot build notion. It’s, that doesn’t quite work in this case, but[00:08:44] Simon Last: it’s, it’s a little bit differently shaped. I think, you know, a classic vertical SaaS, like the data is kind of like that. They understand their individual customer very deeply.It’s kinda a narrow slice, um, notion has always been super horizontal. And our, our task has always been to sort of balance these two somewhat opposing forces of like, we’re listening to our customers and what they want us to build. It’s a broad slice. And then also we’re thinking about like, okay, how do we decompose what they want into, uh, nice primitives that are, that are really nice to use and we’ll, we’ll get us like as much bang for the buck as possible.And then, you know. Maintain the whole system, make it all like, like super clean and nice to use.[00:09:22] Sarah Sachs: We still have user journeys. I mean, we still focus on like core. I actually think the failure of our team is when we focus too much on what are cools that are, what are tools that are[00:09:31] Simon Last: mm-hmm.[00:09:31] Sarah Sachs: Cool tools. I actually think that’s when we make have the least velocity because you still need some sort of focus on a user journey.So like for instance, we’ll all sit down every Friday and look at the P 99 of like the most token exhaustive custom agent transcript and just look at why it didn’t do well and cut a bunch of tasks. Like we still focus on like, this has, like this should work. Email triaging should work. Mm-hmm. Right. And similarly, like when we’re talking about before building, um, chatting, um, before we started filming about, okay, how can I do PDF export?Well that’s functionality that then merits. Maybe we should build a tool that has access to a computer sandbox in a file system and the ability to write code. Right? Right. Um, but it’s because we’re thinking about the fact that our users to do their, to do their daily work, need to export PDFs, not because we’re like, Hmm, I think a computer tool could be cool.Like, let’s just see what happens. Mm-hmm. Like we, we have to focus on some user journeys, otherwise we just don’t have like, enough strategy to, to prioritize.[00:10:29] swyx: I think there’s a lot of like really strong opinions that you’ve had. Do you have like sort of like a towel of Sarah Sachs? Like, you know, like what, how do you run your team?Like I feel like you just have accumulated all these strong opinions. Obviously part, part of this is your, your token town thing.[00:10:43] Sarah Sachs: I think the TAs working with Service X is, um, you’d have to, it depends who you ask. Um, I think it depends if you’re on my team or a partner Right. Or a vendor.[00:10:54] swyx: Yeah. There other people want to run their teams the way that you’re Yeah.You’re like bringing these things. And then also similarly, uh, Simon, when you did the custom agents demo, you had like, well, we’ve been using custom agents and here’s the super long list of everything that we do. No humans ever read it. Right? That’s what you said. I was like,[00:11:07] Sarah Sachs: yeah. So I think for, for me, um, something that I learned very quickly and became very comfortable with was that my job was not to be the ideas per person or the technical expert.My job was to make it so that everybody understood the objective, had a resource to help prioritize what they should work on, and had an avenue to prioritize what they thought was important. And I think that’s true with all, all leadership, but I think especially on the AI team. Almost all of our best ideas come from prototypes, from people that have a cool idea because they saw a user problem, and it’s a huge disservice if all of those ideas have to pass, like the sniff test of what me and a product partner or Simon and Ivan decided were the direction, right?Because a lot of what we’re doing is leaning into capabilities, so. I think that’s the first thing is like, I don’t really view like the role of engineering leadership as like, uh, hierarchical, nor has it ever been, but especially now, like very willing to change direction based on, um, like proof is in the pudding.Yeah. And like, and I think we have rebuilt our harness three or four times. And when you do that, then the second rule of engineering leadership is like you need to build a team that’s comfortable deleting their own code and is very low ego and is driven by what’s best for the company. And, um, doesn’t write design docs because they think it’s their promotion packet.Right. And that’s a culture that notion had long before I joined, but like our willingness to just swarm on different problems and um, redo things that we’ve built before because something has changed. Like, there’s a lot of friction that can happen at companies when you do that. And it doesn’t happen at Notion.And because it doesn’t happen when new people join. Like they don’t wanna be the ones that are saying, we shouldn’t do this. I wrote that code. So then it’s, you know, you, you create a culture that everyone thoughts and that culture comes directly, I think from Simon and Ivan though, um, because they’re very open-minded.[00:12:50] swyx: Anything that you,[00:12:50] Simon Last: you’d add? I’m not a manager, like, like, like Sarah is. Um, a lot of my role is really to try to think a little bit ahead, make sure that we’re, we’re building on the right capabilities and then like the prototyping stuff. And yeah, it’s really, really critical to always just be starting again.It’s like, okay, this is new thing. What does this mean? What if we just rethought everything or wrote everything? And so I, I’m, I’m basically just doing that in a loop every six months.[00:13:16] swyx: Yeah. Do you believe in internal hackathons for this stuff?[00:13:19] Sarah Sachs: I think there’s like two different versions. So one is like, we just have a, a, a solid bench of senior engineers that come and go on what we call the Simon Vortex and Productionizing what we built, right?Because when you’re in the Simon Vortex, the velocity is super high. The direction changes daily, and it’s meant to be like the equivalent of a SC Works lab. We don’t need to do hackathons for that. We need to have senior engineers that we trust to come in and out of those projects. For instance, like management boundaries are really loose.Like you report to him, but you work for her right now. Yeah. That’s something that when we hire managers, it’s important they don’t care about because we tend to form more structures. Yeah. Don’t be too[00:13:54] swyx: territorial.[00:13:55] Sarah Sachs: We form more. It’s after we ship things, not not before, just historically. Um, the second thing is we do have companywide hackathons.Actually we just had our demos day for the hackathon we had last week this morning. That’s more for people that aren’t directly working on the project, feeling like they have the time to pause and learn how to make themselves more productive or how they would use notion custom agents to build something.Or part of the hackathon was actually encouraging everyone across the company to build their own agentic tool loop, calling from scratch. Follow like an every blog post on how to do what I think because we want[00:14:26] swyx: just with the compound engineering one. Yeah.[00:14:28] Sarah Sachs: We want everyone to use cloud code in the company or whatever the coding agent they please and understand that fundamental.So we set aside a day and a half. We’re all leadership, encourage everyone on their teams across the company to do it. So we have hackathons like that. I would say like kind of facetiously, like everything we build is a little bit like a hackathon until it graduates and puts on big boy pants and as a product ops rollout leader and has a assigned data scientists and stuff like that,[00:14:54] swyx: security review enterprise stuff,[00:14:56] Sarah Sachs: actually security reviews one of the things that we bring in first because it just slows us down way more and, um, causes a lot of tension and they build better product if they’re involved early.So, um, that is probably the first person to get involved in something that’s the[00:15:09] swyx: right PR approved answer.[00:15:10] Sarah Sachs: No, but it’s not just PR approved. It like, um, um, it’s[00:15:13] swyx: actually real. It’s actually real. It’s like, um, I’m just saying scar[00:15:15] Sarah Sachs: tissue.[00:15:15] swyx: Yeah,[00:15:16] Sarah Sachs: because like, you know, my background’s also, I worked at Robinhood for a number of years.Yes. So like, uh, compliance and things like that, um, are a little bit more, you learn the hard way when it doesn’t come naturally.[00:15:26] Simon Last: Yeah. I think the. The hackathon is really important for uplifting the general population, but like, if that’s the only way you can build new things, you’re kind of toast. I mean, it, it has to be like the daily processes, like, you know, building these new things.Um, and it has to be about, I think like, I think in the AI era a lot more leverage accumulates to the most curious and excited people. And so it’s like we’re all about just like activating that energy. You know, like if someone’s protesting something on the weekend that they’re excited about and it’s important, that should be the main thing that we’re doing.Yeah. Um, it’s not a hackathon that we schedule once a quarter, it’s just like, yeah. Daily process. Part of the culture.[00:16:02] Sarah Sachs: I mean, that’s how we shift image generation and notion now. It was always this thing that would be kind of nice to have, but it wasn’t really clear where that was necessarily aligned in product priorities.It’d be a lot of work. And we had someone on the database collections team, Jimmy, who was like. I really wanna do image generation for cover photos and inside notion. And we’re like, if you wanna build it, like it’s, do it please. Like we encourage you. We gave ‘em all the resources of working directly with Gemini and being able to like track the token usage and it working through endpoints.We gave them eval, support, everything, and then became a, a full project.[00:16:34] Alsesio: Yeah.[00:16:35] Sarah Sachs: That’s why you can’t have like ego as a, a leader. Like that’s, that’s how we work.[00:16:39] Alsesio: What’s the size of the team today, both engineering and overall?[00:16:43] Sarah Sachs: I manage, uh, the team. That’s what we’ll call it. Core AI capabilities and infrastructure.That’s about 50 people. But then we have per i partner teams that do packaging. So how it shows up in the corner chat versus custom agents versus meeting notes, that’s another 30, 40 people. And, and then every team that has a product service at Notion that a user can interface with owns the tool that the agent interfaces with the editor team.The team that did CRDT for offline mode is the same team that handles how two agents, um, edit competing blocks. Mm-hmm. Right? It’s the same problem. The team that built the underlying SQL engine is the same team that owns how the agent asks it to run a SQL query, and it does it performantly. And so from that regard, anyone working on product engineering is tasked with making them work for customers that are humans and agents because over time the majority of our traffic will be coming from agencies using in our interface, not humans.And so. Our objective is to make it so that the whole product org is building for agents.[00:17:40] Alsesio: Yeah. How has it changed internally? The activation bar is kind of lowered a lot. Like anybody can kind of create a prototype very, somewhat easily, especially if you’re like an existing code base. Have you raised the bar on like what type of prototype people need to bring forward to gonna be taken?Not like seriously, but like, you know what I[00:17:58] Simon Last: mean? Yeah. I think the bar is lowered in many ways. Be like, one thing our, uh, our team built that is really cool is our, uh, our, our design team made a whole separate GitHub repo, uh, called the, the design Playground. And it’s basically just to create a bunch of like, like helper components and you, uh, for, for quickly a throwing together UIs.And it’s become like actually quite sophisticated. Like it has like an agent in there and like, uh, that’s pretty fun. So like, we pretty much, like, they don’t do mocks, they just make like, like full, full prototypes.[00:18:27] swyx: Here it is. It works.[00:18:28] Simon Last: They give you like a u rl. They’re like, okay, all right. So we have to make the, like the real production version of that.Um, and then for engineers. A prototype looks like just making it a feature flag that actually works. Like that’s sort of the bar.[00:18:39] Sarah Sachs: Something to understand that’s really unique about notion. One of the reasons I joined we’re super lucky is no one uses Notion in their job as much as people that work at Notion.[00:18:46] Simon Last: Of course.[00:18:47] Sarah Sachs: So I think there’s very few companies, maybe if you worked on Chrome I guess, but like everything that we ship, we ship internally first and get a lot of really quick feedback. And also sometimes our dev instance is totally borked and you have to change a bunch of flags to get things done. And that’s kind of like, but everyone, so people that do it ticketing, people that do supply chain procurement, recruiting, everyone is using the same instance of notion with like a lot of flags on for these prototypes people build.Um, and so we have this, Brian Levin, one of the designers on our team, I think evangelize this concept of demos over memos.[00:19:18] swyx: Ooh, too[00:19:20] Sarah Sachs: good. Um, which has been, uh, very good for building demos, and I think it’s put a big pressure point on us to have really strong product conviction, because if anything can be demoed, you really need a strong filter of making sure that if you know, you’re doing X amount of work, you’re making the, you’re, you’re focusing on one tower, you’re not just building a really flat hill.Right. That’s actually where I think there has to be more conviction from our PMs, um, and our designers and, and well, the company really to have conviction of what journey we’re going on.[00:19:52] Simon Last: But overall, I feel like it works pretty well. Like people, almost all the engineers have good enough taste to realize that like, this prototype doesn’t actually make sense in the product, or, or it does.So it’s not that common that I would see a prototype. It’s like, oh, this makes no sense. Mm-hmm. It’s like, you know, people are doing reasonable things and, and, and then it’s just a matter of. Which things we build first and then often just, just figuring out how to turn it on and off. There’s our, in the, in our like experimental chat ui, there’s this, there’s probably like, like a hundred check boxes in there.[00:20:22] Sarah Sachs: Kills me[00:20:23] Simon Last: the things you could turn on and off.[00:20:25] Sarah Sachs: Uh, but I think that, okay, so that is kind of true, Simon, but like being the person that manages the evals team, like there is a level of intensity that it adds to the platform team. So, you know, if we’re gonna do image generation and notion, all of a sudden the way that we do attachments and the way that we, um, our LLM completion like cortex talks and expects tokens back and now it’s getting images back.Like there’s a lot of platform work that we do need to, like solidify a little bit. So sometimes it’ll be in dev for a couple weeks before it makes it to prod just because we still have to like, make it robust, make it HIPAA compliant, ZDR compliant, figure out the right contracting with the vendor, whatever it is.And we need to eval it because we want the team. To still maintain what they build. That’s the one thing is like if we have a bunch of prototypes, it can’t just be like a small group of people that then maintain whatever end prototypes. So we have invested a lot of people in an eval and model behavior understanding teams that, we call it agent dev velocity.So your dev velocity building agents can be faster if we invest in that platform. And so we have a whole org dedicated to Asian, um, platform velocity so that you can build your own eval and then maintain it once you ship it. So if a new model release comes out and we, every[00:21:38] swyx: team maintains their own eval,[00:21:40] Sarah Sachs: we maintain the eval framework.Every team owns their own evals and a lot of them we’ve integrated to Optin, to ci, or we run them nightly and we have a team, uh, a custom agent that triggers to a team to look at the major failures. That’s really critical because if we have like all these different surfaces now, a lot of it’s on the same agent harness, so it’s easier to maintain.It’s just packaging of different agent harnesses, but new functionality of the agent. Let’s say that like we wanna update like. Uh, you know, they deprecated, sonnet, um, four or whatever it is and we need to auto update. Are[00:22:11] swyx: they already? That’s so, okay. Yeah. Actually wasn’t that long ago.[00:22:14] Alsesio: Theywere[00:22:14] Alsesio: just 3.5.[00:22:15] Sarah Sachs: 3.537. Just got deprecated.[00:22:18] swyx: 3 7, 5 0.2 or, yeah. No,[00:22:20] Sarah Sachs: it’s not. 5.2 is five point. Five point no. Yeah, five four is 40% more expensive than five two. So if they deprecated five two, you would hear they can, you would hear from me about that one. Um, but, uh, another conversation to have.[00:22:35] swyx: I have a cheeky evals question for you.Have you noticed any secret degradation from any of the major model providers?[00:22:40] Sarah Sachs: Secret degradation,[00:22:42] swyx: like. During the War Bay, when it’s high traffic, it suddenly gets dumber.[00:22:47] Sarah Sachs: Yeah. I mean, not just between the, I mean, we definitely notice flakiness, we’ve definitely noticed, particularly for some providers, that things are slower during working hours and[00:22:57] swyx: there’s a latency argument.Yes. Not a quality argument.[00:22:59] Sarah Sachs: No. I think the quality difference that’s interesting is, um, even though companies that say they’re selling the same, a, it’s really into like quanti quantization, but like companies that say they’re selling the same model through different vendors, whether it be through first party or Bedrock, Azure, et cetera.We do see different qualities sometimes, and that’s not necessarily what’s advertised.[00:23:21] swyx: Yeah. Kidney went to the point of like, if we, they shipped like this, like eval across all the providers and it was like very obvious we were secret equalizing and it was very,[00:23:28] Sarah Sachs: yeah. But[00:23:29] swyx: that’s very embarrassing.[00:23:30] Sarah Sachs: You know, um, we hire Subprocess to figure that out for us.So we just wanna understand where it’s regressing or where it’s optimized. And sometimes we’re okay with regressions that optimize latency if they’re the appropriate regressions. Our job is to make sure we have the evals to understand the changes that are important to us. And even like when we’re partnering with labs on pre-releasees of models, they’ll send us multiple snapshots.And this is less about quantization, but more just regressions. Like they have shipped models that were not the snapshots that we wanted, and they have changed the snapshots that they shipped based on the feedback that we give. Because our feedback tends to be more enterprise work focused and not coding agent focused.And definitely those can be bummers, like, you know, uh, we know that this wasn’t the version you wanted, but we’ll help you make it work. I mean, we always make it work, but that definitely happens.[00:24:16] Alsesio: Yeah. Do you have, um, failing evals that you’re just hoping, oh, that will have success eventually when a good model comes out?[00:24:23] Sarah Sachs: Uh, I mean, yeah. So I think. I mean, I could talk about this for 60 minutes, so I will limit myself. I think it’s a real issue when people say evals and it’s just like, that’s quality, that’s like unit, I mean, it’s like saying testing. It’s not just unit tests, right? So. We have the equivalent of unit test.Regression test. Those live in ci, those have to pass a certain percent, you know, within some stochastic error rate. Then we have, as you’re building a product, evals of these aren’t passing right now, and this is launch quality. So we have a report card and we need to, on these categories, you know, be it 80 or 90% of all of these user journeys to launch, and then what we have what we call frontier or headroom evals, where we actively wanna be at 30% pass rate.And that’s actually been a effort that we took in partnership with philanthropic and OpenAI in the past maybe two or three months, because we actually hit a point where our evals were saturated and we weren’t able to really give insightful feedback other than it wasn’t worse. And not only is that not helpful for our partners, it’s not helpful for us to understand where the stream is going.You know, going back to that analogy. And so we spent a lot of time thinking about. What notions last exam looks like, right? Mm-hmm. Not just humanities, last exam. Ooh, notions last exam. Mm-hmm. And, um, there’s a lot of, you know, dreams about what that would look like. I know we’ve talked a lot about benchmarking, um, swix, but, uh, yeah.Notions last exam is a big thing inside the company and we have people, full-time staff to it exclusively. Mm. We have a data scientist, a model behavior engineer, and an full-time, um, evals engineer just dedicated to the evals that we pass 30% of the time.[00:25:56] swyx: What you’re hiring for[00:25:57] Sarah Sachs: MBEs? I am hiring[00:25:58] swyx: What is an MBEA[00:25:59] Sarah Sachs: model?Behavior Engineer Model. Behavior engineers started with a title data specialist before I joined when they were working with Simon on like, uh, Google Sheets and like Simon just needed someone to look through Google Sheets and say, yes, no, this looks bad. This looks good. Right? And so we hired people with kind of diverse linguistics background.We had like a linguistics PhD dropout. Mm-hmm. And a Stanford ate new grad. And they’re amazing. And they formed a new function basically. And over time we’ve built a whole team, um, with a manager who’s now kind of reinventing what that role is with coding agents. So they used to be kind of manually inspecting code.Now they’re primarily building agents that can write evals for themselves or LLM judges. There’s a really funny day I can send you the picture where Simon, about a year and a half ago, was teaching them how to use GitHub. Um, and they’re on the whiteboard and it was like, okay, I think it would be so much faster if our data specialists learned how to use GitHub and like learned how to commit these things in Dakota.And, and that was then and now I think, you know, coding has been a lot more accessible. Um, but moving forward it’s this mix of like data scientist PM and prompt engineer because there’s craft in understanding like even like what models can and can’t do things. How do we define like that headroom? How do we define like what a good journey is?Um, is this model better or not? Why is this failing? There’s some qualitative work, but then there’s also like a lot of instinct and taste to it, and that’s not necessarily software engineering. And so we have like very firm conviction and we have had for a number of years now that that is its own career path and we have always welcomed the misfits, so to speak.So we really firmly believe that you don’t need an engineering background to be the best at this job. And that’s what’s quite unique about this particular role.[00:27:37] Simon Last: Yeah, this is something that I’ve been pretty excited about recently is we made an effort basically to treat the eval system as like an agent harness.So if you think about it, like, you know, you should be able to have an agent end-to-end, download a dataset, run an eval, iterate on a failure, debug, and, and then implement a fix. And ultimately you should be able to, you know, drive the full time process with a human sort of observing the, you know, the outer uh, system.So yeah, we went, went pretty hard on that. And that’s, that’s worked extremely well so far. It’s like basically just to turn it into a coding agent, uh, uh, problem.[00:28:11] swyx: Your coding agent or just whatever[00:28:13] Simon Last: harness No coding agent. Yeah, code, cloud code. It should be totally general. Yeah. I think if it would be a mistake to like, like fix it on any, any particular coding agent.At the end of the day, it’s just like CLI tools.[00:28:21] Sarah Sachs: It’s like the same way that you would’ve a coding agent write the unit test. You should have a coding agent write the eval.[00:28:26] swyx: Yeah.[00:28:26] Sarah Sachs: But there’s a lot of supervision in that still. We just don’t believe that supervision has to come from software engineers because a lot of it is like, um, kind of you XREE and whatever, and these are the people that also triage failures and tell us where we should be investing next.[00:28:40] swyx: Yeah. I’m gonna go ahead and ask a spicy question. Is there a data, there are no software engineers at Notion.[00:28:46] Simon Last: Um,[00:28:46] Sarah Sachs: what does it mean to be a software engineer?[00:28:47] swyx: Exactly.[00:28:48] Simon Last: I mean, I think the way things are going is like we’re on some continuum where. If, if you look back three years ago, humans were typing all the code and then we had auto complete, you’re typing list of the code.Then we had sort of like filling agents, filling lines, and now we’re getting into like agents doing longer range tasks where you can debug and implement a fix and then verify it works and you know, get your, get your PR even like, like Merion deployed. I think we’re sort of just moving up the abstraction ladder and then the human role becomes more about observing and maintaining the outer system.There’s a string of agents flowing through, like me prs what’s going off the rails. Like what do I need to approve? Is there like a learning or memory mechanism that that works? So it’s kind of a hard engineering problem. There’s a, you know, there’s, there’s a lot to do there. I think we’re just sort of moving up stack[00:29:34] Sarah Sachs: the same transition machine learning engineers have made, right?Like I haven’t looked at a PR curve in a while.[00:29:39] swyx: Yeah. You used to do this stuff and now, um, auto research can do it,[00:29:42] Sarah Sachs: right? Like I think it depends on what you define as a software engineer.[00:29:46] swyx: Yes. It’s, that’s changing for sure.[00:29:49] Sarah Sachs: I think every software engineer in notion this summer went through like this, um, sheer, um, one of our engineering leads of the company called it, like every software engineer is going through the, the, uh, identity crisis that every manager goes through, where all of a sudden they realize their ability to write code is less important than their ability to delegate in context switch.And I think that is a transition out of being a software engineer. But[00:30:12] Simon Last: yeah. Yeah, there’s a critical difference to being a manager, which is that like, it is actually very deeply technical. The problem, you know, humans are very like, like, like fuzzy and you can’t like treat a team of humans like a, like a rigorous system where like, you know, prs like, like flow through and can be in like a block status and then what happens when they’re blocked, right.With a set of agents, you actually can do that. And, and, and I think it’s actually, there’s a lot of interesting technical rigor that that goes into that it’s like it’s a technical design problem. Ultimately.[00:30:42] Alsesio: What is the design of the software factory that you’re building?[00:30:46] Simon Last: Yeah, I mean, I think we’re. Trying a lot of different things.I mean, ultimately you want to design a system that requires as little human intervention as possible, but like still maintaining the in variance that, that you care about. So yeah, we’re exploring a lot different ideas there. I mean, I think I could talk about a few things I think are important there.Like, one thing I think is really important is, um, having some kind of like specification layer you can just commit marked on files. Mm-hmm. That works pretty well, but[00:31:15] swyx: it’s nice to be notion man. I’m just saying like the spec, like Yeah. The natural home for specs is notion.[00:31:21] Simon Last: Yeah. Right. It can be a database of pages.Yeah. I mean, it needs to be something that is, you know, human readable and I viewable and I think that’s pretty key. Another really key component is like the, the self verification loop. Yes. You need really, really good testing layers, basically. And that’s a really deep, uh, uh, problem. But by getting that right, you know, and then, and then it’s kinda like the workflow of like.What happens when there’s a bug? How does it flow into the system? Like, is it like a subagent working on it? How does it make a PR and how does that get reviewed? And me, and then, you know, so there’s like the, the flow or process.[00:31:56] swyx: Yeah. Cool. Uh, you know, one thing we did work out before you guys came in was this demo or this[00:32:01] Simon Last: agents[00:32:02] swyx: agent demo.Uh,[00:32:03] Simon Last: so every,[00:32:04] Alsesio: every time we do an episode, we try the product. Right. I don’t think there’s ever been an episode that I haven’t tried. Yeah. Um,[00:32:11] swyx: and we, we try, try is a, a big word. Like since day one lane space has been on Notion, but this is the, this is the net new thing. Yes.[00:32:18] Alsesio: So this is for Nel Labs, which is the space we’re in.So next week we’re opening applications for tenants. So there’s a web form, let me, we got this form done here. Uh, so, uh, before. Uh, the workflow would be I get an email, then I look at the person. It was like, should I spend time talking to this person? Then I respond, they respond back. So I build this. So the name it came up for on its own.Can you maybe h how do, how does it come up with its own name?[00:32:43] Simon Last: Yeah, that’s a pretty app name. It’s, it, it is just a random, it’s a random, a name generator.[00:32:47] Alsesio: Oh, that’s funny. It just came,[00:32:49] Simon Last: the fact that it picked that is, is kind of hilarious. I’m pretty sure it’s just determined,[00:32:54] Sarah Sachs: resilient collector. I, I think I’ve never looked at the code for that.I’ve never second guessed it. I think it’s kind of like a madlib situation.[00:33:00] Simon Last: Yeah, I think you’re right. Yeah. It’s, it’s totally a, a deterministic. Oh, I thought it was great. Yes. Although, although when the, if you use the AI to set itself up, it can update its own name, so. Okay. Um,[00:33:11] Sarah Sachs: how did you create it? It, did you just do[00:33:12] Alsesio: classroom?I,[00:33:13] Sarah Sachs: okay.[00:33:13] Alsesio: I did, yeah. I’ll say just check my inbox for applications for a coworking space. Keep a people, so it created the database for me. Which I have here. And I guess database is like an notion table because everything is notion. Um, and then whenever um, an email comes in, like here, it just creates a new role for the person.Mm-hmm. And then it uses web search to enrich the mm-hmm. The profile. So it kind of like searches the web and it’s like, this is who this person is, this is when they say they wanna move in and kind of updates everything else. This is, I mean, it’s not a GI, but to me, I don’t wanna do this work. So it feels like, I mean, it took me maybe like 15 minutes to set up the whole thing.Um, and I really like that most of the information should live here. You know, it is not like some other tool asking me[00:34:01] Sarah Sachs: Yeah.[00:34:01] Alsesio: To like, bring my stuff there. It’s like I would’ve probably already created an ocean thing.[00:34:06] Sarah Sachs: Mm-hmm.[00:34:06] Alsesio: So[00:34:07] Sarah Sachs: most of our biggest use cases and gains are from. That extra layer of human involvement in the process to make it so right.And so like one of our biggest use cases is bug triaging. So if someone posts something in Slack, can you just have a custom agent that lives there that has its own routing constitution of what team this belongs to, creates a task in your task database and then posts in that Slack channel, right? Like that’s like one of the first things that we built internally, I think.And it’s completely changed the way that notion functions as a company. Nothing falls through, well, most things don’t fall through the crack. We don’t know what we don’t know. But it’s not replacing people, it’s replacing processes.[00:34:44] Alsesio: Yeah.[00:34:44] Sarah Sachs: Right.[00:34:45] Alsesio: And I’m curious how you think about composability of these things.So the other one I was working on is like a. These filler. So whenever somebody signs up as a tenant, kind of he’ll sell the lease for them. There should probably some agent that is like office manager agent mm-hmm. That can handle the request, make the lease, and then, uh, give them a ADA access to the office and all of that.How do you think about that feature?[00:35:08] Simon Last: Yeah, so I mean, there’s, there’s two ways you can compose. One way is by using like the data primitives. So you can, you know, you, you could give, you have one agent, uh, be writing to the database and there’s another agent that’s walked in the database. So that’s, that’s one way that they, they can coordinate that’s like a little bit more decoupled and mm-hmm.Works really well. Or you, you can couple them. So I, I think it’s actually not released yet. Releasing it like next week is, uh, in the settings for an agent, you can give access to invoke any other agent.[00:35:34] swyx: Hmm.[00:35:34] Simon Last: So you can have them just. Just, uh, uh, talk directly. So[00:35:37] swyx: you, was there a limit on like, number of recursions or just,[00:35:40] Simon Last: um, probably,[00:35:42] swyx: you know what I mean?Like, you can just get an infinite loop that way there’s[00:35:45] Simon Last: some kind of Yeah,[00:35:46] Sarah Sachs: I think it’s, there is actually a number somewhere.[00:35:49] swyx: I believe I’m just, you know, like, you’re, you’re, someone’s gonna screw up. You[00:35:51] Simon Last: should you try to see[00:35:53] swyx: Yeah. I mean, everything’s gonna be paperclips.[00:35:55] Simon Last: Oh, yeah. Yeah. But, uh, but, but that’s really useful.Yeah. So we, you know, like I just, I, I helped, uh, someone internally the other day, they had, they had built like over 30 custom agents for, uh, for our go to market team doing all kinds of different things. You know, for example, like researching, you know, like, like filling information about, about a customer or like, like triaging customer feedback or like, uh, something like that.Literally over 30 of them. And, and then he, and then he even made like a database of all the agents and then he is like, okay, and, and now I’m getting 70, over 70 notifications per day with just the agents are blocked on various things. Uh, and then I was like, oh, okay, cool. You know, the obvious thing to do there is to make a manager agent,[00:36:32] Sarah Sachs: right?[00:36:33] Simon Last: That’s gonna sort of blocks be another abstraction layer in between your, your, uh, uh, 30 agents. Uh, so yeah, we, we send out with like a manager agent and then has access to invoke all the other agents and it’s sort of like, like watching and observing them and then it sort of, it just creates a layer of abstraction.So instead of 70 notifications per day, it’s like, like five. And then, and then the manager agent can help like, uh, debug and fix any problems with the,[00:36:54] swyx: does this is a concept of like an inbox or something like piece, you’re basically saying that they can message each other?[00:37:00] Simon Last: Yeah.[00:37:01] Sarah Sachs: Well[00:37:01] swyx: they use the system of record, which, which is[00:37:02] Sarah Sachs: notion, so we[00:37:03] Simon Last: actually, yeah, we didn’t make any special concepts at all.[00:37:06] swyx: They’re interested to the motion notifications that I would’ve got,[00:37:09] Sarah Sachs: they can just like write a task to a database that the other agent’s task to listening to, or they can actually call a web book to the agent, like they can just add the agent. Okay.[00:37:17] Simon Last: Yeah, I mean, this is something that, that we’re still working on.I, I think we, you know, like, like generally, generally the way we do these things is, you know, you first make it possible, maybe like a sort of janky way. So I, I, I think the way I set ‘em up is like, you know, we created like a new database that was sort of like issues mm-hmm. That the custom agents were, were experiencing, and then gave them all access to file an issue and then the manager has access to, to read the issues.Um, and that works pretty well, essentially like, like give it its own like internal issue tracker just for the agents. And then, you know, if that becomes a, a concept that seems useful, generally maybe we will think of how to package it in. But I mean, generally we try to just keep it to composing the primitive if we can.You know, another example of this is we have no built-in memory concept. Memory is, is just pages and databases. And so if you wanna give a memory, just give it a page and give it. Edit access to that page and the[00:38:03] swyx: human can edit it. Agent can edit[00:38:04] Simon Last: it. Yeah. And so that works, that pattern works extremely well on it.And you know, depending this case, you can have it be just a page or it could be an entire database with, you know, or, you know, I can have sub pages is is pretty on what you can do with that.[00:38:15] Alsesio: So when I was setting this up, uh, I connected my inbox and it was like, do you wanna use Gmail or Notion Mail? And I’m like, I don’t wanna use Eater, I just want you to do it.I’m curious how you think about, you know, notion, mail, notion, calendar, all of these kind of ui ux interfaces, full stack[00:38:29] Simon Last: notion.[00:38:30] Alsesio: Yeah. When like at the same time you have the agents abstracting them away from you in a way, you know, how do you spend like the product calories so to speak?[00:38:37] Simon Last: Yeah, I mean, I think it’s pretty important that you don’t have to use, not your mail to connect to the mail capability.So we can just connect to Gmail or, or whatever you want, uh, to use. And we’re thinking of the mail service as being really great to the extent that it’s really agent built, right? So maybe the mail app is just sort of a prepackaged agent that helps you automate your, your inbox.[00:39:00] Alsesio: Yeah, the auto labeling is great.Think[00:39:03] Sarah Sachs: the, when we, um, integrate with Gmail for instance, we have a series of tools available that are available via MCP or API to Gmail. When we integrate with Notion Mail, we have the Notion Mail engineering team to build us the, um, exact right tools that optimize latency, optimize performance and quality.They own that quality. Um, there’s product leads there. They’re directly thinking about the user problems that happen in mail. So it tends to be when we build integrations and connections, we build natively first. Um, and then think about, um, extending them generally just because it’s also easier. Mm-hmm. Um, um, to build natively first.Um, so that tends to be how we phase things out.[00:39:43] swyx: Talking about integrations, you prompted me, so I gotta ask. M-C-P-C-L-I. What’s going on? What’s the[00:39:48] Simon Last: Yeah. Opinion. I think, I mean, I’m, I’m definitely bullish and excited about cli. I think there’s a few really cool things about cli. So one really cool thing is like, um, is that it’s in the terminal environment, so it gets a bunch of extra power.So it, you know, for example, it can like, like paginating and cursor through like long outputs. Um, and it has a progressive disclosure inherently. Uh, so, you know, you don’t see all the tools at once. It’s just, you see the CLI wrapper and you can like use the, the help commands and, and, and read files. And then I think the most important thing that’s, that’s super cool is that there, it’s also inherently a, a bootstrapped.So if there’s an issue, uh, the agent can debug and fix itself within the same environment that it uses the tool.[00:40:30] swyx: Mm.[00:40:30] Simon Last: Right. Like, you know, I think I saw a tweet this morning. Someone said, you know, my agent didn’t have a browser, so I asked it to make all a browser tool and within a hundred lines of code, it gave itself a little browser, like, like wrapping the, the, the chromium API, um.That’s pretty incredible. And then if there was a bug, it would just immediately try to fix it. Mm-hmm. Right. On the other hand, if you use an, you know, if you use like of, of the Chrome dev tools, MCP, I’ve had this issue where like, like sometimes the transport gets like messed up. If it gets messed up, the agent has no way to fix itself.It, it no longer has a browser, it’s, it’s not broken. Right. I think that’s, that’s pretty fundamental, but I would say like a lot of the, the bad things about it can be fixed. Uh, so I think like, as a progressive disclosure, that can be fixed with, with right harness. Like, it, it obviously doesn’t make sense to show it all the tools all the time.That’s not really inherent to the MCP protocol. It’s just like how you wrap it and use it.[00:41:16] swyx: There’s many poorly built MCPs because we didn’t know.[00:41:19] Simon Last: Yeah, yeah. I mean it was just early, like, like the obvious thing is, uh, you know, to start with is, is to just show it all the tools and it’s like, okay, now we have a hundred tools.Yeah. And like the tool calling actually works. So let’s of[00:41:28] swyx: your success[00:41:29] Simon Last: give it a way to like, like filter to source the tools. So yeah, I would say like broadly speaking, I’m really bullish on cli. I’m still bullish on CPS and in a certain environment. I think in, in particular, CP is really great for when you want sort of like a narrow, lightweight agent.I think there’s, there’s definitely a lot of use cases where, where you don’t want like a full coding agent with a compute run time. And also you want it to be like more tightly permissioned. MCP inherently has a really strong permission model, like all you can do is call the tools. A CLI is a little bit murkier.It’s like, can I access the, if PI token are you, like, properly sort of like re-encrypt the token so it can’t like exfiltrate it, it introduce a lot of like, like new issues, which are. Real and hard to solve. And MCP is just like the dumb simple thing that works and it that it’s pretty good.[00:42:12] Sarah Sachs: I’ll add two more perspectives, not from it working well for Notion, but how notion like commits to both platforms.Notion is dedicated to being the best system of record for where people do their enterprise work. So we will always support our MCP and so far as other people are using cps, right? So regardless of our perspective, we’ve put a lot of effort into our MCP and we have a fantastic team that we’re building, um, to do more there.And the second thing I’ll say, I think, um, we all think a lot, but lately I’ve been thinking a lot about making sure there’s a value alignment and pricing, um, with capability.[00:42:43] swyx: Literally our next question[00:42:44] Sarah Sachs: and. Needing language to execute deterministic tasks feels wasteful and requiring on a language model to interface with third party providers seems wasteful for tasks that don’t require it.And particularly because our custom agents are using usage-based pricing. We think of pricing as like the barrier of entry for use of our product, and we’re quite committed to making sure that it’s not wasteful. Um, not just because it’s a bad deal for our customers, but it’s also bad business. We wanna have as many buyers, like there’s a, there’s an elasticity of demand and so if we can have our agents properly execute code that calls on CLI deterministically, it’s a one-time cost, right?Versus constantly having a language model integrate with an MCP over and over and over and paying those like repeated token fees and it’s happening outside the cash window, then you’re paying for it over and over and over and it’s just kind of unnecessary and less deterministic when it doesn’t have to be.[00:43:36] Alessio: Yeah, the open-endedness I think is like, the main thing is like, well, if I go write code to just call an API, I would never use an MCP. But then you need an NCP sometimes when you know what to call, but you don’t want it to restart versus like, I think the it built a browser from scratch is like, it’s great when you’re doing it on your own, but like if your customers were having your AI write a browser from scratch every time and you had to pay the token cost of that, yeah.You’d be like, no, no. The Chrome dev tools CP is actually pretty great. Just use that. I’m curious, how do you make that decision? Like should it be. Just straight API call very narrow. Should it be an MCP? Should it be super open-ended?[00:44:10] Sarah Sachs: Do you mean for when we ship notion capabilities or when we add capabilities to[00:44:13] Alessio: notion[00:44:14] Sarah Sachs: AI or,[00:44:14] Alessio: I mean, you might have a capability that the only way to do is an open-ended agent, like an agent with a coding sandbox.[00:44:21] Sarah Sachs: Yeah. In Notion ai they’re not explicit, not We also ship an MCP.[00:44:24] Alsesio: Yeah. Yeah. In B,[00:44:25] Sarah Sachs: yeah.[00:44:26] Alsesio: Internally. Okay. Like is there ever a discussion of like, we’re not gonna ship it because we’re not able to tie it down? Or are you happy to just like,[00:44:33] Sarah Sachs: um, no. I mean, there are a lot of things where we choose not to use MCP because we wanna add more high touch to quality.I think search an agent to find is like the largest instance of that, where we have. Um, slack and linear and Jira search and notion that is not using necessarily the search MCP functionality that is provided by those companies. And that’s because it’s quite critical we think, to how our agent trajectories work is for us to have a little bit more control on the functionality of the search journey.And so it usually comes from quality and there’s a long tail of things and that’s why we built an MCP client or an MCP server, excuse me, so that people can connect whatever they want. There’s that long tail, right. But we, for search particularly, I would say that’s like the primary entry point, but there are other connections as well that it’s a little bit of secret sauce about when we are okay with like MCP functionality and user driven off.And when we actually want to wanna carry a lot more ourselves.[00:45:31] Simon Last: I think that there’s not really a conflict here. There’s just like different layers of the stack and different abstractions. I mean, if we were to like map it out, it’s like, you know, you’ve got CPS give you a, a way to, it’s a protocol for gaining access to tools.It’s an open protocol, so you can, you can easily get like a long tail, many things. So if you open up our, like in the tool settings, oh, that’s saw the trigger. Actually, actually, that’s something that MCP can’t do. So if you scroll down and you, and yeah. The, the tools and access, so you’re gonna a connection.Yeah. MCP is a really great way to gain access to tools or really well, but you just looked at the, the trigger why, for example, there’s no trigger protocol. And so those are things we had to build ourselves. And then there’s, there’s some integrations where we use MCP. Like, so for example, I think the, you know, the linear and the GitHubmm-hmm.[00:46:20] Simon Last: Use M ccp, but, but the Slack mail, er, those are actually ones they built in house. And we spent a lot of time really fine tuning all the tools to make the really good and also like building out the triggers. So it’s just like different layers of the stack. Some things make sense sometimes. And then, you know, we just have to like, like harness the right tool at the right time.I don’t think there’s an inherent like. Strong conflict between these things.[00:46:40] Alsesio: Do you have a canonical representation of these tools internally where like you wrap these things together, the MCP plus, the custom built?[00:46:46] Simon Last: Yeah. Yeah. We have like internal abstractions for like what is a tool, what is an agent, what is a completion call?Yeah.[00:46:55] Sarah Sachs: We even have internal obstructions for like, what is a chat archetype, whether it be from teams or Slack.[00:47:02] swyx: Yeah.[00:47:02] Sarah Sachs: Right.[00:47:02] swyx: It’s like the only[00:47:03] Sarah Sachs: way a to[00:47:03] swyx: build with, with ai ‘cause everything’s moving so quickly, you would have to attract it so that you can swap things up.[00:47:09] Simon Last: Yeah. I mean, there’s always a dance.We, we probably rebuilt our, our framework like, like I said, like, like five different times. Um, it’s always a dance of like, okay, how does this new thing work? Right? What should the abstraction be? Like, what is OpenAI giving us? What is that therapy giving us? Um, you know, like we’re trying to wrap over it. I think.I think we’ve been pretty successful with that. It, it’s just a matter of like, like staying nimble. Yeah. And making sure that you always have like the simplest, dumbest obstruction you can, that you know, that the maps are different things. Yeah. So, so we have like a tool integration abstraction, for example.And then MCP is like a, a type of integration.[00:47:41] swyx: Yeah.[00:47:42] Simon Last: That’s, that’s one of the,[00:47:43] swyx: this might be a big ask, uh, um, but I’m gonna try, uh, which is, you said, you’ve said multiple times, you rebuild a few times, like five times through, I don’t know if the, what the right number is. Is there like a brief history of what was the each rebuild doing and Yeah, I know it,[00:47:56] Simon Last: I can try to do that.I[00:47:57] swyx: mean,[00:47:58] Simon Last: yeah, there’s[00:47:58] swyx: interesting, you need, you need to rag over[00:48:00] Sarah Sachs: archeology.[00:48:00] Simon Last: I mean, the first version, the first version that we started building in like late 2022. Oh my gosh. Well, there’ve been many versions actually. Okay. Well the writers, the,[00:48:08] swyx: I like the highlights. The,[00:48:09] Simon Last: the[00:48:10] swyx: like,[00:48:10] Simon Last: oh[00:48:10] swyx: wow.[00:48:10] Simon Last: I mean the, the first version we built was actually a coding agent.Yeah. So we’re like, oh, instead of building tools, let’s make everything be JavaScript and then we’ll just give it JavaScript APIs and we’ll just write code. And that’s how it speaks to the tools. Um, but at the time. It just sucked at writing code. It wasn’t that good. Uh, so then we moved to, uh, more of like a tool calling obstruction.A tool calling didn’t exist yet, so we created this whole XML mm-hmm. Of representation. And a big, a big learning in that version is we were catering way too much to what made sense for notion and notions data model versus what the model wants. So as an example, we created this whole, uh, XML, uh, format that can losslessly mapped in notion blocks.And the transformation between them is super easy to do. Uh, and then we created this sort of like mutation operations to, to add to pages. Um, but it sucked because the model didn’t know the XML format and also the, and you have to prompt it[00:49:04] swyx: in and[00:49:04] Simon Last: Yeah, to prompt it in and the team just more convenient.And so yeah, we’re like, okay, well it has to be marked down. Uh, uh, the model’s no markdown, you know. So, uh, we did a whole project around basically, uh, uh, creating a notion flavored markdown where, uh, you know, the whole goal was like, it has to be just simple markdown at the core, and, and then we can add some enhancements.And it doesn’t have to be a, a full lossless conversion. That was a big one we did. And, and then we did a whole similar learning to, uh, the, the database layer. So, so to query a database, I mean, in the notion API, the way you query a database is there’s a crazy JSON format and it’s, you know, kind of limiting, but it maps nicely to like how we represent things internally.We scrapped all that and we’re like, okay, let’s just make it SQL light. Everything is a SQL Light database. You, you can query it just like a SQL light query. And the models are super good at that. So[00:49:51] swyx: give the models what they want.[00:49:52] Simon Last: That was another one. Yeah. Yeah. Give us what they want. I mean, that was, I would say that was a big learning is just, you know, really be, be savvy and really careful thinking about what the model wants in terms of, you know, its environment and, and, and cater around that.And really try so hard not to expose it to any complexity about your system that, that’s unnecessary.[00:50:12] swyx: Notions underlying database is Postgres, right? Not sql, right? Yeah. So I don’t know if there’s any mismatch there.[00:50:18] Simon Last: That one was kind of a fortuitous thing because we actually already, um, had a big project, uh, going where, so, so we have this, um, when you query Notion database, it’s actually querying this like, uh, cluster of SQL databases.[00:50:34] swyx: Mm-hmm.[00:50:35] Simon Last: That’s something that we’d already been working on even before the agents came around.[00:50:38] swyx: Yeah. You know, you guys had a fantastic blog post about it and like it’s, it is actually a really good database engineering knowledge to have that from you guys because where else would we get it?[00:50:47] Simon Last: Yeah, yeah.It’s a, it’s, it’s a crazy engineering problem when you want to have like millions and billions of tiny databases or where, where some of them are tiny, but some of ‘em are, are very large and want everything to be very fast.[00:50:57] swyx: Yeah. And also like, not that hierarchical sometimes, you know, uh, so somewhat of a graph.[00:51:02] Simon Last: Mm-hmm.[00:51:03] swyx: I do like that history because I think that shows the evolution that you guys went through and the work that went into it,[00:51:09] Sarah Sachs: that he just ended you a year and a half ago.[00:51:11] swyx: Oh, okay. Okay. Oh,[00:51:13] Simon Last: I need to, I need[00:51:13] swyx: to hit continue.[00:51:14] Sarah Sachs: If you’re curious. I mean, we can keep going. Just saying like, that’s really,[00:51:18] Simon Last: that’s another one.Yeah.[00:51:19] Sarah Sachs: I lemme think. Well, no. ‘cause there was tool calling and then there was research mode, which wasn’t a fully agentic tool calling. Um, then we moved away from few shot prompting entirely to tool definitions. Um, and now we’re thinking about Agent 2.0.[00:51:34] swyx: So no fusion prompts ever. Right.[00:51:35] Sarah Sachs: Uh,[00:51:36] swyx: okay. No, maybe not.[00:51:37] Sarah Sachs: I know never, but[00:51:38] Simon Last: yeah, that kind of went away. It’s an interesting thing,[00:51:40] swyx: right?[00:51:41] Simon Last: Yeah. I mean, so[00:51:41] swyx: these just instruction follow really well,[00:51:44] Simon Last: I would say if there’s been like a general arc where, you know, it’s like you gradually strip away everything. And it, it looks more a GI like. And so, you know, it it, it started out as like, it’s a one shot, one prompt.There’s a few shot examples. And it became like, okay, actually let’s give it, let’s give it tools, but it’s still a few shot examples. And then it became actually like, no, no, no, let’s just give it a whole bunch of tools. One big, big shift that, uh, that we I’ve been working on recently that’s about to ship is, um, you know, what happens when you have a lot of tools?[00:52:13] swyx: Yeah.[00:52:13] Simon Last: So then tool search. Yeah. So then a, a progressive disclosure becomes really important. So, you know, we were, we sort of hit a bottleneck where our, our agent worked really well. Um, we hit a bottleneck where, um, it, it, it became pretty hard to. Add new tools. Mm-hmm. And we, and we became sort of worried about it, like, like breaking the model.It’s like, okay, someone No, I[00:52:32] Sarah Sachs: just heard it was like saying hello was like thousands and thousands and thousands[00:52:35] Simon Last: Yeah.[00:52:35] Sarah Sachs: Tokens. It was really slow.I[00:52:37] Simon Last: can see you’re the efficiency person here. Yeah. It’s, it was too many tokens. But also it’s a quality issue because it meant that like any engineer could introduce this, this new tool for some like, like niche feature.And it would kind of like, like Nerf, the overall model by like causing it to call the tool too much and stuff like that. And so, um, it, uh, yeah, so we, uh, we had an effort basically to, to make our harness. Uh, implement progressive disclosure in, in a nice way. Um, that’s a big shift.[00:53:00] Sarah Sachs: You said earlier, like everyone says reasoning models was the big shift.Like what’s more there? When we went away from few shots to describing the goal of the tool in like goal-driven, basically moving from a DAG to like a, a true system with feedback, that’s something we could distribute tool ownership to the teams. Much better because when it was all few shots, it was everyone truly editing one string and things would o would compete.And like the order, there were all this, all these papers about, oh, you know, not all context is created equal. The higher up it is in your examples, like the more the model listens and we’re trying really hard to like fight against the order and the selection of the few shot. And that really had to be a center of excellence and it didn’t scale with the number of people for the need the company had.It was really just five or six people that were allowed to even touch that or had to approve it rather in our code base. And then now we can actually, with the right eval, setup, distribute, um, so that everyone owns their tool and their tool definition. And sometimes we have crazy things where like we write two tools that have the same title and the agent crashes and stuff like that.So like, you know, there are issues actually, believe it or not, um, Andro couldn’t take it. Sonic couldn’t handle two tools with the same name and open AI GPT five point. Two, it was like, I can figure this out. So that was an interesting one that we learned by accident through a, a sev.[00:54:17] swyx: But I mean, then, you know, the underlying representation is that’s a addict, right?Clearly. Like that’s a safety. Yeah,[00:54:23] Sarah Sachs: exactly. Exactly. Um, but so that was like a big shift for the company and velocity not immediate because the AI team that was the center of Excellence team that owned, you know, that one file of few shop prompts had to become a platform team overnight, and that wasn’t natural.Yeah. Yeah. But I would say that in terms of like the velocity of how we contribute to the agent, beyond coding tools, obviously being a big velocity lever, um, being able to distribute tools and not have to all collaborate on like one very select string of system prompt is truly, I would say the biggest lever on how we’ve scaled.[00:54:57] Simon Last: We’re fighting to keep the prompt as short as possible now and then, yeah. Yeah. It’s, uh, in the latest version of the agent, I, it’s not in custom agents yet, but it will be like, like next week, a week after or so, um, there’s now like over a hundred tools. Just for all, all the crazy notion stuff. So we’re able to, to really go deep and like,[00:55:11] swyx: would you list those tools publicly?Is this like IP or, uh,[00:55:15] Simon Last: no, no, no. It’s, it’s totally public. You can ask,we[00:55:17] Sarah Sachs: can fine[00:55:19] Simon Last: just ask. You can just ask the agent and, and we’ll tell you.[00:55:21] swyx: I find,[00:55:21] Sarah Sachs: and we’re gonna post a bench. I mean, like you’re[00:55:23] swyx: post bench.[00:55:24] Sarah Sachs: We don’t think our system prompt is our secret sauce.[00:55:26] swyx: Yeah. Mm-hmm.[00:55:27] Simon Last: Great. We don’t try to hide the tools at, at all.I think it’s, I think it’s kinda important actually as an operator, you know?[00:55:32] swyx: Yeah. As a power user, I wanna be like, oh, I can do this, this, this. Great.[00:55:35] Simon Last: Yeah. Yeah. I mean, one thing that, one phrase we say internally in lot is to, to teach at the top of the class. You know, we wanna build like, like the customization’s, kind of like a power tool.I mean, we try to make it as easy as possible to set up, but we want it to be pretty deep and sophisticated. And I think a huge part of that is the operator needs to be able to interrogate. The way the system works. And a big part of that is like, what are the tools? How do they work? You know, like, like how should I prompt it to use the tools in the right way?[00:56:00] Sarah Sachs: I’d actually say we don’t try and make it as easy as possible to use. ‘cause the more we do that, the more we abstract away that interpretability, that Simon’s talking about, that basically nerfs the model or nerfs the agent from being super capable. So a huge. I would say turning point, I can think about like the week and a half that we all came together on this as we were building custom agents, was that alignment that we’re not trying to build for everyone here.We’re not trying to build the model that, um, or build the user experience that anyone can figure out how to use. ‘cause the more we do that, the more we just diminish its capabilities. And that was a big, you know, everyone in a couple Slack messages aligned on that, that actually made us all work faster again.Right? ‘cause we all were like more centralized on who we were building for[00:56:40] Alsesio: what does the meta prom generator look like? So I looked in the system prompt that it, gen, for example, uses emojis. That’s not a, you know, obvious thing to be doing.[00:56:50] swyx: Wait, did you just[00:56:51] Alsesio: ask it? What’s your system prompt? Oh no. This is how to generate prompts.[00:56:54] swyx: The[00:56:54] Alsesio: prompts generate prompts.[00:56:55] Sarah Sachs: We call it set. Then it’s[00:56:56] Alsesio: a set.[00:56:56] Simon Last: Well, well, so this is actually just the agent. So, so one thing we did that, that I really like with the custom agents is it can set itself up. So we not only give it access to use the tools than it has access to like send your emails or whatever, um, but it has more tools to set itself up and to debug itself.And so when you ask it to write system prompt, it’s just your agent itself is doing that.[00:57:16] Alsesio: So this is just the model preference. You’re not really injecting and then into the model too much.[00:57:21] Sarah Sachs: No, no. We haven’t guide the same thing. Makes a good custom agent and Yeah.[00:57:23] Alsesio: Yeah.[00:57:24] Sarah Sachs: And things like that. And then, and, and it’s really nice too because like if it fails, you can ask it, why did it fail?And then say, okay, update your instructions so it doesn’t fit again. Obviously we should build product of self-healing that’s, that’s next on our roadmap. But um, it actually, it creates a nice system.[00:57:40] Simon Last: Yeah. We do essentially give it like a development guide. Here’s, you know, here’s how to make a custom agent.Here’s how to like, like help the user test it end to end, you know, to, to help them gain confidence that it works. Stuff like that.[00:57:49] Alsesio: Mm-hmm. Yeah. Yeah. The fixing thing work, I mean, it wasn’t automatic, but I, I miss set something up and then there works like a fix button and then just, yeah,[00:57:58] Simon Last: yeah, yeah. One thing where[00:57:59] Alsesio: fix agent makes more,[00:58:01] Simon Last: it’s, it’s actually, it’s an interesting sort of permission problem.So like, right. The thing about custom agents. That is that by default it has no permission to do anything and then you have to explicitly grant it all its permissions and that’s what lets you trust it can work in the background. Right? Like you can know like, oh, it, it can read my email but not send email.Okay, I can trust that. Right. If you let it fix itself, you know, you’re, you’re breaking that, that version there, it, it is not allowed to edit its own permissions. But as, so, you know, in the current product you can sort of click a button to fix, but now you’re entering sort of an admin mode where, where, where you’re in a synchronous chat and, and you can, and you can see what it’s doing.[00:58:35] Sarah Sachs: Yeah. And it, and it confirms before it[00:58:37] Alsesio: changes.[00:58:37] Sarah Sachs: Yeah.[00:58:37] Alsesio: The thing that I really like that most people don’t do is like, the editing chat is the same thing as the using chat. Like you can message the agent to both edit it and use it, versus a lot of other products are like, I think[00:58:49] Simon Last: that’s really key. I think, I[00:58:50] Sarah Sachs: think a lot of designers will feel so happy you said that.Yeah. ‘cause we spent, we, we call this flippy, um, uh,[00:58:55] Simon Last: yeah. What is[00:58:56] Alsesio: this?[00:58:56] Sarah Sachs: What do you mean? This,[00:58:57] Simon Last: this view of, well, yeah, so if you sort of, if you close that in like open settings, you can see sort of Yeah. This is, we. We call it flippy because you know, we started with sort of like the settings were the sort of the main page and then you can test the agent.The a GI pill way to think about it is like, oh, it is just the agent. Everything’s the agent, right? It can set itself up, it can test itself and it can run the workflow that they want to run. Uh, so we flipped it. So the main view I was looking at is the chat and, and then the settings is more just like a side panel at, at sort of previewing the changes that it’s making.So you can introspect on them or, or you can also make changes manually if you’d like. But, but we wanna design the experience from the get go. So you don’t have to ever any of the settings manually, you can just talk to it.[00:59:39] Sarah Sachs: And the inside baseball is like how this works was probably the launch blocking part of this build.Right. Um, especially ‘cause we had a lot of early adopters that were used to the old way and that’s like the benefit of adopting in public. But then changing how people think about setting up custom agents when they already had this flow in and of itself was difficult. Um,[00:59:57] Simon Last: I mean that’s really fun ‘cause the, we, we ended up sort of uh, uh, painfully delaying the launch.Mm-hmm. By.[01:00:04] Sarah Sachs: A month?[01:00:04] Simon Last: A few weeks. Yeah, definitely. Like, like a month or so. Um, but the whole team was super enthusiastic about it though. ‘cause it was just so much better. It was like, oh yeah, obviously you have to chat with it, right? Yeah, yeah, yeah. To set itself up. And everyone was super bullish on that, so it was like, like painful for a second.But then everyone’s like,[01:00:19] Sarah Sachs: right, and like back to, you know, organization design, which I probably care about more than Simon, but like the people that built this are three engineers from three different teams. Because we’re like, we need to launch this and we need to fix this. And then we’ve just built a company where then we just put people on it and no one complains, the manager doesn’t complain.And we were able to unblock and just ship it.[01:00:37] Alsesio: Yeah, yeah. But being in a failure chat and asking it to just fix yourself is amazing. Versus I gotta copy this and put in the settings chat. Mm-hmm. Mm-hmm. To do[01:00:49] Simon Last: it. So yeah. Interesting. Like a trade off in there that, that we’re trying to explore, which is, you know, we wanna be like a business enterprise safe agent where you can delegate something and, and trust that it’s gonna work.But also we want to get some of that sort of bootstrapping power that, that you feel like when you’re coding it is making a browser, like for itself, right. There’s something there. I think that’s, that’s really important. So it’s, we’re trying to sort of. Navigate that, that, that trade off and try to get you both.[01:01:12] Alsesio: Now it’s free, it’s amazing. Uh, I’m worried about when I have to start paying. How do you think about, so you have notion credits as a payment for this, which is like separate from the usual tokens, uh, that the model generates. How do you design pricing, value-based pricing based on the task and things like that.[01:01:30] Sarah Sachs: So they are, um, the credits and payment structures associated with the token usage. The reason that we had to make it not just throughput of tokens is that it’s not always priced that way. Like our, um, fine tuned and open source models are served on GPUs, right? Web search is priced differently. You know, if we were to host sandboxes, those are priced differently.So we had to think of an abstraction above tokens. And it’s also not just tokens, it’s the token model. Um, and serving tier trade off, right? Mm-hmm. Because we can have priority tier processing, we can have asynchronous processing. The cash rate could be different, um, depending on who uses it when, right?And so we wanted to, um, from the get go commit to making sure that customers were getting the fair deal. Not necessarily that we were making a ton of money off of it, but that customers were paying for what was reasonable. That’s the fundamental of where we started. And also, you know, we’re selling enterprise sa, so if we sell credit packs and you get discounts if you’re an enterprise and you buy a certain amount of credit packs and things like that.So it also just helped the sales motion, um, work a little bit easier. So that’s the answer on the abstraction of credits to dollars. Now was the question how we decide how to price it or?[01:02:34] Alsesio: Yeah, like, I mean, I think there’s, all tokens are not made equal, but yeah, we obviously get charged mostly equal. Like you can ask, uh, codex to create you a dumb tool for like, I created one for our StarCraft two land for people to like find a game.Uh, but then people create it to build features and like billion dollar companies. But the token price is the same.[01:02:53] Sarah Sachs: Yeah.[01:02:54] Alsesio: Like for you, I can ask this to update my favorite recipes doc. I’ll do it, but I could ask it to like respond to an email from an investor and like the value is like very different, you know, and you could charge more, but you’re not necessarily doing it.So I’m curious if there was any discussion.[01:03:11] Sarah Sachs: I think, I think that, um, that’s not where the market is right now. Um, number one, the second reason that we’re not doing that, as it ended up being kind of complicated to figure out what was complicated or not. So we at first we were like, let’s just charge on agent runs.And you know what, you went through all the different versions that ultimately just brought you back to a lot of complexity that mapped directly to token throughput. And so it, it’s also just simpler. Um, it’s quite difficult, um, to build those pricing systems. And, um, I actually think that one of the biggest reasons we want had usage based pricing for this capability is.We’ve had our core agent for a while with a model picker and there were certain models, um, or certain functionality that we had margins to maintain. And if we wanted to ship this functionality, uh, you, we couldn’t afford it, it would bankrupt the company. If we let, for instance, like autofill or the database autofill feature, we’ll soon be agentic That will be associated with usage based pricing.Because if every single autofill action was an agent running on Opus on every single database sell, it would be billions of dollars, right? And so we had to find a way for the customers that wanted to do more and wanted to give us their money and pay more to find the outlet for them to do it, that we didn’t have to apply to the lower end of the curve.And also, not all knowledge work is equal. Like there’s different points. A lot of the agent workflows here really saturate model capabilities. Like you don’t need a complicated model for it. And so charging based on token usage, um. It, we couldn’t just decide for you that you wanted your email client to be dumb or not, right?Like, we want you to decide if you want to have Opus Auto Triage all of your emails, we will actually give you nudges in the product to rethink if that’s the right choice. Right. Um, because also not every user, um,[01:04:52] Alsesio: understand.[01:04:53] Sarah Sachs: You’d be surprised in user interviews. People would be like, oh, I didn’t know that.So now we actually have a little hover that tells you like if it’s expensive or not. Yeah. I mean, it’s also slower. So the thing that’s interesting is like people don’t care about speed and custom agents. And so the incentive of like, uh, haiku being faster, people don’t care when it’s asynchronous. Um, and so we want to only provide the service of extra, extra benefit that people want.And the best way to do that is to incentivize them because it’s their own own money.[01:05:21] Alsesio: It must be confusing for people that are not familiar. It’s like, why is there no 5.3. You know, you open this thing and it’s like, is there something missing? Manual. It’s not their fault. Not their fault.[01:05:30] Simon Last: Yeah. That’s just the world we live in now.[01:05:32] Alsesio: Yeah. It just radical jump point too, it’s like Cloud had that.[01:05:35] Sarah Sachs: I mean, but auto is heavily, I think what’s actually been hard for us is to tell convince people that auto is not just our cheapest, dumbest model, but actually the model that’s best for the task that you wanna do. Um, alright. Steve.[01:05:46] swyx: I mean,[01:05:48] Sarah Sachs: exactly.Nice. Um, and a lot of our job is actually figuring out auto because it’s like,[01:05:54] swyx: this is the agent lab. Every agent lab has an auto. Mm-hmm.[01:05:57] Sarah Sachs: Yeah. And[01:05:58] swyx: that’s the job.[01:05:58] Sarah Sachs: Exactly. Because if you think about, like I said, I come from Robinhood, like you could spend a lot of time keeping up with the markets or you could have a auto investing, right?And you can have an index fund or you can have[01:06:12] swyx: roboadvisors[01:06:12] Sarah Sachs: of the robo advisor. And so like at a certain point we also can be roboadvisors and like we have a lot of people figuring out what model is best for the right task. And we now, we’re not using auto as a, as a margin maker, we’re just using it to kind of reduce stress.It’s not opus, that’s for sure. Yeah. Because a majority of the tasks people are doing aren’t opus level, um, intelligence.[01:06:34] Simon Last: The other thing I would say is the, um, you know, unlike a lab, we aren’t fully incentivized just for you to use as many tokens as possible. We’re actually really interested in. Giving you the right tool for the job.A lot of the time, the right tool for the job is actually just writing code and not even using agent at all. So that’s, that’s something that we’re investing in a lot is like, you know, imagine your, your agent can actually automate itself out of a job. Right. We would love if that were true.[01:06:58] Sarah Sachs: I feel very strongly about this because I don’t necessarily feel like that’s the SKUs that Frontier Labs give you.I feel like they are just getting more and more capable and more and more expensive, which is fantastic for the use cases of when people wanna do really complicated things on Notion. Um, what’s difficult is like that market that I think right now is no man’s land of where reasoning models were six months ago, that the nano haikus, et cetera, haven’t caught up to, because now we’re just paying more for those, um, for like extra capability that we didn’t necessarily need and so are our customers.Mm-hmm. And, um, labs aren’t necessarily incentivized, um, right now with how few players there are to be meeting the market everywhere. They just need to be the cheapest. They don’t need to be at value that the customer wants.[01:07:41] swyx: Hmm.[01:07:42] Sarah Sachs: If no one’s cheaper than them, then they’re the cheapest and that’s good enough.Right. And so we’re doing a lot to make sure that we have the right optionality, um, to switch between models and also invest in open source because the open source models actually are, um, getting to be the place where reasoning models were three, four months ago. And, um, that’s what’s filling that gap right now.So you’ll see we offer Mini Max and, um, we are collaborating a lot with different open source labs to think about notion’s last exam and how they can do better on these types of tasks. Mm-hmm. So that we can offer them for that intelligence to price to latency trade off. Because, you know, in that triangle of intelligence, price, um, intelligence, price and latency, excuse me, um, users get to choose where they are, but right now, um, there’s not, the whole triangle isn’t filled with models, right?Yeah. And the more that different models build cluster triangle capability, everyone’s clustered in capability where everyone’s cluster. I mean, haiku’s not that much cheaper. No one’s really in the middle. Like people really tend to. Cluster round two. Mm-hmm. Like, this is really capable and it’s really fast made, it’s really expensive or whatever.Right. And so we just wanna make sure that that triangle’s filled, um, and we wanna offer the models that fill it and we wanna, um, gate guide users to understand when they need it. Yeah. Um, which one,[01:08:54] swyx: I mean, all I’m hearing is that someday you’re gonna change your model. You have lots of tokens.[01:09:01] Sarah Sachs: I don’t know if, what do you mean by train your model?You train[01:09:03] swyx: your[01:09:03] Sarah Sachs: own, train your own model. Don’t know. We have money to train a founda. I mean,[01:09:06] Alsesio: you go raise[01:09:07] swyx: it. Yeah. You, you can raise it.[01:09:09] Sarah Sachs: That’s your job, Simon. No, I, I don’t think that that needs to be our core competency.[01:09:14] swyx: This is usually the, the thought process that leads to like, well, no one else is doing it.We, we will take a crack. You know,[01:09:19] Simon Last: I think I’m, yeah. I mean, I feel like to the extent that we do anything like training in the other area I’m actually most excited about is, um. Less of like one big model for all the users, but like as, as, as it becomes more possible to do, you know, to make like a specific fine tuning that’s like really knows your context of, you know, your company, the people that work your company, what’s going on.I think that’s, that’s pretty interesting because if you, if you had a model that really knows your company, I think that would be like a huge quality uplift.[01:09:47] Sarah Sachs: We actually have some enterprise vendors that kind of ask about this, um, along with bring our own key. Like if I have a model that really understands like my enterprise that we’re training for all these reasons, these tend to be like quite large institutions thinking about how to let people bring their own models.But those models have to function with like[01:10:04] swyx: right[01:10:04] Sarah Sachs: understanding how to call our tools. And that’s where again, having, um, more. Public system prompt is like beneficial to notion, right? Um, we want all models to plug into notion as, as, as well as they can. Um, that being said, like of course there are certain aspects of notion where we do fine tune and do reinforce and fine tuning on our own capabilities.Um, but that’s not necessarily trained on user data. Um, you don’t need that, that much data, um, in the first place. And that’s where when we have like a data scientist and a, a model behavior engineer really understand where the capability gap is, that’s when we invest there.[01:10:38] Simon Last: I personally burned a lot of time trying to train models.Uh, and it’s tempting, right? It’s so tempting, retraining[01:10:46] Sarah Sachs: every day.[01:10:47] Simon Last: I was doing crazy amount. Yeah, I was doing a lot of different things. Um, and it, I[01:10:50] Sarah Sachs: was the budget person that came and found out and I showed up and I heard that that was happening time[01:10:55] Simon Last: out. You know, like a, a funny thing that ‘cause the sort of an arc that like looped on itself is, uh, you know, back when I was doing tons of training stuff, it takes a long time to do it.Any kind of training run. And so. You end up operating like, like 24 7 around the clock. Like it becomes very important that before you go to sleep, like everything is watch intensive board, all the experiments are, are started. And then as I stopped training, that kind of went away. But now the coding agents have totally brought this back.Mm-hmm. So now every night before I go to bed, I’m like, okay, did I start enough agents, you know, to get them done. I get everything done. So it, it’s, it’s a ding interesting heart,[01:11:26] swyx: this balance of like, you have to try polyphasic sleep so you can wake up every two.[01:11:29] Simon Last: Absolutely. Yeah. Yeah. We, uh, yeah, I have not gone there yet, but, but my goal these days is just to, before I go to bed.The agents are running, and I’m confident that they won’t be done by the time I wake up. Really[01:11:41] swyx: Eight[01:11:42] Simon Last: hours.[01:11:42] Sarah Sachs: There’s a, I won’t say which coding Frontier Lab, but there was a point where he had like outlived like the thread length and context length uhhuh that that coding agent provided. And I DMed you DMed them being like, Hey, I need, I need more.And our account rep DMed me directly and they’re like, is Simon trying to prove string theory? Like what is he doing?[01:12:00] Simon Last: Yeah. I, I had a single coating Asian thread going for I think it was like 17 days. Uh, pretty much continuously.[01:12:06] swyx: Don’t, don’t they just compress? I mean, yeah.[01:12:08] Simon Last: Yeah. It was actually just a bug.It was a harness bug. Yeah. It, it had done compaction like a hundred times probably.[01:12:13] swyx: Yeah. The[01:12:14] Sarah Sachs: other thing that um, reminded me about fine tuning that I think you and I have aligned on is that. Our tools change really frequently, and right now we spend a lot of time rethinking and building tools for capability and fine tuning a model, um, to understand your tool.Like we don’t have legal expertise or coding expertise. So if we were to fine tune a model, it would either be expertise about the enterprise and you know, we have ZDR, no data retention offerings for those enterprises. So we’d have to really rethink how we structure if an enterprise wanted to opt into that or it would be fine tuning and better capability on navigating our tools that doesn’t match with the velocity with which we create new tools.And so it actually really slow us down, um, to have a model that was fine tuned on our tools because we’d have to retrain it and cut a new model every time we did that. And that’s not how we’re set up right now. Um, particularly with the way that we’re changing our, I, I guess we could fine tune a model to like search for tools.It’s just. The, the amount of time it takes to do that, ship it, have the right system, you’re basically making a bet against a frontier capability not serving that, and the time it takes you to build it. Mm-hmm. And that, that time lag hasn’t happened for us yet. It hasn’t[01:13:17] Simon Last: been, yeah. It’s just the wrong trade off.I think. It’s just like you want Yeah. We literally change our tools every single day and if we notice an issue, we will, we’ll, we’ll, we’ll fix the problem. I think a, a good way to think about it, I think is pretty fruitful, is like, don’t focus too much on training. I would think of that as like, that’s an implementation detail.Like what’s the outer loop, right? Like, like the outer loop is you have a model and then some harness or, or system where it’s interacting with the system that needs to work. And you know, if there’s a problem, the way to solve the problem isn’t necessarily to train a model. It’s like, oh, maybe there’s just a bug in one of the tools.Right? And actually 99% of the time it’s a bug in one of the tools, right? And so just fix the bug. And then the outer loop thing that’s really fruitful to think about is like, how can you improve your, your velocity and robustness? Making really good tools, making a good harness, you know, like, like verifying it works.Hmm.[01:14:07] Sarah Sachs: The one place that we do invest more in model turning now necessarily though, is actually in retrieval because, um, we’re at a point right now in our business and enterprise, our AI enabled plans where. The search load and the search traffic. Majority of it’s coming from agents, not humans. And so for every query that’s hitting our elastic search or our vector indices, they’re not coming from humans.And the queries are structured differently. And what’s returned has a different re requirement. Positional ranking matters less, but top K retrieval mode matters more. Right.[01:14:34] swyx: Isn’t top KA form of position?[01:14:36] Sarah Sachs: Of course it is. But um, when you’re training on like click through rate, it’s really, you know,[01:14:41] swyx: yeah.[01:14:41] Sarah Sachs: It matters much less.Number one through number six is very different[01:14:44] swyx: Yeah.[01:14:44] Sarah Sachs: Than it needs to be in the top 100.[01:14:45] swyx: Like the slope is just,[01:14:46] Sarah Sachs: yeah.[01:14:46] swyx: Higher.[01:14:47] Sarah Sachs: It’s a different optimization function for retrieval, um, model. Similarly, uh, what snippet you include matters more or less. Right. So we are rethinking a lot of that functionality, um, to work with how the agents like to write queries and how, um, they wanna, uh, receive information.Yeah. So we are doing like another kind of reinvestment into rethinking not only search for, um, how do agents do searches versus how humans do searches. Um, but we’re also investing in like. Indexing different things now because, uh, how are, how do you index, uh, the setup generator for Notion agent? It kind of breaks our block model entirely, um, where all blocks are nested in each other.Same with meeting notes. Um, and so we do, we, I mean, so we’re hiring ranking engineers and model training engineers, but it’s primarily on ranking.[01:15:32] swyx: Yeah. Does ranking maps to res for you? It does, right. Recommendation systems.[01:15:36] Sarah Sachs: Yeah. Um, yes.[01:15:38] swyx: Right. Okay. Say this, but I’m trying to promote res more in general ‘cause I is weirdly unpopular.[01:15:45] Sarah Sachs: I don’t know why. Um, but the other thing is that, like, I I was just talking about this with a peer, like how much is ranking important versus like, uh, being able to do parallel exhaustive queries. Right. Um, so we’re also, they’re both important. They’re both important, but like they’re both two tools to the same user outcome or the same agent outcome.Uhhuh. Right. And so, um, that. That’s something that we’re also rethinking a lot even on, we just did an experiment on, um, notion ranking at this point, um, for notion retrieval, vector embeddings are less and less.[01:16:15] swyx: Did you see that? Yeah. Notion just, uh, to nine[01:16:19] Alsesio: so long it became dark mode.[01:16:21] Sarah Sachs: We’re working the night shift for you.Right? Looks[01:16:23] Simon Last: pretty good. I’m not seeing any bug.[01:16:24] swyx: You know, I worked on this like parallel search thing where you, you found out to eight different queries, right? Yes. And so you actually need to use the model to work on query diversity so that you get right. Investment space.[01:16:35] Sarah Sachs: And so like the people that are working on, um, ranking and retrieval are the same people working on what query generation is.It’s all one, uh, journey. Yeah. We call it age agentic find. And we’re actually realizing, for instance, that it’s less about a selection. Like we don’t spend a lot of time trying to optimize what vector embedding we use anymore. That was a period of time, but that’s just not the right lever of optimization.[01:16:55] swyx: Yeah. Right. Yeah. Okay. Uh, we’ve gone long. I have to talk about motion meeting minutes and then we’ll, we’ll, we can call it there. Uh, you, you, you just have a lot of comments. Uh, you, you, uh, I don’t know where you wanna start. Um, is it the audio side? Is it the sort of Oh, meeting notes, summarization? Yeah.[01:17:12] Simon Last: Sort of like what makes it work or[01:17:13] swyx: No, just like anything sort of interesting technically, right? Like I think you had, you had some, uh, book points. I always call these like check marks along the way when the, when a guest says something that we, they wanna return to later, I just like, check mark it. Yeah.I’m like, okay. We’ll back to it. Um,[01:17:26] Sarah Sachs: meeting notes was one of those things where at first we were nervous that we’d have to teach people a different way to work, and we were nervous that that was a lot of user friction. I think one of the reasons why, I mean, they’re one of our biggest growth lever. I think they’re one of the most like.In terms of virality of adoption and retention, quite strong. Um, and so we’ve invested more and more as we did that. I think what’s really powerful about it is, again, notion is the system of record of where and how you work. The way that I use meeting notes is every one-on-one and meeting I have is meeting notes.When I do my performance review for myself, myself, review, I say primarily look at all my conversations with my manager and like, write up what I did this year, right? Because if I didn’t talk about it in my one-on-one with my manager, it probably wasn’t relevant for my performance review. So it also just adds a ton of signal on prioritization that’s really helpful for a good system of record.That’s really helpful for like our agent. It’s also like caused a lot of scaling for search and for the agent. Um, and you know, it’s, it’s just an explosion of content when you have transcripts like that. Um, how we do compaction. A lot of that was triggered by meeting notes passed into context, things like that.Um, so it’s been a good impetus for us to think about. Longer form, um, content when you think of it as like a priority, primitive, but it’s been one of the most powerful signals for our agent. Um, because it’s[01:18:44] swyx: unsurprising. Right? Right. And[01:18:45] Sarah Sachs: you’re[01:18:45] swyx: capturing a whole new thing.[01:18:46] Sarah Sachs: So it’s like our own data. Like we want users like, or they’re creating their own data flywheel, right?[01:18:51] swyx: Like it serves me to prefer notion, uh, to put all my stuff because it has my other stuff.[01:18:57] Sarah Sachs: Totally. I mean, the way that, the way that like our teams run right now is. You know, there’s a custom agent that does a pre-read before standup. It looks through all of Slack and GitHub and just says, you know, it, it, it creates a summary and it creates a meeting note and it says Everyone do this pre-read.Then we just press play. We have the meeting, we talk through the pre-read, we talk about what needs to happen next, and then we have a custom agent integrated with our calendar and triggers that then files task for tomorrow or today based on what we spoke about. And, um, sends off Slack messages that we decided in the meeting needed to be follow ups.Like our meetings are hands off keyboard and we’re focused on, um, the root of the problem, not the bookkeeping around the problem.[01:19:32] Simon Last: One thing that, uh, the me, us team had recently that was, but I’ve been blowing my mind, is they, we, uh, uh, they made it so it actually, when it makes the summary, we’ll actually app mention the people that were referenced oof in it.So I, I, I now get notifications whenever someone talks about meeting. Yeah. I[01:19:46] Sarah Sachs: feel like that one[01:19:47] Simon Last: was, it’s like, it’s like, oh, you know. Simon is working on this. Okay, I’m gonna, it’s actually amazing how, because then I’m like, oh, okay, cool. I’m gonna go talk to them about that.[01:19:55] swyx: Right? What, what if they’re two Simons?[01:19:56] Simon Last: Um,[01:19:57] Sarah Sachs: no wait, so wait. It’s powered by the agent. So it’s doing agentic. So if you look at it thinking, I don’t know if this is shipped yet. It will be, when you look at it thinking when it’s doing the summarization, it’s saying, figuring out who Simon[01:20:07] swyx: is most probable Simon[01:20:08] Sarah Sachs: is. Yeah. Um, and we also have like a people to people similarity cash and stuff like that.Yeah, yeah. On the here’s we sort of like,[01:20:15] Simon Last: we also like generate a profile for each person and like, and use that. Um, yeah. I mean of course I can get it wrong, but the goal is for not to get it[01:20:22] Sarah Sachs: wrong. Meeting nuts is just like the agent primitive packaged on top of a transcription. Primitive. Yeah. Yeah. And then a vertical team.It’s probably one of the only teams at Notion that’s completely a vertical team around quality and product like UX design. ‘cause it’s still a Tiger team. Um, with a fantastic manager, Zach, that joined recently, um, from Embr, but, um,[01:20:40] swyx: Zachar.[01:20:41] Sarah Sachs: Yeah.[01:20:42] swyx: Yeah. I, uh, chatted with him when he was talking about with his working number.[01:20:45] Sarah Sachs: Yeah. So he’s, he’s managing that team now and thinking about it as data capture. That’s what meeting notes is, is data capture it, get[01:20:50] swyx: all[01:20:51] Sarah Sachs: the kinds of kind of reframing, um, where meeting notes are valuable as a data capture problem and then working inside, um, like the summarization used to not be age agentic.Yeah. Now it is because it does all the things like figure out who the right Simon is. And one day you can have a custom agent directly integrated in it that knows like what task database the meeting is referring to. And as you’re having the meeting perhaps update the tasks and things like that. Like there’s a, there’s a lot of that experience of where we do our work in meetings that we wanna invest in.Making more seamless.[01:21:18] swyx: Yeah. Uh, opening eyes, doing hardware. Uh, would you ever ship one of these?[01:21:22] Simon Last: Yeah, probably not,[01:21:23] Sarah Sachs: but one of those.[01:21:23] swyx: But you know, this, this is meeting notes in person.[01:21:25] Simon Last: Yeah. Yeah. I, I’d be excited about, I mean, I’m excited about that, that product category in general for sure. Yeah.[01:21:31] Sarah Sachs: I think it’s like, it’s a, it’s a mechanism and it.It, one of those needs to work really well with Notion. We would partner with whoever’s building one of those, I think. Yeah. This is[01:21:40] swyx: be they, they were bought by Amazon. I don’t know. I I can refer you.[01:21:43] Sarah Sachs: And there’s like, there’s some wild companies doing like really cool things that come to our partnerships team that I like to sit in on the demos of, of wearables.I always like to send in on the demos ‘cause I think they’re Oh, okay. Pretty cool. And all of them want to make sure, not just notion, but like you can imagine the ones that talk to you. Yeah, yeah. Um, being able to do search and build context. So like if you’re entering like a conference, um, being able to like do like look at your CRM and do things like that.Um, and you can utilize the Notion agent to do that. So we are in like the very beginnings of those partnerships. I think what’s unique about that particular technology is it goes against what I talked about with custom agents right now, which is the more simple it is, the harder it is to have like advanced controls over its capabilities.Right? And so that would be a great investment for data capture, but not necessarily like our agent is workflows.[01:22:26] Simon Last: It’s something with a different slice of the problem, I would say. Yeah. Like that’s gonna be deeply personal. Like, like your company’s not gonna force you to wear a risk. Wristband. Right. I, I think[01:22:35] Sarah Sachs: it’s good to hear that from me.From you. Yeah.[01:22:38] Simon Last: Yeah. The, the CEO’s gonna force everyone to wear a wristband look, I mean, the slice of the problem that, that we care about is like, you know, can the company have all the context of what everyone said at every single meeting, and then use that to, yeah. To, to derive value for themselves.[01:22:52] Sarah Sachs: It kinda reminds me, I remember once you.Very strongly reminded me, our job is to not make the best harness for agentic work. Our job is to be the best place where people collaborate. It’s like our job isn’t to build the best wearable to capture meeting notes. Our job is to build the best place where meeting notes live. Right?[01:23:11] swyx: Yeah. So it basically, you’re saying everyone else can just pipe to you and it’s fine, right?Yeah, yeah, yeah. That’s, that’s a reasonable thing. All I’ll say is that people, there’s people walking around with notion tattoos on them. They, they’ll wear notion anything. So just, I don’t know, do a limited run.[01:23:24] Simon Last: Yeah, yeah. No, I mean,[01:23:27] Sarah Sachs: we have such understated swag that the idea, like our swag has so few notion lay logos on it.The idea that people have notion tattoos is pretty antithesis to our design principles, so that’s pretty funny.[01:23:38] Simon Last: Yeah.[01:23:39] Sarah Sachs: Do you have one?[01:23:40] Simon Last: No, not, I do not have a notion Tattoo too. I’ve, I’ve seen them. Yeah.[01:23:44] swyx: Cool. Uh, well, thank you so much. This is such a great deep, deep dive. Actually. The chemistry between you two is amazing.Like, I, I can’t believe, like[01:23:51] Sarah Sachs: we work together a lot. Yeah. Different jobs. Work closely.[01:23:55] swyx: Yeah.[01:23:55] Alsesio: That’s it. Yeah. Thank you. Thank you.[01:23:57] Sarah Sachs: Thanks. Thank you. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
We’re proud to release this ahead of Ryan’s keynote at AIE Europe. Hit the bell, get notified when it is live! Attendees: come prepped for Ryan’s AMA with Vibhu after.Move over, context engineering. Now it’s time for Harness engineering and the age of the token billionaires.Ryan Lopopolo of OpenAI is leading that charge, recently publishing a lengthy essay on Harness Eng that has become the talk of the town:In it, Ryan peeled back the curtains on how the recently announced OpenAI Frontier team have become OpenAI’s top Codex users, running a >1m LOC codebase with 0 human written code and, crucially for the Dark Factory fans, no human REVIEWED code before merge. Ryan is admirably evangelical about this, calling it borderline “negligent” if you aren’t using >1B tokens a day (roughly $2-3k/day in token spend based on market rates and caching assumptions):Over the past five months, they ran an extreme experiment: building and shipping an internal beta product with zero manually written code. Through the experiment, they adopted a different model of engineering work: when the agent failed, instead of prompting it better or to “try harder,” the team would look at “what capability, context, or structure is missing?”The result was Symphony, “a ghost library” and reference Elixir implementation (by Alex Kotliarskyi) that sets up a massive system of Codex agents all extensively prompted with the specificity of a proper PRD spec, but without full implementation:The future starts taking shape as one where coding agents stop being copilots and start becoming real teammates anyone can use and Codex is doubling down on that mission with their Superbowl messaging of “you can just build things”.Across Codex, internal observability stacks, and the multi-agent orchestration system his team calls Symphony, Ryan has been pushing what happens when you optimize an entire codebase, workflow, and organization around agent legibility instead of human habit.We sat down with Ryan to dig into how OpenAI’s internal teams actually use Codex, why the real bottleneck in AI-native software development is now human attention rather than tokens, how fast build loops, observability, specs, and skills let agents operate autonomously, why software increasingly needs to be written for the model as much as for the engineer, and how Frontier points toward a future where agents can safely do economically valuable work across the enterprise.We discuss:* Ryan’s background from Snowflake, Brex, Stripe, and Citadel to OpenAI Frontier Product Exploration, where he works on new product development for deploying agents safely at enterprise scale* The origin of “harness engineering” and the constraint that kicked off the whole experiment: Ryan deliberately refused to write code himself so the agent had to do the job end to end* Building an internal product over five months with zero lines of human-written code, more than a million lines in the repo, and thousands of PRs across multiple Codex model generations* Why early Codex was painfully slow at first, and how the team learned to decompose tasks, build better primitives, and gradually turn the agent into a much faster engineer than any individual human* The obsession with fast build times: why one minute became the upper bound for the inner loop, and how the team repeatedly retooled the build system to keep agents productive* Why humans became the bottleneck, and how Ryan’s team shifted from reviewing code directly to building systems, observability, and context that let agents review, fix, and merge work autonomously* Skills, docs, tests, markdown trackers, and quality scores as ways of encoding engineering taste and non-functional requirements directly into context the agent can use* The shift from predefined scaffolds to reasoning-model-led workflows, where the harness becomes the box and the model chooses how to proceed* Symphony, OpenAI’s internal Elixir-based orchestration layer for spinning up, supervising, reworking, and coordinating large numbers of coding agents across tickets and repos* Why code is increasingly disposable, why worktrees and merge conflicts matter less when agents can resolve them, and what it really means to fully delegate the PR lifecycle* “Ghost libraries”, spec-driven software, and the idea that a coding agent can reproduce complex systems from a high-fidelity specification rather than shared source code* The broader future of Frontier: safely deploying observable, governable agents into enterprises, and building the collaboration, security, and control layers needed for real-world agentic workRyan Lopopolo* X: https://x.com/_lopopolo* Linkedin: https://www.linkedin.com/in/ryanlopopolo/* Website: https://hyperbo.la/contact/Timestamps00:00:00 Introduction: Harness Engineering and OpenAI Frontier00:02:20 Ryan’s background and the “no human-written code” experiment00:08:48 Humans as the bottleneck: systems thinking, observability, and agent workflows00:12:24 Skills, scaffolds, and encoding engineering taste into context00:17:17 What humans still do, what agents already own, and why software must be agent-legible00:24:27 Delegating the PR lifecycle: worktrees, merge conflicts, and non-functional requirements00:31:57 Spec-driven software, “ghost libraries,” and the path to Symphony00:35:20 Symphony: orchestrating large numbers of coding agents00:43:42 Skill distillation, self-improving workflows, and team-wide learning00:50:04 CLI design, policy layers, and building token-efficient tools for agents00:59:43 What current models still struggle with: zero-to-one products and gnarly refactors01:02:05 Frontier’s vision for enterprise AI deployment01:08:15 Culture, humor, and teaching agents how the company works01:12:29 Harness vs. training, Codex model progress, and “you can just do things”01:15:09 Bellevue, hiring, and OpenAI’s expansion beyond San FranciscoTranscriptRyan Lopopolo: I do think that there is an interesting space to explore here with Codex, the harness, as part of building AI products, right? There’s a ton of momentum around getting the models to be good at coding. We’ve seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you’re trying to.Build a user journey that you’re trying to solve into code. It’s pretty natural to use the Codex Harness to solve that problem for you. It’s done all the wiring and lets you just communicate in prompts. To let the model cook, you have to step back, right? Like you need to take a systems thinking mindset to things and constantly be asking, where is the Asian making mistakes?Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I’m putting in place. So I have solved this part of the SDLC.swyx: [00:01:00] All right.[00:01:03] Meet Ryan swyx: We’re in the studio with Ryan from OpenAI. Welcome.Ryan Lopopolo: Hi,swyx: Thanks for visiting San Francisco and thanks for spending some time with us.Ryan Lopopolo: Yeah, thank you. I’m super excited to be here.swyx: You wrote a blockbuster article on harness engineering. It’s probably going to be the defining piece of this emerging discipline, huh?Ryan Lopopolo: Thank you. It is it’s been fun to feel like we’ve defined the discourse in some sense.swyx: Let’s contextualize a little bit, this first podcast you’ve ever done. Yes. And thank you for spending with us. What is, where is this coming from? What team are you in all that jazz?Ryan Lopopolo: Sure, sure.Ryan Lopopolo: I work on Frontier Product Exploration, new product development in the space of OpenAI Frontier, which is our enterprise platform for deploying agents safely at scale, with good governance in any business. And. The role of VMI team has been to figure out novel ways to deploy our models into package and products that we can sell as solutions to enterprises.swyx: And you have a background, I’ll just squeeze it in there. Snowflake, brick, [00:02:00] stripe, citadel.Ryan Lopopolo: Yes. Yes. Same. Any kind of customerswyx: entire life. Yes. The exact kind of customer that you want to,Vibhu: so I’ll say, I was actually, I didn’t expect the background when I looked at your Twitter, I’m seeing the opposite.Stuff like this. So you’ve got the mindset of like full send AI, coding stuff about slop, like buckling in your laptop on your Waymo’s. Yes. And then I look at your profile, I’m like, oh, you’re just like, you’re in the other end too. Oh, perfect. Makes perfect.Ryan Lopopolo: I it’s quite fun to be AI maximalist if you’re gonna live that persona.Open eye is the place to do it. And it’sswyx: token is what you say.Ryan Lopopolo: Yeah. Certainly helps that we have no rate limits internally. And I can go, like you said, full send at this stay.swyx: Yeah. Yeah. So the Frontier, and you’re a special team within O Frontier.Ryan Lopopolo: We had been given some space to cook, which has been super, super exciting.[00:02:47] Zero Code ExperimentRyan Lopopolo: And this is why I started with kind of a out there constraint to not write any of the code myself. I was figuring if we’re trying to make agents that can be deployed into end to enterprises, they should be [00:03:00] able to do all the things that I do. And having worked with these coding models, these coding harnesses over 6, 7, 8 months, I do feel like the models are there enough, the harnesses are there enough where they’re isomorphic to me in capability and the ability to do the job.So starting with this constraint of I can’t write the code meant that the only way I could do my job was to get the agent to do my job.Vibhu: And like a, just a bit of background before that. This is basically the article. So what you guys did is five months of working on an internal tool, zero lines of code over a mi, a million lines of code in the total code base.You say it was cenex, more like it was cenex faster than you would’ve. If you had done it by end. SoRyan Lopopolo: yeah, thatVibhu: was the mindset going into this, right?Ryan Lopopolo: That’s right.[00:03:46] Model Upgrades LessonsRyan Lopopolo: Started with some of the very first versions of Codex CLI, with the Codex Mini model, which was obviously much less capable than the ones we have today.Which was also a very good constraint, right? Quite a visceral feeling to ask the [00:04:00] model to build you a product feature. And it just not being able to assemble the pieces together.Which kind of defined one of the mindsets we had for going into this, which is whenever the model just cannot, you always pop open at the task, double click into it, and build smaller building blocks that then you can reassemble into the broader objective.And it was quite painful to do this. Honestly, the first month and a half was. 10 times slower than I would be. But because we paid that cost, we ended up getting to something much more productive than any one engineer could be because we built the tools, the assembly station for the agent to do the whole thing.[00:04:43] Model Generations, Build Systems & Background ShellsRyan Lopopolo: But yeah, so onward to G BT 5, 5, 1, 5, 2, 5, 3, 5 4. To go through all these model generations and see their kind of corks and different working styles also meant we had to adapt the code base to change things up when the model was revved. [00:05:00] One interesting thing here is five two, the Codex harness at the time did not have background shells in it, which means we were able to rely on blocking scripts to perform long horizon work.But with five, three and background shells, it became less patient, less willing to block. So we had to retool the entire build system to complete in under a minute and. This is not a thing I would expect to be able to do in a code base where people have opinions. But because the only goal was to make the Asian productive over the course of a week, we went from a bespoke make file build to Basil, to turbo to nx and just left it there because builds were fast at that point.swyx: Interesting. Talk more about Turbo TenX. That’s interesting ‘cause that’s the other direction that other people have been doing.Ryan Lopopolo: Ultimately I have. Not a lot of experience with actual frontend repo architecture.swyx: You’re talking that Jessica built the sky. So I’m like, I know the NX team. I know Turbo from Jared [00:06:00] Palmer.And I’m like, yeah, that’s an interesting comparison.[00:06:02] One Minute Build LoopRyan Lopopolo: The hill we were climbing right, was make it fast.swyx: Is there a micro front end involved? Is it how how complex reactRyan Lopopolo: electron base single app sort of thingswyx: And must be under a minute. That’s an interesting limitation. I’m actually not super familiar with the background shelf stuff.Probably was talked about in the fight three release.Ryan Lopopolo: BA basically means that codex is able to spawn commands in the background and then go continue to work while it waits for them to finish. So it can spawn an expensive build and then continue reviewing the code, for example.swyx: Yeah.Ryan Lopopolo: And this helps it be more time efficient for the user invoking the harness.swyx: And I guess and just to really nail this, like what does one minute matter? Like why not five, okay, good. We want no. WeRyan Lopopolo: want the inner loop to be as fast as possible. Okay. One minute was just a nice round number and we were able to hit it.swyx: And if it doesn’t complete, it kills it or some something,Ryan Lopopolo: No.We just take that as a signal that we need to stop what we’re doing, double click, decompose a build graph a bit to get us to high back under so that we [00:07:00] can able the agent continue to operate.swyx: It’s almost like you’re, it’s like a ratchet. It’s like you’re forcing build time discipline, because if you don’t, it’ll just grow and grow.That’s right. And you mentioned that my current, like the software I work on currently is at 12 minutes. It sucks.Ryan Lopopolo: This has been my experience with platform teams in the past, where you have an envelope of acceptable build times and you let it go up to breach and then you spend two, three weeks to bring it back down to the lower end of the average low bed stop.But because tokens are so cheap Yeah. And we’re so insanely parallel with the model, we can just constantly be gardening this thing to make sure that we maintain these in variants, which means. There’s way less dispersion in the code and the SDLC, which means we can simplify in a way and rely on a lot more in variance as we write the software.[00:07:45] Observability, Traces & Local Dev StackVibhu: Lovely.[00:07:46] Humans Are BottleneckVibhu: You mentioned in your article, like humans became the bottleneck, right? You kicked off as a team of three people. You’re putting out a million line of code, like 1500 prs, basically. What’s the mindset there? So as much as code is disposable, you’re doing a lot of review. A lot [00:08:00] of the article talks about how you wanna rephrase everything is prompting everything, is what the agent can’t see.It’s kind of garbage, right? You shouldn’t have it in there. So what’s like the high level of how you went about building it, and then how you address okay, humans are just PR review. Like how is human in the loop for this?Ryan Lopopolo: We’ve moved beyond even the humans reviewing the code as well.[00:08:19] Human Review, PR Automation & Agent Code ReviewRyan Lopopolo: Most of the human review is post merge at this point.But post, post merge, that’s not even reviewed. That’s justswyx: Oh, let’s just make ourselves happy by YouRyan Lopopolo: haven’t used fundamentally. The model is trivially paralyzable, right? As many GPUs and tokens as I am willing to spend, I can have capacity to work with my hood base.The only fundamentally scarce thing is the synchronous human attention of my team. There’s only so many hours in the day we have to eat lunch. I would like to sleep, although it’s quite difficult to, stop poking the machine because it makes me want to feed it. You have to step back, right?Like you need to take a systems thinking mindset to things and [00:09:00] constantly be asking where is the agent making mistakes? Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I’m putting in place. So I have solved this part of the SDLC, and usually what that has looked like is like we started needing to pay very close attention to the code because the agent did not have the right building blocks to produce.Modular software that decomposed appropriately that was reliable and observable and actually accrued a working front end in these things, right?[00:09:35] Observability First SetupRyan Lopopolo: So in order to not spend all of our time sitting in front of a terminal at most, doing one or two things at a time, invested in giving the model that observability, which is that that graph in the post here.swyx: Yeah. Let’s walk through this traces and which existed firstRyan Lopopolo: we started with just the app and the whole rest of it. From vector through to all these login metrics, APIs was, I dunno, half an [00:10:00] afternoon of my time. We have intentionally chosen very high level fast developer tools. There’s a ton of great stuff out there now.We use me a bunch, which makes it trivial to pull down all these go written Victoria Stack binaries in our local development. Tiny little bit of python glue to spin all these up. And off you go. One neat thing here is we have tried to invert things as much as possible, which is instead of setting up an environment to spawn the coding agent into, instead we spawn the coding agent, like that’s the entry point.It’s just Codex. And then we give Codex via skills and scripts the ability to boot the stack if it chooses to, and then tell it how to set some end variables. So the app and local Devrel points at this stack that it has chosen to spin up. And this I think is like the fundamental difference between reasoning models and the four ones and four ohs of the past, where these models could not think so you had to put them in [00:11:00] boxes with a predefined set of state transitions.Whereas here we have the model, the harness be the whole box. And give it a bunch of options for how to proceed with enough context for it to make intelligent choices. SoVibhu: sales, so like a lot of that is around scaffolding, right? Yes. Previous agents, you would define a scaffold. It would operate in that.Lube, try again. That’s pivoted off from when we’ve had reasoning models. They’re seeming to perform better when you don’t have a scaffold, right? That’s right.[00:11:28] Docs Skills GuardrailsVibhu: And you go into like niches here too, like your SPEC MD and like having a very short agent MG Agent md.swyx: Yes. Yes.Vibhu: Yeah. So you even lay out what it is here, but I likeswyx: the table contents.Vibhu: Yeah.swyx: Like stuff like this, it really helps guide people because everyone’s trying to do this.Ryan Lopopolo: This structure also makes it super cheap to put new content into the repository to steer both the humans and the agents.swyx: You, you reinvented skills, right?Vibhu: One big agents andswyx: skills from first princip holdsRyan Lopopolo: all skills did not exist when we started doing this.Vibhu: You have a short [00:12:00] one 100 line overall table of contents and then you have little skills, right? Core beliefs, MD tech tracker. Yeah. Yeah. The scale is overRyan Lopopolo: The tech jet tracker and the quality score are pretty interesting because this is basically a tiny little scaffold, like a markdown table, which is a hook for Codex to review all the business logic that we have defined in the app, assess how it matches all these documented guardrails and propose follow up work for itself.Before beads and all these ticketing systems, we were just tracking follow up work as notes in a markdown file, which, we could spa an agent on Aron to burn down. There’s this really neat thing that like the models fundamentally crave text. So a lot of what we have done here is figure out ways to inject textswyx: intoRyan Lopopolo: the system right when we get a page, because we’re missing a timeout, for example.I can just add Codex in Slack on that page and say, I’m gonna fix this by adding a timeout. Please update our reliability documentation. To require that all network calls have [00:13:00] timeouts. So I have not only made a point in time fix, but also like durably encoded this process knowledge around what good looks like.swyx: Yeah.Ryan Lopopolo: And we give that to the root coding agent as it goes and does the thing. But you can also use that to distill tests out of, or a code review agent, which is pointed at the same things to narrow the acceptable universe of the code that’s produced.swyx: I think one of the concerns I have with that kind of stuff is you think you’re making the right call by making, it’s persisted for all time across everything.Yes. But then you didn’t think about the exceptions that you need to make, right? And that you have to roll it back.Vibhu: Part of it isswyx: also sometimes it can follow your s instructions too.Vibhu: It’s somewhat a skill, right? So it determines when it uses the tools, right? Like it’s not like it’ll run outta every call.It’ll determine when it wants to check quality score, right?Ryan Lopopolo: Yeah. And we do in the prompts we give these agents, allow them to push back,[00:13:51] Agent Code Review RulesRyan Lopopolo: When we first started adding code review agents to the pr, it would be Codex, CLI. Locally writes the change, pushes up a PR on [00:14:00] those PR synchronizations of review agent fires.It posts a comment. We instruct Codex that it has to at least acknowledge and respond to that feedback. And initially the Codex driving the code author was willing to be bullied by the PR reviewer, which meant you could end up in a situation where things were not converging. So yeah, we had to,swyx: he’s just a thrash.Ryan Lopopolo: We had to add more optionality to the prompts on both of these things, right? The reviewer agents were instructed to bias toward merging the thing to not surface anything greater than a P two in priority. We didn’t really define P two, but we gave it, youswyx: did define P two.Ryan Lopopolo: We gave it a framework within which to score its outputswyx: and then greater than P zero is worse, right?Yes. P two is very good.Ryan Lopopolo: P zero is you will mute the code place ifswyx: you merch thisRyan Lopopolo: thing, right?swyx: Yeah.Ryan Lopopolo: But also on the code authoring agent side, we also gave it the flexibility to either defer or push back against review feedback, right? This happens all the time, right? Like I happen to notice something and leave a code review, [00:15:00] which.Could blow up the scope by a factor of two. I usually don’t mean for that to be addressed Exactly. In the moment. It’s more of an FYI file it to the backlog, pick it up in the next fix it week sort of thing. And without the context that this is permissible, the coding agents are gonna bias toward what they do, which is following instructions.swyx: Yeah.[00:15:19] Autonomous Merging Flowswyx: I do wanted to check in on a couple things, right? Sure. All the coding review agent, it can merge autonomously. I think that’s something that a lot of people aren’t comfortable with. And you have a list here of how much agents do they do Product code and tests, CI configuration and release tooling, internal Devrel tools, documentation eval, harness review, comments, scripts that manage the repository itself, production dashboard definition files, like everything.Yes. And so they’re just all churning at the same time, is there like a record that, that any human on the team pulls to stop everythingRyan Lopopolo: Because we are building a native application here. We’re not doing continuous deploy. So there’s still a human in the loop for cutting the release branch.I see. We require a blessed [00:16:00] human approved smoke test of the app before we promote it to distribution, these sort of things.swyx: So you’re working on the app, you’re not building like infrastructure where you have like nines of reliability, that kinda stuff?Ryan Lopopolo: That’s correct. That’s correct. Okay. And also like full recognition here that all of this activity took in a completely greenfield repository.There’s. Should be no script that this applies generally toswyx: this is a production thing, you’re gonna shipRyan Lopopolo: toswyx: customers. Of course. Yeah, of course. So this is realVibhu: And like one of the things there is, you mentioned you started this as a repo from scratch. The onboarding first month or so was pretty, it was like working backwards, right?Yeah. And then you had to work with the system and now you’re at that point where you know, you’re very autonomous. I’m curious like, okay, so what, how human in the loop is it? So what are the bottlenecks that you wish you could still automate? And part of that is also like, where do you see the model trajectory improving and offloading more human in the loop?We just got 5.4. It’s a really good,Ryan Lopopolo: fantastic model, by the way.Vibhu: Yeah. Yeah. It’s the first one that’s merged. Top tier coding. So it’s codex level coding and reasoning. So general reasoning both in one model. SoRyan Lopopolo: andVibhu: computer [00:17:00] use vision.Ryan Lopopolo: Now we now with five four, I can just have Codex write the blog post, whereas for this one I had to balance between chat.swyx: Oh, I need to, I might be out of a job. Oh my God.Ryan Lopopolo: Oh,swyx: I know. You just gave me an idea for a completely AI newsletter that five four could do. Yeah, I get it Now.Ryan Lopopolo: This sort of thing is just one example of closing the loop, right? Like the dashboard thing you mentioned. We have Codex authoring the Js ON, for the Grafana dashboards and publishing them and also responding to the pages, which means when it gets the page, it knows exactly which dashboards are defined and what alerts.What alert was triggered by which exact log in the code base. ‘cause all of this stuff is collated together.swyx: It has to own everything.Yes. Yeah. Yeah.Ryan Lopopolo: And it means that if we have an outage that did not result in a page. It has the existing set of dashboards available to it. It has the existing set of metrics and logs and can figure out where the gaps in the dashboard are or [00:18:00] in the underlying metrics and fix them in one go.In the same way, you would have a full stack engineer be able to drive a feature from the backend all the way to the front end.Vibhu: So it, it seems like a lot of the work you guys had to do was you as a small team are fully working for a way that the model wants the software to be written. It’s like less human legible for better. Code legibility, agent legibility. How do you think that affects broader teams? So one at OpenAI, do liaison, like this is how software should be written. Like I can imagine, say you join a new team with this methodology, this mindset there’s ways that, teams do code review, teams write code, like teams are structured and a lot of it is for human legibility.So should we all swap? Like how does this play back one broader into OpenAI and then like broader into the software engineering, right? Is it like teams that pick this up will it’s pretty drastic, right? You have to make a pretty big switch. Should they just full send Yeah.Ryan Lopopolo: The mindset is very much that I’m removed from the process, right? I can’t really have deep code level opinions about [00:19:00] things. It’s as if I’m. Group tech leading a 500 person organization.Vibhu: Yeah.Ryan Lopopolo: Like it’s not appropriate for me to be in the weeds on every pr. This is why that post merge code review thing is like a good analog here, right?Like I have some representative sample of the code as it is written, and I have to use that to infer what the teams are struggling with, where they could use help, where they’re already moving quickly and I can pivot my focus elsewhere.Vibhu: Yeah.Ryan Lopopolo: So I don’t really have too many opinions around the code as it is written.I do, however, have a command based class, which is used to have repeatable chunks of business logic that comes with tracing and metrics and observability for free. And the thing to focus on is not how that business logic is structured, but that it uses this primitive ‘cause I know that’s gonna give leverage by default.Vibhu: Yeah.Ryan Lopopolo: Yeah, back to that sort of systems stinking,Vibhu: and you have part of that in your blog post, enforcing architecture and ta taste how you set boundaries for what’s used. There’s also a section on redefining [00:20:00] engineering and stuff, but yeah, it’s just, it’s interesting to hear,Ryan Lopopolo: and as the models have gotten better, they have gotten better at proposing these abstractions to unblock themselves, which again, lets me move higher and higher up the stack to look deeper into the future on what ultimately blocked the team from shipping.swyx: Yeah. You mentioned so you, this is primarily a, it is like a 1 million line of code base electron app. But it manages its own services as well, so it’s like a backend for front end type thing.Ryan Lopopolo: We do have a backend in there, but that’s hosted in the cloud.Yeah. This sort of structure is actually within the separate main and render processesWithin theswyx: electric.That’s just how electronic works.Ryan Lopopolo: Yeah, of course. So have also treated like. MVC style decomposition with the same level of rigor, which has been very fun.swyx: I have a fun pun. This is a tangent, NVC is model view controller. Any sort of full stack web Devrel knows that.But my AI native version of this is Model view Claw, the clause the harness.Ryan Lopopolo: That’s right. That’s right. I do think that there is an interesting space to [00:21:00] explore here with Codex, the harness as part of building AI products, right? There’s a ton of momentum around getting the models to be good at coding.We’ve seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you’re trying to build, a user journey that you’re trying to solve into code, it’s pretty natural to use the Codex Harness to solve that problem for you. It’s done all the wiring and lets you just communicate and prompts to let the model cook.Yeah. It’s been very fun. And there’s also a very engineering legible way of increasing capabil. It’s fantastic, right? Yeah. Just give you, just give the model scripts, the same scripts you would already build for yourself.swyx: Yeah.Yeah. So for listeners, this is Ryan saying that software engineering or coding against will eat knowledge work like the non-coding parts that you would normally think.Oh, you have to build a separate agent for it. No, start a coding agent and go out from there. Which open Claw has like it’s pie Underhood.Ryan Lopopolo: [00:22:00] Yes.Vibhu: Basically define your task in code. Everything is a codingswyx: agent by the way. Since I brought it up, it’s probably the only place we bring it up. Is any open claw usage from you?Any?Ryan Lopopolo: No. No. Not for me. I don’t have any spare Mac Minis rattling around my house.swyx: You can afford it? No. I just, I’m curious if it’s changed anything in opening eye yet, but it’s probably early days. And then the other, the other thing I, I wanna pull on here is like you mentioned ticketing systems and you mentioned prs and I’m wondering if both those things have to go away or be reinvented for this kind of coding.So the git itself and is like very hostile to multi-agent.Ryan Lopopolo: Yeah. We make very heavy use of work trees.swyx: But like even then, like I just did a, dropped a podcast yesterday with Cursors saying, and they said they’re getting rid of work trees ‘cause it still has too many merge conflicts.It’s still un too un unintuitive. But go ahead.Ryan Lopopolo: The models are really great at resolving merge conflicts. Yeah. And to get to a state where I’m not synchronously in the loop in my terminal, I almost don’t care that there are mergeswyx: with disposable.[00:23:00] Yeah.Ryan Lopopolo: We invoke a dollar land skill and that coaches codex to push the PR Wait for human and agent reviewers Wait for CI to be green.Fix the flakes if there are any merged upstream. If the PR comes into conflict, wait for everything to pass. Put it in the merge queue. Deal with flakes until it’s in Maine. End. This is what it means to delegate fully, right? This is in a, very large model re probably a significant tax on humans to get PRS merged, but the agent is more than capable of doing this and I really don’t have to think about it other than keep my laptop open.swyx: Yeah. I used to be much more of a control freak, but now I’m like, yeah, actually you could do a better job of this than me. Yeah. With the right context. Yes.[00:23:47] Encoding Requirementsswyx: Anything else in harness in general? Just this piece, I just wanna make sure we,Ryan Lopopolo: I think one thing that I maybe didn’t make super clear in the article that I heard on Twitter as an interesting, that’s respond [00:24:00]swyx: to them.What’s the chatter and then what’s your response?Ryan Lopopolo: Ultimately, all the things that we have encoded in docs and tests and review agents and all these things are ways to put all the non-functional requirements of building high scale, high quality, reliable software into a space that prompt injects the agent.We either write it down as docs, we add links where the error messages tell how to do the right thing. So the whole meta of the thing is to basically tease out of the heads of all the engineers on my team, what they think good looks like, what they would do by default, or what they would coach a new hire on the team to do to get things to merch.And that’s why we pay attention to all the mistakes, mistakes that the agent makes, right? This is code being written that is misaligned with some as yet not written down, non-functional requirement.swyx: Sorry, what? Did the online people misunderstand orRyan Lopopolo: No,swyx: whatyouRyan Lopopolo: responded to? Somebody just literally said that.I was like, oh yeah,swyx: okay,Ryan Lopopolo: This is the [00:25:00] thing. This is what I’ve been doing. Oh, youswyx: agree? Yeah. I see. Interesting.Ryan Lopopolo: One other neat thing, which I did totally did not expect is folks were just. Taking the link to the article and giving it to pi or Codex and say, make my repo this,Vibhu: you achi a whole recursion.Ryan Lopopolo: And it was wildly effective. Really? It was wildly effective. NoVibhu: way. It just actually is something I tried with five, four yesterday. I didn’t have time. Last time I was like out speaking of something, and this is one of my things, I was like, okay, I have this article. Can we just scaffold out what it would be like to run this?And I, I did it first as that and then I was like, okay, let me take another little side repo and say okay, if I was to fully automate this like this because I haven’t written a line of code, it’sRyan Lopopolo: like over full, setVibhu: it right. The side thing I’m doing of voice. TTS I’m just like, slobbing out, whatever.It’s nothing production. I’m like, how would I make this like this? And it’s actually like a really good way. It’s like a good way to learn what could be changed, what could be like, it’s just a good analyzing, right? You give it all the codes, you give it all the context, you give it the article and it walks you through it very well.That’s right. That’s right.[00:25:57] Inlining Dependencies[00:25:57] Dependencies Going Away & Brett Taylor’s Responseswyx: I guess one more thing before we go to Symphony is I wanted to cover [00:26:00] Brett Taylor’s response. We had him on the show. He is your chairman, which is wild. Yeah. That he’s reading your articles as well and like getting engaged in it. He says software dependencies are going away.Basically they can just be like vendored. Yes. Response.Ryan Lopopolo: Aswyx: hundred percent. A hundred percent agree. You still pro qr, you still pay Datadog. You still pay Temporal. Thank you.Ryan Lopopolo: Yep. The level of complexity of the dependencies that we can internalize is, I would say low, medium right now. Just based on model capability.What does the,swyx: what is medium?Ryan Lopopolo: I would say like a. A couple thousand line dependency is a thing that we could in-house No problem. Call in an afternoon of time. One neat thing about it is like probably most of that code you don’t even need. Like by in-house and abstraction, you can strip away all the generic parts of it and only focus on what you need to enable the specific thing.Yes. You’re building,swyx: I’ve been calling this the end of b******t plugins.Ryan Lopopolo: Yeah.swyx: Because there’s so much when I published an open source thing, I want to accept everything, be liberal. I want to accept, this is post’s law, but that means there’s so much bloat. Yes. There’s so much overhead.Ryan Lopopolo: One other neat thing about [00:27:00] this too is when we deploy Codex Security on the repo, it is able to deeply review and change. The internalized dependencies in a much lower friction way than it would be to like, push patches upstream, wait for them to be released, pull them down, make sure that’s compatible with all the transitive I have in my repo and things like that.So it’s also much lower friction to internalize some of these things if code is free. ‘cause the tokens are cheap sort of thing.swyx: Yeah. Yeah. I think like the only argument I have against this is basically scale testing, which obviously the larger pieces of software like Linux, MySQL, he calls up even the Datadog and Temporals and then maybe security testing where Yes.Classically, I think, is it linis tos, it said security open source is the best disinfectant.Ryan Lopopolo: Many eyes.swyx: Many eyes. And if inline your dependencies and code them up, you’re gonna have to relearn mistakes from other people that Yep.Ryan Lopopolo: Yep. And to internalize that dependency, you’re back to zero and you have to start.Reassembling all those bits and pieces to Yeah. Have [00:28:00] high confidence in the code as it is written. Yeah.Vibhu: Even part of the first intro of this, you basically mentioned like everything was written by codex, including internal tooling, right? So internal tooling, like when you’re visualizing what’s going on it’s writing it for itself.swyx: Yeah. I’m built internal tools way I now, and like I just show them off and they’re like, how long did you spend? And I didn’t spend any time. I just prompted it,Ryan Lopopolo: very funny story here.swyx: Yeah, go ahead.Ryan Lopopolo: We had deployed our app to the first dozen users internally had some performance issues, so we asked them to export a trace for us get a tar ball, gave it to our on-call engineer, and he did a fantastic job of working with Codex to build this beautiful local Devrel tool, next JS app, the drag and drop the tar ball in, and it visualizes the entire trace.It’s fantastic. Took an afternoon, but none of this was necessary. Because you could just spin up codex and give it the tar ball and ask the same thing and get the response immediately. So in a way, optimizing for human [00:29:00] legibility of that debugging process was wrong. It kept him in the loop unnecessarily when instead he could have just like Codex cooked for five minutes and gotten this same.swyx: Yeah, you verify your instincts here of this is how we used to do it. Or this is how I would have used to solve it.Ryan Lopopolo: Yeah. In this local observability stack. Like sure, you can de deploy Yeager to visualize the traces, but I wouldn’t expect to be looking at the traces in the first place because I’m not gonna write the code to fix them.swyx: Yeah. So basically there needs to be like this kind of house stack and owning the whole loop. I think that is very well established. And it sounds like you might be like sharing more about that in the future, right?Ryan Lopopolo: Yeah. I think we’re excited to do[00:29:36] Ghost Libraries Specs[00:29:36] Ghost Libraries & Distributing Software as SpecsRyan Lopopolo: We’re gonna talk about Symphony in a little bit, but like the way we distribute it as a spec, which I think folks are calling Ghost Libraries on Twitter.This is like a such a cool name. It does mean it becomes much cheaper to share software with the world, right? You define a spec, how you could build your own specifying as much as is required for a coding agent to reassemble it [00:30:00] locally. The flow here is very cool. Like we have taken. All the scaffolding that has existed in our proprietary repo spun up a new one.Ask Codex with our repo as a reference. Write the spec. We tell it. Spin up a team ox spawn a disconnected codex to implement the spec. Wait for it to be done. Spawn another codex and another team ox to review the spec com or review the implementation compared to upstream and update the spec so it diverges less.And then you just loop over and over Ralph style until you get a spec that is with high fidelity able to reproduce the system as it is. It’s fantastic.Vibhu: And you’re basically, you’re not really adding any of your human bias in there, right? That’s correct. A lot of times people write a spec and be like, okay, I think it should be done this way, and you’ll riff on something.And it’s no, the agent could have just handled it like you’re still scaffolding in a sense, right? I want it done this way. It can determine its spec better.swyx: That’s right. That’s right. Part of me it, I’m, I’ve been working a lot on evals recently, and part of me is wondering if [00:31:00] an agent can produce a spec that it cannot solve.Is it always capable of things that he can imagine or can you imagine things that it is impossible to do?Ryan Lopopolo: I think with Symphony, we, there’s like this there’s this axis where you have things that are easier, hard, or established or new, right? And I think things that are hard and new is still something that the models need humans.Yeah. Drive.swyx: Yeah. Yeah.Ryan Lopopolo: But I think those other quadrants are largely salt. Given the right scaffold and the right thing that’s gonna drive the agent to completion,swyx: it’s crazy that it solved,Ryan Lopopolo: but it means that the humans, the ones with limited time and attention get to work on the hardest stuff, like the problems where it’s pure white space out in front. Or like the deepest refactorings where you don’t know what the proper shape of the interfaces are. And this is where I wanna spend my time. ‘cause it lets me set up for the next level of scale.swyx: Yeah. Yeah. Amazing. Let’s introduce Symphony.I think we’ve been mentioning it every now and then. Elixir. Interesting option.Ryan Lopopolo: Yeah.swyx: Yeah. I’m not,Ryan Lopopolo: again, like the [00:32:00] elixir manifestation here is just a derivative. Is it a modelswyx: chosen? Yeah.Ryan Lopopolo: Yeah. Yeah. And it chose that because the process supervision and the gen servers are super amenable to the type of process orchestration that we’re doing here.You are essentially spinning up little Damons for every task that is in execution and driving it to completion, which. Means the mall gets a ton of stuff for free by using Elixir and the Beam.swyx: I had to go do a crash course in Beam and Elixir, and I think most people are not operating at that scale of concurrency where you need that.But it is a good mental model for Resum ability and all those things. And these are things I care about. But tell me the story, the origin story of Symphony. What do you use it for? Is this, how did it form maybe any abandoned paths that you didn’t take?[00:32:46] Terminal Free Orchestration[00:32:46] Symphony: Removing Humans from the LoopRyan Lopopolo: At the end of December we were at about three and a half PRS per engineer per day.This was before five two came out in the beginning of January. Everyone gets back from holiday with five two and no other work [00:33:00] on the repository. We were up in the five to 10 PRS per day per engineer. And I don’t know about y’all, but like it’s very taxing to constantly be switching like that. Like I was pretty tapped out at the end of the day, again, where are the humans spending their time? They’re spending their time context switching between all these active tmox pains to drive the agent forward.swyx: Yeah. No way. Yeah.Ryan Lopopolo: So let’s again, build something to remove ourselves from the loop. And this is what frantic sprinted adapt here to find a way to remove the need for the human to sit in front of their terminal.So a lot of experimentation with Devrel boxes and, automatically spinning up agents, like it seems like a fantastic end state here, where my life is beach. I open live twice a day and say yes no to these things. Yeah. And this is again, a super, super interesting framing for how the work is done.Because I become more latency and sensitive. I have [00:34:00] way less attachment to the code as it is written. Like I’ve had close to zero investment in the actual authorship experience. So if it’s garbage. I can just throw it away and not care too much about it. In Symphony, there’s this like rework state where once the PR is proposed and it’s escalated to the human for review, it should be a cheap review.It is either mergeable or it is not. And if it’s not, you move it to rework. The elixir service will completely trash the entire work tree NPR and start it again from scratch. Okay. And this is that opportunity again to say, why was it trash right? What did the agent do that wasswyx: bad. Yeah.Ryan Lopopolo: Fix that before moving the ticket toswyx: endRyan Lopopolo: of progress again.swyx: Yeah. Why is this not in codex app? I guess this, you guys are ahead of Codex app,Ryan Lopopolo: yeah, so the way the team has been working is basically to be as AI pilled as possible and spread ahead. And a lot of the things we have worked on have fallen out [00:35:00] into a lot of the products that we have.Like we were in deep consultation with the Codex team to. Have the Codex app be a thing that exists, right? To have skills be a thing that Codex is able to use. So we didn’t have to roll our own to put automations into the product. So all of our automatic refactoring agents didn’t have to be these hand rolled control loops.It has been really fantastic to be, in a way, un anchored to the product development of Frontier and Codex and just very quickly try to figure out what works and then later find the scalable thing that can be deployed widely. It’s been a very fun way to operate. It’s certainly chaotic. I have lost track very often of what the actual state of the code looks like.‘cause I’m not in the loop. There was. One point where we had wired playwright directly up to the Electron app. With MCPM CCPs, I’m pretty bearish on because the harness forcibly injects all those tokens in the [00:36:00] context, and I don’t really get a say over it. They mess with auto compaction. The agent can forget how to use the tool.There’s probably only what three calls in playwright that I actually ever want to use. So I pay the cost for a ton of things. Somebody vibed a local Damon that boots playwright and exposes a tiny little shim CLI to drive it. And I had zero idea that this had occurred because to me, I run Codex and it’s able to, it’s oh, it’s better.Yeah. Like no knowledge of this at all. Uhhuh.[00:36:30] Multi Human ChaosRyan Lopopolo: So we have had like in human space to spend a lot of time doing synchronous knowledge sharing. We have a daily standup that’s 45 minutes long because we almost have to. Fan out the understanding of the current state.swyx: Yeah, I was gonna say this is good for a single human multi-agent, but multi human, multi-agent is a whole like po like explosion of stuff.Ryan Lopopolo: Yeah. And that this is fundamentally why we have such a rigid, like 10,000 [00:37:00] engineer level architecture in the app because we have to find ways to carve up the space so people are not trampling on each other.swyx: Sorry, I don’t get the 10,000 thing. Did I miss that?Ryan Lopopolo: The structure of the repository is like 500 NPM packages.It’s like architecture to the excess for what you would consider, I think normal for a seven person team. But if every person is actually like 10 to 50. Then the like numbers on being super, super deep into decomposition and sharding and like proper interface boundaries make a lot more sense.swyx: Yeah. To me, that’s why I talked about Microfund ends and I, an anex is from that world, but Cool. It is just coming back to, to, to this I dunno if you have other, thoughts on. Orchestrating so much work coin going through this. Is this enough? Is this like any aha moments?Vibhu: It’ll be interesting to see like where, okay, so right now you pick linear as your issue tracker, right?swyx: Or it’s like a is it actually linear? This is actually linear.[00:37:55] Linear vs Slack WorkflowVibhu: Oh, that’s linear. It’s linear.swyx: Oh I never looked atVibhu: video. The demo video I had to download to [00:38:00] run.swyx: So I, because I’m a Slack maxie, but Yeah, linear. Linear is also really good. Yes,Ryan Lopopolo: we do make a good use of Slack. We we fire off codex to do all these lotion, elasticity, fix ups, the things that like sync that knowledge into the repository.It’s super cheap. Yeah.swyx: Yeah.Ryan Lopopolo: Just do it in Codex.swyx: My biggest plug is OpenAI needs to build Slack. You need to own Slack. Build yours. Turn this into Slack.Ryan Lopopolo: I did read about it. Youswyx: did?Ryan Lopopolo: Yeah.[00:38:25] Collaboration Tools for AgentsRyan Lopopolo: I would say that if we think that we want these agents to do economically valuable work, which is like this is the mission, right?We want AI to be deployed widely, to do economically valuable work, then we need to find ways for them to naturally collaborate with humans, which means collaboration tooling, I think, is an interesting space to explore.swyx: Yeah, totally. Yeah. GitHub, slack, linear.Vibhu: Yeah, that was my thing. Okay, where do we see right now Codex has started Codex Model, then CLI, now there’s an app, app can let me shoot off multiple Codex is in parallel, but there’s no great team collaboration for Codex.And it [00:39:00] seems like your team had some say into what comes out, right? So you talked to ‘em, codex kind of was a thing. From there, if you guys are on the bound, what stuff that like, you might not focus on, but what do you expect other people to be building, right? So people that are like five x 50 Xing.Should you build stuff that’s like very niche for your workflow, for your team? Should it be more general so other people can adopt? Is there a niche there? ‘Cause part of it is just okay, is everything just internal tooling? Do we have everything our own way? Like the way our team operates has our own ways that we like to communicate or is there a broader way to do it?Is it something like a issue tracker? Just thoughts if you wanna riff on that.[00:39:35] Standardizing Skills and CodeRyan Lopopolo: I think TBD we have not figured this out in a general way. I do think that there is leverage to be had in making the code and the processes as much the same as possible. If you think that code is context, code is prompts, it’s better from the agent behavior perspective to be able to look in a package in directory X, Y, Z, and it not to have to page so [00:40:00] deeply into directory if you C, because they have the same structure, use the same language, they have the same patterns internally.And that same like leverage comes from aligning on a single set of skills that you’re pouring every engineer’s taste into to make sure that the agent is effective. So like in our code base, we have, I think, six skills. That’s it. And if some part of the software development loop is not being covered, our first attempt is to encode it in one of the existing setup skills, which means that we can change the agent behavior.Yeah. More cheaply than changing the human driver behavior.swyx: Yeah.[00:40:39] Self Improvement via Logsswyx: Have you ever, have you experimented with agents changing their own behavior?Ryan Lopopolo: We do.swyx: Yeah. Or parent agent changing a subagents, behavior or something like that.Ryan Lopopolo: We have some bits for skill distillation. So for example, there’s one neat thing you can do with Codex, which is just point it at its own session logs to ask it to tell you how you can use [00:41:00] the tool pedal better.swyx: It’s like introspectionRyan Lopopolo: or ask it to do things. I useVibhu: this session better. What skills should Iswyx: high? I like the modification of, you can do, just do things to you can just ask agent to do things.Ryan Lopopolo: Yeah. You can just codex things. This is like a, this is like a silly emoji that we have, right? You can just codex things, you can just prompt things.It’s really glorious future we live in, but okay, you can do that one-on-one. But we’re actually slurping these up for the entire team into blob storage and. Running agent loops over them every day to figure out where as a team can we do better and how do we reflect that back into the repositories?Yes, though everybody benefits from everybody else’s behavior for free. Same for like PR comments, right? These are all feedback. That means the code as written, deviated from what was good, a PR comment, a failed build. These are all signals that mean at some point the agent was missing context. We gotta figure out how toswyx: Yeah.Ryan Lopopolo: Slurp it up and put it back in the reboot.swyx: By the way, I do this exactly right. I used to, when I use cloud code for [00:42:00] knowledge work, cloud cowork is like a nice product, right? Yes. In I think you would agree. I always have it tell me what do I do better next time? And that’s the meta programming reflection thing.So I almost think like you have six reflection extraction levels in symphony and almost like the zero of layer. So the six levels are PO policy, configuration, coordination, execution, integration, observability. We’ve talked about a couple of these, but the zero layer is like the, okay, are we working well?Can we improve how we work? Yes. Can I modify my own workflow without MD or something? I don’t know.Ryan Lopopolo: Yeah, of course. Yeah, of course you can. Like this thing is also able to cut its own tickets ‘cause we give it full access.Yeah. Make it a ticket to have it cut. Tickets you can.Put in the ticket that you expect it to file as on follow up work,swyx: like Yeah. Self-modifying. Yeah.Ryan Lopopolo: Yeah.[00:42:44] Tool Access and CLI FirstRyan Lopopolo: Put, don’t put the agent in a box. Give the agent full accessibility over it. Domain.swyx: I had a mental reaction when you said don’t put the agent in a box. So I think you should put it in a box. Like it’s just that you’re giving the box everything it needs.Ryan Lopopolo: Yeah. Context and tools.swyx: But we’re like, as developers, we’re used to calling [00:43:00] out to different systems, but here you use the open source things like the Prometheus, whatever, and you run it locally so that you can have the full loop. I assume.Ryan Lopopolo: Yep.Vibhu: I think likeRyan Lopopolo: another, you wanna minimize cloud, cloud dependencies.Vibhu: You also want to make sure that you think about what the agent has access to. What does it see? Does it go back into the loop, like from the most basic sense of you let it see its own like calls, traces it can determine where it went wrong. But are you feeding that back in? So you know, just the most basic level of you wanna see exactly what’s input output, like does the agent have access to.What is being outputted, right? It can self-improve a lot of these things. It’s allRyan Lopopolo: text, right? My job is to figure out ways to funnel text from one agent to the other.swyx: It’s so strange like way back at the start of this whole AI wave Andre was like, English is the hottest day programming language.It’s here, it’s just Yeah. The feature as well.Vibhu: A lot of, okay. Like a lot of software, a lot of stuff. There’s a gui, it’s made for the human. We’re seeing the evolution of CLI for everything, right? All tools have CLIs. Your agents can use [00:44:00] them well, do we get good vision? Do we get good little sandboxes?Like right now? It’s a really effective way, right? Models love to use tools. They love the best. They love to read through text. So slap a CLI let it go loose. That works for everything.Ryan Lopopolo: It does. Yeah. Yeah.[00:44:14] UI Perception and RasterizingRyan Lopopolo: We’ve also been adapting nont, textual things to that shape in order to improve model behavior in some ways, right?We want the agent to be able to see the UI agents do not perceive visually in the same way that we do. They don’t see a red box, they see red box button, right? They see these things in latent space. So if we want, Hey, yeah, I do. We haveswyx: a ding if that goes off every time. Alien spaceRyan Lopopolo: ding.Anyway if we wanna actually make it see the layout, it’s almost easier to rasterize that image to ask EOR and feed it in to the agent. Ha. And there’s no reason you can’t do both, right? To like further refine how the model perceives the object it’s [00:45:00] manipulating.swyx: Cool. Could we, you wanna talk about a couple more of these layers that might bear more introspection or that you have personal passion for?[00:45:07] Coordination Layer with ElixirRyan Lopopolo: I will say that the coordination layer here was a really tricky piece to get right.swyx: Let’s do it. Yep. I’m all about that. And this is Temporal core.Ryan Lopopolo: This is where when we turn the spec into Elixir, where like the model takes a shortcut, right? Like it’s oh, I have all these primitives that I can make use of in this lovely runtime that has native process supervision.Which is I think, a neat way to have taken the spec and made it more choices achievable by making choices that naturally mapswyx: Yeah.Ryan Lopopolo: To the domain, right? In the same way that like you would prefer to have a TypeScript model repo if you are doing full stack web development, right? Because the ability to share types across the front end and backend reduces a lot of complexity.And becauseswyx: that’s what graph kill used to be.Ryan Lopopolo: That’s right. Andswyx: I don’t know if it’s still alive, butRyan Lopopolo: [00:46:00] no humans in the loop here. So like my own personal ability to write or not write elixir. Doesn’t really have to bias us away from using the right tool for the job. It is just wild.swyx: Love it. I love it.Yeah. I wonder if any languages struggle more than others because of this? I feel like everyone has their own abstractions. That would make sense. But maybe it might be slower, it might be more faulty where like you’d have to just kick the server every now and then. I, I don’t know. I think observability layer is really well understood.Integration layer, CP is dead. I think all these just like a really interesting hierarchy to travel up and down. It’s common language for people working on the system to understandRyan Lopopolo: The policy stuff is really cool, right? Yeah. You don’t really have to build a bunch of code to make sure the system wait for the, to passswyx: it’s institutional knowledge.Ryan Lopopolo: Yeah. You just give it the G-H-C-L-I with some text that say CI has to pass. It makes the maintenance of these systems a lot easier.[00:46:57] Agent Friendly CLI Outputswyx: Do you think that CLI maintainers need to be [00:47:00] do anything special for agents or just as is? It’s good because like I don’t think when people made the G GitHub, CLI, they anticipated this happening.Ryan Lopopolo: That’s correct. The GH CLI is fantastic. It’s great super industry.swyx: Everyone go try GH repo create GH pull and then pull request number, right? GH HPR, like 1 53, whatever. And then it like pullsRyan Lopopolo: basically my only interaction with the GitHub web UI at this point is GH PR view dash web.Exactly. Glanceswyx: at the diffRyan Lopopolo: and be like Sure thing. Send it. Yeah. But the CLI are nice ‘cause they’re super token efficient and they can be made more token efficient really easily. Like I’m sure you all have seen like I go to build Kite or Jenkins and I could just get this massive wall of build output.And in order to unblock the humans, your developer productivity team is almost certainly gonna write some code that parses the actual exception out of the build logs and sticks it in a sticky note at the top of the page. And you basically [00:48:00] want CLI to be structured in a similar way, right? You’re gonna want to patch dash silent to prettier because the agent doesn’t care that every file was already formatted.Just wants to know it’s either formatted or not. So it can then go run a right command. Similarly, like in our PNPM distributed script runner, when we had one, when you do dash recursive, like it produces a absolute mountain of text. But all of that is for passing. Test suites. So we ended up wrapping all of this in another scriptswyx: to suppress the,Ryan Lopopolo: which you can vibe the channel only output the failing parts of the tests.swyx: You make a pipe errors versus the standard, standard out. I don’t know. Okay. Whatever. Too much thinking have to do that. The CII used to maintain SCLI for my company and yeah, this is like core, very core to my heart. But you’re vibing my job.Ryan Lopopolo: That’s right.swyx: Cool. Any other things?This is a long spec. [00:49:00] I appreciate that. It’s got a lot of strong opinions in here. Any other things that we should highlight? I think obviously you can spend the whole day going through some of these, but I do think that some of these have a lot of care or some of this you might wanna tell people, Hey, take this, but, make it your own.[00:49:15] Blueprint Spec and GuardrailsRyan Lopopolo: Fundamentally, software is made more flexible when it’s able to adapt to the environment in which it is deployed, which means that things like linear or GitHub even are specified within the spec, but not required pieces of it. There’s like a more platonic ideal of the thing that you could swap in like Jira or Bitbucket, for example.But being able to tightly specify things like the ID formats or how the Ralph Loop works for the individual agents. Basically means you can get up and running with a fully specified system quickly that you then evolve later on. I think we never intended for this to be a static spec that you can [00:50:00] never change.It’s more like a blueprint to get something worth a starting point up and running.swyx: Yeah.Ryan Lopopolo: For you then to vibe later to your heart’s content,swyx: you have like code and scripts in here where it’s oh, I think this is a really good prompt. It’s just a very long prompt.Ryan Lopopolo: Fundamentally, the agents are good at following instructions, so give them instructions.And it will, improve the reliability of the result. We, much like the way we use Symphony, we don’t want folks to have to monitor the agent as it is vibing the system into existence. So being very opinionatedVery strict around what these success criteria are means that our deployment success rate goes up. Yeah. It means we don’t have to get tickets on this thing.Vibhu: Think it all goes back to that like code to disposable, right? Like early on when you had CLI or you’d kick off a Codex run, it would take two hours. You would wanna monitor okay, I’m in the workflow of just using one.I don’t want it to go down the wrong path. I’ll cut it off and, just shoot off four, like that was my favorite thing of the Codex app, right? Yeah. Just Forex it like, [00:51:00] it’s okay. One of them will probably be right, one of them might be better. Stop overthinking it. Like my first example was probably like deep research.When you put out deep research and I’d ask it something like, I asked it something about LLM, it thought it was legal something and spent an hour, came back with a report completely off the rails. And I was like, okay, I gotta monitor this thing a bit. No don’t monitor it. Just you want to build it so it’s that it, it goes the right way.And you don’t wanna, you don’t wanna sit there and babysit, right? You don’t want to babysit your agentsRyan Lopopolo: with that deep research query that you made. Looking at the bad result, you probably figured out you needed to tweak your prompt Yeah. A bit, right? That’s that guardrail that you fed back into the code base for the task, your prompt to further align the agent’s execution.Same sort of concept supply there too.swyx: When you talk, how are the customers feelingRyan Lopopolo: for Symphony? I think we have none, right? This is a thing we have put out into theswyx: world. Symphony’s internal, right? As long as you are happy, you are the customer. That’s right. Just, what’s the external view?[00:51:53] Trust Building with PR VideosRyan Lopopolo: I’d say folks are very excited about this way of distributing software and ideas in [00:52:00] cheap ways. For us as users, it has again, pushed the productivity five x, which means I think there’s something here that’s like a durable pattern around removing the human from the loop and figuring out ways to trust the output.The video that is shared hereswyx: Yeah.Ryan Lopopolo: Is the same sort of video we would expect the coding agent to attach to the pr.swyx: Yeah.Ryan Lopopolo: That is created. Yeah. That’s part of building trust in this system and that’s, to me, like fundamentally what has been cool about building this is it more closely pushes that persona of the agent working with you to be like a teammate.I don’t shoulder surf you like for the tickets that you work on during the week. I would never think that I would want to do that.swyx: Yeah.Ryan Lopopolo: I wouldn’t want a screen recording of your entire session in Cursor or Claude code. I would expect you to do what you think you need to do to convince me that the code is good and [00:53:00] mergeableswyx: Yeah.Ryan Lopopolo: And compress that full trajectory in a way that is legible to me. The reviewer.swyx: Yeah.Ryan Lopopolo: It’s Stu. And you can just do that because Codex will absolutely sling some f you can just around. It’s great.swyx: Oh, F FM P is the og like God, CLI.Ryan Lopopolo: Yeah.swyx: Swiss Army Chainsaw. I used to say. There’s a SaaS, micro SaaS that’s called it in every flag in FFM Peg.Ryan Lopopolo: Oh, for sure.swyx: You know what I mean? For sure. Just host it as a service, put a UI on it. People who don’t know FM Peg will pay for it.Ryan Lopopolo: When we were first experimenting with this, it was a wild feeling to be at the computer with just like windows just popping up all over the place and getting captured and files appearing on my desktop, like very much felt like the future to have a thing controlling my computer for like actual productive use.Like I’m just thereswyx: keeping it. Like awake, jiggling the mouse every once in a while. That’s what some office workers do. So they buy a mouse jiggler. That’s right.[00:53:59] Spark vs Reasoning ModelsVibhu: One thing I [00:54:00] wanted to ask, so okay, as stuff is so CO is disposable is saying shoot off a budget of agents. One question is okay, are you always like a extra high thinking guy?And where do you see Spark? So 5.3 Spark, there’s a lot of me wanting to make quick changes. I’m not gonna open up a id, I’m not gonna do anything. But I will say, okay, fix this little thing, change a line, change a color. Spark is great for that, but am I still a bottleneck? Like, why don’t I just let that go back?I’m like, just riff on that. Is there,Ryan Lopopolo: spark is such a different model compared to the. The extra high level reasoning that you get in these, five Yeah. To clear for people.swyx: It is a different model, different architecture, different, like it doesn’t supportRyan Lopopolo: it, it just, it’s incredibly fast smaller model.I have not quite figured out how to use it yet. To be honest, I use faster. I was adapting it to the same sorts of tasks I would use X high reasoning for. Yeah. I, and it would blow through three compactions before writing a line of code.Vibhu: And that’s another big thing with 5.4 right.Million co context.Ryan Lopopolo: Yes, it’sVibhu: fantastic. Which is huge [00:55:00] ingenix, right? Like you can just run for longer before you have to compact. The more tokens you can spend on a task before compacting, like the better you’ll do.Ryan Lopopolo: That’s right. That’s right. I’m not sure how to deploy spark. I think your intuition is right, that it’s very great for spiking out prototypes, exploring ideas quickly, doing those documentation updates.It is fantastic for us in taking that feedback and transforming it into a lint. Where we already have good infrastructure for ES links in the code base these sorts of things it’s great at and it allows us to unblock quickly doing those like anti-fragile healing tasks in the code base.swyx: Yeah, that makes sense.[00:55:38] What Models Can’t Do Yetswyx: So you are push, you guys are pushing models to the freaking limit.[00:55:41] Current Model Limitationsswyx: What can current models not do well yet?Ryan Lopopolo: They’re definitely not there on being able to go from new product idea to prototype singleswyx: one shot.Ryan Lopopolo: This is where I find I spend a lot of time steering is translating end state of a mock for a net new [00:56:00] thing, right?Think no existing screens into product that is playable with. Similarly, while this has gotten better with each model release, like the gnarliest refactorings are the ones that I spend my most time with, right? The ones where I’m interrupting the most, the ones where I am. Now double clicking to build tooling to help decompose monoliths and things like that.This is a thing I only expect to get better, right? Over the course of a month, we went from the low complexity tasks to like low complexity and big tasks in both these directions. So this is what it means to not bet against the model, right? You should expect that it is going to push itself out into these higher and higher complexity spaces.Yeah. So the things we do are robust to that. It just basically means I’ll be able to spend my time elsewhere and figure out what the next bottleneck is.Vibhu: I do think it’s also a bit of a different type of task, right? Codex is really good at codebase understanding, working with code bases. But companies like Lovable bolt, repli, they solve a very different [00:57:00] problem.Scaffold of zero to one, right? Idea of a product. And it’s there, there are people working on that and models are also pushing like step function changes there. It’s just different than the software engineering agents today, right?Ryan Lopopolo: Like I said, the model is isomorphic to myself.The only thing that’s different is figuring out how to get what’s in here into context for the model and for these white space sort of projects. I, myself, I’m just not good at it. Which means that often over the agent trajectory, I realize the bits that we’re missing, which is why I find I need to have this synchronous interaction.And I expect with the right harness, with the right scaffold, that’s able to tease that outta me or refine the possible space, right? To be super opinionated around the frameworks that are deployed or to put a template in place, right? These are ways to give the model. All those non-functional requirements, that extra context to acre on and avoid that wide dispersion of possible outcomes.swyx: Thank [00:58:00] you for that.[00:58:00] Frontier Enterprise Platformswyx: I wanted to talk a little bit about Frontier.Ryan Lopopolo: Yeah, sure.swyx: Overall you guys announced it maybe like a month ago. And there’s a few charts in here and it’s basic like your enterprise offering is what I view it. Is there one product or is there many,Ryan Lopopolo: I can’t speak to the full product roadmap here, but what I can say is that Frontier is the platform by which we want to do AI transformation of every enterprise and from big to small.And the way we want to do that is by making it easy to deploy highly observable, safe, controlled, identifiable agents into the workplace. We want it to work with your company native. I am stack. We want it to plug into the security tooling that you have. Oh, we want it to be able to plug into the workspace tools that you used,swyx: so you’re just gonna be stripping specs, right?Ryan Lopopolo: We expect that there will be some harness things there. Agents, SDK is a core [00:59:00] part of this to enable both startup builders as well as enterprise builders to have a works by default harness that is able to use all the best features of our models from the Shell tool down to the Codex Harness with file attachments and containers and all these other things that we know go into building highly reliable, complex agents.We wanna make that great and we wanna make it easy to compose these things together in ways that are safe, for example, right? Like the G-P-T-O-S-S safeguard model. For example. One thing that’s really cool about it is it ships. The ability to interface with a safety spec. Safety specs are things that are bespoke to enterprises.We owe it to these folks to figure out ways for them to instrument the agents in their enterprise to avoid exfiltration in the ways they specifically care about, to know about their internal company, code names, these sorts of things. So providing the right hooks to make the [01:00:00] platform customizable, but also, mostly working by default for folks is the space we are trying to explore here.swyx: Yeah. And this is the snowflakes of the world just need this, right? Yes. Your Brexit of the world stripes. Yeah, it makes sense.[01:00:11] Dashboards and Data Agentsswyx: I was gonna go back to your, I think the demo videos that you guys had was pretty illustrative. It’s like also to me an example of very large scale agent management.Yes. Like you give people a control dashboard that if you play, if you like, play any one of these like multiple agent things, you can di dig down to the individual instant and see what’s going on.Ryan Lopopolo: Yes, of course.swyx: But who’s the user Is it let’s it like the CEO, the CTO, ccio, something like that.Ryan Lopopolo: At least with my personal opinion here, the buyer that we’re trying to build product for here is one and employees who are making productive use of these agents, right?That’s gonna be whatever surfaces they appear in the connectors they have access to, things like that. Something like this dashboard is for it. Your GRC and governments folks, your AI innovation office, your security [01:01:00] team, right? The stakeholders in your company that are responsible for successfully deploying into.The spaces where your employees work, as well as doing so in a safe way that is consistent with all the regulatory requirements that you have and customer attestations and things like that. So it is a iceberg beneath the actual end. It’s,swyx: yeah you jump every, I guess layer in the UI is like going down the layer of extraction in terms of the agent, right?Yep. Yeah. Yeah. I think it’s good.Ryan Lopopolo: Yeah. The ability to dive deep into the individual agent trajectory level is gonna be super powerful.Not only for from like a security perspective, but also from like someone who is accountable for developing skills. One thing that was interesting that we also blogged about shipping was an internal data agent, which uses a lot of the frontier technology in order to make our data ontology accessible to the agent and things like that to understand.What’s actually in the data [01:02:00] warehouse?swyx: Yeah. Seman layer Yes. Type things. Yes. I was briefly part of the, that, that world is it salt? I don’t know. It’s actually really hard for humans to agree on what revenue is. Yes.Ryan Lopopolo: Yes.swyx: What is an active user?Ryan Lopopolo: There’s what, five data scientists in the company that have defined this Golden.swyx: They, yeah. And no. And there’s also internal politics. Yes. As to attribution of I’m marketing, I’m responsible for this much, and sales is responsible for this much, and they all add up to more than a hundred. And I’m like you guys have different definitions.Vibhu: Yeah. And if you’re a startup, everything is a RR,swyx: So I think that’s cool.Oh, you guys blog about this. Okay. I didn’t see this. Yeah. Is this the same thing? I don’t know. This is what you’re referring to? Yes. Okay. We’ll send people to read this. This is our data.Vibhu: Him this one.swyx: Yeah. I don’t know if you’re you have any highlights? IVibhu: No. In general from the playlist.Yeah. A lot of good things to read.swyx: Yeah. Yeah. Lot, lots of homework for people. No, but like data as the feedback layer, you need to solve this first in order to have the products feedback loop closed. That’s right. So for the agents to understand and this is not something that humans have not solved.This like, andRyan Lopopolo: this is [01:03:00] how you build artists that do more than coding, right? Yeah.swyx: Yeah.Ryan Lopopolo: To actually understand how you operate the business.swyx: Yeah.Ryan Lopopolo: You have to understand what revenue is, what your customer segments are. Yeah. What your product lines are.[01:03:13] Company Context and MemesRyan Lopopolo: Like one thing that’s in looping back to the code base that we described here for harnessing, one thing that’s in core beliefs.md is who’s on the team, what product we’re building, who our end customers are.Who our pilot customers are, what the full vision of what we want to achieve over the next 12 months is these are all bits of context that inform how we would go about building the software. Oh my God. So we have to give it to the agent too.Vibhu: I’m guessing that stuff is like pretty dynamic and it changes over time too, right?Like part of it was, it’s not just a big spec. You have it as one of the things and it will iterate.Ryan Lopopolo: One, one thing that I think is gonna break your mind even more is we have skills for how to properly generate deep fried memes and have Ji culture [01:04:00] and Slack. Because with the Slack Chachi PT app that you’re able to use in Codex, like I can get the agent to s**t post on my behalf.Just, it’s part of humor.swyx: Theme humor. Humor is part of EGI. Is it funny? It is pretty good, yeah. Okay. Yeah,Ryan Lopopolo: it’s pretty good at makingswyx: Deep, it’s a lot of I think humor is like a really hard intelligence test, right? It’s like you have to get a lot of context into like very few words.This is why make referencesRyan Lopopolo: is why five four is such a big uplift for our it’s the me. Yeah, for sure. Yeah. Yeah.swyx: It’s very cool.Vibhu: So five, four can two post. So that’s what we take over here.Ryan Lopopolo: Yeah. Maybe maybe when y’all are done here today, ask Codex to go over your coding agent sessions and to roast you.swyx: Love it. I’ll give it a shot. Give a shot. Coming back to the final point I wanted to make is, yeah I think that there, there are multiple other, like you guys are working on this, but this is a pattern that every other company out there should adopt. Yes. Regardless of whether or not they work with you.To me, this is I saw this, I was like, f**k, [01:05:00] every company needs this. Thisisswyx: multiple billions.Ryan Lopopolo: This is what it takes to getswyx: Yeah.Ryan Lopopolo: People to Yes. Yeah. Actually realize the benefits. Yes. And distribute.swyx: And it’s, it, I think it sounds boring to people like, oh, it’s for safeguards and whatever, but I think you to handle agents at scale like you are envisioning here I don’t know if it’s like a real screenshot, like a demo, but this is what you need.This is, or my original sort of view of what Temporal was supposed to be that you, you built this dashboard and you basically have every long running process in the company Yes. In one dashboard and that’s it. That’s right.Vibhu: Yeah. I think it’s pretty customized towards every enterprise, right?Like you care about different things.swyx: There’s a lot of customization, but there’ll be multiple unicorns just doing this as a service. I don’t know. I’m like very frontier field, if you can tell. Amazing. But it, it only clicked ‘cause obviously this came out first, then Harness eng, then symphony and only clicked for me that like, this is actually the thing you shipped to do that.Ryan Lopopolo: Yeah. Yeah. There’s a set of building blocks here that we assembled into these agents [01:06:00] and the building blocks themselves are part of the product, right? Yeah. The ability to steer revoke authorization if a model becomes misaligned, like all of this is accessible through Frontier. And there’s gonna be a bunch of stakeholders in the company that have the things they need to see in the platform Yeah.To get to. Yes. So we’ll build all of those in the frontier so that we can actually do the widespread the planet. Yeah. That’s the fun part.swyx: Yeah. I’m also calling back to there’s this like levels of EGI I don’t know if Opening Eye is still talking about this, but they used to talk about five levels of EGI and one of it was like, oh, it’s like an intern coding software patient.At some point it was AI organization and this is it. That’s right. This is level four or five. I can’t remember which, which level, but it’s somewhere along that path. Was this.Ryan Lopopolo: You know how I mentioned that my team is having fun sprinting ahead here. And we do this thing where we’re collecting all the agent trajectories from Codex to slurp them up and distill them.This is what it means to build our team [01:07:00] level knowledge base, happen to reflect it back into the code base. But it doesn’t have to be that way. And it doesn’t have to be bound to just codex. I want Chacha BT to also learn our meaning culture and also the product we are building and how so that when I go ask it, it also has the full context of the way I do my work and I’m super excited for Frontier to enable this.swyx: Yeah. Amazing.[01:07:21] Harness vs Training Tensionswyx: What are the model people say when they see you do this? Like you have a lot of feedback, obviously you have a lot of usage, you have a lot of trajectories and don’t, I don’t imagine a lot of it’s useful to them, but some of it is,Vibhu: you have this too, you deploy a billion tokens of intelligence a day and this was, this was at the beginning of 2096.You’re Yeah. Cooking.Ryan Lopopolo: Yeah, there’s this fundamental tension, which I think you have talked about between whether or not we invest deeper into the harness or we invest deeper into the training process to get the model to do more of this by default. Yeah, and I think success for the way we are [01:08:00] operating here means the model gets better taste because we can point the way there and none of the things we have built actively degrade Asian performance.‘cause really all they’re doing is running tests and like running tests is a good part of what it means to write reliable software. If we were building an entire separate rust scaffold around Codex to restrict its output, that I think would be like additional harness that would be prone to being scrapped.But yeah. Yeah. If instead we can build all the guardrails in a way that’s just native to the output that Codex is already producing, which is code, I think. No friction with how the model continues to advance, but also like just good engineering and that’s the whole point.swyx: Yeah. So I’ve had similar discussions with research scientists where the RL equivalent is on policy versus off policy.Yeah. And you’re basically saying that you should build an on policy harness, which is already within distribution and you [01:09:00] modify from there. But if you build it off policy, it’s not that useful.Ryan Lopopolo: That’s right.swyx: Super cool. Any, anybody thoughts, any things that we haven’t covered that we should get it, get out there?[01:09:08] Closing Thoughts & OpenAI HiringRyan Lopopolo: Just I’ve been super excited to benefit from all the cooking that the Codex team has been doing. Yes. They absolutely ship relentlessly. This is one of our core engineering values, ship relentlessly, and they, the team there embodies it. To extreme degree, yeah, I have five three and then Spark and five four come out within what feels like a month is just a phenomenally fast.swyx: It’s exactly a month ago it’s five three and yesterday was five four. Yeah. I mean it’s, do we have every month now is five five next? Exactly.Ryan Lopopolo: I can’t say that the poll markets would be very upset.swyx: I think it’s interesting that it’s also correlated with the growth. They announced that it’s 2 million users, but like almost don’t care about Codex anymore.This is it, this is the gay man. It’s like coding cool, soft like knowledge work.Ryan Lopopolo: That’s right. That’s right. This is the thing to chase after. Yeah. And this is one of things that my team is excited to support,swyx: get the whole like [01:10:00] self-hosted harness thing working, which you have done and like the rest of us are trying to figure out how to catch up, but then do things.You That’s right. With youVibhu: do things.swyx: That’s right. You can just do things. That’s the line for the episode.Vibhu: That’s it. Any other call to actions. You’re based in Seattle, your team, I’m guessing. New Bellevue office.Ryan Lopopolo: New Bellevue office. We just had the grand opening yesterday as of the recording date which was fantastic.Beautiful buildings. Super excitedly part of the Bellevue Community building the future in Washington. And I would say that there is lots of work to be done in order to successfully serve enterprise customers here in Frontier. We are certainly hiring and if you haven’t tried the Codex app yet, please give it a download.We just passed 2 million weekly active users growing at a phenomenally fast rate, 25% week over week. Come join us.swyx: Yes. And I think that’s an interesting no. My, my final observation opening is a very San Francisco centric company. I know people who have been. [01:11:00] Who turned down the job or didn’t get the job ‘cause they didn’t want to move to sf and now they just don’t have a choice.You have to open the London, you have to open the Seattle. And I wonder if that’s gonna be a shift in the culture, obviously you can’t say, butRyan Lopopolo: I was one of the first engineering hires out of our Seattle office, so Yeah.swyx: See I was very natural.Ryan Lopopolo: Its success has been part of what I have been building toward and it is, it has grown quite well, right?Yeah. We have durable products in the lines of business that are built outta there a ton of zero to one work happening as well, which is the core essence of the way we do applied AI work at the company to sprint after it new to figure out where we can actually successfully deploy the model.Yeah. Yes. A hundred percent. We also have a New York office too that has a ton of engineering presence.swyx: Yeah. Exact. Exactly. That’s these are my road roadmaps for a e wherever people hiring engineers, I will go. That’s right. Ra it’sVibhu: a cool office to New York is a old REI building, I believe the REI office.swyx: It’s just No, you’ll never be as big. New York is you can’t get [01:12:00] the size of office that they need.Ryan Lopopolo: The New York office, Seattle user has a very office Mad Men vibe. It’s beautiful. The Bellevue one is very green, gold fixtures, very Pacific Northwest is very cool place to the vibe.Be localVibhu: little, yeah. A lot of people are like there for people like New York. They wanna be in New York, right?Ryan Lopopolo: Yeah. Yeah. We have a fantastic workplace team that has been building out these offices. It really is a privilege to work here. Yeah. Excellent. Okay. Thank you for your time. You’ve been veryswyx: generous and you’re, you’ve been cooking, so I’m gonna let you get back to cooking.It’s been amazing to be with you folks. Happy Friday. Happy Friday. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Fresh off raising a monster $15B, Marc Andreessen has lived through multiple computing platform shifts firsthand, from Mosaic and Netscape to cofounding A16z. In this episode, Marc joins swyx and Alessio in a16z’s legendary Sand Hill Road office to argue that AI is not just another hype cycle, but the payoff of an “80-year overnight success”: from neural nets and expert systems to transformers, reasoning models, coding, agents, and recursive self-improvement. He lays out why he thinks this moment is different, why AI is finally escaping the old boom-bust pattern, and why the real bottleneck may be less about models than about the messy institutions, incentives, and social systems that struggle to absorb technological change.This episode was a dream come true for us, and many thanks to Erik Torenberg for the assist in setting this up. Full episode on YouTube!We discuss:* Marc’s long view on AI: from the 1980s AI boom and expert systems to AlexNet, transformers, and why he sees today’s moment as the culmination of decades of compounding technical progress* Why “this time is different”: the jump from LLMs to reasoning, coding, agents, and recursive self-improvement, and why Marc thinks these breakthroughs make AI real in a way prior cycles were not* AI winters vs. “80-year overnight success”: why the field repeatedly swings between utopianism and doom, and why Marc thinks the underlying researchers were mostly right even when the timelines were wrong* Scaling laws, Moore’s Law, and what to build: why he believes AI scaling laws will continue, why the outside world is messier than lab purists assume, and how startups can still create durable value on top of rapidly improving models* The dot-com crash and AI infrastructure risk: Marc’s comparison between today’s AI capex boom and the fiber/data-center overbuild of 2000, plus why he thinks this cycle is different because the buyers are huge cash-rich incumbents and demand is already here* Why old NVIDIA chips may be getting more valuable: the pace of software progress, chronic capacity shortages, and the idea that even current models are “sandbagged” by supply constraints* Open source, edge inference, and the chip bottleneck: why Marc thinks local models, Apple Silicon, privacy, trust, and economics all point toward a major role for edge AI* American vs. Chinese open source AI: DeepSeek as a “gift to the world,” why open models matter not just because they’re free but because they teach the world how things work, and how open source strategies may shift as the market consolidates* Why Pi and OpenClaw matter so much: Marc’s claim that the combination of LLM + shell + filesystem + markdown + cron loop is one of the biggest software architecture breakthroughs in decades* Agents as the new “Unix”: how agent state living in files allows portability across models and runtimes, and why self-modifying agents that can extend themselves may redefine what software even is* The future of coding and programming languages: why Marc thinks software becomes abundant, why bots may translate freely across languages, and why “programming language” itself may stop being a salient concept* Browsers, protocols, and human readability: lessons from Mosaic and the web, why text protocols and “view source” mattered, and how similar principles may shape AI-native systems* Real-world OpenClaw use: health dashboards, sleep monitoring, smart homes, rewriting firmware on robot dogs, and why the most aggressive users are discovering both the power and danger of agents first* Proof of human vs. proof of bot: why Marc thinks the internet’s bot problem is now unsolvable via detection alone, and why biometric + cryptographic proof of human becomes necessaryTimestamps* 00:00 Marc on AI’s “80-Year Overnight Success”* 00:01 A Quick Message From swyx* 01:44 Inside a16z With Marc Andreessen* 02:13 The Truth About a16z’s AI Pivot* 03:29 Why This AI Boom Is Not Like 2016* 06:33 Marc on AI Winters, Hype Cycles, and What’s Different Now* 10:09 Reasoning, Coding, Agents, and the New AI Breakthroughs* 12:13 What Founders Should Build as Models Keep Improving* 16:33 AI Capex, GPU Shortages, and the Dot-Com Crash Analogy* 24:54 Open Source AI, Edge Inference, and Why It Matters* 33:03 Why OpenClaw and PI Could Change Software Forever* 41:37 Agents, the End of Interfaces, and Software for Bots* 46:47 Do Programming Languages Even Have a Future?* 54:19 AI Agents Need Money: Payments, Crypto, and Stablecoins* 56:59 Proof of Human, Internet Bots, and the Drone Problem* 01:06:12 AI, Management, and the Return of Founder-Led Companies* 01:12:23 Why the Real Economy May Resist AI Longer Than Expected* 01:15:53 Closing ThoughtsTranscriptMarc: Something about AI that causes the people in the field, I would say, to become both excessively utopian and excessively apocalyptic. Having said that, I think what’s actually happened is an enormous amount of technical progress that built up over time. And like for, for example, we now know that neural network is the correct architecture.And I, I will tell you like there was a 60 year run where that was like a, you know, or even 70 years where that was controversial. And so, so the way I think about what’s happening is basically, I think, I think about basically the, the, the period we’re in right now is it’s, I call it 80 year overnight success, right?Which is like, it’s an overnight success ‘cause it’s like bam, you know, chat GPT hits and then, and then oh one hits, and then, you know, open claw hits and like, you know, these are open, these are, these are like overnight, like radical, overnight transformative successes, but they’re drawing on an 80 year sort of wellspring backlog, you know, of, of, of, of ideas and thinking it’s not just that it’s all brand new, it’s that it’s an unlock of all of these decades of like very serious, hardcore research.If I were 18, like this is a hundred, this is what I would be spending all of my time on. This is like such an incredible conceptual breakthrough.swyx: Before we get into today’s episode, I just have a small message for listeners. Thank you. We will not be able to bring you the ai, engineering, science, and entertainment contents that you so clearly want if you didn’t choose to also click in and tune into our content.We’ve been approached by sponsors on an almost daily basis, but fortunately enough of you actually subscribed to us to keep all this sustainable without ads, and we wanna keep it that way. But I just have one favor to ask all of you. The single, most powerful, completely free thing you can do is to click that subscribe button.It’s the only thing I’ll ever ask of you, and it means absolutely everything to me and my team that works so hard to bring the in space to you each and every week. If you do it, I promise you will never stop working to make the show even better. Now, let’s get into it.Alessio: Hey everyone, welcome to the Lidian Space Pockets. This is CIO, founder Kernel Labs, and I’m joined by s Swix, editor of Lidian Space.swyx: Hello. And we’re in a 16 Z with a, uh, mark G and welcome.Marc: Yes, yes. A and what, half of 16? Something like that. A one. Exactly,swyx: exactly. Uh, apparently this is the, the final few days in your, your current office.You’re moving across the road.Marc: Uh, we’re, yeah. We have a, we have some, we have some projects underway, but yeah, this is actually, oh, this is the original. We’re in actually the original office. We’re in the, we’re in the, we’re, we’re in the whole thing.swyx: It’s beautiful. Yeah. Great.Marc: Thank you.swyx: So I have to come out, uh, this is a, you know, I wanted to pick a spicy start in October, 2022.I just made friends with Roone and, uh, I wanted to give him something to sort of be spicy about. And I said, uh. Uh, it’ll never not be funny. The A 16 Z was constantly going. The future is where the smart people choose to spend their time and then going deep into crypto and not in ai. And that was in October 22nd, 2022.And Ruen says there was an internal meeting in a 16 Z to reorient around Gen ai. Obviously you have, but was there a meeting? What, what was that?Marc: I mean, I don’t, look, I’ve been doing AI since the late eighties.swyx: Yeah.Marc: So I, I don’t know, like all that, as far as I’m concerned, this stuff is all Johnny cum lately.Yeah. You, I mean, look, we’ve been doing ar entire existence. I mean, we’ve been doing AI machine learning deep, you know, deeply. We’ve been doing this stuff way from the beginning. Obviously a AI is just core to computer science. I, I, I actually view them as like quite, uh, quite continuous. Um, you know, Ben and I both have computer science degrees.Um, you know, we, we both, Ben, Ben and I actually both are world enough to remember the actual AI boom in the 1980s. Yeah. There was like a, there was a big AI boom at the time. Um, and there was a, was names like expert systems. Um, and they of like lisp and lisp machines. Uh, I, I coded in lisp. I was coding a lisp in 1989.When that was the, the language of the AI future. Um, yeah. So this is something that we’re like completely, you completely comfortable with. I’ve been doing the whole time and are very enthusiastic aboutswyx: is there a strong, like this time is different because, uh, my closest analog was 20 16 17. It was an AI boom.Mm-hmm. And it petered out very, very quickly. Um, we, it just, it just in terms of investingMarc: sort of, sort of,swyx: yeah. Investment, investment excitement.Marc: Although that’s really when the, the, the Nvidia phenomenon really, it was, I would say it was in that period when it was very clear that at, at the time it, the vocabulary was more machine learning, but it, it was very clear at that time that machine learning was hitting some sort of takeoff point.Alessio: Yeah.Marc: Well, and as you guys, you guys have talked about this at length on, on your thing, but, you know, if you really track what happened, I think the real story is, it was, it was the Alex net, uh, basically breakthrough in like 2013. That was the, that was the real knee in the curve. Um, and then it was obviously the transformer breakthrough in 17.Alessio: Yeah.Marc: Um, and then everything that followed. But, but, you know, look, machine learning, you know, there were, you know, look, uh, I mean look, I’ve been working, you know, I’ve been working with, uh, one of my, you know, kind of projects working with Facebook since 2004. Um, and on the board since 2007, and of course, you know, they, they started using machine learning very early, um, and, you know, have used it basically, you know, for like 20 years for, you know, content, you know, feed optimization and advertising optimization.And obviously many, you know, financial services. You know, many, many, many companies, many different sectors have been doing this. And so it’s like one of these things, it’s like, it’s not a, it’s not a single thing. Like it’s, it’s like, it’s like layers, right? Yeah. Um, and, and the layers arrive at different paces and, but they kind of build up.swyx: Yeah.Marc: Uh, they kind of build up over time and then, and then, yeah. And then look, in retrospect, it was 2017 was kind of the, you know, the key, the key point with the trans transformer and then. And then as you guys know, there was this really weird like four year period where it’s like the, the transformer existed and then it was just like,swyx: let’s go.Yeah.Marc: Well, but, but it was just, but, but between 2020, but between 2017 and 2021, I mean, that was the era of which like companies like Google had internal chat Botts, but they weren’t letting anybody use them.swyx: Yeah.Marc: Right. And then, you know, and then OpenAI developed Chat GT or GPT two, and then they told everybody, this is way too dangerous to deploy.Right. Yeah. You know, we can’t possibly let normal people, normal people use this thing. And then you, you guys, I’m sure remember AI Dungeon, um mm-hmm. So the o for, there was like a year where like the only way for a normal person to use GP T three was in, in AI dungeon.Alessio: Yeah.Marc: And so you, you, we would do this, you’d go in there and you’d pretend to play Dungeons and Dragons.In reality, you’re just trying to talk to talk to GPT. And so there was this, you know, there was this long, you know, and I, you know, the big, big companies, you know, big companies are cautious and, you know, the big companies were cautious. It, it, by the way, it took open ai. You know, they, they, they talk about this, it took open AI time to actually adjust, you know, kind of re redirect their researchswyx: path.I, I think, uh, let say Rosewood, right? Uh, the, the dinner that founded OpenAI was right there.Marc: Right, right. But that, that dinner would’ve taken place in 20swyx: 18Marc: 19. The formation of OpenAI Uhhuh as late as 2018.swyx: Uh, uh, sorry. Uh, no, I’m, I’m, I’m, I’m wrong. Probably It should be 20. Yeah. They just celebrated a 10 year anniversary, so it it is 2025.Yeah, so, so 2015?Marc: Yeah. 2015. Yeah. 2015. But then, uh, um, Alec Radford did G PT one in what, probablyswyx: mm-hmm. 17, 18,Marc: yeah. 17, 18. So it, yeah. For, and then, and then they didn’t really, and then GPT three was what? 2020? 2020.swyx: 2020.Marc: Because that became copilot immediately. Even open ai, which has been, you know, the leader of, of this thing in the last decade, you know, e even they had to adapt and, and, and lean into the new thing.And so. Um, yeah, I, I think it’s just this process of basically sort of wave after wave layer after layer, you know, building on itself. And then you kind of get these catalytic moments where, where the whole thing pops and, and obviously that’s what’s happening now.swyx: Is it useful to think about will there be any ai, winter?‘cause there’s always these patterns. Like, is this, in the summer is something I constantly think about because do I get, do I just like. Just get endlessly hyped and just trust that I will only be early and never wrong or right. Well, are we, will there be a winter?Marc: So there’s something about, say the following.There’s something about AI that has led to this repeated pattern. Um, and, and, and you guys know this,swyx: it’s summer, winter, summer,Marc: winter, summer, winter, summer, winter. And it goes back 80 years. Yeah. 80 years. Uh, so the original neural network paper was 1943. Right. Which is, which is amazing. Uh, that it was, it was far back that long.And then there was you, if you guys have ever talked about this on your show, but there was this, uh, there was a big, uh, there was an a GI conference at Dartmouth University in 1950. 55. 55, yeah. And they got a NSF grant to, uh, for the, all the AI experts at the time to spend the summer together. And they figured if they had 10 weeks together, they could get a GI, uh, at the other end.And they got their, by the way, they got the grant, they got the 10 weeks and then, you know, 1955, you know. No, no. A GI. And like I said, I, I lived through the eighties version of this where there was a big, a big boom and a crash. And so, so there is this thing, and there, there is something about AI that causes the people in the field, I would say, to become both excessively utopian and excessively apocalyptic.Um, and, and it’s probably on both sides of like the, the, the boom bus cycle. You, you kind of see that play out. Having said that, I think what’s actually happened is like just, and you know, and we now know in retrospect like an enormous amount of technical progress that built up over time. And like for, for example, we now know that neural network is the correct architecture.And I, I will tell you like there was a 60 year run where that was like a, you know, or even 70 years or that was controversial. And, and we now know that that’s the case. And so we, we now, you know, everything we’re building on today just sort of derives from the original idea in 1943. And so, so in retrospect, we, we now know that like, these, these guys are right.They, they, you know, they would get the timing wrong and they thought, you know, capabilities would arrive faster, or they were, it could be turned into businesses sooner or whatever, but like, they were fundamentally, the, the scientists who worked on this over the course of decades were fundamentally correct about what they were doing.And, and the, and the payoff from, from, from all their work is happening now. And so, so the way I think about what’s happening is basically, I think, I think about basically the, the, the period we’re in right now is it’s, I call it 80 year overnight success, right? Which is like, it’s an overnight success.‘cause it’s like bam, you know, chat, GPT hits and then, and then oh one hits, and then, you know, open claw hits and like, you know, these are open, these are, these are like overnight, like radical, overnight transformative successes, but they’re drawing on an 80 year sort of wellspring backlog, you know, of, of, of, of ideas and thinking it’s not just that it’s all brand new, it’s that it’s an unlock of all of these decades of like very serious, hardcore research.Um, and thinking, and look, there were AI researchers who spent their entire lives. They got their PhD. They, they worked for, they’ve researched for 40 years. They retired in a lot of cases, they passed away and they never actually saw it work.swyx: Yeah. It’s all sad.Marc: It is. It is sad. It’s sad. Knewswyx: Jeff Hinton was like the last guy.Marc: Yeah. Yeah. Well, there were the guys, uh, was a guy, Alan Newell. I mean, there’s tons of John McCarthy. You know, John McCarthy was like one of the inventors in the field. He’s one of the guys who organized the Dartmouth Conference and you know, he taught at Stanford for 40 years. Wow. And passed, you know, passed away, I don’t know, whatever, 10, 10 years ago or something.Never, never actually go. Got to see it happen. But like, it is amazing in retrospect, like, these guys were incredibly smart and they worked really hard and they were correct. So anyway, so then it’s like, okay, you know, say history doesn’t repeat, but it rhymes. It’s like, okay, does that mean that there’s gonna be another, like, you know, basically boom buzz cycle.And I, I will tell you, like, let, like in a sense, like yes, everything goes through cycles and, you know, people get overly enthusiastic and overly depressed and there’s, there’s a time, there’s a timelessness to that. Having said that, there’s just no question. Um, so the form, the foremost dangerous words in investing this time are, this time is different.Do you know the 12 most dangerous words investing? No. The four most d foremost dangerous words in investing are this time is different. Yeah. Um, the 12 most dangerous words. And so like, I’ll tell you what’s different. Like now it’s working like, like there’s just no, I mean, look, there’s just no question.And by the way, I, I’ll just give you guys my take. Like L LLMs, like from, from basically the Chad G PT moment through to spring of 25. I think you could still, I think well intention, well, and of. Form skeptics could still say, oh, this is just pattern completion. And oh, these things don’t really understand what they’re doing.And you know, the hall hallucination rates are way too high. And, you know, this is gonna be great for creative writing and creating, you know, Shakespeare and so sonnets and, you know, as, as rap lyrics or whatever, like, it’s gonna be great and all that stuff, but we’re not gonna be able to harness this to make this relevant in, you know, coding or in medicine or in law or in, you know, you know, kind of feels that, you know, kind of really, really matter.And I think basically it was the reasoning breakthrough. It, it was oh one and then R one that basically answered that question basically said, oh no, we’re gonna be able to actually turn this into something that’s gonna work in the real world. And, and then obviously the coding breakthrough over the, over basically the coding breakthrough that kind of catalyzed over the holiday break was kind of the third step in that.Mm-hmm. Where you’re just like, alright, if, if, you know, if Linus Tova is saying that the AI coding is no better than he is like. Like, that’s, that’s never happened before. That’s theswyx: benchmark.Marc: Yeah. That’s never happened before. And so now we know that it’s, it’s gonna sweep through coding and, and then, and then we, we know, you know, we know that if it’s gonna work in coding, it’s gonna work in everything else.Right. It’s just then, because that’s, that’s like, that’s like, that’s like the hardest in many ways. That’s the hardest example. And how everything else is gonna be a, a derivative of that. And then on top of that, we just got the agent breakthrough, you know, with Open Claw, which is fantastic. Which is amazing and incredibly powerful.And then we just got the, the, um, the auto research, uh, you know, the, the self-improvement. You know, we’re now into the self-improvement breakthrough. And so the, so the way I think about it is we’ve had four fundamental breakthroughs in functionality, l OMS reasoning, uh, agents, um, and then, uh, and, and then now RSI, um, and, and they’re all actually working.Um, and so I’m, I’m just, as you like, you can tell I’m jumping outta my shoes. Like, like this is, like this is it like this, this is the culmination of 80 years worth of worth of work, and this is the time it’s becoming real.Alessio: Yeah.Marc: I, I’m completely convinced.Alessio: I think the anxiety that people feel is like during the transistor era, yet Mors law, and it’s like, all right, we understand why these things are getting better.We understand the physics of it. Yeah. With ai, it’s. It’s so jagged in like the jumps where like, like you said, it’s like in three months you have like this huge jump like, and people are like, well this can keep happening. Right? But then it keeps happening,Marc: it’ll keep happening.Alessio: And so like how do you think about also timelines of like what’s we’re building?I think we always have this question with guests, which is like, you know, should you spend time building harness for a model versus like the next model just gonna do it one shot in the lead space. Right. And how does that inform, like how you think about the shape of the technology? You know, you talk about how it’s a new computing platform.If you have a computing platform, then like every six months it like drastically changes in what it looks like. It’s hard to build companies on top of it.Marc: Yeah. So, so a couple things. So one is like, look, the, the Moore’s law was what we now call a scaling law. Like Moore’s Law was a scaling law and for your younger viewers, more Moore’s Law was every chip chip chips either get twice as powerful or twice as cheap every, every 18 months.And that, and that and that, you know, that it’s gotten more complicated in the last few years. But like that, that was like the 50 year trajectory of, of, of the computer industry. And then, and then by the way, and that’s what took the mainframe computer from a $25 million current dollar thing into, you know, the phone in your pocket being, you know, a million times more powerful than that.Like that, you know, for, for 500 bucks. And so that, that was a scaling law. And then, and then, and then key to any scaling law, including Moore’s Law and the AI scaling laws is, you know, they’re not really laws, right? They’re, they’re, they’re, they’re predictions, but when they work, they become self-fulfilling predictions because they, they, they, they, they set a benchmark and, and then the entire industry, right?All the smart people in the industry kind of work to make sure that, that, that actually happens. And so they, they kind of motivate the breakthroughs that are required to, to keep that going. And, and in and in chips, that was a 50 year, that was a 50 year run. Right. And it, it was amazing. And it’s still happening in, in some areas of, of chips.I think the same thing is happening with the, the core scaling laws. The core scaling laws. In, in, in ai, you know, they’re, they’re not really laws, but like they, they are basically. There are predictions and then they’re motivating catalysts for the research work that is required to be. And, and, and, and by the way, also the investment, uh, dollars, um, uh, you know, required to basically keep, you know, keep the curves going and, and look, it, it is, it’s gonna be complicated and it’s gonna be variable and they’re, you know, there’re gonna be walls that are gonna look like they’re fast approaching, and then they’re gonna be, you know, engineers are gonna get to work and they’re gonna figure out a way to punch through the walls.And obviously that’s, you know, that’s been happening a lot, you know, and then look, there’s gonna be times when it looks like the walls have, you know, the, the, the laws have petered out and then they’re gonna, they’re gonna pick up again and surge and then, and then, and then it, it appears what’s happening to the eyes is there’s not multiple, you know, multiple scaling laws.Um, there’s multiple areas of improvement. And, and I think, you know, I don’t know how many more there are already yet to be discovered, but there are probably some more that we don’t know about yet. You know, they, like, for example, there’s probably some scaling law around, um, world models and robotics that we don’t fully understand, you know, kind of acquisition of data at scale in the real world that we don’t fully understand yet.So that, that, that one will probably kick in at some point here. There’s a bunch of really smart people working on that. Um, and so, yeah, I, I think the expectation is that, that, you know, the, the scaling laws generally are gonna continue. Yeah. The, the pace of improvement will continue to move really fast.Um. To your question on like what to build. So, uh, I’m a complete believer the scaling laws are gonna continue. I’m a complete believer the capabilities are gonna keep getting amazing, um, you know, leaps and bounds. Uh, the part where I kind of part ways a little bit with how, what I would describe as the AI purists, um, you know, which is, which I would characterize as like the people who are.In many ways, the smartest people in the field, but also the people who spend their entire life, like at a lab, um, and have, have, I would say, have very little experience in the outside world. Um, the, the, the nuance I would offer is the outside world of 8 billion people and institutions and governments and companies and economic systems and social systems is really complicated.Um, and, um, and doesn’t, you know, it it 8 billion people making collective decisions on planet Earth is not a simple process of like, just like you see this happening now. It’s like a bunch of AI CEOs have this thing, which is just like, well, there’s just this, they just all have this kind of thing when they talk in public where they’re just like, well, there’s these, these obvious set of things that so society to do.Alessio: Mm-hmm.Marc: And then they’re like, society’s not doing any of those things. Right. And it’s like, how can society not, you know, what, whatever their theory is, how can society not see x, y, Z? Mm-hmm. And the answer is, well, society is number one. There’s no single society, it’s like 8 billion people. And they like all have a voice, and they all have a vote, like at the end of the day of how they, they react to change.And then, you know, it just like, it’s just human reality is just really complicated and messy. Um, and, and, and so the specific answer to your question is like, as usual, it depends. Um, you know, it, it depends. Look, pe there’s no question people are gonna, like, there’s no question they’re gonna be companies.It’s already happening. There are companies that think that they’re building value on top of the models and then they’re just gonna get blissed by the, by the next model. There’s no question that’s happening. But I think there’s no question also that just the process of adaptation of any technology into the real and into the real messy world of humanity is, is just going to be messy and complicated.It’s, it’s not going to be simple and straightforward. It’s gonna be messy and complicated. And there are gonna be a lot of companies and a lot of products, um, uh, and in, in fact entire industries that are gonna get built to, to, to basically actually help all of this technology actually reach real people.Alessio: The amount of capital going into these companies, I mean, Dario talked about it on the Door Cash podcast and Door Cash was like, why don’t you just buy 10 x more GPUs? And he is like, because I’m gonna go bankrupt if the model doesn’t exactly hit the, the performance level. How do you think about that?Also as a risk on, you know, you guys are investors, open AI and thinking machines and world apps. It seems like we’re leveraging the scaling loss at a pretty high rate, right? Like how comfortable, I guess, do you feel with the downside scenario, like, and say like things Peter out, you think you can kind of like restructure uh, these build outs and uh, you know, capital investments.Marc: Yeah. So should start by saying, so I live through the.com crash, um, and I can tell you stories for hours about the.com crash and it was horrible. No, it was awful. It was, it was, it was apocalyptic by the way. The, a lot of the.com crash was actually at the time, it was actually a telecom crash. It was a bandwidth crash.Like the, the thing that actually crashed, that wiped out all the money with the tele, the telecom companies.swyx: GlobalMarc: crossing. Global, global, yeah.swyx: I’m from Singapore and they, they laid so much cable o over over our oceans.Marc: Actually there was a scaling law in the.com. Era. And it was literally the, the US Commerce Department put out a report in 1996 and they said internet traffic was doubling every quarter.Um, and, and actually in 1995 and 1996, internet traffic actually did double every quarter. And so that became the scaling law. And so what all these telecom entrepreneurs did was they went out and they raised money to build fiber, anticipating that the demand for bandwidth is gonna keep doubling every quarter.Doubling every quarter though is like, you know, grains of chess and the chessboard, like at some point the numbers become extremely large. Right. And, and, and it really, and really what happened was the internet. The internet by the way, continuously kept growing basically since inception. And it’s, you know, it’s, it’s continuously grown.It’s never shrunk. And it’s grown really fast compared to anything else. Mm-hmm. You know, in, in, in human history. But it wasn’t doubling every quarter as of 19 98, 19 99. And so there was this gap in the expectation of what they thought was a scaling law versus reality. And that’s actually what caused the.com crash, which was the, it they, they way over companies like global crossing way overbuilt fiber, which is sort of the, and by the way, fiber, telecom equipment, you know, so all the, all the networking gear, you know, and then, and then by the way, the actual physical data centers, like that was the beginning of the, of the, of the data center build and then, and the data center overbuild.And so you had that, but it was, it was literally, I think it was like $2 trillion got wiped out, right? It was like Jesus, it was like a big, it was. And by the way, the other, the other subtlety in it was the internet companies themselves never really had any debt. ‘cause tech, tech companies generally don’t run on debt, but the telecom companies run on debt.Physical infrastructure companies run on debt. And so the companies like Global Crossing not just raise a lot of equity, they also raise a lot of debt. So they’re highly levered. And so then you just do the thing. It’s just like, okay, you have a highly levered thing where you’re, you’re just over, you’re overbuilding capacity.Demand is growing, but not as fast as you hoped. And then boom, bankrupt. Right. And, and then it, and then it’s like they say about the hotel industry, which is, it’s always the third owner of a hotel that makes money. It has to go bankrupt twice, right? You have to wash out all of the over optimistic exuberance before it gets to actually a stable state.And then it makes money. So by the way, all of those data centers and all of those, all the fiber that they’re in use, it’s all in use today. Yeah. But 25 years later. But it, it, it took, and actually the elapsed time was, it took 15 years. It took 15 years from 2000 to 2015 to actually fill, fill up all that capacity.The cautionary warning is the, the overbuild can happen. Um, and, and, and, and, you know, you, you get into this thing where basically everybody, everybody who basically has any sort of institutional capital, it’s like, wow. It’s just, I, I don’t know how to invest in these crazy software things. For sure I can put build data centers and for sure I can buy GPUs that I can deploy, you know, compute grids and, and all these things.Um, and so, you know, if you’re a pessimist, you could look at this and you could say, wow, this is like really set up to be able to basically replicate, you know, what we went through, what we went through in 2000. Obviously that would be bad. The counter argument, which is the one I I agree with, which is the counter on, on the other side is a couple things.One is the companies that are investing all the, the companies that are investing the money are like the bluest chip of companies. And so back, back, back in the, in the do, like Global Crossing was like a, it was like an entrepreneur. It was like a, a new venture, but like the money that’s being deployed now at scale is Microsoft, and, you know, and Amazon and Google, Facebook and Facebook and Nvidia and, you know, these, these, these, and, and now you know, by the way, open ai philanthropic, which are now at like, you know, really serious size, um, you know, as companies with, you know, very serious revenue.These are very large scale companies with like, lots, lots of cash, lots of debt capacity that they’ve, they’ve never used. And so th this is institutional in a way that, that really wasn’t at the time. And then the other is, at least for now, every dollar that’s being put into anything that results in a running GPU is being turned into revenue right away.Like so, and you guys know this, like everybody’s starved for capacity, everybody’s starved for compute capacity and then, you know, all the associated things, memory and, and, and interconnected and everything else. Um, data center space. And so e every dollar right now that’s being put into the ground is turning into revenue.And, and it, and in fact, I actually think there’s an interesting thing happening, which is because everybody starve for capacity, the models that we actually have that we can use today are inferior versions of what we would have if not for the supply constraints. That’s true. Um, if Right pose a hypothetical universe in which GPUs were 10 times cheaper and 10 times more plentiful mm-hmm.The models would be much better. ‘cause you would just allocate a lot more money to training and you’d just build better models and they would be better. Um, and so we’re, we’re actually getting the sandbag version of the technology.swyx: Yeah. No. Everything we use is quantized because the, the labs have to keep the, the full versions,Marc: right?swyx: LikeMarc: we’re not even getting the good stuff.swyx: Yeah.Marc: But, but getting the good stuff, it’s, it’s just, even if technical progress stops. Once there’s like a much bigger build of like GPU manufacturing capacity and memory, you know, all, all the things that have to happen in the course of the next five or 10 years.Once it happens, even the current technology is gonna get, gonna get much better. And then as you know, like there’s just like a million ways to use this stuff. Like there’s just like a million use cases for this. Mm-hmm. Like, it, it, you know, this isn’t just sending packets across a, a thing, whatever, and hoping that people find something to do with it.This is just like, oh, we apply intelligence into every domain of human activity. And then it works like incredibly well. Yeah. Um. Here’s what I know, here’s what I know. Um, in the next three or four year, it’s like somewhere between three or four years out, basically everything is selling out. So like the, the entire supply chain is, is, is, is sold out or, or, or selling out.And so there, there’s no, like, we’re just gonna have like chronic supply shortage for, you know, for years to come. Um, there’s going to be a response from the market that’s gonna result in an enormous, you know, it’s happening now. An enormous flood of investment in a new fab capacity and ev you know, every, everything else to be able to do that, at some point the supply chain constraints will unlock, you know, at least to some degree that will be another accelerant to industry growth when that happens.‘cause the products will get better and everything will get cheaper. Um, and so, so I know that’s gonna happen. I know that, you know, the deployments, you know, the, the actual use cases are like really compelling. And then, like I said, you know, with reasoning and agents and so forth, like, I know they’re just gonna get like much, much better from here.And so I, I, I know the capabilities are like really real and serious. I also know that the technical progress is not going to stop. It. It, it is excel. It is, is accelerating. Like the, the breakthroughs are are tremendous. I mean, even just month over month, the breakthroughs are really dramatic. And so, you know, I think if you were a cynic and there, there are cynics, you can look at 2000, you can find echoes.But I can’t even imagine betting it that this is gonna like somehow disappoint and, you know, at least for years to come, I think it would be essentially suicidal to make that bet. Yeah. Um, it was that Michael Burry, uh, uh, that’sswyx: anMarc: interesting guy, huh? We’ll pick on a guy. We’ll pick, let’s pick on one guy.We’ll pick. Well ‘cause he did, he he came out with, it was, it was the, heswyx: doesn’t mind.Marc: It was the Nvidia short. Right. He came with the Nvidia short. And then if you guys probably talked about this, which is the, the analysis now that like the current models are getting better faster at such a rate that if you are running an Nvidia, if you’re running an Nvidia inference chip today, that’s three years old, you’re making more money on it today than you did three years ago because the pace of improvement of the software is, is faster than the, the, the depreciation cycle, the chip.And then my understanding is Google is running. I don’t if they’ve, I don’t know exactly what, uh, these are rumors that I’ve heard or maybe it’s public, but, um, I think Google’s running very old TPUs, very profitably. Ference. Yeah. And very profit and very profitably. Yeah. Um, and so, so it actually turns out, as far as I can tell, it’s actually the opposite of the Beery thesis is actually.He was actually 180 degrees wrong. It’s actually the, the, the, the old Nvidia chips are getting more valuable, which is something that’s like literally never happened before. Like it’s never been the case that you have an older model chip that becomes more valuable, not less valuable. And that, and again, that’s an expression of the just ferocious pace of software progress.Ferocious pace of capability payoff. Yeah. Uh, that you’re getting on the other side of this. And so I just, the idea of betting against that, like.swyx: Yeah. Yeah. Well, one ofMarc: my, it seems like an invitation to get your face ripped up.swyx: One of my early hits was like modeling the lifespan of the H 100 and h two hundreds and, and going like, you know, usually they advise like four to seven years and it was, you know, maybe you sort of realistically haircut cut it down to two to three.Yeah. But actually it’s going up and not down. Yeah. And, and uh, that’s, I mean that’s, I think that’s the dream. Uh, we are finding utilization and I think utilization solves all problems. Like, you can, you can find use, use cases for even like the poor, like even memory, we’re having a shortage. Right. And, and even like the, the shittier versions of, of memory that we do have, we are finding use cases for it.So like That’s great.Marc: Yeah.Alessio: How, how important is open source AI and kinda like edge inference in a world in which you have three years of supply crunch. Like, do you think in the, like, you know, if you fast forward like five years, like how do you think about inference, uh, in the data center versus at the edge?Marc: Well, so just to start, yeah. So I think, I think open source is very important for a bunch of reasons. I think edge, edge inference is very important for a bunch of reasons. I, I think just practically speaking, if we’re just gonna have fundamental construc, supply crunches for the next, I mean, you, you guys know if you just project forward demand over the next three years, right?Yeah. Relative to supply, one of the, its main predictions you can do is what’s gonna, what, what’s gonna happen to the cost of, of inference in the core, uh, over the next three years? And like, it may rise dramatically, right? Like, so, so what is, and then is, is, you know, like the, the, the big model competition are subsidizing heavily right now.Right? Right. And so, so what’s the, what will be the average person’s, you know, per day, per month token cost, you know, three years from now to do all the things that they want to do. And I, I don’t know, it’s gonna. I mean, I have, you guys probably have friends, I have friends today who are paying a thousand dollars a day for open claw, for claw tokens to run open claw.Right? And so, okay. $30,000 a month. Right? And, and by the way, those, those friends have like a thousand more ideas of the things that they want their claw to do, right? Yeah. And so you, you could imagine there, there’s like latent demand of up to, I don’t know, five or $10,000 a day of, of, of tokens for a fully deployed, you know, per personal agent.Uh, and obviously consumers can’t pay that, right? And so, so, but it gives you a sense of the fu of the fu of the future scope of demand, right? And so, so even, even if there’s a 10 x improvement in price performance, that still, you know, goes to a hundred dollars a day, which is still way beyond what people can pay.Mm-hmm. So there’s just gonna be like. Ferocious to me, by the way. The agent thing, the other interesting thing is I think the agent thing, so up until now, a lot of the constraints of GGPU constraints, I think the agent thing now also translates into CPU constraints. Mm-hmm. Right?swyx: CPU memory.Marc: Yes. CPU memory, right?And so, like the entire chip ecosystem is just gonna get wait,swyx: wait for network constraints, that that will be the killer.Marc: It’s all bottleneck potentially for years. And so, so I, I think that Brad, and, and I think it’s actually possible, I mean, generally inference costs are gonna keep coming down, but I think the, let’s put it this way, the rate of decline, I think may level out here for a bit because of these supply constraints.And then at some point, maybe the lab stops subsidizing so much and that, that, that again, will be, be an issue. And so there’s just gonna be so much more demand for inference than, than can be satisfied. Um, you know, kind of with the centralized model. And then, and then, you know, you guys know this, but like all the, just the dramatic, I mean just the dramatic innovations that have happened in the Apple silicon to be able to do, uh, inferences, it’s quite amazing the level of effort being put.Like the open source guys are putting incredible effort into getting, you know, this recurring pattern where the big model will never run on a pc, and then six months later mm-hmm. Oh, it runs in a pc, right? It’s like amazing. And there’s very smart people working on that. So there’s all that. And then look, there’s also, you know.There’s also like other, there’s other motivators. There’s other motivators which is just like, okay, how much trust are the big centralized model providers? You know, how much trust are they building in the market versus, you know, how much are, you know, at least for, in certain cases with some people, for certain use cases, people being like, well, I’m not willing to just like, turn everything over.So there, there, there’s all the trust issues. Um, by the way, there’s also just like straight up price optimization. There’s many uses of AI where you don’t need Einstein in the cloud. You just need like a, a a, a smart local model. There’s also performance issues where you want, you know, you want, you know, you’re gonna want your doorknob to have an AI model in it.Right. You know, to be able to, you know, do, um, you know, to be able to do access control. Um, obviously like everything with a chip is gonna have an AI model in it. Mm-hmm. And it, a lot of those are gonna be local. Um, and so, yeah. No, like I think, I think you’re gonna have ti and then you’re gonna, by the way, also wearable devices, you know, you don’t wanna do a complete round trip.You want, you know, you, whatever your smart devices are, you want it to be like super low latency. Yeah.swyx: The question, do we care who makes it? Yeah. One of the biggest news this week was the collapse of AI two, the Allen Institute. Mm-hmm. One of the actual American open source model labs. Yeah. Um, and, uh, I’m not that optimistic on, on American open source.Yeah. Like you, you guys invested in MIS trial and MIS trial’s doing extremely well outside of China. That’s about it.Marc: Yeah. We’ll see. We’ll see. I look, I, number one, I do think we care. Uh, I do think we, I do think we care who makes it. Um, I would say this, the, the, the, the previous presidential administration wanted to kill it in the us Oh yeah.They wanted to drown in the bathtub. Um, and so they wanted to kill it. So at least we have a government now that actually like, actually wants it wants it to happen. And youswyx: earned to councilMarc: and Yeah. And the new and the P pcast. Yeah. So the, the, you know, this admin for whatever other political issues people have, which are many, you know, this administration has, I think a very enlightened view and in particular an enlightened view on AI and in particular on open source ai.Uh, and so they’re very supportive. Um, my read is the Chi. The Chinese have a very, the various Chinese companies have a very specific reason to do open source, which is, they, they, they don’t fundamentally, they don’t think they can sell commercial, uh, AI outside of China right now. And or at least specifically not, not in the US for a combination of reasons.And so they, they kind of view, I think, open source AI as a bit of a loss leader against basically domestic, uh, you know, paid, paid services. And then kind of an, you know, kind of an ancillary products. You know, they’re, they’re very excited about it, by the way. I think it’s great. I think it’s great that they’re doing it.Um, you know, I think Deeps seek was like a gift to the world. Um, I think. The great thing about open source, open source, the, the, the impact of open source is felt two ways. One is you, you get the software for free, but the other is you get to learn how it works, right? And so like the paper, the paper, the paper and, and the code, right?And the code. And so, like, for example, I thought this was amazing. So open comes out with L one and it’s an amazing technical breakthrough, and it’s just like, absolutely fantastic. But of course they don’t explain how it works in detail. And then of course they hide the, they hide the reasoning traces, right?And, and then, and then, and then everybody’s like, okay, this is great, but like, who’s gonna be able to replicate this? Are other people gonna be able to do this? You know, is their secret sauce in there? And then our one comes out and it’s just like, there’s the code and there’s the paper, and now the whole world knows how to do it.And then, you know, three months later, every other AI model is, is adding reasoning. And so, so you get this kind of double, like even if the Chinese models themselves are not the models that get used, the education that’s taken place to the rest of the world, the information diffusion, you know, is incredibly powerful.So that happens and then, I don’t know. We’ll, we’ll see. You know, there are a bunch of American, you know, open source, you know, ai, uh, model companies. I mean, look, there’s gonna be tremendous, you know, there already is. There’s, you know, there’s gonna be tre there’s tremendous competition, uh, among the primary model companies.You know, there’s, depending on how you count, there’s like four or five, you know, big co model companies now that are, you know, kind of neck and neck, uh, in different ways. Um, uh, you know, and, and, and, um, you know, and then obviously Bo Bo both X and then MetAware involved are, you know, both have huge, you know, huge attempts to, you know, kind of, to kind of leapfrog underway.And then you’ve got, you know, a whole fleet of startups, new companies, including a whole bunch that we’re backing, that are, you know, trying to come out with different approaches. And then you’ve got whatever it is. I don’t know how, how many, how many, like main line foundation model companies are there in China at this point?It’s probably six. It’sswyx: five Tigers is what they call it. Yeah. Uh, Quinn is in questionable because there’s change in leadership,Marc: right?swyx: Yeah.Marc: But that, does that include, that includes like Moonshot,swyx: yes. Can deep seek, uh, uh, ZI, um, Quinn oh one is in there.Marc: Right. And then, um, and by dance and, and then you see,swyx: ance would be like the next tier ance.They weren’t as prominent. They weren’t, didn’t haveMarc: a leading. Yeah. But they, you at least, you know, ance is very inspiring and presumably they have more stuff coming and Tencent probably has more stuff coming and, and so forth. And so, so, so like, look, here, here would be a thing you can anticipate, which is there are not these markets, there are not going to be between the US and China right now, there’s like a dozen primary foundation model companies that are like at scale, at, at some level of a critical mass.It’s not gonna be a dozen in three years, right? Like, it just because these industries don’t bear a dozen, it’s, it’s gonna be three or you know, there’s gonna be three or four big winners or maybe one or two big winners. And so there’s gonna be like a whole bunch of those guys that are gonna have to figure out alternate strategies.Um, and I think like open source is one of those strategies. And so I, I think you could see like a whole, i, I, I think the questions like, who’s gonna do open source? I think that could change really fast. I, I think that, that, that’s a very dynamic thing. I think it’s very hard to predict what happens. And, and I think it’s very important.swyx: NVIDIA’s doing a lot.Marc: Well, I was gonna say. Well, exactly. And then you’re got Nvidia and then, and then, you know, just to, again, indu, there’s an old thing in business strategy, which is called, uh, commoditize Compliments. Commoditize the compliment. That’s right. And so if your Jensen is just kind of obvious, of course, you wanna commoditize the software.Yeah. And he’s, and to his enormous credit, he’s putting enormous resources behind that. And so maybe it, maybe it’s literally Nvidia and I think that would be great.Alessio: Yeah. Uh, narrative violation to European projects, uh, in the, uh, damn.swyx: I’m hosting my, uh, Europe, uh, conference soon. And I got both of them.Alessio: They got us.They got us. MarkMarc: finished. They got us, us. Well, wait a minute. Where was Peter? So where was Steinberger when he did? In AustriaAlessio: was, yeah, yeah, yeah.Marc: He was in what? He was in Vienna. Oh, he was in Vienna. And then where is he now?swyx: Uh, he’s moving to sf.Marc: Okay. Okay. Alright. Okay, there we go. And then, yeah, the PI guy, right?The PI guys are European.swyx: Yeah, they’re also, they’re buddies inAlessio: Australia. Mario’s also there. Yeah.Marc: Right. And are they, yeah, they haven’t announced yet. Any sort of change changed or have theyAlessio: No, they’re, they have a company there.Marc: Okay. Got, okay. Good.Alessio: Good, good,good.Alessio: Um,Marc: yeah, good.swyx: Anyways, I think pie and open cloud very important software things and, and I just wanted you to just go off on what you think.Marc: Yeah. So I think in co the, the combination of the two of them I think is one of the 10 most important softwares. Openswyx: Claw got all the attention, but Right. Talk about pie,Marc: pi pie’s, kind of the Yeah. PI’s, PI’s kind of the architectural breakthrough for those of us who are older. There was this whole thing that was very important in the world of software basically from like 1970 to, I don’t know, it still is very important, but like 19, from 1973 to like basically the creation of Linux, which is basically this, this thing used to call like the Unix mindset.Like so, so, ‘cause there were all these different, you know, theories. There are all these different operating systems and mainframes and, and then you know, all these windows and Mac and all these things. And then there was this, but kind of behind it all was this idea of kind of the Unix mindset. And the Unix mindset was this thing where basically you don’t have these, like, like in the old days, like, like the operating system that like made the computer industry really work, like in the 1960s mm-hmm.Was this thing called o os 360, which was this big operating system that IBM developed that was supposed to basically run everything. And it was this like giant monolithic architecture in the sky. It was like a, you know, it was like a giant castle. Um, of software. And, and by the way, it worked really well and they were very successful with it.But like, it was this huge castle in the sky, but it was this thing, it was almost unapproachable, which is like, you had to be kind of inside IBM or very close to IBM. And you had to really understand every aspect, how the system worked. And then the, the Unix sky is originally out of at and t and then out out of Berkeley, um, you know, came out and they said, no, let’s have a completely different architecture.And the way architecture’s gonna work is we’re gonna have, we’re gonna have a, a prompt and, and a, and a shell. And then, and then we’re gonna, all, all the functionality is gonna be in the form of these discreet modules, and then you’re gonna be able to chain the modules together. Mm-hmm. Yeah. And so like the, the, the op, it’s almost like the operating, operating system itself is gonna be a programming language.Um, and then that led led to the, the, the sort of centrality of the shell. Um, and then that led to sort of, uh, you know, basically chaining together Unix tools. And then that led to the emergence of these, these scripting languages like Pearl, where you, you could basically kind of very easily do this, and then the shells got more sophisticated and then, and then, and then look like, you know, that, that, that number one, that worked and that, that was the world I grew up in.Like I was, I was a Unix guy. You know, sort of from, call it 1988 to, you know, kind of all, all the way through my work and it worked really well. It, it’s in the background, um, you know, nor normal people don’t need to, didn’t need to necessarily know about it, but like, if you were doing like system architecture, application development, you, you, you knew all about it.Um, and then, you know, it’s been in the background ever since. And, you know, look, your Mac still has a Unix shell, you know, kind of in there, and your iPhone still has a Unix shell kind of buried in there somewhere. So they’re kind of in there. And then, you know, the Windows shell is kind of a, you know, sort of a weird derivative of that.But, um, you know, but look, the inter, the internet runs on Unix, um, and that smartphones, actually, both iOS and Android are Unix derivatives. And so, you know, kind of Unix did end up winning. But, but anyway, and then we just started taking that for granted. And then, and then so, so basically the, the way I think about what happened with Pie and then with Open Claw is basically what those guys figured out is, I always say the, the great breakthroughs are obvious in retrospect, right?Which is the best kind, the best kind. They weren’t obvious at the time or somebody else would’ve done them already. Um, and so there is a, like a real conceptual leap, but then you look at it sort of the backwards looking and you’re just like, oh, of course. Mm-hmm. Like the, the, to me those are always the best breakthroughs.Well, actually language models themselves are like that. It’s just like, oh, next token completion. Oh, of course.swyx: Yeah. What other objective mattered?Marc: Yeah, exactly. But, but like it, right. But she’s even saying it wasn’t obvious until somebody actually did it. Right. And so the conceptual breakthrough is real and deep and powerful and, and very important.And so the way I think about pie and olaw is it’s basically marrying the, the language model mindset to the un to the Unix, basically shell prompt mindset. And so it’s, it’s basically this idea that what, what, so what is an agent, right? And as, as, and as you know, like many smart people who have been trying to figure out what an agent is for, for, for decades, and they’ve had many architectures to build agents and the whole thing.And it turns out what is an agent. So it turns out what we now know is an agent is the following. It’s, so it’s a language model. And then above that, it’s a ba, it’s a bash shell. Um, so it’s a, it’s a Unix shell, and then it’s, and then the agent has access, uh, has access to, to the shell. And, you know, hopeful, hopefully in a sandbox, maybe in, maybe in a sandbox.So it’s, it’s the model. Um, it’s the shell. Um, and then it’s a fi, it’s a file system. Um, and then the state is stored in files. And then, you know, there’s the markdown format for the, you know, for, for the files themselves. And then, and then there’s basically what in Unix is called Aron job. There’s a loop and then there’s a heartbeat for the, there’s heartbeat and, and the thing basically Wake Wakes up.Wakes up. So it’s basically LLM plus shell, plus file system, plus markdown, plus kron. And it turns out that’s an agent. And, and, and every part of that, other than the model is something that we already completely know and understand. And in fact, it turns out that like the latent power of the Unix shell is like extraordinary because basically like all, like, there’s just like an, there’s just enormous latent power in the shell.There’s enormous numbers of Unix commands, there’s enormous number of command line interfaces into all kinds of things already in the, you know, your entire, I mean your entire, just to start with, your computer runs on a shell. If you’re running a Mac or a, or, or a phone, your computer, your computer’s running on a shell, uh, already.And so like the full power of your computer is available at the command line level. Um, and then it turns out it’s really easy to expose other functions as a command line interface. And so like this whole idea where we need like MCP and these like product mm-hmm. Fancy protocols, whatever, it’s like, no, we don’t, we just need like a command, command line thing.So that’s the architecture. And then it turns out what is your agent? Your agent has a bunch of files starting a file system. And then there’s the thing that just like completely blew my mind when I write my head around it as a result of this, which is like, okay. This means your agent is now actually independent of the model that it’s running on.Because you can actually swap out a different LLM underneath your agent and your, your agent will change personality somewhat. ‘cause the model is different, but all of the state stored in the files will be retained.swyx: Yeah. Different instruction set, but you just compiledit.Marc: Right, exactly. And it’s all right.It’s like right. Swapping out a ship and recompiling, but it’s, it’s still, it’s still your agent with all of its memories. Um, and with all of its capabilities. And then by the way, you can also swap out the shell, uh, so you can move it to a different execution environment that is also, is also a b shell, by the way, you can also switch out the file system, right.Uh, and you can, and you can, and you can swap out the, the, the heartbeat for the, the crown framework, the, the loop that the agent framework itself. And so your agent basically is ba basically at the end of the day, it’s just. It’s just, its files. Um, and then, and then there’s of course it a openswyx: call.Marc: Yeah, it’s, it’s basically, it’s, it’s just the files.Um, and then by the way, as a consequence of that, the agent and then the agent itself, it turns out a couple important things. So one is it, it’s, it, it can migrate itself, right? And so you’re, you can instruct your agent, migrate yourself to a different, uh, runtime environment, migrate yourself to a different file system, migrate yourself to a different, you know, swap out the language model.Your agent will do all that stuff for you. And then there’s the final thing, which is just amazing, which is the agent is the agent actually has full introspection. It actually, it actually knows about its own files and it could rewrite its own files. Right. Which by the way, is basically no widely deployed software system in history where the, the, the thing that you’re using actually has full introspective knowledge of how it itself works and is able to modify itself.Like that, that, I mean, there have been toy systems that have had that, but there, there’s never been a widely deployed system that has that capability and then that leads you to the capability. That just like completely blew my mind when I wrap my head around it, which is you can tell the agent to add new functions and features to itself and it can do that.Extend yourself. Yeah. Right? Extend, extend yourself. Like extend yourself. Give yourself a new capability. Right? And so, and so literally it’s just like you run into somebody at a party and they’re like, oh, I have my open claw, do whatever, connect to my eat, sleep bed, and it gives me better advice and sleep.And you go home at night and you tell your claw, or if they’re at the party, by the way, you tell your claw, oh, add this capability to yourself. And your claw will say, oh, okay, no problem. And it’ll go out on the internet and it’ll figure out whatever it needs and then it’ll go out to claw code or whatever.It’ll write whatever it needs. And then the next thing you know, it has this new capability. And so you don’t even have to, like, you can have it upgrade itself without even having to, without having to do anything other than tell it that you want it to do that. And so anyway, so the, the combination of all this is just, I mean, this is just like a massive, incredible, I mean, it’s just incredible.Like if I, if I were, if I were 18, like this is a hundred, this is what I would be spending all of my time on. This is like such an incredible conceptual breakthrough. Yeah. And again, pe people are gonna look at it and they already get this response. People are gonna look at it and they’re gonna say, oh, well, where’s the breakthrough?‘cause these, the, all of these components were already known before. Mm-hmm. But, but this is the key, the key to the breakthrough was by using all these components that were known before, you get all of the underlying capability of that’s buried in there. And so all, and so for example, computer use all of a sudden just kind of falls, trivi, trivial.Of course it’s gonna be able to use your computer. It has full access to the shell. Right. And then, and then you just, you, you give it access to a browser, and then you’ve got the computer and the browser and, and often away it goes. And, and then you’ve got all the abilities of the browser also. Um, yeah.And so, and so the capability unlock here is profound. My friends who are, you know, deepest into this, are having their claw do like a, like, literally like a thousand things in their lives. They have new ideas every day. They’re just like constantly throwing new challenges at the thing. And by the way, it’s early and, you know, these are, you know, these are prototypes and there are, you know, as you guys know, there’s security issues.Yeah. And, and so, you know, there’s a bunch of stuff to be ironed out, but the, the unlock of capability is just incredible.swyx: Yeah.Marc: And I, I have absolutely no doubt that everybody in the world is gonna, is gonna have at least, you know, an agent like this, if not an entire family of agents. And we’re gonna be living in a world where I think it’s almost inevitable now that this is the way people are gonna use computers.swyx: I was gonna say for someone who is deeply familiar with social networks, the next step is your claw talking to my Claw. Mm-hmm.Marc: Postingswyx: on Claw Facebook, uh, posting their jobs on cloud LinkedIn and close posting their tweets on claw XAI or what, whatever, you know. Um, I do think that that is how, uh, you know, we, we get into some danger there in, in terms of like alignment and whether or not we want these things to, to, to run.Marc: You guys know where Rent a, rent a human.com.swyx: Yeah. Rent a,Marc: yeah. Yeah.swyx: I mean, it’s Fiverr, it’s TaskRabbit.Marc: Sure, of course.swyx: MechanicalAlessio: Turk.Marc: Yeah. But flipped, right. The agent hiring the people.Alessio: Yeah.Marc: Which of course is gonna happen, right? It’s obviously gonna happen.Alessio: I’m curious if you have any thoughts on the engineering side.So when you build the browser, the internet, you know, just a bunch of mostly plain text file plus some images, and today the, every website and app is like, so complex. Somehow, you know, the browser kept evolving to fit that in. Mm-hmm. Are there any design choices that were made like early in the browser and kinda like the internet and the protocols that you’re seeing agents similar to this?Like, Hey, this thing is just not gonna work for like this type of new compute and we should just. Rip it out right now.Marc: There were a whole bunch, but I’ll give you a couple. So one is, um, and we didn’t, you know, to be clear like this, this was not, you know, this is totally different. We didn’t have the capabilities we have today, but because Wet have, we didn’t have the language models underneath this, but, um, we did have this idea that human readability actually mattered a great deal.Um, and, and, and so, and specifically in those days, it was, it was not so much English language, but it was there, there was a design decision to be made between binary protocols and text protocols. And basically every, every, every basically old school systems architect that had grown up between like the 1960s and the 1990s basically said, you know, the internet, it’s, what do you know about the internet?It’s star for bandwidth. You, you just, you have these very narrow straws. Uh, you know, look, people, when we did the work on Mosaic, like pe, people who had the internet at home had a 14 kilobit modem, right? So you’re, you’re trying to like hyper optimize every bit of data mm-hmm. That, that travels over the network.And so obviously if you’re gonna design a protocol like HGTP, you’re gonna want it to be binary, you know, highly compressed, binary protocol for maximum efficiency. And you’re gonna wanna have it be like a single connection that persists. And you’re, you’re, the last thing you’re gonna wanna do is like, bring up and tear down new connections.And you definitely, you’re not gonna, not gonna want a text protocol. And so of course we said no. We actually want to go completely the other direction. It’s obviously, we only want text protocols. Uh, by the way, same thing in H TM L itself. We want html to be relatively verbose. You know, we want the tags to actually be like human readable.Um, we wanna useswyx: the most inefficient things possible.Marc: Yeah, we wanna do the, we wanna do the in, we wanna do the inefficient things.swyx: You’re the original token Mixer.Marc: Yeah, exactly. Yeah, yeah, yeah. Basically it’s just like better lessonAlessio: filled.Marc: Well, yeah. Well actually this was, this was actually the, the conscious thing, which basically says just like assume, assume a future of infinite, infinite bandwidth built for that, right?And then basically what it was, is it was a bet that it, it was a bet that if the system, if the, if the latent capabilities of the system were powerful enough, and that was obvious enough to people that would create the demand for the bandwidth that would cause the supply of bandwidth to get built that would actually make the whole thing work.And then specifically what we wanted was we wanted everything to be human readable because we, at the engineering level, we wanted people to be able to read the protocol coming over the wire and be able to understand it with their, with their bare eyes without having to like disassemble it or whatever.Right. Have it converted outta binary. Right. And so the, the, the, all the pro, you know, HTTP and everything else were, were, it was always, uh, text protocols. Uh, and the same thing with HTML and in, in many ways, some people say that the key breakthrough in the browser was the view source option, um, which is every webpage you go to, you could view source, which means you could see how it worked, which means you could teach yourself how to build right new, uh, to, to build new webpages.There was that. So human readability. Um, and, and again, human readability in those days still meant technical, you know, specs. You know, now it means English language, but there’s an incredible latent power in giving everybody who uses the system the option to be able to drop down and actually understand and see how it’s working.And that worked really well for the web and I think it’s working really well for ai. That was one. Um, what was the other, um. A big part of the idea of web servers was to actually surface the underlying latent capability of the operating system and to be able to surface the, uh, also the underlying latent capability of the database because basically what was a web server?What, what, what, what is a web server? Fundamentally? Architecturally, it’s, it’s, it’s the operating system. So it’s, it’s the operating system’s ability to, you know, it’s running on top of an os. So it’s the OSS ability to manage. The file system and do everything else that you wanna do, process everything. Um, and then of course, a lot of early, you know, a lot, a lot of websites are, are front ends to databases.Um, and so you wanted to, you wanted to unleash the underlying latent power of whether it was an Oracle database or some other, you know, some other Postgres or whatever, whatever it was. Um, and so a lot of the function of the web server was to just bridge from that internet connection coming in to be able to unlock the underlying power of the OS and the database.Uh, and again, people looked at it at the time and they were like, well, is this really, does this really matter? Like, is this important Because we’ve had databases forever and we’ve always had, you know, user interfaces for databases and this is just another user interface for a database. And it’s like, okay, yeah, fair enough.But on the other side of that is just like, this is now a much better interface to databases and one that 8 billion people are going to use and is going to be like, far easier to use and far more flexible. And, and, and, and you’re not just gonna have old databases. Now you have a system where people can actually understand why they want to build, you know, a million times more database apps than they have in the past.And then the number of databases in the world exploded. And so again, this goes to this thing of like building, building in layers. Some of the smartest people in the industry look at any new challenge and they’re like, okay, I’m, I’m, I need to build a new kind of application. So the first thing I need to do is build a new programming language, right?And then the next thing I need to do is build a new operating system, right? And then the next thing I need to do is I need to build a new chip. Right? And they, they kind of wanna reinvent everything. And I’ve, I’ve always had, maybe it’s just, I don’t know, pg pragmatic mentality or something, or maybe an engineering over science mentality, but it’s more like, no, you have just like all of this latent power, uh, in the existing systems and you, you don’t want to be held back by their constraints, but what you wanna do is you wanna kinda liberate that power and open it up.Yeah. And so I, I think, I think, and I think the web did that for those reasons. And I think it’s the same thing now that’s happening. It’s a greatswyx: perspective on the web.Alessio: Programming language just is not a good thing. We have Brett Taylor on the podcasts and we were talking about rust. And you know, rust is memory safe by the phone.So why are we teaching the model to not write memory, unsafe code, just use rust, and then you get it for free. How much do you think there’s like. Time to be spent like recreating some of these things instead of taking them for granted. I’ll be like, oh, okay. Python is kind of slow Pythonswyx: type scripts,Alessio: you know?It’s like, yeah.swyx: As, as imperfect as they are, they are the lingua franca.Marc: I mean, I think this is gonna change a lot. ‘cause I don’t think the models care what language they program in. Mm-hmm. And I think they’re gonna be good at programming in every language, and I think they’re gonna be good at translating from any language to any other language.Like, okay, so this gets into the coding side of things. I, I think we’re going through a really fundamental change. And then, look, I, I grew up hand, you know, I grew up hand code, you know? Yeah, yeah, yeah. I grew up hand coding. Everything I did was actually everything I did actually was written in CI wasn’t evenAlessio: back in the days,Marc: I wasn’t even using c plus plus, so I, or like Java or any of this stuff.Right. Uh, and so, um, I, everything, everything I ever did, I was like managing my own memory at, at, at the level of c and then I, you know, I, I’m still from the generation that, you know, I, I knew assembly language and, you know, I, I, you know, um, so I, I could drop down and do things, uh, right on the ship. And so we, we’ve just, we’ve all, all of us, we’ve always lived in a world in which software is like this precious thing that like, you have to think about very carefully.And it’s like really hard to generate good software. And there’s only a small number of people who can do it. And like, you have to be very, like, jealous in terms of thinking about like, how do you allocate, like what are your engineers working on and how many good engineers do you actually have? And how much software can they write?And how can, how much software can human beings, you know, kind of maintain? And I think like all those assumptions are being shot right out the window right now. Like, I think they’re, I, I think those days are just over. And I think the new world is like, actually high quality software is just like infinitely available.Mm-hmm.Marc: And if you need new software to do X, Y, Z, like, you’re just gonna wave your hand and you’re gonna get it. And then if it’s, if you don’t like the languages written in, you just tell the thing, all right, I want the, now I want the rush version. Um, or, you know, se secure, you know, secure. We’re about to, by the way, we’re about to go through computer security is about to go through the most dramatic change ever, which is number one, like every single latent security bug is about to be exposed,swyx: right?Marc: So we’re gonna have like, the in, we’re, we’re, we’re set up here for like the computer security apocalypse for a while. But, but, but on the other side of it, now we have a coding agents that can go in and actually fix all the security bugs. And so how, how are you gonna secure a software in the future?You’re gonna tell the, tell the bot to secure it, and it’s gonna go through and, and fix it all. And so, so this thing that was this incredibly scarce resource of high quality software is just going to become a completely fungible thing that you’re just gonna have as much as you want, right? Uh, and, and that has like, you know, that has like tons and tons of consequences in some sense.The answer to the question that you posed, I, I think it’s just somewhat, I don’t know, simple or something, or straightforward, which is just, if you want all your software and rust, you just, all the bot, you want all your software and rust, like, things that used to be like hard or even like, seem like an insurmountable mountain to get to get through all of a sudden, I think, become very easy.swyx: I, I think Brett had a theory that there would be a more optimal language for lms. And so the contention is, uh, there isn’t like, just don’t bother, just whatever humans already use LMS are perfectly capable, porting.Marc: I think we’re pretty close to being, I don’t know if this would work today. I think we’re pretty close to being able to ask the AI what would its opt optimal language be and let Right, and let it design it.True. Okay, here’s a question. Are you gonna even gonna have programming languages in the future? Um, or the ai, are the AI just gonna be emitting binaries? Let’s assume for a moment that humans aren’t coding anymore. Let’s assume it’s all bots. The bot. What levels of intermediate abstraction do the bots even need?swyx: Yeah.Marc: Or are they just coding binary directly? Did you see there’s actually an experi, somebody just did this thing where they have a, they have a, a language model now that actually emits model weights for a new language model. Right. And so will the bots be justAlessio: predict the weightsMarc: Will, yeah. Will the bots literally be emitting not just coding binaries, but will they, will, will they actually be admitting weights for, for new models?Yeah. Direct directly and. Conceptually, there’s no reason why they can’t do both of those things. Uh, like architecturally. Both of those things seem completely possible. It’sswyx: very inefficient. You’re basically veryMarc: inefficient.swyx: A simulation of a simulation in a simulation inside of the weights. Correct?Marc: Yeah, yeah. Very inefficient. But like, look, LMS are already like incredibly inefficient. Ask an uh, in favor thing, ask Claude, add two plus two equals four. Right? It’s just like, you know, it’s like, you know, it’s, it’s, it’s like whatever, billions and billions of times more inefficient than using your pocket calculator.swyx: Yeah.Marc: But, but, but yet the, the, the payoff is so great of the general capability. And so anyway, like I, I kind of think in 10 years, like, I’m not sure. Yeah. Like, I’m not sure there will even be a salient concept of a programming language, um, in the way that we understand it today. And in fact, what we may be doing more and more is a form of interpretability, which is we’re trying to understand why the bots have decided to, uh, structure, uh, code in the way that they have.swyx: I mean, if you play it through, you don’t need browsers, then like, that’s the depth of the browser.Marc: Well, so I, I would take it a step further, which is you may not need to use your interfaces. So who is gonna use software in the future?swyx: Other bots.Marc: Other bots. Yeah. Yeah. Andswyx: so you still need to, I don’t know, pipe information in,Marc: do we?swyx: And outMarc: reallyswyx: well, what are you gonna do then?Marc: Are you sureswyx: you’re just gonna log off and touch grass?Marc: Whatever you want. Exactly. Isn’t that better?swyx: I want software to do stuff for me.Marc: Isn’t that? But isn’t that better? I mean, look, I, you know, I don’t know. Look like, you know, you know, you, all the arguments here, you know, it was not that long ago that 99% of humanity was behind a plow.swyx: Right.Marc: Right. And what are people gonna do if they’re not plowing fields all day to, to, to grow food? Right. And it just turns out there’s like much better ways for people to spend time than plowing fields. Yeah.swyx: Dooms growing.Marc: Uh, yeah, exactly. Exactly. Or, you know, talking to their friends and look, and I’m not an absolutist and I’m not a utopian.And I, and to be clear, like I’ve, I have an 11-year-old and he’s learning how to code and like I’m, you know, I, I think it’s still a really good idea to learn how to code and so forth, but I just, if you project forward, you just have to think forward to a world in which it’s just like, okay, I’m just gonna tell the thing what I need and it’s gonna do it, and then, and then it’s gonna do it in whatever way is most optimal for it to do it.Mm-hmm. Yeah. Unless I tell it to do it non optimally. Like if I tell it to do it in Java or in Rust or whatever, it’ll do it, I’m sure. But like, if I’m just gonna tell it to do, it’s, gonna do it in whatever way is like the optimal way to do it. Yeah. And then I, and then if I need to understand how it works, I’m gonna ask it to explain to me how it works.Right. And so it’s gonna be doing its own, interpret it, it’s gonna be the engine of interpretability to explain itself. And I, I just am not convinced that, that I’m not, I’m not convinced that in that world you have these historical, the goals of the abstractions will be whatever, the Boston network with the human Right.Alessio: Yeah. Yeah. That, well, I, I’m curious like. If that’s true, then shouldn’t the models providers be building some internal language representation that they can do extreme, kinda like rl uh, and reward modeling around, because it’s like, today they’re kind of like tied to like type script and Python because the users need to write in that language versus they can have their own thing internally and like they don’t need to teach it to anybody.They just need to teach their model. And I think that’s how you get maybe the version between the models, like going back to like the pie open claw thing. It’s like, oh, I built all the software using the open AI model and now switch to the RO model. But the TRO model doesn’t understand the thing. So I I, it feels like there still needs to be some obstruction.But maybe not. Maybe that’s the lockin that the model providers want to have. I don’t,Marc: I’m not even sure that’s lockin though. ‘cause why can’t the second model just learn what the first model has done? Like,swyx: exactly.Marc: Okay. So okay. Give you an example. So as you know, models can now reverse engineer software by, right?Isn’t it the whole thing now where people are reverse engineering, like Nten, Nintendo, gay binaries. Yeah. So you, you have like there’s, I’ve seen a bunch of reports like this where somebody has like a favorite game from the 1980s and the source code is like long dead, but they have like a binary brand to do a chip or something, another reverse engineer to get a version that runs in their Mac.Right. And so if you reverse it, if, this is why I kinda say if you’re reversing like X 86 binaries, then why can’t you reverse engineerAlessio: whatever the degree. Yeah. And because we’re all on a Unix based system, it has to be reversible because it needs to run on the target.Marc: Yeah, yeah, yeah, yeah, yeah. Basically.And so I just, I just think it’s this thing where it’s just like, and by the way, and everything we’re describing is something that human beings in theory could have done before, but just with like, right. Yeah, yeah. But with enormous where, but it was just always like cost and labor prohibitive. Reverse engineer.I learned how to reverse engineer. Human beings can reverse engineer binaries. Yeah. It’s just for any complex binary, you need like a thousand years mm-hmm. To do it. But now with a model, you don’t. And so all of a sudden you get, you get these things. Or, or another way to think about it is so much of human built systems are to compensate for the human limitations.swyx: Mm-hmm.Marc: Yep. Right? Um, and if you don’t have the human limitations anymore, then all of a sudden you have, and, and it’s not that you, you won’t have abstractions, but you’ll have a different kind of abstraction. Yep. Yep.swyx: I have two topics to bring us to a close. And, uh, you could pick whichever ones. Uh, just talking about protocols, was it you or someone else?Uh, I forget my internet history. Who said that? Like the biggest mistake that we didn’t figure out in the early days was payments. Yes. Was that you?Marc: Yes. Itswyx: was a 4Marc: 0 2swyx: 0 2 4Marc: 0 2 payment required.swyx: We have a chance now. Nope. I don’t think we’re gonna figure it out. I don’t know. Like, what’s your take?Marc: Oh, I think, we’ll, yeah, no, now I think it’s gonna happen for sure.swyx: Yeah.Marc: Yeah. And there’s two reasons to example for sure. One is we actually have internet native money now in the form of crypto. Stable coins. Stable coins and crypto. And this is, I, I think this is the grand unification basically of ai, crypto, uh, is what’s about to happen now. Um, I think AI is the crypto killer app, I think is where, where this is really gonna come out.Um, and then the other is it’s just, it, I mean it’s just, I think it’s now obvious. It’s like obviously AI agents are gonna need money and it’s already happening, right? If you’ve got a c if you’ve got a claw and you wanted to buy things for you, you have to give it money in some form.swyx: I would say the adoption’s probably like 0.1% if, if that, but Yeah.Marc: Oh, today? Yeah. Yeah, yeah. But think, think forward, like where is it goingswyx: forward thinkingMarc: The ultimate principle of everything and, and everything that I think I, we, we do is, it’s the William Gibson quote, which is, the future is already here. It just isn’t distributed. Mm-hmm. It isn’t, isn’t distributed yet.My friends who are the most aggressive use users of, of, of, of open claw, just like have given their clause bank accounts and credit cards. Um, and, and, and, and, and not only have they done it. Obvious that they needed to do it because it’s obvious that they needed to be able to spend money on their behalf.swyx: Yeah. Yeah.Marc: It’s just completely obvious. And so, and again, like, so the number of people who have done that today to your point is like, I don’t know, probably 5,000 or something. Yeah. Butswyx: it’ll grow.Marc: That’s how these things startswyx: actually, I mean, since, uh, you keep mentioning,Marc: and by the way, open cloud, by the way, if you don’t give it a bank account, it’s just gonna break into your, your, it’s gonna break high agency, it’s gonna break into your bank account anyway, and, and take your money.So you, you might, as you might as well do it, you might as well do it,swyx: uh,Marc: by the way. I really love, I gotta tell you, I really love the phenomenon. I love the Yolo. Um, I’m not doing it myself to be clear, but, but I love the people that are just like, yeah, what, what is it? Skip, skip, vision,swyx: danger, skip.Marc: Dangerous.swyx: Which by the way, is a Facebook thing.Marc: Okay?swyx: Right. Because, uh, because we, uh, in Facebook, they, they have this culture to name the thing dangerous, so that you are aware when you enable the flag that you are opting into a dangerous thing.Marc: Okay, good.swyx: And they brought it into open ai and of course thatMarc: makes it enticing.swyx: Sam runs Codex, uh, with skip permissions on, on his laptop.Marc: Yes, a hundred percent. And so I, I th I think the way to actually see the future is to find the people who are doing that. There’s a man, you know, and they, you knows,swyx: log everything, you know, just watch it, watch the logs,Marc: but. Let’s actually find out what the thing can do.Yeah. And the way to find out what the thing can do is just like, try everything. Yeah. Let it try everything. Let it unlock everything. By the way, that’s how you’re gonna find all the good stuff it can do. By the way. That’s also how you’re gonna find all the flaws. Yeah. I think the people who turn that on for bots are like, they’re, they’re like martyrs to the progress of human civilization.Like, I feel very bad for their descendants that their bank accounts are gonna get looted by their bots in the first like 20 minutes. But I think the contribution that they’re making to the future of our species is amazing.swyx: It’s like gentleman science, you know?Marc: Yes. It’s, yes, yes. Experi yourself. It’s, uh, Ben Franklin out with the, trying to try, trying to get lightning to strike his, his, uh, his balloon and see, seeing if he gets electrocuted.swyx: Yeah.Marc: It’s, uh, Jonas sk with the polio vaccine, right. Injecting it. Yes. So, yes. I, I, I, I think we should have, like agl, we should have like flags and like we should have like monuments to the people that just let open club run their lives.swyx: More anecdotes of like, what, what are the craziest or interesting things that people listening to this should go, go home and do.Marc: I mean, this is, this is the, this is the, the extreme thing is just like the straight Yolo, like just Yeah. Turn, turn your lifeswyx: on. I mean, that’s a general capability. Yeah. Yeah. Is there like a specific story that was like, wow. And, and everyone in a group chat just lit up.Marc: I mean, like, you know, so there’s tons of, there’s already tons of health, you know, there’s the health dashboard stuff is just, is just absolute personal health.Absolutely amazing. Yeah. The number of stories on, um, I just don’t wanna violate people’s, you know, obviously personal. Yeah. Anonymized. But, um, you know, one of the things open clouds are really good at is hacking into all this stuff in your land. Uh, it’s really good. So, you know, internet of things. AKA internet of s**t.swyx: Yeah.Marc: Likeswyx: super insecure, but great. It’s discoverable.Marc: Yeah, it’s discoverable. O open claw is happy to scan your network, identify all the things. And then my, my, my friends who are most aggressive at this are having open claw take over everything in their house.swyx: Yeah.Marc: Take it takes over their security cameras.It takes over their, their, you know, their whatever their, their access control systems. It takes over their webcams. I have a friend whose claw watches him sleep. Put a webcam in your bedroom. Put the, put the claw, put the claw on a loop. Uh, I have it. Wake up frequently and have it watch, just tell it, watch me sleep.And, and I’ve, I’ve seen the transcripts and it’s literally like Joseph asleep. This is good. This is good that Joe’s asleep. ‘cause you know, I have, I have his health day and I know that he hasn’t been getting enough sleep and so it’s really good that he’s getting sleep. I really hope he gets his full, whatever, you know, five hours of REM sleep.Uh, Joe’s moving. Joe’s moving. Um, uh, Joe might be wake waking up. This is a real pro. If Joe wakes up now, he is gonna ruin his sleep cycle. Oh, okay. It’s okay. Joe just rolled over. Okay. He’s gone back to bed. Okay, good. Alright. Okay. I can relax. This is fine. He’sswyx: monitoring the situationMarc: monitoring, monitoring the situation, and, and being a bot, like, you know, is just like very focused, right?It’s just like, uh, this is like, its reason for existence is to watch Joe sleep. And then, and then I was talking to my friend who did this is like, you know, on the one hand it’s like, all right, this is weird and creepy. Um, and I need to, I need to, maybe this has taken over my life. And then the other thing is like, you know what if I had a heart attack in the middle of the night, this thing literally would like freak out and call 9 1 1.Like, there’s no question. This thing would figure out how to like, alert medical authorities and like, prob probably some in SWAT teams and like, do whatever would be required to save my life. Right? And so it’s like, you know, like, yeah. Like that’s happening. What else? Um, I’ll give, I, um, uh, it’s a company unitary, uh mm-hmm.That makes the robot dogs. Um, and I, I actually have one at home, which is, it’s actually really fun. The Chinese companies, the Chinese companies are so aggressive at adopting, uh, new technology, but they don’t always like, listen, take the time to really.swyx: Package it,Marc: package it, and maybe think it all the way through.And so, so the, at least the industry dog I have, so it, it has a old non LLM just control system, which by the way is not very good in, in markets. Well, but it, in practices, it’s not that good. It has trouble with stairs and so forth. And so it’s not quite what it should be. But then the language model thing comes out in the voice.So they, they add, so they add LLM capability and then they, they add a voice mode to it. Um, but, but that LLM capability is not at all connected to the control system. So, so you’ve got this schizophrenic dog that like, is a complete idiot when it comes to climbing the stairs, but it will happily teach you quantum mechanics.Right. In like a lum English accent. Right. Like, it, it, it is just like absolutely amazing. Jagged intelligence. Yeah. Yeah. Talk about jagged and then, now obviously what’s gonna happen in the future is, is they’re gonna connect together, but they’ll do it. But right now it’s, it’s, and so right now it’s not that useful.And so I, I have a friend who has one of these who had his claw basically hack in and rewrite the code Rew write new firmware. Yeah. Write new firmware for the, for the unit robot. Ooh. And now it’s, now it’s an actual pet dog for his kids.swyx: You could do that before or after like. The motion.Marc: Yeah. It’s, he said it’s completely different.He said it’s a complete transformation. Yeah. And whenever there’s an issue in the thing, now the claw just like reiterates the code. You know, you know, you goes in, it does, does the code and so is it kind of goes to your thing here. So, so like all of a sudden, uh, this is why the way we wanna think about AI code AI coding is not just like writing new apps.It’s also going in and rewriting all the old stuff that should have worked that never worked. And so, like, I, I think, I think basically, I think the internet, the internet of s**t is basically over. Like, I, I think everything, there’s a potential here where like all these devices in your house that have been like basically marginal or you know, basically dumb, you know, like all of a sudden they might all get really smart.Now you have smartswyx: home.Marc: You have to decide if, yes, there are horror movies in which this is just, of which this is the premise. And so you have to decide if you want this. Yeah. But, but, but this is the first time I can say with confidence, I now know how you could actually have a smart home. Yeah. Yeah.With 30 different kinds of things with chips and internet access, where it actually all makes sense and all works together and it’s all coherent in the, in the whole thing. And to have that unlock without a human being having to go do any of that work, like, you know.swyx: You know, I, I’m, I’m waiting for a, sorry, mark.Uh, I can’t let you open that fridge door, you know, likeMarc: Exactly, exactly. Yes, yes.swyx: Because Oh, yeah, yeah. You’re not supposed to eat rightMarc: now. I have all of, yes, I have every shred of health information, you know, and I know you think you’re doing, you know, da da da. I didn’t think you do this, but you know, this is a real, are you really, you know, are you really sure?And you know, you told, you know, you told me last night, you really don’t want me to let you do this, so, you know, I’m sorry, but the fridge door is locked. Um, yes. Openswyx: the fridge doors.Marc: Exactly. And by the way, I know you’re supposed to be studying for a test, so why don’t we, why don’t you go when you can pass the test, um, I will open the fridge door for you.Yeah.swyx: Final protocol and then, and then we can wrap up, uh, proof of humanMarc: Yes.swyx: Uh, right.Marc: Yeah.swyx: That’s the last piece that we gotta figure out.Marc: Yeah. So I would say there’s, there’s two massive, I would say, um, uh, sort of asymmetries in the world right now where we’ve known these asymmetries exist and we, we societally have an unwilling to grapple with them.And I think they’re both tipping right now. And, and they’re, they’re, they’re, they’re the same thing. It’s virtual world version. It’s a physical world version. So the virtual world version is, is the bot problem. We’re just like, you know, the internet, internet is just like a wash and bots, internet’s a wash and fake people.It has been forever. Um, by the way, a lot of that has to do with lack of money, you know? And so this, you know, this is the Yeah, this is this.swyx: My spicy take was these two are the same thing. And corporations of people too, you know? So interesting.Marc: Yeah, yeah, yeah.swyx: Okay. So a bank account is proof of human.Marc: Yeah.Okay. Yeah. Until you, until you give the bots bank accounts. Yeah, exactly. So, okay. Yeah. So there’s that. But yeah, look, look, the bot, I mean, every social media user knows this. The bot, the bot problem is a big problem. You know, the bot, the bot problem has been a big problem forever. It’s, it’s a huge problem.And it’s never really been confronted directly, like at any point, by the way. The physical world version of this is the drone, the drone problem. Um, right. And so we, we’ve known for, you know, we’ve known for 20 years now that the asymmetric threat both in Milit military and actual military conflict, but also in just like security, like, like, you know, security on the home front.The big threat is, is the cheap attack drone. Right? The, the, the cheap, the cheap suicide, you know, drone with the bomb. And we’ve known that forever. And by the way, like, you know, it’s very disconcerting how like every, you know, every office complex in the, in the co you know, in the world is like unprotected from drone attacks.Um, every, every stadium, every school, every prison. Like, like, sure e okay, we’ve known that, we’ve never done anything about what you gonna doswyx: about it. Yeah.Marc: One possibility is just leave, leave them unprotected forever and live in a world of like, asymmetric terrorism forever. Or the other is take the problem seriously and figure out the set of techniques and technologies required to, to be able to deal with that.Whether those are lasers or jammers or early warning systems, or, you know, allswyx: personal force fields,Marc: kinetic, personal for dune, uh, personal, personal force fields. Exactly. And in both cases, the, these are, these are economic asymmetries. These are economic asymmetries, right? ‘cause it’s really cheap to field a bot, but it’s very hard to tell something, a bot.It’s very cheap to field a drone. It’s very hard. It’s very expensive to defend against a drone. But you see what I’m saying is it’s, it’s, it’s the, it’s the virtual version of the problem, and it’s the physical version of the problem. Uh, the virtual version of the problem. What we, what we need quite literally is proof of human.The reason is because you’re, you’re, you’re not gonna have proof of bot. The, the, the, especially now the, the bots are too good. The, the, the bots can pass the Turing test. And if the bots can pass the Turing test, then you can’t, you can’t screen for bot. You can’t have proof of not a bot. But what you can have is you can have proof of human, you can have, you know, cryptographically validated, this is definitely a person, and this is, and then you can have cryptographically validated.This is definitely like something that a person said, yeah, this video is real. Right. Um,swyx: just to double click on, on, uh, do you think Alex Lanya with world? Yeah. Do you think he’s got it or is there an alternative?Marc: Oh, so I mean, there’s gonna be, I think there’ll be, I think many people will try, we’re one of the key, you know, participants in, in, in the World, in the World Project.I dunno that, yeah. So we’re, we’re partisans, but yeah, I, I think so we think world is exactly correct. Okay. And, and the reason is it, it has, it has to be, it, it has to be proof of human. It it has, because you can’t do proof of not bought. You have to do proof of human to do proof of human. You, you need, you need biological validation.You, you needed to start with this was actually a person, right? Because otherwise your bot signing up as fake people. Right? So you, you have to have like something, you have to have a bi. Biometric. And then you have to have cryptographic validation. And then the ability to do, to do, to do the lookup. And then, by the way, the other thing you need, which that you, you also need selective disclosure.Um, so you need to be able to do proof of human without reviewing privacy, all the underlying information. Privacy. Yeah. By the way, another thing you’re need, you’re gonna need proof of age, right? ‘cause there’s all these laws in all these different countries now around you need to be 13 or 16 or 18 or whatever to do different things.And so you’re gonna, you’re gonna need a, you know, sort of validated proof of age, um, you know, to be able to legally operate, right? And so that, that’s coming. And then you’re gonna want like, proof of credit score and, you know, proof of like, you know, a hundred other things.swyx: That’s a tricky one.Marc: It is a tricky one, but you’re gonna, you’re gonna, there, there’s no reason, like if somebody’s checking on your credit, somebody shouldn’t, I’ll give you an example.Somebody shouldn’t need to know your name in order to be able to find out whether you’re credit worthy.swyx: Right? I see. Independently verifiable pieces of information.Marc: Pieces of information, yeah. It’s like selectively disclosed. And this is the answer to the privacy problem wr large, which is, I, I only need to prove, I need to prove at that moment.So like, you’re gonna need that. And I, I think their, their, their architecture makes sense. So that needs to get solved. I think language models have tipped, the bots are now too good. Uh, and, and, and so they’re undetectable. And so as a consequence, you, we now need to go confront that problem directly. And then, and like I said, and then the other problem is we, we need to go actually confront the drone problems.The Ukraine conflict has really unlocked a lot of thinking on that. And now the, um, and now the, the, the, the, the Iran situation is also unlocking that. And so I think there’s gonna be just like this incredible explosion of, of both drone and counter drones.swyx: Our drones are better than their drones to keep it that way.Marc: Yeah. Yeah. And counter drones,Alessio: I think we can sneak in one more question. Go for it. Um, I’m trying to tie together a lot of things that you said over the years. So at the Milken Institute debate with Teal, which is amazing. Um, you talked about the lag between a new technology and kinda like the GDP, um, impact of it.Marc: Yep.Alessio: The other idea you talked about is bourgeois capitalism and how, you know, this kind of managerial class was needed because of this complexity. And I think if you bring AI into the fold, you have like much higher leverage of people. So like if you have, you know, the Musk industries, um, and you give Elon a gi, you can run a lot more things That’s right.At once.Marc: That’s right.Alessio: And then you have the social contract. And I know you reviewed a clip of Sam ing, um, we’re rethinking the whole thing, and you’re like, absolutely not. Yes.Marc: Under,Alessio: and I wa I was in an event with Sam last night, uh, and he actually said in the last couple weeks it felt like now people are taking that seriously.Yeah. So I’m just curious like how you’re seeing the structure of organization changing, especially when you invest in early stage companies and, um, yeah, just like how the impact of. Work structure and, uh, all of that is playing out. Yeah.Marc: So there’s a whole bunch of, there’s a whole bunch of topics. I know, yeah.We, we could spend, and by the way, we’d be happy to spend more time, but we could, we could spend more time on all that. So just for people who haven’t followed this, so the, this, this, this term managerial comes from this thinker in the 20th century, James Burnham, who, um, just one of the great kind of 20th century political thinkers, um, societal thinkers.And he sort of said a as, and he was writing in like the 1940s, 1950s. Um, and he said kind of the, the whole history, capitalism until that point had been in two phases. Number one had been what he called bourgeois capitalism, which was think about as like name on the door, like Ford Motor Company. ‘cause Henry Ford runs the company.Um, and Henry, it’s like a DIC dictatorial model. And Henry Ford just like tells everybody what to do. And he said the problem with bourgeois capitalism is it doesn’t scale. ‘cause Henry Ford can only tell so many people to do so many things. And then he runs at a time in the day. And so, um, he said the second phase of capitalism was what he called managerial capitalism, which was the creation of a professional class of managers, um, that are trained not to be like.Car experts or to be whatever experts in any particular field, but are trained to be experts in management. And then that led to, you know, the importance of like Harvard business, you know, business schools and management consulting firms and all these things. And then you look at every big company today, and like most of the executives at most of the Fortune 500 companies are not domain experts in whatever the company does.And they’re certainly not the founders of those, but they’re professional managers. And in fact, in the course of their careers, they’ll probably manage many different kinds of businesses. They’ll rotate around and they might work in healthcare for a while and then work in financial services and then go work in something else, you know, come work in tech.And what Burnham said is he said that transition is absolutely required because the, the, the, the problem with bourgeois capitalism is, is it doesn’t scale. Henry Ford doesn’t scale. And so if you’re gonna run capitalist enterprises that are gonna have millions to billions of customers, um, you’re gonna need to, you’re, they’re gonna be operating a level of scale and complexity that’s gonna require this professional management class.And he said, look, the, the professional management class has its downsides. Like they’re not necessarily experts at doing the thing. They’re not as inventive, you know, they’re not gonna create the next breakthrough thing. But he is like, whether you think that’s good or bad or whatever is what’s gonna be required.And basically that’s what happened. Right. And so he wrote that book originally in like 1940, you know, over the course of the next 50 years, basically. Managerialism. Well, I mean, today, up till today, managerial managerialism basically took over everything. Mm-hmm. And you know, what I’m describing is basically how all big companies run and how all governments run and how are large scale nonprofits run and kind of everything, you know, everything runs basically what, what, what Venture Capital does is we basically are a rump, uh, sort of protest movement to that.To try to find the next Henry Ford or, or just to say El Elon Musk or, or the next, or the next Elon Musk or the next Steve Jobs, or the next Bill Gays. The next Mark Zuckerberg. And so we, we, we, we start these companies in, in the old model, right? We, we, we start them out as, as, as, as in the Henry Ford model.And so we start them out with a founder or a, or a, or a founder with, with colleagues. But you know, there’s the a founder, CEO, um, and then we basically bet that we basically bet that the startup is going to be able to do things, specifically innovate in ways that the big incumbents in that industry are not gonna be able to do.And so it’s a bet that by, basically by relighting this sort of name on the door, you know, kind of thing. Mm-hmm. This new innovative thing with like a king monarchical, uh, uh, political structure, um, that they’re gonna be able to innovate in a way that the incumbent is not going to be able to because the incumbent is, is being run by managers.Right. And, and, and, and by the way, and of course venture being what it is, sometimes that works, sometimes it doesn’t. But we’re, we’re constantly doing that, but I’ve always viewed it my entire life as like, we’re like raging against the dying of the light. Mm-hmm. Like we’re, we’re, we’re, we’re sort of constantly trying to fight off managerialism, just basically swamping everything and everything.Getting basically boring and gray and dumb and old. Right. And we’re trying to keep some level of energy vitality in the system. AI is the thing that would lead you to think, wow, maybe there’s a third model.Alessio: Mm-hmm.Marc: Right? And, and maybe may and way to think about it would be, maybe it’s a combination of the two, maybe the new Henry Ford or the new Elon or the new Steve Jobs plus ai, right.Is the best of both. Right. Because it’s, it’s, it’s sort of the spark of genius of the name on the door model, the Henry Ford model. But then it’s give that person AI superpowers to do all the managerial stuff and let the boss draw the managerial stuff. That may be the actual secret formula. And we’ve never even known that we wanted this because we never even thought it was a possibility.But I mean, you know, this, what is the thing that these bots are really good, they’re really good at doing paperwork. Like they’re really good at filling out forms, right? Like they’re really good at writing reports, they’re really good at reading, they’re really good at doing all the managerial work. Like they’re amazing at it.And so, yeah, so I, I think, I think the, I a hundred percent, I think the answer, the answer very well might be to get the best, best of both worlds by doing this. And then the challenge is gonna be twofold. The challenge is gonna be for the innovators to really figure out how to leverage AI actually do this.Right? Um, and, and then, and then the, the other challenge is gonna be for the, for the incumbents that are managerial, to figure out like, okay, what does that mean? ‘cause now they’re gonna, they’re, they’re gonna be facing a different kind of insurgent competitor that has a different set of capabilities than they’re used to.And so th the, this really I think is gonna force a lot of big companies to kind of figure out innovation. EE either I say figure out innovation or die trying.Alessio: Do you feel like that structure accelerates the impact on the actual GDPN economy? If you look at Space Act? Yes. The growth is like so fast. Yeah.And like, instead of having these companies kind of like Peter out in growth and impact, they can kind of like keep going if not accelerating.Marc: Yeah, that’s for sure. The hope, um, the, the, the challenge and, and you know, and, and look, the AI utopian view is of course, of course. And, and, and that’s gonna be the future of the economy.And it’s gonna grow 10 x and a hundred x and a thousand x. And we’re entering this regime of like much higher economic growth forever and consumer cornucopia of everything. And it’s, it’s gonna be great. And I, and, and I hope that’s true. I hope that’s, that’s like the u you know, that’s the current kind of utopian vision.I hope that’s true. The problem is, it goes back again. The real world is really messy. Um, and I’ll give you an example of how the real world is really messy. It requires 900 hours of professional certification training to become a hairdresser in the state of California. Um, so it’s like 35% of the economy, something like that.You have to get some sort of professional certification to do the job, which is to say that the, the professions are all cartels, right? Yeah. And so you have to get licensed as a doctor. You have to get licensed as a lawyer, you have to get licensed as a. You have to get into a union. Mm-hmm. Um, by the way, to, to work for the government, you need to be, you, you have both civil service protections and you have public sector unions.You have two layers of insulation, uh, against ever getting fired for anything or anything. Anything ever changing. I’ll give you another example. The the dock work. The dock workers one on strike a couple years ago. Mm-hmm. ‘cause they, you know, robotics, you know, if, if you go look at a modern dock, like in Asia, it’s all robots.If you go to American dock, it’s like all still guys, dragon, dragon stuff, by by hand, the dock works. Goes on a strike. It turns out there are 25,000 dock workers working on, on, on, on Docs in America. It turns out they have incredible political power. Mm-hmm. Because it’s a, it’s, it’s one of these un unified blocks of things.They won their strike and so they got commitments from the dock owners to not implement more automation. We learned a couple things in that. So number one, we learned that even a union as small as 25,000 people still has like tremendous political stroke. We also learned that they, it actually turns out the Dock Workers Union has 50,000 people in it.‘cause there’s 20, they have 25,000 people working in the docks. They have 25,000 people during full paycheck sitting at home from prior union agreements. Oh myswyx: God.Marc: From prior union agreements. I’ll give you another great example. There are government agencies, there are federal government agencies where the employees right of have civil service protections and there are in public sector unions.There are entire federal government agencies that struck new collective bargaining agreements during COVID, where not only are they have their jobs guaranteed in perpetuity, but they only have to report to work in an office one day per month. And so there are entire office buildings in Washington DC that are empty 29 outta 30 days of the year that are still operating and are still, we’re all still paying for it.20 and say, and then what they do, it turns out what the employees do is they’re very, they’re very smart in, in, in this way. And so they figure out, they come in on the last day of a month and the first day of the next month. And so and so, they’re, so, they’re in there, they’re in the office two days per 60 days, which means these buildings are empty for 58 days at a time.And you see what I’m, you see where I’m heading with this? Like this is like locked in, right? This is like locked in in a way that has nothing to do with like, and people say capitalist, it’s like anticapitalistic. It’s like, it’s, it’s basically it’s restrictions on trade, it’s restrictions on the ability to like change the workforce.And so, so much of our economy is, is, you know, the, the, I I’m, I’m describing the entire healthcare system. I’m describing the entire legal profession. I’m describing the entire housing industry. I’m describing the entire education system, right? K through 12 schools in the United States. They’re a literal government monopoly.How are we gonna apply AI and education? The answer is we’re not, because it’s a literal government monopoly, it is never going to change the end. And there is nothing to do, by the way, you can create an entirely new school system. Like that’s the one thing you can do, is you can do what Alpha School’s doing.You can create an entirely new school system. Other than that, you’re not gonna go in and change what’s happening in the American classroom, like K through 12. There’s no chance the teachers are 100% opposed to it. It’s a hundred percent not gonna happen. So, so you see what I’m saying is like there’s this like massive slippage that’s gonna take place.Both the AI utopians and the AI dors are far too optimistic.swyx: Right.Marc: You see what I’m saying? Be because they believe that because the technology makes something possible that 8 billion people all of a sudden are gonna change how they behave. And it’s just like, nope. So much of how the existing economy works.Mm-hmm. It’s just, it. It’s just like wired in. And so we’re gonna be lucky as a society, we’re gonna be lucky if AI adoption happens quickly. Right. Because if it doesn’t, what we’re just gonna have is stagnation.Alessio: Awesome. Mark. I know you gotta run.swyx: Yeah. We all know or still welcome. But, uh, it was such a pleasure talking to you.Uh, we’re truly living in the age of science fiction coming to real life.Marc: Yes. Yes. Could not be more exciting. Yeah. Really. Thank you, mark. You guys awesome.swyx: Thank That’s it.Marc: Good. Thank you. That’s it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
We’ve been on a bit of a mini World Models series over the last quarter: from introducing the topic with Yi Tay, to exploring Marble with World Labs’ Fei-Fei Li and Justin Johnson, to previewing World Models learned from massive gaming datasets with General Intuition’s Pim de Witte (who has now written down their approach to World Models with Not Boring), to discussing the Cosmos World Model with with Andrew White of Edison Scientific on our new Science pod, to writing up our own theses on Adversarial World Models. Meanwhile Nvidia, Waymo and Tesla have published their own approaches, Google has released Genie 3, and Yann LeCun has raised $1B for AMI and published LeWorldModel.Today’s guests have a radically different approach to World Modeling to every player we just mentioned — while Genie 3 is impressive, its many flaws demonstrate the issues with their approach - terrain clipping, noninteractivity (single player, no physics/no objects other than the player move), and maximum of 60 second immersion. Moonlake AI (inspired by the Dreamworks logo) is the diametric opposite - immediately multiplayer, incredibly interactive, indefinite lifetime, capable of MANY different kinds of world models by simulating environments, predicting outcomes, and planning over long horizons. This is enabled by bootstrapping from game engines and training custom agents: In Towards Efficient World Models, Chris Manning and Ian Goodfellow join Fan-Yun in explaining why their approach to efficiency with structure and casuality instead of just blind scaling is sorely needed:SOTA models still show physical or spatial understanding glitches, such as solid objects floating in mid-air or moving “inside” other solid objects.If the goal is to plan for the next action, how often is a high-resolution pixel view necessary for modeling the world? Our bet is that there is a disproportionately large share of economically valuable tasks where such detail is not required. After all, humans with a wide variety of sensory limitations have little difficulty doing almost everything in the world. Furthermore, for a large number of purposes, describing a scene or a situation in a few words of language (“the car’s tires squealed as it cornered sharply”) is sufficient for understanding and planning.Experiments also show that humans only partially process visual input in a top-down, task-directed way, often making use of abstracted object-level modeling. In almost all cases, partial representations combined with semantic understanding are sufficient.…If the goal is to facilitate the understanding of causality in multimodal environments, then the world model—whether it is used in the virtual world or the physical world—must prioritize properties such as spatial and physical state consistency maintained over long time periods, and an ability to evolve the world that accurately reflects the consequences of actions. That’s what Moonlake is building.Game engines are the right starting point abstraction to efficiently extract causal relationships, and building the interfaces and community (including their new $30,000 Creator Cup) to kickstart the flywheel of actions-to-observations.We were fortunate enough to attend their sessions at GDC 2026 (the Mecca of Game Devs), and were impressed by the huge variety and flexibility of the worlds people were building with Moonlake’s tools already! Live videos on the pod.Full Video Pod on YouTube!Timestamps00:00 Benchmarking Gets Hard00:47 Meet Moonlake Founders01:26 Why Build World Models03:12 Structure Not Just Scale05:37 Defining Action Conditioned Worlds07:32 Abstraction Versus Bitter Lesson14:39 Language Versus JEPA Debate20:27 Reasoning Traces And Rendering Layer37:00 Gameplay Over Graphics38:02 Fiction Rules And World Tweaks39:15 Code Engines Beat Learned Priors41:10 Diffusion Scaling Limits43:23 Symbolic Versus Diffusion Boundary46:14 Platform Vision Beyond Games50:24 Spatial Audio And Multimodal Latents54:23 NLP Roots Hiring And Moon Lake NameTranscript[00:00:00] Cold Open[00:00:00] Chris Manning: Think this whole space is extremely difficult as things are emerging now. And I mean, it’s not only for world models, I think it’s for everything including text-based models, right? ‘cause in the early days it seemed very easy to have good benchmarks ‘cause we could do things like question answering benchmarks.[00:00:20] But these days so much of what people are wanting to do is nothing like that, right? You’re wanting to get some recommendations about which backpack would be best for you for your trip in Europe next month. It’s not so easy to come up with a benchmark, and it’s the same problem with these world models.[00:00:41] Meet the Founders[00:00:41] swyx: Okay. We’re back in the studio with Moon Lake’s, two leads. I, I guess there’s other founders as well, but, sun and Chris Manning. Welcome to the studio.[00:00:54] Fan-yun Sun: Thanks. Thanks, Chris. Thanks for having us.[00:00:56] swyx: You’ve got, you guys have, come burst onto the scene with a really refreshing [00:01:00] new take of mold models.[00:01:01] I would just want to, I guess ask how you, the two of you came together. Chris, you’re a legend in NLP and just AI in, in, in general. You’re, you’re his grad student, I guess[00:01:10] Fan-yun Sun: Actually my co-founder.[00:01:11] swyx: Oh, yeah.[00:01:12] Fan-yun Sun: I should give a lot of credit to my co-founder, Sharon. Yeah. She was, she was actually working with Professor Fe Androgyn and then she ended up working with, Ron and Chris Manning here.[00:01:22] And then, so I got connected through to Chris initially, actually through my co-founder,[00:01:26] What is Moon Lake?[00:01:26] swyx: what is Moon Lake? What, what is, actually, I’m also very curious about the name, but like why going into world models?[00:01:33] Fan-yun Sun: So I was working a lot. With actually Nvidia research during my PhD years on essentially generating interactive worlds to train reinforcement learning agents or embody EA agents.[00:01:44] And then there’s two observations. One in academia and one in industry. An industry like folks at Nvidia are actually paying a lot of dollars to purchase these types of interactive worlds, whether it’s for the sake of evaluation or training the robots, or policies or models. And [00:02:00] then, in academia, same thing is happening.[00:02:02] And more specifically, when I was actually working with Nvidia on the synthetic data foundation model training project, we were actually generating a lot of these synthetic data and showing that, hey, you can actually, these synthetic data are actually as useful as real world data when it comes to multimodal pre-training.[00:02:16] But then, like I said, there’s a lot of dollars being paid out to like external vendors or, or like. Other folks to manually curate these types of data. It was very clear to us that, okay, on our way to, let’s call it embody general intelligence models need to learn the consequences behind their actions, which means that they need interactive data and the demand for those types of data are growing exponentially.[00:02:38] But everybody’s sort of thinking about it from a pure, say, video generation perspective or something else. But we feel like the true actually opportunity is actually building reasoning models that can do these things, like how humans do these things today. So that’s a little bit on the genesis of Moon Lake, and I think the reason I got into world models was partly.[00:02:59] A philosophical [00:03:00] take of the on the world where I like, believe the simulation theory and stuff like that. But on the other, on the other hand, it’s really just like, oh, like there’s an opportunity there that I feel like nobody’s doing it the way I think should be done.[00:03:10] Structure, Not Scale: The Vision[00:03:10] Chris Manning: I can say a little bit about that.[00:03:12] Yeah. So of the overall goal is the pursuit of artificial intelligence and most of my career has been doing that in the language space and that’s been just extremely productive. As we all know, the story of the last few years, I don’t have to tell about how much we’ve achieved with large language models, but, uh.[00:03:31] Although they have been extremely effective for ramping language and general intelligence, it’s clearly not the whole world. There’s this multimodal world of vision, sound, taste that you’d like to be dealing with more than just, language. And then the question is how to do it. And despite, a huge investment in the computer vision space, right, as the research field computer [00:04:00] vision has been for decades, far, far larger than the language space, actually.[00:04:05] I think it’s fair. Say that, vision, understanding sort of stalled out, right? You got to object recognition and then progress just wasn’t being made right? If you look at any of these, vision language models, it’s the language that’s doing 90% of the work and the vision barely works. And so there’s really an interesting research question as to why that is and at heart, the ideas behind Moon Lake are an attempt to answer that, believing that there can be a really rich connection between a more symbolic layer of abstracted understanding of visual domains, which aren’t in the mainstream vision models, which are still trying to operate on the surface level of pixels.[00:04:50] swyx: I think one of your blog posts, you put it as structure, not scale. Is that, a general thesis?[00:04:57] Chris Manning: Yeah. Well, scale is good too.[00:04:58] swyx: Yeah. Scale is good. Too[00:04:59] lot,[00:04:59] Chris Manning: [00:05:00] lots of data is good as well and scale, but nevertheless, you want the structure Yeah. To be able to much more efficiently learn.[00:05:07] swyx: Yeah. The other thing I really liked also is you put out an example of what your kind of reasoning traces look like.[00:05:12] Right. Which you would distill is the word that comes to mind. I don’t even think that’s a good, good description, but it would involve, for example, geometry, physics, affordances, symbolic logic, perceptual mappings, and what, what have you. But like that, that is the kind of example that involves, let’s call it spatial reasoning, role model reasoning as as compared to normal LM reasoning.[00:05:35] Yeah.[00:05:36] Defining World Models vs Video Generation[00:05:36] Vibhu: But also like taking it a step back. So how do you guys define world models? A lot of people see okay, you can do diffusion, you can do video generation. But, you guys put out quite a few blog posts. You put out a essay recently, we can even pull it up about efficient world models. You have a pretty like structural definition here, but for the general audience that don’t super follow the space, right.[00:05:55] What’s, what’s the difference in what we see from like a video generation model to [00:06:00] a world gen A simulator? How do you kind of paint that last[00:06:02] Chris Manning: year? Yeah, so I think this is actually a little bit subtle because, people look at these amazing generative AI video models, SAWA VO three, one of these things, and they think Genie, they think, oh, this is amazing.[00:06:17] This is we’ve solved understanding the world because you can produce these generative AI videos, but. The reality is that although the visuals do look fantastic, those visuals actually are accompanied by an understanding of the 3D world, understanding how objects can move, what the consequences of different actions are, and that’s what’s really needed for spatial intelligence.[00:06:49] So I mean, a term we sometimes use is that you need action condition, world models. That you only actually have a world model if you can predict, [00:07:00] given some action is taken, what is going to change in the world because of it. And in particular, that becomes hard over longer time scales. So if you’re simply, trying to.[00:07:12] Predict the next video frame. That’s not so difficult. But what you actually want to do is understand the consequences, likely consequences of actions minutes into the future. And to do that, you actually much more of an abstracted semantic model of the world.[00:07:32] The Bitter Lesson & Data Abstraction[00:07:32] swyx: Yeah, the question comes where you want to have more structure than is available in just predicting the next token.[00:07:41] And typically, well, let’s, let’s call it the experience of the last five years has been that is just washed away by scale, right? So what is the right middle ground here that, you don’t ignore the bitter lesson, but also you. Can be more efficient than what we’re doing today.[00:07:57] Chris Manning: One possibility [00:08:00] is, look, if we just collect masses and masses and masses and masses of video data, this problem will be solved.[00:08:11] Under certain assumptions that could be true, but there are sort of multiple avenues in which it could not be true. The first is what’s really essential is understanding the, the consequences of actions producing an action conditioned world model. And if you are simply, collecting observational video data, which is the easy stuff to collect, when you’re sort of mining online videos, you don’t actually.[00:08:41] Know the actions that are being taken to see how the video is changing. And so if you are never collecting directly actions and you are having to try and infer them from what happened in the observed video, that’s not impossible. But it’s very [00:09:00] hard and it’s not really established that you can get that to work at any scale yet.[00:09:05] And so there’s a lot of premium on collecting action condition video data, which is part of why there’s been a lot of interest in using simulation so that you can be collecting data where you do know the actions, which isn’t quite limited supply, but there’s also in the limit of as much data as you could possibly have.[00:09:28] Maybe the problem is eventually solvable, but. Even though we collect huge amounts of text data is always at a great level of abstraction, right? Language is a human designed, abstracted representation where there’s meaning in each token and it’s representing and abstraction of the world, right?[00:09:51] As soon as you are describing someone as a professor, and as soon as you are saying that they’re condescending, right? These are very [00:10:00] abstracted descriptions of the world. It’s not at what you’re observing as pixel level, and to get to that kind of degree of abstraction, starting from pixels is orders and magnitude of extra data and processing.[00:10:14] And so, although, we absolutely want to exploit, get as much data as possible, use the bitter lesson. Nevertheless, if there are ways in which you can work with five orders of magnitude less data than people working purely from pixels, you’re gonna be able to make a lot more progress, a lot more quickly.[00:10:34] And that’s the bet here. And so you could just say that’s only wanting to be able to, do it more efficiently, do it more quickly, do it more cheaply. But I think it’s actually more than that, I think. One should be making the analogy to how human beings work at one level. You know? Yes, we have these high [00:11:00] resolution eyes and we can look and see a scene like a video, but all of the evidence from neuroscience and psychology is that most of what comes into people’s eyes is never processed.[00:11:13] Right. That you are doing fairly fine ated processing of exactly what you’re focusing on. But as soon as it’s away from that of yeah, there’s another guy over there that you’ve sort of only processing top down this very abstracted semantic description of the world around you. And so, that’s what human beings are doing.[00:11:33] They’re working with semantic abstractions and so. I think it is just the right representation. ‘cause we also have other goals we want to be able to do, real time worlds. So that means there’s a limit to how much processing you can do and we want to do long-term planning and consistency. And again, that favors abstraction.[00:11:55] I mean, I guess there was actually a recent. Blog posts that [00:12:00] came out from our Friends of physical intelligence and, they were sort of heading in the same direction they were saying Oh, to the pay[00:12:06] swyx: pay model.[00:12:07] Chris Manning: Yeah. Yeah. To maintain a long term memory of what’s happening in the world. So we can, do longer term we actually storing text of what is, been happening in the world.[00:12:19] Right. It is not such a successful strategy of trying to keep it all at a pixel level.[00:12:24] Vibhu: And yeah, I mean, you can see it in video models like that Temporal consistency. We’re at a scale of train on, all the video data we have. We have it for maybe 30 seconds, a few minutes. That’s not the same as a game state played for half an hour.[00:12:37] Right. I thought you guys break it down pretty well. You have a, you have a blog post about. Building multimodal worlds with an agent. I dunno if you guys wanna talk about this. This is one of the things I read, I[00:12:48] swyx: thought, yeah, it’s the thing I talked about with the reasoning chain. Yeah.[00:12:51] Vibhu: So there’s like different phases to this.[00:12:53] It seems like it’s more of an agent, a scaffold, very different approach than just, type in a prompt and you, you don’t have the same consistency. [00:13:00] It also, like, for people that are listening, I, I would highly recommend reading it. It breaks down the problem in a different light, right?[00:13:06] So like, what do you need to consider when you’re talking about video, like world game models, right? How would, what do you need to consider? What are the factors? What are the elements? What’s the state? So I don’t know if you guys have stuff to talk about for this one.[00:13:19] Fan-yun Sun: Yeah. Actually, I wanted to add on a little bit Yeah.[00:13:22] On our previous point, which is just like, change topics so quickly. I, I do feel like sometimes people confuse like, oh, like we’re taking an an, an method with abstraction. That means they don’t believe in bitter lesson. Like that’s just false, right? Like we are believed is a bitter lesson. But then I feel like the question that we always discuss is like, what is the right abstraction level today?[00:13:42] The analogy I like to make is like, let’s just say we can encode and decode. Represent all of images, videos, audio and bytes. Then the most bitter lesson approached is to train a next byte prediction model as opposed to the next token prediction model where it’s just like, okay, it’s natively multimodal, can just, but it’s like, yeah, like [00:14:00] to, to Chris’s point, it’s like the scale and computing you need to achieve that.[00:14:03] So that’s why we always come back to like, okay, what is the most efficient way to do it? And reasoning models to the point of this blog post is a showcase of like, Hey, we’re actually just like reasoning about the world and reasoning about. The aspects of the world that CAGR that matter for me to learn what I want to learn from this role model.[00:14:21] swyx: Yeah, it’s like you’re improving the en encoder of whatever you’re, trying to model. And like a better representation would just represent the important things in less space. Yeah. Which would just be more efficient.[00:14:33] Fan-yun Sun: Yeah.[00:14:34] swyx: So yeah, I, I, I fully agree that it is not, antagonistic to, bitter lesson.[00:14:38] I do wanna wanna mention one more thing. Is there any philosophical differences with the JPA stuff that, Yun is working on? I gotta go there. You, you, you, you’re, you’re imagining like some latent abstraction. I’m like, okay, fine. Let’s, let’s talk about it, right? Like it’s an elephant in the room.[00:14:52] Chris Manning: Yeah.[00:14:53] JEPA & Philosophical Differences with LeCun[00:14:53] Chris Manning: There are philosophical differences. Jan Lacoon is a dear friend of mine, but. [00:15:00] He has never appreciated the power of language in particular, or symbolic representations in general. Yarn is a very visual thinker. He always wants to claim that he thinks visually and there are no words, symbols, or math in his head.[00:15:21] Maybe that’s true of yarn. It’s certainly not the way I think. Um. But at any rate, the world according to yarn is the basic stuff of the, the world and of intelligence is visual and language is just. This low bit rate communication mechanism between humans and it doesn’t have much other utility and it’s far inferior to the high bit rate video, that comes into your eyes.[00:15:53] And I think he’s fundamentally missing a number of important things [00:16:00] there. Think of this evolutionary argument looking at animals, right? That the closest analogies, the things with chimps, right? So chimpanzees, have fairly similar brains to human beings. They have great vision systems, they have great memory systems.[00:16:18] They’ve got, better memory than we do of short term memories. They can plan, they can build primitive tools that, humans. Massively ahead in what we understand about the world, what we can plan, what we can build. And essentially what took off for us was that humans managed to develop language and that gave a symbolic knowledge, representation, and reasoning level, which just, okay if this sort of vaulting of what could be done with the intelligence in brains.[00:16:59] So the [00:17:00] philosopher Dan de refers to language as a cognitive tool and argues that, humans unique among the creatures in the world have managed to build their own cognitive tools and language is the famous first example. But other things like, mathematics and programming languages are also cognitive tools.[00:17:21] They give you an ability to. Think in abstractions, in extended causal reasoning chains. And that allows you to do much more. And we use that for spatial representation and intelligence and planning and gameplay as well. So we believe, and this is, underlying the specific technologies that Moon Lake is making, that symbolic representations are powerful.[00:17:50] And you want to use that in your understanding of the visual world when you want a causal understanding, when you want to maintain long-term [00:18:00] consistency and prediction. And as I understand it, that’s just not in ya Koon’s worldview. So I think that’s the fundamental philosophical difference. Then there’s the specific model.[00:18:11] He’s been advancing jpa, that’s a reasonable. Research bed is a direction as to, to head for building out a model of the visual world. To my mind, it’s sort of one reasonable research bed. It’s not really established. It’s the best one that everyone should be following,[00:18:32] swyx: at least developed at scale, at Meta.[00:18:34] But it’s not just vision, right? Like, I mean, JPA is a, just joint admitting prediction can be applied to anything really. And people have done it. The argument is that there is a latent representation or that is probably more. Suited to the task, then why not let machines do it for us instead of predefining it at all?[00:18:50] And isn’t something like a JPA shaped thing the right answer? And if not, why not?[00:18:55] Chris Manning: So I think there’s a part of jpa that’s right, which is [00:19:00] you do want to have a joint. Embedding that gives you a consistent model of the world. And Jan’s argument is you can never get that from auto aggressive language models ‘cause they’re sort of left to right churning out one token at a time.[00:19:22] I guess this is where we’re the research arguments of the field, I’m not actually convinced that’s right. ‘cause although the token production is this auto aggressive, process that’s heading, left to right, I guess don’t have to be left to right. But anyway, in sequence of tokens we could have right to left Arabic.[00:19:40] But although that’s true, all of the weights of the model that are internal to the transformer, they are a joint model of the model’s understanding of the world. And so I think you can think of the weights of the model as a form of. Joint representation, [00:20:00] and therefore it is plausible to think that could be the basis of a world model, which avoids, ya’s objections.[00:20:10] swyx: I think I follow, and obviously that would touch on what Moon Lake eventually ends up doing as well. Right. Like, which it’s hard to tell because you put out the end results, but we don’t know the inputs that go into it. So it’s, it’s, that’s something that we have to figure out over time.[00:20:25] Vibhu: Yeah. I mean, I guess this kind of breaks down some of the outputs. Do you wanna walk us through it?[00:20:31] Reasoning Traces & Interactive Worlds[00:20:31] Fan-yun Sun: Yeah. So this, this really just walks us through the reasoning traces of like, okay. So that just say, if we wanna build a world in this context, it’s really just a game demo that, that shows the, the variety of interactions that this world model can build.[00:20:45] And yeah, it’s really just a reasoning traces of like, okay it prompted to create a bowling game. Like how did it achieve what you saw? That level of causality, interaction and consistency, right? So yeah, this is almost just like a, an example of [00:21:00] like a reasoning traces. Very[00:21:01] swyx: detailed.[00:21:01] Fan-yun Sun: Yeah.[00:21:01] Vibhu: Very, very detailed.[00:21:02] You gotta you don’t even realize it, right? Like when a video is generated, what happens when a ball strikes a pin, right? So first, like you, there’s audio in that, like audio triggers happens, score increments, the world changes. Like pins have to start dropping. There’s a timer that goes on. It’s just like very similar to how now we’re used to reasoning for language models.[00:21:20] There’s a whole state of what happens. So geometry, physics, all this stuff. And then yeah, there’s kind of that single prompt. So asset, ation all this stuff. It’s like a, it’s a nice view to see what’s going on.[00:21:32] swyx: I think Sun is also too polite to point out that, both like Google’s genie, demos as well as world Labs is marble, do not have interactive worlds.[00:21:41] Fan-yun Sun: That’s the benefit of having a reasoning model, right? Like, because you can, you can say, oh, like maybe in this particular context, I want to learn how to bowl. And then you can say, okay, then what is it important when it comes to learning how to bowl? Okay, maybe it’s like I need to understand the, the basic of like, physics and I want to throw it over [00:22:00] them.[00:22:00] I wanna know that when I, when it resets it’s a new game. So I know that yeah, basically, you know to pick up the ball, you know that ball’s gonna cause the pins to fall down. You know that what’s important to this particular bowling game is to score and you know that the score corresponds to the number of pins that fell down.[00:22:19] So it’s just like, if it’s a model that sort of knows what it. Looks like, knows what a bowling game looks like, but doesn’t actually allows you to practice over and over again and to understand that, oh, like what it takes to actually get a high score. Then it sort of doesn’t actually allow you to learn what you set out to learn within the world model.[00:22:38] And I think this is really just one example of showing like the advantages of the approach that we’re taking over most the, let’s call it the zeitgeist, is today, when people talk about clinical role models,[00:22:51] Chris Manning: right? So it sort of seems like the question to ask when there’s a world model is.[00:22:58] Can I not [00:23:00] only just wander around the world and look at the beautiful graphics, can I interact with the objects in the world and see the right consequences of actions?[00:23:11] Vibhu: And you also understand what the consequences would be if you do something right. So it’s not just like, okay, there’s one thing if I pick it up, something will happen.[00:23:19] But, there’s 50 options and I know I can expect, I can infer what would happen if I do any of them. Right. So very different when you can actually see it play around with it.[00:23:28] swyx: There,[00:23:28] Beyond Unity: Cognitive Tools for World Building[00:23:31] swyx: there’s two cheeky elements of that. I mean, the, the, the I guess, less ambitious one is, let’s really establish for listeners, why is this fundamentally different than writing Unity code, right?[00:23:40] Like just creating a model to translate a prompt into Unity code[00:23:44] Fan-yun Sun: so there is an underlying physics engine. Yeah. In that sense, there’s some overlapping things to Unity, but the way we think about it is like physics engine. Tools or code are cognitive tools like borrowing Chris’s term, right? Like tools [00:24:00] that the model can employ as means to an end.[00:24:04] So today maybe you say, okay, in this particular context we care about physics, we care about the long-term causality consequences. Then yes, we deploy it, employ physics engine, and then maybe tomorrow we say, okay, we’re we’re training that. Just say drones where we only care about really fluid dynamics and the visual aspect of the world.[00:24:25] Then, then yeah, maybe we don’t actually, the model actually doesn’t have to use a physics engine. Or maybe it employs other types of representation or physics engine to achieve the task. So yes, writing code for Unity is sort of similar to a tool that our A model can employ, but our goal is for a model to take a representation conditioned reasoning.[00:24:46] Approach or process.[00:24:47] swyx: Yeah,[00:24:47] Fan-yun Sun: internally.[00:24:48] swyx: Yeah. Using these things as just like general two calls. Right. Which I think is very interesting. The other more ambitious one is, some kind of recursive element where it becomes multiplayer, right? Like here, there’s a single player element, you’re not [00:25:00] modeling any other people involved.[00:25:01] And that is a whole other thing.[00:25:04] Fan-yun Sun: But in fact, we can really do multiplayers. Oh yeah, okay. I haven’t seen any double situations. So just actually just like prompt our, our model to say, Hey, like configure to multiplayer. Then it’ll do like this. You’ll be able to configure multiplayer[00:25:16] swyx: great[00:25:17] Fan-yun Sun: persistency database for you.[00:25:18] Easy. Yeah.[00:25:19] Vibhu: So what, what are like some of the current limitations in where we’re at? So there’s one approach of like, okay, scale up video predictors. Obviously there’s data issues. With approaches like this, is it data constraints? What are like the next steps? Is it real time? Like, so there’s one side of, write an agent to write Unity code, but okay, I want to be streaming a game real time.[00:25:38] I want to have characters being also like agent, but where, where do we kinda see this scaling up? Right?[00:25:44] Fan-yun Sun: Yeah, there’s definitely a data constraint. Like the more data, the, the better. This reasoning model can almost basically act as humans to like operate a variety of tools and softwares to build whatever’s necessary.[00:25:57] And then there’s a sort [00:26:00] of fidelity constraint, which we’re actually solving with another model, which we can talk about later. But it’s like, it’s not as easy to get to photorealism with the approach that we’re taking. But we think there are better solutions to that, which is we can dive into later.[00:26:14] Later.[00:26:15] Vibhu: The one one thing you note here is it’s a diffusion model, right? So there’s, there’s a few approaches, diffusion caution, splatting, yeah, so Ry diffusion model, you guys wanna[00:26:25] Fan-yun Sun: Yeah.[00:26:25] Vibhu: Introduce,[00:26:26] Fan-yun Sun: yeah, totally.[00:26:26] Rie: Neural Rendering & Skins for Worlds[00:26:26] Fan-yun Sun: So within our world modeling framework, we think there are two models that we train, right?[00:26:31] Like, there’s the multimodal reasoning model that we just talked about that essentially handles. Mainly the, the causality, the persistency and logic determinism of the world. And then RY is our bet on saying, okay, like while all those model, can take care of all these things that we just talked about, it’s limitations compared to existing, say, video models, is that it doesn’t have as high of a pixel [00:27:00] ality right off the gate, right?[00:27:02] And EE is to say, Hey, we can actually take whatever persistent representation that we generate with our multimodal reasoning model and learn to restyle it into photo photorealistic styles or arbitrary styles you want. So this model is almost to say, Hey, I’m going to respect the persistency and interactivity of the world that you created, but my only job is to make sure that its pixel distribution is close to what we want.[00:27:29] Vibhu: Yeah.[00:27:30] swyx: Great example right there. You kept the KL divergence.[00:27:33] Fan-yun Sun: Oh. Where,[00:27:34] swyx: no, no. I mean this, this is a, a classic like, how you don’t stray too far from the source material as you, you kept the kl, which is Oh yeah. Kind of cool. Yeah.[00:27:43] Fan-yun Sun: Yeah.[00:27:44] swyx: I mean, and the[00:27:44] Chris Manning: difference is, and I mean sun was pointing at this, where sort of saying it’s in one way a more difficult path, but a better path that, typically the diffusion models are producing the whole scene and it looks lovely, [00:28:00] but there isn’t spatial understanding behind it, which is allowing for the real time graphics gameplay, the spatial intelligence, understanding the consequences of worlds where this is, taking a path where it is assuming an abstracted semantic model of the world’s state.[00:28:20] And then the diffusion model is then being used on top of that to produce the high quality graphics.[00:28:27] swyx: Is there an intended practical, or business use for this, or is it like a, like a demonstration of capabilities?[00:28:34] Fan-yun Sun: We actually believe that this is gonna be the next paradigm of rendering. So it’s gonna replace how ra raizer, it’s gonna replace DLSS today because it not only has these pixel prior that’s learned from the world such that you can literally play any game in photo realistic styles, which is a lot of people’s desire when they do GTA, right?[00:28:51] Like,[00:28:51] Vibhu: all the mods, all the people adding perfect lighting and all this.[00:28:54] swyx: So[00:28:54] Fan-yun Sun: skins[00:28:55] swyx: for worlds, let’s call it[00:28:56] Fan-yun Sun: skins, let’s call it skin for worlds. I,[00:28:58] Vibhu: it’s also like, you can call it skin, you can call it [00:29:00] customization. You can play it how you want, right?[00:29:01] Fan-yun Sun: Yeah, exactly. And I think another thing that we really pointed out specific specifically in this blog is the programmability of it, right?[00:29:09] So what this means is that this render historically render is always a derivative of the game state, right? You’re saying, oh, here’s the game state, I’m rendering out a frame. But here I’m saying actually this render can be part of the gameplay loop. I can say something along the lines of, if upon getting 10.[00:29:26] Apples, I’m gonna, my weapon of choice, my bullet’s gonna turn into apples. And that’s, that’s possible because we can say, we can basically dynamically have certain game state trigger the, the preconditions to the render such that the rendering is now part of the game loop too. One thing is to just say, okay, it’s, it’s, it’s the appearance.[00:29:47] But the second thing is also to say there’s these novel interactions that are possible because this render now has actually priors of the world.[00:29:57] swyx: It is up to the artist to figure out what to do with it.[00:29:59] Fan-yun Sun: It [00:30:00] is up to the creators. Yes.[00:30:01] swyx: Yeah.[00:30:01] Fan-yun Sun: And I also think that’s actually another big argument that we’re making and the reason that we’re picking, taking the bet we’re baking is that a lot of the times, whether it’s for embody AI gaming, like you want a layer where human can inject their intentions.[00:30:15] So, for example, let’s just say in the context of gaming, it’s obviously like my creative intent, but maybe in the context of embodied ai, it’s like, oh, like I take this foundational policy and I want to actually fine tune it to deploy in my house. So you want to almost say, inject, have a layer where human can say, oh, here’s the distribution of things I want to create to achieve my goal.[00:30:35] And I think 3D graphics as it as it is today, is basic, the layer for people to say, Hey, what do I care about in this world? And it allows, basically human intent to be expressed in these worlds much more explicitly and distributionally as opposed to just saying, Hey, I’m gonna generate like, arbitrary.[00:30:54] And it’s like just prompts,[00:30:55] swyx: it’s one of those things where like, I think you, you’re going to build up a series of models, right? [00:31:00] This is just one of, this is probably like the highest utility or heaviest, frequency one, I don’t dunno what to call this. Where like you Yeah. You can immediately drop this in on any game and you don’t need anything else that.[00:31:10] That you guys do. But, I, I could see, I could see that I think the, the human intent is something that people are not even used to because we’re so used to static worlds or, worlds that just don’t react, or, I don’t know. It’s, it, you’re kind of blowing my mind right now with like, I’m, I wonder if you’ve talked to people at GDC Hmm.[00:31:27] And what are they gonna do with it?[00:31:30] Fan-yun Sun: Yeah. Now the stance that we take on this front is like, we’re not gonna be more creative than our users to ship[00:31:35] swyx: it out.[00:31:35] Fan-yun Sun: Yeah. But we wanna make sure that we’re building things in a way that really allows them to express their intent.[00:31:41] swyx: The thing that you said about, here’s the distribution that I want.[00:31:45] I think text may be too low of a bandwidth to. To really demonstrate, because I, I, there, I’m, I’m probably just gonna want to drop in a bunch of, reference assets and then you can figure it out from[00:31:58] Vibhu: there. But you probably wanna do a, a mixture of [00:32:00] both, right? Like you throw in a few images. I wanted this style.[00:32:02] Yeah. I want it to look like this. So it, it’s, it’s a mixture, right?[00:32:05] Chris Manning: I, I think it’s a mixture. I mean, yeah, I mean there’s clearly a visual component of this, and it’s not that, everything can be text. ‘cause of course you want to give a visual look, but there’s also a massive amount of giving the overall picture of the look of the world and the behavior of things that you can express in a few words of text.[00:32:32] And it be very time consuming and difficult to do via visual means. So I think, yeah, you want a combination of both.[00:32:40] Evaluating World Models[00:32:40] Vibhu: So one question I kind of have is, how do we go about evaluating world models? So like, there’s many axes, right? One is like, okay. I have preferences. How well do we adhere to prompts? One is the simulation.[00:32:50] One is like do things, is there core logic that’s broken? So coming from we know how to evaluate diffusion, there’s fidelity, there’s [00:33:00] stuff like that. But what are some of the challenges that most people probably aren’t thinking about?[00:33:04] Fan-yun Sun: Yeah, I think this is like a great question and probably one of the hardest questions in role models because like, I think it always comes back to what are you building this role model for?[00:33:13] And depending on your end goal and purpose, the evaluation should defer. So in the context of games, then the most direct way of measuring is how much behind are people actually spending in this world that you create? And if your goal is to say, for example, in the context that we just talked about, like, hey, deploying, deploying action in body, a agent, then your, your end.[00:33:33] Metric is then, okay, after training in these worlds that you generate how robust it is to when you actually deploy to the target environment. But then, it’s, it’s hard to measure these end metrics. So today people have like these proxy metrics that I call that basically try to measure what we really care about, which is the end metrics, but then frankly it’s different for every use case.[00:33:57] Yeah,[00:33:57] Vibhu: which seems like quite a challenge, right? Like in [00:34:00] in language models or video models. Image models, your benchmarks are proxies, right? People aren’t actually asking instruction, following tool use questions. They’re proxies of how well it will do downstream. But for this, so like, should teams, should companies have their own individual benchmarks outside of games?[00:34:16] If you think of stuff like, okay, video production, movies, stuff like that, that also want to use world models. Should, should they sort of internalize like. Their own proxy. Is this something you guys do? Where, where does that connect[00:34:28] Chris Manning: go? Yeah, I think this whole space is extremely difficult as things are emerging now.[00:34:35] And I mean, it’s not only for world models, I think it’s for everything including text-based models, right? ‘cause in the early days it seemed very easy to have good benchmarks ‘cause we could do things like question answering benchmarks and could you answer the question based on these documents and the various other kinds of, do pieces of logical reasoning or math.[00:34:58] But again, these are sort of. [00:35:00] And there were sort of visual equivalents of things like object recognition, right? For these small component tasks. These days so much of what people are wanting to do also with language models is nothing like that, right? You’re wanting to, have an interaction with the language model and get some recommendations about which backpack would be best for you for your trip in Europe next month.[00:35:25] And it’s not the same kind of thing, right? And it’s not so easy to come up with a benchmark as to does this large language model give you an effective interaction for guiding you in a good way for shopping, right? So, and it’s the same problem with these world models. So if we take the game design case, well success is that a game designer can.[00:35:57] Produce what they are [00:36:00] imagining in a reasonable amount of time. And that’s really the kind of macro task. That’s a very hard thing to turn into a benchmark and I think a lot of this is actually going to turn into people walking, walking with their feet. Right? I mean, I guess that’s what’s happening, at the large language model level, right?[00:36:23] When people are choosing to use, GPT five or Gemini or clawed, individuals are trying out these different models and deciding, oh, I like the kind of answers that GT five gives me, or no, I feel like I get more accurate detail from Claude, right?[00:36:43] Vibhu: It’s a lot of[00:36:43] Chris Manning: vitech, a lot of people just using it.[00:36:45] It’s vibe checking. I realize that, but it’s actually whether. People feel it’s giving them utility in what they want. Right.[00:36:52] Vibhu: And the the interesting thing there is like a lot of people prefer the visual, right? This looks pretty, which is not the objective of what this is [00:37:00] for, right? It’s if a, if a game designer is working on something, they care about the game engine, right?[00:37:04] The state, it’s, it can look whatever. You can fix that up later. Or you can have a really good game state and you can quickly edit it to 20. 20 different versions, like Keep State,[00:37:14] Chris Manning: right?[00:37:14] Vibhu: So[00:37:14] Chris Manning: that’s a really important distinction, for and for speaking to Moon Lake strength, right? So, yeah, great visuals are lovely to look at for a few seconds, but gains are really all about the concept, the game play.[00:37:33] And a lot of the time that doesn’t actually even require great visuals. I mean, there are just lots of very successful games which have relatively primitive visuals, and there are other games where people have spent millions producing photo realistic, visuals, and the game sucks, right? So, keeping those two axes apart is really important in thinking about what’s important in a [00:38:00] world model for different uses.[00:38:02] swyx: This conversation is reminding me of some game review and fiction discussions I’ve, had in my sort of non-AI related life. Some, for some people might know Brandon Sanderson, who’s a very famous, fiction author, had, is is a big game reviewer. And he, he’s a big fan of video games where you change one thing about a normal what you might assume about, about the world.[00:38:22] For example, Baba is you, I don’t know if you might have come across that, where like the rules change as you play the game. And also like where, you can do things like reverse time selectively or like change gravity selectively. And I think this is also reminds, reminds me of other kinds of world models that are created by authors.[00:38:38] Where Ted Chang is, is my typical example where he’ll take the world that, you know today, but change one thing about it and, but then create a consistent world based on that. Which is long-winded answer of me to, of. For me to say is it’s it easy to create alternative roles that don’t exist, but you change one thing and then let’s, let’s run a whole bunch of people through it to see if it works.[00:38:58] Chris Manning: My first dance will [00:39:00] be, that seems a lot easier and more conceivable to do using Techn technology like Moon Lakes than with some of the other world models out there, where the sun can actually make it happen. I’ll let him give a second answer.[00:39:15] swyx: If I guess for you, you’re constrained by the game engine tool, right?[00:39:18] Like at the end of the day, that’s the, that’s the thought, partner that you have. If I ask for something where like, if it never is allowed to reverse time or if gravity only ever works one way, then well that’s it. But sometimes gravity might change,[00:39:33] Fan-yun Sun: but it’s a lot easier to change with code as opposed to a model that is learned primarily on data of.[00:39:42] Real world and virtual worlds that are, I guess, like for example, junior, like there’s actually trained on a lot of real world data and a lot of virtual gaming data, and it’s hard to say maybe it’s easier to say, okay, I wanna change the visuals in like the time period of, of the world. Like, you can’t change gravity, for [00:40:00] example.[00:40:00] Vibhu: I feel like you can to light bounds, right? Everything comes down to like, code is a better way to execute it, but the models aren’t that diverse and creative, right? You can say, okay, make gravity slower. It can do that, but it’s limited to your representation of how you text it out, right? Like they’re, they’re only gonna do a few iterations, whereas programmatically, if there’s a game engine under the hood, you can kind of go wild, right?[00:40:22] So one of the, I dunno, one of the limitations of most models is that they’re very overtrained to one style. Right. And extracting diversity is pretty difficult. At least that’s something we’ve seen.[00:40:35] Fan-yun Sun: I mean, are there examples you have in mind where you Existing models? Yeah. Like it would be easier to do that’s not using code.[00:40:43] Certain types of creative intent or like transition state transitions,[00:40:47] swyx: Clipping, other models, other wo models are very good at clipping through things. Clipping my, my, my legs clipping through a rock because it’s, it’s just, it’s just bad. [00:41:00] Like, you would have to struggle very hard with your stuff to actually make that happen.[00:41:04] Which I think is maybe a topic that you actually prepared on, Gian Splatting versus, the other stuff.[00:41:09] Vibhu: Yeah. Yeah. It’s just for those not super familiar, right? There’s a, there’s gian splatting, there is diffusion. Like what works, what scales up. I feel like in February when Soro one came out the blog post was literally titled like,[00:41:21] swyx: you bring it up.[00:41:22] You never know.[00:41:23] Vibhu: World, world, video generation models are world simulators. It’s super bitter lesson pilled. Yeah, emer, a lot of it is emergence, right? So, not to go through their blog post, basically their whole thing was as you scale up all this consistency, all this stuff just kind of solves, it’s a very simple premise, right?[00:41:41] They just scaled up, diffusion, and from there, this is, this is Feb 2024, how much can we, it’s already been two years, which is basically five years. How much more in AI time do we need to just scale up or, or do we hit a data cap? But I think we already talked about this a lot, right? Like this is back to the beginning discussion of what’s [00:42:00] appropriate for the time.[00:42:01] And that seems like your approach, right?[00:42:03] Fan-yun Sun: Yeah. The point I’m trying to make is that they’re very many, many different types of world simulators and like having a world simulator that can produce pixel coherency is very, very useful for games and, marketing and all these things, but it’s not as useful as people think when it comes to causal reasoning.[00:42:25] When it comes to embodied ai. Yeah, like it this title is true. We’re not saying that it’s, it’s like, not a great world simulator, but actually in the blog that we, we, we, we wrote, the bet is more so that there are gonna be disproportionately large share of value of real world tasks or, and virtual tasks where high resolution pixel fidelity is not needed.[00:42:47] Yes. Video models have their values.[00:42:50] swyx: Yeah. This is at the absolute limit of my physics understanding, but one example that comes to mind is basically having to solve like ba the equivalent of a three [00:43:00] body problem in a deterministic Well, where the video models, which is approximated good enough. Yeah.[00:43:08] Right. Like there’s, there’s some point at which your approach kind of runs into like the you now have to simulate the world. Please, thank you very much. And like you’re trying to do that, but only to the extent that the game engine lets you and like game engines cannot do some things.[00:43:23] Fan-yun Sun: Yeah, no, I mean, I think the interesting or more technical question here actually is where do you draw the boundary between.[00:43:32] What’s handled with, let’s say, diffusion prior and what, when? What’s handled with symbolic priors?[00:43:38] swyx: Yes.[00:43:38] Fan-yun Sun: Okay.[00:43:38] swyx: Okay.[00:43:39] Fan-yun Sun: Right. Let’s go there. Because this, this boundary can actually be fluid. Like I think like maybe what you’re trying to get at is like, okay, people are saying pixel prior, everything. But what we’re saying is, okay, there’s a boundary that we draw where this is where we think provides the most economical value for the domains and things that we care about today.[00:43:59] [00:44:00] And I actually do think, and it’s something that we do internally all the time, which is like, okay, given new equations that we learn or new elements of the world and that we, we learn, or maybe some other knowledge that we acquire in the process of developing the models. Should we still be maintaining this line exactly as it is today?[00:44:22] Or should we move it a little bit left or a little bit right? Right. Like sometimes that we realize that, oh, like maybe customers or, or folks like want certain things that are better handled with preop pryor as opposed to, symbolic prior than,[00:44:34] swyx: yeah. Your, your skin thing is a, is a example moving it, right.[00:44:37] Yeah.[00:44:37] Or left. Yeah,[00:44:37] Fan-yun Sun: exactly.[00:44:38] swyx: I dunno what the, the left right is.[00:44:39] Fan-yun Sun: Yeah, yeah, yeah. No the, the model.[00:44:42] swyx: Yes.[00:44:42] Fan-yun Sun: Actually we have a few iterations of them. They’re actually at slightly different[00:44:45] swyx: I know boundaries. You should, you should do that. That’s a cool dimension to show.[00:44:49] Fan-yun Sun: Yeah.[00:44:50] swyx: Is quantum mechanics the diffusion prior of our world?[00:44:55] Right. It’s like that’s the boundary of classical mechanics versus quantum. Right? Like, that’s it. At one [00:45:00] point God plays dice and the other point doesn’t.[00:45:02] Fan-yun Sun: I dunno if Chris, you wanna say it, but I think, I think generally I feel like physics is better with symbol P priors.[00:45:08] Chris Manning: Even quantum physics.[00:45:09] Fan-yun Sun: Even quantum physics.[00:45:11] swyx: Yeah. This is starts against to, MLST territory is, is what I call it, where, he, he likes to get philosophical. We, we we’re quite friendly.[00:45:18] Vibhu: I mean, we need to get, we need to get singularity. I heard some of that.[00:45:23] swyx: No, no, I think that is actually really helpful and man, I just want you to productize this like, as a product guy, I’m just like, oh, also[00:45:32] Vibhu: a gamer, I[00:45:33] swyx: wanna, it’s like a researcher, like, it’s cool.[00:45:35] Like this is a, the theoretical, like you have a very good, I don’t know, like the way of thinking about these things, but I just wanna see you like, express it. I do think like your fundamentally things when, when you leave open new tools, like, okay, use, use human intent to incorporate it into how you render.[00:45:52] Artists are gonna have to take like two to three years to figure out what to do with this. And you just don’t know.[00:45:57] Chris Manning: Right. But I think, this is, [00:46:00] gives a much more approachable and controllable world for the society, which is the beauty, the beauty of, NLP, that that will enable it to be adopted and used.[00:46:10] And we are very hopeful about that. Yeah,[00:46:13] Fan-yun Sun: yeah. Yeah. I mean, we are, we are very focused actually on commercialization in the sense that like we do, we do really believe in the data flywheel app approach. Yeah. Where, we put this in the hands of the creators and the users and then they will teach us when, what capability our model should improve.[00:46:27] And that’s why we are, we are actually, like products and beta[00:46:31] swyx: Yeah. Focusing on gaming. What, what’s like the adjacent thing to gaming[00:46:34] Fan-yun Sun: embody adjacent, basically. So maybe we can, we can I’ll maybe start with where we see the platform in three years. Yeah. Which is like, okay. The users would tell us what they want to achieve.[00:46:45] The end goal could be, Hey, I just, I wanna make something to teach my kids the value of humility. Or it could be, Hey, I wanna fine tune my, drones to be really good at rescue situations. I could be vacuum robots. I want to like train [00:47:00] my manipulation or like vacuum robot to be very robust to my office, right?[00:47:04] But it’s like, whatever it is, scenario robust to[00:47:06] swyx: my office[00:47:07] Fan-yun Sun: or like navigate very robustly in my office. But then it’s like, whatever end goal that you want, our role model will say, okay, given what you want to achieve, let me generate a distribution of environments such that I can train and evaluate whatever it is you want.[00:47:24] Yeah. Right. Maybe for the purpose of games, it’s just the end simulation and that’s the end product for certain policies. It’s like I can train it within these environments and then help you see where your policy is failing or not. Yeah. And then, so I think,[00:47:37] swyx: so in that case, much more of a training tool.[00:47:40] Than in other training[00:47:41] Vibhu: evaluation? Both. Right?[00:47:43] swyx: Sure. Same. Same thing.[00:47:43] Fan-yun Sun: Yeah, same thing. I think it’s just this role model that allows people to train any policy that can act in any multimodal environments.[00:47:51] swyx: Would it be harder to reward hack? Is there an angle here where it is harder to reward hack? Like it’s just, I’ll just put it generally because I think that’s a, that’s obviously a key [00:48:00] problem that a lot of people face when in training agents in these environments, and I don’t know, can you solve it?[00:48:07] Chris Manning: I think not necessarily. To the extent that there’s a mis specified reward that. It seems like it could be hacked in a more symbolic world or in a more pixel based world. I dunno if Sun’s got any thoughts, but I don’t think that’s really being solved.[00:48:26] swyx: The other thing that comes to mind is just you could just build a better sawa as a video generator model, right?[00:48:31] Because then you, you would move the diffusion, side a bit more further to the right. I think if I got the directionality correct. And that’s it.[00:48:40] Vibhu: It’s better on domains, right? Like on consistency over now, or for sure it exists versus something doesn’t, right.[00:48:46] Chris Manning: So[00:48:46] swyx: yeah. Yeah. Is[00:48:49] Vibhu: is a question more like, like[00:48:51] swyx: I’m just riffing on like, how do you, what can you build, you know?[00:48:54] Oh, with the stuff that you have. I do think that the minor, the academic does go immediately to training [00:49:00] and in eval evaluation, but like art tends to take unusual directions. Like you might end up,[00:49:06] Chris Manning: okay. Yeah. But the question is, can you use this piece of software to develop compelling gameplay and. I don’t think you can take SOAR and produce compelling gameplay, right?[00:49:19] If you want to have a world that you can wander around in a bit, you are good. But what are your abilities to have gameplay mechanics implemented the way you’d like them to be and to have things stay, with the long-term history of your gameplay that influences future actions. I think there’s just nothing there for that.[00:49:39] swyx: Yeah, I do tend to agree. I, I’m just trying to sort of test the boundaries. I would also make the observation that as AAA games industry has developed the line between what is a movie and what is a game has blurred. And you, you, you do end up basically producing a two hour movie as part of your game.[00:49:57] Fan-yun Sun: No, honestly, there, there’s so many actually [00:50:00] applications in adjacent markets that our world model can go into. Yeah. But yeah, it, it’s sort of fun to riff, riff on. Although on the execution side, we we, we need to stay focused with like, okay, what are the capabilities we want to unlock over time?[00:50:11] And there’s a roadmap for that. But yeah, if we’re just riffing on sort of like the possibilities, I feel like, whether it’s endless Yeah, it’s like classic[00:50:18] swyx: and the embedding for a possibility and endless in my mind, it’s very close. Yeah. I do wanna, focus on one, like weird choice. I, I don’t know if it’s weird.[00:50:28] Maybe I’m, I got something here. Audio, right? You could have just said no audio And audio in my mind has a lot of recursion, whereas in video you can just do recasting and that’s much computationally much simpler. Audio just seems way harder. I don’t know if you wanna just comment on just the special 3D audio.[00:50:46] Problem. Did you really have to do it? I guess you do to be immersive, but like a lot of people do treat it as like, well, you just stick a, a tt S model on top of[00:50:57] Vibhu: Well, there’s a lot more to game audio than [00:51:00] just speech. Right. It’s not just[00:51:01] swyx: tts. Yeah. Tts. S Fxt, GM Spatial in my mind Echoes[00:51:06] Chris Manning: Yeah.[00:51:06] swyx: And reflections.[00:51:07] And I, I don’t even know what’s, what else? I don’t know what, what other problems in this space.[00:51:13] Fan-yun Sun: Yeah, I think this point like the, it’s sort of a more, more pointing to the benefits of using an game engine as a tool that’s available to the model, right? Because like part of the spatial audio is from the code that is underlying the simulation.[00:51:32] And while we do give our model access to other types of audio models as. Tools.[00:51:39] swyx: None of them would be spatial, I think.[00:51:41] Fan-yun Sun: But that’s exactly sort of more 0.2. We’re giving our model an abstraction or a suite of tools such that it’s able to achieve that. And you can argue that sort of spatial is like a, like a emergence out of the, the tools that we and abstraction that we provide to the agents.[00:51:59] And I think that’s the beauty of [00:52:00] this, this, this approach is like there’s a lot of things kind of like how human’s built technology and they’re like Lego blocks that build on top of each other. And it’s the same thing here. There’s gonna be things that sort of just sort of emerges from being able to put these things together in like combinatorially interesting ways,[00:52:14] Chris Manning: right?[00:52:15] So this integrated audio model exploits the understanding and semantics of the Moon Lake world, right? And whereas in general for the Gen AI video models. There’s no actual integration across to audio at all, right? That someone might stick some music or stick a soundscape or whatever else on top of their video.[00:52:44] So it’s not a silent video, but they’re in no way connected into a consistent world model. And there’s nothing that’s okay. An action is happening in the video. Therefore there should be a sound that’s [00:53:00] coming from this part of the visual field.[00:53:03] swyx: Yeah.[00:53:03] Vibhu: Is that different than Sora too? Does it not have audio?[00:53:06] Not to say it’s not like[00:53:08] swyx: amazing[00:53:08] Vibhu: isn’t a spatial[00:53:09] swyx: audio.[00:53:09] Vibhu: It doesn’t,[00:53:10] swyx: no. I’ve played around it with it enough. It just sounds like someone put an 11 laps voice on top of it and just tried to do the lip sync.[00:53:18] Vibhu: Oh, yeah. I’ve seen, okay. Generate a dog at the beach and reactions to big wave and move[00:53:23] swyx: around.[00:53:23] It’s definitely like, so have the dog, have the dog move away from camera and see if the, the song goes down. It doesn’t. ‘Cause they don’t have facial audio.[00:53:32] Fan-yun Sun: We do want to basically like we, our moral model, like the one we’re training is basically towards the goal of having a combined latent representation across all these different modalities.[00:53:42] Right? Such that it can like reason across these different modalities. So for example, if I close my eyes and like you play a video, you play a sound of like a car skidding away from me. I almost can like, visually extrapolate that trajectory in my mind. And I think that type of capability, we want our model to be able to reason, right?[00:53:59] And that’s the reason that [00:54:00] we’re sort of taking this multimodal reasoning approach. It’s like we want this combine late in space that can[00:54:05] swyx: Yeah. Oh, you said late in space. We like that. Here we have to play the, the bell Every time that someone says late in space, no, you gotta train daredevil one. Where you, you, you, it’s only audio, but you have to work out.[00:54:15] Where everything is.[00:54:19] Cool. I I think that that was, that was about it for our Moon Lake coverage. I do think that we have like a couple of, Chris Madden questions on, on IR and, just any, any other sort of attention topics or n NLP topics.[00:54:31] Vibhu: Okay.[00:54:31] swyx: Go ahead.[00:54:32] Chris Manning’s Journey: From NLP to World Models[00:54:32] Vibhu: Well, no, I mean, yeah, it’s just fun. We talked a bit about how you guys met, but you basically, you, you were like the godfather of NLP per se, right?[00:54:39] You spent the whole career from early embeddings, early early attention. You did 2015 attention for machine translation, everything. You, you had information retrieval, so RAG before rag, we just wanna shout that out and admire a lot of that. Right? So what prompted the switch over to world models?[00:54:56] How, how’d all that come about?[00:54:58] Chris Manning: To some answer it [00:55:00] is, the enthusiasms and creativity of students, but there’s a bit of a history there, right? So, yeah. So clearly most of my career has been doing stuff with language and how I got into research was thinking, ah, this is just so amazing how humans can produce speech and understand each other in real time.[00:55:21] And somehow they managed to learn languages from their kids. How could this possibly happen? And so, yeah, starting off I was very focused on language, but as it sort of got into the 2000 and tens, I started, going, I’d been working on question answering, and then I started to get, interest in visual question answering.[00:55:42] And that was an area where it was very noticeable. That the visual understanding was bad. Right. These were the days when like, it sort of seemed like there’s almost no visual [00:56:00] understanding. You were just getting answers that came from priors. So, if you asked how many people are sitting at the table, it’d always answer two regardless of how many, how many people you could see in the picture.[00:56:11] And so it seemed like, oh, these models actually aren’t able to get semantic information outta IMA images. And so I was interested in that problem and tried to work more on that. And so then that required. Knowing more about what’s happening in vision and how you can represent visual information.[00:56:34] And then things start, there started to be this revolution of, doing generative AI images. And then I had students that started looking at that before the era of Moon Lake. I was also working with Demi Gore, who founded pika. And so, and[00:56:50] swyx: Ian obviously[00:56:52] Chris Manning: with gans. Yeah. Though Ian was never my student, but yeah, Ian I was very aware for the, the whole decade there of Ian with Gans.[00:56:59] [00:57:00] Yeah. And I mean, Ian was a Stanford undergrad, but yeah,[00:57:03] Vibhu: richard des u.com, I believe he was your student.[00:57:06] Chris Manning: Yeah. Yeah. And there were, there were links across at that stage as well. So there were several papers in that era of doing, I mean, so Andre Cap was a, PhD student at the same time as Richard.[00:57:20] And so there was some joint language vision work in that era as well. It seems kind of ancient by modern standards, but yeah, we’re trying to go from sort of textural dependency graphs to visual scenes[00:57:32] Vibhu: at a time. The glove embeddings really took over a lot of. T-F-I-D-F, like one hot encoding, all that.[00:57:38] The early vision language models we saw were like lava style adapters, right? It’s, it’s technically still just embedding latent space. Let’s add image, let’s like mixed modality. So, and that, that’s one of the things you super put out there too, right?[00:57:51] swyx: Yeah.[00:57:51] Vibhu: Yeah.[00:57:52] swyx: Yeah.[00:57:52] Hiring, Closing & The Name “Moon Lake”[00:57:55] swyx: Well, thank you for all of that. Thank you for all advancing the worlds on, world modeling.[00:57:56] I honestly, do think that if people deeply understand everything we just [00:58:00] covered, they will see what’s coming. I think you guys have, made some, a really significant contribution here. What are you hiring for? What is the, what do people find? We, we agreed that the CTA was a hiring call.[00:58:10] Yeah. Don’t we have a GI You don’t need, you don’t need engineers anymore, right?[00:58:14] Fan-yun Sun: Yeah. On the model side we are actually striving towards basically a self-improving system. But what that means is that we need people to set up the self-improving system. So more, more specifically people who have the intersection of knowledge within co-generation and computer vision and graphics, right?[00:58:30] Yeah. That’s, that’s sort of the core research background that we look for within OTM and, and the majority of the team today do have like both backgrounds.[00:58:38] swyx: When you say computer vision and graphics, are they the same thing or is it computer vision one thing, graphics, another thing. And how intertwined are they?[00:58:46] Chris Manning: They’re intertwined but different.[00:58:49] swyx: Yeah.[00:58:49] Chris Manning: And I think, this relates to some of the themes that we’ve been talking about, that the more explicit underlying [00:59:00] world models that are being constructed inside Moon Lake really draw on the computer graphics tradition. And so it’s then combining that with the visual understanding of vision.[00:59:16] swyx: Got it. Yeah. All right. So you’ve written a game engine, you’re come talk to us, right?[00:59:21] Fan-yun Sun: Oh yeah, definitely. Definitely. But I do think that the line is blurred, like increasingly blurred these days where it’s like if you have a general understanding of group vision and graphics,[00:59:31] swyx: I think for your standards it is, for me it feels like vision is, is.[00:59:35] I’ll leave that to the big labs graphics. I, I, I can get that, you would want to do that from more first principles, but vision, there’s so many vision models off the shelf that I can take, but probably not good enough for your[00:59:45] Fan-yun Sun: I see, I see. If, if you’re sort of like making that distinction then maybe we, we care a little bit more about having graphics[00:59:51] swyx: knowledge.[00:59:51] Yeah, exactly.[00:59:52] It could be like, sometimes a hiring call can be as simple as like, if you know the answer to blah, you should talk to me. Like the sort of core known hard [01:00:00] problem in, in your world.[01:00:01] Fan-yun Sun: Ah, I see. Yeah. In that case, if you, yeah, definitely. If you’ve written a game engine before, if you’ve rld a variety of coding models on different objectives, like[01:00:13] swyx: easy,[01:00:13] Many of those, yeah.[01:00:14] Fan-yun Sun: If you’ve done multimodal lean space alignment, I, I intentionally include[01:00:20] swyx: space.[01:00:20] Fan-yun Sun: Again,[01:00:21] swyx: a poor editor has a thing every time. Yeah. Lean space alignment. Honestly. Is it that hard?[01:00:26] I, I, there’s some scripts out there that I’ve saved for the day. I someday have to do it, but I don’t have to do it.[01:00:31] But it’s[01:00:32] Fan-yun Sun: done, I think. Yeah. There, there’s, there’s a versions of that that are done. But I, I think we are aligning audio, text, language and video. Yeah. Right. Like, and basically we have these role models that are able to act as agents to like act in these worlds and extract long horizon videos and encoding that back to the model to sort of self-improve.[01:00:52] So it’s an insanely exciting, but also technically challenge problem. Yeah. So people who wanna do their lives best work, that only [01:01:00] makes a place.[01:01:01] Vibhu: How big are you guys? Where are you guys based?[01:01:02] Fan-yun Sun: We’re currently based in San Mateo, although we’re moving up to sf. We’re about 18 folks right now.[01:01:08] swyx: My ending question was gonna be why, what, what is the name?[01:01:10] What’s behind the name?[01:01:11] Vibhu: Yeah.[01:01:12] Fan-yun Sun: Oh,[01:01:14] Vibhu: Very cool. Graphics and design, by the way.[01:01:16] Fan-yun Sun: Actually at the, at the time when the, when the, when we started the company, we were thinking a lot about how do we make a company name that gives people the vibe of like, open ai, but for like, almost like industrial light and magic vibes.[01:01:28] Wow. Because it’s like we care about creativity and using that as a funnel to solve a GI. So then we were, we, we brainstorm a lot around like Dreamworks, right? Like industrial light magic. And, so there’s a few, few basically, space of things that we feel like are very, very semantically close to the company’s identity.[01:01:47] swyx: Yeah.[01:01:48] Fan-yun Sun: And then it ended up being Moon Lake, partly because of the Dreamworks vibe, the Dreamworks, moon[01:01:54] swyx: Lake.[01:01:55] Fan-yun Sun: Exactly. Yep. So that was a little bit of that inspiration. And then the moon was sort of [01:02:00] like a, it basically was like about the. Reflection. The reflection part also implies the self-improvement loop.[01:02:07] Wow. That we sort of like, that’s really bleed and that’s the path towards multimodal general intelligence. So that’s, that’s that. I’ll leave that as I love a good[01:02:15] swyx: name. I love a good name. This is great. It’s a[01:02:16] Vibhu: very[01:02:17] swyx: good name. It’s very good. Lo I’m glad I asked the question. I will also say, one, my favorite story, books or biographies ever is, creativity Inc.[01:02:24] With Ed Kamal’s, story about Pixar and how he, was rejected as a Disney animation artist. So then he went into computing and brute forced his way into back. No, I love that story. Yeah. Disney.[01:02:37] Fan-yun Sun: Yeah. And Walt Disney is also like one of my favorite founders. He’s like, his, his story. Like at the time you’re like, okay, I’m gonna create this like.[01:02:44] Immersive park. Like people can’t, don’t even have that technology to create it virtually, but they’re like, you know what, let’s just build it physically such that people can,[01:02:50] swyx: so he is the first world modeler.[01:02:52] Fan-yun Sun: No, I, I I tell people that like, theme parks are world models too.[01:02:56] swyx: Mm. Yeah. Yeah. Yeah. I mean, it’s a small world or it’s [01:03:00] a, like the Epcot center with all the little, replicas of the countries.[01:03:03] Yeah. Those are very interesting. Okay. Well thank you, we’ve covered, a huge amount. Thank you for your time and thank you for inspiring us.[01:03:10] Fan-yun Sun: Thank you[01:03:10] swyx: for having us. Thank you. It’s fun[01:03:11] Fan-yun Sun: chatting. Yeah. It’s been a good time. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe