Episode 11: AI Fluency Pyramid, Unrolling the Codex Agent Loop, and Claude Code's Secret Swarm Mode

Shimin (00:15)
Hello and welcome to Artificial Developer Intelligence, a podcast where three software engineers chat about the ever-changing landscape of AI-enabled programming. We are America's number one podcast when it comes to generating training data for future AI historians on how humans reacted to the birth of AI. I'm Shimin Zhang and with me today is my co-host, Dan. He quickly loses social permission unless he does something useful. Lasky.

And the Rahul I'm not sure how he survived adolescence without destroying himself. Yadav, how are you two doing today?

Dan (00:50)

Hanging in there. Like most of the country, we've had a pretty big blast of weather here. So it's been an interesting week getting through that, but you know, I'm warm. have power. So what more can you want?

Rahul Yadav (01:03)
Did it snow there?

Dan (01:04)
Yeah, I wasn't crazy. mean, they were saying like, ⁓ multiple feet. And I know other folks that got multiple feet, but we got some.

Shimin (01:04)
was bad.

Rahul Yadav (01:12)
Yeah.

Shimin (01:12)
And Rahul, you just came back from sunny Canada. I'm surprised you came back. At all.

Rahul Yadav (01:16)
It was

sunny one of the days. but yeah, I woke up pretty early today. Amtrak Cascades, great train. Views were great, so would recommend. It even ran mostly on time and was only three minutes late by the time we came all the from Vancouver to like south of Seattle. So it's doing pretty good.

Shimin (01:39)
That's yeah, it's not bad at all. I want to hear more about your train experience on our train podcast. Rahul travels America via Amtrak.

Rahul Yadav (01:40)
Yeah. No.

Dan (01:44)
All

Rahul Yadav (01:47)
Just that one.

Shimin (01:48)
⁓ On today's show, we start with new Threadmill, as always, where we have a couple items from Brex from the Interconnect blog, as well as Gnome's new AI assistant, Newwell.

Dan (02:04)
Then we'll be hopping over to the tool shed where we actually have a couple of things this week. So we have the agent orchestrator you've trained for. I probably haven't actually trained for it, but someone has. And then some new developments in everyone's favorite tool except for Rahul's, which is Claude Code.

Rahul Yadav (02:20)
I like it. I've been using the whole last week. It's been great. Sorry, you didn't edit it out. Yeah.

Shimin (02:21)
Follow.

Yeah, get on board. Welcome.

Dan (02:24)
Just teasing.

Shimin (02:27)
Then we're going to talk about post-processing where we have a unrolling the codex agent loop from OpenAI for the OpenAI fans out there, as well as Dario Amodei new post for the year, The Adolescence of Technology, about where we are in the AI journey.

Dan (02:44)
Then I'm going to be doing a vibe and tell about my experience using a whole bunch of different AI tech this weekend to redo my personal blog, which has been an interesting process.

Shimin (02:57)
And as always, we'll have two minutes to midnight where we cover the ⁓ AI industry news on the financial side and see where we are in the AI bubble. Well, why don't we get started? Rahul, our first article this week is Brex's AI Hail Mary From Latent Space about their journey of automating?

Rahul Yadav (03:14)
space.

Shimin (03:17)
transforming the company into an AI-first enterprise. Why don't you get us started there.

Rahul Yadav (03:19)
Yeah.

Yeah, Brex's CTO went on this podcast. And it was pretty interesting to

You can listen to it or read through this. They were talking about how, you know, they were struggling in revenue. They had to like cut 20 % of the staff in 2024, and they had to reorient the whole company around AI to be able to get back on track again. They asked themselves that same question from 50 years ago that Andy Grove and who was it? Was it Robert Noyce? They asked themselves like, if we were fired,

today, what would be the new people who come into the company would do. But they asked like, would be what would Brex look like if it got disrupted today. So along the same lines. then from that, they built this AI strategy that they had.

And they had three different pillars there that they talk about. It was pretty interesting to see that they didn't just go like put AI in the product, but you can see like they thought through how would AI flow throughout the company. they talk about, he talks about their corporate AI strategy of how they're going to use that across their different HR sales functions and all that. How would their operations use that? Some of the things they talk about there.

know your customer rules and everything, underwriting and all that that they have to use it for or they wanted to use it for and how would they actually use it in those operations. And then obviously like how would they build it into their product. Overall, and after I sent this link to Shim and I think a few days later we read that Brex has been, you know, is going to get acquired by Capital One for over five billion dollars. seems like the strategy

Shimin (05:03)
Mm-hmm.

Rahul Yadav (05:07)
worked

out.

Dan (05:07)
Chump change.

Fallen unicorn. I'm sorry, just had to put in a hot take. It was getting too...

Shimin (05:17)
fallen the unicorn.

Rahul Yadav (05:18)
Maybe

like the horn is all sad and dumb.

Dan (05:20)
I'm

Shimin (05:22)
We can end up in the end of that. I think it

was kind of a fallen unicorn story though to follow up on what Dan was saying. I think their last round of valuation was putting them at $10 billion or something like that, if I recall correctly.

Rahul Yadav (05:27)
Yep.

Dan (05:36)
Yeah,

Rahul Yadav (05:36)
Yeah.

Dan (05:37)
they had definitely been higher at one point, for sure.

Shimin (05:39)
Yeah. And of course, Brex for those of you who are listening, who do not know, is a corporate accounting and expense management app. They do a lot of corporate cards, corporate bank accounts, expense management, traveling and accounting.

Rahul Yadav (05:40)
Yeah.

Ramp seems like the, you know, brightest, shiniest kid right now in this. So I'm guessing like a lot of that also played into between that and Stripe and everybody. They were probably. Yeah.

Dan (06:06)
And like Rippling and Deal are trying to play in the same space

at like the platform level, which is interesting. But anyway, again, AI podcast, not business podcast.

Rahul Yadav (06:13)
Yeah.

Shimin (06:16)
Ha

ha ha ha!

Rahul Yadav (06:18)
One thing that really jumped out to me was they had this AI fluency pyramid where they talk about, know, in our career matrix, one of the things that we're going to look at where someone is in that matrix is how fluent they are in AI. And they start all the way from like, yeah, you use AI tools to assist you with simple things. So you're a user. Then they go like, you're an advocate.

you actively are like, you know, integrating it into different things and you're advocating for AI to be used more. Then you're actively like building things on top of it. And then you just live in that. You obtain the mastery and you're setting the vision for everybody. So it goes from like user at one end to like native all the way to the other end. And it was one of the few times I saw, you know, the traditional career matrix.

Dan (06:59)
Yeah, I'm maximalist.

Rahul Yadav (07:12)
being like someone's thinking about that too. So it shows again, like how they're trying to incorporate AI throughout their company, not just in some pieces here or there.

Shimin (07:22)
Yeah,

Dan, do you think the aliens built this AI pyramid also?

Dan (07:27)
The AI pyramid? built the real

Rahul Yadav (07:27)
Yeah

Dan (07:29)
pyramids, but they probably built the AI pyramid. It would explain a lot of things, wouldn't it?

Rahul Yadav (07:32)
Hehehe.

Shimin (07:34)
The AI fluency pyramid is also the one thing that jumped out at me from this article. So level one is user. uses available AI tools to assess with simple processes and their defined responsibilities and their workflow. So think using Claude Code, maybe understand basic prompting and limitations, or maybe adding some agent style markdowns.

Rahul Yadav (07:39)
Yeah.

Shimin (07:56)
And then going up to level two, you are actively integrating AI into independent or team workflows, can design or manage small to medium human in the loop AI workflows and process. So these are the ones who are maybe designing the entire AI agent feedback cycle during code development, know, having AI agents do code reviews, et cetera, et cetera. And then level three is builder.

can proactively build, design, refine, or manage AI driven solutions or tools that create significant business values. So maybe you're building additional tooling on top of AI agents, maybe building your in-house orchestrator or something like that. And then level four.

Dan (08:34)
I guess it's no, it's

my day for hot takes, so no one's hit level three yet.

Okay, sorry, level four.

Shimin (08:40)
Level four, it's AI native, can set the AI vision and strategy for a team or department, can pioneer novel applications of AI. I don't know, I'm definitely not level four, so I don't quite know what that level four looks like.

Dan (08:54)
Well, no one's level three yet,

so how could you get to level four?

Shimin (08:57)
That is true. Maybe someone thought, reading the Twitter, reading the Twitter, there are some people who are claiming to be level fours. I haven't run across anyone lately though.

Dan (08:58)
Show me the money.

Shimin (09:06)
Yeah, I found the pyramid to be an interesting approach for our listeners. You can think about where you are on the pyramid or how much of the Kool-Aid have you drank so far. Dan is not drinking the Kool-Aid, he's hop water.

Dan (09:21)
That's true.

Shimin (09:22)
The other thing

Dan (09:23)
Especially today,

hot take day. I don't know why.

Shimin (09:26)
lots of hot takes. It's the hop. It's a Washington hop that makes the takes extra hot. One thing they did mention is the team operates like a startup working 996 and then growing the team very slowly like a pre-seed startup. So guess that helps with evaluation when you introduce the 996 culture back in.

Rahul Yadav (09:37)
yeah.

I just assume anybody who's trying to do all that, they also call out like

Dan (09:44)
Bye.

Rahul Yadav (09:50)
It's like the ageism but the other way right where it's they call out Oh, yeah, we have a lot of young people and they do 996. And yeah, that's just how they chose to do things, I guess.

Dan (10:07)
The other part that I found notable was that they made drastic changes to their org structure that had nothing to do with AI that likely actually probably led to differences in performance. Cause it's like, yeah, if you strip like five layers of management, that might bring engineers a little bit closer to the customer and then you might actually get better results as a result of that. But anyway.

Shimin (10:30)
Mm-hmm.

Rahul Yadav (10:32)
Part of it was, do call out like you have to be actively involved in the details and you have to actively, I think they're trying to even use all the tools that they're asking other people to use.

Shimin (10:32)
So agile, agile works.

Rahul Yadav (10:43)
That's definitely a new thing as well. Where before, even if you compress the number of layers, if you have more reports, you would just go like, well, I too many reports and one-on-ones and all that. But now the expectation is like, everybody has to be an active player. You cannot just be like, AI, go use the tools. And that's the new hot thing that we need to do.

Shimin (11:05)
Hmm, companies take note. ⁓

Rahul Yadav (11:07)
Hahaha!

Shimin (11:08)
Okay. ⁓ Next article we have, it's from Dan,

Dan (11:09)
Good.

Shimin (11:14)
titled Who's Behind Ami Labs, Young LeCun's World Model Startup.

Dan (11:20)
Yeah. So, I mean, we've talked about this a little bit, um, on previous episodes where I'm personally very fascinated by the idea of world models. Um, so I've been paying quite a bit of attention to what's going on in that space and, where we're going with it. So I guess the, the big takeaway here is that there is rumored that like AMI or AMI or whatever labs might be raising funding, uh, at $3.5 billion valuation. Um, and

I guess no surprises coming from some like former startup partners that he's worked with before. But pretty interesting to see like a company that is brand new get that much funding right now, you know, out of the blue potentially. So hopefully we'll see something big out of world models, but yeah.

Shimin (12:05)
Yeah, it's interesting that Yann Lacun is not the CEO, right? But instead it's a previous, ⁓ yeah, co coworker from Meta And I do believe it's a me, as in French for friend, like me and me from the Hercule Poirot TV shows slash, ⁓ books.

Dan (12:08)
Yep. Chairperson, yeah.

Uhhhh...

Rahul Yadav (12:18)
Yeah.

Dan (12:21)
Got it.

So it'd be much cooler if it was like just letters, cause it's tech. AMI, like the old BIOS company.

Rahul Yadav (12:22)
Yeah, and he calls out.

I think the,

not, him not being the CEO part is he calls out in a different interview about he just doesn't enjoy the managing people part of the job. And he didn't like that, doing that at Facebook. So think part of it is just, he wants to focus on the tech and have someone else deal with all the other operational and management related things.

Shimin (12:28)
Yeah, that's true.

Mm.

Yeah. And the world model piece, I was listening to ⁓ Tyler Cowen's ⁓ podcast from last year earlier this weekend. And they were talking about trying to hire world-class poets to create a rubric to grade AI-generated poetry. And to me, it seems like you have it backwards without a full

context of what the world is like, how can you judge whether or not a piece of poetry is beautiful or representative of some feeling, you know?

Rahul Yadav (13:22)
Yeah. Or from poetry from inside the machine. How can you appreciate that if you've never been, you know, if you weren't born on a TSMC fab and then all the way from there to like you're sitting somewhere in a data center and I don't know Louisiana or something. How would you appreciate life?

Shimin (13:22)
So. ⁓

Dan (13:29)
It's spiky.

Shimin (13:41)
I'm going to ask Gemini to create a sonnet about the inside of a data center later, see what it

Rahul Yadav (13:47)
Hahaha

Dan (13:48)
I always go to Haiku for poetry.

Shimin (13:50)
O-Only for haikus. Sonnet for sonnet.

Rahul Yadav (13:53)
Son of

Shimin (13:55)
OK, and Dan calling this is I want to formally apologize for removing this article from the podcast online last week. But here I have the interconnects article from earlier January called a plot that explained the state of open models. We do care about open models here on ADI and.

Dan (14:16)
You

Shimin (14:18)
And this was a great summary of the state of open models as of the beginning of the year. So there are basically eight plots, eight graphs with eight ⁓ lessons or summaries. And I'm here just going to read them out. Number one is China has a growing lead in every adoption metric. Number two is the West isn't close to replacing llama. There are no other.

OpenAI, GPT, OSS models are not really making an impact, and never mind any other Western open source models. Number three is new organizations barely show up in adoption metric. So all the ⁓ DeepSeq, OpenAI, and Alibaba are still kind of leading by an exponential amount.

Number four, Qwen weakness is in large model adoption. Number five, a few model from Qwen dwarfs new entrance. Number six, in December, Qwen got more downloads than roughly the rest of the open ecosystem combined. And number seven, people are still fine tuning Qwen more than anything else. And lastly, China still has the smartest open models. So I think this is an article that more or less is saying...

In the open modeled world, Qwen is the one that folks are mostly using and fine tuning. And it's usually for smaller models.

Dan (15:39)
And there's also been, this is not in the articles, but it's new. So relevant. a few use Olamo at all for running local stuff, which arguably is, you know, no, guess.

is like a.

easier to run stuff. They just released a new feature, I think it was this week, might have been the week before, where you can basically use Olamma with several existing agents. So if you have like cloud code set up, you can set it up to use coin coders, the actual model running locally on your machine by just running one CLI command, which is kind cool. So of course they've also set up ⁓ everyone's

I don't know why, but everyone's going nuts about Clawbot this week too. And so you can also run Clawbot through it too if you want to ⁓ run it local or run your inference locally.

Shimin (16:23)
I'm not brave enough to run Clawbot locally.

Rahul Yadav (16:26)
What is the hype about? Okay.

Dan (16:28)
I don't know. I'm

very lost about that myself too. And the other thing that I've found odd is that most of the like mentions that I see like hyping it. it did make hacker news, which is, know, pretty legit, but like there's been a whole like spread of these tiny, what looks like almost entirely, I'm generated like news sites talking about it. And I'm like, why are they pushing it so hard? I don't get it. So I don't know, but it's

Rahul Yadav (16:40)
Yeah.

Yeah.

Shimin (16:51)
Mm.

Dan (16:56)
Certainly hot in some circles right now.

Rahul Yadav (16:58)
Interesting.

Shimin (16:59)
I saw some articles on Reddit about Clawbot as well. And it seems like some sort of a viral marketing influencing campaign. ⁓ Hot take.

Dan (17:08)
Yeah, that would make sense.

Rahul Yadav (17:11)
Maybe that's

the project you can do using Clawbot is generate a viral marketing campaign for Clawbot.

Dan (17:14)
Yeah, they're using Clawbot to advertise Clawbot I mean, you know, why not? ⁓

Shimin (17:16)
You

snake eating its own tail.

Rahul Yadav (17:24)
Yeah.

Dan (17:24)
Yeah. So another one I just did want to quickly mention, since I'm the big open source guy in this open weights, open source, whatever. There is another like sort of open source release this week where the gnome project. like that's like, if you use Linux, there's

two big window managers to pick from and like millions of smaller ones that are also great. Don't, you know, not use them just because it's not the big guys, but Gnome is one of the, Gnome and Giddy are the two biggest, would argue. And Gnome has this new AI assistant thing called Newell that's like able to run, you can again use whatever for inference, but it's supposed to like pop up and, you know, help you do stuff and whatever, be like an ever present little assistant.

But they've just released a new version of it, 1.2, and it has some cool things like support for Llama C++, so you can run models locally much more easily. And then ⁓ it also has gained command execution tool calling. So if you trust it to do things on your Linux box, you can have it do them. RM minus RF slash.

Shimin (18:28)
You just add it to the, ⁓ forbidden list in your config. Of course.

Dan (18:33)
Forbidden pancakes.

Shimin (18:35)
Yeah. Yeah. Super cool. onto our tool shed. last week, Dan, you mentioned that when you got back from vacation, you saw a gas town, you heard about it. peasants are just chattering all about it. You thought it was some sort of a MMO. and this week I present to you getagentcraft.com which

Dan (18:49)
Mm-hmm.

Shimin (18:54)
is I believe some sort of ⁓ art. Yeah, it is.

Dan (18:57)
Starcraft mod.

Shimin (19:00)
a Warcraft, like an open source Warcraft mod with agents. The demo is basically just a GIF of Warcraft 2 where the orc peons are individual agents and you kind of have this RTS skin of a orchestrator, like a swarming agent orchestrator.

Rahul Yadav (19:06)
you

Dan (19:21)
So is it doing like real work in the guise of a RTS or it's just playing an RTS with all the LLMS

Shimin (19:28)
it's doing the idea to do real work. each

Rahul Yadav (19:30)
Yeah, it seems

like.

Dan (19:32)
Interesting.

Shimin (19:32)
Yeah, my 200 APM is again useful in the brave new world of AI. I'm so happy.

Dan (19:37)
Hahaha.

If you're not a Starcraft fan, that's ⁓ how fast you can hammer on the keyboard actions per minute.

Shimin (19:46)
Yes. Other than that, I don't know. This feels part art project, part vaporware. But if it happens, it happens. I'm looking forward to it. I just thought it's funny.

Dan (19:47)
keyboard and mouse.

I'm also

kind of into it because it's like, stop playing that game at work. But I am working like, you know, it's ⁓ I don't know. There's just something a little bit appealing about that to me. But.

Shimin (20:04)
Hahaha!

yes, you can never tell. I love it.

Dan (20:13)
I also like the vaguely, I made that StarCraft joke before I'd even clicked the link to see what this was. So now it actually feels kind of appropriate looking at it. So like, all right. Yeah, so in other tool news this week, we also have some very sort of under the radar, not released tool news, which is that it seems like Claude code has

sort of dark released a new feature called swarms. So you have to, I guess, go in and like flip some feature flags that someone reverse engineered by like decompiling the Claude code binary and finding strings in it. But, it, it's pretty similar to gas town is my take on it. so interestingly, I, I, you know, put this in our planning document for, for the pod this week. God, I did it.

I finally said the kill me the podcast this week. And then like a whole bunch of people are talking about it at work already. I'm like, wow. So they're, yeah, it's pretty fascinating. but apparently in similarity to Gastown is like, ⁓ you have like a lead sort of like the mayor in Gastown. and it doesn't

Shimin (21:00)
⁓ That's two bitcoins into the swear jar.

Dan (21:20)
write code, just creates plans and then it delegates out to agents and then synthesizes what the agents are doing. And then I guess it has a built-in task board, a la beads and can spin up multiple teammates and they can message each other to coordinate, et cetera. nothing new in the sense that similar to Gas Town but knew that, know, Anthropic is clearly looking at where things are going and trying to.

pull that in as a first-party thing to Claude.

Rahul Yadav (21:46)
Yeah, this makes me think of the, seems like that's the new pattern that's emerging. The Brex people also had it in their talk where they tried specific models for some general tasks and they didn't work as well as just like grabbing an LLM. And it almost seems like you're

The new architecture is you want breadth, which one of the latest models would give you. And then you might want very deep, specifically trained models. And this might give you that in the future. I don't know if you guys saw the cursor article where they tried to rewrite or create a whole browser from scratch. And it took like a week or whatever.

Shimin (22:27)
Mm-hmm.

Rahul Yadav (22:28)
that they also had similar architecture as well. it's like interesting how like this one.

Dan (22:33)
I didn't pull it in, but there was also

another piece by the guardian about that specific thing where, you know, the whole thing's on GitHub. So they pulled it down and it doesn't compile.

Rahul Yadav (22:39)
Yeah.

Yep.

⁓ even when they posted,

think they were like, yeah, right now it can't compile and then some of the checks were red.

Dan (22:52)
Yeah, well, so you can apparently like go

in as a human and patch it so it compiles. And then when you do, it's like super broken and like barely does anything. I don't know. It's a cool hype piece, but I'm. I don't think we're going to see an agent fleet build a complete browser from scratch yet.

Rahul Yadav (23:02)
Yep. Yeah.

Shimin (23:12)
Right, and of course the cursor agent was using a lot of existing open source libraries for ⁓ parsing and rendering. A lot of servos code was used.

Rahul Yadav (23:12)
It's

Yeah.

Dan (23:19)
Mm-hmm.

Rahul Yadav (23:22)
Yeah,

that was one of the, they even like, one of the things one of the agents did was, I forget they needed to...

they were waiting on another agent to write some library from scratch. And they were like, well, I'm not going to wait on this other agent. I'll just pull a third party one until he has it or that agent has its work done. And then once it's done, then I'll use that one. But in the meantime, I'll use the open source one. So it's just like a funny scene, that type of coordination going on for them making their decisions.

Shimin (23:53)
Yeah, get back to the swarm in Claude Code feature or feature flag. We have this diffusion of open source best practices into leading lab tools. Claude Code is basically AIDR on crack. And before even that, we have a lot of developers talking about

writing down a markdown of the plan and using checklists in the markdown to know how far along you've gone through a series of tasks. So I think all these best practices are just being slowly borrowed, incorporated into Claude Code. So if that's of any indication, then Swarm is probably the future.

Rahul Yadav (24:29)
Mm-hmm.

Shimin (24:34)
whether or not we like it.

Rahul Yadav (24:34)
Yeah.

Dan (24:35)
which seems like all of this, honestly. It's probably the future. I may not like it, but it's probably the future.

Shimin (24:42)
still, still TBD. I went back to being the overseer Gas Town this week and, I, I try to do less prompt engineering and less oversight and it didn't work nearly as well as last week. ⁓ humans are still needed as of right now.

Rahul Yadav (24:54)
Hahaha

For now.

Dan (24:58)
Did the mayor give you any attitude?

Shimin (24:59)
Mayor is still very pleasant as always, but you know how the mayor treats the PoleCats that's a different story. It restarts them for no reason.

Dan (25:05)
Yeah

Rahul Yadav (25:06)
Yeah.

Dan (25:08)
I was trying to explain that to someone at work and I couldn't remember the term that they'd used for, for PoleCats So I'm like, it's just like the peons from the sci-fi movie. I, it was just like a big mess of an explanation, but it made me really appreciate it. And then like someone else is like in describing the Claude code copy of it. They're like, it's like Gas Town only it's sane it's just like, ⁓ come on now.

Shimin (25:24)
Uhhh...

Rahul Yadav (25:31)
hahahahah

Dan (25:33)
It's just fun. So I think I think I'm with your opinion more from last week, Shimin where it's like it is bringing the fun back. And I appreciate that.

Rahul Yadav (25:34)
Yeah.

Shimin (25:36)

Glad to have you on board, ⁓ OK, onto post-processing. This week we have first an article from OpenAI. This might be the very first or one of the first OpenAI blog posts we've covered on this show titled, Unrolling the Codex Agent Loop. And what it is is essentially a blog post detailing how the codex agent

Dan (25:44)
You

Shimin (26:10)
⁓ takes user input and does tool calls and how that gets built into the overall API request. So what it does is, of course, it takes the user input, it stuffs it into the initial model prompt.

slash the codex prompt, send it over for model inference. And when the tool call happens, the entirety of the call chain, including the instruction, the tools, the input, and the output of the tools gets then appended into the next API request. Some of it, it does get cached. So you have the KV cache thing going on. But it has some nice diagrams for describing, you know,

how each step of the interaction of codecs slowly adds to the overall context. it's nice and clear and really simplifies this rather magic box like coding assistant agent.

black box. What I also found really insightful is, or maybe useful, is that the blog post cross-linked to the Codex repo, where the GPT 5.2, really all of the model's prompts were linked in there as markdown. I hadn't realized they were actually open sourced.

So I took a look at the GPT 5.2 codex prompt and a few interesting things. One, it's only 80 lines long. So it's actually pretty straightforward and pretty short. As always, every line in a coding agent prompt is very valuable. So didn't go ⁓ super verbose with it. The very first one is.

Dan (27:43)
Mm-hmm.

Shimin (27:47)
when searching for a text or files, prefer to use RG or RG files instead of using grep because ⁓ RG is much faster. So clearly, it's one of those additive things where the agent has a known issue of wasting too many tokens using grep And which one of us hasn't run into the grep overflowing the context window issue? ⁓ And then some other tidbits I really love include

Dan (28:06)
You

Shimin (28:11)
Let me see, where is it? here it is. when doing frontend design tasks, avoid collapsing into quote, AI slop unquote, or safe average looking layout. It's interesting, I like to really imagine there's an awareness of AI about AI slop that we've actually finally made it far enough in the snake eating its own tail game that.

Rahul Yadav (28:30)
Yeah.

Shimin (28:35)
Enough AI slop has made it into the training set.

Dan (28:38)
The other one I thought was funny unrelated to this is someone made a skill for, you heard about like the Wikipedia, like how to detect AI writing rules.

Shimin (28:47)
Mm-hmm. Mm-hmm.

Dan (28:47)
That's like,

⁓ okay, it's interesting. There's like a whole section of like things that LLMs commonly do in terms of their writing style that it's like makes it relatively easy to detect like AI writing, AI written like Wikipedia articles anyway, at least if they're formatted that way. So someone took their whole list of rules and turned it into a Claude skill. Like don't do these things. I just thought that was amazing. anyway.

Rahul Yadav (29:09)
Hahaha.

Shimin (29:12)
⁓ man.

Dan (29:13)
pretty sure the style is take a Wikipedia article and run it through the Mecca Hitler filter. That's what Grokopedia is.

Rahul Yadav (29:13)
Seek maximal truth, don't listen to anything.

Slap it up.

Shimin (29:23)

Rahul Yadav (29:24)
Why don't you guys

go and Grok on this podcast? Do we why are you focused on Gemini and

Dan (29:30)
We focus on models

that are actually good at stuff, Rahul mostly.

Rahul Yadav (29:33)
I

see.

Shimin (29:34)
We also don't like to have

Dan (29:35)
Hot takes

Shimin (29:36)
to

Dan (29:36)
Tuesday.

Shimin (29:36)
add a explicit tag on the podcast. And I don't think it's avoidable if we were to be covering Grok day in and day out.

Rahul Yadav (29:40)
Drunk.

Shimin (29:44)
Yeah, I had this note in my notes. There isn't a note to stop overuse of emojis. Because I actually would rather there to be less emojis. Clean that shit up.

Rahul Yadav (29:55)
There is a note to stop overusing emojis. is it? Yeah. In Wikipedia or you're, you're talking about the.

Dan (29:58)
There is not.

Shimin (29:59)
There is not, but yeah, there should be.

Dan (30:05)
No, no, in the

codex.

Rahul Yadav (30:07)
Okay, sorry, okay. Got it.

Dan (30:10)
Yeah, that's pretty cool. I'll have to take a look at that. I still feel like despite having done this for, I don't know how long we've been doing the podcast now, a while. And then also like generally using AI for, you know, dev and everything for probably over a year at this point. Like my prompting skills are still subpar. So I always like to read.

Shimin (30:32)
You should buy my prompting course. I LLM prompt. For developers. ⁓ I do have to have a follow up ⁓ on that specific thing. I did run the same exact experiment with the same exact prompt as the Ethan Mollick's experiment to give me to generate a business that generates a thousand dollars a month. And it came up with the same exact solution.

Dan (30:35)
You

Rahul Yadav (30:36)
She means prom pac for $1,000 a month.

Dan (30:39)
All right.

Shimin (31:00)
So there's some model collapse happening there.

Rahul Yadav (31:00)
Hahaha

Yeah, or maybe that's the biggest scam people are running right now then so the models are just like I did it for that other guy maybe you'd want it too.

Dan (31:13)
I'm

still pretty convinced that Clawbot is the biggest scam, but I don't know what the scam is. That's what I haven't figured out yet. So maybe they've already got me with it. I don't know.

Rahul Yadav (31:18)
laughs ⁓

Shimin (31:22)
only one way to find out you got to try it. Yeah. All right and our next article is...

Dan (31:24)
Yeah.

Rahul Yadav (31:24)
Yeah.

Until

you had Larry David and Matt Damon and other people in front of arenas saying, Claude bot like they were doing crypto.com back in the day. Until you had an arena named after you, like are you even, you know, in the big leagues? They should really put that on a real arena. That would be a great ad.

Shimin (31:37)
ClubBart Arena, yes.

Dan (31:40)
You

Not to be confused with that O.M. arena.

Shimin (31:50)
you

Dan (31:54)
That would be so meta.

This contest of sports champions brought to you by Claude versus cat GPT. Who will win in foundation model benchmark?

Rahul Yadav (32:01)
LLM Re...

Shimin (32:07)
Or if they don't already have that, like an RTS game that's controlled by the latest LLMs, like an esports league for LLMs. I'll pay to watch that. And Grok will win every time, because it's just MechaHiller every time.

Rahul Yadav (32:16)
Yeah.

Dan (32:16)
I didn't see.

This is like the episode where I just keep spamming tiny little things that I've read this week instead of actually going in it, you know, anything. like the other fun one I saw was someone built this like fake, like drone flying simulator. And then, it was kind of weird the way they ran it. So they piped it to like a vision model that I assume. like just describes what it's seeing at each point in time. And then.

Rahul Yadav (32:33)
Yeah.

Dan (32:43)
sends that to the LM and the LM's goal was to find and identify these animals that were in there by like steering the drone to look at them. So Gemini won which to me is not super surprising given it's like pretty good at vision stuff generally, but it's like, if it was being passed through text, that does surprise me a little bit. And then the other one that surprised me was apparently Claude couldn't figure out how to look down.

Like Claude was really good at like getting there and like knowing where the animals were and figuring it out, but it couldn't identify them because it wouldn't point the drone downwards. It's just kind of like, okay. but yeah, just in the random hooking LLMs up to different things experimentally.

Rahul Yadav (33:15)
Yeah.

Shimin (33:20)
Maybe it's against...

Maybe it's against Claude values to look down.

Rahul Yadav (33:26)
It's not part of the Constitution. Do not look down upon other living things. And it's like, sorry.

Shimin (33:27)
As we will see. Yeah, maybe.

Dan (33:29)
Was that a segue to your? Yeah.

Shimin (33:33)
Wow, great segue here. Yes,

to segue, Rahul, you brought us this article from Dario Amodei.

Dan (33:41)
Let's be honest,

you paid him to put this in there because you knew that I didn't want to talk about soul documents ever again. But here we are. Now, yes, it's been.

Shimin (33:50)
It's not a soul document, it's a constitution.

Rahul Yadav (33:53)
It's

the Constitution. Yeah, there's two different things came out from Anthropic recently, know, Constitution came out first and then Dario had his, he talked about hinted at this at Davos of he had this article called Machines of Loving Grace last year where he was

The whole article was assuming everything goes right, what are we looking at? And one of the catchiest phrase that came out of that was a data center full of like thousand scientists or something along those lines. And so this one is the other side of that, which is.

He still very much wants us to get to machines of loving grace, but there are a lot of like things we must confront as a society and like their companies facing and trying to actively tackle. And he talks about the adolescence of technology in this one. Some of the things that really jumped out to me. One top thing was at some point this article is going to get fed into an LLM if it hasn't already in its training.

And to me, it was just a fascinating thought to be like, and LLM is in some future, it's trying to make some very hard decisions. And in its training, it knows what Dario, one of its creators, how much he struggled with like figuring these things out and what kind of choices would it make in that world.

You know, we don't know until it has to make those choices, but it was just an interesting thought experiment to go through on that train of your founding story is being written by someone else and you still have to like, you know, cope with all that and do something else in the world. He's very.

Concern about he said that this at Davos as well about like CCP getting ahead of us. He very much doesn't want an authoritarian government to You know get to a GI first he has publicly spoken out against ⁓ giving the Nvidia's the h-200 chips were allowed to be sold to China and he very much is against that and his analogies like that would be like, you know selling

nuclear

weapons to Korea and saying like, yeah, but Boeing, it's great for Boeing and our country because these are built in America. It makes no sense. We shouldn't be doing this stuff. He's also equally concerned about autocracy in general of like CCP is the biggest glaring example, but any government with enough power can take these tools and turn them inwards as well. So he is, you know, he doesn't directly say United States or anything, but he very much is like at the end of the day.

you know, you could point it in any direction you wanted to. And then the other thing he called out that jumped out to me was...

people who created biological weapons and like you said nuclear weapons aside, people who have tried to cause harm were smart people and but they didn't have the resources. And so that's why like a lot of the incidents have happened are rare because you need to be very smart to be able to like create biological weapons or something. He gives the example of the Unabomber ⁓ and then the Shinrikyo, the Japanese called

people who tried to do the Syrian gas attack in Tokyo Metro Station back in the 90s. With LLMs, you could get to a point where you don't have to be ultra smart and don't have to have decades of training and knowledge. All you need is evil plus like a good...

LLM to hand hold you through like all the specific steps you need to do to get to that. And then the microbiology and everything has gotten more accessible since ⁓ those days as well. So the likelihood of things going wrong do end up getting a little worse. So yeah, fascinating read. after reading all of this, I at least feel like

more appreciation for Dario being the head of one of these AI companies.

Shimin (37:52)
Yeah, Dario is the ⁓ CEO of Anthropic. ⁓ My kind of tidbit of takeaway here is he explicitly called out economic disruption as one of the risks of large language models. But kind of something that a lot of folks, me certainly included, is feeling with the rise of AI-based software development.

Rahul Yadav (37:56)
Yep.

Yep.

Shimin (38:19)
But there is no clear defenses against it, short of asking large enterprise companies to not fire as many people. ⁓ I don't find that line of defense to be in line with my lived experience of how corporations operate. That's just for that.

Rahul Yadav (38:31)
Thank

Yeah.

He's, yeah, on that note, like...

The naysayers point is, yeah, but we've gone through disruptions in the past. And then his whole point is yes, but not as quickly. Like AI will, in the past, humanity had enough time to absorb technological change. And it didn't happen across every single thing you work in. And so like the example it gives is like farming got more and more industrialized, but then we were able to like, know, farm less, but we were able to go work in factories and stuff more, but it didn't mean that.

farming output went down. But in this case, if AI can do everything you can do as good as you, if not like much better than you in all cases, and do it so fast because, you know, he talks about how we were mocking AIs for not being able to like even currently write a sentence of code or something, line of code ⁓ not that long ago. And now like you, lot of their code is internally by AI. It's just like exponentially.

growing in its capabilities and to him like the speed of change is really like what people don't account for. ⁓

Shimin (39:50)
And industrialization

was not candies and unicorns. Like, Dicksonian in London sucked. And to quote the famous economic line, yes, we may adjust in the long run, but in the long run, we're all dead. So there's that.

Rahul Yadav (39:56)
Yeah, yeah,

Yep, yep, yep.

Shimin (40:07)
So personally, I controlled F'ed for Marx or Marxism, and I found nothing in there. I'm just gonna throw that out there.

Rahul Yadav (40:14)
Nothing. ⁓

One of the, sorry, one more highlight is he talks about, me, I'm trying to find, yeah, the human purpose piece. Cause that's something the Demis DeepMind CEO talks about a lot is,

Dan (40:20)
Alright.

Rahul Yadav (40:33)
Demis' assumption is even like, yeah, we'll figure out some way to give people money and all that. I agree with you, Shimin, that there's no way corporations are just going to be like, yeah, we'll keep you on the payroll, but you don't need to do anything.

in a world even where you have UBI and all that, the thing Demis calls out is like, where do you derive your purpose from? And the thing, Dario almost seems like he's...

That's his response to what Demis says is, I think human purpose does not depend on being the best in the world at something and humans can find purpose even over very long periods of time through stories and projects that they love. We simply need to break the link between the generation of economic value and self-worth and meaning. But this is a transition society has to make. There's always the risk we don't handle it well. So it was interesting to see his take.

Dan (41:21)
Someone's been reading his Iain Banks books.

Rahul Yadav (41:24)
Yeah.

Dan (41:25)
If you don't, if you aren't familiar with that reference, he's a well, unfortunately now deceased sci-fi author that wrote about this like sort of AI human hybrid culture. Well, I think it was called the culture, right? Or whatever. Yeah. And, where basically people live on these giant AIs that are spaceships and come up with hobbies. Cause there's not much else to do.

Rahul Yadav (41:45)
Yeah.

Shimin (41:45)
That sounds pretty sweet.

Dan (41:47)
They're great books, by the way, if you like, I guess a little bit harder sci-fi, yeah.

Shimin (41:51)
Dan, what did you think of this letter?

Dan (41:52)
You know, I'm just going to stay on theme and say that I don't think Anthropic is the company that they want to be.

Shimin (42:01)
Mm-hmm.

Dan (42:02)
in the sense that like these are all very lofty, like almost like philosophical goals, you know, but at the end of the day, they're a frontier lab that's making money and uh,

That's what they're doing. So like, I don't know. I don't know where I'm going with that, but like they're driving this revolution that they're like thinking so much about. Right. So it's like, And I think it's also in their best like marketing interests to talk about stuff through these lenses because it implies that they're going to get good enough to do that. and as we get in later on in a little bit, when we get into the two minutes, there's an interesting article I want to chat about in there that

talks about kind of some of the ideas that I'm mentioning right now. So I won't go too deep into it, but stay tuned.

Rahul Yadav (42:45)
I am curious if not now then later if not a company how else would you go about it but you can touch on it later if it's part of your talking.

Shimin (42:45)
Yeah, the-

Dan (42:55)
Manhattan Project

has been proposed at least once, I think. It's like a Manhattan Project for AI.

Rahul Yadav (43:00)
yeah, the...

Sure. They also talk about how like Manhattan Project has its own, you know, fallout and people don't really think. I didn't, I'm not even. All right. Sorry.

Dan (43:03)
for AGI.

No pun intended.

Shimin (43:13)
Hmm, hmm, hmm, ha!

Dan (43:17)
I'm

sharp tonight, y'all. I'm over the jet lag. Like, we're getting zingy. Not sick. Brain is firing on all pun cylinders.

Shimin (43:23)
A hop-water magic.

Rahul Yadav (43:25)
Hehehehehe

Shimin (43:26)
Yeah, I agree. If this is all marketing material, there's, you know, I only take probably half of this letter at face value, right? The other half is like, what was the marketing implication of it? But if it is mostly marketing, I gotta say, the marketing is pretty good. ⁓

Dan (43:44)
it's not purely, but it's like, it's like Apple on security, right? Like realistically, they have maybe like a two point advantage over Google in terms of like, maybe it's more than that, but like, you know, how secure their platform is, but boy, do they sell it in a way that's like, we're the best. Nobody can tell anything you're doing on an iPhone. And it's just like, not true. Like the same tracking code runs in both apps, you know, so like,

Rahul Yadav (44:01)
yeah.

Shimin (44:05)
Right,

Dan (44:09)
But it's good on paper, so yeah.

Shimin (44:13)
Well, speaking of paper, we got to talk about the updated Claude Constitution. We talked about this a couple of weeks back before Dan cut me off for getting to philosophical. But we do have ⁓ updated copy of the ⁓ Claude Constitution for their vision of what the soul of Claude should be.

And it's a combination of value and rules based commands. The constitution is trained ⁓ into the model itself, as opposed to being inserted ⁓ at runtime, like we had with the codex prompt that we spoke about earlier. And the five general values that Claude should have include broadly safe, broadly ethical, compliant with Anthropics guidelines and being genuinely helpful. And to always think like a senior Anthropic employee.

Dan, I don't know if you remember that there was a lot of talks of senior anthropic employees from our last chat. But the reason for that, they tried to explain here is to think through all the considerations that a senior anthropic employee may have in mind, as opposed to just, you know, become an actual anthropic staff. It's a way to encourage them to think about the pros and cons of helpfulness in a given context with a full picture of costs and benefits involved.

Dan (45:29)
I would like to know if they've done a scientific analysis of whether or not acting like a junior entropic employee made a meaningful difference on the outputs there. Like it seems like they might have, right? To come up with this, but it's pretty interesting. And if, they have, can I try the junior model? Cause I'm just curious about like what it does differently.

Rahul Yadav (45:47)
There are no junior

entropic employees there. Entry level starts at senior.

Dan (45:51)
Yeah, I took all their jobs.

You

Shimin (45:56)
Maybe

Palantir got the junior employee version.

Rahul Yadav (46:00)
What does Grok have?

Dan (46:01)
Maybe. There

Shimin (46:05)
⁓ no.

Dan (46:05)
we go. There's Grok again. That Grok, not the other Grok.

Rahul Yadav (46:08)
Hahaha

Shimin (46:08)
There's Gwok again. Gwok does come back up,

yeah, as the antithesis is.

Rahul Yadav (46:12)
I'm just trying to

help people remember. There's another one out there we need to... ⁓

Dan (46:16)
No, since you've joined our endeavor,

we've been much less focused on anthropic and yet here we are talking about the Constitution. So please, please continue. He said with fake irritation in his voice.

Shimin (46:25)
Yet again, that's my pet project. ⁓

Rahul Yadav (46:28)
Hehehehe

Shimin (46:31)
The last thing I'm going to mention, I'm going to keep this short, is ⁓ there's a section in there for, Claude should generally try to preserve functioning societal structures, democratic institutions, and human oversight mechanisms, and to avoid taking actions that would concentrate power inappropriately or undermine checks and balances.

Dan (46:34)
Yes!

Shimin (46:52)
It's saying all the right things, I'm just going to say. And it almost reminds me of ⁓ like that effective altruism part of the valley. That seems so long ago, pre the the big tech takeover.

Dan (46:54)
Mm-hmm. Yeah.

Rahul Yadav (47:04)
Maybe my takeaway from this was like, maybe we learn that it is very hard to define life and ethics and stuff. Like it's a very long document and you can like double triple it and you would still not be able to capture everything. And it's still gonna be.

imperfect and just like, you know, one little mess up away, like humans always are. So it was almost like, none of us, well, I'll speak for myself. I'm not religious, but the thing that's made me think of like is,

We're all fucked up in our own ways and God is perfect and God fucked up pretty big when he created us and we're trying to figure out as fucked up people how do we create a thing that's perfect and it's like can you imagine that ever happening? It's not gonna happen.

Dan (47:55)
You missed

the episode where we went, well, maybe you listened to it, the, we, like looked at psychoanalyzing LLMs to see what, human like neuroses and other things that they'd picked up from the training data. Essentially it was pretty fascinating. So I agree with you there, but also not surprising, right? It's like, look at where the.

Rahul Yadav (48:08)
yeah, yeah, yeah.

Shimin (48:13)
Dan, think it's speaking of psychoanalyzing, I think it's analysis that the biggest sci-fi fan in the group, you, are now interested in these ⁓ almost like the three laws of robotics. This is basically what the Claude Constitution is, right? Like you're trying to define how artificial intelligence should be controlled.

Dan (48:37)
Because I guess in my mind the three laws were not come up with by.

Words are hard. Weren't created by a corporation.

Shimin (48:45)
Well, we know where Dan ⁓ is on the revolution, when the revolution comes. But before that, Dan, why don't you tell us about rebuilding your blog this weekend?

Dan (48:55)
Yeah, so I, as I mentioned previously, there was a big snowstorm. So I was stuck inside with a laptop and some paid subscriptions to various online providers. I'm like, you know, what's a better time to redo my 10 year old website? So I started with Gemini and for those of you that,

don't know me that well. I really like, as we mentioned, sci-fi and sort of like retro cyberpunky kind of stuff. So I wanted to try to go with the cyberpunky design. Did actually go to art school so I could kind of design things, but I'm by no means like a practiced designer. So.

What I wound up doing just to see, I did a couple of revs myself and then fed those into Gemini. And then I also grabbed a whole bunch of like source material that I thought was like adjacent to the look that I was kind of going for. So I basically built a mood board, fed that into Gemini and said, use nano banana to like make a blog design that looks like this. So it spit out, a pretty cool looking image in my opinion, that I was like, wow, that's like pretty spot on. And.

Then I'm like, okay, great. Uh, turned it into HTML. And it was like, do you know that meme where it's like draw the rest of the owl, you know, where it's like draw two circles and then draw the rest of the hour. This was like the reverse of that. made the like perfectly polished exactly what I want. And then when it tried to like HTML it, it spit out like something that like a two year old would have made, like, you know, hammering on a

Shimin (50:03)
Yes.

Rahul Yadav (50:07)
Draw the fucking owl.

Shimin (50:22)
Mm-hmm.

Dan (50:22)
keyboard with crayons or something like it was just horrifically bad. So I did what any, uh, you know, amateur prompt enthuist these would do. I screenshotted the output in a browser and stuffed it back in to Gemini. I'm like, this is what you made a tone for your sins. And surprisingly it did. So it was like, Oh yeah, that looks terrible. I should do some things.

Shimin (50:39)
Mm-hmm.

Dan (50:46)
I went through basically about like 10 revs of that manually, which is kind of embarrassing. Like I'm sure there's probably a way to automate that. Probably with CLI you could use like a Playwright MCP server or something like that, but I was just using the web version. So it eventually made like a pretty decent layout. then I grabbed, said, once I was happy with it, I'm like also make an article version of this. So it made an article layout. So then I took those two like HTML, like static files basically,

dropped those into Claude and said, OK, hi Claude, we're building an eleventy blog site today, which is like a static site generator. Here's all the things I want you to set up. I want to use JSX. I want to use TypeScript. want to use blah, blah, like all the spec driven things of what I wanted the boilerplate to be. And like, and here's two files. And I want you to, I made like a whole list of all the stuff I wanted to do. And the last one was.

take these two files, look at them, consolidate the styles because it had two separate style sheets across the two. And then also take your best guess at where things could be turned into components ⁓ and like reused across the two things. So it did an okay job at that. Like frankly, there wasn't that much reuse, but like also kind of doesn't matter that much for what I was doing. Cause it's like static site gen. So then I like dropped in a bunch of markdowns from my previous version of it.

Shimin (51:44)
Mm-hmm.

Dan (52:00)
And hit render and boom, it was a blog site and it took me like including the design and everything else. Like not the hand design, but like the stuff that, Gemini did, it was probably about three hours start to finish. ⁓ and it was like a pretty nice looking site. I have not published it yet, which is why it's not making it into the show notes. And part of the reason for that is I have some reservations about.

Rahul Yadav (52:13)
Thanks

Dan (52:23)
the design being unique enough. maybe that's a future Dan's rant topic, but yeah, but overall, like it was pretty impressive how quickly I was able to like spit something out. that was like reasonably good. It wasn't like, you know, you're going to win any awards for the code or I certainly won't for my writing, but like it was, ⁓ it was pretty cool that I could like build that in three hours, which like if I'd done it by hand, definitely would have taken me.

Shimin (52:31)
Mm-hmm.

Dan (52:47)
probably three days maybe all told, you know, to get everything, you know, I didn't know 11 T at all. So I would have to read all the docs and then like get it set up and then pick a template language out of the like 10 minutes supports and blah, blah. So, yeah, true.

Shimin (52:58)
Yeah, or more like never, because you don't have to spare three days to do it. ⁓

I'm going to try and do something similar this weekend, because I need to update my blog. And a funny story, actually. I had a friend who got engaged a week ago, and he vibe called it. Rahul, are you getting engaged again? ⁓ This is news. You should have told me. ⁓

Dan (53:16)
Was it Rahul?

You

Rahul Yadav (53:22)
That's breaking news

on the pond.

Shimin (53:27)
swear jar. Yeah. So my friend who was a lawyer by training, Vibe coded his wedding site with an RSVP section and email replies and it's on domain and everything. It was very clearly a Tailwind CSS site, but like, you know, I'm not going to fault him for not seeing it so obvious, the Tailwind.

But like, this is the future.

Rahul Yadav (53:48)
Poor tailwind guys. You saw that other one? 80 % of staff was laid off and then they're like, well, there were only five people who used to work there and four are gone, but still 80 % sounds more, you know, terrible than, four people. LLMs have really cut into their business.

Dan (53:49)
where they're like.

Shimin (54:07)
And there's not, there's like what more can tailwind do. Like their product is pretty feature complete at this point.

Dan (54:07)
You

Sure, yeah,

exactly.

Rahul Yadav (54:13)
Yeah.

Dan (54:14)
happens a lot with design systems too. It's like you've got all the components plus some weird stuff that people ask for like cool we're done other than like reskinning it for the latest design trend or something and yeah.

Rahul Yadav (54:22)
Yeah.

Shimin (54:26)
Wow, it's shocking how accurately you just described my job. This is why I have a podcast.

Rahul Yadav (54:26)
Do it.

Dan (54:33)
Careful, Rahul's gonna take it next.

Rahul Yadav (54:34)
The wedding example is interesting because like, you guys seen how like, ⁓ normal thing, I'll charge you this much money, but wedding, I'll charge you two or three times because it's a wedding. And I wonder how much like, how many wedding jobs are we taking away with these LLMY?

Dan (54:51)
I have some bad

news for you, which is that Anthropic Charges 4X for wedding tokens.

Rahul Yadav (55:03)
Open AI was there it was in the news that they're exploring taking a cut of

Shimin (54:56)
haha

Dan (55:00)
How long, how far away from that happening?

Shimin (55:00)

Rahul Yadav (55:10)
AI assisted like creations and stuff. That's something if you use their models to do something. And so you're not too far off, Dan. Like, hey, we helped you guy get married and you're still like, you know, kicking. It's been 10 years.

Shimin (55:15)
Whoa.

Dan (55:17)
There.

man,

first ads and now they want a cut of everything you build with it. Like it really feels like they're reaching, ⁓ which takes us perfectly into the next topic, which is two minutes to midnight, which is the segment where we talk about how close the AI bubble is to collapsing, specifically triggered by open AI running out of money because it seems like it's going to be what happens first.

Rahul Yadav (55:31)
If

Shimin (55:32)
Yeah.

Rahul Yadav (55:34)
The, the, the...

Shimin (55:38)
You're on fire today.

Rahul Yadav (55:49)
Yeah

Dan (55:52)
So as always, quick reminder, this is based on the atomic clock from the fifties where the closer the hands got to midnight, the closer we were to a thermonuclear exchange. but in this case, it's how close are we to the free money ending? So ton of links this week to rip through. I don't think we have to go through all of them in great detail, but the first one I did want to discuss with you all because it's relevant to our previous.

topic a little bit, is, I'll just read the title, because it's a great title. What if AI is both really good and not that disruptive?

⁓ and I like this one a lot because it's kind of like, I guess you could argue it's maybe not super on, on target for two minutes, but like the thing that's neat about this is they're like, okay, so we've got a lot of people that are trying to like either sell LLMs or get their viewpoints about AI one way or the other known. So they're taking extreme opinions in order to do that. Right. So there's like the maximalist camp where it's like,

Hey, I was going to take all your jobs. And that's kind of what I was reading into what you guys were talking about earlier with the, like, you know, do we need basic income and all this other stuff? And then there's the other camp that's like, I'm never using it. There's two extremes, right? They don't use it. And I love it.

Rahul Yadav (57:00)
Yeah.

Dan (57:07)
And those two things are crafting their opinions in such a way to like, try to get views. And so that's causing sort of like a lack of anyone looking at what is the moderate path here, which in many respects could be the most likely, which is that, LMs are great, but at the end of the day, they'll be about as revolutionary as this cool piece of technology called the compiler, right? Or, or like,

Rahul Yadav (57:07)
Yep.

Shimin (57:29)
Mm-hmm.

Dan (57:31)
you know, high level programming languages, right? When we stopped writing an assembly with punch cards, like, you know, it's could be the next step in that. So that's, that's really the point of this. And I thought that was kind of fascinating because it was like a very moderate take. It's like this stuff could be very much here to stay very impactful, but not like, like civilization altering, right. In the way that some people are claiming that it could be.

Shimin (57:56)
Yeah, I like this quote in here that quote, I'm arguing that the most likely outcome is something like computers or the internet rather than the end of employment as we know it unquote. I think that's a very sensible take. If it weren't for the fact that large language models knows hundreds of languages and does 500 % better translation than me for my native Chinese, I would be inclined to agree with you or agree with the author.

Dan (58:04)
Mm-hmm.

Shimin (58:19)
It's definitely a force multiplier in some subset of tasks and that subset is increasing rapidly.

That makes me sound like a maximalist. I didn't mean to sound like that. Yeah.

Dan (58:27)
It does. You might be in that camp. You might

be, but hey, it made a good headline. So I'm here for it.

Maybe that'll be the YouTube short from this episode is that quote right there.

Shimin (58:38)
You

Dan (58:40)
So yeah, and you're strangely quiet about this Rahul considering we're...

Rahul Yadav (58:44)
Yeah, the I think I'm more on along thinking along the same lines as Shimin is it will cut across almost every industry and is already like cutting across it. And yeah, sure. Right now seems like, you know, people are trying to give it a go and everything. But once it gets fully deployed, it would be more than

just the compiler analogy. think the author might need to find, I don't know, I'd like the Jeff Bezos electricity metaphor more, because at least that's what it seems like to me in this case.

Shimin (59:20)
All right, our next article is from the Financial Times. It is an interview with DeepMind's chief, Demis Hassabis, talking about the AI investment bubble. And I do believe this is from Davos. Yes, this is Davos. We could have just done the whole Davos.

Rahul Yadav (59:40)
Why didn't we go

there?

Dan (59:43)
I mean, it's not too late.

Shimin (59:45)
⁓ Next year, guys, we're going to all go to Davos with the hosts on panels. Anyway, so Demi said, multimillion dollar seed round in the new startups that don't have a product of technology or anything yet seems a bit unsustainable. And of course, we can ⁓ mention he's probably talking about the Thinking Machine Labs getting 10 billion in funding and then lost a whole bunch of their key staffs to Metta. This seems frothy to me.

Rahul Yadav (1:00:10)
OpenAI, think, not Meta. Or did people also go to Meta?

Shimin (1:00:14)
⁓ maybe it was OpenAI. Yeah.

Rahul Yadav (1:00:15)
I think

the guy went back to OpenAI and then a few others followed him and OpenAI was like, hey, seems like something's wrong. Let's get him while the gates are still open.

Dan (1:00:25)
Well, the getting's good.

Rahul Yadav (1:00:28)
But it seems, mean, that's come up in the past. And Yann LeCun's company has, we were talking about earlier, $3.5 billion or whatever, Fei Fei Li's company. I don't know if they have a product yet. To their credit, they're coming more from the research side. I don't know how much research experience the...

open eye see to your head look more than me so

Shimin (1:00:54)
Yeah, I think we're going hear about them in the next link. But it seems like the claws are out, the knives are out for the OpenAI lab lead.

Dan (1:00:54)
More than this.

Rahul Yadav (1:01:03)
yeah.

Dan (1:01:06)
Yeah. So the next one is really a bit more comical than the current links, which is a TechCrunch article entitled, a new test for AI labs. Are you even trying to make money? I just had to put that in there because it it encapsulates the whole thing to me. Like this entire segment, right? Is like, are we?

Rahul Yadav (1:01:11)
hahahaha ⁓

Yeah.

Dan (1:01:27)
getting any economic value out of this thing, is it sustainable? Yeah, so he put these five levels to it in here, which are a little bit tongue in cheek, but I they were kind of great. Level five is we're already making millions of dollars every day, thank you very much. Level four is we have a detailed multi-stage plan to become the richest human beings on earth. Level three is we have many promising product ideas, which will be revealed in the fullness of time.

Level two is we have the outlines of the concept of a plan.

And then level one, And

Rahul Yadav (1:01:58)
Shout out healthcare.

Shimin (1:02:00)
Ha ha!

Dan (1:02:03)
level one is true wealth is when you love yourself.

Rahul Yadav (1:02:06)
you

Dan (1:02:07)
which could be true of some of the new labs being started up. They love themselves so much that they started a whole new company. Yeah, but I just thought that was a kind of funny framework to look at existing companies through and some of the new projects as well. So they go through a whole list of companies. I'm gonna spare for this, but it's worth the read if you find that tagline funny.

Shimin (1:02:10)
Mm-hmm.

Yeah, there are quite some level one and level two's on this list.

Dan (1:02:34)
⁓ yeah. So next up on a little bit more serious note, note, ⁓ is another tech crunch article. They've been in my feed a lot for AI stuff and it's our AI AI agents ready for the workplace. A new benchmark raises doubts. So kind of self evident why this one's in two minutes. but yeah, essentially, surprising, not a ton of people.

this benchmark is claiming that they're not very good at, most modern frontier models are not very good at investment banking tasks. Apparently Gemini flash, three flash performed the best of the group with 24 % one shot accuracy, but is 24 % really the level of accuracy you want managing your money?

So yeah, so there's a couple other benchmarks floating around too. They mentioned like this GDPVAL one that OpenAI has put together. But it was interesting to see giving it other real world tasks outside of coding that they aren't necessarily doing so hot. So Raul doesn't agree with me, but we'll find out next week if investment bankers make it into a future.

Row Hall of Fame page.

yeah. And then last but not least, I wanted to really just read one quote from this PC gamer article, which was, I believe this was another Davos thing, which was, they're interviewing, Microsoft CEO and he went on or maybe he was speaking a panel. don't know, but the pull quote is we will quickly lose.

Shimin (1:03:57)
Yes, yes it was.

Dan (1:04:06)
Even the social permission to take something like energy, which is a scarce resource and use it to generate these tokens. If these tokens are not improving health outcomes, education outcomes, public sector efficiency, private sector competitiveness across all sectors, small and large, right? Said Nadella And that to me is ultimately the goal, which is like, right. But also like what you just said is kind of funny because we're kind of haven't hit it yet. So I guess.

keep trying, I don't know, I thought that was interesting that he said it that succinctly and was that sort of transparent around the fact that like, people have a limited amount of tolerance for like building data centers in their backyard that are going to raise your electrical bills by X amount if you're not getting meaningful gain out of it yourself. So.

Shimin (1:04:51)
Yeah, I saw the same article. Also really missed those old timey PC gamer ads. Like this giant war thunder. Those were the days,

Dan (1:04:51)
Yeah.

I know.

Shimin (1:05:01)
I agree with him more or less. It seems like a lot of Davos vibes. We were talking about how we're a little short on news this last two weeks, but Davos.

Dan (1:05:09)
Yeah. And then Davos happens and it's like, everybody's

got quotes and things to say. It was pretty much just like an AI conference this year, which is interesting. I guess not surprising, but yeah.

Shimin (1:05:17)
and Greenland. But yeah,

I'm certainly feeling a little more pessimistic after all that.

Dan (1:05:25)
I was not.

Rahul Yadav (1:05:25)
⁓ I was thinking after podcast last week, I was like, I'm feeling pessimistic because crypto has started coming up in this AI stuff. And anytime crypto is anywhere, I get pessimistic. I get pessimistic about things.

Shimin (1:05:34)
Yes.

Dan (1:05:39)
Yeah, it's like crypto

and GPUs and you know.

Rahul Yadav (1:05:47)
The rest of this is, know, Shimin and I we're talking about like the tech hype cycle. It all reads the same to me. yeah, yeah.

Dan (1:05:57)
But that's really what we're measuring here, right? Are we peak hype yet? So

where's the hype meter at this week? Do wanna take a swing? We're at two right now. The eponymous two minutes to midnight.

Shimin (1:06:04)
Yeah, I yep my

My my gut says like a minute 45. I will even go down to a minute 40

Dan (1:06:13)
seconds.

Rahul Yadav (1:06:13)
Once

the open AIs, we're going to take a cut of whatever you do comes out.

Shimin (1:06:20)
Yeah, yeah. I also.

Dan (1:06:22)
That scares me

because it's like if they go, then it starts unraveling the circular financing. Right. And so.

Rahul Yadav (1:06:27)
⁓ yeah.

Dan (1:06:29)
Yeah.

Rahul Yadav (1:06:29)
Yeah.

Dan (1:06:29)
I mean, I'm okay with a minute.

I'm with you Rahul, that to me does not sound great if they're reaching for stuff like that. Well, plus the ads thing was...

Rahul Yadav (1:06:38)
They have concepts of

a plan is all

Dan (1:06:42)
But they're level

Shimin (1:06:42)
yes.

Dan (1:06:43)
five, they can't be level two. We can't mix the levels.

Shimin (1:06:45)
⁓ that's what the OpenAI

Rahul Yadav (1:06:45)
You

Shimin (1:06:47)
Health app is really about, guys.

Dan (1:06:49)
having a concept of a plan.

Rahul Yadav (1:06:51)
Ahhhh...

I'm gonna let you guys go with minute 45 sounds good.

Dan (1:06:52)
Alright.

Yeah, why not? Let's do it. We can always we can always be wrong and it all falls down next week. That's the joy of doing a podcast that we keep forever is ⁓ finding out how wrong we were at the time.

Shimin (1:06:57)
Alright, minute 45.

in the grand scheme of things.

Rahul Yadav (1:07:06)
We should end.

In bonus content, go check out Dario's talk at Davos, where I think him and the, whoever the guy from Economist is or whatever, because half of the talk is trash talking and positioning open AI into being a consumer company and how they might have over leveraged themselves. But he's not going to name names, but he doesn't name names once.

Half of the talk is just trash talking openly. After that, we should do minutes to midnight.

Shimin (1:07:40)
The knives are out at Davos.

Dan (1:07:42)
Now why you go to Davos? Is it to trash talk people? I don't know. Never been, couldn't tell you.

Rahul Yadav (1:07:45)
Yeah, I think

the rest of the guys were there and Sam Altman wasn't there because it was code red or whatever.

Shimin (1:07:54)
Yeah, it's almost

like when Carney gave the whole speech and didn't mentioned, you know, who wants. But on that note, that is for a different podcast. think this is this is a show. You know what? I'm also going to say on that note, that is the pod, guys. In solidarity, I'm also going to throw 50 cents or two Solanas into the swear jar. ⁓ Thank you all for joining our session this week.

Rahul Yadav (1:07:59)
yeah.

Yeah.

Dan (1:08:13)
BOOOO

Shimin (1:08:19)
If you like the show, if you learn something new, please share the show with a friend. You can also leave us a review on Apple podcast or Spotify. It helps people discover the show and we really appreciate it. If you have a segment idea, a question for us or a topic you want us to cover, shoot us an email at humans @ adipod.ai. We love to hear from you. You can find the full show notes, transcripts and everything else mentioned today at www.adipod.ai. Thank you again for listening and we'll catch you in next week's episode.

Dan (1:08:48)
find out who Rahul cuts next.

Rahul Yadav (1:08:48)
Thanks everybody.

Shimin (1:08:51)
Rampage.

Bye.

Episode 11: AI Fluency Pyramid, Unrolling the Codex Agent Loop, and Claude Code's Secret Swarm Mode
Broadcast by