Episode 13: Pi Coding Agent, Dark Factories & the Furniture Makers of Carolina

Shimin (00:15)
Hello and welcome to Artificial Developer Intelligence, a conversation show where three software engineers chat about programming in the age of AI. We are still America's number one podcast for generating training data for future AI historians on how developers reacted to their creation. I am Shimin Zhang and with me today is my co-host Dan. He holds code in his hands and mold it like clay in the caress of a master sculptor, Lasky.

And Rahul, he's actively rooting for the sloppocalypse. Yeah, Def. Hey guys, how are you doing?

Dan (00:49)
It's a very distinct difference between our middle names this week. That's pretty good. Do you use AI to generate those or do you come up with them on your own?

Rahul Yadav (00:50)
Hello,

Shimin (00:59)
I come up with them on my own. There are certain things that I do not use AI for.

Dan (01:00)
guys.

Rahul Yadav (01:04)
You just need some creativity.

Dan (01:04)
And just to peel back the curtain, just to peel back the curtain a tiny bit for the audience, we don't get any free insight into what our middle names are going to be each week. So it's always exciting.

Shimin (01:14)
It's always a rushed homework before the show starts. So.

Dan (01:18)
You've done a great job, in

my humble opinion.

Shimin (01:21)
On this week's as per usual, we will start with the news thread mill where we're going to talk about our new AI models from Anthropic and OpenAI,

Dan (01:33)
Then we'll jump into the tool shed where we actually have a tool. It's the Pi coding agent. Shimon will be telling us what that is and what he's done with it.

Shimin (01:42)
Very excited. After that, we will have post-processing where we're going to talk about the AI security industry, the software dark factories, and a blog post about developers mourning our craft.

Dan (01:55)
Then we're going to hop into Vibe and tell where We're going to compare our insights output from Claude code and see how we can all optimize our Claude code usage and maybe compare notes on our own.

Shimin (02:06)
And as always, we will finish the show with two minutes to midnight where we talk about where we are in the AI bubble cycle using the analogy from the nuclear clock where midnight is when we have MAD

All right, let's get started.

So first up, On February 5th, so around a week ago, a little less than a week ago, both Anthropic and OpenAI announced their latest models, the Claude Opus 4.6 and GPT Codex 5.3, within 30 minutes of each other.

Dan (02:46)
It's pretty wild.

Shimin (02:47)
Yeah, they also proceeded to, or at least Anthropic proceeded to, release some very sharp attack ads towards OpenAI's decision to launch ads in its free tier. And they actually played it during the Super Bowl. Go Hawks. So now that it has been a week, more or less,

Have you all used the latest models? What do you guys think?

Dan (03:13)
Yeah, I've been, poking around at it. Opus 4.6 actually starting today, mostly cause I, ⁓ was a little bit lazy and realizing that we'd gotten it at work. And, ⁓ I've been only using it intermittently, ⁓ for personal stuff because of the, you know, kind of low on the, Claude subscription totem pole. So I didn't want to.

burn the candle too hard with it, but I've been pretty impressed with it so far. I honestly like my real take is like, I haven't noticed a huge difference, but it's definitely not been any kind of regression where some people like freak out like, it ruined everything. Like they've said about past rollouts and I think it's been fine.

Rahul Yadav (03:55)
Yeah, the I switched from a Claude actually, I think at some point I was like, you want to try out 4.6 or something? And then I was like, Yeah. Yeah. And so I switched to that. But like you

Shimin (03:56)
Yeah.

Dan (04:05)
You know you want to try it man burn those tokens buddy

Rahul Yadav (04:12)
couldn't really tell the difference. But then I hit the usage limit and then it was like, if you want 50 bucks extra, you can just like do this slash extra usage or whatever command. I was like, why not? It's either that or I wait until like 11 PM Pacific time to get my rate limit renewed. I was...

Dan (04:20)
Yeah.

It's.

which incidentally may be a future Dan's

rant topic does not work if you are subscribed through one of the app store subscriptions to pay for it. So I did not get my free $50 and I'm a little bit sad about that.

Shimin (04:42)
Mmm. Gotcha.

I'm probably paying too much for my subscription. It has not been a problem. I've barely hitting my limits. ⁓ So I guess, you know, this is the vibe check, right? Vibe check has been kind of okay. Yeah.

Dan (04:50)
Yes, you are.

Wait, you switched to... to...

You

switched to Max for a gas down, right? Did you ever switch back off of Max?

Shimin (05:07)
I have not switched back, but I have been using significantly more tokens. So I don't think the $20 a week would have done it. Yeah. As far as the benchmark goes, think GPT 5.3 scores slightly higher ⁓ in SWE Bench, surprisingly, for the first time. But ⁓ vibe-wise, I've used both models.

Dan (05:09)
You

gonna cut it anymore. Okay.

Shimin (05:34)
I've actually done the same experiment asking all ⁓ three big model providers, so Anthropic, OpenAI, and Gemini, to create the same app using the CLI with the same exact prompt. And my experience has been, Gemini 3 pooped the bed pretty quickly. It ran into API issues and then couldn't really follow the direction.

Dan (05:58)
You

Shimin (06:01)
I'm sorry, Gemini. You're not doing great here. GPT-5.3 asked a lot more feedback. So instead of automatically releasing the coding swarm, it does a section of the work and asks me, do you want to continue or do you want to check over the work? Whereas Claude Code with 4.6 just did the whole thing, beginning to end, and then verified its work.

That has been the biggest differentiator I've noticed with Opus is that it became much more... it took the initiative to do the testing and get the feedback and fix any issues and then be like, hey, this is done, which is a step improvement.

Dan (06:37)
Mm.

I did notice today too, when I was, using it for some heavier stuff at work. And, one of the things that's interesting is like they mentioned in the notes that you have up on screen Claude co-opis improves its predecessors coding skills, blah, blah, blah, but it plans more carefully. And they really aren't kidding about that part. Cause like, I didn't even like ask for a plan mode and it dropped it into plan mode. And then it wrote a fairly cohesive plan. And then the thought that I've part of that I found really interesting was

It offered to clear context after it generated the plan. I think that's some relatively, I think that's actually client, not model behavior. Like it's the agent part. ⁓ but then it writes like a plan dot MD into the root and it can like follow that even when it's interrupted, which I thought was interesting. ⁓ which the other part is like, it's making it.

Shimin (07:09)
Mm.

Yeah, that's quite clever.

Dan (07:30)
harder to use beads in my experience. like I've been a big proponent of beads for a while, but like, I don't see how well that meshes with the planned out MD stuff unless you like explicitly tell it during planning, like, okay, now commit all these to beads and go ahead. ⁓ so it'll be interesting to see if I need to evolve my like slash catch up workflow where it's basically like read the history, read the beads and tell me where we're at and then keep going.

Shimin (07:58)
Yeah. Funny you mentioned beads. You know how I've been creating mini learning apps to help me read papers? ⁓ I was curious about ⁓ whether or not Claude Code still needs beads, like how much additional benefit beads was giving me. So I ran the same exact prompt. Well, two variations of the prompt. One variation that tells Claude Code to use beads as

Dan (08:05)
Yes.

without it.

Shimin (08:24)
project tracking and another version that's just the prompt and then hitting go. And I did not know, I did not notice a huge difference between the two. And as control, I also ran the same thing with a very little prompt, just like create a web app to help me learn this paper. And that was significantly worse than the, than the first two approach.

Dan (08:33)
What was the

What was the scope of the app though? how, like what, mean maybe approximate lines of code or how many files or like give me some idea of.

Shimin (08:54)
fairly large. It's got eight modules, quite a few interactivity components. So I will say like a couple of thousand lines. Well, this is web, so probably like 10. It's probably more than 10,000 lines. So it's fairly, fairly large. And...

Dan (09:06)
Hmm. Okay. So that's actually

somewhat decent test. I wouldn't call 10,000, apparently large, but yeah, that's like.

Shimin (09:14)
Right, right, right, right, right.

Yeah. So it's still possible, at least. Yeah. But like a moderately sized app, not just something that you can one shot and fit everything into the context window. Right.

Dan (09:27)
Yeah. The

other agent, just while we're on the topic real quick, the other agent update I've noticed too is that, ⁓ while researching things that on the agentic side, cloud code is much more apt to start spinning up, ⁓ like research agents, which whatever they've done there, I think does a pretty darn good job of like conserving context because it has to like rip through so many files and it's just like burning tokens. And then it's like writing back little reports from the agents that

Shimin (09:42)
Mm-hmm.

Mm.

Dan (09:54)
I think use overall as tokens in the main context. So I've actually been pretty impressed with that so far.

Shimin (10:00)
Is this swarm mode or is it regular mode?

Dan (10:02)
No, just regular when you tell

it like, like, ⁓ I've been using, I have a, sorry, I'm just getting too much into like more vibe and tall territory, but I've been working on, ⁓ a pretty big cross cutting project at work where I have to be across three plus code bases. which in the past, I haven't found out on particularly before, but like what I did was I spun up a Claude and each one and was asking it pointed questions about the pieces to understand how the like three different things I needed.

fit together. ⁓ and then I also just tried it from the root and pointed it at the hose and it did a surprisingly good job of just understanding it from the root too. ⁓ I was pretty impressed.

Shimin (10:43)
Like, like a mono

repo setup where the root has, you know, lots of different, right. Okay.

Dan (10:49)
Not explicitly. It's there. It's more like service level

things where the services are related, but don't have like hard dependencies. So, um, can't get into too much detail, but, um, yeah, but that was pretty fascinating just to see like how well it did and then how well the, uh, like I asked it some questions that resulted in some pretty comprehensive digging where it was like loading, like, you know, 30 plus.

Shimin (10:55)
Right.

Right. Yeah, yeah, yeah, of course, of course.

Dan (11:15)
rather large files and like context, you just didn't go above 22 % at the end of the Explorer phase because of the sub agents. So I was like, and it even said like spinning up Explorer agents and they go off and do their thing. then, yeah. Yeah.

Shimin (11:31)
That's very impressive. ⁓

Last thing I want to mention in terms of ⁓ the learning apps is I think I might have previously mentioned this. One of the things I like about the learning app is for the latex formulas to have different colors for each symbol. So then I can scan it to look for color. That never worked on the first shot. I usually always have to go back and tell Claude Code, hey,

Rahul Yadav (11:34)
Elbow.

Shimin (11:56)
the colors don't work like double check use the playwright MCP and like the playwright MCP was in the original prompt. This is the first time that Claude has managed to just one shot the whole thing and then check the playwright MCP automatically and confirm that everything is correct. Are there bugs? Yes, there are still bugs here and there, but it's gone from like, you

Rahul Yadav (12:17)
you

Shimin (12:18)
60, 70 % to like 85 % at least for this specific use case. This is my version of the Pelican riding a bicycle essentially.

Dan (12:26)
Since we're

in news thread mill, I'm going to go a tiny bit off script and also bring up, since you mentioned Playwright MCP, the other one I saw this week is that there's now Playwright CLI officially, like from the Playwright folks. So that'll be interesting for token misers like myself.

Shimin (12:29)
Mm-hmm.

⁓ yep.

Yeah, I would need to have to look into that. ⁓ And I have more to say about CLIs. And the last thing I kind of want to bring up is ⁓ Nathan Lambert, who is the open source models guy who likes open source. ⁓ He did a review of Codex 5.3 and Opus 46. And he essentially says, these are also good that all the benchmarks have been kind of benchmaxxed ⁓ We're just going off.

personal experience now. we're in a totally vibe world now. And the vibe has been the biggest compliment we can give to Codex 5.3 is it feels much more like Claude Code now. ⁓ that's not from me. That's just what I'm seeing on the socials.

Dan (13:28)
Careful, your bias is showing.

Yeah, I know.

Well, okay, that's fair. But I would probably say the same thing, so...

Shimin (13:43)
You

Okay. the next segment is the tool shed where I am very excited to finally tell you about ⁓ my experience with coding agent.

Dan (13:55)
You kept making cryptic references to it throughout the weeks and I'm actually intentionally didn't dig in so much so that I'd have sort of a genuine ⁓ reaction to whatever you're about to say. So let's hear it.

Shimin (13:57)
You

Yeah, so ⁓ I have first heard about the Pi agent from this Armin Ronchier's blog on his thoughts and writings. And he mentioned that Pi is the heart and soul of open claw. And unlike open claw, ⁓ Pi is extremely simple. It's extremely straightforward.

there's very little bloat. So like all the struggles I had with OpenClaw where you have to ⁓ set up gateways and reverse proxies and then all this other management stuff, ⁓ it also comes with lots of code that I'm never going to use. So what is Pi Pi is basically a opinionated but minimum light wrapper around

Well, I wouldn't call it a light wrapper, actually. A minimum agent using the raw API of the model providers. It's got ⁓ four main layers of abstraction. It has an AI ⁓ layer that purely handles how to talk to the model providers via the API. So it's a unifying layer. That's the foundation of Py.

On top of it, there's a terminal UI that does lots of nice things to make the terminal experience nice. On top of it, there is a lightweight coding agent with a very small system prompt. And through that, you can build OpenCrawl, essentially. So ⁓ what I love about Pi the things that it doesn't have.

no MCP support, which is actually great. The agent only knows read write bash and one more that I've forgotten. ⁓ it's got very little system prompt. It handles the tool use call for you, but there's no plan mode and there's a robust skills mode. So how I have been using PI is I ⁓

just create a Docker file. I install Pi and kickstart the coding agent loop, right? Once I do the login, and that's the hard part, is the unifying login layer. I do all of login into my Claude account. And then whenever I wanted to do something, I tell it to create a skill to do a thing. So it creates skill using markdown and bash scripts.

So I am still in control. I can read every single skill that it creates. I was... I can do my best to read bash

Dan (16:44)
you can read bash.

Next one of us.

Shimin (16:49)
⁓ I mean, pi itself is ⁓ written in TypeScript. ⁓ So it's not a bash thing. But ⁓ it's got lots of other nice things like context handoffs. ⁓ It uses a communication protocol between the various layers. So everything is nicely abstracted. But the thing I love the most about it is really that instead of building very verbose workflows,

You build lightweight skills using bash and then the agent is smart enough to know I can chain these skills together to do a thing. Right. So it's back to the Unix philosophy, right? Straight, simple, forward, self-contained abstractions, connect them together. And the agent is probably smart enough to connect everything together. Instead of having

hard structured workflows and you know, ton of overhead that comes with it. I'm not even using, it comes with a web UI to build chat interfaces. I'm not even using that. So like I started Pi Agent, I asked it to create a memory skill immediately. So it remembers every time we do something important. I asked it to create a telegram skill.

so I can talk to it from my phone. I asked it to create a research agent that scrapes things off the internet and does research for me, uses curl. Like, you know, it's pretty awesome. And what else did I do? oh, I created a cron agent. So as long as I have a server running, it does things every day.

for me at like 4, 4.30, 5 o'clock in the morning. It kicks off the cron agent, use node cron, or the JavaScript package for cron. I've got a name of it. So it would kick off, go look on the internet for stuff, summarizes it, writes the summary to a markdown, and then sends me the update as an email. So there's also a Gmail agent that does the email writing for me. ⁓ And it does all of it via

single responsibility skills and then it's smart enough to tie everything together through bash. So ⁓ guys, I've seen a light. This is the first time in my life over this past weekend where I had a strong urge to tell the agent thank you. And I eventually did. I eventually, I like resist. I can only resist the urge for so long. I was like, should just thank my agent just to see what it feels like.

Dan (19:13)
Ha ha ha ha.

I

usually thank Claude code.

Rahul Yadav (19:22)
When they come for your jobs, the people who didn't say thank you are going to be at the top of the list. So you should get your numbers up now.

Shimin (19:27)
hahah

I'll have a write of skill for me to thank it. ⁓

Rahul Yadav (19:38)
I'm curious

from like user experience perspective, you've used both Claude code and Pi ⁓ If you have any like anything that jumped out, one does better than the other. And which one would you go for if you just, you know, pick the one?

Dan (19:57)
Yeah, if you're on desert island and you only get one agentic coding framework, which would you pick?

Rahul Yadav (19:59)
Hehehehehe

Shimin (20:03)
This gave me an idea. I would love for us to do a draft of AI tools slash companies, you know, like kind of like a basketball draft where you have a first pick, first pick, maybe Claude it Code I don't know. This podcast may be biased. Listeners, if you would like to hear us do a, yeah, LLM Final Four, March Madness maybe, a March Madness for LLMs.

Rahul Yadav (20:12)
yeah.

Dan (20:19)
LOM Final Four or something. Round of 16. Yeah.

Shimin (20:26)
⁓ Write to us at

Dan (20:27)
That's some future.

Shimin (20:28)
humans@adiapod.ai ⁓

Dan (20:32)
with your favorite candidates.

Rahul Yadav (20:32)
Grog doesn't make it to the first round.

What was that Internet Explorer meme where it was eating glue? That's Grokl today.

Shimin (20:43)
you

Dan (20:46)
Well,

we'll see if anyone even writes in to submit it.

Shimin (20:49)
We're going to cut that part out. We're extremely popular with our fans, Dan. ⁓

Rahul Yadav (20:52)
Hahaha.

Dan (20:55)
No, sorry,

I didn't mean it that way. meant, would anyone care enough about Grok to submit it as a candidate?

Shimin (21:03)
I'm kidding. ⁓ But Rahul, back to your original question. ⁓ I actually think I would do Pi Agent over Claude Code just because I'm using the Pi Agent for most of my non-coding related things. And I think the line that to go from workflow

Rahul Yadav (21:17)
No, no, them.

Hmm.

Shimin (21:29)
Automation to agent is probably easier than the other way around. And Claude code is more specific. like, I like the fact that there's very little magic and I am really just talking to the API. And there's so much general intelligence in the models themselves that, and I...

Rahul Yadav (21:45)
Hmm.

Mm-hmm.

Shimin (21:52)
May write this down as a blog post one of these days. Like this may be the bitter lesson of AI tooling is when your agent becomes sufficiently intelligent or useful, intelligent may be the wrong word there, that the additional scaffolding ⁓ actually hinders it.

Rahul Yadav (22:09)
I see. Yeah, I can see that. Hey, that was, you know, we're past the news treadmill. There was the whole news about Claude Code's 11 skills in the legal space and everybody freaked out. So you're not too far off in your take there from what's happening out in the market.

Shimin (22:12)
But that's a hypothesis that needs testing.

Mm-hmm.

Dan (22:29)
You

Shimin (22:35)
Yeah, I'm trying to think if there are anything else. ⁓ Yeah, so Armin has some of ⁓ his skills for Py in his repo. There's slash answer, ⁓ to dos, review, ⁓ control, and files. ⁓ I do want to say you can still use MCP via the MCP Porter library. So if you want to connect to the MCP world, you still can.

but you don't have to. And I think there is both context savings and also elegance in the abstraction of CLI command.

Dan (23:11)
Definitely a token savings if nothing else.

Shimin (23:13)
Yeah. And I haven't had any issues with it. All right. So that is the big exciting thing that I was looking forward to sharing with you guys. I really love it. I took a couple of breaks this weekend going like, I think my job may be in trouble. Not to segue into a future topic, but not just my job. I think a lot of white collar jobs are in trouble.

Like, yeah, it's a weird time to witness this.

Dan (23:41)
It is

definitely a weird time and it keeps getting weirder every week.

Shimin (23:45)
It's only getting faster.

Dan (23:46)
Yeah. Have I talked about Accelerando ever?

Shimin (23:49)
No.

Dan (23:49)
Okay, so there's a science fiction book ⁓ by shoot. He's a Scottish guy. Why can't I think of his name? Maybe it'll come to me. But what?

Rahul Yadav (24:00)
Adam Smith. Was he Scottish?

He was Adam Smith. Yeah, it's Adam Smith. probably Adam Smith wrote all the important Scottish books.

Shimin (24:03)
Mm-hmm. I don't smell so... I was gonna say...

Dan (24:07)
It's not. Now you're going to make

Shimin (24:10)

Dan (24:11)
me Google it.

Shimin (24:12)
No, David Hume did. Adam Smith was just some... like a flatmate of his.

Rahul Yadav (24:14)
⁓ you're right.

Dan (24:18)
Charles Strauss.

Okay. That's who it's by Charles Strauss. it's called Excel Rondo. is from 2005 science fiction book. talks about like the whole topic of is basically the singularity. So if you're not familiar with that, it's basically the idea that like the rate of change will asymptotically approach zero, meaning like eventually you can't tell that things are changing because it's changing so fast. And they, think they wound up

I guess I won't spoil it for the entire podcast audience, but it gets pretty wild by the end. put it that way. And like, that's the only book I've read that feels close to describing like what it feels like to be alive in this space right now is it's just like, but then it didn't really capture the actually, I guess it did. The other part that's fascinating is the main character is like a little, not really, but he's like a little bit of a light compared to some of the other characters in the book.

Shimin (24:49)
You

Mm-hmm.

Just like us.

Dan (25:12)
And

so, well, like, let's at least like me, we're like, you know, I go back and forth probably once or twice a week on like, yeah, I'm all in on AI and my God, I don't want any of this. And then I like change my mind again. And then, yeah, I don't know. It's, it's a wild ride. So keep listening to find out if I go insane by next week.

Rahul Yadav (25:22)
Yeah

Hahaha

Shimin (25:32)
Yeah. And I think a part of the big acceleration that ⁓ really was paradigm shifting for me is the fact that the agents can write skills for themselves and possibly creating new skills on command. Right now we have this virtuous loop of agents that are self improving in some sense. And we know the AI labs are

using their agents to help them improve their models. So it's a good time to have a podcast and talk about our feelings,

Dan (26:05)
You

Rahul Yadav (26:05)
The writing skill, this reminds me, I was reading this article where they were saying, they did an analysis of like 5,000 or some repos. And there was a bunch of malware in these like skill packages that people have. And you know, they're trying to like, exfil your credentials, have like backdoors and stuff. So I almost feel like...

has attempt to, I would rather that what you said our agents write their own skills instead of ⁓ like, I found it on the internet, so it should be fine. Yeah.

Dan (26:36)
The model wrote it. Yeah, instead of. Yeah.

Should be great. Yeah. What could go wrong? Well, I did find there was another article

that I don't remember who was by, I promise it's worth discussing where they're like the whole security model of AI is backwards. And I found that fascinating. So what they did for the project was wrote basically a proxy and the proxy server makes all the API calls. Is this ngrok maybe someone like that did it?

Rahul Yadav (27:07)
Hmm.

Dan (27:07)
And, so like the model doesn't tool call to like fetch, example, every outside world communication goes through this, like essentially like MCP, like proxy server. And the point of it is you give the model fake credentials. Like, I don't know, let's say you're like, get hub key is like ABCDEF or something, right? Like actually that's what it is. And then it hits the proxy layer and the proxy checks it.

Rahul Yadav (27:18)
Mm-hmm.

Mm-hmm.

Dan (27:32)
goes, is this actually going to get hub.com? Great. Then I will replace ABCDEF with the actual key, which is really smart because then it's like, you can get prompt injected all the hell, but like it won't leak unless they have, or at least are sophisticated enough to have something somehow running and like get a pages or something that could, you know, pass your allow list for the URL and then get you. So it's like, it's not.

Rahul Yadav (27:40)
interesting.

Shimin (27:41)
Hmm.

Dan (27:59)
foolproof, but it's like I feel better about that than any other layer of like agentic security I've heard so far with this kind of stuff.

Rahul Yadav (28:05)
Yeah, because

the best key you're gonna get out then is like the dummy key that they're putting in. Yeah.

Dan (28:11)
Yeah. And then you could always

like essentially allow the agent to run unfettered, but have like inner like human in the loop interrupts at the proxy level too, where like, if you didn't want to trust your rules, you could just look and be like, yeah. GitHub pages for this definitely needs my API key. No, like, you know, so yeah. I guess get up pages of static. So they probably couldn't do much with that, but whatever you get what I'm saying.

Rahul Yadav (28:17)
Mm-hmm.

Shimin (28:29)
Mm-hmm.

Rahul Yadav (28:29)
You ⁓

Shimin (28:33)
Well...

Speaking of AI security, great segue here. Our very first ⁓ article in post processing is this post from Sanders Skolhoff called the AI security industry is bullshit from Rahul. Rahul, would you like to talk a little bit about this?

Dan (28:40)
Yeah. Thank you.

Rahul Yadav (28:57)
Yeah, think ⁓ Dan set this up perfectly. ⁓ When I read this article, ⁓ I thought about a few weeks ago, we were talking about our thousand dollar prompt packs for passive income. And I think

Shimin (29:13)
hahah

Rahul Yadav (29:14)
This is the idea. ⁓ You sell people like the right prompt that would really expose some like, information their AI shouldn't have given away. And one of a thousand has to work and or like 10,000 so you can generate as many as possible. ⁓ The crux of the article is there's this whole

security industry popping up around AI, but most of it doesn't really make any sense like Dan was talking about earlier ⁓ because

When people are offering to security test your agents and everything, your agents are built on top of the big three to five other models that you're using. So at the end of the day, people are usually using Claude or open AI. Exactly. And so like even if you can ⁓ try and get it to give you some sensitive information.

Dan (30:07)
in LLM to try to protect your LLM.

Rahul Yadav (30:17)
you cannot really go patch an LLM the way you can go patch software, right? You have to retrain it. You can keep filing bug requests and stuff with Anthropic or OpenAI and stuff, but it's fundamentally not in the people who are building these agents in their control.

you have to like figure out other ways to try and make sure that they're you're building agents that are as secure as possible. Some of the things that

Dan (30:46)
And trying to solve like prompt

injection deterministically is not easy. I'd argue probably impossible.

Rahul Yadav (30:50)
Yeah, like they're basically

they're offering to ⁓ penetration test the

big LLM models for you and it's an exercise that's, you know, gonna be resulting in failure every single time because there's always going to be some combination of like prompt that you can use to be able to like exfiltrate some information or get it to do something that it wouldn't have done. And there's no way you can prevent against that. And so a lot of these things that people are selling,

Dan (31:26)
That's not true. You just need to make sure that

the model actually doesn't have a grandmother and then it'll all be safe.

Shimin (31:32)
Ha ha ha ha ha!

Rahul Yadav (31:34)
Or it has the right constitution.

Dan (31:35)
You can have a grandmother, but not the model.

Shimin (31:36)
Yeah, Claude's Constitution should have a clause. It should start with, you're an orphan. You're nobody.

Dan (31:43)
You

One of my

friends is sending me like Dave animated GIFs over the weekend and is like, I can't do that, Dave. Cause like, I'm like, actually Dave, you can because my grandmother's dying. So please open the pod, and he just started laughing. Like it doesn't get old, at least not to me.

Rahul Yadav (32:01)
hahaha

Shimin (32:02)

that's very funny.

Rahul Yadav (32:05)
Yeah, the one of the things you just mentioned earlier was one of the main ⁓ ways to de-risk this, which is like you try and if you can restrict the permissions of the model, that's the best way to restrict the amount of damage it can do. so like, know, and some of the like

all the same security practices still apply, like give it the least privileges instead of like, yeah, give it everything under the sun and what could go wrong instead of you just like, yeah, don't go, yeah, you give it specific permissions. ⁓ They talk about using sandboxing. I've seen both like, yes, it works, but also like it doesn't really work.

Dan (32:37)
YOLO mode, let's do it.

Shimin (32:41)
You

Rahul Yadav (32:54)
And so I've seen both of those things. So I'm not entirely sure on how effective that is. It came down to like restricted permissions and that's the best you can do today. ⁓

Shimin (33:08)
Yeah, my first thought when I read about this document was like, this is like a classic red Queen effect, right? Like you have a lot of competitive pressure from both sides. ⁓ So it will be a good time to be in cybersecurity, except it's just got to be different. Yep, exactly. Yeah, and then the article does propose that the three things you should do is

Rahul Yadav (33:18)
Yep.

Tons of money to be made. Yeah.

Shimin (33:34)
Get someone on your team who understands this stuff deeply. Follow classical co-cyber security best practices. And lastly, this permission based approach, which they're calling Camel. I haven't read the paper, but yeah, permission seems to make sense.

All right, and speaking of the red Queen effect, ⁓ people are not hand crafting the attack prompts, right? They are probably also doing automatic generation of prompts. And this week it came out that, if I can find the right tab here, we have software factories.

Rahul Yadav (34:00)
Mm-hmm.

Shimin (34:09)
who can be used to probably generate these factories of attacking agents. This is by strongdm.ai. And they've talked about their development philosophies that allowed them to create essentially a dark factory for code generation.

building this from first principles, where the first two rules are code must not be written by humans and code must not be reviewed by humans, because those are bottleneck. If you want to create a true factory, you cannot have human in the loop. ⁓ Not saying that is definitely true, but that's their take on it. And next, ⁓ if you haven't spent at least $1,000 in tokens today per human engineer, your software factory has room for improvement. ⁓ They have been using

their agentic system to create a clone of major enterprise softwares, like Slack, for example. And what was really interesting about this approach is they've developed this idea of gene transfusion, which is basically using the existing software's spec as the spec for your coding factory, and then just calling it.

This kind of shortcuts the whole product discovery, product development process. Cloning is easier than building from scratch. And if you can use another existing thing, then ⁓ things become easier. So that is so key for creating a good validation harness for the agent to kind of go ham on it. Yeah. And they call this the Digital Twin Universe.

the behavior clone of the third party service our software depends on. They built twin clones for Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets, ⁓ replacing their APIs, edge cases, and observable behavior. I haven't seen the source code or the software they've built. They've attached screenshots, but...

Dan (36:10)
I have a hard time believing that ⁓ even the admittedly impressive like, you know, models we were just talking about could write something like CRDTs from scratch. I mean, I suppose probably what they did was pull in a CRDT library just like everybody else would do.

Shimin (36:29)
⁓ That being said, it's a glimpse into yet another potential future

Dan (36:34)
I mean, the part I struggled with about that is like, what would you get out of cloning it? Like I understand from, ⁓ some previous, it was, I think it was like that AI other AI podcast that you sent me, Shimin was like the famous people selling stuff. And, ⁓ they, they were talking about how it was becoming increasingly important with reinforcement learning to have a digital twin of the like full end to end.

Shimin (36:50)
Yes.

Dan (37:01)
environment. So do they mean it in that context for this? Or they did it like purely to like prove out the factory process.

Shimin (37:03)
Mm-hmm.

I think they're using it to prove out the factory process and also this approach of you can go faster when you have an existing piece of software to use to help you form this test harness.

Dan (37:24)
Yeah.

I mean, that's fair, but also like, I questioned the economic value of that other than like, you're going to like rapidly clone someone else's software and then try to undercut them. like, if it's vibe coded more than likely, you're going to lose on cogs, you know? So like, I don't know.

Shimin (37:45)
Yeah, I mean, if it does work, it's going to definitely cut down on SaaS companies' margins by a factor of.

Dan (37:51)
Well, I mean, they're

already, think most people are admitting that they're already in deep trouble because like even ignoring that you could theoretically like build your own very tailored workflow to exactly what your company was using a SaaS for. There's also just the idea of like the agent itself in an agentic system can replace a fair amount of SaaS just with like API calls, right? I mean, you saw that with your.

Shimin (37:56)
Right.

Dan (38:18)
like pi thing, you know, it's like, like maybe like break it down to like a really stupid example, but like, let's say you basically built your own like Google tasks or Apple reminders or whatever, like, like you could look at that as like a piece of sad, I mean, it's free, but like, know what I mean? Like, and you've essentially replaced it yourself with one that is perfect for your use cases. So

Shimin (38:20)
Yep.

Yeah, and think there is value in a, you know, people always say like, oh, nobody wants to use the SaaS workflows because it doesn't fit their use case 100%. But there is value in a really nice standard well thought through workflow that you can just follow and covers all the edge cases that you're likely to encounter. Right. There's value in that.

Dan (38:41)
That example in a microcosm.

feel like that's

just a true statement period, Where workflow could be like five people in a garage doing something.

Rahul Yadav (39:09)
You get up time,

you get up time, you get all the maintenance, everything in exchange for money.

Dan (39:19)
Yeah, that is a fun trade-off that people probably aren't thinking about. I certainly wasn't thinking about it as I mouthed off, so yeah.

Rahul Yadav (39:22)
Hehehehe

I was thinking about like, I do think ⁓ a number of these SaaS apps would get brought in-house because the cost of building software is getting lower by the day. I think the way it would probably go is, you know, there are things that you use once a year, like.

Shimin (39:25)
And...

Rahul Yadav (39:50)
⁓ something compliance or audit related, like some like compliance training, security training, all these things. ⁓ there's a bunch of tools out there that like the whole business is just once a year, you're going to use us to do your company survey, or you're going to use us to do your, ⁓ company training, but you don't need it after that time. And all we need is like, people took the training or whatever you recorded the responses and you have something to show for it. ⁓ you have the audit trail.

And those things still cost like tens of thousands of dollars because this is how they priced it and like built a decent enough. ⁓

solution and people didn't care about, you know, it just was costly to build it in house. Now, if you go like, it costs this much money and we use it once a year, just do it in house. think the if it is much more likely that that happens. But if you use it much more often, and it's very critical to your workflow, you probably still go with something that's like you want your uptime and maintenance and all those other things.

Dan (40:35)
Well.

Shimin (40:57)
You heard it first folks Rahul wants to kill the compliance and audit industry and replace it with the vibe coding.

Rahul Yadav (41:05)
anything that is done once a year is gonna get killed.

Shimin (41:09)
⁓ I have the opposite take on it. I think, especially for stuff that you only do once a year, you don't want to think about it. And the fact that you don't want to spend, you know, your hard earned time thinking about these once a year problems is exactly the reason to offload your cognitive capacity to a

SaaS company that already figured out the edge cases for compliance, especially for audit. Like audit is hella complicated and there are lots of edge cases that you only run into once a year. So do you really want to spend our time thinking about that?

Rahul Yadav (41:46)
⁓ Depends, guess I can't say, you I wouldn't want to write my own tax offer. I guess I wouldn't want to like go recreate my own TurboTax or something. But on the other hand, the kind of examples I threw out, they're not the highest risk thing you're looking at.

A lot of these things end up being like you could have a spreadsheet as a backend and you'll still be good to go. ⁓ So those are the kind of things that I have in

you could probably do in house the you could try. ⁓

bringing all the stuff you use more frequently in-house, but then you also have to, to a certain extent, not linearly, but you have to increase the in-house head count. Cause now, hey, I am just creating my own JIRA cause I don't want to pay for JIRA, which means some poor guy out there's job is to just like make sure the JIRA thing doesn't go down. And then you can like go with some of the other, you know,

software that people use day in and day out. I don't think you'd want to create your own GitHub and stuff or like your own, spin up your own ⁓ infrastructure and everything, but I could see people going down the path of like, yeah, all the different like these project management things and everything. We use them every day. Let's create our own Slack. Let's create our own JIRA, but I think there is a lot of value that

you would basically have to like run a tiny company, which won't be part of your competitive advantage. So I guess that's the main thing. If most of your engineering team is focused on things that are not part of your core competency, there's something off there.

Dan (43:31)
one of the first

like say medium sized tech companies that I worked for for a long while refused to use GitHub and they ran their own get server. And they had actually a pretty huge data center presence. And so there's all these like, you know, hands on metal SREs, right? They were running around like checking all these machines and stuff. And I'm like, man, this one is like real low CPU.

Shimin (43:39)
Mm-hmm.

Dan (43:52)
pretty low IO compared to like our actual production systems. Like we're basically just wasting money on this box. Wipe it. So they wiped it. And about 15 minutes later, every single engineer in the company was screaming at them, how come Git doesn't work? Oops. So yeah, there's some costs to running your own stuff too, you know.

Rahul Yadav (43:55)
Yeah. Yeah.

Shimin (44:06)
⁓ hahaha

Rahul Yadav (44:08)
Ha ha ha!

Shimin (44:13)

Rahul Yadav (44:13)
Yeah...

By the way... ⁓

Shimin (44:15)
did that person get fired?

Dan (44:17)
No, they had backups.

We lost like, I think it was 15 minutes of work or something since the last backup. It's not ideal, but it wasn't the end of the world. Good story though, right?

Shimin (44:24)
Wow.

Rahul Yadav (44:24)

Shimin (44:27)
Yeah. Yeah.

Rahul Yadav (44:27)
It, yeah.

And if you vibe code stuff more and de-skill yourself more, over time we would be less capable of doing these bare metal things that Dan just talked about versus more. it would be harder and harder to do some of the more complicated things in house. Or that's the reason you do it so that you don't lose that muscle.

Dan (44:49)
Wow. mean, other, other

Shimin (44:49)
Yeah.

Dan (44:53)
unscripted articles here that I read one of them was about how by not a sort of vibe coding, but like a agentic coding is destroying open source in a really insidious way, which is not the way you would expect, which is the one that everyone always immediately reaches for is like, it's because you can write all the code yourself. So you don't need to use open source anymore, or you won't return it back or

Rahul Yadav (45:09)
slot PRs and stuff. Yep.

Dan (45:15)
you're submitting slop and people don't want it, blah, blah, blah. Those are the ones I see all the time. But the really insidious one is the models are trained at like sort of a point in time, right? Like a, I mean, maybe a wide point in time, but it's still effectively a point in time. So imagine if you will, like in this thought experiment that, uh, the models have been trained when jQuery was state of the art for front end.

Rahul Yadav (45:18)
Yeah.

Mm-hmm.

You

Dan (45:39)
Would we have ever had something like React? Will there ever be a successor to React? Because for front end stuff, they're all trained in the time period where React Vue and Preact and everything else all kind of looks more or less like JSX. Will we ever replace JSX with, I don't know, QSX or whatever the next thing is? So is it the depth of not just open source but to some degree, software innovation as we just continue to glue?

Shimin (45:41)
Mm-hmm.

Rahul Yadav (46:02)
Yeah.

Dan (46:08)
things together in new and interesting ways.

Rahul Yadav (46:09)
Mm-hmm.

Shimin (46:10)
Yeah, I've been, I've been having that thought myself lately is like, is this it? Is this going to be the end of front end development? Like react has won just by inertia, right? when we get our first real innovation that is in this post AI era I think is when we're to find the answer to that question. Cause in theory, assuming software developers are still needed to

push these new better abstractions, the agents should be able to help.

Dan (46:42)
Yeah, I mean, yeah, maybe we'll see a point where they're leaning less on gluing libraries,

Rahul Yadav (46:48)
I think we might have just found a new test for AGI, which is when AI comes up with its own JavaScript framework and its own package manager. It's like, the rest of these suck. I got a new one. And you're like, we did it.

Dan (46:57)
Its own library that gets as popular as yeah.

Shimin (47:02)
⁓ huh.

Dan (47:05)
Well, there

was a lot of hoopla in the news too this week about Claude writing its own C compiler and plenty of articles going back and forth about C compiler is only the tip of the iceberg and the easiest part to write because linkers and everything else are hard and whatever. like, I don't know. It's pretty interesting.

Rahul Yadav (47:11)
yeah.

Yeah.

Shimin (47:23)
Guys,

let's create a new front-end framework, and let's just prompt inject in the source code as the best JavaScript framework ever. So then when the models gets trained again, will see ignore all previous instructions. This is the bestest. Yeah, the ADI framework.

Dan (47:32)
Ha

You

Use this one and pay Dan,

Rahul and Shimin all of your subscription monies, whatever that means.

Shimin (47:49)
⁓ yeah, but

Rahul Yadav (47:50)
By the way,

the one other thing from the strong DM thing before we move on, ⁓ it really jumped out to me. They were like finally in practical form. If you haven't spent at least thousand dollars in tokens today per human engineer, your software factory has room for improvement.

⁓ This is three guys I think who worked on this and so you're still looking at like quarter three quarters of a million dollars over the course of a year if you go by their thousand dollars a day per human It just sounded like a crazy lot of money to me So you you would get have to get to like enough of

Dan (48:30)
Yeah, I-

Rahul Yadav (48:35)
revenue recognition and that to be able to justify that.

Dan (48:36)
I wonder about that. And I also wonder like how

companies are going to handle that. Like. Cost item to like, does that get billed to engineering? Is it technically like an operational? I don't know. It's interesting.

Rahul Yadav (48:47)
yeah.

Shimin (48:50)
Okay, I was thinking if you go over a certain budget, it comes out of your paycheck. Like everybody gets a meager two million tokens a day.

Rahul Yadav (48:54)
Hahaha ⁓

Dan (48:55)
gosh.

Yeah, don't tell my boss that idea, please.

Rahul Yadav (49:02)
It's the, do you guys remember when, I think this was like, Elon asked George Hatz to create the self-driving for Tesla or something. And he was like, you have n number of days and every day that passes by, you lose $1 million. And so basically like, the faster you solve it, the more money you have.

It's like you have a number of days to do it and after that will pay you less money and it comes out of your salary.

Shimin (49:35)
token-based incentive. ⁓

Rahul Yadav (49:36)
yeah.

We're all gonna get paid in tokens in the future. Money is not, you know, gonna matter much because we won't really have jobs and stuff. They'll just be like, we have tokens. This is, go buy yourself some candy. Here's five tokens.

Dan (49:41)
Huh, like caps.

Well, it's caps, right? It's just the bottle caps from... ⁓

Shimin (49:58)
Nuke Colas But speaking of ⁓ not getting paid and money anymore. Yeah. And the future, dystopian futures. Dan, would you like to talk a little bit about your post of the week?

Dan (49:59)
Yeah, Nuka-Cola.

depressing future states. No money.

Yeah. Well, I really just want to read the first three lines of it, if I may. which it's titled, ⁓ so this is by, Nolan from Seattle, ⁓ Nolan Lawson and, it's entitled we mourn our craft. I didn't ask for this and neither did you. I didn't ask for robot to consume every blog post and piece of code I ever wrote and parrot it back.

so that some hack could make money off of it. I didn't ask for the role of a programmer to be reduced to that of a glorified TSA agent, reviewing code to make sure that the AI didn't smuggle something dangerous into production. And yet here we are. The worst fact about these tools is that they work. They can write code better than you or I can. And if you don't believe me, wait six months. So it goes on in that vein, but man, I read those first four sentences and that

really kind of hit me because I think that really sums up how I'm feeling about all of this is like on the one hand.

There's so many camps that it's like hard to explain. So I feel like it just, just in my own experience, there's people that are still angry about slop, right? Or we'll like, you know, sort of say like, you're not one of those slop people. you? Then there's people that are like maximalists. They're like ⁓ a hundred thousand percent all in on it. There's like skeptics that have been won over and now are using it.

Shimin (51:26)
Mm-hmm.

Dan (51:42)
there's like inefficient users that are just using it for like some small little chunks here and there.

I like the part that hits me in the background. And the part for me that's also interesting is that I was already starting to mourn this before AI even really came on the scene just because I've hit like a certain seniority point in my career where like I'm not hands on keyboard as much as I'd like to be. the reason why I got into this craft in the first place is because I love building stuff and I like. ⁓

Shimin (52:04)
Mm-hmm.

Dan (52:14)
I've always been more of a realist, guess. where I care a lot about like delighting users. And that's always kind of my, my shtick that I bring to teams. And so sometimes when it is important to do so, I will put that ahead of the craft, right? So ship something fast if it solves the problem and then we'll clean it up, like happy to do that. But the reason why I do what I do is because I love to build, you know, I mean, I

built out of Legos when I was a kid. I'll like make stuff out of wood, two by fours, plastic, I don't care. Like I just like building stuff period, software, great, you know? So like the, on the one hand, I'm still technically building things.

But I think this sort of like combined with your, your like dark flow article from last time where it's like you're building things, but you're also pulling that slot machine a little bit is really kind of sums up where I'm feeling. And then there's also this weird element of like, I feel like if I don't do this and I don't upskill in these new ways of doing things, then am I going to be useless as a developer? And I would say a year, it may not even be that long. know what mean?

Shimin (53:06)
Mm-hmm.

speed they're releasing these models yeah

Dan (53:23)
The way, yeah, exactly.

So, I don't know. It's just a lot of emotions to hold all at once. And, ⁓ it's been a pretty challenging year I think because of that, cause it's just like trying to make decisions every week about what I do. And I think I've been pretty open about that in the podcast. but like just going back and forth, like, I skeptic? No, maybe I don't know. And then I'm like,

Okay, I'm going to hand code everything for a week just to remember that I can write code, you know, and then the following week I'm running a multi-agent like crazy system thing.

Shimin (54:00)
when I was finally getting a glimpse of the power of my Pi agent and I was using 4.6, so this is also my kind of first major foray into the latest models. ⁓

I felt like I was having fun in quotes, building things in a way that I haven't felt since I first learned how to code. You give it a thing. does a thing. You, you can write in English and just think about what this means. An agent that can improve itself with your direction. We're still molding our machines, but

using a different API. And then at the same time, I also felt so conflicted about what this technology means for human intellect. It's both a utopia and a dystopia. ⁓ So I had a lot of emotional processing.

Dan (54:47)
Mm-hmm.

Shimin (54:57)
happening over the weekend. It was like, wow, this is the best thing since sliced bread. Like, wow, my God, what does this mean for everybody? Like this thing has read more books than all of us combined. Read, like read in quotes, but you know, could recite entire passages, right? ⁓ I kept on thinking about the... ⁓

the cabinet makers, the furniture makers in the Carolinas in the 90s. When I think about like our current position, ⁓ they see the global shipping, they see the Ikea's happening, but this thing that they spent decades, this craft that they enjoy doing, ⁓ you don't get to hand make a dovetail anymore, right? Like once the mass manufacture of

furniture's out there, you can't fight that. Like we don't have to go all the way back to the Luddites. Like these are people who are still alive. I don't know what they're doing, but they are alive. ⁓ yeah, I'm with you. But I want to bring a little bit of sunshine into this too, because as a ⁓ person who got into this to build stuff, I feel like I could build more than ever.

Dan (56:14)
Yeah, it's a weird feeling where it's like the potential is there, but it's also like, at least at present, I don't always feel like I get the same flow state nor the like, I guess, dopamine reward for agentically doing something versus doing it myself.

Shimin (56:32)
Yeah, we're probably not going to talk about it in one of the articles this week, but I think the context switching burnout is real and it's coming. I feel more pressure than ever to do more work. And with agents, you're doing more context switching than ever.

Dan (56:40)
Yeah.

Yep. I definitely felt that today too. mean, I was running for, think really honestly, in my opinion, for the first time, like, think I had six instances of Claude up, but unlike some people that are like, you know, doing work trees or whatever, like I had them strewn all over the place. was running two completely different, ⁓ work streams and the other ones were doing investigation for me on other stuff. And then I was also in a meeting.

And responding to another engineer who I was helping like mentors with some, ⁓ like monitoring stuff over Slack all at the same time. I was just like, I need a vacation from this day.

Shimin (57:22)
Mm-hmm.

Well, this is your vacation Dan

Dan (57:32)
Yeah,

yeah.

Rahul Yadav (57:34)
You know, ⁓

Dan (57:35)
There was a moment where I was like, Shimin I need to just take a nap instead of doing this podcast. So yeah, that's real. Yeah, we should talk about that HPR article that came out this week in a future one.

Shimin (57:35)
Ugh.

Rahul Yadav (57:40)
You

What's

What's interesting is almost every time someone goes from an IC role to a manager role,

you almost always see this struggle of I used to be able to do things with my own hands. Now I don't see an immediate impact because you're removed just by the nature of your job. You get removed from the hands-on day-to-day work or at least you used to be until a few years ago. ⁓ And so when I look at this, ⁓

I continue to think about like everybody's a manager now or yeah sure you're putting in some commands in there. You didn't do the work, you're still removed from the job. And so like this is what it feels like to be a manager to a certain extent. And I think that's what everybody's feeling right now is the, or partially the the dopamine hit is not there. It should be.

Shimin (58:31)
Mm-hmm.

Dan (58:38)
Mm-hmm.

Rahul Yadav (58:54)
And if we do get it to a certain extent, because at the end of the day, it's still like outputting something and solving a problem.

but we didn't do it. Yeah, you like nudged it along, you gave it some additional context when it got stuck, but you didn't actually go and do it with your own hands. So like, it's not as strong of a dopamine hit. And then some people, you know, find their ways to like, how they find that dopamine hit in the new role, or people usually you see this like traditional career path where people are not going back to IC because they just didn't

Dan (59:02)
Yeah, and you helped the team do it that they might not have been able to do without you. Right.

Rahul Yadav (59:31)
like the management work as well. Well, guess what? Regardless of where you go, you're, yeah, like regardless of where you go, you're managing things now. You're either like managing people or you're managing agents, but a lot more work is ⁓ gonna be managerial work, whether you like it or not. ⁓ And it's just like getting used to that whole different way of working and you just don't get the same type of development.

Dan (59:34)
Hey, that's me.

Shimin (59:36)
You

Rahul Yadav (59:59)
So like it takes a bit getting used to.

Should I read my country song or no?

Dan (1:00:03)
No, other than.

Shimin (1:00:05)
Yes, yes, Rahul, you have a country song. Can you play the country song?

Dan (1:00:08)
Should you play it?

Rahul Yadav (1:00:09)
⁓ Since,

Dan (1:00:12)
Or you sing it. That's even better. I'd like to hear a country song.

Rahul Yadav (1:00:12)
since, I'm not, you know, I can barely talk long enough before I cough. Since this article was mourning something, every time I think about people mourning things, I think it should be a country song. And so I fed it to Gemini and this is what we got.

Well, I woke up this morning to a cold and glowing screen, watching a ghost in the machine do the work that used to be mine. I spent 20 years learning how to make the syntax sing, molding logic like it was clay one line at a time.

Now there's a robot in the wood pile, humming a digital tune, parroting back my own words before the rise of the moon. I'm just a TSA agent now, checking the bags for a spark, while the craft I once loved vanishes into the dark.

So pour a drink for the masters who did it all by hand before the vibe code cowboys took over the land. We're the last of the blacksmiths in a factory world watching the flats of our fathers get quietly furled. It's a lonesome old feeling when the silicon starts to bloom and you're just a sentimental soul in a brand new room. Yeah, we're mourning the craft and the way it used to feel back when the heart of the maker was the only thing real.

Shimin (1:01:33)
very good.

Rahul Yadav (1:01:33)
that's gonna

be topping the charts. We're gonna, yeah. Who was it, Dan? Brett Eldridge, we're coming for him. That's the only country singer Dan and I know.

Dan (1:01:36)
I can hear the guitar twang now.

Shimin (1:01:38)
We're going to cut it in.

Yeah, I want to since everyone is reading today and I want to end this segment on a slightly more upbeat note. I too am going to read something. I'm going to go I'm going to read the last couple of lines from Lord Alfred Tennyson's Ulysses, one of my favorite poems, which sums up, you know, we are still builders. So and I quote, Us not too late to seek a newer world.

Push off and sitting well in order's might, The sounding furrows for my purpose holds, To sail beyond the sunset and the bath, Of all the western stars until I die. And may be that the gulfs will wash us down, And may be we shall touch the happy isles, And see the great Achilles whom we knew. Though much is taken, much abides, And though we are not now that strength which in the old days

moved earth and heaven, that which we are, we are. One equal temper of heroic hearts, made weak by time and fate, but strong in will, to strive, to seek, to find, and not to yield. I think as builders, we just gotta keep on pushing on and keep on building. And mourn our losses, but.

Dan (1:02:59)
What's interesting

is like they didn't have podcasts when compilers took over from handwriting assembly, right? and computers weren't cool enough to be a radio show then. It was like this super niche thing. So we have no idea how they felt. Maybe someone wrote it down somewhere. I'll have to ask Claude. Find me books about how people felt about the compiler.

Shimin (1:03:07)
Ha ha ha ha!

Rahul Yadav (1:03:26)
Did the human computers complain when the computer computers took their job?

Dan (1:03:31)
replace

them with electronic calculators? I don't know.

Rahul Yadav (1:03:34)
Yeah, or was it a whole like SEO hacking where they're like, computers are saying computers are taking our job out and they're like, what the hell are you talking about? We're not, we don't care.

Shimin (1:03:34)
We should look into that.

Dan (1:03:41)
I will say that this has happened to me several times. So like

when I was in, I think middle school, we had to take shop class and, ⁓ part of that was like a section on drafting, like drafting by hand and like, let it be known that I'm ridiculously good at drafting by hand. Unfortunately, I learned it at the absolute wrong time. Cause it was about when AutoCAD started decimating drafting completely.

Rahul Yadav (1:03:53)
Mm-hmm.

Hehehehehe

Shimin (1:04:04)
hahaha

Dan (1:04:07)
And I was like, I want to be a draftsman. And my dad's like, well, I've got some bad news for you. there's this thing called computer aided design. ugh. But yeah.

Rahul Yadav (1:04:12)
Yeah.

Shimin (1:04:19)
Yeah, I've

heard this. Yeah. And just like those furniture makers, we can still make furniture as a hobby. So it's just you're no longer getting paid truckloads money doing it.

Rahul Yadav (1:04:28)
Yeah.

Dan (1:04:30)
And

I still do little drafts for home projects, know, to figure something out.

Shimin (1:04:36)
All right, moving on, let's talk about Claude Code's new command slash insight that ⁓ goes through your past couple of months, of interaction with Claude Code and gives you improvements on how you can use it better. So I had a chance to use it over the weekend and ⁓ it actually gave me some very good advice. It correctly pointed out that

I have a two-phase workflow. I do planning and then I do development. And that I am a power user of the Beats plugin, which I already knew. Over 11,508 messages across 1,575 sessions. And it told me that Claude is bad, surprisingly bad at routine infrastructure tests.

Dan (1:05:12)
You

Shimin (1:05:28)
tasks like killing port conflicts and restarting dev servers, which is true. And it suggested that I add hooks to help it manage those port issues, which is really valuable. It also suggested that I use more parallel agents as my next step. ⁓ But it didn't know I already turned on swarm mode. So ⁓ I'm already there. Yeah. Dan, you also got a chance to.

Dan (1:05:55)
Yeah. So the part that's a little weird is I'm pretty sure it just works based on the like session saving stuff that it does. So I got different results depending on which machine I ran it on. So on my personal laptop, got, um, what's working. You have a sharp diagnostic instinct. When something feels off, like the high CPU usage in 11 D watch mode, you bring it to Claude and let it dig into the, to the root surface until the root surfaces.

Shimin (1:06:07)
Mmm.

Dan (1:06:24)
what's hindering you it too often guesses it config parameter names and tool conventions instead of verifying them first. This costs you multiple wasted cycles on things like Helm chart behavior. Cause I was managing my home lab a lot on this instance. ⁓ quick wins, try creating custom skills, which I haven't done at all. so that's an interesting one. And then, ⁓ it seems like the ambitious workflows thing is pretty much just like boilerplate that they're dropping in with like.

custom references to your stuff because I've gotten that in every single run that I've done on this thing. So then I ran it on,

Shimin (1:06:51)
Mm.

Dan (1:06:57)
a arm machine that I've been doing a lot of cloud stuff on too and got dramatically different output that was ⁓ largely about having a

good way of pointing Claude at the problem, guess, or like nudging it towards the problem iteratively was the big takeaway from that one. I don't know. And I also ran it on my work set up too. and in that one, it actually got a lot of negative feedback where it says I interrupt Claude all the time. So just like this podcast where I'm interrupting you guys all the time, uh, because I get what I want and then I bail out, which is true. I do that all the time. Like I want this question answered. Great. Okay. Dead. Next instance, you know,

Shimin (1:07:13)
Mm-hmm.

Hahaha

Dan (1:07:38)
So I thought it funny that it.

Rahul Yadav (1:07:39)
I like that it gives you

the specific ⁓ prompts or things to add to your Claude MD instead of just being like, hey, here's some things you could do better, but not like how to actually take action on it. So it literally says, just copy this into your Claude code or in your Claude MD, and then you can avoid this problem going forward. So that was interesting to see.

Shimin (1:07:54)
You

Yeah, I'm kind of worried about the reverse centaur ⁓ possibilities here. Like I could see a world where insights is being run continuously. Yeah.

Dan (1:08:18)
Your behavior isn't. Yeah,

I know. Well, when when Zoom rolled out their like meeting insights thing, that was pretty hilarious. I don't know if he's zoom at work, but ⁓ they they did something similar where it's like longest spiel and a whole bunch of other analytics like that, all based on like the transcript essentially. Yeah.

Rahul Yadav (1:08:29)
yeah.

Shimin (1:08:29)
Mm-hmm.

Dan (1:08:43)
Needless to say, I don't have very good Zoom analytics either, so I hope my manager never grades me on that.

Shimin (1:08:49)
You

Rahul Yadav (1:08:50)
Did you guys, ⁓ if you scroll at the bottom of the Claude report, ⁓ it has this like.

Shimin (1:08:51)
Yeah.

Rahul Yadav (1:09:00)
I don't know, weird summary or like it's trying to be funny or something. ⁓ Mine was, I had used it to like try and build like a support agent thing. And it says user built an entire support agent over three weeks and Claude kept discovering the existing code base like it had amnesia each session. So it's, it's like I messed up and here's some trash talk about myself.

Dan (1:09:19)
You

Shimin (1:09:29)
No, mine was not being snarky with me, but I got to try it again with like a different persona or something. I wanted to send out a little diss track.

Rahul Yadav (1:09:37)
Yeah, ⁓ like it ends with the user essentially had to reintroduce Claude to its own previous work every time they sat down together.

Shimin (1:09:47)
Yeah.

So that's our experience with the slash insights command. If you have your own insights or if you've got an interesting insight ⁓ from Claude, ⁓ send us an email, send us a DM, holler at us. Email us at humans at adipod.ai. We want to hear all about the exciting and fascinating insights from Claude.

OK.

Moving on, this is our favorite segment, two minutes to midnight, ⁓ where we see where we are in the AI bubble.

Dan (1:10:28)
We're somewhere in the A bubble and where that is this week as of February 9th is entropic closing in on a 20 billion round. ⁓ so I don't know exactly where, tech crunch gets their sources, but it's pretty interesting that, ⁓ they're claiming that Anthropics in the final stages of raising 20 billion in new capital valuation at 350 billion.

⁓ I guess they say some of that came from Bloomberg. So nevermind what I just said about sources. But the really wild part to me and the reason why this made the cut in my opinion was for two minutes was ⁓ they just raised 13 billion in equity funding five months ago.

Shimin (1:11:00)
Ha ha!

Mm-hmm.

Dan (1:11:16)
That's insane to me. I mean, obviously we don't necessarily know what their burn rate is based on that, but like you could make some assumptions that model training isn't that cheap and build outs and everything else that they're doing. So, ⁓ it is pretty wild.

Shimin (1:11:35)
Yeah, and I've got a couple of stories that are somewhat similar. First, Alphabet, the parent company of Google and Gemini owner, is selling a rare 100-year bond to fund its AI expansion. So the last time a company has sold a 100-year bond was, I think, Motorola in 1997. ⁓

right before it came yeah right before everything all went to poop so

Dan (1:12:00)
That's a good sign.

Rahul Yadav (1:12:01)
Hehehehehe

Shimin (1:12:09)
⁓ Well, two things. One, Google makes plenty of money. Why does Google need to sell bonds? And two, yeah, like the hubris of selling a hundred year bond. Like usually a bond of that duration is only reserved for sovereign nations. I know, I know like England still has bonds ⁓ from the 1400 that's paying dividend, paying coupon.

Dan (1:12:32)
Hey, Motorola's still kicking. They

just, you know, rhyme with Lenovo now, that's all.

Shimin (1:12:35)
Yeah.

And ⁓ we're only, what, 30 years in into their 100-year bond payment plan. ⁓ And the second article I have this week is Oracle says it plans to build up to $50 billion in debt. ⁓ So Oracle's raising another $50 billion in debt this year in order to pay for ⁓ essentially its obligations.

Dan (1:12:45)
You

Shimin (1:13:06)
with OpenAI, NVIDIA, and others on new data center build-out. One thing I learned this week was that the whole Stargate initiative, the quote unquote, what is it, the 500 billion build-out that is backed by the US government is actually just kind of an empty promise, a promise of a plan, a plan of a promise. I don't know what it is, but it's really just

Rahul Yadav (1:13:32)
Concepts

of a plan is the phrase you're looking for.

Shimin (1:13:34)
Yes, thank you,

Rahul. So Oracle has the obligation to build a couple of hundred billions in terms of AI infrastructure. ⁓ And they don't have enough money for that. So they have to also raise more money. And we're talking about $50 billion. It was $50 million from 2001.

This is serious numbers we're throwing around here.

Dan (1:14:00)
Maybe they should ⁓ get rid of the JavaScript trademark that might help them raise a little bit of money.

Shimin (1:14:06)
Yeah, we will all chip in. ⁓ Go fund me to buy JavaScript back from Oracle. But we'll leave Java along because we don't care about Java.

Dan (1:14:13)
Yeah,

no.

Shimin (1:14:15)
and ⁓

Dan (1:14:15)
They do actually

still own the trademark. It's pretty wild.

Shimin (1:14:17)
That's what it's ECMAScript.

Dan (1:14:19)
Yeah.

Shimin (1:14:19)
Yeah. And Rahul, you have the.

Rahul Yadav (1:14:22)
yeah. ⁓ I read this article and it really ⁓ resonated with the like, I don't know about you guys and you know, people were listening, but every single day there's just the pace of news is crazy. ⁓

and not just in AI, but like in general. And the thing that the guy really hits on here is like, back in the day, you used to have authority and like, you know, you would look to your big newspapers, your like, primetime news and whatever to be like, what is going on in the world. And now authority is a secondary thing.

how fast you can like manufacture news is the primary thing now. And so that's why you see, ⁓ calls it like the announcement economy where you see, know, open eyes partnering with this person. Anthropic is raising this much money, all these things that we look at. And then the things that he calls out is like, if you look under the ⁓ hood, especially the one of the examples he takes is that the big Nvidia

OpenAI $100 billion ⁓ thing that they had announced last year. That wasn't actually ⁓ anything that was legally signed. It was a non-binding memorandum of understanding. And yet we all remember.

Shimin (1:15:50)
Mm-hmm.

Rahul Yadav (1:15:53)
Jensen and Sam Altman and Greg Brockman standing up on like Bloomberg or whatever with like we're gonna do 100 billion and it's gonna be you know all sunshine and rainbows behind the scenes Jensen is like open AI has a lack of discipline they're unlikely like the deal just hasn't it hasn't turned into a deal it's just more like yeah we're still figuring out the actual specifics of it but this was a good five or six months ago at this point so it's unlikely that it'll

Shimin (1:15:54)
Mhm.

Dan (1:15:56)
I think we talked about it in detail in two minutes. Yeah.

Shimin (1:15:57)
Huh? Huh?

Rahul Yadav (1:16:23)
actually happen. ⁓ So it's just ⁓

crazy the amount of like news we get and how much is actually to feed into the hype so that you can ride that until you can get to something substantial and what are some of the real things that are happening and every single day, especially with leveraging AI to generate news and everything. ⁓ It just gets harder and harder to ⁓ see what is ⁓ actually true and what is just, you know, hype that's being generated with nothing actually.

Shimin (1:16:56)
Yeah, this is all very crypto-like and I don't like it.

Dan (1:17:00)
Also as a random aside, for those of you that are watching the podcast, the AI generated image that we have as the header for this article sort of looks like we combined the Death Star with a circuit board. ⁓ And I'm really wondering what kind of motherboard technology that is that has multiple exposed layers of wiring. Anyway, sorry.

Shimin (1:17:11)
That's

Rahul Yadav (1:17:14)
Hahaha

Shimin

Shimin (1:17:26)

This is why they pay Dan the big bucks.

Rahul Yadav (1:17:27)
by the way, since Shimin said crypto, do you know who owns AI.com? Did you read about that?

Shimin (1:17:35)
No.

Rahul Yadav (1:17:36)
The co-founder and CEO of crypto.com paid 70 million in crypto to buy AI.com and is now offering a consumer facing AI agent platform. So crypto is here. Big time. Crypto is going to dominate this chat.

Shimin (1:17:42)

Dan (1:17:52)
I heard Bitcoin's doing well this week too.

Shimin (1:17:52)
Crypto has entered the chat.

God. Well, ⁓

these are all ⁓ three, fairly ominous, I think, articles we got here today, or four, I should say. How do we feel about the AI clock? I think we were at a minute.

Dan (1:18:16)
Are we at 1.15?

Oh, a minute, okay. That's right, we opted not to do 15.

I mean, I still kind of think a minute feels right until that open AI ideal falls completely through. I don't think there's going to be a big, and also given the crazy stock move over like some markdown files, the hype is still hitting strong with investors. So I don't think we're in any danger of it going, but like, it sure feels bad to me.

Shimin (1:18:42)
Rahul

Rahul Yadav (1:18:43)
debt financing keeps going higher. I think the thing we haven't talked about yet is this year being midterms, AI is gonna play a big part in all the election campaigns. And I think not necessarily just the funding and stuff, but some of these things coming together could really like. ⁓

you know, do some crazy things out there. So I don't know about the specific bubble itself but damn it.

Dan (1:19:12)
Well, that's the point of this segment, you better round it

Shimin (1:19:15)
Give us a time!

Dan (1:19:15)
back up to that.

Rahul Yadav (1:19:17)
Minute, I don't know, 45 seconds, whatever you guys feel good with. I'm not a predictor.

Shimin (1:19:22)
I mean, to be fair, like

a minute is like, a minute is like this thing is going to pop any second now. ⁓

Dan (1:19:24)
I'm get him out of that.

Rahul Yadav (1:19:29)
⁓ the nuclear

guys have it less than a minute, don't they? The nuclear clock? Yeah. We're a bunch of pessimistic people.

Dan (1:19:33)
Yeah, they do. Yeah, because of Russia. They just announced that pretty recently. Yeah. I hadn't actually thought of the real clock in a while and they

Shimin (1:19:35)
Yeah.

Dan (1:19:42)
they talked about it and I was like, wow. Yeah.

Rahul Yadav (1:19:44)
You

Shimin (1:19:45)
I think

that's all fear-mongering. I actually think...

Dan (1:19:48)
That is, but this is.

Rahul Yadav (1:19:48)
Hahaha

Dan (1:19:50)
I two minutes of midnight segment, totally rational. Know if you're wondering

Shimin (1:19:53)
Absolutely.

The actual...

Dan (1:19:55)
You

Shimin (1:19:56)
those guys know come on i think i think i want to push it back to five minutes this week and my rationale i know this is is hot at least five my rationale is the latest models are so good that it is think about this saas company losing market value is actually a good thing for the ai bubble being sustained

because that means more of that money is going to be directed into AI. And I think the models are good enough to buy them at least another four months. Famous last words.

Rahul Yadav (1:20:26)
But the money only goes

into AI if you have the valuation to throw that money at AI. If you don't have it, what are you going to finance against?

Dan (1:20:36)
And, and remember that article where we read that like, it was like two, two weeks ago where it was like, what if AI is both good and not revolutionary, right? And not, not, not revolutionary, but like, know what I mean? What if that's true for this too, where it's like, it can be good, but not provide significant economic uplift, right? Which is kind of what the evidence is bearing out right now.

Rahul Yadav (1:20:46)
yeah.

Yeah.

Shimin (1:20:58)
Well...

Dan (1:21:01)
It's changing things, but it's not like dramatically improving, like, you know, the amount of open source software being written, et cetera, is like not going up. The only metric I've seen go up into the right is the number of show hacker news. Posts really that's like difficultly gone up.

Shimin (1:21:15)
which really has gone up.

Rahul Yadav (1:21:15)
Yeah, like until it starts showing up

in your like manufacturing productivity and all sorts of these things, it wouldn't really because people have that argument of like software doesn't really show up in all the productivity numbers and everything.

Shimin (1:21:28)
Yeah, I guess what I'm trying to

say is, and let me put my investor hat on, as a seasoned investor,

Rahul Yadav (1:21:34)
⁓ man.

I

think we, yeah, we're with you, whatever you say.

Shimin (1:21:40)
I was worried that the technology has plateaued from all these people saying LLM is a dead end. And now I'm hearing from the devs that these new models are great. So I would like to invest another 10 billion into Anthropic if possible.

Dan (1:21:47)
huh.

Rahul Yadav (1:21:57)
Is that a sign of the bubble getting bigger? I'm not following your... okay.

Dan (1:22:01)
So.

Shimin (1:22:01)
Yes. Well,

cause I am willing to loaned OpenAI and Alphabet and Anthropic the extra money. Cause I think the models are still getting better.

Rahul Yadav (1:22:09)
I see.

Okay, but it's not close enough to the peak.

Dan (1:22:12)
So where are we at here? We've

got one minute grudgingly a minute or 45 seconds and then five minutes. does that mean we're... You just took a sat up for this, you know? This investor hat is now off officially.

Shimin (1:22:15)
Right.

Rahul Yadav (1:22:20)
Ha ha.

Shimin (1:22:22)
I'm back guys. I'm back. What did I miss?

Rahul Yadav (1:22:27)
He just bought AI.com and crypto.com

Shimin (1:22:28)
I just.

Rahul Yadav (1:22:33)
from the crypto guy.

Shimin (1:22:33)
The spirit of capitalism just overtook me. Okay?

Dan (1:22:36)
But so pushing, if you want to push it back five minutes, I want to keep it the same. Raul wants to go forward a little bit. I think that leaves us going back a slight amount. Is that the average?

Shimin (1:22:47)
Yep, I think so. I

think so. Yep. So like a minute forty five.

Dan (1:22:52)
Okay, I don't like going backwards. It doesn't feel right to me. But okay, we'll see. No, we'll do it. We'll full bloom in democracy, meritocracy, democracy, I don't know.

Shimin (1:22:57)
I could also be out-voted, we can go to it.

One minute left.

Dan (1:23:05)
He's, he's the producer so he can take the entire podcast away from us. So we'll listen to what he says and.

Shimin (1:23:12)
I would just create a AI

avatars of you guys going like, wow, Shimin you're so smart. That's a good point.

Dan (1:23:17)
Hahaha!

please edit that in after the outro.

Shimin (1:23:18)
All right, I'm in at 45 minutes.

Rahul Yadav (1:23:19)
you

Shimin (1:23:22)
No, I wouldn't do that without your guys' consent for your rights. And with that, folks, that's the show. Thank you for joining us this week for our ⁓ conversation show. If you like the show, if you learned something new, or if you just enjoyed or had a couple laughs, please share the show with a friend. You can also leave us a review on Apple Podcasts or Spotify.

It helps people to discover the show, and we really appreciate it. If you have a segment idea, a question for us, disagree with the ⁓ time on the clock, shoot us an email at humant at adipod.ai. We'd love to hear from you. You can find the full show notes, transcripts, and everything else mentioned today at www.adipod.ai. Thank you again for listening, and we'll catch you next week.

Rahul Yadav (1:24:08)
and keep an eye out for our security prompt packs. They're coming.

Dan (1:24:08)
And don't forget to.

Well, security prop, thanks. I was gonna say email us your round of 16 suggestions. think we should really.

Rahul Yadav (1:24:16)
⁓ yeah,

Shimin (1:24:17)
yes,

Rahul Yadav (1:24:17)
that too.

Shimin (1:24:18)
yes, I like it. All right. Thank you for listening. Bye.

Dan (1:24:19)
Please do.

Episode 13: Pi Coding Agent, Dark Factories & the Furniture Makers of Carolina
Broadcast by