Crabby Rathbun, Model Councils & Why You Want More Tech Debt
Shimin (00:15)
Hello and welcome to Artificial Developer Intelligence, a conversation show where three software engineers talk about the impact of AI on programming. I am Shimin Zhang and with me today are my co-hosts, Dan, Handcrafting Hit Pieces on open source maintainers, Lasky, and Rahul. He loves the smell of fresh tokens in the morning, Yadav. Guys, happy Lunar New Year and a happy belated birthday to you, Rahul. How we are doing?
Rahul Yadav (00:42)
Thanks, happy Lunar New Year.
Dan (00:44)
As a white person, I feel like I'm extra entitled to celebrate Lunar New Year this year because it seems to be the popular thing to do.
Rahul Yadav (00:51)
Yeah.
Ahem.
Shimin (00:53)
Embrace
your inner fire horse. That's all I have to say about it. You've definitely found me in a very Chinese period of my life.
Dan (00:56)
Yeah.
Rahul Yadav (00:57)
I...
Dan (01:01)
All right, I'm glad that wouldn't have it any other way.
Rahul Yadav (01:03)
I
was genuinely wondering if Kentucky Derby would see more bets or more people because this is the year of the horse and like all these horse races would see more action or I don't know if like some people say it's supposed to be unlucky if you're horse so I don't know which way it goes but
Dan (01:17)
or more Airbnbs.
Shimin (01:17)
Yeah, let's meet up.
Dan (01:23)
I'll report back.
Shimin (01:24)
Well, as the
resident Chinese person, I have no idea. So, As always.
Rahul Yadav (01:28)
The horses shouldn't bet
Dan (01:28)
Alright.
Rahul Yadav (01:30)
on horses, but non-horses should bet on horses, I think.
Shimin (01:34)
Uhhh
Dan (01:36)
Well, I think like
with all the betting platforms today, you should be betting on the number of people that are going to bet on horses, right? It needs to be kind of meta with. Yeah.
Rahul Yadav (01:41)
There would be polymarket
Shimin (01:44)
⁓
Ha
Rahul Yadav (01:46)
on Hamay.
Dan (01:47)
It's like betting derivatives. ⁓
Rahul Yadav (01:49)
man shout
out future sponsor
Shimin (01:50)
Ugh.
Yeah, on today's show, we're to start with the news thread mill as always, we're going to talk about the AI bot Krabby Rathbun and how OpenAI and Anthropic are both seeing some departures and rearranging of the departments as well as the news item that attackers prompt Gemini over 10,000 times trying to clone it.
Dan (02:15)
Google's happy about that. ⁓ Next up, we're going to have the tool shed where we'll be talking about some model councils which is pretty cool.
Shimin (02:16)
Nope.
Yeah, followed by post-processing, where we're going to talk about why we're not worried about AI job loss, and we are not taking on enough tech debt, but also how generative and agentic AI shifts technical debt and cognitive debt.
Dan (02:39)
Then we're going to open a deep dive where we talk about some weird PDF that Raul found on the internet. ⁓ can't wait. Now it's workflows and automations.
Shimin (02:44)
You
⁓ lastly, we're going to talk about two minutes to midnight where we cover where we are in the AI bubble cycle. So why don't we get started? The big news in the open source sphere this week has been the AI bot. Dan, why don't you give us a little background on that?
Dan (03:04)
I don't know if I'm saying it right. Crabby. I want to pronounce it like wrath, like wrath bun, like I'm angry, but it's crabby. It's spelled R-A-T-H bun. Yeah. So there's been some ongoing saga this week. If you haven't really been following, a maintainer for math plot lib, which is, you know, one of the more popular plotting libraries for Python.
Shimin (03:11)
Mm.
Dan (03:26)
Rejected a like clearly very vibed, uh, pull request. And then, as the story goes, supposedly the same bot wrote a hit piece about him on the internet about how he was rejecting it purely based on who had written it and not the, actual quality of the code and all those other kinds of things. So then.
the sort of internet drama thickened later on because Ars Technica, which is actually a publication that I love reading and that generally pretty well respected in the space, I feel like, themselves, Vibecoded, I don't know what you call it, Vibe wrote an article about it where the LLM that they used made up a whole bunch of quotes from Scott Chamberg, who is the maintainer that actually was the victim of this hit piece.
Shimin (04:12)
Ha
Dan (04:16)
And they later very publicly retracted this in front of essentially everyone. So of course the retraction also got posted like a million times on Hacker News. But I was proud to say that my RSS reader caught it before Hacker News did. So it was pretty cool to follow that whole thing. But absolutely bonkers, right? Because we've got, first of all, a supposedly autonomous coding agent that someone has
paying money to cut loose on the internet to essentially go around and open pull requests. someone else did a random sort of like blog post about the drama and looked at all that. found all the PRS that it's opened. so as you can see on the screen, if you're watching us on YouTube, it's quite a few pull requests, going on for like a string of days. So even after the crazy article and everything, it's still, it's still cranking.
And then extremely late breaking news I just saw but I have not had a chance to really catch this. Hopefully we'll get to it in a future episode. There's someone claiming that the entire bot is actually a crypto bro. And the whole thing might have been human all along. But who knows? I mean, know, it's very par for the course as far as the moltbook drama, right? Because it's like turned out that at least some percentage of the multbook posts were made up by people trying to...
Shimin (05:18)
Mm-hmm
Rahul Yadav (05:28)
bunch of people.
Dan (05:32)
make AI seem spookier than it may actually be. pretty wild, wild week, all things considered.
Shimin (05:38)
I do wonder what is the crypto bros rationale for opening these random PRs or is just to gain clout?
Dan (05:45)
I mean, that may not even be true, right? It's just, I mean, it's literally popped up on Hacker News right before this. Haven't even had a chance to watch the video. I just thought it was worth mentioning because there's, what's a little more, just a little sprinkling of drama on top of the existing drama, right?
Shimin (05:55)
Right.
Rahul Yadav (06:01)
Is or is someone out there actively working to create misaligned AI? And this is the first step towards you start with small trash talk and then that gets fed into future model training. And they're like, Hey, we did it last time and got away with it. So let's push it a little bit more.
Shimin (06:02)
de-
Dan (06:02)
Crabby Rathbun.
Is it misaligned or is it just aligned with the trolls?
Rahul Yadav (06:26)
yeah, I guess so. I think that's called rock. We'll cut that part out.
Shimin (06:27)
Mm.
Dan (06:28)
Alignment's all in the eyes of the beholder.
Shimin (06:30)
Yeah, I think from this blog post about the Crabby bot, don't believe everything you saw on the internet has become don't believe anything you see on the internet.
Dan (06:39)
Anything.
Yeah, living in a post-truth world is pretty strange.
Shimin (06:42)
How far has the internet fallen?
Yeah,
and not just text, like every single piece of image, video, everything. Like my feeds are already full of AI generated crap. ⁓
Rahul Yadav (06:57)
So I...
Dan (06:57)
I mean, we had a pretty respectable
Rahul firing his entire technical writer staff generated over the weekend as well.
Shimin (07:03)
Ha
Rahul Yadav (07:04)
I have to ask,
did you guys see the whole Brad Pitt Tom Cruise fighting thing that everybody was freaking over? Like it looks very, yeah, it's the new SeeDance generated. It's like short clips and it's pretty realistic. So, you know, the whole like don't believe like what Shemin's saying with like videos and photos is getting to that point where it's very hard to tell.
Dan (07:12)
No.
Shimin (07:13)
Yes,
it's Chinese New Year, of course.
Rahul Yadav (07:31)
Did those guys film it or is it you know AI generated?
Shimin (07:35)
Yeah, pour one out for the old internet as we know that.
Rahul Yadav (07:39)
Right in time for the midterms by the way. know, SeeDance has got its timing right and we'll see all sorts of crazy shit this midterm.
Dan (07:43)
Hahaha.
Look,
think speaking of Chinese New Year and Proverbs, is that, ⁓ you know, may you live in interesting times. All of this is just a grand experiment to prove whether or not that's a curse. And but we'll see.
Rahul Yadav (07:57)
You
You
Shimin (08:00)
Mm-hmm.
speaking of dis information campaign, you would think that the Frontier Labs would have something to say and have some teams that are there to make sure that our tools are indeed aligned with our human values. And this week, we have this article here from TechCrunch with the title of OpenAI, Disbanded Mission Alignment Team. So what happened was OpenAI once had a team of 6 to 7 software developers working on their
alignment team. And now the former team leader is given the role of chief futurist and everyone else has been reassigned. And on the neck of that news, one of the chief safety officer of Anthropic has also just left. This is the
tweet from Miranank Sharma, who used to be the head of the safeguard research team at Anthropic. And he sent a very public goodbye letter to the rest of his team with the line that says, we are living in
a world where a whole series of interconnected crises are unfolding in this very moment that he no longer thinks this is the right place for him to be. I believe he is going on to do a PhD in poetry, but ⁓ I can respect that. So I don't think the big frontier labs are doing anything about alignment and safeguarding.
Dan (09:23)
as one does.
Yeah, also like this didn't quite make the cut, but around the same timing, OpenAI just rewrote its ⁓ mission statement too and chopped a whole bunch of language about openness and driving humanity forward and stuff like that, which is kind of funny as well. So definitely seeing a shift here and ⁓ some reactions to it.
Shimin (09:45)
Right.
Yeah, but a kind of a tangent. ⁓ There has been report that the US Department of Defense, or guess Department of War now, is cutting ties with Anthropic over their refusal to provide a jailbroken model to the government. So at least they are putting their money where their mouth is, so to speak.
Rahul Yadav (10:08)
yep
Shimin (10:13)
Do you feel bright about the future of AI safety?
Dan (10:13)
Yeah.
Shimin (10:16)
Ha
Dan (10:16)
Again,
alignment's in the eye of the beholder, so possibly, but might not be the safety that many people were hoping for.
Rahul Yadav (10:24)
I'm more curious what PhD in poetry entails and how long it takes. I've never heard of it. I'm genuinely curious. Yeah.
Dan (10:28)
Ha
Should ask Gemini.
Speaking of Gemini, we have an article that I added largely because I thought it was kind of hilarious. distillation is nothing new, right? We've had DeepSeq distilling their own great model on top of like Qwen and a whole bunch of other things to just for like sort of practical deployment at a smaller scale. There's been some
insinuations over the years, months, weeks, whatever, it doesn't even matter anymore, Chinese model folks have been running distillation processes against Frontier Labs and then using some of that to help train their own models faster and more efficiently than they might have otherwise. So I think Google's been digging in a little bit there. And apparently Google...
publicly announced that they found a quote unquote commercially motivated actor, which is attempting to clone knowledge from Gemini by essentially asking it a whole bunch of prompts in an automated fashion and then recording the responses. So like they caught him doing this like a hundred thousand times. think presumably before that account got banned and they insinuated it might be other accounts doing this as well.
But the part that just really got me is the fact that these models are trained on all kinds of human discourse, including intellectual property that has been copyrighted. And there's still ongoing legislation and lawsuits and everything else about this. So for Google to go around and be like, they're stealing.
Shimin (11:47)
Mm-hmm.
Hahaha
Dan (12:10)
very small parentheses,
what we stole too, is just funny to me. I don't know. I understand where they're coming from, at the same time, was too funny not to mention.
Shimin (12:20)
Yeah, I think when the latest GLM model first came out, I saw on Reddit where a user asked the model, which model it is, and the GLM model spit out, I'm Gemini, created by Google. You know, all is fair in love and corporate AI model distilling.
Dan (12:28)
It said it was Gemini.
Rahul Yadav (12:32)
Ha ha!
Dan (12:32)
Yeah.
Shimin (12:39)
Distilling or distillation, if you've not heard the term before, this article had a great description of it. It's like going to a restaurant and reverse engineering a chef's recipes by ordering every single dish on the menu and then working backwards to try and find out what the recipe is. This allows, and this is a fairly standard way to create smaller models that are as powerful as larger models.
I think the implication here that is most interesting to me is the big frontier labs have no way of restricting their users from doing distillation without like banning IP. it's going to be, yeah, it's going to be leaky no matter what. So what does this say about their moat?
Dan (13:15)
preventing them from being able to use it, yeah.
Yeah, it's interesting.
Rahul Yadav (13:23)
They might,
I'm just guessing at this, they might end up moving to some sort of like, you have to reach out to us and then they like allow you to be able to do these calls. like people do distill models for like legit, you know, to spin off their own little business use cases and stuff. So. ⁓
My guess is it will end up being like, sure you can do it, but we need to first see what you're trying to do and what your, how legit your domain is and all those things.
Dan (13:56)
And we need to cut 2 % off the top for every token.
Rahul Yadav (13:58)
Of course. Of course.
Shimin (14:00)
course.
Yeah, but if North Korea can get their engineers hired into large tech companies, what are the odds that you won't be able to trick one of these labs?
Rahul Yadav (14:11)
you can
spin up a legit sounding thing
Dan (14:14)
And I suspect they only got this because they basically did something fairly dumb, which is just running into a thing in a loop, essentially like here's the array of a hundred thousand prompts that we want to check out. then they probably hit some rate limit and that pissed off an SRE. So they looked into it and was like, Oh, Whoa, someone's doing this. But if they'd done something crazy, like mixing traffic or whatever, then, or going through a different provider, then they may not have even caught them.
Rahul Yadav (14:21)
Yeah.
Shimin (14:41)
Mm-hmm.
Yeah, I think overall this is a good for the open source models and I'm here for it.
Dan (14:48)
Although to your moat comment, one thing worth always mentioning for Google specifically is they do sort of have that TPU moat.
Shimin (14:56)
Mm-hmm. Yep.
Dan (14:57)
Like
they can and probably will eventually beat everybody on cost unless folks innovate like Groq or similar.
Rahul Yadav (15:05)
Yeah, they're the only ones that own.
Dan (15:08)
With a Q, not
a K. It's not confusing.
Shimin (15:10)
Okay. Yeah. They're not the only ones with their own chips, but they are the only ones with at scale. Yeah. They've been doing it for the longest.
Dan (15:15)
At scale, would argue.
Rahul Yadav (15:18)
Sorry,
and like, or they're the only ones that can build everything that goes into building these models from like the little SIN that goes into it all the way to like providing these models, right? They can build their own chips, own data centers, all the way to like the application layer and everything.
I don't think there's anyone else, any other competitor who can do this.
Shimin (15:45)
Maybe Tencent or one of the big Chinese tech players. But I think that's the only competition. Yeah. Or Alibaba for that matter. All right. Moving on to the tool shed, actually, guys, we have our first listener email. And this is a listener suggested topic for us. the email, I'm going to read the email real quick.
Rahul Yadav (15:53)
Yeah.
Shimin (16:10)
And I quote, hey guys, really love your podcast. I haven't heard anyone talking about it, but I came across the perplexity model council announcement on February 6th. I thought I would toss it to your opinion for the next episode. I don't have a perplexity max subscription, so I can't try it out. Although I suppose it could be replicated by grabbing responses from three models and prompting one to synthesize them.
The console approach seems more human efficient, but obviously relies on the LLM getting the synthesis right. So I pulled up the Proplexity Teams announcement for model council It basically allows Proplexity Max users to run multiple models. In this case, Opus, For6, GPT-52, and Gemini 3. then depending on the model,
the distribution of the outcome will be slightly different. So then hopefully with a powerful enough synthesizing model, it's able to catch topics or key insights that one of the individual models missed. This seems like a valid approach. I think we've spoken about how I run the same prompt through multiple models all the time manually. And it's time consuming. And Rahul said, I think you said at some point, I hope you're using
open router for this and not doing it manually, but I was not. The perplexity max account is $200 a month and I don't have one. I did not want to spend the money to look this up. Thank you, reader.
Rahul Yadav (17:20)
Yeah.
Dan (17:27)
You
Rahul Yadav (17:30)
Are you gonna link
to it and be like, please go buy it and try it out and report back? ⁓
Shimin (17:34)
Hahaha
Dan (17:34)
our wishlist for the pie.
Shimin (17:37)
I am not, but what
I did do is I found.
Dan (17:40)
just speaks to your anthropic
bias.
Shimin (17:42)
Well, it uses Opus as an underlying model, so you're still throwing some money that way.
Dan (17:45)
He's got an
anthropic max account, he doesn't have...
Shimin (17:49)
Anthropic if you want to sponsor this show, I'm here for it. ⁓
Rahul Yadav (17:53)
We should each
get one. You got the Claude Max, I'll get the Google AI Ultra. Dan's stuck with what our OpenAI offers. And... Because we love you, Dan.
Dan (18:02)
gosh, why would you guys do that to me?
Shimin (18:07)
Hahaha
Dan (18:07)
I've actually heard good things about the latest
⁓ yeah, now that I'm on the pie train that I could maybe swing it just to, ⁓ break my own Claude bias for a little bit.
Shimin (18:19)
Yeah, so what I ended up doing is I found the LLM console, which is Andre Karpathy's repo for implementing the same feature back in December. He ran this as an experiment, as a VibeCoded repo that essentially allows you to configure which models you want, takes an open router API key, and give it a shot. I tried it on some of my favorite
questions or queries when it comes to testing how good a model is. I like a very open-ended query that requires a lot of in-depth world knowledge and a lot of zoning and that kind of information. I think the synthesizer matters a lot, right? But even just to see where each model's
kind of drift in their response, I think is a good way to get a feel of how various models work and what their relative weaknesses are. So I tried this with Gemini 3 Pro, Claude Sonnet, and GPT-5-2. And you can, off the bat, just know that Sonnet is the weakest of the three. So it has this really nice UI
where the synthesizer model provides a rubric and guideline for where each model's strengths are and for where each model falls short. ⁓ I may try again.
Dan (19:39)
What did they use for
synthesis?
Shimin (19:41)
It was the Gemini 3 Pro. You want to use the strongest model as a judge usually. So in this comparison, yeah, Gemini 3 was the strongest, at least in my experience.
Dan (19:49)
Interesting.
It would also be interesting if the judge got judged periodically and you could have like an election for what the judge model was based on quality where there's like a QA check. It's just looking at ways we can burn a lot of tokens really quickly. But seriously, like, it seems like, yeah, it does seem like that kind of thing is being used in like Gastown right? Where they're like, there might be some redundancy, but at the end of the day, you wind up getting a better product.
Shimin (19:58)
you
we have to keep the bubble going
Rahul Yadav (20:11)
Which model?
Dan (20:20)
Sorry, Raul, we saying?
Rahul Yadav (20:21)
I was just curious which model do you use as the judge? Or you can pick whichever you feel the most comfortable with.
Dan (20:25)
Gemini.
Shimin (20:28)
If I were to do it for realsies, I'd probably use Opus 46. That's the one I feel the most comfortable with, with high reasoning. But I think it's a great way to just verify the underlying model. But I have no concrete evidence that OpenAI's GPT 5.3 is not going to be a better synthesizer. That's something I can probably try out.
Rahul Yadav (20:34)
Yeah.
Yep.
Dan (20:49)
Well, I'll tell you after
Rahul buys me the max subscription for it.
Shimin (20:54)
Yes, when's your birthday? ⁓ darn it it was was back. Yeah next year I've I've first
Dan (20:57)
Yeah.
Rahul Yadav (20:59)
Wait for another year.
Patreon link,
are you going to put it at the bottom so we can buy Dan
Dan (21:07)
Please fund our model habits.
Shimin (21:11)
The first time I heard about this model council approach was probably May or June of last year at one of these tech conferences where the initial approach was instead of having a model run at once, you run the same problem three times or five times using the same model and then have them vote with each other.
to decide on the best approach. This seems like a level up from that. And then another similar approach I've used, and I think I believe I first read about this on Steve Yegges blog, is ask the model to check its own work and look for improvements. And you do this n times, usually up to five, and it will eventually converge. And the converge solution is usually much better than the initial solution.
Rahul Yadav (21:59)
Yep.
Shimin (21:59)
I do that pretty regularly as well. So there's definitely merit to this.
Rahul Yadav (22:01)
Yeah, like you
see this pattern more and more in like how people are setting up their agents where the agent doing the task is different than the agent verifying that the doer did the job.
Dan (22:16)
Yeah, that was gonna be my question. Are you
separate context for that, for that to be effective? Like, is it a sub agent?
Rahul Yadav (22:22)
They work off of the
same spec, I think. Like, you look at this thing and make sure that the other agent did the thing, other agent looks at the thing and tries to accomplish the task, but has no say in just being like, other than like, my work is done, someone else needs to look at this. ⁓
Shimin (22:40)
Right.
Rahul Yadav (22:41)
Yeah, and then obviously like TDD you try and like write the test first so that it doesn't match the test to what it wrote it actually tries to like satisfy satisfy the test.
Dan (22:50)
Maybe that's what I was doing wrong, was not TDDing.
Rahul Yadav (22:53)
Hahaha
Shimin (22:54)
Always TDD
All right, yeah, I hope listener that answers your question, but definitely seems promising.
Rahul Yadav (22:59)
Thank you
Dan (23:00)
And thanks for listening.
Rahul Yadav (23:02)
for listening and for sending these links. If you have any more, please keep sending them, our way
Shimin (23:07)
Yeah, definitely send us ⁓ more topics. We appreciate it. All right, let's go to post-processing. Dan, first one is an article from you. But I think to set the ground rules for it, ⁓
Dan (23:20)
I didn't write it.
Shimin (23:21)
you submitted. The first.
Dan (23:22)
Yes.
Rahul Yadav (23:23)
Not with that attitude.
You have to distill the article Dan
Shimin (23:27)
I think the article that truly took the world by storm this week is this something big is happening article by Matt Schumer I've had lots of folks who are not developers, not in AI sending this article to me asking me if Matt's correct. And basically the thesis here is AI is going to change everything. It's like COVID back in 2020, every single white collar.
Dan (23:27)
I have clawed for.
Mm-hmm.
Shimin (23:50)
worker is endangered, they just don't know about it yet. Right. And then the article you submitted is kind of a counterpoint to that.
Dan (23:58)
Right. And I kind of feel like this is the two camps that we keep talking about quite a bit on the show, but there's something to be said for, you know, the advancement and timing and everything else. But I think that there's also something to be said for keeping a very careful eye on the people that are making those promises.
⁓ and this actually kind of came up in, in work context for me today because folks are interested in trying to drive adoption, but there's also this like, kind of very careful balance that I think you need to strike depending on the type of company you have. So it's like, you could look at this like, yeah, it's coming and it's going to, you know, take over everything, but the type of people.
And at least in my opinion that are, and this is very much my opinion that are doing this, unleashed 3000 agents and they're all cranking on this, you know, thing are working in greenfield projects for the most part, right? Where there isn't like a huge established legacy code base that might be across several weird languages and all kinds of other stuff. Um, and they're also in a situation where
more than likely they're racing runway, right? Meaning you're working for a startup, have a finite amount of cash to burn and whether or not the product's gonna make it is actually more important than the quality to some degree insofar as the quality doesn't influence that, right? ⁓ And obviously there's an intersection where that's a problem, but like at a certain point, know, when how many, well, maybe, yeah, how many...
Shimin (25:16)
Mm-hmm.
Dan (25:27)
startup code bases have you worked in where it's like, my God, why did, why was this done? And the answer is always like efficiency at the moment because we didn't know if we were going to die in two weeks before the, you know, be around came in or whatever. ⁓ yeah, certainly. I've, you know, spent my fair share of time looking and stuff like that too. And which is fine, right? You don't look at it and go, man, those people sucked. You look at it and go, yeah, I probably would have made the same decision in that timing.
Shimin (25:31)
Mm-hmm. Yup.
All of them is my answer to that question.
Mm-hmm.
Right.
Dan (25:53)
Which I think is also true of AI too, right? It's like, yeah, if I was in that situation versus sort of like the more medium sized environment that I'm working in today, like I would make very different decisions. So I think it's like finding that balance. And that's kind of what I liked about this article is that like, yes, there's potentially a huge impact. That huge impact is kind of driven by the,
size and shape of the teams and in certain situations, really at a certain type and structure of company, it's more important, how you communicate about what work you're going to be doing than the actual coding, right? The coding isn't necessarily the bottleneck. so So through that lens, he's not worried about AI taking his jobs. Ask me if I am maybe.
Shimin (26:36)
Yeah, I-
Dan (26:37)
It depends. I have changed my mind every week.
Shimin (26:40)
There are good days and bad days. ⁓ The observation that David Oks the author brought up that I found to be very helpful is in economics, this will be probably a pretty economics heavy episode, but in economics, there is this idea of comparative advantage, which is like some countries, America may be better at
Dan (26:42)
Yeah.
Shimin (27:05)
farming and manufacturing than another country, let's say, Atlantis. But even if that is true, America should still focus on doing one of the two, right? America should still be focusing on manufacturing and use their extra capacity on manufacturing, whereas Atlantis will focus on farming. And this would actually increase the overall wealth in the world.
through the power of trade. So what David is arguing for is like, yes, even assuming AI is better than humans at coding at everything, the fact that you still have humans means that when you have AI with humans, the overall productivity will still be greater than exclusively using AI for everything. Therefore,
the future will still be a human working with AI future. And therefore, there will still be jobs for humans. And of course, the Jevons paradox.
Dan (28:01)
And it's a messy, messy conglomerate. It slows
everything down, right?
Rahul Yadav (28:08)
So there will still be jobs, but not necessarily the jobs we have today, right? ⁓ And so this is the...
Shimin (28:08)
Yes.
Dan (28:15)
Yeah. But we've talked about that in detail too, right? Like,
you know, you're not handwriting a lot of punch cards today, are you?
Rahul Yadav (28:22)
Exactly. And like the, the thing that like, Dario had in his recent post and his talks and stuff is the every job being impacted and how like, collectively, they're getting impacted. It's not like over the course of a long time horizon. And so
even if you have humans and AI working together, it might take longer for those new jobs to come about and humans will take longer to learn something than an AI would. You know, at least that's my intuition. the, the doomer side of this is like,
Sure, but it's not going to be an overnight swap where humans and AI collectively are doing the same job that a human was doing before. So there might still be short-term disruption and then everybody has to figure out what the new world looks like. And anytime there's uncertainty, there's less capital. And so you would actually like, people would just hold money until someone figures out what the new world looks like. So we have to like account for all those things happening in real life.
Shimin (29:27)
Yeah, that makes sense. Well, but once we get to that feature state, the article also brings up a good point, which is
For any workflow, the bottleneck is always going to be the one that is more or less the most valuable. So if humans eventually become that bottleneck and AI becomes extremely cheap and commoditized, then the workers will be able to use that bottleneck to get some bargaining power. And they will actually be able to proportionally capture more of the overall productivity of that workflow.
which is something that I've never thought of before. like, you know, when, when we no longer have punch cards or when we no longer have computers, we're doing calculations with pen and paper, then the software become the software developer becomes a bottleneck. then software developer is able to capture more of that productivity. And I wonder if at the end of this, you know, humans will still capture most of the economic productivity. You know, it just, just becomes a resource allocation issue. ⁓
Dan (30:26)
Or if
reverse centaur happens from Cory.Town, then maybe the LLMs will be capturing them.
Shimin (30:31)
Okay,
which brings me to a ⁓ new question for you two and for our listeners. Thank you for mentioning the reverse centaur I was talking to my wife about this the other day and we were talking about the idea of reverse centaur and we think it is a terribly named thing, right? It's a mouthful. It is not clear.
Rahul Yadav (30:36)
you
Hahaha
Shimin (30:54)
So she suggested that instead of calling them reverse centaurs we call them Minotaurs because it has an animal head and a human body. Or the alternative choice is the character Edgar in Men in Black. I don't if you remember that. It's where the little alien controls a very intricate robot, human robot. So listeners, if you have ideas for a better name for the reverse centaur please let us know.
Rahul Yadav (31:11)
yeah.
Dan (31:19)
But I mean, okay, so the reason why it's reverse Centaur though is because Centaur from like sci-fi lore is a long running term for like, unquote, assisted humans, right? They didn't even make that up. That's been around since, I don't know, probably the 50s. So reverse Centaur is just the play on that so that you understand the...
forward-facing concept, whereas if you went straight to that analogy, I think it might get lost a little bit as to like the flip and why the flip is interesting slash advantageous to certain sets of people.
Shimin (31:54)
I don't know, am I right or is Dan right? Listeners, let us know.
Dan (31:59)
There can be only one. Whoever writes in more in support of their argument, the other person gets booted off the podcast for a week.
Shimin (32:05)
Yeah, or pitch us with a better idea
for reverse centaurs
Dan (32:09)
You
Shimin (32:09)
OK, moving on. Our next post this week, is brought to us by Rahul, titled, Are Not Taking Out Enough Tech Debt This goes directly to what Dan, you were saying about startup code bases being messy. So go ahead, take the stage, Rahul.
Rahul Yadav (32:23)
Yeah, Dan and I didn't talk about this
at all. And when Dan said that I wrote it down because I was like, yeah, you know, this totally ties into what I'm going to talk about. The argument that this person is making in the article is the cost of creating software is so low. Now it's like another like zero interest rate phenomenon where you
can just borrow against the future in terms of tech debt. You can aggressively just try and figure out what your value prop is. Because like Diane said, a lot of companies are trying to figure out what the new world looks like, what your new offerings look like, and everything. So it doesn't matter as much that your code is the cleanest and best written. It's more about finding your.
⁓ the right value prop and you should aggressively be just going after that using agents and the given that the cost of like inference and models is falling every time you get these new models and ⁓
even in between we get more optimizations, it is going to be much cheaper to serve that debt in the future than it is today. So this is like the United States approach to debt applied to software. And I thought it was great.
Dan (33:45)
I'd be remiss if I didn't point out one critical caveat though, which is that you still need to be smart about where you take on the debt, right? Cause there's certain types of debt that are much harder to reverse than others. So taking a shortcut to do something is only as good as Essentially
Rahul Yadav (33:51)
Yeah.
Yep.
Dan (34:03)
If you do something that will require like a database migration or stuff like that, that could potentially be much more, know, much higher risk. think that's the area where I would be cautious. Something like that versus like, let's try this like fairly expensive experimental architecture. And guess what? You can replace that in 35 seconds.
Rahul Yadav (34:09)
Yep.
Yep. ⁓
or yeah.
Yeah, yeah, or like you called out like, you know, existing messy code bases that have so many different languages, you can easily avoid that just by just being like, nope, let's just everybody use one so that we don't have to do all this later and do extra migration work and everything. So yeah.
Dan (34:42)
and have an agent rewrite the other languages to be whatever
your standardized one is, yeah. Which was clearly Haskell.
Rahul Yadav (34:48)
⁓
Shimin (34:50)
Did anybody,
did you guys learn Haskell slash use Haskell?
Rahul Yadav (34:54)
I don't know where is. Dan probably did.
Dan (34:55)
I have a friend that is a
Shimin (34:56)
It's-
Dan (34:57)
huge proponent of it. I've looked at it and my brain hurts a little bit, which probably says something about my intellect more than the language, but...
Shimin (35:04)
I have dabbled and I feel the same. My IQ is not high enough to use Haskell. Like even Lisp, you know, is the parentheses gets to me.
Dan (35:13)
I think it's actually an intentional filtering function to like find people that are really good at bracket matching.
Shimin (35:20)
Yeah. I like the analogy of tech debt as our national debt and the way to keep on occurring tech debt would be higher productivity, Similar to how if the GDP grows faster than your rate of increase of your tech debt, you can do it indefinitely. So as long as maybe the capacity of AI agents are
Rahul Yadav (35:36)
Yep.
Shimin (35:44)
is increasing faster than your code tech debt that you're probably okay.
Yeah, this is this is a weird episode where we're at least playing with the idea of hey guys, just tech debt more tech debt
Rahul Yadav (35:55)
We like to entertain different viewpoints instead of just being set in our ways except for Dan.
Dan (36:00)
Speak for yourself.
I'm kidding, of course, for the record.
Shimin (36:10)
my article this week is about cognitive debt. And I've never really come across this before. The post is from Margaret Storey. She's actually a neighbor. She's in Victoria, Canada. So we can maybe say hi to her one day.
Dan (36:25)
send some smoke signals across them.
Shimin (36:27)
On a bright clear day, yes. And she proposes this idea of a program that rather than being a code base, a program lives in the minds of the developers capturing what the program does, how developers' intentions are implemented, and how the program can be changed over time. That has never really occurred to me before, but I think
That actually makes a ton of sense. The value of a software engineer is not just creating code, but as a keeper of the mental structure of what the program is supposed to do. And if that is the case, then
with AI coding, we are generating a lot of tech debt, but way more cognitive debt. And what she found is in her classes, using AI, students are able to build a lot of features very quickly, meeting milestones initially, but by week seven or eight, the teams will hit a wall where they can no longer make simple changes without breaking something unexpected.
Rahul Yadav (37:24)
This, when I, my take on this is this is speed running your lifetime as a startup, right? Like you see that in any like company that's been around for a while, you see the same problem.
you're just seeing it much faster because you can write a lot of code faster, you can ship a lot of features faster. like, when you see this in a lot of companies, like one of the common complaints you hear is no one understands the whole system end to end, right? And like, let's rewrite it and then we'll have two systems we don't understand end to end. And so like...
Dan (37:45)
Mm-hmm.
Let's rewrite it. What could go wrong? Yeah No, you'll have one system that the people that are
Shimin (37:54)
Yep.
Mm-hmm. Yep.
Dan (38:04)
there understand, but then once they all leave then you'll have to two
Rahul Yadav (38:07)
Exactly.
So there's like, and that's always like one of the biggest value props you got out of like the, you know, the founders, the initial employees that have been around for a long time, the architects, the like historians of the company is they had all this in their head and they could talk you through a lot of these things, write these things down. We're just like with everything that AI is doing, we're compressing decades into, you know, few weeks or months. And then that's
you get because our brains can't really ⁓ try and do the same thing that the machines are doing at the speed they're doing it.
Shimin (38:38)
All
Dan (38:44)
The other thing that immediately popped into my head as soon as I saw the title for this article too, was how much of that is also because code for better or for worse is a fairly precise descriptor of logic, right? You can certainly like write things in different styles and blah, blah. But at end of the day, like you can't read the same piece of code.
Shimin (38:59)
Mm-hmm.
Dan (39:06)
Well, I guess that's not true either. You can read the same piece of code and come away with two different understandings about what it does. It's harder, I would argue, than a sentence in English. So I wonder how much like compounding there is too by just using plain language that causes like the models, like mental models to diverge faster because of the like diversity of language at play. But like the other thing I've never really understood is like why for some people that
Shimin (39:20)
Ahem.
Mmm.
Dan (39:32)
They're like, if you vibe code, show me your prompts. So they want like, if you submit a vibe coded PR, they want the pull request and or the entire conversation with the agent submitted as well. And I've never.
Shimin (39:42)
Mm-hmm.
Rahul Yadav (39:43)
Is this a real thing?
I haven't heard of this.
Dan (39:47)
Yeah, I've definitely seen people asking for it and I don't understand the value of that personally because I'm like, what does, I mean, you get what they're asking for, I guess, but like there's so many other variables that went into like how a giant statistical model spit out a certain thing that like, it's not necessarily reproducible, right?
Rahul Yadav (39:53)
Yeah.
Shimin (40:07)
Mm-hmm. it's not deterministic, so it's by definition not reproducible.
Dan (40:10)
Exactly. Yeah.
Rahul Yadav (40:11)
Yeah.
Dan (40:12)
So I'm like, what's the point of I mean, I guess you can get the intention from it, but I don't know. It's just odd to me.
Rahul Yadav (40:17)
I could see, so yeah,
I guess I could see like, hey, show me how you thought about it and that way they can understand that and maybe like you missed a couple of angles. But like, no one asked about these back in the day. No one was like, here's literally every single thought that I had before I post this PR. Yeah, so it just seems like.
Dan (40:23)
Yeah, show me your work. Yeah.
Right. You went and the or, or read the extremely out of date
docs and go, man, this isn't good. And then you read the code. Why did I waste all my time with those docs?
Rahul Yadav (40:43)
Yeah.
On the other hand, you could pretty easily, if this is a company policy, if you're using agents already to post the PR, you could be like, the last conversation we had about this and stick it in the PR and then you can check off the box and maybe you'll learn something from the reviewers.
Shimin (41:03)
Right, but like, are they asking for it on top of like asking for the specs? Like if the spec is already in a ticket, then it should be fairly straightforward, but
Dan (41:12)
yes.
Rahul Yadav (41:12)
Yep.
Dan (41:12)
It's,
it's like, people that I've seen two versions of this and also interesting one in docs writing too, but, one is, know, give me your initial prompt and that's it. And then the other one is like, paste me verbatim your entire conversation with the agent. They got the yielded this code and like, that's the one, the first one I sorted it, but the second one really boggles my mind. Cause it's like, okay, you've maybe.
Let's say you aired on the side of, you know, vibing pretty hard and you created a, you know, 150 file PR, the person's already like, and then what they're also going to read your like 20 minute conversation with whatever your agent de jure is about, like all the things that messed up in the first spec. I just, I don't, I don't know. I don't get it personally. I mean, there must be some value. Otherwise people wouldn't be doing this, but
Rahul Yadav (41:54)
You
Shimin (41:55)
Yeah.
Rahul Yadav (41:57)
what they're gonna do.
Shimin (42:00)
Yeah.
Rahul Yadav (42:00)
They'll point their agent at it and be like, you look at this crap and then tell me if there's anything I need to, yeah, what nitpicky comments I need to leave on this.
Dan (42:05)
What did these two do wrong?
Shimin (42:08)
⁓
⁓ nip-nip picky comment for PRs. Man, that's something that agent is like Olympic gold at. Yeah.
Rahul Yadav (42:15)
Man,
Dan (42:16)
as a service.
Rahul Yadav (42:20)
good stuff is going away though, because back in the day you would review the PR so you would leave nitpicks, now no one's reviewing PR so you can't really nitpick that stuff anymore. Where's the fun in that?
Shimin (42:31)
⁓ yeah.
Dan (42:33)
I review PRs,
there is very definitely a review fatigue occurring, I think. And I'm noticing it with articles now too. Like I see, like the pattern I see all the time is like, like a capital, the, like the whatever, the, this as like article headers. And I immediately flagged that mentally as like generated. And then I start tuning out on the article and I'm just like, ⁓
Rahul Yadav (42:48)
Yeah.
Hmm. ⁓
Dan (42:56)
May not even be
true. Might be someone just writes that way.
Rahul Yadav (42:59)
Yeah
Shimin (43:00)
Yeah,
AI fatigues at 10. So Margaret here still calls for one, for developers to still have a good mental model of the code base so they aren't afraid of it. And then she also proposes two additional helpful principles for teams who are using AI-assisted programming. That teams should establish cognitive debt mitigation strategies.
Right. like for every team, at least one person needs to know the code base still inside out so that, you know, we don't, someone, someone knows what's going on. Usually the senior devs. and the second one is, there should be a better way to detect cognitive debt before it becomes crippling. So I've, I definitely experienced this with a lot of my
vibe coding side projects is like, at first, the first couple of PRs, I still follow very closely. And then I just eventually get into a vibing state where I'm just like, I guess I don't know what's going on. So I might as well just let the AI take the wheel. I wonder what would be some good warning signs slash red flags for, hey, I need to like take a step back and read the code base and really understand it.
Rahul Yadav (43:53)
Yep.
one
quick way to tackle cognitive debt or at least to get up to speed on what's happening is the modern agents and IDs and stuff have excellent search. Back in the day, it was a pain in the ass to be like, trace everything to figure out what exactly a certain piece of code is doing across different files. Now you can just ask your cursor or whatever, can you help me figure out what's going on? And it gives you a pretty nice summary and sites and sources and everything.
So that I would see that as one real fighting cognitive debt where you can even have you can regularly generate these like summaries of what exactly is going on and that way you can at least like at a high level keep your mental model updated of what's happening in the code base.
Shimin (44:56)
So like summarizing your prompts to the agent.
Rahul Yadav (44:59)
Well, no, you like if you you can open any existing large code base because the indexing on these things is also pretty amazing. And you can just be like.
tell me where you know XYZ feature is or give me a high level overview of this project and the agent would actually give you a really good overview of what exactly the product is about, what are the different like key features it has, all those different things. So it obviously like doesn't match what Margaret is saying of like having you know every single line of code or whatever in your
but you would get you can like come up to speed much faster on things and you can keep up with all the changes that are going on that in my opinion might be the closest thing you're gonna get anyways because if you have like 10 agents per person or whatever writing code it just practically is not going to work out for any one person to have a understanding of the whole thing base
Shimin (45:38)
Mm-hmm.
Dan (45:57)
Right. But I think like there is some value in the like human process of knowing what you don't know. So like one of the things that I always try to do with teams that I work on and have generally gotten pretty positive feedback about is like I run a thing called a detro, right? So a lot of teams will run retros where you're, you know, doing retrospective. What did we do wrong? How can we improve as a team? Blah, blah. This is like that, but for your technical debt, because
A lot of times you know what your technical debt is, right? But the whole team may not know it and people may value different things. So a lot of the exercise is sort of like applying design thinking principles to bring the team's knowledge of where the technical debt is and share it widely across the team. And I feel like a process, like a human process like that could actually work for cognitive debt too. But the difference in the part that's tricky and maybe it's not a difference, I think, because like
Rahul Yadav (46:23)
Yep.
Shimin (46:24)
You
Mm.
Rahul Yadav (46:28)
interesting.
Shimin (46:38)
Mm-hmm.
Dan (46:49)
Part of the technical piece means it's your team, you worked on the code, so there's an element of vulnerability in the sense that you have to own up to, well, we didn't do this right. Or maybe we did this right, but we could do it better now or whatever. also, people, at least in my experience, and I'm sure I'm guilty of this as well, don't want to say what they don't know in a group. So it's hard. But assuming we could get through that.
Shimin (47:00)
Yeah.
Absolutely.
Dan (47:13)
vulnerability gap, I could also see a process where it's like, it maybe would just take one person going, I don't know what the hell this does. Do you know? You know, then collectively you could chart out the pieces that you do know, and then maybe explore it as a team or use some of the techniques that you're talking about, Rahul, to like dig in agentically to understand those blind spots before it starts crippling everyone in terms of like, cause I wonder how much of the, the bug
Rahul Yadav (47:19)
Yeah.
Dan (47:39)
Stuff happens because the agent has lost context. Like that's certainly an element, right? But the fact that it follows a timeline, like seven to eight weeks, which means it can't be just like a simple context window means that the people's mental model of what should be happening is drifting. And so bugs are being introduced by like prompting something that isn't against the spec, right? So I think that that's why I think that.
Rahul Yadav (47:51)
Yep.
yeah.
Shimin (48:02)
Mm.
Dan (48:03)
human process could help with that a little bit because it develops sort of like a re-shared understanding of what the spec is. Like, oh, well, I added this section because X, Y, and Z, and it's meant to do this. And someone else added this section and they didn't, you had a coordination problem, right? Where like they didn't agree on like, oh no, that was to choose the paint color of the car. Oh no, that was to add wheels to the car. Like, oh, you know, I didn't understand. And now they're like, oh, I get it. It's for paint, not wheels. Cool.
Shimin (48:13)
Mmm.
Rahul Yadav (48:16)
Hmm.
Shimin (48:26)
Hmm.
Rahul Yadav (48:26)
Yeah.
What? Something that what you just said? Now I'm wondering
Shimin (48:27)
Good idea.
Rahul Yadav (48:34)
Are we going to go from a blameless culture being the thing for the past decade or so to a blameful culture? But I didn't do it. The agent wrote it. These stupid agents are messing the whole thing up. you go, like you just push the responsibility on the agents, and they can't really defend themselves.
Dan (48:45)
Hahahaha
Shimin (48:52)
Mm-hmm.
Dan (48:55)
Look, all I'm gonna say
is it seems like almost every week GitHub's going down these days.
Rahul Yadav (49:01)
That is very true.
Shimin (49:04)
Hmm.
Dan (49:06)
That that just leaving that there. We'll see. Sorry, Microsoft.
Shimin (49:08)
There goes our GitHub sponsorship. let's move on. ⁓
Well, this week we have a, I guess we should maybe just call it paper corner ⁓ from a whole, yeah, it torrented called workflow and automation.
Dan (49:19)
paper that Rahul found on the internet.
Rahul Yadav (49:22)
Paper cuts. ⁓
Yeah, came across this paper here. The thing that the author, Philip, talks about is a lot of these benchmark emails and everything you see are.
about specific tasks that are part of a job. what he's saying is, but you know, if you like, go and look at any job description or the jobs like UI or.
assuming anyone who's listening to this does, it's never like, you will just do this one task over and over again, day in and day out for the rest of your career here, right? It's always like a set of connected jobs. so, and the reason why these are like connected is because you get more out of them versus specializing on like one person just writes code and never reads it. Another person just reads code and never writes it or whatever. Like they're connected because from the process of doing
activity you actually like you have a better model and you're able to like it feeds into the other activity so he calls it learning spillovers the person who wrote the code would be better at reviewing that code and would have a better understanding of how it's connected to other things versus
you know, they never like were involved in some of the other pieces and their work is cut down to a specific task. So the reason why this is very important is he talks about automation following this convexity where
we're starting to see some impact of AI now with like, think it's starting to show up in job numbers and everything. Until recently, one of the common clips has been like, sure, but where is, you know, like the effect of AI and what's happening? And the thing he's talking about is.
If AI can do only like small slices of a task, it doesn't make sense to actually like have AI do the whole job because the quality of output would be better if you're doing all those interconnected tasks. But at some point, if AI is able to do the whole job, then you go from this so there's this convex deployment where you go from AI can't do much of it to AI can do all of it. And now let's just automate the whole thing.
together. And so what it might end up looking like in real life is for a decent amount of time we don't see any major impacts and then all of a sudden you do because comes a future model plus the way you structure them together through agents and everything where you can literally just do away with the
So it was interesting to look at it this way because a lot of the analysis and everything looks at benchmark ULs and they're usually focused on specific tasks for job versus a whole job end to end.
Dan (52:08)
It's interesting to focus on that angle and not even though adoption side of it, right? Cause you could also look at the economic impact, like purely from an adoption angle and be like, we just haven't fully adopted it yet, right? Like even in software, would argue we haven't.
Shimin (52:19)
Mm-hmm.
Rahul Yadav (52:23)
yeah.
Yep. Yeah. And you see these. I think we had that PWC case study ⁓ a few weeks ago and stuff where
People say, yeah, we tried it, but it didn't really make any significant dent in what we're trying to do. And so I think part of it at least can be attributed to this where they're just trying to do it in a very small task. And then someone just throws their hands up and like, I can just do this whole thing myself. Why are we wasting our time on this?
if you come back to that same job, maybe like, you know, six, 12 months, whatever time it takes to then be like, you don't need to worry about any of the tasks that were part of this job. You can just like automate the whole thing. It could like quickly go from nothing to everything given how he's outlining this.
Dan (53:11)
I was also thinking about this like over the week that it's interesting how at least for me personally, my tolerance for AI making mistakes is significantly lower than a human making them.
Shimin (53:25)
As you should.
Dan (53:26)
But it's it's fascinating. It's like, why do I think that? Because I guess it's like, I have this perception that computers are doing exactly what they're told. So they'll do it, you know, like procedurally all the time, which is sort of no longer the case with like these, you know, deterministic things, probabilistic models. And, but why don't I treat it like a human coworker where if it, you know,
Rahul Yadav (53:26)
Hahaha
Dan (53:50)
If, if Raul makes a mistake, I'm like, ⁓ Hey, look, you know, you missed this. I caught it. And now we're both better because we're on the same team, you know, for like doing this. Like, I don't know. It's just interesting. The mindset.
Rahul Yadav (54:01)
Bye.
Shimin (54:02)
I think because
there's never a now we're both better part of it because they would make the same mistake again. Yeah.
Dan (54:07)
That's true. Every game. Yeah, that is
a good point. But I was just like thinking about through the lens of this where like there's some tasks that I just don't see AI being as useful for right now.
Rahul Yadav (54:11)
Yeah.
Shimin (54:24)
Mm-hmm.
Dan (54:25)
And I think that's also like driving some of it. Cause like that's what I long argued, you know, on this show is that like we're seeing this crazy adoption and software because you build the software, it compiles or it doesn't like it. There's a verification step. There's test driven development, right? There's all these things you can do, but where is that in a word doc, right? There's, you know, style and all these sort of like intangibles that, that you can, you might agree with, but what if your boss doesn't agree with it and they're judging your output or, know, there's all these other things that like,
Rahul Yadav (54:38)
Yup, yup.
Yep. ⁓
Shimin (54:46)
Right.
Dan (54:53)
don't apply like that. So it's interesting to think of it through that lens of yes, it's the slicing of the task, but also it's the function of how, not rule-based, know, how measurable the outcomes of the task are too.
Rahul Yadav (55:07)
Yeah, I so comment on your why do you have less tolerance for AI? You know, making sorry, making mistakes is we've created this you know, positive.
Dan (55:14)
Yeah, mistakes. I have lots of tolerance for AI. I want a podcast about it.
Rahul Yadav (55:22)
bias for ourselves of anytime someone makes a mistake usually you tell yourself people make mistakes we wish to forgive them not everybody's perfect but it's not in common language that we just go yeah I mistake make makes mistakes so we can we should just forgive it it'll try better next time even though Google OpenAI all of them have at the bottom this can make mistakes please check the but it hasn't
Dan (55:45)
Please
results.
Rahul Yadav (55:46)
Yeah, like, it hasn't been like, our parents and the society hasn't like over and over told us that of yes, people make mistakes, should forgive them. And they haven't done the same thing for AI. So maybe like in the future, Dan would be it happens, man, it's fine. You and I are in this together. Claude code, you and I have been doing this for a year now. We'll figure it out.
Dan (55:47)
Which nobody does.
No,
I'm Team OpenAI now because of your generous donation, so.
Rahul Yadav (56:17)
When and the other thing you said I the Documenting the tacit knowledge and the process knowledge is going to be if you could document it then it wouldn't be that type of knowledge obviously, but there's gonna be a
term effort because that's how you get the most out of AI and so and it's hard to put these things down so I don't know how we'll solve those challenges but that's definitely something that's gonna happen it's just like why can't you just write down what your style and taste looks like and you can be like because I don't know the words for it and they're like don't care find the words figure it out but you have to write it down so that the robots can do the same thing
or try and do the same thing you do in the way you do it.
Shimin (57:00)
Yeah, the author, Peter talks about breaking a series of the entire task space into workflows, which is a series of tasks connected where there's very little learning spillover. So reading code, writing code has a high degree of spillover. I wonder if we're going to get a bifurcated ⁓ chain of there are certain tasks, say,
Rahul Yadav (57:19)
Yep.
Shimin (57:24)
technical writing where there's very little spill over. Like me writing the code does not make me a better technical writer.
Dan (57:27)
Ha ha.
Rahul Yadav (57:30)
Some might argue.
But some might argue you're already at the end of the convexity there and those jobs are automated. yeah. think one of our listeners.
Shimin (57:40)
Right. like those jobs you should, what kind of monster would
Dan (57:41)
Who would argue that, Rahul?
What kind
of monster would make that argument?
Shimin (57:49)
So maybe a certain workflows where there's very little spill over, we will aggressively automate to the AI. And then any other workflows where pretty much by definition, anytime you have to interact with another human, right? Like that's largely going to be non-automatable. maybe, cause yes, talking to the designer makes me better at writing code to some extent.
Rahul Yadav (57:55)
Absolutely.
you
Yep.
Shimin (58:13)
Right. So there is a larger amount of learning spill over there. Maybe those would never be fully automated because you're losing the spill over when you rely too heavily on the AI. Just, just a hypothesis.
Rahul Yadav (58:24)
Yeah.
And along those lines, also, if you look at the kind of jobs we talk about that are actively getting automated, like data entry jobs and stuff, the job is very narrowly scoped. You take data from one system, you put it in another. There isn't as much of questioning the data, analyzing it, or anything. It's very much a couple of activities being done over and over.
again, right for automation because you have fewer tasks and they're not, there's not as much learning spillover versus the messiest things would be all these like you said, the R &D type jobs or you run like marketing or anything where you're like spanning over a wide surface area and you can't really cut that job down into any specific slice.
Shimin (59:06)
Mm-hmm.
Dan just got really excited when Rahul mentioned marketing. I'm not sure why.
Dan (59:13)
I it's no, no,
it has nothing to do with what you're talking about, but I'm going to talk about it anyway because it's too good not to talk about it on the podcast. I looked at my RSS and
Rathbun's operator has posted supposedly saying why they let it loose. And yeah, anyway, we'll get into it next episode. But ⁓ the drama continues.
Shimin (59:35)
have to do a follow up. Yeah.
Rahul Yadav (59:37)
Teaser,
man.
Shimin (59:39)
Okay. So I think the moral of this paper is probably that as developers, you should join more meetings because AIs cannot automate your meetings yet. So do your meetings and talk to your managers about this learning spillover effect and that your entire workflow cannot be automated.
Dan (59:42)
I'm sorry, Ro, truly.
Rahul Yadav (59:56)
I, those meetings already have AI notes being generated and you can pipe them into a spec or whatever, man. That's just coming. yeah.
Shimin (1:00:04)
But you have to contribute. have to provide value as the person
who holds the cognitive structure of the code base in your head.
Rahul Yadav (1:00:11)
⁓ Ed.
Dan (1:00:11)
The other fun thing is
the ask me questions about the meeting AI is usually not super well defended, so you can use it for some pre-imprints.
I was asking it for Python advice the other day and it worked great.
Rahul Yadav (1:00:22)
Hahaha.
Shimin (1:00:23)
That's the kind of tip you only get from here, guys. All right. Let's move on to two minutes to midnight, our favorite segment, where we covered the state of the AI bubble in the same way that the bulletin of the atomic scientists covered the Doomsday clock. And we are currently at a minute and 45 seconds. All right. I'm going to go first. This week, we have I've got ⁓
an article from Where's Your Ed At? I should really pay for the premium version of Where's Your Ed At, because I really thoroughly enjoy this newsletter. Where they broke down the numbers for Anthropic. And we already know that the Frontier Labs are losing money. Of course they are. But this article makes a good point that they are actually making less money
Dan (1:00:51)
Where's your Ed at?
Mmm.
Shimin (1:01:13)
as a part of their compute service than you would expect. And the reason for that is they are putting all of their model training costs into cost of goods sold. So it's counted differently. So they have a very high margin on the inference ⁓ providing section of their revenue. But since in reality, the models are continuously being refined, a lot of their R &D budget should actually be a part of their
ongoing cost of goods sold. I said cost of goods sold when I meant R &D earlier. Yeah, they're counting it as R &D, but they really should be a part of COGS. And if you look at it that way, their margins are actually only 50%, as opposed to like 80, 90%, like regular SaaS companies. That on top of the support costs and
just the sheer amount of build out that they're doing, leaving the frontier labs are actually in an even worse financial shape than originally, one would assume.
Okay, that was it.
Dan (1:02:14)
Not surprising, but also very on brand for where's your Ed at?
Shimin (1:02:18)
Yeah, the deep dives.
Dan (1:02:19)
⁓
Although, you know, I honestly don't know if I agree with that assessment because the flip side for a SaaS business would be that like,
Your next generation of product is being built against your cogs too, right? So you're saying the refinement counts as cogs or he's saying, I didn't read this one, so apologies, but like he's saying refinement costs against cogs because it's like ongoing, like RL kind of stuff or like beyond that.
Shimin (1:02:45)
Or just model tweaking, you know, cause they're constantly tweaking their models in the backend.
Dan (1:02:46)
Yeah, that's true. There's
like the dated versions and everything. Yeah. Okay. I mean, I largely assume that they're behind the scenes doing like mostly tweaks for efficiency once they've baked the original thing and like the per token inference costs are going down, which, is why I always thought that you hear, you know, you sort of hit that thing.
point where people are like, ⁓ man, Claude used to be so good, now it sucks or like Opus, whatever, it was great at this, now it's terrible. And I don't know if that's the effect of like, my computer's slow because I've had it for two years, even though it's probably, you know, just as fast as when I got it originally, it's just everything else is, or if it's literally they're doing Opto in the backend, it's like making little, you know, maybe they quantized the model or something like that. And it like actually does having meaningful.
Shimin (1:03:10)
here.
huh.
Rahul Yadav (1:03:23)
Thank you.
Dan (1:03:33)
difference on output and occasionally you get routed to the quant and you're like,
But anyway.
Shimin (1:03:37)
But yeah,
that's like support that should be included in COGS, in my opinion.
Dan (1:03:41)
Yeah.
Yeah.
Shimin (1:03:42)
All right, Rahul, one data point. Our second data point is the SASpocalypse.
Dan (1:03:42)
Okay, so one data point.
Rahul Yadav (1:03:48)
TLDR on this article is the market doesn't know can't make up its mind that whether we are for the bear case on this or for the bull case on this.
On the one side, everybody's freaking out about the infrastructure investment we have. He talks about how the hyperscalers are going to spend $700 billion on CAPEX in 2026, but direct AI revenue is only covering about 4 % of the spend. So that's one side of the freak out.
one other thing there is like the per
seat pricing and the recurring revenue that we've generated, that model is going to become redundant. And so everybody's afraid of that disruption. On the other side, the bull cases like AI can, you know, automate a lot of services related stuff and services. ⁓
$6 trillion in industry, so we should be looking at the market as like a $6 trillion thing that AI is going to go after and disrupt. And so you see.
Shimin (1:04:47)
Mm-hmm.
Rahul Yadav (1:04:48)
both things where Claude Code or Claude, a coworker whatever has like 11 markdown files and then the legal industry is like, oh my God, this is the end of the world, which means AI is taking our jobs away. And at the same time, oh my God, they're spending so much money. There's no way they can do this. And the market is acting in this like bipolar way where Some days it's very happy about things.
Dan (1:04:58)
Poof, yeah.
Mm-hmm. Flapping.
Rahul Yadav (1:05:12)
And so other days it's really sad about things. so, and one other thing that he calls out is like semiconductor industry is cyclical. And the software industry has, you you get like high margin and better recurring revenue. And still like software is trading at a lower forward multiple. And I think part of it is because of all the...
for spending everything that is being accounted for but it's still like a historical anomaly ⁓ that hasn't happened in the past. So there is just like the market cannot make up its mind which way to go is the TLDR on this one.
Dan (1:05:52)
Well, neither can I, so it's fun. It
makes sense to me, because I go from going full vibe one week to like, I'm never using this stuff again to back to full vibe again, so I get it.
Rahul Yadav (1:06:05)
Dans emotions, swing with the market. S&P500 is a dance hall pro.
Shimin (1:06:05)
I'm up, I'm up.
Dan (1:06:08)
Could be.
Shimin (1:06:10)
I'm gonna put on my capitalist hat back again. I think this article is discounting a third hypothesis, which is as an investor, I want to buy companies that have fat margins that makes lots of money. If I was afraid that the infrastructure providers are over capacity,
which with these $200 billion planned investments, could be, then I am expecting the margin on those additional capacities to be very low going forward as they fight for those customers, tooth and nail. And on the other hand, I also expect the SaaS companies to have much lower margins going forward because software's marginal cost of production is dropped dramatically. And I expect
the cost of cloning a piece of SaaS software to be much lower. Therefore, everybody is also fighting for a lower margin. So I think that could explain why I am not willing to invest in either the infrastructure or the SaaS companies.
Rahul Yadav (1:07:15)
So you're keeping, Shimin staying all cash is what I'm seeing.
Shimin (1:07:15)
Okay, hat off.
Dan (1:07:19)
You
Shimin (1:07:19)
⁓
I picked up some infrastructure companies. during the sell off, this is not financial advice listeners, but I think I think things are cheap.
Rahul Yadav (1:07:25)
You
Dan (1:07:27)
It's also, if I learned one thing from crypto, it's that we will find another use for those GPUs.
Shimin (1:07:32)
haha
Rahul Yadav (1:07:35)
oh yea
Dan (1:07:35)
Uh, and like, they're still extremely powerful compete. mean, like, just think what if, what if all of AI goes away tomorrow? Like just pretend, just play the game with me for a moment. That's what I was going to say. We could have a nation of the best gamers on the planet because we could scream at 4,000 FPS. Those GPU blackwells so good. So.
Rahul Yadav (1:07:45)
man, the kind of gaming ⁓ computers we'll have.
Holy shit. That's the world I want to be in.
Shimin (1:07:56)
My god.
Dan (1:08:03)
Yeah, I don't know. Actually, it was terrifying. Did you hear that role that Nvidia is considered? They're talking about skipping a generation of gamer cards for the first time because the crunch, the crunch for, for, know, the Ram crunch and GPU crunch is real. So I
Rahul Yadav (1:08:11)
Yep.
Shimin (1:08:12)
Wow.
Rahul Yadav (1:08:14)
forgetting their roots.
Yeah.
Shimin (1:08:19)
I thought we just
get the worst version of the... no that's not true, not anymore
Dan (1:08:23)
or no RAM, yeah.
I, yeah, that's, to me, I think that I'm very sad to hear that because it's like, you know, the gaming community is what got you here in the first place, pushing those pixels, so.
Shimin (1:08:37)
Yeah.
Rahul Yadav (1:08:38)
Yeah. Yeah, it's just more like selling to businesses, more profitable and you get more advanced commitments and all that. Should we do the two minutes? I have to leave.
Dan (1:08:43)
Yeah, for sure.
that capitalist hat back on.
Shimin (1:08:46)
So yeah, sad
days for gamers, sad news week for gamers, but how do we feel about our clock? We're at a minute 45.
Dan (1:08:55)
I don't feel like we moved all that much this week.
Shimin (1:08:59)
I'm gonna propose we move it back by 30 seconds. Okay, here's my rationale for this.
Dan (1:09:04)
Are we just the flappy
investors now? Is that what's happening? Okay, I'll hear out your argument.
Shimin (1:09:09)
⁓
My rationale for this is it's now officially Lunar New Year. The Chinese models, while good, this is usually when the Chinese open source models come out during the new year. And this year the models are just okay. So other than the very amazing video models, but I just don't think video models have as much of an impact. I think the real money is in...
Dan (1:09:14)
I'm celebrating.
Shimin (1:09:35)
workflow automations. So that means the frontier labs probably have a little more run way because they don't have open source models catching their heels.
Rahul Yadav (1:09:46)
world models might have a lot of video. Yeah, but world models I think will have a lot of video. That's why you see all these like video first. Everybody's trying to go to video as well, because it feeds into like building a world model.
Shimin (1:09:46)
And also because they release new models and they seem pretty good. Sorry.
Alright, but I think Gemini still has a lead on that.
Rahul Yadav (1:10:01)
I'm with you. I feel optimistic. think the more agents and stuff we deploy, we buy ourselves more time and stuff. So it works. Two minutes and 15 seconds works.
Dan (1:10:12)
Okay, who am I to argue? Who am I to argue? I have no capitalist hat. I have the, I don't know, hoodie of...
Shimin (1:10:12)
Dan, have I convince you? How's the capital?
the hacker hoodie. You just want to everything down like the first season of what is that show?
Rahul Yadav (1:10:19)
Hahaha.
Dan (1:10:20)
I don't. I really don't.
I just couldn't think of a good ideology to go with the hoodie. So, you know, I'm fine with that, honestly. Let's let's let's let's roll it.
Rahul Yadav (1:10:30)
Mr. Robot, Dan's yeah.
Shimin (1:10:32)
Let me see a robot,
Libertarian Dan right, two minutes, 15 it is. And I think with that, that wraps up the show. Thank you for joining us this week. If you like the show, if you learned something new, please share the show with a friend. really helps us out. You can also leave us a review.
Dan (1:10:47)
As long
as that friend is not named Crabby Rathburn. Anyway, go ahead.
Shimin (1:10:53)
That's true. Crabby Rathbun is banned.
Rahul Yadav (1:10:54)
Don't piss off Crabby Rathbun.
Shimin (1:10:56)
Yeah. Please leave us a review on Apple podcasts or Spotify. It helps people to discover the show and we really appreciate it. If you a segment idea, question for us or a topic you want us to cover, like Perplexity's model council show us an email at humans at adipod.ai. We really enjoy hearing from you. Definitely read it out loud. you can find the full show notes, transcripts and everything else mentioned today at www.adipod.ai. Thank you again for listening.
I will catch you next week. Bye.
Rahul Yadav (1:11:22)
Thanks everybody.