Episode 26 · May 22, 2026

LLM Neural Anatomy with David Noel Ng, Forward Deployed Everybody, Running LLMs at Home

interaction models, thinking machines, mira murati, qwen 3.5, micro turns, switchboard model, multimodal model, meta surveillance, employee data extraction, mouse tracking, keystroke monitoring, meta layoffs, tech worker collectivization, IBEW, forward deployed engineer, FDE, palantir, deltas, scott werner, works on my machine, salesforce api only, piggly wiggly, jevons paradox, pit crew, service integrator, SI, stripe forward deployment specialist, jasmine sun, david noel ng, dr david ng, yoummday, max planck institute, organic chemistry, optogenetics, fluorescent dyes, action potentials, brain microchip interfacing, cortical barrels, blind schizophrenia, LLM neural anatomy, LLM neuroanatomy, layer duplication, beam search layers, repeated middle block, thinking circuit, qwen 3.5 family, llama 2, mixture of experts, MoE, PCA explorer, language clusters, semantic clusters, GLM 4.7, multi-token prediction, MTP, wave function collapse, stochastic parrots, chomsky, grace hopper module, bavarian pig forest, antires, llama.cpp fork, deepseek 4 flash, deepseek v4 flash, ryzen 395 max, ryzen ai max+ 395, 128 GB unified memory, SSD prefill caching, Q2 quant, KV cache, sonnet 4.5, ESP32, ESPHome, seeed studio sensecap d1, home assistant, vibe coding, agentic engineering, rampocalypse, stated preferences, revealed preferences, behavioral economics, keaton ellis, wanying huang, lottery experiment, loss aversion, prompt AI, data AI, both AI, EU AI act, steerability, transparency, mistakes were made but not by me, carol tavris, elliot aronson, cognitive dissonance, nat friedman, cerebras IPO, high bandwidth memory, anthropic vs openai, ramp data, 34.4 percent businesses, jobless prosperity, andy hall, FDR 1944, two percent unemployment, two minutes to midnight, AI bubble

Listen on

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music

Why is Mira Murati’s first product from Thinking Machines a switchboard model rather than a frontier one? Why do beam-searched cross-layer combinations on Qwen 3.5 lose to a simple repeated middle block? And what does it mean that a data-driven AI matches your behavior better than an AI you painstakingly prompted? Shimin, Dan, and Rahul cover Thinking Machines’ interaction models (a Qwen-3.5-sized “interaction model” mediating in micro-turns between humans and a larger backend), Meta employees flyering against the keystroke-and-mouse-and-screen surveillance program announced last week amid 10% layoffs, Scott Werner’s case for Palantir’s forward-deployed-engineer title eating every department as Salesforce moves to an API-only data model, an interview with Dr. David Noel Ng on LLM neural anatomy (layer duplication holding up on Qwen 3.5 and MoE, language→semantic→language cluster transitions in the PCA Explorer, multi-token prediction as wave-function collapse over entire poems, and the Grace Hopper module he bought from a stranger in a Bavarian pig forest), Dan running Antires’s DeepSeek-V4-Flash-only fork of llama.cpp at ~10 tok/s on a 128 GB Ryzen AI Max+ 395 plus a vibe-coded several-thousand-line C lambda for an ESP32/ESPHome home dashboard, Ellis & Huang’s “Should I State or Should I Show?” paper showing data-only AI beats prompt-only AI on lottery decisions (75% vs 70%) while “both AI” does worst, and a Cerebras IPO popping 108% on day one with Anthropic now beating OpenAI on Ramp business data and Andy Hall’s “Politics of Jobless Prosperity” naming 2% unemployment as the line in the sand.

Takeaways

Thinking Machines’ first product is an “interaction model” — a Qwen-3.5-class switchboard at the UI front that handles human micro-turns while a larger backend model does the heavy lift. Mira Murati’s team explicitly credits the pattern to Qwen rather than claiming it. The demo runs real-time translation while tracking who’s in the scene; the bet is that most user-facing interaction doesn’t need frontier compute and that a small fast model can mediate between humans and an agent loop.
Meta’s surveillance program from last episode got worse — employees taped flyers around the offices (“don’t want to work at the employee data extraction factory?”) in protest of the keystroke / mouse / screen-recording stack used to train computer-use models, all while 10% of the workforce is being cut. The likely sequel: a vendor packaging the same surveillance stack as a sellable service. Shimin reads it as the first plausible glimpse of tech-worker collectivization after two decades of resistance.
Scott Werner’s “Here Comes Forward Deployed Everybody” frames Salesforce’s API-only pivot as the trigger that pulls Palantir’s FDE title (originally “delta”) into the new pit-crew role across every department. The argument worth watching is Jevons paradox: Rahul leans agents-only (Jasmine Sun’s read — more of the thing, but not necessarily more humans doing it); Shimin leans humans-with-AI-glue because marketers and lawyers still won’t vibe-code their own UIs. Either way, the SI ecosystem just got rebranded.
David Noel Ng’s follow-up neural anatomy work generalizes the layer-duplication finding to Qwen 3.5 and MoE models — brute-forced beam search across cross-block combinations did not beat the simple “repeat one middle block 1-5 times” recipe (on Qwen 3.5, layer 33-35 wins, repeated once or twice). His PCA Explorer (post 3) shows clusters in early layers form along language lines, dissolve into semantic clusters in the middle, and re-form as language clusters at the top — a Chomsky-friendly hint that the abstract “thinking” representation really is in there.
Multi-token prediction is David’s underrated philosophical bomb: a model isn’t predicting the next word, it’s holding a probability mass over entire poems at once, and sampling is wave-function collapse. He pairs it with “people born blind never develop schizophrenia” as a hint that data flow shapes brain structure the way pretraining shapes transformers. The mic-drop: “If you’re thinking, ah, it’s a stochastic parrot, then you’re in the wrong field.” (Bonus: the Grace Hopper module he runs all this on came from a stranger he met in a Bavarian pig forest.)
Dan runs Antires’s DeepSeek-V4-Flash-only fork of llama.cpp on a 128 GB Ryzen AI Max+ 395 — Q2 quantization on the front of the MoE, full experts in the back, SSD-cached prefills so the system prompt isn’t re-crunched per session, ~210-250K context, ~10 tok/s. Output quality pegs roughly at Sonnet 4.5. Same week: a several-thousand-line vibe-coded C lambda on a Seeed Studio SenseCAP D1 / ESP32 / ESPHome dashboard that, against all reasonable expectation, works.
Ellis & Huang’s “Should I State or Should I Show?” is the deep-dive paper. Data-only AI matched human lottery choices 75% of the time, prompt-only AI 70%, and “both AI” did worst because it deferred to the prompt 66% of the time when prompt and behavior conflicted. 35% of humans picked the worse-performing prompt agent because they overestimated their own prompt-writing skill. If the EU AI Act enforces “transparency and steerability” via stated preferences, compliance literally picks the worse model — which sets up a future where agents nod at your prompt and act on your revealed behavior anyway.
Two Minutes to Midnight stays at 6: Cerebras popped 108% on a $5.5B IPO (claim: ~80× memory throughput vs. comparable NVIDIA GPUs, via on-chip HBM); Anthropic now holds 34.4% of Ramp-card-paying businesses, beating OpenAI for the first time; Andy Hall’s “Politics of Jobless Prosperity” reads a 2% unemployment jump as the line in the sand for political stability, opening with FDR’s 1944 State of the Union. Nothing this week jumped the needle.

Resources Mentioned

Chapters

(00:00) - Cold Open & Welcome
(02:08) - News: Interaction Models from Thinking Machines
(05:57) - News: Meta Employees Protest the Mouse-Tracking Program
(10:53) - Post-Processing: Here Comes Forward Deployed Everybody (Scott Werner)
(22:50) - Sit-Down: Dr. David Noel Ng on LLM Neuroanatomy
(40:05) - Vibe N Tell: DeepSeek-V4 Flash at Home on a 395 Max
(45:37) - Vibe N Tell: ESP32 Home Dashboards via Vibe Coding
(48:51) - Deep Dive: Should I State or Should I Show? (Ellis & Huang)
(1:04:24) - Two Minutes to Midnight: Cerebras IPO, Anthropic vs OpenAI, Jobless Prosperity
(1:09:34) - Outro

Transcript

Show full transcript

Shimin (00:00) Hello and welcome back to Artificial Developer Intelligence, a weekly conversation show about AI and software development. We go through hundreds of links and dozens of newsletters each week, so you don’t have to. My name is Shimin Zhang and with me today are my co-hosts. Dan, you can find him in meeting rooms, on vending machines, and even in the most sacred of spaces, atop toilet paper dispensers, Lasky. And Rahul.

Performing significantly better when given revealed preferences than when given stated preferences. Yadav. gents Where do I find you guys today?

Dan (00:33) Hi.

Rahul Yadav (00:34) Hey.

Dan (00:35) Apparently on top of a toilet paper dispenser.

That’s a new one. Yeah, I’m doing all right.

Shimin (00:42) All good. Smells nice in there. ⁓ All right. On this week’s as per usual, we’re going to start with the news thread mail where we’re to talk about interaction models as well as the meta employee pushback.

Dan (00:44) Hahaha

the toilet paper to sponsor. ⁓

pushback.

Then we’re to be looking at post-processing, where we have one post today. ⁓ and it is called here comes forward deployed everybody, which I can’t wait.

Shimin (01:08) Yeah, then we’re going to have a sit down where we interviewed, Dan and I interviewed David Noel Ng, the writer of the LLM Neural Anatomy blog post that we spoke about a couple of weeks back. Very exciting stuff.

Dan (01:22) And then I’m going to do a very brief vibe Intel where I talk about running a fairly large language models at home and ⁓ also some sort of ESP thing. I don’t know what that means either, but we’ll get into it.

Shimin (01:34) Then on deep dive we have a paper titled should I state or should I show what how to align AI with human preferences.

Dan (01:43) Yep. And then last but never least, we have two minutes to midnight where we’re going to assess the financial state of the world through the lens of the.

Get into it.

Shimin (01:53) Let’s.

Alright, first up, we have an article. Yes.

Dan (01:56) Interaction models from

thinking machines. We haven’t talked about thinking machines too much on here. So they’re the that’s the company that I’m gonna slaughter her name. How do you say it?

Rahul Yadav (02:07) Mira Murati? Is that right? I don’t know, but we’re going with it.

Dan (02:08) Yeah, there you go. People that are good at names. I don’t know better than I would have done.

who was the former CTO of, open AI left, started her own, company called thinking machines. ⁓ got a huge initial raise like a lot of other folks. And now we’re finally starting to see some output from them. So the first thing that they have really sort of publicly announced is,

this new interaction models concept and I’ll summarize it as briefly as I can, But my understanding of what they’ve done is they’ve made a pretty good, pretty small model. Like think roughly, Qwen three five sized model, but it’s actually a little bit better in terms of its intelligence density. And they run it.

on a very fast cadence where instead of like waiting for a full turn, which if you don’t know the lingo on that turn is basically like me and Raul talking, right? Like, hello, my name is Dan.

Say hello, my name is Rahul.

Rahul Yadav (03:07) Hi, my name is Dan too Am I doing it right?

Dan (03:10) Oh, OK. Well, that was a turn, a weird one that didn’t go right at all. yeah,

OK. So only in one case, usually it’s the model and the human and the model and the human, blah, blah, So what they’ve done is essentially chunked that up into much tiny what they’re calling micro turns. And then they’re running this small sort of micro turn optimized model in the front. And then they have a larger, almost agentic kind of

thing running behind it and that you are able to communicate. And somehow, and this is the part where I think the magic really is, the micro turn-based model is able to handle both the humans talking to it in real time and the other model streaming in data that it needs to tell the humans about things that they’re asking in real time. It mediates that conversation between the two. So it’s kind of like a extremely fancy.

Switchboard Operator or Traffic Director. But their demo video is pretty cool. We’ll link their announcement post in the thing. And basically, a couple little show-off moments in there where they’re having it do real-time translation while it’s also tracking who’s in the scene and a couple other things like that. It’s pretty cool. So exciting to see what comes of this, but not bad for a first press release anyway.

Shimin (04:18) Mm-hmm.

Yeah, I think we spoke about thinking machines when they raised a couple of billion dollars with no products to show as a part of two minutes to midnight section, like a couple of months back. It makes sense that they’re not going to kind of build their own frontier model, right? You can’t just build a frontier model overnight. So they’re tackling this user interaction problem.

Dan (04:34) Well, here they are. Yeah, that’s fair.

Shimin (04:49) And they did mention that this idea of an interaction model and a background model is not something that they came up with. I believe they mentioned they borrowed the idea from the Qwen team. It was already demonstrated once upon a time. So yeah, the demo is impressive. I agree with you there. And this idea of multimodal models by default is also something that we’re likely to see more of.

It, think it’s yet to be seeing whether, ⁓ they are going to allow. Kind of true frontier models to act as, you know, background models. could see having a small model that interacts with me, but then kicks, task off to like opus on the backend. think that would be hugely beneficial. yeah, excited to see where they go with this.

Dan (05:33) Same. But it’s definitely a neat idea if nothing else. I hadn’t seen a demo quite like that, at least even if maybe Quinn has beaten them to it.

Shimin (05:42) Well, for my news item, think last week or the week before we talked about Meta has started monitoring their US employees, every mouse stroke, keystroke, and also their screens as a part of their initiatives to better train their models on how human computer users behave.

And this week it came out that Meta employees are actually pushing back via flyers that they’ve sent all over the Meta offices, you know, with the title that says, don’t want to work at the employee data extraction factory? Meta employees, I don’t want to use a…

Rahul Yadav (06:20) Take one of these

pullout things if you don’t want to do that.

Dan (06:25) ⁓ tab, take a tab. Have you seen my cat? Would you like guitar lessons or?

Rahul Yadav (06:26) Yeah.

Shimin (06:27) Take a test.

Rahul Yadav (06:30) Yeah.

Shimin (06:31) So, and the goal here is of course for the pamphlet to encourage employees to sign an online petition to protest the employee surveillance program. Dan, I believe it is you who had a heart take that this is actually good, if I recall correctly.

Rahul Yadav (06:46) defend yourself.

Dan (06:47) I mean, I’m not saying it’s good, but…

It’s an interesting idea, right? Like how else are you going to do this if you don’t have the data is it was my real point, but then I’d rather them do it on their employees than like the public writ large, which I feel like they could probably do with the platform, you know, size that they have.

Rahul Yadav (07:06) It is, yeah, and it is setting a precedent for other people to want to try to do this. And there’s not much standing in the way of them turning this into a service that you could deploy and try and like, do that, sell it to other people.

Shimin (07:07) Yeah, it’s either the employees or it’s us.

Dan (07:10) Yeah.

Rahul Yadav (07:29) I don’t think they would do it, but someone would look at this and be like, yeah, totally. I can get into this business.

Shimin (07:36) And they’re doing all this while their ⁓ meta is cutting 10 % of the total workforce. If this is the trend for all the other companies to follow, I don’t like where this is leading.

Rahul Yadav (07:42) Yeah.

Dan (07:42) Mm-hmm.

Rahul Yadav (07:47) Yeah, tomorrow looks like it says May 20th. Big beautiful layoffs. That’s what the article says right there and it’s not me. Yeah.

Dan (07:51) Yeah, it’s a layoff day.

Shimin (07:52) Yeah. Yeah.

⁓ It’s not you. Well, it sounds like you. That’s why I’m laughing.

I do wonder if we’ve hit some sort of a worker’s rights movement in tech finally. It only took us like two decades. Tech employees have historically been very anti-collectivization and worker organizations. But this seems like at least an initial glimpse of something that come potentially.

We’ll find out.

Rahul Yadav (08:25) Yeah, and I don’t know if the internals of how the software works is published somewhere, the anyone who remains after the layoffs and chooses to stay but doesn’t like it is probably going to try and sabotage it, right? If they don’t support it. And so you can

⁓ imagine people trying to screw up the data that is being sent so that the models are being trained on bad data and doesn’t really go anywhere.

Shimin (08:46) Mm-hmm.

Rahul Yadav (08:57) be between that and people moving to old tech like flyers and everything it’s just a overall seems like a weird environment to be in like why would you spend eight ten more hours of your day in an environment like that but that’s where it seems like things are going here

Shimin (09:06) Mm-hmm.

Dan (09:14) I mean, I think that about offices generally,

but that’s me. What do I know?

Rahul Yadav (09:19) More that the, there’s no trust at some point, right? Like if you do stuff like that and then it’s just, why would you spend the prime part of your day on?

in a very low trust environment and you’re consistently able to look over your shoulder and you know you’re continuously being monitored and even if this data is not going to be used for performance as it says on the paper doesn’t necessarily mean that’s what it’s going to happen in reality. overall the kind of like psychological environment for people is just doesn’t seem like a good thing.

Shimin (10:01) Yeah, pamphlets, potentially sabotage. Like what’s next? Actual workers unions? couldn’t imagine, like this has never happened before, right? I just looked this up. The International Brotherhood of Electrical Workers, the main electricians union was founded in 1891. That was before the Westinghouse and Edison kind of DCAC thing was settled for good.

Dan (10:07) Hahaha.

Rahul Yadav (10:07) Thank you.

Dan (10:16) I B E W. Yeah.

Shimin (10:27) ⁓ We’ve long passed that moment and I think tech workers have always had a very cushy job prospects, but maybe, maybe this is a sign of things to come.

Rahul Yadav (10:27) Hmm.

Shimin (10:37) Alright, moving on to post-processing.

This week I have an article from Scott Werner of he writes at the works on my machine sub stack. I don’t know if we ever covered any of his writings, but ⁓ I am a huge fan of Scott’s writing. And this week’s article was titled, here comes the forward deployed everybody. And in this article, Scott brings up the idea that, you know, initially

back in the old days before Pigly Wiggly was created, folks went into grocery stores and a clerk would take their orders and go bring their groceries all bagged to the customers. Pigly Wiggly was the first surprise. It’s just so fun to say Pigly Wiggly. Say that three times faster. Pigly Wigglies.

Dan (11:24) I used to live near a Piggly

Wiggly. Actually, it was the one where the Ego scene in Stranger Things was.

Shimin (11:33) Yeah, it was a big thing back in the day still have a couple in the south ⁓

Rahul Yadav (11:37) They might

soon be selling GPUs where this thing is going. Yeah.

Dan (11:41) If all birds is any indication, yeah. Ego and GPU. mean, why

not?

Shimin (11:47) What’s the difference? So Piggly Wiggly was the first grocery store where customers can go in and bag their own groceries. And now we’ve had this evolution where ⁓ even the checkout person has been replaced, right? Like we now pick, bag, and check out our own groceries. And the point that he brings up with this analogy is that Salesforce announced that they’re going to

move towards an API only service model, similar to how we are now checking out our own groceries. The UI is now also the user’s responsibility to build. And what does that mean for the industry? Because the work still needs to be done. Somebody still has to go pick up the grocery and do the checkout. So if Salesforce data is API only, then

If you’re a large enterprise customer, you still need a somebody, right? To build that custom layer of UI for your company. You shouldn’t expect the marketing person or the sales person to be using Claude code and vibe coding their own UI. So what should we call this potential person who does the final mile of using a

generic piece of intelligence, AI technology to build out custom UIs with their kind of subject matter experts.

Dan (13:09) Well, that

role exists today though, I will say like, or I prior to AI, right? Which is SI. So like it’s a service integrator, right? And, and Salesforce has arguably one of the largest SI, like, I don’t know what am I looking for? Community, guess, because it’s just like such a complex, like,

Shimin (13:12) Mm-hmm. Yes.

Right. Yes.

Rahul Yadav (13:27) Yeah.

Dan (13:29) platform is designed to do one thing and people have made it do all kinds of other things. So there’s like this huge SI ecosystem around like make Salesforce do whatever, you know.

Shimin (13:38) Mm-hmm.

Rahul Yadav (13:39) Yep.

And bajillion jobs over the past how many years that are all the Salesforce developer.

Shimin (13:46) Right. So if this kind of headless API first AI facing first kind of SaaS becomes a norm, then now not only do you need Salesforce admins and database admins, now you’re going to need an admin for every single SaaS product you consume for all of your individual departments. And Scott calls them the

pit crew. The idea is you are a F1 pit crew that fine-tunes the engine and the vehicle for your own needs. Stripe calls them AI Forward Deployment Specialist, which is a mouthful and I rather not have to repeat that. But basically, and that’s the heart of the argument here, because Jevons paradox is a thing.

We are actually going to create more jobs of folks who interface between the AI and the end users, not less.

Takes.

Dan (14:43) Sanders.

Rahul Yadav (14:44) The Jasmine Sun who writes on Substack and in Atlanta and stuff had something recently where she said, yes, Jevons paradox means, you know, you get more of the thing, but it doesn’t say that humans necessarily get more of it. So when you say we would get more of, you know, these integrators, it doesn’t necessarily mean

it would be human jobs. It could be that you are building your own agents on top of it, but there’s still like fewer people doing it. I do think maybe this is partially my brain just, you know, grappling its own cognitive dissonance, where I do think it means you will need more engineers because at some point you have to

have enough technical specialization but you might not necessarily need more non-technical people to be able to it might not lead to more of those jobs or more opportunities if you unbundle it that

Shimin (15:48) Yeah, I guess my counterpoint there is like, if I was a marketing person, even if the AI is helping me build a UI, like I still don’t want to have the prerequisites to finally prompt the AI agent on the specific requirements of my workflow. Like it still helps someone who knows about.

Rahul Yadav (16:05) Mmm.

Shimin (16:08) has the abstraction of, know, MCP data sources, how front end actually works and kind of how the pipelines are actually set up to think through these things step by step. This person may not be doing hands-on coding, unless we expect every lawyer, every marketing individual, every salesperson to also know about, you know, data pipelines and algorithms to some extent. And maybe that’s possible, right? Maybe in the future, knowing about

Rahul Yadav (16:18) Hmm.

Yeah.

Shimin (16:35) technical details of how a web server spun up is as crucial as learning geometry today. But assuming that’s not true, there will still be jobs.

Rahul Yadav (16:42) Yeah.

Yeah, and by the way, this is great news for things like Claude code and codecs and stuff because they can’t work as well with UI, but they can work very well with APIs and something that would have slowed down their deployment or adoption would have been that, we just have UIs as we don’t have all the APIs and you just got to buy our product. But

Through this, actually makes it much more likely that the march of these agents getting adopted will continue and people will use them to be able to then not just consume, this case, Salesforce APIs, but you can combine them with any other marketing and whatever other ⁓ software you might have bought, all those APIs in one place and then build your own.

custom dashboards and in-house software. So more bullish case for, guess, Claude Code and Codex is one way to look at this.

Dan (17:49) Yeah, I assisted S. I. It’s a lot of letters. Yeah, I as I. Yeah. Um, yes, it’s interesting. The other piece I was like expecting this article to go into more is like, what do we mean by forward deployed? But I think you kind of had to like draw the lines there, right? And like, I’m also curious, like, how has that term entered the sort of

Rahul Yadav (17:52) Yeah.

Dan (18:14) general discourse lately, it sort of snuck in. So yeah, that’s why I I wanted to like wonder where it came from. So

Rahul Yadav (18:16) Palantir did

Yeah.

Dan (18:20) yeah, it turns out it came from Palantir. Apparently they originally called their like on-site engineers, deltas as in, you know, like delta operators, which is kind of funny, but like everyone’s saying like forward deployed now, but like, I seem to recall that position always being like the least.

glamorous one out of engineering, right? Like you would start in that type of engineering role and then like graduate to being a quote unquote real engineer, right? I that’s like showing a little bit of bias towards it. So I find it odd that we’re kind of like glamorizing it now with like that type of language and everything. I do think that there’s like a ton of power in that position that I think normal, especially huge companies that like Salesforce’s scale kind of get wrong.

because like they have so much good signal about what customers want. And a lot of times that just gets like lost in the rest of the company.

Rahul Yadav (19:11) Yeah.

Dan (19:12) ⁓ so like, but some of the things that they want can only be solved at the platform level too. So it’s like, that’s why I’m like, will this be any more effective than we are today with SI roles where like, you can work around platform limitations with like, you know, little bits of in-house code here and there, but like at the end of the day, if it’s a fundamental limitation of the platform, there’s like not much you can do. So I don’t know. It’ll be interesting to see how that pans out for them as a strategy.

Rahul Yadav (19:34) Yeah.

And what happens when that person leaves whoever that or how many SIs are you going to have per, you know, Salesforce or any other similar product that you bought? And what happens when they lose who maintains that infrastructure? All of those problems. There’s a reason why, you know, it’s better to offload these things to someone else instead of building your own.

Dan (19:51) Yeah.

Rahul Yadav (20:03) highly available SAS in-house for every single thing. So we’re going to learn all those lessons again and then be like, yeah, maybe we shouldn’t have done that.

Dan (20:11) Again.

mean everything old is new again anyway, right? Like, you know, there’s all the people like when Docker came out, people were like, but IBM, whatever 320 mainframe. Sorry, a little before my time. I know I’m old, but not that old. Like had containers like this and whatever. Yeah. Anyway.

Rahul Yadav (20:18) Yeah.

Shimin (20:29) Ha!

Rahul Yadav (20:32) Yeah.

Shimin (20:33) As technology changes, the right equilibrium and the right comp also and the right organizational unit also changes. So Scott is here painting this alternative future where you don’t go from 20 marketers to one marketer with one

you know, as I or forward deployed or pit crew or whatever you want to call it. You go from 20 marketers to 25 marketers to five pit crews and then 30 marketers to 10 pit crews as the productivity increases. In theory, you should want more of the good, right? Cause more of the market can be reached. And we just don’t know. Like we clearly see from some companies that they’re cutting engineering, head count, but there’s also a potential world where Jevons

paradox comes true and we actually have more demand for these intermediary forward deployed positions. I guess time will tell, but I don’t necessarily think that we’re learning the new lessons again or the old lessons again. I think we are learning what the new equilibrium should be as our technology landscape changes.

Rahul Yadav (21:42) I’m saying we’ll learn that eventually. yeah. The old, well, fingers, yeah. Fingers crossed.

Shimin (21:45) Yeah, we’ll learn something eventually.

Dan (21:48) speak for yourself.

Shimin (21:53) Alright, shall we move on to our interview with David Ng? Alright, let’s go.

Dan (21:57) Yeah, I’m really looking forward

to this discussion.

Shimin (22:00) You

Shimin (22:00) Hello listeners, due to some technical glitch with the recording, we lost the last four minutes of the interview with David Ng. I am just absolutely gutted by it because it was such a fantastic story. But you can find out the rest of David’s story on building his high end AI desktop at David’s blog. Just Google David Ng.

building a high-end AI desktop, and that will be the first post that comes up. Okay, on to our interview.

Shimin (22:34) and today on the show we have with us a special guest, Mr. David Ng, the author of the LLM Neuroanatomy post that we spoke about a couple of weeks back. David is the head of AI product and engineering at Yoummday and in his past life, he has a PhD in

Organic Chemistry from Max Planck Institute of Biochemistry and was an embedded software engineer for a bit there. Welcome to the show, David.

David (23:01) Thank you very much, Shimin Nice to be here. I’m looking forward to the chat.

Shimin (23:04) Can you tell us a little bit about how did you go from doing a PhD in organic chemistry? Well, from an embedded software engineer to a PhD in organic chemistry there.

David (23:14) Yeah, that’s a pretty wild journey. I did pick microcontrolling programming back in the day, back in the 90s. It was basically just pure assembly and I was poking around pushing stuff in and out of registers. Wasn’t really my thing. That was my first like little job after like working as a barman for a while. So I thought, all right, I’ll do my degree and I did it in biochemistry and then long story, but I ended up in Germany. I met my wife. ⁓

Shimin (23:38) Mm-hmm.

David (23:39) backpacking through Bolivia, ended up in Germany, didn’t know what to do. So I thought I’ll go and do my PhD now. And I was lucky enough to get into a really cool group that were mostly doing brain microchip interfacing. So they were growing like rat neurons on their bits of silicon that they were producing there in their lab. But I was doing something kind of new then that was optogenetics. So I was designing ⁓ fluorescent dyes atom by atom and they

go into a ⁓ lipid membrane in a brain cell and actually change color. They glow at different colors, like between like red and blue, depending on the charge across the membrane. And I was like looking at those at 10,000 frames a second and you could literally see the actual potential race down neurons. So that was my PhD. That was pretty fun.

Shimin (24:09) amazing.

Dan (24:19) Wow.

Shimin (24:20) Yeah.

And how did you find your way into AI?

Dan (24:23) was gonna say after that, it kind of makes sense, but…

David (24:24) ⁓

Shimin (24:26) Yeah, yeah, yeah, yeah,

David (24:27) Yeah,

Shimin (24:27) yeah, yeah.

David (24:28) like, brains are pretty cool things. I think one of the coolest things in our universe. yeah, so I ended up then doing my postdoc just around the corner at the Max Planck for neurobiology. And I was still doing biosensor work. was like doing calcium sensors. Cause I wanted to see how brains work at like the molecular level, like what the heck is going on there? How do they connect and so on.

Dan (24:30) sure.

David (24:45) And I was doing a lot of genetic engineering, messing around with CRISPR Cas9 and building protein probes, no more small molecules anymore. It’s a little bit toxic for your health. think most chemists die young. I think they have the lowest lifespan of scientists. So I thought, enough of that.

Dan (25:00) Wow.

David (25:00) I’ll get into molecular biology and do things on that side. And you know what? There’s just so much data. I was just grinding. We’d be making a million different protein variants and we’re trying to scan them and figure out how they work. I had to build robotic systems to pick and place colonies because we just had so many thousands of arbatural colonies making these things. was getting a bit out of hand. And after doing robotics and coding, it was like, this is kind of fun.

Shimin (25:10) Bye.

David (25:25) It was ⁓ in my PhD, I’d seen the Jeffrey Hinton paper where they had these restricted bottom machines and he could finally like recognize digits. I’m like, that sounds so much cooler than what I’m doing. And I was doing, you know, optogenetics, but that seemed really cool. Yeah. No fumes. And you could just like do it on your laptop. That sounded like pretty, that’s a pretty cool way to go, but.

Shimin (25:29) Mm-hmm. Yep.

Yeah, no fumes, of course.

David (25:44) At the time, sorry, sorry Jeff, but I didn’t get these sleep-wake cycles. The terminology was super weird and I just couldn’t get into it. It was like it was like too much of a jump from from from chemistry at that time. But I always liked the idea and I actually got into another job in between. I was doing some work doing high-speed electronics and I picked up bit of FPGA programming and was generating huge amounts of data.

And yeah, I had to analyze some protein analysis and I picked autoencoders because it was a nice way to analyze looking for anomaly detection. It seemed like a really nice way to go.

And so, yeah, was a huge reconstruction loss. And I got into that and it was like, wow, this is really, really fun. And compared to the speed of molecular biology, know, they were coming up with CRISPR, Cas9, and all these new amazing things every couple of years. In AI, it was every couple of weeks or every couple of months, there’s a huge change. And it’s really addictive. So I jumped ship and got into AI consulting and that led me to here.

Shimin (26:39) Yeah. ⁓

It’s almost like a perfect storm of just a perfect set of qualifications for your neuroanatomy experiments, right? Because the idea of you have these layers of circuit is something that… Was it inspired by your background in neurobiology?

David (27:04) Yeah, look, I wasn’t a structural neurobiologist. It was more of the, at the neuron level. The idea for me was like, I know a little bit about cortical barrels. And the idea about that is that you have these repeating chunks of brain and there’s lots and lots of them and they’re pretty plastic. And eventually with enough of them, you see a difference between us and a chimpanzee and a macaque. We kind of have more of these.

Reusable components and I’m thinking well like layers are the same aren’t they? I always start off like you know with knowing nothing Maybe something is going on there as well And I was thinking about like, you know, just weird things like, you know I’ll give you a really weird fact like people who are born blind Like never have schizophrenia. I just I just don’t get it

Dan (27:42) Hmm.

David (27:42) which is

super weird, right? So I mean, there shouldn’t be a link, but there is. And you must think, you think to yourself, well, all right, maybe that’s something to do with like the way the brain is constructing itself from this flow of information coming in. You you have different, you know, if you’ve ever had, if you’ve got kids, you’ll see initially they, it’s all touch and smell and then they, their eyes are really blurry. They don’t see very much. And then they start to see things and things come on in a certain order.

Yeah, so maybe if you’re born blind that connections they take longer or they get rerouted and so that kind of blocks that path to have like, you know, weird interconnections which maybe are the cause of something like schizophrenia.

Dan (28:11) Don’t happen. Yeah.

Shimin (28:13) Mm-hmm.

David (28:20) But it’s weird, I always actually think, well, maybe the data is actually developing the structure. And the same way as maybe the way we pre-train with trillions of tokens onto this blank slate of a transformer was actually doing something. So that and those experiments, think you just discussed them in that topic when you were talking about my paper in the past on my blog site. I think you kind of covered that already.

Shimin (28:27) Mmm… Nope.

Yeah, so let’s talk a little bit about your follow-up experiments on LLM neural anatomy. So to catch folks up, the idea here is we do a beam search of repeated layers of a LLM in order to see if that improves the performance, with the idea being that multiple blocks of a model

David (28:44) Yeah, sure.

Shimin (29:04) can act as a thinking circuit, essentially. Correct me if I’m wrong here, David. And then this was done on Llama 2, I believe, initially. And so as a part of the follow-up, you expanded it to also the Qwen 3.5 family and found that the results did generalize. Although I believe here it was layer 33, 35.

single block that did the best here. ⁓

David (29:29) Yeah, yeah.

It’s super, super weird. Yeah, that’s exactly right.

So the premise, right, is that if you think about like what a Layla has to do, like what is its job is, what each layer is doing. Like you have a token coming in and let’s stack it up from like the data coming in from the bottom and coming up to the top. And at the very top layer, you have like a linear and then a soft max. That’s kind of like the more or less feed forwards, then a soft max, then off you go.

The initial intuition was like we had to be doing something to get us into a thinking state initially because you know there are lots of different languages, there are lots of different ways to like hack the systems with Bay 64 and so on but they can still think.

So in the middle, there must be some maybe abstract layer where the thinking, quote unquote, thinking is happening. And maybe if you duplicate those layers, it works better. And so with these experiments, the idea was like, firstly, like, do, has models change? We do lots of RLHF and GPRO and all these other tricks. Does it still work? And does it work with MOE as well? Because maybe just does something completely different when you have a mixture of experts.

So what I did again, I did the brute force. I tried every single layer and I have quite a nice computer to do that on and ⁓ Then I tried every single layer itself

like know lay on like 35 once, twice, three times and so on. And then I went overboard. I then like did a beam search. So it’s like, okay, let’s take any possible combination of a single layer or two of those or three of those and then we’ll grab another piece. It could be the same again or it could be a block from over there and I’ll run a beam search and I don’t know, I generated like, you know, a lot of different variants. Got all those scores and then use that to make a, how would I put it? Like a meta model. So to judge.

I used XGBoost to predict how well the score would be and then I like then ran millions of combinations through so I really put the hard work in and it turns out it was just useless. Just a regular block is the best way to go. So just having a repeated block between one and five layers like it seems to improve the intelligence the most.

Shimin (31:10) Yep.

Yeah, and speaking of the different layers, I love your PCA Explorer on part three of the blog series where we talk about, you you have these sample strings in different languages, but they have similar semantic concepts and you can move them, I believe, through the layers to see where the initial language grouping breaks down. And then we eventually see

David (31:48) Mm-hmm.

Shimin (31:53) the semantic grouping in the middle layers before they then again break down into ⁓ a language-based clusters.

David (32:03) Okay and I see

yeah that’s amazing isn’t it I wasn’t expecting that because I thought I’d broken it right so I thought oh I’ve messed this up I thought I’d done a UMAP by mistake because I’d done UMAP experiments and I’d done PCA and if you play with UMAP you always get clusters like you can kind of force it to get a cluster if you want it to. With PCA it’s normally like a big triangular blob and you kind of like you kind of like squint at it a bit and go ah maybe there’s something in there but this came out perfectly and if I can see you’re you’re currently looking at the the Qwen 27b I want you to have

Shimin (32:09) Right.

David (32:33) a look just quickly ⁓ a minute at it because you’ll see that ⁓ you see kind of three clusters there right you can see there are three rough groups right now ⁓ if you look at them you’ll see that those clusters actually even match the language types so you’ve got like in the top you’ve got like a yellow and a blue I think English and French which are romantic languages in the bottom I think you’ve got

Shimin (32:40) Yep. Yep. Yep.

⁓ yeah. Yeah, yeah.

David (32:58) some interesting languages like including Russian and on the on the side I think you have Asian languages. So even in this you can see that Asian languages versus Central European versus like Romantic languages they also have their own clusters. Again I wasn’t expecting that but there it is in front of you. It’s super weird and super interesting.

Shimin (33:09) Mm-hmm.

Right. And then I love, uh, do you talk about the comeback of, uh, Chomsky? Um, they, Dan hates this cause Dan hates when I go, go to any philosophy related topics. Uh, he thinks it’s a waste of time, but.

Dan (33:24) I don’t hate it.

David (33:27) Ah, that’s the point of this, isn’t it? If

you’re not doing philosophy and AI, you’re in the wrong field. think, know, chilling out and just imagining how this affects, you know, how we even like think about what consciousness is, I think that’s the most fun about researching AI. And if you’re if you’re thinking, ah, it’s a stochastic parrot, then you’re in the wrong field.

Can we scroll back up just a little? I want to show you one really cool thing. Back to that PCA view and for your listeners, just check it out because it’s super interesting. Grab one of the larger models like GLM 4.7 or OSS. And I want you to scroll through this really slowly for a minute because you’ll see something really weird happen. You’ll see clusters form.

Shimin (33:45) Yeah, of course.

Yeah, it’s super cool.

David (34:04) And then you’ll see strings of like dots. And just for the listeners who can’t see it, we have them clustered by shape, which is what the topic is. And it could be science or history or law or something, and then by color. And you see these clusters go from initially language into the topic. And during the layers, you see them shift around again. And…

I think what’s going on is that you’re seeing these patterns form the way these thoughts are structured. And what I haven’t done yet, and I’m really looking forward to doing, is we know what these topics are about, right? So we can ask an LLM, what will these axes be that kind of makes sense? And then we can see over time, initially it’s thinking, the first two axes are like how scientific it is versus how artistic it is. And you can see the model then.

Shimin (34:36) Mm-hmm.

David (34:50) moves around and now it’s thinking more in terms of how do these things interact or how many items or how many objects or concepts are we thinking about. So I want to do a semantic analysis of the way the model interprets concepts over time through the layer.

Shimin (35:04) Yeah, that actually that was going to that’s one of questions I want to ask ⁓ later is like, do we have do you have any experiments in the in the in the books about trying to find things that are in between the block level and then like the Golden Gate neuron level of fine grain kind of interpretability? ⁓ Sounds like you are. Yeah.

David (35:24) Mm-hmm.

Yeah, definitely. ⁓ I’m leaning away towards the super low level, like, know, like single neuron aspect. I think the larger area is, for me, more interesting because I think it’s more practically usable. I’m a bit of a hacker in the end, and I’m looking for things that can make models better and more intelligent. I’ll give you an example, which really blows my mind. I don’t know why it’s not like, you know, people aren’t screaming about it it’s so incredibly weird.

Models don’t pick the next word. We make them do it. They pick the next token. Okay, which we all know. It’s just, yeah, they pick the next token. It doesn’t seem so interesting until you think it seems that models can actually pick multiple tokens in a row.

Shimin (35:53) All right. Yep.

David (36:02) Okay, it says like, you you can use MTP, model week token prediction, and you can, the model is seen to be able to predict a couple of tokens into the future. That gets philosophically super weird very quickly. Think about this for a minute. If you ask someone to write a really good poem, like Claude can write one better than I can really, really quickly. By far, and better as well. But if it’s giving you a probability mass and it’s looking forwards, it’s not predicting a poem. It’s predicting like all the poems at the same time.

Shimin (36:17) ⁓ by far, yeah.

David (36:29) one step, which is a really really hard task. mean it’s like if it was a next token that that’s hard but if it’s actually in its weight, all right to predict a good poem you have to predict the last word. You can’t just predict the first next token. You have to have an idea of it right? And if the next word was a sunset, another one was tree, and you see 50-50 probability on both, well that’s two different poems. But we know that they can predict a few tokens in the head so they predicted both poems at the same time. But they have to do it.

Shimin (36:29) Mm-hmm.

Dan (36:32) Time’s just like an illusion,

Shimin (36:42) Yep. Yep.

David (36:57) and then you’re at this like it’s like the multiverse or like ⁓ it’s like a quantum mechanic wave function it’s like they’ve all the all the possible poems are there and us using a sampler is like the wave function collapse at each point it’s super weird and a super ⁓ yeah

Shimin (37:05) Mm-hmm.

Yeah, because it’s outputting

a distribution of all potential poems with given this starting word or the starting query. So we are merely sampling from that distribution. And then the question becomes like, is it necessarily sampling the best one? What does even best mean? Can we just do a bunch of those and see what comes out the other end? But I do want to kind of talk about

David (37:29) Hahaha

Shimin (37:34) You know, you’re running the experiment with these very large models and you’re doing it on your own setup in the basement. ⁓ Tell us a little bit about that setup because we are all very envious.

David (37:41) Mm-hmm.

Yeah, I think I have like the most insane desktop in probably maybe in Europe maybe. It’s a weird story. I was like, as probably everybody who’s a listener, I kind of slightly addicted to local llama. I kind of have the browser open and I check what’s there because that and X is other place to go to get news on an AI. And I saw a

Shimin (37:51) Ha

Dan (37:51) Probably.

David (38:06) Like a guy post, hey, I have like, you know, like one of these grace hopper modules for sale and I want 10,000 euros for it. And I thought, oh, this guy’s a scammer. You know, that’s just, that’s just BS. That’s not true. And then I thought, well, he’s a scammer. And like, I’m also offering him seven and a half. Like, the hell not? I’ll try and haggle the scammer because you why the heck not? And it turns out that he, he said he lived nearby, like, you know, about an hour’s drive away. I’m like, well, why, why wouldn’t I do this? I mean,

Dan (38:29) That’s even stranger, honestly. Like, what are the odds?

Shimin (38:31) Hehehehehe

David (38:34) So I got a bunch of cash out and I thought, well, I’ll just go pick it up. If it’s not there, like I’ve lost an hour or two on my weekend and my wife would be annoyed, but that’s fine. And I drove out into the forest and I drive to the end of this like road and there was like just nothing there, just like a dirt road into the forest. I’m like, but that can’t be right. So I looked on Google maps and there was another end to this street. Maybe I’ve gone on the wrong end. So I drove around and it shut off. It’s just like a pile of dirt and like construction signs. So I went back to the first entrance of this road and I’m like, okay.

I’ll try it. And I’ve driven my little car of this really windy bumpy road into the forest. And I got to the address and it was just like there was an abandoned building there. There was like nothing there. I thought, Oh no, this is a waste of time. Yeah. Like it was that kind of house. And then I look and there’s like hundreds of pigs in there. And it’s like this pigs everywhere, like in these fields and wandering around this house. I’m like, this is, this is definitely the wrong spot. And I’m about to leave and I.

Dan (39:13) besides the man with the shotgun hiding in the abandoned building.

Shimin (39:27) Yeah, the Frankfurt chainsaw

massacre.

Shimin (39:32) Wow that was a great interview wasn’t it Dan

Dan (39:36) It was, and it’s going to be a really tough act to follow.

Rahul Yadav (39:38) You guys really, you got rid of me in that section. I don’t know what

Shimin (39:39) And ⁓

Dan (39:42) I know,

we didn’t like you anyway.

Shimin (39:44) Yeah, you disappeared. ⁓ That was a

Rahul Yadav (39:44) Damn it.

Dan (39:47) You’re the one that has meetings

during our interviews, Rahul.

Shimin (39:49) If you, if you enjoyed that interview, ⁓ please check out David’s Substack. You can just Google it, David’s Substack. There’s lots of super insightful posts there. And as he promised, there’s some cool stuff coming down the pipeline. So go to dnhkng.substack.com to check out more of David’s work. All right. Next we have a vibe and tell from Dan.

Follow up.

Dan (40:14) Yeah.

So it’s kind of a two-parter. I’ll start with this new, I don’t know how new it is, but it’s like relatively new. It’s like what? A couple weeks old. Repo from Antires who was the like reddest dude originally. And so he has sort of made a very specialized fork of llama CPP.

that runs DeepSeek for Flash and only DeepSeek for Flash. But the interesting part is it runs the whole model. And he’s done some kind of, or they, I don’t know how many people worked on this, but like have done some pretty wild tricks using quantization only for certain parts of the, like I think it’s an MOE model and only the front.

Part of it is quantized and the rest like the actual experts are not is my understanding of it again, you know defer to the people that actually know what they’re talking about here, but so long story short it runs on either a big Mac or A pretty darn big Nvidia card if you’ve got one, I think you’d need probably an h100 to run this but if you have 96 gig of

some sort of unified memory, you can actually run this. out of the box, it’s designed to work on, I believe those like fancy new little Nvidia boxes to like the GB 10 machines that have been floating around under a couple of different brands. But there is a branch that is apparently like community maintained. Antires himself does not have a AMD machine, but it will run on a Ryzen three 95.

Max with 128 gig of RAM. So I ran it and I was just curious to see how it works. And so first thing I did was like, you know, download the, has like some auto, there’s two quants to pick from. There’s like a Q1 and a Q2 quant, which kind of sounds like a joke if you know quantization at all, like Q2, like really that’s going to be terrible. But again, only the like front half of the model is quantized at Q2, I guess. So it’s still pretty big.

but it gives you enough RAM overhead that even in 96 gigs, you can have about 210 to like 250K context, token context window, which is not too bad. And then they also did another really cool trick that I thought was neat, which is it can cache the prefills stuff to SSD.

So the reason why that’s useful is if you hook it up to a coding agent like Claude code or a Pi which is what I hooked it up to. A lot of times they’ll have a huge initial system prompt. That’s like, you you are a coding agent, blah, blah, blah, blah, blah. Here’s how to use tools like all this other crap. And so your startup time is often bad on something like this because it has to crunch that huge.

Shimin (43:02) Mm-hmm.

Dan (43:04) prompt and then before it can actually even start doing the token generation. But the problem is then you’re always tacking onto that context. So it’s re-crunching that prompt over and over again. Performance is terrible. So they’ve done some sneaky stuff where they’re essentially caching it based on, I think they try to do, there’s two different lookup strategies for the caching and it falls back to caching it or something like that. But that will in theory allow you to just like,

load that prefill off a disk pretty quickly on like more or less a new session startup. So when you’re within the same session, it’s keeping the KB store in memory. so enough talking about random crap, like how did it actually work in practice? It’s I was getting like in the neighborhood of like 10 tokens a second on.

the 395 Max, which is not like gonna set any, you know, token speed records, but like it’s definitely usable, especially for like casual conversation. The reason why they chose DeepSeek 4 is apparently DeepSeek 4 has the ability to think in moderation. Meaning like, if you give it a test that requires a lot of thinking, it’ll do a lot of thinking, you’ll get something easy it’s supposed to do like, they can sort of like wait to thinking well, supposedly.

Shimin (44:04) Hehehehe

Dan (44:14) So I asked it to actually analyze some code that Claude had written. And I wanted to do a comparison between that and some handwritten code that I’d done. And it took it probably a good like 10 plus minutes to do the analysis. It did do it. I didn’t think the output was like particularly great. I would probably like, this is just vibes, but I would peg it like somewhere in the like, what’s the

not Opus, but the little guy. Yeah, I’d say it’s like a Sonnet 4.5-ish level of intelligence, but like, still, it’s Sonnet 4.5, it’s running on my machine, and it did something useful. You know, not the end of the world. So I’ve been running it, like I kept it running, and I’ve been hooking more and more stuff up to the, like, of inference for it, and it’s been pretty interesting to just kind of see how it goes.

Shimin (44:39) on it.

Dan (45:00) Yeah, it definitely hasn’t replaced. Claude is my daily driver by any means, but like, it’s kind of neat just to have it. ⁓ so the second. Yep. Sort of, I don’t, mean, you know, 10 tokens a second is not, not quite there yet, but it’s, it’s pretty cool. yeah. So then the second one,

Shimin (45:08) Means of means of token production. Yeah, you now have captured your means of token production. This is really good news.

Dan (45:21) So I also vibe coded this. I’ve been doing a lot of home automation stuff. And you can buy these things for, I think they’re about 70 bucks. It’s a seed studio, uh, since cap D one. Um, so it’s basically an ESP 32 processor. can flash ESP home to it, which is an already made open source project. Um, that

Supports a whole wide variety of like little microcontrollers so you can make your own like home sensors and stuff with the esp32s Which is pretty cool, but you can also make little dashboard guys like this so I used Claude to vibe code this huge lambda which is like a C function that renders the interface for this thing It’s like one several thousand line file. They’re I full-on vibe code and friends, but you know it works and

Shimin (45:52) Mm-hmm.

This is… This is like the… ⁓

Dan (46:07) I will say that when I push this button, the lights go off. So like pretty cool. You know, it does what I hoped it would do. So yeah, anyway. ⁓

Shimin (46:12) Nice.

Yeah,

this is like that healthcare patient portal that was just one huge HTML with JS.

Dan (46:24) Pretty much,

only this is C instead of… Yeah, so…

Shimin (46:30) Well, now

we kind of understand what folks who don’t know how to code at all feels about the ability to use Claude cowork, right? Like you don’t have to look on the inside to see what it’s actually doing. You just want to treat it like a black box, you know, tokens going tokens or functionality comes up. And sometimes that’s all you need to get to that 80 to 90 % spot.

Dan (46:55) Yeah, I mean, I didn’t even tell it what I wanted on the dashboard. What I did was, I have another automation project that we’re working on that uses like an existing package to pull, to create types, types for, ⁓ all the entities on my like home assistant network. So like things like lights and switches and whatever. And, so I just pointed Claude at the entities folder and I was like, look, here’s all the stuff. You can go look at it and just like build me this thing.

Shimin (47:06) Mm-hmm.

Dan (47:19) Mostly there’s a little bit of redundancy that I worked with it to get rid of, but like it mostly picked what I wanted. So think the only thing I’m going to add to it is HVAC, but yeah, pretty cool.

Shimin (47:28) very impressive. Overall you would recommend VibeCoding to everybody.

Dan (47:33) I would for… I mean, would I sell this as a product? No.

Rahul Yadav (47:36) Agents in Engineering.

Shimin (47:38) Sorry, yes, agentic Engineering, yes.

Rahul Yadav (47:40) Dance,

Dan (47:40) Yes.

Rahul Yadav (47:41) yeah, dance and agentic engineer. He’s not a wipe-coder. Yeah.

Shimin (47:43) Agentic C engineer.

Dan (47:46) No, my C is definitely

five coded. My TypeScript is agentic engineered. Sure.

Shimin (47:51) All right, well, thank you for that vibe and tell. I am very excited to see where this DS4 project goes as well. I know Anteriz has been talking about, you know, doing more than just DeepSeek for Flash at some point, but the idea of having a super optimized single model that everybody can align on, that’s going to be our local token future, I believe. All right.

Dan (48:14) Yeah, and

especially when it can actually like do real stuff, you know, like, I it handles tool calling, is pretty impressive for something that’s running on a little box under my desk, you know?

Shimin (48:21) Impressive, yeah.

Also, now

I feel GPU envy. I need to grab myself a new machine with more VRAM.

Dan (48:30) Need more RAM, that’s all.

Good luck during the RAMpocalypse, maybe in a couple of years.

Shimin (48:35) fair. Okay, moving on to deep dive. This week we have an article from Rahul titled Should I State or Should I Show?

Rahul Yadav (48:44) Should I state or should I show Aligning AI with Human Preferences? This is by Keaton Ellis and Wanying Huang.

So in behavioral economics, there’s this concept of stated preferences, preference versus revealed preference. And a very simple example of that would be my stated preferences, I like to eat healthy and I like to eat salad and greens. My revealed preference would be I order burger and fries every time I go to a restaurant. So and brownies, you know. So what you say and how you actually behave ⁓ and how they’re

Dan (49:12) And brownies.

Shimin (49:19) Mm-hmm.

Rahul Yadav (49:20) different and it’s a key concept in behavioral economics because you want to study how people actually behave. So this paper, one of the things as AI becomes more prominent

We’re gonna see it take more and more actions on behalf of humans With little to no oversight right the more we trust it the more it’s going to be PR looks good ship it Yeah, it looks like you The looks like the answers you’re giving me are plausible even if they look plausible, but maybe are not accurate and You would go ship it. So there’s a lot of this delegating actions to AI

Dan (49:44) You

Rahul Yadav (49:58) partially or fully, that’s going to happen over time. What this paper studies, and the study they did was what people actually say and how they behave and how AI would operate differently if it operates off of what they said. So that would be the prompt you gave it versus how people behaved. That would be the actual behavioral data you feed into it

So they did these lottery experiments. Lottery experiment, a simple example would be would you rather choose four dollars 100 percent of the time or you know 80 percent of five dollars and percent of the time. So like they somehow have to equate to the same end result but you get to play this little lottery where you pick whether you pick option A versus option B.

And from that you get concepts like loss aversion and all these different behavioral economics concepts that we’ve gotten. So they ran a similar experiment of lottery choices. then they had people write down

their specific prompts that they would want to give to AI agents. And then they had the AI actually do similar lottery type choices. And they had three different types of AI. One they called the prompt AI. So this is literally people writing natural language prompts saying, these are my risk tolerances and this is how given a lottery choice, this is how I would want you to behave. Second one, they called AI.

data AI and this was the researchers giving AI access to the behavioral data of the people being studied. So the choices they made in previous lottery experiments and their click behavior and everything without actually giving any of the prompts. And then the third one they called both AI where…

They took both the prompt that the people being studied gave it and their historical choice data. And then they studied like how would the AI perform in these different scenarios. They’re good. Yeah.

Shimin (52:01) So before we look at the results, Like just kind

of a, from the first principle way of looking at it, I would expect if humans are perfectly rational, that their prompt and their actions are exactly the same and the agents should be aligned and no matter which set of data they are given.

Rahul Yadav (52:12) Yeah.

Yeah.

Exactly. Yeah. What happened instead was the data AI ended up they found statistical significance, where it ended up outperforming the prompt AI. And the reason for that was saying

Stating your preferences is very hard because how we feel, how we think are inherently very complex things. And ⁓ any time you try and say it out loud or you write it down, it is very hard to distill it into these simpler forms because to a certain extent, ⁓ writing is lossy unless you’re trying to write down every single caveat and all the different things that you have intuitions about.

Dan (52:46) Mm-hmm.

Rahul Yadav (53:04) but you can’t really talk about it or you say those out loud. Versus if you train AI on the patterns, then it is able to infer all those things as well that you didn’t actually write, but it’s able to get from your behaviors.

So I think they got like 75%. So they did a match rate is how they determine how different AIs did. And they were able to get about a 75 % match rate with data AI versus the prompt one did 70 % or something like that. And then the funny thing is you would think both AI, given that it has data from both of them, would do best of both of these.

of all three, it actually did the worst because…

anytime it saw conflicting data, it differed back to the prompt related stuff and then it got confused and so it ended up doing worse than it would have done with either of them. actually giving behavioral and prompt data ended up being much worse than giving if one or the other prompt, both AI ended up being, I think about 66 % of the time it went to human prompts and then it obviously did worse than the data AI.

one so

Shimin (54:22) So meta was correct to record its employees movements as opposed

to like have their employees write down the prompts of what they do. Right.

Dan (54:29) ⁓ no.

Rahul Yadav (54:32) ⁓

Your stated preference may be that you care for your people and your revealed preference may be otherwise. I will leave it at that. So.

Dan (54:41) You

Shimin (54:41) Right.

Well, I also

want to point out that not only are human prompts lossy, we are also assuming that humans have perfect knowledge of themselves, which is, I think, pretty much never the case, right?

Rahul Yadav (54:54) Yes.

Dan (54:56) That was, that

Rahul Yadav (54:57) Never.

Yeah.

Dan (54:57) was what I was going to say. Cause there’s like this whole like post purchased rationalization or like post consumer rationalization is what I’m sure you’re called. Again, it’s also a behavioral economics concept, but I know it largely through the lens of like my partner does consumer insights research that comes up a lot. So it’s like, Oh, you bought the cookie. Why’d you buy the cookie? They don’t really know why they bought the cookie. But then, you know, people will like make up a story about.

Rahul Yadav (55:03) yeah.

Yeah.

Dan (55:20) the cookie and despite the fact that they just made up the story, you actually learn a lot about the person from the story itself and like it is actually useful research despite the fact that it’s made up to something. Not all of it, sorry My wife’s gonna kill me if she listens.

Rahul Yadav (55:20) Yeah.

Shimin (55:24) Mm-hmm.

Rahul Yadav (55:27) Yeah.

Yeah.

Shimin (55:31) So.

So Dan, why

did you purchase that AMD RTX 395?

Dan (55:41) It sounded cool. No, I wanted to run local models and it was cheaper than the Nvidia one because when the Nvidia one was announced it was supposed to be like 2k and then when it was actually released it was like when they were calling it digits before right and then when that was actually released it was like yeah suddenly it’s four thousand dollars and ⁓

Rahul Yadav (55:46) See?

Shimin (55:59) Okay, well, now

let’s go into Dan’s purchase history to see if that’s the reason why.

Rahul Yadav (56:03) Hehehehehe

Dan (56:06) And you’ll find out I bought a bunch of AMD machines. Yeah. I’m sure that factored into, but.

Rahul Yadav (56:10) ⁓

So, slight digression on this. A really good book is Mistakes Were Made But Not By Me. And it’s by

Shimin (56:21) I love that

Dan (56:22) You

Shimin (56:22) title.

Rahul Yadav (56:23) Yeah, it’s by Carol Tarris and Elliot Aronson. came out maybe a decade or two ago at this point where the whole book, if you really want to understand cognitive dissonance, that is the one book to read. it really like then a lot of the things in the world make sense that they don’t before because it’s so hard for our brain to live in a cognitively dissonant state that it would do whatever

to get rid of that because we just don’t want competing ideas in our head or anything that would challenge, you know, a clear coherent story. And they give all sorts of examples and everything. And what Dan said is also a flavor of cognitive dissonance, right? You don’t want to have the brain struggle hard with why did you make that choice that you can’t really rationalize and it just comes up with a story. And you see this anytime these like weird issues happen, you can almost

Dan (57:04) Yeah.

Rahul Yadav (57:15) always track it back to cognitive dissonance. And they called it like the mother of all biases or something like that. So awesome book. I recommend to our readers that they check it out. ⁓

Dan (57:28) Maybe in a

future episode of Rahul’s Book Club, slash Rahul’s Rants, slash, I don’t know. Rampage, yeah, that’s right. My Rants is Rampage.

Shimin (57:33) Rampage. Yes.

Rahul Yadav (57:34) So

there’s a couple other things I wanted to cover related to the paper that we were discussing. One other thing they did, which was super fascinating afterwards, was they asked the humans that they were testing.

whether they realize they tested whether the humans realize that they were bad at prompting. So they asked the humans, hey, which agent out of these would you want to use? Would you want to use the prompt AI, data AI? ⁓ Or I forget if they gave them the option of both AI.

Dan (58:05) Or both, yeah.

Rahul Yadav (58:08) 35 % of the people failed to pick the Data AI agent, which was better performing because they overestimated how good they were at expressing their ⁓ preferences. And so that also really goes to show like how much we think of ourselves, how well we think of ourselves, but actually it’s just not true.

Dan (58:18) Mmm.

Shimin (58:19) Mm-hmm.

Dan (58:27) Ego’s a hell of a drug.

Rahul Yadav (58:29) Yeah, so it was great that they were able to like, you know, run these different experiments one after the other as part of that study. And yeah, I have one other thing related to this that I wanted to cover.

Dan (58:37) Yeah, that’s pretty cool.

Rahul Yadav (58:44) trying to find my notes here. yeah, so there’s like some implications of this

This is, before I say this, this is Gemini. So take it with a grain of salt and just because it’s plausible doesn’t mean it’s accurate. There are apparently regulatory frameworks like the EU AI Act and there’s different like safety movements that are demanding that there’s transparency and steerability in these, how the AI agents behave, which means they have to

obey the prompts that they are given. They cannot just be going about making their own decisions. And if you apply that to this specific study, we know that prompt agents do worse, but we also don’t, if you have to, if you comply with EU AI Act, then you go with steerability and transparency and all that.

you are literally picking the worst of the models because you have to comply with those regulations. What could happen is that in the future the AI might say, I heard you, but it actually uses your revealed preferences instead of your stated preferences and carries out the task that way. So that’s something we could see happening because

The end goal that everybody’s optimizing for is accuracy and delivering results and everything. if one of the hurdles to clear is you should be optimizing for revealed preferences instead of stated preferences, people are going to find a way to go after that versus specifically following the instructions that were given to them.

Shimin (1:00:18) I would hate the idea of having an agent that does not actually follow my explicit prompt because I think humans are also hopeful creatures. if even though I’ve had cheeseburgers ⁓ for dinner that was delivered by AI every day last week, my prompt is still going to say go bring me a salad. Right. Like if the AI then brings me a cheeseburger, like what does that say about the human prospect for change?

Rahul Yadav (1:00:24) Yep.

Yeah.

Shimin (1:00:44) and for improvement.

Rahul Yadav (1:00:44) Soon you might be,

yeah, you might ask it to remind you to drink water. I don’t know, Shimin The more, this is, we were discussing off the record, Nat Friedman was asking his open-claw deployment to remind him to drink water. That’s the reference there. Over time, as you, and it’s, it took a picture of him and showed him drinking water.

Dan (1:01:00) And it took a picture of him drinking water.

This is fine.

Rahul Yadav (1:01:09) over, maybe not initially, but there is a world where over time, people could delegate more and more of that to AI, right? In that world, maybe people would choose to go with the revealed preferences versus stated preferences.

Shimin (1:01:25) Alright, yeah, this is… I think this is one of the more important papers we’ve covered.

Rahul Yadav (1:01:29) I do have for anyone in the audience who’s interested a different spin on an exercise we did a few podcasts ago where we asked you what your agent knows about you. You could also ask it how are my stated preferences different from my revealed preferences and you’ll learn some fun stuff. Let me just leave it at that. I did.

Shimin (1:01:42) Mm-hmm.

Dan (1:01:42) You

you’ve tried it. Okay. We’ll have to come back and…

Of I’m typing into Claude right now, but… It’s thinking.

Shimin (1:01:59) Yes, I’m eagerly awaiting the result of that one. Well,

Rahul Yadav (1:02:00) Hehehehehe

here let me read out mine while you guys is loading. Mine says, first one, abstract frameworks versus discrete tooling execution.

Shimin (1:02:09) Okay.

Rahul Yadav (1:02:16) Stated preference, a strong conceptual focus on macro level mental models, organizational structure and systems thinking. Actual behavioral craze shift rapidly away from abstract framework source immediate micro level technical efficiency, direct economic evaluations and automated quality control.

Second one is stated preference dietary and lifestyle goals are articulated through the lens of strict performance optimization Focusing on precise nutrient timing clean metabolic impact and structured supplementation splits because I was asking about like creatine and something And then reveal preferences true behavioral consumption choices lean heavily towards textually complex traditional culinary profiles that prioritize cultural diversity and deep flavor or purely utilitarian mac

Shimin (1:02:48) Mm-hmm.

Rahul Yadav (1:03:02) nutrient design.

Shimin (1:03:03) I surprised the ice cream. They did not make an appearance there.

Rahul Yadav (1:03:06) Yeah.

Yeah.

Dan (1:03:09) Mine was pretty dumb. It said, my stated preference is lean towards minimalism and consistency and revealed preferences leading toward optimization and exploration. And then it goes, deals aren’t necessarily in conflict. Exploring is how you find what you consistently stick with. And it’s like, it’s also just like a usage pattern. Yeah. And then like the examples I gave are really kind of dumb because it was like, you prefer like

Shimin (1:03:25) Yeah, that’s weak sauce.

Rahul Yadav (1:03:26) Yeah

Dan (1:03:33) simple, clean TypeScript, know, functional over classes, whatever. And then it goes, you say that, but then you’ve run a Kubernetes cluster and blah, blah, blah. it’s like, well, those aren’t related things. yeah.

Rahul Yadav (1:03:42) What? ⁓

Shimin (1:03:44) those do not

contradict each other at all.

Rahul Yadav (1:03:49) I gave it as the article as the link for reference. So maybe if you, you know, you can try it later. If you try it and that it might ground itself in like, what do you mean by state that I know the preferences.

Shimin (1:04:01) You didn’t give it as reference. You did some context engineering, Rahul.

Rahul Yadav (1:04:06) Yeah, exactly.

Shimin (1:04:08) All right. ⁓ Okay. Now let’s move on to our favorite segment. ⁓ our last segment as always. Two minutes to midnight where we checked the state of the AI bubble using the analogy of the Amigeddon clock from the Bulletin of Atomic Scientists. I’m going first this week. I have the news item. Cerebris.

raised $5.5 billion when IPO this week and the stock then popped 108 % on the first day. If you’ve not heard of Cerebras, are a custom chip maker. They produce chips that are very fast at model inference.

Dan (1:04:37) Hmm.

Shimin (1:04:50) What they managed to do is they managed to ⁓ attach a whole bunch of high bandwidth memory on the chip itself. compared to compatible NVIDIA GPUs, it has like the same amount of memory, but something like 80 times the memory throughput. yeah, so the investors are clearly still giving hardware companies lots of money and rewarding breakthroughs in the hardware space.

Dan (1:05:14) Hmm.

Next up, a lot of tech crunch this time around, Anthropic now has more business customers than OpenAI according to RAMP data. So RAMP is like that credit card expense processing company. So they’re basically using that as a data mining source. So it’s not by any means a perfect data set, but…

Yeah. And it shows that something like 34.4 % of participating businesses are paying for entropics services. True. More than.

Not surprising, but another little finger gun pointed at OpenAI.

Shimin (1:05:50) Mm-hmm.

Yep.

Rahul Yadav (1:05:52) And last one is the politics of jobless prosperity by Andy Hall. The article starts with people who are hungry and out of a job are the stuff of which dictatorships are made. That’s from an FDR state of the union 1944 speech.

The crux of the article is OpenAI has its economics department and Tropic has its where they’re coming up with and they’re actively pushing or proposing these policies to preempt what’s going to happen if AI takes away jobs at a large scale. And so they’re coming up with ideas of shorter work weeks and you can give people a certain amount of basic income and things like

The thing that the author calls out is when it actually becomes reality, way we’ve approached these things has always been reactive. If you look at history, you cannot really preempt these things. And anytime you have joblessness and all that happens, populism, we’ve seen historically, goes up and also

And if initially people might accept what you give them, that doesn’t necessarily mean it’s some sort of binding contract or anything like that. And people wouldn’t want to be in a position like that because they wouldn’t know what the effect of these things would be. So a lot of the things that are being tried right now when they meet reality in our political system in that situation might not go as well as the lab.

planning right now.

Shimin (1:07:32) Yeah. And this article was looking for a 2 % unemployment jump as kind of the line in the sand for when stuff is going to hit the fan.

Rahul Yadav (1:07:39) Yep.

Yeah.

Shimin (1:07:45) Alright, so…

We were at six minutes last week. We moved back a ⁓ pretty decent chunk. ⁓ Given our news items this week, how do we feel about where we are in the AI bubble?

Dan (1:07:53) Mm-hmm.

still feels like solidly in the middle of it to me.

Shimin (1:08:03) Yeah, I’m with you. think, you know, Anthropic may have more business customers than OpenAI. That doesn’t mean OpenAI is in trouble just yet. It’s not like OpenAI is losing customers at the moment. And of course, a huge AI inference IPO kind of also leans us towards there’s still money in the system slashing around.

Dan (1:08:11) Dead, yeah.

Mm-hmm.

Shimin (1:08:24) And lastly, of course, we haven’t had a 2 % unemployment job just yet.

Rahul Yadav (1:08:28) Yeah.

Shimin (1:08:29) Thankfully, despite what Meta is trying to do.

Dan (1:08:29) Yeah.

Rahul Yadav (1:08:30) At that point, think we would be,

yeah, we would be more confident on the clock if that comes about, I think.

Dan (1:08:37) And or much more aggressive about our sponsorships like Google, whatever it was.

Shimin (1:08:40) Ha

Rahul Yadav (1:08:45) There’s

mass unemployment happening and we’re debating whether the clock should be at 15 seconds or 25. What are we feeling?

Dan (1:08:52) No, I think at that point we’d be like, the nukes are landing.

Shimin (1:08:56) ⁓ I’m okay with leaving it at 6. I can also see us maybe pushing it back a little bit But I think 6 I don’t feel like there’s been major breaking news. So maybe let’s leave at 6 I agree Okay All right. Let’s leave it at 6 then and of course with the setting of the clock We come to the end of the show Thank you

Rahul Yadav (1:08:58) Yeah.

Dan (1:09:04) Six is pretty far away. Yeah, I think it’s acceptable.

Rahul Yadav (1:09:04) Yeah.

Shimin (1:09:18) Listeners for joining us for another conversation this week. If you’d like to show, if you learned something new, please share the show with a friend. You can also leave us a review on Apple podcast or Spotify. It helps people to discover the show and we really appreciate it. If you have a segment idea, a question for us, a topic you want us to cover, or if you would like to hop on the show, show us an email at humans at adipod.ai. We’d love to hear from you.

You can also find the full show notes, transcripts, and everything else mentioned today at www.adipod.ai. Thank you again for listening, and we’ll catch you on next week’s episode. Bye.

Dan (1:09:52) Thanks.

Takeaways

Resources Mentioned

Chapters

Transcript

Read more on the blog

Glossary terms in this episode