Two weeks ago I started talking about our potential paths forward for how we create an “inhuman” testing and verification system for inhuman code. We are not going to put "the massive amounts of AI-generated code genie back in the bottle. There is no “reverting” this structural change of working as an industry. We need a fix-forward solution.

I have also been suffering from the “AI Vampire” problem. I've been feeling bit burnt out from not managing my own workload and rest appropriately from using vibe coding tools. The solution isn’t “don’t use the tools at all”. We need a fix-forward solution for this as well and it will look a lot like what an athlete does. We'll get to this in the second half of the newsletter.

This past week I caught the Anthropic “Code with Claude” conference livestream on Youtube, and I saw evidence of potential forward fixes for testing and verification in multiple presentations. The Datadog talk was the best one.

During a session with Boris Cherny and Jarred Sumner, they brought up that the Bun package manager was doing a Rust port (from Zig). Since I first wrote this, they have merged the PR! I had Claude take a look at the PR to glean any lessons and while they don't use formal verification methods, they do have a fairly extensive multi-agent-driven specification and verification porting process.

I also heard evidence of improving verification on a recent episode of Joe Rogan with Chamath Palihapitiya (of the All-in Podcast). Chamath actually goes into depth of how his company, 8090, is part of a government contract program that is turning legacy code into English specifications and ensuring that the behavior they encode is, in fact, what they want the code to do. Chamath claims this will help eliminate waste and more accurately automate more of what the government does.

An Open Secret

It appears to me that it's almost an open secret that everyone is working on advancing verification as an industry, but nobody has an answer that they are particularly proud of or are ready to talk about yet. There aren’t people shouting from the rooftops about their great method, or at least making it on the front page of Hacker News.

I've only seen hints of people using specification and verification language (this will become important in the second half of the article). Things like “invariants”, and “property tests”. The only actual mention I’ve seen of “Formal Verification” on Hacker News is Martin Kleppman's post in December 2025 predicting formal methods becoming mainstream and the discussion around it. I haven't seen a declarative, "hey, this is definitely what we're doing, and how you can do it too.”

I think it's very interesting that this potentially may have some alpha (or “advantage”) still. People are potentially gatekeeping how they are making coding with AI agents work better for their company than their competition. Or maybe the process is so customized to each company that publishing your methods may not even be applicable to others. Or maybe nobody really knows what they're doing yet. I lean towards the first and the third explanations.

If you haven't already, you can join our Discord to connect with other like-minded AI-enabled software engineers and discuss the latest trends, tips, tricks, and techniques.

It's a great place to further explore topics that we discuss on the live stream and where I put all of the links that we pull up every stream.

Premature or Overkill?

Traditionally, only highly mission-critical systems dealing with health, safety, or cryptographic fundamentals use formal verification to prove that the specification was implemented correctly. But just like how we now live in a forest of low-hanging fruit with AI-generated code, the effort level to formally verify software has never been lower as well.

I would assume if people are paying a lot of money for your software, they expect it to work correctly. You can accomplish this with lots of various levels of test coverage, but formally proving certain features operate the way you specify them adds yet another layer of confidence. We may see more companies and teams move towards this, especially if you are going to de-emphasize or remove human code review altogether because the amount of code generated per PR is now inhuman. We as an industry need to develop an inhuman process to handle inhuman amounts of code.

An inhuman, “inhumane” process

Formal verification certainly is not standard practice across the industry, yet. For those who missed it in my previous newsletter, formal verification is, after consulting with Claude, a set of methods for proving that your code matches what you wrote in your spec.

Think of math proofs in your high school geometry class. The step up version is discrete math proofs in freshman year college computer science class. Basically, we are not just testing, but mathematically proving that the code works exactly as we specified, which is a much stronger assertion than any individual test. The inputs, the outputs, and the transformation between them are exactly what we defined in the spec; no more, no less.

What used to be an inhumane pain in the ass as a college freshman 15 years ago is now potentially a basic necessity in AI-generated code pipelines.

I have only been thinking about this for two weeks, but it's this problem that's starting to plague the industry for the past six months or so since agents have become a staple at every company. How much should humans review, and can humans even do a good job of reviewing if no human actually wrote the code?

If I were starting a new project and had customers that relied on my app to do exactly what I said it would do, I would definitely have some amount of formal verification methods. It just adds an additional stage at the beginning and end of your development process, which, if you're using agents with any amount of orchestration, doesn't matter to you.

Layers of Swiss Cheese

Sesh Nalla from Datadog in his Code with Claude talk described their code release process as having “layers of Swiss cheese”. You think Swiss cheese has a bunch of holes in it, but if you have slices with different holes in different areas, then it's an effective filtering mechanism to hopefully catch many, if not all different types of problems before they go out to production.

This is more or less what I proposed in my previous newsletter: we need an inhuman pipeline for inhuman code. I've been saying for a while that we need to 10x our testing, verification, and validation, but that was just in a vague sense. This is thankfully much easier and more concrete by having our AI Agents write code to fortify our defenses against bugs. It almost doesn’t matter how much testing we add since we can have agents manage it for us.

Of course, we still want to be able to go from code change to production deployment within 1 hour (or do we? does that matter? that was a pre-AI DORA metric standard, but speed to deploy still sounds pretty good to be able to fix changes?). It’s just a matter of how many dollars in compute and tokens do we want to spend on each code change? This is a question of economics and is different for every project.

The key question for businesses moving forward is: how many dollars of business value do you get out every one dollar you put into tokens?

SaaS businesses previously flipped dollars in cloud spend into dollars in business value. Businesses moving forward are now flipping dollars spent in tokens into business value. This is the key trick moving forward.

What can you add to tokens that someone else can't? Your perspective.

Don’t Prematurely Optimize

I find when developers hear “optimize spend”, they by default believe it to mean “minimize spend”. I try to remind developers that our time and sanity continue to be the highest ongoing cost to any business. We need to continue to advocate for tools that save us time and toil so we can focus on delivering the highest ROI business value.

We don't want to think about conserving token spend when we haven't even figured out how much we need to spend yet. I don't advocate for spending tokens just to spend them, but we should be free of a penny-pinching mindset when we haven't even realized the value of the models from six months ago. You need to spend money to make money.

I have used a bunch of words here so far that are specific to the software development industry. How differently would this read if you did not know these words at all? l barely remembered the words "formal verification" from my freshman year college discrete math course. But I still vaguely knew what it was, and that I could plug that term in to the LLM and start to make progress.

A Note on Magic

Knowledge is still power, even if Sam Altman wants to meter your access to an artificial intelligence. The following is my bull case for LLMs ultimately motivating the increase of human intelligence in the world. The more you know, or at least are aware of, the more effective you can be using these tools.

Just the mere knowledge that there is a thing that you can do called “formal verification” enables you to cast a new magic spell with AI Agents. You don't even need to know that it “mathematically proves that specs can accurately be turned into code”. The LLM can give you that definition if you just know the magic words or get in the neighborhood of "formal verification". Otherwise, this is a completely inaccessible part of the model without even the notion of those words.

I noticed this when talking with my non-technical vibe coding brother: he doesn't know what he doesn't know. I can give him hints on where to go next with his project or what things he can do better, but he's gotten very far just by asking Claude to make his project more of what he intends it to be.

I think this self-bootstrapping, hyper-personalized tactical method of learning in the weeds is insanely useful and earth-shattering, but we need to also make time for strategic, big-picture, higher-level learning that happens outside of the LLM where a human can guide us through their perspective on a subject. There’s that word again.

He complained recently about having to "wire everything together" with his various vibe coding cloud hosting services. I asked what he meant by “wiring”, and it sounds like he had to manually deploy different parts of his app to different services and then update a bunch of environment variables on each service to point them at each other, which is insane that he knows what an “environment variable” is and probably still an appropriate solution for where he's at in his project.

I told him, roughly: "What if I told you all this is hosted on AWS in some way or another, and that AWS was the real infrastructure underneath it all? If you just wanted everything to wire together automatically, you would write some Terraform and deploy to AWS yourself. The hosted vibe coding services are opinionated wrappers that prevent you from footgunning yourself.”

My brother: 🤯

He just didn't know what he didn't know. [1]

Unknown Unknowns

I feel like this is the same thing with formal verification. I noticed that certain people in the industry were using different terms like “invariants” when describing specifications. My primary understanding for “invariants” is around the constraints-led approach (CLA) to coaching sports. It was jarring to hear them being used in this computer science context. Little did I know the people using these words were all pointing towards the same thing: better specifications that lead to better, more formal verification.

Nobody in the industry is explicit and loud about, "Hey, this is what we're doing to verify our AI-generated code!". It's just a gray area for now because it's still very new. Nobody really knows what they're doing yet to be confident enough in saying that their method is definitively a best practice, especially when it's a deep field with many PhDs who have spent lifetimes in the space.

I believe a large portion of our job moving forward will be reading technical and non-technical material in order to gain new ideas and perspective for what's possible with LLMs. AKA, learning for the sake of learning. If LLMs have read every book in existence and every corner of the public internet, how do we invoke the highest alpha, or the least known bits of it?

There are unknown unknowns similar to what my brother experienced, and it may be our job moving forward to shrink that set of knowledge to be as small as possible. We can only do that now through command of the written language and all of the concepts described therein. We need to broaden our horizons of what's possible to invoke different parts of the LLM to get it “thinking” in different ways.

wtf… so I have to keep learning?!? For life?!?

We’re now on a learning curve of all human knowledge. We always have been, but now that it is more apparent and accessible to everyone with AI Agents. It’s more important and rewarding to climb up this learning curve than ever.

I believe all learning curves are “exponential” [2], or at the minimum for sake of argument, quadratic (source: trust me bro). The exponent can be any number above 1, like 1.1 or 1.00001, but 2 in the following section illustrates the example better. The important idea to keep in mind is that learning compounds in humans and machines.

The more you learn, the faster your rate of learning. The more mental model scaffolding you have, the more you can attach new ideas and have them stick. The more ideas stick, the more they can cross-pollinate and make it easier to learn new ones in the future, etc. I believe that the "second derivative" or “acceleration” of learning is positive.

Quick calculus lesson here for those who forget. Don't worry, it's super simple and mind-expanding. You've read this far because you like learning already. Math is just a structured way of thinking.

A simple “quadratic” (a 2 in the exponent) equation is y = x2

It looks like this:

It means that the value of y is equal to x multiplied by x. For example, y = 1*1, y = 2*2, y = 3*3, etc

You can see that for each value of x, it goes higher and higher up at an increasing rate. Notice the scale at x = 12, y = 12*12 = 144. A linear relationship would look like y = x, where x = 12, y would also equal 12.

In physics, we call this first graph the “position” of an object. You can think of it like an odometer tracking the distance traveled on a car trip. At x = 10 hours, you’re 100 miles into the trip. At x = 20 hours, you’re 400 miles in. What would it be at 40 hours?

That’s a constantly accelerating car that isn’t possible to engineer! Spoiler alert, it’s your brain.

The calculus part of this is that the derivative, or the rate of change, of this graph can be represented by moving the 2 down next to the x. y = 2x. Don't ask me why, I honestly forget. This is just a math fact.

This graph of y = 2x represents the concept of “the more you know, the rate at which you can make new connections increases every time you add something new to your existing knowledge base”. For the sake of argument, you can increase the rate of making connections by 2 for every 1 thing you learn, or for every unit of time, etc.

In physics, the second derivative is called the “velocity”. In the car analogy, it's what the speedometer shows at any given moment. At the various points in time we examined before, x = 10 so y = 2 × 10 = 20 mph. At x = 20, y = 40 mph, etc.

What’s cool about calculus is that the area under the line (down to y = 0) of this will give you the current position of the car. This concept is called an “integral”. Literally the sum of all of your experiences if we’re talking about learning.

One more derivative. The second derivative is y = 2. The x goes away. Them’s the rules.

This shows a constant, positive acceleration for the entire duration of our “car trip”. This would be the force your phone’s accelerometer measures to help determine its position in space (it can detect the -9.8m/s2 acceleration due to gravity downward at the same time). You would also feel getting pushed back more and more in your seat over time like a race car driver or fighter pilot feeling G-forces.

Acceleration shows the rate of change of the rate of change. Each time we learn something, the rate of the increase of our rate of learning, goes up by 2. Realistically, this acceleration of learning probably peaks in childhood and then varies over the course of our life with our cognitive health.

This value > 1 just means that learning compounds. If this “acceleration” value were 0, then we’d learn in a linear fashion and knowing more things would NOT make learning new things easier. It would be a constant amount of effort to learn new things. Since I think we can all agree that the more you know, the easier it is add on knowledge, we can agree that this value is above 1.

Again, the above numbers are all made up, but this is an empowering, directionally correct model of how we learn.

The Math of Learning of Jiu-Jitsu

I feel this learning curve most viscerally in Jiu-Jitsu. There’s your free space on your Mike newsletter bingo card. If you don't want to read about Jiu-Jitsu for the next few minutes, skip ahead to the TL;DR.

There's canonically five adult belts in Brazilian Jiu-Jitsu: White, Blue, Purple, Brown, Black.

You start your journey as a White Belt with zero grappling experience. You have exactly 0% chance of submitting (also known as “tapping”) a Black Belt your own age and weight class. There is just no way. It is simply not possible. if you feel like this is a challenge, please try at your local gym and let me know how it goes!

The above is a fact, but the rest of the following percentages are, “just like my opinion, man”.

When you get your Blue Belt, you're 20% of the way time-wise through the belts, but you still don't feel threatening at all to a Black Belt. You certainly don't have a 20% or a 1 in 5 rolls (sparring rounds) chance of tapping or even getting to a dominant position against a black belt. It is much closer to 1%, and that’s being generous. It is possible that you may never submit or “tap” a Black Belt while you’re a Blue Belt. Only exceptional competitive Blue Belts have the potential ability to do that.

Once you get to Purple Belt, it's closer to 20-30% chance of getting to a dominant position on a Black Belt, still a much smaller chance to tap, closer to like 10%. It’s a well known phenomenon that at purple belt, you are now actually dangerous to all the belts, not just the ones lower than you. You know enough and can potentially specialize in a very specific thing that can trip a Black Belt up and you can catch them. Brown Belt success rate against Black Belts probably triples from Purple Belt from 10% to 30%, and then a Black Belt is very even with other Black Belts, with 50/50 chances if all attributes are equal.

If a White Belt, Purple Belt, and a Black Belt take the same class from the same instructor, the Black Belt gets the most absolute value out of the class just by virtue of being able to connect to the lesson to their broader mental-model-mind-map-web-mesh of techniques, but the White and Purple Belts may make greater relative gains because they may be experiencing something for the first time which was previously a hole in their “mind map”.

If we think of a mind map, a White Belt starts trying to figure out different “islands” of techniques. They get some conception of what a “guard” is and somehow that connects to what a pin is. They get some flashes of insight here and there that connect these different things together, but they get very lost if there’s something they haven't seen before.

As we go up the belt rank hierarchy, all the little transitions between positions become more and more defined. You get a higher level of resolution of grappling. You've seen many, many more of the positions before than someone who is still early in their journey, and you're able to get more out of each repeated experience than the person who is experiencing it for the first time.

That's why if you specialize in a very particular position as a purple belt, you can potentially have experienced that position possibly more than many black belts. If you can funnel people to that particular position, you can be very dangerous to everyone. However, if the black belt has, and probably will have, a broader array of techniques, they can take you out of your comfort zone very easily and into a comfort zone of their own.

TL;DR of the Jiu-Jitsu Learning Analogy

Learning progress appears linear in time (x-axis), but the y-axis value has an exponent greater than 1 which means that learning compounds. The second derivative, or acceleration of learning is positive, which means that learning more improves your future rate of learning.

Now, given all of this explanation about learning, how tf do we apply this to working with AI agents?

To get better, we must disconnect

I think it's great that everyone is spending a lot of time with AI Agents, trying to figure out how to most effectively use them. However, that is now just table stakes for being an effective engineer today.

The language that LLMs speak is the written word. If you are not proficient in expressing yourself in writing, how are you supposed to access the deepest parts of the LLM and the knowledge it contains?

You can try to trudge your way there through self-exploration of just recursively asking it questions. You get to continually ask better and better questions, which is the incredible self-bootstrapping power of this technology.

I also believe there's a complementary way: to read a lot of the encoded knowledge that the LLM has in human-written book form. That gives us new ideas and better questions to ask from a particular human's perspective.

“Tech Bro Discovers Old-Fashioned Reading”

Shocking, I know, but I think intentionally taking time weekly to disconnect from using agents is going to be important for our long-term sustainability of this pursuit moving forward.

Up until two weeks ago, I had not even thought of formal verification or interrogated an LLM about it. It was until I got the idea to do so, did it reveal its secrets to me. This is sort of an extension of a thought I had in November-December 2025: if we're going to be managers of these LLMs, we have to know what to ask of it to get the most out of it. I believe we can do that by reading books: technical, business, and otherwise.

If you can ask the agent, "Hey, optimize this AWS network configuration for load balancing," you have to know the existence of at least some of these words to ask it. Once you know the words, it will probably get most of the way to solving the problem if you can just nudge it in that direction.

This means we need a large vocabulary of technical and domain-specific jargon to have maximal optionality and mastery over the agent. This means not just mastery of technical concepts, but of domain-specific ones that we’re trying to write code around. These domain-specific ideas could be orthogonal or complementary to a given problem you’re trying to solve. Anything that can help access the deepest recesses of the model.

Because everyone is using these tools and is asking them similar things, if you can ask an agent different things, you’ll get different results. That's a literal business differentiator. By taking in new information and entertaining new ideas, you can potentially be inspired to create things that are not like what everyone else is doing.

We can get pretty far by just self-bootstrapping and self-learning with the LLM and just asking it, "Hey, what should we do?" If we become a better student, we'll have better questions to ask of it, and that requires prior knowledge.

Asking better questions is the key skill to develop.

I think disconnecting more and doing more analog activities will allow us to maintain our mental and physical health, and ultimately make us better at this vibe coding thing. It's a marathon, not a sprint.

Treat Vibe Coding Like an Athlete

When I train Jiu-Jitsu for too many days in a row, it always crushes my central nervous system (CNS) and I get sick. Especially when I would sign up to compete. Just knowing that there's a tournament on the horizon is an additional large stressor placed on my system. I would often get sick prior to or after a tournament.

With no peaking point in any given month or year, with no competition to take a break after, Vibe Coding is just going to be this ongoing, ever-increasing source of stress that will definitely lead to burnout and health implications for those who do not manage this well. High performance for a prolonged period of time with no regulation is not healthy. Doing anything demanding with no rest for seven days a week is not healthy.

We need rest days. We need to unplug. We're all still trying to figure our way around this. I think this is the biggest thing for long-term usage of these tools: we have to have a healthy distance from them and treat it as a stimulus that we can control.

An adage in weight lifting is that we want to stimulate, not annihilate. We have to apply periodized training principles to Vibe Coding, because like Stevie Yegge says in “The AI Vampire” article, coding with Agents intensifies work. We cannot pretend like it's the same thing as manual coding work anymore.

We need to build in disconnection days or even daily disconnection times in order to maintain our mental and physical health. I'm trying to throw us a bone here by saying that by reading, we're still advancing our agentic software development skills even though we are not directly doing the thing itself.

Just like how an athlete can't maintain their body with just their sport alone. Athletes need to do complementary strength and conditioning work in order to stay healthy and have a successful long-term career.

I know it's scary right now, with a bunch of talk of a K-shaped economy, looming threats of layoffs and massive job loss. Just remember, if you don't have your mental and physical health, nothing else matters. Your health is the foundation for all of this fun, exciting, demanding, challenging, and rewarding stuff.

I ignored mine for a bit because I wanted to get on top of this AI coding trend and stay ahead of it. Now I am refocusing on moving my body daily and eating good food that will ultimately support my long-term vibe coding performance. I think my initial sprint with AI coding is done, and the real long tail of gains will be made in the coming years in a sustainable fashion, day in and day out.

[1] He doesn’t have to use bare AWS for his project, and probably benefits from someone else taking on the risk of hosting his app, but at least he now knows that’s a potential alternative option. He's currently going with Claude’s suggestion of a secrets manager service that sits in between all of his services. When I asked Claude, it suggested the same and voted against the AWS option for him for now.

[2] yes, I know an exponential function means something very specific in math, like y = 2^x. People like to exaggerate by saying something is “exponentially” bigger or better, so I’m trying to connect it to a word they may already use.

Reply

Avatar

or to participate

Keep Reading