When AI Slaps You in the Face

By Jeffrey M. Barber 9 min read

So, what happens when you increase the complexity? Well, you get interesting results. In this post, I'll document three experiences that required deep debugging.

First: WebSocket Activation

I'm taking it slow on how I approach using Claude with Adama. My strategy is to teach the machine to produce Adama code, review all of it, and help me build test harnesses. Beyond producing test data, I'm also producing documentation that's up to snuff. It did a great job, but subtle things were wrong. That's normal in the editing process—I've worked with documentation writers at scale. As such, I have a lot of reviewing and editing ahead of me. My feedback will ultimately improve the docs and provide more context and data for the AI to learn from. At the end of the day, I'm 100% responsible for the contents.

Amazingly, the AI was able to vibe-code me a reasonable and super nice first pass of a "book generator" such that I can maintain my vertical integration and super monolith. So, Adama now has a special-purpose documentation generator, which is cool. It has a lot of modern features that I learned about from Claude. This leaves a good taste in my mouth, right? You may wonder, "Why do you need yet another documentation generator?" Well, I don't—except I want to validate that every snippet produced by the AI can be parsed. So, my special book generator will ensure every Adama snippet compiles.

Then, it occurred to me that this would be great for building a simple MCP server that allows the AI to better understand the language. Beyond invoking tools, this is a great way to encode the Adama documentation into the language so that the AI can use MCP to figure things out. I'm not 100% sure about that path yet, but it's an experiment. The immediate value-add is the ability to compile Adama (and future RxHTML code).

It got a shocking amount of the "abstract" MCP server right, but it broke unit tests. I tried to vibe my way out of it, but it kept getting worse until I had to simply revert everything. This is good for breaking my habit of letting the AI run wild. I went into human mode and introduced a layer of abstraction that the AI was struggling with, along with challenges with Netty. Then, I was able to vibe-code a bunch of working stuff, except the new tests were failing. I tried to vibe them into working, but it never worked, and then other unit tests started failing. So, I reverted again and got back to where the new tests were failing.

It took a hot second to debug the issue, but the key issue was that the test code was faulty at a low level: it wasn't waiting to write to the channel before the handshake. Now, this could be a "my problem" in how it leveraged an existing test harness, but this has to do with which signal to unblock the test. It's an isolated test case, so I forked the state of the code and tried to hack at it. It didn't work. I burned 25% of my Claude max window budget for it to fail. I tried a lot of prompts. This is a subtle problem. Let's blame my code (fine), but why didn't this magic box figure it out?

Perhaps it's my patience; I need to update my build system and run on a faster machine. Maybe it could have solved the problem if it could build, compile, and poke the test. For me to figure it out, I had to do some logging. The first (and easy) problem is that Netty's pipeline can't leverage multiple WebSocketServerProtocolHandlers; that's a knowledge problem it could have figured out. The second problem with the broken tests was annoying.

Now, these may seem picky, but the meme of debugging is very true. On the bright side, I got a lot done very quickly. I would have hit the multiple WebSocketServerProtocolHandler trap as well. I would have fixed the test issue after writing the first test. I got a lot of coverage without writing tests. All in all, it was a huge win.

Once I got past that, setting up the final MCP keystone was easy, and it did a great job of allowing Claude to compile Adama documents.

Second: Teaching the Adama Language and Defects

I'm exploring different paths toward teaching the LLM how to leverage Adama. On the positive side, I had it rewrite the existing documentation and generate code samples from a variety of existing code examples. The newly vibe-coded documentation engine for turning my Markdown into a website can tell me what percent of code samples even compile.

Impressively, it got 80%, and it was able to refine it automatically with a bunch of source code. I have a huge task ahead of me to refine the documentation to be correct, and I will do this in a variety of ways.

I'm trying to take an AI-first approach to these problems, as it raises so many questions about how to do hard things faster, but it keeps my brain active.

Doing large tasks with an agent requires work, since it can go off in bad directions and spiral out of control. A big challenge right now is that I use a monorepo for the entire project and have a big one-shot build system, which requires 10 minutes on my machine to build. This works for me as a human since I'm an IDE-bro that leverages unit tests like a bad MF to poke and probe my system. However, I have to bring that IDE capability to the AI, so I reverted a big batch of questionable work and am letting the AI try again.

Since AI is so fast, the tooling it uses should also be responsive so you can iterate faster. My build system is holding me back at this time, so I'm going to optimize it dramatically. From there, a strategy is to enrich the error reports from the compiler dramatically such that the documentation is ultimately encoded in the errors. That way, the AI doesn't have to guess at the documentation second-hand. Instead, the AI will have sufficient prompts from the compiler.

A compounding factor is that the AI can't hold the context of partial rewrites. For example, I have the start of a new typing system in place, but it doesn't realize that it's not nearly done, and it works on it as if it is important. This is a problem of solo-man pathologies—I'm finding it hard to have a mainline path along with experimental work, but the AI is less capable of keeping a bunch of context unrelated to the current task. Now, maybe I upgrade my prompts and do a large context dump of everything in flight, but this is awkward. I think I have to adjust how I work such that I treat it like I'm in a team environment again and avoid solo-man pathologies.

The meta observation is that if you don't do things well, then expect the AI to take things off the rails. Shipping software is still your #1 priority.

Third: Avoiding Large Jobs

Adama is a complex idea in itself, so the pure context required to capture every single feature is not possible. Even within just the language, I can spend a lot of tokens and go nowhere.

For example, I'm unable to get it to correctly transcribe the old type system to a new one without it going off the rails. At least, I can't get it done without using $10K-plus of tokens. I can bust my entire subscription usage window very quickly—too quickly.

At this time, large projects require embracing a new management pattern and shaping context, which requires organizational thinking. You have no choice but to flex and grow the skill in managing agents like large organizations, which are all about creating specialists.

However, even complex and compact things can easily outpace the ability of AI to achieve work well. This is a new skill, for sure. Critics are missing out on understanding how this will affect them as first movers work to capture the next wave of value.

Future of Leadership

I can't help but think how this is going to affect the future. The one thing I do know is that the way organizations train junior talent has been forever changed. And the capacity for the next generation to learn to lead is also changed.

For example, it used to be that a future leader could learn some leadership by being given an intern to build something during a summer session. Sometimes, it's just a small thing or a new component that is easy to isolate and not high-risk nor high-priority. Sometimes, the entire thing is thrown away, as there is value in developing talent for the long term and giving interns a great experience.

All of that is going to change as the queue of ideas can be drained very quickly now. Those one-off projects can be vibed in a few hours. It's shocking, but now the question is how to identify and grow talent in a meaningful way. It's hard in this new world.

Worse yet, I'm retired, and I'm working super hard. I'm going fast and building value for myself, but my mind is feeling the sheer strain of this productivity. I'm making too much, and the technology that should have liberated me is pushing me even harder in even more directions than I can manage.

It's exciting, but I fear what will happen in new ways. It's going to be a wild ride.