Coding with AI

Yeah, it's just LLMs doing the boring part.

There have been a lot of changes over my career, and this one is very much the same as the older ones.

Going from local hardware to cloud hardware. Going from single app to distributed app. Going from waterfall to agile work.

AI will only be more disruptive if people can actually take advantage of it.

Starting off skeptical

When OpenAI launched, it sounded like a scam.

When LLMs started getting shared in the tech circles and all they could do, I wasn’t excited.

I gave Tab9 a shot and it was like trying to help someone who couldn’t code write code. Someone with no perception of their own limitations.

Use the ambiguity of language to bring it to life!

Code has hard, fast rules, kinda like chess. If it’s wrong, it doesn’t work, and people get mad.

Chat has sloppy nonsense rules where any answer can have meaning layered onto it by the nature of the meaning imbued into it by the asker. I’ve been in many conversations where after zoning out while someone else is talking, I’m snapped back to reality when someone prompts, “Mike, what do you think?” and instead of admitting my brain was betraying me, I’ll just say something trite and see how it goes over. Was it sarcasm? We’ll find out when everyone’s faces start contorting.

When you put questions into a chatbot and it only understands a bit of it, it can still respond with something plausible. If it’s a bit wrong, that may feel like the other participant steering the conversation. If it’s a perfect answer, then it’s the luck of the draw.

GPT 3.5-ish is where I noticed a change

Earlier uses of GPT were shoddy enough that I wasn’t alarmed. I wasn’t interested. It would be a net negative to try to wrangle the robot instead of just reading the docs and writing the code.

After 3.5, it felt like the React stuff it could make was plausible. It was better than I could do off the top of my head.

The ability of a code assistant to do anything beyond trivial actually appeared to me during this phase, but it was still very frustrating.

YEARS PASS, I found Cursor

Cursor was the first UX that was really good. The early approach was a nicely controlled workflow.

  1. add context to a tab
  2. give it a tiny bit of work to perform
  3. watch it give it a go and see what happened

Pretty quickly it would be clear if the requested things were understood. The hit rate for me early on was less than 50/50 but because the wins were worth 20+ minutes of work, it actually was speeding things up.

My first real application of this was when I built the Libhoey receiver for the OpenTelemetry Collector. It was open source so none of the privacy or secrecy implications were present. I was free to go wild with untested AI stuff and see what would happen.

Naturally, I wrote the entire thing to the point it worked perfectly without using any LLMs at all.

Refactoring exercises

When submitting code to an open source repo, the people who work on it mostly have ways they want things to look. A lot of PR review stuff is about the names of things and the breakdown of classes and modules.

Is it a good enough idea? maybe,

Is that variable name unclear to someone who has spent a billion hours staring at the code? Fix that.

So when a bunch of “move the initialization logic over here, move the logic into internal modules, move the…” came back, I was like: “okay.”

Then I remembered that this Cursor thing was pretty good at tedious stuff like moving a bunch of functions and types from one place to another, so I just fired it up, said “move these functions into a module,” and it did.

Because I had a working copy in my git repo, it was perfectly safe to let it go wild with taking big swings, rewriting a bunch of stuff, and then comparing the output with my known-good code.

Vibe coding a replacement app

A long time ago, I wrote an app in .NET WebForms, and it has endured. A few times over the years, I’ve gone through attempting to rewrite it with the most trendy languages and frameworks. It always became clear that the nuances coded into that app were what made it valuable and terrible.

With the ability to test things very rapidly in terms of visuals and workflow, I decided to give it another go. This time, I was just going to have Cursor write everything and intervene in code only when it was necessary.

This experience was fascinating and enjoyable.

The ability to have it build a half-dozen different user experiences and just throw away the ones that don’t look good or don’t scale is amazing. If I had to spend 20 minutes on spacing and shapes, I’d be far less likely to delete the whole thing when it doesn’t come out being ergonomic and intuitive. This app is a pretty simple workflow but it has quite a bit of administrative tooling and audit logs and things like that. Having something else build out the tedious audit mechanisms and financial validations was wonderful.

Don’t get lazy though, especially with audit and validations. The cursorrules file tells cursor to always audit anything that changes anything and yet, sometimes, it falls out of the context window so I have to remind it.

Honestly, it’s the same way it works for when I’m writing the code. Sometimes I forget to audit an endpoint. Someone has to tell me. Now I’m at the higher mental abstraction to keep an eye on that stuff.

Integrating with a new-ish and quirky library

Before my vibe coding adventure, I tried Copilot and Zed and a few others, but nothing else was able to build the same rapport that Cursor did. While doing open source work and testing out these other LLM-assisted IDEs, Cursor also got way better.

The tab in the side went from doing 5% of my coding to 50% pretty quickly. I was still giving it small tasks around using libraries that I wasn’t familiar with. Any time I had to instrument something with OpenTelemetry, any time I asked Cursor to do it, it would just fail. There’d be some nuanced dependency thing, it would pull an older version of the library, or it would reference some long-since-changed examples. OpenTelemetry is hard because it’s changed a lot during and after the last few model training rounds.

Giving Claude Code a whirl

There isn’t much difference between running commands in a terminal and Cursor’s vertical stream of code change -> description -> response -> code change -> description -> response…

I decided to give Claude Code a try when instrumenting a Next.js app that I spun up from scratch a few days prior. The Claude Code API is unrelated to the Claude.ai API so I had to give them another $5 for some reason. Anyway, after doing that, I spent 1/5th of that on just getting it oriented to the repo. Claude Code uses a lot of tokens and each token is like, noticably costly.

Once it knew what it was looking at, I asked it to instrument the codebase. Since I don’t have a way to feed in certain pieces of text, files, console output, etc., I was at its mercy to do lots of find and grep. It got everything 90% perfect. The last 10% was just copying and pasting an example file so Next’s client-side instrumentation would work. That example is only 2 months old, and there are dozens of others that look slightly different, so I’m not surprised it needed a bit of a nudge.

Once we were past that hurdle, I saw lots of recovered exceptions firing all the time. Yay Javascript.

The next thing I asked of this Claude Code (with $2.50 left) was to add Jest tests to a bunch of pages and APIs. It did this part of identifying the types of things to test really well. It seemed to be running the tests but couldn’t because it never installed the dependencies.

When I ran the tests, 3/4 of them failed because it needed some kind of test data or subtly different input.

Back to Cursor for clean-up

With the test framework in place and a reasonable set of tests established, I gave Cursor the command to run them, and it just looped a dozen times and fixed stuff. Several fixes were just selectors or test data changes. A couple were about how a test suite was initialized that caused it to have no data.

Cursor is still really good at saying “lookie here! FIX PLZ” and it just gets it done.

Cleaning up the blog

I hadn’t updated things in here in a while, so when I fired it up, Hugo threw a bunch of errors.

cursor.png

Asking Cursor to fix my blog.

It was able to point at all the places in the config and templates that needed adjustment. Saved me 5 to 10 minutes.

It saves a lot of 5 to 10 minuteses.

Also, VS Code forks… yeah

Having worked with the folks at Coder for a while before I went to Honeycomb, I know how tough it can be to live and die by the output of Microsoft. The way they integrated the UX into VS Code is outstanding. Seeing how Zed struggles with some of it because they’re building the entire experience is quite interesting to watch. Zed has no constraints so they can do things “perfectly” so the code is elegant and the UX isn’t great. VS Code has decades of refining UX to a good-enough point, so all Cursor folks have to do is work as hard as anyone has ever worked to stay ahead of Copilot, which keeps trying to add all the Cursor features.

Best of luck out there, Cursor folks!

ai  llm  tools  cursor