AI and More AI

Guess what? I'm still doing AI! Shocking, I know.

I've had significantly more interaction with various AI tools (though mostly Claude, with a little Gemini) since my last post, and I have a bunch more thoughts. I also have some updates on topics I previously discussed.

Meeting Summaries

Just a few quick notes here:
- I still like letting AI take meeting notes. It's not perfect, but it's generally good enough
- I have found a few major factual errors in summaries, most notably when Gemini claimed I agreed to something in a meeting I was not even present at. Please do not consider AI-generated meeting notes to be authoritative.

Claude Code

My company has adopted Claude Code as the primary AI tool for developers, so I've spent a decent amount of time working with it and exploring its limits. Here are a few areas I've used (or attempted to use) it for productive work.

Smart Copy-Paste

We have one poorly designed API with a lot of repetition. That means when we develop a new feature we tend to do it in one place and then copy it to the others once we've nailed down the design. I recently worked on such a feature and Claude was quite helpful for that. I largely wrote the initial design by hand (because most of it was discussion with other teams, not technical changes to the code), but when the time came to apply my changes to the other locations, Claude made it very easy. I told it to apply the changes from the files I modified to the other affected files, and it just did it. Tests passed on the first try. There were small changes required in each of the other locations as well, which Claude handled admirably.

Was it faster than just making the changes by hand? Unsure. I don't think it was any slower than making the changes by hand, but at the same time what I was trying to do was pretty simple and would not have taken me long to complete myself. It did require less cognitive load on my part, so that was a definite win.

Python

We had some changes to our build process that invalidated a script I use semi-regularly. Essentially this script scraped some internal sites and drilled down to some logs to determine the exact contents of our container images, without having to actually pull and run those images. I decided to turn Claude loose on the problem, and made some interesting discoveries.

1) It found a source for the version information that I was not previously aware of. It turns out our internal site had started publishing the information I was looking for at a higher level so there was less need to follow deeper links. Claude was able to handle this simple case pretty easily and I briefly wondered if AI would actually replace me. However...

2) After a bit more testing, I discovered that older releases did not publish the necessary information at a higher level, and did require digging through links. I tried to prompt Claude to follow those links, but it was unsuccessful. I think there were a couple of reasons for this: First, Claude didn't want to go deeper than one level of links (the data was multiple links deep). Maybe I could have directed it to do that, but at some point if I have to tell it exactly what to do there's no benefit. Also, I later discovered that Claude can't handle dynamically generated pages, which is where the content I needed to scrape was found. I was able to modify the original Python script to handle that, but Claude was never going to be able to because it couldn't see the content that needed to be processed.

In general, Claude was excellent for scaffolding the new script and even implementing the simple case. However, it fell down a bit when things got more complex and required quite a bit of human intervention.

Complex tasks

Another piece of work I tried to implement with Claude was a change to our CI system. Some of our test infrastructure is being retired and we needed to make sure all of our test jobs were moved off it. I knew some of the jobs had already moved, but I thought some had not (more on that later). Because of that, I turned Claude loose on the git history of our CI config repo and asked it to find a commit where we moved away from the old profile to a new one. It turns out that was a trick question - there was no such commit. The migration had been done at a different level and the profile name had remained the same (which is confusing, but that's a whole other discussion). Interestingly, Claude did come up with the correct answer...eventually.

So I asked a stupid question and got an answer anyway. Why am I even bringing this up? Because it took Claude 2 hours to conclude that what I was looking for did not exist. In the meantime, I had figured that out for myself and moved on to other work. I left Claude running just out of curiosity, and I'm glad I did because it was an interesting result. As I noted in a previous post, AI doesn't like to say "I don't know" or "Your question doesn't make sense", so while the fact that it did eventually conclude what I asked for did not exist is impressive, the fact that it took so long was still not great.

API Hallucination

This is a longstanding, known issue with AI coding agents, but I had an experience that was a microcosm of the problems people tend to run into with AI using non-existent APIs or creating new APIs that duplicate existing ones. In one attempt to have Claude assist me, I ran into both in close succession. I was attempting to modify a field on a system object using a library (I'm going to keep this generic because the specifics don't matter). I asked Claude to give me code to do that. It obliged. I added the code, compiled, and...the function name it provided did not exist, and never had. Pure hallucination. I told Claude this, and it spat back dozens of lines of code to write the function myself. I did not particularly want to do that, so I looked a little deeper into the library I was using. It turns out there is a function to do what I wanted, it just isn't named what Claude told me it was. Further, the contents of that function in the library were essentially (maybe exactly) what Claude had given me to implement the function myself. One could argue this is not strictly a hallucination, but it is a bad response to the original hallucination.

Both of those are terrible outcomes. In a way, just getting the function name wrong was the lesser of two evils, because it was easily verified as incorrect. Duplicating code from the library is a more insidious failure because it leaves you with a bunch of extraneous code to maintain for absolutely no benefit, and Claude would have gotten away with it too, if it hadn't been for those darn kids humans. What it gave me probably would have compiled and solved the problem, but it would have been the wrong solution.

Conclusions?

AI remains a very mixed bag. When it works, it's pretty slick. When it doesn't, good luck. While it seems to do well with simpler tasks, complex ones are still elusive, sometimes even when the user provides hints.

I continue to have similar concerns to my previous post. It's not just a question of AI getting things right or wrong, it's also the way it can be both right and wrong in the same answer. I had that problem with the chat bot I trained, and the second hallucination case is another good example. What AI gave me might have seemed correct at a shallow glance, but to someone who knows about the topic it was clearly not. What's the point of using AI if you have to already know the things you are asking it about?