At work we recently started experimenting with generative AI for assistance with programming. We have a new Visual Studio Code plugin which we can ask questions in English, and it spits back code. It was a really interesting piece of, well, mandatory training. I've formed some opinions.
The main thing I dislike about AI coding assistance is that I have to very carefully review the AI's code to make sure that it does the right thing and none of the wrong things. And I, personally, find that reviewing code is more difficult than writing equivalent code myself.
I don't know if anybody else has this experience, but I understand code better when I interact with it directly by hand. Either by bringing code into existence, or by refactoring code which was already there to do something new, or to do the same thing more effectively. I need to take things apart and put them back together. I need to play with the code.
Passively reading someone else's code in the dead form of a git diff just doesn't engage my brain in the same way. It takes more focus, it's much easier to overlook vital details, there's no flow state, and it's much less interesting. Of course, I do it, all the time, and I have no problem with it. Half of the job is code review, and unblocking my fellow developers is always a high priority. But I know which half of the job I enjoy more. (There are several other "halves" of any development job, but let's not confuse things.)
It also takes more time. I can write a simple switch/case statement or an explanatory docstring for a middle-sized method more quickly than I can get an AI to do this for me and then manually verify its correctness to an equivalent level of confidence. This applies even if the AI writing time is zero.
For example. I have no idea how to launch a HTTP request from C++. The assistant I tried out was able to provide apparently working code for this, shortcutting my need to spend hours familiarising myself with new syntax and APIs. It compiled and ran and did the thing.
However! Without a priori knowledge of C++ or the HTTP library in question, how am I supposed to know that the AI hasn't made some tragic blunder with its seemingly working implementation? Or misused the APIs in question? I don't even know what classes of problems to look for. So I'm back to checking the AI's work against the documentation. And the value the AI has provided to me — or whatever experienced C++ developer I inflict this PR on — is not zero, but it's not the giant leap head start it looks like at face value.
A similar example would be if I said, "Hey computer, this project uses webpack as its bundler. I want you to convert it to use esbuild instead." Even if the AI appears to do the thing, I'm still going to have to wade through some esbuild documentation to make sure it did it right, right?
Additionally, I find that describing my requirements to the AI in technical detail is often (not always) basically equivalent in complexity to just writing the code. And takes a similar amount of time to writing the code.
Code is actually really efficient, relative to human language. It's extremely dense, expressive and specific. Compare the amount of time taken to write out:
For each element in the field array, we need to emit a debug log entry explaining what element we're working on. Then, examine the
typeproperty on the element. If it's "bool", then we want to add the element to the array of Boolean fields, otherwise, add it to the array of non-Boolean fields. In either case, emit a debug log entry saying what we did.
versus:
for (const field of fieldArray) {
logger.debug('processing field', field)
if (field.type === 'bool') {
booleanFields.push(field)
logger.debug('field is Boolean')
} else {
nonBooleanFields.push(field)
logger.debug('field is not Boolean')
}
}
The amount of typing required for each of these is comparable. The amount of time taken to properly phrase the English prompt is comparable with the amount of time taken to just write the code directly. The phrasing will need altering a few times to get workable output. And the amount of refactoring of the generated code to create something equivalent to the handwritten code is always non-zero. So, have we gained anything?
Or imagine we have an existing piece of code generated, but the AI has missed something. Compare the effort of messaging it to say:
If the first argument isn't a number, the function should throw an exception saying "not a number".
and having it regenerate its work, versus just plumbing in:
if (typeof a !== 'number') {
throw Error('not a number')
}
The English is fifty percent longer. It's also far more ambiguous. What about NaN, BigInts, numeric strings, boxed Number objects? The code is shorter, and abundantly clear on what should happen in all of those cases.
On top of all of that, I think I would have less of a problem with manually reviewing AI-generated code if I felt that this was actually helping someone get better at coding. I know that feeding positive and negative votes back to the AI will influence its internal logic and ultimately improve the quality of its output. Good for it. But, to borrow a phrase and twist it a bit, I value individuals and interactions over processes and tools. That is, I'm more personally invested in my fellow developers' professional development than I am in training any machine. As much of a chore as it can be, truthfully, I do enjoy reviewing my teammates' code because it's an opportunity to share good practices, and make good decisions, and, as months pass, watch them develop.
To back up a little: Humans are extremely bad at software development and we need all the help we can get.
I'm hugely in favour of human processes which make this better. Peer review, best practices, rules of thumb, comments and documentation, forms and checklists. I'm also hugely in favour of automated tools which augment the human process. Linters, strong type systems, fuzzers, mutation testing. But this shouldn't be an either/or thing. I want all the human checks and all the technological checks.
And as it's currently positioned, AI coding assistance seems like an addition, but it actually removes the human from a part of the process where the human was actually extremely valuable. When a human writes code, we have the original programmer to vouch for their code, plus another person to review and double-check their work. With AI assistance, we don't get that first "guarantee". Nobody with the vital domain knowledge which comes from inhabiting physical reality, or who was present at the meetings where we discussed these new features, participated in the creation of the code. We basically just have two peer reviews.
Essentially, AI assistance replaces coding with code review.
Which is bad, for all the reasons I mentioned, but also, and mainly, because I love coding!
(In the same way, self-driving cars replace the experience of driving with the experience of being a driving instructor. A lot of the same objections apply here. Most of us who have some experience with driving are capable of acting as a driving instructor, but it is more difficult and stressful than driving, requiring greater vigilance. It's a different mental process from driving and it's much less enjoyable than driving.)
So I feel that the better place for the AI assistant is in a supporting role like: "Hey, AI. I have written this code. Do you see anything wrong with it? Could it be improved? Have I missed an obvious simplification? Do my unit tests seem to be exhaustive?"
This also makes it much simpler for us to evaluate the AI's contributions, in a relatively objective fashion. Does it have anything useful to say? Is it generally worth listening to or is it wasting our time?
I think this approach generalises pretty well to other fields where AI is being applied. Not to do work and then have a human review it, but to review human work. "Do you agree? What have I missed?"
And if the AI is all false results, well... I will disable a tool which doesn't add value. I've done it before.
Discussion (29)
2024-04-05 02:03:57 by qntm:
2024-04-05 02:18:26 by osmarks:
2024-04-05 03:13:06 by Emily:
2024-04-05 03:29:15 by ebenezer:
2024-04-05 06:12:34 by Tetragramm:
2024-04-05 07:20:32 by Aybri:
2024-04-05 11:35:33 by zaratustra:
2024-04-05 12:45:20 by mavant:
2024-04-05 12:59:43 by James Fryer:
2024-04-05 15:41:50 by Oliver:
2024-04-06 15:05:29 by BoppreH:
2024-04-06 22:49:50 by Nathan:
2024-04-07 13:45:11 by JP:
2024-04-07 23:35:53 by TJSomething:
2024-04-08 00:39:19 by qntm:
2024-04-09 12:41:31 by RobFisher:
2024-04-09 22:15:36 by sdfgeoff:
2024-04-11 04:56:55 by dmz:
2024-04-14 11:33:28 by asdf123:
2024-04-14 23:06:43 by mesyaria:
2024-04-20 20:57:55 by David:
2024-05-03 01:27:01 by randomhuman:
2024-05-04 00:18:06 by Jared:
2024-05-19 09:21:56 by Toricon:
2024-05-19 18:16:40 by lalaithion:
2024-06-18 17:02:18 by Harry:
2024-07-03 22:26:48 by Xib:
2024-08-29 16:18:26 by Krazykat:
2024-09-29 02:07:50 by Inglonias: