Tools and Agents

I had an interesting conversation with a colleague after my last post, about whether it was possible to get the best of both worlds, or if any new tool would necessarily make you dumber. And I had to stop and think – what was it about an IDE with autocomplete that seemed unobjectionable, while LLM-based AI was a path toward damaged skills? I found myself thinking about the distinction between tools and agents drawn by Ben Thompson in his article Tech’s Two Philosophies.

In Google’s view, computers help you get things done — and save you time — by doing things for you.

…technology’s second philosophy [is] orthogonal to the other: the expectation is not that the computer does your work for you, but rather that the computer enables you to do your work better and more efficiently.

In this definition, a “tool” is a technology that amplifies our abilities, helping us to leverage our skills more effectively. “Agents”, on the other hand, are technologies that perform tasks for us, obviating the need for us to do them at all. When we use tools, we’re continuing to reinforce core skills, whereas using agents negates the need for these skills.

This makes sense to me, but feels incomplete. Because even tools replace skills. I used to be able to memorize phone numbers, but now I have a phone that does that for me. If I didn’t use an IDE with autocomplete, I’d have to remember function names. I use a higher order programming language instead of assembly, which prevents me from reinforcing my understanding of low-level memory constructs.

And, in fact, these are usually the examples people trot out when talking about AI. It’s just the same as using an IDE, they say. Or using a higher order language. Just one more step up the ladder of abstractions. In order to make sense of this, I think we need to make another distinction: that is, between deep and shallow skills. Being able to play a musical instrument, speak a foreign language, weave a tapestry, paint a beautiful oil painting, or architect and implement a complex piece of code – these are all deep skills that take years to learn, thousands of hours of intentional practice, and a lifetime to master.

On the other hand, learning the command line arguments to a bash command, memorizing phone numbers, or learning how to navigate a particular website – these are all shallow skills. We pick up shallow skills all the time, and we forget them all the time, and this isn’t a big deal because the cost to learn them again is so low.

As such, we can define tools as technologies that specifically replace shallow skills. When my IDE autocompletes a function name, I no longer have to remember the name exactly, or the function signature – the IDE provides this information, and I can put it into my short-term memory instead of really learning it. But the cost of not knowing the exact function name is low, and easily remedied if necessary. I haven’t memorized phone numbers in decades, but I could start doing it again if I had to. Even becoming effective at a new programming language (or going back to assembly, heaven help you) is fairly straightforward. The specific language is shallow, the core software engineering and problem solving skills are deep.

But when I tell an agent to write code for me, I’m replacing a deep skill. I’m no longer exercising the skill I’ve spent decades learning, no longer learning new techniques, reinforcing existing knowledge, thinking critically about how the code works, evaluating whether it needs refactoring, etc. Over time, this is corrosive to the deep skills I’ve developed over a lifetime, and which have genuine value.

Using an agent is a lot like management. As a manager, I define goals, create tasks with sufficient details (prompts) for someone else to do them, then assign them to engineers on my team (coding agents). I expect other engineers (code review agents) to look over the PRs. This isn’t software engineering – this is engineering management. I have a hypothesis that the reason AI tools are so attractive to even well-meaning senior leadership is that although they were once strong technologists, they eventually moved into management, found success, and let their own deep technical skills wither away. Prompt engineering makes sense to them, because it’s what they do. And when they play with the tools themselves, it’s easy to build toy apps or throwaway prototypes. Surely production code requires just a little more effort!

It’s easy to make fun of management, and it’s common for individual contributors to deride the value of managers, executives, or that most hated class, middle managers. But what the pointy-haired boss jokes miss is that management and leadership are themselves deep skills, and critical ones, and that effective managers haven’t thrown away their technical skills for nothing – they’ve traded them for a different set of deep skills.

BUT. When you let your deep skills gather dust, and replace them with shallow skills (the whole sales pitch of AI is that anyone can pick it up easily), you’re reducing your value in one area with no corresponding gain in another. It’s like giving up chess to focus on tic-tac-toe. Part of the value of working at a company with interesting technical challenges is gaining experience and increasing your personal long-term value, independent of the company. But the promise of AI is an increase in productivity, with a loss in personal value, not a gain. Whether or not it delivers on the productivity is a question we can debate (I think you know my opinion); whether or not it’s damaging your skills has already been demonstrated in multiple studies.

To be clear – there’s nothing wrong with using an agent. We all do it all the time. I rely on other people to grow the food I eat, build the home I live in, clean the water I drink, develop the medicine I consume, and a thousand other things that I don’t even know about. None of these are skills I’ve developed, and consciously or unconsciously, I’ve decided to delegate these responsibilities to other people, who in turn delegate some tiny set of their responsibilities to me.

The problem comes when we replace ourselves with agents. Even in the optimistic scenario in which you’re able to generate quality product, you’ll be replacing your deep skill with a shallow one, and all of the skills you could be building, maintaining, and expanding will wither away over time.

This, then, is the answer to our question. Tools make us smarter, because they clear away low value tasks, and allow us to focus on high value activities that exercise our deep skills. Agents make us dumber, because they perform the tasks that require deep skills, and replace them with tasks that only need shallow skills.

Building a Dumber Team

Having kids is weird. Here are these human beings, with their own interests, likes and dislikes, and absolutely no knowledge. I mean yes, over time they build up experiences, but they start out with literally nothing. You have to teach them how to go to the bathroom, how to use a knife and fork, how to sit in a seat, etc., etc. All the things that you’ve forgotten there was any need to learn in the first place.

And they learn! It’s so much fun to watch, and continuously surprising, especially when they pick up skills you never learned. My kids are into crafts, and are constantly creating friendship bracelets, constructing incredible origami sculptures, and crocheting cute animals. As an adult with a lifetime of experiences, I try to think about how I can help them learn faster, or learn to enjoy the things I enjoy. This is mostly a fool’s errand – generally speaking, they like what they like, and are completely uninterested in my perspective. #parenting

A couple of days ago, I had an interesting conversation with one of my kids. They had been creating two different friendship bracelets, and had run out of beads for both. So they asked AI to create a pattern that combined the two, and voilà! They had a working design.

And it was interesting – I could see the logic of what they’d done. They didn’t want to have to do something difficult and confusing, where they had no confidence. So they asked the magic box to do it for them. All parents should recognize this situation – enough kids are using the magic box to write their essays that schools are now requiring students to write all essays in class. A Spanish teacher gave F’s to a bunch of kids who used Google Translate to do their homework (pro-tip: if you haven’t learned preterite yet, don’t use the past tense in your answers). Graduate students – who have famously bad writing skills – are suddenly turning in grammatically correct papers. (note: the only one of these I’m making up is none of them)

But in the alternate universe without AI, my kid would have had to figure out for themselves how to merge the two bracelets. It might not have been perfect, but the next time it would have been a little better, and the next time after that it would have been a little better, and so on. That’s how skill acquisition works. But that didn’t happen.

So what I told them was: “You can use AI to do something for you, and it may do it faster. It will also make you dumber.”

There’s a well-known model of skill acquisition, in which a learner progresses through multiple stages – beginner, advanced beginner, competence, etc., until they achieve mastery. The problem with this model, and with skill acquisition in general (again – ask any parent), is that learning deep skills is hard, and takes thousands of hours of directed practice. Much of that practice, especially in the beginning, is tedious, difficult, and emotionally punishing. It challenges our idea of ourselves as capable, intelligent, and special. But if you want to master a skill, there’s no other way.

Having a bodyguard doesn’t make you a martial arts master. Using Google Translate doesn’t make you fluent in another language. Pushing a button and generating an image doesn’t make you an artist. Knowing how to use your music app doesn’t make you a rock star.

“But Dan, in the old days people had to know how to take care of horses, and now we can all drive our cars to the mechanic when there’s a problem.” That’s true. But driving to the mechanic isn’t a skill – you’ve delegated that skill to an external agent, whom you expect to have actual skills, not just a fast search engine.

“But Dan, I’m just using AI to do things that aren’t core to my job, like unit tests.” Unit tests aren’t core to your job? And even if that were the case, there’s going to come a time when you need to understand why those unit tests are failing, but you won’t know how they work or what they’re supposed to do – any more than you know what the mechanic is doing in the garage.

“But Dan, I spent years learning the skill. Now I can use higher order tools to generate business value, and use the skills to make sure the AI gets things right.” Except it doesn’t work that way, does it? Just like lifting weights at the gym, when you complete a task, you aren’t just completing a task, you’re building and reinforcing a skill, and building and reinforcing knowledge about the code base. And just like sitting on the couch eating potato chips, when you ask someone else (a person, a technology) to do something instead of doing it yourself, your mental muscles atrophy when you stop using them. Just ask any manager.

So. You can use AI, and get to an answer faster, but it will make you dumber. That’s the choice. You can’t both use AI to complete a task, and get better at the core skill behind the task. Using AI is an active choice to let the deep skills you’ve developed for years atrophy, in favor of a hoped-for productivity gain.

As I was finishing up this blog post, I ran across a study of AI productivity gains in which open-source developers randomly completed tasks with or without AI. It’s definitely worth a read.

Methodology

To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.

Core Result

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

The whole premise of AI tools is that they provide a productivity improvement. I.e., that engineers who use AI will handily out-perform engineers who don’t. While this is just one study, and it’s hard to know how generalizable its results are, it’s worth recognizing that your own perception of your productivity gains might be completely off base.

Ever Again

Buckle up.

I grew up in the 1970s. I remember going to the Simon Wiesenthal Center as a child, meeting Holocaust survivors, hearing the stories, seeing the tattoos, watching videos of camps being liberated, looking at pictures of dead, emaciated human beings stacked like cordwood. Adults in those days didn’t shrink from talking about the Holocaust – it was just part of a normal 1970s Jewish education, whether at home or at shul, just part of the air we breathed.

And back in those days, we learned a simple rule. It was – and is – an absolute. There are no caveats, curlicues, or asterisks. There are no exceptions, no escape clauses. Here it is:

Genocide is wrong.

That’s it. This isn’t a political statement, and this isn’t controversial. It’s not up for debate. There are no special cases. No one gets special dispensation.

To be extra clear, because I know that people will always look for loopholes, I’d like to answer the most common replies up front:

But—

No buts – genocide is always wrong.

You don’t understand the special situation—

It doesn’t matter what the situation is – genocide is always wrong.

Akshually, this doesn’t really count as genocide—

Seriously? That’s your argument? Let me be extremely clear – genocide is wrong.

You should be using your platform to speak out against a different issue.

You’re making a bad faith argument. Even if you think that I don’t have the right to tell you this, or that I’m a hypocrite, or that I should be talking about some other issue, or that I’m a bad person for any reason at all, guess what! No matter what you think about me, genocide is still wrong.

There are many, many ways in which people can hurt other people, and genocide is at the absolute top of that list. It is the worst thing that humans can do to other humans. Genocide is always wrong. There is no situation in which it is not wrong.

If the above extraordinarily generic statements make you angry because you think I’m singling out a specific group, ask yourself why you pattern-matched on that specific group. Do you think what I’ve written above is wrong? Which of the above statements do you disagree with? Are there any you don’t passionately agree with?

This is a blog site about software engineering. And you might think that this is a weird departure, or perhaps that I should stay in my lane. My response is that there’s no lane that doesn’t include the fact that genocide is wrong.

Now’s the “so what?” part of the blog. The part where I tie it up for you in a bow and tell you what it all means. The part where I give you food for thought, or advice, and maybe you walk away and think some thoughts that you wouldn’t otherwise have thought.

I’m not going to do that here. Everything I have to say is in the above lines. It’s important for us to say it out loud. And maybe if enough of us say it, the world will start to believe it again.

Working Code

There’s a joke I like to tell my teams. Senior engineers nod in gruff acknowledgement, and junior engineers tend to laugh nervously and nod with (I hope) enlightenment. It goes like this:

If you’ve been working on a piece of code for a while, and it works the first time you run it, junior engineers will celebrate. Senior engineers will get scared, because they don’t know where it’s broken.

Can we all agree that engineers are machines that turn coffee into bugs? The first version of any code is going to be buggy, and senior engineers know that you need to work through the code repeatedly in order to debug – and even to understand – what you’ve written.

There are so many different kinds of bugs. Off-by-ones, logical mistakes, inefficient data structures or algorithms, mistaken requirements, misunderstood parameters, non-idempotent / non-reentrant code, memory leaks, race conditions, deadlock, livelock, and on and on and on.

Modern software engineering has developed a variety of techniques to try to find or prevent bugs, including linting, static analysis, automated tests, and code reviews – but no matter how awesome your unit tests, integration tests, and end-to-end test coverage, your code has bugs. And while your code reviewers might get lucky, it’s hard to find subtle problems in code that you haven’t been thinking about deeply, digging through, and living with for a while.

None of the above should be controversial, and it should all sound pretty familiar to anyone who’s worked in the industry for even a little while, but somehow it seems to fly out the window when people start talking about AI. You give the magic box a prompt, it generates code that works the first time out, and voilá – you’re done!

Of course you know that you have to go through and validate that it’s doing what you think it should be doing. You probably went through many iterations of your prompt to get it to build what you wanted, and you probably wrote prompts to generate a bunch of unit tests, but does it really do what you want? Did you define your prompts with absolute fidelity? How can you know? Because you’re a code reviewer now. And a code reviewer never understands the code as well as the person who wrote it.

The Restaurant Reviewer

When a new person joins my team, I like to have everyone go around and share some unusual fact about themselves. It’s a good ice breaker – everyone usually tries to think of something interesting, and I’ve had people who were bus drivers, sky divers, and in one memorable case, someone who’d eaten something he absolutely shouldn’t have eaten.

But we all have secrets, and one thing I’ve never shared with my coworkers is that for the past twenty years I’ve been a restaurant reviewer under a pseudonym for a local newspaper. It’s a bit of a dream job – I get paid to go out to great restaurants, eat the most exciting things on the menu, then write it up for my weekly readers.

My editor and I have been working together for years, and we have a pretty solid relationship. So I wasn’t particularly concerned when he set up a meeting to go over next month’s plans. It started uneventfully enough – we talked over different themes (like doing an exhaustive survey of all ramen restaurants in Boston), budgets, etc. The usual. Then, as we were wrapping up, he hit me with it.

“Hey Dan, one last thing. I have some exciting news.”

“That’s great! What is it?”

He took a metal box out from behind his desk. “Check this out.” He opened the top of the box, flipped the remains of his lunch in, closed it up, and pressed a button. A couple of lights flashed across the front of the box, then there was a dinging noise and a burning smell. He opened the box and pulled out a piece of paper. On it was a review of his sandwich.

“Pretty neat, huh?” He put the box on his desk and handed me the paper. “This little baby can review a meal in seconds. All you have to do is put the meal inside, press the button, and voilá! Out comes the review.”

I frowned and pursed my lips. “Okaaaay. What exactly do you want me to do with this?”

“Why, use it of course!” His eyes were alight. “In the old days you could only review one restaurant at a time, and you couldn’t try all the entrées you wanted. But now you can review a dozen different restaurants in one night, and go through as many menu items as you want. And the best part is, you don’t have to eat the food, or drink the wine! No more cocktails or desserts – all you have to do is make the reservations and order the food! The Review-o-matic® (patent pending) does all the hard work for you!”

I looked down at the sheet in my hands. Where even to begin? “Have you read this? ‘Hamburgers have been eaten by lords and ladies, kings and queens, and simple peasants ever since the age of Charlemagne.'” I scanned down through the rest of it. “None of this makes sense.”

He waved this away. “I know, I know, sometimes it comes up with some really wild stuff. You’ll need to edit it down, add your own style, you know.”

“And where does it come up with this stuff, anyway? I spent four years in culinary school, then another ten working my way up in the LA restaurant scene. I spent years getting my sommelier certification. When I talk about the interplay of flavors, or mouth feel, or service, I’m talking from experience. How do you expect this thing to be able to replace my decades of experience?”

He sat back down, and gave me a hard look. “Dan. We’ve been working together for years, and I have to tell you, restaurant review bots are the way the industry is going. It’s cheaper to have one of these things than a half dozen reviewers. Sure, you won’t actually be able to taste the food anymore, and maybe you’ll have to rewrite a lot of what the bot pumps out, but think about how much more productive you’ll be!”

It was hard to find the words. “But I like tasting the food. I like drinking the drinks. I do this job because I enjoy it. I don’t want to be a nanny for a machine that does the best parts of the job. You’re taking away the things that make this job fun, and replacing it with tedious chores. You’re turning me into a copyeditor, and not even for a talented writer, but for something that literally has no idea what it’s saying. It’s just stringing words together in a plausible way.”

He shook his head in hurt confusion. “I can’t understand why you’re so scared of this thing. Don’t you want to be more productive?”

“I’m not scared of this thing. I’m scared that you think that this thing can do what I do, and that you’re going to use it to put me into a job that I hate. The increased productivity will be a mirage. It’ll create initial write-ups like this one in seconds,” I waved the page in my hand, “and then you’ll have me, or someone like me, spend as much time as it would have taken anyway to rewrite it into something that actually makes sense. Your end-product will be worse, and your employees will be less happy. Eventually the people who knew what they were doing will leave, and you’ll hire people who like to copyedit, but don’t know much about food, or restaurants, or fine dining. Your articles will get worse and worse, and you won’t understand why people stop buying what you’re selling. After all, everyone’s so productive!”

He frowned, his forehead bunching up in a way that looked genuinely uncomfortable. “Dan, I talk to other editors all the time. Everyone’s doing this, and they’re telling me it’s amazing. They’re saving money, and the quality is just as high.”

What to say to this? I liked my editor. I knew there was no animus, no cynicism in him. He was trying to do the right thing for his business, and when everyone around him was telling him that this was the way, what was he to do?

“I don’t think you know this,” I said, “but in my day job I’m a software engineering manager.”

He laughed and shook his head. “You? An engineer? A manager?”

“I know, I know, it’s hard to believe. The funny thing is that I love to write code, but I decided at some point that I could have more impact by being a manager than continuing to write code, even though that’s what I enjoyed doing. So now I sit in meetings, review other peoples’ design docs and code, assign tasks, and write reports.”

He looked at me curiously.

I sighed. “You’re turning everyone into managers. Everyone’s just going to go to meetings, define tasks, review output generated by stochastic algorithms, and generally stop doing things that they enjoy. They’ll stop doing things that will build their skills. Quality will go down, and things will start breaking more often, because it wasn’t built right in the first place. And when something breaks, no one will know how to fix it, and the machine will come up with solutions that don’t make any sense, and no one will be able to tell why not.”

He shook his head. “I don’t know what to tell you. I have to worry about the big picture, and right now, the industry trend is automated reviews. No one promised you that things would stay the same, and I’m paying you for your output, not to do a job you love.” I could see the disappointment in his eyes. “And if this machine can drive greater productivity, then I need for you to use it.”

I shrugged. I knew I’d lost the battle. I’d been alive during the golden age of restaurant reviews, and I supposed that that was something. But it felt like something beautiful was being destroyed on a false premise, by people who knew how to count beans, but knew nothing about building a great product. And one day, all the people who could have told them what was happening would be gone.

Dan Dreams of Coding

Just trying to make sense of it all