Posted on Jul 25 • Originally published at swiderski.tech

The Genie's Curse: My Month with AI-Augmented Coding

You know that meme where a guy asks a genie to grant his wish, and the genie's sole purpose is to twist that wish to make him miserable? Like he wishes for $1 million and gets it, but inflation is 300,000,000% so he can't even afford bubble gum with it.

After a month of experience with vibe/AI-augmented coding, I have a similar feeling.

The Project

I used Cursor to build a native Android app for Todo/Pomodoro. Most basic stuff ever, but I got carried away a bit. What started simply ended up with task statistics, progress notifications, haptic feedback on setting changes, background noises, focus mode control, and even a watch companion app. Not so basic anymore - about 30k lines of code. Some of which are dead, because LLMs tend to excel at adding code, but struggle with cleaning up after it's no longer used.

I have pretty strict and comprehensive rule files that explain exactly, with code examples, how I want Views to be structured, what ViewModels should and should not be injected with, and what return types from UseCases should look like—all based on my preferred approach to building apps.

Rules are for people

It worked well most of the time, especially for the phone app. However, for the watch, there are significantly fewer sources, blog posts, and code examples available on GitHub. Especially when trying to use the latest Jetpack Compose with Material 3 features.

But Cursor was continuously doing some annoying things. Instead of using built-in view components like HorizontalPageIndicator, it created a new one from simple UI elements and added the entire logic to it. Not making a UI element to be reused, but instead hammering the code into the concrete View until it worked.

When told that Views should not use Services directly, it simply removed the calls, without changing to call a ViewModel to perform the action. I didn't notice that at first, and asked to add Services to DI, create their interfaces, and use those types rather than concrete implementations, which removed the services entirely. Therefore, none of the existing services were used directly, as they no longer existed.

Cursor rules are similar to speed limits in Poland - just a mere suggestion. Even though I had rules about keeping Views small, trying to extract reusable parts from them, and having each part use its own view model to maintain state, Cursor had no issues generating massive views with 1000+ lines, where a single View Model was doing way too much. A lot of logic was embedded into the View, rather than being moved out to UseCases.

After implementing some features, I wanted to split them into packages, so I wouldn't have 20 unrelated files in one generic directory. Oh, it failed miserably. Moving some files, deleting others, and not updating imports...It made such a mess I just reverted to last commit. I did it manually, and it was way faster.

I want to extract as much shared code as possible and try using it on iOS with Kotlin Multiplatform, but I'm reasonably sure there is no point in asking Cursor to do that for me.

I ruled out generating tests entirely. The LLM checks what code appears to do and then generates tests to verify it, rather than determining whether it produces something that actually makes sense. The tests were generating zero value when run alongside code generation, and they were causing compilation errors because Cursor was very stubborn about using Mockito instead of Mockk, even when the rules specifically stated what the entire tech stack was.
I think it was on Modern Software Engineering YT channe where I heard the idea of writing tests manually (do we really have to add an adjective now to precisely state code was NOT AI-generated?), but then using LLM to generate code that would pass the tests. Remember the promise that POs will write Cucumber BDD tests? You are the PO now. Is this the way?

The Annoying Dance

It also has an annoying tendency to add imports to nonexistent dependencies, but with names that make sense. For example, when I asked to add a button, the LLM did so, and also added import to a Material Design dependency that was not in the project. It fails on compilation, so then I have to ask politely to fix it. A lot of ping pong.

Usually, it works well when I point to other places in code that already have the desired structure or elements. But this is not quite vibe coding, right?

I noticed after a while that I have a lot of dead code. Cursor tends to add new functions, replace the call, but leave the old one hanging around. Removing it by hand is faster and safer than asking LLM for it. In my experience, it has trouble identifying unused code and can leave some parts out or remove used code instead. Little prankster.

I was using two IDEs at all times: Cursor (a glorified version of VSCode) and Android Studio. I'd rather have the Cursor plugin in my primary IDE, which has all the tools I need for Android development. I started using my iPad as my third screen, to have a big prompt window there, as my 34-inch ultrawide was occupied with Android Studio running two emulators, highlighting code issues that VSCode missed, and managing file order. Oh, and my laptop screen to search online for Material Design UI elements that Cursor was intentionally not using.

There were studies and opinionated videos about how AI is making programmers slower (and dumber). There is something to it. I don't know my project codebase by heart, unlike the ones I wrote myself. Whatever I would learn by grunting through compilation errors, and UI issues, LLM just handled me for ~~free~~ 20$ a month.
LLMs have a limited memory (context), so it's not like a junior developer who learns things on the way. You will need to repeat the same steps multiple times, pointing to the same code examples and possibly tweaking the rule files. As the codebase grows, the results are getting worse and slower.

If I had precise and small enough tasks to implement, I'd probably be faster, learning the project as I go, and having a full context in my mind. However, this wasn't the case for this project. And rarely in my working experience, when working on multiple tasks simultaneously, without precise requirements or UI design, looming deadlines, and nobody knows what they are doing or who to ask for help.

I would probably still be drawing diagrams, creating UI, and writing tasks, assigning them to epics and phases, but instead, I have a working app.
It wouldn't work if I were part of a team, and everyone would just vibecode whatever they wanted whenever they are pleased. But for solo devs...

The Sweet Spot

It is an excellent tool for prototyping, though. I could test and try a lot of ideas and scrap them without feeling like I wasted 3 days building a screen that doesn't make sense. It took me five minutes of prompting to verify that it wasn't what I actually wanted, and then I moved to another idea.

In this project, I took a different approach than I usually do, or that professional developers typically take at work. I didn't plan anything or design anything - I just started building the next thing I thought was the most important. It's refreshing and liberating. I could do this because I have no budget, no deadline, and nothing but the urge to build something I'll use. And AI is great for that - to test ideas, because what I like may not be the thing I actually need. AI tools like Cursor are great for prototyping and exploration, allowing me to test and discard ideas without investing significant time and effort.

The nice thing was that after some update, or using a new model (I dunno, I keep it on auto, I'm vibing), it started to compile the code and react to build errors, fixing hallucinated imports and imaginary method calls. On its own, without me changing the rules or asking it explicitly.

Around the same time, I noticed that Cursor improved at searching the codebase and finding connections. Rather than manually testing changes, I asked: What will happen when {{insert edge case}}? It could generally answer and suggest solutions when issues were found. This seems like a nice tool to learn a new codebase, where you just point to a file and ask, 'What's going on here? Who hurt you?'

UI Generation

While it's useful when generating UI, especially since I don't want to PhD in Jetpack Compose that STILL DOESN'T HAVE NATIVE SCROLL INDICATORS and keeps changing or forcing me to use experimental APIs, the LLM doesn't see or understand what it's generating.

Each time I ask to add a button somewhere to perform an action via ViewModel, UseCase, Service, and Repository, it does just that. Then I have to run the code or check Preview (God I love those), capture part of the screen, add it to the conversation, with note: "This button is in the wrong spot, I want it elsewhere and smaller, and use IconButton as I asked you 15 minutes ago."

At some point, I learned it's actually better to paste an image of a wireframed design and ask the LLM to do "this" without providing too much information, rather than trying to specify exactly what I need the AI to do. Then do manual fixes. A single image is worth more than 2^10 words. Typically, there are no concerns about font sizes or paddings, as the LLM will use standard, statistically correct options.

LLM tends to use the simplest UI components at all times, rather than more sophisticated yet standard ones. If I hadn't told it to use CardView, it would simply draw a box container and place text and buttons within it. The same applies to dialogs - it will add one, but not following Material Design principles, unless I explicitly describe them. Something that's a standard thing in Android.

Taking Back Control

As I reach the stage of taking back control, I feel a sense of relief. There is a lot, and I mean A LOT, of refactoring before I can release the app. It works, but it has to be human-rewritten. While happy paths are covered, LLMs often fail to anticipate and handle error cases, or they can be less creative in breaking things than a typical user.

But I have something to work with. It's easier to fix a broken car than build a new one from scratch.

I will still use Cursor or other AI tools to perform tedious tasks, such as adding new languages, extracting logic to separate files, verifying that I'm using correct logging everywhere, or simply questioning features and architecture in the context of real code.

DHH, in a podcast with Lex Fridman, at some point in a 6-hour conversation (worth it), compared writing code to playing on an instrument. While you can listen to a recorded song, and it's going to be perfect each time, a lot of folks still spend hours to learn how to play an instrument, just to play a poor version of the song. Because playing is joy. For DHH, coding is joy, and he doesn't want it to be taken away by LLMs, even if they can do it well.
I enjoy coding, but I prefer building things more; coding is just a means to achieve that. If I can delegate tedious tasks to an LLM and focus on the bigger picture or actually challenging problems, while overseeing the code generation, I'm OK with that.

The Unexpected Gem

What I was positively surprised by using Cursor as the discussion partner about the project. For example, what features are missing? Asking it to generate a README based on used libraries and Use Cases, what else should I add to make it MVP-worthy, etc? It's like that one colleague you want to get an opinion from, but don't want to bother too much asking those questions - but the LLM doesn't care, it has all the time in the world. You bump global warming a bit with each question, that's all.

The LLM asks pretty decent clarifying questions. When I requested the implementation of a new feature with a detailed description of how it should work and look, it still had five clarifying questions with possible answers. And it made me think about how I actually want it, and sometimes even change my mind from my original protein-based idea.

With that, LLMs shine more than just as code-generating tools.

My Rules of Thumb for LLM Coding

After 30k lines of AI-generated code and countless hours of ping-pong conversations, here are the rules I've learned the hard way:

Point to existing code instead of describing - LLMs work best when you can reference other parts of your codebase that already have the desired structure. "Make it like the UserService but for Tasks" works better than explaining dependency injection from scratch. "Do as I say, not as I do" will have exactly the same poor result as when raising a toddler.

Use wireframes over verbal descriptions - Paste an image of what you want rather than trying to describe UI in words. I wasted hours saying "move the button to the right and make it smaller" when a simple mockup would have done it instantly.

Start small, then expand - LLMs work better on smaller codebases where they can understand the full context. Once you reach a certain complexity threshold, their reasoning begins to break down. Having a well-organized structure in a project leads to a smaller context, even when working within a large codebase. This works well for protein-based developers too.

Write comprehensive rules files - Treat the LLM like a new team member who needs a detailed project handbook. My rules files became as important as the documentation I'd write for human developers. I used to write docs for "my future self" and my goldfish memory, never thought I would need that skill for AI.

Use AI for prototyping, not production - Great for testing ideas quickly and seeing if they make sense, but be prepared for major refactoring. Scrapping AI-generated code leaves no emotional damage, since you didn't spend days writing it.

Leverage AI as a discussion partner - Ask it about missing features, architecture decisions, and MVP requirements. You may already have a solid idea, but you never know what else you don't know. Some ideas may be off the mark, but it doesn't hurt to ask. It's like having that colleague who's always available for brainstorming and never gets annoyed by your questions.

Avoid AI-generated tests - They test what the code looks like it should do, not what it actually should do. The tests were bringing zero value and causing more compilation errors than the actual code. LLM can help you write tests, but you should control it, rather than relying on automagically created unit tests and achieving code coverage.

Use GIT - Cursor has some idea of reverting changes, but I learned that making a commit as soon as something remotely close to what I wanted can save me a lot of time. With continuous prompts aimed at ironing out this one minor issue, LLM can completely destroy something that was already OK.

It seems that I wouldn't progress more than 20% in the project without a complete understanding of everything the LLM generates. Having experience not only in coding in general but also in the specific technology being used is essential. AI is only as good as the programmer who uses it.

The Verdict

Maybe for web apps it works better? For mobile, it's so-so; for the watch, it's even worse. But it allowed me to move fast and change my mind often - I don't need much more at this stage of the project.

So, if you're building another CRUD-like SaaS using Next.js and Tailwind, hosting it on Vercel, and utilizing Supabase, LLM can replace a team of developers.

It won't, for now, replace seasoned, grumpy CS grads who have seen and made all the possible mistakes a software engineer can commit. But remember coding bootcamps? The place where they explain some frontend JS framework for 6 weeks to career-switching enthusiasts? Yeah, they are fucked.

For anything more sophisticated, niche, or actually interesting, you still have to understand what the LLM is doing and be able to write everything yourself. However, it makes my work faster, at least when used in a narrow context with clear directions.

While writing rules for Cursor, including code examples, it reminded me of documenting our mobile project a few years ago. To avoid any misunderstanding about how it works, how it's built, what the structure is, and how it's implemented. I did it to end useless discussions during code reviews, where each developer had slightly different opinions about project architecture. However, documentation is also useful when new people join the team, as it provides a project handbook. LLM is that new guy, but he never actually learns.

But he is excellent for generating AWK commands that I have to use once a decade.

DEV Community