My experience building an app entirely via an LLM
The good, the bad, and the "wow, this app is ugly"
In this blog post, I write about how I created a web app 100% via an LLM. My major takeaways along the way:
Iterative development continues to be the game
However, Anthropic-imposed message limits make this a bit of a hassle
AI helps a ton with those necessary tiny tasks that no one wants to do
Claude was not able to take over the styling of my app
If you're hoping to be able to build apps without being a coder, I'm sorry, it's not there
"It's about the journey, not the destination." Lately, I've realized that most people turn to LLMs when they care more about getting to a destination than they do the journey.
Think about it: authors aren't using LLMs to write their blog posts, because they enjoy the act of writing. People aren't using LLMs to summarize YouTube videos that they actually enjoy watching.
There are also the problems with trustworthiness and quality, of course. Thus, why I say "a destination" rather than "the destination." In some cases, that's not an issue. While an LLM isn't going to be a more evocative author than Salman Rushdie, it doesn't need to be in order to write that report that your boss's boss insisted he needed but will never read.
So, it was recently that I was in that sweet spot of caring little about the journey for something where quality was less important.
I've set a goal for myself this year of memorizing every station of the Paris Metro on every line, in order. Anki's been useful for this purpose, but I wanted a small app that lets me cram instead of relying on the spaced repetition algorithm. I started to code it myself, and I soon painted myself in a corner where I need to backtrack and make some significant changes. The process was going to be boring.
So, I turned to AI and decided to see it through to the end. Could I offload the entire work to an LLM?
Turns out, the answer is yes, but I was giving something up in doing so. (Don't worry, no talk about any "ineffable" or "ethereal" quality missing in the LLM output here!)
Iterative is nice until...
I gave the initial requirements to both Claude and Jetbrains AI.
Create an app using Typescript and React components where a user first selects a line from the Paris Metro, and then the app displays all of the stations (with their transfers) of that line in order except for one of the stations is hidden and the user must type in the name of that station. If guessed successfully, that station is revealed and another one is hidden at random. The lines and stations are contained in resources/lines.json (attached), with interfaces of interface Line {
name: string;
terminus_left: string;
terminus_right: string;
stations: {
[stationName: string]: Station;
};
}
interface Station {
prev?: string | string[];
next?: string | string[];
line: string;
travel_time_min?: number;
transfers?: string[] | null;
}
Claude gave me code that better matched the requirements, and so I went with it over Jetbrains. I was specifically using Claude 3 Opus as Anthropic says it performs better at complex tasks, but many people online are high on Claude 3.5 Sonnet for coding, too.
The nice thing about using Claude for this is that I could go along iteratively. Building this game is challenging because most Metro lines are like this:
But then you have just one that's like this:
And one that's like this:
And one that's like this:
I could give Claude one of these new lines and have it handle it:
I have a line that's a little bit different and needs to be accounted for with this code. How can we change the code to work with this line and the other ones?
Here's the JSON representation:
It didn't do it perfectly. For example, on line 10 above, it only altered the code to handle the eastbound journey. That's okay, because I'm still not sure how I would have done it any better. Plus, I was able send a follow-up message to get a fix in. However, if I had a clearer idea of how to implement this, it would have probably been faster to do it myself. In those early messages, there were a handful of, "no, that still doesn't work" or "that seems to work now, but you removed this other key component."
Oddly enough, Claude did a really good job when I asked it for something more complex, which was adding new play modes to the game. I asked it first for a "hard mode" where users had to guess a station only getting the station name before and after. It did this well. Then I asked it for an "extra hard mode" where a user had to guess the stations in order. It also added this well.
There were diminishing returns of how much effort I was saving with Claude. While spinning up the app, a few messages back and forth were fine, because Claude was writing a lot of code that I didn't want to write. But near the end, the app mainly just needed some refinements, and my pledge not to touch the code directly started to be a burden.
At one point, I even set aside a feature because getting Claude to fix its mistake was too much of a hassle:
Let's go ahead and remove the line information on the top incorrect stations. It's not working properly and I don't feel like troubleshooting it.
This was further compounded by the limits that Anthropic puts on long-running chats.
These limits are crimping my style
Anthropic sets limits on how many messages you can send, and these limits are significantly reduced when a conversation gets long or when you include files. The limits reset every five hours, but I found myself hitting the limits again within fifteen minutes of picking back up the conversation.
The limits are more frustrating when Claude comes back with something that doesn't work. I found myself questioning whether it was worth asking a question, knowing that if I didn't get the right answer back that I was going to burn through some of my quota.
To try and maximize how much I could get out, I gave a try to sending messages during the periods when I wouldn't otherwise be working on the app. I'm at home watching my son this week, so when he was playing on the playground, I sent Claude a few messages about turning my app into a PWA, thinking that I would use the output when I got home. Except Claude gave me code that introduced new errors, and backing out of the new state was painful.
I'm glad I didn't have to do that
AI helped a ton when it came to things that I needed to do but I just didn't feel like doing.
For example, while Claude was writing code for me, I still continued to use Jetbrains AI to write my commit messages. I found them to be spot on in most situations and saved me from messages that would be impenetrable when I come back to them in the future.
Claude has no design taste at all
The biggest thing I don't like doing, though, is where Claude offered no help at all: styling the app.
I was really excited here, and maybe it's just Claude, but the results were terrible.
First, I tried giving Claude a screenshot and some instructions.
This is going great.
Now let's get to styling it.
I want it to look like this, with the input continuing to be there in place of the hidden station name. I want it to be horizontal in cases where the screen is wide (like a laptop) and vertical in other cases (like a tablet or phone).
The color of the line will vary depending on each line. Don't worry about that for now, though.
Let's see what Claude gave me.
Ahhh!!!
I tried going back and forth with Claude until finally giving up. The design never got closer to my desired output than the initial effort. Instead, I decided to focus on adding new functionality before coming back to the design at the end, reasoning that new functionality would need to be styled anyway. When I got to that point, I instructed Claude to start over:
Okay, this works well. But the page is ugly. Ignore what I asked you before in regards to styling or what you gave me (where applicable), and just make this page look presentable.
I had to prompt it another time to forget the previous styling, after which I got a plain, but passable set of styles.
This was the biggest disappointment by far. Others have had better luck with this, so maybe I just got unlucky, or maybe I need to prod Claude a bit more. Either way, this is by far where I want AI to take work away from me.
Claude's not writing the most maintainable code
If Claude isn't giving you great design, it's not giving you the most elegant code, either. The primary function is is 417 lines, because everything is stuffed in there except for imports and one function that I had Jetbrains write for me. Inside that function, there's a 35 line function to handle the guess on extra hard mode and a 36 line function to hide a station.
And, of course, there are no tests.
Now, I probably could have guided Claude to make smaller functions, create multiple React components, and write tests. But see the section on the message limit to see why I didn't.
You still need some background knowledge
Thinking about it now, there's actually one place where AI didn't write code for me. Claude didn't import useEffect
when it was necessary, and so instead of prompting it again, I went ahead and added the import. Sorry!
And that shows that you still need to have some level of coding knowledge to do this. Even when Claude doesn't mess up, it also doesn't give you the complete code, instead showing you just what has changed, and leaving the rest as "the rest stays unchanged." You are expected to know what to change and what to leave alone. If you think that's a simple proposition, you've never taught code to non-coders.
In closing
Overall, I was pretty happy with the outcome, despite the challenges. I went from a project that I was procrastinating on and was likely to keep putting off, to a "finished" project in a few days. (Again, message limits.) This is entirely for myself, so I'm happy with the state that it is now, but if I wanted to share this more broadly, there's still some work left to do.
Would I use this if coding were still my job? Yes, but definitely not to this extent. Jetbrains AI has been truly helpful in helping me get around opaque error messages and doing the necessary work I don't really love doing (like JSDoc comments or commit messages). I might even use Claude or another service to spin up a proof of concept, but it's unlikely that I'll be getting to 90% complete for an app that I'm getting paid to make.
If you'd like to check out the app (no promises it isn't broken, again, no tests from Claude!), give it a look here: https://sparkly-gnome-8f2efd.netlify.app