You need to be reasonably experienced and guide it.
First, you need to know that Claude will create nonsensical code. On a macro level it's not exactly smart it just has a lot of contextual static knowledge.
Debugging is not it's strongest skill. Most models don't do good at all. Opus is able to one-shot "troubleshooting" prompts occasionally, but it's a high probability that it veer of on a tangent if you just tell it to "fix things" based on errors or descriptions. You need to have an idea what you want fixed.
Another problem is that it can create very convincing looking - but stupid - code. If you can't guide it, that's almost guaranteed. It can create code that's totally backwards and overly complicated.
If it IS going on a wrong tangent, it's often hopeless to get it back on track. The conversation and context might be polluted. Restart and reframe the prompt and the problems at hand and try again.
I'm not totally sure about the language you are using, but syntax errors typically happens if it "forgets" to update some of the code, and very seldom just in a single file or edit.
I like to create a design.md and think a bit on my own, or maybe prompt to create it with a high level problem to get going, and make sure it's in the context (and mentioned in the prompts)
Sometimes people forget that you don't have to use AI to actually write the code. You can stick to "Ask" mode and it will give you useful suggestions and generate code but won't actually modify your files.
It seems to me you expect Claude to be able to one-shot your tool based on a single prompt. Potentially "vibe-coding" as in the sense: you don't know how to develop this yourself (perhaps you are not a software developer?)
While this may be possible, it likely requires a very detailed prompt and/or spec document.
- The spec broke down development into major phases. I reviewed detailed plans for each phase before Claude started. I often asked Claude to update these detailed plans before starting. And after implementation, I often had to have Claude fix bugs in the implementation.
- I tried to share the chat session where Claude got the first functional MVP working: https://opncd.ai/share/fXsPn1t1 (unfortunately the shared session is truncated)
I then have it go to a central location because I use multiple machines and it creates a website so I can see what I've been working on.
Project | Prompts | Tools | Files | Words In | Words Out | Tokens In | Tokens Out | Cache R/W | Last Activity
Nawin | 10688 | 74568 | 7724 | 1201.3k | 1379.5k | 592.0k | 83.3k | 3221.4M/199.5M | 2026-01-30 20:31
Crabbit | 3232 | 14252 | 1348 | 310.4k | 259.1k | 82.7k | 17.6k | 755.0M/51.2M | 2026-01-30 08:22
Reading these figures now, I think it counts its own prompts, you know it talks to itself. There's no way I've typed ten thousand prompts on that project
Like OP, I've been similarly struggling to get as much value from CC (grok et c) as "everyone" else seems to be.
I'm quite curious about the workflow around the spec you link. To me, it looks like quite an extensive amount of work/writing. Comparable or greater than the coding work, by amount, even. Basically trading writing code files for writing .md files. 150 chat sessions is also nothing to sneeze at.
Would you say that the spec work was significantly faster (pure time) than coding up the project would have been? Or perhaps a less taxing cognitive input?
Yes, it's a lot of spec work. But a lot of it is research and exploring alternatives. Sometimes Claude suggests ideas I would have never thought of or attempted on my own. Like a custom python websocket server: https://github.com/Leftium/rift-local
I have been able to implement ideas I previously gave up on. I can test out new ideas much faster.
For example, https://github.com/Leftium/gg started out as 100% hand-crafted code. I wanted gg to be able to print out the expression in the code in addition to the value like python icecream. (It's more useful to get both the value and the variable name/expression.) I previously tried, and gave up. Claude helped me add this feature within a few hours.
And now gg has its own virtual dev console optimized for interacting with coding agents. A very useful feature that I would probaly not have attempted without Claude. It's taken the "open in editor" feture to a completely new level.
I have implemented other features that I would have never attempted or even thought about. For example without Claud's assistance https://ws.leftium.com would not have many features like the animated background that resembles the actual sky color.
60 minute forecast was on my TODO list for a long time. Claude helped me add it within an afternoon or so.
Note: depending on complexity of the feature I want to add the spec varies in the level of detail. Sometimes there is no spec outside Claude's plans in the chat session.
Wow thanks for sharing!
Could you explain how you made the specs?
Did you already know pretty much everything you wanted to cover before hand? Was one CC session enough to go through it?
In my experience, trying to make a plan/specs that really match what I want often ends in a struggle with Claude trying to regress to the mean.
Also it’s so easy to write code that I always have tons of ideas I end up implementing that diverge from the original plan…
- It was definitely not one CC session. In fact, this spec is a spin-off of several other specs on several other branches/projects.
- I've actually experienced quite the opposite: I suggest an idea for the spec and Claude says "great idea!" Then I change my mind and go in the opposite direction: "great idea!" again. Once in a while, I have to argue with Claude to get my idea implemented (like adding dependencies to parse into a proper AST instead of regex.)
- One tip: it's very useful to explain the "why" to Claude vs the "what." In fact, if you just explain the why/problem without a specific solution, Claude's suggestions may surprise you!
A novice was trying to fix a broken Lisp machine by turning the power off and on.
Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
That isn't what the hype is. If that's the kind of stuff you're reading about or watching, you should find better sources. You can one-shot some things, and it makes for an impressive demo (oh yay, yet another video game made instantly) but anything larger and more useful will probably be a conversation. (Though not necessarily with a human, AIs can discuss it among themselves too.)
Your first one-shot might be a good rough prototype. From there, you continue the conversation with your refinements. While Claude goes and works on that for 15 minutes - you can go and do other work. Or talk with another Claude in another window to make progress on another project.
A good mental model is to imagine you're talking to a remote developer. You need to give them an extremely detailed spec on the first go if you expect them to get it right the first time. Sometimes it's better to explain "this is my grand vision, but how about we first mockup a prototype to see if that's actually how I want it to work". Sometimes Claude will suggest you talk about your plan together first to remove the ambiguities from the plan, or you can encourage Claude to do that with you.
(Also, the remote developer mindset is useful - treat the remote developer with respect, with humanity, and they're more likely to be helpful towards you and motivated to align with your goals.)
Consider that in an hour or two of conversation, you now have your app, completed, fully debugged... and not once did you look at the code, and you spent half of that time catching up on your other tasks. That's vibe coding.
Well, I've offered what I can to help. If your experience is mostly free chatbots, I would definitely suggest trying Opus 4.5 or 4.6 in Claude Code. The agentic harness of the software around the model (ie Claude Code) is important. Consider also that some of us have been doing this for a year and have already built our own MCP server tooling to go faster. Giving your AI the same kind of deterministic software tools that you use is important (eg make sure your AI has access to a diff tool, don't make it try and do that "in its head", you wouldn't ask that of a human).
As for listening to Hacker News... yeah, this is one of the worst places (well, Mastodon is worse) and HN is surprisingly AI-doomerish. I don't check in here very often anymore, and as of this week I just get Claude to summarize HN headlines as a morning podcast for me instead.
My own experience: my first few uses of Claude in Dec 2024 seemed rubbish, I didn't get it. Then one day I asked it to make me a search engine. The one shot from that wasn't perfect, but it worked, and I saw it build it in front of my eyes. That was the moment & I kept iterating on it. I haven't used Google or Kagi in almost a year now.
Anyway, hope it helps, but if not using AI makes you feel more comfortable, go with what fills your life with more value & meaning & enjoyment.
No, but I index parts of the web that are important to myself, sites I frequently reference. (I have all of Simon Willison's site indexed, for example.) It turns out that a simple SQLite database is a lot more capable and faster than I thought. I index from my laptop, using another tool I built with Claude. I don't crawl or spider, I focus on indexing from sitemap.xml files and RSS feeds. I have about 1.5 Million pages in my local index, and I get search results in 40ms - 70ms, thereabouts.
For every search that doesn't find results - and of course that's still the majority - it falls back to a meta-search combining results from Brave, Mojeek & Marginalia. The core of that metasearch is what Claude 3.5v2 generated for me in a one-shot back in Dec 2024. Kagi is just a metasearch with a very small local index as well, and my main purpose in building this was replacing Kagi for my needs.
The last 10% of my queries were widget queries like currency conversion, distance & temperature conversion etc that I was using for about 10% of my search queries.
Not so experienced with Claude Code, but the web version does the job for me. I mean, I don't create very complex stuff for others (mostly websites and simple apps) but it did some things that I am proud of (I'm not a dev - have some exp with python, but even that is really basic stuff. So, here's what I do (makes it really easy to do stuff (at least simple stuff I do):
- Start a new project and discuss every single detail about it with Claude.
- Tell it to write a txt or pdf summary of everything and place that file in project knowledge.
- After that tell it to give you a complete project structure and create it
- Then simply start populating the files.
- Place each file in your project knowledge so it can see it
After this it's just debugging which mostly goes smooth as well
I've been playing with it for almost two years now, and this is what gets me there. ChatGPT never got even close to it.
You aren't telling us anything about how you're using it. So how can we tell you what you're doing wrong? You're just reporting what happened.
You haven't even said what programming language you're trying to use, or even what platform.
It sounds to me like you didn't do much planning, you just gave it a prompt to build away.
My preferred method of building things, and I've built a lot of things using Claude, is to have a discussion with it in the chatbot. The back and forth of exploring the idea gives you a more solid idea of what you're looking for. Once we've established the idea I get it to write a spec and a plan.
I have this as an instruction in my profile.
> When we're discussing a coding project, don't produce code unless asked to. We discuss projects here, Claude Code does the actual coding. When we're ready, put all the documents in a zip file for easy transfer (downloading files one at a time and uploading them is not fun on a phone). Include a CONTENTS.md describing the contents and where to start.
So I'll give you this one as an example. It's a Qwen driven System monitor.
What you are trying to do is quite easy to do with Claude. I have done way more complex things than that in hours. But having programming, managing(with humans) and engineering experience is extremely useful.
It seems you try to tell the tool to do everything in one shot. That is a very wrong approach, not just with Claude but with everything(you ask a woman for a date and if you do not get laid in five minutes you failed?). When I program something manually and compiles, I expect it to be wrong. You have to iron it and debug it.
Instead of that:
1.Divide the work in independent units. I call this "steps"
2.Subdivide steps into "subsets"
You work in an isolated manner on those subsets.
3.Use an inmediate gui interface like dear imgui to prototype your tool. Translating then into using something else once it works is quite easy of LLMs.
4.Visualize everything. You do not need to see the code but you need to visualise every single thing you ask it to do.
5.Tell Claude what you want and why you want it and update the documentation constantly.
6. Use git in order to make rock solid steps that Claude will not touch when it works and you can revert changes or ask the ia to explore a branch, explaining how you did something and want something similar.
7. Do not modify code that already works rock solid. Copy it into another step leaving the step as reference and modify it there.
5.Use logs. Lots of logs. For every step you create text logs and you debug the problems giving Claude the logs to read them.
6.Use screenshots. Claude can read screenshots. If you visualise everything, clause can see the errors too.
7.Use asserts, lots of asserts, just like with manual programming.
It is not that different from managing a real team of people...
Stuff like "divide the work up" is something you do when doing it yourself. Making a GUI prototype isn't really much work at all in the age of LLMs, akin to drawing up a few ideas on a notepad. Using git for small steps is something lots of people do for their own work and rebase later. Using extensive logging is mostly just something you have in your AGENTS.md for all your projects and forget about, similarly getting it setup to make and look at screenshots.
What part of this is more work than doing it yourself?
It’s more work in the same sense that trying to delegate a task to someone who doesn’t understand what needs to be done, and needs their hand held, is more work than doing it yourself.
This is especially true when the vision is a little hazy and the path isn’t clear. When doing it yourself, you can make decisions in the moment, try things, pivot… when trying to delegate these things, it becomes a chore to try to clarify things that are inherently unclear, and pivot an idea when the person (or AI) being delegated to doesn’t fully grasp the pivot and keeps bringing in old ideas.
I think most people have had an experience trying to delegate a task, where it becomes so much work to wrangle the person, that they just do it themselves. I’ve run into this countless times. That’s how it feels to use AI.
It’s really not. For anything substantial, the things that you do to manage an LLM are the same things that you should be doing to manage a team of human devs, even if the team is just yourself.
Documentation. Comments. Writing a plan and/or a spec before you begin coding. Being smart with git commits and branches.
Not even close. A friend and I are working on an iOS game (a tower defense style game). We are writing 0 code ourselves. We both have a history of iOS development, he is still actively involved and I've move away from it in recent years.
In about 2 weeks we have a functional game, 60 levels, 28 different types of enemies, a procedurally generated daily challenge mode, an infinity mode. Tower crafting and upgrades, an economy system in the game for pay for the upgrades.
This likely would have taken us months to get to the point that we are at, it was playable on Day 2.
I start with what I want to build. In the initial prompt I provide an overview of what I want, and then some specifics. Last night I added an archive to the Daily Challenge mode, so if you missed a day's challenge you could go back and play it. This is what my initial prompt looked like:
---
I'd like to add an archives mode to the daily challenge. This will allow players to complete any daily challenges they didn't attempt on the actual day.
It will look like a calendar, with the dates in Green if it was played, and in white if not.
The archive should only go back to January 30, 2026, the day the project started. Include a to do to change this date prior to release.
Rewards for completing daily challenges via the archive should be 25% of the normal value.
---
Claude Code then asked me a couple of clarifying questions before it harnessed the superpowers:writing-plans skill and generate a document to plan the work. The document it put together is viewable at https://gist.github.com/Jeremy1026/cee66bf6d4b67d9a527f6e30f...
There were a couple of edits that I made to the document before I told it to implement. It then fired off a couple of agents to perform the tasks in parallel where possible.
Once it finished I tested and it worked as I had hoped. But there was a couple of follow up things that would make it more intertwined with everything else going on around daily challenges. So I followed up with:
---
lets give 1 cell for compelting an archived daily challenge
---
And finally:
---
Now that we are tracking completions, can we update the notification to complete daily mission to include "Keep your X day streak"
Sounds like I should give Claude Code another try. The last time I worked with it, it was quite eager to code without a good plan, and would overcomplicate things all the time.
Not entirely relevant, but the example I remember is I asked for help with SQL to concatenate multiple rows into a single column with SQL Server and instead of reminding me to use STRING_AGG, it started coding various complicated joins and loops.
So my experience is/was a little different. Regardless, I think I should take one of my old programs and try implementing it from ground up by explaining the issue I'm trying to solve to see how things progress, and where things fail.
Another example is the tower stat caps. When Claude Code generate the first pass, it make it so that the tower level would control each individual stat's cap. Which was way too high. I didn't know exactly what the limits were, but knew they needed to be pulled back some. So I asked it:
-Start Prompt-
Currently, a towers level sets the maximum a single stat can be. Can you tell me what those stat caps are?
-End Prompt-
This primed the context to have information about the stat caps and how they are tied to levels. I followed up after it gave me a chart back with Tower Level and Max Stat Rank with some real stats from play
-Start Prompt-
Lets change the stat cap, the caps are currently far too high. All towers start at 1 for each IMPACT stat, my oldest tower is Level 5, and its stats are I-3, M-4, P-6, A-3, C-1, T-1. How do you think I could go about reducing the cap in a meaningful way.
-End Prompt-
It came back with a solution to reduce the individual stat cap for individual stats to be tower level + 1. But I felt that was too limiting. I want players to be able to specialize a tower so I told it have the stat cap be total, not per stat.
-Start Prompt-
I'm thinking about having a total stat cap, so in this towers case, the total stats are 18.
-End Prompt-
It generated a couple of structures of how the cap could increase and presented them to me.
-Start Prompt-
Yes, it would replace the per-stat cap entirely. If a player wants to specialize a tower in one stat using the entire cap that is fine.
Lets do 10 + (rank * 3), that will give the user a little bit of room to train a new tower.
Since it's a total stat cap, if a user is training and the tower earns enough stat xp to level beyond the cap, lock the tower at max XP for that stat, and autoamtically level the stat when the user levels up the tower.
-End Prompt-
It added the cap, but introduced a couple of build errors, so I sent it just the build errors.
-Start Prompt-
/Users/myuser/Development/Shelter Defense/Shelter Defense/Views/DebugTowerDetailView.swift:231:39 Left side of mutating operator isn't mutable: 'tower' is a 'let' constant
/Users/myuser/Development/Shelter Defense/Shelter Defense/Views/DebugTowerEditorView.swift:181:47 Left side of mutating operator isn't mutable: 'towerInstance' is a 'let' constant
What you’re doing is the so called “slot machine AI”, where you put some tokens in, pray, and hope to get what you want out. It doesn’t work that way (not well, at least)
The LLM under the hood is essentially a very fancy autocomplete. This always needs to be kept in mind when working with these tools. So you have to focus a lot on what the source text is that’s going to be used to produce the completion. The better the source text, the better the completion. In other words, you need to make sure you progressively fill the context window with stuff that matters for the task that you’re doing.
In particular, first explore the problem space with the tool (iterate), then use the exploration results to plan what needs doing (iterate), when the plan looks good and makes sense, only then you ask to actually implement.
Claude’s built in planning mode kind of does this, but in my opinion it sucks. It doesn’t make iterating on the exploration and the plan easy or natural. So I suggest just setting up some custom prompts (skills) for this with instructions that make sense for the particular domain/use case, and use those in the normal mode.
I mainly use it in a work context where it’s not my money I burn. I do have a private subscription that I’m going to use for a project. Do you have any tips how to try and do kind of what I describe, but in a more cost sensitive way?
Just burn the tokens. It’s an upfront cost that you pay once at the beginning of a project, or on a smaller scale at the beginning of a major feature.
For context, I’ve built about 15k loc since Christmas on the $20 plan, plus $18 of extra usage. Since this is a side project, I only hit the limits once or twice per week.
1. How are you using Claude? Are you using https://claude.ai and copying and pasting things back and forth, or are you running one of the variants of Claude Code? If so, which one?
2. If you're running Claude Code have you put anything in place to ensure it can test the code it's writing, including accessing screenshots of what's going on?
i added a comment about GSD here and it was nice to see yours too... I'm just a user of GSD but boy has it changed the context rot I used to experience and the system is just one-shotting everything I ask from basic to complex and finally being able to handle stuff and not make things up and mess stuff up (or go in circles)...
I (and my colleagues) get consistent good results and I am starting to believe it is because our experience is in running large projects for decades with outsourcing companies. We would get assigned a project and a company, usually on the other side of the world, and we would need to make it work: seems LLMs is pretty much the same type of work. And we get consistent gains (better than with outsourcing on average, also because our tasks run 247 which was never financially possible even with the large clients we worked with) from this. Reading that so many people have issues with even trivial stuff makes me think my team has at least some kind of skill others do not have; we kind of assumed everyone is getting the same benefits really.
1. Make sure you are using Opus model. Type /model and make sure Opus is selected. While many say sonnet is good, too, I’m not too convinced. Opus is the first model that actually convinced me to use AI as my daily driver - and I’m a developer for about 20 years.
2. Make the tasks as small and specific as possible. Don’t prompt „create a todo app with user login“ but „create a vue app where users can register, don’t do more than that“, then „build a user login“ then, „create a page to create todo items“, then „create a page to list todo items“, then „on list page, add delete functionality“ - and so on, you get the idea.
3. beware the context size. Claude code will warn you if you exceed it, but even before: the higher the context window, the higher AI will miss things. If you start a new prompt that doesn’t require the whole context of the previous one, type /clear.
4. build an agents.md or Claude.md. /init will do that for you, but it will just create a Claude.md with information that it might think are important - but easily miss things. You know best. It often also includes file and directory structure, while it could easily find out again (tree command) without that info in agents/claude file. Still I recommend: let Claude create that file, then adjust it to your needs. Only add important stuff here. The more you add, the more you spam the context. Again, try to keep context small.
5. if Claude needs a long time for finishing a task or did it wrong at first attempt, tell it to update the Claude.md with information to not do the same mistakes the next time again.
6. make sure you understand the code it created. Add conventions to agents.md that will make the code more readable (use early returns, don‘t exceed nesting level of 3, create new methods with meaningful names instead of inline comments etc.)
While I have had some good experiences with CC, I do use at least double the tokens and probably more like 5x going through fixes / debugging from its initial efforts. I don't think this is always bad, because it helps me to understand some of the more complicated interactions of existing and new code and improves documentation, but it's irritating when it runs out of usage allotments when it has broken something. There are some small things it never has managed to fix that I have to figure out myself, but again, I learn from that. Mapping out a data structure in advance and creating a plan before immediately coding can also help, but at least in our project, sometimes it just takes an incorrect approach and so I don't just let it go off and do things willy-nilly. I can't at all imaging having an agent free to maintain the code at this point, despite the past 2 weeks' hype cycles.
If you’re using claude code/cursor, you should be using plan mode.
There are 3 major steps:
(Plan mode)
1. assuming this is an existing codebase, load the relevant docs/existing code into context (usually by typing @<PATH>
2. Ask it to make a plan for the feature you want to implement. Assuming you’ve already put some thought into this, be as specific and detailed as you can. Ask it to build a plan that’s divided into individually variable steps. Read the plan file that it spits out, correct and bad assumptions it made, ask it questions if you’re unclear one what it’s saying, refine, etc.
(agent mode)
Ask it to build the plan, one step at a time. After it builds each step verify that it’s correct, or have it help you verify it’s correct in a way you can observe.
I have been following this basic process mostly with Opus 4.5 in a mixture of claude code and cursor working on a pretty niche image processing pipeline (also some advanced networking stuff on the side) and have hand-written basically zero code.
People say - “your method sounds like a lot of work too” and that’s true, it is still work, but designing at a high level how I want some CUDA kernel to work and how it fits into the wider codebase and then describing it in a few sentences is still much faster than doing all of the above anyway and then hand writing 100 lines of CUDA (which I don’t know that well).
I’d conservatively estimate that i’ve made 2x the progress in the same amount of time as if I had been doing this without LLM tools.
It takes many months to figure this out, much longer than learning a new programming language.
Read through anthropics knowledge share, check out their system prompts extracted on github, write more words in AGENTS/CLAUDE.md, you need to give them some warmup to do better at tasks.
What model are you using? Size matters and Gemini is far better at UI design work. At the same time, pairing gemini-3-flash with claude-code derived prompts makes it nearly as good as Pro
Words matter, the way you phrase something can have disproportionate effect. They are fragile at times, yet surprisingly resilient at others. They will deeply frustrate you and amaze you on a daily basis. The key is to get better at recognizing this earlier and adjusting
You can find many more anecdotes and recommendations by looking through HN stories and social media (Bluesky has a growing Ai crowd, coming over from X, good community bump recently, there are an anti-ai labelers/block lists to keep the flak down)
First of all, congratulations on asking this question, it seems that everyone is an AI expert these days and it takes courage to admit you're not one of them (neither am I or most of of everyone).
In my little experience, what I've seen work is that you need to provide a lot of constraints in the form of:
- Scope: Don't build a website, but build a feature (either user facing or infra, it doesn't matter). I've found that chunking my prompts in human-manageable tasks that would take 0.5-1 day, is enough of a scale down.
- Docs .md files that describe how the main parts of the application work, what a component/module/unit of code looks like, what tools&technologies to use (and links to the latest documentation and quickstart pages). You should commit these to code and update them with every code change (which with Claude is just a reminder in each prompt).
- Existing code, if it's not a greenfield project.
It really moves away from the advertised paradigm of one-shot vibe-coding but since the quality of the output is really good these days, this long preparation will give you a production ready output much sooner than with traditional methods.
This reminds me of someone who dropped into #java on undernet once upon a time in the 90s. "I can't get it to work" , and we kept trying to debug, and for some reason we kept hitting random new walls. It just never would work! Turns out that they were deleting their .java file and starting over each time. Don't do that.
---
Take it as a sequence of exercises.
Maybe start like this:
Don't use claude code at all to begin with. It's a pair programming exercise, and you start at the keyboard, where you're confident and in control. Have claude open in the web interface alongside, talk through the design with it while working; and ask to google stuff for you, look up the api, maybe ask if it remembers the best way(s) to approach the problem. Once you trust it a bit, maybe ask for code snippets or even entire functions. They can't be 100% correct because it doesn't have context... you might need to paste in some code to begin with. When there's errors, paste them in, maybe you'll get advice.
If you're comfy? Switch seats, start using claude code. Now you're telling claude what to do. And you can still ask the same questions you were asking before. But now you don't need to paste into the web interface anymore, and the AI sure as heck can type faster than you can.
Aren't you getting tired of every iteration where you're telling the AI "this went wrong", " that went wrong"? Maybe make sure there's a way for the AI to test stuff itself, so it can iterate a few cycles automatically. Your LLM can iterate through troubleshooting steps faster than you can type the first one. Still... keep an eye on it.
Similarly underwhelming experience. I have been using it for about a week. I like how unlike Gemini or ChatGPT I don't have to copy+paste code. I have to keep going back to it to tell it I don't want it to access unrelated project folders beyond the scope of the problem, then it runs out of tokens for the next few hours, or gives me a response of varying quality.
My pattern matching brain says this is normal for hype. It's a good product, but no where near to the level you read about in some places (like HN in this case)
You need to give it the tools to check its own work, and remove yourself from that inner low-level error resolution loop.
If you're building a web app, give it a script that (re)starts the full stack, along with Playwright MCP or Chrome DevTools MCP or agent-browser CLI or something similar. Then add instructions to CLAUDE.md on how and when to use these tools. As in: "IMPORTANT: You must always validate your change end-to-end using Playwright MCP, with screenshot evidence, before reporting back to me that you are finished.".
You can take this further with hooks to more forcefully enforce this behavior, but it's usually not necessary ime.
There are skills available that might help you out. The “superpowers” set from Anthropic is really impressive.
The idea is, you want to build up the right context before starting development. I will either describe exactly what I want to build, or I ask the agent for guidance on different approaches. Sometimes I’ll even do this in a separate Claude (not Claude Code) conversation, which I feel works a bit faster. Once we have an approach, I will ask it to create an implementation plan in a markdown file, I clear context and then tell it to implement the plan.
Check out the “brainstorming” skill and the “git worktrees” skill. They will usually trigger the planning -> implementation workflow when the work is complex enough.
The problem I run into is the propensity for it to cheat so you can't trust the code it produces.
For example, I have this project where the idea is to use code verification to ensure the code is correct, the stated goal of the project is to produce verified software and the daffy robot still can't seem to understand that the verification part is the critical piece so... it cheats on them so they pass. I had the newest Claude Code (4.6?) look over the tests on the day it was released and the issues it found were really, really bad.
Now, the newest plan is to produce a tool which generates the tests from a DSL so they can't be made to pass and/or match buggy code instead of the clearly defined specification. Oh, I guess I didn't mention there's an actual spec for what we're trying to do which is very clear, in fact it should be relatively trivial to ensure the tests match for some super-human coding machine.
Not nearly enough context to be a front page post but here we are. Does everyone just give up on all fundamental expectations including determinism when they see AI in the title? Is the AI final boss to take over the front page of hacker news? /rant
To answer the question I would highlight the wrong regions in neon green manually via code. Now feed the code (zipped if necessary) to the AI along with a screenshot. Now give it relatable references for the code and say "xxxx css class/gtk code/whatever is highlighted in the screenshot in neon. I expect it to be larger but it's not, why?"
I’m probably going to be downvoted for this but this thread doesn’t really reflect well on the promises of Generative AI and particularly the constantly reiterated assurance that we’re on the verge of a new industrial Revolution.
I'm feeling the same way. It's quite the contrast from all the hype posts that make it sound like you give the AI a rough idea of what you're looking for and then it will build it from start to finish on its own.
Yes. I’m trying it, it’s too early for me to state a conclusion, but it’s not clear what the point is of an interface that requires magic touch best described as je ne sais quoi.
The alternative to this isn’t even necessarily no AI, just not using it this way.
Agreed. Many of the suggestions are pretty much code it yourself, but without actually tapping the individual keys.
Furthermore, and more generally, one of the great things about (traditional) coding is that it allows 'thinking through making' - by building something you learn more about the problem and thus how best to solve it. Code generation just leaves you with reviewing, which is less powerful in this way I believe. See also 'thinking through writing [prose]'.
I said it since the very first video of somebody who built a login page with it. They kept adding more and more constraints and at some point it's just coding but with extra steps.
It doesn't mean those tools do not have value though but they're not capable of "coding ", in the sense we mean in the industry, and generating code isn't coding.
I don't think the OP gave enough information for us to really have any honest conversation about this one way or the other.
That said: I suspect that OP is providing low-detail prompts.
These tools cannot read your mind. If you provide an under-specified prompt, they will fill in all the details for things that are necessary to complete the task, but that you didn't provide. This is how you end up with slop.
Get the superpowers plugin and then ask Claude to design and document the system. It will go into brainstorming mode and ask you a lot of questions. The end result will be a markdown file. Then get another agent (maybe ChatGPT) to critique and improve the design (upload the markdown file in the web version). Then give it back to Claude and have it critique and improve. Last step, make Claude analyze the design and then document a step by step implementation guide. After that turn Claude code loose on implementation. Those techniques have been working for me when doing a project from scratch.
I listened to a conversation between two superstar developers in their 50's, who have been coding for more than most readers here have been alive, about their experience with Claude Code.
I wanted to tear my ears out.
What is crystal clear to me now is using LLMs to develop is a learned and practiced skill. If you expect to just drop in and be productive on day one, forget it. The smartest guy I know _who has a PhD in AI_, is hopeless at using it.
Practice practice practice. It's a tool, it takes practice. Learn on hobby projects before using it at work.
The problem is that it’s being marketed like it’s magic and will make people obsolete… not as a tool with a high learning curve.
I don’t blame people for being upset when it can’t do what all the hype says it will do.
The way people talk about the latest Claude Code is the same way people were talking 2-3 years ago about whatever the latest model was then. Every release gets marketed as if it’s a new level of magic, yet we’re still here having the same debates about merit, because reality doesn’t match the marketing and hype.
It has gotten better, I tried something with early ChatGPT that failed horribly (a basic snake game written in C), and just tried the exact same thing again last week and it worked—it wasn’t good, but it technically worked. But if it took 3 years to get good enough to pass my basic test, why was I being fed those lies 3 years ago? The AI companies are like the boy who cried wolf. At this point, it’s on them to prove they can do what they say, not up to me to put in extraordinary efforts to try and get value out of their product.
Last week I sat through a talk from one of our SVPs who said development is cheap and easy now, then he went on about the buy vs build debate for 20 minutes. It’s like he read a couple articles and drank the kool-aid. I also saw someone talking about ephemeral programs… seeing a future where if you want to listen to some MP3s, you’ll just type in a prompt to generate a bespoke music player. This would require AI to reliably one-shot apps like Winamp or iTunes in a few words from a layperson with no programming background. These are the ideas the hype machine is putting in people’s minds that seem detached from reality.
I don’t think the, “you’re holding it wrong”, type responses are a good defense. It’s more that it’s being marketed wrong, because all these companies need to maintain the hype to keep raising money. When people use the AI the way the hype tells them it should work… it doesn’t work.
I gave it a basic one or two sentences prompt for the snake game, the code it generated wouldn’t compile 3 years ago. It was also unable to fix the errors. A similar prompt last week worked. It wasn’t a “good” version of the game, but it compiled and functioned.
The process being described by many in the comments removes all the magic. It sounds laborious and process heavy. It removes the part of the job I like, while loading the job with more work I don’t enjoy. This feels like the opposite of what we should want AI to do.
This is going to sound crazy but I felt it was super degraded this morning.
CC was slow and the results I was getting were subpar having it debug some easy systems tasks. Later in the afternoon it recovered and was able to complete all my tasks. There’s another aspect to these coding agents: the providers can randomly quantize (lobotomize) models based on their capacity, so the model you’re getting may not be the one someone else is getting, or the same model you used yesterday.
It was a bit of a learning curve for me as well. I've found having programming experience comes in handy. I know a lot of non-technical folks (people who don't know how to program) who keep bumping their heads on these tools; crunching through credits when a simple update to the code/push to repo is all that's needed.
When claude or codex does something other than what you want, instead of getting mad at it, ask it what it saw in your prompt that led it to do what it did, and how should you have prompted it to achieve what you wanted. This process tends to work very well and gives you the tools you need to learn how to prompt it to achieve the results you want.
I don't wish to be harsh but why, upon encountering a syntax error, would you have the next step be "redo everything from scratch?" This seems odd to me.
You need to be very specific about what to build and how to build it, what tools to use, what architecture it should do, what libraries, frameworks it should include. You need to be a programmer to be able to do this properly and it still takes a lot of practice to get it right.
1. Good for proof of concepts, prototypes but nothing that really goes to heavy production usage
2. Can make some debugging and fixing that usually requires looking the stack, look the docs and check the tree
3. Code is spaghetti all way down. One might say it is ok because it is fast to generate, but the bigger the application, every change gets more expensive and it always forget to do something.
4. Tests it generates is mostly useless. 9/10 times it always passes on all tests it creates for itself but the code does not even start. No matter what type of test.
5. Frequently lied about the current state of the code and only when pushed it will admit it was wrong.
As others said, it is a mix of the (misnomer) Danny Kruger effect and some hype.
I tried possibly every single trick to get it working better but I feel most are just tricks. They are not necessarily making it work better.
It is not completely useless, my work involves doing prototypes now and then and usually they need to be quite extensive. For that it has been a help. But I don't feel it is close to what they sell
I think at the current state of the art, LLM tools can help you build things very quickly, but they can't help you build something you yourself are incapable of building, at least not in a sustained way. They need hand holding and correction constantly.
I don't think that's true, I've seen examples to the contrary. Here for example a recent article [1] from a non programmer building a tool. The article is long so I pasted the relevant part below. My thoughts go more in the direction, the article author built something that is complicated for non technical people, but in essence simple -- he says it so himself "copy paste". What if what the OP here is building is something novel and Claude doesn't know how to build it?
Relevant excerpt:
I spent a bit of time last month building a site to solve a problem I’ve always found super-annoying about the legislative process. It’s hard to read Federal bills because they don’t map to the underlying code. Anyone who has worked in Congress knows what I mean, you get a bill that says “change this word from ‘may’ to ‘shall’ in section XYZ of Federal law.” To understand what it does, and find possible loopholes they are trying to sneak in, you have to go to that underlying Federal law and look at where it says “may” and then put “shall” in there and read it. It’s basically like a manual version of copy and pasting, except much more complicated and with lawyers trying to trick you.
So I wrote an app that lets you upload legislation, and it automatically shows you how it changes Federal law. There are commercial versions of this software, and some states do it for their proposed legislation. But I haven’t seen anything on the Federal level that is free, so I built it. (The code is here.) It’s not very good. It’ll probably break a lot. There’s no “throat to choke” if you use it and it’s wrong. And my guess is that Anthropic or Gemini ultimately will be able to do this function itself eventually. But the point is that if I can build something like this in my spare time and deploy it without any training at all, then it’s just not that hard for an organization with some capital to get rid of some of its business software tools.
You should try out the GSD system, check "GSD github repo"... it will launch everything as a phase and steps and clear context for you so you never compact, and tons of other features...
The whole thing is worth reading, multiple times over, if you want to find success in using the tool. But I'll call attention to this passage especially:
> Include tests, screenshots, or expected outputs so Claude can check itself. This is the single highest-leverage thing you can do.
I've found it really valuable to pair with people, sit at a computer together while they're driving and using AI. It's really interesting to see how other people prompt & use AI to explore the problem.
There's a lot of ads/propaganda/influencer BS/copywriting about how incredible AI but the reality is that its not "that" good. VCs want a return on their investment.
Also I suggest giving it low-level instructions. Its half-decent for low level stuff especially if it has access to preexisting code. Also note that it does exactly what you tell it to do like a genie. I've asked it to write a func that already exists in the codebase and it wrote a massive chunk of code. It wasn't until after it was done that I remembered we already have the solution to the problem done. Anyhow the hype is unreal so tailor expectations accordingly.
could you share an md of your prompts? I find with those tools I still have to break the problem down into verifiable pieces, and only move on to the next step once the previous steps are as expected.
syntax error is nothing, I just paste the error into the tui and it fixes it usually.
claude-code added the /insights command which might tell you what you are doing wrong, using your history.
from the basics, did you actually tell it that you want those things? its not a mind reader. did you use plan mode? did you ask it to describe what its going to make?
Stop wasting your time and money. There's a lot of money being spent to get very known individuals to praise this vibe-coding.
Think about AI the same way you'd think about trading courses: would you buy a course that promises 10,000% returns? If such returns were possible, the course seller would just trade instead of selling courses.
Same logic here - if "vibe-coding" really worked at scale, Claude would be selling software, not tokens.
If you expect it to _do_ things for you - you're setting yourself up for failure.
If you treat it as an astonishingly sophisticated and extremely powerful autocomplete (which it is) - you have plenty of opportunities to make your life better.
To be fair with OP, the hype about what the tool is "supposed" to be doing ("your accountants will rebuild the ERP over the week end, you don't need programmers, etc...") is setting a dev up for frustration.
Personnaly, I'm trying to learn the "make it write the plan, fix the plan, break it down even more, etc..." loops that are necessary; but I haven't had a use case (yet?) where the total time spent developing the thing was radically shorter.
LLMs make wonders on bootstrapping a greenfield project. Unfortunately, you tend to only do this only once ;)
I think this reply might have been downvoted for being a bit glib, but the superpowers plugin took my Claude Code experience from mostly frustrating to nearly-magical
I’m not a software engineer by training nor trade, so caveats apply, but I found that the brainstorming -> plan writing -> plan execution flow provided by the skills in this plugin helps immensely with extracting assumptions and unsaid preferences into a comprehensive plan—-very similar to the guidance elsewhere in this thread, except automated/guided along by the plugin skills
First prompt, ask it to come with a plan, break it down to steps and save it to a file.
Edit file as needed.
Launch CC again, use the plan file to implement stage by stage, verify and correct. No technical debugging needed. Just saying X is supposed to be like this, but it’s actually like that goes a long way.
The fact that you got a syntax error at all is pretty telling. Are you not using agent mode? Or maybe that's just the experience with inferior non-statically typed languages where such errors only appear when the application is run. In any case, the key is to have a feedback mechanism. Claude should read the syntax errors, adjust and iterate until the error is fixed. Similarly, you should ask Claude to write a test for your landscape/portrait mode bug and have it make changes until the test passes.
I’ve found a problem with LLMs in general is that it is trying to mirror the user. If the user is a world class software dev you will get some good stuff out of it. If the user is not experienced at programming you will get something that resembles that out of it.
There used to be more or less one answer to the question of "how do I implement this UI feature in this language"
Now there are countless. Welcome to the brave new world of non-deterministic programming where the inputs can produce anything and nothing is for certain.
Everyone promises it can do something different if you "just use it this way".
From what I've seen, giving Claude a single detailed prompt with exact specs upfront works way better than iterating fix-by-fix — each "fix this" request tends to regress something else because it loses context on what was working. For visual stuff like grid layouts I usually describe the final state precisely (viewport dimensions, aspect ratios, z-index for controls layer, etc.) in one shot rather than letting it guess. Still early days but the prompting style matters way more than people expect.
The truth is that there is a lot of hype.
You need to be reasonably experienced and guide it.
First, you need to know that Claude will create nonsensical code. On a macro level it's not exactly smart it just has a lot of contextual static knowledge.
Debugging is not it's strongest skill. Most models don't do good at all. Opus is able to one-shot "troubleshooting" prompts occasionally, but it's a high probability that it veer of on a tangent if you just tell it to "fix things" based on errors or descriptions. You need to have an idea what you want fixed.
Another problem is that it can create very convincing looking - but stupid - code. If you can't guide it, that's almost guaranteed. It can create code that's totally backwards and overly complicated.
If it IS going on a wrong tangent, it's often hopeless to get it back on track. The conversation and context might be polluted. Restart and reframe the prompt and the problems at hand and try again.
I'm not totally sure about the language you are using, but syntax errors typically happens if it "forgets" to update some of the code, and very seldom just in a single file or edit.
I like to create a design.md and think a bit on my own, or maybe prompt to create it with a high level problem to get going, and make sure it's in the context (and mentioned in the prompts)
Am I crazy thinking that interacting with such a system is a nightmarishly frustrating way to write code?
Like trying to write with a wet noodle - always off in some way.
Write the code feels way more precise and not less efficient.
Sometimes people forget that you don't have to use AI to actually write the code. You can stick to "Ask" mode and it will give you useful suggestions and generate code but won't actually modify your files.
It seems to me you expect Claude to be able to one-shot your tool based on a single prompt. Potentially "vibe-coding" as in the sense: you don't know how to develop this yourself (perhaps you are not a software developer?)
While this may be possible, it likely requires a very detailed prompt and/or spec document.
---
Here is an example of something I successfully built with Claude: https://rift-transcription.vercel.app
Apparently I have had over 150 chat sessions related to the research and development of this tool.
- First, we wrote a spec together: https://github.com/Leftium/rift-transcription/blob/main/spec...
- The spec broke down development into major phases. I reviewed detailed plans for each phase before Claude started. I often asked Claude to update these detailed plans before starting. And after implementation, I often had to have Claude fix bugs in the implementation.
- I tried to share the chat session where Claude got the first functional MVP working: https://opncd.ai/share/fXsPn1t1 (unfortunately the shared session is truncated)
---
"AI mistakes you're probably making": https://youtu.be/Jcuig8vhmx4
I think the most relevant point is: AI is best for accelerating development tasks you could do on your own; not new tasks you don't know how to do.
---
Finally: Cloudlflare builds OAuth with Claude and publishes all the prompts: https://hw.leftium.com/#/item/44159166
I got Claude to make a tool to record all of the prompts in and all of the responses out. but not the actual file changes.
https://github.com/lawless-m/Devolver
it uses hooks to export the session
https://github.com/lawless-m/Devolver/blob/master/HOOKS.md
and then parses the session logs and dumps them out
https://github.com/lawless-m/Devolver/blob/master/JSONL_FORM...
I then have it go to a central location because I use multiple machines and it creates a website so I can see what I've been working on.
Reading these figures now, I think it counts its own prompts, you know it talks to itself. There's no way I've typed ten thousand prompts on that project> Finally: Cloudlflare builds OAuth with Claude and publishes all the prompts: https://hw.leftium.com/#/item/44159166
Lord help us
Thanks for sharing!
Like OP, I've been similarly struggling to get as much value from CC (grok et c) as "everyone" else seems to be.
I'm quite curious about the workflow around the spec you link. To me, it looks like quite an extensive amount of work/writing. Comparable or greater than the coding work, by amount, even. Basically trading writing code files for writing .md files. 150 chat sessions is also nothing to sneeze at.
Would you say that the spec work was significantly faster (pure time) than coding up the project would have been? Or perhaps a less taxing cognitive input?
Yes, it's a lot of spec work. But a lot of it is research and exploring alternatives. Sometimes Claude suggests ideas I would have never thought of or attempted on my own. Like a custom python websocket server: https://github.com/Leftium/rift-local
(I also implemented the previous version with only a high-level, basic understanding of websockets: https://rift-transcription.vercel.app/sherpa)
I think of Claude like a "force-multiplier."
I have been able to implement ideas I previously gave up on. I can test out new ideas much faster.
For example, https://github.com/Leftium/gg started out as 100% hand-crafted code. I wanted gg to be able to print out the expression in the code in addition to the value like python icecream. (It's more useful to get both the value and the variable name/expression.) I previously tried, and gave up. Claude helped me add this feature within a few hours.
And now gg has its own virtual dev console optimized for interacting with coding agents. A very useful feature that I would probaly not have attempted without Claude. It's taken the "open in editor" feture to a completely new level.
I have implemented other features that I would have never attempted or even thought about. For example without Claud's assistance https://ws.leftium.com would not have many features like the animated background that resembles the actual sky color.
60 minute forecast was on my TODO list for a long time. Claude helped me add it within an afternoon or so.
Note: depending on complexity of the feature I want to add the spec varies in the level of detail. Sometimes there is no spec outside Claude's plans in the chat session.
[1]: https://github.com/gruns/icecream
Wow thanks for sharing! Could you explain how you made the specs? Did you already know pretty much everything you wanted to cover before hand? Was one CC session enough to go through it?
In my experience, trying to make a plan/specs that really match what I want often ends in a struggle with Claude trying to regress to the mean.
Also it’s so easy to write code that I always have tons of ideas I end up implementing that diverge from the original plan…
- No, I did not know everything I wanted to cover beforehand. Claude helps me brainstorm, research, and elaborate on my ideas. The spec is a living document that I occasionally check in: https://github.com/Leftium/rift-transcription/commits/main/s...
- It was definitely not one CC session. In fact, this spec is a spin-off of several other specs on several other branches/projects.
- I've actually experienced quite the opposite: I suggest an idea for the spec and Claude says "great idea!" Then I change my mind and go in the opposite direction: "great idea!" again. Once in a while, I have to argue with Claude to get my idea implemented (like adding dependencies to parse into a proper AST instead of regex.)
- One tip: it's very useful to explain the "why" to Claude vs the "what." In fact, if you just explain the why/problem without a specific solution, Claude's suggestions may surprise you!
The what-why switch is quite useful, because it also helps you avoid Claude's "great idea!" responses as well.
> It seems to me you expect Claude to be able to one-shot your tool based on a single prompt.
Yes, this is what the hype says doesn't it?
Or... are they all lying?
I guess the AI Koans are still relevant:
A novice was trying to fix a broken Lisp machine by turning the power off and on.
Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
Knight turned the machine off and on.
The machine worked.
http://www.catb.org/jargon/html/koans.html
That isn't what the hype is. If that's the kind of stuff you're reading about or watching, you should find better sources. You can one-shot some things, and it makes for an impressive demo (oh yay, yet another video game made instantly) but anything larger and more useful will probably be a conversation. (Though not necessarily with a human, AIs can discuss it among themselves too.)
Your first one-shot might be a good rough prototype. From there, you continue the conversation with your refinements. While Claude goes and works on that for 15 minutes - you can go and do other work. Or talk with another Claude in another window to make progress on another project.
A good mental model is to imagine you're talking to a remote developer. You need to give them an extremely detailed spec on the first go if you expect them to get it right the first time. Sometimes it's better to explain "this is my grand vision, but how about we first mockup a prototype to see if that's actually how I want it to work". Sometimes Claude will suggest you talk about your plan together first to remove the ambiguities from the plan, or you can encourage Claude to do that with you.
(Also, the remote developer mindset is useful - treat the remote developer with respect, with humanity, and they're more likely to be helpful towards you and motivated to align with your goals.)
Consider that in an hour or two of conversation, you now have your app, completed, fully debugged... and not once did you look at the code, and you spent half of that time catching up on your other tasks. That's vibe coding.
> If that's the kind of stuff you're reading about or watching
HN - posts and comments - is full of it.
And my personal experiments with the free chatbots contradict it ofc.
Well, I've offered what I can to help. If your experience is mostly free chatbots, I would definitely suggest trying Opus 4.5 or 4.6 in Claude Code. The agentic harness of the software around the model (ie Claude Code) is important. Consider also that some of us have been doing this for a year and have already built our own MCP server tooling to go faster. Giving your AI the same kind of deterministic software tools that you use is important (eg make sure your AI has access to a diff tool, don't make it try and do that "in its head", you wouldn't ask that of a human).
As for listening to Hacker News... yeah, this is one of the worst places (well, Mastodon is worse) and HN is surprisingly AI-doomerish. I don't check in here very often anymore, and as of this week I just get Claude to summarize HN headlines as a morning podcast for me instead.
My own experience: my first few uses of Claude in Dec 2024 seemed rubbish, I didn't get it. Then one day I asked it to make me a search engine. The one shot from that wasn't perfect, but it worked, and I saw it build it in front of my eyes. That was the moment & I kept iterating on it. I haven't used Google or Kagi in almost a year now.
Anyway, hope it helps, but if not using AI makes you feel more comfortable, go with what fills your life with more value & meaning & enjoyment.
> I haven't used Google or Kagi in almost a year now.
So you have the resources to index the whole www on your own?
No, but I index parts of the web that are important to myself, sites I frequently reference. (I have all of Simon Willison's site indexed, for example.) It turns out that a simple SQLite database is a lot more capable and faster than I thought. I index from my laptop, using another tool I built with Claude. I don't crawl or spider, I focus on indexing from sitemap.xml files and RSS feeds. I have about 1.5 Million pages in my local index, and I get search results in 40ms - 70ms, thereabouts.
For every search that doesn't find results - and of course that's still the majority - it falls back to a meta-search combining results from Brave, Mojeek & Marginalia. The core of that metasearch is what Claude 3.5v2 generated for me in a one-shot back in Dec 2024. Kagi is just a metasearch with a very small local index as well, and my main purpose in building this was replacing Kagi for my needs.
The last 10% of my queries were widget queries like currency conversion, distance & temperature conversion etc that I was using for about 10% of my search queries.
Not so experienced with Claude Code, but the web version does the job for me. I mean, I don't create very complex stuff for others (mostly websites and simple apps) but it did some things that I am proud of (I'm not a dev - have some exp with python, but even that is really basic stuff. So, here's what I do (makes it really easy to do stuff (at least simple stuff I do): - Start a new project and discuss every single detail about it with Claude. - Tell it to write a txt or pdf summary of everything and place that file in project knowledge. - After that tell it to give you a complete project structure and create it - Then simply start populating the files. - Place each file in your project knowledge so it can see it After this it's just debugging which mostly goes smooth as well
I've been playing with it for almost two years now, and this is what gets me there. ChatGPT never got even close to it.
You aren't telling us anything about how you're using it. So how can we tell you what you're doing wrong? You're just reporting what happened.
You haven't even said what programming language you're trying to use, or even what platform.
It sounds to me like you didn't do much planning, you just gave it a prompt to build away.
My preferred method of building things, and I've built a lot of things using Claude, is to have a discussion with it in the chatbot. The back and forth of exploring the idea gives you a more solid idea of what you're looking for. Once we've established the idea I get it to write a spec and a plan.
I have this as an instruction in my profile.
> When we're discussing a coding project, don't produce code unless asked to. We discuss projects here, Claude Code does the actual coding. When we're ready, put all the documents in a zip file for easy transfer (downloading files one at a time and uploading them is not fun on a phone). Include a CONTENTS.md describing the contents and where to start.
So I'll give you this one as an example. It's a Qwen driven System monitor.
https://github.com/lawless-m/Marvinous
here are the documents generated in chat before trying to build anything
https://github.com/lawless-m/Marvinous/tree/master/ai-monito...
At this point I can usually say "The instructions are in the zip, read the contents and make a start." and the first pass mostly works.
Yeah if the prompt is as specific as this post, then that's probably the issue...
What you are trying to do is quite easy to do with Claude. I have done way more complex things than that in hours. But having programming, managing(with humans) and engineering experience is extremely useful.
It seems you try to tell the tool to do everything in one shot. That is a very wrong approach, not just with Claude but with everything(you ask a woman for a date and if you do not get laid in five minutes you failed?). When I program something manually and compiles, I expect it to be wrong. You have to iron it and debug it.
Instead of that:
1.Divide the work in independent units. I call this "steps"
2.Subdivide steps into "subsets" You work in an isolated manner on those subsets.
3.Use an inmediate gui interface like dear imgui to prototype your tool. Translating then into using something else once it works is quite easy of LLMs.
4.Visualize everything. You do not need to see the code but you need to visualise every single thing you ask it to do.
5.Tell Claude what you want and why you want it and update the documentation constantly.
6. Use git in order to make rock solid steps that Claude will not touch when it works and you can revert changes or ask the ia to explore a branch, explaining how you did something and want something similar.
7. Do not modify code that already works rock solid. Copy it into another step leaving the step as reference and modify it there.
5.Use logs. Lots of logs. For every step you create text logs and you debug the problems giving Claude the logs to read them.
6.Use screenshots. Claude can read screenshots. If you visualise everything, clause can see the errors too.
7.Use asserts, lots of asserts, just like with manual programming.
It is not that different from managing a real team of people...
> you ask a woman for a date and if you do not get laid in five minutes you failed?
If successfully using Claude Code is as difficult as successful dating, I'm not sure this tech will prevail. ;)
That's significantly more work (and effort) than just doing it yourself, though? Even for larger, complicated projects.
No it's not?
Stuff like "divide the work up" is something you do when doing it yourself. Making a GUI prototype isn't really much work at all in the age of LLMs, akin to drawing up a few ideas on a notepad. Using git for small steps is something lots of people do for their own work and rebase later. Using extensive logging is mostly just something you have in your AGENTS.md for all your projects and forget about, similarly getting it setup to make and look at screenshots.
What part of this is more work than doing it yourself?
It’s more work in the same sense that trying to delegate a task to someone who doesn’t understand what needs to be done, and needs their hand held, is more work than doing it yourself.
This is especially true when the vision is a little hazy and the path isn’t clear. When doing it yourself, you can make decisions in the moment, try things, pivot… when trying to delegate these things, it becomes a chore to try to clarify things that are inherently unclear, and pivot an idea when the person (or AI) being delegated to doesn’t fully grasp the pivot and keeps bringing in old ideas.
I think most people have had an experience trying to delegate a task, where it becomes so much work to wrangle the person, that they just do it themselves. I’ve run into this countless times. That’s how it feels to use AI.
You can probably assume the person who suggested it isn't having the experience of it being more work to do that way.
It’s really not. For anything substantial, the things that you do to manage an LLM are the same things that you should be doing to manage a team of human devs, even if the team is just yourself.
Documentation. Comments. Writing a plan and/or a spec before you begin coding. Being smart with git commits and branches.
Not even close. A friend and I are working on an iOS game (a tower defense style game). We are writing 0 code ourselves. We both have a history of iOS development, he is still actively involved and I've move away from it in recent years.
In about 2 weeks we have a functional game, 60 levels, 28 different types of enemies, a procedurally generated daily challenge mode, an infinity mode. Tower crafting and upgrades, an economy system in the game for pay for the upgrades.
This likely would have taken us months to get to the point that we are at, it was playable on Day 2.
Could you explain how a chat session progresses, with an example if possible?
I start with what I want to build. In the initial prompt I provide an overview of what I want, and then some specifics. Last night I added an archive to the Daily Challenge mode, so if you missed a day's challenge you could go back and play it. This is what my initial prompt looked like:
---
I'd like to add an archives mode to the daily challenge. This will allow players to complete any daily challenges they didn't attempt on the actual day.
It will look like a calendar, with the dates in Green if it was played, and in white if not.
The archive should only go back to January 30, 2026, the day the project started. Include a to do to change this date prior to release.
Rewards for completing daily challenges via the archive should be 25% of the normal value.
---
Claude Code then asked me a couple of clarifying questions before it harnessed the superpowers:writing-plans skill and generate a document to plan the work. The document it put together is viewable at https://gist.github.com/Jeremy1026/cee66bf6d4b67d9a527f6e30f...
There were a couple of edits that I made to the document before I told it to implement. It then fired off a couple of agents to perform the tasks in parallel where possible.
Once it finished I tested and it worked as I had hoped. But there was a couple of follow up things that would make it more intertwined with everything else going on around daily challenges. So I followed up with:
---
lets give 1 cell for compelting an archived daily challenge
---
And finally:
---
Now that we are tracking completions, can we update the notification to complete daily mission to include "Keep your X day streak"
---
Sounds like I should give Claude Code another try. The last time I worked with it, it was quite eager to code without a good plan, and would overcomplicate things all the time.
Not entirely relevant, but the example I remember is I asked for help with SQL to concatenate multiple rows into a single column with SQL Server and instead of reminding me to use STRING_AGG, it started coding various complicated joins and loops.
So my experience is/was a little different. Regardless, I think I should take one of my old programs and try implementing it from ground up by explaining the issue I'm trying to solve to see how things progress, and where things fail.
Another example is the tower stat caps. When Claude Code generate the first pass, it make it so that the tower level would control each individual stat's cap. Which was way too high. I didn't know exactly what the limits were, but knew they needed to be pulled back some. So I asked it:
-Start Prompt-
Currently, a towers level sets the maximum a single stat can be. Can you tell me what those stat caps are?
-End Prompt-
This primed the context to have information about the stat caps and how they are tied to levels. I followed up after it gave me a chart back with Tower Level and Max Stat Rank with some real stats from play
-Start Prompt-
Lets change the stat cap, the caps are currently far too high. All towers start at 1 for each IMPACT stat, my oldest tower is Level 5, and its stats are I-3, M-4, P-6, A-3, C-1, T-1. How do you think I could go about reducing the cap in a meaningful way.
-End Prompt-
It came back with a solution to reduce the individual stat cap for individual stats to be tower level + 1. But I felt that was too limiting. I want players to be able to specialize a tower so I told it have the stat cap be total, not per stat.
-Start Prompt-
I'm thinking about having a total stat cap, so in this towers case, the total stats are 18.
-End Prompt-
It generated a couple of structures of how the cap could increase and presented them to me.
-Start Prompt-
Yes, it would replace the per-stat cap entirely. If a player wants to specialize a tower in one stat using the entire cap that is fine.
Lets do 10 + (rank * 3), that will give the user a little bit of room to train a new tower.
Since it's a total stat cap, if a user is training and the tower earns enough stat xp to level beyond the cap, lock the tower at max XP for that stat, and autoamtically level the stat when the user levels up the tower.
-End Prompt-
It added the cap, but introduced a couple of build errors, so I sent it just the build errors.
-Start Prompt-
/Users/myuser/Development/Shelter Defense/Shelter Defense/Views/DebugTowerDetailView.swift:231:39 Left side of mutating operator isn't mutable: 'tower' is a 'let' constant
/Users/myuser/Development/Shelter Defense/Shelter Defense/Views/DebugTowerEditorView.swift:181:47 Left side of mutating operator isn't mutable: 'towerInstance' is a 'let' constant
-End Prompt-
And thus, a new stat cap system was implemented.
It's like managing a team of 6-8 year olds.
Put that down! What are you doing? Don't put that in your mouth. Where are you going? Stop that! Why are you sitting there alone, Johnny?
But that is ... a lot of work.
... which is why it is usually faster for me to just write the code myself :-)
However ChatGPT is really helpful doing sysadmin style tasks on Linux.
What you’re doing is the so called “slot machine AI”, where you put some tokens in, pray, and hope to get what you want out. It doesn’t work that way (not well, at least)
The LLM under the hood is essentially a very fancy autocomplete. This always needs to be kept in mind when working with these tools. So you have to focus a lot on what the source text is that’s going to be used to produce the completion. The better the source text, the better the completion. In other words, you need to make sure you progressively fill the context window with stuff that matters for the task that you’re doing.
In particular, first explore the problem space with the tool (iterate), then use the exploration results to plan what needs doing (iterate), when the plan looks good and makes sense, only then you ask to actually implement.
Claude’s built in planning mode kind of does this, but in my opinion it sucks. It doesn’t make iterating on the exploration and the plan easy or natural. So I suggest just setting up some custom prompts (skills) for this with instructions that make sense for the particular domain/use case, and use those in the normal mode.
With this kind of workflow you run out of tokens quickly, in my experience.
I mainly use it in a work context where it’s not my money I burn. I do have a private subscription that I’m going to use for a project. Do you have any tips how to try and do kind of what I describe, but in a more cost sensitive way?
Just burn the tokens. It’s an upfront cost that you pay once at the beginning of a project, or on a smaller scale at the beginning of a major feature.
For context, I’ve built about 15k loc since Christmas on the $20 plan, plus $18 of extra usage. Since this is a side project, I only hit the limits once or twice per week.
I wrote a post here that describes how I was able to rein in the AI to build something useful: Maybe my experience/tips will help.
https://medium.com/@josh.beck2006/i-vibe-coded-a-cryptocurre...
Show us your prompts.
Two questions:
1. How are you using Claude? Are you using https://claude.ai and copying and pasting things back and forth, or are you running one of the variants of Claude Code? If so, which one?
2. If you're running Claude Code have you put anything in place to ensure it can test the code it's writing, including accessing screenshots of what's going on?
I’ve had good luck using https://github.com/gsd-build/get-shit-done
It will ask you questions, break down the project into smaller tasks, work through them one by one with UAT check points along the way.
It also handles managing your context
i added a comment about GSD here and it was nice to see yours too... I'm just a user of GSD but boy has it changed the context rot I used to experience and the system is just one-shotting everything I ask from basic to complex and finally being able to handle stuff and not make things up and mess stuff up (or go in circles)...
I (and my colleagues) get consistent good results and I am starting to believe it is because our experience is in running large projects for decades with outsourcing companies. We would get assigned a project and a company, usually on the other side of the world, and we would need to make it work: seems LLMs is pretty much the same type of work. And we get consistent gains (better than with outsourcing on average, also because our tasks run 247 which was never financially possible even with the large clients we worked with) from this. Reading that so many people have issues with even trivial stuff makes me think my team has at least some kind of skill others do not have; we kind of assumed everyone is getting the same benefits really.
1. Make sure you are using Opus model. Type /model and make sure Opus is selected. While many say sonnet is good, too, I’m not too convinced. Opus is the first model that actually convinced me to use AI as my daily driver - and I’m a developer for about 20 years. 2. Make the tasks as small and specific as possible. Don’t prompt „create a todo app with user login“ but „create a vue app where users can register, don’t do more than that“, then „build a user login“ then, „create a page to create todo items“, then „create a page to list todo items“, then „on list page, add delete functionality“ - and so on, you get the idea. 3. beware the context size. Claude code will warn you if you exceed it, but even before: the higher the context window, the higher AI will miss things. If you start a new prompt that doesn’t require the whole context of the previous one, type /clear. 4. build an agents.md or Claude.md. /init will do that for you, but it will just create a Claude.md with information that it might think are important - but easily miss things. You know best. It often also includes file and directory structure, while it could easily find out again (tree command) without that info in agents/claude file. Still I recommend: let Claude create that file, then adjust it to your needs. Only add important stuff here. The more you add, the more you spam the context. Again, try to keep context small. 5. if Claude needs a long time for finishing a task or did it wrong at first attempt, tell it to update the Claude.md with information to not do the same mistakes the next time again. 6. make sure you understand the code it created. Add conventions to agents.md that will make the code more readable (use early returns, don‘t exceed nesting level of 3, create new methods with meaningful names instead of inline comments etc.)
Good luck!
Can you give some history of what you did? We can't answer "what am I doing wrong?" if you don't tell us… what you did.
While I have had some good experiences with CC, I do use at least double the tokens and probably more like 5x going through fixes / debugging from its initial efforts. I don't think this is always bad, because it helps me to understand some of the more complicated interactions of existing and new code and improves documentation, but it's irritating when it runs out of usage allotments when it has broken something. There are some small things it never has managed to fix that I have to figure out myself, but again, I learn from that. Mapping out a data structure in advance and creating a plan before immediately coding can also help, but at least in our project, sometimes it just takes an incorrect approach and so I don't just let it go off and do things willy-nilly. I can't at all imaging having an agent free to maintain the code at this point, despite the past 2 weeks' hype cycles.
If you’re using claude code/cursor, you should be using plan mode.
There are 3 major steps:
(Plan mode)
1. assuming this is an existing codebase, load the relevant docs/existing code into context (usually by typing @<PATH>
2. Ask it to make a plan for the feature you want to implement. Assuming you’ve already put some thought into this, be as specific and detailed as you can. Ask it to build a plan that’s divided into individually variable steps. Read the plan file that it spits out, correct and bad assumptions it made, ask it questions if you’re unclear one what it’s saying, refine, etc.
(agent mode) Ask it to build the plan, one step at a time. After it builds each step verify that it’s correct, or have it help you verify it’s correct in a way you can observe.
I have been following this basic process mostly with Opus 4.5 in a mixture of claude code and cursor working on a pretty niche image processing pipeline (also some advanced networking stuff on the side) and have hand-written basically zero code.
People say - “your method sounds like a lot of work too” and that’s true, it is still work, but designing at a high level how I want some CUDA kernel to work and how it fits into the wider codebase and then describing it in a few sentences is still much faster than doing all of the above anyway and then hand writing 100 lines of CUDA (which I don’t know that well).
I’d conservatively estimate that i’ve made 2x the progress in the same amount of time as if I had been doing this without LLM tools.
It takes many months to figure this out, much longer than learning a new programming language.
Read through anthropics knowledge share, check out their system prompts extracted on github, write more words in AGENTS/CLAUDE.md, you need to give them some warmup to do better at tasks.
What model are you using? Size matters and Gemini is far better at UI design work. At the same time, pairing gemini-3-flash with claude-code derived prompts makes it nearly as good as Pro
Words matter, the way you phrase something can have disproportionate effect. They are fragile at times, yet surprisingly resilient at others. They will deeply frustrate you and amaze you on a daily basis. The key is to get better at recognizing this earlier and adjusting
You can find many more anecdotes and recommendations by looking through HN stories and social media (Bluesky has a growing Ai crowd, coming over from X, good community bump recently, there are an anti-ai labelers/block lists to keep the flak down)
Try this:
* have Claude produce wireframes of the screens you want. Iterate on those and save them as images.
* then develop. Make sure Claude has the ability to run the app, interact with controls, and take screenshots.
* loop autonomously until the app looks like the wireframes.
Feedback loops are required. Only very simple problems get one-shot.
What tools do you use for wireframes / how are you generating them?
hmm but wouldn't that rapidly spend my tokens?
Effective use of LLMs in this way is not cheap.
First of all, congratulations on asking this question, it seems that everyone is an AI expert these days and it takes courage to admit you're not one of them (neither am I or most of of everyone).
In my little experience, what I've seen work is that you need to provide a lot of constraints in the form of:
- Scope: Don't build a website, but build a feature (either user facing or infra, it doesn't matter). I've found that chunking my prompts in human-manageable tasks that would take 0.5-1 day, is enough of a scale down.
- Docs .md files that describe how the main parts of the application work, what a component/module/unit of code looks like, what tools&technologies to use (and links to the latest documentation and quickstart pages). You should commit these to code and update them with every code change (which with Claude is just a reminder in each prompt).
- Existing code, if it's not a greenfield project.
It really moves away from the advertised paradigm of one-shot vibe-coding but since the quality of the output is really good these days, this long preparation will give you a production ready output much sooner than with traditional methods.
> I ask it to redo everything from scratch.
This reminds me of someone who dropped into #java on undernet once upon a time in the 90s. "I can't get it to work" , and we kept trying to debug, and for some reason we kept hitting random new walls. It just never would work! Turns out that they were deleting their .java file and starting over each time. Don't do that.
---
Take it as a sequence of exercises.
Maybe start like this:
Don't use claude code at all to begin with. It's a pair programming exercise, and you start at the keyboard, where you're confident and in control. Have claude open in the web interface alongside, talk through the design with it while working; and ask to google stuff for you, look up the api, maybe ask if it remembers the best way(s) to approach the problem. Once you trust it a bit, maybe ask for code snippets or even entire functions. They can't be 100% correct because it doesn't have context... you might need to paste in some code to begin with. When there's errors, paste them in, maybe you'll get advice.
If you're comfy? Switch seats, start using claude code. Now you're telling claude what to do. And you can still ask the same questions you were asking before. But now you don't need to paste into the web interface anymore, and the AI sure as heck can type faster than you can.
Aren't you getting tired of every iteration where you're telling the AI "this went wrong", " that went wrong"? Maybe make sure there's a way for the AI to test stuff itself, so it can iterate a few cycles automatically. Your LLM can iterate through troubleshooting steps faster than you can type the first one. Still... keep an eye on it.
And, really that's about where I am now.
Similarly underwhelming experience. I have been using it for about a week. I like how unlike Gemini or ChatGPT I don't have to copy+paste code. I have to keep going back to it to tell it I don't want it to access unrelated project folders beyond the scope of the problem, then it runs out of tokens for the next few hours, or gives me a response of varying quality.
My pattern matching brain says this is normal for hype. It's a good product, but no where near to the level you read about in some places (like HN in this case)
You need to give it the tools to check its own work, and remove yourself from that inner low-level error resolution loop.
If you're building a web app, give it a script that (re)starts the full stack, along with Playwright MCP or Chrome DevTools MCP or agent-browser CLI or something similar. Then add instructions to CLAUDE.md on how and when to use these tools. As in: "IMPORTANT: You must always validate your change end-to-end using Playwright MCP, with screenshot evidence, before reporting back to me that you are finished.".
You can take this further with hooks to more forcefully enforce this behavior, but it's usually not necessary ime.
There are skills available that might help you out. The “superpowers” set from Anthropic is really impressive.
The idea is, you want to build up the right context before starting development. I will either describe exactly what I want to build, or I ask the agent for guidance on different approaches. Sometimes I’ll even do this in a separate Claude (not Claude Code) conversation, which I feel works a bit faster. Once we have an approach, I will ask it to create an implementation plan in a markdown file, I clear context and then tell it to implement the plan.
Check out the “brainstorming” skill and the “git worktrees” skill. They will usually trigger the planning -> implementation workflow when the work is complex enough.
Superpowers is from Obra (Jesse Vincent), quintessential hacker and was a leader in the Perl community back in the day (still?).
https://github.com/obra/superpowers
The problem I run into is the propensity for it to cheat so you can't trust the code it produces.
For example, I have this project where the idea is to use code verification to ensure the code is correct, the stated goal of the project is to produce verified software and the daffy robot still can't seem to understand that the verification part is the critical piece so... it cheats on them so they pass. I had the newest Claude Code (4.6?) look over the tests on the day it was released and the issues it found were really, really bad.
Now, the newest plan is to produce a tool which generates the tests from a DSL so they can't be made to pass and/or match buggy code instead of the clearly defined specification. Oh, I guess I didn't mention there's an actual spec for what we're trying to do which is very clear, in fact it should be relatively trivial to ensure the tests match for some super-human coding machine.
Not nearly enough context to be a front page post but here we are. Does everyone just give up on all fundamental expectations including determinism when they see AI in the title? Is the AI final boss to take over the front page of hacker news? /rant
To answer the question I would highlight the wrong regions in neon green manually via code. Now feed the code (zipped if necessary) to the AI along with a screenshot. Now give it relatable references for the code and say "xxxx css class/gtk code/whatever is highlighted in the screenshot in neon. I expect it to be larger but it's not, why?"
I’m probably going to be downvoted for this but this thread doesn’t really reflect well on the promises of Generative AI and particularly the constantly reiterated assurance that we’re on the verge of a new industrial Revolution.
I'm feeling the same way. It's quite the contrast from all the hype posts that make it sound like you give the AI a rough idea of what you're looking for and then it will build it from start to finish on its own.
Yes. I’m trying it, it’s too early for me to state a conclusion, but it’s not clear what the point is of an interface that requires magic touch best described as je ne sais quoi.
The alternative to this isn’t even necessarily no AI, just not using it this way.
Agreed. Many of the suggestions are pretty much code it yourself, but without actually tapping the individual keys.
Furthermore, and more generally, one of the great things about (traditional) coding is that it allows 'thinking through making' - by building something you learn more about the problem and thus how best to solve it. Code generation just leaves you with reviewing, which is less powerful in this way I believe. See also 'thinking through writing [prose]'.
I said it since the very first video of somebody who built a login page with it. They kept adding more and more constraints and at some point it's just coding but with extra steps.
It doesn't mean those tools do not have value though but they're not capable of "coding ", in the sense we mean in the industry, and generating code isn't coding.
I don't think the OP gave enough information for us to really have any honest conversation about this one way or the other.
That said: I suspect that OP is providing low-detail prompts.
These tools cannot read your mind. If you provide an under-specified prompt, they will fill in all the details for things that are necessary to complete the task, but that you didn't provide. This is how you end up with slop.
Get the superpowers plugin and then ask Claude to design and document the system. It will go into brainstorming mode and ask you a lot of questions. The end result will be a markdown file. Then get another agent (maybe ChatGPT) to critique and improve the design (upload the markdown file in the web version). Then give it back to Claude and have it critique and improve. Last step, make Claude analyze the design and then document a step by step implementation guide. After that turn Claude code loose on implementation. Those techniques have been working for me when doing a project from scratch.
Claude is a programming assistant not a programmer.
You still need knowledge of what you are building so you can drive it, guide it, fix things.
This is the core of the question about LLM assisted programming - what happens when non programmers use it?
> what happens when non programmers use it?
We have the answer already, which product was fully built by a non-programmer with those tools? I can't find an example.
They just trip into their own code at some point and if there's nobody to watch, they end up with something they can't recover from.
It's especially devastating when they don't know enough git to get back on tracks
I listened to a conversation between two superstar developers in their 50's, who have been coding for more than most readers here have been alive, about their experience with Claude Code.
I wanted to tear my ears out.
What is crystal clear to me now is using LLMs to develop is a learned and practiced skill. If you expect to just drop in and be productive on day one, forget it. The smartest guy I know _who has a PhD in AI_, is hopeless at using it.
Practice practice practice. It's a tool, it takes practice. Learn on hobby projects before using it at work.
The problem is that it’s being marketed like it’s magic and will make people obsolete… not as a tool with a high learning curve.
I don’t blame people for being upset when it can’t do what all the hype says it will do.
The way people talk about the latest Claude Code is the same way people were talking 2-3 years ago about whatever the latest model was then. Every release gets marketed as if it’s a new level of magic, yet we’re still here having the same debates about merit, because reality doesn’t match the marketing and hype.
It has gotten better, I tried something with early ChatGPT that failed horribly (a basic snake game written in C), and just tried the exact same thing again last week and it worked—it wasn’t good, but it technically worked. But if it took 3 years to get good enough to pass my basic test, why was I being fed those lies 3 years ago? The AI companies are like the boy who cried wolf. At this point, it’s on them to prove they can do what they say, not up to me to put in extraordinary efforts to try and get value out of their product.
Last week I sat through a talk from one of our SVPs who said development is cheap and easy now, then he went on about the buy vs build debate for 20 minutes. It’s like he read a couple articles and drank the kool-aid. I also saw someone talking about ephemeral programs… seeing a future where if you want to listen to some MP3s, you’ll just type in a prompt to generate a bespoke music player. This would require AI to reliably one-shot apps like Winamp or iTunes in a few words from a layperson with no programming background. These are the ideas the hype machine is putting in people’s minds that seem detached from reality.
I don’t think the, “you’re holding it wrong”, type responses are a good defense. It’s more that it’s being marketed wrong, because all these companies need to maintain the hype to keep raising money. When people use the AI the way the hype tells them it should work… it doesn’t work.
Are you saying that with the models from 3 years ago, with the right context management and prompting, your C snake game would fail?
I agree with you, expectations are not being set correctly.
That's my point. Learn to use the tools, including as they were three years ago, and magic does happen
I gave it a basic one or two sentences prompt for the snake game, the code it generated wouldn’t compile 3 years ago. It was also unable to fix the errors. A similar prompt last week worked. It wasn’t a “good” version of the game, but it compiled and functioned.
The process being described by many in the comments removes all the magic. It sounds laborious and process heavy. It removes the part of the job I like, while loading the job with more work I don’t enjoy. This feels like the opposite of what we should want AI to do.
This is going to sound crazy but I felt it was super degraded this morning.
CC was slow and the results I was getting were subpar having it debug some easy systems tasks. Later in the afternoon it recovered and was able to complete all my tasks. There’s another aspect to these coding agents: the providers can randomly quantize (lobotomize) models based on their capacity, so the model you’re getting may not be the one someone else is getting, or the same model you used yesterday.
Put it in plan mode.
Then, repeatedly ask Claude to criticize the plan and use the "AskUserQuestion" tool to ask for your input.
Keep criticizing and updating the plan until your gut says Claude is just trying to come up with things that aren't actually issues anymore.
Then unleash it (allow edits) and see where you get. From there you may ask for one off small edits. Or go back into plan mode again
It was a bit of a learning curve for me as well. I've found having programming experience comes in handy. I know a lot of non-technical folks (people who don't know how to program) who keep bumping their heads on these tools; crunching through credits when a simple update to the code/push to repo is all that's needed.
When claude or codex does something other than what you want, instead of getting mad at it, ask it what it saw in your prompt that led it to do what it did, and how should you have prompted it to achieve what you wanted. This process tends to work very well and gives you the tools you need to learn how to prompt it to achieve the results you want.
I don't wish to be harsh but why, upon encountering a syntax error, would you have the next step be "redo everything from scratch?" This seems odd to me.
You need to be very specific about what to build and how to build it, what tools to use, what architecture it should do, what libraries, frameworks it should include. You need to be a programmer to be able to do this properly and it still takes a lot of practice to get it right.
My experience so far:
1. Good for proof of concepts, prototypes but nothing that really goes to heavy production usage 2. Can make some debugging and fixing that usually requires looking the stack, look the docs and check the tree 3. Code is spaghetti all way down. One might say it is ok because it is fast to generate, but the bigger the application, every change gets more expensive and it always forget to do something. 4. Tests it generates is mostly useless. 9/10 times it always passes on all tests it creates for itself but the code does not even start. No matter what type of test. 5. Frequently lied about the current state of the code and only when pushed it will admit it was wrong.
As others said, it is a mix of the (misnomer) Danny Kruger effect and some hype.
I tried possibly every single trick to get it working better but I feel most are just tricks. They are not necessarily making it work better.
It is not completely useless, my work involves doing prototypes now and then and usually they need to be quite extensive. For that it has been a help. But I don't feel it is close to what they sell
I think at the current state of the art, LLM tools can help you build things very quickly, but they can't help you build something you yourself are incapable of building, at least not in a sustained way. They need hand holding and correction constantly.
I don't think that's true, I've seen examples to the contrary. Here for example a recent article [1] from a non programmer building a tool. The article is long so I pasted the relevant part below. My thoughts go more in the direction, the article author built something that is complicated for non technical people, but in essence simple -- he says it so himself "copy paste". What if what the OP here is building is something novel and Claude doesn't know how to build it?
Relevant excerpt:
I spent a bit of time last month building a site to solve a problem I’ve always found super-annoying about the legislative process. It’s hard to read Federal bills because they don’t map to the underlying code. Anyone who has worked in Congress knows what I mean, you get a bill that says “change this word from ‘may’ to ‘shall’ in section XYZ of Federal law.” To understand what it does, and find possible loopholes they are trying to sneak in, you have to go to that underlying Federal law and look at where it says “may” and then put “shall” in there and read it. It’s basically like a manual version of copy and pasting, except much more complicated and with lawyers trying to trick you.
So I wrote an app that lets you upload legislation, and it automatically shows you how it changes Federal law. There are commercial versions of this software, and some states do it for their proposed legislation. But I haven’t seen anything on the Federal level that is free, so I built it. (The code is here.) It’s not very good. It’ll probably break a lot. There’s no “throat to choke” if you use it and it’s wrong. And my guess is that Anthropic or Gemini ultimately will be able to do this function itself eventually. But the point is that if I can build something like this in my spare time and deploy it without any training at all, then it’s just not that hard for an organization with some capital to get rid of some of its business software tools.
[1] https://www.thebignewsletter.com/p/monopoly-round-up-the-2-t...
Well congratulations on having an opinion!
You should try out the GSD system, check "GSD github repo"... it will launch everything as a phase and steps and clear context for you so you never compact, and tons of other features...
Try a prompt that helps claude iterate until it can verify the result.
For example, if you tell it to compile and run tests, you should never be in a situation with syntax errors.
But if you don’t give a prompt that allows to validate the result, then it’s going to get you whatever.
Have you read the best practices? https://code.claude.com/docs/en/best-practices Are you using plan mode?
The whole thing is worth reading, multiple times over, if you want to find success in using the tool. But I'll call attention to this passage especially:
> Include tests, screenshots, or expected outputs so Claude can check itself. This is the single highest-leverage thing you can do.
I've found it really valuable to pair with people, sit at a computer together while they're driving and using AI. It's really interesting to see how other people prompt & use AI to explore the problem.
What do your prompts look like? Garbage in-Garbage out applies to using AI, probably at a much larger scale than other applications of the phrase.
There's a lot of ads/propaganda/influencer BS/copywriting about how incredible AI but the reality is that its not "that" good. VCs want a return on their investment.
Also I suggest giving it low-level instructions. Its half-decent for low level stuff especially if it has access to preexisting code. Also note that it does exactly what you tell it to do like a genie. I've asked it to write a func that already exists in the codebase and it wrote a massive chunk of code. It wasn't until after it was done that I remembered we already have the solution to the problem done. Anyhow the hype is unreal so tailor expectations accordingly.
could you share an md of your prompts? I find with those tools I still have to break the problem down into verifiable pieces, and only move on to the next step once the previous steps are as expected.
syntax error is nothing, I just paste the error into the tui and it fixes it usually.
claude-code added the /insights command which might tell you what you are doing wrong, using your history.
from the basics, did you actually tell it that you want those things? its not a mind reader. did you use plan mode? did you ask it to describe what its going to make?
Stop wasting your time and money. There's a lot of money being spent to get very known individuals to praise this vibe-coding.
Think about AI the same way you'd think about trading courses: would you buy a course that promises 10,000% returns? If such returns were possible, the course seller would just trade instead of selling courses.
Same logic here - if "vibe-coding" really worked at scale, Claude would be selling software, not tokens.
If you expect it to _do_ things for you - you're setting yourself up for failure.
If you treat it as an astonishingly sophisticated and extremely powerful autocomplete (which it is) - you have plenty of opportunities to make your life better.
In other words, if we believe what the CEOs of the AI companies claim, we are setting ourselves up for disappointment.
To be fair with OP, the hype about what the tool is "supposed" to be doing ("your accountants will rebuild the ERP over the week end, you don't need programmers, etc...") is setting a dev up for frustration.
Personnaly, I'm trying to learn the "make it write the plan, fix the plan, break it down even more, etc..." loops that are necessary; but I haven't had a use case (yet?) where the total time spent developing the thing was radically shorter.
LLMs make wonders on bootstrapping a greenfield project. Unfortunately, you tend to only do this only once ;)
> LLMs make wonders on bootstrapping a greenfield project. Unfortunately, you tend to only do this only once ;)
This is why LLMs look so impressive in demos. Demos are nearly always greenfield, small in scale, and as long as it launches, it looks successful.
Sounds right. Any one shot anything is cap
Add https://github.com/obra/superpowers
and then try again.
I think this reply might have been downvoted for being a bit glib, but the superpowers plugin took my Claude Code experience from mostly frustrating to nearly-magical
I’m not a software engineer by training nor trade, so caveats apply, but I found that the brainstorming -> plan writing -> plan execution flow provided by the skills in this plugin helps immensely with extracting assumptions and unsaid preferences into a comprehensive plan—-very similar to the guidance elsewhere in this thread, except automated/guided along by the plugin skills
That matches my Claude experience.
it's just programming with extra steps (english)
Are you using plan mode?
Typical flow for a greenfield project for me is:
First prompt, ask it to come with a plan, break it down to steps and save it to a file.
Edit file as needed.
Launch CC again, use the plan file to implement stage by stage, verify and correct. No technical debugging needed. Just saying X is supposed to be like this, but it’s actually like that goes a long way.
One thing I should note is that I find Claude to be amazing for helping me write or brainstorm non-coding stuff.
It is much better than other models I have tried. Didn't think the post would blow up so much tbh..
it's a tool, not an oracle. you build with it, you aren't its customer, you're its wielder.
for now, anyway.
The fact that you got a syntax error at all is pretty telling. Are you not using agent mode? Or maybe that's just the experience with inferior non-statically typed languages where such errors only appear when the application is run. In any case, the key is to have a feedback mechanism. Claude should read the syntax errors, adjust and iterate until the error is fixed. Similarly, you should ask Claude to write a test for your landscape/portrait mode bug and have it make changes until the test passes.
I’ve found a problem with LLMs in general is that it is trying to mirror the user. If the user is a world class software dev you will get some good stuff out of it. If the user is not experienced at programming you will get something that resembles that out of it.
what language/framework are you asking it to work with?
Ah.
There used to be more or less one answer to the question of "how do I implement this UI feature in this language"
Now there are countless. Welcome to the brave new world of non-deterministic programming where the inputs can produce anything and nothing is for certain.
Everyone promises it can do something different if you "just use it this way".
From what I've seen, giving Claude a single detailed prompt with exact specs upfront works way better than iterating fix-by-fix — each "fix this" request tends to regress something else because it loses context on what was working. For visual stuff like grid layouts I usually describe the final state precisely (viewport dimensions, aspect ratios, z-index for controls layer, etc.) in one shot rather than letting it guess. Still early days but the prompting style matters way more than people expect.
AI seems to work a lot better once you acquire some AI equity, you go from not working at all to AI writing all the code. /s
skill issue