I'm going back to writing code by hand

(blog.k10s.dev)

790 points | by dropbox_miner 16 hours ago

115 comments

  • pron 7 hours ago
    Yep. The only people I've heard saying that generated code is fine are those who don't read it.

    The problem is that the mitigations offered in the article also don't work for long. When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes it’s small, like the selection of a data structure. You can tell the agent what the constraints are with something like "Views do NOT access other views' state" as the post does.

    Except, eventually, you'll want to add a feature that clashes with that invariant. At that point there are usually three choices:

    - Don’t add the feature. The invariant is a useful simplifying principle and it’s more important than the feature; it will pay dividends in other ways.

    - Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.

    - Go back and change the invariant. You’ve just learnt something new that you hadn’t considered and puts things in a new light, and it turns out there’s a better approach.

    Often, only one of these is right. Often, at least one of these is very, very wrong, and with bad consequences.

    Picking among them isn’t a matter of context. It’s a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.

    Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered. What I've seen is that if you define the architectural constraints, the agent writes complex, unmaintainable code that contorts itself to it when it needs to change. If you don't read what the agent does very carefully - more carefully than human-written code because the agent doesn't complain about contortious code - you will end up with the same "code that devours itself", only you won't know it until it's too late.

    • perarneng 5 hours ago
      If you know how to write good code you can force AI to write good code with various techniques. It's 100% doable. You just need to figure out the problems AI has and find solutions to make it easier for it. Ex: extremely small contexts Modularize to modules with clear boundaries and only allow the AI to work within those boundaries. Make modules pure from IO so they are easily testable. Hide modules behind interfaces etc .. You can write 100 tests that executes within a second. You can write benchmarks etc .. AI needs boundaries and small contexts to work well. If you fail to give it that it will perform poorly. You are in charge.
      • pron 5 hours ago
        That doesn't quite work, and precisely for the reason I mentioned: You can definitely tell the AI to follow some strategy, but at some point the strategy will need to change, and the AI won't tell you that (even if you tell it to). Unless you read the code every time you won't know if the AI is following the strategy and producing good results or following it and producing bad results because the strategy has to change. This can happen even in small changes: the AI will follow the strategy even if the change proves it's wrong, and if you don't pay close attention, these mistakes pile up.

        So yes, you might get good results in one round, but not over time. What does work is to carefully review the AI's output, although the review needs to be more careful than review of human-written code because the agents are very good at hiding the time bombs they leave behind.

        • lobocinza 8 minutes ago
          [delayed]
        • lukan 4 hours ago
          How do you define "bad code"?

          If I instruct the AI to make small modules where I can verify they work, have tests and no side effects - then it is good enough code for me. It works, is readable and can be extended - and will turn into bad code if this is not done with care.

          • WalterBright 36 minutes ago
            > How do you define "bad code"?

            The harder the code is to understand, the badder it is (and the more likely it is infested with bugs).

          • pron 4 hours ago
            Sure, if you carefully review the agent's output, including tests, you can get good results. If you don't carefully review the output, you obviously have no idea if it's good enough for you. The only way to find out is that 30 changes down the line the agent won't be able to change one thing without breaking another, but by then the codebase will be too far gone to fix.
            • readitalready 1 hour ago
              I let agents break things 30 changes down the line. If something breaks, I add a check to my project validator and start over, with the validator providing instructions on what was wrong and how to fix it. It's all automatic, and now I have a guard against the exact same error in the future.

              Some of these checks have caught thousands of the same error, even with the latest Opus 4.7 writing the original code.

              • hunterpayne 54 minutes ago
                You proved that testing is a good idea, not that vibe coding is a good idea.
                • lukan 31 minutes ago
                  To be honest, I am past the point of wanting to convince people that AI is useful, if you want to refuse new tools other people find helpful, your loss.

                  (Also I stick to the original definition of "vibe coding = not looking at generated code", "LLM assisted coding = verify generated code", I do both, depending on the task)

          • throwaway173738 4 hours ago
            The concept of a small module is an architecture invariant. You’re making that decision, not the LLM. And you’ve made that decision because the machine is not good at certain things. You’re doing that because you can’t trust the LLM to make that decision on its own.
            • ChicagoDave 3 hours ago
              I’m doing it because as a DDD adherent, I’ve been building software that way for 15 years without GenAI and now with GenAI I can do it faster.

              You can’t play whack-a-mole with GenAI. You have to start from well-known principles and watch everything it produces. Every module or bounded context has to have its own invariants.

              You can’t fully automate software engineering with GenAI. It seems the vast majority of GenAI users think they can and end up in the same place as the OP.

              Maybe learn Domain-Driven Design, Event Sourcing, and then try again. The results will be dramatically improved.

              https://devarch.ai/

              • dominotw 1 hour ago
                oh yea right you discovered a secret technique to making llm outputs predictable and not probabilistic.
                • gabrielhidasy 40 minutes ago
                  That's not the point, the point is they can generate pretty good code, and do that most of the time, so ask them to generate the code, review it as you would review a more junior teammate or an opensource collaboration from an unknown source, and take advantage of their speed to test everything.

                  You can't make a great vibe-coded thing that you couldn't make yourself, but you can get pretty much the same code you would have made in a fraction of the time.

      • IdiotSavage 5 hours ago
        So, basically you need to micro-manage it. Where are your 10x gains now? And is it fun to work like that?
        • sirwhinesalot 4 hours ago
          This is actually what I do. I'm extremely picky about the code and force the LLM to rewrite it 1000x times until it is basically exactly what I want. You might be wondering what is the point when it would be faster for me to just write the code myself?

          I have ADHD and for whatever reason telling the LLM what to do instead of doing it myself bypasses the task avoidance patterns and/or focus problems I tend to suffer from. I do not find it fun, but I am thankful for it.

          • stepbeek 1 hour ago
            This framing of it being a tool that you find indispensable as an individual is important. I’m not interested in debating static vs dynamic types, or vim vs emacs, etc. If it works for you, then that’s great!

            But the difference with LLMs currently - I guess? - is that non-engineers are pushing the idea that it’s universally indispensable at scale. I think it leads to a lot of emotion bleeding into the debate.

          • black_knight 4 hours ago
            I have used LLMs a couple of times to get started on something. I don’t have ADHD, so this is not a regular occurrence for me. But when I have tried this, I have always found the LLM solution so horrible that it instantly inspired me to do it myself. So, in that sense it worked, I got unstuck, but no LLM garbage makes it into the project.
            • abalashov 2 hours ago
              It's nice to know I'm not alone in this. I have definitely used slop as inspiration by negative example.
            • zephen 2 hours ago
              Even pre-AI, when working with slop generated by other humans, starting with something was often better than staring at a blank screen.
          • yakattak 3 hours ago
            This is kind of how I feel I think. Putting pen to paper for me is hard.
        • andriy_koval 42 minutes ago
          > So, basically you need to micro-manage it. Where are your 10x gains now? And is it fun to work like that?

          it depends on language and infra, but some/many require lots of boilerplate and memorizing thousands of APIs, automating this is easy LLM 10x gain.

          I for example write SQL myself, because boilerplate is super-minimal, and core SQL is very minimal itself, there are like 20 constructs to memorize.

        • readitalready 1 hour ago
          I don't micromanage it. I let my projects custom linter micromanage it.

          Every project should have a custom linter for their tech stack. It would check for not just syntax errors, but architectural choices as well as taste guidelines.

          Whenever the LLM writes bad code, I add it to my linter to check against in the future.

        • hansmayer 5 hours ago
          Amen. Instead of freeing you up - AI enslaves you - and if it was even enslaving to a superior being at least!
        • nijave 1 hour ago
          Honestly, I think so. I do a mix of infrastructure and programming so don't tend to have any frameworks memorized. Using AI is much quicker than constantly referencing the docs.

          I can also switch between codebase with different frameworks and languages and make changes without spending all day reading docs.

          It's also pretty good at tracing code and that's fairly straight forward to verify the results manually. It can build a flow diagram in 10-30 minutes (depending on what tool calls need allowed and how many prompts it needs) versus me taking a couple hours to do the same.

        • forgotaccount3 5 hours ago
          > you need to micro-manage it.

          It is significantly easier to micro-manage an AI than a suite of junior developers. The AI doesn't replace a principal engineer, it's replacing junior and weaker senior developers who need stories broken down extremely concisely to be able to get anything done. The time it takes to break down a story such that a junior through weak senior developers can pick it up and execute it well would have the AI already done with testing built around it.

          • gylterud 4 hours ago
            Juniors learn. Some juniors are potential good seniors. Over time they will internalise good architecture and be able to make good judgments on their own.

            Micromanaging LLMs is like having Dory from Finding Nemo as your colleague. You find ways to communicate, but there is no learning going on.

            • abustamam 1 hour ago
              LLMs can learn, just not the same way that juniors do. When an LLM does something wrong you can always update it's rules or skills to not make that mistake again. Or you can utilize a subagent whose sole purpose is to review code to prevent that mistake. Lots of ways you can improve LLMs over time.

              Of course if you don't provide that feedback loop, no learning happens. I guess the same could be said of a junior, though.

              • ModernMech 1 hour ago
                Building larger systems of accountability isn't usually what people mean by learning. And besides, if telling an LLM not to do something were actually reliable, then LLMs would be a lot more useful than they are. And even if that were reliable, then you're just reinventing expert systems, which didn't work.
            • esafak 2 hours ago
              Juniors don't always learn.
          • hansmayer 3 hours ago
            I think if you tried working with some junior folks, you'd be quite surprised. You know, with at least some of them choosing to use their brains and all.
      • wombat-man 3 hours ago
        Yeah I agree. It's improved quite a bit just in the past few months. The code should always be reviewed, and you need to spend some time tuning your skills and agent configs. If you're still getting bad code out of your LLM tooling, you might not be using or configuring it correctly.
      • hansmayer 4 hours ago
        > You are in charge.

        No, if you have to do all of the stuff you have listed to kind-of-make-it-work...You are not in charge.

      • insane_dreamer 1 hour ago
        > You are in charge.

        Sure. That's how I work with AI, and the way I believe that AI is meant to be use -- as a companion tool.

        But it's a lot of work. It saves me time for certain tasks, but not others. I haven't measured my productivity gains, but they're at most 2x.

        But that's not "vibe coding" (which was the point of the article) or the (false) promise of "10x productivity" and "code that writes itself" that companies are being told is going to reduce their engineering headcount tenfold.

    • Zach_the_Lizard 6 hours ago
      I agree with this. I've been writing a new internal framework at work and migrating consumers of the old framework to the new one.

      I had strong principles at the outset of the project and migrated a few consumers by hand, which gave me confidence that it would work. The overall migration is large and expensive enough that it has been deferred for nearly a decade. Bringing down the cost of that migration made me turn to AI to accelerate it.

      I found that it was OK at the more mechanical and straightforward cases, which are 80% of the use cases, to be fair. The remaining 20% need changes to the framework. Most of them need very small changes, such as an extra field in an API, but one or two require a partial conceptual redesign.

      To over simplify the problem, the backend for one system can generate certain data in 99% of cases. In a few critical cases, it logically cannot, and that data must be reported to it. Some important optimizations were made with the assumption that this would be impossible.

      The AI tooling didn't (yet) detect this scenario and happily added migration logic assuming it would work properly.

      Now, because of how this is being rolled out, this wasn't a production bug or anything (yet). However, asking the right questions to partner teams revealed it and unearthed that some others were going to need it as well.

      Ultimately, it isn't a big problem to solve in a way that will mostly satisfy everyone, but it would have been a big problem without a human deeper in the weeds.

      Over time, this may change. Validation tooling I built may make a future migration of this kind easier to vibe code even if AI functionality doesn't continue to improve. Smarter models with more context will eventually learn these problems in more and more cases.

      The code it generates still oscilates between beautiful and broken (or both!) so for now my artistic sensibilities make me keep a close eye on it. I think of the depressed robot from the Hitchhiker's Guide to the Galaxy as the intelligence behind it. Maybe one day it'll be trustworthy

    • agentultra 4 hours ago
      The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

      Even if you could state it in a precise formal language the LLM under the agent doesn’t have the capability to understand what the invariant is for and why it’s important. You’ll still get oddly generated code. You might get an LLM that can associate certain tokens with those in the formal language specification which can hold invariants and perhaps even write the proofs… but you’ll still get a whole bunch of other code generated from the informal parts of the prompt.

      I agree that simply adding constraints and prompts to you skills and specs isn’t going to prevent these things. Worse, that even if you could invent a better mouse trap the creature will still escape.

      The problem is… “elongation:” the addition of code for the sake of the prompt/task/etc. Often less is better. This takes a human with the ability to anticipate what other humans would want/expect. When you need a generator, they’re great but it’s a firehouse that whose use should be restrained a little more.

      • pron 4 hours ago
        > The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

        That depends on the invariant. Some are behavioural, like "variable x must be even if y is positive", but some are architectural, such as "a new view requires a new class".

        But that's only one side of the problem because maintaining the invariant can be just as bad as breaking it. You ask the agent to add a feature and it may well maintain the invariant - only it shouldn't have, because the feature uncovers the fact that the invariant is architecturally wrong.

        The problem is that evolving software requires exercising judgment about when you need to follow the existing strategy and when you need to rethink it. If there is any mechanical rule that could state what the right judgment is, I don't know what it is.

        • agentultra 4 hours ago
          Yes! I was trying to make this part of my point but you definitely made it much more clear and concise.

          With a skilled operator, it could be possible to drive an agent to handle these kinds of changes. I would be concerned that spoken language wouldn't be precise enough to handle the refactoring and changes necessary to make to a code base when an invariant changes... regardless of whether it was a property, architectural, or procedural change. It already can take several prompts and burn quite a few tokens doing large-scale rewrites and code changes. Maybe the parameters and weights can be tuned for this kind of work but I remain skeptical that what we have at present is "efficient" at this kind of work.

    • 21asdffdsa12 7 hours ago
      And the solution is the same, as when it was outsourced- and the "patch" was fix it by writing spec. Thus i conclude my TED talk with the statement: LLMs are the new outsourcing and run into the same problems.
      • pron 6 hours ago
        Not quite, because the architecture often needs to evolve when you learn more as the project evolves. People will complain when they feel the constraints drive them to unnatural workarounds, the agents don't.

        You can try telling the agent to stop and ask when a constraint proves problematic, except it doesn't have as good a judgment as humans to know when that's the case. I often find myself saying, "why did you write that insane code instead of raising the alarm about a problem?" and the answer is always, "you're absolutely right; I continued when I should have stopped." Of course, you can only tell when that happens if you carefully review the code.

        • multjoy 30 minutes ago
          It has no judgement at all.
        • senordevnyc 2 hours ago
          So I run a solo saas that supports my family, and so the stakes feel very high for me. I use AI heavily, and I’ve seen the exact problem you’re describing. I feel like I’m often really riding the edge in terms of trying to use AI to accelerate product development while not letting tech debt accumulate too fast, or let my mental model of the codebase slip too much.

          Here’s what’s working for me right now:

          1. The basics: use best model available, have skills and rules that specify project guidelines, etc.

          2. Always use plan mode. It works much better to iterate on the concept of what we’re going to do, then do the implementation. The models will adhere to the plan at very high rates in my experience.

          3. Don’t give chunks of work that are too large in scope. This is just art, and I’m constantly experimenting with how ambitious I can be.

          4. I review all code to some extent, but I have a strong mental model of what areas of the app are more critical, where hidden bugs might accumulate, etc, and I review both tests and impl more strenuously in those areas. Whereas like a widget for my admin panel probably gets a 2 second glance.

          5. Have the discipline to go through periodically and clean up tech debt, refactor things that you’d do differently now, etc. I find the AI a huge help here, because I can clean up cruft in an hour that would have once taken me days, and thus probably wouldn’t have gotten done.

          6. I’m experimenting with shifting my architecture to make it easier to review AI code, make it less likely it’ll make mistakes, etc. Honestly mostly things I should have always been doing, but the level of formalism and abstraction on my solo projects is usually different than on a bigger team.

          To each their own, but I’ve grown this from nothing to about $350k in ARR over the last ten months, and I’m very confident I never could have built this product without AI help in triple that time.

      • marcosdumay 2 hours ago
        It's approximately the same problems, but stretched to an insane extent that you can never expect before it arrives.
      • i_love_retros 7 hours ago
        Don't outsource either then
        • 21asdffdsa12 6 hours ago
          How about we outsource it to pakistan and they use LLMs. That way, we do what the LLM people do - many agents and stacked on top
    • benguild 7 hours ago
      “The only people I've heard saying that generated code is fine are those who don't read it.” Are you sure these people aren’t busy working rather than chatting? (haha)

      But in all seriousness it depends on what you’re doing with it. Writing a quick tool using an LLM is much easier than context changing to write it yourself. If you need the tool, that’s very valuable.

      • sevenzero 7 hours ago
        Also as a webdev, it writes basic CRUD pretty good. I am tired of having to build forms myself and the LLMs are usually really good at that.

        Been building a new app with lots of policies and whatnot and instructing a LLM is just much faster than doing the same repetitive shit over and over myself.

        • spockz 6 hours ago
          If you were tired of writing forms yourself, had you looked at https://jsonforms.io/? Just specify the the data you need, or extract it from the api spec and go. Display the form uniformly every time across your site. No need to burn AI time.
          • sevenzero 5 hours ago
            I typically avoid any most abstractions or third party dependencies. Yea it could be neat, but I still need a lot of custom logic here and there. Same reason I avoid stuff like GraphQL.

            A little update: upon viewing the page on phone, for me the "comitter" field in the demo is going out of bounds... Really not speaking for their product.

            • topaz0 5 hours ago
              Sounds like you're just fine depending on an extremely imprecise abstraction (natural language) and an extremely opaque third party (anthropic).
              • abustamam 1 hour ago
                I think you're missing the point of the commenter. A third party library is a new dependency. Since there's new vulnerabilities almost every week in the npm ecosystem, if you can do something without a third party, it's probably better.

                With LLM driven code you can generate code once, and then if anything is shitty about it you can always manually update it yourself without the need of an LLM. It's a dependency of convenience, not an app-dependency.

                • topaz0 1 hour ago
                  From the description of the recommended tool it sounded to me like something that you use to deterministically generate code from a spec, which you could then modify if you like. That would be the same kind of dependency as the LLM workflow you describe, except that the abstraction is well-defined in a way that the LLM is not. Whether it's good or not is a different question.
                  • abustamam 39 minutes ago
                    That would be nice if it were the case but from what I can gather from this interesting dependency graph, there's a hard dependency on its renderer and schema.

                    https://jsonforms.io/img/architecture.svg

                    You can add your custom renderer but you still need their library for bindings and such.

              • sevenzero 4 hours ago
                I can also just do it myself though lmao its not like I dont look at what it is producing

                The recommended tool cant even produce mobile friendly, like why would I ever use it?

                • topaz0 3 hours ago
                  I don't know or care about that specific tool, or really what you do at all, I was just reacting to how the principle you stated conflicts with the practice you described. How you reconcile those is up to you.
            • hansmayer 4 hours ago
              > I typically avoid any most abstractions or third party dependencies

              Right, so depending on an LLM makes perfect sense in that case, thanks for clarifying :)

              • sevenzero 4 hours ago
                Yea because I totally depend on the LLM doing it because I cant do it myself /s
                • hansmayer 3 hours ago
                  Mate, that's literally what you implied, innit? You probably "can" do it yourself, but you choose not to - I wonder why? Also the point of sarcasm is to communicate it in such way that it is obvious, without using the "/s" signifier. You know like, telling a joke at a party that you don't have to explain.
                  • sevenzero 3 hours ago
                    > I wonder why?

                    Because I like to save time?

                    • hansmayer 2 hours ago
                      ...which means you depend on the LLMs? Of course strictly "to save time". It's not like you are slowly forgetting how to start a project in the first palce or implement that db integration, right?
                      • sevenzero 2 hours ago
                        LOL why would I ever forget how to start a project or how to connect to a DB or make migrations and whatnot, brother generating a web form for creating and updating models is not that big of a deal. A LLM can do this while providing a11y attributes and proper styling in like 10 minutes. This includes creating a migration which I take a look at and correct if needed, creating the model, creating required policies, creating the controller endpoints which i correct in case its needed, creating a template file for the crud operations with search and pagination and whatnot while making it somewhat look good.

                        I can do all of this myself, but why would I waste 1-2 hours (per model) on doing all that myself if I can just instruct some stupid LLM to do it for me? It's repetitive boilerplate.

                    • bluefirebrand 2 hours ago
                      Avoiding abstractions "because I like to save time" doesn't sound like something a professional software engineer should ever say
                      • abustamam 31 minutes ago
                        Isn't that the whole concept of "technical debt" though? This has been how software has been developed for quite a while, even pre-LLM. Sometimes your boss puts a thousand things on your plate and you take shortcuts on less important things to save time, and sometimes it works out well and sometimes it doesn't.
                      • sevenzero 2 hours ago
                        Yea because having 200 different abstractions and DSLs makes stuff easier for sure! Why not use all the stuff that was popular 6 years ago like Prisma, GraphQL and Redux, whoops suddenly you need a whole team of devs knowing all kinds of unecessary abstractions.
                        • hansmayer 10 minutes ago
                          > Prisma, GraphQL and Redux, whoops suddenly you need a whole team of devs knowing all kinds of unecessary abstractions.

                          Ah, let me guess / you're one of those non-technical PMs who can finally shove it to the devs - by spitting out unreadable HTML storing all it's data in a flat file? Oh boy, do I have news for you...

                        • bluefirebrand 14 minutes ago
                          Based on the examples you provided, I think the term you're looking for is "external dependencies" not "abstractions"

                          Edit: Incidentally, I tend to treat "code made by an LLM" and "external dependencies" pretty much the same. Pretty low trust, with a strong interface between it and any code that matters

          • drbojingle 5 hours ago
            This might pair well with something like https://data-atlas.net.
      • pron 6 hours ago
        Sure. I'm talking about production software that needs to survive and evolve for a long while.
        • pydry 5 hours ago
          This the core unspoken bone of contention in most AI arguments I think: most people either arent writing code with strict quality requirements or dont realize where their use of AI is violating them.

          That said most of the world's most useful code has strict quality requirements. Even before AI 90% of SLOC would be tossed away without much if any use, 9% was used infrequently while 1% runs half the world's software.

        • mountainriver 5 hours ago
          Can you not review it?
          • RugnirViking 3 hours ago
            I think this misses the scale of the problem. Review never fixed tech debt, nor did it fix relevant/bloated test suites. It didn't solve complexity, or eliminate footguns. Very few people (I would argue almost noone) had developed theories for what all of these even were, or how to spot them in code.

            Reviewers aren't perfect, far from it. And we just gave them ~20x more code to review. Incentives mean that taking 20x longer to review is unacceptable. So where do we go from here?

          • throwaway173738 4 hours ago
            Review always misses something.
    • daishi55 4 hours ago
      The generated code is more than fine, it’s good in many cases. And I read it :)

      Indeed for the task of “jump into an unfamiliar codebase and make a requested change that aligns with existing styles and patterns, and uses existing functionality” I would say something like opus 4.7 exceeds the capabilities of most developers.

      • pron 4 hours ago
        I agree with both statements, but that doesn't change the problem I stated. If an agent produces reasonable code 80-90% of the time, and 10-20% of the time it makes mistakes that could render the codebase irretrievably unevolvable once they accumulate, the only thing you can do is to carefully review the agent's output 100% of the time. That it gets things right 80% of the time as opposed to 40% of the time doesn't change this calculus one iota.

        But agents generate code much faster, and to know slow them down, some people want to not do the only thing that can currently ensure you get good results, which is to carefully review the output. Once that happens, there is simply no way for them to know how good or bad what they're getting is.

        • noelsusman 2 hours ago
          I guess I don't understand how this logic doesn't apply to human developers.
          • pron 2 hours ago
            Human developers don't produce code at such a rate, and their judgment is, on average, better. So one, the review doesn't make you feel like you're slowing things down much, and two, the problems are less hidden.
            • Kiro 52 minutes ago
              > their judgment is, on average, better

              I can only presume you work with talented people somewhere that is not representative of most companies. You're definitely overestimating the average programmer's abilities.

            • hunterpayne 35 minutes ago
              More code != better. Don't believe me? Which is more valuable, MS Word or all the code written as part of class projects?
        • Kiro 3 hours ago
          And humans produce 100% reasonable code or what? The kind of mess me and everyone I've worked with produces by hand is the inverse of that. Constant shortcuts and lazy slop through and through. Never worked anywhere where the code wasn't an entangled disarray.

          As soon as requirements change the abstractions fall apart and everything gets shoehorned.

          • bluefirebrand 2 hours ago
            > And humans produce 100% reasonable code or what?

            Humans can be held accountable for their own slop

            > The kind of mess me and everyone I've worked with produces by hand is the inverse of that

            Yes, it's frustrating to work with isn't it? So why are you so excited to make higher volumes of this low quality slop using AI?

    • eatsyourtacos 6 minutes ago
      >Yep. The only people I've heard saying that generated code is fine are those who don't read it.

      If you already have a mature code base, then it's very easy to get AI to write excellent code. It has a ton of documentation on what you already do, how you do things, functions to use etc.

      I read all the changes AI does. I work in small chunks.

      >Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered

      The agent can modify the structure you want to change to 100x faster than you can. That's the beauty of it. We all know how hard it is manually to make architectural changes once you've started to lock into something.

      These comments just show me you must not be using AI in the right way, or haven't used it enough to learn "how" to use it. I've been using claude code months now at full speed. You are simply wrong that it doesn't generate good code.

    • WalterBright 37 minutes ago
      My own code is contortious. I refactor it regularly to reduce that, but it still can be better.
    • __alexs 4 hours ago
      I read all the code I generate with Cursor and some of it smells a bit weird but is easily fixable and most of it is as good as what I would write or better.
    • bicepjai 2 hours ago
      This is the rule I have settled on and I can feel why. Writing the first buggy working version with agents is always fun. Then making the software reliable with the agents, the way you want is very painful.
    • stingraycharles 6 hours ago
      > Picking among them isn’t a matter of context. It’s a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.

      Yeah I’m currently working for several months already on a harness that wraps Claude Code and Codex etc to ensure that these types of invariants are captured and enforced (after the first few harness attempts failed), and - while it’s possible - slows down the workflow significantly and burns a lot more tokens. In addition to requiring more human involvement, of course.

      I suspect this is the right direction, though, as the alternatives inevitably lead any software project to delve into a spaghetti mess maintenance nightmare.

      • pron 6 hours ago
        It's not enough to enforce the invariants because they may need to change. You need to follow the invariants when they're right, and go back and reconsider them when they prove unhelpful. Knowing which is the case requires judgment that today's models are simply incapable of (not consistently, at least).
    • leonaves 3 hours ago
      What's the difference between asking an AI to write you a module you never read and installing a 3rd-party module without auditing all its source code?
      • Xirdus 3 hours ago
        If the 3rd party module is popular, its badness will affect other people too and either the module will get improved or well known workarounds/"best practices" will develop. With AI-generated code, more often than not you're the sole user.
      • skydhash 3 hours ago
        Trust and reputation.

        I would use Stripe, curl, and ffmpeg without audits, because I trust them to provide good code and to respect their API. I wouldn’t trust AI to write a Fibonacci series implementation.

        The AI has no reputation to wager for my trust.

      • frikk 3 hours ago
        stars on github? I've wondered the same thing.
    • indoordin0saur 2 hours ago
      Write your code by hand, but AI still serves as something of a stack overflow and code completion tool. Also good for writing tedious things like regex or little one-off utility scripts as well as a first crack at unit tests. Using it to actually write big blocks of important code is a no-no in my opinion as it produces what I would characterize as slop, even if it technically works.
    • zephen 2 hours ago
      > What I've seen is that if you define the architectural constraints, the agent writes complex, unmaintainable code...

      To be fair, there are many people like this as well. One of my personal favorite examples was way back in the 80s when I inherited the code for a protocol converter that let ASCII terminals communicate with IBM mainframes via the 3270 protocol.

      One of the pieces of code in there, for managing indicator lights, was simply wrong. It was ca. 150 lines of Z80 assembly language that was trying to faithfully follow the copious IBM documentation of how things worked, but it had subtle issues and didn't always work.

      My approach was to accept the documentation as accurate (the IBM documentation was always verbose and almost never wrong), but to reason that the original 3270 had these functions implemented in TTL logic gates, and there was no way in heck that they were wasting enough gates on indicator lights to require the logical equivalent of 150 instructions.

      So in my mind, it had to be a really simple circuit that had emergent properties that required the reams of documentation. With that mindset, I was able to craft correct code for this in 12 instructions.

      Many systems are likewise fractal in nature. You want to figure out the generating equations, rather than all the rules that derive from those. And, in many cases, writing down the generating equations is at least as easy to do in code as it would be to do in English for someone or something else to implement.

    • linuxftw 3 hours ago
      Try plan mode. The problems you're speaking about are already solved.
      • pron 2 hours ago
        They are nowhere near solved. Agents make serious mistakes in judgment and do it frequently enough to threaten the viability of the codebase unless you slow down and monitor them very, very closely. If you do that, it's all good. If you're not, your codebase is rotting at a superhuman speed underneath you and you have no idea until it collapses.
        • linuxftw 1 hour ago
          I agree they make mistakes in judgement, that's the whole point of plan mode. That judgement comes to the surface before lots of tokens are wasted without sight of the overall solution.

          It's all very simple. "Use x library, data model should be xyz, do m, not n."

          They're obviously not at the point of replacing an experienced programmer as far as knowing the start-to-finish way of accomplishing every detail, that's what the human is for.

      • hatefulmoron 2 hours ago
        Plan mode improves results, but it doesn't solve the underlying problems. Pretty often Claude Opus 4.7 on xhigh will formulate a reasonable enough plan, churn for a while, then come back with a summary that it didn't stick to the plan because it wasn't accurate.

        Worse, the disclaimer is buried under a bunch of "did X, did Y on line Z of file a/b/c", as if it's just a minor inconvenience. To the extent the plan was inaccurate, you're left in an undefined state where you might as well undo what it just did..

        • linuxftw 2 hours ago
          You have to review the plan and fill in any missing gaps or correct anything that's wrong. Plan mode often isn't one shot, it might take a few iterations, but once the plan is nailed down, the results are usually very good.
          • hatefulmoron 2 hours ago
            You're right. I think having it spawn lots of subagents, read everything, formulate a big and detailed plan, only for it to be subtly wrong while requiring me to carefully review the result and the intermediate plans that produced it is quite tiring. I suppose things slip through.
            • linuxftw 1 hour ago
              If you understand these subtle pieces you perceive the AI to get wrong, you should include that in your prompt. Also, unit test and functional test coverage go a long way to ensure correct behavior.
              • hatefulmoron 55 minutes ago
                I could also include the correct implementation for it to copy in the prompt, if you get what I'm trying to say. Some amount of laziness or vagueness in the prompt is an intended use case, it's surely the point of having the subagents do so much churning of tokens to research before writing the plan that I'm about to disregard. But sure, those are helpful tips.
    • jstummbillig 6 hours ago
      > The only people I've heard saying that generated code is fine are those who don't read it.

      Well, that is problematic. I have to either assume you are disinterested or lying and neither is great for any discourse.

      • nathanielks 5 hours ago
        Yeah, their statement just isn't true. With enough instruction, I've been able to get great output from models. I think that's the key: with detailed, pointed instructions, the output will match.
        • rimliu 4 hours ago
          how do you know it matches? You did read it then?
          • nathanielks 2 hours ago
            Indeed, I'm not using LLM output without thorough review.

            After reading a bunch of other comments, it sounds like people are referring to letting agents go wild and code whatever off a limited prompt. I'm not using LLMs like that; I'm generally interacting only via conversations with pretty detailed initial prompts. My interactions with the chat after that are corrections/guiding prompts to keep it on point and edit the prompt output from time to time.

            • hunterpayne 29 minutes ago
              Which defeats the purpose of using a LLM in the first place. Same as writing by hand but with a bill for tokens.
    • tcgv 46 minutes ago
      > "Yep. The only people I've heard saying that generated code is fine are those who don't read it."

      I review every line of code I generate with AI. I mainly use an MR-based approach:

      1) Provide a tightly scoped technical spec to Codex as a task, and ask for 3x solutions. Usually at least one of them is on the right track, and it is better to ditch a solution that went in the wrong direction than to try to fix it.

      2) Review the explanation and diff of the proposed changes line by line, file by file. If I find minor deviations from what I asked, or violations of the codebase architecture/conventions, I write comments in the diff and/or global comments, and ask again for 3x adjusted solutions.

      3) Usually, by this point, the solution is ready for me to merge locally and either run local tests or do some manual fine-tuning.

      4) Finally, I generate unit tests. I leave them to this stage because I can repeat the same process with the sole intent of generating case-specific unit tests. This way, I can generate/review tests against the final version of the implementation.

      This has been working very well for me since our repos are reasonably organized and have a well-defined architecture. In the technical spec, I include the major architectural requirements and code conventions, and I also add a catch-all like "follow the codebase's existing conventions and style", which works reasonably well.

      This simple process has enabled me to deliver most minor/medium tasks and bug fixes really quickly while maintaining control over the changes and without lowering the quality bar. For larger and more challenging tasks, I find myself "driving the wheel" (i.e. coding by hand) more often, and using AI code generation in a much more scoped and specific way. So that becomes a different process altogether.

  • djeastm 4 hours ago
    When it was Copilot tab-completing lines, people would say, "yea, but you still have to make sure you're the one writing the whole functions".

    Then when it was completing functions, people would say, "yeah, but you still have to make sure you're the one writing the logic around the functions"

    Then when it was completing the logic around the functions, people would say, "yeah, but you still have to make sure you're the one writing the features"

    Now it's completing features and people say, "yeah, but you still have to make sure you're the one writing the architecture"

    I don't know if architecture is a solvable problem for these models, but it is interesting watching the expectations moving over time.

    • raincole 4 hours ago
      The "people" in your hypothetical story have been wrong the whole time. The correct attitude is:

      When AI can complete lines, you still have to read and understand the code.

      When AI can complete whole functions, you still have to read and understand the code.

      When AI can complete features and tickets, you still have to read and understand the code.

      • globnomulous 14 minutes ago
        Precisely. And this is why all the MCP servers that people at my company are writing aren't worth using: their apparently goal is automate as much as possible, which encourages people not to pay attention -- which results in bad code, bad tests, and bugs.
      • brightball 3 hours ago
        I heard a talk from a VP at NVIDIA a couple of months ago and he echoed this. Essentially their policy is "you are still fully responsible for the code you ship, whether AI helps with it or not"
        • rwmj 2 hours ago
          While also no doubt telling C-level that the AI can write code completely automatically.

          The developers in this scenario are there to absorb the blame when things go wrong. "Human crumple-zones" to protect the company.

        • 9dev 1 hour ago
          Does anyone out there working with AI coding agents not have this policy?
        • cynicalpeace 1 hour ago
          Culpability is and always will be the limiting factor in AI adoption.

          As humans, we need another human to blame when things go wrong.

          Especially in situations that are catastrophic when things go wrong.

        • senordevnyc 2 hours ago
          I think being responsible for the code is a better framing. I run a saas and I don’t always review all the code, but this thing supports my family, so I am acutely aware that I’m responsible for what it does. My customers aren’t going to let me blame the agent for fucking up their workflows.

          But that still doesn’t mean I review all the code. I tend to review defensively, based on the potential for harm if this piece of code is broken. And I rely a lot on tests, static analysis, canaries, analytics, health checks, etc. to reduce risk for when I’m wrong. So far it’s working.

      • jstummbillig 4 hours ago
        Not at all. Code is not important, intent is. The leader of a product/company does not have to read code. It doesn't matter if it is generated by humans or non-humans. It simply needs to be correct enough to be usable and then steerable towards better outcomes. Understanding of code never existed from the business perspective.
        • throwaway173738 4 hours ago
          It does in safety critical industries. You can get grilled by regulators about your source code. And lawyers will use it as evidence in court.
        • raincole 3 hours ago
          > The leader of a product/company does not have to read code.

          Yeah, because they believe (sometimes wrongly) their subordinates read it.

          > Understanding of code never existed from the business perspective.

          It does, it's called organizational wisdom and domain knowledge, because you need those witty names to sell books to aspiring managers.

        • contagiousflow 2 hours ago
          Can you think of a good way to encode intent into a system?
      • herdcall 4 hours ago
        I'm no longer sure you have to, actually. I mean, we do trust the assembly that compilers produce without having to read it, don't we? We're rapidly getting to that stage with LLMs, IMO.
        • bigfishrunning 4 hours ago
          The assembly is a deterministic transform of the input logic, and if it doesn't match then it's a bug in the compiler. If an LLM-based code generator doesn't match what you asked for, that's OK, just pull the slot-machine handle again. that's the difference.
          • james_marks 3 hours ago
            The "pull the slot-machine handle again" is the dangerous thing here.

            I can feel it sometimes, as my brain shuts down and I gamble instead of thinking. It's a reversion to what I call "monkey mind" where you just keep pressing buttons to "make it work". I took a decade training my mind away from this, and too much AI is bringing it back.

          • Xunjin 4 hours ago
            Thank you for this, people often miss the point, comparing apples with oranges.
            • TremendousJudge 3 hours ago
              If they were equivalent, people would be committing the prompts and not the code
              • AnimalMuppet 3 hours ago
                And then getting bugs when they use a new version of the AI, just like people occasionally got bugs when they upgraded to new versions of the compiler...
                • snowe2010 15 minutes ago
                  they would get bugs on every invocation of the software, not on a new version of the AI. it's equivalent to your compiler have a RAND function in it where it chooses between a billion different options every time it compiles, it's absolutely not equivalent to a compiler having a bug.
                • throwaway613746 2 hours ago
                  [dead]
          • n48dotdev 3 hours ago
            [dead]
          • estearum 4 hours ago
            For now, but obviously they're becoming (effectively) more and more similar to the former every day.
            • tuetuopay 16 minutes ago
              They’re not, and will never be in their current form and architecture.

              Compilers are mechanical and engineered to produce a correct output. A compiler emitting incorrect machine code is exceedingly rare, and considered a bug. They have heuristics and probabilities in them, but those are to pick between a set of known-good outputs.

              An AI is a bag of weights outputting a probability of the most plausible token that follows [1]. It is inherently probabilistic in nature and its output is organic (by design, they’re designed to mimic human speech), as opposed to mechanical like a compiler.

              A compiler follows hard rules. An AI does its best.

              And to be fair, AIs are no better than human in this regard: humans are pretty bad at generating correct code without mechanical tools to keep them in line (compilers, linters, formatters). It’s not a wonder we use the same tools to keep LLM output in line as we do humans. (And, to be fair, LLMs are better than humans at oneshotting valid code).

              [1]: to those that tell me this vision of an LLM is outdated: nope. The heavy lifting is done in the probability generation. Debates about understanding are not relevant here, and the net output of an LLM is a probability vector over raw tokens. This basic description can be contrasted to a compiler whose output is a glorified Jinja template.

        • gregsadetsky 4 hours ago
          I know it’s tiring to talk about “hallucination”, but truly, models still do hallucinate

          They constantly say they did a thing they didn’t, say they know how to solve something when they don’t, etc. Regardless of guard rails or tests - AI forces a constant vigilance of a new kind.

          Not just “what might have gone wrong” but also “what do I think is working but isn’t actually”.

          And we’re not even talking about how it chooses substandard solutions, is happy to muddy code/architectures, add spaghetti on top of spaghetti etc.

          Agentic coding often feels like an army of unexperienced developers who are also incredibly eager to please.

          • Zardoz84 3 hours ago
            "still" isn ghecorrect word. They always be having hallucinations
            • AnimalMuppet 3 hours ago
              "Still" means "it always had hallucinations, and it still does, despite people thinking that it doesn't anymore". People think we've moved past that. We haven't.
        • amw-zero 3 hours ago
          This is a really, really, really bad comparison. I used to say the same thing. But the semantic distance between compiling a for loop to equivalent assembly instructions is much smaller than the distance between "I'd like a web application that can store and retrieve todo items." The space of the latter is practically infinite in what can be "compiled."
        • eska 26 minutes ago
          We don’t. That’s why tools like godbolt are popular, debuggers can jump into assembly, and compilers can output assembly files.
        • bayindirh 3 hours ago
          > we do trust the assembly that compilers produce without having to read it

          Yes, because wrong assembly blows really loudly. From wrong behavior to invalid instruction errors and everything between them. Moreover, compilers are battle tested over the years, with extremely detailed test suites, and extreme testing (everyday, hundreds of thousands users test and verify them).

          Also, as people said, assembly generation is deterministic. For a given source file and set of flags, you get the same thing out. Byte by byte, bit by bit. This is what we call "reproducible builds".

          AI is not like that. It's randomized on purpose, it pulls from training set which contains imperfect, non-ideal code. "Yeah, it works whatever", doesn't cut it when you pull a whole function out of its connections, formed by the training data. It can and will make errors, because it's randomized from a non-ideal pool.

          Next, sometimes you need tight code. Fitting into caches, running at absolute performance limit of the processor or system you have. AI is not a good fit here. Sometimes you go so far that you optimize for the architecture at hand, and it works slower on newer systems, so you need to re-optimize that thing.

          For anyone who reads and murmurs "but AI can optimize", yes, by calling specific optimization routines written by real talented people for some cases; by removing their name, licenses, and context around them. This is called plagiarism in its mildest form and will get you in hot water in academia, for example. Writing closed source software doesn't make you immune from cheating and doing unethical things.

          Lastly, this still rings in my ears, and I understood it over and over as I worked with more high performance, correctness critical code:

          I was taking an exam, there's this tracing question. I raise my head and ask my professor: "Why do I need to trace this? Compiler is made to do this for me". The answer was simple yet deep: "If you can't trace that code, the compiler can't trace it either".

          As I said, I just said "huh" at the time, but the saying came back and when I understood it fully, it was like being shocked by a Tesla coil.

          Get your sleep, eat your veggies and understand your code. That's the four essential things you need to do.

        • SpaceNoodled 3 hours ago
          I've actually taken to double-checking the assembly in some instances. There are surprising times that the compiler won't make the shortcuts and optimizations you thought it should, and I also used this method to call out an unsuitable compiler since I caught it spitting out some ridiculous 10x-long set of instructions in certain critical instances.
        • yakattak 4 hours ago
          I want to preface this with that I am all for agentic engineering.

          I am so tired of hearing about this false equivalency. Compilers are deterministic, their outputs are well understood and they’re transparent.

          LLMs are not.

        • aprilthird2021 4 hours ago
          We are not rapidly getting to that stage with LLMs and frankly it's hilarious that you are claiming so.

          For anything other than Greenfield, new code projcets without dependencies and conventions and connections to other proprietary code, it has to be reviewed. Even for that case it's not good to not review code

    • onlyrealcuzzo 31 minutes ago
      > I don't know if architecture is a solvable problem for these models, but it is interesting watching the expectations moving over time.

      At least with current languages, I think the primary problem is they are globally complex, and it's not scalable for them (and certainly for you to review a codebase they've mainly or completely generated) that the invariants you want are being withheld.

      No matter how many times you tell them - there is ZERO blocking allowed on the critical path, they will add blocking on the critical path.

      No matter how many times you tell them any time they do X, they need Y type of test, they will do X without Y type of test.

      They cannot follow directions 100%. Neither can people.

      But they are more random. The mistakes people make are less likely to do the exact polar opposite of what you wanted to do.

      People are less likely to see a critical invariant in the code, build themselves a loophole to get through it, write a test that the code fails successfully, and then tell you they did exactly what you asked for, and burry it in a 5k line commit, where 1000 lines are them changing comments that shouldn't be there in the first place.

      LLMs are great. I'm convinced they're the future. I'm building a language specifically for them: https://GitHub.com/Cuzzo/clear - and to make it easier for YOU to work with them.

      I think once we get around this language problem, that they need global context for things where they shouldn't, it will be a challenge to work with them.

      I've had success with them, but it's been so frustrating, that I question how much it's been worth my sanity.

    • jayd16 23 minutes ago
      Are any of these steps actually solved? AI tab completion still kinda sucks.

      They can keep internal consistency so the more you let it write the more it can write with internal consistency. It still fails at all of these levels as soon as you are looking at each level of detail.

    • bluGill 4 hours ago
      The models can do architecture. However they typically (at least currently) do a really bad job until you force them. I use AI all the time, it is getting better, but I still review every single line. Individual lines are no today are not better than tab completion of last year - sometimes really good and save my typing, sometimes really really bad.
      • embedding-shape 4 hours ago
        Anyone who understands the motivation, reasoning and goals can do the architecture. The crux is that hardly anyone actually understand those and even less is aligned on those, that's when misalignment happen over time, LLMs or not.

        Considering how fast we can poop out code now, I think this issue is just more visible than before, but it's been an issue for as long as I've been a developer. Almost no one knows what they actually want, and half the job is trying to coax out what they want to be able to do, so you can properly architect it.

    • the__alchemist 3 hours ago
      > I don't know if architecture is a solvable problem for these models, but it is interesting watching the expectations moving over time.

      I think the solution is between the lines of this article. The author states the steps leading to this, but doesn't arrive at it explicitly. It has been obvious (With 50/50 hindsight) to me since LLMs started getting popular, and holds:

      LLMs are fantastic for software dev. If you don't let it write architecture. Create the modules, structs, and enums yourself. Add as many of the struct fields and enum variants as possible. Add doc comments to each struct, enum, field, and module. Point the LLM to the modules and data structures, and have it complete the function bodies etc as required.

    • snowe2010 19 minutes ago
      weirdly made up scenario. I'm the person in the very first sentence. Tab-completing lines is still dog-shit. The majority of the time it has no clue what I'm going to write. Just because it can now write a lot more stuff doesn't mean it isn't still just as incorrect.

      Also, you've set up a huge strawman here. Who are these people saying these things in this order and why is that the argument and not "You need to be reviewing every line of code that gets written and understand it."

      Your argument is nonsense.

    • wiseowise 2 hours ago
      > but it is interesting watching the expectations moving over time.

      While the salary stays stagnant or even reduced if you adjust for inflation.

    • wolttam 4 hours ago
      These models understand architecture perfectly well, but they're not trained to care about it when being asked to complex X or Y feature. They're trained to implement the feature in the shortest route possible.

      So it's not much of a surprise that this is the situation folks find themselves in with the current models.

    • vrganj 4 hours ago
      As somebody with a colleague that is using AI agents to "complete features", let me tell you, it is not. It is taking that dude so much longer to prompt and reprompt and then prompt again until it is anywhere close to something that passes review than it would take any competent mid-level engineer to just build the whole thing with some autocomplete help.

      Have people's standards for quality just completely vanished in the pursuit of the shiny new thing? Is that guy doing something wrong?

      That has also been my experience with this sort of thing fwiw, which is why I gave up and do more of a class-by-class pairing with an LLM as a workable middle ground.

    • koumou92 4 hours ago
      100% agree. Obviously AI is at a point where the developer has to do the architecture. Or at least be in control of what kinda of architecture the AI is implementing. You can't one-shot huge features in huge codebases with AI. You are bound to get strange decisions. But that does not mean they are not worth using. That's a silly take.
    • callamdelaney 3 hours ago
      Architecture is one of the easiest things in programming frankly.
    • hansmayer 4 hours ago
      > Now it's completing features

      It's completing shit. Even if it does not implement some lazy stuff with empty catch blocks (i.e. happy path from programming 101 tutorials), it will either expose your secrets in a sensible place or do some other stupidity.

    • user34283 4 hours ago
      I felt the same with:

      "it takes too much effort to get the output production ready"

      turning into

      "maybe long term the maintenance will be more expensive"

      I give it three months until people realize that you rarely need to review every single line and fully understand the code, like so many comments are claiming.

      • camdenreslink 4 hours ago
        If you work on a product that has an existing user base that has an expectation that things will still work then you definitely still need to read the code. LLMs frequently break things or introduce subtle incompatibilities.

        Maybe on projects with no users you can yolo things.

        • user34283 15 minutes ago
          It's not about the number of users but the kind of software you develop.

          In a mobile app, do you think it's more important to test that your drag gesture works as expected on the phone, or to understand every line of the implementation?

    • dzonga 4 hours ago
      the autocomplete can be shit some times.
    • keybored 3 hours ago
      There are always people who will disagree, no matter how amazing something is, and they naturally respond with concerns close to the locus of the LLMification. It would be absurd to respond to “AI autocomplete is great now” with “but you still need to architecture your code”. What’s people saving seconds writing code minutiae got to do with architecturing the code?

      This blob of people criticizing AI is just that, a blob. A gaggle of discrete people that your brain makes up a narrative about being some goalpost shifting entity.

      Of course there could be individuals who have moved the goalposts. Which would need a pointed critique to address, not an offhand “people are saying” remark.

    • taytus 3 hours ago
      Nice Fiction story.
  • peterbell_nyc 5 minutes ago
    I'm generally in agreement with everyone here. - Some code is ephemeral - it's generated to do the thing, thrown away end of session and the csv was imported successfully (or whatever). Make sure you have at least some testing of the output or you may find the email is in the last name field for some rows. If possible, have an API your agent uses with rich domain types and validations that force it to do things right or do them again (and that it' can't rewrite to relax the constraints!) - You can one or few shot a real app - for a few users, for a small set of use cases. Scope of this will improve with models, but at least today it's spelling bee app for my kids" not "salesforce replacement for millions of workers". - You can add rich validation steps for all types of quality that you care about which (assuming they converge) can deliver high performance, well designed and functionally correct code mostly autonomously.

    I'm building an orchestrator (who isn't). Haven't looked at the code yet, but it appears to work. But man have I spent hours in loops between Claude, Codex and myself all on the highest thinking levels to figure out what interface portability means for the employee, how best to handle "remote" sessions and the appropriate semantics for pipelines/recipes.

    I've also been very opinionated about who does what. I'll let the agent write a script to sync with github and reload workers, but I decided to "waste" the 5 minutes to manually do all of the config steps on render for my server when claude told me that I couldn't just give it read only scope to pull the logs. Bad news, I'm cutting and pasting for my computer overlord. Good news? Claude can't blow away the prod db if it happens to get in the way of whatever interpretation is makes of the instructions I give it.

    A chainsaw requires very different skills that an axe. It has different failure modes. Some experience as a lumberjack probably helps using either/both.

    No difference (at least now) with agents.

  • baddash 12 hours ago
    I've set a few rules for working with coding agents:

    1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).

    2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.

    3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.

    Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.

    I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.

    • jimsojim 11 hours ago
      While this is a legitimate set of rules to follow for maintaining code sanity and a solid mental model of how a codebase may grow, it’s always challenging to stick to them in a workplace where expectations around delivery speed have changed drastically with the onset of AI. The sweet spot lies in striking a balance between staying connected to the codebase and not becoming a limiting factor for the team at the same time.
      • baddash 11 hours ago
        That's kind of what I figured, sadly. I haven't experienced it personally yet since I got let go from my last job about 14 months ago, but it makes so much sense given how management is so willing to sacrifice quality for speed.
        • jimsojim 10 hours ago
          Another frustrating thing that has emerged from this is where managers “vibe code” half-baked ideas for a couple of hours and then hand it off as if they’ve meaningfully contributed to the implementation. Suddenly you’re expected to reverse engineer incoherent prompts, inconsistent code, and random abstractions that nobody fully understands.

          In their mind they’ve already done the “architectural heavy lifting” and accelerated the team. More often than not it just adds cognitive overhead where you spend more time deciphering and cleaning up garbage than actually building the thing properly from scratch.

          • KronisLV 8 hours ago
            Vouching for this comment because my friend confided in me a week ago that her manager also does this and is like “oh yeah, here’s 80% done, you just do the rest so we can ship it” when a large part of it is slop that needs to be rewritten, due to not enough guidance and pushback during generation.
            • eCa 8 hours ago
              That’s when you ask it to write tests to a good coverage, and then have it reimplement everything with the tests still passing…
              • camdenreslink 4 hours ago
                Writing tests against a bad implementation usually doesn't work well. In this scenario I would have an LLM look at the changes in the branch and try to create a markdown document of the changes, why it thinks they were made, etc. and then review that doc with the manager and do a new implementation from scratch after aligning.
              • throwaway173738 4 hours ago
                Be careful with this because the LLM will just change the tests on you to get them to pass.
              • KronisLV 7 hours ago
                Unless the tests are written against logic that is in of itself subtly wrong and even the structure of code and what methods there are is wrong - so even your unit tests would have to be rewritten because the units are structured badly.

                It’s a valid direction to look in, it just doesn’t address the root issue of throwing slop across the wall and also having unrealistic expectations due to not knowing any better.

                • skydhash 7 hours ago
                  Yep. It’s very healthy to be suspicious of code. Any code. Whether generated or not. That’s where the bugs are.

                  If there’s one thing that’s disturbing with AI proponent is how trusting they are of code. One change in the business domain and most of the code may have turn from useful to actively harmful. Which you have to rewrite. Good luck doing that well if you’re not really familiar with the code.

          • 6LLvveMx2koXfwn 9 hours ago
            I am lucky to have never worked in a team where my manager wouldn't expect strong push back in this scenario. Many of the corporate environments described on here seem dystopian, this included.
          • baddash 1 hour ago
            lol I pray to the lord almighty my future managers aren't this fuckin dumb
        • 9dev 1 hour ago
          To make a bit of a counter argument here - it's really hard to stick to 100% quality at speed 1.0, when your opponent argues for 90% at 2.5. That's the story the AI fast-movers are telling, and from a business perspective, it's hard to counter it (regardless of whether that speed increase actually materialises).
    • brabel 11 hours ago
      I was trying to follow similar rules, until one day I had to solve a hard mathematical problem. Claude is a phd level mathematician, I am not. I, however, know exactly the properties of the desired solution and how to test it’s correct. So I decided to keep Claude’s solution over my basic, naive one. I mentioned that in the pull request and everyone agreed that was the right call. Would you open exceptions like that in your rules? What if AI becomes so much better at coding than you , not just at doing advanced mathematics? Would you then stop to write code by hand completely since that would be the less optimal option, despite you losing your ability to judge the code directly at that point (and as in my example, you can still judge tests, hopefully)? I think these are the more interesting questions right now.
      • Jweb_Guru 11 hours ago
        > Claude is a phd level mathematician

        Unfortunately, it is not, and many of its attempts at mathematical proofs have major flaws. You shouldn't trust its proofs unless you are already able to evaluate them--which I think is pretty much all the OP is saying.

        • adrianN 10 hours ago
          To be fair, many of the proof attempts that mathematicians do also have major flaws. Most get caught before getting published.
          • seba_dos1 5 hours ago
            But that's the actually important difference. Mathematicians have the toolset and processes to catch the flaws, random people using Claude don't.
        • IanCal 9 hours ago
          Trust isn’t a binary, and I can trust things I don’t understand enough that I can use them. OP was talking about needing to understand, which is quite a bit above the level of being able to validate enough to use for a task.
      • auggierose 10 hours ago
        I definitely wouldn't put math in my code I didn't understand just because Claude says so. I am not astonished that everyone agreed, that's why shit is going to hit the fan pretty badly pretty soon due to AI coding.

        There is one exception to this: If the AI also delivers the proof of why the math is correct, in a machine-checked format, and I understand the correctness theorem (not necessarily its proof). Then I would use it without hesitation.

        • hennell 9 hours ago
          I always found it weird when helping people with excel formulas how few people even try to check maths they don't understand, let alone try to understand it.

          I struggle to remember even relatively simple maths like working out "what percentage of X is Y" so if I write a formula like that I'll put in some simple values like 12 and 6 or 10,000 and 2,456 just to confirm I haven't got the values backwards or something. I've been shown sheets where someone put a formula in that they don't understand, checked it with numbers they can't easily eyeball and just assumed it was right as it's roughly in their ball park / they had no idea what the end result should be.

          Then again I've also seen sheets where a 10% discount column always had a larger number than the standard price so even obviously wrong things aren't always checked.

        • ncruces 9 hours ago
          I don't disagree, but whoever never put math they don't fully understand in their code gets to throw the first stone.

          I've reached solutions by trial and error too, and tried to rationalize them later, quite a few times. And it's easier to rationalize a working solution, however adversarial you claim to be in your rationalization.

          I don't see using gen AI for the (not so) “brute force” exploration of the solution space as that different from trial and error and post fact rationalization.

      • boron1006 10 hours ago
        > Claude is a phd level mathematician , I am not

        I’m going to guess that this is Gell-Mann amnesia more than anything, and it’s going to get a lot of organizations into a lot of weird places.

      • layer8 9 hours ago
        How did you test that the solution is correct? Is the set of possible inputs a low-ish finite number?

        Normally with mathematical problems you have to prove the solution correct. Testing is not sufficient, unless you can test all possible inputs exhaustively.

      • topranks 9 hours ago
        How do you know what it spat out is correct though?

        If it’s beyond our ability to review and we blindly trust it’s correct based on a limited set of tests… we’re asking for trouble.

      • rubzah 6 hours ago
        > Claude is a phd level mathematician

        ... that can't even count.

      • imtringued 9 hours ago
        You do realize you can ask Claude about the things you don't understand?

        "PhD level" just means you finished a bachelor and masters degree and are now doing a bit of original research as an employed research assistant.

        Claude isn't "PhD level" anything. This shows a complete lack of understanding here. Claude has read every single text book in existence, so it can surface knowledge locked away in book chapters that people haven't read in years (nobody really reads those dense books on niche topics from start to finish).

        Since Claude has infinite patience, you can just keep asking until you get it.

        • famouswaffles 2 hours ago
          Man people really overestimate training. Claude did not 'read' any of that either. I wish frontier models behaved like people that had read and remembered everything they've trained on, but they're not.
    • dathanb82 12 hours ago
      I’ve also heard it being called “comprehension debt,” which I like a little more because I think it’s more precise: the specific debt being accrued is exactly a lack of comprehension of the code.
      • cassianoleal 9 hours ago
        I think it’s both in fact.

        Comprehension debt just sounds like there are things you don’t (yet) understand.

        Cognition debt means your lack of understanding compounds and the cognition “space” required to clear it increases accordingly.

        An increasing comprehension debt that can be paid off one bit at a time within reasonable cognition space takes linear time to clear.

        Cognition debt takes exponential time to clear the more of it you have. If it reaches a point where you simply don’t have the space for the cognition overhead required to understand the problem, you probably need to start over from your specifications.

        • baddash 1 hour ago
          Woah, didn't expect a cognition debt researcher to be in the comment section

          jk :D your points make a lot of sense though!

          • cassianoleal 42 minutes ago
            lol no. Just an opinionated grey bearded software engineer with sleep deprivation. :D
      • layer8 9 hours ago
        I like that too. However, “cognitive debt” points to the possibility of cognitive overload, that the code can become so complex and inscrutable that it may become impossible to comprehend. “Comprehension debt” sounds a bit weaker in that respect, that it’s just a matter of catching up with one’s comprehension.
      • baddash 11 hours ago
        Yeah I like that better too, gonna start using that
      • kortilla 9 hours ago
        “You can outsource your thinking, but not your understanding.”
    • IanCal 9 hours ago
      This is fine if it’s more enjoyable for you, that’s what’s important in personal projects most of the time.

      But we don’t follow the same things for dependencies, work of colleagues, external services, all the layers down to the silicon when trying to work.

      Why is AI suddenly different?

      We just have to do this by risk and reward. What’s the downside if it’s wrong, and how likely is an error to be found in testing and review? What is the benefit gained if it’s all fine? This is the same for libraries and external services.

      A complex financial set of rules in a non-updatable crypto contract with no testing?

      A viewer for your internal log data to visualise something?

      • marginalia_nu 8 hours ago
        It is and has always been immensely helpful to understand what you are doing in any context.

        There are some programmers who treat the job as just plumbing together what is to them completely incomprehensible black boxes, who treat the computer as a mystery machine that just does things "somehow", but these programmers will almost always be hacks that spend their entire career producing mediocre code.

        There are things such a programmer can build, but they are very limited by their lack of in depth understanding, and it is only a tiny fraction of what a more competent programmer can put together.

        To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth. You don't have to be an expert at these things by any means, but you do need to understand them and be comfortable treating them as transparent boxes that you may have to go in and fiddle with at some point to get where you need to go. Sometimes you need to vendor a dependency and change it. Sometimes you need to drop it entirely and replace it with something more fit for purpose you built yourself.

        • lo_zamoyski 3 hours ago
          That's a little simplistic and lacking in nuance.

          > To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth.

          I think maybe you overestimate your own knowledge here. It's one thing to understand general principles and design or to understand a contextually-relevant vertical or whatever. It's another to demand comprehensive (even if not expert) familiarity in non-trivial projects, especially those created by many developers over long time spans. It's not just a question of intelligence or dedication or even just time spend working on a project.

          The amount of software even your typical piece of code relies on is staggering and shifting, and it's only getting more complicated. A good chunk of software engineering and programming language research has been focused on making it practical to operate in such an complex environment - an environment that nobody fully understands - which is a major part of why modularity exists. Making software like "plumbing together [...] black boxes" is exactly what such research aspired to accomplish, because it allows different developers to focus on different scopes and focus on the domain they're working on. Software engineering is a practical field, and any system that requires full knowledge to operate, modify, and extend is either relatively small (maybe greenfield and written by a sole developer) or impractical to work with.

          So I would say there's a wide gap between "lazy guy who doesn't give a shit" and "guy who thinks he can understand everything". Both lack the humility and wisdom needed to know the limits of their knowledge, to circumscribe what he needs to understand, and to operate within the space these afford. (Both extremes remind me of cocky junior devs. On the one hand, you have the junior dev who carelessly churns out "hot shit" garbage code by plumbing things together with no grasp or appreciation of sound design; on the other you have the dev who makes a big show about "rigor" completely detached from the actual realities and needs of the project. In each case, the dev is failing to engage intelligently with the subject matter.)

          • marginalia_nu 1 hour ago
            Well I mean I've built an internet search engine from scratch[1] and I'm making a living off this successfully enough to have completely left the wagie existence for the foreseeable future, so I think I at least kinda walk the talk.

            I'm far from the best at anything and make no claims toward knowing everything, but I do think I have reasonable breadth in my experience and work, and I don't think I could have built something like this otherwise.

            [1] ... which is something that does not decompose neatly into black boxes and must to a large degree be built from first principles as goddamn nothing off the shelves scales well enough to deal with multi-terabyte workloads at the even a fraction of the speed a bespoke solution can.

            • baddash 1 hour ago
              dude... I need to break free from the wagie existence too. any advice? besides the points you've made already :P
              • marginalia_nu 1 hour ago
                Build something people want.

                Tired point, but it really is true. People will throw money at you if you do.

                • baddash 1 hour ago
                  ok, I kind of knew that already LOL, but I don't have any questions that are more specific so I can't really complain. just gotta get after it i guess.
      • wccrawford 7 hours ago
        AI is different because it's a tool, and the user of the tool is responsible for the work performed.

        An outsourced developer isn't a "tool". They're a human being, and responsible for their actions. They're being paid, and they either act responsibly or they get replaced.

        A vibe coder is a human using a tool. The human is responsible for code quality, and if it's not good enough, they need to keep using the tool to make it better. That means understanding the tool's output.

        If an artist used Photoshop to create a billboard ad that was ugly, they don't get to blame Photoshop. They have to keep using the tool until their output is good.

        • IanCal 2 hours ago
          I don’t find “but who is to blame, ultimately?” All that useful.

          So you figure out that someone you paid is at fault, instead of someone they hired. Your contract is with them so what really changes? What process or anything else is really different between it being a company with a manager who asks a team of devs and a company which asks an AI agent, to you as a customer?

          Maybe it changes who gets fired or sued or whether one insurance or another pays out- but broadly I think none of what I said about project work really changes.

          Product owners and hell even customers have been able to get software they don’t understand all the details of or for customers even get to see the code, purely driven with natural language.

        • CoastalCoder 6 hours ago
          > An outsourced developer isn't a "tool".

          I'd think that depends on the model of responsibility at play.

          For example, suppose I hire a building contractor to build a house, and the electrician he subcontracts makes mistake.

          From my perspective, the prime contractor is equally responsible for that mistake regardless of whether he used a subcontractor, or did the work himself but used a broken tool.

          This doesn't make the electrician any less of a "person" in the deeply important ways, but it's not a distinction that's relevant to my handling of the problem.

          • kelvinjps10 4 hours ago
            But in internally it would work the same for this contractor as this subcontractor would either learn or get replaced
    • seba_dos1 5 hours ago
      I had a similar approach, but in the end I don't think it's feasible to actually sufficiently follow the rule 2. It sounds good in theory, but in practice you'll always take some mental shortcuts that you may not even be aware of. Try digging into an unknown codebase to fix some issue and compare how much will stay in your head a week after if you do it yourself or if you "completely understand" what an agent did for you. When I do it myself, it contributes to my general knowledge and I mostly retain the important parts in my head even if I lose the details over time; when I try to own what an agent did as if it was mine, it feels like I understand it well at the time after putting some effort into it but then I forget it all very fast. Ultimately I decided that having an LLM help me there is actually detrimental to my goals most of the time, and that's without even considering some other concerns raised by sibling comments here such as time and business pressures.
    • TranquilMarmot 12 hours ago
      This is great until the "gun to your head" is your skip-level manager demanding that a feature be implemented by the end of the week, and they know you can just "generate it with AI" so that timeline is actually realistic now whereas two years ago it would have required careful planning, testing, and execution.
      • fransje26 8 hours ago
        Well, that's nice.

        Your manager is unknowingly helping you create a form of job security for yourself, with all the technical debt and bugs being accumulated.

        He might not understand it, and it might not be the type of work you want to do, but someone is going to have to fix those issues. And the longer they wait, the bigger the task gets.

        • TranquilMarmot 13 minutes ago
          The bet that management is making is that the AI will continue to improve and that it will be able to fix those issues on the cheap - so far this has proven to be true for us. We use AI to generate code at scale, that code has issues at scale, so we use AI to fix those issues.
        • anygivnthursday 7 hours ago
          That isn't new, though. Managers often pushed unrealistic timelines and showed lack of care about tech debt well before vibe coding, just the timelines where different, and the magnitude will be bigger this time. But we also have LLMs to help it clean it up faster, I guess.
        • 2ndorderthought 7 hours ago
          The question is, is it a job you actually still want once the poo pile reaches critical mass you are the only one with a shovel and the deadline is "yesterday"
          • fransje26 6 hours ago
            That is absolutely true. Unfortunately, this ship has sailed and we are not closing Pandora's box anymore. We'll have to adapt.

            But we still hold good cards in hand.

            Do they want their pile of steaming slop fixed, or not? Because no amount of complaints about the deadline being "yesterday" are going to change anything about the fact that time will be needed to fix the accrued technical debt, whether they like it or not.. And if AI dug you in that deep to start with, the solution is not to dig deeper.

            I suspect some companies are going to find that out the hard (costly) way.

      • wccrawford 7 hours ago
        If the manager is unreasonable, you were always going to have a problem with them, eventually. Nothing you can do with fix this.

        If manage is reasonable, you can explain to them that there isn't time to check the work of the AI, and that it frequently makes obscure mistakes that need to be properly checked, and that takes time.

        At this point, if they still insist you just give it the AI's work, they've made a decision that is their fault. You've done what you can.

        And when the shit hits the fan, we're back to whether they're reasonable or not. If they are, you explained what could happen and it did. If they force responsibility on you, they aren't reasonable and were never going to listen to you. That time bomb was always going to go off.

        • TranquilMarmot 10 minutes ago
          The problem is that this mode of operation for them works - they get the features made in a fraction of the time it used to take, the feature does what it says on the tin, they feel good about pushing the product in a specific direction. If something goes wrong, the AI can fix it, too.

          I'm not sure that there's really a "bomb" hiding in here anywhere. The issue is that it IS "reasonable" now to expect big features to be done within a week.

        • baddash 1 hour ago
          Great point! Looks like you got some strategies for dealing with managers lol
      • nertirs3 11 hours ago
        I hate this current trend of managers deciding, what tools developers have to use. Hopefully it ends soon.
        • nikau 10 hours ago
          Time will tell if outages and defect resolution sky rocket or if ai can deal with it
          • adrianN 10 hours ago
            Does that matter that much in practice? I bet lots of costumers are okay with software that crashes 10x as much if it costs 10x less. There already is a ton of shitty software that still sells.
            • drchickensalad 4 hours ago
              > if it costs 10x less

              This will not happen. Nobody desires to give that up. Also AI does not deliver even remotely that much true value multiplier

    • whitefang 11 hours ago
      I agree to this though it also depends on the nature of project.

      Had a project idea which I coded with the help of AI and it became quite large to a point I was starting to have uncharted areas in the code. Mostly because I reviewed it too shallow or moved fast.

      It was a good thing as that project never floated but if I were to do such a thing on my breadwinning project I would lose the joy.

    • gritzko 11 hours ago
      I just had a Claude episode. Instead of trying to fix the bug, it edited the data to hide the bug in the sample run. This kind of BS behavior is not rare. Absolutely, if you do not understand every bit of what's going on, you end up with a pile of BS.
      • dzhiurgis 10 hours ago
        That’s why I love gemini - none of this bullshit ever happen.
        • gritzko 8 hours ago
          I do not think Gemini can relieve a developer from knowing what he is doing.
          • dzhiurgis 4 hours ago
            No but it can actually go a fix a bug instead of disabling unit test..?
          • 2ndorderthought 7 hours ago
            There's some really weird and unusual posts glazing Google in here today. Bot accounts out in force!
            • dzhiurgis 4 hours ago
              Says 27 day old account
    • throwaway2027 8 hours ago
      I already followed those rules mostly with StackOverflow and before AI.
    • bmitc 11 hours ago
      This is about how I use it. I initially use it to carve out an architecture and iterate through various options. That saves a lot of time for me having to iterate through different language features and approaches. Once I get that, I have it scaffold out, and I go in and tidy things up to my personal liking and standards. From there, I start iterating through implementations. I generally have been implementing stuff myself, but I've gotten better at scaffolding out functions/methods through code instead of text. Then I ask it to finish things off. That falls into your first category of letting it implement stuff that I already know I could do. Not sure if it's faster. But it's lower cognitive load for me, since I can start thinking about the next steps without being concerned about straightforward code.

      This all works pretty great. Where it starts going off the rails is if I let it use a library I'm not >=90% comfortable with. That's a good use of these tools, but if I let it plow through feature requests, I end up accumulating debt, as you pointed out.

      For my uses, I'm still finding the right balance. I'm not terribly sure it makes me faster. What I do think it helps with is longer focused sections because my cognitive load is being reduced. So I can get more done but not necessarily faster in the traditional sense. It's more that I can keep up momentum easier, which does deliver more over time.

      I'm interested in multi agent systems, but I'm still not sure of the right orchestration pattern. These AI tools still can go off the rails real quick.

    • i_love_retros 7 hours ago
      It's not worth fighting it at work. If the idiots you work for want everything vibe coded and delivered at 5 * 2025 speed then just vibe code and try to leave the company ASAP. That's where I am right now. Of course I might end up somewhere just as ridiculous or maybe not be able to even find another job. Shitty times we live in right now.
  • jwpapi 8 hours ago
    That’s the same story I had.

    The swindle goes like this, AI on a good codebase can build a lot of features, you think it’s faster it even seems safer and more accurate on times, especially in domains you don’t know everything about.

    This goes in for a while whilst the codebase gets bigger and exploration takes longer and failure rate increases. You don’t want it to be true and try harder so you only stop after it practically became impossible to make any changes.

    You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.

    You start working…, and you realize what was going on

    I deleted 75,000 of 140,000 lines of code and I honestly feel like the 3 months I went hard into agentic coding I wasted and I failed my users by building useless features increasing bugs, losing the mental model of my code and not finding the problems I didn’t know about the kind of hard decisions you only see when you in the code, the stuff that wanders in your mind for days

    • dxdm 7 hours ago
      I find it interesting that this outcome is a surprise. I don't want this to sound smug, I'm genuinely curious what the initial expectations are and where they come from.

      They seem to be different for LLMs, because would anyone be surprised if they handed summary feature descriptions to some random "developer" you've ever only met online, and got back an absolute dung pile of half-broken implementation?

      For some reason, people seem to expect miracles from some machine that they would not expect of other humans, especially not ones with a proven penchant for rambling hallucinations every once in a while.

      I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?

      • manicennui 13 minutes ago
        Receiving an absolute dung pile of half-broken implementation is honestly what I expect from most working software engineers. Now the step where they spend even a second thinking about what they are doing has been removed. My job as a principle engineer became doing most of the thinking for people and then providing the only worthwhile code reviews before LLMs became a thing. LLMs just made these people even less useful and my job became even more about reviewing their low quality work that I could have done in less time manually.

        LLMs also don't solve the much bigger problem of most software engineers having no ability to work with others to clarify requests or offer alternatives. So now bad and/or misunderstood requests can be implemented faster.

      • throw101010 5 hours ago
        LLMs do deliver "miracles", in certain cases, if you've experienced it and have been blown away by their output (one shot functional app from a well manufactured prompt, new feature added flawlessly on a complicated existing codebase, etc.), it can be tempting to reajust your expectations and think this will work consistently and at a much larger scale.

        They can assimilate 100s of thousands of tokens of context in few seconds/minutes and do exceptional pattern matching beyond what any human can do, that's a main factor in why it looks like "miracles" to us. When a model actually solves a long standing issue that was never addressed due to a lack of funding/time/knowledge, it does feel miraculous and when you are exposed to this a couple of times it's easy to give them more trust, just like you would trust someone who provided you a helping hand a couple of times more than at total stranger.

        • dxdm 5 hours ago
          Thanks, that makes sense.

          I suppose it's difficult to account for the inconsistency of something able to perform up to standard (and fast!) at one time, but then lose the plot in subtle or not-so-subtle ways the next.

          We're wired to see and treat this machine as a human and therefore are tempted to trust it as if it were a human who demonstrated proficiency. Then we're surprised when the machine fails to behave like one.

          I have to say, I'm still flabbergasted by the willingness to check out completely and not even keep on top of, and a mental model of, what gets produced. But the mind is easily tempted into laziness, I presume, especially when the fun part of thinking gets outsourced, and only the less fun work of checking is left. At least that's what makes the difference for me between coding and reviewing. One is considerably more interesting than the other, much less similar than they should be, given that they both should require gaining a similar understanding of the code.

      • doginasuit 7 hours ago
        I've never relied on an LLM to build a large section of code but I can see why people might think it is worth a try. It is incredible for finding issues in the code that I write, arguably its best use-case. When I let it write a function on its own, it is often perfect and maybe even more concise and idiomatic than I would have been able to produce. It is natural to extrapolate and believe that whatever intelligence drives those results would also be able to handle much more.

        It is surprising how bad it is at taking the lead given how effective it is with a much more limited prompt, particularly if you buy in to all the hype that it can take the place of human intelligence. It is capable of applying a incredible amount of knowledge while having virtually no real understanding of the problem.

      • jeltz 6 hours ago
        Probably same reason people expected outsourcing to the cheapest firm in India would work: wishful thinking. People wanted it to work and therefore deluded themselves.

        Or really the same reason people fall for get rich quick schemes.

    • thunky 6 hours ago
      > You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.

      I don't understand this. A large codebase should be a collection of small codebases, just like a large city is a collection of small cities. There is a map and you zoom into your local area and work within that scope. You don't need to know every detail of NYC to get a cup of coffee.

      Its your responsibility to build a sane architecture that is maintainable. AI doesn't prevent you from doing that, and in fact it can help you do so if you hold the tool correctly.

      • wavemode 4 hours ago
        To use your analogy - there's a big difference between the streets of New York and the streets of Boston. In New York if you know you're on 96th and 3rd, then you automatically know how to get to 101st and 5th - it's just a grid. But not every town is like that - many require you to possess knowledge about specific streets and specific landmarks in order to navigate anywhere.

        To speak more directly - every codebase has local reasoning and global reasoning. When looking at a single piece of code that's well-isolated, you can fully understand its behavior "locally" without knowing anything about any other part of the code. But when a piece of code is tightly coupled to many other parts of the codebase, you have to reason globally - you have to understand the whole system to even understand what that one piece of code is doing, because it has tendrils touching the whole system. That's typically what we call spaghetti code.

        If you leave an AI to its own devices, it will happily "punch holes", and create shortcuts, through your architecture to implement a specific feature, not caring about what that does to the comprehensibility of the system.

      • gf000 5 hours ago
        I don't think that's a useful mental model for software in general.

        There are software that works like this (e.g. a website's unrelated pages and their logic), but in general composing simple functions can result in vastly non-proportional complexity. (The usual example is having a simple loop, and a simple conditional, where you can easily encode Goldbach or Collatz)

        E.g. you write a runtime with a garbage collector and a JIT compiler. What is your map? You can't really zoom in on the district for the GC, because on every other street there you have a portal opening to another street on the JIT district, which have portals to the ISS where you don't even have gravity.

        And if you think this might be a contrived example and not everyone is writing JIT-ted runtimes, something like a banking app with special logging requirements (cross cutting concerns) sits somewhere between these two extremes.

        • bluGill 4 hours ago
          The GC shouldn't care about all the code it is collecting. I collects garbage, it doesn't care if the garbage is a intermediate value from your tax calculations, or the the previous state image from your UI - either way it is garbage and it is gone. Now in a few cases details of garbage collection matter by enough that it is worth something more invasive for some reason, but the vast majority of code shouldn't care about the other areas.

          When on a tiny project it doesn't matter. However when you have millions of lines of code you have to trust that your code works in isolation without knowing the details.

          • gf000 4 hours ago
            I don't get your comment. The GC is part of the runtime, to it user code is data. But the JIT compiler and other internal details of the runtime are its code, and there are very real cross-cutting concerns, like the JIT compilers output should take into account what memory representation the GC expects, where are barriers, when one is run etc. So I'm talking about a project such as the JVM.

            > have millions of lines of code you have to trust that your code works in isolation without knowing the details.

            More like hope. This is where good design and architecture helps, as well as strong invariants held up by the language. But given that most applications can't really escape global state (not even internally, let alone external state like the file system), you can never really know that your code will work the way you expect it to - that is, it's not trivially composable to any depth.

      • marcosdumay 2 hours ago
        > A large codebase should be a collection of small codebases, just like a large city is a collection of small cities.

        Oh, great analogy there.

        Just like there's almost nothing in common between a large city and a collection of small cities, a large codebase is completely different from a collection of small codebases too.

        Mostly because of the same kinds of effects.

      • OptionOfT 5 hours ago
        No but the speed up of AI is giving up control, and then you notice these issues too late.
      • gonational 3 hours ago
        It's not that you can't "build a sane architecture" as much as it is difficult to justify the time spent to do this when you can "bang out features" in 10 minutes that would take days to do manually. It's about the economics of code generation. When inventing structure and typing it out as code takes time, thinking deeply about architecture first makes sense. There is another factor, as well: "thinking deeply about the architecture" involves experimentation. You might go down a particular path while coding, and then realize some limitations and/or new ideas, etc. You ultimately craft something that will work well and play well with future code, and which may be easily understood. If somebody stops by your desk and says, "you finish that <3 day feature you were just assigned 2 hours ago> yet?", that'll be the last time you think deeply about anything at work.

        Rather than arguing about the specifics, it's easier to point to numerous concrete examples, such as a fairly simple system - which should be easy to implement in 8-15k lines of code, depending on certain choices (I've been writing code long enough to estimate this relatively accurately) - being still-incomplete while approaching 150k lines. These kinds of atrocities are usually economically infeasible in hand-written code, for 2 reasons: 1) the cost to produce that much code is very high, and 2) the cost of maintaining that much code is insurmountable.

        I guess you could say that AI is great at generating code that only AI can understand and maintain.

    • mjburgess 8 hours ago
      I think this is true, but i imagine there's a workflow solution to this which isnt to drop AI.

      Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.

      There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.

      Single-shot stuff is going to make more sense for a protyping phase with extensive spec iteration. Once that prototype is in place, you then prob want to drop down into per-module/per-file generation, and be more systematic -- always maintaining a reasonably good mental model at this layer.

      • herrherrmann 7 hours ago
        That workflow just sounds exhausting to me. Would I always need to consider how much of a blast radius my AI-generated code might have? Sounds like there’s so much extra management going into these micro decisions that it ultimately defeats the purpose of generating code altogether.

        I could see value in using it during the prototyping phase, but wouldn’t like to work like you described for a serious project for end users.

        • meetingthrower 7 hours ago
          And you have discovered the job of managers! There has always been a lot of hate for managers. Wonder if the robots hate us just as much? (I often feel a weird guilt when I tell an agent to do something I know I am going to throw away but will serve as an interesting exploration...I know if I did that to a human they would be pissed...)
          • nixon_why69 5 hours ago
            IMO the hate has always been for clueless managers, especially clueless yet demanding managers. Managing an LLM for coding is different, try being clueless and demanding and see how far you get.
            • meetingthrower 4 hours ago
              So you think "good" management translates? I actually think it very much does. Clear expectations, providing right context and "the why", quick and clear feedback loops, intervening early when they are going off track, not micromanaging too much so they can actually accomplish more. It's all very similar.
              • nixon_why69 4 hours ago
                Yes it very much does but managing humans is still very different.

                Understanding your domain, setting clear expectations and understanding limitations and how much ambiguity your people/robots can handle are all good management techniques, they translate.

                But the nature of working with an always-on flattery machine vs humans that can exceed your expectations while also being sources of infinite drama and frustration are still fundamentally different. The blind spot is being subsceptible to the flattery machine and forgetting how much you relied on good people challenging you. The benefit is, of course, not having to deal with humans.

          • amenghra 5 hours ago
            > I know if I did that to a human they would be pissed

            You call it a hackathon. You tell the human to stay up the whole night. In exchange for the extra hours worked you provide some pizza.

        • embedding-shape 7 hours ago
          I just don't like to type code anymore. If I can accomplish the same by describing the code, and get the same results as if I typed it myself, I'll opt for not typing so damn much. I've done so much typing in my career, that typing ~80% less to get the same results, makes a pretty big difference in how likely I am to set out to accomplish something.

          I care more about code quality now, because typing no longer limits if I feel like it's worth to refactor something or not.

      • timacles 1 hour ago
        This is like getting a person addicted to drugs and then asking them to only use the drugs on Thursday and Friday.

        This seems to me like it requires an impossible level of discipline, judgement and foresight

      • anal_reactor 7 hours ago
        > treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc.

        This is good advice regardless whether you're using AI or not, yet in real life "let's have well-defined boundaries and interfaces" always loses against "let's keep having meetings for years and then ducttape whatever works once the situation gets urgent".

    • chorsestudios 7 hours ago
      Were you auto-committing everything without reading the generated code? and if you read it but didn't understand it why not just ask for detailed comments for each output? Knowing that a larger codebase causes it to struggle means the output needs to be increasingly scrutinized as it becomes more complex.
      • skydhash 7 hours ago
        I don’t think it’s about what the code does. I think it’s more about how the code fits in its whole context. How useful it is in solving the overarching problem (of the whole software). How well does it follow the paradigm of the platform and the codebase.

        You can have very good diffs and then found that the whole codebase is a collection of slightly disjointed parts.

    • gchamonlive 7 hours ago
      I haven't had the chance to work on large codebases, but isn't it possible to somehow adapt the workflow of Working Effectively with Legacy Code, building islands of higher quality code, using the AI to help reconstruct developer intention and business rules and building seams and unit tests for the target modules?

      AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.

    • qudat 6 hours ago
      Yep. I’m approaching the same problem from a different angle: writing code fast means you aren’t being thoughtful about the features you’re building. I started realizing that after I had kids and spent more time thinking about code than writing it and it really improved the quality of my work: https://bower.sh/thinking-slow-writing-fast
    • jwpapi 4 hours ago
      Obviously this is all my fault and others might have a better jugdement when using it. I’m just sharing my experience compared to the promises you easily might believe reading X/HN/Anthropic.

      I still have a lot of usage for AI: Exploration, Double-checking me, teaching me. But writing code became very tough for me to accept. Nex-edit autocompletes mainly

      • scruple 4 hours ago
        > I still have a lot of usage for AI: Exploration, Double-checking me, teaching me.

        I'm ready to give up on having it even review my code at this point. It's been so frustrating. It hallucinates bugs, especially in places where "best practices" are at odds with reality.

        Recently it informed me of a bug where it suggested the line of code in question couldn't possibly do anything because on Linux the specific stdlib behaved in X ways, but it was obvious from the line of code that it was running on Windows which doesn't have this problem at all. Of course, it doesn't actually mention that this is an issue on Linux, just that there is a bug here. It vomits up a paragraph of $WORDS explaining why this was a high-priority bug that absolutely needed to be fixed because it was failing in subtle ways. Yet the line of code in question has been running in production, producing exactly the results it is expected to, for ~3 years.

        And this is just one simple example, of the many dozens+ of times it has failed this task this year. In that same review run, the agent suggested 3 additional "bugs" or other issues that should be addressed that were all flatly wrong or subjective. I'm at a point of absolute exhaustion with this sort of shit. It's worse than a junior half of the time because of how strongly opinionated it is. And the solution to this sort of problem is an endless amount of configuration and customization that will be forgotten about by all of us over time, leading to who knows what sort of knock-on effects (especially as we migrate from one model to the next). We have a guy on our team who has ~17,000 words in his agent and instructions files, yet he sees nothing wrong with this. I guess he just really loves YAML and Markdown.

    • deadbabe 7 hours ago
      I don’t think you truly captured the worst part:

      There comes a realization, to many engineer’s horror, that AI won’t be able to save them and they will have to manually comprehend and possibly write a ton of code by hand to fix major issues, all while upper management is breathing down their back furious as to why the product has become a piece of shit and customers are leaving to competitors.

      The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.

      But the solution doesn’t come. They realize there is nothing they can do. It’s over.

  • 20k 9 hours ago
    I always find these kinds of posts interesting, to compare the velocity that people seem to get with Ai, vs what I get by just coding by hand

    Coincidentally I've been working on a project for about 7 months now: its a 3d MMO. Currently its playable, and people are having fun with it - it has decent (but needs work) graphics, and you can cram a few hundred people into the server easily currently. The architecture is pretty nice, and its easy to extend and add features onto. Overall, I'm very happy with the progress, and its on track to launch after probably a years worth of development

    In 7 months vibe coding, OP failed to produce a basic TUI. Maybe the feature velocity feels high, but this seems unbelievably slow for building a basic piece of UI like this - this is the kind of thing you could knock out in a few weeks by hand. There are tonnes of TUI libraries that are high quality at this point, and all you need to do is populate some tables with whatever data you're looking for. Its surprising that its taking so long

    There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice. This seems to be backed up by the available productivity data, where AI users feel faster but produce less

    • manicennui 9 minutes ago
      This is probably because the people who feel like they receive the most benefit from LLMs never actually knew much about or were just incapable of writing good software before they started using LLMs.
    • ZaoLahma 8 hours ago
      > There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice.

      This metric highly depends on who uses the AI to do what, where strong emphasis is on "who" and "what".

      In my line of work (software developer) the biggest time sinks are meetings where people need to align proposed solutions with the expectations of stakeholders. From that aspect AI won't help much, or at all, so measuring the difference of man hours spent from solution proposal to when it ends up in the test loops with and without AI would yield... very disappointing results.

      But for troubleshooting and fixing bugs, or actually implementing solutions once they have been approved? For me, I'm at least 10x'ing myself compared to before I was using AI. Not only in pure time, but also in my ability to reason around observed behaviors and investigating what those observations mean when troubleshooting.

      But I also work with people who simply cannot make the AI produce valuable (correct) results. I think if you know exactly what you want and how you want it, AI is a great help. You just tell it to do what you would have done anyway, and it does it quicker than you could. But if you don't know exactly what you want, AI will be outright harmful to your progress.

      • zzzeek 4 hours ago
        the venn diagram of people who say "AI produces useless results I have to 100% throw away" and people who say "I've never successfully delegated parts of large software development to junior programmers" is a circle
    • fancythat 4 hours ago
      You nailed it, I came to the same conclusion recently. When people show me what have they done with LLM, I am left unimpressed as mostly they show things that can be done manually in a very short time. I also failed to observe the rise in availability of impressive software, which coincides with the fact that LLMs are currently being used to solve simple problems, instead of important ones.
    • chromadon 8 hours ago
      This struck me as odd too. 7 months? It wouldn’t take that long to write it in a new language.

      Another thing I don’t see mentioned is code quality.

      Vibe-coded code bases are an excellent example of why LLMs aren’t very good at writing code. It will often correct its own mistakes only to make them again immediately after and Inconsistent pattern use.

      Recently Claude has been making some “interesting” code style choices, not inline with the code base it’s currently supposed to be working on.

    • 21asdffdsa12 8 hours ago
      Seems to be baked into the GPT- produce text- aka to produce language and code is life and purpose. So the whole system is inherently internally biased towards "roll your own everything?" unless spoke too in a "Senior-dev" language, that prevents these repetitions.
    • echelon 6 hours ago
      This was made in two days of vibe coding. It has flaws, but it's impressive as hell:

      https://tinyskies.vercel.app/

      It's got a fun Zelda-inspired mechanic (I won't say which one), and you'll have to unlock abilities and parts of the world over several quests and modes to "win".

      It's also multiplayer.

    • thot_experiment 9 hours ago
      It's more complex than that, I think the reality is that there's a lot of code that's just not that deep bro. I have some purely personal projects that have components that I don't understand anymore, I wrote that shit by hand, they still work but I haven't touched that shit in years. There's a lot of code that AI can write that's like that that helps me, the stuff I would forget about even if I wrote it by hand. I think you have to have discipline in it's use, it's a tool like any other.

      AI, and especially agentic AI can make you lose situational awareness over a codebase and when you're doing deep work that SUUUUCKS, but it's not useless, you just have to play to it's strengths. Though my favorite hill to die on is telling people not to underestimate it's value as autocomplete. Turns out 40 gigabytes of autocomplete makes for a fucking amazing autocomplete. Try it with llama.vim + qwen coder 30b, it feels like the editor is reading your mind sometimes and the latency is so low.

  • snowe2010 15 hours ago
    > The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules.

    That’s the hard part of coding. If you have an architecture then writing the code is dead simple. If you aren’t writing the code you aren’t going to notice when you architected an API that allows nulls but then your database doesn’t. Or that it does allow that but you realize some other small issue you never accounted for.

    I do not know how you can write this article and not realize the problem is the AI. Not that you let it architect, but that you weren’t paying attention to every single thing it does. It’s a glorified code generator. You need to be checking every thing it does.

    The hard part of software engineering was never writing code. Junior devs know how to write code. The hard part is everything else.

    • Philip-J-Fry 10 hours ago
      Yes, I think there's 2 kinds of developer. Those who think the code is the hard part, and those that don't.

      The developers that thing coding is hard are the ones that absolutely love AI coding. It's changed their world because things they used to find hard are now easy.

      Those that think coding is easy don't have such an easy time because coding to them is all about the abstractions, the maintainability and extensibility. They want to lay sensible foundations to allow the software to scale. This is the hard part. When you discover the right abstractions everything becomes relatively easy. But getting there is the hard part. These people find AI coding a useful tool but not the crazy amazing magical tool the people who struggle with coding do.

      The OP is definitely in the second camp since they could spot and realise the shortcomings of the AI. They spotted the problem, and that problem is that the AI can't do the hard bit.

      • jfim 9 hours ago
        I'd say there's another camp: the camp of people who know that code isn't the hard part, but that it's still time consuming to write code. AI coding is pretty useful for that, when you can nail the design but you just need a set of hands to implement it.
        • Philip-J-Fry 7 hours ago
          I'm classing that as the second camp. Because you don't find it hard to do, it's just time consuming. It means you still know what you're doing and you're just using AI as a tool to accelerate your delivery. That's the optimal way to use in my experience if you want to actually deliver well architected software.
      • seer 9 hours ago
        But isn’t AI doing the same thing to project management as to coding?

        PMs can now cross reference and organize tickets with just a few keystrokes. Organisational knowledge, business knowledge, design systems and patterns, etc all of it is encoded in LLM consumable artefacts. For PMs it is the same switch - instead of having to do it by hand you direct lower level employees to handle the details and inconsistencies and you just do vibe and vision.

        When all of the pieces successfully connect and execute reliably, what is left for humans to do? Just direct and consume?

        And AI companies with their huge swaths of data are soon gonna be in the situation of being able to do the directing themselves

        • jcgrillo 7 hours ago
          Such a person is just pushing a giant pile of cleanup work onto their colleagues. Unless they actually checked, the "cross references" are probably wrong in places or just entirely made up. Lower level employees by definition don't have the experience to correct the more subtle inconsistencies, so you've basically just constructed a high pass filter that lets only the worst failures through. Moreover, you're absolutely guaranteed to lose the respect of those lower level employees--forcing someone else to clean up your sloppy work is just cruel, and people resent being treated cruelly.
      • byzantinegene 9 hours ago
        this pretty much sums up what i feel about AI currently. It made my life significantly easier for most tasks I already breeze through, yet tasks I used to struggle with are still the equally difficult
      • mountainriver 5 hours ago
        There are also just problems where the code is the hard part…
        • Philip-J-Fry 4 hours ago
          What problems are they? I can't really think of any problems where writing the code was the hard part.

          There's plenty of times where I don't know what code to write because I've never used a library before. But it's just a page of documentation away. It's not hard, it's just slow and tedious.

    • mikepurvis 14 hours ago
      I agree with what you're saying, but I think we do have a problem right now with definitions where there's a lot of people basically getting supercharged tab completions or running a chatbot or two in a parallel pane, but still clearly reviewing everything; and on the other side of things is freaking Steve Yegge pitching a whole new editor that lets you orchestrate a dozen or more agents all vibing away on code you're apparently never going to read more than a line or two of: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

      The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas. The second group are not, and those are the ones that I find a bit more worrisome.

      • RossBencina 13 hours ago
        > The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas.

        I can't speak for others, but I'd go further and say that LLMs allow me to go deeper on the design side. I can survey alternative data structures, brainstorm conversationally, play design golf, work out a consistent domain taxonomy and from there function, data structure and field names, draft and redraft code, and then rewrite or edit the code myself when the AI cost/benefit trade off breaks down.

        • mikepurvis 41 minutes ago
          GP and this is where I stand with it too. Additionally, the cost of "exploring" down a riskier design path and discovering the unknown unknowns is substantially reduced too, which I think ultimately leads to better decision-making on the design side. It's less "let's just stick to the pattern/tools that we know for sure works because we've done it before" and more "here's a vibed up mockup of it working, we can all see how this actually works and the better pattern that it enables".

          Obviously technical and design choices have risks beyond just initial implementation, and those have to be considered too (do we trust the dependency, will it still be there in a year, can we get fixes merged upstream), but I think there's significant value in driving down the cost of code sketches involving unfamiliar libraries and tools.

      • barrell 11 hours ago
        That’s a little bit of a No True Scotsman. Yes there are people who do not review anything; but even people who are reviewing every line from an LLM do not have the same understanding as someone who wrote it themselves.

        I’m not making a judgement call about which is better, but it was widely accepted in tech before the advent of LLMs that you just fundamentally lack a sense of understanding as a reviewer vs an author. It was a meme that engineers would rather just rewrite a complicated feature than fix a bug, because understanding someone else’s code was too much effort.

      • imtringued 8 hours ago
        That blog post is surreal. It's like cryptocurrencies and the whole web3 nonsense. Cryptocurrencies basically don't work, so there have been a hundred aimless attempts at fixing self inflicted problems caused by deficiencies of cryptocurrencies with no actual goal that has any impact on the real world.

        It's the same thing here. AI has dropped the cost of software development, so developers are now fooling themselves into producing low or zero value software. Since the value of the software is zero or near zero, it doesn't really matter whether you get it right or not. This freedom from external constraints lets you crank up development velocity, which makes you feel super productive, while effectively accomplishing less than if you had to actually pay a meaningful cost to develop something.

        Like, what is the purpose of Gas Town? It looks to me like the purpose of Gas Town is to build Gas Town.

        • weakfish 2 hours ago
          I am not exaggerating when I say reading the post made me concerned for the author’s mental health.
      • bmitc 11 hours ago
        > and on the other side of things is freaking Steve Yegge pitching a whole new editor that lets you orchestrate a dozen or more agents all vibing away on code you're apparently never going to read more than a line or two of

        I find it useful to not listen to people who just talk.

      • skydhash 13 hours ago
        > The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas

        I worry about the first group too, because interfaces and data structures are the map, not the territory. When you create a glossary, it is to compose a message, that transmit a specific idea. I find invariably that people that focus on code that much often forgot the main purpose of the program in favor of small features (the ticket). And that has accelerated with LLM tooling.

        I believe most of us that are not so keen on AI tooling are always thinking about the program first, then the various parts, then the code. If you focus on a specific part, you make sure that you have well defined contracts to the orther parts that guarantees the correctness of the whole. If you need to change the contract, you change it with regard to the whole thing, not the specific part.

        The issue with most LLM tools is that they’re linear. They can follow patterns well, and agents can have feedback loop that correct it. But contracts are multi dimensional forces that shapes a solution. That solution appears more like a collapsing wave function than a linear prediction.

    • seer 11 hours ago
      I’ve noticed that agents almost always fail at the planing vs execution stage.

      I follow the plan -> red/green/refactor approach and it is surprisingly good, and the plans it produces all look super well reasoned and grounded, because the agent will slurp all the docs and forums with discussions and the like.

      Trouble is once it starts working there would inevitably be a point where the docs and the implementation actually differ - either some combination of tools that have not been used in that way, some outdated docs, or just plain old bugs.

      But if the goals of the project/feature are stated clearly enough it is quite capable of iterating itself out of an architectural dead end, that is if it can run and test itself locally.

      It goes as deep as inspecting the code of dependencies and libraries and suggesting upstream fixes etc. all things that I would personally do in a deep debugging session.

      And I’m supper happy with that approach as I’m more directing and supervising rather than doing the drudgery of it.

      Trouble is a lot of my team mates _dont_ actually go this deep when addressing architectural problems, their usual mode of operandi is “escalate to the architect”.

      This will not end up good for them in the long run I feel, but not sure what they can do themselves - the window of being able to run and understand everything seems to be rapidly closing.

      Maybe that’s not super bad - I don’t exactly what the compiler is doing to translate things to machine code, and I definitely don’t get how the assembly itself is executed to produce the results I want at scale - that is level of magic and wizardry I can only admire (look ahead branching strategies and caching on modern cpus is super impressive - like how is all of this even producing correct responses reliable at such a a scale …)

      Anyway - maybe all of this is ok - we will build new tools and frameworks to deal with all of this, human ingenuity and desire for improvement, measured in likes, references or money will still be there.

    • staplers 14 hours ago

        You need to be checking every thing it does.
      
      This is what seems to be lost on so many. As someone with relatively little code experience, I find myself learning more than ever by checking the results and what went right/wrong.

      This is also why I don't see it getting better anytime soon. So many people ask me "how do you get your claude to have such good output?" and the answer is always "I paid attention and spotted problems and asked claude to fix them." And it's literally that simple but I can see their eyes already glazing over.

      Just as google made finding information easier, it didn't fix the human element of deciphering quality information from poor information.

      • krilcebre 10 hours ago
        How do you know what good output should look like with little code experience?
      • brabel 11 hours ago
        Looking at code looking for errors is a hard thing to do well for a large amount of code. A better approach is to ensure tests cover all the important cases and many edge cases. Looking at the code may still be a good idea but mostly to check the design. I think that once you get Claude to test the code it writes well, trying to find errors in the code is a waste of time. I’ve made the mistake of thinking Claude was wrong many times despite the tests passing just to be humbled by breaking the tests with my “improvements”!
    • tripledry 10 hours ago
      This is the only way for me to use Agents without completely hating and failing at it. Think about the problem, design structures and APIs and only then let AI implement it.
    • skydhash 14 hours ago
      And when you got familiar with the other parts, you realize that writing code is the most enjoyable one. More often than not, you’re either balancing trade offs or researching what factors yoy have missed with the previous balancing. When you get to writing code, it’s with a sigh of relief, as that means you understand the problem enough to try a possible solution.

      You can skip that and go directly to writing code. But that meant you replaced a few hours of planning with a few weeks of coding.

      • jcgrillo 7 hours ago
        Very well said. When I'm working on a hard problem I'll often spend a few weeks sweating details like algorithms, API shapes, wire formats, database schemas, etc. These things are all really easy to change while they're just in a design document. Once you start implementing, big sweeping edits get a lot more difficult. So better to frontload as much of that as possible in the design phase. AI coding agents don't change this dynamic. However all that frontloaded work pays off big when it does come time to implement, because the search space has been narrowed considerably.
  • plastic041 15 hours ago
    Title says

    > back to writing code by hand

    But what they are doing is

    > doing the __design work__ myself, by hand, before any code gets written.

    So... Claude still is generating the code I guess?

    And seriously, I can't understand that they thought their vibe coded project works fine and even bought a domain for the project without ever looking at source code it generated, FOR 7 MONTHS??

    • 0xpgm 10 hours ago
      In short, it is simply a click-bait title.

      And the goal of the article is to draw attention to their project.

      • lelanthran 6 hours ago
        > And the goal of the article is to draw attention to their project.

        Additionally, they couldn't even bother to write their own blog post, so it's a little hard to take them seriously when they say they're going to write their own code...

      • kdheiwns 8 hours ago
        It's the same thing every time.

        > Claude (c) by Anthropic (R) is the best thing since sliced bread and I'm Lovin' It(tm)! Here's a breakdown of you too can live a code free life for 10 easy payments of $99.99 a month if you subscribe now!

        > Step one in your journey to code free life: code the whole damn project and put it together yourself

        It's so much fluff and baloney and every single article is identical. And every single one is just over the top praise of Claude that doesn't come off as remotely authentic. There's always mentions of Claude "one shotting"(tm) something.

    • dewey 13 hours ago
      I bought domains for projects minutes after the idea.

      I don’t think it’s that weird to not look at the code if it’s a side project and you follow along incrementally via diffs. It’s definitely a different way of working but it’s not that crazy.

      • bayarearefugee 12 hours ago
        > I don’t think it’s that weird to not look at the code if it’s a side project and you follow along incrementally via diffs.

        Its not weird to not look at the code, as long as you're looking at the code? (diffs?)

        Uh, ok

        • retsibsi 11 hours ago
          The article explicitly says that the author looked at the diffs; it distinguishes this from "sitting down and actually reading the code", which they didn't do. So when plastic041 says the author spent 7 months vibe coding "without ever looking at source code", it's not unreasonable for dewey to assume that "looking at source code", in this context, actually means something stronger and excludes just looking at the diffs.
  • IanCal 9 hours ago
    I feel like I’m watching developers speed run project and product management learnings.

    We’ve moved to seeing that specs are useful and that having someone write lots of wrong code doesn’t make the project move faster (lots of times devs get annoyed at meetings and discussions because it hinders the code writing, but often those are there to stop everyone writing more of the wrong thing)

    We’ve seen people find out that task management is useful.

    Now more I’m seeing talk of fully doing the design work upfront. And we head towards waterfall style dev.

    Then we’ll see someone start naming the process of prototyping, then I’m sure something about incremental features where you have to ma age old vs new requirements. Then talk of how really the customer needs to be involved more.

    Genuinely, look at what projects and product managers do. They have been guiding projects where the product is code yet they are not expected to read the code and are required to use only natural language to achieve this.

    • meetingthrower 9 hours ago
      So right. All these guys have never been managers. Do you think humans don't write things that break? Or that teams sometimes take a wrong path and burn a week of work? Or months? Well now you can experience all of that in 30 minutes of vibecoding. As a former tech product manager, it feels EXACTLY the same.
      • yakshaving_jgt 8 hours ago
        Except it isn't the same because the cost is different, which allows discovery that we couldn't afford previously.
        • IanCal 2 hours ago
          Yes, lots of the process and problems and solutions to them are the same but we’ve just massively cut the cost of a part of development. That has huge ramifications about when it makes sense to tackle different things and how tradeoffs work out.

          Was it strongdm talking about the dark factories? They were working on some integration software so needed to use google drive and slack and lots of other things. They fully reimplemented those to the level they needed for their tests - outside of the biggest firms this would probably have been an enormous time and money sink. Now it’s reasonable.

          On a personal project, with my wife we wanted a tracker for holiday planning. Five minutes given a barely through through request and we had a working prototype, fixed bugs in seconds and then talked through with a model what we needed and how it did or didn’t fit (and we needed that first version to figure that out). It helped drive out actual requirements from us, prioritise them, choose a stack add tickets and then went ahead and implemented it pretty far. Have a mostly working v2 which has highlighted some details about what we really wanted. Total invested cost was one day of a $20 subscription and maybe half an hour of talking to a bot and checking results.

        • meetingthrower 7 hours ago
          Yes x 1000. I find it amazing.
  • fitsumbelay 59 minutes ago
    I wonder how viable this debate is outside of dev circles.

    For example, if I'm new to programming today and I'm not part of any community that necessarily approves agentic coding or disapproves of vibe coding and I heard that C programs run fast as heck and I heard that I can automate jobs 1,2 and 3 with such a program, I generate said program and it works as expected per my limited experience then what's the issue?

    Perhaps in a couple of weeks I notice I'm missing 1/4 of my HD space and I figure out probably via an agent that my cool C program is creating bloat through caching or creating hidden dot files, so I agentically/vibe-ally generate a patch. Maybe this encourages me to join a community of other amateurs or a pro-am community where I learn specifics - eg. the exact bug(s) in my code -- as well as metas -- eg. testing.

    There will probably be millions and millions of people generating code for their own purposes thanks to LLMs, and the number grows as the technology develops and becomes more trivial. So I wonder how much value there is in the "how to think about this" discussion vs the "how to use this" discussion. It almost feels like religious encampments are forming over a false -- possibly manufactured -- lines of division

  • xantronix 15 hours ago
    So you're not actually writing code by hand? I'm very confused by the difference between the title and the conclusion here.
    • rane 13 hours ago
      The point was to come up with a sensationalistic headline that HN eats up and post flies to the front page.
      • Towaway69 9 hours ago
        I wonder whether the title was generated/suggested by an AI?
    • dwedge 4 hours ago
      I don't think they even wrote the article by hand. It seems like the title got to the top of HN not the article.
  • deferredgrant 16 minutes ago
    There is a craft argument here that is easy to dismiss until you lose fluency. Tools that save effort can also remove practice.
  • viceconsole 13 hours ago
    > Vibe-coding makes you feel like you have infinite implementation budget. You don't. You have infinite LINE budget (the AI will generate as much code as you want). But you have the same finite complexity budget as always.

    This is a special case of a general fundamental point I'm struggling with.

    Let's assume AI has reduced the marginal cost of code to zero. So our supply of code is now infinite.

    Meanwhile, other critical factors continue to be finite: time in a day, attention, interest, goodwill, paying customers, money, energy.

    So how do you choose what to build?

    Like a genie, the tools give us the power to ask for whatever we want. And like a genie, it turns out we often don't really know what we want.

    • TranquilMarmot 11 hours ago
      Right - knowing what to actually build always has been and always will be the limiting factor to actual success. I could spend months and hundreds of dollars generating the absolute BEST todo list that's out there but nobody wants that.
    • ozim 11 hours ago
      I have vibe coded 3 applications I never had time to code but always wanted.

      Now it is different in a way where now I don’t have time to use those apps.

      That’s a joke.

      But I do believe it answers the question of “what to build?”. If you didn’t have time before LLM assisted coding you still don’t have time for it. You most likely know what is used and what not already by heart or by some measurements.

  • selfsimilar 23 minutes ago
    > For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote. I'd look at the diff, verify it compiled, test the happy path, move on. But now something was fundamentally broken and I couldn't just prompt my way out of it.

    I stopped reading after this, because this is the dumbest way to vibe code anything larger than a single-use tool.

    Claude is a collaborator, and honestly a decent voice of dissent, but it will never offer that unprompted. "Make this thing" - "OK".

    You need to review the code. You need to say "I want this, AND HERE IS THE LONG-TERM VISION. Now offer critique and the trade-offs for various implementations."

    Or just realize that in every hand-written project you learn the contours of the problem space as you go along and if the tool is big enough you'll feel the urge to do a green-field rewrite of hand-rolled code after a few years. You get there quicker with the robot's help. This is not a new lesson.

    • gosukiwi 21 minutes ago
      bad devs are still bad, good devs are still good
  • simon84 9 hours ago
    Personally, i've taken a serious step back from 'unsupervised' vibe-coding. When the codebase is clean and you want some additional fix or small feature, Claude is quite good at mimicking your style and does a pretty good job.

    When asking for a new major feature, despite hard guidelines and context (that eat half your context window), then it quickly ships bloat. The foundations are not very well organized and this is where you acknowledge it is all about random-prediction of the next word-thing.

    Overall, i've wasted more time reviewing the PR and trying to steer it properly than I expected. So multi-layer agent vibe coding is no longer the way to go *for me*. Maybe with unlimited tokens and a better prompt, to be investigated...

    • Rapzid 2 hours ago
      And it can quickly start spiraling out of control. The bloated implementations keep adding more and more context it needs for the next change. Discovery results start getting worse, implementations get worse, and bloat continues to increase.
      • simon84 1 hour ago
        Actually it was sort of fun to see that the AI started writing comments to itself by gradually explaining what it was trying to do and ways it failed to do it.

        Then it spent more time appending comments to its own comments rather than writing code ^^

  • shahbaby 15 hours ago
    This reads too much like it was LLM generated. I can't say for sure if it was but I have an allergic reaction to the short snappy know-it-all LLM writing style.
    • TranquilMarmot 11 hours ago
      AI;DR
    • baxtr 12 hours ago
      Writing code by hand but blog post are written by LLMs?
    • fromwilliam 13 hours ago
      yeah, it set off my llm radar too
  • web007 3 hours ago
    So much of the problem here is that the author blindly trusted the agent. They're enthusiastic juniors, not jaded seniors.

    Prompt for what you want. Get your feature working, then cut: reduce SLOC, refactor to remove duplication, update things to match existing patterns. You might do these instinctively, or maybe as-you-go, but that's just style. Having a dedicated pass works just as well.

    The same thing goes for my code now that did when I wrote every line by hand: make it work, then make it good, then make it manageable. Manually that meant breaking things down into small blocks of individual diffs inside a PR (or splitting PRs), checking for repetitive code and refactoring, or even stashing what I got to and doing it again with the knowledge of how things went wrong.

    Agents can do the same. It's WAY easier mentally and works out better if you treat them the same way and go working -> better -> done.

  • sim04ful 52 minutes ago
    My opinion is that we're using the wrong paradigms for LLMs. We should be leaning more on declaratively specifying behaviour.

    If there's any hope for reliability, auditability, predictability to be had it lies in contraining and LlMs grammar whilst delegating freeform behavior to a more passive substrate.

  • theunmanagedboy 1 hour ago
    The cognitive debt caused by AI autocompletion and Agent stuff is real. I'm feeling it right now. I started a project on my own, writing every line of code but then out of timeline pressure I started using Claude Code. The atrophy it has caused to go and edit the code is real. I'd rather rely on the slot machine than my own experience. SAD!
  • ilaksh 25 minutes ago
    He says he went several months without having to do a code review and it worked the vast majority of the time. That's incredibly impressive work by the AI.

    AI may default to mediocre and often somewhat buggy code unless you iterate because that is just what the vast majority of human written code that it has seen looks like. But the fact that he got away with not reviewing the code for so long to me proves the opposite of his conclusion.

    1690 lines of code in one file is a walk in the park for SOTA models.

    He can just say something like:

    "Please review and create a refactoring plan and test suite. I found atrocious architectural decisions like numerous special cases and if statements rather than using abstractions properly. Make a few notes in comments and architecture.md to never do this again."

    One could also argue that it was a better decision each time by the AI to just never do a refactor unless prompted because that increases the likelihood of something breaking and you want to do that after you verify the minimum code change actually functionally does what you want.

    Also I bet you the headline is a lie. He basically admits it by saying he is writing the core structure of the next version by hand ahead of time, implying that he will generate the rest. So the title is a half-truth at best.

    • wolttam 20 minutes ago
      > Also I bet you the headline is a lie.

      He's already 5k+ LOC into the rust rewrite...

  • graphememes 17 minutes ago
    they are just doing design work now, they could have done design work with go too, without even knowing go

    clickbait title

  • khasan222 6 hours ago
    I’m not very familiar with Go, however after looking at the repo I can’t help but notice there is no infra to ensure code quality. Do others see the same thing, because if so that is the real problem

    Yes I agree for sure llms write terrible code when left to their own devices, but so do most engineers. Which is why we have so many tools to help keep a certain level of quality. Duplication checks, tests, linters, other engineers.

    I find whenever you make an llm repo without these checks, and more, it will write like an enthusiastic junior engineer, wrong and strong. However a junior engineer would be hard pressed to get 95% coverage on a codebase, the ai is more than willing and does it in a few minutes. We can use things like this to our advantage, how many people have ever seen a repo with 100% test coverage? With ai this is very possible, with people not so much.

    LLM’s writes terrible code, we know this, but when dealing with humans that write terrible code we have many techniques. We should be using those same techniques to keep the llms honest, but more importantly verifiable.

    • shimman 19 minutes ago
      Go has a built-in tools that mimic formatting + linters. Also LSP is a first class citizen in Go. I don't know what other "code quality" infra there is out there aside from formatting and linting.
  • spicyusername 6 hours ago
    It's really very easy to spend a few hours going through a vibe-coded project by hand and having an agent fix the weird parts. If you do this often enough, you can get the best of both worlds.

    Then you're right back on track.

    In a way it's not that different from a human-made project. Plenty of teams have to crunch, ignoring the architecture and incurring tech debt, and then come back and fix it later.

    • ex-aws-dude 2 hours ago
      That’s what I found too

      I have to periodically get it to do a bunch of refactoring

  • archleaf 16 hours ago
    So what you really mean is you are going to do better and more detailed skills files so you can get an architecture that you've thought through rather than something random?
    • dropbox_miner 16 hours ago
      Partly, but the order matters. The CLAUDE.md constraints only work if you designed the architecture first. They're just how you communicate it to the AI. The mistake I made wasn't writing bad skills files, it was not designing anything at all and expecting the AI to make coherent structural decisions across 30 sessions.

      The rewrite is me sitting down with a blank doc and drawing the boxes before any code exists. Then the CLAUDE.md enforces what I already decided. Whether that actually holds up as the project grows, I genuinely don't know yet.

      • cpncrunch 16 hours ago
        Are you really saving any time at all using AI at all then? If you have to write the architecture for it, write all the rules you want it to follow, check everything it's written, and then reprompt it because it's not how you want it?
        • SpicyLemonZest 15 hours ago
          Yes. I do all of this and I'd estimate 50-100% coding time savings. A lot of that comes from better multitasking over single-workstream throughput, which I suppose might compromise the gains depending on what you're doing. For me it amplifies the speedup by allowing some of my "coding time" to be spent on non-coding tasks too.
          • cpncrunch 15 hours ago
            But even if coding time is reduced by half, is that worth the downsides? Coding has never really been a major percentage of my time.
            • SpicyLemonZest 13 hours ago
              I could be wrong in some subtle way I'm not seeing, but I believe the model we're working in avoids the downsides. I actually think my review bar is slightly higher now, because I don't feel as much pressure to compromise my standards when I know Claude is capable of writing the code I want.
  • erelong 16 hours ago
    Can't you just ask AI to break up large files into smaller ones and also explain how the code works so you can understand it, instead of start over from scratch?
    • dropbox_miner 15 hours ago
      That was actually the first thing I tried. It did a good jov at explaining the code base mess and the architecture. Then I ran 3-4 refactor attempts. Each one broke things in ways that were harder to debug than the original mess. The god object had so many implicit dependencies that pulling one thread unraveled something else. And each attempt burned through my daily Claude usage limit before the refactor was stable.

      And I'm sure the rewrite is going to teach me a whole different set of lessons...

      • tres 14 hours ago
        What's your test coverage like?

        Not sure why good coverage wouldn't mitigate risk in a refactor...

        My mantra whenever I'm working with AI is that I want it to know what "point b" looks like and be able to tell by itself whether it's gotten there...

        If you have a working implementation, it sounds like you have a basis for automated tests to be written... once you have that (assuming that the tests are written to test the interface rather than the implementation), then it should be fairly direct to have an agent extract and decompose...

    • striking 15 hours ago
      I'm currently working on the discovery phase of a larger refactor and have pretty quickly realized that AI can actually often be pretty useless even if you've encoded the rules in an unambiguous, programmatic way.

      For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.

      And at first, it will appear to have Just Worked™. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.

      As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.

      This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.

    • joshuanapoli 15 hours ago
      Rewrite following a new architecture plan could get finished pretty quickly, treating the original as a prototype.
    • SpicyLemonZest 15 hours ago
      When people talk about codebases being "incomprehensible", it's not always hyperbole. Sometimes the architecture literally cannot be broken up or understood.
      • whattheheckheck 15 hours ago
        I find that really hard to believe. It's not like curing cancer
        • pixl97 14 hours ago
          When you see some legacy C++ codebase with millions of lines of code, catching cancer and slowly dying from it is more human than trying to unscrew that mess.

          A really screwed code base blows out your context window and just starts burning tokens as the AI works out a way to kill -9 itself to escape the hell you're subjecting it to.

        • NichoPaolucci 14 hours ago
          While I mostly agree - science is built up on truths. Code has a large amount of creativity and freedom built into the decisions, some codebases will be documented, follow rigorous training, and design decisions. Others will just be an absolute legacy mess of 20 years of odd decisions made by people who may have not known what they were doing. Like an art piece that you don’t really “understand”.
        • chamomeal 15 hours ago
          No but it can be a rube goldberg machine of insanity
        • SpicyLemonZest 13 hours ago
          [flagged]
  • abalashov 2 hours ago
    I went back to writing code by hand quite some time ago and cannot say there has been any loss of velocity or productivity for it.

    I really do think this whole thing is a wash.

  • mtrovo 6 hours ago
    Most of the issues are around code hygiene rather than just LLM code being bad. You're creating code 10x faster, but you're also writing unit tests 10x faster, not just that but integration tests, CICD workflows, prod monitoring, product and engineering documentation, etc. It was already the way to get good code quality before, nowadays I think it's just reckless to generate code that's not backed by 100% test coverage and pass all lints and static checks configured.
    • mountainriver 5 hours ago
      This is it, people are acting like bad code wasn’t written before. My wife and I were full on laughing about it in bed the other night of all the absolutely horrible code we’ve seen written and how people actually think LLMs are worse than that.

      The quality gates are up to you, and if you are smart you will make a lot of them and review them closely

  • throwaway2027 7 hours ago
    I'm thoroughly enjoying using AI to write code, but it paid off by years of doing things the hard way before. I already was a so called "10x developer" if I speak for myself. I'm doing things even faster now with AI.
  • haolez 2 hours ago
    I've started using OpenSpec[0] recently to mitigate problems like that, but I'm still very early in this journey.

    Can someone with more experience with it (or similar tools) chime in and confirm that this isn't just more AI snake oil? :)

    [0] https://openspec.dev/

  • yason 8 hours ago
    We're still in the early ages and must discern hard what AI is good for, what it can maybe do, what it could potentially do and what it just can't do, and move those threshold marks very conservatively. AI is also cheap enough that it's worth shots of experiments. As long as you don't really rely on AI it's easy to test the capabilities of this new conversational autocomplete, and the random gains it offers can be magnificent (except when they aren't, of course).

    What has generally worked for me is paraphrasing the old adage "Write the data structures and the code will follow" over to AI. Design your data, consider the design immutable and let the AI try fill in the necessary code (well, with some guidance). If it finds the data structures aren't enough, have it prompt you instead of making changes on its own. AI can do lot of the low-hanging fruit and often the harder ones as well as long as it's bound to something.

    Yet, for now, AI at best has been something that relieves me from having to write a long string of boring code: it's not sustainable to keep developing stuff relying on AI alone. It's also great when quality is not an issue; for any serious work AI has not speeded me up noticeably. I still need to think through the hard parts, and whatever I gain in generating code I lose in managing the agents. But I can parallelise code generation, trying new approaches, and exploring out because AI is cheap. AI is also pretty good for going through the codebase and reasoning about dependencies whether in the context of adding a new feature or fixing a bug: I often let AI create a proof-of-concept change that does it, then I extract the important bits out of that and usually trim down the diffs down to at least 1/3 or less.

    AI further helps with non-work, i.e. tasks that you have to do in order to fulfill external demands and requirements, and not strictly create anything solid and new. I can imagine AI creating various reports and summaries and documentation, perhaps mostly to be consumed and condensed by another AI at the receiving end. Sadly, all of this is mostly things not worth doing anyway.

    Overall, I cringe under all the hype that's been laid on AI: it's a new tool that's still looking for its box or niche carveout, not a revolution.

    • ktzar 7 hours ago
      I don't think we're in the early ages... LLMs technology has essentially stagnated since GPT3.5, we just have bigger models that can handle more context. We're trying to cope for the lack of progress of the actual technology by coming up with contraptions of multiple models stuck together, Mixture-of-Experts, Reviewer models, PM models...
  • larusso 7 hours ago
    I ran quite early into the same issues with my rust pet projects. Single structs with tons of Option<T> and validation methods etc. enums for type fields combined with says optional fields in the same layer so accessor methods all return Option<T>.

    I add now a long list of instructions how to work with the type system and some do’s and don’ts. I don’t see myself as a vibe coder. I actually read the damn code and instruct the ai to get to my level of taste.

  • tvbusy 11 hours ago
    I don't think the prompts that the author has proposed will actually work. Including final scope and non-scope is good but it's more of a reaction of what the AI already did. These prompts are suitable for a rewrite, basically, since it's unlikely anyone would have had these ready when they start out.

    I have found small iterations to have the best results. I'm not giving AI any chance to one shot it. For example, I won't tell it to "create a fleet view" but something more like "extract key binding to a service" so that I can reuse it in another view before adding another view. Basically, talk to the AI as an engineer talking to another engineer at the nitty gritty level that we need to deal with everyday, not a product person wishing for a business selling point to magically happen.

  • radicalbyte 10 hours ago
    I don't understand the people who "get the agent to do everything" for them. It just makes a mess if you do that. Yet if I spend a little bit of time setting a project up properly (including telling my minions exactly what to do) I can then get it to do the boring things for me.

    The very worst things you can do in a codebase are (a) not deeply understand how it works (have it be magic) and (b) be lazy and mess up the structure.

    How do you fix a problem which happens at 2:00am and takes your system down if you don't have an excellent understanding of how it works?

    Over time we're already bad at (a) because most developers hate writing documentation so that knowledge is invariably lost over time.

  • binyu 16 hours ago
    > I'm rewriting k10s in Rust. Not because Rust is better but, because it's the language I can steer. I've written enough of it to feel when something's wrong before I can articulate why. That instinct is the one thing vibe-coding can't replace. The AI hands you plausible-looking code. You need a nose for when it's garbage.

    Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically.

    > The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt.

    This post is good to grasp the difference between "vibe-coding" and using the AI to help with design and architectural choices done by a competent programmer (I am not saying you are not one). Lately I feel that Opus 4.7 involves the user a lot more, even when given a prompt to one-shot a particular piece of software.

    • dropbox_miner 16 hours ago
      Go reads fine whether the architecture is good or bad, and I couldn't tell the difference until I was in trouble. Rust is harder to read but harder to misuse. The borrow checker would have caught that data race at compile time. I've also just written more Rust. That familiarity matters separately.

      +1 on Open 4.7 involving the user a lot more. Rn I'm trying to get to a state where I can codify my design + decision preferences as agents personas and push myself out of the dev loop.

      • ok_dad 11 hours ago
        Buddy that k10s code was never good. Go vs Rust is not the issue here, it’s the fact the project was vibe coded without reading anything. It’s hilarious to even think that a god model was caused by anything other than someone who let the bot choose too much.

        Good architecture in any language is obvious to someone who is experienced and cares.

        Go is actually great for bots to write if you’re actually thinking.

      • binyu 15 hours ago
        Gotcha, that implies you are going to read the code that the AI produces anyways.

        > Go reads fine whether the architecture is good or bad

        Were you reading the Golang code all along and got fooled or did you review it after it failed? Sorry I admit I didn't read the whole article.

        • williamstein 15 hours ago
          He was NOT reading the code: "For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote."
          • binyu 15 hours ago
            Right, thank you. Personally I think reading all the code that the AI produces is impossible and kind of defeats the purpose of using it. The key is to devise a structured way to interact with it (skills and similar) and use extensive testing along the way to verify the work at all steps.
    • cortesoft 14 hours ago
      > Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically

      It sounds like the author knows Rust, and might not be as familiar with Go.

      A language that you are proficient in is always going to be easier read than one you don’t, even if it is an objectively easier language to to read in general.

      • travisgriggs 12 hours ago
        In a world where juniors (or seniors in new territories) are incentivized to publish or perish, how will any of us gain proficiency any more? I can see the agent assisted journey accelerating some familiarity, but not proficiency.

        I’ve used AI tools to do i18n translations to Spanish and Portuguese (somewhat ashamed to admit this). I’ve grown more familiar with the structure of these languages, and come to recognize some of the common vocabulary for our agtech domain. If anything, I feel more clueless about both languages now than I did before, when it comes to any sort of proficiency.

  • kccqzy 4 hours ago
    > AI builds features, not architecture.

    I see this in Claude too, but I also see this in junior engineers. In the case with Claude, I simply ask it to refactor immediately after each feature is done. The human is still responsible for the AI writes, so if the AI writes code that’s gross, I would never push that lest it sully my name and my reputation for my own code quality.

  • zem 9 hours ago
    I don't bother trying to give the LLM a set of dos and don'ts for how to write the code, that becomes a frustrating game of whack-a-mole. I find it a lot more efficient to have it write some code, look it over, and if I'm not happy with some of the decisions give it specific instructions for how to fix that one part. as a bonus I end up reinforcing my knowledge of the code base in the process.
  • RuoqiJin 13 hours ago
    This is Claude's problem. Compared to GPT-5.5, Claude Code prefers to take shortcuts. I've tested having codexapp GPT-5.5 and Claude Code opus4.7 do the same thing - if following GPT-5.5's requirements, Claude Code's execution time for a task would stretch from 5 minutes to 40 minutes. To solve macro architecture problems, I use Lisp to write the entire program's framework. Lisp replaces architecture documents, because I believe it has high semantic density, syntax restrictions, and checkers for assistance. This way, at least I didn't have to rework anything anymore. I used this method to refactor my 20+ projects
  • AntiUSAbah 7 hours ago
    Im exploring currently if i should split up a project into a framework part and the game itself (2d, idle game).

    The framework could be an isolation later against viberod but not sure if its necessary for my small project i always wanted to do and never done anything with it.

    For another tool, i will try another approach: Start with a deep investigation and spec write together with AI, than starting with the core architecture layout and than adding features.

    So instead of just prompting "write a golang project with a http server serving xy, and these top 3 features" i will prompt "create a basic golang scarfold for build and test" -> "create a basic http server with a basic library doing xy" -> "define api spec" -> "write feature x"

    There is kind a skill and depth to vibe coding though.

  • dwedge 4 hours ago
    Clickbait title about not writing code by hand anymore, both the article and future code generated by AI. This is meta.
  • eddy-sekorti 5 hours ago
    Yes, i also do this, the old feeling of writing something, deploying, testing and fixing the bugs is good. Vibecoding can never replace this feeling.
  • Myrmornis 14 hours ago
    > I typed :rs pods to switch back to the pods view. Nothing rendered. The table was empty... > now something was fundamentally broken and I couldn't just prompt my way out of it.

    Hey I don't want to over simplify, I'm sure it was complicated, but did the author have functional tests for these broken views? As long as there are functional tests passing on the previous commit I'd have thought that claude could look at the end situation and work out how to get the desired feature without breaking the other stuff.

    TUIs aren't an exception, it's still essential to have a way to end-to-end test each view.

    • jvuygbbkuurx 11 hours ago
      The problem wasn't the view didn't work. The problem was the view didn't work after something else had been done.

      You can't test every permutation of app usage. You actually need good architechture so you can trust your test and changes to be local with minimal side-effects.

      • Myrmornis 6 hours ago
        > The problem was the view didn't work after something else had been done.

        In that situation you have two choices:

        1. Tell claude to iterate until the tests for the new view and the old views are all passing.

        2. git reset --hard back to the previous commit at which all tests are passing and tell claude to try again, making sure not to break any tests.

        It's essential to use tests when vibecoding anything non trivial. Almost certainly in a TDD style.

  • pjmlp 11 hours ago
    I am still mostly coding by hand, other than meeting the KPIs of AI use at the company, required trainings, use of agents and whatever.

    Eventually like every hype wave the dust will settle, and lets see where we stand.

    By now all the AI companies have consumed all human knowledge so they either learn to actually think for themselves, or that is it.

    Either way, that won't change the ongoing layoffs while trying to pursue the AI dream from management point of view.

    • 0xpgm 10 hours ago
      > Either way, that won't change the ongoing layoffs while trying to pursue the AI dream from management point of view.

      I think most companies doing layoffs are bloated to begin with, AI is just the scapegoat to do the layoffs.

      • pjmlp 9 hours ago
        I am aware of layoffs that are really caused by AI.

        Translation and asset generation teams for enterprise CMS, whose role has now been taken by AI.

        Likewise traditional backend development, that was already reduced via SaaS products, serverless, iPaaS low code/no code tooling, that now is further reduced via agents workflow tooling, doing orchestration via tools (serverless endpoints).

  • ninjahawk1 7 hours ago
    A problem often ignored is that while AI is trained on human written code, how it writes is different in practice.

    Will that improve or get worse? One would argue that LLMs in general are drastically more competent now than they were a couple years ago, they’re also much better at coding. We’re likely just now entering the era where they can code but are still not what you’d fully expect, or at least not what someone with absolutely no coding knowledge could use to code at the same level as someone who does know how to code.

    Maybe that changes as the models improve, maybe it doesn’t, only time will tell.

  • nopurpose 5 hours ago
    Feels like it can be solved wirh even more AI: adverserial models reviewing and testing work performed by main model.

    Actually I am curikus to try somwthing like that myself. Is there an existing orchestrating engine (or single agent) which can spawn multiple subagents and keep passing their feedback/output between each other until all of them agree that assignment overall is complete?

  • neals 7 hours ago
    I'm moving very slowly into AI coding. I'm not comfortable enough to let Claude do anything big. What I do is this: I set out general architecture, create function stubs and add comments on how to implement things. Then I let Claude do 10 minutes of work and I check everything and refactor some of it. It saves me on boring implantation stuff (like, is this an array, move an index here or there, check for whatever exists or not, put it in the db)
  • desireco42 36 minutes ago
    I understand, and I saw this problem. It's actually quite hilarious that he got this far before noticing it.

    But again, if you just guide the AI on architecture and review the code, you should be fine. The code that you write and the code that an AI writes are two different things; they will never be the same.

    The AI is very helpful for generating code, and that is exactly how you should use it: as a code generator.

  • cultofmetatron 9 hours ago
    the ship has sailed on my handcoding at work. the AI is producing stuff thats more bulletproof than what I can do in the same timeframe and if my competitors are using it, the pressure to ship is that much higher.

    Personally, I've taken the time its freed up to spend more time on mathacademy and reading more theory oriented books on data structures and algorithms. AI coding systems are at their best when paired with someone with broad knowledge. knowing what to ask for and knowing the vocabulary to be specific about what you want to be built is going to be a much more valuable job skill going forward.

    One example is a small AI based learning system I have been developing in my free time to help me learn. the mvp stored an entire knowledge graph and progress in markdown files. being an engineer, I knew this wouldn't scale so once I proved the concept viable, I moved everything into sqlite with a graphdb. then I decided to wrap some parts of teh functionality in to rust and put everything behind a small rust layer with the progress tracking logic still being in python.

    someone with no knowlege of graph databases or dependncy graphs or heuristics would not be able to build this even if they had AI. they simply don't know what they dont' know and AI wont' save you there.

    That said, I think its important to also spend time in the dirt. I've recently started pickign up zig as my NO AI langauge just to keep. those skills sharp.

    • oblio 9 hours ago
      > the ship has sailed on my handcoding at work.

      I'm really curious if we'll seesaw once AI costs go up 10x.

      • cultofmetatron 6 hours ago
        I've been relying primarily on deepseek-v4-flash for 90% of my work. It sips tokens. that model will run on 128gb. not a cheap configuration for a consumer but within the budget of a developer relying on it for work.

        Ive only been using kimi 2.5 and deepseek pro for reviewing PRs for security issues. less than 10% of my workflow requires a full powered frontier model.

        I think the issue is overblown by people who think claude code is a good harness and use opus for everything. opencode is objectively better. its much more verbose about what its doing, you have more control when it comes to offloading to subagents with targeted context (crucial for running through larger jobs) and I can swap between codex and open weight models.

        • oblio 4 hours ago
          I'd want to agree with everything you say.

          However everything in this field is cargo culting. We have absolutely no way to quantify productivity in the real world.

          We've had advanced programming languages backed by advanced programming language theory for decades and the most used/ran programming languages in the world are C, PHP and JavaScript, languages held together by duct tape or in the case of C, programing language theory from the 60s.

          We have a super minimal JavaScript runtime in the browser to avoid a bloated standard library and then people invent things like leftpad. At the same time basically every major website in the world serves mega bytes of tracking and ad serving libraries.

          We all "know" AI makes coders more productive but nobody can do the equivalent of a clinical trial for a major new drug.

      • wartywhoa23 9 hours ago
        And they will.
  • Aeolun 12 hours ago
    I think the answer here is to not use Claude with bubble tea. I tried the same thing and got the same result. But it seems to be limited to that specific framework, because it's really good at not doing the same thing with SolidJS.
    • neomantra 9 hours ago
      While I felt this in 2025, I do not feel this in 2026. I use Claude and the rest with BubbleTea all the time.

      But I will say... you have to know Golang. You have to have at least tried to make a BubbleTea app yourself and try to understand ELM architecture. You have to look at the code and increment with it.

      It makes total sense for OP to switch to Rust and Ratatui if they don't know Golang well. But I don't think it's a better language for it. [Ratatui has brought me great inspiration though!]

      Independent of framework, the LLMs get the spacial relationships. I say things like "the upper right panel's content is not wrapping inside and the panel's right edge should extend to the terminal edge" and the LLM will fix it. They can see the resultant text, I'm copy-pasting all the time.

      TUI code is finicky; one mis-rendered component mucks everything up. The LLMs will decide themselves make little, temporary BubbleTea fixtures to help understand for itself when things aren't right.

      The only real problem with LLMs and BubbleTea is that upon first prompt, they insist on using BubbleaTea v1 versus BubbleTea v2, released in December 2025. But then you just point it to the V2_UPGRADE.md and it gets back on track. That will improve as training cutoffs expand.

      I vibe-coded this TUI for Mom's last night. I actually started with Grok (who started with v1) and then moved into Claude Code after some iteration:

      https://gist.github.com/neomantra/1008e7f2ad5119d3dd5716d52e...

  • snickerbockers 1 hour ago
    I like to explain my opposition to vibe coding by replacing the phrase "write code for you" to "fuck your wife for you". You could make all the same arguments that the AI could do a better a job, its never impotent, it frees you from being pressured to do it when you might be tired or not in the mood etc. But thats not the point and most people would still be opposed to sort of, err, "vibe vibrating".

    I feel the same way about coding, its a source of pride for me and when I hear people say I should resign myself to being an "ideas guy" while chatgpt actually creates things I find the very concept to be distasteful regardless of whether or not it can outperform me.

  • eranation 11 hours ago
    I used to write code by hand.

    I still do, but I used to, too.

  • rnxrx 13 hours ago
    I'm not sure we'll ever really be free of the GIGO (garbage in / garbage out) principle. Tools will get better and better, but can never be a substitute for a deep understanding of the thing we want to create.
  • keithnz 15 hours ago
    AI writes what you ask it to write, you need to talk to it about architecture. You should have an architecture doc so AI can shape the code based on that, you can get the AI to make the architecture doc also. If using claude you can use the software architecture mode for this.
  • youre-wrong3 9 hours ago
    This is the wrong take. If you keep “vibe” coding and end up with bad results you should probably question your ability.
  • hirako2000 13 hours ago
    Research also makes similar claims: https://arxiv.org/html/2603.24755v1
  • jasonvorhe 9 hours ago
    When the title stands in opposition to the actual post, I'm not gonna engage with that author again.
  • sakesun 14 hours ago
    A coder typing in code is not solely to generate outcome. It's part of ongoing thinking process. Without this ongoing process, we have no material to keep iterating forward.
  • Havoc 7 hours ago
    That's a strange definition of "code by hand"
  • deeviant 1 hour ago
    Have you people ever read human generated code? Good grief, you act the like human code is not a disaster 9 times out of 10.
  • ipaddr 15 hours ago
    When he mentions I push commits at work for as long as my tokens last I can understand that. Managing tokens has become an important skill.
  • moveax3 5 hours ago
    Code writers have changed, but the conceptual mistakes remain the same.
  • cortesoft 14 hours ago
    What has really made AI coding be able to continue to work as the project got bigger was using speckit. It has been great at keeping the code consistent across features.

    https://github.com/github/spec-kit

    • nopurpose 5 hours ago
      Did you evaluate other projects, like openspec, before deciding on spec-kit?
  • d_silin 14 hours ago
    It absolutely looks like AI psychosis.
  • Laoujin 11 hours ago
    I'm just wondering: you know what architecture you want to go to now and you have the tests... can't you just let Claude refactor it to the better architecture?

    Also 1600 lines... didn't any agent reviewing the diffs point that out?

    You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules.

    • my-next-account 10 hours ago
      > can't you just let Claude refactor it to the better architecture?

      In my experience, no. These tools suck at refactoring, mostly choosing to add more code instead.

  • Laoujin 11 hours ago
    I'm just wondering: you know what architecture you want to go to now and you have the tests... can't you just let Claude refactor it to the better architecture?

    Also 1600 lines... didn't any agent reviewing the diffs point that out?

    You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules

  • mindaslab 6 hours ago
    I'm going back to writing algorithms on paper.
  • hsaliak 4 hours ago
    I wrote https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode... to avoid the brain rot from just shooting slop. It has helped me stay sane, review code and make changes step by step.

    I dont go as fast as with other agents, but this works for me, and I enjoy the process.

  • amelius 14 hours ago
    So how are people writing the specifications for AI?

    Do they write empty functions and let AI fill them in?

    Or do they use some kind of specification language?

    Are people designing those languages?

  • floodfx 3 hours ago
    Genuinely curious if you've used "plan mode" (with perhaps a plan feedback tool) to get clarity from your coding agent before unleashing it on a feature like "add a pods view with live updates"?

    Getting a plan isn't a panacea but is a better way to limit downstream slop than just vibing without one.

  • z3t4 11 hours ago
    Vibe coding works great with test driven development. You can have AI write the tests as well, but you need to confirm yourself because it's lying all the time. AI coding is like when you first started out, it's copying random bits and pieces from the web into your code until it works... Good for one shots and proof of concept. But for any long living project I think you are better off rewriting it from scratch yourself. Abstractions let you work faster, especially when you have it all in your head.
  • dr_girlfriend 11 hours ago
    i try to write one portable shell script per day; using AI would take all the fun out of it, so i never started using it. i honestly find it ridiculous that anyone uses it to write code, it just doesn't make sense to me.
  • jesse_dot_id 13 hours ago
    LLMs assist those of us who were apt to take blocks of code from StackOverflow, or wherever, to solve problems quickly and avoid as much of the aggravating and slow toil of trial and error as possible.

    That trial and error process is still happening with a LLM, but much faster, and with instantaneous cross-references to various forms of documentation that I would be looking up myself otherwise. It produces code of a quality that is dependent on the engineer knowing what they want in the first place and prompting for it and refining its output correctly.

    It's the exact same process of sculpting code that the majority of the industry was doing "by hand" prior to the release of LLMs, but faster, and the harnesses are only getting better. To "vibe code" is to prompt vaguely and ignore the quality of the output. You're coming to a forum full of professionals and essentially telling us that you're getting really frustrated with your Scratch project.

    I don't know if you're trying to lead a charge or whatever but good luck with that. As a senior SWE, it is clear to me that this is the new paradigm until something better than LLMs comes along. My workflows and efficiency have been vastly improved. I will admit that I have never really been a "I made a SMTP server in 3k of Rust" kind of guy, though.

  • EMM_386 15 hours ago
    You don't need to go back to coding by hand if you know how to do it already. There is a middle ground.

    If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.

    Let the AI do the implementation of your architecture.

  • codingfisch 10 hours ago
    It's pretty simple to vibe code for months without producing slop. And it's the same recipe one used before AI: 1. make it work 2. make it pretty 3. make it fast Omit 2. and 3. long enough -> slop beyond recovery
  • apt-apt-apt-apt 15 hours ago
    Outright lie clickbait. As he states himself, he's doing the design work by hand, and will likely still use AI to write code.
  • mpurbo 16 hours ago
    Strict SDD might help to constrain and harness the process.
  • m3kw9 4 hours ago
    Greed really comes into play when using LLM's to write code, is so easy to say YES when this cool feature where 2 years ago would have taken a week, now is 1 day or even one prompt. The "Say no" skill that Steve Jobs said was important is gonna be needed on an minute by minute basis.
  • ljoshua 14 hours ago
    > tl;dr: AI writes features, not architecture.

    This. I definitely agree with this statement at this point in AI-assisted development. This gets at the "taste" factor that is still intrinsically human, especially in software engineering. If you can construct and guide the overall architecture of an application or system, AI can conceivably fill in the smaller feature bits, and do so well. But it must have a strong architecture and opinionated field in which to play.

    • littlecranky67 7 hours ago
      My main takeaway, too. Been using Claude on my side project that I have singlehandledly been working on for three years. It works well initially, you catch all of AIs mistakes or unfavorable approaches because you know the architecture in and out. But as you stop thinking about the new features, stop losing touch with all the stuff AI throws at you, you fail to develop intuitive feeling on when and how to abstract and introduce architecture.

      Another note was for me e2e tests; while AI can write them it never comes up with just basic organization or abstraction required to manage a large e2e test suite with hundreds of tests. It immediately starts to produce spaghetti code.

  • aryan_kalra12 6 hours ago
    I've been saying the same thing and I'll repeat it again: AI is still gonna take away your job even If you switch domains.
  • holografix 9 hours ago
    Good luck finding a job. All the decision making business people I know see only two types of “technical people”.

    The ones who are “AI pilled” and the contagious lepers.

  • IceDane 4 hours ago
    This doesn't make any sense to me.

    The problem with this dev's approach is not AI, it's their use of it. They didn't ensure that the architecture made sense. They didn't look at the code and get a "feel" for it. They didn't do the whole build stuff, step back, refactor, rinse and repeat dance. The need for that hasn't gone away; if anything, it's even more important now. Because you can spit out code 100x faster than you could before, your tech debt compounds 100x faster. The earlier you refactor, the less work it is.

    I usually give the agent a solid idea of what I want, often down to the API interfaces. Then every now and then, I'll go through the code and ensure that everything makes sense, and that I'm not just spitting out code that works, but building a codebase that scales.

  • magic_hamster 11 hours ago
    Let me preface my comment by saying I also still write a lot of code by hand - especially when it's something I know I need to understand in depth, and in some cases defend.

    With that said, this caught my eye:

    > AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony.

    This is too general. "AI" is used here as a catch-all, but in fact, it was the specific model under the specific conditions you ran your prompt, including harness, markdowns, PRDs, etc. So it's not fair to say "AI does X!" in this case.

    It's also very much up to you. It's very common to have a frontier model plan an architecture before you have another model implement code. If you're just one-shotting an LLM to do everything you get mediocre, more brittle code.

    This stuff is still being figured out by a lot of people. But I feel the core of the issue is not using AI well. Scoping, task alignment, validation, are crucial.

  • royal__ 14 hours ago
    The title is just flat out wrong. The author isn't going back to writing code by hand, they're plopping some new stuff into their CLAUDE.md to "fix" the issues they see AI is having.
  • epec254 16 hours ago
    Not sure if just me, but this post feels AI written?
    • weregiraffe 12 hours ago
      You are absolutely right.
    • pipeline_peak 15 hours ago
      Feels a bit too long winded to be AI generated.
  • nothinkjustai 15 hours ago
    Writing code by hand is an oxymoron. You don’t write code with AI, AI doesn’t write, it generates.
  • devmor 5 hours ago
    I dismissed “AI Psychosis” as a silly term, even as a strong critic of LLMs for programming tools.

    > For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote.

    But every time I read something like this, I seriously wonder about the mental state of the person that wrote it.

    How do you get to this point?

  • localhoster 12 hours ago
    another behavior I noticed is that even you plan with an agent than a lot of business logic leaks to the code.

    some states, for an example, are meant to be assumed from the data shape, rather than the actual state fields, but damn they like adding a state field.

  • blueTiger33 9 hours ago
    nuts
  • AIorNot 16 hours ago
    This doesnt make much sense the article itself is AI written

    It would have been easy to run a few ai agents to review the code and find these issues as well and architect it clearly

  • nothinkjustai 12 hours ago
    I don’t really think OP is writing code themselves since they admit they still use agents for code gen. I’ve really scaled back the amount I use agents though because in the medium to long term I haven’t been getting good results with them. And it’s not enjoyable. That’s enough for me, I’ll do whatever for a job because who cares, if the company wants slop I will gladly give them that, but for my own shit Ive gone back to circa 2024 and am mostly just using them as a chatbot.

    Inb4 “you’re gonna be replaced” god damn it I hope so, I do not want to spend the rest of my life behind a computer screen…

  • imperio59 14 hours ago
    Alternate title: "I did not understand the current limitations of AI and assumed it could do large software design and it generated spaghetti slop"

    Yea, that's why engineers are still very important for now (until models can do this type of longer term designs and stick to them).

  • Fokamul 7 hours ago
    I also code by hand.

    But in my main work, reverse engineering, LLMs are godsend, for years now.

    You can basically bruteforce binary obfuscation thanks to them. And thanks to eager chinese LLM providers, basically for free.

    But I always use LLM only for boring work and rest is for me to do manually, or with scripts of course, but made by me. Because I want to learn.

    Yes, there are a lot people using LLMs for full RE automation since they're selling exploits for profit. No problem with me.

    I see funny future for huge corporations like Adobe, etc.

    Imagine prompt, "Hey Claude, re-implement Adobe Photoshop with clean-room design" One agent will open decompiler, outputs complete low level technical details how is everything implemented.

    Second agent implements new Photoshop based on that.

    They will be mad and I like this.

    You will own nothing, and you will be happy, corpos.

  • bbbflgllglhlld 11 hours ago
    Luddite.
    • recursive 10 hours ago
      Seems to be an unstated assumption that the Ludds were wrong.
  • UrbanNorminal 10 hours ago
    Wow ok, I will too then. Fuck AI!
  • FpUser 12 hours ago
    >"I'm doing the design work myself, by hand, before any code gets written."

    This is what I was doing right from the beginning. AI just fills out methods and doing other low intelligence work. Both are happy. My architectures and code are really mine, easy to read and reason. AI gets paid and does not get a chance to fuck me in the process. At no point I felt any temptation to leave "serious" to AI.

  • scuff3d 14 hours ago
    I feel like this article was circling a point it never actually got to. All the advice in here (except controlling scope creep) is specific to a TUI with an elm like architecture.

    But here's the thing, you almost never know what the architecture is up front. If you do you probably aren't the one writing the actual code anymore. Writing the code, with or without an AI is part of the design process. For most people it isn't until they've tried several times, fucked it up a bunch, and refactored or rewrote even more that you actually know what the architecture needs to be.

  • photochemsyn 15 hours ago
    Does ‘writing code by hand’ mean you’re not going to use compilers to generate assembly?

    Now I do feel lucky that I started learning coding about four years before the LLM revolution, but these things are really just natural language compilers, aren’t they? We’re just in that period - the 1980s, the greybeards tell me - where companies charged thousands of dollars per compiler instance, right? And now, I myself have never paid for a compiler.

    This whole investor bubble will blow up in the face of the rentier-finance capitalists and I’ll be laughing my head off while it happens.

    • green_wheel 14 hours ago
      Nondeterministic natural language compilers
      • photochemsyn 12 hours ago
        Just because the trajectory is chaotic doesn't mean it’s not deterministic.
        • zephen 1 hour ago
          A model, given exactly the same inputs, will return exactly the same outputs.

          But your prompts are not the only inputs. Among other things, there is a random seed injected by the vendor.

          That is a primary source of non-determinism.

          Then, of course, is the fact that you don't personally have an old copy of the model, and the vendor isn't going to keep the model forever, and there are no unit tests to make sure that, faced with prompts like you gave it before, the newer models won't suffer major regressions in the functionality you were using.

          And even if there were no non-determinism, the models suffer greatly (much more so than traditional compilers) from the butterfly effect.

          It is literally impossible to pin down part of your prompt in such a way that it always will contribute to good outcomes, and such that you can simply vary a tiny bit of the prompt to logically correlate with tiny variations in the output.

    • platevoltage 13 hours ago
      So C++ doesn't count as code now.
  • kypro 15 hours ago
    > I learned over these 7 months

    7 months ago was early November. Coding assistants were getting very good back then, but they were still significantly poorer at making good architectural decisions in my experience. They tended to just force features into the existing code base without much thought or care.

    Today I've noticed assistants tend to spot architectural smells while working and will ask you whether they should try to address it, but even then they're probably never going to suggest a full refactor of the codebase (which probably is generally the correct heuristic).

    My guess is that if you built this today with AI that you wouldn't run into so many of these problems. That's not to say you should build blind, but the first thing that stood out to me was that you starting building 7 months ago and coding assistants were only just becoming decent at that time, and undirected would still generally generate total slop.

  • lacymorrow 4 hours ago
    [flagged]
  • luodaint 4 hours ago
    [flagged]
  • Jatin-Mali 13 hours ago
    [flagged]
  • icmpkitty 4 hours ago
    [dead]
  • Serhii-Set 7 hours ago
    [dead]
  • genghot 14 hours ago
    [flagged]
  • Kepler_X 3 hours ago
    [dead]
  • throwmo999 7 hours ago
    [dead]
  • andrew_kwak 13 hours ago
    [flagged]
  • devinabox 11 hours ago
    [dead]
  • Towaway69 9 hours ago
    If you're coding by hand, then you're that carpenter before IKEA came along. Now the market wants bland machine-built functional furniture that gets replaced every five to ten years. If every tenth piece is broken or slightly off, doesn't matter, mass production has lowered the price that a replacement is available for free and you're still making a profit.

    Time to become a "product engineer" and watch the hyper-agile agents putting up digital post-it notes on digital pin-boards discussing how much each post-it is worth in digital scrum meetings. Meanwhile the agents keep wasting more and more time so that their owners make less and less of a lose, until eventually a profit is made.

    Until the costs become prohibitive and humans become cheaper than the agents that replaced them. Once the agents are replaced by the humans, the next hype bubble awaits around the bend.

    /s

  • vladsiu 13 hours ago
    [dead]
  • wotsdat 4 hours ago
    [dead]
  • Decabytes 15 hours ago
    We should go back to designing UML diagrams for programs before we write them /s
    • khutorni 1 hour ago
      I think we should, to a reasonable degree.
  • eggplantemoji69 14 hours ago
    TLDR ai wrote tech debt slop because I vibed for 7 months, now I am taking a hybrid approach of defining strict constraints before vibing…
  • gjvc 16 hours ago
    [flagged]