> I hated writing software this way. Forget the output for a moment; the process was excruciating. Most of my time was spent reading proposed code changes and pressing the 1 key to accept the changes, which I almost always did. [...]
That's why they hated it. Approving every change is the most frustrating way of using these tools.
I genuinely think that one of the biggest differences between people who enjoy coding agents and people who hate them is whether or not they run in YOLO mode (aka dangerously-skip-permissions). YOLO mode feels like a whole different product.
I get the desire not to do that because you want to verify everything they do, but you can still do that by reviewing the code later on without the pain of step-by-step approvals.
>reviewing the code later on without step-by-step approvals
I found that Claude likes to leave some real gems in there if you get lazy and don't check. Gently sprinkled in between 100 lines of otherwise fine looking code that sows doubt into all of the other lines it's written. Sometimes it makes a horrific architectural decision and if it doesn't get caught right there it's catastrophic for the rest of the session.
or it casually forgets to implement some requirements, which one finds out about when the program runs, hits that pathway, and either crashes or does nothing.
Are you not giving it enough information to work with? All of these issues you and the parent comment mentioned can be worked around by telling it HOW to do things.
you can tell it how to do things, but sometimes it still goes out on its own, I have some variant of "do not deviate from the plan" and yet sometimes if you look while it's coding it will "ah, this is too hard as per the plan, let me take this shortcut" or "this previous test fails, but it's not an issue with my code I just wrote, so let's just 'fix' the test"
For simple scripts and simple self contained problems fully agenting in yolo mostly works, but as soon as it's an existing codebase or plans get more complex I find I have to handhold claude a lot more and if I leave it to its own devices I find things later. I have found also that having it update the plan with what it did AND afterwards review the plan it will find deviations still in the codebase.
Like the other day I had in the plan to refactor something due to data model changes, specifying very clearly this was an intentional breaking change (greenfield project under development), and it left behind all the existing code to preserve backwards compatibility, and actually it had many code contortions to make that happen, so much so I had to redo the whole thing.
Sometimes it does feel that Anthropic turns up/down the intelligence (I always run opus in high reasoning) but sometimes it seems it's just the nature of things, it is not deterministic, and sometimes it will just go off and do what it thinks it's best whether or not you prompt it not to (if you ask it later why it did that it will apologize with some variation of well it made sense at the time)
The whole shtick of LLMs is that it can do stuff without telling it explicitly. Not sure why people are blamed because they are using it based on that expectation....
Yes, it can. So can I. But neither of us will write the code exactly the way nitpicky PR reviewer #2 demands it be written unless he makes his preferences clear somewhere. Even at a nitpick-hellhole like Google that's mostly codified into a massive number of readability rules, which can be found and followed in theory. Elsewhere, most reviewer preferences are just individual quirks that you have to pick up on over time, and that's the kind of stuff that neither new employees nor Claude will ever possibly be able to get right in a one-shot manner.
There is an unconstrained number of ways it can write code and still not be how I want it. Sometimes it's easier to write the correction against the code that is already generated since now you at least have a reference to something there than describing code that doesn't yet exist. I don't think it's solvable in general until they have the neuralink skill that senses my approval as it materializes each token and autocorrects to the golden path based on whether I'm making a happy or frowny face.
Stop thinking like a programmer and start thinking like a business person. Invest time and energy in thinking about WHAT you want; let the LLM worry about the HOW.
The thing is that the HOW of today becomes the context of someone else's tomorrow session, that person may not be as knowledgeable about that particular part of the codebase (and the domain), their LLM will base its own solution on today's unchecked output and will, inevitably, stray a little bit further from the optimum. So far I haven't seen any mechanism and workflow that would consistently push in the opposite direction.
Technically that's true, but unless you literally write every single line of code, the LLM will find a way to smuggle in some weirdness. Usually it isn't that bad, but it definitely requires quite a lot of attention.
> I get the desire not to do that because you want to verify everything they do, but you can still do that by reviewing the code later on without the pain of step-by-step approvals.
It's a well-known truth in software development that programmers hate having to maintain code written by someone else. We see all the ways in which they wrote terrible code, that we obviously would never write. (In turn, the programmers after us will do the same thing to our code.)
Having to get into the mindset of the person writing the code is difficult and tiring, but it's necessary in order to realise why they wrote things the way they did - which in turn helps you understand the problems they were solving, and why the code they wrote actually isn't as terrible in context as it looked at first glance.
I think it makes sense that this would also apply to the use of generative AI when programming - reviewing the entire codebase after it's already been written is probably more error-prone and difficult than following along with each individual step that went into it, especially when you consider that there's no singular "mindset" you can really identify from AI-generated output. That code could have come from anywhere...
I think that those permissions are largely security theater anyway.
It would be better if an LLM coding harness just helped you set up a proper sandbox for itself (containers, VMs etc.) and then run inside the isolated environment unconstrained.
In setup mode, the only tool accessible to the agent should be running shell scripts, and each script should be reviewed before running.
Inside an isolated environment, there should be no permission system at all.
I think it's too far to say you need YOLO mode — the author was correctly pointing to the "auto-accept all changes" setting. They should have just turned that on and then reviewed the changes in larger chunks. You don't have to let it go for half an hour and review the mess it cooked up — you can keep an eye on things and even manually make commits to break the work into logical pieces.
With auto-accept edits plus a decent allowlist for common commands you know are safe, the permission prompts you still get are much more tolerable. This does prevent you from using too many parallel agents at a time, since you do have to keep an eye on them, but I am skeptical of people using more than 3-5 anyway. Or at least, I'm sure there is work amenable to many agents but I don't think most software engineering is like that.
All that said, I am reaching the point where I'm ready to try running CC in a VM so I can go full YOLO.
I'm legitimately curious - could you elaborate on the difference? Speaking as someone who has always preferred the commit-by-commit focus of a rebase instead of all-at-once merge conflict resolution, auditing all the changes together later doesn't sound more appealing than doing things incrementally.
It's far more sane to review a complete PR than to verify every small change. They are like dicey new interns - do you want to look over their shoulder all day, or review their code after they've had time to do some meaningful quantum of work?
> It's far more sane to review a complete PR than to verify every small change.
Especially when the harness loop works if you let it work. First pass might have syntax issues. The loop will catch it, edit the file, and the next thing pops up. Linter issues. Runtime issues. And so on. Approving every small edit and reading it might lead to frustrations that aren't there if you just look at the final product (that's what you care about, anyway).
The main difference in the current (theatrical) permission model is that the agent is blocked on waiting for your approval. So you can't just launch it and go do something else, because when you return you will see that nothing is done and it has just been waiting for your input all this time. You have to stare at the screen and do nothing, which is a really boring and unproductive way to spend time.
If you launch it in YOLO mode in a separate branch in a separate worktree (or, preferably, in total isolation), you can instead spend time reviewing changes from previous tasks or refining requirements for new tasks.
The choice isn't really between all at once and line by line. I always use accept all changes, but I make commits that I can review and consider in bigger pieces, but usually smaller than the full PR.
Even if you don't want to do yolo mode, there are things like Copilot Autopilot or you can make the permissions for Claude so wide that they can work for an hour and let you come back to the artifact after lunch.
...and then you get "the agent just git resetted --hard 12 hours of my work!", because AI bros can't be bothered to make their tooling actually good and version the changes at filesystem level, because it needs more than putting another variation of "pretty please don't break things" in the prompt.
I have come to the conclusion that many people are going to live this AI period pretty much like the five stages of grief: denial that it can work, anger at the new robber barons, bargaining that yeah it kinda works but not really well enough, catastrophic world view and depression, and finally acceptance of the new normality.
What's the 'new normality' in the fifth stage? Do you think you'll start to believe it actually works 100%? Or that you won't change your assessment that it works only sometimes, but maybe pulling the lever on the slot machine repeatedly is better/more efficient than doing it yourself?
No this is still the "bargaining/negotiating" phase thinking. After this is when depression hits when for your usecases you see that the code quality and security audit is very good.
People will accept it as a way to build good software.
Many are still in denial that you can do work that is as good as before, quicker, using coding agents. A lot of people think there has to be some catch, but there really doesn’t have to be. If you continue to put effort in, reviewing results, caring about testing and architecture, working to understand your codebase, then you can do better work. You can think through more edge cases, run more experiments, and iterate faster to a better end result.
> it's looking like assessment and evaluation are massive bottlenecks.
So I think LLMs have moved the effort that used to be spent on fun part (coding) into the boring part (assessment and evaluation) that is also now a lot bigger..
You could build (code, if you really want) tools to ease the review. Of course we already have many tools to do this, but with LLMs you can use their stochastic behavior to discover unexpected problems (something a deterministic solution never can). The author also talks about this when talking about the security review (something I rarely did in the past, but also do now and it has really improved the security posture of my systems).
You can also setup way more elaborate verification systems. Don't just do a static analyis of the code, but actually deploy it and let the LLM hammer at it with all kinds of creative paths. Then let it debug why it's broken. It's relentless at debugging - I've found issues in external tools I normally would've let go (maybe created an issue for), that I can now debug and even propose a fix for, without much effort from my side.
So yeah, I agree that the boring part has become the more important part right now (speccing well and letting it build what you want is pretty much solved), but let's then automate that. Because if anything, that's what I love about this job: I get to automate work, so that my users (often myself) can be lazy and focus on stuff that's more valuable/enjoyable/satisfying.
When writing banal code, you can just ask it to write unit tests for certain conditions and it'll do a pretty good job. The cutting edge tools will correctly automatically run and iterate on the unit tests when they dont pass. You can even ask the agent to setup TDD.
I'm kind of excited about that though. What I've come to realize is that automated testing and linting and good review tools are more important than ever, so we'll probably see some good developments in these areas. This helps both humans and AIs so it's a win win. I hope.
It doesn't have to work 100% of the time to be ubiquitous! This is just the strangest point of view. People don't work 100% of the time either, and they wrote all the code we had until a couple of years ago. How did we deal with that? Many different kinds of checks and mitigations. And sometimes we get bugs in prod and we fix them.
The new normal will be: Everything will get worse and far more unstable (both in terms of UI/UX and reliability), and many of us will loose their jobs. Also the next generation of the programmers will have shallower understanding of the tools they use.
AI doesn't need to outrun the bear; it only needs to outrun you.
Once the tools outperform humans at the tasks to which they were applied (and they will), you don't need to be involved at all, except to give direction and final acceptance. The tools will write, and verify, the code at each step.
> Once the tools outperform humans at the tasks to which they were applied (and they will)
I don't get why some people are so convinced that this is inevitable. It's possible, yes, but it very well might be the case, that models cannot be stopped from randomly doing stupid things, cannot be made more trustworthy, cannot be made more verifiable, and will have to be relegated to the role of brainstorming aids.
I think they meant that people insisting total genAI takeover of coding is inevitable are likely people who stand to profit greatly by everyone giving up and using the unmind machines for everything.
the original post is an example of how. Every programmer is discovering slowly, for their own usecases, that the agent can actually do it. This happens to an individual when they give it a shot without reservation..
Large scale AI datacenters require a very expensive physical supply chain that includes cheap land, water, and electricity, political leverage, human architects and builders to build datacenters, and massive capital investments. Yes, AI will outperform humans, but at some point it may become cheaper to hire a human programmer.
My existence is defined not but what I adopted but what I sabotaged or refused to deal with. 30 years in I haven't made a mistake and I don't think I am making one here. The positive bets I made have been spot on as well. I think I have a handle on what works for society and humanity at least.
When I say AI, I mean specifically LLMs. There isn't a single future position where all the risks are suitably managed, there is a return of investment and there is not a net loss to society. Faith, hope, lies, fraud and inflated expectations don't cut it and that is what the whole shebang is built on. On top of that, we are entering a time of serious geopolitical instability. Creating more dependencies on large amounts of capital and regional control is totally unacceptable and puts us all at risk.
My integrity is worth more than sucking this teat.
“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”
— George Bernard Shaw
The antidote to runaway hype is for someone to push back, not to just relent and accept your fate. Who cares about affording to. We need more people with ideals stronger than the desire to make a lot of money.
I remember that around 2023, when I first encountered colleagues trying to use ChatGPT for coding, I thought "by the time you are done with your back-and-forth to correct all the errors, I would have already written this code manually".
No, it's still very much true. Every now and then I use an LLM to write code and the vast majority of the time it turns out to take just as much time (if not more) than it would've taken to write the code myself.
I suspect I fall into the former camp, but I'm not sure where to start when it comes to learning how to use llms "the right way".
I'm not a proper software engineer, but I do a lot of scripting and most of my attempts to let a model speed up a menial task (e.g. a small bash or python script for some data parsing or chaining together other tools), end up with me doing extensive rewrites because the model is completely inconsistent in naming convention, pattern reusage, etc.
This is true for things you already understand. It works for implementing yet another CRUD view because I've done it a million times before. I know exactly what the code should look like, but it takes a while to type it in. When my typing speed is the bottleneck then of course LLMs win (and I use them for that all the time).
But the interesting stuff where you don't understand the problem yet, it doesn't make it quicker. Because then the bottleneck is my understanding. Things take time. And sleep. They require hands-on experience. It doesn't matter how fast LLMs can churn out code. There's a limit to how fast I can understand things. Unless, of course, I'm happy shipping code I don't understand, which I'm not.
Less than 6 months ago I would say about 50% of HN was at the denial phase saying it's just a next token predictor and that it doesn't actually understand code.
To all of you I can only say, you were utterly wrong and I hope you realize how unreliable your judgements all are. Remember I'm saying this to roughly 50% of HN., an internet community that's supposedly more rational and intelligent than other places on the internet. For this community to be so wrong about something so obvious.... That's saying something.
If it doesn’t understand anything why the fuck are we letting it write all our code when it doesn’t understand code at all? Does that make any sense to you? Does that align with common sense? You’re still in denial.
You gonna give some predictable answer about next token prediction and probability or some useless exposition on transformers while completely avoiding the fact that we don’t understand the black box emergent properties that make a next token predicted have properties indistinguishable from intelligence?
I'm letting it write (type out) most (80-98%) of my code, but I see it as an idiot savant. If the idea is simple, I get 100 lines of solid Ruby. Good, saves me time. If the idea is complicated (e.g. a 400-LOC class that distills a certain functionality currently scattered across different methods and objects) and I ask 4 agents to come up with different solutions, I get 4 slightly flawed approaches that don't match how I'd personally architect the feature. And "how I'd personally architect the feature" is literally my expertise. My job isn't typing Ruby, it's making good decisions.
My conclusion is that at this point, LLMs are not capable of making good decisions supported by deep reasoning. They're capable of mimicking that, yes, and it takes some skill to see through them.
Yes, I do find it a little funny how the developer community got it all wrong and the non technical people who were thinking AI is going to change everything in 2023 were the right ones. Maybe they know more than developers think.
Not true. You’re a next token predictor and clearly the tokens you predict indicate that the way you predict the next token is much much more then simply a probabilistic detection. You’re a black box and so is the LLM and the evidence is pointing at emergent properties we don’t completely understand but are completely inline with what we understand as reasoning.
Don’t make me cite George Hinton or other preeminent experts to show you how wrong you all are.
Use your brain. It is changing the industry from the ground up. It understands.
The author has arrived at resentful acceptance of the models power(eg: "negative externalities", "condemn those who choose").
But the next step for many is championing acceptance. Eg "that the same kind of success is available outside the world of highly structured language" .. it actually is visible when you engage with people. I'm myself going through this transition.
They really shouldnt have read all the changes individually. What you gotta do is set up your VC properly so these changes are seperated from good code, and then review the whole set of changes in an IDE that highlights them, like a proto PR. Thats far far less taxing since you get the whole picture
I recently spoke to a very junior developer (he's still in school) about his hobby projects.
He doesn't have our bagage. He doesn't feel the anxiety the purists feel.
He just pipes all errors right back in his task flow. He does period refactoring. He tests everything and also refactors the tests. He does automated penetration testing.
There are great tools for everything he does and they are improving at breakneck speeds.
He creates stuff that is levels above what I ever made and I spent years building it.
You can still survive without using generative tools. Just not writing crud apps .
There is plenty of code that require proof of correctnesss and solid guarantees like in aviation or space and so on. Torvalds in a recent interview mentioned how little code he gets is generated despite kernel code being available to train easily .
How is that measured? Is his stuff maintainable? Is it fast? Are good architectural decisions baked in that won't prevent him from adding a critical new feature?
I don't understand where this masochism comes from. I'm a software developer, I'm an intelligent and flexible person. The LLM jockey might be the same kind of person, but I have years of actual development experience and NOTHING preventing me from stepping down to that level and doing the same thing, starting tomorrow. I've built some nice and complicated stuff in my life, I'm perfectly capable of running a LLM in a loop. Most of the stuff that people like to call prompt/agentic/frontier or whatever engineering is ridiculously simple, and the only reason I'm not spending much time on it is that I don't think it leads to the kind of results my employer expects from me.
Your experience may be valuable, and in fact made me think, but I also think the brashness of framing everything in the "adapt or die" ultimatum is unnecessary and off-putting.
The way I see it, the kid has a dangerous dependency on at least one expensive service, cannot solve problems by himself and highly likely doesn't understand core concepts of programming and computers in general.
Yeah I dread the software landscape in 10 years, when people will have generated terabytes of unmaintainable slop code that I need to fix.
> I have no reason to expect this technology can succeed at the same level in law, medicine, or any other highly human, highly subjective occupation.
I mean, if anything, I would expect it to help bring structure to medicine, which is an often sloppy profession killing somewhere between tens of thousands and hundreds of thousands of people a year through mistakes and out of date practices.
As medicine is currently very subjective. As a scientific field in the realm of physical sciences, it shouldn't be.
I was just talking to some friends in medicine the other day. They are getting more and more AI stuff and they love it.
Just basic stuff like smart dictation that listens to the conversation the practitioner is having and auto creates the medical notes, letters, prescriptions etc saving them time and effort to type that all up themselves etc. They were saying that obviously they have to check everything but it was (and I quote) "scarily perfectly accurate". Freeing up a bunch of their time to actually be with the patient and not have to spend time typing etc.
It's way beyond dictation. Medics I know (fresh postgraduates who used LLMs to help write their R code for statistical analysis for their research) are starting to treat it as one of their peers for domain reasoning, e.g. for discussing whether the conditions for a heart transplant are met. They're indeed in the "wow, this thing is human-like" stage, just not in the "let's delegate to the super brain, and then rubber-stamp the result at the end if it looks good" one we seem to be in... perhaps yet.
This is the crazy part with LLMs. It knows much more than you as a single user will ever realize, as it only shows the part that matches with what you put in.
I was building a tool to do exploratory data analysis. The data is manufacturing stuff (data from 10s of factories, having low level sensor data, human enrichments, all the way up to pre-agregated OEE performance KPIs). I didn't even have to give it any documentation on how the factories work - it just knew from the data what it was dealing with and it is very accurate to the extent I can evaluate. People who actually know the domain are raving about it.
If programmers like being able to pay their rent/mortgage, they'll quickly learn not to feel sad about literally the best thing to happen to software development in decades. Because otherwise they'll be replaced by someone who's delighted with it (they're not hard to find).
A programmer who is not delighted by programming cannot be very good at it. So the same people who are "delighted" by using an LLM is the exact same people who should not be using it.
It would be like putting a person who don't know how to drive in the driving seat of a semi-autonomous driving vehicle.
These takes are growing increasingly tiresome, I have to admit. They are pretty much all just tacit admissions of some kind of skill issue with this new class of tool, but presented with a sheen of moral outrage. I don’t think anyone’s buying it anymore. Figure it out.
What kind of skill does it require to let LLMs write 100% of your code? I'm genuinely asking, what's the hard part that a pre-LLM developer is fundamentally incapable of doing? Is it running the agents in a loop? Or along a state machine? Running them in parallel? Because honestly none of that sounds like anything an experienced software dev shouldn't be able to pick up in two weekends.
That's why they hated it. Approving every change is the most frustrating way of using these tools.
I genuinely think that one of the biggest differences between people who enjoy coding agents and people who hate them is whether or not they run in YOLO mode (aka dangerously-skip-permissions). YOLO mode feels like a whole different product.
I get the desire not to do that because you want to verify everything they do, but you can still do that by reviewing the code later on without the pain of step-by-step approvals.
I found that Claude likes to leave some real gems in there if you get lazy and don't check. Gently sprinkled in between 100 lines of otherwise fine looking code that sows doubt into all of the other lines it's written. Sometimes it makes a horrific architectural decision and if it doesn't get caught right there it's catastrophic for the rest of the session.
For simple scripts and simple self contained problems fully agenting in yolo mostly works, but as soon as it's an existing codebase or plans get more complex I find I have to handhold claude a lot more and if I leave it to its own devices I find things later. I have found also that having it update the plan with what it did AND afterwards review the plan it will find deviations still in the codebase.
Like the other day I had in the plan to refactor something due to data model changes, specifying very clearly this was an intentional breaking change (greenfield project under development), and it left behind all the existing code to preserve backwards compatibility, and actually it had many code contortions to make that happen, so much so I had to redo the whole thing.
Sometimes it does feel that Anthropic turns up/down the intelligence (I always run opus in high reasoning) but sometimes it seems it's just the nature of things, it is not deterministic, and sometimes it will just go off and do what it thinks it's best whether or not you prompt it not to (if you ask it later why it did that it will apologize with some variation of well it made sense at the time)
You mean, let the LLM hallucinate about the HOW...
It's a well-known truth in software development that programmers hate having to maintain code written by someone else. We see all the ways in which they wrote terrible code, that we obviously would never write. (In turn, the programmers after us will do the same thing to our code.)
Having to get into the mindset of the person writing the code is difficult and tiring, but it's necessary in order to realise why they wrote things the way they did - which in turn helps you understand the problems they were solving, and why the code they wrote actually isn't as terrible in context as it looked at first glance.
I think it makes sense that this would also apply to the use of generative AI when programming - reviewing the entire codebase after it's already been written is probably more error-prone and difficult than following along with each individual step that went into it, especially when you consider that there's no singular "mindset" you can really identify from AI-generated output. That code could have come from anywhere...
It would be better if an LLM coding harness just helped you set up a proper sandbox for itself (containers, VMs etc.) and then run inside the isolated environment unconstrained.
In setup mode, the only tool accessible to the agent should be running shell scripts, and each script should be reviewed before running.
Inside an isolated environment, there should be no permission system at all.
With auto-accept edits plus a decent allowlist for common commands you know are safe, the permission prompts you still get are much more tolerable. This does prevent you from using too many parallel agents at a time, since you do have to keep an eye on them, but I am skeptical of people using more than 3-5 anyway. Or at least, I'm sure there is work amenable to many agents but I don't think most software engineering is like that.
All that said, I am reaching the point where I'm ready to try running CC in a VM so I can go full YOLO.
Especially when the harness loop works if you let it work. First pass might have syntax issues. The loop will catch it, edit the file, and the next thing pops up. Linter issues. Runtime issues. And so on. Approving every small edit and reading it might lead to frustrations that aren't there if you just look at the final product (that's what you care about, anyway).
If you launch it in YOLO mode in a separate branch in a separate worktree (or, preferably, in total isolation), you can instead spend time reviewing changes from previous tasks or refining requirements for new tasks.
I'm still at the bargaining phase, personally.
Many are still in denial that you can do work that is as good as before, quicker, using coding agents. A lot of people think there has to be some catch, but there really doesn’t have to be. If you continue to put effort in, reviewing results, caring about testing and architecture, working to understand your codebase, then you can do better work. You can think through more edge cases, run more experiments, and iterate faster to a better end result.
So I think LLMs have moved the effort that used to be spent on fun part (coding) into the boring part (assessment and evaluation) that is also now a lot bigger..
You can also setup way more elaborate verification systems. Don't just do a static analyis of the code, but actually deploy it and let the LLM hammer at it with all kinds of creative paths. Then let it debug why it's broken. It's relentless at debugging - I've found issues in external tools I normally would've let go (maybe created an issue for), that I can now debug and even propose a fix for, without much effort from my side.
So yeah, I agree that the boring part has become the more important part right now (speccing well and letting it build what you want is pretty much solved), but let's then automate that. Because if anything, that's what I love about this job: I get to automate work, so that my users (often myself) can be lazy and focus on stuff that's more valuable/enjoyable/satisfying.
Once the tools outperform humans at the tasks to which they were applied (and they will), you don't need to be involved at all, except to give direction and final acceptance. The tools will write, and verify, the code at each step.
I don't get why some people are so convinced that this is inevitable. It's possible, yes, but it very well might be the case, that models cannot be stopped from randomly doing stupid things, cannot be made more trustworthy, cannot be made more verifiable, and will have to be relegated to the role of brainstorming aids.
Someone once said that It is hard to make a man understand things if their profit depends on them not understanding it...
We don’t have to accept things.
When I say AI, I mean specifically LLMs. There isn't a single future position where all the risks are suitably managed, there is a return of investment and there is not a net loss to society. Faith, hope, lies, fraud and inflated expectations don't cut it and that is what the whole shebang is built on. On top of that, we are entering a time of serious geopolitical instability. Creating more dependencies on large amounts of capital and regional control is totally unacceptable and puts us all at risk.
My integrity is worth more than sucking this teat.
— George Bernard Shaw
The antidote to runaway hype is for someone to push back, not to just relent and accept your fate. Who cares about affording to. We need more people with ideals stronger than the desire to make a lot of money.
I mean, at some point it was true.
I remember that around 2023, when I first encountered colleagues trying to use ChatGPT for coding, I thought "by the time you are done with your back-and-forth to correct all the errors, I would have already written this code manually".
That was true then, but not anymore.
I'm not a proper software engineer, but I do a lot of scripting and most of my attempts to let a model speed up a menial task (e.g. a small bash or python script for some data parsing or chaining together other tools), end up with me doing extensive rewrites because the model is completely inconsistent in naming convention, pattern reusage, etc.
But the interesting stuff where you don't understand the problem yet, it doesn't make it quicker. Because then the bottleneck is my understanding. Things take time. And sleep. They require hands-on experience. It doesn't matter how fast LLMs can churn out code. There's a limit to how fast I can understand things. Unless, of course, I'm happy shipping code I don't understand, which I'm not.
To all of you I can only say, you were utterly wrong and I hope you realize how unreliable your judgements all are. Remember I'm saying this to roughly 50% of HN., an internet community that's supposedly more rational and intelligent than other places on the internet. For this community to be so wrong about something so obvious.... That's saying something.
You gonna give some predictable answer about next token prediction and probability or some useless exposition on transformers while completely avoiding the fact that we don’t understand the black box emergent properties that make a next token predicted have properties indistinguishable from intelligence?
My conclusion is that at this point, LLMs are not capable of making good decisions supported by deep reasoning. They're capable of mimicking that, yes, and it takes some skill to see through them.
https://www.youtube.com/watch?v=qvNCVYkHKfg
I don’t have any questions about LLMs. At least not any more than say an LLM researcher at anthropic working on model interpretability.
They weren't wrong though. It objectively is just a next turn predictor and doesn't understand code. That is how the thing works.
Don’t make me cite George Hinton or other preeminent experts to show you how wrong you all are.
Use your brain. It is changing the industry from the ground up. It understands.
https://www.youtube.com/watch?v=qvNCVYkHKfg
But the next step for many is championing acceptance. Eg "that the same kind of success is available outside the world of highly structured language" .. it actually is visible when you engage with people. I'm myself going through this transition.
He doesn't have our bagage. He doesn't feel the anxiety the purists feel.
He just pipes all errors right back in his task flow. He does period refactoring. He tests everything and also refactors the tests. He does automated penetration testing.
There are great tools for everything he does and they are improving at breakneck speeds.
He creates stuff that is levels above what I ever made and I spent years building it.
I accepted months ago: adapt or die.
There is plenty of code that require proof of correctnesss and solid guarantees like in aviation or space and so on. Torvalds in a recent interview mentioned how little code he gets is generated despite kernel code being available to train easily .
How is that measured? Is his stuff maintainable? Is it fast? Are good architectural decisions baked in that won't prevent him from adding a critical new feature?
I don't understand where this masochism comes from. I'm a software developer, I'm an intelligent and flexible person. The LLM jockey might be the same kind of person, but I have years of actual development experience and NOTHING preventing me from stepping down to that level and doing the same thing, starting tomorrow. I've built some nice and complicated stuff in my life, I'm perfectly capable of running a LLM in a loop. Most of the stuff that people like to call prompt/agentic/frontier or whatever engineering is ridiculously simple, and the only reason I'm not spending much time on it is that I don't think it leads to the kind of results my employer expects from me.
“He automated his job so well the company doesn’t need him anymore.”
Yeah I dread the software landscape in 10 years, when people will have generated terabytes of unmaintainable slop code that I need to fix.
I mean, if anything, I would expect it to help bring structure to medicine, which is an often sloppy profession killing somewhere between tens of thousands and hundreds of thousands of people a year through mistakes and out of date practices.
As medicine is currently very subjective. As a scientific field in the realm of physical sciences, it shouldn't be.
Just basic stuff like smart dictation that listens to the conversation the practitioner is having and auto creates the medical notes, letters, prescriptions etc saving them time and effort to type that all up themselves etc. They were saying that obviously they have to check everything but it was (and I quote) "scarily perfectly accurate". Freeing up a bunch of their time to actually be with the patient and not have to spend time typing etc.
I was building a tool to do exploratory data analysis. The data is manufacturing stuff (data from 10s of factories, having low level sensor data, human enrichments, all the way up to pre-agregated OEE performance KPIs). I didn't even have to give it any documentation on how the factories work - it just knew from the data what it was dealing with and it is very accurate to the extent I can evaluate. People who actually know the domain are raving about it.
A programmer who is not delighted by programming cannot be very good at it. So the same people who are "delighted" by using an LLM is the exact same people who should not be using it.
It would be like putting a person who don't know how to drive in the driving seat of a semi-autonomous driving vehicle.
I'm able to pay rent just fine without one...
If that's not delusional thinking I don't know what is.