The gains are ~17% increase in individual effectiveness, but a ~9% of extra instability.
In my experience using AI assisted coding for a bit longer than 2 years, the benefit is close to what Dora reported (maybe a bit higher around 25%). Nothing close to an average of 2x, 5x, 10x. There's a 10x in some very specific tasks, but also a negative factor in others as seemingly trivial, but high impact bugs get to production that would have normally be caught very early in development on in code reviews.
Obviously depends what one does. Using AI to build a UI to share cat pictures has a different risk appetite than building a payments backend.
That 17% increase is in self-reported effectiveness. The software delivery throughput only went up 3%, at a cost of that 9% extra instability. So you can build 3% faster with 9% more bugs, if I'm reading those numbers right.
Those aren't even percentage increases, but standardized effect sizes. So if you take an individual survey respondent and all you know is that they self-reported higher AI usage, you can guess their answers to the self-reported individual effectiveness slightly more accurately, but most of the variation will be due to unrelated factors.
The question that people are actually interested in, "After adopting this specific AI tool, will there be a noticeable impact on measures we care about?" is not addressed by this model at all, since they do not compare individual respondents' answers over time, nor is there any attempt to establish causality.
I think for myself, it's close to 25% if I only take my role as a dev. If I take my 'senior' role it's less, because I spend way more time in reviews or in prod incident meetings.
Three months ago, with opus4.5, I would have said that the productivity improvement was ~10% for my whole team.
I now have to contradict myself: juniors and even experienced new hires with little domain knowledge don't improve as fast as they used to. I still have to write new tasks/issue like I would have for someone we just hired, after 8 months. I still catch the same issues we caught in reviews three months ago.
Basically, experience doesn't improve productivity as fast as it used to. On easy stuff it doesn't matter (like frontend changes, the productivity gains are extremely high, probably 10x), and on specific subjects like red teaming where a quantity of small tools is better than an integrated solution I think it can be better than that.
But I'm in a netsec tooling team, we do hard automation work to solve hard engineering issues, and that is starting to be a problem if juniors don't level up fast.
For me it is a 2x or 5x or something, "but high impact bugs get to production that would have normally be caught very early in development on in code reviews" is what takes it back down to a 1.5x.
There are genuinely weeks where I go 5x though, and others where I go 0.5x.
It's not so valuable to assess the current state - what the impact of using AI is today. From personal experience it feels like overall impact on productivity was not positive a couple of years ago, might be positive now and will be positive in a couple of years. That means by assessing the current state of impact on product where just finding where we are on that change curve. If we accept that trend is happening then we know at some point it will (or has) pass the threshold where our companies will fall behind if they're not using it. We also know it takes a while to get up to speed and make sure we're making the most of it so the earlier we start the better. That's the counter arguement that we could wait for a later wave to jump on but that's risky and the only potential reward is a small percentage short-term productivity gain.
So you're saying instead of assessing the current capabilities of the technology, we should imagine its future capabilities, "accept" that they will surely be achieved and then assess those?
I would assess the directionality and rate of the trend. If it's getting better fast and we don't see a limit to that trend then it will eventually pass whatever threshold we set for adoption.
Self-reported productivity does not equate to actual productivity. People have all sorts of biases that make such assessments fairly pointless. They only gauge how you feel about your productivity, which is not necessarily a bad thing, but it doesn't mean you're actually more productive.
To extend on this, the measures of productivity before LLMs were difficult for any kind of complex work, so there's no reason to think we would have better measures now.
You need broad economic measurements, not individual or company specific. And that takes a long time plus there's a lot of noise in the data right now (war, for example).
1. We're very bad at measuring developer productivity. We've been trying to do it for a long time, and really have very little to show from it from my POV.
2. That said, almost all the people who "want to see a study" don't make sense to me. I don't remember anyone insisting on seeing a study that shows that writing Python is more productive than C; people just used it and largely agreed that it was. How many studies show that git (or other DVCS) are better than the things that preceded it? I don't know if any exist. I do know that nobody was looking for studies before switching to git.
I don't ever remember seeing any new technology in software development for which people demanded studies before adopting it. They just assumed that if the professional developers they trusted to build their software said something was better, then it was — a correct assumption IMO.
Now, we're seeing a technology which most professional developers — that have used it seriously, at least — insist is orders of magnitude better than anything else that's come before it. And suddenly developers can't be trusted? Suddenly, when the claimed effect is orders of magnitude bigger than almost any other new technology, developers are biased and incapable of making this kind of determination?
I really don't think that's a serious position to hold.
>Now, we're seeing a technology which most professional developers — that have used it seriously, at least — insist is orders of magnitude better than anything else that's come before it.
You can't just assert this. I could equally-baselessly say most professional developers have used LLMs and find them, overall, more trouble than they're worth. Except it's not totally baseless because I think that was actually a result of a study, IIRC.
But we didn't have pressure to switch from C to Python & solved it down our throats by management, or social media telling us if you don't use Python you're getting left behind, did we?
In C vs. Python case, we know about technical trade-offs and when to use what, but in AI productivity neratives, we keep pretending that technical or cognitive debt created by AI doesn't exist.
Sure, person A can be 20% "faster" and suggest that this tool increases productivity by a magnitude, but if it costs person B 50% more time to review A's slop or clean up A's mess, the team's productivity doesn't really increase.
We're incapable of putting an accurate, standardized value on developer productivity, yet there often seems to be consensus between senior engineers on who are the high performers and the low performers. I certainly can tell this about the people I work with.
> Point at a problem, and measure the cost of solving it.
The problem with this is that AI will create worse code that is going to cause more problems in the future, but the measurements won’t take that into account.
Measure or estimate? What ways? Honest question, because virtually all AI discussions _convieniently_ become vague a few steps short of actually answering the question.
Unironically, ai evaluating the impact of those lines might be getting close to a metric that would measure output better than having everyone print out their last 6 months of work for the new boss to look at.
How do you know you're more productive? Humans are excellent at fooling themselves, and absent a metric (or multiple metrics) by which you can measure your productivity, you can't be sure you're actually being more productive.
I don't know if it's made me more productive but I do know that for the past ten years I've been thinking about making an intermediate mode GUI toolkit for MPV user scripts, rendered with ASS subtitles and with a full suite of composable widgets, but for ten years I kept putting it off because it seemed like it would be a big quagmire of difficult to diagnose rendering errors (based off far more modest forays into making one-off GUIs in this way.) And I know that yesterday I decided to explain my idea to claude and now it just fucking works after just a few hours of easy casual back and forth.
I don't know man, could just be in my head. I better defer judgement, put aside all my own opinions about what happened and let some researchers with god knows what axe to grind make that decision for me.
I'm very very sure. Based on my last 15 years of coding experience I can assist fairly accurate how much a task takes. With AI I can finish the task 2x-4x faster (this includes testing, edge case handling etc).
What's the best car? If you're trying to go fast it's one answer, if you're trying to carry as much load as possible it's another, if you're buying for your just-qualifed-teen it's another. But best is obviously subjective, so what about safest? I don't know specifics there, but if you're in the EU the "safest" car would be very different to the "safest" in the US, because their safety studies measure very different things.
Which is the issue with almost all studies and statistics, what it means depends entirely on what you're measuring.
I can program very very fast if I only consider the happy path, hard code everything and don't bother with things like writing tests defining types or worrying about performance under expected scale. It's all much faster right up until the point it isn't - and then it's much slower. Ai isn't quite so obviously bad, but it can still hide short term gains into long term problems which is what studies tend to focus on as the short term doesn't usually require a study to observe.
I think Ai is similar to outsourcing staff to cheeper counties, replacing ingredients with cheaper alternatives and other MBA style ideas. It's almost always instantly beneficial, but the long term issues are harder to predict, and can have far more varied outcomes dependent on weird specifics of the business.
Most people seem to be expecting some kind of quantitative analysis: N developers undertook M tasks with and without access to a given AI tool, here is the statistical evidence that shows (or fails to show) the effect, and this result is valid across other projects and tools.
In practice, arriving at this ideal scenario can be very challenging. Actually feasible experiments will be necessarily narrow, with the expectation that their results can be (roughly) extrapolated outside of their specific experimental setup.
Another valid approach would be to carry out qualitative research, for example a case study. This typically requires the study of one (or a few) developers and their specific contexts in great detail. The idea is that a deep understanding of how one person navigates their work and their tools would provide us with insights that might be related to our specific situation.
Personally, in this particular area, I tend to prefer detailed qualitative accounts of how other developers are working on similar projects and with similar tools as me.
But in any case, both approaches are valid and complementary.
I think you are underestimating the amount of low priority issues that exist that doing need alignment around fixing. In the past these had little upside to actually fix, but as the cost of fixing them trends towards $0, you might as well fix them.
AI can take over testing and release planning / coordination. This is the allure of AI. Being able to fully close the loop of releasing software without needing a human.
I do remember some of them showed some productivity improvement but it pretty much dove off cliff with the complexity of the tasks involved, or the small improvement on medium difficulty task was eaten by time to wait for responses.
Note that most of them were focused on programming tasks aimed to ship a product, not other use cases like "prototype a dozen of ideas quickly before we pick direction", or "write/update documentation about this feature" which AI might be significantly more productive use case than just programming.
I have a coworker who is obsessed by LLMs and keeps reiterating that he is super productive with them.
Yet I have to still see the first delivery or codebase by that same person. (I am not his manager)
I lean in the LLM skeptic camp, I know they're great for some things (never to outsource your thinking, what unfortunately a lot of people do), but I'd like to see some studies. Because there are a lot of net negatives in the business press, or max up to 10% improvement.
Some people prefer evidence before investing large amounts of money and labor. That is not an indication of irrational behavior even if challenging your emotionally invested opinion or result.
It might also depend on how the tools are used. In practice a lot of value seems to come from reducing small bits of friction rather than dramatically increasing output.
AI can build systems based on static assumptions that the orchestrator (you) gives it. But proper engineering (which is what matters economically much more) is the process of the system's assumptions & requirements changing over time to ensure you have a reliable and consistent service - and that's not something that AI excels at (yet).
Because I am long past pretending to give a shit about intellectual property when the corps don't, or caring about the energy expenditure of my hobby when all the car guys don't, and really when it all comes down to brass tacks I think the technology must be judged by what it can do for me, not according to some misguided principles that don't actually serve my interests in the grand scheme of things (IP) or quasi-ideological matters like how much every I'm morally entitled to use. Screw all of that, frankly I file it all under cope that's used by people who want to go back to the old methods to justify their decision to ignore and not learn one of the most amazing technologies created during our lifetimes. I suggest you get real, for all its faults the tools work too well for us to turn back the clock on any of this. This stuff isn't going to blow over so you should be learning to make the best of it. My two cents.
The issue with creative and novel output from people is neither about intellectual property nor energy, though. So even someone who has nothing (personal) to lose by adopting these techs should be able to reflect on how that will make things look 5, 10, 20 years from now.
And I'm not talking about climate or poor starving artists here. But of course, if everyone thinks like you seem to do we might just give up on having a livable planet in 50 years. Or any significant scientific or artistic progress.
Yeah that's nice, but I don't care and it's not going to stop this train. The future I envision coming is one where even local models are sufficiently capable to give common people the ability to control their own computers in a way that previously would have required them to hire a team of professionals, or to devote years of their life to study. Frontier models aren't quite good enough yet for normies to use in this way, let alone local models, but this stuff is all still very new and there's a lot of competition to improve it. I think we'll get there, and in any case the upsides are big enough already to squash all the whining objections. You can't stop this tech, all you can do is stop yourself from benefiting from it while others do.
But the drop of original human created output will be worse for you. Even if you are fine with consuming AI slop the quality of it will go down with worse inputs
Slop custom made for me individually, on demand to fit my individual personal needs like a glove, is the dream.
Also, this slop is substantially slicker and more polished than the software I would have made myself, for myself. Judge away, but when I write something myself, for myself, I take short cuts and find little excuses to give myself less work. XDG complaint config? That can wait... Animations? Pfft, skip it. Tool tips on every interactive element? That'll never happen. But with a coding agent doing my bidding, these niceties become realities.
It’s not company. It’s always the 10x developer who uses the tools to increase his output. My buddies report at least once a month the new AI policy in corporate world. All of them are bollocks written by someone who never wrote any code.
> Why are the pro AI people so obsessed with proving the AI skeptics wrong.
It seems to me the pro-AI types just want to be free to enjoy a transformative tech and discuss the implications of its development and innovations - without being badgered and henpecked or told the results they see are some kind of mass delusion.
The "badgering and henpecking" "problem" was created entirely by AI bros hyping AI to everyone and forcing it in every possible channel and avenue.
You're literally trying to blame the victim. Put "don't show AI content" on every major platform and the henpecking will stop but (aside from technical annoyanced of doing it) that won't happen because companies want to force AI down our throats.
> Put "don't show AI content" on every major platform and the henpecking will stop
Your argument then is: "Ban the subject of AI from your platforms or we're coming at you with pitchforks. And don't say anything to us when we do, because we are the sad ones here." Correct?
There are a few studies that show perceived increases in productivity (all of them show negativ or almost no real increase, but I don't that is relevant to snake oils salesman).
We've had the AI tools for maybe two years, and they have only gotten really good in the past half a year or so. For fuck's sake, adopting electricity took like 50 years, why would you expect to see any kind of effect from the AI so quickly? The tools are still developing - rapidly - and people are still figuring out the best usage patterns for it.
Electricity analogy is fairplay, but ChatGPT had something like 110% global adoption 5 minutes after its release. The infrastructure and the electrical appliances had to catch up, but the Internet is all built out already.
So I think it's fair to be looking at results a few years in.
Andrey Karpathy famously mentioned in an interview with Dwarkesh Patel [0], that the computer doesn't show up on GDP numbers, there's no noticeable jump or change in slope. Even if Excel is so damn fast, people are likely not drawing its full potential, and institutions are likely actively resisting change anyway.
My take is that the general population hasn't found the productive levers yet, they're at the stage where they're happy to drag down and auto generate the date list in Excel, but don't know to adjust diagrams or read function docs, not to even mention VBS scripting. And the enthusiast (dev) community I'd say is starting adoption with internal tools, and shot-in-the-dark apps, but big successes need time to mature in all the other ways (design, reliability, user feedback, marketing...), which comes back to what you said also, that needs time. Product Market Fit isn't happening automatically by chance or good prompting, I would like to think.
I agree. I'd also argue that local effects of productivity were already visible since the start of ChatGPT. I was already using it a lot back then for writing tests and as a "smarter scaffolding", even before Copilot and such. Often cutting the time of doing something from half an hour to a few seconds.
IMO the bottleneck remains the same: doing proper engineering is more than writing code. Even 20 years ago a big corp would spend a few years writing something that a startup would do in weeks (and yes: even 20 years ago) just because of laser-focused requirements, better processes/less bureaucracy, using the right tools for the job and having less friction in tooling. That hasn't changed.
Productivity was never about the lines of code written. I thought the industry as a whole had collectively decided that metric was a joke before the age of LLMs. The bottlenecks are the same: office politics, coordinating teams, consulting subject matter experts and coherent system design. AI is not a swiss army knife that results in devs becoming their own island; LLMs cannot tell me if something would jive well with our customer base -- I need people in the company who actually interact with them, for example.
These sort of things are really hard to study. Combine that with the fact that the AI landscape is so varied and fast moving... It's easy to see why there aren't many studies on it.
There are a mountain of things that we reasonably know to be true but haven't done studies on. Is it beneficial for programming languages to support comments? Are regexes error-prone? Does static typing improve productivity on large projects? Is distributed version control better than centralised (lock based)? Etc.
Also you can't just say "AI improves productivity". What kind of AI? What are you using it for? If you're making static landing pages... yeah obviously it's going to help. Writing device drivers in Ada? Not so much.
I think these comparisons are unfairly picked. A good chunk of the world's economy is not currently jacked up on the promise that comments in code will lead to unimaginably high value (in pretty much every field from medicine to the media industry) in the span of a couple of years. Given the claims and market valuations around AI, wouldn't you agree a bit more hard evidence would be reassuring?
I can tell you that at Cisco they just released an internal AI study that measured just about everything related to AI at Cisco except tangible gain. No mention of productivity, but tons of other data about who uses it, how long, why or why not, what correlates to usage or non usage, etc. I can only assume what that means.
I believe that individual productivity in most areas peaked long ago.
Industrial production is still scaling up, and this is the model that applies to AI, or as it realy is, automation of "management", but as this is NOT a linear mechanical process,(almost, oh! so almost mechanical), it is not quite working.For exactly the same reason that industry can not make you one ,lets say,car, that is green on one side, but orange on the other, and has six headlights, but only one seat, industry cant scale down, minimum order is 250000 units, it will take 3 years, pay us now!
I deal with this every week, something small,(smol), breaks, in a large corporate
environment, they work in millions, they have teams, and departments, but the little handle thing on a set of automated front doors facing a main street in a significant asset, has failed, and watching the whole corporate aparatus convulse as they try and figure out how to pay for something smaller than a rounding error to a company that barely exists, and needs to be passed higher and higher to be approved as there is no button, just like a major corporate deal.
People cant figure this out, AI never will.
And I am exploring just how to exploit this scaling problem to my advantage.
Because the data is private and often such studies are not measuring solely the part that AI makes more productive. And measuring productivity in general is a very hard problem so the results of whatever study often are meaningless in practice. Pair this with studies today still being based off ancient models like GPT-4o and it's even more meaningless.
If you are familiar with AI it's obvious how it increases productivity. When bugs get fixed with 0 human time it's plain as day that it was productive compared to a human making the fix.
If AI makes people so much more productive, why aren't there much more apps on the Apple store? Mobile apps involve a lot of dirty, boring scaffolding work which AI automated first thing, 2 years ago easily. It should've been the very first place where productivity boost should've been evident, a year ago at least. But it's just not there. Why not?
App Store releases are increasing due to a new gold rush on subscription apps. Review times have gotten longer as the review team at Apple is being spammed.
Most of these apps are rudimentary habit trackers, time management apps etc. so not much creativity, much more recycled ideas. More code != better ideas though.
Also a lot more clone ideas these days. AI has definitely empowered people to write things from scratch, either as a product to sell or as internal projects inside companies.
+160k apps a year, that's only 84% above pre-AI era (safe to say that apps were not routinely built with AI in 2023 yet). Noticeable increase but doesn't feel dramatic, especially since yes, majority of those new apps are low-effort trash like those described in this thread.
The gains are ~17% increase in individual effectiveness, but a ~9% of extra instability.
In my experience using AI assisted coding for a bit longer than 2 years, the benefit is close to what Dora reported (maybe a bit higher around 25%). Nothing close to an average of 2x, 5x, 10x. There's a 10x in some very specific tasks, but also a negative factor in others as seemingly trivial, but high impact bugs get to production that would have normally be caught very early in development on in code reviews.
Obviously depends what one does. Using AI to build a UI to share cat pictures has a different risk appetite than building a payments backend.
That 17% increase is in self-reported effectiveness. The software delivery throughput only went up 3%, at a cost of that 9% extra instability. So you can build 3% faster with 9% more bugs, if I'm reading those numbers right.
The question that people are actually interested in, "After adopting this specific AI tool, will there be a noticeable impact on measures we care about?" is not addressed by this model at all, since they do not compare individual respondents' answers over time, nor is there any attempt to establish causality.
Three months ago, with opus4.5, I would have said that the productivity improvement was ~10% for my whole team.
I now have to contradict myself: juniors and even experienced new hires with little domain knowledge don't improve as fast as they used to. I still have to write new tasks/issue like I would have for someone we just hired, after 8 months. I still catch the same issues we caught in reviews three months ago.
Basically, experience doesn't improve productivity as fast as it used to. On easy stuff it doesn't matter (like frontend changes, the productivity gains are extremely high, probably 10x), and on specific subjects like red teaming where a quantity of small tools is better than an integrated solution I think it can be better than that.
But I'm in a netsec tooling team, we do hard automation work to solve hard engineering issues, and that is starting to be a problem if juniors don't level up fast.
There are genuinely weeks where I go 5x though, and others where I go 0.5x.
You need broad economic measurements, not individual or company specific. And that takes a long time plus there's a lot of noise in the data right now (war, for example).
2. That said, almost all the people who "want to see a study" don't make sense to me. I don't remember anyone insisting on seeing a study that shows that writing Python is more productive than C; people just used it and largely agreed that it was. How many studies show that git (or other DVCS) are better than the things that preceded it? I don't know if any exist. I do know that nobody was looking for studies before switching to git.
I don't ever remember seeing any new technology in software development for which people demanded studies before adopting it. They just assumed that if the professional developers they trusted to build their software said something was better, then it was — a correct assumption IMO.
Now, we're seeing a technology which most professional developers — that have used it seriously, at least — insist is orders of magnitude better than anything else that's come before it. And suddenly developers can't be trusted? Suddenly, when the claimed effect is orders of magnitude bigger than almost any other new technology, developers are biased and incapable of making this kind of determination?
I really don't think that's a serious position to hold.
You can't just assert this. I could equally-baselessly say most professional developers have used LLMs and find them, overall, more trouble than they're worth. Except it's not totally baseless because I think that was actually a result of a study, IIRC.
>I'm [...] at $x, a frontier AI Security company
I really should check these before I bother engaging with posts boosting AI
In C vs. Python case, we know about technical trade-offs and when to use what, but in AI productivity neratives, we keep pretending that technical or cognitive debt created by AI doesn't exist.
Sure, person A can be 20% "faster" and suggest that this tool increases productivity by a magnitude, but if it costs person B 50% more time to review A's slop or clean up A's mess, the team's productivity doesn't really increase.
We only avoid doing it at scale because it's expensive. In particular if we want the measurement to generalise out of sample.
(In particular in this case, where once we're done, proponents will claim our data is too old to be a useful guide to tomorrow.)
The problem with this is that AI will create worse code that is going to cause more problems in the future, but the measurements won’t take that into account.
If we could even measure teams, against themselves, others and some kind of baseline, but we don't AFAIK.
Unironically, ai evaluating the impact of those lines might be getting close to a metric that would measure output better than having everyone print out their last 6 months of work for the new boss to look at.
I don't know man, could just be in my head. I better defer judgement, put aside all my own opinions about what happened and let some researchers with god knows what axe to grind make that decision for me.
Which is the issue with almost all studies and statistics, what it means depends entirely on what you're measuring.
I can program very very fast if I only consider the happy path, hard code everything and don't bother with things like writing tests defining types or worrying about performance under expected scale. It's all much faster right up until the point it isn't - and then it's much slower. Ai isn't quite so obviously bad, but it can still hide short term gains into long term problems which is what studies tend to focus on as the short term doesn't usually require a study to observe.
I think Ai is similar to outsourcing staff to cheeper counties, replacing ingredients with cheaper alternatives and other MBA style ideas. It's almost always instantly beneficial, but the long term issues are harder to predict, and can have far more varied outcomes dependent on weird specifics of the business.
In practice, arriving at this ideal scenario can be very challenging. Actually feasible experiments will be necessarily narrow, with the expectation that their results can be (roughly) extrapolated outside of their specific experimental setup.
Another valid approach would be to carry out qualitative research, for example a case study. This typically requires the study of one (or a few) developers and their specific contexts in great detail. The idea is that a deep understanding of how one person navigates their work and their tools would provide us with insights that might be related to our specific situation.
Personally, in this particular area, I tend to prefer detailed qualitative accounts of how other developers are working on similar projects and with similar tools as me.
But in any case, both approaches are valid and complementary.
Those that can “see” the potential push through the adaptation period, even when longer than expected.
Depending on how forward looking a group is, the adaptation costs are a problem, a dilemma, or a completely obvious win.
Yet, external measurements don't distinguish between accumulating, accelerating, flat or fading intermediate value.
--
Avoidance of necessary adaptation, even with no immediate impact, becomes the dual. Technical, strategic, or capability debt.
Does that hidden anti-productivity ever get accounted for? When maladaptive firms take their anti-productivity into a hole as they fade/demise?
A company can operate with high margins while its sales fall off a cliff. Is that just "decreasing quantities" of uniformly "high productivity"?
Learning to write code always was the easy part, learning to write good software is what takes the rest of our careers to get better at.
It’s not. In a proper org the cost is the testing, the release process, the coordination, the planning, etc.
Any scope creep even if it fixes something often gets shouted at.
Note that most of them were focused on programming tasks aimed to ship a product, not other use cases like "prototype a dozen of ideas quickly before we pick direction", or "write/update documentation about this feature" which AI might be significantly more productive use case than just programming.
Yet I have to still see the first delivery or codebase by that same person. (I am not his manager)
I lean in the LLM skeptic camp, I know they're great for some things (never to outsource your thinking, what unfortunately a lot of people do), but I'd like to see some studies. Because there are a lot of net negatives in the business press, or max up to 10% improvement.
- built AWS dashboard to identify and manage internal resources in a few hours
- solved several production problems connecting Claude to devops APIs in near real-time
- identified solutions for feature requests or bugs for existing internal applications including detailed source changes
- built Ledga.us
- built sharpee.net and its associated GitHub repo
- building mach9 poker ios and android apps
- working on undisclosed app that might disrupt a huge Internet sector
We’re still in the early stages of LLM influenced development and reporting productivity will take time
Things like generating boilerplate, quick test scaffolding or documentation lookups. Each one is small, but they compound during the day.
That’s probably why it’s hard to capture in traditional studies.
Curious: has anyone seen studies measuring task-level productivity instead of overall output?
If anything, there needs to be studies done on
- the drop in creative, novel output from actual people (due to theft and loss of jobs)
- the energy cost per pax in relevant industries, pre/post LLMs being adopted
And I'm not talking about climate or poor starving artists here. But of course, if everyone thinks like you seem to do we might just give up on having a livable planet in 50 years. Or any significant scientific or artistic progress.
Also, this slop is substantially slicker and more polished than the software I would have made myself, for myself. Judge away, but when I write something myself, for myself, I take short cuts and find little excuses to give myself less work. XDG complaint config? That can wait... Animations? Pfft, skip it. Tool tips on every interactive element? That'll never happen. But with a coding agent doing my bidding, these niceties become realities.
Beats me. With "AI" being so good at faking stuff, there should by now be ton of such studies :)
Why are the pro AI people so obsessed with proving the AI skeptics wrong.
Is AI is working for you? Great. Go make great things. Isn't that the point after all? Who cares who believes you if the results speak for themselves?
Cognitive dissonance. "Why are people claiming they do not see any benefit and I do? That is unacceptable, they must be wrong."
I have to admit cognitive dissonance works both ways.
It seems to me the pro-AI types just want to be free to enjoy a transformative tech and discuss the implications of its development and innovations - without being badgered and henpecked or told the results they see are some kind of mass delusion.
You're literally trying to blame the victim. Put "don't show AI content" on every major platform and the henpecking will stop but (aside from technical annoyanced of doing it) that won't happen because companies want to force AI down our throats.
Your argument then is: "Ban the subject of AI from your platforms or we're coming at you with pitchforks. And don't say anything to us when we do, because we are the sad ones here." Correct?
It's all make believe
So I think it's fair to be looking at results a few years in.
Andrey Karpathy famously mentioned in an interview with Dwarkesh Patel [0], that the computer doesn't show up on GDP numbers, there's no noticeable jump or change in slope. Even if Excel is so damn fast, people are likely not drawing its full potential, and institutions are likely actively resisting change anyway.
My take is that the general population hasn't found the productive levers yet, they're at the stage where they're happy to drag down and auto generate the date list in Excel, but don't know to adjust diagrams or read function docs, not to even mention VBS scripting. And the enthusiast (dev) community I'd say is starting adoption with internal tools, and shot-in-the-dark apps, but big successes need time to mature in all the other ways (design, reliability, user feedback, marketing...), which comes back to what you said also, that needs time. Product Market Fit isn't happening automatically by chance or good prompting, I would like to think.
[0] https://youtu.be/lXUZvyajciY?is=CBJI4hIr6w_UHVs9
That's certainly an interesting take. Where do these people think the 1-2% annual growth came from — steam machine late adopters?
The conundrum in the 1980s and 1990s was, growth hasn't increased, despite all the computer adaptation. Why not?
IMO the bottleneck remains the same: doing proper engineering is more than writing code. Even 20 years ago a big corp would spend a few years writing something that a startup would do in weeks (and yes: even 20 years ago) just because of laser-focused requirements, better processes/less bureaucracy, using the right tools for the job and having less friction in tooling. That hasn't changed.
There are a mountain of things that we reasonably know to be true but haven't done studies on. Is it beneficial for programming languages to support comments? Are regexes error-prone? Does static typing improve productivity on large projects? Is distributed version control better than centralised (lock based)? Etc.
Also you can't just say "AI improves productivity". What kind of AI? What are you using it for? If you're making static landing pages... yeah obviously it's going to help. Writing device drivers in Ada? Not so much.
And no, no-one is waiting for a “study” to believe in AI, they’re out doing it.
If you are familiar with AI it's obvious how it increases productivity. When bugs get fixed with 0 human time it's plain as day that it was productive compared to a human making the fix.
Most of these apps are rudimentary habit trackers, time management apps etc. so not much creativity, much more recycled ideas. More code != better ideas though.
https://www.a16z.news/i/185469925/app-store-engage https://42matters.com/ios-apple-app-store-statistics-and-tre...
End of 2023: 1,870,119 apps
End of 2024: 1,961,596 apps
Now: 2,150,612 apps, after 1.18 years.
+160k apps a year, that's only 84% above pre-AI era (safe to say that apps were not routinely built with AI in 2023 yet). Noticeable increase but doesn't feel dramatic, especially since yes, majority of those new apps are low-effort trash like those described in this thread.
There are more apps, and webpages, and software and whole lot of stuff.
It's just not good
I would have included the flatness of earth, but the flat earthers have some excellent studies (reviewed by their flat earth peers) on the subject.