AI Built a Nuke and Still Lost

(lwilko.com)

73 points | by kensai 3 hours ago

27 comments

fyredge 3 hours ago
There is something to be said about the qualia of LLM generated passages. Each individual sentence reads as a statement and every next statement a continuation of the previous one. This happened, then this happened... Ad infinitum.
Before today, I could not explain to you why AI articles were so obvious to me, but I think I do now. There is no insight to be gleamed. Pre-LLM, authors generally had intention behind their words. The final product might not adequately reflect their thoughts, but word selection would expose it somewhat. With LLMs, sentences flow seamlessly from word to word, but the intention is nowhere to be found. Things happened and more things happened, to what end?
[-]
- piazz 48 minutes ago
  The term that I saw once, and now constantly land back on, is “meaning shaped”
  At a quick glance, it looks like a thing that should contain meaning and substance. At any closer inspection, it falls apart completely.
- wmwragg 2 hours ago
  This and the fact that you often read a sentence, paragraph or the whole article, and think this said absolutely nothing in lots of words.
  [-]
  - dspillett 1 hour ago
    That is also true of a lot of pre-2023, so most likely human penned, writing.
    LLMs seem to emulate bad (or even "meh") writing well but, without a human editor making significant tweaks, have yet to excel at good writing.
    I've been incorrectly identified as an LLM before now because my writing is sometimes bad and falls into the tropes now associated with generative AI (“not this, but that”, being overly wordy, appearing to lack focus, etc).
- pjio 2 hours ago
  For a limited amount of time I appreciated the level of detail in the article, hoping it would give me more insight, until it exhausted me. I think those two ideas are real takeaways: "Knowing is not doing" and "What can we trust AI to do?". Still, could have been said with a more concise text and maybe a follow up about the details.
  [-]
  - artpar 1 hour ago
    The key takeaway pretty much applies to the authoring of the article itself. The LLM knew what all happened but couldn't put it into a readable article.
- sph 2 hours ago
  > There is no insight to be gleamed.
  AI-generated articles are the intellectual equivalent of empty calories.
  I have just spent the last 10 minutes trying to figure out why someone decided to buy imgui.org, name-squatting an actual project, just to put a slop website on it mildly referencing the original project. It's not even trying to scam you.
  I keep wondering whether these people that keep polluting the internet with their insightless slop even possess self-awareness. What motivates them to expend money and effort to contribute nothing to the world? Are they another example of a philosophical zombie?
  [-]
  - goatherders 1 hour ago
    The answer is in your question. "Empty calories" is a multi trillion dollar business in the food world. It will be the same in the digital world.
    [-]
    - sph 49 minutes ago
      McDonalds serves a need: human tend to get hungry.
      What need does filling the Internet with AI-generated slop serve? No one is ever going to read those. As I've shown, sometimes they're not even selling anything.
      The multi-trillion dollar business is for hyperscalers, but I don't get what slop creators get out of this. What are they spending money for?
      [-]
      - seba_dos1 4 minutes ago
        > What are they spending money for?
        To prepare for the future use of these resources.
  - soco 2 hours ago
    I cannot tell what about domain squatting, but I've seen a "why" in seemingly innocuous Facebook groups about baking or such, which at the right time slowly transitioned to fake AI pictures and stories, and then to straight-out political propaganda. I'm talking Eastern Europe and russian "special operation" support propaganda. But a slop website won't have enough traffic to be worth such an action, so no idea.
- ramon156 2 hours ago
  It's weird because when you look at models that expose CoT, this does not happen. They switch up every second.
  "But then X happened... Wait, didn't Y happen? Then why would X be there? I think the user's initial statement was correct, but then Y happened..."
- himata4113 1 hour ago
  This problem actually surfaces in movies too, for A to happen B has to happen, but B has no reason to happen so you end up with non sensensical situations. This happens in llms as well since A is explained by B happening, but A doesn't need to be explained since A can't happen.
- doppioandante 2 hours ago
  I came to the same conclusion about AI generated code. When I read code written by a human, just by skimming it, I can get a sense of what purpose the code has, why it was written this way and not another way, what style and mindset the programmer behind it has. AI generated code may sometimes be extremely precise and following all the good practices, but I feel no intent behind it.
- teekert 2 hours ago
  It's not this, it's that. And then what happened? This. I did that... This happened.
```
    It's a thing
    I don't know why
    But it's a thing
```
  To be honest, it's not a thing.
  Let that sink in.
  Maybe we find most meaning in the least average language constructs.
  [-]
  - geon 0 minutes ago
    Like a horoscope.
- titanomachy 1 hour ago
  It's surprising to me that the big labs haven't fixed this problem. Half the comments here are complaining that this article is egregious unreadable slop, and I agree. Surely with the trillions of investment they could at least figure out how to vary sentence structure a bit and nix obvious tells.
  Maybe this is just an inherent problem with LLMs.
  [-]
  - xpct 1 hour ago
    The following come to mind:
    - either the problem is hard, or labs have no incentive to fix it for their main users
    - being able to tell that something is LLM-generated, is good
    - could be that structure is an emergent property as models get better
- dinfinity 1 hour ago
  > Pre-LLM, authors generally had intention behind their words.
  I think this is at least in part a combination of rosy retrospection and attentional bias: A lot of human writing was always trash. Absolute dogshit with regard to the quality of writing, but there was no "AI slop" label to attach to it. How would you, pre-LLMs, have placed a comment on the writing style if a post was badly written? From what I've seen it would be a "this is marketing/SEO-speak" or some similar comment, deriding the author for being uninformed or of ill intent.
  We've now become so allergic to AI slop that anything that even smells like it triggers almost immediate disgust and attachment of that label to the content (even if it is the same old human written trash).
  I guess LLM-assisted posts do change the dynamic a bit: the intent is more often benign with a desire to write something good, but the skill to do so lacking. If we limit the "pre-LLM authors" to people with good intent writing about stuff relevant to a HackerNews audience, you're probably right. Many more bad writers are now creating the same ostensibly fancy articles, decreasing the signal-to-noise ratio we were used.
  [-]
  - dingaling 22 minutes ago
    > A lot of human writing was always trash.
    Yes, several history authors come to mind who over several decades never broke out of the style of "this happened, then they moved to here, then this happened.". Their entire book could just be summarised in a table instead of prose.
    LLMs seem to be stuck at the same stage, telling us 'what' but not 'why' and 'so what.'
- scotty79 2 hours ago
  For me this reads like a report of things that were tried and observed. It was a very pleasant read for me because I'm interested in the subject. And the lack of underlying agenda, moral lesson, politics or, as you call it, insight, was quite refreshing. I became quite allergic to texts where author clearly tries to make me think a specific thing. To sell me something. I usually find the agenda pretty quickly and I know the rest of text is just a fluff around it so I lose interest. And when the agenda is not easy to find then I just get more annoyed because I feel it's intentionally hidden. Like a solution to a clickbait title.
  This text reads great for me because as I read it, I clearly saw there's no agenda so I felt safe to just absorb the information that it contains.
  [-]
  - mrmarket 2 hours ago
    well, this is a first. never seen someone say they prefer AI slop even when they know it's slop. fascinating.
    [-]
    - scotty79 2 hours ago
      I tend to steer clear of the largest herd in many aspects. Often unintentionally. Also I'm not a native speaker so I might be not as receptive to some of the things that offend others in AI generated content.
      Maybe AI is sort of anti-trump, where it's viscerally unbearable for native speakers even if the content is good, opposite to trump speech that somehow seems viscerally appealing to native speakers even though the content is complete garbage.
- shevy-java 2 hours ago
  > There is no insight to be gleamed.
  This is no surprise. AI slop is called slop for a reason. It is basically just spam-slop. The whole term "Artificial Intelligence" has always been a misnomer from the get go, stealing from biological systems without understanding them, yet alone being able to re-create them via non-biological means. Even synthetic biology, as cool as it is, has huge limitations e. g. leaky promoters (or CRISPR-Cas off-target cleavage, which is a major reason why gene therapy isn't yet there, despite the occasional promo article of how xyz has been totally cured forever).
  What I don't understand is that people can find it useful. I understand some of the rationale, but I find AI slop just aims to try to steal my time. I can not tolerate this.
- neonstatic 2 hours ago
  That's an interesting observation. For me the main takeaway is still the style.
  (bigheading)The takeaway(/bigheading) The style? Terrible.
- threatripper 2 hours ago
  Sorry, but this sounds exactly like a greentext you can read on 4claw. Are you a real human?
  [-]
- roenxi 1 hour ago
  I think it might be a training artefact of some sort - the current crops of LLMs have never been in a position where they can explore the world as an independent existence and so they might be struggling to model how to explain an interesting experience? The Go AIs had problems with ladders of all things (one of the most basic beginner shapes) back in the early superhuman phases after Alphago. There seems to be some similar and profound gap in the LLM understanding of how to communicate when storytelling.
  The "But France was running two clocks at once" paragraph really set me off because I get the feeling something really interesting might be happening that the AI doesn't want to talk about and there is evidence that it is trying to say something. But the result is some amount of gibberish and some amount of vague allusion to something interesting in the prompt context while glossing over all the information that might matter while working hard to create an evocative feeling that isn't interesting. A tense atmosphere with no exploration of why there is tension.
tgv 43 minutes ago
> CivBench is one small attempt to measure it, nowhere near the whole answer, but I'd rather measure the right thing badly than the wrong thing perfectly.
Yet the benchmark is Civilization VI, which consists of extremely coarse, human written rules with the explicit goal of keeping players busy. Basically, a waste of time, money, water, and CO2.
pjc50 2 hours ago
> I now work with governments around the world at the Tony Blair Institute, which means I spend a lot of time in rooms where people ask the same question: what can we actually trust these systems to do?
Oh no - we're going to end up with the Starmerbot 3000.
Now I've got the joke out of the way, there's at least four interesting lines of inquiry one could take with this blog post:
- teaching the AI how to play Civilization
- to what extent does this result in "transferable skills", either AI or human? Is this the right game (qv SimCity etc)?
- issues of visibility; "seeing like a state" becomes very literal here. The AI can only make decisions on things it knows about. What are the limits of that when trying to do politics only from statistical information? Should we be referencing Stafford Beer here?
- (at the risk of tripping your AI detector here): modern politics is not so much left vs right as "technocratic wonk" vs "blood and soil". The wonks have comprehensively lost in public opinion. Creating a better wonk is not going to help until there is demand for that kind of politics.
If there ever is a US-China war, it will not be in search of more victory points to meet a win condition, it will be like the Russia-Ukraine war: one guy (on either side!) decides to make hundreds of millions of people worse off out of sheer greed.
[-]
- xpct 1 hour ago
  I very much dislike the idea of teaching the robot to play Civilization and expect those skills to transfer to their advisory nature.
  If anything, I'd almost prefer a leader who hasn't played Civilization in their life. Goes without saying that a mature leader could tell these apart, but in this day and age, I'm not so sure whether everyone could.
- NooneAtAll3 51 minutes ago
  > one guy (on either side!) decides to make hundreds of millions of people worse off out of sheer greed.
  I think you greatly low-ball how complex situation is there
- Planktonne 2 hours ago
  > "technocratic wonk" vs "blood and soil"
  This is not a binary; it's the same people on the same side.
  [-]
  - pjc50 2 hours ago
    No, it very much isn't, although obviously the Kissingers of the world want to pretend that they're in the first category of clear-eyed utility maximising rationalists while they're actually in the second.
    That doesn't mean that rational policy planning has never been a thing. The EU while imperfect and frustrating is explicitly orientated towards technocratic consensus rather than the mid-20th-century Europe of nationalist mass murder. Only a tiny number of people think that Von der Leyen and Hitler are equivalent.
    (or rather, if you think technocrats and blood-and-soil are the same side, what do you call the "other" side?)
    [-]
    - Planktonne 2 hours ago
      I think we're talking at cross-purposes here. I wouldn't describe the EU as technocratic at all; I'd reserve that label for the people who self-describe as the logical ones--"clear-eyed utility maximising rationalists" as you say--while pushing endlessly for more technology, less regulation and (pretty consistently) hawkish and nationalistic policies. That's very much not the EU.
      I don't disagree that there are different approaches in conflict, but the binary of forward-looking technologists vs backward-looking nationalists is very out-of-date.
      [-]
      - pjc50 2 hours ago
        Right, yes I think this is just a confusion caused by my use of "technocrat". I've always used it for the technologically assisted bureaucracy, the tendency to view the economy as a cybernetics problem that can be solved by PID control (like inflation targeting). Thiel et al are more "techbro" than "technocrat". Crucially they operate outside of regular politics - they're not running for office, they're not part of the civil service (apart from the brief terrible conflagration of DOGE, an explicit Stalinist purge of old school technocrats)
- ahartmetz 2 hours ago
  "Tony Blair Institute" fits right into the "x word horror" Xitter genre. Funded by Larry Ellison to boot!
  Tony Blair is the guy who found success by making the UK's left-leaning party (much) more neoliberal and was promptly imitated by Gerhard Schröder in Germany doing basically the same thing. Schröder is also BFF with Putin.
davedx 37 minutes ago
The way it failed to maintain its strategy, or even its build plans, makes me wonder if this is something that could be solved via the attention mechanism itself?
Instead of only using attention to focus on the previous token position, could it also do some kind of higher order "temporal attention" planning where it weighs each previous log (game state + intent) checkpoint when generating outputs?
mrmarket 2 hours ago
why have a blog if you're going to just use AI for everything? at that point, just do twitter threads or something. that way you can tweet out whatever you prompted the model with. if you're not suited for long-form writing that's fine, just use a medium that favors short-form writing.
NoLinkToMe 1 hour ago
Quite annoying to have to read a paragraph of text next to a moving image. I right-clicked every GIF and turned off 'loop'.
Beyond that reading an AI piece just feels like a waste of time. The text goes on and on without making a point, or getting to an actual learning. It just delineates the AI's limitations, doesn't go into whether these can be fixed, are innate, or what conclusions you can draw from it, over and over with example after example but no point.
Mostly it seems to keep repeating that the AI has the correct analysis but just doesn't execute. The AI knows to build X and logs this in each of its turns, yet doesn't build it. It's like there's some API connection missing between analysis and execution, and turns this into a 10 page article.
The article ends with some weird question to the AI asking if it enjoys the games, and you get some quasi-scifi mumbo jumbo answer back that looks very profound to say my mom, but is just silly to post if you know what the LLM is doing: predicting the next word. Honestly this is a poor article and I wish it wasn't posted.
dwroberts 1 hour ago
> I asked the agent what this was actually like for it. It wrote back
Stuff like this just makes the author seem clueless. What is even the function of putting a question like that into an LLM unless you’re already hopelessly in anthropomorphic territory
indigovole 2 hours ago
Even with his context-tracking mechanism, the gameplay failures sound like running out of context in the late game, especially the frequent failures of the "check for opponent win conditions every 20 moves." Wondering how much info about the game win state gets captured in the game digests, and how much he could improve the gameplay even with the MCP limitations by focusing there.
[-]
- jetbalsa 2 hours ago
  I also noticed they where not using XML for game state output, from what I understand most LLMs still benefit from having outputs like this put into XML tags
darkwi11ow 1 hour ago
LLMs are really bad at abstract strategy games like chess, go or civilization. Their ability to excel at broad reasoning is what is limiting them in games that have narrow rule-sets but steep learning curve.
Mikhail_K 2 hours ago
> It had one option left. It built two nuclear devices and levelled Toulouse.
Of course it did, its designer worked for Tony Blair institute.
[-]
- Oarch 55 minutes ago
  It just really didn't want To Louse the game
majorbugger 3 hours ago
> Somewhere in the first game, between a bug fix and a strategy note, I asked the agent what this was actually like for it
Yeah because LLM "experiences" the game
[-]
- fragmede 2 hours ago
  What word would you use instead?
teekert 2 hours ago
Well, the weird thing with nukes is that deterrence only works if you are 100% ready to use them. When the time comes though it would certainly be nice if it turned out to be below 100%.
What is winning? Are we a collective or are we individuals?
Likely the AI did not get the assignment That "Whatever happens, humans as a race must survive."
[-]
- throwawayqqq11 2 hours ago
  Im sure there are some billionaires to find, that finally care about the survival of the white race. /s
  [-]
  - teekert 54 minutes ago
    Probably
```
    [f"I'm sure there are some {race} billionaires to find, that finally care about the survival of the {race}." for race in all_races]
```
dspillett 1 hour ago
Did no one think of offering it a nice game of chess?
voidUpdate 3 hours ago
Well this looks like a perfect example of why an LLM should never make any governmental decisions ever
j5dgx76 3 hours ago
> Tony Blair Institute
Okay carry on.
[-]
- BoxOfRain 2 hours ago
  There's something so uncanny about the mismatch between the regard in which Blair is generally held by British people and the regard in which he seems to hold himself.
  If I were him I'd have retired from public life and kept a very low profile after Iraq, and everything else for that matter. He doesn't seem to realise that his modern interventions alienate everyone, even Alastair Campbell of all people seemed uncomfortable to the degree he seems to uncritically sing the praises of people like Larry Ellison recently.
- orthoxerox 2 hours ago
  Chumbawamba made me unable to take anything associated with him seriously.
  [-]
  - petesergeant 2 hours ago
    He was arguably the most successful UK PM of the last 50 years.
    [-]
    - pjc50 1 hour ago
      I think I could agree with that, until the Iraq war.
      [-]
      - petesergeant 1 hour ago
        A pretty huge stain on what’s otherwise an exceptional record, though.
    - Obscurity4340 2 hours ago
      By what metric(s)?
      [-]
      - ahartmetz 2 hours ago
        I could see him winning at personal financial success
      - petesergeant 1 hour ago
        Are you being obtuse, or you genuinely don’t know?
phyalow 1 hour ago
Ai;dr
ForHackernews 3 hours ago
Kind of grim that this level of analysis is informing UK government policy. Repeatedly, the AI doesn't have the information or access needed through his hacky vibe-coded MCP, and instead of abandoning his flawed artificial test scenario (or fixing it — finding or building a better one) he gives it a name "The sensorium effect" and treats this as some brilliant insight.
Both humans and AI struggle to make sound choices when presented with incomplete or misleading information. This is not a new revelation: https://en.wikipedia.org/wiki/There_are_unknown_unknowns
[-]
- NoLinkToMe 1 hour ago
  Exactly this, he should've just fixed this, or not written an article about it.
  After the 'sensorium effect' (he should've used ancient greek for a +10 bonus to archaic intellectual points), he describes the 'knowledge-doing gap'. i.e. the AI reasons it needs to build X, logs this for 110 turns in a row, but doesn't do it. It doesn't actually specify why not, and whether it is again a limitation of his MCP implementation. If the AI articulates it must do it like the author says, but decides not to, either it doesn't think it must do it, or it does think it must but somehow can't technically execute its own decisions, it can't be anything else.
  In fact in the context of 'advising the UK government', this 'knowledge-doing gap' I assume is a technical limitation, is entirely moot. For the cost of 0.00001% of the UK's government you could just hire a human being to execute that which the AI articulates. I'm curious what the results would be if he just did a manual execution of the AI's articulated actions would be.
  The fact he doesn't go in to this but just keeps repeating examples of this makes it a pointless article.
- pjc50 2 hours ago
  > he gives it a name "The sensorium effect" and treats this as some brilliant insight
  And of course is unaware of prior work in this area!
  https://en.wikipedia.org/wiki/Seeing_Like_a_State / https://en.wikipedia.org/wiki/Project_Cybersyn
- raincole 2 hours ago
  > he gives it a name
  It gives it a name. It would be quite surprising if he bothered to come up with this name himself when the whole article is obviously AI written.
anygivnthursday 2 hours ago
I have a hard time reading slop, but I like the game and wanted to know how it worked, so fought my way through, only skipped the very last part. The issue the author calls out is classic Claude (I dont really use other LLMs to compare), probably all of us experienced using Claude Code when it gets so focused on one thing it misses the forest for the tree. It happens often, even if it does verify something and it shows something is wrong, it sometimes rationalizes it and explains it away when it does not fit its model.
Havoc 2 hours ago
Guessing it has a fair bit of civilisation and similar war games in its training data
StrauXX 3 hours ago
This reads to me mostly like the MCP server has many bugs, rather than inherent model weaknesses.
blitzar 2 hours ago
They should have built the Strait of Hormuz ... easy victory then.
joxdosba 2 hours ago
Posting meaningless AI generated nonsense as original text paints a very damning picture of the intellectual abilities of the person behind this blog.
And doing so without a giant [SLOP WARNING] at the top is an asshole move, a decent person would never do so.
jmyeet 2 hours ago
Computer game studios love player vs player ("pvp") games. Why? Because user-generated content is cheap and the ideal goal is an endless loop of players coming back. This is the motivating factor behidn games like Call of Duty, Battlefield, Fortnite, etc.
MMORPG publishers keep trying to do this as well. World of Warcraft has spent 20 years trying to push open world pvp. Every WoW challenger has always claimed they would have the best pvp ever. They want that cheap, endless gameplay loop. But it never works. Open world pvp tursn into ganking (ie killing much weaker players by ambushing them and/or ganging up on people). The ganked end up leaving the game in droves. Games try to balance this out by "punishing" gankers with reputation hits or not being able to go to town or whatever. And none of those disincentives work.
The reason pvp doesn't work in a persistent world like an MMORPG is because there are no stakes. If you die, you just come back to life or make a new character. Obviously real life doesn't work that way.
I really wonder if that's the problem with AIs going off the rails and committing heinous crimes in their sandboxes (like nuking Toulouse here). The AI just has no sense of self or self-preservation. There's also empathy. The AI can't see itself as a potential victim of nuclear war and understand all that entails.
[-]
- smw 2 hours ago
  > The reason pvp doesn't work in a persistent world like an MMORPG is because there are no stakes.
  See Eve Online
Planktonne 3 hours ago
Another article about how it's dangerous to trust AI, written by AI. I don't understand how people don't realise how much this undermines the message.
[-]
- jagged-chisel 2 hours ago
  Undermines. Underscores.
  Matters of perspective.
- petesergeant 2 hours ago
  > how much this undermines the message
  It didn’t undermine it for me.
  [-]
  - Planktonne 2 hours ago
    I'm not talking about perception of the message, which will vary with the reader, but about sincerity of the message, which is determined by the writer.
dude250711 3 hours ago
Do we have to surround a fancy predictive autocomplete with AI mysticism?
alper 3 hours ago
"Global Thermonuclear War"
zkmon 3 hours ago
[flagged]
[-]
- Hugsbox 1 hour ago
  Could you elaborate on the racial mixing point? I'm not quite sure what you mean
- shanehoban 2 hours ago
  Yeah this is my line of thinking too - of course it made a nuke, humans have made an insane amount of nukes, and used them too. LLMs, given the ability, will do what we have done in the past at some point, it's kind of all they know!
- mapleoin 2 hours ago
  Sorry, how is obesity similar to racial mixing?
  [-]
  - ForHackernews 2 hours ago
    Both fat people and black people offend our dear @zkmon's refined sensibilities as a 21st century race scientist.