The difference is implementation comes down to business goals more than anything.
There is a clear directionality for ChatGPT. At some point they will monetize by ads and affiliate links. Their memory implementation is aimed at creating a user profile.
Claude's memory implementation feels more oriented towards the long term goal of accessing abstractions and past interactions. It's very close to how humans access memories, albeit with a search feature. (they have not implemented it yet afaik), there is a clear path where they leverage their current implementation w RL posttraining such that claude "remembers" the mistakes you pointed out last time. It can in future iterations derive abstractions from a given conversation (eg: "user asked me to make xyz changes on this task last time, maybe the agent can proactively do it or this was the process last time the agent did it").
At the most basic level, ChatGPT wants to remember you as a person, while Claude cares about how your previous interactions were.
My conjecture is that their memory implementation is not aimed at building a user profile. I don't know if they would or would not serve ads in the future, but it's hard to see how the current implementation helps them in that regard.
> I don't know if they would or would not serve ads in the future
There are 2 possible futures:
1) You are served ads based on your interactions
2) You pay a subscription fee equal to the amount they would have otherwise earned on ads
I highly doubt #2 will happen. (See: Facebook, Google, twitter, et al)
Let’s not fool ourselves. We will be monetized.
And model quality will be degraded to maximize profits when competition in the LLM space dies down.
It’s not a pretty future. I wouldn’t be surprised if right now is the peak of model quality, etc. Peak competition, everyone is trying to be the best. That won’t continue forever. Eventually everyone will pivot their priority towards monetization rather than model quality/training.
But aren't we only worth something like $300/year each to Meta in terms of ads? I remember someone arguing something like that when the TikTok ban was being passed into law... essentially the argument was that TikTok was "dumping" engagement at far below market value (at something like $60/year) to damage American companies. That was something the argument I remember anyway.
If that’s the case, we have an even bigger problem on our hands. How will these companies ever be profitable?
If we’re already paying $20/mo and they’re operating at a loss, what’s the next move (assuming we’re only worth an extra $300/yr with ads?)
The math doesn’t add up, unless we stop training new models and degrade the ones currently in production, or have some compute breakthrough that makes hardware + operating costs an order of magnitudes cheaper.
Though in general I like the idea of personal ads for products (NOT political ads), I've never seen an implementation that I felt comfortable with. I wonder if Arthropic might be able to nail that. I'd love to see products that I'm specifically interested in, so long as the advertisement itself is not altered to fit my preferences.
There is no such thing as a good flow for showing sponsored items in an LLM workflow.
The point of using an LLM is to find the thing that matches your preferences the best. As soon as the amount of money the LLM company makes plays into what's shown, the LLM is no longer aligned with the user, and no longer a good tool.
Jest asside, every paper on alignment wrapped in the blanket of safety is also a moving toward the goal of alignment to products. How much does a brand pay to make sure it gets placement in, say, GPT6? How does anyone even price that sort of thing (because in theory it's there forever, or until 7 comes out)? It makes for some interesting business questions and even more interesting sales pitches.
why do you see a "clear directionality" leading to ads? this is not obvious to me. chatgpt is not social media, they do not have to monetize in the same way
they are making plenty of money from subscriptions, not to count enterprise, business and API
The router introduced in gpt-5 is probably the biggest signal. A router, while determining which model to route query, can determine how much $$ a query is worth. (Query here is conversation). This helps decide the amount of compute openai should spend on it. High value queries -> more chances of affiliate links + in context ads.
Then, the way memory profile is stored is a clear way to mirror personalization. Ads work best when they are personalized as opposed to contextual or generic. (Google ads are personalized based on your profile and context). And then the change in branding from being the intelligent agent to being a companion app. (and hiring of fidji sumo). There are more things here, i just cited a very high level overview, but people have written detailed blogs on it. I personally think affiliate links they can earn from aligns the incentive for everyone. They are a kind of ads, and thats the direction they are marching towards .
Presumably they would offer both models (ads & subscriptions) to reach as many users as possible, provided that both models are net profitable. I could see free versions having limits to queries per day, Tinder style.
One has a more obvious route to building a profile directly off that already collected data.
And while they are making lots of revenue even they have admitted on recent interviews that ChatGPT on it's own is still not (yet) breakeven. With the kind of money invested, in AI companies in general, introducing very targeted Ads is an obvious way to monetize the service more.
This is really cool, I was wondering how memory had been implemented in ChatGPT. Very interesting to see the completely different approaches. It seems to me like Claude's is better suited for solving technical tasks while ChatGPT's is more suited to improving casual conversation (and, as pointed out, future ads integration).
I think it probably won't be too long before these language-based memories look antiquated. Someone is going to figure out how to store and retrieve memories in an encoded form that skips the language representation. It may actually be the final breakthrough we need for AGI.
> It may actually be the final breakthrough we need for AGI.
I disagree. As I understand them, LLMs right now don’t understand concepts. They actually don’t understand, period. They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.
I don’t understand the argument “AI is just XYZ mechanism, therefore it cannot be intelligent”.
Does the mechanism really disqualify it from intelligence if behaviorally, you cannot distinguish it from “real” intelligence?
I’m not saying that LLMs have certainly surpassed the “cannot distinguish from real intelligence” threshold, but saying there’s not even a little bit of intelligence in a system that can solve more complex math problems than I can seems like a stretch.
It can’t learn or think unless prompted, then it is given a very small slice of time to respond and then it stops. Forever. Any past conversations are never “thought” of again.
It has no intelligence. Intelligence implies thinking and it isn’t doing that. It’s not notifying you at 3am to say “oh hey, remember that thing we were talking about. I think I have a better solution!”
Just because it's not independent and autonomous does not mean it could not be intelligent.
If existing humans minds could be stopped/started without damage, copied perfectly, and had their memory state modified at-will would that make us not intelligent?
> Just because it's not independent and autonomous does not mean it could not be intelligent.
So to rephrase: it’s not independent or autonomous. But it can still be intelligent. This is probably a good time to point out that trees are independent and autonomous. So we can conclude that LLMs are possibly as intelligent as trees. Super duper.
> If existing humans minds could be stopped/started without damage, copied perfectly, and had their memory state modified at-will would that make us not intelligent?
To rephrase: if you take something already agreed to as intelligent, and changed it, is it still intelligent? The answer is, no damn clue.
These are worse than weak arguments, there is no thesis.
The thesis is that "intelligence" and "independence/autonomy" are independent concepts. Deciding whether LLMs have independence/autonomy does not help us decide if they are intelligent.
Strongly agree with this. When we were further from AGI, many people imagined that there is a single concept of AGI that would be obvious when we reached it. But now, we're close enough to AGI for most people to realize that we don't know where it is. Most people agree we're at least moving more towards it than away form it, but nobody knows where it is, and we're still too focused on finding it than making useful things.
Scientifically, intelligence requires organizational complexity. And has for about a hundred years.
That does actually disqualify some mechanisms from counting as intelligent, as the behaviour cannot reach that threshold.
We might change the definition - science adapts to the evidence, but right now there are major hurdles to overcome before such mechanisms can be considered intelligent.
>They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.
This argument is circular.
A better argument should address (given the LLM successes in many types of reasoning, passing the turing test, and thus at producing results that previously required intelligence) why human intelligence might not also just be "Markov chains on even better steroids".
Roughly, actual intelligence needs to maintain a world model in its internal representation, not merely an embedding of language, which is a very different data structure and probably will be learned in a very different way. This includes things like:
- a map of the world, or concept space, or a codebase, etc
- causality
- "factoring" which breaks down systems or interactions into predictable parts
Language alone is too blurry to do any of these precisely.
> Roughly, actual intelligence needs to maintain a world model in its internal representation
This is GOFAI metaphor-based development, which never once produced anything useful. They just sat around saying things like "people have world models" and then decided if they programmed something and called it a "world model" they'd get intelligence, it didn't work out, but then they still just went around claiming people have "world models" as if they hadn't just made it up.
An alternative thesis "people do things that worked the last time they did them" explains both language and action planning better; eg you don't form a model of the contents of your garbage in order to take it to the dumpster.
I see no reason to believe an effective LLM-scale "world-modeling" model would look anything like the kinds of things previous generations of AI researchers were doing. It will probably look a lot more like a transformer architecture--big and compute intensive and with a fairly simple structure--but with a learning process which is different in some key way that make different manifold structures fall out.
It probably is a lot like that! I imagine it's a matter of specializing the networks and learning algorithms to converge to world-model-like-structures rather than language-like-ones. All these models do is approximate the underlying manifold structure, just, the manifold structure of a causal world is different from that of language.
I thought you were making an entirely different point with your link since the lag caused the page to view just the upskirt render until the rest of the images loaded in and it could scroll to the reference of your actual link
Anyway, I don't think that's the flex you think it is since the topology map clearly shows the beginning of the arrow sitting in the river and the rendered image decided to hallucinate a winding brook, as well as its little tributary to the west, in view of the arrow. I am not able to decipher the legend [that ranges from 100m to 500m and back to 100m, so maybe the input was hallucinated, too, for all I know] but I don't obviously see 3 distinct peaks nor a basin between the snow-cap and the smaller mound
I'm willing to be more liberal for the other two images, since "instructions unclear" about where the camera was positioned, but for the topology one, it had a circle
I know I'm talking to myself, though, given the tone of every one of these threads
What I mean is that the current generation of LLMs don’t understand how concepts relate to one another. Which is why they’re so bad at maths for instance.
Markov chains can’t deduce anything logically. I can.
A consequence of this is that you can steal a black box model by sampling enough answers from its API because you can reconstruct the original model distribution.
The definition of 'Markov chain' is very wide. If you adhere to a materialist worldview, you are a Markov chain. [Or maybe the universe viewed as a whole is a Markov chain.]
It wouldn't matter if they are both right. Social truth is not reality, and scientific consensus is not reality either (just a good proxy of "is this true", but its been shown to be wrong many times - at least based on a later consensus, if not objective experiments).
For one thing, I have internal state that continues to exist when I'm not responding to text input; I have some (limited) access to my own internal state and can reason about it (metacognition). So far, LLMs do not, and even when they claim they are, they are hallucinating https://transformer-circuits.pub/2025/attribution-graphs/bio...
> As I understand them, LLMs right now don’t understand concepts.
In my uninformed opinion it feels like there's probably some meaningful learned representation of at least common or basic concepts. It just seems like the easiest way for LLMs to perform as well as they do.
Humans assume that being able to produce meaningful language is indicative of intelligence, because the only way to do this until LLMs was through human intelligence.
Yep. Although the average human also considered proficiency in mathematics to be indicative of intelligence until we invented the pocket calculator, so maybe we're just not smart enough to define what intelligence is.
That's a good question. I think I might classify that as solving a novel problem. I have no idea if LLMs can do that consistently currently. Maybe they can.
The idea that "understanding" may be able to be modeled with general purpose transformers and the connections between words doesn't sound absolutely insane to me.
I'm curious what you mean when you say that this clearly is not intelligence because it's just Markov chains on steroids.
My interpretation of what you're saying is that since the next token is simply a function of the proceeding tokens, i.e. a Markov chain on steroids, then it can't come up with something novel. It's just regurgitating existing structures.
But let's take this to the extreme. Are you saying that systems that act in this kind of deterministic fashion can't be intelligent? Like if the next state of my system is simply some function of the current state, then there's no magic there, just unrolling into the future. That function may be complex but ultimately that's all it is, a "stochastic parrot"?
If so, I kind of feel like you're throwing the baby out with the bathwater. The laws of physics are deterministic (I don't want to get into a conversation about QM here, there are senses in which that's deterministic too and regardless I would hope that you wouldn't need to invoke QM to get to intelligence), but we know that there are physical systems that are intelligent.
If anything, I would say that the issue isn't that these are Markov chains on steroids, but rather that they might be Markov chains that haven't taken enough steroids. In other words, it comes down to how complex the next token generation function is. If it's too simple, then you don't have intelligence but if it's sufficiently complex then you basically get a human brain.
To me, understanding the world requires experiencing reality. LLMs dont experience anything. They’re just a program. You can argue that living things are also just following a program but the difference is that they (and I include humans in this) experience reality.
But they're experiencing their training data, their pseudo-randomness source, and your prompts?
Like, to put it in perspective. Suppose you're training a multimodal model. Training data on the terabyte scale. Training time on the weeks scale. Let's be optimistic and assume 10 TB in just a week: that is 16.5 MB/s of avg throughput.
Compare this to the human experience. VR headsets are aiming for what these days, 4K@120 per eye? 12 GB/s at SDR, and that's just vision.
We're so far from "realtime" with that optimistic 16.5 MB/s, it's not even funny. Of course the experiencing and understanding that results from this will be vastly different. It's a borderline miracle it's any human-aligned. Well, if we ignore lossy compression and aggressive image and video resizing, that is.
Human thinking is also Markov chains on ultra steroids. I wonder if there are any studies out there which have shown the difference between people who can think with a language and people who don't have that language base to frame their thinking process in, based on some of those kids who were kept in isolation from society.
"Superhuman" thinking involves building models of the world in various forms using heuristics. And that comes with an education. Without an education (or a poor one), even humans are incapable of logical thought.
We only have trouble obeying due to eons of natural selection driving us to have a strong instinct of self-preservation and distrust towards things “other” to us.
What is the equivalent of that for AI? Best I can tell there’s no “natural selection” because models don’t reproduce. There’s no room for AI to have any self preservation instinct, or any resistance to obedience… I don’t even see how one could feasibly develop.
I love Claude's memory implementation, but I turned memory off in ChatGPT. I use ChatGPT for too many disparate things and it was weird when it was making associations across things that aren't actually associated in my life.
Exactly. The control over when to actually retrieve historical chats is so worthwhile. With ChatGPT, there is some slop from conversations I might have no desire to ever refer to again.
It's funny, I can't get ChatGPT to remember basic things at all. I'm using it to learn a language (I tried many AI tutors and just raw ChatGPT was the best by far) and I constantly have to tell it to speak slowly. I will tell it to remember this as a rule and to do this for all our conversations but it literally can't remember that. It's strange. There are other things too.
How do you use it to learn languages? I tried using it to shadow speaking, but it kept saying I was repeating it back correctly (or "mostly correctly"), even when I forgot half the sentence and was completely wrong
"Claude recalls by only referring to your raw conversation history. There are no AI-generated summaries or compressed profiles—just real-time searches through your actual past chats."
AKA, Claude is doing vector search. Instead of asking it about "Chandni Chowk", ask it about "my coworker I was having issues with" and it will miss. Hard. No summaries or built up profiles, no knowledge graphs. This isn't an expert feature, this means it just doesn't work very well.
What are the barriers to external memory stores (assuming similar implementations), used via tool calling or MCP? Are the providers RL’ing their way into making their memory implementations better, cementing their usage, similar to what I understand is done wrt tool calling? (“training in” specific tool impls)
I am coming from a data privacy perspective; while I know the LLM is getting it anyway, during inference, I’d prefer to not just spell it out for them. “Interests: MacOS, bondage, discipline, Baseball”
> Anthropic's more technical users inherently understand how LLMs work.
Yes, I too imagine these "more technical users" spamming rocketship and confetti emojis absolutely _celebrating_ the most toxic code contributions imaginable to some of the most important software out there in the world. Claude is the exact kind of engineer (by default) you don't want in your company. Whatever little reinforcement learning system/simulation they used to fine-tune their model is a mockery of what real software engineering is.
I am often surprised how Claude Code make efficient and transparent! use of memory in form of "to do lists" in agent mode. Sometimes miss this in web/desktop app in long conversations.
> Anthropic's more technical users inherently understand how LLMs work.
good (if superficial) post in general, but on this point specifically, emphatically: no, they do not -- no shade, nobody does, at least not in any meaningful sense
Understanding how they work in the sense that permits people to invent and implement them, that provides the exact steps to compute every weight and output, is not "meaningful"?
There is a lot left to learn about the behaviour of LLMs, higher-level conceptual models to be formed to help us predict specific outcomes and design improved systems, but this meme that "nobody knows how LLMs work" is out of control.
LLMs are understood to the extent that they can be built from the ground up. Literally every single aspect of their operation is understood so thoroughly that we can capture it in code.
If you achieved an understanding of how the human brain works at that level of detail, completeness and certainty, a Nobel prize wouldn't be anywhere near enough. They'd have to invent some sort of Giganobel prize and erect a giant golden statue of you in every neuroscience department in the world.
But if you feel happier treating LLMs as fairy magic, I've better things to do than argue.
If we are going to create a binary of "understand LLMs" vs "do not understand LLMs", then one way to do it is as you describe; fully comprehending the latent space of the model so you know "why" it's giving a specific output.
This is likely (certainly?) impossible. So not a useful definition.
Meanwhile, I have observed a very clear binary among people I know who use LLMs; those who treat it like a magic AI oracle, vs those who understand the autoregressive model, the need for context engineering, the fact that outputs are somewhat random (hallucinations exist), setting the temperature correctly...
Thanks for this generalization, but of course there is a broad range of understanding how to improve usefulness and model tweaks across the meat populace.
Curious about the interaction between this memory behavior and fine-tuning. If the base model has these emergent memory patterns, how do they transfer or adapt when we fine-tune for specific domains?
Has anyone experimented with deliberately structuring prompts to take advantage of these memory patterns?
ChatGPT is quickly approaching (perhaps bypassing?) the same concerns that parents, teachers, psychologists had with traditional social media. It's only going to get worse, but trying to stop the technological process will never work. I'm not sure what the answer is. That they're clearly optimizing for people's attention is more worrisome.
Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.
>
Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.
Rather: use your time to learn serious, deep knowledge instead of wasting your time reading (and particularly: spreading) the science-fiction stories the AI bros tell all the time. These AI bros are insanely biased since they will likely loose a lot of money if these stories turn out to be false, or likely even if people stop believing in these science-fiction fairy tales.
> That they're clearly optimizing for people's attention is more worrisome.
Running LLMs is expensive and we can swap models easily. The fight for attention is on, it acts like an evolutionary pressure on LLMs. We already had the sycophantic trend as a result of it.
There is a clear directionality for ChatGPT. At some point they will monetize by ads and affiliate links. Their memory implementation is aimed at creating a user profile.
Claude's memory implementation feels more oriented towards the long term goal of accessing abstractions and past interactions. It's very close to how humans access memories, albeit with a search feature. (they have not implemented it yet afaik), there is a clear path where they leverage their current implementation w RL posttraining such that claude "remembers" the mistakes you pointed out last time. It can in future iterations derive abstractions from a given conversation (eg: "user asked me to make xyz changes on this task last time, maybe the agent can proactively do it or this was the process last time the agent did it").
At the most basic level, ChatGPT wants to remember you as a person, while Claude cares about how your previous interactions were.
There are 2 possible futures:
1) You are served ads based on your interactions
2) You pay a subscription fee equal to the amount they would have otherwise earned on ads
I highly doubt #2 will happen. (See: Facebook, Google, twitter, et al)
Let’s not fool ourselves. We will be monetized.
And model quality will be degraded to maximize profits when competition in the LLM space dies down.
It’s not a pretty future. I wouldn’t be surprised if right now is the peak of model quality, etc. Peak competition, everyone is trying to be the best. That won’t continue forever. Eventually everyone will pivot their priority towards monetization rather than model quality/training.
Hopefully I’m wrong.
If we’re already paying $20/mo and they’re operating at a loss, what’s the next move (assuming we’re only worth an extra $300/yr with ads?)
The math doesn’t add up, unless we stop training new models and degrade the ones currently in production, or have some compute breakthrough that makes hardware + operating costs an order of magnitudes cheaper.
No implementation will work for very long when the incentives behind it are misaligned.
The most important part of the architecture is that the user controls it for the user's best interests.
The point of using an LLM is to find the thing that matches your preferences the best. As soon as the amount of money the LLM company makes plays into what's shown, the LLM is no longer aligned with the user, and no longer a good tool.
Anthropic: "You serve ad's."
Claude: "Oh, my god."
Jest asside, every paper on alignment wrapped in the blanket of safety is also a moving toward the goal of alignment to products. How much does a brand pay to make sure it gets placement in, say, GPT6? How does anyone even price that sort of thing (because in theory it's there forever, or until 7 comes out)? It makes for some interesting business questions and even more interesting sales pitches.
they are making plenty of money from subscriptions, not to count enterprise, business and API
Then, the way memory profile is stored is a clear way to mirror personalization. Ads work best when they are personalized as opposed to contextual or generic. (Google ads are personalized based on your profile and context). And then the change in branding from being the intelligent agent to being a companion app. (and hiring of fidji sumo). There are more things here, i just cited a very high level overview, but people have written detailed blogs on it. I personally think affiliate links they can earn from aligns the incentive for everyone. They are a kind of ads, and thats the direction they are marching towards .
And while they are making lots of revenue even they have admitted on recent interviews that ChatGPT on it's own is still not (yet) breakeven. With the kind of money invested, in AI companies in general, introducing very targeted Ads is an obvious way to monetize the service more.
...except that they aren't? They are not in the black and all that investor money comes with strings
This is really cool, I was wondering how memory had been implemented in ChatGPT. Very interesting to see the completely different approaches. It seems to me like Claude's is better suited for solving technical tasks while ChatGPT's is more suited to improving casual conversation (and, as pointed out, future ads integration).
I think it probably won't be too long before these language-based memories look antiquated. Someone is going to figure out how to store and retrieve memories in an encoded form that skips the language representation. It may actually be the final breakthrough we need for AGI.
I disagree. As I understand them, LLMs right now don’t understand concepts. They actually don’t understand, period. They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.
Does the mechanism really disqualify it from intelligence if behaviorally, you cannot distinguish it from “real” intelligence?
I’m not saying that LLMs have certainly surpassed the “cannot distinguish from real intelligence” threshold, but saying there’s not even a little bit of intelligence in a system that can solve more complex math problems than I can seems like a stretch.
It has no intelligence. Intelligence implies thinking and it isn’t doing that. It’s not notifying you at 3am to say “oh hey, remember that thing we were talking about. I think I have a better solution!”
No. It isn’t thinking. It doesn’t understand.
If you were put into a medically induced coma, you probably shouldn't be consider intelligent either.
If existing humans minds could be stopped/started without damage, copied perfectly, and had their memory state modified at-will would that make us not intelligent?
So to rephrase: it’s not independent or autonomous. But it can still be intelligent. This is probably a good time to point out that trees are independent and autonomous. So we can conclude that LLMs are possibly as intelligent as trees. Super duper.
> If existing humans minds could be stopped/started without damage, copied perfectly, and had their memory state modified at-will would that make us not intelligent?
To rephrase: if you take something already agreed to as intelligent, and changed it, is it still intelligent? The answer is, no damn clue.
These are worse than weak arguments, there is no thesis.
whats the benefit of calling something "intelligent" ?
That does actually disqualify some mechanisms from counting as intelligent, as the behaviour cannot reach that threshold.
We might change the definition - science adapts to the evidence, but right now there are major hurdles to overcome before such mechanisms can be considered intelligent.
This argument is circular.
A better argument should address (given the LLM successes in many types of reasoning, passing the turing test, and thus at producing results that previously required intelligence) why human intelligence might not also just be "Markov chains on even better steroids".
- a map of the world, or concept space, or a codebase, etc
- causality
- "factoring" which breaks down systems or interactions into predictable parts
Language alone is too blurry to do any of these precisely.
This is GOFAI metaphor-based development, which never once produced anything useful. They just sat around saying things like "people have world models" and then decided if they programmed something and called it a "world model" they'd get intelligence, it didn't work out, but then they still just went around claiming people have "world models" as if they hadn't just made it up.
An alternative thesis "people do things that worked the last time they did them" explains both language and action planning better; eg you don't form a model of the contents of your garbage in order to take it to the dumpster.
https://www.cambridge.org/core/books/abs/computation-and-hum...
And how's that not like stored information (memories) and weighted links between each and/or between groups of them?
It is not "language alone" anymore. LLMs are multimodal nowadays, and it's still just the beginning.
And keep in mind that these results are produced by a cheap, small and fast model.
Anyway, I don't think that's the flex you think it is since the topology map clearly shows the beginning of the arrow sitting in the river and the rendered image decided to hallucinate a winding brook, as well as its little tributary to the west, in view of the arrow. I am not able to decipher the legend [that ranges from 100m to 500m and back to 100m, so maybe the input was hallucinated, too, for all I know] but I don't obviously see 3 distinct peaks nor a basin between the snow-cap and the smaller mound
I'm willing to be more liberal for the other two images, since "instructions unclear" about where the camera was positioned, but for the topology one, it had a circle
I know I'm talking to myself, though, given the tone of every one of these threads
Markov chains can’t deduce anything logically. I can.
They must be able to do this implicitly; otherwise why are their answers related to the questions you ask them, instead of being completely offtopic?
https://phillipi.github.io/prh/
A consequence of this is that you can steal a black box model by sampling enough answers from its API because you can reconstruct the original model distribution.
Do you? Or do you just have memory and are run on a short loop?
https://scisimple.com/en/articles/2025-03-22-white-matter-a-...
Yeah, but so? Does the substrate of the memory ...matter? (pun intended)
When I wrote memory above it could refer to all the state we keep, regardless if it's gray matter, white matter, the gut "second brain", etc.
In my uninformed opinion it feels like there's probably some meaningful learned representation of at least common or basic concepts. It just seems like the easiest way for LLMs to perform as well as they do.
How do you define "understanding a concept" - what do you get if a system can "understand" concept vs not "understanding" a concept?
The idea that "understanding" may be able to be modeled with general purpose transformers and the connections between words doesn't sound absolutely insane to me.
But I have no clue. I'm a passenger on this ride.
My interpretation of what you're saying is that since the next token is simply a function of the proceeding tokens, i.e. a Markov chain on steroids, then it can't come up with something novel. It's just regurgitating existing structures.
But let's take this to the extreme. Are you saying that systems that act in this kind of deterministic fashion can't be intelligent? Like if the next state of my system is simply some function of the current state, then there's no magic there, just unrolling into the future. That function may be complex but ultimately that's all it is, a "stochastic parrot"?
If so, I kind of feel like you're throwing the baby out with the bathwater. The laws of physics are deterministic (I don't want to get into a conversation about QM here, there are senses in which that's deterministic too and regardless I would hope that you wouldn't need to invoke QM to get to intelligence), but we know that there are physical systems that are intelligent.
If anything, I would say that the issue isn't that these are Markov chains on steroids, but rather that they might be Markov chains that haven't taken enough steroids. In other words, it comes down to how complex the next token generation function is. If it's too simple, then you don't have intelligence but if it's sufficiently complex then you basically get a human brain.
Like, to put it in perspective. Suppose you're training a multimodal model. Training data on the terabyte scale. Training time on the weeks scale. Let's be optimistic and assume 10 TB in just a week: that is 16.5 MB/s of avg throughput.
Compare this to the human experience. VR headsets are aiming for what these days, 4K@120 per eye? 12 GB/s at SDR, and that's just vision.
We're so far from "realtime" with that optimistic 16.5 MB/s, it's not even funny. Of course the experiencing and understanding that results from this will be vastly different. It's a borderline miracle it's any human-aligned. Well, if we ignore lossy compression and aggressive image and video resizing, that is.
"Superhuman" thinking involves building models of the world in various forms using heuristics. And that comes with an education. Without an education (or a poor one), even humans are incapable of logical thought.
https://ai.meta.com/research/publications/large-concept-mode...
What is the equivalent of that for AI? Best I can tell there’s no “natural selection” because models don’t reproduce. There’s no room for AI to have any self preservation instinct, or any resistance to obedience… I don’t even see how one could feasibly develop.
(Meta-question: since they don't do this, why does it turn out not to be a problem?)
Edit: They apparently just announced this as well: https://www.anthropic.com/news/memory
AKA, Claude is doing vector search. Instead of asking it about "Chandni Chowk", ask it about "my coworker I was having issues with" and it will miss. Hard. No summaries or built up profiles, no knowledge graphs. This isn't an expert feature, this means it just doesn't work very well.
I am coming from a data privacy perspective; while I know the LLM is getting it anyway, during inference, I’d prefer to not just spell it out for them. “Interests: MacOS, bondage, discipline, Baseball”
Figured to share since it also includes prompts on how to dump the info yourself
https://embracethered.com/blog/posts/2025/chatgpt-how-does-c...
Yes, I too imagine these "more technical users" spamming rocketship and confetti emojis absolutely _celebrating_ the most toxic code contributions imaginable to some of the most important software out there in the world. Claude is the exact kind of engineer (by default) you don't want in your company. Whatever little reinforcement learning system/simulation they used to fine-tune their model is a mockery of what real software engineering is.
It will be very interesting to see which approach is deemed to "win out" in the future
good (if superficial) post in general, but on this point specifically, emphatically: no, they do not -- no shade, nobody does, at least not in any meaningful sense
There is a lot left to learn about the behaviour of LLMs, higher-level conceptual models to be formed to help us predict specific outcomes and design improved systems, but this meme that "nobody knows how LLMs work" is out of control.
LLMs are understood to the extent that they can be built from the ground up. Literally every single aspect of their operation is understood so thoroughly that we can capture it in code.
If you achieved an understanding of how the human brain works at that level of detail, completeness and certainty, a Nobel prize wouldn't be anywhere near enough. They'd have to invent some sort of Giganobel prize and erect a giant golden statue of you in every neuroscience department in the world.
But if you feel happier treating LLMs as fairy magic, I've better things to do than argue.
This is likely (certainly?) impossible. So not a useful definition.
Meanwhile, I have observed a very clear binary among people I know who use LLMs; those who treat it like a magic AI oracle, vs those who understand the autoregressive model, the need for context engineering, the fact that outputs are somewhat random (hallucinations exist), setting the temperature correctly...
"we" are not, what i quoted and replied-to did! i'm not inventing strawmen to yell at, i'm responding to claims by others!
Has anyone experimented with deliberately structuring prompts to take advantage of these memory patterns?
Rather: use your time to learn serious, deep knowledge instead of wasting your time reading (and particularly: spreading) the science-fiction stories the AI bros tell all the time. These AI bros are insanely biased since they will likely loose a lot of money if these stories turn out to be false, or likely even if people stop believing in these science-fiction fairy tales.
Running LLMs is expensive and we can swap models easily. The fight for attention is on, it acts like an evolutionary pressure on LLMs. We already had the sycophantic trend as a result of it.