The Permission Slip

(cringely.com)

14 points | by B1FF_PSUVM 2 days ago

6 comments

  • atleastoptimal 1 hour ago
    I think the use of the word "hallucination" with respect to AI confidently making errors has led a lot of people astray, including the author.

    He claims that his company has "solved" hallucination by creating a verifiable fact-finding system, which is like saying that a person has solved plan crashes by creating a plane that never leaves the ground.

    When an LLM says something incorrect, it often is due to that LLM reaching the limits of its abilities, but it doesn't "know" (for lack of a better term) what being wrong feels like, so it will try its best to fit the information it has into a compelling story. The reason why scaling leads to fewer hallucinations is that the model can hold more abstractions, more facts about the world, it can work through the complex, vague machinery of reason with more scaffolding, and more of a buffer (via its weights) to reason with nuance. This is why LLM's are useful, not because they can be fed into a fact-retrieval system, but because they can produce new information via the association of things they know.

    The point is, we want LLM's to actually produce new information and work out things via their thinking, not be limited to citing facts that already exist and avoid veering into the limits of its abilities. In that sense hallucination is really just exposing the limits of scale, which would necessitate scaling models further.

    Scaling is the only way we have gotten to this interesting, emergent property of LLM's. Further, the best way to make small models which don't hallucinate (that we've found so far) is to train a big model first, then distill it, or use it as a teacher to a smaller model. Either way, pursuing scale is the most defensible strategy, and a more robust solution to hallucination.

    • decimalenough 1 hour ago
      > it can work through the complex, vague machinery of reason with more scaffolding

      No, it can hold more floating point numbers.

      I'm not an expert in the field, but I've yet to see a solid rebuttal to this paper;

      https://arxiv.org/abs/2401.11817

      • atleastoptimal 48 minutes ago
        A claim that LLM's can in a theoretical sense be 100% accurate all the time is not the same as the claim that scaling models with more compute/params will reduce hallucination. The former is a far stronger claim and I agree with the paper in that it probably isn't the case, but we don't rely on general reasoners (a.k.a. humans) to be 100% accurate all the time either.

        > No, it can hold more floating point numbers.

        Fallacy of composition. Just because an LLM is made up of floating point numbers doesn't mean its capabilities are limited to that of bare floating point numbers, in the same way that the individual faculties of a neuron don't preclude the human brain from emergent properties born from the synthesis of its synapses.

      • evrydayhustling 1 hour ago
        That paper shows hallucinations can't be eliminated, due to approximation error. But it is completely compatible with hallucination becoming less probable as scale reduces that approximation error.
  • Animats 46 minutes ago
    "The hallucination problem is the difference between a clever toy and a system a hospital or a bank or a court can actually rely on. It is the whole ballgame for enterprise AI."

    This is the big problem with "agentic" AI. If you let the AI system do anything important, it's going to screw up reasonably often, and screw up in an expensive way occasionally. The usual solution to this is to make the errors an externality - dump them on the consumer-grade end user or an employee. As Google Search puts it, at the end of each result, "AI can make mistakes, so double-check responses".

    External checking, which Cringley is pushing, has potential for search type systems. It's not likely to help when there's no one source text that can be used as an authority for checking. It's not likely to help with systems that actually do something.

    How's end to end neural net driving working out?

  • JSR_FDED 2 days ago
    If “scale will solve everything”, even (as the article contends) things that could be solved more cheaply in other ways, that’s of course wasteful and inefficient.

    But what about things that only scale can achieve? Like the superhuman security vulnerability assessment capabilities that Fable showed? That would be a reason to continue to spend, wouldn’t it?

    • thewebguyd 30 minutes ago
      Do we know that Fable/Mythos was the result of throwing more hardware & data at it? Anthropic is still pretty compute constrained. More does not always equal better, it very well could have been Fable/Mythos came as a result of better data curation or some other break through, not necessary more parameters & compute.

      I don't think "just throw more compute at it forever" is the only way to go, but if that turns out to be true, the labs aren't going to share that knowledge because that would be a risk to the dump trucks of cash getting dumped at their feet if they came out and said "You know, we don't really need much more compute, we found a better way to make a smarter model" the cash would slow down.

    • B1FF_PSUVM 2 days ago
      > scale will solve

      I have a bad feeling about this, and it's about us, not AIs ...

      (I fear that we're #$&@%!## most of the time, and just oblivious about it)

  • ashley95 1 hour ago
    > The hallucination problem is the difference between a clever toy and a system a hospital or a bank or a court can actually rely on. It is the whole ballgame for enterprise AI.

    It... isn't? Hallucinations are surely a limitation of LLMs, but I haven't heard people worrying about it as some kind of existential question for a long time. You accept it's a non-deterministic system. You build appropriate safeguards or deterministic checks around it. And you accept it's not perfect, there will be occasional mistakes. No large enough organization can claim determinism for any sufficiently large system.

  • applfanboysbgon 43 minutes ago
    My favourite genre, incoherent Claude-generated slop about how Claude is wrong, with an appetizer of self-aggrandizement to boot.
  • vintagedave 1 hour ago
    [flagged]