> [...] he had intervened at forces that were deploying commercially available AI tools before they had been properly assessed [...] “All forces have got a good policy on the use of Copilot,” Murray said. “All forces will have a policy that says, ‘Check everything that it produces’.”
Not only are they using AI before they've properly assessed them, they also end up using Copilot which must be one of the worse AIs currently available, probably because of existing Microsoft relations. And on top of all that, they hope to be able to rely on "Please review the outputs" which obviously isn't an actual solution here, of course people will get complacent and throw stuff over the wall whenever they can.
> “All forces will have a policy that says, ‘Check everything that it produces’.”
Everyone I talk to (including outside of tech) is going through this phase at their companies. It’s not working.
Checking the output seems like a simple request, but the question becomes: Check against what? If the police are making a document that sources from another report that another officer used AI to produce from their notes which were also run through AI and on and on, an inconsistency that leaks in at a previous step will check out when someone reviews the output against the inputs.
We’re all also discovering that many people’s idea of reviewing the output is to skim it and verify that it looks convincing enough. Checking facts is hard and takes time. These people are using AI because they want to work less, not to give themselves extra work.
One can ask, what is a practical difference between “Check everything that it produces” and “Do all the work yourself”?
It’s not typing that’s the bottleneck, at least not often, so this is essentially assuming that you can do all the needed work without actually doing it, which is obviously wishful thinking.
This is definitely the most interesting question in a ton of AI applications. I think folks should be really be spending a lot of time on figuring out how to deterministically check AI outputs in a way that's reliable in order to reduce the amount of work a human has to check, and to build tools that speed up the checking process.
Thinking about all of the fake citations in legal submissions that have come up of late, it seems pretty straightforward to set up a regex that captures all forms in which a cited case might be written (I could be wrong but I'd assume there's some standard variety of formats) and search those against a database (again assuming such a database exists) to ensure they all exist.
Then for the tougher problem of making sure that the cited cases say whatever the document citing them says they do, you could have an LLM run through the document, pull out the text with the case name and text about why it's being cited, then read the case and try to determine whether the reason for citing it is valid. Rather than just give a yes/no, you'd put the doc in front of the user and let them jump from citation to citation. On each citation, it'd pop up a card that shows the literal text of why it's being cited, a judgement from the LLM of whether it matches what the case says, and snippets of text from the case as evidence + deeplinks to that text within the case.
Or maybe you wouldn't even want to give the LLM's judgement since people might rely on that without reading, but there's definitely a way to speed up the review.
I believe OpenEvidence does something like this with medical papers. If you ask it a medical question, it doesn't answer so much as link you directly to the relevant papers so you can read them and determine if they're useful. Avoids all of the potential risks of using an LLM but still hugely valuable and time-saving for docs.
excellent point. it is like saying computers in the 90s.
remember how the bank giving your money to the wrong person was a crime? and then when "the computet" did it was just business as usual and you paid more for banking because now they had "computer fraud" insurance?
same thing. cop deliver false report, jail (hah! i know). now, it was "the Ai". so no jail, they will go back and put rules for the cop to read or something.
and we are making everything worse by the minute. One gov push back on Ai nonsense, ibm/rh cames up with all sort of lies that would make any engineer or research laugh on their faces (federated learning being for privacy, instead of cost cutting. or explainable Ai being real, and not something bolted after the inference with extra unexplainable inference. etc.) but that are good enough to fool the regulator.
All this, so people like us can do a job that wasn't that hard in the first place (and in fact was quite comfortable all things considered), just a little bit easier, for companies that are promising to lay us off for productivity gains that aren't even measurable.
> Checking the output seems like a simple request, but the question becomes: Check against what?
A colleague of mine circulated "minutes" from a meeting last week, there were only three of us in the meeting (one external service provider, my colleague + me).
There were several items on the "minutes" which I didn't recall being discussed, so I asked him if he'd had AI help, he said AI was filling in the gaps based on its knowledge of other discussions he'd had with it.
This is a great way of capturing the core problem. Fact-checking a document is a difficult skill! Expecting people who've never had to do that before to just start doing it - when these AI tools are supposed to save them time and make life easier - is not reasonable.
> We’re all also discovering that many people’s idea of reviewing the output is to skim it and verify that it looks convincing enough.
I mean over time I've come to believe that most people are just _bad at reading_ - if you ask these people to compare two documents they'll say that they are the same if the wording or surface "feel" is at all similar, even if in the precision of the statement they say the opposite.
See also: People being generally bad at listening and hearing what they want to instead of anything quantitatively derivable from what the other person said.
Here's a massive document, without any real context as to what thinking went into the points it's making, tell me if it looks ok. Oh, and there's 10 more where that came from.
We're outsourcing the thinking to the recipient.
Yes, it's way easier to create the report now, but it's not being honed down to the crux of the points it needs to make. And the reviewers are expected to what? Up their ability to mentally consume and reason about reports.
I mean, barrier number 1: did you read it yourself before asking someone else seems too high for some..
The mindset must be that if you use AI (which I happen to advocate for) you are also responsible for the output, if you use the output publicly. AI is obviously very powerful if used responsibly - the human is responsible for it once it is used - however it’s used.
I appreciate the sentiment, but this is exactly like the expectation that people can be responsible for intervening when self-driving or driver-assistance goes wrong. Human brains are strongly driven to conserve energy. If nothing seems to be happening - when errors become less and less frequent - the more difficult it becomes to guarantee intervention, and the less practiced the human will be at doing so.
I have written factory tests, in which I injected errors to make sure that the factory workers didn't develop "click next" syndrome and actually noticed errors. That's what you'd have to do. It's hard to get an organisation to stick to that, when they add up the time they're paying for in detecting fake errors.
I think the problem is that, this is practically speaking impossible adjacent. I think generally speaking writing is way easier than editing, especially at scale. This isn’t binary or all or nothing, it’s not like “you can never use AI”. But I think we need to go back to augmentation over generation.
A person produces the content and AI removes barriers, and contextually accelerates the process keeping you in a flow state, rather than AI generates human edits.
Something happening in the US right now is that the "presumption of regularity" is being openly challenged by judges. To the best of my understanding , it's the presumption that the testimony of the police is truthful until proven otherwise.
I think "check everything that it produces" will ultimately have to happen in cross examination on the witness stand. "Did you use AI" will be the first question.
> on top of all that, they hope to be able to rely on "Please review the outputs" which obviously isn't an actual solution here, of course people will get complacent and throw stuff over the wall whenever they can.
This is honestly the fundamental problem of AI as I see it
When we offload our work to a different person we can calibrate our expectations to our past experiences with that person. With AI the experience is not very consistent. To use AI effectively you basically should treat it as a low trust, brand new coworker every single time you use it
That doesn't really scale, so people have two choices: be constantly hyper vigilant for mistakes the AI makes, or become complacent and trust it more than they should
People rightly point out that humans make mistakes too, not just AI. But humans have a pretty manageable cap on the amount of output they can produce. One human can pretty thoroughly review the outputs of a small team of other humans
One human can't possibly thoroughly review the volume of output that an LLM they are prompting can produce
Yeah, it's like declaring self driving safe because people are told to remain alert with their hands on the wheel, ready to take over in an instant. It's a charade.
That’s why most people that says LLM doesn’t work. It’s not that it can’t produce a good output once in a while, it’s that you can’t guarantee it. Or reduce the risks of a bad output. It’s a chaotic element and the cost of being alert enough to ensure consistency (if it’s feasible at all) is higher than just doing without.
But AI proponents are more than happier to brandish carefully curated anecdotes than to do a systematic study of risks and impacts.
We can get ambitious and try to head toward a form of statement more probative than even an officer personally typing a report: Have them narrate the facts of the event and the reasons for their decisions as soon as possible after the incident, as a video. Additions and corrections made later would be separate annotations. Where text is needed, auto-transcribe.
Courts prefer to have live witness testimony for a good reason. Detectives prefer to have statements made with the events as fresh as possible for a good reason. At the same time an oral report can save time and labor. Where we can take police or witness testimony verbally, more promptly, with less work, and including body language, we should.
If you listen closely to UK cabinet ministers you can intuit that they are being horse whispered into handing over vast sums of taxpayer money to firms for AI who are promising solutions to the productivity gap (chasm?) that the UK is plagued by.
I can say with certainty lots of money will be spent, and the gap will not be filled. I would bet my life on it.
Speaking as someone who subscribed earlier this year, the Lex column does provide subtle stock tips, but my real interest in it is the fact that it’s aimed at people who have a financial interest in accurate news, so the reporting doesn’t veer off into pushing moralistic narratives like other UK news sources.
The FT is a paper read by people who don't work in finance assuming that people in finance read it. It hasn't been relevant in finance or even business for many years now. You can also tell this from the comments section which has, like the paper, turned into establishment/"centrist dad" central.
WSJ and Bloomberg took a lot of the top markets/companies people almost a decade ago now (Andrea Felstead was one, genuinely someone who knew UK retail very well). The majority of the remaining columnists either worked in politics or are politics-adjacent. There is almost no detailed finance market coverage. The UK companies stuff was spun out into the Shares magazine 20 years ago.
The FT reflects British society, nothing could be more grubby than becoming involved in commerce. Many of the people who moved up and out go into political journalism because that is high status (i.e. Peston). The FT also has a nasty habit of creating special jobs for people if they are high status enough (Kuper is one, Keynes is the new one, there are many more). The FT is a basically unreformed backwater that is a bit like it was 80s, nothing has really changed. I know a few people who work there in undemanding roles (every couple of weeks attending an expenses paid dinner with a celebrity) and got their job through nepotism. It isn't like anywhere else in UK business or even journalism because of the corporate subscription revenue, the editor is able to run it like a fief.
To give specific examples: Chris Giles is somewhat notorious for being a complete hack. If you are somewhat familiar with how news is made, you should be able to read his stories and work out exactly what conversations led to that story being written. In many cases it is Giles talking to someone adjacent to or in politics. Martin Wolf is a complete dinosaur, if he writes a column you can predict exactly what his take will be because he hasn't had a new idea since 1990. JBM is probably the only journalist who actually writes interesting things, these things however often seem to be conflicted with his personal interests/conversations with civil servants. Stuart Kirk...how does he have a column? Barely worked in markets, somehow the markets guy. Shrimsley, politics guy. Cavendish, worked for Cameron. Beattie, basically a Martin Wolf-lite. Pilita Clark, Lidl Kellaway. It goes on and on. Ineffectual posh people with the most anodyne, pro-establishment positions boring everyone to death with their thoughts.
Exceptions: Lucy Kellaway, long gone now but she was very good (nothing to do with business or finance though, more social commentary). Janen Ganesh, also good (again, social commentary). In actual business or finance...nothing interesting. John Lee was quite interesting. MSW was somewhat interesting but also said catastrophically incorrect things often...but she was good for marketing if you started a fund and actually was primarily a fund management journalist (although at the FT she often strayed into politics).
Also, a special mention for Lionel Barber...admitted to leaking stories to traders in a documentary, there are multiple laws against this in the UK, never charged, never investigated. A lot of what changed with the paper happened because of Barber and his opinions on things like Brexit where he interpreted the role of the FT as being a political activist first. Same thing has happened at the Economist when Micklethwait left, the lure of politics and being culturally relevant is too strong.
Lex is also useless. They had some decent people there writing on niche topics. But after the incident with Barber/Wirecard, there has been a big change in how that part of the world works. Many years ago, you would call up someone from Lex to leak a fake M&A rumour and (if you gave them something real later) they would "leak" it (this was confirmed publicly in the Operation Tabernula trial, this is why many papers have pulled back completely on any market-adjacent coverage...afaik, Mark Kleinman is basically the only person trying to do this stuff anymore and it is a million miles from what it once was).
You can usually find a way to get it for free or cheaper through a library, other institution or your employer if working in the financial sector or education.
It's worth it for me with the physical paper. I don't think you'll get any valuable tips but the reporting is moderately better than other papers. Gives me something to do instead of immediately picking up my phone.
It seems like; there’s two kinds of data that might go into this, boilerplate and subjective information. Subjective information should be input by the police, because I would assert the specific wording matters. It matters that the words used to describe what the policeman saw comes out of the policeman’s brain. If it’s boilerplate, I’d AI really more reliable then copy-paste?
I never thought AI would be the fork in the road to Idiocracy. Can you believe that the people whose evidence and testimony in court means so much, value The Great Hallucinator over hand work? They give a few nice sounding options for using AI ("checking child porn"), but it of course won't end there. They already started. People are so fucking lazy.
It's funny to me how shocked people seem to be at the realization of just how fucking lazy people are. Sure, there are definitely differences in actual improvement through technology compared to sheer laziness. Movies tropes like Idicocray or even animated like Wall-E weren't far off either. It's just so much easier to be lazy. The number of people that do not go down these sci-fi trope timelines will be pretty small to the point of just being the weird odd balls that everyone else would just shut up already.
Not only are they using AI before they've properly assessed them, they also end up using Copilot which must be one of the worse AIs currently available, probably because of existing Microsoft relations. And on top of all that, they hope to be able to rely on "Please review the outputs" which obviously isn't an actual solution here, of course people will get complacent and throw stuff over the wall whenever they can.
Everyone I talk to (including outside of tech) is going through this phase at their companies. It’s not working.
Checking the output seems like a simple request, but the question becomes: Check against what? If the police are making a document that sources from another report that another officer used AI to produce from their notes which were also run through AI and on and on, an inconsistency that leaks in at a previous step will check out when someone reviews the output against the inputs.
We’re all also discovering that many people’s idea of reviewing the output is to skim it and verify that it looks convincing enough. Checking facts is hard and takes time. These people are using AI because they want to work less, not to give themselves extra work.
It’s not typing that’s the bottleneck, at least not often, so this is essentially assuming that you can do all the needed work without actually doing it, which is obviously wishful thinking.
Thinking about all of the fake citations in legal submissions that have come up of late, it seems pretty straightforward to set up a regex that captures all forms in which a cited case might be written (I could be wrong but I'd assume there's some standard variety of formats) and search those against a database (again assuming such a database exists) to ensure they all exist.
Then for the tougher problem of making sure that the cited cases say whatever the document citing them says they do, you could have an LLM run through the document, pull out the text with the case name and text about why it's being cited, then read the case and try to determine whether the reason for citing it is valid. Rather than just give a yes/no, you'd put the doc in front of the user and let them jump from citation to citation. On each citation, it'd pop up a card that shows the literal text of why it's being cited, a judgement from the LLM of whether it matches what the case says, and snippets of text from the case as evidence + deeplinks to that text within the case.
Or maybe you wouldn't even want to give the LLM's judgement since people might rely on that without reading, but there's definitely a way to speed up the review.
I believe OpenEvidence does something like this with medical papers. If you ask it a medical question, it doesn't answer so much as link you directly to the relevant papers so you can read them and determine if they're useful. Avoids all of the potential risks of using an LLM but still hugely valuable and time-saving for docs.
remember how the bank giving your money to the wrong person was a crime? and then when "the computet" did it was just business as usual and you paid more for banking because now they had "computer fraud" insurance?
same thing. cop deliver false report, jail (hah! i know). now, it was "the Ai". so no jail, they will go back and put rules for the cop to read or something.
and we are making everything worse by the minute. One gov push back on Ai nonsense, ibm/rh cames up with all sort of lies that would make any engineer or research laugh on their faces (federated learning being for privacy, instead of cost cutting. or explainable Ai being real, and not something bolted after the inference with extra unexplainable inference. etc.) but that are good enough to fool the regulator.
A colleague of mine circulated "minutes" from a meeting last week, there were only three of us in the meeting (one external service provider, my colleague + me).
There were several items on the "minutes" which I didn't recall being discussed, so I asked him if he'd had AI help, he said AI was filling in the gaps based on its knowledge of other discussions he'd had with it.
Glorious.
I mean over time I've come to believe that most people are just _bad at reading_ - if you ask these people to compare two documents they'll say that they are the same if the wording or surface "feel" is at all similar, even if in the precision of the statement they say the opposite.
See also: People being generally bad at listening and hearing what they want to instead of anything quantitatively derivable from what the other person said.
None of this is written as an excuse for AI.
We're outsourcing the thinking to the recipient.
Yes, it's way easier to create the report now, but it's not being honed down to the crux of the points it needs to make. And the reviewers are expected to what? Up their ability to mentally consume and reason about reports.
I mean, barrier number 1: did you read it yourself before asking someone else seems too high for some..
*Yes, I know
I have written factory tests, in which I injected errors to make sure that the factory workers didn't develop "click next" syndrome and actually noticed errors. That's what you'd have to do. It's hard to get an organisation to stick to that, when they add up the time they're paying for in detecting fake errors.
A person produces the content and AI removes barriers, and contextually accelerates the process keeping you in a flow state, rather than AI generates human edits.
I think "check everything that it produces" will ultimately have to happen in cross examination on the witness stand. "Did you use AI" will be the first question.
This is honestly the fundamental problem of AI as I see it
When we offload our work to a different person we can calibrate our expectations to our past experiences with that person. With AI the experience is not very consistent. To use AI effectively you basically should treat it as a low trust, brand new coworker every single time you use it
That doesn't really scale, so people have two choices: be constantly hyper vigilant for mistakes the AI makes, or become complacent and trust it more than they should
People rightly point out that humans make mistakes too, not just AI. But humans have a pretty manageable cap on the amount of output they can produce. One human can pretty thoroughly review the outputs of a small team of other humans
One human can't possibly thoroughly review the volume of output that an LLM they are prompting can produce
But AI proponents are more than happier to brandish carefully curated anecdotes than to do a systematic study of risks and impacts.
Hardly ideal but this isn't a static problem. Either the work gets done or cases get lost on slop.
Courts prefer to have live witness testimony for a good reason. Detectives prefer to have statements made with the events as fresh as possible for a good reason. At the same time an oral report can save time and labor. Where we can take police or witness testimony verbally, more promptly, with less work, and including body language, we should.
And video is more AI tamper evident than text.
I can say with certainty lots of money will be spent, and the gap will not be filled. I would bet my life on it.
WSJ and Bloomberg took a lot of the top markets/companies people almost a decade ago now (Andrea Felstead was one, genuinely someone who knew UK retail very well). The majority of the remaining columnists either worked in politics or are politics-adjacent. There is almost no detailed finance market coverage. The UK companies stuff was spun out into the Shares magazine 20 years ago.
The FT reflects British society, nothing could be more grubby than becoming involved in commerce. Many of the people who moved up and out go into political journalism because that is high status (i.e. Peston). The FT also has a nasty habit of creating special jobs for people if they are high status enough (Kuper is one, Keynes is the new one, there are many more). The FT is a basically unreformed backwater that is a bit like it was 80s, nothing has really changed. I know a few people who work there in undemanding roles (every couple of weeks attending an expenses paid dinner with a celebrity) and got their job through nepotism. It isn't like anywhere else in UK business or even journalism because of the corporate subscription revenue, the editor is able to run it like a fief.
To give specific examples: Chris Giles is somewhat notorious for being a complete hack. If you are somewhat familiar with how news is made, you should be able to read his stories and work out exactly what conversations led to that story being written. In many cases it is Giles talking to someone adjacent to or in politics. Martin Wolf is a complete dinosaur, if he writes a column you can predict exactly what his take will be because he hasn't had a new idea since 1990. JBM is probably the only journalist who actually writes interesting things, these things however often seem to be conflicted with his personal interests/conversations with civil servants. Stuart Kirk...how does he have a column? Barely worked in markets, somehow the markets guy. Shrimsley, politics guy. Cavendish, worked for Cameron. Beattie, basically a Martin Wolf-lite. Pilita Clark, Lidl Kellaway. It goes on and on. Ineffectual posh people with the most anodyne, pro-establishment positions boring everyone to death with their thoughts.
Exceptions: Lucy Kellaway, long gone now but she was very good (nothing to do with business or finance though, more social commentary). Janen Ganesh, also good (again, social commentary). In actual business or finance...nothing interesting. John Lee was quite interesting. MSW was somewhat interesting but also said catastrophically incorrect things often...but she was good for marketing if you started a fund and actually was primarily a fund management journalist (although at the FT she often strayed into politics).
Also, a special mention for Lionel Barber...admitted to leaking stories to traders in a documentary, there are multiple laws against this in the UK, never charged, never investigated. A lot of what changed with the paper happened because of Barber and his opinions on things like Brexit where he interpreted the role of the FT as being a political activist first. Same thing has happened at the Economist when Micklethwait left, the lure of politics and being culturally relevant is too strong.
Lex is also useless. They had some decent people there writing on niche topics. But after the incident with Barber/Wirecard, there has been a big change in how that part of the world works. Many years ago, you would call up someone from Lex to leak a fake M&A rumour and (if you gave them something real later) they would "leak" it (this was confirmed publicly in the Operation Tabernula trial, this is why many papers have pulled back completely on any market-adjacent coverage...afaik, Mark Kleinman is basically the only person trying to do this stuff anymore and it is a million miles from what it once was).
Are we thinking about how we’re using it, or???
It seems like; there’s two kinds of data that might go into this, boilerplate and subjective information. Subjective information should be input by the police, because I would assert the specific wording matters. It matters that the words used to describe what the policeman saw comes out of the policeman’s brain. If it’s boilerplate, I’d AI really more reliable then copy-paste?
Couple it with a society in which nothing really matters no one really cares? Well that’s what we have now.