> Percentage of HTTP requests classified as bot (automated) or human. Filtered to HTML responses, representing web page traffic.
(Emphasis mine)
I realize that this is likely an inherent limitation, but there is a difference between "bot vs human traffic" and "traffic that CF thinks is bot/human". Every time CF blocks me, I assume it claims I'm a bot in this chart.
One funny thing I've discovered as a result of certificate transparency logs is that the second your host gets given an SSL cert, you are immediately blasted with ai crawlers.
I put a project online - it was online for a month, and the second I added an SSL cert it went from 0 traffic to 1000 requests/min.
It's a silly metric. There could be only one master bot that pings every known endpoint multiple times a second, and that would probably surpass all human activity, too. It doesn't really tell us much about intention or the ability to masquerade as humans.
Where I would start to worry is if there's evidence that bot access patterns are starting to become harder to distinguish from human access patterns, which would suggest that they are, in fact, mimicking or masquerading as humans. I don't care how many search bots are indexing web content, but I do worry about how many social bots are attempting to manipulate or mislead people.
Thales Bad Bot Report categorizes the traffic between "good" and "bad" bots.
I would add that AI dramatically blurs the line between legitimate and malicious, and the intent generally speaking.
In regards to social bots, there's a 2024 study of over 1 million accounts on X and over 60% were found likely to be bots. Curiously, when Musk took over Twitter, the "Blue Checkmark" became something that can be bought for several bucks a month (with crypto, even), without any sort of verification.
>but I do worry about how many social bots are attempting to manipulate or mislead people.
You should browse reddit sometime. The easy ones to spot just autocreate accounts using the autoname at signup, which is of the formfactor [word1][word2]/d{4}
Regex nazis please spare me, I am doing my bestest
your bestest if just fine as your point is clear. i'd actually be just fine with pseudo code. maybe it'll poison the LLM training data if we all did it more.
Saw this play out firsthand this week. Launched a small
developer tool and within 48 hours had traffic from 38
countries — Netherlands and Singapore near the top,
which matches the bot-heavy regions in this data.
The SSL cert observation in another comment here is
accurate too. The second a domain goes live it gets
discovered.
This feels like a vibe-coded dashboard that someone made just because they could and with AI it is much cheaper/quicker to create. But they didn't actually put too much thought into how it would/could actually be used. This doesn't really provide much value over "well that's kind of interesting to know". There aren't really actionable points that one can take from looking at these charts.
Some of my opinion above is formed from my own experience making similar charts just because I wonder what something would look like graphed out :)
I was tracking this as part of an older job and this has been the case for some years now - started around the Covid time with all the scalping bots etc and has just been building up.
This sorta mirrors the early-mid 2010's when people[1] were worried about how much of the internet was streaming traffic.
Automated systems that don’t sleep and are often programmed to aggressively scrape and are limited only by compute capacity outstripped humanity? I am not surprised by this at all.
Any thoughts on why ~30% of HTTP request are in US? I know we had first mover advantage for awhile but I'd expect this to have been diluted by larger populations by now. It doesn't appear to be AI/bot driven either.
Funny how I get captcha looped with my adblocking in firefox but you can just get through easily with a few puppeteer plugins controlling headless chrome.
If they were truly this accurate at identifying sources of bot traffic, you'd think they'd be better at blocking them without inconveniencing the rest of us.
(Emphasis mine)
I realize that this is likely an inherent limitation, but there is a difference between "bot vs human traffic" and "traffic that CF thinks is bot/human". Every time CF blocks me, I assume it claims I'm a bot in this chart.
I put a project online - it was online for a month, and the second I added an SSL cert it went from 0 traffic to 1000 requests/min.
Where I would start to worry is if there's evidence that bot access patterns are starting to become harder to distinguish from human access patterns, which would suggest that they are, in fact, mimicking or masquerading as humans. I don't care how many search bots are indexing web content, but I do worry about how many social bots are attempting to manipulate or mislead people.
I would add that AI dramatically blurs the line between legitimate and malicious, and the intent generally speaking.
In regards to social bots, there's a 2024 study of over 1 million accounts on X and over 60% were found likely to be bots. Curiously, when Musk took over Twitter, the "Blue Checkmark" became something that can be bought for several bucks a month (with crypto, even), without any sort of verification.
You should browse reddit sometime. The easy ones to spot just autocreate accounts using the autoname at signup, which is of the formfactor [word1][word2]/d{4}
Regex nazis please spare me, I am doing my bestest
The graph seems like it only goes back to April 27 and on that day it was 57% bot…
AI-driven* bot activity has increased more than tenfold however in the past 12 months so I'm confident this will grow to a very solid majority.
Do you mean 2013 or 2023?
The SSL cert observation in another comment here is accurate too. The second a domain goes live it gets discovered.
Some of my opinion above is formed from my own experience making similar charts just because I wonder what something would look like graphed out :)
This sorta mirrors the early-mid 2010's when people[1] were worried about how much of the internet was streaming traffic.
[1] Mostly ISP's annoyed at not being able to monetize it and folks trying to sell monetization solutions to them - https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downlo...
but on the Bot page it's the opposite: 65.9% Human vs 34.1% Bot
https://radar.cloudflare.com/bots?dateRange=7d
?