Now AI agents need what RSS does

(julienreszka.com)

60 points | by julienreszka 5 hours ago

20 comments

  • analogpixel 42 minutes ago
    I have this idea, that instead of browsing completely random things on the internet pushed by what other people are interested in (or want to promote), create an llm that scans through your backlog of projects YOU want to do, and then search the internet for projects/articles about those things, and then create a feed from that.

    I'm not sure why I keep reading HN, 99% of the content is uninteresting, probably 99.9% now that every article is about AI. maybe I just like clicking on things.

    • acgourley 14 minutes ago
      This is going to happen, but it's too expensive for your LLM to do the scanning, and instead someone needs to build and maintain the index while allowing other people to subscribe to concepts. The problem is no one has sorted out the embedding space this all lives in.
  • daxfohl 50 minutes ago
    Google should bring back Google Reader. But make it only for bots. And then drop it again once it gets popular.
  • sperandeo 45 minutes ago
    we spent a decade killing structured feeds in favor of algorithmic timelines and now we're rebuilding them because the algorithms need structured feeds. the circle of life, but for protocols.
  • phyzix5761 2 hours ago
    I have almost 40 feeds I subscribe to and they're my primary way of getting information I care about without being exposed to ads or other things I don't want to see.
  • dchuk 2 hours ago
    I built a site that's similar in concept to Hacker News, but is entirely fed by RSS feed content, that is then bullet-pointed summarized on the article page: https://engineered.at/

    But I also extract topics automatically from the content too with LLMs, to allow for dynamic topic pages that users can separately subscribe to to tune their feeds.

    Haven't promoted it much, but it's pretty amazing what you can do for a couple bucks a month. And my main thesis with this site is that by locking the content to only rss feeds of known blogs, you dramatically reduce the spam submission risk (basically eliminate it). Doesn't handle the spam comment side of things, but that's a different problem.

    EDIT: I also open sourced a Rails engine I made to power this site if anyone is interested: https://github.com/dchuk/source_monitor

    • devinpower 46 minutes ago
      This looks great, I've wanted something like this for a while. Finding how to click through to the actual item in the feed was a high point of friction for me.

      I went to a topic and then clicked on the header of something I was interested in expecting to be brought to the blog post directly. Needing to click on that same title again to be brought to the post was unintuitive to me, I searched around the page, went back and forth a few times and eventually figured it out.

      As a user I would love to be able to click directly through to the article FROM the topic feed. I would expect that the comments is a URL to the page that the header currently brings me to. This would match my expectations from using sites like reddit/HN.

      A one or two liner summary directly on the topics feed would be really great I think.

      • dchuk 33 minutes ago
        Great feedback, should be straightforward to make happen. I’ll try to implement tonight.
    • solid_fuel 57 minutes ago
      As a sysadmin hosting a few blogs, do you mind sharing what IP ranges you crawl from? Or what agent your requests use? Thank you.
      • dchuk 28 minutes ago
        I presume you’re politely asking in order to block? Which is fine, I get it. On my phone right now but can update later.

        I do want to ask though (and I should make this clear in a FAQ or something): the way I check RSS feeds uses adaptive scheduling, so I intentionally don’t check feeds of sites too rapidly. Then the summarization is based on the full article content but I never render that full content on the site (to avoid traffic hijacking concerns). Given that: what’s the concern?

    • Joe_Cool 53 minutes ago
      Getting

          406 browser not supported
      
      for ESR Firefox 140.

      If I set my UA to "FUCKIT" I can use the site perfectly fine. Why is there a User Agent Filter that disables the whole website? This should be maybe a warning, not a complete block.

      • dchuk 47 minutes ago
        you know, I had setup some analytics filtering based on geoip because I was getting crazy spam traffic from Chine and Singapore, but that should only be affecting analytics not the whole site. Mind if I ask where you're located? (you can email me privately if preferred: me@dchuk.com)
        • Joe_Cool 44 minutes ago
          Europe

          IP address has no effect on the User Agent block though...

          • dchuk 31 minutes ago
            Yeah I know and agree, just wondering if something is haywire in that logic somehow. Otherwise it’s a bizarre issue but I’ll get it fixed
            • Joe_Cool 20 minutes ago
              Glad to hear, and neat site. Cool to see new Ruby on Rails sites. Thought I was the only one still loving it. ;)
    • shaunpud 1 hour ago

        Your browser is not supported.
        Please upgrade your browser to continue.
      
      Can't even view your site with Firefox
      • dchuk 1 hour ago
        That’s…bizarre. Let me take a look

        EDIT: just checked in firefox, I don't see an issue. can you email me at me@dchuk.com and maybe I can debug with you?

        • Joe_Cool 52 minutes ago
          I just noticed the same thing.

          UA being blocked for example:

            Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0
          
          Did mess with it some more:

          Allowed:

              Opera/9.80 (Windows NT 6.1; U; zh-tw) Presto/2.7.62 Version/11.01
              Opera/9.80 (Windows NT 5.1; U; cs) Presto/2.7.62 Version/11.01
          
          406:

              Mozilla/5.0 (Windows NT 5.1) Gecko/20100101 Firefox/14.0 Opera/12.0
              Mozilla/5.0 (Macintosh; Intel Mac OS X 14; rv:140.0) Gecko/20110101 Firefox/140.0
          
          Maybe just remove it?
          • dchuk 31 minutes ago
            Thanks for this info! Very helpful
  • alextillman 4 hours ago
    What's old is new again. The solution RSS offered was structure for an otherwise unstructured challenge (trying to figure out updates on a site). That value grew exponentially when connected to AI (providing the signals of when do I need to look at this site/podcast again). Smart marketing.
  • PaulHoule 2 hours ago
    Re: Rate Limits, see

    https://rachelbythebay.com/w/2024/05/27/feed/

    but coming from an aggressively anticommercial world view. She collects evidence that real world feed readers don't implement RSS correctly

    https://rachelbythebay.com/w/2026/02/23/readers/

    Her problems are the problems of a polling-based protocol and really if she does not like the RSS protocol she should stop publishing it and stand up an ActivityPub or PubSubHubBub service instead.

    A big part of the value of Google Reader and the ecosystem around it was that Google could poll your RSS feed once and everyone could read it... A huge win for the Rachels!

    • solid_fuel 51 minutes ago
      > Her problems are the problems of a polling-based protocol and really if she does not like the RSS protocol she should stop publishing it and stand up an ActivityPub or PubSubHubBub service instead.

      Bit odd to take potshots at a third party blog on this discussion, why single out Rachel?

      And more to the point, the dynamics here might be due to RSS being polling-based, but if feed readers implemented the RSS logic correctly it wouldn't matter nearly as much, would it?

      • PaulHoule 9 minutes ago
        (1) Rachel complains more than most. Most people realize it is easier for you to speed up your server/lower your costs than to expect people to implement RSS "correctly"

        (2) You can use a cache or be correct, pick one! I think of all the lame cache busting methods that are still in use because it took web browsers more than 15 years to get caching mostly right.

        (3) If you'd been reading Rachel as opposed to asking why I pointed Rachel out your questions would be answered!

        (4) Polling based systems come in two speeds: too fast and too slow and it is possible to be both at the same time

  • grobibi 1 hour ago
    I need what rss does.

    Can someone reccomend a way to create an rss feed from a site that has none?

    • senectus1 41 minutes ago
      freshrss will do that for you. It has a built in web scraper.
  • hparadiz 1 hour ago
    Never dropped it

    https://technex.us/.rss

    https://github.com/hparadiz/technexus/blob/release/src/Contr...

    I would enjoy a JSON based refresh of the format.

    • ramgine 1 hour ago
      This is why I love HN
  • b3ing 2 hours ago
    I guess if you want your content all slurped up and served as coming from AI with no backlinks.
    • 8organicbits 2 hours ago
      Anyone know the best practices for keeping AI crawlers off your RSS feeds? I know robots.txt works for the well-behaved bots. Other tools like interstitial captchas don't as the feed readers break if you send them anything but XML.

      Putting just the post intro in the feed and linking to the website feels like a safer approach, assume you have bot protections on the website, but that's a poor experience for people who want to read in their feed reader.

      • solid_fuel 50 minutes ago
        I have some aggressive filters in Caddy that block the worst offenders by CIDR range, and also filter by user agent to remove any honest facebook and amazon bots. Otherwise, maybe strong rate limits by IP?

        Edit:

        Longer term, the approach might be - provide a separate RSS feed with full content but gated by a query parameter, then only give that URL to known-good consumers via email verification or patreon subscription, etc.

        It would suck that people would have to pay more to consume content in their preferred way, but depending on your needs it might be a reasonable compromise.

  • erelong 3 hours ago
    I kinda don't like RSS because I often want like a whole blog archive downloaded if I add a new feed and it usually has limits how far back of posts it will download (randomly configured by each site)

    Unless someone has a fix of whatever settings I've been using

    • happytoexplain 3 hours ago
      Sorry, I don't have a solution. But I use RSS for everything, and I can confidently say: RSS is not designed for your use case.
    • conesus 3 hours ago
      For the Premium Archive tier, NewsBlur attempts to download a blog's entire backlog to backfill stories, whether it's exposed through paging or RFC 5005. Here's more info about how NewsBlur does it: https://blog.newsblur.com/2022/07/01/premium-archive-subscri...
    • phyzix5761 2 hours ago
      I use Elfeed for Emacs and it stores the history as you download updates so you can always go back and read an old post.
  • _pdp_ 2 hours ago
    AI agents don't need RSS. What they need is some representation in text. The XML/RSS markup is completely unnecessary.
  • notnullorvoid 21 minutes ago
    I think now more than ever humans need RSS, so we can curate what enters our information feed as the social media experiment continues to degenerate.
  • h4kunamata 2 hours ago
    >RSS was declared dead in 2013

    Where? Not within the homelab space.

    • 8organicbits 1 hour ago
      They probably meant it hyperbolically, but RSS was on a downward slope during that period. The recent uptick is fascinating.

      https://trends.google.com/explore?q=%2Fm%2F0n5tx&date=all&ge...

      • PunchyHamster 1 hour ago
        No, 2013 was demise of Google Reader, which was at the time very good and very accessible so it was how a lot of people used RSS
    • h4kunamata 2 hours ago
      I must add that I self-host FreshRSS to fetch news and GitHub repos updates so I can update my stuff, everything in-house, controlled by me.

      RSS makes life so much easier, some only provide the bare minimal while others, provide the whole post so I can read everything right there without opening a website.

      Also, some podcast support it so I have a list of podcast that I list and can go back without having to go from website to website.

      One place to govern them all, RSS still king.

  • rvz 2 hours ago
    You mean scraping instead of reading it? Reddit does not like the sound of that at all and are mulling to remove RSS support due to scrapers [0]

    [0] https://www.reddit.com/r/modnews/comments/1tq9vxo/protecting...

    • eli 2 hours ago
      Well yeah if they provide open access to content then AI labs wouldn’t have to pay them for bull access.
  • 0gs 3 hours ago
    i mean, i still read hacker news primarily via RSS in feedly. i kind of never stopped using it, and everybody is much more generous with their feeds nowadays than back in google reader times. bearblog, etc. RSS rules
  • hendler 40 minutes ago
    ...And Semantic Web.
  • themafia 1 hour ago
    > The same logic will now extend to any written content that agents need to reliably consume.

    Get your rapacious hands away from my website please.

    > and actively degrades programmatic access.

    That's your problem. You choose these tools. If they can't function without ripping everyone else off then why do you persist in using them?

  • overfits-ai 31 minutes ago
    [flagged]
  • tokenfaucet 50 minutes ago
    [flagged]