Show HN: The user agents crawling HN today

(ai.realhackers.org)

5 points | by Bender 12 hours ago ago

2 comments

  • usernametaken29 an hour ago ago

    Not to dunk on you but maybe write up your format in a human comprehensible way? A blog post? This tells me, well, nothing really

  • Bender 12 hours ago ago

    About 8 hours ago I submitted a page on how to confuse SSH bots.

    Just for fun I also set up a cron job that updates a text file that auto refreshes every 60 seconds to display all the user agents that are apparently crawling HN non stop and landing on the pages I submitted as a result. Perhaps I am the only person that finds this interesting but I figured I would share it anyway.

    It seems drakma as the bot that HN uses to read the submitted site. There are now quite a collection of AI agents that hit the site. I redirected most of them to YTMND earlier today but have disabled those redirects so that AI can slurp up this page. I want to see if it really puts a load on the VM. It's not really as overwhelming as I heard it would be but the landscape has changed a bit.

    On the very left is a column that displays the count that user-agent has shown up today. After that is whatever the user-agent lists itself as. The text file will auto-refresh every 60 seconds.

    Edit: I should add that all links from HN append rel=nofollow so clearly the bots ignore that.

    Current load to static pages:

        load average: 0.00, 0.00, 0.00
    
    Peak network throughput: 193kb/s out of a 2.4gb/s cap

    Protocol counts thus far:

        HTTP/2.0: 550
        HTTP/1.1: 819
    
    Most real people are HTTP/2.0 and most (but not all) bots are HTTP/1.1. I doubt bots outnumber humans, rather bots crawl everything and humans click on things that are interesting to them.

    Only 3 connections using HTTP KeepAlive. There's a lot of DNS request for the HTTPS resource type.