You are only as good as your tools.

Notes On Robots

MRU: 7 March 2023

Goto #thebotlist bookmark to skip the editorializing.

This lists software (code, tools, programs) that read (scan) multiple Websites. (There ain't anything new or spectacular here, I'm just amusing myself...)

Included are ALL automated code, whether a "Search Engine" or "malicious code" – I do not distinguish between the two. And there are many "in between" those two ends. If a classification were done, it might look like this:

  1. Search Engines – companies that index website content for people to search.
  2. Website Services – companies that index website content to sell services (SEO).
  3. White Hats – coders/people that index website exploits to list on their own website.
  4. Crackers – coders/people that just like (apparently) to break websites.

It's not "Hackers/Hacked", okay? The correct terms are "Crackers/Cracked". OKAY? Hackers are your friends.

I block any robot that does not read robots.txt, all "Search Engine Optimization" services of any kind and now all "White Hat" actors (but they are in the first category, so...). I do this on principle. They, especially the latter two, just use Other People's Data to build their Websites.

Sometimes a Bot uses someone else's Bot code to do their Bot Shit; code like "zgrab" and "httpx" for example. And some do their Bot Shit in total isolation (nobody writes about them) like "Alittle Client", "Hello, [Ww]orld" and the new "0xAbyssalDoesntExist".

Not included are CLI/API sources like WGET, CURL and like Perl and Python libraries. They can do Bot shit, but can also be "Users". (As Bots they often go after .ENV and .GIT shit which yer hosting company blocks, so they are just being silly.)

I do not block any known exploits. Why? I ain't got any exploitable code on my website. (More on why I can make that claim elsewhere.)

I do block some file requests known to be exploitable, such as any request with "wp-" or "admin" in the URI, though just to reduce traffic.

The Bot names here are usually "BotName/VERS; URL" from the User-Agent string but I might not be always exact. (Many do not adhere to the common format; sometimes I change one to that format.)

The order of the Robots listed here started (bottom up) randomly. Now the list grows top down with lastest discovered first.

Other Resources

  1. matomo-org/device-detector/master/Tests/fixtures/bots, a long list of Bot names and metadata.



Does not read robots.txt. No results found in a quickie web search.

2ip bot/1.1 (+

Does not read robots.txt. Says they provide "DDoS protection" or something. Whatever.

Gregarius/0.5.2 (

Gregarius does not have any sponsors for you.

IonCrawl (

Did not read robots.txt. (Just seen.) IONOS is just a web-hosting company.

It crawls the web "to allow us to improve and expand our world-class hosting services," whatever that means.

Neevabot/1.0 +

Adheres to robots.txt. Just another search engine. So far very light.


This is funny. The Bot not a Bot that says it's:

The XMCO Cabinet. Trusted experts in your company's cyber security service!

Yet from my POV they are simple abusers, checking up on a few (lame) known exploits... over and over and over... They use l9tcpid, Go-http-client and their own (guessing) libraries to scour. sigh

(While some "threat actor listers" list XMCO as a "threat", looks like XMCO is just sloppy.)

BestProxies (+

Another not a bot but like a bot... Indexing and/or selling Proxies or something.

Using HTTP CONNECT method for various other websites. (That's yer hosting company's thing to block, which they probably do.)


Did not read robots.txt. First seen 10 Nov 2022. Read just root.

We Classify The Web Layer By Layer

Whatever that means. Though they seem to be selling services for "Protection" or some shit. Whatever.

What's extra funny about them? Their front page has Google reCAPTCHA to prove "I'm not a robot".

SurdotlyBot/1.0 +

Did not read robots.txt. First seen Oct 2020 alters the outbound links on your site so that visitors can get to external target pages without leaving your domain.

What? The? Fuck? But okay, say that's a cool thing... Why are you scraping people's shit? (Oh, and "domaining" external links within yer domain ain't cool. That's like IE Era shit.)

got (

First seen. One root request, October, 2022. Another Node.js creeper...

Okay, firebounty (aka YesWeHack) is not a Bot in the typical sense, however, they index/catalog/link website's "/.well-known/security.txt" files, so you'll see them by referer.

So, they are using other people's content for their content. And then they say:

Users shall have no intellectual property rights to the Site at or its Services as well as their contents. Unless authorized by statutory provisions, users may not utilize content obtained through the Site or its services.

Fucking Wow! (Though they have their "out" with the phrase, "Unless authorized by statutory provisions", which probably to them justifies their use of yer shit.)

everyfeed-spider/2.0 (

First seen for a few root requests, October, 2022.


Great... Someone created a Bot with Node. Just Great...


A documented phpMyAdmin scanner. People just should not install admin shit in a default admin location, use like "/notadmin/" or something...

SeekportBot +

Does read robots.txt.

Though a few write-ups complaining about SeekportBot were found, this is first contact for me.

Looks like just yer basic search engine.


Only reads robots.txt. (Which has a "User-agent: * Disallow: /" at the end, so maybe it's adherent.)


So far only Shellshock attempts.

Cloud mapping experiment. Contact

A well known exploiter. A more in depth look: .

Does read robots.txt.

Panscient crawls the web and turns unstructured information on companies and their employees into structured databases. Our databases are used for sale lead generation, business intelligence, marketing and recruiting. We help organizations provide complete and comprehensive corporate information to their customers.

Seems like enabling people to SPAM...


Exploiter. Mostly trying "/admin/config.php".

I have been seeing this for a year, but forget to log it, as it ain't prolific. Not very many search hits.


Does not read robots.txt.

First seen 31 Aug 2022. Just got root. No search hits.

InternetMeasurement/1.0 +

Does not read robots.txt.

They've been getting root and the site's associated icon.

This domain is used to discover and measure services that network owners and operators have publicly exposed.

Not another one! Sheesh!

If you find particular probes technically or operationally problematic, please let us know why:

Really? Are you saying you could cause problems?

To opt out, block these IP ranges: ... You may also opt out by sending your IP ranges to ...

Ah, please stop.


I have been igoring that one for a long time, as it's, well, I dunno. Just a run-o-the-mill Wordpress exploiter. So it's docxd here for S&Gwp_is_mobile log.

Qwantify/1.0 +

Does read robots.txt.

Why havn't I doxd this before? (I'm slow...)

Wondering who uses Qwant? We do too.

X'lent! A search engine with a sense of humor!

"The search engine that doesn't know anything about you Zero tracking of your searches. Zero sale of your personal data."

Does read robots.txt but does not adhere to it...

Calling themselves "The Anti-Counterfeiting Network":

Supporting members in their anti-counterfeiting strategies by providing customs – online – and market enforcement services at non-commercial fees. To support activities to protect all rights holders, consumers and governments against the negative consequences of the trade in counterfeited goods.


Yet another HTTP library.

It's been trying for variations of "/adminer.php". WTF? and, lately, "/.git/config". frack

Probably, the most popular REST API client library for .NET.

Okay. Yeah. Sure.

Screaming Frog SEO Spider/8.1

Only a few requests for root so far, so, we'll see...

Kinda funky, they request with a typical UA first, then with their UA. (Interesting.)


Just seen. An exploiter attacking "/login.cgi":


Yes, a WTF?

serpstatbot/2.1 (advanced backlink tracking bot;;

Does read robots.txt. Seems to be a heavy reader.

...crawls the web to add new links and track changes in our link database. We provide our users with access to one of the largest backlink databases on the market for planning and monitoring marketing campaigns.

Frack! Not another one! Oh, and really?:

"Does the bot crawl links with the rel = nofollow attribute? Yes, it scans."

RepoLookoutBot/1.0.0 (abuse reports to

Does not read robots.txt. (And they identify as a Robot!)

Repo Lookout is a large-scale security scanner, with a single purpose: Finding source code repositories, that have been accidentally exposed to the public and reporting them to the domain’s technical contact.

Yet another useless "We are here to help," Bandwidth waster.

This bot will try every directory for ".git" shit. Stop wasting my time.


Does not read robots.txt. Been doing these:

"\x16\x03\x01" 404
"GET / HTTP/1.1" 200 of students at Esslingen University scanning the internet to gain insights into network security. If u want us to stop scanning your IP range, get in touch with us [email]...

robots.txt! ROBOTS.TXT! ROBOTS DOT TEXT!!! sheesh

InfoTigerBot/1.9 +

Does read robots.txt.

The "Independent, privacy respecting search engine". Looks very... inviting.

More than 30 years after the World Wide Web first saw the light of day at CERN, only very few search giants determine the results of all of our web search.


... we neither collect user data nor do we track users. For us - a matter of course.

Cool. In fact, they look like a "Very Good Thing!" (I never make recommendations, but InfoTiger merits a very close looking into.)

Applebot/0.1 +

Does read robots.txt.

Why haven't I see these guys before? Weird. They can't be new.

SeznamBot/3.2 +

Does read robots.txt. is a Czech on-line company running, besides other services, the web portal, which is the first place of choice for millions of Internet users from the Czech Republic.



Does read robots.txt.

The BuiltWith system visits a website to determine the technology profile it is using by looking at the publicly visible code on a website. Millions of people benefit from understanding how websites are built using BuiltWith's free technology profile lookup tool.

While that seems like BS, I'll give them the benefit of the doubt as they play nice.


Does read robots.txt.

YaCy is free software for your own search engine.

Interesting. But they have requested only a single URI here, one that has been gone for years (but one that is still linked to on some websites...).


Only POSTs to "/editBlackAndWhiteList" which is maybe a hardware CVE or something.

That URL/URI can be seen in this code:, which is some kind of Exploit code...

And, of course, it can be found in many a Website's online Server Logs. (Why do people do that? It servers no purpose. sigh)

The website lists this several times – but looking at their meta data results in more confusion.

httpx Open-source project (

Does not read robots.txt.

What they say:

httpx is a fast and multi-purpose HTTP toolkit allows to run multiple probers using retryablehttp library, it is designed to maintain the result reliability with increased threads.

Okee fine. But why are you reading my stupid little website? Oh, and what they say? Smells like plain 'ole horse hockey pucks.

NihilScio Educational search engine - +

Seen just three times, three days in March... They are what they say they are.


Since seen only twice and for an outdated link, this might not be a robot. And nobody else has that string in any webpage... I'll keep it here for S&G.


Does read robots.txt.

Only 5 hits this year. Another "can't be found by Web Search" Bot. (Why, really, do people place web server logs in the public shpere? It makes zero sense.)


Does read robots.txt. Aka bytedance.

I can't read Mandarin...

CATExplorador/1.0beta (sistemes at domini dot cat;

Does not read robots.txt. But only seen a few times this month. Spain.

fluid/0.0 +

Does read robots.txt.

An "Internet Marketing Research" company. I (do) like how they say of their Web hosting company: "The people there are wise and nice."

serpstatbot/2.1 (advanced backlink tracking bot;;

Does read robots.txt.

We provide our users with access to one of the largest backlink databases on the market for planning and monitoring marketing campaigns.

Huh? What's a backlink? Wait. Don't tell me. I don't want to know.

HTTP Banner Detection (

Does not read robots.txt. Only reads "/".

For network security research, we need to obtain the IP location Banner and fingerprint information, we detecting the common port openly or not by ZMap, and collecting opened Banner data by our own code. Any questions please do not hesitate to contact with us:

Ok. But not! [A Chinese company that needs a better English translator.]

go-resty/2.6.0 (

Does not read robots.txt. Aka. MAndroid.

Only reads "/". (Likes to give HEAD.)

Two in one! First seen March, 2022. HEAD with the "go-resty", followed immediately by a GET with "MAndroid". WTF? Just seen so I'll wait for more before wasting my time on them.

Fuzz Faster U Fool v1.3.1

Does not read robots.txt, but, since the code is a tool "to discover potential vulnerabilities", they will say, "We are not a Spider, Luser..."

On Github.

Nmap Scripting Engine

Great. Nmap has a Scripting Engine. (The NSE ain't knew, it's just someone using it has decided to tell us that he/she/they has/have automated it and it found this stupid little website. sigh)

Oh, see for how it started.

webprosbot/2.0 (

Does not read robots.txt.

What they say:

WebPros delivers the most innovative technologies to enable the digital world. We bring together products and solutions to enable businesses to build, operate, and grow online. Our products help manage servers, websites, billing, and online marketing.

Not another one!

They say their brands include cPanel and Plesk. But why are they automating reading other people's websites?!?!

Dalvik/2.1.0 (Linux; U; Android 9.0; ZTE BA520 Build/MRA58K)

Does not read robots.txt.

Dalvik is an Android App, a Java Virtual Machine (as most search results indicate). That is as far as I went in research. (Slowly seeing more from them.) Why I initially placed it here was that the UA is formatted as if a Robot...

Mozilla/5.0 (compatible; Wappalyzer)

Does not read robots.txt.

What they say:

Find out the technology stack of any website. Create lists of websites that use certain technologies, with company and contact details. Use our tools for lead generation, market analysis and competitor research.

Way cool! (Not.) More bandwidth wasting. sigh


Does read robots.txt.

Some kind of search engine "helper" or something; but looks interesting.

Pandalytics/1.0 (

Does read robots.txt.

What they say:

The most ccTLD-friendly Name Suggestion on the market. DomainsBot’s name suggestion is optimized to help meet your customers’ demand for local domains. Get a full picture of the domain and hosting market, discover better business opportunities and generate higher revenue.

Such dredge! Yet another waste of bandwidth.

Project-Resonance (

Does not read robots.txt.

What they say:

Internet wide surveys to study and understand the security state of Internet as well as facilitate research into various components / topics which originate as a result of our surveys.

Aw fuck. Yet another "White Hat" trying to protect me from myself. sigh

Further self-justifuckencation shit from them:

You are visting this page most probably because you saw this url in your logs. Well, nothing to worry. So, what Happened? You recieved a [sic] innocent HTTP request from one of our distributed research engine as a part of Project Resonance. We perform internet-wide security research and send non-malicious and non-intrusive requests for the same. We take special care of making sure no systems are negatively affected because of our scans.

I like (not) how they use the word "innocent".

And then there is this:

And if you would not like any of our further probes, please drop us an email at [email protected]. Please make sure that you include the list of IP Addresses / IP Ranges which you would like to get excluded. Once we hear from you, we will simply put your IP Ranges on our exclusion list and you will never see any probe from us.

No, there is something called "Robots Exclusion Standard". (And the "[email protected]" thing means THEY DO NOT WANT BE SCANNED NEEDLESSLY! Needs Javascript enabled to display the address. It's a Cloudfare, /cdn-cgi/l/email-protection thing...)

Not needed. Not wanted. Thank you very much. Go the fuck away!


Does read robot.txt.

This is the "Internet Archive" bot; aka "Wayback Machine".

ArchiveTeam ArchiveBot/20210517.c1020e5 (wpull 2.0.3)

Does not read robots.txt. They will claim that they are not a spider, but they are.

What they say:

HISTORY IS OUR FUTURE And we've been trashing our history.

Really?!?! (But actually, WTF does that even mean?!?!)

Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage.

They have much more to brag about... They also say, via their wiki:

ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC file, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive's Wayback Machine (or other archive sites).

So, how did they get to my pathetic, always changing, stupid little website? My shit does not need these kinds of "services", thank you very much.


UGH! Favicon draggers... sigh

They should adhere to to robots exclusion standard, should they not? Of course they should.

DuckDuckGo's does not.

masscan/1.3 (

Does not read robots.txt. But they will say (they did say), "We are not a spider, Luser."

They do say they are a "TCP port scanner" that:

... spews SYN packets asynchronously, scanning entire Internet in under 5 minutes.

Sounds cool. Not.

And why me? Oh...

"It can also complete the TCP connection and interaction with the application at that port in order to grab simple "banner" information."

Scanners! Analizers! SEOs! Oh my! When are these fucking people going to stop! Not necessary here! Stop stealing my bandwidth! Stop slowing the Internets to a crawl!; Gather Analyze Provide.

Does not read robots.txt.

"Global Digital Network Plus scours the global public internet for data and insights. To accomplish this, GDNP sends packets to all IPv4 IP addresses. While far within legal boundaries, sometimes our benign research initiatives are mistaken for malicious network reconnaissance. If you are interesting in removing your organization’s IP space from within our scope, please send us an email at"

I have to ask you to remove my "organization’s IP space from" your scope? Oh, please.

ThinkChaos/0.3.0 In_the_test_phase,_if_the_ThinkChaos_brings_you_trouble,_please_add_disallow_to_the_robots.txt._Thank_you.)"

Does not read robots.txt (kind of ironic, donchta think?).


Does not adhere to robots.txt. (I am positive, that if you ask them, they will say, "We are not a spider, Luser"...)

From their gitshit, I mean github:

ZGrab is a fast, modular application-layer network scanner designed for completing large Internet-wide surveys. ZGrab is built to work with ZMap (ZMap identifies L4 responsive hosts, ZGrab performs in-depth, follow-up L7 handshakes). Unlike many other network scanners, ZGrab outputs detailed transcripts of network handshakes (e.g., all messages exchanged in a TLS handshake) for offline analysis.

That's a huge WTF as that all is techobabble.


ZGrab is commonly used for penetration testing, security assessment, or vulnerability scanning. Target users for this tool are pentesters.

Okay, but why my website? And here's the shit they requested this month (most root requests removed): - "GET / HTTP/1.1" 200 7666 "-" "Mozilla/5.0 zgrab/0.x" - "GET / HTTP/1.1" 200 7666 "-" "Mozilla/5.0 zgrab/0.x" - "GET /portal/redlion HTTP/1.1" 400 177 "-" "Mozilla/5.0 zgrab/0.x" - "GET /actuator/health HTTP/1.1" 400 177 "-" "Mozilla/5.0 zgrab/0.x" - "GET /hudson HTTP/1.1" 400 177 "-" "Mozilla/5.0 zgrab/0.x" - "GET /manager/text/list HTTP/1.1" 410 4 "-" "Mozilla/5.0 zgrab/0.x" - "GET /manager/html HTTP/1.1" 410 4 "-" "Mozilla/5.0 zgrab/0.x" - "GET /portal/redlion HTTP/1.1" 400 177 "-" "Mozilla/5.0 zgrab/0.x" - "GET /actuator/health HTTP/1.1" 400 177 "-" "Mozilla/5.0 zgrab/0.x" - "GET /hudson HTTP/1.1" 400 177 "-" "Mozilla/5.0 zgrab/0.x"

Not exactly looking like "good guys," eh?

CensysInspect/1.1 +

Does not adhere to robots.txt.

What they say:

Your cloud is bigger, wider, and more vast than you know; your internet assets innumerable. Censys is the proven leader in Attack Surface Management by relentlessly searching and proactively monitoring your digital footprint far more broadly and deeply than ever thought possible.

They go on with their bullshit:

Censys ASM provides a comprehensive profile of the IT assets on the internet, we empower defenders.....

Two things: Who are they fooling and why do they access my pathetic little website?

They also do a request without a user agent string, which is a 400. Then they immediately make a request with their UA; which gets 'em a 403...

Linux Gnu (cow);

Does not adhere to robots.txt, but probly not a spider..

Just gets root about 20 times per month two IP addresses.

Funny, since first seen months ago, no one seems to have written about this... Whatever it is.

(But one will see many "hits" as oh so many people/sites make their server log files public. Why would anyone do that? Who/What does that help?)

Oh, here's one other hit: But that, while they have a Website that is really well designed, simply displays a single request's data as Json, it's lamer than this...

Linespider/1.1 +

Does adhere to robots.txt. (Redirects to

Linespider is a Web crawler that provides a wide range of search results for LINE services...

WTF is/are "LINE services"? Then I realized that, is the Bot for, a Japanese messaging App.

LINE has grown into a social platform with hundreds of millions users worldwide, having a particularly strong focus in the rapidly advancing continent of Asia.

Why they need a Bot, though, they do not say.

Baiduspider/2.0 +

Does not adhere to robots.txt.

While sometimes called the "Google of China," they are very annoying as not only do not read robots.txt they also sometimes mis-identify themselves.


Does not adhere to robots.txt.

I almost missed this one. It was in the "ssl_log" log file for last month... A duckduckgo search resulted in:

Not many results contain flfbaldrbot
debilsoft IP-Logger PRO Web analytics
[Search domain]
[MAP] [Wiki] United States. FlfBaldrBot/1.0.
No more results found for FlfBaldrBot.

Funny thing about the one hit – "IP-Logger PRO; visitor data & web analystics" – it is dynamically generated so my visit did not see that bot in their logs. debilsoft's logger page is well formed and easy to read. Kinda nice.


Does not adhere to robots.txt.

Their UA is the actual string below.

NetSystemsResearch studies the availability of various services across the internet. Our website is

From their main page:

Net Systems Research is an independent research organization focusing on a range of topics in internet security including IoT Proliferation, Zero Trust Networking, Network-Level Security, Cyber Risk Modeling and External Network Security Measurement. We focus on surveying and analyzing real world network systems to better understand and study challenging internet security problems. Through our research, we hope to improve the current understanding of the global internet’s security and promote better network security practices.

Wow! That's bold! But is just marketing bullshit? You betcha!

What really bugs me about these kinds of "We are here to help!" websites is:

  1. They do not adhere to the robots.txt standard – it's a "Standard."
  2. They say "If you would like your IP ranges or domains to be excluded from our studies, please contact us at with the IP ranges and/or domains and any associated ownership information that is relevant to processing your request."
  3. Number 2 is a big "Fuck You you arrogant jerks," from my view. I run static, non-services websites, and I do not need anyone's "help" to run them.
  4. Since they do not adhere to such a basic web standard as robots.txt, how can they be trusted for anything?

DataForSeoBot/1.0 +

Does adhere to robots.txt. A pay for SEO.

From their main page:

Powerful API Stack For Data-Driven Marketers.

"We provide comprehensive SEO and digital marketing data solutions via API. Everything your SEO software requires — in one place."

Ah, no.

InfoTigerBot/1.9 +

Does adhere to robots.txt.

Search engine.

Independent, privacy respecting search engine... A text only search engine, covering two languages (English+German).

ZoominfoBot (zoominfobot at zoominfo dot com)

Adheres to robots.txt.

From their main page:

Don’t just go to market, own your market. Accelerate your pipeline with ZoomInfo’s portfolio of solutions that combine B2B intelligence & company contact data with engagement software, and dynamic workflows. Pump the richest B2B data into your tech stack or take advantage of ZoomInfo’s fully-loaded suite of applications to reach your buyers faster.

From their FAQ:

ZoomInfo is used by salespeople, marketers, and recruiters to optimize their lead generation efforts by providing them access to a vast business contact database and numerous sales intelligence and prospecting tools.

Who buys this dredge...

Twingly Recon-Klondike/1.0 (+

Does not read robots.txt.

A Search API:

Twingly Blog Search API is a commercial XML over HTTP API that enables machine access to Twingly’s blog search index.

It is very interesting. From their Terms of Use:

Twingly is a Search Engine for Conversational Media such as Blogs. Our API and Widgets are free for personal use, and we offer paid licenses for commercial use. You can use Twingly Widgets without registering, but in doing so you accept these terms of use.

But they do not seem to have any – as my logs show – support for robots.txt. Their main page is full of dredge like:

We keep track of updates from millions of online sources like blogs, forums, news, etc. Our focus is a broad coverage that includes all significant sources in each country. Through our easily integrated APIs, you get access to all that social data at your fingertips!

My fingers just blocked you!

Mail.RU_Bot/2.0 +

Adheres to robots.txt.

They have been around for a long time. And I have no idea what they do.

DotBot/1.2 +;

Adheres to robots.txt.

Redirects to, which says:

Enter the URL of the website or page you want to get link data for. Create a Moz account to access Link Explorer and other free SEO tools. Get a comprehensive analysis for the URL you entered, plus much more!

I think not! Plus much more!

Editorial: Light but like they get robots.txt and one more link about 80 times a month. What's up with that? Oh, and the scarf/snarf all the available ZIP files. That's fucked up. But I am being nice and just putting them on the list...

Googlebot/2.1 +

UPDATE: I am now seeing Google results with ""...

I no longer use Google for searches. Hover over their generated links and Google LIES! They do not reflect the true URL, as all of them go directly to Google with a ton of META data they use to track you before redirecting to the actual result. That is DISHONEST.


Adheres to robots.txt.

We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

hrankbot/1.0 +

Does not read robots.txt.

Which web hosting is better? We rank 300 Shared Web Hosting Providers by Uptime, Response Time and other features. Now we know for sure!

Yer blocked fer sher!

Barkrowler/0.9 +

Does read robots.txt.

Using Babbar, SEO gets easier. Thanks to Babbar’s data and metrics, uncover the strengths and weaknesses of your site and its competitors. Babbar helps you set up truly effective link building strategies thanks to its understanding of link and page semantics.

sigh Where do these people come from? Marketing 101 I guess. (Which just means the BS is good looking.)


Adheres to robots.txt.

No one seems to know who/what they. Verisign hosted. Many "IPS Insurance Agent" related pages. Could also mean Intrusion Prevention Systems. It's a WTF? +

Does, does not adhere to robots.txt.

Web Search.

[18/Oct/2021:23:47:20] "GET / HTTP/1.1" 200 11585 "-" "Mozilla/5.0 (compatible;; +"
[18/Oct/2021:23:47:22] "GET /robots.txt HTTP/1.1" 200 26 "-" "Mozilla/5.0 (compatible;; +"

They get the root page and THEN get robots.txt. WTF? (But that could be an Apache thing.)

SEOkicks +

Does adhere to robots.txt.


Adsbot/3.1 +

Does adhere to robots.txt.

I have zero need, and less tolerance for, "SEO" Bots and their shady services.

Sogou web spider/4.0 (+

Does adhere to robots.txt.

Does, does not, adhere to robots.txt. Again, why get root and THEN get robots.txt?

[18/Oct/2021:09:48:07] "GET / HTTP/1.1" 200 11585 "-" "Mozilla/5.0 (compatible;"
[18/Oct/2021:09:48:10] "GET /robots.txt HTTP/1.1" 200 26 "-" "Mozilla/5.0 (compatible;"

(I did think Apache "log issues" but by three seconds? I don't know.)

More dredge: transforms the internet into a structured database of web data. Our technology produces rock-solid insights today to empower your decisions for tomorrow. Start your free trial

No thanks.

AhrefsBot/7.0 +

Does adhere to robots.txt.

An SEO for paying customers to keep tabs on their own website. Therefore, they have no reason to crawl my websites.

ahrefs is an All-in-one SEO toolset, with free Learning materials and a passionate Community & support

"AhrefsBot is a Web Crawler that powers the 12 trillion link database for Ahrefs online marketing toolset. It constantly crawls web to fill our database with new links and check the status of the previously found ones to provide the most comprehensive and up-to-the-minute data to our users.

Link data collected by Ahrefs Bot from the web is used by thousands of digital marketers around the world to plan, execute, and monitor their online marketing campaigns.

DomainStatsBot/1.0 (

Does adhere to robots.txt.

Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7143; Pro)

Does not read robots.txt.

Seen last week for the first time and just one GET / HTTP/1.1. Weird.


Does not read robots.txt..

This is their new UA:

Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to:

WOW and WTF!

Via "I'm not one of their customers, so why are they all over my websites like a rash?"

Well, they are checking to see if their customers are mentioned on your website...

Their website says Expanse "protects around 10% of the overall Internet."

Yeah, right...

ALittle Client

Does not adhere to robots.txt.

All requests are for Wordpress exploits.

ThinkChaos/0.3.0 +In_the_test_phase,_if_the_ThinkChaos_brings_you_trouble,_please_add_disallow_to_the_robots.txt._Thank_you.

Does not adhere to robots.txt despite what it says.

Gets just "/" so far. Has footprints on the web as a developer(s) on Github and Stack Overflow. Saw this: "I just noticed a new user-agent string called ThinkChaos out of Tencent IP..."

A WTF as far as I can see.

SemrushBot/7~bl; +

Does adhere to robots.txt.

This is a weird one. Just gets a few pages over and over all month long, using a ill-formed URL. Still trying them even after a few weeks of 404's.

PetalBot +

Does adhere to robots.txt.

A search engine owned by Chinese telecom Huawei.


Does adhere to robots.txt.

BLEXBot/1.0 +

Does (lately not) adhere to robots.txt. Pay for SEO...

The BLEXBot crawler is an automated robot that visits pages to examine and analyse the content, in this sense it is similar to the robots used by the major search engine companies.

Um, okay, but:

BLEXBot assists internet marketers to get information on the link structure of sites and their interlinking on the web, to avoid any technical and possible legal issues and improve overall online experience. To do this it is necessary to examine, or crawl, the page to collect and check all the links it has in its content.

Not mine.

MojeekBot/0.10 +

Does adhere to robots.txt.


Does adhere to robots.txt.

Since it is all in Mandarin, I can't tell what they do.

The community does not like this one.

SEOkicks +

Does adhere to robots.txt.

What they say:

SEOkicks continuously collects link data with its own crawlers and makes them available via website, CSV export and API. The current index comprises more than 200 billion link data records.


DotBot/1.2 +

Does adhere to robots.txt. Redirects to

"Your All-In-One Suite of SEO Tools The essential SEO toolset: keyword research, link building, site audits, page optimization, rank tracking, reporting, and more."

Yeah, whatever. But, ah... Why?`

YandexBot/3.0 +

Does adhere to robots.txt.

Yandex is a technology company that builds intelligent products and services powered by machine learning. Our goal is to help consumers and businesses better navigate the online and offline world. Since 1997, we have delivered world-class, locally relevant search and information services. Additionally, we have developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers across the globe.