Why does do_robots() Allow: /wp-admin/admin-ajax.php by default?

This is happening with a fresh, new WordPress install. But all you need to do is look at the crazy do_robots() function, which outputs the following...

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://knowingart.com/wp-sitemap.xml

Sending robots to wp-admin (by default) makes no sense:

  1. Why would I send random robots into my admin directory?! The admin directory is for me, and me alone. Robots/crawlers should not be in my admin directory doing anything.
  2. Why would I DOS attack myself by sending robots to an admin script?
  3. If I was a bot developer, I might (wrongly) interpret this allow as negation of the disallow, because the allow comes after the disallow and it's the same directory. Why does robots.txt contradict itself here?
  4. This weird (default) robots.txt seems to break DuckDuckGo. For example the top search result is my wp-admin directory?! It appears DuckDuckGo read my robots.txt, went into wp-admin because robots.txt told it to go there, and that is the wrong directory. Was DDG's crawler confused by the weird robots.txt file? Now I'm thinking DDG crawled my blog before any content was available, and just hasn't updated yet, that seems to be a more likely explanation.

Why does WordPress send robot crawlers to an admin directory that has no content?! It makes no sense to me, which is why I am here trying to figure it out. I can only imagine the author of this do_robots() code doesn't understand the purpose of robots.txt

Topic robots.txt hosting wp-admin seo Wordpress

Category Web


the truth is that probably nothing should be blocked in robots.txt by core (this is IIRC was joost's position abot the matter) as wordpress is an open platform and front end content and styling might and is generated in all kinds of directories which might not make much sense to you and me. Wordpress is not in the buisness of preventing site owners from installing badly written plugins.

Why do you have pages indexed by a search engine? Wordpress uses a kind of "don't index" headers for all admin pages so most likely you have some badly written code that prevents the header from bein sent. (this assumes that there is no bug in bing which is the SE which powers DDG).

Maybe worth reminding that robots.txt is just an advisory file, it is up to the search engine to decide if and how to respect it. IIRC google will not respect it fully if there was a link to a page which supposed to be excluded by robots.txt


First: have a read-up on robots.txt, if you haven't. Another good reference is this one from Yoast.

The sample you posted:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://knowingart.com/wp-sitemap.xml

...tells all user-agents (ie, all crawlers) that they are not allowed to crawl anything in your wp-admin directory except /wp-admin/admin-ajax.php, which is required by WordPress for properly-functioning AJAX.

It's not telling robots to go to your /wp-admin/ directory; it's telling them they're unwelcome there (except for the AJAX requirement).

As for your question #4, I'm afraid I don't have an answer to that one.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.