Disallow author with robots.txt

I would like to disallow the authors (of posts for example) in the file robots.txt. In fact, I noticed that for some of my sites, Google indexed the authors of articles, which I don't want to. Someone thinks is it possible ? (I tried something like Disallow: /author/ but I'm not sure it works...) Thank you in advance, ArbreMojo.
Category: Web

Virtual robots.txt missing

on every WordPress installation there is a default virtual robots.txt file with the following User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php but on my 2 last project /robots.txt give me a 404 instead. I tried to disable all plugin I tried to switch theme with a blank one Tried to look on another older project which I updated last week and robots.txt is working This not server related because it work on my local environment with older project Any idea why …
Category: Web

Use a filter for wp_robots to block CPT/feed/

We have a CPT (with base slug templates) that somehow included in every single post's head a link to /templates/post/feed/. I've removed this link to the feed from the head, but want to block robots from the feed also. From this answer, I can adapt code like: add_filter( 'wp_robots', function( $robots ) { if ( is_singular( 'templates' ) ) { $robots['noindex'] = true; $robots['nofollow'] = true; } return $robots; } ); but don't know how to include is_feed() into this …
Category: Web

Why does do_robots() Allow: /wp-admin/admin-ajax.php by default?

This is happening with a fresh, new WordPress install. But all you need to do is look at the crazy do_robots() function, which outputs the following... User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://knowingart.com/wp-sitemap.xml Sending robots to "wp-admin" (by default) makes no sense: Why would I send random robots into my admin directory?! The admin directory is for me, and me alone. Robots/crawlers should not be in my admin directory doing anything. Why would I DOS attack myself by sending …
Category: Web

WordPress site using Yoast SEO being blocked from indexing by robots.txt

Morning all! I've just built a website for a client, using WordPress and the Yoast SEO plugin (free version). https://www.fly2help.org/ Whilst they're really happy with the website, I've noticed an issue when trying to submit it to search engines, such as Google, for indexing. It is continually being blocked, with an error like this: Crawl allowed? error No: blocked by robots.txt I'm not 100% sure why this is the case. I've checked the robots.txt file and there's nothing in it …
Category: Web

How to stop google from indexing spammed internal search results?

Google is indexing all kinds of spammed internal search results from my wordpress site, and I can't figure out how to put a stop to it. I added Disallow: /search/ to my robots.txt file, but that didn't stop google. They still index them, but now they say "Indexed, though blocked by robots.txt" What can I do to stop these from being indexed? And how do they even show up for google? Is there somewhere within wordpress that these are being …
Category: Web

"No sitemap linked in your robots.txt file"

Relatively new to anything that doesn't require common sense in WordPress. My previous site was built poorly etc, so cutting a long story short I installed a new theme and started building up pages again etc. I have ran various diagnostics on seo etc while I am working and seem to have problems with indexability and sitemaps. Getting this message on the onpage.org site check and had something similar on another platform wherby it is saying there are resources that …
Category: Web

Set noindex page-comment from Pages 2, 3 and More?

I use All in one seo pack plugin. I see that Google indexed page comments (that are a pages generated from paginated comments). The example url are the followings: URL canonical - mydomain.com/url-canonical URL comment page - mydomain.com/url-canonical/comment-page-3 The problem is that 'URL comment page' has itself as canonical, and not the main page. I'd like to set as noindex all comment-page for all postes. Is it possible? I see that AIO SEO has aioseo_filter_robots_meta, but I do not understand …
Category: Web

How to prevent Google from indexing the /wp-content/ directory?

I received an error from google saying that I have mobile usability errors on my site. the "page" that they are referring to is www.example.com/wp-content/plugins/ag-admin. I'm not asking for help with that plugin, but trying to figure out how to make google not index that directory at all. Is disallawing this in my robots.txt enough or should I even have to do that?
Category: Web

Why does Google Image Search still display my images?

I am having trouble with a few image files that I would like to block for Google Image search. I am using Yoast SEO Plugin, which allows me to directly edit the robots.txt file. A few people with portrait images on the site requested not to appear in image search, so we added entries like the following: User-Agent: Googlebot-Image Disallow: /wp-content/uploads/2019/03/domain.eu-adam.png Disallow: /wp-content/uploads/2020/02/domain.eu-team-eve.jpg Some time later, we noticed that this is not working anymore, as the images started to reappear …
Category: Web

Robots.txt not updating

When I try to google search my web address, the follow comes up as the meta description "A description for this result is not available because of this site's robots.txt – learn more." The website I have tried to edit the robots.txt (downloaded the WP Robot plugin) and although I have changed it to User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ The robots.txt still appears as User-agent: * Disallow: / I have Yoast SEO, WP Robots, and Salient's Theme Nectar. If …
Category: Web

Bing/msn bots is heavily requesting random of my website

I am facing a big problem with my server. I have a website that keeps getting massive page requests coming from "Bing/Msn" bot every second or two and the ip changes now and then. Which is putting a heavy load on my server. My CPU is constantly over 90% I tried to block the bot from htaccess and robots.txt but they don't seem to have any effect. If anyone has an idea how to defeat this it would be much …
Category: Web

Robots.txt file not updating

One of our website robots.txt file is not getting updated,deleted .htaccess and robots.txt file from root folder from one of the suggested answers Why does Google Webmaster Tools say my whole site is blocked by robots.txt when I have no robots.txt? I thought may be some plugin conflict.So I have completely deactivated Yoast SEO and Security plugin which is affecting the file. but still robots.txt file displays me as User-agent: * Disallow: / User-agent: Googlebot-Image Disallow: / Any other suggestion …
Category: Web

Old robots.txt file not changing, can't update to the current robots.txt

While my wordpress website was in production, I created a robots.txt file to disallow everything. When the site was ready, I deleted the robots.txt file through cpanel and never thought much about it. Recently, I realized that the website was not showing up on google search results, upon further investigation, i realized that the old robots.txt file was still there (even though I can't locate the file in my root folder). http://youngauthors.my/robots.txt I inserted another robots.txt file in my root …
Category: Web

Why are some user agents still getting a meta robots tag with noindex from my site?

A friend pointed out to me that a website is not in Google because of the presence of a <meta name='robots' content='noindex,follow' /> tag in the home page. Ok, should be easy enough to turn that off, right? Wrong. Site uses Genesis framework, and in the SEO settings, none of the options are enabled to generate noindex. Go to the individual page (homepage is a page, not a post), look there, none of the noindex options are checked there either. …
Category: Web

Where to put robot.txt and sitemap.xml for wordpress installed in its own directory

My website is setup where wordpress is installed in it's own subdirectory, but the content is served as though it's at the domain root. (This wordpress codex page details the configuration) Wordpress Url: https://www.example.com/wordpress Site Url: https://www.example.com So in this case, urls get re-written to NOT include /wordpress in the url it serves pages from (although image urls do include it: https://www.example.com/wordpress/wp-content/uploads/...). URL for a page is: www.example.com/mypage/ NOT: www.example.com/wordpress/mypage/ but image link is: www.example.com/wordpress/wp-content/uploads/2018/12/25/image1.jpg Question: Do I put robots.txt …
Category: Web

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.