TannerRitchie Web Applications

Advanced Web Application Development in the GTHA

Problem with default WordPress robots.txt

A WordPress site we administer had the following automatically generated robots.txt file (in a fairly vanilla installation).


User-agent: *
Disallow:
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-login.php
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments

Note the lines which block indexing of theme and plugins. This used to be fine, but with Google now analysing CSS and JS (especially when making important judgements about mobile rendering), blocking these folders can have a drastic effect on your sites indexing, as it will appear to be plain HTML.

It was therefore important to add an old style hardcoded robots.txt file in the root directory (which blocks the automatic generation):


User-agent: *
Disallow: /wp-admin/*.php$
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /wp-comments
Disallow: /cgi-bin

Even this is probably overkill, and not I am also not blocking /wp-includes (where shared libraries like jquery are kept).

Using robots.txt as some sort of security by obscurity is never going to be successful, however it prevents accidental indexing of some application files without blocking your theme files.