Can the files robots.txt and sitemap.xml be dynamic through an .htaccess redirect?

by Cesar   Last Updated August 30, 2019 16:04 PM

I have a multilanguage and multidomain site. It runs through a unique CMS installation (Drupal), so I have a single root directory. So if I have a static robots.txt, there I only can show the files for a single domain, as far as I know.

Could I put a line in .htaccess

Redirect 301 /robots.txt /robots.php

(or equivalent instruction, and please, indicate which one if allowed)

so it redirects to a dynamic php file, where I can serve different contain according to the $_SERVER['HTTP_HOST']?

And the same question for sitemap.xml, so I can serve a dynamic sitemap.php which indicates different links for each different domain.

The problem with no using .txt and .xml is, as mentioned, that all the domains share a single physical directory in the server computer.



Answers 4


Yes, the same way any request can be "dynamic".

However, you would not redirect (as in your example code), you should internally rewrite using mod_rewrite. (The same as what Drupal is probably already doing.)

For example, in your root .htaccess file:

RewriteEngine On
RewriteRule ^robots\.txt$ robots.php [L]

RewriteEngine should only occur once (although it doesn't really matter if it occurs multiple times).

You just have to make sure that it doesn't conflict with any other directives in your .htaccess file. So, this should probably be near the start of the file, certainly before your front controller.

MrWhite
MrWhite
October 22, 2015 09:23 AM

You can make any file dynamic. The best way to do so is not through redirects, but through rewrite rules.

RewriteRule ^robots\.txt$  /robots.php [L]

That way, you power it with a dynamic script, but the URL doesn't change. Most crawlers (including Googlebot) will follow redirects for robots.txt, but some crawlers will get confused if you introduce redirects.

Note that even if you power it with PHP, your robots.txt should appear to be static to each crawler for each domain. It is fine to serve different content for different domains, or even for different user agents. However, serving different content randomly, or based on time of day can really confuse search engine crawlers and mess up your SEO.


Sitemaps are fine to name however you want. You could redirect those, or use a rewrite rule to power them dynamically at the same URL. You can also name them like

  • site-a-sitemap.xml
  • site-b-sitemap.xml
  • site-c-sitemap.xml

Then refer to them in robots.txt:

Sitemap: http://www.example.com/example-sitemap.xml

or submit them to the search engines manually through their webmaster tools or search console.

Stephen Ostermiller
Stephen Ostermiller
October 22, 2015 09:23 AM

Making the sitemap file dynamic is fine -- it's a good way to auto-update your sitemaps.

Making the robots.txt file dynamic (for the same host! Doing this for separate hosts is essentially just a normal robots.txt file for each of them.) would likely cause problems: it's not crawled every time a URL is crawled from the site, so it can happen that the "wrong" version is cached. For example, if you make your robots.txt file block crawling during business hours, it's possible that it's cached then, and followed for a day -- meaning nothing gets crawled (or alternately, cached when crawling is allowed). Google crawls the robots.txt file about once a day for most sites, for example.

John Mueller
John Mueller
October 28, 2015 07:00 AM

There is no need to create sitemap.php because: 1. For each language you can run a separate sitemap.xml file and specify each in search engine consoles. 2. Standard sitemap files can be re-written regularly to include recent content and it makes them in a way dynamic - for that .php is not required. It's up to the internal update mechanism and the cron to recreate the same file with standard extension .xml

Sitemap.xml files are static and only updates make them dynamic - they do not updating in real time. It's possible of cause to make them re-write every minute, but there is no need for it because: 1. Google won't check it in less than 1 hour since last submission 2. When sitemap files are big, re-writing them often will make the server performance kaput.

When there is large volume of data and it makes the sitemap file larger than 50mb, a system with multiple sitemaps is required. It means that sitemap2,3... .xml will add up to the list of the main file, but content in these files remains also fixed until these files being recreated (by cron for example).

Also to mention, that once a search engine has accessed the file, it won't return to it again very fast (unless it's done manually). It confirms that there is no need in any case creating a real-time updating of sitemap.php, because a normal sitemap.xml is by itself can be dynamic, updating with new content throughout the day or a week.

I can't think of any pros using a sitemap.php. It will do no good, as there are other better/proper ways to use these files.

Inducto
Inducto
August 30, 2019 15:26 PM

Related Questions


Updated October 15, 2018 06:04 AM

Updated May 12, 2018 11:04 AM

Updated December 13, 2018 10:04 AM

Updated May 13, 2015 14:01 PM

Updated June 05, 2016 08:01 AM