Simple question, I'm asking just to make sure.
A Google sitemap generator generated a sitemap.txt fle with links written like this:
is it correct to use the
& in these links in place of the
& or it's just an error made by the sitemap generator?
That is correct. It is the HTML entity for an ampersand (
&) and is the proper character representation of it in a properly encoded URL. Ampersands (
&) and well as
> are special characters in XML and HTML and need to be displayed using their special character entities.
Your Sitemap file must be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters.
This may help out, http://sitemaps.org/protocol.php
URL-Encoding and XML entity encoding are not the same things. You need URL-Encoding to replace special characters in URLs, such as & which can only be used for the separation of query parameters. XML entity encoding is for encoding special characters in XML (also XHTML). This means, if you have a URL in an XML (or XHTML) file, and this URL includes some & characters, you have to entity encode it to &. So in a sitemap.xml you will have urls like in the question from Marco Demaio.
You can also convince yourself by checking
You can't really argue against the official xml sitemaps protocol page :)
Google rejects the sitemap as broken if it has a & character in an URL. It accepts it when you replace & with &
BUT: if you later check the list of crawling errors in the Google webmasters tool, it will report this URL of the sitemap file as broken, because it contains & instead of &.
Thus the correct solution is to change the URL such that it does not contain &. Or report this as bug to Google.