Sitemap generation strategy for massive dynamic website

by Jan ┼╗ankowski   Last Updated September 05, 2018 14:04 PM

I need to put in place a system that will generate sitemap files (https://www.sitemaps.org) for a huge website with content changing dynamically. These are example figures, but I'm thinking of something at and above these orders of magnitude:

  • 10,000,000 pages.
  • 1,000s of pages added daily.
  • 1,000s of pages modified daily.

My ongoing sitemap goals after search engines index everything initially are:

  • New pages to be discovered/indexed ASAP.
  • Modified pages to be discovered/indexed ASAP.
  • Non-modified pages to be re-crawled rarely.
  • Try to help search engines save bandwidth, e.g. by putting all the new/modified pages in one sitemap file.

I'll add that I suspect sites such as Wikipedia or StackOverflow are in a similar position.

Are there any good algorithms for such a use case?



Related Questions


Updated October 20, 2018 13:04 PM

Updated August 24, 2019 08:04 AM

Updated March 08, 2018 13:04 PM

Updated August 16, 2016 08:03 AM

Updated December 20, 2018 17:04 PM