Storing 450,000,000,000 data points in a bitemporal datastore

by Tom   Last Updated October 18, 2019 22:06 PM

I'm really looking for guidance on an appropriate way to tackle persisting and retrieving 450 billion data points. The details:

  • 15,000 equities
  • 15,000 days of history (approximately 40 years)
  • 2,000 columns/properties
  • 45 billion = 15,000*15,000*2,000
  • 30,000,000 inserts per day (15,000 equities * 1 day * 2,000 properties)

Caveats: Some properties have values for nearly every day e.g. price. Some properties have very few values per year e.g. earnings. Therefore the estimates above are upper bounds.

Inserts happen throughout the day and do not need to be "fast". Reads happen during a short windows in the morning and need to be "fast"; retrieve all data and perform business calculations within 1 hour.

Properties are bitemporal - they have a data date and an effective date. The latter is to support corrections to erroneous data and backtesting.

Some properties are calculated from other properties i.e. we read data, calculate a derived value and then insert the result.

There is no offical budget for this project so assume high spec hardware, cloud infrastructure, custom software are all possibilities.

So any suggestions on an appropriate persistence technology for this problem? Any design consideration I should be thinking about?



Related Questions


Updated May 04, 2018 10:06 AM

Updated September 01, 2017 08:06 AM

Updated July 09, 2019 08:06 AM

Updated January 25, 2018 19:06 PM