Appendix C. Data sets

Table A.2: A list of indicative publicly available data sets.

Related section Description Link
§2.7.9. Deep Probabilistic Models Data for wikipedia page views, Dominicks retail, electricity consumption, traffic lane occupation.
§2.9.3. Forecasting with text Information Movie reviews data provided by the Stanford NLP group.
§2.10.4. Ecological inference forecasting Party registration in South-East North Carolina (eiPack R package).
ei.Datasets: Real Datasets for Assessing Ecological Inference Algorithms.
§2.12.7. Forecasting competitions Data for the M, M2, M3 and M4 forecasting competitions.
Time Series Competition Data (R package)
Mcomp: Data for the M and M3 forecasting competitions (R package).
M4comp2018: Data for the M4 forecasting competition (R package).
Data for the M4 forecasting competition (csv files).
Tcomp: Data from the 2010 Tourism forecasting competition (R package)
Data for the M5 forecasting competition (csv files).
§3.2.3. Forecasting for inventories Grupo Bimbo Inventory Demand.
§3.2.4. Forecasting in retail Rossmann Store Sales.
Corporación Favorita Grocery Sales Forecasting.
Walmart Recruiting – Store Sales Forecasting.
Walmart Recruiting II: Sales in Stormy Weather.
Store Item Demand Forecasting Challenge.
Online Product Sales.
§3.2.8. Predictive maintenance Robot Execution Failures.
Gearbox Fault Detection.
Air Pressure System Failure at Scania Trucks.
Generic, Scalable and Decentralised Fault Detection for Robot Swarms
Wind turbine data (e.g., failures).
§3.3. Economics and finance Two Sigma Financial Modelling Challenge.
Financial, economic, and alternative data sets, serving investment professionals.
§3.3.2. Forecasting GDP and Inflation Repository website with Dynare codes and data sets to estimate different DSGE models and use them for forecasting.
Data set for Macroeconomic variables for US economy.
Data set for Macroeconomic variables for OECD economy.
§3.3.2. Forecasting GDP and Inflation (continued) Data set for Macroeconomic variables for EU economy.
§3.3.7. House price forecasting Zillow Prize: Zillow’s Home Value Prediction (Zestimate).
Sberbank Russian Housing Market.
Western Australia Rental Prices.
§3.3.12. Forecasting returns to investment style Algorithmic Trading Challenge.
§3.3.13. Forecasting stock returns The Winton Stock Market Challenge.
The Big Data Combine Engineered by BattleFin.
§3.4. Energy VSB Power Line Fault Detection.
ASHRAE – Great Energy Predictor III
§3.4.3. Hybrid machine learning system for short-term load forecasting Global Energy Forecasting Competition 2012 – Load Forecasting.
§3.4.6. Wind power forecasting Global Energy Forecasting Competition 2012 – Wind Forecasting.
§3.4.8. Solar power forecasting Power measurements from a PV power plant and grid of numerical weather predictions.
AMS 2013-2014 Solar Energy Prediction Contest.
SolarTechLab data set.
§3.4.9. Long-term simulation for large electrical power systems Brazilian National Electric Systems Operator (hydro, solar, wind, nuclear and thermal generation data).
§3.4.10. Collaborative forecasting in the energy sector Solar power time series from 44 small-scale PV in Évora, Portugal.
Australian Electricity Market Operator (AEMO) 5 Minute Wind Power Data.
Electrical energy consumption data from domestic consumers.
Electric vehicles charging data (arrivals, departures, current, voltage, etc.).
Wind power plant data and numerical weather predictions from CNR (France).
§3.5.2. Weather forecasting How Much Did It Rain?
§3.5.3. Air quality forecasting EMC Data Science Global Hackathon (Air Quality Prediction).
§3.6. Social good and demographic forecasting LANL Earthquake Prediction.
§3.6.1. Healthcare Flu Forecasting.
West Nile Virus Prediction.
§3.6.2. Epidemics and pandemics COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
§3.6.3. Forecasting mortality Human Mortality Database.
The Economist.
The New York Times.
§3.6.3. Forecasting mortality data (continued) The Financial Times.
ANACONDA- Quality assessment of mortality data.
Australian Human Mortality Database.
Canadian Human Mortality Database.
French Human Mortality Database.
§3.6.4. Forecasting fertility Human Fertility Database: fertility data for developed countries with complete birth registration based on official vital statistics.
World Fertility Data: UN’s collection of fertility data based on additional data sources such as surveys.
§3.6.5. Forecasting migration Integrated Modelling of European Migration (IMEM) Database, with estimates of migration flows between 31 European countries by origin, destination, age and sex, for 2002–2008.
QuantMig data inventory: meta-inventory on different sources of data on migration and its drivers, with European focus.
Bilateral international migration flow estimates for 200 countries (Abel & Cohen, 2019).
UN World Population Prospects: UN global population estimates and projections, including probabilistic
§3.8. Other applications Forecast Eurovision Voting.
Reducing Commercial Aviation Fatalities.
Porto Seguro’s Safe Driver Prediction.
Recruit Restaurant Visitor Forecasting.
Restaurant Revenue Prediction.
Coupon Purchase Prediction.
Bike Sharing Demand
Google Analytics Customer Revenue Prediction.
Santander Value Prediction Challenge.
Santander Customer Transaction Predictio.n
Acquire Valued Shoppers Challenge
Risky Business
Web Traffic Time Series Forecasting.
A repository of data sets, including time series ones, that can be used for benchmarking forecasting methods in various applications of interest.
WSDM – KKBox’s Churn Prediction Challenge.
Homesite Quote Conversion
Liberty Mutual Group: Property Inspection Prediction.
Liberty Mutual Group – Fire Peril Loss Cost.
A set of more than 490,000 time series (micro, macro, demographic, finance, other) to download.
§3.8. Other applications (continued) A repository of data sets, including time series ones, that can be used for benchmarking forecasting methods in various applications of interest.
§3.8.1. Tourism demand forecasting TourMIS: Database with annual and monthly tourism time series (e.g., arrivals and bednights) covering European countries, regions, and cities (free registration required).
Tourism Forecasting.;
§3.8.2. Forecasting for aviation Airline and Airport performance data provided by the U.S. Department of Transportation
§3.8.3. Traffic flow forecasting New York City Taxi Fare Prediction.
RTA Freeway Travel Time Prediction
ECML/PKDD 15: Taxi Trip Time Prediction.
BigQuery-Geotab Intersection Congestion.
Traffic volume counts collected by DOT for New York Metropolitan Transportation Council.
§3.8.5. Elections forecasting New Zeal and General Elections – Official results and statistics. <https:/ />
Spanish Elections – Official results and statistics.
§3.8.6. Sports forecasting NFL Big Data Bowl.
§3.8.9. Forecasting under data integrity attacks Microsoft Malware Prediction.


Abel, G. J., & Cohen, J. E. (2019). Bilateral international migration flow estimates for 200 countries. Scientific Data, 6(1), 82.