Appendix C. Data sets

Table A.2: A list of indicative publicly available data sets.

Related section Description Link
§2.7.9. Deep Probabilistic Models Data for wikipedia page views, Dominicks retail, electricity consumption, traffic lane occupation. https://gluon-ts.mxnet.io/api/gluonts/gluonts.dataset.repository.datasets.html
§2.9.3. Forecasting with text Information Movie reviews data provided by the Stanford NLP group. https://nlp.stanford.edu/sentiment/code.html
§2.10.4. Ecological inference forecasting Party registration in South-East North Carolina (eiPack R package). https://www2.ncleg.net/RnR/Redistricting/BaseData2001
ei.Datasets: Real Datasets for Assessing Ecological Inference Algorithms. https://cran.csiro.au/web/packages/ei.Datasets/index.html
§2.12.7. Forecasting competitions Data for the M, M2, M3 and M4 forecasting competitions. https://forecasters.org/resources/time-series-data/
Time Series Competition Data (R package) https://github.com/robjhyndman/tscompdata
Mcomp: Data for the M and M3 forecasting competitions (R package). https://cran.r-project.org/package=Mcomp
M4comp2018: Data for the M4 forecasting competition (R package). https://github.com/carlanetto/M4comp2018
Data for the M4 forecasting competition (csv files). https://github.com/Mcompetitions/M4-methods/tree/master/Dataset
Tcomp: Data from the 2010 Tourism forecasting competition (R package) https://cran.r-project.org/package=Tcomp
Data for the M5 forecasting competition (csv files). https://github.com/Mcompetitions/M5-methods/tree/master/Dataset
§3.2.3. Forecasting for inventories Grupo Bimbo Inventory Demand. https://www.kaggle.com/c/grupo-bimbo-inventory-demand
§3.2.4. Forecasting in retail Rossmann Store Sales. https://www.kaggle.com/c/rossmann-store-sales
Corporación Favorita Grocery Sales Forecasting. https://www.kaggle.com/c/favorita-grocery-sales-forecasting
Walmart Recruiting – Store Sales Forecasting. https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting
Walmart Recruiting II: Sales in Stormy Weather. https://www.kaggle.com/c/walmart-recruiting-sales-in-stormy-weather
Store Item Demand Forecasting Challenge. https://www.kaggle.com/c/demand-forecasting-kernels-only
Online Product Sales. https://www.kaggle.com/c/online-sales
§3.2.8. Predictive maintenance Robot Execution Failures. https://archive.ics.uci.edu/ml/datasets/Robot+Execution+Failures
Gearbox Fault Detection. https://c3.nasa.gov/dashlink/resources/997/
Air Pressure System Failure at Scania Trucks. https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge
Generic, Scalable and Decentralised Fault Detection for Robot Swarms https://zenodo.org/record/831471##.WwQIPUgvxPY
Wind turbine data (e.g., failures). https://opendata.edp.com/pages/homepage/
§3.3. Economics and finance Two Sigma Financial Modelling Challenge. https://www.kaggle.com/c/two-sigma-financial-modeling/overview
Financial, economic, and alternative data sets, serving investment professionals. https://www.quandl.com/
§3.3.2. Forecasting GDP and Inflation Repository website with Dynare codes and data sets to estimate different DSGE models and use them for forecasting. https://github.com/johannespfeifer/dsge_mod
Data set for Macroeconomic variables for US economy. https://fred.stlouisfed.org/
Data set for Macroeconomic variables for OECD economy. https://data.oecd.org/
§3.3.2. Forecasting GDP and Inflation (continued) Data set for Macroeconomic variables for EU economy. https://ec.europa.eu/eurostat/data/database
§3.3.7. House price forecasting Zillow Prize: Zillow’s Home Value Prediction (Zestimate). https://www.kaggle.com/c/zillow-prize-1
Sberbank Russian Housing Market. https://www.kaggle.com/c/sberbank-russian-housing-market
Western Australia Rental Prices. https://www.kaggle.com/c/deloitte-western-australia-rental-prices
§3.3.12. Forecasting returns to investment style Algorithmic Trading Challenge. https://www.kaggle.com/c/AlgorithmicTradingChallenge
§3.3.13. Forecasting stock returns The Winton Stock Market Challenge. https://www.kaggle.com/c/the-winton-stock-market-challenge
The Big Data Combine Engineered by BattleFin. https://www.kaggle.com/c/battlefin-s-big-data-combine-forecasting-challenge/data
§3.4. Energy VSB Power Line Fault Detection. https://www.kaggle.com/c/vsb-power-line-fault-detection
ASHRAE – Great Energy Predictor III https://www.kaggle.com/c/ashrae-energy-prediction
§3.4.3. Hybrid machine learning system for short-term load forecasting Global Energy Forecasting Competition 2012 – Load Forecasting. https://www.kaggle.com/c/global-energy-forecasting-competition-2012-load-forecasting
§3.4.6. Wind power forecasting Global Energy Forecasting Competition 2012 – Wind Forecasting. https://www.kaggle.com/c/GEF2012-wind-forecasting
§3.4.8. Solar power forecasting Power measurements from a PV power plant and grid of numerical weather predictions. https://doi.org/10.25747/edf8-m258
AMS 2013-2014 Solar Energy Prediction Contest. https://www.kaggle.com/c/ams-2014-solar-energy-prediction-contest
SolarTechLab data set. https://ieee-dataport.org/open-access/photovoltaic-power-and-weather-parameters
§3.4.9. Long-term simulation for large electrical power systems Brazilian National Electric Systems Operator (hydro, solar, wind, nuclear and thermal generation data). http://www.ons.org.br/Paginas/resultados-da-operacao/historico-da-operacao/geracao_energia.aspx
§3.4.10. Collaborative forecasting in the energy sector Solar power time series from 44 small-scale PV in Évora, Portugal. https://doi.org/10.25747/gywm-9457
Australian Electricity Market Operator (AEMO) 5 Minute Wind Power Data. https://doi.org/10.15129/9e1d9b96-baa7-4f05-93bd-99c5ae50b141
Electrical energy consumption data from domestic consumers. data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households
Electric vehicles charging data (arrivals, departures, current, voltage, etc.). https://eatechnology.com/consultancy-insights/my-electric-avenue/
Wind power plant data and numerical weather predictions from CNR (France). https://challengedata.ens.fr/challenges/34
§3.5.2. Weather forecasting How Much Did It Rain? https://www.kaggle.com/c/how-much-did-it-rain-ii
§3.5.3. Air quality forecasting EMC Data Science Global Hackathon (Air Quality Prediction). https://www.kaggle.com/c/dsg-hackathon/overview
§3.6. Social good and demographic forecasting LANL Earthquake Prediction. https://www.kaggle.com/c/LANL-Earthquake-Prediction
§3.6.1. Healthcare Flu Forecasting. https://www.kaggle.com/c/genentech-flu-forecasting
West Nile Virus Prediction. https://www.kaggle.com/c/predict-west-nile-virus
§3.6.2. Epidemics and pandemics COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. https://github.com/CSSEGISandData/COVID-19
§3.6.3. Forecasting mortality Human Mortality Database. https://www.mortality.org
EuroMOMO. https://www.euromomo.eu/
The Economist. https://github.com/TheEconomist/covid-19-excess-deaths-tracker
The New York Times. https://github.com/Financial-Times/coronavirus-excess-mortality-data
§3.6.3. Forecasting mortality data (continued) The Financial Times. https://github.com/nytimes/covid-19-data/tree/master/excess-deaths
ANACONDA- Quality assessment of mortality data. https://crvsgateway.info/anaconda
Australian Human Mortality Database. https://demography.cass.anu.edu.au/research/australian-human-mortality-database
Canadian Human Mortality Database. http://www.bdlc.umontreal.ca/CHMD/
French Human Mortality Database. https://frdata.org/fr/french-human-mortality-database
§3.6.4. Forecasting fertility Human Fertility Database: fertility data for developed countries with complete birth registration based on official vital statistics. https://www.humanfertility.org/
World Fertility Data: UN’s collection of fertility data based on additional data sources such as surveys. https://www.un.org/development/desa/pd/data/world-fertility-data
§3.6.5. Forecasting migration Integrated Modelling of European Migration (IMEM) Database, with estimates of migration flows between 31 European countries by origin, destination, age and sex, for 2002–2008. https://www.imem.cpc.ac.uk/
QuantMig data inventory: meta-inventory on different sources of data on migration and its drivers, with European focus. https://quantmig.eu/data_inventory/
Bilateral international migration flow estimates for 200 countries (Abel & Cohen, 2019). https://doi.org/10.1038/s41597-019-0089-3
UN World Population Prospects: UN global population estimates and projections, including probabilistic https://population.un.org/wpp/
§3.8. Other applications Forecast Eurovision Voting. https://www.kaggle.com/c/Eurovision2010
Reducing Commercial Aviation Fatalities. https://www.kaggle.com/c/reducing-commercial-aviation-fatalities
Porto Seguro’s Safe Driver Prediction. https://www.kaggle.com/c/porto-seguro-safe-driver-prediction
Recruit Restaurant Visitor Forecasting. https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting
Restaurant Revenue Prediction. https://www.kaggle.com/c/restaurant-revenue-prediction
Coupon Purchase Prediction. https://www.kaggle.com/c/coupon-purchase-prediction
Bike Sharing Demand https://www.kaggle.com/c/bike-sharing-demand
Google Analytics Customer Revenue Prediction. https://www.kaggle.com/c/ga-customer-revenue-prediction
Santander Value Prediction Challenge. https://www.kaggle.com/c/santander-value-prediction-challenge
Santander Customer Transaction Predictio.n https://www.kaggle.com/c/santander-customer-transaction-prediction
Acquire Valued Shoppers Challenge https://www.kaggle.com/c/acquire-valued-shoppers-challenge
Risky Business https://www.kaggle.com/c/risky-business
Web Traffic Time Series Forecasting. https://www.kaggle.com/c/web-traffic-time-series-forecasting
A repository of data sets, including time series ones, that can be used for benchmarking forecasting methods in various applications of interest. https://github.com/awesomedata/awesome-public-datasets
WSDM – KKBox’s Churn Prediction Challenge. https://www.kaggle.com/c/kkbox-churn-prediction-challenge/overview
Homesite Quote Conversion https://www.kaggle.com/c/homesite-quote-conversion
Liberty Mutual Group: Property Inspection Prediction. https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Liberty Mutual Group – Fire Peril Loss Cost. https://www.kaggle.com/c/liberty-mutual-fire-peril
A set of more than 490,000 time series (micro, macro, demographic, finance, other) to download. http://fsudataset.com/
§3.8. Other applications (continued) A repository of data sets, including time series ones, that can be used for benchmarking forecasting methods in various applications of interest. https://github.com/awesomedata/awesome-public-datasets
§3.8.1. Tourism demand forecasting TourMIS: Database with annual and monthly tourism time series (e.g., arrivals and bednights) covering European countries, regions, and cities (free registration required). https://www.tourmis.info/index_e.html
Tourism Forecasting. https://www.kaggle.com/c/tourism1; https://www.kaggle.com/c/tourism2
§3.8.2. Forecasting for aviation Airline and Airport performance data provided by the U.S. Department of Transportation https://www.transtats.bts.gov/
§3.8.3. Traffic flow forecasting New York City Taxi Fare Prediction. https://www.kaggle.com/c/demand-forecasting-kernels-only
RTA Freeway Travel Time Prediction https://www.kaggle.com/c/RTA
ECML/PKDD 15: Taxi Trip Time Prediction. https://www.kaggle.com/c/pkdd-15-taxi-trip-time-prediction-ii
BigQuery-Geotab Intersection Congestion. https://www.kaggle.com/c/bigquery-geotab-intersection-congestion/overview
Traffic volume counts collected by DOT for New York Metropolitan Transportation Council. https://data.cityofnewyork.us/Transportation/Traffic-Volume-Counts-2014-2019-/ertz-hr4r
§3.8.5. Elections forecasting New Zeal and General Elections – Official results and statistics. <https:/ /www.electionresults.govt.nz>
Spanish Elections – Official results and statistics. https://dataverse.harvard.edu/dataverse/SEA
§3.8.6. Sports forecasting NFL Big Data Bowl. https://www.kaggle.com/c/nfl-big-data-bowl-2020
§3.8.9. Forecasting under data integrity attacks Microsoft Malware Prediction. https://www.kaggle.com/c/microsoft-malware-prediction

Bibliography

Abel, G. J., & Cohen, J. E. (2019). Bilateral international migration flow estimates for 200 countries. Scientific Data, 6(1), 82. https://doi.org/10.1038/s41597-019-0089-3