Appendix C. Data sets
Table A.2: A list of indicative publicly available data sets.
Related section | Description | Link |
---|---|---|
§2.7.9. Deep Probabilistic Models | Data for wikipedia page views, Dominicks retail, electricity consumption, traffic lane occupation. | https://gluon-ts.mxnet.io/api/gluonts/gluonts.dataset.repository.datasets.html |
§2.9.3. Forecasting with text Information | Movie reviews data provided by the Stanford NLP group. | https://nlp.stanford.edu/sentiment/code.html |
§2.10.4. Ecological inference forecasting | Party registration in South-East North Carolina (eiPack R package). | https://www2.ncleg.net/RnR/Redistricting/BaseData2001 |
ei.Datasets: Real Datasets for Assessing Ecological Inference Algorithms. | https://cran.csiro.au/web/packages/ei.Datasets/index.html | |
§2.12.7. Forecasting competitions | Data for the M, M2, M3 and M4 forecasting competitions. | https://forecasters.org/resources/time-series-data/ |
Time Series Competition Data (R package) | https://github.com/robjhyndman/tscompdata | |
Mcomp: Data for the M and M3 forecasting competitions (R package). | https://cran.r-project.org/package=Mcomp | |
M4comp2018: Data for the M4 forecasting competition (R package). | https://github.com/carlanetto/M4comp2018 | |
Data for the M4 forecasting competition (csv files). | https://github.com/Mcompetitions/M4-methods/tree/master/Dataset | |
Tcomp: Data from the 2010 Tourism forecasting competition (R package) | https://cran.r-project.org/package=Tcomp | |
Data for the M5 forecasting competition (csv files). | https://github.com/Mcompetitions/M5-methods/tree/master/Dataset | |
§3.2.3. Forecasting for inventories | Grupo Bimbo Inventory Demand. | https://www.kaggle.com/c/grupo-bimbo-inventory-demand |
§3.2.4. Forecasting in retail | Rossmann Store Sales. | https://www.kaggle.com/c/rossmann-store-sales |
Corporación Favorita Grocery Sales Forecasting. | https://www.kaggle.com/c/favorita-grocery-sales-forecasting | |
Walmart Recruiting – Store Sales Forecasting. | https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting | |
Walmart Recruiting II: Sales in Stormy Weather. | https://www.kaggle.com/c/walmart-recruiting-sales-in-stormy-weather | |
Store Item Demand Forecasting Challenge. | https://www.kaggle.com/c/demand-forecasting-kernels-only | |
Online Product Sales. | https://www.kaggle.com/c/online-sales | |
§3.2.8. Predictive maintenance | Robot Execution Failures. | https://archive.ics.uci.edu/ml/datasets/Robot+Execution+Failures |
Gearbox Fault Detection. | https://c3.nasa.gov/dashlink/resources/997/ | |
Air Pressure System Failure at Scania Trucks. | https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge | |
Generic, Scalable and Decentralised Fault Detection for Robot Swarms | https://zenodo.org/record/831471##.WwQIPUgvxPY | |
Wind turbine data (e.g., failures). | https://opendata.edp.com/pages/homepage/ | |
§3.3. Economics and finance | Two Sigma Financial Modelling Challenge. | https://www.kaggle.com/c/two-sigma-financial-modeling/overview |
Financial, economic, and alternative data sets, serving investment professionals. | https://www.quandl.com/ | |
§3.3.2. Forecasting GDP and Inflation | Repository website with Dynare codes and data sets to estimate different DSGE models and use them for forecasting. | https://github.com/johannespfeifer/dsge_mod |
Data set for Macroeconomic variables for US economy. | https://fred.stlouisfed.org/ | |
Data set for Macroeconomic variables for OECD economy. | https://data.oecd.org/ | |
§3.3.2. Forecasting GDP and Inflation (continued) | Data set for Macroeconomic variables for EU economy. | https://ec.europa.eu/eurostat/data/database |
§3.3.7. House price forecasting | Zillow Prize: Zillow’s Home Value Prediction (Zestimate). | https://www.kaggle.com/c/zillow-prize-1 |
Sberbank Russian Housing Market. | https://www.kaggle.com/c/sberbank-russian-housing-market | |
Western Australia Rental Prices. | https://www.kaggle.com/c/deloitte-western-australia-rental-prices | |
§3.3.12. Forecasting returns to investment style | Algorithmic Trading Challenge. | https://www.kaggle.com/c/AlgorithmicTradingChallenge |
§3.3.13. Forecasting stock returns | The Winton Stock Market Challenge. | https://www.kaggle.com/c/the-winton-stock-market-challenge |
The Big Data Combine Engineered by BattleFin. | https://www.kaggle.com/c/battlefin-s-big-data-combine-forecasting-challenge/data | |
§3.4. Energy | VSB Power Line Fault Detection. | https://www.kaggle.com/c/vsb-power-line-fault-detection |
ASHRAE – Great Energy Predictor III | https://www.kaggle.com/c/ashrae-energy-prediction | |
§3.4.3. Hybrid machine learning system for short-term load forecasting | Global Energy Forecasting Competition 2012 – Load Forecasting. | https://www.kaggle.com/c/global-energy-forecasting-competition-2012-load-forecasting |
§3.4.6. Wind power forecasting | Global Energy Forecasting Competition 2012 – Wind Forecasting. | https://www.kaggle.com/c/GEF2012-wind-forecasting |
§3.4.8. Solar power forecasting | Power measurements from a PV power plant and grid of numerical weather predictions. | https://doi.org/10.25747/edf8-m258 |
AMS 2013-2014 Solar Energy Prediction Contest. | https://www.kaggle.com/c/ams-2014-solar-energy-prediction-contest | |
SolarTechLab data set. | https://ieee-dataport.org/open-access/photovoltaic-power-and-weather-parameters | |
§3.4.9. Long-term simulation for large electrical power systems | Brazilian National Electric Systems Operator (hydro, solar, wind, nuclear and thermal generation data). | http://www.ons.org.br/Paginas/resultados-da-operacao/historico-da-operacao/geracao_energia.aspx |
§3.4.10. Collaborative forecasting in the energy sector | Solar power time series from 44 small-scale PV in Évora, Portugal. | https://doi.org/10.25747/gywm-9457 |
Australian Electricity Market Operator (AEMO) 5 Minute Wind Power Data. | https://doi.org/10.15129/9e1d9b96-baa7-4f05-93bd-99c5ae50b141 | |
Electrical energy consumption data from domestic consumers. | data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households | |
Electric vehicles charging data (arrivals, departures, current, voltage, etc.). | https://eatechnology.com/consultancy-insights/my-electric-avenue/ | |
Wind power plant data and numerical weather predictions from CNR (France). | https://challengedata.ens.fr/challenges/34 | |
§3.5.2. Weather forecasting | How Much Did It Rain? | https://www.kaggle.com/c/how-much-did-it-rain-ii |
§3.5.3. Air quality forecasting | EMC Data Science Global Hackathon (Air Quality Prediction). | https://www.kaggle.com/c/dsg-hackathon/overview |
§3.6. Social good and demographic forecasting | LANL Earthquake Prediction. | https://www.kaggle.com/c/LANL-Earthquake-Prediction |
§3.6.1. Healthcare | Flu Forecasting. | https://www.kaggle.com/c/genentech-flu-forecasting |
West Nile Virus Prediction. | https://www.kaggle.com/c/predict-west-nile-virus | |
§3.6.2. Epidemics and pandemics | COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. | https://github.com/CSSEGISandData/COVID-19 |
§3.6.3. Forecasting mortality | Human Mortality Database. | https://www.mortality.org |
EuroMOMO. | https://www.euromomo.eu/ | |
The Economist. | https://github.com/TheEconomist/covid-19-excess-deaths-tracker | |
The New York Times. | https://github.com/Financial-Times/coronavirus-excess-mortality-data | |
§3.6.3. Forecasting mortality data (continued) | The Financial Times. | https://github.com/nytimes/covid-19-data/tree/master/excess-deaths |
ANACONDA- Quality assessment of mortality data. | https://crvsgateway.info/anaconda | |
Australian Human Mortality Database. | https://demography.cass.anu.edu.au/research/australian-human-mortality-database | |
Canadian Human Mortality Database. | http://www.bdlc.umontreal.ca/CHMD/ | |
French Human Mortality Database. | https://frdata.org/fr/french-human-mortality-database | |
§3.6.4. Forecasting fertility | Human Fertility Database: fertility data for developed countries with complete birth registration based on official vital statistics. | https://www.humanfertility.org/ |
World Fertility Data: UN’s collection of fertility data based on additional data sources such as surveys. | https://www.un.org/development/desa/pd/data/world-fertility-data | |
§3.6.5. Forecasting migration | Integrated Modelling of European Migration (IMEM) Database, with estimates of migration flows between 31 European countries by origin, destination, age and sex, for 2002–2008. | https://www.imem.cpc.ac.uk/ |
QuantMig data inventory: meta-inventory on different sources of data on migration and its drivers, with European focus. | https://quantmig.eu/data_inventory/ | |
Bilateral international migration flow estimates for 200 countries (Abel & Cohen, 2019). | https://doi.org/10.1038/s41597-019-0089-3 | |
UN World Population Prospects: UN global population estimates and projections, including probabilistic | https://population.un.org/wpp/ | |
§3.8. Other applications | Forecast Eurovision Voting. | https://www.kaggle.com/c/Eurovision2010 |
Reducing Commercial Aviation Fatalities. | https://www.kaggle.com/c/reducing-commercial-aviation-fatalities | |
Porto Seguro’s Safe Driver Prediction. | https://www.kaggle.com/c/porto-seguro-safe-driver-prediction | |
Recruit Restaurant Visitor Forecasting. | https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting | |
Restaurant Revenue Prediction. | https://www.kaggle.com/c/restaurant-revenue-prediction | |
Coupon Purchase Prediction. | https://www.kaggle.com/c/coupon-purchase-prediction | |
Bike Sharing Demand | https://www.kaggle.com/c/bike-sharing-demand | |
Google Analytics Customer Revenue Prediction. | https://www.kaggle.com/c/ga-customer-revenue-prediction | |
Santander Value Prediction Challenge. | https://www.kaggle.com/c/santander-value-prediction-challenge | |
Santander Customer Transaction Predictio.n | https://www.kaggle.com/c/santander-customer-transaction-prediction | |
Acquire Valued Shoppers Challenge | https://www.kaggle.com/c/acquire-valued-shoppers-challenge | |
Risky Business | https://www.kaggle.com/c/risky-business | |
Web Traffic Time Series Forecasting. | https://www.kaggle.com/c/web-traffic-time-series-forecasting | |
A repository of data sets, including time series ones, that can be used for benchmarking forecasting methods in various applications of interest. | https://github.com/awesomedata/awesome-public-datasets | |
WSDM – KKBox’s Churn Prediction Challenge. | https://www.kaggle.com/c/kkbox-churn-prediction-challenge/overview | |
Homesite Quote Conversion | https://www.kaggle.com/c/homesite-quote-conversion | |
Liberty Mutual Group: Property Inspection Prediction. | https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction | |
Liberty Mutual Group – Fire Peril Loss Cost. | https://www.kaggle.com/c/liberty-mutual-fire-peril | |
A set of more than 490,000 time series (micro, macro, demographic, finance, other) to download. | http://fsudataset.com/ | |
§3.8. Other applications (continued) | A repository of data sets, including time series ones, that can be used for benchmarking forecasting methods in various applications of interest. | https://github.com/awesomedata/awesome-public-datasets |
§3.8.1. Tourism demand forecasting | TourMIS: Database with annual and monthly tourism time series (e.g., arrivals and bednights) covering European countries, regions, and cities (free registration required). | https://www.tourmis.info/index_e.html |
Tourism Forecasting. | https://www.kaggle.com/c/tourism1; https://www.kaggle.com/c/tourism2 | |
§3.8.2. Forecasting for aviation | Airline and Airport performance data provided by the U.S. Department of Transportation | https://www.transtats.bts.gov/ |
§3.8.3. Traffic flow forecasting | New York City Taxi Fare Prediction. | https://www.kaggle.com/c/demand-forecasting-kernels-only |
RTA Freeway Travel Time Prediction | https://www.kaggle.com/c/RTA | |
ECML/PKDD 15: Taxi Trip Time Prediction. | https://www.kaggle.com/c/pkdd-15-taxi-trip-time-prediction-ii | |
BigQuery-Geotab Intersection Congestion. | https://www.kaggle.com/c/bigquery-geotab-intersection-congestion/overview | |
Traffic volume counts collected by DOT for New York Metropolitan Transportation Council. | https://data.cityofnewyork.us/Transportation/Traffic-Volume-Counts-2014-2019-/ertz-hr4r | |
§3.8.5. Elections forecasting New Zeal | and General Elections – Official results and statistics. <https:/ | /www.electionresults.govt.nz> |
Spanish Elections – Official results and statistics. | https://dataverse.harvard.edu/dataverse/SEA | |
§3.8.6. Sports forecasting | NFL Big Data Bowl. | https://www.kaggle.com/c/nfl-big-data-bowl-2020 |
§3.8.9. Forecasting under data integrity attacks | Microsoft Malware Prediction. | https://www.kaggle.com/c/microsoft-malware-prediction |
Bibliography
Abel, G. J., & Cohen, J. E. (2019). Bilateral international migration flow estimates for 200 countries. Scientific Data, 6(1), 82. https://doi.org/10.1038/s41597-019-0089-3