Adaptive Feature Selection Ensembles for Weather Forecasting

Supervisor: Professor Qiang Shen (qqs@aber.ac.uk)

Traditional weather forecasting has been built on the foundation of deterministic modelling. The forecast process typically starts with putting certain initial conditions into a sophisticated computational model, and ends with a prediction about the forthcoming weather. This project will follow a different approach, with ensemble-based forecasting [2]. Such an approach was first introduced in the early 1990s. In this method, results of (up to hundreds of) different computer runs, each with slight variations in the starting conditions or model assumptions, are combined to derive the final forecast. Ensembles may provide more accurate statements about the uncertainty in daily and seasonal forecasting. Existing techniques for constructing classifier ensembles often require the development of a group of classifiers with diverse training backgrounds [3], [9] and subsequently, the aggregation of their decisions to produce the final classification outcome. Varying the feature subsets used by each member of the ensemble may help to promote this necessary diversity [8], whilst reducing the computational complexity that arises when the classification algorithms are applied to high dimensional data sets [5]. Such work is generally referred to as “feature selection for ensembles” [6], [7]. Several recently developed nature-inspired feature selection search techniques [1], [4] are capable of forming multiple, compact, and high quality feature subsets, which may prove valuable for the construction of the desirable ensembles for weather forecasting. Many application problems, including predictions of weather conditions, are concerned with data sources that are constantly changing. The data volume may grow both in terms of attributes and objects, and given information may become invalid or irrelevant over time. In order to maintain the preciseness and effectiveness of the extracted knowledge, it is necessary to develop suitable strategies to handle such dynamic data. Adaptive ensemble-based techniques are of particular significance for the prediction of natural disasters, and unusual, severe, or unseasonal weather (commonly referred to as extreme weather) that lies at the extremes of historical distributions.

The aim of this PhD project is to develop feature selection-based ensemble methods that actively form and refine ensembles in a dynamic environment. The work will have a particular focus on scenarios of weather forecasting. Despite being a largely application-oriented project, significant underlying theoretical investigations are necessary. At the initial phase, the project will look into a number of existing approaches proposed to address the different aspects of a dynamic system, as well as the ensemble techniques established to resolve conventional, static problems. A core part of the project will involve the design and implementation of a specific adaptive feature selection ensemble that will work with carefully selected weather forecasting aspects. The implemented system will be evaluated with respect to simulated bench mark data sets first, followed by a close examination of how such a system may perform when applied to a forecasting problem of realistic complexity.

REFERENCES

[1] R. Diao and Q. Shen, “Feature selection with harmony search,” IEEE Trans. Syst., Man, Cybern. B, vol. 42, no. 6, pp. 1509–1523, 2012.
[2] T. Gneiting and A. E. Raftery, “Weather forecasting with ensemble methods,” Science, vol. 310, no. 5746, pp. 248–249, 2005. [Online].
Available: http://www.sciencemag.org/content/310/5746/248.short
[3] T. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp. 832 –844, aug 1998.
[4] M. Kabir, M. Shahjahan, and K. Murase, “A new hybrid ant colony optimization algorithm for feature selection,” Expert Syst. Appl., vol. 39, no. 3, pp. 3747 – 3763, 2012.
[5] J. Li, “A combination of de and svm with feature selection for road icing forecast,” in Informatics in Control, Automation and Robotics (CAR), 2010 2nd International Asia Conference on, vol. 2, march 2010, pp. 509 –512.
[6] J. S. Olsson, “Combining feature selectors for text classification,” in Proc. the 15th ACM international conference on Information and knowledge management, 2006, pp. 798–799.
[7] D. Opitz, “Feature selection for ensembles,” in Proceedings of 16th National Conference on Artificial Intelligence. Press, 1999, pp. 379–384.
[8] L. Saitta, “Hypothesis diversity in ensemble classification,” in Foundations of Intelligent Systems, ser. Lecture Notes in Computer Science, F. Esposito, Z. Ra, D. Malerba, and G. Semeraro, Eds. Springer Berlin Heidelberg, 2006, vol. 4203, pp. 662–670.
[9] X. Wu, V. Kumar, J. Ross Q., J. Ghosh, Q. Yang, H. Motoda, G. McLachlan, A. Ng, B. Liu, P. Yu, Z.-H. Zhou, M. Steinbach, D. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, pp. 1–37, 2008.