Different curves
The first novelty of the model that Veeramachaneni developed with his colleagues - Una-May O'Reilly, a principal research scientist at CSAIL, and Alfredo Cuesta-Infante of the Universidad Rey Juan Carlos in Madrid - is that it can factor in data from more than one weather station. In some of their analyses, the researchers used data from 15 or more other sites.
But its main advantage is that it's not restricted to Gaussian probability distributions. Moreover, it can use different types of distributions to characterize data from different sites, and it can combine them in different ways.
It can even use so-called nonparametric distributions, in which the data are described not by a mathematical function, but by a collection of samples, much the way a digital music file consists of discrete samples of a continuous sound wave.
Another aspect of the model is that it can find nonlinear correlations between data sets. Standard regression analysis, of the type commonly used in the wind industry, identifies the straight line that best approximates a scattering of data points, according to some distance measure. But often, a curved line would offer a better approximation. The researchers' model allows for that possibility.
Validation
The researchers first applied their technique to data collected from an anemometer on top of the MIT Museum, which was looking to install a wind turbine on its roof. Once they had evidence of their model's accuracy, they applied it to data provided to them by a major consultant in the wind industry.
With only three months of the company's historical data for a particular wind farm site, Veeramachaneni and his colleagues were able to predict wind speeds over the next two years three times as accurately as existing models could with eight months of data.
Since then, the researchers have improved their model by evaluating alternative ways of calculating joint distributions. According to additional analysis of the data from the Museum of Science, which is reported in the new paper, their revised approach could double the accuracy of their predictions.