Ensemble models have been around for a while. In essence co-operative ensemble models take many independent predictive models and combine their results for greater accuracy and generalisation. Think "the wisdoms of crowds" applied to predictive models.

For example, the well known Netflix prize, where competitors had to advance the existing RMSE scores on a movie recommendation challenge, was won by a team that employed a combination of 107 different models. Whilst they noted in the detailed write-up of their approach that they could have dramatically cut it back to far fewer models, they do note that even the reduced combination of models that yielded an 88% RMSE involved numerous approaches: k-NN, factorisation, restricted Boltzmann Machines (RBMs) and asymmetric factor models.

Whilst there are no rigorous guidelines for what models to use as constituent models - that is the "weak learners", the ideal scenario would be a set of models where each of the models generalise well, and when they do make errors on new data, these errors are not shared with any other models. i.e the models are not strongly correlated. In other words, diversity matters!

Enter the Kaggle Homesite Quote Conversion challenge. This open challenge hosted by Kaggle.com was a binary classification problem to predict which customers would purchase a given home insurance quote, thus "converting" them in marketing terms. More interestingly is that the top 2 entries have both provided detailed write up of their approaches, with both teams making use of ensemble learning.

Here is a detailed write up of the winners submission. They started with 500 models before using 125 of these in an 3-meta layer ensemble. Unsurprisingly they put a lot of effort into identifying good algorithms/parameters for single model performance, as well investigating options to ensure model diversity. Their approach is depicted below:

Here is a detailed write-up of the submission from the 2nd place team. They used 600 base models, of which 500 were created via a robot that automatically trained models using a variety of techniques including gradient boosting (XGBoost), logistic regression and neural networks, all on randomly chosen features. They used 3 different types of blenders all of which outperformed the best single model. Their approach is depicted below:

Ensemble techniques seem to be rather prevalent on Kaggle amongst the winning submissions. If you want to know more about employing this approach on Kaggle check out the Kaggle ensembling guide.

Ensemble learning... some may call it data snooping, others call it progress. :-)

Bookmark and Share