The last Ensemble method we will discuss in this series is called stacking (short for stacked generalization). It is based on a simple idea: instead of using trivial functions (such as hard voting) to aggregate the predictions of all predictors in an ensemble, why don’t we train a model to perform this aggregation? Figure below shows such an ensemble performing a regression task on a new instance. Each of the bottom three predictors predicts a different value (3.1, 2.7, and 2.9), and then the final predictor (called a blender, or a meta learner) takes these predictions as inputs and makes the final prediction (3.0).

Diagram illustrating the stacking ensemble method in machine learning, showing a new instance being predicted by three base predictors (3.1, 2.7, and 2.9) and a blending model outputting the final prediction (3.0).
Aggregating predictions using a blending predictor

To train the blender, a common approach is to use a hold-out set.19 Let’s see how it works. First, the training set is split in two subsets. The first subset is used to train the predictors in the first layer.

Diagram illustrating the process of splitting a training set into subsets for training predictors in a stacking ensemble method.
Training the first layer

Next, the first layer predictors are used to make predictions on the second (held-out) set (see Figure below). This ensures that the predictions are “clean,” since the predictors never saw these instances during training. Now for each instance in the hold-out set there are three predicted values. We can create a new training set using these predicted values as input features (which makes this new training set three-dimensional), and keeping the target values. The blender is trained on this new training set, so it learns to predict the target value given the first layer’s predictions.

Diagram illustrating the stacking ensemble method in machine learning. It shows a 'Blender' model combining predictions from first-layer predictors, demonstrating the flow from training, predictions, and utilizing a second subset for final predictions.

It is actually possible to train several different blenders this way (e.g., one using Linear Regression, another using Random Forest Regression, and so on): we get a whole layer of blenders. The trick is to split the training set into three subsets: the first one is used to train the first layer, the second one is used to create the training set used to train the second layer (using predictions made by the predictors of the first layer), and the third one is used to create the training set to train the third layer (using predictions made by the predictors of the second layer). Once this is done, we can make a prediction for a new instance by going through each layer sequentially, as shown in Figure below.

Diagram illustrating a stacking ensemble method with three layers. Each layer contains predictors making different predictions for a new instance, culminating in a final prediction from the top layer.
Predictions in a multilayer stacking ensemble

Unfortunately, Scikit-Learn does not support stacking directly, but it is not too hard to roll out your own implementation (see the following exercises). Alternatively, you can use an open source implementation such as brew (available at https://github.com/viisar/brew).

By

Leave a Reply

Discover more from Geeky Codes

Subscribe now to keep reading and get access to the full archive.

Continue reading