AdaBoost

A Basic Example Using the Iris Dataset

Table of Contents

Overview

In this example, we execute an Adaptive Boosting (AdaBoost) model in order to classify plant species based on characteristic measurements of petals/sepals. We will not go into the mathematical details of the model. A few resources are listed below if you are interested in a deeper dive.

Briefly, an AdaBoost method is a meta classifier or an exmaple of an ensemble method. It fits a sequence of weak learners that are only a bit better than random guesses repeatedly on samples from trainin data. The predictions from these weak learners are averaged in a sort of voting method where better guessers are weighted higher. It is a search through the solution space looking for local minima (in hopes of finding a global one!) by improving random guessing weak learners through an iterative process.

Prerequisites

This script assumes that you have reviewed the following (or already have this know-how):

Data Exploration

First we import a the iris dataset, and print a description of it so we can examine what is in the data. Remember in order to execute a 'cell' like the one below, you can 1) click on it and run it using the run button above or 2) click in the cell and hit shift+enter.

We randomly select a quarter of our data to be the 'test' dataset. This way we can train our model on remaining data, and test it on data not used in training. Once we are confident that our model is generalizing well (i.e. there is not a HUGE different in the training/testing performance, or in other words, not obviously overfitting), then we can use all of our data to train the model.

AdaBoost

Briefly, an AdaBoost method is a meta classifier or an exmaple of an ensemble method. It fits a sequence of weak learners that are only a bit better than random guesses repeatedly on samples from trainin data. The predictions from these weak learners are averaged in a sort of voting method where better guessers are weighted higher. It is a search through the solution space looking for local minima (in hopes of finding a global one!) by improving random guessing weak learners through an iterative process.

We note that the performance of the AdaBoost model is similar to decision tree classifier, but inferior to random forest or gradient boosted trees on this dataset.

Feedback

If you have ideas on how to improve this post, please let me know: https://predictivemodeler.com/feedback/

Reference: py.iris_adaboost