Book

Home » Book

 

Welcome to my ebook Predictive Modeling – Principles & Practice. 

My vision for the book is simple. One does not need to go through years of culinary schooling in order to prepare a great meal. All you need is a great recipe. I have tried to pack a lot of practical usefulness in powerful little recipes that you can execute quickly and easily. Take a look through the menu below, and choose your adventure!

 


Table of Contents

Note: You will notice that most of the table of contents is not yet hyperlinked. This is because I am still working on those posts! I am adding new content every week. You can subscribe to get notified of new posts. Articles that include downloadable content are indicated with a little graphic. For example,  indicates that the article includes a downloadable excel file, or a SQL file , or an R script , TensorFlow script etc.

1. Read.Me
<span class='shortdescr'></span>
2. Setting up your Predictive Modeling environment
<span class='shortdescr'></span>
3. Making a Numeric Prediction
<span class='shortdescr'></span>
> 3.1 Regression Analysis
<span class='shortdescr'></span>
>> 3.1.1 Ordinary Least Squares (OLS)
<span class='shortdescr'></span>
>>> 3.1.1.1 Python OLS: Basic Example 
>>> 3.1.1.2 R OLS: Basic Example 
>>> 3.1.1.3 Julia OLS: Basic Example 
>>> 3.1.1.4 Python OLS: Advanced Case Study: Industrial data
>>> 3.1.1.5 Python OLS: Advanced Case Study: Very Large Data
>>> 3.1.1.6 Python OLS: Boston Housing Prices Data 
>> 3.1.2 Non-Negative Least Squares 
>> 3.1.3 Logistic Regression
>> 3.1.4 Stochastic Gradient Descent (SGD) Regression 
>> 3.1.5 Stepwise Regression
<span class='shortdescr'></span>
>> 3.1.6 Generalized Linear Model (GLM)
>> 3.1.7 Generalized Additive Model (GAM)
>> 3.1.8 Ridge Regression
<span class='shortdescr'></span>
>> 3.1.9 Isotonic Regression
>> 3.1.10 Lasso Regression
<span class='shortdescr'></span>
>> 3.1.11 Support Vector Regression
>> 3.1.12 Robust Regression
>> 3.1.13 ElasticNet Regression
<span class='shortdescr'></span>
>> 3.1.14 Symbolic Regression
<span class='shortdescr'></span>
> 3.2 Neural Networks
<span class='shortdescr'></span>
>> 3.2.1 Inspired from Our Brain
>> 3.2.2 Types of Neural Networks
>> 3.2.3 The Multi-Layer Perceptron
<span class='shortdescr'></span>
>> 3.2.4 TensorFlow
<span class='shortdescr'></span>
>> 3.2.5 Vector Quantization
>> 3.2.6 Python: Neural Networks
<span class='shortdescr'></span>
>>> 3.2.6.1 A Basic Example
>>> 3.2.6.2 Advanced: Case Study #1: Industrial Data
>>> 3.2.6.3 Advanced: Case Study #2: Large Number of Variables
>> 3.2.7 R: Neural Networks
>> 3.2.8 Julia: Neural Networks
>> 3.2.9 AiXQL: Neural Networks 
> 3.3 Stochastic Machines
<span class='shortdescr'></span>
>> 3.3.1 Support Vector Machine
>> 3.3.2 Boltzmann Machine
>> 3.3.3 Simulated Annealing
>> 3.3.4 Genetic Algorithms
>> 3.3.5 Matrix Factorization Method
> 3.4 Time-Series Methods
<span class='shortdescr'></span>
>> 3.4.1 Autoregression (AR)
>> 3.4.2 Moving Average (MA)
>> 3.4.3 Autoregressive Moving Average (ARMA)
>> 3.4.4 Autoregressive Moving Integrated Average (ARIMA)
>> 3.4.5 Seasonal Autoregressive Moving Integrated Average (SARIMA)
4. Making a prediction about class or category
<span class='shortdescr'></span>
> 4.1 SGD Classifier
> 4.2 Linear SVC Method
> 4.3 Lazy Classifiers
<span class='shortdescr'></span>
>> 4.3.1 Nearest Neighbor
<span class='shortdescr'></span>
>>> 4.3.1.1 k-Nearest Neighbor
>> 4.3.2 K* Algorithm
>> 4.3.3 Bayesian Rules Classifier
>> 4.3.4 Locally Weighted Learning
> 4.4 Kernel Methods
<span class='shortdescr'></span>
>> 4.4.1 Kernel Density Estimation
> 4.5 Classification Tree Algorithms
<span class='shortdescr'></span>
>> 4.5.1 Decision Tree
<span class='shortdescr'></span>
>> 4.5.2 Naive Bayes Tree
>> 4.5.3 CART
>> 4.5.4 CHAID
>> 4.5.5 Decision Stump
>> 4.5.6 Random Forest
<span class='shortdescr'></span>
>> 4.5.7 C4.5 or J4.8
>> 4.5.8 AdaBoost
<span class='shortdescr'></span>
>> 4.5.9 Gradient Boosted Tree
<span class='shortdescr'></span>
>> 4.5.10 Alternating Decision Tree
> 4.6 Bayesian Classifiers
<span class='shortdescr'></span>
>> 4.6.1 Averaged, One-Dependence Estimators
>> 4.6.2 BayesNet
>> 4.6.3 Complement Naive Bayes
>> 4.6.4 Naive Bayes
>> 4.6.5 Hidden Naive Bayes
>> 4.6.6 DBNBText
>> 4.6.7 AODEsr (Subsumption Resolution)
>> 4.6.8 WAODE
> 4.7 Rule Based Algorithms
<span class='shortdescr'></span>
>> 4.7.1 Decision Table
>> 4.7.2 OneR
>> 4.7.3 ZeroR
>> 4.7.4 Conjunctive Rule
>> 4.7.5 PART
>> 4.7.6 NNGE
>> 4.7.7 PRISM
>> 4.7.8 M5Rules
>> 4.7.9 RIDOR
>> 4.7.10 JRIP
>> 4.7.11 Ordinal Learning Method
>> 4.7.12 Fuzzy Unordered Rule Induction
5. Unsupervised Learning
<span class='shortdescr'></span>
> 5.1 Nearest Neighbor
<span class='shortdescr'></span>
>> 5.1.1 k-Nearest Neighbor
> 5.2 K* Algorithm
> 5.3 Bayesian Rules Classifier
> 5.4 Locally Weighted Learning
> 5.5 Self-Organizing Maps 
6. Exploring Complexity
<span class='shortdescr'></span>
> 6.1 Cellular Automata
> 6.2 Complex Adaptive Systems
> 6.3 Agent-based modeling
7. Measuring Performance
<span class='shortdescr'></span>
> 7.1 Error Types
> 7.2 Loss Functions
> 7.3 Performance Metrics
<span class='shortdescr'></span>
>> 7.3.1 Metric Selection
> 7.4 Validation
<span class='shortdescr'></span>
>> 7.4.1 Split Sampling
>> 7.4.2 Cross-Validation
>> 7.4.3 Bootstrapping
> 7.5 Estimation Error Measurement
<span class='shortdescr'></span>
>> 7.5.1 R-Square
>> 7.5.2 Weighted R-Square
>> 7.5.3 Adjusted R-Square
>> 7.5.4 Absolute Error
>> 7.5.5 Prediction Error
>> 7.5.6 RMSE
>> 7.5.7 Correlation Coefficient
> 7.6 Classification Error Measurement
<span class='shortdescr'></span>
>> 7.6.1 Confusion Matrix
>> 7.6.2 Sensitivity & Specificity
>> 7.6.3 Precision & Accuracy
>> 7.6.4 Entropy
>> 7.6.5 Kappa Statistic
> 7.7 Visualizing Performance
<span class='shortdescr'></span>
>> 7.7.1 Lift Charts
>> 7.7.2 ROC Curves
>> 7.7.3 Lorenz Curves & Gini Coefficient
8. Automated Predictive Modeling
<span class='shortdescr'></span>
9. Practical Matters
<span class='shortdescr'></span>
> 9.1 Blueprinting & Prototyping
> 9.2 Managing Expectation
> 9.3 Communication
> 9.4 Documentation
<span class='shortdescr'></span>
>> 9.4.1 Excel Documentation
>> 9.4.2 SQL Documentation
>> 9.4.3 Project Documentation
>> 9.4.4 Notes & Assumptions
> 9.5 Monitoring & Maintenance
> 9.6 Folder Organization
10. Responsibility & Ethics
<span class='shortdescr'></span>
> 10.1 With great power…
11. Interesting Applications
<span class='shortdescr'></span>

Recent Posts

Since most of the table of contents is not yet hyperlinked, you can see some of the more recent posts below for easier access.

2021
05
Jun
1 Comment
In this post we get to see an example of self-organizing map (or SOM) and also see competitive learning in action. This is where one neuron wins at each presentation of input data, and in this way we are able to map a few neurons to large and complex data. The[...]
16
Jan
2020
11
Dec
11
Dec
11
Dec
21
Nov
20
Nov
20
Nov
14
Nov
No Comments
In the video below I provide a brief overview of Microsoft's Azure AutoML. For background on AutoML, read this.  
14
Nov
No Comments
The mitigation of manual labor through automation has always been a goal, especially since the dawn of machines with the industrial revolution. While the term automation was coined in the 1940's as it related to motor vehicle assembly, today the term has another meaning. Automation of data science/predictive modeling/machine learning[...]