Book

Home » Book

 

Welcome to my ebook Predictive Modeling – Principles & Practice. 

My vision for the book is simple. One does not need to go through years of culinary schooling in order to prepare a great meal. All you need is a great recipe. I have tried to pack a lot of practical usefulness in powerful little recipes that you can execute quickly and easily. Take a look through the menu below, and choose your adventure!

 


Table of Contents

Note: You will notice that most of the table of contents is not yet hyperlinked. This is because I am still working on those posts! I am adding new content every week. You can subscribe to get notified of new posts. Articles that include downloadable content are indicated with a little graphic. For example,  indicates that the article includes a downloadable excel file, or a SQL file , or an R script , TensorFlow script etc.

1. Read.Me
2. Setting up your Predictive Modeling environment
3. Making a Numeric Prediction
> 3.1 Regression Analysis
>> 3.1.1 Ordinary Least Squares (OLS)
>>> 3.1.1.1 Python OLS: Basic Example 
>>> 3.1.1.2 R OLS: Basic Example 
>>> 3.1.1.3 Julia OLS: Basic Example 
>>> 3.1.1.4 Python OLS: Advanced Case Study: Industrial data
>>> 3.1.1.5 Python OLS: Advanced Case Study: Very Large Data
>>> 3.1.1.6 Python OLS: Boston Housing Prices Data 
>> 3.1.2 Non-Negative Least Squares 
>> 3.1.3 Logistic Regression
>> 3.1.4 Stochastic Gradient Descent (SGD) Regression 
>> 3.1.5 Stepwise Regression
>> 3.1.6 Generalized Linear Model (GLM)
>> 3.1.7 Generalized Additive Model (GAM)
>> 3.1.8 Ridge Regression
>> 3.1.9 Isotonic Regression
>> 3.1.10 Lasso Regression
>> 3.1.11 Support Vector Regression
>> 3.1.12 Robust Regression
>> 3.1.13 ElasticNet Regression
>> 3.1.14 Symbolic Regression
> 3.2 Neural Networks
>> 3.2.1 Inspired from Our Brain
>> 3.2.2 Types of Neural Networks
>> 3.2.3 The Multi-Layer Perceptron
>> 3.2.4 TensorFlow
>> 3.2.5 Vector Quantization
>> 3.2.6 Python: Neural Networks
>>> 3.2.6.1 A Basic Example
>>> 3.2.6.2 Advanced: Case Study #1: Industrial Data
>>> 3.2.6.3 Advanced: Case Study #2: Large Number of Variables
>> 3.2.7 R: Neural Networks
>> 3.2.8 Julia: Neural Networks
>> 3.2.9 AiXQL: Neural Networks 
> 3.3 Stochastic Machines
>> 3.3.1 Support Vector Machine
>> 3.3.2 Boltzmann Machine
>> 3.3.3 Simulated Annealing
>> 3.3.4 Genetic Algorithms
>> 3.3.5 Matrix Factorization Method
> 3.4 Time-Series Methods
>> 3.4.1 Autoregression (AR)
>> 3.4.2 Moving Average (MA)
>> 3.4.3 Autoregressive Moving Average (ARMA)
>> 3.4.4 Autoregressive Moving Integrated Average (ARIMA)
>> 3.4.5 Seasonal Autoregressive Moving Integrated Average (SARIMA)
4. Making a prediction about class or category
> 4.1 SGD Classifier
> 4.2 Linear SVC Method
> 4.3 Lazy Classifiers
>> 4.3.1 Nearest Neighbor
>>> 4.3.1.1 k-Nearest Neighbor
>> 4.3.2 K* Algorithm
>> 4.3.3 Bayesian Rules Classifier
>> 4.3.4 Locally Weighted Learning
> 4.4 Kernel Methods
>> 4.4.1 Kernel Density Estimation
> 4.5 Classification Tree Algorithms
>> 4.5.1 Decision Tree
>> 4.5.2 Naive Bayes Tree
>> 4.5.3 CART
>> 4.5.4 CHAID
>> 4.5.5 Decision Stump
>> 4.5.6 Random Forest
>> 4.5.7 C4.5 or J4.8
>> 4.5.8 AdaBoost
>> 4.5.9 Gradient Boosted Tree
>> 4.5.10 Alternating Decision Tree
> 4.6 Bayesian Classifiers
>> 4.6.1 Averaged, One-Dependence Estimators
>> 4.6.2 BayesNet
>> 4.6.3 Complement Naive Bayes
>> 4.6.4 Naive Bayes
>> 4.6.5 Hidden Naive Bayes
>> 4.6.6 DBNBText
>> 4.6.7 AODEsr (Subsumption Resolution)
>> 4.6.8 WAODE
> 4.7 Rule Based Algorithms
>> 4.7.1 Decision Table
>> 4.7.2 OneR
>> 4.7.3 ZeroR
>> 4.7.4 Conjunctive Rule
>> 4.7.5 PART
>> 4.7.6 NNGE
>> 4.7.7 PRISM
>> 4.7.8 M5Rules
>> 4.7.9 RIDOR
>> 4.7.10 JRIP
>> 4.7.11 Ordinal Learning Method
>> 4.7.12 Fuzzy Unordered Rule Induction
5. Unsupervised Learning
> 5.1 Nearest Neighbor
>> 5.1.1 k-Nearest Neighbor
> 5.2 K* Algorithm
> 5.3 Bayesian Rules Classifier
> 5.4 Locally Weighted Learning
> 5.5 Self-Organizing Maps 
6. Exploring Complexity
> 6.1 Cellular Automata
> 6.2 Complex Adaptive Systems
> 6.3 Agent-based modeling
7. Measuring Performance
> 7.1 Error Types
> 7.2 Loss Functions
> 7.3 Performance Metrics
>> 7.3.1 Metric Selection
> 7.4 Validation
>> 7.4.1 Split Sampling
>> 7.4.2 Cross-Validation
>> 7.4.3 Bootstrapping
> 7.5 Estimation Error Measurement
>> 7.5.1 R-Square
>> 7.5.2 Weighted R-Square
>> 7.5.3 Adjusted R-Square
>> 7.5.4 Absolute Error
>> 7.5.5 Prediction Error
>> 7.5.6 RMSE
>> 7.5.7 Correlation Coefficient
> 7.6 Classification Error Measurement
>> 7.6.1 Confusion Matrix
>> 7.6.2 Sensitivity & Specificity
>> 7.6.3 Precision & Accuracy
>> 7.6.4 Entropy
>> 7.6.5 Kappa Statistic
> 7.7 Visualizing Performance
>> 7.7.1 Lift Charts
>> 7.7.2 ROC Curves
>> 7.7.3 Lorenz Curves & Gini Coefficient
8. Automated Predictive Modeling
9. Practical Matters
> 9.1 Blueprinting & Prototyping
> 9.2 Managing Expectation
> 9.3 Communication
> 9.4 Documentation
>> 9.4.1 Excel Documentation
>> 9.4.2 SQL Documentation
>> 9.4.3 Project Documentation
>> 9.4.4 Notes & Assumptions
> 9.5 Monitoring & Maintenance
> 9.6 Folder Organization
10. Responsibility & Ethics
> 10.1 With great power…

Recent Posts

Since most of the table of contents is not yet hyperlinked, you can see some of the more recent posts below for easier access.

2021
05
Jun
1 Comment
In this post we get to see an example of self-organizing map (or SOM) and also see competitive learning in action. This is where one neuron wins at each presentation of input data, and in this way we are able to map a few neurons to large and complex data. The[...]
16
Jan
2020
11
Dec
11
Dec
11
Dec
21
Nov
20
Nov
20
Nov
14
Nov
No Comments
In the video below I provide a brief overview of Microsoft's Azure AutoML. For background on AutoML, read this.  
14
Nov
No Comments
The mitigation of manual labor through automation has always been a goal, especially since the dawn of machines with the industrial revolution. While the term automation was coined in the 1940's as it related to motor vehicle assembly, today the term has another meaning. Automation of data science/predictive modeling/machine learning[...]