Linear regression (i.e. ordinary least squares) is one of *the* most commonly used statistical modeling technique. It works wonderfully in many situations, but there are scenarios in which some other version of linear modeling *could* yield a better answer.

In this example, we turn to something called "ElasticNet" regression. We will not go into the mathematical details of this model. A few resources are listed below if you are interested in a deeper dive.

I will note that ElasticNet does not always outperform linear regression. However, there is a chance of it performing better (on test data) in cases where there is a significant amount of multicollinearity present, or when a sparser model (i.e. fewer predictors) would generalize better. ElasticNet sort of combines elements from both Lasso and Ridge regression.

First we import a the boston house prices dataset, and print a description of it so we can examine what is in the data. Remember in order to execute a 'cell' like the one below, you can 1) click on it and run it using the *run* button above or 2) click in the cell and hit shift+enter.

In [1]:

```
import pandas as pd
from sklearn.datasets import load_boston
boston = load_boston()
print(boston.data.shape) #get (numer of rows, number of columns or 'features')
print(boston.DESCR) #get a description of the dataset
```

In [2]:

```
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data.head()
```

Out[2]:

In [3]:

```
#For some reason, the loaded data does not include the target variable (MEDV), we add it here
data['MEDV'] = pd.Series(data=boston.target, index=data.index)
data.describe() #get some basic stats on the dataset
```

Out[3]:

In [4]:

```
#Load the independent variables (the x1, x2, etc.) into a dataframe object called 'X'. Similarly for the dependent variable 'Y'
X = data.drop('MEDV', axis = 1) #define independent predictor set (excluding the dependent variable)
Y = data['MEDV'] #define the target values (i.e. the dependent variable)
```

We randomly select a third of our data to be the 'test' dataset. This way we can train our model on 2/3 of the data, and test it on the remainder. Once we are confident that our model is generalizing well (i.e. there is not a HUGE different in the training/testing performance, or in other words, not obviously overfitting), then we can use *all* of our data to train the model.

In [5]:

```
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.33, random_state = 5)
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
```

We now import the 'ElasticNet' method from the *sklearn* library, and the fit our model as follows:

In [13]:

```
from sklearn.linear_model import ElasticNet
en = ElasticNet(alpha=0.01,l1_ratio=0.01) #a higher value of alpha restricts the coefficients further
en.fit(X_train,Y_train)
Y_pred_train = en.predict(X_train) #predictions on training data
Y_pred = en.predict(X_test) #predictions on testing data
# We plot predicted Y (y-axis) against actual Y (x-axis). Perfect predictions will lie on the diagonal. We see the diagonal trend, suggesting a 'good' fit
import matplotlib.pyplot as plt
plt.scatter(Y_test,Y_pred)
plt.xlabel("Actual Price: $Y_i$")
plt.ylabel("Predicted Price: $\hat{Y}_i$")
plt.title("Predicted Price vs Actual Price: $Y_i$ vs $\hat{Y}_i$")
plt.show()
```

In [14]:

```
#Get mean squared error and R-square for both training & testing data
from sklearn.metrics import mean_squared_error, r2_score
print('Mean Squared Error: on training data: ',mean_squared_error(Y_train,Y_pred_train), '| And on Testing data',mean_squared_error(Y_test,Y_pred))
print('R-Squared Error: on training data: ',r2_score(Y_train,Y_pred_train), '| And on Testing data',r2_score(Y_test,Y_pred))
```

In [15]:

```
#Let's get the coefficients
print('Intercept: ',en.intercept_) # This gives us the intercept term
print('Coefficients: \n',en.coef_) # This gives us the coefficients (in the case of this model, just one coefficient)
```

You can compare this example to the OLS method: https://predictivemodeler.com/2019/08/19/py-ols-boston-house-prices/

If you have ideas on how to improve this post, please let me know: https://predictivemodeler.com/feedback/