If you are wondering why this webpage looks the way it does, it might help you to review Anaconda, Jupyter scripts and a basic Python example. You can do so by reviewing the post(s) below.
We load in a popular predictive modeling dataset called "Iris" using the sklearn library. Then, we utilize a histogram plot to visualize the data and its relationship with the variable that we are trying to predict.
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
data = pd.DataFrame(iris.data, columns=[iris.feature_names]) #loading data into a pandas dataframe (for easier manipulation)
data['y'] = pd.Series(data=iris.target, index=data.index) #the loaded data does not include the target variable for some reason, adding it here
data.describe() #get some basic stats on the dataset
data.head() #we observe the first few lines of the dataset (always a good idea to get a sense of what is 'in there')
import matplotlib.pyplot as plt
x = data[["sepal length (cm)"]] #our 'x' axis variable (or whichever column we wish to see the distribution of)
#Now, an issue is that the matplotlib function below uses a Numpy array, and not a dataframe. So we convert the dataframe to array
x_array=x_array = x.values #convert dataframe to array
n, bins, patches = plt.hist(x_array, bins='auto', facecolor='blue', alpha=0.5)
plt.xlabel('sepal length (cm)')
plt.ylabel('Frequency')
plt.title('Histogram');
A histogram can be a great tool to quickly observe the distribution of data points.
Helpful tip: you can use the 'snipping tool' in windows to cut/paste any developed charts into your report, presentation, or to save it as a picture and use whereever you like.
If you have ideas on how to improve this post, please let me know: https://predictivemodeler.com/feedback/