Pie Chart

Some preliminaries

If you are wondering why this webpage looks the way it does, it might help you to review Anaconda, Jupyter scripts and a basic Python example. You can do so by reviewing the post(s) below.

Data Exploration with a Pie Chart

We load in a popular predictive modeling dataset called "Iris" using the sklearn library. Then, we utilize a pie chart to visualize the data and its relationship with the variable that we are trying to predict.

In [1]:
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
data = pd.DataFrame(iris.data, columns=[iris.feature_names]) #loading data into a pandas dataframe (for easier manipulation)
In [2]:
data['y'] = pd.Series(data=iris.target, index=data.index) #the loaded data does not include the target variable for some reason, adding it here
data.describe() #get some basic stats on the dataset
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) y
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333 1.000000
std 0.828066 0.435866 1.765298 0.762238 0.819232
min 4.300000 2.000000 1.000000 0.100000 0.000000
25% 5.100000 2.800000 1.600000 0.300000 0.000000
50% 5.800000 3.000000 4.350000 1.300000 1.000000
75% 6.400000 3.300000 5.100000 1.800000 2.000000
max 7.900000 4.400000 6.900000 2.500000 2.000000
In [3]:
data.head() #we observe the first few lines of the dataset (always a good idea to get a sense of what is 'in there')
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) y
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
In [4]:
#convert the dataframes to array
values = data[["sepal length (cm)"]].values #our 'x' axis variable (or whichever column we wish to see the distribution of)
labels = data[["y"]].values
In [5]:
import plotly.plotly as py
from plotly import graph_objs as go
#Now, an issue is that the matplotlib function below uses a Numpy array, and not a dataframe. So we convert the dataframe to array
piechart = go.Pie(labels=labels, values=values, hole=0.3) #you can remove the 'hole' attribute to get a solid piet chart
D:\Data\PredictiveModeler\Anaconda\lib\site-packages\IPython\core\display.py:689: UserWarning:

Consider using IPython.display.IFrame instead


A pie chart is a good way of visualizing the class composition of data.

Helpful tip: you can use the 'snipping tool' in windows to cut/paste any developed charts into your report, presentation, or to save it as a picture and use whereever you like.


If you have ideas on how to improve this post, please let me know: https://predictivemodeler.com/feedback/

Reference: py.piechart