In this article we are going to make similar plots using Python’s Seaborn library and R’s ggplot2. The Python Seaborn library is built over Matplotlib library but it has much simpler syntax structure than matplotlib.
Visualizing data in Python
Seaborn is one of the richest data science library which provides a
To start let’s first import our libraries.
import seaborn as sns
import matplotlib.pyplot as plt
Now that we have imported our libraries let’s go through some functions that will help you to give graphs a personal touch. 🙂
Description of various functions which we will be using in this tutorial:
sns.set_style()
sets the background theme of the plot. “ticks” is the closest to the plot made in R.sns.set_context()
will apply predefined formatting to the plot to fit the reason or context the visualization is to be used.font_scale=1
is used to set the scale of the font size for all the text in the graph.plt.figure()
is a command to control different aspects of the matplotlib graph (as stated before seaborn graphs are just Matplotlib plots under the hood).-
sizes=(800,1000)
controls the minimum and maximum size of the scatter points on the plot. plt.title()
gives the plot its main title. If you are an experienced Matplotlib user or usedplt .subtitle() before you know the confusion when using the two together. The arguments are self-explanatory.plt.xlabel()
will format the x-axis label. I useset_..
to access the class to include aesthetic properties. This can get cluttered at times but there are many ways to format a seaborn/matplotlib plot. This is useful for after the plot has been created. The plot was already madewith sns.scatterplot
so now we need to override the default formats in this manner.plt.ylabel()
works in the exact same way just for the y-axis.sns.pairplot()
plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each variablein data
will be shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column.-
data
: DataFrame – Tidy (long-form) data frame where each column is a variable and each row is an observation. -
hue
: String (variable name), optional. Variablein data
to map plot aspects to different colours.
-
Now been done with formalities let’s jump to the coding part.
We will be using Iris Data set for this tutorial. You can download Iris data set from here.
Importing required libraries and dataset
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('iris.data', header=None, names=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species'])
- I set
header
=None as the file contains no header row. Next , I set the names of the columns by passing;names
as the list of column names.
The data will be loaded as follows:
Plotting the Pairplot
Add following lines of code to the previous code.
...
sns.set_style("ticks")
sns.set_context("talk")
plt.figure()
p = sns.pairplot(data=data, hue="Species")
plt.show()
Seaborn will output a beautiful Plot of various features.
Plotting the Correlation matrix in Python
Next we will draw a correlation matrix, to identify the correlation between various features of the dataset.
...
plt.figure()
sns.heatmap(data.iloc[:,:-1].corr())
plt.show()
Here data.iloc[:,:-1].corr()
returns the correlation matrixand sns.heatmap()
plots the same.- Try out for yourself.
New to Python? Go through our Quick Introduction to Python and boost your py basics.
Visualizing data in R
We will be ggplot2
ggplot2
The following R code will load ggplot2
pairplot
for us.
Plotting Pairplot and Correlation Matrix
library(ggplot2) # Data visualization
# Load the dataset
iris=read.csv('iris.data')
# First let's get a random sampling of the data
iris[sample(nrow(iris),10),]
# plotting pairplot
library(GGally)
ggpairs(iris, aes(colour = Species))
We got a highly detailed pairplot
and that too in bare minimum lines of code.
Such is the beauty of R that we got the pair-plots and correlation matrix both on the same plot.
Conclusion
One of the main differences I believe is that Seaborn
ggplot2
Seabor
Matplotlib
ggplot2
uses a layered approach wherein the user can add aesthetics and formats in any order to create the figure (which I believe can be more simpler despite the amount of code required). Most people do not notice and this may be more significant to some more than others, Python
plots, Seaborn/Matplotlib
ggplot2
Recreating the same plot — albeit with minor differences — is very possible with Seaborn
and ggplot2
. While the tools are different, they can still be used to create the same object.
Nice answer back in return of this issue with firm arguments and
describing everything about that.