In this article we are going to make similar plots using Python’s Seaborn library and R’s ggplot2. The Python Seaborn library is built over Matplotlib library but it has much simpler syntax structure than matplotlib.
Seaborn is one of the richest data science library which provides a high-level interface for drawing informative and attractive statistical graphs.
To start let’s first import our libraries.
import seaborn as sns
import matplotlib.pyplot as plt
Now that we have imported our libraries let’s go through some functions that will help you to give graphs a personal touch. 🙂
sns.set_style()
sets the background theme of the plot. “ticks” is the closest to the plot made in R.sns.set_context()
will apply predefined formatting to the plot to fit the reason or context the visualization is to be used. font_scale=1
is used to set the scale of the font size for all the text in the graph. plt.figure()
is a command to control different aspects of the matplotlib graph (as stated before seaborn graphs are just Matplotlib plots under the hood). sizes=(800,1000)
controls the minimum and maximum size of the scatter points on the plot.plt.title()
gives the plot its main title. If you are an experienced Matplotlib user or used plt.subtitle() before you know the confusion when using the two together. The arguments are self-explanatory. plt.xlabel()
will format the x-axis label. I useset_..
to access the class to include aesthetic properties. This can get cluttered at times but there are many ways to format a seaborn/matplotlib plot. This is useful for after the plot has been created. The plot was already made with sns.scatterplot
so now we need to override the default formats in this manner. plt.ylabel()
works in the exact same way just for the y-axis. sns.pairplot()
plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each variable in data
will be shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column. data
: DataFrame – Tidy (long-form) data frame where each column is a variable and each row is an observation. hue
: String (variable name), optional. Variable in data
to map plot aspects to different colours. Now been done with formalities let’s jump to the coding part.
We will be using Iris Data set for this tutorial. You can download Iris data set from here.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('iris.data', header=None, names=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species'])
header
=None as the file contains no header row.names
as the list of column names.The data will be loaded as follows:
Add following lines of code to the previous code.
...
sns.set_style("ticks")
sns.set_context("talk")
plt.figure()
p = sns.pairplot(data=data, hue="Species")
plt.show()
Seaborn will output a beautiful Plot of various features.
Next we will draw a correlation matrix, to identify the correlation between various features of the dataset.
...
plt.figure()
sns.heatmap(data.iloc[:,:-1].corr())
plt.show()
data.iloc[:,:-1].corr()
returns the correlation matrix and sns.heatmap()
plots the same.New to Python? Go through our Quick Introduction to Python and boost your py basics.
We will be using ggplot2
to plot the graphs in R. Using ggplot2 we can create easy and customizable graphics by adding layers of aesthetics to the plot. A great feature for new users is that besides the step of loading the data to be used in ggplot2
and giving the geometric shape, the layers of aesthetics can be (mostly) done in any order. This is because ggplot2 was built on the principles of the grammar of graphics. These principles enable us to create stunning and informative visualizations.
The following R code will load the ggplot2
package (probably the most prominent visualization package in R) and will generate a pairplot
for us.
library(ggplot2) # Data visualization
# Load the dataset
iris=read.csv('iris.data')
# First let's get a random sampling of the data
iris[sample(nrow(iris),10),]
# plotting pairplot
library(GGally)
ggpairs(iris, aes(colour = Species))
We got a highly detailed pairplot
and that too in bare minimum lines of code.
Such is the beauty of R that we got the pair-plots and correlation matrix both on the same plot.
One of the main differences I believe is that the Seaborn
plots have a better default resolution than the ggplot2
graphics and the syntax required can be much less (but this is dependent on circumstance). Seaborn uses a programmatic approach whereby the user can access the classes in Seabor
n and Matplotlib
to manipulate the plots. ggplot2
uses a layered approach wherein the user can add aesthetics and formats in any order to create the figure (which I believe can be more simpler despite the amount of code required). Most people do not notice and this may be more significant to some more than others, Python
plots, when saved as graphics take up significantly more disk space than R generated graphics. Among the graphics in this article, the Seaborn/Matplotlib
graphics take up approximately 6x more disk space than the ggplot2
graphics.
Recreating the same plot — albeit with minor differences — is very possible with Seaborn
and ggplot2
. While the tools are different, they can still be used to create the same object.
In Python, the print() function is a fundamental tool for displaying output. While printing simple…
Python is a versatile programming language known for its simplicity and flexibility. When working on…
PDF (Portable Document Format) files are commonly used for sharing documents due to their consistent…
PDF (Portable Document Format) files are widely used for document exchange due to their consistent…
Python is a high-level programming language known for its simplicity and ease of use. However,…
Object-Oriented Programming (OOP), iterators, generators, and closures are powerful concepts in Python that can be…
This website uses cookies.
View Comments
Nice answer back in return of this issue with firm arguments and
describing everything about that.