Programming

Python vs R – Data Visualization

In this article we are going to make similar plots using Python’s Seaborn library and R’s ggplot2. The Python Seaborn library is built over Matplotlib library but it has much simpler syntax structure than matplotlib.

Visualizing data in Python

Seaborn is one of the richest data science library which provides a high-level interface for drawing informative and attractive statistical graphs.

To start let’s first import our libraries.

import seaborn as sns
import matplotlib.pyplot as plt

Now that we have imported our libraries let’s go through some functions that will help you to give graphs a personal touch. 🙂

Description of various functions which we will be using in this tutorial:

  • sns.set_style() sets the background theme of the plot. “ticks” is the closest to the plot made in R.
  • sns.set_context() will apply predefined formatting to the plot to fit the reason or context the visualization is to be used. font_scale=1 is used to set the scale of the font size for all the text in the graph.
  • plt.figure()is a command to control different aspects of the matplotlib graph (as stated before seaborn graphs are just Matplotlib plots under the hood).
  • sizes=(800,1000) controls the minimum and maximum size of the scatter points on the plot.
  • plt.title() gives the plot its main title. If you are an experienced Matplotlib user or used plt.subtitle() before you know the confusion when using the two together. The arguments are self-explanatory.
  • plt.xlabel()will format the x-axis label. I useset_.. to access the class to include aesthetic properties. This can get cluttered at times but there are many ways to format a seaborn/matplotlib plot. This is useful for after the plot has been created. The plot was already made with sns.scatterplot so now we need to override the default formats in this manner.
  • plt.ylabel()works in the exact same way just for the y-axis.
  • sns.pairplot() plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each variable in data will be shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column.
    • data : DataFrame – Tidy (long-form) data frame where each column is a variable and each row is an observation.
    • hue : String (variable name), optional. Variable in data to map plot aspects to different colours.

Now been done with formalities let’s jump to the coding part.

We will be using Iris Data set for this tutorial. You can download Iris data set from here.

Importing required libraries and dataset

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('iris.data', header=None, names=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species'])
  • I set header=None as the file contains no header row.
  • Next, I set the names of the columns by passing; names as the list of column names.

The data will be loaded as follows:

I am using Spyder. Your visualization may differ according to the IDE you are using.

Plotting the Pairplot

Add following lines of code to the previous code.

Related Post
...
sns.set_style("ticks")
sns.set_context("talk")
plt.figure()
p = sns.pairplot(data=data, hue="Species")
plt.show()

Seaborn will output a beautiful Plot of various features.

Seaborn: iris_pair_plot

Plotting the Correlation matrix in Python

Next we will draw a correlation matrix, to identify the correlation between various features of the dataset.

...
plt.figure()
sns.heatmap(data.iloc[:,:-1].corr())
plt.show()
  • Here data.iloc[:,:-1].corr() returns the correlation matrix and sns.heatmap() plots the same.
  • Try out for yourself.
sns heatmap: iris_corr matrix

New to Python? Go through our Quick Introduction to Python and boost your py basics.

Visualizing data in R

We will be using ggplot2 to plot the graphs in R. Using ggplot2 we can create easy and customizable graphics by adding layers of aesthetics to the plot. A great feature for new users is that besides the step of loading the data to be used in ggplot2 and giving the geometric shape, the layers of aesthetics can be (mostly) done in any order. This is because ggplot2 was built on the principles of the grammar of graphics. These principles enable us to create stunning and informative visualizations.

The following R code will load the ggplot2 package (probably the most prominent visualization package in R) and will generate a pairplot for us.

Plotting Pairplot and Correlation Matrix

library(ggplot2) # Data visualization

# Load the dataset
iris=read.csv('iris.data')


# First let's get a random sampling of the data
iris[sample(nrow(iris),10),]

# plotting pairplot
library(GGally)
ggpairs(iris, aes(colour = Species))

We got a highly detailed pairplot and that too in bare minimum lines of code.

iris_r_pairplot

Such is the beauty of R that we got the pair-plots and correlation matrix both on the same plot.

Conclusion

One of the main differences I believe is that the Seaborn plots have a better default resolution than the ggplot2 graphics and the syntax required can be much less (but this is dependent on circumstance). Seaborn uses a programmatic approach whereby the user can access the classes in Seaborn and Matplotlib to manipulate the plots. ggplot2 uses a layered approach wherein the user can add aesthetics and formats in any order to create the figure (which I believe can be more simpler despite the amount of code required). Most people do not notice and this may be more significant to some more than others, Python plots, when saved as graphics take up significantly more disk space than R generated graphics. Among the graphics in this article, the Seaborn/Matplotlib graphics take up approximately 6x more disk space than the ggplot2 graphics.

Recreating the same plot — albeit with minor differences — is very possible with Seaborn and ggplot2. While the tools are different, they can still be used to create the same object.


K

View Comments

  • Nice answer back in return of this issue with firm arguments and
    describing everything about that.

Share

Recent Posts

  • Programming

Mastering Print Formatting in Python: A Comprehensive Guide

In Python, the print() function is a fundamental tool for displaying output. While printing simple…

8 months ago
  • Programming

Global Variables in Python: Understanding Usage and Best Practices

Python is a versatile programming language known for its simplicity and flexibility. When working on…

8 months ago
  • Programming

Secure Your Documents: Encrypting PDF Files Using Python

PDF (Portable Document Format) files are commonly used for sharing documents due to their consistent…

8 months ago
  • Programming

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

PDF (Portable Document Format) files are widely used for document exchange due to their consistent…

8 months ago
  • Programming

Boosting Python Performance with Cython: Optimizing Prime Number Detection

Python is a high-level programming language known for its simplicity and ease of use. However,…

8 months ago
  • Programming

Using OOP, Iterator, Generator, and Closure in Python to implement common design patterns

Object-Oriented Programming (OOP), iterators, generators, and closures are powerful concepts in Python that can be…

8 months ago

This website uses cookies.