Top 10 Programming Languages to Learn in 2019

ProgrammingPython

Python vs R – Data Visualization

python-vs-r-data-visulization

In this article we are going to make similar plots using Python’s Seaborn library and R’s ggplot2. The Python Seaborn library is built over Matplotlib library but it has much simpler syntax structure than matplotlib.

Visualizing data in Python

Seaborn is one of the richest data science library which provides a high-level interface for drawing informative and attractive statistical graphs.

To start let’s first import our libraries.

import seaborn as sns
import matplotlib.pyplot as plt

Now that we have imported our libraries let’s go through some functions that will help you to give graphs a personal touch. 🙂

Description of various functions which we will be using in this tutorial:

  • sns.set_style() sets the background theme of the plot. “ticks” is the closest to the plot made in R.
  • sns.set_context() will apply predefined formatting to the plot to fit the reason or context the visualization is to be used. font_scale=1 is used to set the scale of the font size for all the text in the graph.
  • plt.figure()is a command to control different aspects of the matplotlib graph (as stated before seaborn graphs are just Matplotlib plots under the hood).
  • sizes=(800,1000) controls the minimum and maximum size of the scatter points on the plot.
  • plt.title() gives the plot its main title. If you are an experienced Matplotlib user or used plt.subtitle() before you know the confusion when using the two together. The arguments are self-explanatory.
  • plt.xlabel()will format the x-axis label. I useset_.. to access the class to include aesthetic properties. This can get cluttered at times but there are many ways to format a seaborn/matplotlib plot. This is useful for after the plot has been created. The plot was already made with sns.scatterplot so now we need to override the default formats in this manner.
  • plt.ylabel()works in the exact same way just for the y-axis.
  • sns.pairplot() plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each variable in data will be shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column.
    • data : DataFrame – Tidy (long-form) data frame where each column is a variable and each row is an observation.
    • hue : String (variable name), optional. Variable in data to map plot aspects to different colours.

Now been done with formalities let’s jump to the coding part.

We will be using Iris Data set for this tutorial. You can download Iris data set from here.

Importing required libraries and dataset

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('iris.data', header=None, names=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species'])
  • I set header=None as the file contains no header row.
  • Next, I set the names of the columns by passing; names as the list of column names.

The data will be loaded as follows:

I am using Spyder. Your visualization may differ according to the IDE you are using.

Plotting the Pairplot

Add following lines of code to the previous code.

...
sns.set_style("ticks")
sns.set_context("talk")
plt.figure()
p = sns.pairplot(data=data, hue="Species")
plt.show()

Seaborn will output a beautiful Plot of various features.

iris_pair_plot
Seaborn: iris_pair_plot

Plotting the Correlation matrix in Python

Next we will draw a correlation matrix, to identify the correlation between various features of the dataset.

...
plt.figure()
sns.heatmap(data.iloc[:,:-1].corr())
plt.show()
  • Here data.iloc[:,:-1].corr() returns the correlation matrix and sns.heatmap() plots the same.
  • Try out for yourself.
iris_corr
sns heatmap: iris_corr matrix

New to Python? Go through our Quick Introduction to Python and boost your py basics.

Visualizing data in R

We will be using ggplot2 to plot the graphs in R. Using ggplot2 we can create easy and customizable graphics by adding layers of aesthetics to the plot. A great feature for new users is that besides the step of loading the data to be used in ggplot2 and giving the geometric shape, the layers of aesthetics can be (mostly) done in any order. This is because ggplot2 was built on the principles of the grammar of graphics. These principles enable us to create stunning and informative visualizations.

The following R code will load the ggplot2 package (probably the most prominent visualization package in R) and will generate a pairplot for us.

Plotting Pairplot and Correlation Matrix

library(ggplot2) # Data visualization

# Load the dataset
iris=read.csv('iris.data')


# First let's get a random sampling of the data
iris[sample(nrow(iris),10),]

# plotting pairplot
library(GGally)
ggpairs(iris, aes(colour = Species))

We got a highly detailed pairplot and that too in bare minimum lines of code.

iris_r_pairplot
iris_r_pairplot

Such is the beauty of R that we got the pair-plots and correlation matrix both on the same plot.

Conclusion

One of the main differences I believe is that the Seaborn plots have a better default resolution than the ggplot2 graphics and the syntax required can be much less (but this is dependent on circumstance). Seaborn uses a programmatic approach whereby the user can access the classes in Seaborn and Matplotlib to manipulate the plots. ggplot2 uses a layered approach wherein the user can add aesthetics and formats in any order to create the figure (which I believe can be more simpler despite the amount of code required). Most people do not notice and this may be more significant to some more than others, Python plots, when saved as graphics take up significantly more disk space than R generated graphics. Among the graphics in this article, the Seaborn/Matplotlib graphics take up approximately 6x more disk space than the ggplot2 graphics.

Recreating the same plot — albeit with minor differences — is very possible with Seaborn and ggplot2. While the tools are different, they can still be used to create the same object.


Related posts
ProgrammingPythonPython Basic Tutorial

Mastering Print Formatting in Python: A Comprehensive Guide

ProgrammingPython

Global Variables in Python: Understanding Usage and Best Practices

ProgrammingPythonPython Basic Tutorial

Secure Your Documents: Encrypting PDF Files Using Python

ProgrammingPython

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Worth reading...
How Python is getting must for Traders?
%d bloggers like this: