Categories: Programming

Amazon Price Scraper and Auto Mailer Python App

amazon price scraper

In this tutorial, We are going to make an Amazon Price Scraping and Auto mailer Python App using Requests, BeautifulSoup and smtplib library that check for the price change and if the product’s price goes beyond a set value, it automatically emails the user or given supplied email address.

Page Contents

What’s Web Scraping?

Right, so what exactly is web scraping? As the name implies, it’s a method of ‘scraping’ or extracting data from webpages. Anything you can see on the internet with your browser, including this tutorial, can be scraped onto your local hard drive.

There are many uses for web scraping. For any data analysis, the first step is data acquisition. The internet is a vast repository of all of mankind’s history and knowledge, and you have the means of extracting anything you want and doing with that information what you will.

What we’ll be covering in the tutorial:

Getting web pages using requests
Analyzing web pages in the browser for information
Extracting information from raw HTML with BeautifulSoup

Pre-requisites

This tutorial assumes you know the following things:

Running Python scripts in your computer
Basic knowledge of HTML structure

You can install these packages with pip of course, like so:

pip install bs4 requests re smtplib

Writing Our Scraper

After you’re done downloading the packages, go ahead and import them into your code.

import requests
from bs4 import BeautifulSoup
import re
import smtplib

You may have noticed something quirky in the snippet above. That is, we downloaded a package called beautifulsoup4, but we imported from a module called bs4. This is legal in Python, and though it is generally frowned upon, it’s not exactly against the law.

First Steps

So we have our environment set up and ready. Next, we need the URL for the webpage that we want to scrape. For our tutorial, we’re using Amazon MI mobile URL. Next, we will configure our useragent so that our request will not be classified as robot.

url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}

Getting The Page

We know what we want on the page, and that’s well and all, but how do we use Python to read the contents of the page? Well, it works pretty much the same way a human would read the contents of the page off of a web browser.

First, we need to request the web page using the ‘requests’ library.

req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working

Now we have a Response object which contains the raw text of the webpage. As of yet, we can’t do anything with this. All we have is a vast string that contains the entire source code of the HTML file. To make sense of this, we need to use BeautifulSoup4.

The headers will allow us to mimic a browser visit. Since a response to a bot is different from the response to a browser, and our point of reference is a browser, it’s better to get the browser’s response.

BeautifulSoup4 will allow us to find specific tags, by searching for any combination of classes, ids, or tag names. This is done by creating a syntax tree, but the details of that are irrelevant to our goal (and out of the scope of this tutorial).

So let’s go ahead and create that syntax tree.

Making Our Soup

The soup is just a BeautifulSoup object that is created by taking a string of raw source code. Keep in mind that we need to specify the HTML parser. This is because BeautifulSoup can also create soup out of XML.

soup = BeautifulSoup(res.text, features='html.parser')

We know what tags we want (the span tags with ‘domain’ class), and we have the soup. What comes next is traversing the soup and find all instances of these tags. You may laugh at how simple this is with BeautifulSoup.

# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()

We’ll convert this code into a function

def get_price():
    soup = BeautifulSoup(res.text, features='html.parser')
    # finding price
    price = soup.find(id='priceblock_ourprice').get_text()
    # removing comma and rupee sign we will be using regex
    price = float(re.sub('\,', '', price[2:]))
    title = soup.find(id="productTitle").get_text()
    title = title.strip()
    # print(price, title)
    # Awesome

    # Now when the product price is lessthan the quoted price send a mail
    if (price < 17000):
        send_mail(price,title)
    else:
        print(f"Price still high --> {price}")

Sending the mail

We will use python’s in built smtplib. First we will create a server and establish a connection.

    # We will be using gmail for sending the mail
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()

Next, We will login using google apps password and send the mail.

server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"

msg = f"subject: {subject} \n\n {body}"
        
# now sending the mail
server.sendmail(
   'aditya.s0110@gmail.com',
   'aditya.s0110@gmail.com',
   msg
)
print("Email Sent!!")

Let’s consize it as a function. Learn more about smtplib.

def send_mail(price,title):
    # We will be using gmail for sending the mail
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()
    # for login we will use google apps password
    try:
        server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
        subject = "Prices are falling Down!! :D"
        body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"

        msg = f"subject: {subject} \n\n {body}"
        
        # now sending the mail
        server.sendmail(
            'aditya.s0110@gmail.com',
            'aditya.s0110@gmail.com',
            msg
        )
        print("Email Sent!!")
    except:
        print("Some error occured!!")

get_price()

I have also made a video for this tutorial.. Do watch it.. 🙂

Full Code:

# We will be making an Amazon Price scraper and an auto mailer in Python App

import requests
from bs4 import BeautifulSoup
import re
import smtplib

url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}

req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working 

# now lets make a function to fetch prices
def get_price():
    soup = BeautifulSoup(res.text, features='html.parser')
    # finding price
    price = soup.find(id='priceblock_ourprice').get_text()
    # removing comma and rupee sign we will be using regex
    price = float(re.sub('\,', '', price[2:]))
    title = soup.find(id="productTitle").get_text()
    title = title.strip()
    # print(price, title)
    # Awesome

    # Now when the product price is lessthan the quoted price send a mail
    if (price < 17000):
        send_mail(price,title)
    else:
        print(f"Price still high --> {price}")

def send_mail(price,title):
    # We will be using gmail for sending the mail
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()
    # for login we will use google apps password
    try:
        server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
        subject = "Prices are falling Down!! :D"
        body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"

        msg = f"subject: {subject} \n\n {body}"
        
        # now sending the mail
        server.sendmail(
            'aditya.s0110@gmail.com',
            'aditya.s0110@gmail.com',
            msg
        )
        print("Email Sent!!")
    except:
        print("Some error occured!!")

get_price()

You can also check out our other awesome articles:

Note: Please change the url and the gmail password and email or you will get errors.

Next Preorder Traversal of Binary Tree in Python »

Previous « Top 10 Deep Learning frameworks in 2019 (with comparison)

View Comments

Theuns says:

July 8, 2019 at 2:46 pm

Just a beginner with python and I've never done webscraping before.... I just wanna ask if it's not risky to put your actual email address and password into a code like that, wil hackers be able to get hold of it somehow?
- Aditya Kumar says:
  
  July 31, 2019 at 4:19 pm
  
  The Password used in this post is an OTP and can only be used once. But do take caution while these stunts.. :)
R-ALGO says:

July 9, 2019 at 10:11 pm

Awesome tutorial! Thanks for sharing.

Mastering Print Formatting in Python: A Comprehensive Guide

In Python, the print() function is a fundamental tool for displaying output. While printing simple…

2 years ago

Programming

Global Variables in Python: Understanding Usage and Best Practices

Python is a versatile programming language known for its simplicity and flexibility. When working on…

2 years ago

Programming

Secure Your Documents: Encrypting PDF Files Using Python

PDF (Portable Document Format) files are commonly used for sharing documents due to their consistent…

2 years ago

Programming

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

PDF (Portable Document Format) files are widely used for document exchange due to their consistent…

2 years ago

Programming

Boosting Python Performance with Cython: Optimizing Prime Number Detection

Python is a high-level programming language known for its simplicity and ease of use. However,…

2 years ago

Programming

Using OOP, Iterator, Generator, and Closure in Python to implement common design patterns

Object-Oriented Programming (OOP), iterators, generators, and closures are powerful concepts in Python that can be…

2 years ago

This website uses cookies.

Amazon Price Scraper and Auto Mailer Python App

What’s Web Scraping?

Pre-requisites

Writing Our Scraper

First Steps

Getting The Page

Making Our Soup

Sending the mail

Full Code:

View Comments

Related Post

Recent Posts

Mastering Print Formatting in Python: A Comprehensive Guide

Global Variables in Python: Understanding Usage and Best Practices

Secure Your Documents: Encrypting PDF Files Using Python

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

Boosting Python Performance with Cython: Optimizing Prime Number Detection

Using OOP, Iterator, Generator, and Closure in Python to implement common design patterns