Top 10 Programming Languages to Learn in 2019

Programming

Amazon Price Scraper and Auto Mailer Python App

In this tutorial, We are going to make an Amazon Price Scraping and Auto mailer Python App using Requests, BeautifulSoup and smtplib library that check for the price change and if the product’s price goes beyond a set value, it automatically emails the user or given supplied email address.

What’s Web Scraping?

Right, so what exactly is web scraping? As the name implies, it’s a method of ‘scraping’ or extracting data from webpages. Anything you can see on the internet with your browser, including this tutorial, can be scraped onto your local hard drive.

There are many uses for web scraping. For any data analysis, the first step is data acquisition. The internet is a vast repository of all of mankind’s history and knowledge, and you have the means of extracting anything you want and doing with that information what you will.

What we’ll be covering in the tutorial:

  • Getting web pages using requests
  • Analyzing web pages in the browser for information
  • Extracting information from raw HTML with BeautifulSoup

Pre-requisites

This tutorial assumes you know the following things:

  • Running Python scripts in your computer
  • Basic knowledge of HTML structure

You can install these packages with pip of course, like so:

pip install bs4 requests re smtplib

Writing Our Scraper

After you’re done downloading the packages, go ahead and import them into your code.

import requests
from bs4 import BeautifulSoup
import re
import smtplib

You may have noticed something quirky in the snippet above. That is, we downloaded a package called beautifulsoup4, but we imported from a module called bs4. This is legal in Python, and though it is generally frowned upon, it’s not exactly against the law.

First Steps

So we have our environment set up and ready. Next, we need the URL for the webpage that we want to scrape. For our tutorial, we’re using Amazon MI mobile URL. Next, we will configure our useragent so that our request will not be classified as robot.

url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}

Getting The Page

We know what we want on the page, and that’s well and all, but how do we use Python to read the contents of the page? Well, it works pretty much the same way a human would read the contents of the page off of a web browser.

First, we need to request the web page using the ‘requests’ library.

req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working 

Now we have a Response object which contains the raw text of the webpage. As of yet, we can’t do anything with this. All we have is a vast string that contains the entire source code of the HTML file. To make sense of this, we need to use BeautifulSoup4.

The headers will allow us to mimic a browser visit. Since a response to a bot is different from the response to a browser, and our point of reference is a browser, it’s better to get the browser’s response.

BeautifulSoup4 will allow us to find specific tags, by searching for any combination of classes, ids, or tag names. This is done by creating a syntax tree, but the details of that are irrelevant to our goal (and out of the scope of this tutorial).

So let’s go ahead and create that syntax tree.

Making Our Soup

The soup is just a BeautifulSoup object that is created by taking a string of raw source code. Keep in mind that we need to specify the HTML parser. This is because BeautifulSoup can also create soup out of XML.

soup = BeautifulSoup(res.text, features='html.parser')

We know what tags we want (the span tags with ‘domain’ class), and we have the soup. What comes next is traversing the soup and find all instances of these tags. You may laugh at how simple this is with BeautifulSoup.

# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()

We’ll convert this code into a function

def get_price():
    soup = BeautifulSoup(res.text, features='html.parser')
    # finding price
    price = soup.find(id='priceblock_ourprice').get_text()
    # removing comma and rupee sign we will be using regex
    price = float(re.sub('\,', '', price[2:]))
    title = soup.find(id="productTitle").get_text()
    title = title.strip()
    # print(price, title)
    # Awesome

    # Now when the product price is lessthan the quoted price send a mail
    if (price < 17000):
        send_mail(price,title)
    else:
        print(f"Price still high --> {price}")

Sending the mail

We will use python’s in built smtplib. First we will create a server and establish a connection.

    # We will be using gmail for sending the mail
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()

Next, We will login using google apps password and send the mail.

server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"

msg = f"subject: {subject} \n\n {body}"
        
# now sending the mail
server.sendmail(
   'aditya.s0110@gmail.com',
   'aditya.s0110@gmail.com',
   msg
)
print("Email Sent!!")

Let’s consize it as a function. Learn more about smtplib.

def send_mail(price,title):
    # We will be using gmail for sending the mail
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()
    # for login we will use google apps password
    try:
        server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
        subject = "Prices are falling Down!! :D"
        body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"

        msg = f"subject: {subject} \n\n {body}"
        
        # now sending the mail
        server.sendmail(
            'aditya.s0110@gmail.com',
            'aditya.s0110@gmail.com',
            msg
        )
        print("Email Sent!!")
    except:
        print("Some error occured!!")

get_price()

I have also made a video for this tutorial.. Do watch it.. 🙂

Full Code:

# We will be making an Amazon Price scraper and an auto mailer in Python App

import requests
from bs4 import BeautifulSoup
import re
import smtplib

url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}

req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working 

# now lets make a function to fetch prices
def get_price():
    soup = BeautifulSoup(res.text, features='html.parser')
    # finding price
    price = soup.find(id='priceblock_ourprice').get_text()
    # removing comma and rupee sign we will be using regex
    price = float(re.sub('\,', '', price[2:]))
    title = soup.find(id="productTitle").get_text()
    title = title.strip()
    # print(price, title)
    # Awesome

    # Now when the product price is lessthan the quoted price send a mail
    if (price < 17000):
        send_mail(price,title)
    else:
        print(f"Price still high --> {price}")

def send_mail(price,title):
    # We will be using gmail for sending the mail
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.ehlo()
    server.starttls()
    server.ehlo()
    # for login we will use google apps password
    try:
        server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
        subject = "Prices are falling Down!! :D"
        body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"

        msg = f"subject: {subject} \n\n {body}"
        
        # now sending the mail
        server.sendmail(
            'aditya.s0110@gmail.com',
            'aditya.s0110@gmail.com',
            msg
        )
        print("Email Sent!!")
    except:
        print("Some error occured!!")

get_price()

You can also check out our other awesome articles:

Note: Please change the url and the gmail password and email or you will get errors.

Related posts
ProgrammingPythonPython Basic Tutorial

Mastering Print Formatting in Python: A Comprehensive Guide

ProgrammingPython

Global Variables in Python: Understanding Usage and Best Practices

ProgrammingPythonPython Basic Tutorial

Secure Your Documents: Encrypting PDF Files Using Python

ProgrammingPython

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: