In this tutorial, We are going to make an Amazon Price Scraping and Auto mailer Python App using Requests, BeautifulSoup and smtplib library that check for the price change and if the product’s price goes beyond a set value, it automatically emails the user or given supplied email address.
What’s Web Scraping?
Right, so what exactly is web scraping? As the name implies, it’s a method of ‘scraping’ or extracting data from webpages. Anything you can see on the internet with your browser, including this tutorial, can be scraped onto your local hard drive.
There are many uses for web scraping. For any data analysis, the first step is data acquisition. The internet is a vast repository of all of mankind’s history and knowledge, and you have the means of extracting anything you want and doing with that information what you will.
What we’ll be covering in the tutorial:
- Getting web pages using
requests
- Analyzing web pages in the
- Extracting information from raw HTML with
BeautifulSoup
Pre-requisites
This tutorial assumes you know the following things:
- Running Python scripts in your computer
- Basic knowledge of HTML structure
You can install these packages with pip of course, like so:
pip install bs4 requests re smtplib
Writing Our Scraper
After you’re done downloading the packages, go ahead and import them into your code.
import requests
from bs4 import BeautifulSoup
import re
import smtplib
You may have noticed something quirky in the snippet above. That is, we downloaded a package beautifulsoup4
bs4
First Steps
So we have our environment set up and ready. Next, we need the URL for the webpage that we want to scrape. For our tutorial, we’re using Amazon MI mobile URL. Next, we will configure our useragent
so that our request will not be classified as robot.
url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
Getting The Page
We know what we want on the page, and that’s well and all, but how do we use Python to read the contents of the page? Well, it works pretty much the same way a human would read the contents of the page off of a web browser.
First, we need to request the web page using the ‘requests’ library.
req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working
Now we have a Response object which contains the raw text of the webpage. As of yet, we can’t do anything with this. All we have is a vast string that contains the entire source code of the HTML file. To make sense of this, we need to BeautifulSoup4
The headers will allow us to mimic a browser visit. Since a response to a bot is different from the response to a browser, and our point of reference is a browser, it’s better to get the browser’s response.
BeautifulSoup4
will allow us to find specific tags, by searching for any combination of classes, ids, or tag names. This is done by creating a syntax tree, but the details of that are irrelevant to our goal (and out of the scope of this tutorial).
So let’s go ahead and create that syntax tree.
Making Our Soup
The soup is just a BeautifulSoup object that is created by taking a string of raw source code. Keep in mind that we need to specify the HTML parser. This is because BeautifulSoup can also create
soup = BeautifulSoup(res.text, features='html.parser')
We know what tags we want (the span tags with ‘domain’ class), and we have the soup. What comes next is traversing the soup and find all instances of these tags. You may laugh at how simple this is BeautifulSoup
# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()
We’ll convert this code into a function
def get_price():
soup = BeautifulSoup(res.text, features='html.parser')
# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()
# print(price, title)
# Awesome
# Now when the product price is lessthan the quoted price send a mail
if (price < 17000):
send_mail(price,title)
else:
print(f"Price still high --> {price}")
Sending the mail
We will use python’s in built smtplib. First we will create a server and establish a connection.
# We will be using gmail for sending the mail
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
Next, We will login using google apps password and send the mail.
server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"
msg = f"subject: {subject} \n\n {body}"
# now sending the mail
server.sendmail(
'aditya.s0110@gmail.com',
'aditya.s0110@gmail.com',
msg
)
print("Email Sent!!")
Let’s consize it as a function. Learn more about smtplib.
def send_mail(price,title):
# We will be using gmail for sending the mail
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
# for login we will use google apps password
try:
server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"
msg = f"subject: {subject} \n\n {body}"
# now sending the mail
server.sendmail(
'aditya.s0110@gmail.com',
'aditya.s0110@gmail.com',
msg
)
print("Email Sent!!")
except:
print("Some error occured!!")
get_price()
I have also made a video for this tutorial.. Do watch it.. 🙂
Full Code:
# We will be making an Amazon Price scraper and an auto mailer in Python App
import requests
from bs4 import BeautifulSoup
import re
import smtplib
url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working
# now lets make a function to fetch prices
def get_price():
soup = BeautifulSoup(res.text, features='html.parser')
# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()
# print(price, title)
# Awesome
# Now when the product price is lessthan the quoted price send a mail
if (price < 17000):
send_mail(price,title)
else:
print(f"Price still high --> {price}")
def send_mail(price,title):
# We will be using gmail for sending the mail
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
# for login we will use google apps password
try:
server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"
msg = f"subject: {subject} \n\n {body}"
# now sending the mail
server.sendmail(
'aditya.s0110@gmail.com',
'aditya.s0110@gmail.com',
msg
)
print("Email Sent!!")
except:
print("Some error occured!!")
get_price()
You can also check out our other awesome articles:
- Python vs R: Data Visualization
- Cython vs Python: Speed up your Python
- Python getting must for these banking jobs
Note: Please change the url and the gmail password and email or you will get errors.
Just a beginner with python and I’ve never done webscraping before…. I just wanna ask if it’s not risky to put your actual email address and password into a code like that, wil hackers be able to get hold of it somehow?
The Password used in this post is an OTP and can only be used once. But do take caution while these stunts.. 🙂
Awesome tutorial! Thanks for sharing.