In this tutorial, We are going to make an Amazon Price Scraping and Auto mailer Python App using Requests, BeautifulSoup and smtplib library that check for the price change and if the product’s price goes beyond a set value, it automatically emails the user or given supplied email address.
Right, so what exactly is web scraping? As the name implies, it’s a method of ‘scraping’ or extracting data from webpages. Anything you can see on the internet with your browser, including this tutorial, can be scraped onto your local hard drive.
There are many uses for web scraping. For any data analysis, the first step is data acquisition. The internet is a vast repository of all of mankind’s history and knowledge, and you have the means of extracting anything you want and doing with that information what you will.
What we’ll be covering in the tutorial:
requests
BeautifulSoup
This tutorial assumes you know the following things:
You can install these packages with pip of course, like so:
pip install bs4 requests re smtplib
After you’re done downloading the packages, go ahead and import them into your code.
import requests
from bs4 import BeautifulSoup
import re
import smtplib
You may have noticed something quirky in the snippet above. That is, we downloaded a package called beautifulsoup4
, but we imported from a module called bs4
. This is legal in Python, and though it is generally frowned upon, it’s not exactly against the law.
So we have our environment set up and ready. Next, we need the URL for the webpage that we want to scrape. For our tutorial, we’re using Amazon MI mobile URL. Next, we will configure our useragent
so that our request will not be classified as robot.
url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
We know what we want on the page, and that’s well and all, but how do we use Python to read the contents of the page? Well, it works pretty much the same way a human would read the contents of the page off of a web browser.
First, we need to request the web page using the ‘requests’ library.
req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working
Now we have a Response object which contains the raw text of the webpage. As of yet, we can’t do anything with this. All we have is a vast string that contains the entire source code of the HTML file. To make sense of this, we need to use BeautifulSoup4
.
The headers will allow us to mimic a browser visit. Since a response to a bot is different from the response to a browser, and our point of reference is a browser, it’s better to get the browser’s response.
BeautifulSoup4
will allow us to find specific tags, by searching for any combination of classes, ids, or tag names. This is done by creating a syntax tree, but the details of that are irrelevant to our goal (and out of the scope of this tutorial).
So let’s go ahead and create that syntax tree.
The soup is just a BeautifulSoup object that is created by taking a string of raw source code. Keep in mind that we need to specify the HTML parser. This is because BeautifulSoup can also create soup out of XML.
soup = BeautifulSoup(res.text, features='html.parser')
We know what tags we want (the span tags with ‘domain’ class), and we have the soup. What comes next is traversing the soup and find all instances of these tags. You may laugh at how simple this is with BeautifulSoup
.
# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()
We’ll convert this code into a function
def get_price():
soup = BeautifulSoup(res.text, features='html.parser')
# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()
# print(price, title)
# Awesome
# Now when the product price is lessthan the quoted price send a mail
if (price < 17000):
send_mail(price,title)
else:
print(f"Price still high --> {price}")
We will use python’s in built smtplib. First we will create a server and establish a connection.
# We will be using gmail for sending the mail
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
Next, We will login using google apps password and send the mail.
server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"
msg = f"subject: {subject} \n\n {body}"
# now sending the mail
server.sendmail(
'aditya.s0110@gmail.com',
'aditya.s0110@gmail.com',
msg
)
print("Email Sent!!")
Let’s consize it as a function. Learn more about smtplib.
def send_mail(price,title):
# We will be using gmail for sending the mail
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
# for login we will use google apps password
try:
server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"
msg = f"subject: {subject} \n\n {body}"
# now sending the mail
server.sendmail(
'aditya.s0110@gmail.com',
'aditya.s0110@gmail.com',
msg
)
print("Email Sent!!")
except:
print("Some error occured!!")
get_price()
I have also made a video for this tutorial.. Do watch it.. 🙂
# We will be making an Amazon Price scraper and an auto mailer in Python App
import requests
from bs4 import BeautifulSoup
import re
import smtplib
url = "https://www.amazon.in/gp/product/B07DJCN7C4/ref=s9_acss_bw_cg_Top_4b1_w?pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-5&pf_rd_r=X4D57A3PAJ9DGCGG07KS&pf_rd_t=101&pf_rd_p=50e8253f-cd32-4485-86db-b433363f7609&pf_rd_i=6294306031"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
req = requests.Session()
res = req.get(url, headers=headers)
# print(res.text)
# Voila !! It's working
# now lets make a function to fetch prices
def get_price():
soup = BeautifulSoup(res.text, features='html.parser')
# finding price
price = soup.find(id='priceblock_ourprice').get_text()
# removing comma and rupee sign we will be using regex
price = float(re.sub('\,', '', price[2:]))
title = soup.find(id="productTitle").get_text()
title = title.strip()
# print(price, title)
# Awesome
# Now when the product price is lessthan the quoted price send a mail
if (price < 17000):
send_mail(price,title)
else:
print(f"Price still high --> {price}")
def send_mail(price,title):
# We will be using gmail for sending the mail
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
# for login we will use google apps password
try:
server.login('aditya.s0110@gmail.com', password='uiyhbwkojpxbsmad')
subject = "Prices are falling Down!! :D"
body = f"The price for this amazon product has come down \n\n Title - {title} \n\n Price - {price} \n\n Link - {url}"
msg = f"subject: {subject} \n\n {body}"
# now sending the mail
server.sendmail(
'aditya.s0110@gmail.com',
'aditya.s0110@gmail.com',
msg
)
print("Email Sent!!")
except:
print("Some error occured!!")
get_price()
You can also check out our other awesome articles:
Note: Please change the url and the gmail password and email or you will get errors.
In Python, the print() function is a fundamental tool for displaying output. While printing simple…
Python is a versatile programming language known for its simplicity and flexibility. When working on…
PDF (Portable Document Format) files are commonly used for sharing documents due to their consistent…
PDF (Portable Document Format) files are widely used for document exchange due to their consistent…
Python is a high-level programming language known for its simplicity and ease of use. However,…
Object-Oriented Programming (OOP), iterators, generators, and closures are powerful concepts in Python that can be…
This website uses cookies.
View Comments
Just a beginner with python and I've never done webscraping before.... I just wanna ask if it's not risky to put your actual email address and password into a code like that, wil hackers be able to get hold of it somehow?
The Password used in this post is an OTP and can only be used once. But do take caution while these stunts.. :)
Awesome tutorial! Thanks for sharing.