Categories: Programming Python

Web Scrapping using Python – Scraping Unsplash Photos

In this Tutorial, we will learn about scrapping websites using Python and Selenium module. This Script and Technique will help you to scrap nearly all Websites. Works for all pages in unsplash.com

In the following section we will write a python script to scrap the download links of first 10 photos from a given category in Unsplash and store it in a text file.

What is Web Scrapping?

Web Scrapping Procedure

Web scraping a web page involves fetching it and extracting from it. Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is the main component of web scraping, to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).


Prerequistes

  1. Please ensure that you have selenium installed. If not, run “pip install selenium” to install the latest version.
  2. Firefox Browser
  3. You should also have placed the geckodriver.exe in the folder where you are writing the python script. It is necessary to use this driver.
  4. Python 3.x.x

Go to the official repository to download geckdriver if you don’t have it yet. Follow this link https://github.com/mozilla/geckodriver/releases

The folder structure should look like this:


Checking the Configuration

Download the full configuration from my github account.

Copy and run the following code:

Related Post
from selenium import webdriver

browser = webdriver.Firefox()
url = "https://unsplash.com/search/photos/mountains/"
browser.get(url)

If you face any error please comment below. I will be happy to help. 😁

If everything went well you will see a firefox tab opening up and the given url will open.


Basic Web Scrapping Script Output

Planning Our Script

Before we start I would like you to go to the website and inspect the source code. You will find an interesting thing that all download links have the title = “Download photo”. We will use this info to separate the download link from other links. This will be our flow for developing the Script.

  1. Search for all ‘a’ tags.
  2. Filter the tags having title = “Download photo”.
  3. Save the links in a text file
  4. Voila!! We are done

Writing Our Script

Download the full configuration from my github account.

Code

from selenium import webdriver


def view_webpage(link_file):
    try:
        elem1 = browser.find_elements_by_tag_name('a')
    except:
        print('some error occured')
    try:
        for elem in elem1:
            if elem.get_attribute('title') == 'Download photo':
                print(elem.get_attribute('href'), file=link_file)
    except:
        print("No data in Element")

browser = webdriver.Firefox()
search_term = "mountains/"
url = "https://unsplash.com/search/photos/" + search_term
browser.get(url)
complete = False
# we will open the file in append mode
link_file = open("links.txt", mode="a+")

while not complete:
    view_webpage(link_file)
    complete = True
    
# Closing the file to save in drive
link_file.close()

Output

Voila!! It worked. Here are the links you will get in link_file.txt.


Stay tuned for my upcoming blog post to get the Improved Version of the Script at pyblog.in, New Script will let download as many photos you want and will support multi-threading.

If you get struck anywhere feel free to comment down below. I will be happy to help. 😁

This blog post is for educational purpose only.

Aditya Kumar

Share
Tags: Programming Python

Recent Posts

  • Programming

Mastering Print Formatting in Python: A Comprehensive Guide

In Python, the print() function is a fundamental tool for displaying output. While printing simple…

8 months ago
  • Programming

Global Variables in Python: Understanding Usage and Best Practices

Python is a versatile programming language known for its simplicity and flexibility. When working on…

8 months ago
  • Programming

Secure Your Documents: Encrypting PDF Files Using Python

PDF (Portable Document Format) files are commonly used for sharing documents due to their consistent…

8 months ago
  • Programming

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

PDF (Portable Document Format) files are widely used for document exchange due to their consistent…

8 months ago
  • Programming

Boosting Python Performance with Cython: Optimizing Prime Number Detection

Python is a high-level programming language known for its simplicity and ease of use. However,…

8 months ago
  • Programming

Using OOP, Iterator, Generator, and Closure in Python to implement common design patterns

Object-Oriented Programming (OOP), iterators, generators, and closures are powerful concepts in Python that can be…

8 months ago

This website uses cookies.