Sagar Tamang — GeistHaus

How to web scrape IIM Jobs using Selenium Python in 3 easy steps.

Sagar Tamang Jan 22, 2024

Data is the new diamond and Web Scraping is a powerful technique to gather those valuable data from the internet. Today, we will learn how we can create one web crawler that scrapes the data from iimjobs.com. Part 1: Before we write the code Pre-requisites:Web Driver — Version:120.0.6095.0 for my exampleChromium — Version:120.0.6095.0 for my exampleSelenium […]

Show full content

Data is the new diamond and Web Scraping is a powerful technique to gather those valuable data from the internet. Today, we will learn how we can create one web crawler that scrapes the data from iimjobs.com.

Part 1: Before we write the code

Pre-requisites:
Web Driver — Version:120.0.6095.0 for my example
Chromium — Version:120.0.6095.0 for my example
Selenium Library
BeautifulSoup4 Library

Note: The versions of Web Driver and Chromium should be the same.

You can download my version of the web driver and the chromium from here: https://commondatastorage.googleapis.com/chromium-browser-snapshots/index.html?prefix=Win/1216615/

If you want, you can choose your preferred versions from here:

You can install Selenium and BeautifulSoup4 libraries using the pip command in the terminal.

pip install selenium

pip install beautifulsoup4

Part 2: Let’s write the code

Create one Python file and give it your desired name.

Now, import the necessary libraries at the top of the file, like this.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup

import os #This will be used to get the present working directory
import pandas as pd # Pandas for using the DataFrame
import numpy as np # Numpy for making calculations
import time 
import datetime

After importing the necessary libraries, it’s time to define the chrome options for the chrome driver.

# Give your path of the Chrome Driver in the CHROMEDRIVER_PATH variable
CHROMEDRIVER_PATH = r'C:\Program Files\chromedriver_win32\chromedriver.exe'
service = Service(CHROMEDRIVER_PATH)
WINDOW_SIZE = "1920,1080"
chrome_options = Options()

# In options.binary_location, give your Chromium path.
chrome_options.binary_location = r"C:\Users\TAMANG\Downloads\Win_1216615_chrome-win\chrome-win\chrome.exe"
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.add_argument('--no-sandbox')

Now, let us start by writing the code for Chrome Driver inside the main function.

In order to start writing the code, you need to first understand the structure of the page you want are going to scrape. In our case, it’s going to be IIMJobs website.

So, head over to the IIMJobs website and go to the page you want to scrape.

Let’s search IT jobs

Now, copy the URL of the web page that you are going to scrape and paste it inside the driver.get(), as done in the below excerpt.

We have created a data framewith following columns: ‘Job Title’, ‘Experience Reqd’, ‘City’, ‘Date Posted’, ‘URL’, as we are going to extract those values from the page.

Now, let us understand how to extract the first value of ‘Job Title’

First, press right-click on the Job Title, then click on the inspect option.

This opens up the css panel, which you need to study to pin point the address.

We have also put a scroll code just before we scrape the code, to get more of the data.

def main():
  global dff1
  dff1 = pd.DataFrame(columns=['Job Title', 'Experience Reqd', 'City', 'Date Posted', 'URL'])

  driver = webdriver.Chrome(service = service, options = chrome_options)
  driver.get("https://www.iimjobs.com/search/IT-0-0-0-1.html")

  scroll = np.arange(1, 20)
  counter = 0
  
  for scroll in scroll:
    driver.execute_script("window.scrollTo(0,(document.body.scrollHeight))")
    time.sleep(0.75)

  soup1 = BeautifulSoup(driver.page_source,'html5lib')
  results = soup1.find('div', id='mainContainer')
  job_elems1 = results.find_all('div', class_=['col-lg-9 col-md-9 col-sm-8 container pdmobr5', 'col-lg-3 col-md-3 col-sm-4 pdlr0 mtb2 hidden-xs'])
  # print(job_elems1)

  for job_elem1 in job_elems1:
    finding = counter % 2
    if finding == 0:  
      try: 
        print(counter)
        counter = counter + 1
        
        # Title
        T_n_E = job_elem1.find('a', class_='mrmob5 hidden-xs')
        before_the_parts = T_n_E.get_text()
        parts = before_the_parts.split("(", 1)
        T = parts[0].strip()
        E = parts[1].strip(")")
        Title = T
        # print(Title)
        
        # Experience
        Exp = E
        # print(Exp)

        # URL
        U = job_elem1.find('a',class_='mrmob5 hidden-xs').get('href')
        URL = U
      except Exception as e:
        print("EXCEPTION OCCURRED | COUNTER = " + str(counter))
        pass
    else:
      try:
        print(counter)
        counter = counter + 1
        # Date Posted
        D = job_elem1.find('span', class_='gry_txt txt12 original')
        Date=D.text 
        print(Date)

        # City
        try:
          C = job_elem1.find('span')
          City=C.text.strip()
          print(City)
        except Exception as e:
          City = None
      except Exception as e:
        print("EXCEPTION OCCURRED | COUNTER = " + str(counter))
        pass
    if finding == 1:
      dff1 = pd.concat([dff1, pd.DataFrame([[Title, Exp, City, Date, URL]], columns = ['Job Title', 'Experience Reqd', 'City', 'Date Posted', 'URL'])], ignore_index=True)
      dff1.to_excel("IIMJobsJobListing_BANKING_FINANCE"+ str(datetime.date.today()) + ".xlsx", index = False)
      print(dff1)
    else:
      pass
  # driver.find_element(By.XPATH, '/html/body/div[3]/div[3]/div[9]/div[5]/div/div/div[3]/div/a').click()
  time.sleep(0.5)
  driver.close()

main() # Calling the main function at the end

I have made a more complex code that uses multi-threading to obtain the data from multiple windows of different pages at the same time. You can go through the code in my GitHub here: https://github.com/SAGAR-TAMANG/web-scraping-iimjobs

A tutorial video on YouTube will be released soon describing this.

Step 3: Keep tinkering, keep learning.

http://sagartamang0000.wordpress.com/?p=912

Extensions

The Charge of the Light Brigade by Alfred, Lord Tennyson

Sagar Tamang Aug 20, 2023

I Half a league, half a league, Half a league onward, All in the valley of Death Rode the six hundred. “Forward, the Light Brigade! Charge for the guns!” he said. Into the valley of Death Rode the six hundred. II “Forward, the Light Brigade!” Was there a man dismayed? Not though the soldier knew […]

Show full content

I
Half a league, half a league,
Half a league onward,
All in the valley of Death
   Rode the six hundred.
“Forward, the Light Brigade!
Charge for the guns!” he said.
Into the valley of Death
   Rode the six hundred.

II
“Forward, the Light Brigade!”
Was there a man dismayed?
Not though the soldier knew
   Someone had blundered.
   Theirs not to make reply,
   Theirs not to reason why,
   Theirs but to do and die.
   Into the valley of Death
   Rode the six hundred.

III
Cannon to right of them,
Cannon to left of them,
Cannon in front of them
   Volleyed and thundered;
Stormed at with shot and shell,
Boldly they rode and well,
Into the jaws of Death,
Into the mouth of hell
   Rode the six hundred.

IV
Flashed all their sabres bare,
Flashed as they turned in air
Sabring the gunners there,
Charging an army, while
   All the world wondered.
Plunged in the battery-smoke
Right through the line they broke;
Cossack and Russian
Reeled from the sabre stroke
   Shattered and sundered.
Then they rode back, but not
   Not the six hundred.

V
Cannon to right of them,
Cannon to left of them,
Cannon behind them
   Volleyed and thundered;
Stormed at with shot and shell,
While horse and hero fell.
They that had fought so well
Came through the jaws of Death,
Back from the mouth of hell,
All that was left of them,
   Left of six hundred.

VI

When can their glory fade?
O the wild charge they made!
   All the world wondered.
Honour the charge they made!
Honour the Light Brigade,
   Noble six hundred!

My Commentary

While going through the history of Great Britain, I stumbled upon one spectacular moment of history, the charge of the light brigade.

The charge of the light brigade perfectly encapsulates the moment of heroic and noble actions taken by the brave men in the military.

It also highlights the hierarchy of the powers, where you are not in the authority to question the decisions of your superiors; As it is said in the second para, “Theirs not to make reply, Theirs not to reason why, Theirs but to do and die.”

Sure, the captain did the mistake of reading the order falsely, or maybe the blame lies in the general who wrote the orders in a vastly ambiguous way. Whatever the reason, the lesson we can learn from them in this, almost suicidal, charge against well-fortified positions of the enemy, is that the men are ready to die following the command.

I can only dream to reach the level of professionalism and heroism these men have shown. Truly, the noble six hundred!

I also wonder how many such noble actions in the history of mankind have gone unnoticed. The charge of the light brigade has been immortalized by Alfred’s poem, but I am certain that countless such noble deeds have gone into history completely unnoticed.

Sagar Tamang is a Computer Applications student at the Assam Kaziranga University. His interests spans greatly, from history to geo-politics, from computer science to entrepreneurship.

http://sagartamang0000.wordpress.com/?p=883

Extensions

Government Senior Secondary School, Rumtek, Rawtey, East Sikkim | Rumtek Senior Secondary School

Sagar Tamang Jul 1, 2023

Manju Rai Introduction My school, Government Senior Secondary School, Rumtek, Rawtey, is in a rural area of Sikkim State/UT of India. The school was established in 1881. Government Senior Secondary School is a Co-Ed school affiliated to the Central Board of Secondary Education (CBSE). Motto Towards Progress which means to move forward: to develop to […]

Show full content

Manju Rai

1st Jul 2023

Introduction

My school, Government Senior Secondary School, Rumtek, Rawtey, is in a rural area of Sikkim State/UT of India. The school was established in 1881. Government Senior Secondary School is a Co-Ed school affiliated to the Central Board of Secondary Education (CBSE).

Fig: Rumtek Senior Secondary School Logo

Motto

Towards Progress

which means to move forward: to develop to a higher, better, or more advanced stage. It aims to provide intellectual enlightenment, to socially committed and morally responsible students.

Rumtek Senior Secondary School

My school is a public institution that is located in Sikkim. The school is well equipped in terms of facilities as we have a library and a nice playground, our classes are modern, the buses are adequate and the labs are functional.

I joined this school in the year 2020 and I have been able to learn a lot about the school. The school is affordable and the education I have received is quality because I have developed in all aspects of life.

Not only is the school excellent in education but also excellent in sports. I have always loved playing different sports like kho-kho, Badminton, Volleyball, football, and many more. I participated in many inter-school competitions which also gives chances to participate in extracurricular activities equally.

According to our interests, we are encouraged to participate in Arts & Crafts, NSS, etcetera, and become members of various Clubs and Associations. As I am a member of NSS (National Service Scheme). It is a voluntary association. I had a great experience cause NSS organizes camps and orients the student youth to community service and studying in educational institutions. This has given me a great opportunity to volunteer for community service.

It has been a great experience, especially with support from our teachers. This school has provided many facilities and a quality of good & excellent education system. From here I have started to shine bright and higher. I am very obliged to receive my education from this school and I proudly spread the pride of my school.

Manju Rai
Rai is a 12th-grade science student from Government Senior Secondary School Rumtek, Rawtey, East Sikkim.

http://sagartamang0000.wordpress.com/?p=832

Extensions

Bidding Farewell To Our Teacher, Dr. Dibya Jyoti Bora | Kaziranga University

Sagar Tamang May 18, 2023

All good things must come to an end. It’s a famous proverb, that I would like to remember today to acknowledge the fact that we all will be, someday – somewhere, coming to an end of something. Similarly, one of our respected faculty is bidding farewell to our school and the university. We the SCS […]

Show full content

If you’re brave enough to say goodbye, life will award you with a new hello.
Paulo Coelho

All good things must come to an end. It’s a famous proverb, that I would like to remember today to acknowledge the fact that we all will be, someday – somewhere, coming to an end of something.

Similarly, one of our respected faculty is bidding farewell to our school and the university.

We the SCS students organized a farewell event to commemorate his excellent days in this university, and also to show our deepest gratitude to him for all his hard work.

One of my classmates shot a vlog to make the special day permanent in our memories.

http://sagartamang0000.wordpress.com/?p=813

Extensions

University ER Model (Data Base Management System – DBMS)

Sagar Tamang Apr 17, 2023

University ER Model (DBMS Subject). I created this Entity Relation Model (abbrev: ER Model) by taking inspiration from the management model of my University, Kaziranga University. Any criticism is highly welcomed!

Show full content

University ER Model (DBMS Subject). I created this Entity Relation Model (abbrev: ER Model) by taking inspiration from the management model of my University, Kaziranga University. Any criticism is highly welcomed!

http://sagartamang0000.wordpress.com/?p=736

Extensions

Report on All the Events that Took Place in National Service Scheme (NSS), Kaziranga University, Unit-2 | 7 Day National Special Camp |

Sagar Tamang Mar 29, 2023

22nd of March 2023 to 28th of March 2023 | Jorhat, Assam, India |

Show full content

22nd of March 2023 to 28th of March 2023 | Jorhat, Assam, India |

http://sagartamang0000.wordpress.com/?p=707

Extensions

Consequences Of Acting Without Understanding The Full Context. | 10th of March 2023

Sagar Tamang Mar 10, 2023

Well, I’m about to go on a bit of a rant today. It’s about an event that took place today. We had a letter-writing competition today, which was held for the women’s day celebration, and I had taken part in it along with one of my friends. Okay, there’s no problem with that. But, it […]

Show full content

Well, I’m about to go on a bit of a rant today. It’s about an event that took place today.

We had a letter-writing competition today, which was held for the women’s day celebration, and I had taken part in it along with one of my friends.

Okay, there’s no problem with that.

But, it is about the time we were at the auditorium patiently anticipating our names to be called for the winners.

I was so sure that either my or my friend’s name would be called out because there were literally 5 students participating in the letter-writing competition and there was a HUGE probability that at least one of us would be called for the top 3 positions.

It must be noted that only the top 3 performers would be selected and called out.

One by one, the anchor delivered the names, but in the end, my name was not called, and neither was my friend’s.

Well, we had to accept, in the end, that our letters were not up to the mark. And our handful of competitors managed to outwit us and our beliefs.

Well, that was what we believed, until evening, when I received a message pointing out that we had written letters with the wrong subject.

YES, two of us, sitting together, although didn’t cheat, managed to write a letter on a false subject. And I know why it happened, it is because I had not read the subject properly from the poster. I just made it up in my mind, the false subject, and hence, the outcome.

Lesson to be learnt: Never work on autopilot and try to make up words. Instead, read everything carefully before attempting something. I have done this mistake many times in my exams also, and it is something I need to take care of.

FIN.

You listening to my rants like how these shepherds engage in a discourse. *ImageCourtesy:*https://www.oldbookillustrations.com/illustrations/cuckoo-come-again/

http://sagartamang0000.wordpress.com/?p=674

Extensions

Student’s Experience in Startup Pitch Competition at Kaziranga University | 6th of March 2023

Sagar Tamang Mar 7, 2023

So, we had a STARTUP PITCH DECK at our very own KU! It was a thrilling experience, having to pitch your startup in front of judges. It was my first time, to be honest, pitching a startup. This is what I was expecting life to be like in a university. Having to take part in […]

Show full content

So, we had a STARTUP PITCH DECK at our very own KU!

It was a thrilling experience, having to pitch your startup in front of judges. It was my first time, to be honest, pitching a startup.

This is what I was expecting life to be like in a university. Having to take part in various events, like debate competitions, startup competitions, dramas, etcetera, etcetera.

Anyway, back to the topic, I was thrilled when I heard a startup competition is taking place. This was something I was waiting to happen for so long.

Having heard the news, I asked all my friends to join the competition. Heck, I even posted a message on class’ official group telling everyone, “If someone lacks enough team members,” since a minimum of three members were necessary according to the guidelines, “then they can include my name, I would gladly oversee your idea.”

I guess the previous para justifies my excitement for the event.

Anyway, after a couple of days, one-by-one my team members shrank in size. Some had family functions, some had stage fright, and yada yada; but in the end, myself, and my friend Ashish Tamang were left in the team.

We collectively made plans for the event. And I went to work, creating the presentation.

The event was a success in the end. Although I could only secure 2nd position in the competition, it was a drastic learning experience for me; having to pitch an idea that is a bit something out of ordinary, is a first-of-a-kind experience for me. Well, there’s a first time for everything isn’t it?

Anyway, before I conclude, you can go and watch my whole pitch on my official youtube channel:

My friend Ashish also made a vlog of that day, please have a look at it!

http://sagartamang0000.wordpress.com/?p=657

Extensions

Presentation on Learning How to Learn

Sagar Tamang Feb 26, 2023

Learning How to Learn by Barbara Oakley and Dr. Terrence Sejnowski is a MOOC on Coursera that guides the learners on techniques used by experts to learn complex disciplines of knowledge.

Show full content

Learning How to Learn by Barbara Oakley and Dr. Terrence Sejnowski is a MOOC on Coursera that guides the learners on techniques used by experts to learn complex disciplines of knowledge.

http://sagartamang0000.wordpress.com/?p=600

Extensions

Villain

Sagar Tamang Jan 29, 2023

I was listening to the podcast of Lex having a conversation with El Diablito the other day. El Diablito is a former police member of his country, Mexico. Mexico, along with other South American countries, battles the problem of gang violence. Gruesome terror is inflicted by these members of gangs on a regular basis. But […]

Show full content

I once worked for a villain. A savior to some and a biblical demon of old to others. A true product of his environment, he was the best and worst of us. ‘We are all potential villains in someone else’s story’ he would say to us as we would head out into the unknowns that the night had waiting for us. It was during one of these nights that I looked around me and saw horns and pitchforks among my people and realized what he meant. We were no knights of the round table. Whatever we were.. we were needed. In the end, I guess that justified most of what was about to happen.
El Diablito Chapter. ( Draft notes )

I was listening to the podcast of Lex having a conversation with El Diablito the other day. El Diablito is a former police member of his country, Mexico.

Mexico, along with other South American countries, battles the problem of gang violence. Gruesome terror is inflicted by these members of gangs on a regular basis. But listening to his conversation, however, does shine a light on a more humane outlook on these gangs.

In the discourse, he tells how his old friends had opted for the cartel (gang) life, which is the complete opposite of what he chose. And how each one of them – the bad guys – has a life of their own too. It is just that their choices have led them to their present state, and how each one of them is a hero, in their own way for some, and a villain for others.

http://sagartamang0000.wordpress.com/?p=463

Extensions

https://sagartamang0000.wordpress.com/feed

Posts