How to scrape imdb webpage?
I am trying to learn web scraping using Python by myself as part of an effort to learn data analysis. I am trying to scrape imdb webpage.
I am using BeautifulSoup module. Following is the code I am using:
r = requests.get(url) # where url is the above url
bs = BeautifulSoup(r.text)
for movie in bs.findAll('td','title'):
title = movie.find('a').contents[0]
genres = movie.find('span','genre').findAll('a')
genres = [g.contents[0] for g in genres]
runtime = movie.find('span','runtime').contents[0]
year = movie.find('span','year_type').contents[0]
print title, genres,runtime, rating, year
I am getting the following outputs:
The Shawshank Redemption [u'Crime', u'Drama'] 142 mins. (1994)
Using this code, I could scrape title, genre, runtime,and year but I couldn't scrape the imdb movie id,nor the rating. After inspecting the elements (in chrome browser), I am not being able to find a pattern which will let me use similar code as above.
Can anybody help me write the piece of code that will let me scrape the movie id and ratings ?
Topic scraping python data-mining
Category Data Science