getting error while scrapping Amazon using Selenium and bs4
I'm working on a class project using BeautifulSoup and webdriver to scrap Disposable Diapers on amazon for the name of the item, price, reviews, rating.
My goal is to have something like this where I will split this info in different column:
Diapers Size 4, 150 Count - Pampers Swaddlers Disposable Baby Diapers, One
Month Supply
4.0 out of 5 stars
1,982
$43.98
($0.29/Count)
Unfortunately, I get this message after the 50 data appears: message: no such
element: unable to locate element: {"method":"css selector","selector":".a-last"}
Here is my code:
URL = "https://www.amazon.com/s?
k=baby+disposablerh=n%3A166772011ref=nb_sb_noss"
driver = ('C:/Users/Desktop/chromedriver_win32/chromedriver.exe')
driver.get(URL) html = driver.page_source soup = BeautifulSoup(html, "html.parser")
df = pd.DataFrame(columns = ["Product Name","Rating","Number of
Reviews","Price","Price Count"])
while True:
for i in soup.find_all(class_= "sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36
s-result-item sg-col-
4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"):
ProductName = i.find(class_= "a-size-base-plus a-color-base a-text- normal").text#.span.get_text
print(ProductName)
try:
Rating = i.find(class_= "a-icon-alt").text#.span.get_text()
except:
Rating = "Null"
print(Rating)
try:
NumberOfReviews = i.find(class_= "a-size-base").text#.span.get_text()
except:
NumberOfReviews = "Null"
print(NumberOfReviews)
try:
Price = i.find(class_= "a-offscreen").text#.span.get_text()
except:
Price = "Null"
print(Price)
try:
PriceCount = i.find(class_= "a-size-base a-color-secondary").text#.span.get_text()
except:
PriceCount = "Null"
print(PriceCount)
df = df.append({"Product Name":ProductName, "Rating":Rating, "Number of
Reviews":NumberOfReviews,
"Price":Price, "Price Count":PriceCount}, ignore_index = True)
nextlink = soup.find(class_= "a-disabled a-last")
if nextlink:
print ("This is the last page. ")
break
else:
progress = driver.find_element_by_class_name('a-last').click()
subhtml = driver.page_source
soup = BeautifulSoup(subhtml, "html.parser")
Unfortunately, I hit a block road trying to figure out why it is not taking a_last.
Topic web-scraping scraping python
Category Data Science