I’m trying to scrape a google scholar page, but I can only get the first twenty results that are shown. I’m trying to use selenium to click ‘show more’ so that I can get the rest of the results. Here is what I have, however, it isn’t working (I have the URL stored in a variable):
driver = webdriver.Chrome(executable_path ="/Applications/chromedriver84") driver.get(url) time.sleep(5) element = driver.find_element_by_tag_name('button') element.click()
Any suggestions? Thanks in advance.
The element ‘show more’ in the page has id = ‘gsc_bpf_more’, since you know that, you may use Selenium expected_conditions to wait until the button is loaded on the page and then click it
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Chrome(executable_path="/Applications/chromedriver84") driver.get('https://scholar.google.com/citations?user=VjJm3zYAAAAJ&hl=en') try: #Wait up to 10s until the element is loaded on the page element = WebDriverWait(driver, 10).until( #Locate element by id EC.presence_of_element_located((By.ID, 'gsc_bpf_more')) ) finally: element.click()
If you want to check what more Selenium EC can do, check this https://selenium-python.readthedocs.io/waits.html
Importing ActionChains you can call .click() function to click on elements at the screen:
from selenium import webdriver from bs4 import BeautifulSoup from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver import ActionChains from selenium.webdriver.common.keys import Keys import time driver = webdriver.Chrome(ChromeDriverManager().install()) driver.get('https://scholar.google.com/citations?user=VjJm3zYAAAAJ&hl=en') more = driver.find_element_by_class_name('gs_btnPD') for _ in range(0,5): ActionChains(driver).click(more).perform() time.sleep(3)