首页 文章

未能使用Selenium抓取Web数据

提问于
浏览
0

我正在尝试从https://icostats.com/上的首页表中获取数据 . 但事情就是不点击 .

from selenium import webdriver

browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
browser.get("https://icostats.com")
browser.find_element_by_xpath("""//*[@id="app"]/div/div[2]/div[2]/div[2]/div[2]/div[8]/span/span""").s()
posts = browser.find_element_by_class_name("tdPrimary-0-75")
for post in posts:
    print(post.text)

我得到的错误:

C:\ Python36 \ python.exe C:/.../ PycharmProjects / PyQtPS / ICO_spyder.py Traceback(最近一次调用最后一次):文件“C:/.../ PycharmProjects / PyQtPS / ICO_spyder.py”,行5,在browser.find_element_by_xpath(“”“// [@ id =”app“] / div / div [2] / div [2] / div [2] / div [1] / div [2]”“” ).click()文件“C:\ Python36 \ lib \ site-packages \ selenium \ webdriver \ remote \ webdriver.py”,第313行,在find_element_by_xpath中返回self.find_element(by = By.XPATH,value = xpath)文件“C:\ Python36 \ lib \ site-packages \ selenium \ webdriver \ remote \ webdriver.py”,第791行,在find_element'value':value})['value']文件“C:\ Python36 \ lib \ site -packages \ selenium \ webdriver \ remote \ webdriver.py“,第256行,执行self.error_handler.check_response(响应)文件”C:\ Python36 \ lib \ site-packages \ selenium \ webdriver \ remote \ errorhandler.py“ ,第194行,在check_response中引发exception_class(message,screen,stacktrace)selenium.common.exceptions.NoSuchElementException:消息:没有这样的元素:无法定位元素:{“method”:“xpath”,“selector”:“// [@ ID =“一pp“] / div / div [2] / div [2] / div [2] / div [1] / div [2]”}(会话信息:chrome = 59.0.3071.115)(驱动程序信息:chromedriver = 2.30 . 477700(0057494ad8732195794a7b32078424f92a5fce41),platform = Windows NT 6.1.7600 x86_64)

EDIT

终于搞定了:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait

browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
browser.get("https://icostats.com")
wait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(2) > div:nth-child(8)")))

posts = browser.find_elements_by_class_name("thName-0-55")
for post in posts:
    print(post.text)

posts = browser.find_elements_by_class_name("tdName-0-73")
for post in posts:
    print(post.text)

有没有办法迭代每个 Headers /列并将其导出到csv文件,而不必像这样经历每个类?

2 回答

  • 1

    JavaScript 动态生成所需数据 . 您需要等到它出现在页面上:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait as wait
    
    browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
    browser.get("https://icostats.com")
    wait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div#app>div")))
    posts = browser.find_element_by_class_name("tdPrimary-0-75")
    for post in posts:
        print(post.text)
    
  • 1
    • 好像这一行没有 s() method

    browser.find_element_by_xpath(“”“// * [@ id =”app“] / div / div [2] / div [2] / div [2] / div [2] / div [8] / span / span “”“).S()

    所以,你需要的可能是什么

    browser.find_element_by_xpath("""//*[@id="app"]/div/div[2]/div[2]/div[2]/div[2]/div[8]/span/span""").text
    
    • 由于您要迭代结果,此行:

    posts = browser.find_element_by_class_name("tdPrimary-0-75")

    应该

    posts = browser.find_elements_by_class_name("tdPrimary-0-75")
    

相关问题