首页 文章

BS4:AttributeError:'NoneType'对象没有属性'text'

提问于
浏览
0

我在尝试 grab 某个职位发布网站时遇到了一个问题 . 首先,我的网址是CSV文件“urls.csv”

通常代码运行正常,但我不时会收到这个错误:“AttributeError:'NoneType'对象没有属性'text'”,有时在1次迭代后,有时在30次之后 . 如果问题出现在让我们说我= 230,如果我再次运行它会解析该url,并在一些迭代后再次停止 .

有人可以提出建议吗?谢谢!

此外,错误发生在行textoffer = ......

编辑:链接到csv:https://github.com/DonCheiron/Scraping-Be.Indeed/blob/master/urls.csv

import bs4 as bs
import urllib.request
import csv

with open('C:/Users/******/Desktop/urls.csv', 'r') as f:
    reader = csv.reader(f)
    pages = list(reader)
    for i in range (0,300):
        page = ''.join(map(str, pages[i]))
        print('Working on ' + str(i)+ "...")
        sauce = urllib.request.urlopen(page).read()
        soup =bs.BeautifulSoup(sauce,'lxml')
        textoffer = soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md').text
        file = open(str(i)+ '.txt','w')
        file.write(textoffer)
        file.close()
        print(str(i) + " Done!")

1 回答

  • 1

    使用您提供的一些随机网址,我尝试:

    with open('urls.csv', 'r') as f:
        reader = csv.reader(f)
        pages = list(reader)
    for counter, url in enumerate(pages):
        print(counter, ''.join(url))
        page_response = requests.get(''.join(url))
        print(page_response)
        soup = BeautifulSoup(page_response.content, 'html.parser')
        print(soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md')).text
    

    输出:

    0 https://be.indeed.com/rc/clk?jk=39582947a2d91970&fccid=adb55a49f6636f0e&vjs=3
    <Response [200]>
    
    None
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-511-2b829cd9fc45> in <module>()
          4     print(page_response)
          5     soup = BeautifulSoup(page_response.content, 'html.parser')
    ----> 6     print(soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md')).text
          7
          8
    
    AttributeError: 'NoneType' object has no attribute 'text'
    

    Traceback非常清楚地表明,当没有't anything found is a problem. As to why the same url would only sometimes have this class, it is either not the same url or a dynamic page which doesn' t时,试图将 find 转换为 text 总是包含相同的元素 .

相关问题