Python BeautifulSoup提取字体标记的内容-Java 学习之路

嘿家伙我正在尝试使用beautifulSoup来获取字体标记的内容 . 在html页面我正在解析标签我希望得到的文字看起来像：

<font color="#000000">Text I want to extract</font>

关闭另一个stackOverFlow问题（how to extract text within font tag using beautifulsoup）我正在尝试使用

html = urlopen(str(BASE_URL)).read()
soup = BeautifulSoup(html, "lxml")
info=soup('font', color="#000000")

print str(info)

但print语句只返回 [] . 知道我做错了什么吗？

1 回答

干得好：

from bs4 import BeautifulSoup

html = """<font color="#000000">Text I want to extract</font>"""

soup = BeautifulSoup(html, 'html.parser')

result1 = soup.find('font').text  # not specifying the color attribute
result2 = soup.find('font', {'color':'#000000'}).text  # specifying the color attribute

print result1  # prints 'Text I want to extract'
print result2  # prints 'Text I want to extract'

回复于 2024-04-25T21:27:43+08:00

Python BeautifulSoup提取字体标记的内容

1 回答

相关问题