使用Beautiful Soup时删除html标签的问题-Java 学习之路

我正在使用漂亮的汤来从网站上抓取一些数据，但我无法在打印时从数据中删除html标签 . 推荐的代码是：

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup

page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    print anchor1
for anchor2 in soup.findAll('div', {"class": "gridPrice"}):
    print anchor2
for anchor3 in soup.findAll('div', {"class": "gridMultiDevicePrice"}):
    print anchor3

我使用此输出，看起来像这样：

<div class="listGrid-price"> 
                                $99.99 
            </div>
<div class="listGrid-price"> 
                                $0.01 
            </div>
<div class="listGrid-price"> 
                                $0.01 
            </div>

我只想要输出中的价格而没有任何html标签 . 请原谅我的无知，因为我是编程新手 .

1 回答

0
您正在打印找到的标签 . 要仅打印包含的文本，请使用 .string 属性：
```
print anchor1.string
```
.string 的值是NavigableString instance;像普通的unicode对象一样使用它，首先转换它 . 然后你可以使用 strip() 来删除额外的空格：
```
print unicode(anchor1.string).strip()
```
调整一点以允许空值：
```
for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    if anchor1.string:
        print unicode(anchor1.string).strip()
```
这给了我：
```
$99.99
$0.99
$0.99
$299.99
$199.99
$49.99
$49.99
$99.99
$0.99
$99.99
$0.01
$0.01
$0.01
$0.01
$0.01
```
回复于 2024-05-05T05:02:56+08:00

使用Beautiful Soup时删除html标签的问题

1 回答

相关问题