首页 文章

如何删除span标记内的span标记

提问于
浏览
2

我正在尝试删除span标记中的span标记,但还没有找到解决方案 . 脚本我试过如下:

request = 'http://urltargethere/adeas/asd'
r = urlopen(request).read()
sew = BeautifulSoup(r, 'lxml')
results = sew.findAll("span", {"class": "titles"})
for x in results:
    print 'text ==> ', x

the result of print is:

<span class="titles"><span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.</span>
<span class="titles"><span class="times">2 hour ago</span>Tara enim ad minim veniam.</span>
<span class="titles"><span class="times">3 hour ago</span>Morol eiusmodtempor incididunt.</span>

我正在寻找的结果是:

Lorem ipsum dolor sit amet.
Tara enim ad minim veniam.
Morol eiusmodtempor incididunt.

3 回答

  • 1

    试试这个摆脱你不想留下的部分:

    content="""
    <span class="title"><span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.</span>
    <span class="title"><span class="times">2 hour ago</span>Tara enim ad minim veniam.</span>
    <span class="title"><span class="times">3 hour ago</span>Morol eiusmodtempor incididunt.</span>
    """
    
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(content,"lxml")
    for item in soup.find_all(class_="title"):
        [tag.extract() for tag in item.find_all(class_="times")]
        print(item.text)
    

    输出:

    Lorem ipsum dolor sit amet.
    Tara enim ad minim veniam.
    Morol eiusmodtempor incididunt.
    
  • 1

    如果您只想要span Headers 类的最终文本,' . contents'将返回span的元素列表(时间 Span 和文本),因此您可以索引所需的索引:

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup('''\
    <span class="title"><span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.</span>
    <span class="title"><span class="times">2 hour ago</span>Tara enim ad minim veniam.</span>
    <span class="title"><span class="times">3 hour ago</span>Morol eiusmodtempor incididunt.</span>''','html.parser')
    
    for s in soup.findAll('span',{'class':'title'}):
        print(s.contents[1])
    

    输出:

    Lorem ipsum dolor sit amet.
    Tara enim ad minim veniam.
    Morol eiusmodtempor incididunt.
    
  • 1

    这可能有所帮助

    Demo:

    from bs4 import BeautifulSoup
    a = '<span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.'
    
    soup = BeautifulSoup(a, 'html.parser')
    for tag in soup.find_all("span", {'class':'times'}):
        tag.replaceWith('')
    
    print soup.get_text()
    

    结果:

    Lorem ipsum dolor sit amet.
    

相关问题