美丽的汤文档提供了属性.contents和.children来访问给定标记的子元素(分别是列表和迭代),并包括Navigable Strings和Tags . 我只想要Tag类型的孩子 .
我目前正在使用列表理解来完成此任务:
rows=[x for x in table.tbody.children if type(x)==bs4.element.Tag]
但我想知道是否有一个更好/更pythonic /内置的方式来获得Tag儿童 .
感谢J.F.Sebastian,以下内容将有效:
rows=table.tbody.find_all(True, recursive=False)
这里的文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#true
在我的情况下,我需要表中的实际行,所以我最终使用了以下内容,这更精确,我认为更具可读性:
rows=table.tbody.find_all('tr')
再次,文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names
我相信这比迭代标签的所有子项更好 .
使用以下输入:
<table cellspacing="0" cellpadding="0"> <thead> <tr class="title-row"> <th class="title" colspan="100"> <div style="position:relative;"> President <span class="pct-rpt"> 99% reporting </span> </div> </th> </tr> <tr class="header-row"> <th class="photo first"> </th> <th class="candidate "> Candidate </th> <th class="party "> Party </th> <th class="votes "> Votes </th> <th class="pct "> Pct. </th> <th class="change "> Change from ‘08 </th> <th class="evotes last"> Electoral Votes </th> </tr> </thead> <tbody> <tr class=""> <td class="photo first"> <div class="photo_wrap"><img alt="P-barack-obama" height="48" src="http://i1.nyt.com/projects/assets/election_2012/images/candidate_photos/election_night/p-barack-obama.jpg?1352320690" width="68" /></div> </td> <td class="candidate "> <div class="winner dem"><img alt="Hp-checkmark@2x" height="9" src="http://i1.nyt.com/projects/assets/election_2012/images/swatches/hp-checkmark@2x.png?1352320690" width="10" />Barack Obama</div> </td> <td class="party "> Dem. </td> <td class="votes "> 2,916,811 </td> <td class="pct "> 57.3% </td> <td class="change "> -4.6% </td> <td class="evotes last"> 20 </td> </tr> <tr class=""> <td class="photo first"> </td> <td class="candidate "> <div class="not-winner">Mitt Romney</div> </td> <td class="party "> Rep. </td> <td class="votes "> 2,090,116 </td> <td class="pct "> 41.1% </td> <td class="change "> +4.3% </td> <td class="evotes last"> 0 </td> </tr> <tr class=""> <td class="photo first"> </td> <td class="candidate "> <div class="not-winner">Gary Johnson</div> </td> <td class="party "> Lib. </td> <td class="votes "> 54,798 </td> <td class="pct "> 1.1% </td> <td class="change "> – </td> <td class="evotes last"> 0 </td> </tr> <tr class="last-row"> <td class="photo first"> </td> <td class="candidate "> div class="not-winner">Jill Stein</div> </td> <td class="party "> Green </td> <td class="votes "> 29,336 </td> <td class="pct "> 0.6% </td> <td class="change "> – </td> <td class="evotes last"> 0 </td> </tr> <tr> <td class="footer" colspan="100"> <a href="/2012/results/president">President Map</a> | <a href="/2012/results/president/big-board">President Big Board</a> | <a href="/2012/results/president/exit-polls?state=il">Exit Polls</a> </td> </tr> </tbody> </table>
1 回答
感谢J.F.Sebastian,以下内容将有效:
这里的文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#true
在我的情况下,我需要表中的实际行,所以我最终使用了以下内容,这更精确,我认为更具可读性:
再次,文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names
我相信这比迭代标签的所有子项更好 .
使用以下输入: