美丽的汤 - 意想不到的输出-Java 学习之路

-1

我正在学习美丽的汤，我正在阅读Analytics Vidhya的简短教程，可以在这里找到：https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/

该教程使用Beautiful Soup废弃维基百科中的网页：“https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India

我正在我的Jupyter笔记本中运行命令，除了当我尝试从表标签中提取信息时，我得到相同的输出 .

本教程使用以下命令提取特定表的内容：

soup.find_all("table" , class_ = 'wikitable sortable plainrowheaders' )

根据教程，结果如下：

enter image description here

但是，当我运行相同的命令时，我得到一个不太可读的混乱输出，看起来像这样：

[<table class="wikitable sortable plainrowheaders">\n<tr>\n<th 
   scope="col">No.</th>\n<th scope="col">State or
\nunion  
    territory</th>\n<th scope="col">Administrative capitals</th>\n<th   
    scope="col">Legislative capitals</th>\n<th scope="col">Judiciary

你能解释一下输出的差异吗？

您的建议将不胜感激 .

2 回答

0
它可能是相同的输出，它只是由您的特定开发环境以不同的方式处理 . 你总是可以导入“漂亮的印刷品”，看看是否能够拉直：
```
from pprint import pprint

pprint(my_output)
```
回复于 2024-04-24T11:54:52+08:00
-1
最后我找到了答案 .

使用prettify（）方法可以使打印输出更具可读性 . 但是，find_all（）方法返回一个ResultSet - 它包含“Table”类型和给定属性的所有对象 . prettify（）方法不能应用于ResultSet . 但是，prettify（）可以应用于切片返回的ResultSet的元素 . 因此，要获得表的可打印输出，我们执行以下操作：
```
right_tables = soup.find_all("table" , class_ = 'wikitable sortable plainrowheaders' ) 
  print(right_tables[0].prettify())
```
这将呈现以下输出，这是我正在寻找的：
```
<table class="wikitable sortable plainrowheaders">
   <tr>
    <th scope="col">
      No.
    </th>
    <th scope="col">
      State or
      

         union territory
    </th>
```
等等
回复于 2024-04-24T11:54:52+08:00

美丽的汤 - 意想不到的输出

2 回答

相关问题