首页 文章

从HTML TD中检索多个字符串,使用TD元素循环划分字符串并加入它们?

提问于
浏览
0

因此,我正在尝试学习如何使用Beautiful Soup从已将关键字符串连接到一个块的网站获取数据 . 我在网络上可以胜任谷歌,取得了一些成功 . 我被困在这一点上,似乎我错过了一些基础知识,但我不得不寻求帮助并且四处走动 . 我希望有人可以指出我正确的方向或给我一些反馈,因为我出错了:

首先::我给出了这个问题的简单版本,因为我不想发布一本书 . 如果有人愿意深入解决问题和我犯的实际错误,我会将我编写的脚本和实际代码附加在单独的文件中 . 我相信这是我用字符串和列表做的一个小概念错误,没有进一步的延迟

enter code here

<html>
    <head>


        <center>
        <font face="arial" size="5">
        <table border="0" cellpadding="0" cellspacing="0" width="100%" bgcolor="#000066">

        <tr>
            <td align="left" valign="top" bgcolor="#000066">

          <a href="/"><img height="50" width="540" src="/leftbar-quote.gif" border="0" usemap="#leftbar10b39c7"></a>
              <map name="leftbar10b39c7"><area href="/outside/multi.htm" coords="328,5,390,36" shape="rect">
              <area href="/index.htm" coords="254,5,322,37" shape="rect">
              <area href="#" coords="185,5,251,35" shape="rect" onclick="history.back(); return false;">
              <area href="/cgi-bin/quoteForm.cgi?type=q&sEmail=&part=Engine&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&name=AutoPartex.net&int=-1&uIMS=&userSearch=exact&seqNum=600000000000000000456918622&ref=&userid=1000&email=&userClaim=&userLang=&userZip=&selleruserid=1000" coords="400,5,460,36" shape="rect">
              <area href="/buyerfaq.htm" coords="470,5,530,36" shape="rect">
              </map>


            </td>
            <td valign=top><div align="right"><img height="50" width="36" src="/result-rs.gif"></div></td>
        </tr>
<tr>
<td COLSPAN=2><table WIDTH="100%"><tr>
            <td width="10" valign="top"><img height="30" width="10" src="/trans4.gif"></td>
            <td width="90%">
            <b>
<div style='font-size:18pt; font-style: italic; color: white;'><b>Results sorted by <u>PRICE</u></b> <span class="small"><b>(Click on heading to re-sort)</b></span>
</div><font color='#FFFFFF' face='Arial,Helvetica,Geneva,Swiss,SunSans-Regular' size='2'>Click back to modify your previous choice.<br>Most prices do not include extended warranties or shipping.<br>Not all displayed parts are interchangeable. Please verify with the recycler that the part fits your auto.
</font></b></td><td valign=bottom align=center><table bgcolor="#e4e4e4"width=350 cellpadding=3 border=1 cellspacing=0><tr><td align=center><form method="post" action="/cgi-bin/search.cgi" style="display: inline"><input type= hidden name=userDate value="2005"><input type= hidden name=userModel value="Ford Focus"><input type= hidden name=userLocation value="USA"><input type= hidden name=userPreference value="price"><input type= hidden name=userZip value=""><input type="hidden" name="userPage" value="1"><input type="hidden" name="userInterchange" value="None"><input type="hidden" name="userDate2" value="Ending Year"><input type="hidden" name="userSearch" value="int"><input type="hidden" NAME="userClaim" VALUE=""> <input type="hidden" NAME="userClaimer" VALUE=""> <input type="hidden" NAME="userLang" VALUE=""> <input type="hidden" NAME="userLat" VALUE=""> <input type="hidden" NAME="userLong" VALUE=""> <input type="hidden" NAME="userCSA" VALUE=""> <input type="hidden" NAME="userMCO" VALUE=""> <input type="hidden" NAME="userAdjuster" VALUE=""> <input type="hidden" NAME="userItem" VALUE=""> <input type="hidden" NAME="hpsDate" VALUE=""> <input type="hidden" NAME="hpsGroup" VALUE=""> <input type="hidden" NAME="reqId" VALUE=""> <input type="hidden" NAME="thirdMapType" VALUE=""> <input type="hidden" NAME="vendUrl" VALUE=""> <input type="hidden" NAME="iCN" VALUE=""> <input type='hidden' name='limitYears' value=''> <input type='hidden' name='userIntSelect' value='711575'> <input type='hidden' name='userVIN' value=''> <input type='hidden' name='vinSearch' value='0'> <input type='hidden' name='userVINModelID' value=''> <input type="hidden" name="uID" value=""><input type="hidden" name="uPass" value=""><table bgcolor="#e4e4e4" width=350 cellpadding=3 border=1 cellspacing=0><tr><td colspan=2 align=center>2005&nbsp;Ford Focus<br>Engine<br></td></tr><tr> <td align=center> <font style="font-size: 10pt">Non-Interchange search for year:<br></font> <font style="font-size: 10pt"><b>2005</b><br><br></font> <br> <br><font style="font-size: 8pt"><a style="color:blue" href="/cgi-bin/search.cgi?userDate=2005&userModel=Ford%20Focus&userPart=Engine&origPart=&userPreference=price&userZip=&userLat=&userLong=&userVIN=&dbPart=300.1&userIntSelect=711575&userClaimer=&userClaim=&uID=&uPass=&userLocation=USA&userSearch=int">Click Here</a> to see All Interchange Choices </font> </td> </table></table></form> </td></tr></table></td></tr></table><table width="100%" border="1" cellspacing="0" cellpadding="4"> <tr align=center> <td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=year&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Year</a><br>Part<br>Model</td> <td>Description</td> <td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=miles&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Miles</a></td> <td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=grade&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Part <br> Grade</a></td> <td>Stock#</td> <td>US<br>Price</td> <td>Dealer Info</td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td><a href=""><img width="100" hspace="3" align="middle" onclick="return popupImg('seller=2013&partGUID=2013-1-282435&vehicleGUID=2013-1-V18432&display=2005%20Ford%20Focus%20Engine%20Assembly-Stock%23%2010286')" src="http://wsimgoh.autopartex.net/2013/2015/10286/2013_18432_05_thumb.jpg"></img></a>ZX4,2.0,EFI,FATO,FWDRUNSGREAT</td><td align=right>&nbsp;</td><td align=center>&nbsp;</td><td>10286</td><td align=center>$350550</td><td><A HREF="http://www.LaPointAuto.com" target="_top">LaPoint Discount MIDW</A> USA-OH(Holland) <A HREF="/cgi-bin/quoteForm.cgi?type=g&sEmail=shawn@LaPointAuto.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=10286&price=350550&desc=ZX4%2C2.0%2CEFI%2CFATO%2CFWDRUNSGREAT&name=LaPoint%20Discount%20MIDW&url=http://www.LaPointAuto.com&int=-1&broker=0&recycler=0&selleruserid=2013&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 419-865-2329 / 800-845-0270 <A HREF="/cgi-bin/quoteForm.cgi?type=i&sEmail=shawn@LaPointAuto.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=10286&price=350550&desc=ZX4%2C2.0%2CEFI%2CFATO%2CFWDRUNSGREAT&name=LaPoint%20Discount%20MIDW&url=http://www.LaPointAuto.com&int=-1&broker=0&recycler=0&selleruserid=2013&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=2013&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=350550&pst=10286&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td>TESTED,2.3L,5MT,08/04,FWD,+CORE</td><td align=right>&nbsp;</td><td align=center>&nbsp;</td><td>E94764</td><td align=center>$1500</td><td><A HREF="http://www.ParadiseAutoParts.com" target="_top">Paradise Auto Parts-ELITE</A> USA-MD(Elkton) <A HREF="/cgi-bin/quoteForm.cgi?type=g&sEmail=mdriver@complete-recycle.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=E94764&price=1500&desc=TESTED%2C2.3L%2C5MT%2C08%2F04%2CFWD%2C%2BCORE&name=Paradise%20Auto%20Parts-ELITE&url=http://www.ParadiseAutoParts.com&int=-1&broker=0&recycler=0&selleruserid=2843&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 888-811-5051/410-620-5051 <A HREF="/cgi-bin/quoteForm.cgi?type=i&sEmail=mdriver@complete-recycle.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=E94764&price=1500&desc=TESTED%2C2.3L%2C5MT%2C08%2F04%2CFWD%2C%2BCORE&name=Paradise%20Auto%20Parts-ELITE&url=http://www.ParadiseAutoParts.com&int=-1&broker=0&recycler=0&selleruserid=2843&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=2843&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=1500&pst=E94764&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td>175-175</td><td align=right>38,916</td><td align=center>A</td><td>FC6555</td><td align=center>$1250</td><td><A HREF="http://www.DonsSportcar.com" target="_top">Don's Sportcar</A> USA-CO(Pueblo) <A HREF="/cgi-bin/quoteForm.cgi?type=g&sEmail=parts@DonsSportcar.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=FC6555&price=1250&desc=175-175&name=Don's%20Sportcar&url=http://www.DonsSportcar.com&int=-1&broker=0&recycler=0&selleruserid=3776&miles=38.916&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 800-332-3649 <A HREF="/cgi-bin/quoteForm.cgi?type=i&sEmail=parts@DonsSportcar.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=FC6555&price=1250&desc=175-175&name=Don's%20Sportcar&url=http://www.DonsSportcar.com&int=-1&broker=0&recycler=0&selleruserid=3776&miles=38.916&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=3776&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=1250&pst=FC6555&pgr=A&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr> </table> </div> </body> </html>

这是html文本和结构 . 以下是我在实际方面需要帮助的方法:

  • 没有css装饰器我无法找到使用xpath或类似selenium的传统示例 . 然而,我可能是错的,诺布

  • 我需要将单元格中的文本分隔成单独的字符串 .

  • 使用BeautifulSoup我尝试使用几种方法来获取文本

  • 在尝试这样的事情后,我收到了这个错误:

从bs4 import BeautifulSoup输入代码

汤= BeautifulSoup(打开(“./ test.html”),“lxml”)

trs = soup.find_all('tr')

for tr in trs:

tds = tr.find_all("td")

    try:
        result = str(tds[0].get_text())

    except:
          adjust =  ' '
          continue

    result = result.split(" ")

    result = str.replace('2005Engine', "2005Engine", "2005 ")  + str.replace('AssemblyFord', "AssemblyFord", "Engine Assembly ") + str.repl$

    strresult = ''.join(result)


    trs = soup.find_all('tr')

    for tr in trs:

           tds = tr.find_all("td")

           tds[0] = strresult

           tds.get_text()

           print(tds)

错误信息:

回溯(最近一次调用最后一次):文件“carpartbs5.find.td.py”,第33行,在tds.get_text()中

文件"/usr/local/lib/python2.7/dist-packages/bs4/element.py",第1807行,在 getattr

"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key

AttributeError:ResultSet对象没有属性'get_text' . 您可能正在处理像单个项目的项目列表 . 当你打算调用find()时,你调用了find_all()吗?

这里是FLIP-SCOUP ::

当我只打印tds时,它会使用任何数组替换第一个td,但是,每当我尝试使用BeautifulSoup中的get_text()方法返回文本时,它会抛出该错误 . 该错误似乎表明我在一个不可能的事情上调用方法时遇到问题 .

所以我对列表和字符串并不十分清楚 . 我尝试将我的列表转换为实际的字符串,但它不起作用 . 我想是因为我正在使用一个列表,这就是为什么它无法获取文本的原因 . 如果是这样,有没有更好的方法使用BeautifulSoup来实现以下目标:

  • 从每个元素中的这些位置中获取单个文本

  • 用逗号分隔的字符串结果加入它们?

希望这有帮助,我没有足够的积分来发布图片或上传文件 . 最后一个文本是我的程序吐出的,如果我不在tds变量上尝试和调用一个美丽的方法 . 提前致谢!

My Code

`来自bs4 import BeautifulSoup

汤= BeautifulSoup(打开(“./ test.html”),“lxml”)

trs = soup.find_all('tr')

for tr in trs:

tds = tr.find_all("td")

    try:
        result = str(tds[0].get_text())

    except:
          adjust =  ' '
          continue

    result = result.split(" ")

    result = str.replace('2005Engine', "2005Engine", "2005 ")  + str.replace('AssemblyFord', "AssemblyFord", "Engine Assembly ") + str.repl$

    strresult = ''.join(result)


    trs = soup.find_all('tr')

    for tr in trs:

           tds = tr.find_all("td")

           tds[0] = strresult

           print(tds)'

What Was Returned - A Sample

['2005 Engine Assembly Ford Focus ', <td>139K</td>, <td align="right">\xa0</td>, <td align="center">\xa0</td>, <td>0232</td>, <td align="center">$800</td>, <td><a href="http://someurl.com" target="_top">Chads Part </a> USA-FL(Jacksonville)  <a href="/cgi-bin/quoteForm.cgi?type=g&amp;sEmail=chadsparts@someplace.com&amp;email=&amp;part=Engine%20Assembly&amp;dbPart=300.1&amp;dbSubPart=&amp;model=Ford%20Focus&amp;dbModel=27.20&amp;year=2005&amp;stockNum=0232&amp;price=800&amp;desc=139K&amp;name=Chads%20Parts&amp;url=http://someurl.com&amp;int=-1&amp;broker=0&amp;recycler=0&amp;selleruserid=3566&amp;miles=-1&amp;condition=-1&amp;userid=1000&amp;uIMS=&amp;seqNum=600000000000000000456918622&amp;userClaim=&amp;userLang=">Request_Quote</a> 1-510-569-4845 <a href="/cgi-bin/quoteForm.cgi?type=i&amp;sEmail=chadsparts@someplace.com&amp;email=&amp;part=Engine%20Assembly&amp;dbPart=300.1&amp;dbSubPart=&amp;model=Ford%20Focus&amp;dbModel=27.20&amp;year=2005&amp;stockNum=0232&amp;price=800&amp;desc=139K&amp;name=Chads%20Parts=rs&amp;url=http://someurl.com&amp;int=-1&amp;broker=0&amp;=0&amp;selleruserid=3566&amp;miles=-1&amp;condition=-1&amp;userid=1000&amp;uIMS=&amp;seqNum=600000000000000000456918622&amp;userClaim=&amp;userLang=">Request_Insurance_Quote</a>
<a href="http://someurl.com/cgi-bin/applet.cgi?sid=3566&amp;brf=&amp;bds=&amp;bsr=price&amp;pin=&amp;pyr=2005&amp;pmd=Ford%20Focus&amp;ppt=Engine%20Assembly&amp;ppr=800&amp;pst=0232&amp;pgr=&amp;bty=WEB&amp;bem=&amp;bzp=&amp;ses=600000000000000000456918622" onclick="window.open(this.href,this.target,getPrm()); return false" target="_blank"><img border="0" src="/images/LiveChat_space.gif"/></a></td>]

只是为了加强::

我只想将这些元素中的文本用逗号分隔成一个字符串,我可以在准备编写csv文件时再次使用它 .

年,部分,汽车制造,汽车模型,描述,英里,部分等级,股票#,价格,经销商名称,国家,状态,城市,电话

  • 第一个单元格和最后一个单元格最难弄清楚如何将字符串输出到列表中并返回到上面相同顺序的字符串 . 谢谢!

1 回答

  • 0

    如果你想要3次“2005 Engine Assembly Ford Focus”(如你的html示例中所示),你可以这样做:

    • 由于包含的HTML结构不合理,您应该从中获取最后一个表 .

    table = soup.findAll('table')[-1]

    • 接下来你应该采取除第一行之外的所有行( Headers 行)

    tr = table.findAll('tr')[1:]

    它将是阵列 . 之后你可以循环遍历行 .

    • 最后你应该从每一行获取第一个 td 标签 . 我将只为第一行做 .

    td = tr[0].td

    • 现在您将拥有以下内容:

    <td>2005
    Engine Assembly
    Ford Focus</td>

    不幸的是,我不知道如何处理这个字符串 . 例如,您可以使用此方法:

    td = tr[0].td.children

    您将获得包含所有单词和标签的数组,并根据需要进行处理 .

相关问题