使用htmlagilitypack从网页中提取所有`href` /请求任何内容-Java 学习之路

我有这个网页来源：

<a href="/StefaniStoikova"><img alt="" class="head" id="face_6306494" src="http://img0.ask.fm/assets/054/771/271/thumb_tiny/sam_7082.jpg" /></a>
<a href="/devos"><img alt="" class="head" id="face_18603180" src="http://img7.ask.fm/assets/043/424/871/thumb_tiny/devos.jpg" /></a>
<a href="/frenop"><img alt="" class="head" id="face_4953081" src="http://img1.ask.fm/assets/029/163/760/thumb_tiny/dsci0744.jpg" /></a>

我想在 <a href-" 之后提取字符串 . 但我的主要问题是这些字符串不同，我似乎找不到办法 . 没有agilitypack或webrequests .

也许有人对正则表达有所了解？分享它 .

1 回答

使用HtmlAgilityPack获取所需内容应该非常简单 . 假设您已将文档加载到名为 doc 的 HtmlDocument 对象中：

HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//a[@href]");

foreach (HtmlNode node in collection)
{
    // Do what you want with the href value in here. As an example, this just
    //  just prints the value to the console.
    Console.WriteLine(node.GetAttributeValue("href", "default"));
}

回复于 2024-05-04T23:22:35+08:00

使用htmlagilitypack从网页中提取所有`href` /请求任何内容

1 回答

相关问题