使用Excel VBA从aspx页面表中检索数据-Java 学习之路

我正在尝试使用excel vba从 aspx 页面检索表数据 . 我知道如何从URL获取表数据但下面是主要问题 .

Problem

有一个aspx页面（比如www.abc.aspx） . 我目前在此页面 . 请将此页面设为 page1 .

现在，我单击当前页面上的 page2 链接 . 值得注意的是，单击此链接后，旧URL（www.abc.aspx）不会更改，但内容会发生变化 . （内容为 page2 ）

如果您查看 page1 源代码

<form method="post" action="page1 url" id="Form1">

无论 action on page1 （第2页点击）是什么，它都会回发相同的 page1 网址 .

那么我怎么能在excel VBA中获取 page2 table 数据，因为我不知道它的URL？

Code

这就是我用来获取表数据的方法 .

我使用了Internet Explorer对象 . 然后导航到链接并将文档保存在htmldoc中 .

ie.navigate "url"

Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Fetching data..."
DoEvents
Loop

Set htmldoc = ie.document

'Column headers
Set eleColth = htmldoc.getElementsByTagName("th")
j = 0 'start with the first value in the th collection
        For Each eleCol In eleColth 'for each element in the td collection
            ThisWorkbook.Sheets(1).Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
            j = j + 1 'move to next element in td collection
        Next eleCol 'rinse and repeat


'Content
Set eleColtr = htmldoc.getElementsByTagName("tr")

'This section populates Excel
    i = 0 'start with first value in tr collection
    For Each eleRow In eleColtr 'for each element in the tr collection
        Set eleColtd = htmldoc.getElementsByTagName("tr")(i).getElementsByTagName("td") 'get all the td elements in that specific tr
        j = 0 'start with the first value in the td collection
        For Each eleCol In eleColtd 'for each element in the td collection
            ThisWorkbook.Sheets(1).Range("D3").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
            j = j + 1 'move to next element in td collection
        Next eleCol 'rinse and repeat
        i = i + 1 'move to next element in td collection
    Next eleRow 'rinse and repeat

ie.Quit
Set ie = Nothing

EDIT:

Example

如果我们点击Stack Overflow（https://stackoverflow.com/questions）中的问题，现在点击第2页的问题（新链接是https://stackoverflow.com/questions？ page=2 ＆sort = newest）

在我的情况下，如果我们点击 page2 ，则新链接不会更新 . 它是相同的旧链接 .

EDIT ：我在这里找到了类似的问题

How do I get url that is hidden by javascript on external website?

谢谢 .

2 回答

0

好的，我很同情，有一种思想流派（包括Tim Berners-Lee）说每个单独的页面都应该有自己的URI和that these don't change .

但网站管理员可以而且确实让你感到困惑 . 他们可以重定向您的HTTP请求，并可以像您的情况一样模糊导航 . 他们可以重写HTTP请求 .

你有两个选择

Option 1 - Let Internet Explorer resolve the new content for you

因此，如果内容在屏幕上可见，则它必须位于文档对象模型（DOM）中 . 在IE中，或者实际上在Chrome中，可以右键单击并获取上下文菜单，然后选择Inspect以查看该元素所在的DOM中的位置 .

我认为你的代码展示了足够的专业知识来钻取 . 但是，有时一些网站喜欢禁用Inspect菜单选项以避免程序员四处寻找 . （编辑：就像你现在我已阅读评论一样）

Option 2 - Use an HTTP sniffing Tool like Fiddler to detect the HTTP redirect/rewrite

如上所述，HTTP请求可以由Web服务器重写和重定向，但是HTTP protocol does give notifications of redirects . 有工具可以检测到这一点 . 一个流行的工具是Fiddler，今天我发现有一个特定的IE Fiddler add-on .

说实话，虽然浏览器本身附带的开发人员工具，特别是Chrome（Ctrl Shift I，然后是网络选项卡），网络流量显示的细节水平越来越与任何嗅探工具相提并论 .

对不起，你得到了投票，这似乎是一个非常合理的问题 .

回复于 2024-05-13T05:25:04+08:00
2

A bird's eye view on the problem ：

您需要看起来无法放手：使用Excel VBA . 我强调这一点，因为答案常常提供解决方案，以满足OP中发布的替代前提 .

A possible solution ：

因此，您必须使用另一个能够显示html重定向或模糊URL内容的工具来连接Excel VBA .

Google Chrome开发者工具会显示所有内容，您可以使用Selenium VBA Wrapper将Excel Chrome与Excel VBA非常接口地连接起来 . 下载here .

它非常通用，例如，你可以看到how to scrape web data .

至于获得混淆的内容，有一些项目可能有所帮助

how to get innerHTML of whole page in selenium driver?（不是VBA但很有用）

Selenium + VBA to Control Chrome

（注意：包装器的作者通常渴望在SO中回答，并在他的答案中准确回答） .

我猜YMMV，总有人试图用各种技巧，并且经常有充分的理由......

如果您有http://www.abc.aspx的真实示例，它可能有所帮助 .

回复于 2024-05-13T05:25:04+08:00

使用Excel VBA从aspx页面表中检索数据

2 回答

相关问题