我如何阅读和定位我需要在jsoup中输入登录信息的位置才能访问VPN上的网络?我对所涉及的步骤/主题的解释以及使用java的编程方法感兴趣(基本上如何使用jsoup在java中编写代码) . 注意:对于所有重定向,我很难理解在jsoup-login中发生了什么以及如何/何时/在哪里编码 .

到目前为止,这是我的工作流程:

我有一个目标页面,如下所示

[debug] status code 302 : https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/home/

相应的 Headers :

{Server=Server, Date=Fri, 02 Mar 2018 04:36:49 GMT, Content-Type=text/html; charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, x-REQUESTNAME-id-1=ZZZAAA3YYYBBB9CCC999, x-frame-options=SAMEORIGIN, x-REQUESTNAME-id-2=123aaaWww1111iiiiix7777yyyzzzqqqhhhiiiE/wPUx/IaHiw6hfs7Y7/Gwa1X0, Location=https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/signin/gi-signin.html/123-1234567-1234567?ie=UTF8&landat=%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121, Vary=Accept-Encoding,User-Agent, Content-Encoding=gzip, Set-cookie=session-id-scsus=123-1234567-1234567; path=/; domain=.landingnetwork.com; expires=Tue, 01-Jan-2036 00:00:01 GMT}

当我在java / jsoup中导航到这个URL时,我得到了各种重定向 . 这是我的重定向的踪迹:(接下来的顺序)

[debug] status code 302 : https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/signin/gi-signin.html/123-1234567-1234567?ie=UTF8&landat=%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121

相应的 Headers :

{Server=Server, Date=Fri, 02 Mar 2018 04:36:49 GMT, Content-Type=text/html; charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, x-REQUESTNAME-id-1=ZZZAAA3YYYBBB9CCC999, x-frame-options=SAMEORIGIN, x-REQUESTNAME-id-2=123aaaWww1111iiiiix7777yyyzzzqqqhhhiiiEwwPwwwIaHiw6hfs7Y7vvva1X0, Location=https://wa.secureallnetwork.com/login?clienteId=Centrale-prod-wa&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https%3A%2F%2Fcentrale.landingnetwork.com%3A443%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fsignin%2Fgi-landat.html%2F123-1234567-1234567%3Flandat%3D%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121, Vary=Accept-Encoding,User-Agent, Content-Encoding=gzip, Set-cookie=session-id-scsus=123-1234567-1234567; path=/; domain=.landingnetwork.com; expires=Tue, 01-Jan-2036 00:00:01 GMT}

以及链接路径中的下一个链接:

[debug] status code 200 : https://wa.secureallnetwork.com/login?clienteId=Centrale-prod-wa&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https%3A%2F%2Fcentrale.landingnetwork.com%3A443%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fsignin%2Fgi-landat.html%2F123-1234567-1234567%3Flandat%3D%2Fgp%2Fstores%2Fwww.landingnetwork.com%2Fgp%2Fhome%2F123-1234567-1234567&ort=1122334455.98765&rrt=1112223334.12121

相应的 Headers :

{Server=Server, Date=Fri, 02 Mar 2018 04:36:50 GMT, Content-Type=text/html;charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=31536000; includeSubdomains; preload, Content-Language=en-US, Content-Encoding=gzip, Vary=Accept-Encoding,User-Agent, Set-Cookie=session-id=123-1234567-1234567; Domain=.landingnetwork.com; Expires=Tue, 01-Jan-2036 08:00:01 GMT; Path=/}

以及链接路径中的下一个链接:

[debug] status code 200 : https://wa.secureallnetwork.com/login?sif_profile=gi_profile_1&clienteId=Centrale-prod-wa&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https://centrale.landingnetwork.com:443/gp/stores/www.landingnetwork.com/gp/signin/gi-landat.html/123-1234567-1234567?landat=/gp/stores/www.landingnetwork.com/gp/home/123-1234567-1234567

相应的 Headers :

{Server=Server, Date=Fri, 02 Mar 2018 04:36:50 GMT, Content-Type=text/html;charset=UTF-8, Transfer-Encoding=chunked, Connection=keep-alive, Strict-Transport-Security=max-age=31536000; includeSubdomains; preload, Content-Language=en-US, Content-Encoding=gzip, Vary=Accept-Encoding,User-Agent, Set-Cookie=ubid-main=123-1234567-1234567; Domain=.landingnetwork.com; Expires=Tue, 01-Jan-2036 08:00:01 GMT; Path=/}

编辑:确定nonce值不需要是显示的确切值,我编辑了一个通用值 . &rrt=&ort= 值的想法相同 . (如果这些对我的任务有重要意义,请解释) .

edit2:在每个链接的相应 Headers 中添加 .

edit2:此外,这是登录表单的 action= 值的值 .

/login?sif_profile=gi_profile_1&clientId=Centrale-prod-na&nonce=867:5309:867:5309:867:5309:867:5309:867:53099&redirect_uri=https://centrale.landingnetwork.com:443/gp/stores/www.landingnetwork.com/gp/signin/gi-landat.html/123-1234567-1234567?landat=/gp/stores/www.landingnetwork.com/gp/home/123-1234567-1234567

现在,我在网络方面没有超级大背景,但如果解释得很好/彻底,我肯定可以跟进 .

我的问题:当我浏览重定向时,我不知道为什么我的用户名/密码的表单发布代码不起作用 .

edit3:以下是导航期间的请求标头信息,如Chrome网络标签中所示:

POST /gp/stores/www.landingnetwork.com/gp/handlers/remote-view.html HTTP/1.1
Content-Length: 6013
Accept: */*
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011

POST /gp/stores/www.landingnetwork.com/gp/telephony/handlers/get-due-followup HTTP/1.1
Content-Length: 373
Accept: */*
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011

GET /taw/static/connect-csm.js?_=hereis13numbers HTTP/1.1
Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011

GET /taw/static/secureall-conduit.js?_=hereis13numbers HTTP/1.1
Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011

GET /taw/get-csm-parameters HTTP/1.1
Accept: application/json, text/javascript, */*; q=0.01
Cookie: loc-main=en_US; x-xUid-uid=along141charactersofnonsesnsegoeshere; session-id-time=2082787201l; session-id=123-1234567-1234567; ubid-main=987-6543210-1234567; csrf=-2233445566; x-main="rr@DDhhzzqqeerrttyyuuiiooppkk@77"; at-main=alonglineof417charactersgoeshere; sess-at-main="mm2m/PPPPPPPPPPPPPPPPPPPooooooooooooooooooo="; sst-main=alonglineof200somecharactersgoeshere; session-id-scsus=123-1234567-1234567; session-id-time-scsus=2082758401l; ubid-scsus=987-6543210-1234567; session-token-scsus=hereisacoollineof256characters; skin=noskin; session-token="sessiontokenof268charactersgoeshere"; sidna-p=43charactershere; gidna-p=amonster1607charactershere; cscscs-p=::777::D11vvvvZZZZZZZZZzzzMMMMwwwwwjjjjjeeeeeeevvvvvvbbbbbbbbbbmmmmmm77777777aaaaaa+w/wwwwwww==; csm-hit=222.33|1234567891011

到目前为止,这是我的代码,(两个类);

import java.io.IOException;
import java.net.SocketException;
import java.util.HashMap;

import org.jsoup.Connection;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;
import org.jsoup.UncheckedIOException;
import org.jsoup.nodes.Document;

public class App {
    public static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0";
    public static final String LOGIN_FORM_URL = "https://centrale.landingnetwork.com/gp/stores/www.landingnetwork.com/gp/home/";
    public static final String USERNAME = "myusername";  
    public static final String PASSWORD = "mupassword";

    public static void main(String[] args) throws Exception {
        WebCrawler wc = new WebCrawler();

        // # Go to login page and grab cookies sent by server
        Connection.Response loginForm = wc.crawl(LOGIN_FORM_URL);

        // this is the document containing response html
        Document loginDoc = loginForm.parse();

        // save the cookies to be passed on to next request
        HashMap<String, String> cookies = new HashMap<>(loginForm.cookies());  

        // # Prepare login credentials
        String authToken = loginDoc.select("form").attr("class", "a-spacing-micro").first().attr("action");

        HashMap<String, String> formData = new HashMap<>();
        formData.put("usernameInputField", USERNAME);
        formData.put("passwordInputField", PASSWORD);

        Connection.Response homePage = wc.crawl("https://wa.secureallnetwork.com" + authToken, cookies, formData, Connection.Method.POST, true);
    }
}

import java.io.IOException;
import java.util.HashMap;

import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.Connection.Response;

public class WebCrawler {
    public static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0";

    public Connection.Response crawl(String URL) throws IOException {
        Response response = Jsoup.connect(URL).userAgent(USER_AGENT).followRedirects(false).execute();
        if (response.hasHeader("location")) {
            String redirectUrl = response.header("location");
            return crawl(redirectUrl);
        } else {
            return response;
        }
    }

    public Connection.Response crawl(String URL, HashMap<String, String> cooks, HashMap<String, String> dat, Connection.Method m, boolean follow) throws IOException {
        Response response = Jsoup.connect(URL).userAgent(USER_AGENT).cookies(cooks).data(dat).followRedirects(follow).method(m).execute();

        if (response.hasHeader("location")) {
            String redirectUrl = response.header("location");
            return crawl(redirectUrl);
        } else {
            return response;
        }
    }
}

当我打印出 Headers 时,它们看起来相对简单,唯一可能让我感到惊讶的是, 'X-REQUEST-ID1' / 'X-REQUEST-ID2' Headers ,设置cookie会话ID和位置 . 但是我没有在哪里尝试使用jsoup与多个网页上的数据进行交互 .

To reiterate my question :如何使用java / jsoup以实用方式登录我的网站?如果有人愿意花时间,那么详细/例子/最终代码的详尽解释将是一个光荣的教训!

干杯