首页 文章

HTTP请求的正则表达式在某些情况下不起作用

提问于
浏览
0

我正在创建一个java服务器来处理仅使用 Socket 类的HTTP请求,因为我的教授说我们无法使用HTTP库(因为我们的目标是学习HTTP ...) . 所以,我决定使用正则表达式处理请求 . 在代码上发生的第一件事是它获取请求的每一行并将其转换为一个我用模式处理的字符串 . 我只需要实现以下情况: GETPOSTPUTHEADDELETE . 我正在使用Google Chrome扩展程序 Postman 来测试我的程序 . 以下是我将邮件变成单个字符串后来自邮递员的一些请求示例:

得到:

GET / HTTP / 1.1主机:127.0.0.1:15000连接:keep-alive Cache-Control:无缓存用户代理:Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,类似Gecko)Chrome / 53.0 .2785.101 Safari / 537.36 Postman-Token:dd87e652-2b21-3632-30ad-ace26581d369接受:/ Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en; q = 0.8

没有身体的帖子:

POST / HTTP / 1.1主机:127.0.0.1:15000连接:keep-alive内容长度:0缓存控制:无缓存原点:chrome-extension:// fhbjgbiflinjbdggehcddcbncdddomop用户代理:Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 53.0.2785.101 Safari / 537.36 Postman-Token:8094b5ce-4b3d-cee7-2d10-f5dd2bc6b7b2接受:/ Accept-Encoding:gzip,deflate Accept-Language:en-US,烯; q = 0.8

邮寄身体:

POST / HTTP / 1.1主机:127.0.0.1:15000连接:keep-alive内容长度:9 Postman-Token:3fb2f5e0-2df1-5af4-7853-e9de84648dd5 Cache-Control:no-cache原产地:chrome-extension:/ / fhbjgbiflinjbdggehcddcbncdddomop用户代理:Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,类似Gecko)Chrome / 53.0.2785.101 Safari / 537.36内容类型:text / plain; charset = UTF-8接受:/接受 - 编码:gzip,deflate Accept-Language:en-US,en; q = 0.8

等等...

我写的模式是:

String somethingPattern = "(.*)?";

    String ipPattern = "(((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))|"+somethingPattern+")((:)\\d{3,})?"; // regex for ip varying from 0.0.0.0 to 255.255.255.255 or some string, followed or no by : and a port number 
    String objetoPattern = "([/?a-zA-Z0-9\\.\\-_]+)"; // regex for a linux path to a file, including only letters, numbers and -_.

    String connectionPattern = "(connection:\\s*"+somethingPattern+")?";
    String contentLenPattern = "(content-length:\\s*([0-9]+))?";
    String postmanTokenPattern = "(postman-token:\\s*"+somethingPattern+")?";
    String cacheControlPattern = "(cache-control:\\s*"+somethingPattern+")?";
    String originPattern = "(origin:\\s*"+somethingPattern+")?";
    String userAgentPattern = "(user-agent:\\s*"+somethingPattern+")?";
    String charsetPattern = "(charset="+somethingPattern+")?";
    String contentTypePattern = "(content-type:\\s*"+somethingPattern+";"+charsetPattern+")?";
    String acceptPattern = "(accept:\\s*"+somethingPattern+")?";
    String acceptEncodingPattern = "(accept-encoding:\\s*"+somethingPattern+")?";
    String acceptLanguagePattern = "(accept-language:\\s*"+somethingPattern+")?";


    // (?i) is for the case of coming get, Get, GET... etc...
    String pattern = "^(?i)(get|put|head|post|delete)\\s+?" + objetoPattern + "\\s+?HTTP/1.1\\s+?host:\\s+?" + ipPattern + "\\s+?" + connectionPattern + "\\s+?" + contentLenPattern + "\\s+?" + postmanTokenPattern + "\\s+?" + cacheControlPattern + "\\s+?" + originPattern + "\\s+?" + userAgentPattern + "\\s+?" + contentTypePattern + "\\s+?" + acceptPattern + "\\s+?" + acceptEncodingPattern + "\\s+?" + acceptLanguagePattern + "\\s+?$";

正则表达式匹配和分组从 GETHEADPOST without a body 的大部分请求 except . 我不知道为什么会这样 . 我在每个模式的末尾放置一个 ? ,仅用于例如 origincontent-length 或者请求中不存在的情况 . 但即使它不符合这些情况 . 匹配代码的一部分是:

Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(in); // this in is the input string that is the request all joined in a single line string

if(m.find()){
// ......
} else {
  System.out.println("Input didn't match");
}

EDIT :处理来自Socket的输入的代码部分:

bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));

        String in = "";
        while((msgDoSocket = bufferedReader.readLine()) != null){
            try {
                in += msgDoSocket + " ";
                if(msgDoSocket.isEmpty()){
                    processaInput(in); // this calls the part that process regex
                }
            } catch (Exception ex) {
                Logger.getLogger(ServerThread.class.getName()).log(Level.SEVERE, null, ex);
            }
        }

1 回答

  • 2

    Headers 行由换行符分隔, Headers 与正文(如果存在)分开,并有2个连续的换行符 . 您应该使用 Scanner 对象,因为它默认使用换行符来分隔标记,比 Matcher 容易得多 . 你可以简单地遍历这些行 . 当你得到那些 Headers 时,你可以通过':'对它们进行切片,以形成 Map 而不是百万种类型的变量来覆盖所有 Headers 键的可能性 . 然后,您只需检查映射键值以匹配您发送的内容 .

    您也可以使用Fiddler / Wireshark查看邮递员的原始请求 .

    This使用阅读器回答并做同样的事情 .

相关问题