首页 文章

将特定单词分隔的句子分组

提问于
浏览
1

我试图将由特定单词分隔的任何合理长度的2个子句子分组(在示例“AND”中),其中第二个可以是可选的 . 一些例子:

情况1:

foo sentence A AND foo sentence B

应给

"foo sentence A" --> matching group 1

"AND" --> matching  group 2 (optionally)

"foo sentence B" --> matching  group 3

CASE2:

foo sentence A

应给

"foo sentence A" --> matching  group 1
"" --> matching  group 2 (optionally)
"" --> matching  group 3

我尝试了以下正则表达式

(.*) (AND (.*))?$

并且它有效,但只有在CASE2中,我在字符串的最后位置放置一个空格,否则图案不匹配 . 如果我在圆括号组内包含“AND”之前的空格,则在情况1中,匹配器包括第一组中的整个字符串 . 我想知道一个前瞻性和后瞻性断言,但不确定他们能帮助我 . 有什么建议吗?谢谢

5 回答

  • 2

    我用这个正则表达式:

    ^(.*?)(?: (AND) (.*))?$
    

    explanation:

    The regular expression:
    
    (?-imsx:^(.*?)(?: (AND) (.*))?$)
    
    matches as follows:
    
    NODE                     EXPLANATION
    ----------------------------------------------------------------------
    (?-imsx:                 group, but do not capture (case-sensitive)
                             (with ^ and $ matching normally) (with . not
                             matching \n) (matching whitespace and #
                             normally):
    ----------------------------------------------------------------------
      ^                        the beginning of the string
    ----------------------------------------------------------------------
      (                        group and capture to \1:
    ----------------------------------------------------------------------
        .*?                      any character except \n (0 or more times
                                 (matching the least amount possible))
    ----------------------------------------------------------------------
      )                        end of \1
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
                                 ' '
    ----------------------------------------------------------------------
        (                        group and capture to \2:
    ----------------------------------------------------------------------
          AND                      'AND'
    ----------------------------------------------------------------------
        )                        end of \2
    ----------------------------------------------------------------------
                                 ' '
    ----------------------------------------------------------------------
        (                        group and capture to \3:
    ----------------------------------------------------------------------
          .*                       any character except \n (0 or more
                                   times (matching the most amount
                                   possible))
    ----------------------------------------------------------------------
        )                        end of \3
    ----------------------------------------------------------------------
      )?                       end of grouping
    ----------------------------------------------------------------------
      $                        before an optional \n, and the end of the
                               string
    ----------------------------------------------------------------------
    )                        end of grouping
    ----------------------------------------------------------------------
    
  • 0

    如何使用

    String split[] = sentence.split("AND");
    

    这将用你的词分开句子,并给你一个子部分列表 .

  • 2

    说明

    此正则表达式将请求的字符串部分返回到请求的组 . and 是可选的,如果在字符串中找不到,那么整个字符串将被放入组1.所有 \s*? 强制捕获的组自动修剪其空白区域 .

    ^\s*?\b(.*?)\b\s*?(?:\b(and)\b\s*?\b(.*?)\b\s*?)?$

    enter image description here

    团体

    0获取整个匹配字符串

    • 在分隔单词 and 之前获取字符串,如果没有 and 则整个字符串出现在此处

    • 获取分隔词,在这种情况下它是 and

    • 获取字符串的第二部分

    Java代码示例:

    情况1

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class Module1{
      public static void main(String[] asd){
      String sourcestring = "foo sentence A AND foo sentence B";
      Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
      Matcher m = re.matcher(sourcestring);
        if(m.find()){
          for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
            System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
          }
        }
      }
    }
    
    $matches Array:
    (
        [0] => foo sentence A AND foo sentence B
        [1] => foo sentence A
        [2] => AND
        [3] =>  foo sentence B
    )
    

    案例2,使用相同的正则表达式

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class Module1{
      public static void main(String[] asd){
      String sourcestring = "foo sentence A";
      Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
      Matcher m = re.matcher(sourcestring);
        if(m.find()){
          for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
            System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
          }
        }
      }
    }
    
    $matches Array:
    (
        [0] => foo sentence A
        [1] => foo sentence A
    )
    
  • 0

    你的情况2有点奇怪......

    但我会这样做

    String[] parts = sentence.split("(?<=AND)|(?=AND)"));
    

    你检查 parts.length . 如果length == 1,则为case2 . 你只是在数组中有句子,你可以添加空字符串作为"group2/3"

    如果在case1中你直接 parts

    [foo sentence A , AND,  foo sentence B]
    
  • 2

    更改你的正则表达式,以便在他的第一句可选后创建空格:

    (.*\\S) ?(AND (.*))?$
    

    或者您可以使用 split() 来使用 AND 以及任何周围空格:

    String sentences = sentence.spli("\\s*AND\\s*");
    

相关问题