python和谷歌应用程序脚本中的正则表达式的区别（后端引擎相关？）-Java 学习之路

我在python（3.6，jupyter notebook）和Google app脚本中尝试了相同的正则表达式，但似乎“非捕获组”在app脚本案例中不起作用 .

# python script:
import re
text='<a class=""email"" href=""mailto:SOisAwesome@hello.edu"">'
regex='(?:<a class=""email"" href=""mailto:)(.+?@hello\.edu)(?:"">)'
match=re.search(regex,text)
print(match.group(1))
# result is 'SOisAwesome@hello.edu'

// Google app script
function myFunction() {
  string='<a class=""email"" href=""mailto:SOisAwesome@hello.edu"">'
  regex=new RegExp('(?:<a class=""email"" href=""mailto:)(.+?@hello\.edu)(?:"">)')
  Match=regex.exec(string)
  Logger.log(Match[1])
  // result is 'a class=""email"" href=""mailto:SOisAwesome@hello.edu'
}

如果我没有弄错的话，谷歌应用程序脚本中的正则表达式引擎应该支持非捕获组（参考https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines，我想我应该看"JavaScript (ECMAScript)"和"Shy groups"？），任何人都可以解释我在这里缺少的东西吗？

提前致谢！

1 回答

1
首先，您需要在GAS正则表达式声明中使用 . 之前的 \\ ，因为文字反斜杠形成正则表达式转义序列 .

现在， it seems that GAS non-capturing group implementation is buggy.

如果你在GAS中运行正则表达式并打印 Match 对象，你会看到
```
[18-01-26 08:49:07:198 CET] [<a class=""email"" href=""mailto:SOisAwesome@hello.edu"">, 
a class=""email"" href=""mailto:SOisAwesome@hello.edu, "">]
```
这意味着，非捕获组获得"merged"，第一个捕获组跳过第一个字符 .

以下是一些实验：
```
Logger.log(new RegExp("(?:;\\w+):(\\d+)").exec(";er:34")); // = null, expected [;er:34, 34]
Logger.log(new RegExp("(?:e\\w+):(\\d+)").exec(";er:34")); // = null, expected [er:34, 34]
Logger.log(new RegExp("(?:\\w+):(\\d+)").exec(";er:34"));  // =  [er:34, 34], as expected
```
要解决此问题，您可以删除非捕获括号，如 \d = (?:\d) .
回复于 2024-04-18T14:48:00+08:00

python和谷歌应用程序脚本中的正则表达式的区别（后端引擎相关？）

1 回答

相关问题