用于解析SQL语句的正则表达式-Java 学习之路

我有一个IronPython脚本，它针对SQL Server数据库执行一堆SQL语句 . 语句是大字符串，实际上包含多个语句，由“GO”关键字分隔 . 当它们从sql管理工作室和其他一些工具运行时可以工作，但不能在ADO中运行 . 所以我使用2.5“re”模块拆分字符串，如下所示：

splitter = re.compile(r'\bGO\b', re.IGNORECASE)
for script in splitter.split(scriptBlob):
    if(script):
        [... execute the query ...]

在罕见的情况下，这会打破注释或字符串中的单词“go” . 如何解决这个问题？即正确地将此字符串解析为两个脚本：

-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')

EDIT:

我搜索了更多，发现了这个SQL文档：http://msdn.microsoft.com/en-us/library/ms188037(SQL.90).aspx

事实证明，GO必须按照自己的方式提出一些答案 . 然而，它后面可以跟一个“count”整数，它实际上会多次执行语句批处理（之前有人实际使用过它），然后可以在同一行上单行注释（但不是多行，我测试了这个 . ）所以神奇的正则表达式看起来像：

"(?m)^\s*GO\s*\d*\s*$"

除此之外不考虑：

一个可能的单行注释（ "--" 后跟除了换行符之外的任何字符） .
整条线都在更大的多行评论中 .

我不关心捕获“计数”参数并使用它 . 现在我有一些技术文档，我非常接近写这个“规范”，而且再也不用担心了 .

4 回答

5

“GO”总是在一条线上吗？你可以拆分“^ GO $” .

回复于 2024-05-13T00:04:50+08:00
2
因为你可以在注释，嵌套注释，查询中的注释等内部发表评论，所以没有理智的方法来使用正则表达式 .

只需想象一下以下脚本：
```
INSERT INTO table (name) VALUES (
-- GO NOW GO
'GO to GO /* GO */ GO' +
/* some comment 'go go go'
-- */ 'GO GO' /*
GO */
)
```
没有提到：
```
INSERT INTO table (go) values ('xxx') GO
```
唯一的方法是构建一个有状态的解析器 . 一次读取一个char，并且有一个标志，当它在注释/引用分隔的字符串/ etc中时将被设置，并在结束时重置，因此代码可以忽略“GO”实例 .
回复于 2024-05-13T00:04:50+08:00
8
如果GO总是在一条线上，你可以像这样使用split：
```
#!/usr/bin/python

import re

sql = """-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO 5 --this is a test
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')"""

statements = re.split("(?m)^\s*GO\s*(?:[0-9]+)?\s*(?:--.*)?$", sql)

for statement in statements:
    print "the statement is\n%s\n" % (statement)
```
- (?m) 打开多行匹配，即 ^ 和 $ 将匹配行的开头和结尾（而不是字符串的开头和结尾） .
- ^ 在一行开头匹配
- \s* 匹配零个或多个空格（空格，制表符等）
- GO 匹配文字GO
- \s* 和以前一样
- (?:[0-9]+)? 匹配可选的整数（可能的前导零）
- \s* 和以前一样
- (?:--.*)? 匹配可选的行尾注释
- $ 匹配在一行的末尾
拆分将消耗GO线，因此您不必担心它 . 这将为您提供一份陈述清单 .

这个修改后的拆分有一个问题：在GO之后它不会给你回数，如果这很重要我会说是时候转移到某种形式的解析器了 .
回复于 2024-05-13T00:04:50+08:00

这不会检测GO是否在某些语句中被用作变量名，但是应该注意那些内部注释或字符串 .

EDIT: 如果 GO 是声明的一部分，只要它不在它自己的行中，这现在有效 .

import re

line_comment = r'(?:--|#).*$'
block_comment = r'/\*[\S\s]*?\*/'
singe_quote_string = r"'(?:\\.|[^'\\])*'"
double_quote_string = r'"(?:\\.|[^"\\])*"'
go_word = r'^[^\S\n]*(?P<GO>GO)[^\S\n]*\d*[^\S\n]*(?:(?:--|#).*)?$'

full_pattern = re.compile(r'|'.join((
    line_comment,
    block_comment,
    singe_quote_string,
    double_quote_string,
    go_word,
)), re.IGNORECASE | re.MULTILINE)

def split_sql_statements(statement_string):
    last_end = 0
    for match in full_pattern.finditer(statement_string):
        if match.group('GO'):
            yield statement_string[last_end:match.start()]
            last_end = match.end()
    yield statement_string[last_end:]

用法示例：

statement_string = r"""
-- this is a great database script!  go team go!
INSERT INTO go(go) VALUES ('go away!')
go 7 -- foo
INSERT INTO go(go) VALUES (
    'I have to GO " with a /* comment to GO inside a /* GO string /*'
)
/*
  here are some comments that go with this script.
  */
  GO
  INSERT INTO go(go) VALUES ('this is the next script')
"""

for statement in split_sql_statements(statement_string):
    print '======='
    print statement

输出：

=======

-- this is a great database script!  go team go!
INSERT INTO go(go) VALUES ('go away!')

=======

INSERT INTO go(go) VALUES (
    'I have to GO " with a /* comment to GO inside a /* GO string /*'
)
/*
  here are some comments that go with this script.
  */

=======

  INSERT INTO go(go) VALUES ('this is the next script')

回复于 2024-05-13T00:04:50+08:00

用于解析SQL语句的正则表达式

4 回答

相关问题