从文本中删除大型字符串列表-Java 学习之路

假设

txt='Daniel Johnson and Ana Hickman are friends. They know each other for a long time. Daniel Johnson is a professor and Ana Hickman is writer.'

是一个很大的文本，我想删除一个大的字符串列表，如

removalLists=['Daniel Johnson','Ana Hickman']

从他们 . 我的意思是我想要替换列表中的所有元素

' '

我知道我可以轻松地使用循环来实现这一点

for string in removalLists:
    txt=re.sub(string,' ',txt)

我想知道我是否可以更快地完成它 .

1 回答

一种方法是生成单个正则表达式模式，其是替换项的替换 . 因此，我建议使用以下正则表达式模式，例如：

\bDaniel Johnson\b|\bAna Hickman\b

为了生成这个，我们可以首先用词边界（ \b ）包装每个术语 . 然后，使用 | 作为分隔符将列表折叠为单个字符串 . 最后，我们可以使用 re.sub 用单个空格替换任何术语的所有出现 .

txt = 'Daniel Johnson and Ana Hickman are friends. They know each other for a long time. Daniel Johnson is a professor and Ana Hickman is writer.'
removalLists = ['Daniel Johnson','Ana Hickman']

regex = '|'.join([r'\b' + s + r'\b' for s in removalLists])
output = re.sub(regex, " ", txt)

print(output)

  and   are friends. They know each other for a long time.   is a professor and   is writer.

回复于 2024-04-29T10:44:03+08:00

从文本中删除大型字符串列表

1 回答

相关问题