按新行和大写字母的正则表达式拆分-Java 学习之路

我一直在努力通过Python中的正则表达式来分割我的字符串 .

我有一个文本文件，我加载的格式为：

"Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch 
 at Kate's house. Kyle went home at 9. \nSome other sentence 
 here\n\u2022Here's a bulleted line"

我想获得以下输出：

['Peter went to the gym; he worked out for two hours','Kyle ate lunch 
at Kate's house. He went home at 9.', 'Some other sentence here', 
'\u2022Here's a bulleted line']

我希望用Python中的新行和大写字母或项目符号来分割我的字符串 .

我已经尝试解决问题的前半部分，只用新行和大写字母拆分我的字符串 .

这是我到目前为止所拥有的：

print re.findall(r'\n[A-Z][a-z]+',str,re.M)

这只是给了我：

[u'\nKyle', u'\nSome']

这只是第一个字 . 我已尝试过该正则表达式的变体，但我不知道如何获得该行的其余部分 .

我假设还要按子弹点分割，我只需要包含一个OR正则表达式，其格式与以大写字母分割的正则表达式相同 . 这是最好的方法吗？

我希望这是有道理的，如果我的问题无论如何都不清楚，我很抱歉 . :)

2 回答

你可以使用这个 split 函数：

>>> str = u"Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch at Kate's house. Kyle went home at 9. \nSome other sentence here\n\u2022Here's a bulleted line"
>>> print re.split(u'\n(?=\u2022|[A-Z])', str)

[u'Peter went to the gym; \nhe worked out for two hours ',
 u"Kyle ate lunch at Kate's house. Kyle went home at 9. ",
 u'Some other sentence here',
 u"\u2022Here's a bulleted line"]

Code Demo

回复于 2024-04-29T23:09:44+08:00

您可以在大写字母或子弹字符的 \n 处拆分：

import re
s = """
Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch 
at Kate's house. Kyle went home at 9. \nSome other sentence 
here\n\u2022Here's a bulleted line
"""
new_list = filter(None, re.split('\n(?=•)|\n(?=[A-Z])', s))

输出：

['Peter went to the gym; \nhe worked out for two hours ', "Kyle ate lunch \nat Kate's house. Kyle went home at 9. ", 'Some other sentence \nhere', "•Here's a bulleted line\n"]

或者，不使用项目符号的符号：

new_list = filter(None, re.split('\n(?=\u2022)|\n(?=[A-Z])', s))

回复于 2024-04-29T23:09:44+08:00

按新行和大写字母的正则表达式拆分

2 回答

相关问题