亲爱的社区成员!
我正在努力弄清楚正则表达式的问题 . 目标是 split the text in separate parts with keyword (consectetur|tempor incididunt|proident|consequat) as a separator and add keyword to the text part captured. Start capturing new group again until another keyword is spotted. 不幸的是,我很难找到一个解决方案,用于捕获最后一段文本,该文本没有关键字,必须添加到最后一个捕获组或新组的末尾 . 使用正则表达式甚至可以解决这个问题吗?
Please, pay attention to links at the bottom
Regular expression: ( . *?)(consectetur | tempor incididunt | proident | consequat)
Text: Lorem ipsum dolor sit amet,consectetur adipiscing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua . Ut enim ad minim veniam,quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat . Duis aute irure dolor in repreptderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur . Excepteur sint occaecat cupidatat non proident,sunt in culpa qui officia deserunt mollit anim id est laborum .
Desired outcome:
比赛1
-
Lorem ipsum dolor sit amet,
-
consectetur
比赛2
-
adipiscing elit,sed do eiusmod
-
tempor incididunt
比赛3
-
ut labore et dolore magna aliqua . Ut enim ad minim veniam,quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
-
consequat
比赛4
-
. Duis aute irure dolor in repreptderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur . Excepteur sint occaecat cupidatat non
-
普通
-
sunt in culpa qui officia deserunt mollit anim id est laborum .
Below, I have attached supporting information with what I have so far:
感谢您的帮助!
1 回答
你可以用
\Z
匹配Python中文本的末尾 . 见regex demo .所以,模式现在匹配:
(.*?)
- 第1组:任何0个字符尽可能少,直到第一次出现(consectetur|tempor incididunt|proident|consequat|\Z)
- 任何替代方案:consectetur
,tempor
,incididunt
,proident
,consequat
或字符串结尾 .如果字符串可以有换行符,则在编译正则表达式时使用
re.DOTALL
标志: