正则表达式 - 我需要找到一些东西然后复制它之后的东西-Java 学习之路

我有这样的事情：

<A NAME=speech26><b>SIR HUGH EVANS</b></a>
<blockquote>
<A NAME=1.1.58>Shall I tell you a lie? I do despise a liar as I do</A><br>
<A NAME=1.1.59>despise one that is false, or as I despise one that</A><br>
<A NAME=1.1.60>is not true. The knight, Sir John, is there; and, I</A><br>
<A NAME=1.1.61>beseech you, be ruled by your well-willers. I will</A><br>
<A NAME=1.1.62>peat the door for Master Page.</A><br>
<p><i>Knocks</i></p>
<A NAME=1.1.63>What, hoa! Got pless your house here!</A><br>
</blockquote>

我想找到所有的文字，并把它搞得像这样

Shall I tell you a lie? I do despise a liar as I do
despise one that is false, or as I despise one that
is not true. The knight, Sir John, is there; and, I
beseech you, be ruled by your well-willers. I will
peat the door for Master Page.
What, hoa! Got pless your house here!

我试过 <A NAME=[0-9]+\\.[0-9]+\\.[0-9]+> ，但它没有实现我想做的事情 . 有人可以帮忙吗？

4 回答

-1
你可以试试这个
```
<A NAME=\d+\.\d+\.\d+>(.*)(?=</A>)
```
Explanation
- <A NAME=\d+\.\d+\.\d+> - 会匹配这样的东西 <A NAME=1.1.112>
- (.*) - 匹配除换行零或更多时间以外的任何内容 .
- (?=</A>) - 积极向前看 . 匹配 </A> .
Demo
回复于 2024-04-20T06:12:03+08:00

使用正则表达式解析HTML / XML / JSON就像编写质量差的代码一样 . HTML可能包含重复的嵌套结构，在使用正则表达式进行语法分析时可能会导致意外结果 .

您可以在python中使用 Beautiful Soup 库并解析给定的HTML以提取所需的输出 .

这是一个使用 Beautiful Soup 的示例python代码

import re
from bs4 import BeautifulSoup

data = """<A NAME=speech26><b>SIR HUGH EVANS</b>
</a><blockquote>
<A NAME=1.1.58>Shall I tell you a lie? I do despise a liar as I do</A><br>
<A NAME=1.1.59>despise one that is false, or as I despise one that</A><br>
<A NAME=1.1.60>is not true. The knight, Sir John, is there; and, I</A><br>
<A NAME=1.1.61>beseech you, be ruled by your well-willers. I will</A><br>
<A NAME=1.1.62>peat the door for Master Page.</A><br>
<p><i>Knocks</i></p>
<A NAME=1.1.63>What, hoa! Got pless your house here!</A><br>
</blockquote>"""

soup = BeautifulSoup(data)

for aTag in soup.find_all('a', {'name': re.compile(r'\d+\.\d+\.\d+')}):
 print(aTag.get_text())

根据需要提供以下输出，

Shall I tell you a lie? I do despise a liar as I do
despise one that is false, or as I despise one that
is not true. The knight, Sir John, is there; and, I
beseech you, be ruled by your well-willers. I will
peat the door for Master Page.
What, hoa! Got pless your house here!

注意，我've used regex here as well, but in a limited space, just to tell I am interested in all ' a'标签，其中 name 属性的值属于此 \d+\.\d+\.\d+ 模式 .

回复于 2024-04-20T06:12:03+08:00

-1

你可以尝试下面的代码 .

text = """<A NAME=speech26><b>SIR HUGH EVANS</b>
</a><blockquote>
<A NAME=1.1.58>Shall I tell you a lie? I do despise a liar as I do</A><br>
<A NAME=1.1.59>despise one that is false, or as I despise one that</A><br>
<A NAME=1.1.60>is not true. The knight, Sir John, is there; and, I</A><br>
<A NAME=1.1.61>beseech you, be ruled by your well-willers. I will</A><br>
<A NAME=1.1.62>peat the door for Master Page.</A><br>
<p><i>Knocks</i></p>
<A NAME=1.1.63>What, hoa! Got pless your house here!</A><br>
</blockquote>"""

output = re.findall(r'<A NAME=\d\.\d\.\d*>(.*?)(?=</A>)', text, re.MULTILINE|re.DOTALL)
print(output)

产量

['Shall I tell you a lie? I do despise a liar as I do', 'despise one that is false, or as I despise one that', 'is not true. The knight, Sir John, is there; and, I', 'beseech you, be ruled by your well-willers. I will', 'peat the door for Master Page.', 'What, hoa! Got pless your house here!']

回复于 2024-04-20T06:12:03+08:00

-1

这是一个选项，使用 re.findall ：

text = "<A NAME=1.1.58>Shall I tell you a lie? " # ... your input from above
output = re.findall(r'<A NAME=\d+\.\d+\.\d+>(.*?)(?=</A>)', text, re.MULTILINE|re.DOTALL)
print(output)

['Shall I tell you a lie? I do despise a liar as I do',
 'despise one that is false, or as I despise one that',
 'is not true. The knight, Sir John, is there; and, I',
 'beseech you, be ruled by your well-willers. I will',
 'peat the door for Master Page.',
 'What, hoa! Got pless your house here!']

但请注意，通常使用正则表达式来解析HTML / XML内容并不是一件好事 . 如果您确定目标内容只会出现在上面显示的 <A> 标签类型之间，那么您可能会使用正则表达式 .

回复于 2024-04-20T06:12:03+08:00

正则表达式 - 我需要找到一些东西然后复制它之后的东西

4 回答

相关问题