RegEx strip html标签问题-Java 学习之路

我试图使用正则表达式替换模式“<[^>] *>”从单词生成的html中删除html标签，如下所示：

<html xmlns：v = "urn:schemas-microsoft-com:vml" xmlns：o = "urn:schemas-microsoft-com:office:office" xmlns：w = "urn:schemas-microsoft-com:office:word" xmlns：st1 = "urn:schemas-microsoft-com:office:smarttags" xmlns =“[http://www.w3.org/TR/REC-html40"&gt](http://www.w3.org/TR/REC-html40"&gt);

v: {behavior:url(#default#VML);}*

o: {behavior:url(#default#VML);}*

w: {behavior:url(#default#VML);}*

.shape {behavior:url(#default#VML);}

</ style> <！[endif] - > <o：SmartTagType namespaceuri = "urn:schemas-microsoft-com:office:smarttags" name = "place" downloadurl =“http://www.5iantlavalamp.com/"/> <！ - [if！mso]> <style>

st1:*{behavior:url(#default#ieooui) }

</ style> <！[endif] - > <style> <！ - / 字体定义/ @ font-face {font-family：Tahoma; panose-1：2 11 6 4 3 5 4 4 2 4;} /样式定义 / p.MsoNormal，li.MsoNormal，div.MsoNormal {margin：0in;边距：.0001pt;字体大小：12.0pt; font-family："Times New Roman";} a：link，span.MsoHyperlink {color：blue; text-decoration：underline;} a：visited，span.MsoHyperlinkFollowed {color：purple; text-decoration：underline;} span.EmailStyle17 {mso-style-type：personal; FONT-FAMILY：宋体; color：windowtext;} span.EmailStyle18 {mso-style-type：personal-reply; FONT-FAMILY：宋体; color：navy;} @page Section1 {size：8.5in 11.0in;保证金：1.0in 1.25in 1.0in 1.25in;} div.Section1 {page：Section1;} - > </ style>

</ HEAD>

一切都很好，除了上面的粗体线，任何人都有想法如何匹配它们？

谢谢，

亚历山大

3 回答

-1
您的正则表达式没有考虑到注释可以包含未终止注释的 > 个字符 . 试试这个正则表达式：
```
|<[^>]*>
```
您必须打开选项以使 . 匹配换行符 . 如何做到这一点取决于您使用 /s 标志的应用程序或编程语言 . 在.NET中，你设置 RegexOptions.SingleLine .
回复于 2024-05-16T07:13:17+08:00
0
人们通常建议在处理HTML时使用解析器而不是正则表达式 .

如果你必须使用正则表达式:)你可以使用 -
```
<style>.*?</style>
```
回复于 2024-05-16T07:13:17+08:00
3

You can't use Regular expressions to parse HTML (or XML for that matter) .

回复于 2024-05-16T07:13:17+08:00

RegEx strip html标签问题

3 回答

相关问题