我试图使用正则表达式替换模式“<[^>] *>”从单词生成的html中删除html标签,如下所示:
<html xmlns:v = "urn:schemas-microsoft-com:vml" xmlns:o = "urn:schemas-microsoft-com:office:office" xmlns:w = "urn:schemas-microsoft-com:office:word" xmlns:st1 = "urn:schemas-microsoft-com:office:smarttags" xmlns =“[http://www.w3.org/TR/REC-html40">](http://www.w3.org/TR/REC-html40">);
<head> <meta http-equiv = Content-Type content =“text / html; charset = iso-8859-2”> <meta name = Generator content =“Microsoft Word 11(过滤介质)”> <! - [ if!mso]> <style>
v: {behavior:url(#default#VML);}*
o: {behavior:url(#default#VML);}*
w: {behavior:url(#default#VML);}*
.shape {behavior:url(#default#VML);}
</ style> <![endif] - > <o:SmartTagType namespaceuri = "urn:schemas-microsoft-com:office:smarttags" name = "place" downloadurl =“http://www.5iantlavalamp.com/"/> <! - [if!mso]> <style>
st1:*{behavior:url(#default#ieooui) }
</ style> <![endif] - > <style> <! - / 字体定义/ @ font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} /样式定义 / p.MsoNormal,li.MsoNormal,div.MsoNormal {margin:0in;边距:.0001pt;字体大小:12.0pt; font-family:"Times New Roman";} a:link,span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited,span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal; FONT-FAMILY:宋体; color:windowtext;} span.EmailStyle18 {mso-style-type:personal-reply; FONT-FAMILY:宋体; color:navy;} @page Section1 {size:8.5in 11.0in;保证金:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} - > </ style>
</ HEAD>
一切都很好,除了上面的粗体线,任何人都有想法如何匹配它们?
谢谢,
亚历山大
3 回答
您的正则表达式没有考虑到注释可以包含未终止注释的
>
个字符 . 试试这个正则表达式:您必须打开选项以使
.
匹配换行符 . 如何做到这一点取决于您使用/s
标志的应用程序或编程语言 . 在.NET中,你设置RegexOptions.SingleLine
.人们通常建议在处理HTML时使用解析器而不是正则表达式 .
如果你必须使用正则表达式:)你可以使用 -
You can't use Regular expressions to parse HTML (or XML for that matter) .