如何使用DOM解析器解析忽略DOCTYPE声明的xhtml-Java 学习之路

我面临使用DOM解析器解析带有DOCTYPE声明的xhtml的问题 .

错误：java.io.IOException：服务器返回HTTP响应代码：503为URL：http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20

声明：DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

有没有办法将xhtml解析为忽略DOCTYPE声明的Document对象 .

4 回答

对我有用的解决方案是为DocumentBuilder提供一个返回空流的假解析器 . 这里有一个很好的解释（看看kdgregory的最后一条消息）

http://forums.sun.com/thread.jspa?threadID=5362097

这是kdgregory的解决方案：

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });

回复于 2024-04-24T11:13:42+08:00

1
解析器需要下载DTD，但您可以通过在 <?xml... ?> 行上设置独立属性来解决它 .

但请注意，此特定错误很可能是由XML架构定义与DTD URL之间的混淆引发的 . 有关详细信息，请参阅http://www.w3schools.com/xhtml/xhtml_dtd.asp . 正确的是：
```
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
```
回复于 2024-04-24T11:13:42+08:00
1

最简单的方法是在DocumentBuilderFactory中设置validating = false . 如果要进行验证，请下载DTD并使用本地副本 . 如上面的Rachel评论，这在The WWW Consortium.讨论

简而言之，因为每次验证时默认的DocumentBuilderFactory都会下载DTD，所以每当典型的程序员尝试用Java解析XHTML文件时，W3就会受到攻击 . 他们承担不起那么多的流量，所以他们回答错误 .

回复于 2024-04-24T11:13:42+08:00

以下代码片段指示解析器真正忽略DOCTYPE声明中的外部DTD，而不是伪解析器：

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

(...)

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )

回复于 2024-04-24T11:13:42+08:00

如何使用DOM解析器解析忽略DOCTYPE声明的xhtml

4 回答

相关问题