使用PDFBox 2.0.2从PDF中提取文本缺少类PDFTextStripper（）-Java 学习之路

我在java中使用PDFBox 1.8.10实现了简单的文本提取方法 . 由于某些原因我必须将库升级到PDFBox 2.0.2 . 可能删除PDFTextStripper（）方法或在新版本中找到另一个包 . 有没有办法解决这个问题？或者你能建议另一种从PDF获取文本的方法吗？

这是我的代码：

public String extractTextFromPdf() {
     File jInputFile = new File("c:/lorem/ipsum.pdf");
     PDDocument PDDoc = PDDocument.load(jInputFile ); 
     String strContent = new PDFTextStripper().getText(PDDoc);
     PDDoc.close();
     return strContent;
}

提前致谢 .

1 回答

试试吧

{
    PDDocument document = null;
    document = PDDocument.load(new File("test.pdf"));
    document.getClass();
    if (!document.isEncrypted()) {
        PDFTextStripperByArea stripper = new PDFTextStripperByArea();
        stripper.setSortByPosition(true);
        PDFTextStripper Tstripper = new PDFTextStripper();
        String st = Tstripper.getText(document);
        System.out.println("Text:" + st);
    }
} catch (Exception e) {
    e.printStackTrace();
}`

回复于 2024-04-27T13:00:51+08:00

使用PDFBox 2.0.2从PDF中提取文本缺少类PDFTextStripper（）

1 回答

相关问题