使用 ITextSharp 在合并文档中插入 PieceInfo-Java 学习之路

我有一个将多个 PDF 合并为一个 PDF 的过程。这很有效。

在合并时，我想在页面级别添加 PieceInfo 以跟踪包含在该合并文件中的文档。

假设我按此顺序有 3 个文档：Fester.pdf(2 页)，Gomez.pdf(2 页)和 Lurch.pdf(1 页)。在合并之后，我将有 5 页，每个页面都有一个 PieceInfo，文件名来源于。这样，如果我转到第 4 页，我将知道页面是从 Gomez.pdf 生成

在我的搜索过程中，我发现了这篇文章：使用 iText 库在 pdf 中插入隐藏的摘要并且我尝试在我的过程中实现相同的内容。该建议很有效，但我无法弄清楚如何存储每页的信息。

这是我的代码：

public static byte[] MergeDocuments(DocumentCollection myCollection)
{
    PdfImportedPage importedPage = null;

    // Merged the document streams
    using (MemoryStream stream = new MemoryStream())
    {
        // Create the iTextSharp document
        iTextSharp.text.Document pdfDoc = new iTextSharp.text.Document();

        // Create the PDF writer that listened to the document
        PdfCopy pdfCopy = new PdfCopy(pdfDoc, stream);
        if (pdfDoc != null && pdfCopy != null)
        {
            // Open the document and load content
            pdfDoc.Open();

            //Dictionary Entries
            PdfName appName = new PdfName("MyKey");
            PdfName dataName = new PdfName("Hash");

            //Class to add and retrieve the PieceInfo data
            DocumentPieceInfo dpi = new DocumentPieceInfo();

            //Loop through my collection. The document class has the BinaryFile and FileName
            foreach (Document doc in myCollection)
            {
                PdfReader reader = new PdfReader(doc.FileBinary);
                if (reader != null)
                {
                    int nPage = reader.NumberOfPages;
                    for (int n = 0; n < nPage; n++)
                    {
                        //Trying to add the PieceInfo
                        dpi.addPieceInfo(pdfCopy, appName, dataName, new PdfString(string.Format("Info Doc: {0}", doc.FileName)));
                        importedPage = pdfCopy.GetImportedPage(reader, n + 1);
                        pdfCopy.AddPage(importedPage);
                    }
                    // Close the reader
                    reader.Close();
                }
            }

            if (pdfCopy != null)
                pdfCopy.Close();

            if (pdfDoc != null)
                pdfDoc.Close();

            byte[] arrOutput = stream.ToArray();
            return arrOutput;

        }
    }
    return null;
}

并对 MKL 解决方案进行了一些小改动，将输入更改为 PDFCopy：

public void addPieceInfo(PdfCopy reader, PdfName app, PdfName name, PdfObject value)
    {
        //PdfDictionary catalog = reader.getCatalog();
        PdfDictionary pieceInfo = reader.ExtraCatalog.GetAsDict(PIECE_INFO);
        if (pieceInfo == null)
        {
            pieceInfo = new PdfDictionary();
            reader.ExtraCatalog.Put(PIECE_INFO, pieceInfo);
        }

        PdfDictionary appData = pieceInfo.GetAsDict(app);
        if (appData == null)
        {
            appData = new PdfDictionary();
            pieceInfo.Put(app, appData);
        }

        PdfDictionary privateData = appData.GetAsDict(PRIVATE);
        if (privateData == null)
        {
            privateData = new PdfDictionary();
            appData.Put(PRIVATE, privateData);
        }

        appData.Put(LAST_MODIFIED, new PdfDate());
        privateData.Put(name, value);
    }

上面的代码只在最后一页添加了 pieceinfo :(

页面 PdfImportedPage 对象是否有办法获取目录？

如何在合并过程中按页面级别包含此信息？之后，我如何从页面中获取 pieceInfo？只是循环浏览页面？

1 回答

请注意 ISO-32000-2 将在 ISO-32000-2(也就是 PDF 2.0)中弃用。您也可以创建自己的密钥来添加自己的自定义数据。我在回答问题itext 如何检查 pdf 页面上是否存在巨型字符串时解释了这一点。

您问的是页面 PdfImportedPage 对象是否有办法获取目录？

这不是一个正确的问题。如果你好好学习我的答案，你会发现你需要访问页面词典。您可以将/PieceInfo条目(或自定义条目)添加到此页面字典中，然后再将其检索。

看一下CustomPageDictKeyMerge：

public void createPdf(String filename) throws IOException, DocumentException {
    PdfName marker = new PdfName("ITXT_PageMarker");
    List<PdfReader> readers = new ArrayList<PdfReader>();
    readers.add(new PdfReader(SRC1));
    readers.add(new PdfReader(SRC2));
    readers.add(new PdfReader(SRC3));
    Document document = new Document();
    PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
    document.open();
    int counter = 0;
    int n;
    PdfImportedPage importedPage;
    PdfDictionary pageDict;
    for (PdfReader reader : readers) {
        counter++;
        n = reader.getNumberOfPages();
        for (int p = 1; p <= n; p++) {
            pageDict = reader.getPageN(p);
            pageDict.put(marker, new PdfString(String.format("Page %s of document %s", p, counter)));
            importedPage = copy.getImportedPage(reader, p);
            copy.addPage(importedPage);
        }
    }
    // close the document
    document.close();
    for (PdfReader reader : readers) {
        reader.close();
    }
}

在此示例中，我们在导入页面之前向页面字典添加特殊标记。因此，此标记将添加到合并文档中：

在此输入图像描述

看一下CustomPageDictKeyCreate示例，了解如何检索这些自定义标记：

public void check(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary pagedict;
    for (int i = 1; i < reader.getNumberOfPages(); i++) {
        pagedict = reader.getPageN(i);
        System.out.println(pagedict.get(new PdfName("ITXT_PageMarker")));
    }
    reader.close();
}

请确保为自定义密钥使用第二个类名。 iText 已为其自定义的第二类密钥注册了前缀ITXT和 ISO。此前缀可确保不同公司不会将相同的密钥用于不同目的。所有以ITXT开头的键都可以很容易地识别为 iText Group 创建的键。 ISO 会跟踪所有这些前缀以避免重复。使用 ISO 注册前缀是免费的。

回复于 2024-04-20T00:00:30+08:00

使用 ITextSharp 在合并文档中插入 PieceInfo

1 回答

相关问题