从PDF中提取表格[关闭]-Java 学习之路

-2

我有一个包含 text, images and tables 的pdf文件 . 我想使用Python或R从该pdf文件中提取表格 .

2 回答

如果您正在考虑使用 R ，我建议使用tabulizer包 .
它可用here并且非常易于使用 . 要安装它，您必须使用以下命令：

install.packages("devtools")
devtools::install_github("ropensci/tabulizer")

并使用他们的一个例子：

library("tabulizer")
f <- system.file("examples", "data.pdf", package = "tabulizer")
# When f is your selected pdf file.
out1 <- extract_tables(f)
# Or even better, say what page the tables are in.
out2 <- extract_tables(f, pages = 1, guess = FALSE, method = "data.frame")

回复于 2024-05-04T09:38:24+08:00

1

你'll probably find PyPI useful - you can search for specific things on there like ' PDF ' and it will give you a list of modules relating to PDF' s（here） . 你在PyPI上的重量是_325066 . 这应该可以帮助您入门！

回复于 2024-05-04T09:38:24+08:00

从PDF中提取表格[关闭]

2 回答

相关问题