图形大小与从markdown到docx的pandoc转换-Java 学习之路

我在Rstudio中用Rmarkdown键入报告 . 当使用knitr在 html 中转换它时，还有一个由knitr生成的 markdown 文件 . 我将此文件转换为 pandoc ，如下所示：

pandoc -f markdown -t docx input.md -o output.docx

output.docx 文件很好，除了一个问题：数字的大小被改变，我需要手动调整Word中的数字 . 有没有什么可做的，也许是 pandoc 的选项，以获得正确的数字尺寸？

4 回答

这是一个使用R脚本中的ImageMagick调整大小的解决方案 . 70％的比例似乎是一个不错的选择 .

# the path containing the Rmd file :
wd <- "..."
setwd(wd)

# the folder containing the figures :
fig.path <- paste0(wd, "/figure")
# all png figures :
figures <- list.files(fig.path, pattern=".png", all.files=TRUE)

# (safety) create copies of the original files
dir.create(paste0(fig.path,"_copy"))
for(i in 1:length(figures)){
  fig <- paste0(fig.path, "/", figures[i])
  file.copy(fig,"figure_copy")
}

# resize all figures
for(i in 1:length(figures)){
    fig <- paste0(fig.path, "/", figures[i])
    comm <- paste("convert -resize 70%", fig, fig)
    shell(comm)
}

# then run pandoc from a command line  
# or from the pandoc() function :
library(knitr)
pandoc("MyReport.md", "docx")

有关ImageMagick的 resize 函数的更多信息：www.perturb.org

回复于 2024-05-04T11:37:39+08:00

3
我还想将R markdown转换为html和.docx / .odt，并且数字大小和分辨率都很好 . 到目前为止，我发现最好的方法是明确定义.md文档中图形的分辨率和大小（dpi，fig.width和fig.height选项） . 如果你这样做，你有好的图表可用于发布，odt / docx是好的 . 如果您使用远高于默认72 dpi的dpi，问题是图表在html文件中看起来太大了 . 以下是我用来处理此问题的3种方法（NB我使用带有spin（）语法的R脚本）：

1）在knitr选项中使用out.extra ='WIDTH =“75％”' . 这将强制html的所有图形占据窗口宽度的75％ . 这是一个快速的解决方案，但如果你有不同大小的地块，这不是最佳选择 . （NB我更喜欢用厘米而不是英寸，所以到处都是/2.54）
```
library(knitr)
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
               fig.width = 8/2.54, fig.height = 8/2.54,
               out.extra ='WIDTH="75%"'
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])
```
2）使用out.width和out.height指定图形的大小（以像素为单位）到html文件中 . 我使用常量“sc”将绘图的大小缩小到html输出 . 这是更精确的方法，但问题是，对于每个图形，你必须定义fig.witdth / height和out.width / height，这真的很开心！理想情况下，您应该能够在全局选项中指定例如out.width = 150 * fig.width（其中fig.width从块变为块） . 也许这样的事情是可能的，但我不知道如何 .
```
#+ echo = FALSE
library(knitr)
sc <- 150
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
                fig.width = 8/2.54, fig.height = 8/2.54,
                out.width = sc*8/2.54, out.height = sc*8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54, out.width= sc * 14/2.54, out.height= sc * 10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])
```
请注意，对于这两个解决方案，我认为您无法使用pandoc直接将md文件转换为odt（不包括数字） . 我将md转换为html，然后将html转换为odt（没有尝试过docx） . 类似的东西（如果以前的R脚本是名称“figsize1.R”）：
```
library(knitr)
setwd("/home/gilles/")
spin("figsize1.R")

system("pandoc figsize1.md -o figsize1.html")
system("pandoc figsize1.html -o figsize1.odt")
```
3）简单地编译您的文档两次，一次使用低dpi值（~96）用于html输出，一次用高分辨率（~300）用于odt / docx输出 . 这是我现在的首选方式 . 主要缺点是你必须编译两次，但这对我来说不是一个问题，因为我通常只需要在作业的最后提供odt文件来提供给最终用户 . 我在工作期间使用Rstudio中的html notebook按钮定期编译html .
```
#+ echo = FALSE
library(knitr)

opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), 
               fig.width = 8/2.54, fig.height = 8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])
```
然后用以下脚本编译2个输出（这里你可以直接将md文件转换为html）：
```
library(knitr)
setwd("/home/gilles")

opts_chunk$set(dpi=96)
spin("figsize3.R", knit=FALSE)
knit2html("figsize3.Rmd")

opts_chunk$set(dpi=400)
spin("figsize3.R")
system("pandoc figsize3.md -o figsize3.odt")
```
回复于 2024-05-04T11:37:39+08:00

这是我的解决方案：破解由Pandoc转换的docx，因为docx只是一个xml文件包，调整数字大小非常简单 .

以下是从转换后的docx中提取的 word/document.xml 中的图形：

<w:p>
  <w:r>
    <w:drawing>
      <wp:inline>
        <wp:extent cx="1524000" cy="1524000" />
        ...
        <a:graphic>
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic>
              ...
              <pic:blipFill>
                <a:blip r:embed="rId23" />
                ...
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0" />
                  <a:ext cx="1524000" cy="1524000" />
                </a:xfrm>
                ...
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

因此，用所需的值替换节点 wp:extent ＆ a:ext 的 cx 和 cy 属性将执行调整大小的作业 . 以下R代码适用于我 . 最宽的数字将占用变量 out.width 指定的整行宽度，其余部分按比例调整大小 .

require(XML)

## default linewidth (inch) for Word 2003
out.width <- 5.77
docx.file <- "report.docx"

## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
wp.extent <- getNodeSet(xmlRoot(doc), "//wp:extent")
a.blip <- getNodeSet(xmlRoot(doc), "//a:blip")
a.ext <- getNodeSet(xmlRoot(doc), "//a:ext")

figid <- sapply(a.blip, xmlGetAttr, "r:embed")
figname <- dir("temp_dir/word/media/")
stopifnot(length(figid) == length(figname))
pdffig <- paste("temp_dir/word/media/",
                ## in case figure ids in docx are not in dir'ed order
                sort(figname)[match(figid, substr(figname, 1, nchar(figname) - 4))], sep="")

## get dimension info of included pdf figures
pdfsize <- do.call(rbind, lapply(pdffig, function (x) {
    fig.ext <- substr(x, nchar(x) - 2, nchar(x))
    pp <- pipe(paste(ifelse(fig.ext == 'pdf', "pdfinfo", "file"), x, sep=" "))
    pdfinfo <- readLines(pp); close(pp)
    sizestr <- unlist(regmatches(pdfinfo, gregexpr("[[:digit:].]+ X [[:digit:].]+", pdfinfo, ignore.case=T)))
    as.numeric(strsplit(sizestr, split=" x ")[[1]])
}))

## resizing pdf figures in xml DOM, with the widest figure taking up a line's width
wp.cx <- round(out.width*914400*pdfsize[,1]/max(pdfsize[,1]))
wp.cy <- round(wp.cx*pdfsize[, 2]/pdfsize[, 1])
wp.cx <- as.character(wp.cx)
wp.cy <- as.character(wp.cy)
sapply(1:length(wp.extent), function (i)
       xmlAttrs(wp.extent[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));
sapply(1:length(a.ext), function (i)
       xmlAttrs(a.ext[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));

## save hacked xml back to docx
saveXML(doc, document.xml, indent = F)
setwd("temp_dir")
system(paste("zip -r ../", docx.file, " *", sep=""))
setwd("..")
system("rm -fr temp_dir")

回复于 2024-05-04T11:37:39+08:00

3
一种简单的方法是在各个块选项中包含比例因子 k ：
```
{r, fig.width=8*k, fig.height=6*k}
```
和全局块选项中的变量 dpi ：
```
opts_chunk$set(dpi = dpi)
```
然后，您可以在全局环境中编织 Rmd 文件之前设置 dpi 和 k 的值：
```
dpi <<- 96    
k <<- 1
```
或者你可以在 Rmd 文件的一个块中设置它们（例如在第一个块中设置 k ） .
回复于 2024-05-04T11:37:39+08:00

图形大小与从markdown到docx的pandoc转换

4 回答

相关问题