当使用R中的 adist
函数来计算字符串对之间的Levenshtein对齐时,我会得到不同的结果,这取决于我是为每对运行一次函数还是使用向量一次输入几对 . 这是为什么?
Example: 字符串对的转换'knijpen' - 'kneifen','grijpen' - 'greifen'和'lopen' - 'laufen':
attr(adist("knijpen", "kneifen", counts = TRUE), "trafos")
# [,1]
# [1,] "MMIMSDMM"
attr(adist("grijpen", "greifen", counts = TRUE), "trafos")
# [,1]
# [1,] "MMIMSDMM"
attr(adist("lopen", "laufen", counts = TRUE), "trafos")
# [,1]
# [1,] "MSSIMM"
这些与我自己的手动解决方案一致 . 但是,当我使用向量输入字符串时,我得到的结果略有不同:
dutch <- c("knijpen", "grijpen", "lopen")
german <- c("kneifen", "greifen", "laufen")
attr(adist(dutch, german, counts = TRUE), "trafos")
# [,1] [,2] [,3]
# [1,] "MMIMSDMM" "SSIMSDMM" "SSSSDMMM"
# [2,] "SSIMSDMM" "MMIMSDMM" "SSSSDMMM"
# [3,] "SSSIIMMM" "SSSIIMMM" "MSSIMMM"
此矩阵中的[3,3]元素应对应于 attr(adist("lopen", "laufen", counts = TRUE), "trafos")
(即 "MSSIMM"
),但它还有另一个 M
. 为什么?