仅更改并保存文件中的一行-Java 学习之路

我想知道R中是否有东西允许我更新文件而不是保存所有数据 .

也许有像 sqldf::read.csv.sql 这样的东西可以保存 .

好

假设我将虹膜数据存储为.csv：

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

但我已经意识到，第二朵花是维吉尼卡，所以我想改变第二排：

2          4.9         3.0          1.4         0.2  virginica

我知道，我可以读取文件，更改Species，然后再次保存，但是我的文件中的行数越多（即> 10 6），这种方法的效率就越低 .

1 回答

1
一般来说，R实际上不适用于就地文件编辑，并且我知道在任何上下文中都没有（当前可用的）工具支持它 . 即使像 sed 这样的unixy工具也会进行快速编辑，但仍然不能在技术上做到这一点"in-place"（即使它隐藏了它的工作方式） . （可能有一些可以，但可能没有你想要的易于访问 . ）

有一个值得注意的例外，一种用于就地编辑（嗯，交互）的文件格式 . 它包括重要的就地添加，过滤，替换和删除操作符 . 在大多数情况下，它通常会这样做，而不需要在这样做时增加文件大小 . 这是 SQLite .

例如，
```
library(DBI)
# library(RSQLite) # don't need to load it, just need to have it available
fname <- "./iris.sqlite3"
con <- dbConnect(RSQLite::SQLite(), fname)
file.info(fname)$size
# [1] 0
dbWriteTable(con, "iris", iris)
# [1] TRUE
file.info(fname)$size
# [1] 16384
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          4.7         3.2          1.6         0.2  setosa
file.info(fname)$size
# [1] 16384
dbExecute(con, "update iris set [Species]='virginica' where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
# [1] 1
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 1          4.7         3.2          1.6         0.2 virginica
dbDisconnect(con)
file.info(fname)$size
# [1] 16384
```
优点
- 这是cross-platform . 它比大多数人意识到的更为多产，是Firefox浏览器和Android操作系统所必需的内部组件 . （Many others也是 . ）
- 此外，驱动程序存在于大多数编程语言中，包括R，python，ruby，以及许多在此列出的太多 .
- 对于可以存储在单个SQLite文件中的数据量实际上没有实际限制 . 它理论上支持高达140TB（https://www.sqlite.org/whentouse.html），但是如果你得到这么大，那么对于不同的解决方案有很多（合理的）论据 .
- 拉取数据是 Build 在SQL标准之上的，尽管它不是100％兼容的，但却是pretty darn close . 查询时间/性能取决于您的查询大小，但通常非常快（参考：Will SQLite performance degrade if the database size is greater than 2 gigabytes?）
- 实际上，它可以是faster而不是单个文件操作 .
缺点
- 文件大小将为"overhead" . 值得注意的是 iris 的内存不足7K（参见 object.size(iris) ），但文件大小从16K开始 . 对于较大的数据，间隙比（文件大小与实际数据）将缩小 . （我用 ggplot2::diamonds 做了同样的事情;对象是3456376字节，文件大小是3780608，小于10％ . ）
- 当SQLite认为必要时，文件大小会增加 . 这是基于R范围以外的许多因素和这个问题/答案 .
- 如果删除大量数据，文件大小不会立即减少以适应...请参阅change sqlite file size after "DELETE FROM table"（提示： vacuum ）
- 有许多工具可以轻松/立即从这种文件格式导入数据，但显然缺少的是Excel和Access . 这是可行的SQLite-ODBC，但需要一点肘部油脂来做到这一点 . （我对它很好，但并非所有用户都会这样做，并且一些企业网络使这一步骤变得困难或特别不允许 . ）
SQLite-file-as-CSV

如果要导入所有内容，可以在导入时将其视为文件：
```
con <- dbConnect(RSQLite::SQLite(), fname)
iris2 <- dbGetQuery(con, "select * from iris")
dbDisconnect(con)
```
相比于
```
iris2 <- read.csv("iris.csv", stringsAsFactors = FALSE)
```
如果你想得到幻想：
```
import_sqlite <- function(fname, tablename = NA) {
  if (length(tablename) > 1L) {
    warning("the condition has length > 1 and only the first element will be used")
    tablename <- tablename[[1L]]
  }
  con <- DBI::dbConnect(RSQLite::SQLite(), fname)
  on.exit(DBI::dbDisconnect(con), add = TRUE)
  available_tables <- DBI::dbListTables(con)
  if (length(available_tables) == 0L) {
    stop("no tables found")
  } else if (is.na(tablename)) {
    if (length(available_tables) == 1L) {
      tablename <- available_tables
    }
  }
  if (tablename %in% available_tables) {
    tablename <- DBI::dbQuoteIdentifier(con, tablename)
    qry <- sprintf("select * from %s", tablename)
    out <- tryCatch(list(data = DBI::dbGetQuery(con, DBI::SQL(qry)),
                         err = NULL),
                    error = function(e) list(data = NULL, err = e))
    if (! is.null(out$err)) {
      stop("[sqlite error] ", out$err$message)
    } else {
      return(out$data)
    }    
  } else {
    stop(sprintf("table %s not found", DBI::dbQuoteIdentifier(con, tablename)))
  }
}
head(import_sqlite("iris.sqlite3"))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa
```
（我不提供除了概念验证之外的任何功能，你可以将它与单个文件进行交互，就好像它是一个CSV . 那里有一些安全措施，但实际上只是为了这个题 . ）
回复于 2024-05-04T17:19:42+08:00

仅更改并保存文件中的一行

1 回答

优点

缺点

SQLite-file-as-CSV

相关问题