首页 文章

R读取中的read.csv日期不同

提问于
浏览
0

我有两个非常相似的csv文件 . 以相同格式从同一来源下载的2种不同股票的股票价格 . 但是,R中的read.csv以不同方式读取它们 .

> tab1=read.csv(path1)
> tab2=read.csv(path2)

> head(tab1)
        Date   Open   High    Low  Close  Volume Adj.Close
1 2014-12-01 158.35 162.92 157.12 157.12 2719100  156.1488
2 2014-11-03 153.14 160.86 152.98 160.09 2243400  159.1004
3 2014-10-01 141.16 154.44 130.60 153.77 3825900  152.0036
4 2014-09-02 143.30 147.87 140.66 141.68 2592900  140.0525
5 2014-08-01 140.15 145.39 138.43 144.00 2027100  142.3459
6 2014-07-01 143.41 146.43 140.60 140.89 2131100  138.4461

> head(tab2)
       Date  Open  High   Low Close  Volume Adj.Close
1 12/1/2014 73.39 75.20 71.75 72.29 1561400  71.92211
2 11/3/2014 69.28 74.92 67.88 73.74 1421600  72.97650
3 10/1/2014 66.18 74.95 63.42 69.21 1775400  68.49341
4  9/2/2014 68.34 68.57 65.49 66.32 1249200  65.63333
5  8/1/2014 67.45 68.99 65.88 68.26 1655400  67.20743
6  7/1/2014 64.07 69.50 63.09 67.46 1733600  66.41976

如果我尝试在read.csv中使用colClasses,则第二个表的日期将被错误地读取 .

> tab1=read.csv(path1,colClasses=c("Date",rep("numeric",6)))
> tab2=read.csv(path2,colClasses=c("Date",rep("numeric",6)))

> head(tab1)
        Date   Open   High    Low  Close  Volume Adj.Close
1 2014-12-01 158.35 162.92 157.12 157.12 2719100  156.1488
2 2014-11-03 153.14 160.86 152.98 160.09 2243400  159.1004
3 2014-10-01 141.16 154.44 130.60 153.77 3825900  152.0036
4 2014-09-02 143.30 147.87 140.66 141.68 2592900  140.0525
5 2014-08-01 140.15 145.39 138.43 144.00 2027100  142.3459
6 2014-07-01 143.41 146.43 140.60 140.89 2131100  138.4461

> head(tab2)
        Date  Open  High   Low Close  Volume Adj.Close
1 0012-01-20 73.39 75.20 71.75 72.29 1561400  71.92211
2 0011-03-20 69.28 74.92 67.88 73.74 1421600  72.97650
3 0010-01-20 66.18 74.95 63.42 69.21 1775400  68.49341
4 0009-02-20 68.34 68.57 65.49 66.32 1249200  65.63333
5 0008-01-20 67.45 68.99 65.88 68.26 1655400  67.20743
6 0007-01-20 64.07 69.50 63.09 67.46 1733600  66.41976

不知道如何在不附加.csv文件的情况下使此问题可重现 . 我正在附加这两个文件的快照 . 任何帮助将不胜感激 .
tab1

tab2

谢谢

1 回答

  • 1

    这可以通过将日期作为字符向量读入然后在transform()内调用_1383825来解决:

    transform(read.csv(path2,colClasses=c('character',rep('numeric',6))),Date=as.Date(strptime(Date,'%m/%d/%Y')));
    ##         Date  Open  High   Low Close  Volume Adj.Close
    ## 1 2014-12-01 73.39 75.20 71.75 72.29 1561400  71.92211
    ## 2 2014-11-03 69.28 74.92 67.88 73.74 1421600  72.97650
    ## 3 2014-10-01 66.18 74.95 63.42 69.21 1775400  68.49341
    ## 4 2014-09-02 68.34 68.57 65.49 66.32 1249200  65.63333
    ## 5 2014-08-01 67.45 68.99 65.88 68.26 1655400  67.20743
    ## 6 2014-07-01 64.07 69.50 63.09 67.46 1733600  66.41976
    

    Edit: 您可以使用自己的假设动态尝试"detect"日期格式,但这只会像您的假设一样可靠:

    readStockData <- function(path) {
        tab <- read.csv(path,colClasses=c('character',rep('numeric',6)));
        tab$Date <- as.Date(tab$Date,if (grepl('^\\d+/\\d+/\\d+$',tab$Date[1])) '%m/%d/%Y' else '%Y-%m-%d');
        tab;
    };
    readStockData(path1);
    ##         Date   Open   High    Low  Close  Volume Adj.Close
    ## 1 2014-12-01 158.35 162.92 157.12 157.12 2719100  156.1488
    ## 2 2014-11-03 153.14 160.86 152.98 160.09 2243400  159.1004
    ## 3 2014-10-01 141.16 154.44 130.60 153.77 3825900  152.0036
    ## 4 2014-09-02 143.30 147.87 140.66 141.68 2592900  140.0525
    ## 5 2014-08-01 140.15 145.39 138.43 144.00 2027100  142.3459
    ## 6 2014-07-01 143.41 146.43 140.60 140.89 2131100  138.4461
    readStockData(path2);
    ##         Date  Open  High   Low Close  Volume Adj.Close
    ## 1 2014-12-01 73.39 75.20 71.75 72.29 1561400  71.92211
    ## 2 2014-11-03 69.28 74.92 67.88 73.74 1421600  72.97650
    ## 3 2014-10-01 66.18 74.95 63.42 69.21 1775400  68.49341
    ## 4 2014-09-02 68.34 68.57 65.49 66.32 1249200  65.63333
    ## 5 2014-08-01 67.45 68.99 65.88 68.26 1655400  67.20743
    ## 6 2014-07-01 64.07 69.50 63.09 67.46 1733600  66.41976
    

    在上面我假设文件中至少有一条记录,并且所有记录都使用相同的Date格式,因此可以使用第一个Date值( tab$Date[1] )进行检测 .

相关问题