获取r中每月最长的连续数字，将其相加并存储-Java 学习之路

-1

我在Stackoverflow的第一个问题:)我希望你能够帮助我 .

我试图在数据集中找到每月连续最长的雨天数，将雨总量相加，然后存储下雨持续的天数和矩阵中的雨量之和 . 我成功地获得了下雨天数并将其存储起来，但是一个场景和283点需要2.5小时才能运行（12个模型x 4个场景要跟随:) . 我读到某处读取和写入矩阵是低效的，所以我的猜测是操作可能更有效 .

这个链接已经帮助了longest consecutive，但我的问题更进一步 .

对于这个问题，最好使用2年的每日降雨量值，我将其与日期相关联，以找出每个月和每年最长的连续阴雨天数 . 我将输出写入矩阵 . 然后在盆地中进行了10分 .

用于获取连续数字并将其写入矩阵的函数：

WriteRainyDaysCountToMatrix <- function(myDataFrame, myDates, mymatrix, i)
{
  monthsAmount <- 24
  for (monthNumber in 1:monthsAmount){
  #print(cat("monthNumber = ", monthNumber))
  year <- toString(myDates[monthNumber,2])
  month <- toString(myDates[monthNumber,1])
  dayCounter <- 0
  precipitationMax <- 0
  lastRowPrecipitation <- F

  for (rowNumber in 1:nrow(myDataFrame)){
    rowDate <- myDataFrame[rowNumber,1]
    rowYear <- substr(rowDate,1,4)
    rowMonth <- substr(rowDate,5,6)

    if (rowYear == year && rowMonth == month){
      rowPrecipitation <- myDataFrame[rowNumber,2]

      if (rowPrecipitation > 0){
        dayCounter <- dayCounter + 1
        lastRowPrecipitation <- T
        }
        else{
          if (lastRowPrecipitation == T && precipitationMax == 0){
          precipitationMax <- dayCounter
          dayCounter <- 0
          lastRowPrecipitation <- F
        }
      else if (lastRowPrecipitation == T && precipitationMax < dayCounter){
        precipitationMax <- dayCounter
        dayCounter <- 0
        lastRowPrecipitation <- F
      }
      else{
        dayCounter <- 0
        lastRowPrecipitation <- F
      }
    }
  }
}
if (lastRowPrecipitation == T && precipitationMax == 0){
  precipitationMax <- dayCounter
 }
 mymatrix[[monthNumber,i]] <- precipitationMax
}
return (mymatrix)
}

这里定义了一个存储值的空矩阵：

pmatrix_hist <- matrix(data=NA,nrow=12,ncol=10,dimnames=list(c(1:24),c(1:10)))

作为输入的日期（文本文件）myDates：

01 1981,02 1981,03 1981,14 1981,05 1981,16 1981,17 1981,18 1981,19 1981,11 1981,11 1981,11 1981,11 1982,02 1982,03 1982,04 1982,05 1982 ，06 1982,07 1982,08 1982,09 1982,16 1982,11 1982,12 1982等

Time_step_hist来自包含以这种格式超过2年的日期的文本文件：

19810101,19810102,19810103,19810104,19810105等

然后一个for循环为我做的工作：

for (i in 1:10) {
   # loop over dates and acquire date       
   Prec_hist = read.table(paste(P_read_table_hist$V1[i]), header=F)
   # then put date and rain together
   data_Prec_Hist <- data.frame(Time_step_hist[1:7305,],     Prec_hist[1:7305,]) 
   # call function to get and write to matrix
   pmatrix_hist <- WriteRainyDaysCountToMatrix(data_Prec_Hist,  Dates_hist, pmatrix_hist, i)
   }

我不知道如何输入我使用的雨数据，但可以使用简单的0和1列表 .

在矩阵中仅存储天数，在这些连续日期内存储雨量总和的第二矩阵是可能的 .

可能的最大性能增益在哪里？

提前谢谢了！

1 回答

欢迎！你的直觉是正确的，这种操作应该在几秒钟而不是几小时内测量 .

您最大的收获来自最小化循环和使用矢量化函数 .

一个好的开始是使用as.Date将字符串转换为日期，并使用子集来简化循环 .

而不是所有那些嵌套循环经历这些年和几个月，考虑 as.Date('19810101',"%Y%m%d") 将您的字符串转换为日期 .

这是修改后的结构可能是什么样的，我不得不猜测你的“myDataFrame”有一列日期和一列降水值 .

set.seed(42)#for repeatability
#Dummy Data
#some dates
dates <- seq(as.Date("19810101","%Y%m%d"), as.Date("19821231","%Y%m%d"), by = "day")
#random Precipitation
precip <- sample(0:3,length(dates), replace = TRUE, prob = c(.7,.1,.1,.1))
myDataFrame<- data.frame(dates,precip)

#
#real code
#

#Create a month Column
myDataFrame$month <- as.numeric(format(myDataFrame$dates,'%m'))
myDataFrame$year <- as.numeric(format(myDataFrame$dates,'%Y'))

#order your data frame by date
myDataFrame[order(as.Date(myDataFrame$dates)),]

#create lists of the months, and years to drive loops later
yr_list <- levels(factor(myDataFrame$year))
mo_list <- levels(factor(myDataFrame$month))

#A smoother loop structure will look like like this, relying on subsetting 
for(y in yr_list){
  suby <- subset(myDataFrame, myDataFrame[,"year"] == y)
  for(m in mo_list){
    subm <- subset(suby, suby[,"month"] ==m)
    for(d in 1:length(subm)){
      #here you can run your count for each month, and then write the record
      #to a data structure of your choosing
        } 
      }
    }

有些事情将来会有所帮助：1）输出你的数据源的样子2）显示你想要输出的最小例子 .

一些提示一般提示：将您引入脚本的日期子集可以帮助您更快地进行迭代，并查找system.time（）和proc.time（）来测量您的代码，以查看您所做的改进 .

回复于 2024-04-30T05:36:08+08:00

获取r中每月最长的连续数字，将其相加并存储

1 回答

相关问题