如何使用循环有条件地在新变量中创建值-Java 学习之路

我每五天收集一次关于植物发育或物候学的数据（使用分类变量“代码”编码），沿着横断面划分为78个连续区段 . 每个物种都在每个区段的横断面上进行调查 . 这项努力正在重复100年前的一项研究！

我想重新编码我的数据集，以克服原始研究编码系统的不足 .

原始编码系统（用于植物开花期）：

K = flower bud
b1 = single flower
b2 = sparse flowers (two or three)
b3 = flowers common (more than three)
B4 = flowering ended

问题是，当我想分析我的数据时，这些代码不足以描述观察的背景 . 例如，代码'b1'和'b2'可以在开花期的早期和晚期发生 . 这使得难以以标准化方式“排列”我的观察结果 .

解决方案可以是循环或其他有效的方式来顺序移动观察（通过'Segment'，'Species'，'Date'）来重新编码观察，基于它是在特定事件之前还是之后发生（在这种情况下）第一次'Code'被记录为“b3”） .

对于横断面和物种的任何给定区段，原始数据中的代码可能如下所示：

Date    Segment Species Code
26/05/2017  1   A   K
01/06/2017  1   A   b1
06/06/2017  1   A   b1
10/06/2017  1   A   b2
14/06/2017  1   A   b2
19/06/2017  1   A   b2
23/06/2017  1   A   b3
28/06/2017  1   A   b3
03/07/2017  1   A   b2
08/07/2017  1   A   b2
14/07/2017  1   A   b1
19/07/2017  1   A   b4

如果我考虑在赛季前使用数据，我会使用如下编码系统：

K = flower bud
b1a = single flower
b2a = sparse flowers (two or three)
b3 = flowers common (more than three)
b2b = sparse flowers (two or three)
b1b = single flower
B4 = flowering ended

通过对代码的这些更改，上面的示例数据将如下所示：

Date    Segment Species Code
26/05/2017  1   A   K
01/06/2017  1   A   b1a
06/06/2017  1   A   b1a
10/06/2017  1   A   b2a
14/06/2017  1   A   b2a
19/06/2017  1   A   b2a
23/06/2017  1   A   b3
28/06/2017  1   A   b3
03/07/2017  1   A   b2b
08/07/2017  1   A   b2b
14/07/2017  1   A   b1b
19/07/2017  1   A   b4

此外，我必须重新编码历史数据集，因此任何解决方案对两者都至关重要 .

注意： very 重要的是，在 first 遇到'b3'之后，会发生"a"附加"b"或'b2'的切换 . 这很重要，因为有时花的数据丰度在生长季节会波动 . 例如：

Date    Segment Species Code
01-Jun-17   1   A   b1
06-Jun-17   1   A   b1
10-Jun-17   1   A   b2
14-Jun-17   1   A   b2
19-Jun-17   1   A   b3
23-Jun-17   1   A   b3
28-Jun-17   1   A   b2 # appears out of the "ideal" sequence
02-Aug-17   1   A   b3
07-Aug-17   1   A   b2 # appears out of the "ideal" sequence
12-Aug-17   1   A   b3
17-Aug-17   1   A   b2
22-Aug-17   1   A   b1 # appears out of the "ideal" sequence
27-Aug-17   1   A   b2 
02-Sep-17   1   A   b1
07-Sep-17   1   A   b4

在这种情况下，数据看起来像：

Date    Segment Species Code
01-Jun-17   1   A   b1a
06-Jun-17   1   A   b1a
10-Jun-17   1   A   b2a
14-Jun-17   1   A   b2a
19-Jun-17   1   A   b3
23-Jun-17   1   A   b3
28-Jun-17   1   A   b2b
02-Aug-17   1   A   b3
07-Aug-17   1   A   b2b
12-Aug-17   1   A   b3
17-Aug-17   1   A   b2b
22-Aug-17   1   A   b1b
27-Aug-17   1   A   b2b 
02-Sep-17   1   A   b1b
07-Sep-17   1   A   b4

最后一点 . 由于北极地区的生长季节很短，并不是每个开花期（=代码）都发生在一个区域的每个物种 .

示例数据：

DT <- structure(list(Date = structure(c(17312, 17318, 17323, 17327, 
17331, 17336, 17340, 17345, 17350, 17355, 17361, 17366, 17312, 
17318, 17323, 17327, 17331, 17336, 17340, 17345, 17350, 17355, 
17361, 17366, 17370, 17375, 17350, 17355, 17361, 17366, 17370, 
17312, 17318, 17323, 17327, 17331, 17336, 17340, 17345, 17350, 
17355, 17361, 17366, 17312, 17318, 17323, 17327, 17331, 17336, 
17340, 17345, 17350, 17355, 17361, 17366, 17355, 17361, 17366, 
17370, 17375, 17318, 17323, 17327, 17331, 17336, 17340, 17345, 
17380, 17385, 17390, 17395, 17400, 17405, 17411, 17416, 17318, 
17323, 17327, 17331, 17336, 17340, 17345, 17380, 17385, 17390, 
17395, 17400, 17405, 17411, 17416, 17318, 17323, 17327, 17331, 
17336, 17340, 17345, 17380, 17385, 17390, 17395, 17400, 17405, 
17411, 17416), class = "Date"), Segment = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Species = c("A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "C", 
"C", "C", "C", "C", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "C", "C", "C", "C", "C", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"
), Code = c("K", "b1", "b1", "b2", "b2", "b2", "b3", "b3", "b2", 
"b2", "b1", "b4", "b1", "b1", "b2", "b2", "b2", "b3", "b3", "b3", 
"b2", "b2", "b2", "b1", "b1", "b4", "b1", "b1", "b2", "b2", "b4", 
"b1", "b1", "b2", "b2", "b2", "b3", "b3", "b3", "b2", "b2", "b2", 
"b4", "K", "b1", "b1", "b2", "b2", "b2", "b3", "b3", "b2", "b2", 
"b2", "b4", "b3", "b3", "b2", "b1", "b4", "b1", "b1", "b2", "b2", 
"b3", "b3", "b2", "b3", "b2", "b3", "b2", "b1", "b2", "b1", "b4", 
"b1", "b1", "b2", "b2", "b3", "b3", "b2", "b3", "b2", "b3", "b2", 
"b1", "b2", "b1", "b4", "b1", "b1", "b2", "b2", "b3", "b3", "b2", 
"b3", "b2", "b3", "b2", "b1", "b2", "b1", "b4")), .Names = c("Date", 
"Segment", "Species", "Code"), row.names = c(NA, -105L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x00000000000b0788>)

2 回答

使用 dplyr 可以通过以下方式完成：

library(dplyr)
DT %>% 
  group_by(Species, Segment) %>% 
  mutate(after_b3 = (cumsum(Code == "b3") > 0), 
         Code_new = case_when(Code %in% c("b1", "b2") & !after_b3 ~ paste0(Code, "a"), 
                              Code %in% c("b1", "b2") & after_b3 ~ paste0(Code, "b"), 
                              TRUE ~ Code)) 

# A tibble: 105 x 6
# Groups:   Segment, Species [9]
#          Date Segment Species  Code after_b3 Code_new
#        <date>   <dbl>   <chr> <chr>    <lgl>    <chr>
#  1 2017-05-26       1       A     K    FALSE        K
#  2 2017-06-01       1       A    b1    FALSE      b1a
#  3 2017-06-06       1       A    b1    FALSE      b1a
#  4 2017-06-10       1       A    b2    FALSE      b2a
#  5 2017-06-14       1       A    b2    FALSE      b2a
#  6 2017-06-19       1       A    b2    FALSE      b2a
#  7 2017-06-23       1       A    b3     TRUE       b3
#  8 2017-06-28       1       A    b3     TRUE       b3
#  9 2017-07-03       1       A    b2     TRUE      b2b
# 10 2017-07-08       1       A    b2     TRUE      b2b
# ... with 95 more rows

使用 group_by ，代码将应用于每个Segment，Species组合 . after_b3 列描述 Code 是否已经 "b3" . 然后通过检查几个案例来确定 Code_new .

回复于 2024-05-02T11:30:19+08:00

也许不是最有效的方式，但它有效（考虑到我理解你的问题）

library(data.table)
DT <- as.data.table(DT)

tmp_list <- list()
for (seg in unique(DT$Segment)){ # seg <- 1
  for(spec in unique(DT$Species)){ # spec <- "C"
    tmp_list[[paste0(seg,"_",spec)]] <- DT[Segment%in%seg & Species%in%spec]
    index <- which(tmp_list[[paste0(seg,"_",spec)]]$Code=="b3")[1]
    rows <- nrow(tmp_list[[paste0(seg,"_",spec)]])
    if(!is.na(index)){
      tmp_list[[paste0(seg,"_",spec)]][index:rows,new_code:=ifelse(Code%in%"b1","b1b",
                                                                   ifelse(Code%in%"b2","b2b",Code))]
      tmp_list[[paste0(seg,"_",spec)]][1:index,new_code:=ifelse(Code%in%"b1","b1a",
                                                                ifelse(Code%in%"b2","b2a",Code))]
    }else{
      tmp_list[[paste0(seg,"_",spec)]][,new_code:=new_code:=ifelse(Code%in%"b1","b1a",
                                                            ifelse(Code%in%"b2","b2a",Code))]
    }   
  }
}
final <- rbindlist(tmp_list)

因此，通过细分和物种，我找到第一个 b3 ，之后 (and by after i mean for the next rows) 我分别将 b1 和 b2 更改为 b1b 和 b2b . 对于第一个 b3 之前的行，我分别将 b1 和 b2 更改为 b1a 和 b2a . if语句考虑了特定物种段组合没有 b3 的情况

回复于 2024-05-02T11:30:19+08:00

如何使用循环有条件地在新变量中创建值

2 回答

相关问题