- [R - 如何做一个双循环?

具体来说,我想比较两条信息(请参见简化例子,因为我的实际数据库相当长): "Monthly_category"和"Ref_category",


Months Element_Id Monthly_Category Ref_Category Expected_output


 1	 1	 3	 NA	 0


 2	 1	 2	 NA	 0


 3 	 1	 2	 NA	 1


 4	 1	 1	 NA	 1


 5	 1	 3	 3	 0


 1	 2	 6	 2	 0


 2	 2	 6	 6	 1


 3	 2	 NA	 1	 0


 4	 2	 NA	 6	 0


 5	 2	 1	 1	 0


 

更确切地说,我希望在Monthly_category与选定的Ref_category连续2个周期相差2个周期后,立即输入1. 否则,设置0.

一个检查每行的每月类别,一个检查每5行的参考类别。

时间:

无论如何,你不需要显式的for循环或显式的if循环即可完成工作。

在r中,通常不希望直接编写for循环,你最好使用类似lapply的功能,在这种情况下,dplyr包负责隐式循环。


df <- tibble::tribble(~Months, ~Element_Id, ~Monthly_Category, ~Ref_Category, ~Expected_output,


 1 , 1, 3, NA, 0,


 2 , 1, 2, NA, 0,


 3 , 1, 2, NA, 1,


 4 , 1, 1, NA, 1,


 5 , 1, 3, 3, 0,



 1 , 2, 6, 2, 0,


 2 , 2, 6, 6, 1,


 3 , 2, 1, 1, 0,


 4 , 2, 1, 6, 0,


 5 , 2, 1, 1, 0)



library(dplyr)


library(purrr)



df %>%



 # check if elements are equal


 mutate(Real_Expected_output = !map2_lgl(Monthly_Category, Ref_Category, identical)) %>% 



 # sort by Element_Id and Months just in case your data is messy


 arrange(Element_Id, Months) %>% 



 # For each Element_Id ...


 group_by(Element_Id) %>% 



 # ... define your Expected Output


 mutate(Real_Expected_output = as.integer(lag(Real_Expected_output, default = FALSE) & 


 lag(Real_Expected_output, 2, default = FALSE))) %>% 


 ungroup()



# Months Element_Id Monthly_Category Ref_Category Expected_output Real_Expected_output


# <dbl> <dbl> <dbl> <dbl> <dbl> <int>


# 1 1 3 NA 0 0


# 2 1 2 NA 0 0


# 3 1 2 NA 1 1


# 4 1 1 NA 1 1


# 5 1 3 3 0 1


# 1 2 6 2 0 0


# 2 2 6 6 1 0


# 3 2 1 1 0 0


# 4 2 1 6 0 0


# 5 2 1 1 0 0



Real_Expected_output与你的Expected_output不同,只是因为我相信你的预期结果与你的。

同样:没有循环,你只需要聪明地使用dplyr i包提供的工具,lastgroup_bymutate


df %>%



 # sort by Element_Id and Months just in case your data is messy


 arrange(Element_Id, Months) %>% 



 # For each Element_Id ...


 group_by(Element_Id) %>% 



 # ... check if Monthly Category is equal to the last Ref_Category


 mutate(Real_Expected_output = !map2_lgl(Monthly_Category, last(Ref_Category), identical)) %>% 



 # ... and define your Expected Output


 mutate(Real_Expected_output = as.integer(Real_Expected_output & 


 lag(Real_Expected_output, default = FALSE))) %>% 



 ungroup()



# Months Element_Id Monthly_Category Ref_Category Expected_output Real_Expected_output


# <dbl> <dbl> <dbl> <dbl> <dbl> <int>


# 1 1 3 NA 0 0


# 2 1 2 NA 0 0


# 3 1 2 NA 1 1


# 4 1 1 NA 1 1


# 5 1 3 3 0 0


# 1 2 6 2 0 0


# 2 2 6 6 1 1


# 3 2 1 1 0 0


# 4 2 1 6 0 0


# 5 2 1 1 0 0




此时,我建议你创建一个外部函数来处理你的问题,看起来更清楚了。





df <- tibble::tribble(~Months, ~Element_Id, ~Monthly_Category, ~Ref_Category, ~Expected_output,


 1 , 1, 3, NA, 0,


 2 , 1, 2, NA, 0,


 3 , 1, 2, NA, 1,


 4 , 1, 1, NA, 1,


 5 , 1, 3, 3, 0,



 1 , 2, 6, 2, 0,


 2 , 2, 6, 6, 1,


 3 , 2, NA, 1, 0,


 4 , 2, NA, 6, 0,


 5 , 2, 1, 1, 0)



library(dplyr)


library(purrr)



get_output <- function(mon, ref){



 # set here your condition


 exp <- !is.na(mon) & !map2_lgl(mon, last(ref), identical)



 # check exp and lag(exp), then convert to integer


 as.integer(exp & lag(exp, default = FALSE))



}



df %>%



 # sort by Element_Id and Months just in case your data is messy


 arrange(Element_Id, Months) %>% 



 # For each Element_Id ...


 group_by(Element_Id) %>% 



 # ... launch your function


 mutate(Real_Expected_output = get_output(Monthly_Category, Ref_Category)) %>% 



 ungroup()



# # A tibble: 10 x 6


# Months Element_Id Monthly_Category Ref_Category Expected_output Real_Expected_output


# <dbl> <dbl> <dbl> <dbl> <dbl> <int>


# 1 1 1 3 NA 0 0


# 2 2 1 2 NA 0 0


# 3 3 1 2 NA 1 1


# 4 4 1 1 NA 1 1


# 5 5 1 3 3 0 0


# 6 1 2 6 2 0 0


# 7 2 2 6 6 1 1


# 8 3 2 NA 1 0 0


# 9 4 2 NA 6 0 0


# 10 5 2 1 1 0 0




...