- 是否有R函数用于查找低于特定值的值

是否有一个R函数来查找低于特定值的值。


COl1 COl2 


James Age 


James 23 


Andrew Age 


Andrew 24 



我需要另一行


COl1 COl2 COl3


James Age 23


James 23 23


Andrew Age 24


Andrew 24 24



时间:

使用dplyr


 df %>% 


 mutate_if(is.factor,as.character) %>% 


 mutate(COL3=ifelse(COl2=="Age",lead(COl2),COl2))


 COl1 COl2 COL3


1 James Age 23


2 James 23 23


3 Andrew Age 24


4 Andrew 24 24



使用base,我们可以执行以下操作,并且删除不需要的列:


 df$COL3<-expand.grid(df[which(df$COl2=="Age")+1,])


df


 COl1 COl2 COL3.COl1 COL3.COl2


1 James Age James 23


2 James 23 Andrew 23


3 Andrew Age James 24


4 Andrew 24 Andrew 24



通过COL2将数据帧分组,并且返回到原始数据帧。

Base


merge(df, subset(df, COl2 !="Age"), by = c("COl1"))



dplyr


library(dplyr)


df %>% 


 left_join(df %>% filter(COl2 !="Age") , by ="COl1")



sqldf


library(sqldf)


sqldf('SELECT *


 FROM df


 LEFT JOIN(SELECT *


 FROM df WHERE COl2 !="Age" )USING (COl1)')



输出


 COl1 COl2.x COl2.y


1 Andrew Age 24


2 Andrew 24 24


3 James Age 23


4 James 23 23



数据


df <- structure(list(COl1 = structure(c(2L, 2L, 1L, 1L), .Label = c("Andrew", 


"James"), class ="factor"), COl2 = structure(c(3L, 1L, 3L, 2L


), .Label = c("23","24","Age"), class ="factor")), class ="data.frame", row.names = c(NA, 


-4L))



使用dplyr进行尝试的一种方法是使用cumsum创建组,然后在每个组的"Age"之后选择下一个COl2值。


library(dplyr)



df %>%


 group_by(group = cumsum(COl2 =="Age")) %>%


 mutate(Col3 = COl2[which.max(COl2 =="Age") + 1]) %>%


 ungroup() %>%


 select(-group)



 # COl1 COl2 Col3 


 # <chr> <chr> <chr>


 #1 James Age 23 


 #2 James 23 23 


 #3 Andrew Age 24 


 #4 Andrew 24 24 



我们可以从"Age"中增加,我们可以从组中选择第二个值,


library(dplyr)


df %>%


 group_by(group = cumsum(COl2 =="Age")) %>%


 mutate(Col3 = COl2[2L])



或者使用基础R ave


with(df ,ave(COl2, cumsum(COl2 =="Age"), FUN = function(x) x[2L]))


#[1]"23""23""24""24"



通过将数据帧df与指定的约束结合使用,可以从sqldf‍‍使用解决方案:


library(sqldf)


result <- sqldf("SELECT df_origin.*, df_age.Col2 as Col3 FROM 


 df df_origin join


 (SELECT Col1, Col2, cast(Col2 as int) as Col2Int FROM df WHERE Col2Int > 0) df_age 


 on (df_origin.Col1 = df_age.Col1)") 



使用dplyr,/tidyr,once more1 :


library(tidyverse)



dat %>%


 mutate(COl3 = na_if(COl2,"Age")) %>%


 fill(COl3, .direction ="up")



数据:

#dat <- read.table(


# text ="COl1 COl2


# James Age


# James 23


# Andrew Age


# Andrew 24",


# header = T,


# stringsAsFactors = F


#)



输出:

# COl1 COl2 COl3


#1 James Age 23


#2 James 23 23


#3 Andrew Age 24


#4 Andrew 24 24




1仅当(any(is,na(dat $col2时才正确,

在基本R中:


df <- read.table(text="COl1 COl2 


James Age 


James 23 


Andrew Age 


Andrew 24", h = T)



transform(df, COl3 = ave(COl2, COl1, FUN = function(x) tail(x,1)))


# COl1 COl2 COl3


# 1 James Age 23


# 2 James 23 23


# 3 Andrew Age 24


# 4 Andrew 24 24



...