pandas - python - 有效组合多个Pandas系列

我知道可以使用combine_first来合并两个系列:


series1 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])


series2 = pd.Series([1,2,3,4,5],index=['f','g','h','i','j'])


series3 = pd.Series([1,2,3,4,5],index=['k','l','m','n','o'])



Combine1 = series1.combine_first(series2)


print(Combine1



输出:


a 1.0


b 2.0


c 3.0


d 4.0


e 5.0


f 1.0


g 2.0


h 3.0


i 4.0


j 5.0


dtype: float64



如果我需要合并3个或更多的序列怎么办?

我了解使用以下代码: print(series1 + series2 + series3)yields :


a NaN


b NaN


c NaN


d NaN


e NaN


f NaN


...


dtype: float64



可以在不多次使用combine_first的情况下高效合并多个系列?

谢谢

时间:


def combine_multi(ser_list):


 return pd.concat([series1, series2, series3], 1).fillna(0).sum(1)



例子A(不同索引)


series1 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])


series2 = pd.Series([1,2,3,4,5],index=['f','g','h','i','j'])


series3 = pd.Series([1,2,3,4,5],index=['k','l','m','n','o'])



out = combine_multi([series1, series2, series3])


out



a 5.0


b 12.0


c 15.0


d 3.0


e 4.0


g 6.0


h 7.0


i 8.0


j 9.0


k 10.0


m 12.0


o 14.0


dtype: float64



例子B(重叠索引)


series1 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])


series2 = pd.Series([1,2,3,4,5],index=['a','b','c','i','j'])


series3 = pd.Series([1,2,3,4,5],index=['k','b','m','d','f'])



out = combine_multi([series1, series2, series3])


out



a 2.0


b 6.0


c 6.0


d 8.0


e 5.0


f 5.0


i 4.0


j 5.0


k 1.0


m 3.0


dtype: float64



假设你正在使用combine_first的行为,可以根据combine_first的要求对系列的值进行优先级排序。


from functools import reduce


l_series = [series1, series2, series3]


reduce(lambda s1, s2: s1.combine_first(s2), l_series)



当然,如果索引与你当前的例子一样独特,那么你可以简单地使用pd.concat

演示


series1 = pd.Series(list(range(5)),index=['a','b','c','d','e'])


series2 = pd.Series(list(range(5, 10)),index=['a','g','h','i','j'])


series3 = pd.Series(list(range(10, 15)),index=['k','b','m','c','o'])



from functools import reduce


l_series = [series1, series2, series3]


print(reduce(lambda s1, s2: s1.combine_first(s2), l_series))



# a 0.0


# b 1.0


# c 2.0


# d 3.0


# e 4.0


# g 6.0


# h 7.0


# i 8.0


# j 9.0


# k 10.0


# m 12.0


# o 14.0


# dtype: float64



如果希望一个序列的值优先于另一个序列的值,可以先使用combine_ ,它通常用来填充第一个序列中缺失的值。 我不确定你的例子中所期望的输出,但是,看起来你可以使用concat,


pd.concat([series1, series2, series3])



得到


a 1


b 2


c 3


d 4


e 5


f 1


g 2


h 3


i 4


j 5


k 1


l 2


m 3


n 4


o 5



...