python - python - Pandas DataFrames:使用现有行的计算创建新行

如何通过按某些字段分组(在示例"Country "和"Industry "中),并且将一些数学运算到另一个字段(在示例中),从现有DataFrame创建新行 "Field" 和 "Value")?

源 DataFrame


df = pd.DataFrame({'Country': ['USA','USA','USA','USA','USA','USA','Canada','Canada'],


 'Industry': ['Finance', 'Finance', 'Retail', 


 'Retail', 'Energy', 'Energy', 


 'Retail', 'Retail'],


 'Field': ['Import', 'Export','Import', 


 'Export','Import', 'Export',


 'Import', 'Export'],


 'Value': [100, 50, 80, 10, 20, 5, 30, 10]})



 Country Industry Field Value


0 USA Finance Import 100


1 USA Finance Export 50


2 USA Retail Import 80


3 USA Retail Export 10


4 USA Energy Import 20


5 USA Energy Export 5


6 Canada Retail Import 30


7 Canada Retail Export 10



目标 DataFrame

Net = 导入- 导出


 Country Industry Field Value


0 USA Finance Net 50


1 USA Retail Net 70


2 USA Energy Net 15


3 Canada Retail Net 20



时间:

有很多可能的方法。 这里有一个使用 groupbyunstack:


(df.groupby(['Country', 'Industry', 'Field'], sort=False)['Value']


. sum()


. unstack('Field')


. eval('Import - Export')


. reset_index(name='Value'))



 Country Industry Value


0 USA Finance 50


1 USA Retail 70


2 USA Energy 15


3 Canada Retail 20



IIUC


df=df.set_index(['Country','Industry'])



Newdf=(df.loc[df.Field=='Export','Value']-df.loc[df.Field=='Import','Value']).reset_index().assign(Field='Net')


Newdf


 Country Industry Value Field


0 USA Finance -50 Net


1 USA Retail -70 Net


2 USA Energy -15 Net


3 Canada Retail -20 Net



pivot_table


df.pivot_table(index=['Country','Industry'],columns='Field',values='Value',aggfunc='sum').


 diff(axis=1).


 dropna(1).


 rename(columns={'Import':'Value'}).


 reset_index()


Out[112]: 


Field Country Industry Value


0 Canada Retail 20.0


1 USA Energy 15.0


2 USA Finance 50.0


3 USA Retail 70.0



可以使用 Groupby.diff(),然后再重新创建 Field 列,最后使用 DataFrame.dropna:


df['Value'] = df.groupby(['Country', 'Industry'])['Value'].diff().abs()


df['Field'] = 'Net'


df.dropna(inplace=True)


df.reset_index(drop=True, inplace=True)



print(df)


 Country Industry Field Value


0 USA Finance Net 50.0


1 USA Retail Net 70.0


2 USA Energy Net 15.0


3 Canada Retail Net 20.0



你可以这样做,以将这些行添加到原始 dataframe:


df.set_index(['Country','Industry','Field'])


. unstack()['Value']


. eval('Net = Import - Export')


. stack().rename('Value').reset_index()



输出:


 Country Industry Field Value


0 Canada Retail Export 10


1 Canada Retail Import 30


2 Canada Retail Net 20


3 USA Energy Export 5


4 USA Energy Import 20


5 USA Energy Net 15


6 USA Finance Export 50


7 USA Finance Import 100


8 USA Finance Net 50


9 USA Retail Export 10


10 USA Retail Import 80


11 USA Retail Net 70



这个答案利用了pandas将组key放在结果数据帧的多索引中的事实。 ( 如果只有一个组key,则可以使用 loc 。)


>>> s = df.groupby(['Country', 'Industry', 'Field'])['Value'].sum()


>>> s.xs('Import', axis=0, level='Field') - s.xs('Export', axis=0, level='Field')


Country Industry


Canada Retail 20


USA Energy 15


 Finance 50


 Retail 70


Name: Value, dtype: int64



...