others - python- 标记Pandas分组日期少于n个月


d_ex = pd.DataFrame({'col1': ['A', 'A', 'B', 'B'],


 'col2': ['2019-01-01', '2018-02-01',


 '2015-01-01', '2019-01-01']})



d_ex['col2'] = pd.to_datetime(d_ex['col2'])



d_ex



我试过


d_ex.groupby(['col1'])['col2'].diff()



但这是无法工作的。

时间:

创建数据帧:


import pandas as pd



d_ex = pd.DataFrame({'col1': ['A', 'A', 'B', 'B'],


 'col2': ['2019-01-01', '2018-02-01',


 '2015-01-01', '2019-01-01']})



d_ex['col2'] = pd.to_datetime(d_ex['col2'])



col1 col2


 A 2019-01-01


 A 2018-02-01


 B 2015-01-01


 B 2019-01-01



按降序:

  • diff为每个组返回一个timedelta对象
    • NaT总是在组中出现

d_ex['diff'] = d_ex.groupby('col1').diff()



col1 col2 diff


 A 2019-01-01 NaT


 A 2018-02-01 -334 days


 B 2015-01-01 NaT


 B 2019-01-01 1461 days



函数返回不同的abs

  • abs不能在NaT上工作,所以,abs不能被应用,

def my_abs(x):


 try:


 x = abs(x)


 except TypeError:


 x = x


 return x



# Apply the function


d_ex['diff'] = d_ex['diff'].apply(lambda x: my_abs(x))



col1 col2 diff


 A 2019-01-01 NaT


 A 2018-02-01 334 days


 B 2015-01-01 NaT


 B 2019-01-01 1461 days



如果不是NaT,设置条件函数,

  • 由于使用groupbydiff,组的第一行都应该是NaT,
  • NaT设置为无则,它可以为backfilled

def set_condition(x):


 if type(x) == pd._libs.tslibs.nattype.NaTType:


 x = None


 elif x <= pd.Timedelta('365 days'):


 x = True


 else:


 x = False


 return x



# Apply the function


d_ex['condition'] = d_ex['diff'].apply(lambda x: set_condition(x))



col1 col2 diff condition


 A 2019-01-01 NaT None


 A 2018-02-01 334 days True


 B 2015-01-01 NaT None


 B 2019-01-01 1461 days False



回填None :


d_ex.condition.bfill(inplace=True)



col1 col2 diff condition


 A 2019-01-01 NaT True


 A 2018-02-01 334 days True


 B 2015-01-01 NaT False


 B 2019-01-01 1461 days False



...