pandas - 如何获取 Pandas dataframe的最后一行?

我有Pandas dataframe df1和df2 (df1是Vanila数据帧,df2的索引是'STK_ID ''RPT_Date'):


>>> df1
 STK_ID RPT_Date TClose sales discount
0 000568 20060331 3.69 5.975 NaN
1 000568 20060630 9.14 10.143 NaN
2 000568 20060930 9.49 13.854 NaN
3 000568 20061231 15.84 19.262 NaN
4 000568 20070331 17.00 6.803 NaN
5 000568 20070630 26.31 12.940 NaN
6 000568 20070930 39.12 19.977 NaN
7 000568 20071231 45.94 29.269 NaN
8 000568 20080331 38.75 12.668 NaN
9 000568 20080630 30.09 21.102 NaN
10 000568 20080930 26.00 30.769 NaN

>>> df2
 TClose sales discount net_sales cogs
STK_ID RPT_Date 
000568 20060331 3.69 5.975 NaN 5.975 2.591
 20060630 9.14 10.143 NaN 10.143 4.363
 20060930 9.49 13.854 NaN 13.854 5.901
 20061231 15.84 19.262 NaN 19.262 8.407
 20070331 17.00 6.803 NaN 6.803 2.815
 20070630 26.31 12.940 NaN 12.940 5.418
 20070930 39.12 19.977 NaN 19.977 8.452
 20071231 45.94 29.269 NaN 29.269 12.606
 20080331 38.75 12.668 NaN 12.668 3.958
 20080630 30.09 21.102 NaN 21.102 7.431

我可以通过以下方式获得df2的最后3行:


>>> df2.ix[-3:]
 TClose sales discount net_sales cogs
STK_ID RPT_Date 
000568 20071231 45.94 29.269 NaN 29.269 12.606
 20080331 38.75 12.668 NaN 12.668 3.958
 20080630 30.09 21.102 NaN 21.102 7.431

df1.ix[-3:]给出所有行:


>>> df1.ix[-3:]
 STK_ID RPT_Date TClose sales discount
0 000568 20060331 3.69 5.975 NaN
1 000568 20060630 9.14 10.143 NaN
2 000568 20060930 9.49 13.854 NaN
3 000568 20061231 15.84 19.262 NaN
4 000568 20070331 17.00 6.803 NaN
5 000568 20070630 26.31 12.940 NaN
6 000568 20070930 39.12 19.977 NaN
7 000568 20071231 45.94 29.269 NaN
8 000568 20080331 38.75 12.668 NaN
9 000568 20080630 30.09 21.102 NaN
10 000568 20080930 26.00 30.769 NaN

为什么要获得df1的最后3行(无索引的数据帧)?Pandas 0.10.1

时间:

不要忘记DataFrame.tail, 例如df1.tail(10)!

这是因为使用整数索引(ix通过-3而不是position来选择标签,故意设计成这样的:请参阅Pandas中的整数索引"gotchas" *)。

在较新的pandas版本中,最好使用loc或iloc来消除ix作为位置或label的歧义:

 
df.iloc[-3:]

 

查看文档

还应注意,在Pandas pre-0.14 iloc中,在out-of-bounds访问中将提升IndexError,而.head().tail()将不会出现:


>>> pd.__version__
'0.12.0'
>>> df = pd.DataFrame([{"a": 1}, {"a": 2}])
>>> df.iloc[-5:]
...
IndexError: out-of-bounds on slice (end)
>>> df.tail(5)
 a
0 1
1 2


旧答案(折旧方法):

你可以使用irows DataFrame方法来克服此歧义:


In [11]: df1.irow(slice(-3, None))
Out[11]: 
 STK_ID RPT_Date TClose sales discount
8 568 20080331 38.75 12.668 NaN
9 568 20080630 30.09 21.102 NaN
10 568 20080930 26.00 30.769 NaN

注意:系列类似的iget方法

...