others - python - 从多维数组中比较特定值的方法

我正在编写用于数据预处理的python脚本,


[['United', '-27.654379', '152.917741', 'e10', '1459', '2019-03-18'],


['United', '-27.654379', '152.917741', 'e10', '1449', '2019-03-19']]



目前。我还需要删除数组中相同日期的值


[['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16'],


['United', '-25.607894', '150.367213', 'e10', '1297', '2019-03-16']]



会成为


[['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16']]



我当前这样做的方法(如下所示)似乎可以识别,并且删除具有重复日期的条目,但是,仍可在输出中可以找到一些重复的条目。


 for line in Data_text:


 for row in Data_text:


 if line[5] == row[5]:


 Data_text.remove(row)



时间:

使用纯python,你可以利用set的强大功能来在这种情况下工作:


lst = [['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16'],


 ['Costco', '-27.213607', '152.996416', 'e10', '1297', '2019-03-16']]



seen = set()


print([x for x in lst if not (x[5] in seen or seen.add(x[5]))])



# [['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16']]



使用python3.7,下面的代码就可以,


data = [['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16'],


 ['United', '-25.607894', '150.367213', 'e10', '1297', '2019-03-16']]



data = list({item[5]: item for item in data}.values())


# [['United', '-25.607894', '150.367213', 'e10', '1297', '2019-03-16']]



你要考虑用pandas去处理这种类型的数据和操作:


a = [['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16'],


 ['United', '-25.607894', '150.367213', 'e10', '1297', '2019-03-16']]



import pandas as pd



df = pd.DataFrame(a).drop_duplicates(5, keep='first')



结果:


df



 0 1 2 3 4 5


0 Costco -27.213607 152.996416 e10 1237 2019-03-16



如果日期的格式不同,则此选项尤其有用:


a2 = [['Costco', '-27.213607', '152.996416', 'e10', '1237', 'March 16, 2019'],


 ['United', '-25.607894', '150.367213', 'e10', '1297', '2019-03-16']]



df = pd.DataFrame(a2)


df[5] = pd.to_datetime(df[5])


df.drop_duplicates(5, keep='first')



仍然给出正确的结果:


 0 1 2 3 4 5


0 Costco -27.213607 152.996416 e10 1237 2019-03-16



请尝试这个,新的result_list =[],将无重复记录放入result_list,


result_list = []


length = len(Data_text);


for i in range(0, length):


 line = Data_text[i]


 is_exsit = False


 for row in result_list:


 if line[5] == row[5]:


 is_exsit = True


 break



 if is_exsit == False:


 result_list.append(line)



print(result_list)



...