csv - csv读写器的通用行模式在文件中写csv写错误行中断

  显示原文与译文双语对照的内容

使用通用行模式("读取csv文件运行") cdv.reader 时,在 csv.writer 中生成 r n 。 你知道如何在 csv.writer 中忽略新行? 我必须在读取器中使用("运行"),因为我的文件包含新的行字符。

这是我使用的代码


import csv



dict={}


with open('training_data.csv','rU') as f:


 reader = csv.reader(f,skipinitialspace=True)


for line in reader:


 try:


 dict[line[2]].append(line[3])


 except:


 dict[line[2]]=[line[3]]



with open('training_result.csv','w') as f:


writer = csv.writer(f, delimiter='|',dialect='excel-tab')


for key in dict:


 writer.writerow([key,','.join(dict[key])])



输入就像这样


username, some of tweet that


want to be processed


by machine, label



因为这是换行和通用行模式激活的,当我捕捉数据并想用csv编写器编写时,它是相同

我想做的是这样的输出


username, some of tweet that want to be processed by machine, label



是否应该删除csv文件中的所有分行符? 但是它太大了,csv大约是 150MB,包含 700行。 有什么方法可以?

我已经使用了skipinitialspace和方言等阅读器属性,但仍然无法处理这个问题

时间: 原作者:

通过","替换新行并为每个新追加添加一个新行,我们可以实现这一点。 如果你不需要任何新的行,你可以删除


dict[line[2]].append(line[3].replace("n",","));



这是代码


import csv;



dict={};


with open('training_data.csv','rU') as f:


 reader = csv.reader(f,skipinitialspace=True);


 for line in reader:


 try:


 dict[line[2]].append("n"+line[3].replace("n",","));


 except:


 dict[line[2]]=[line[3].replace("n",",")];



with open('training_result.csv','w') as f:


 writer = csv.writer(f, delimiter=',',dialect='excel-tab');


 for key in dict:


 writer.writerow([key,','.join(dict[key])]);



我想这就是你要找的结果。 你没有提到你的python 版本。 这是 python 3 。我使用你上传的样本数据上传到 Google Drive 。 解析为 UTF-8的文件。

要注意的关键事项:

  • csv 有一个 DictReader 来帮助选择要处理的列。
  • CSV文件应该以二进制模式打开。 在 python 2中,只是 'rb' 或者 'wb',但在 python 3中,它意味着 'r',newline='' 和对 open 调用的编码。
  • line 将是 {'header':'value'} 对的字典。
  • extrasaction 告诉 DictWriter 忽略字典中没有的额外字段。

示例数据:


twitter.place.full_name,twitter.user.location,interaction.author.username,interaction.content,interaction.created_at


"Gunungsari, Lombok Barat",Indonesia,__Thasya__,At Sheraton Senggigi Beach Resort äóî https://t.co/1FdTsMYWje,"Mon, 16 Jun 2014 15:32:54 +0000"


"Cakranegara, Kota Mataram",NULL,__Waone,Mataram,"Mon, 24 Mar 2014 13:13:46 +0000"


"Pemenang, Lombok Utara",Jakarta,_5at,"perdana, my first nephew from my lil sibling sister,,,



*moga gäó» ketularan songong kayak pamannya >_< http://t.co/UBEwcxWY5c","Sat, 04 Jan 2014 04:20:45 +0000"


"Pemenang, Lombok Utara",Jakarta,_5at,"@indiraputeri udah pinter bahasa sasak nih skrng,,, inaq rari","Sat, 04 Jan 2014 06:15:52 +0000"


"Pemenang, Lombok Utara",Jakarta,_5at,@indiraputeri dalemmm bgt nih ndoro..!!! mksd nya apaan?,"Sat, 04 Jan 2014 05:55:04 +0000"


"Keruak, Lombok Timur",Jakarta,_5at,"pagi2, hujan, holiday, nasi goreng hangat, kopi hangat, di rumah, + spesial: kumpul keluarga,,, ^_^ *kurang_apa_lagi","Thu, 02 Jan 2014 00:02:47 +0000"


"Pujut, Lombok Tengah",Jakarta,_5at,"Doäó»a bepergian keluar rumah:



""Bismillaahitawakkaltu äó»alallooh""



*pasrah-pasrah-pasrah;


*bandara_international_lombok","Sun, 05 Jan 2014 03:36:48 +0000"


"Sakra, Lombok Timur",Jakarta,_5at,"Time for riding with my lil bro:


Mataram - Senggigi - Gili Terawangan


*jenguk_ponakan_baru;


*very_early","Fri, 03 Jan 2014 22:09:26 +0000"


"Sukamulia, Lombok Timur",,1821922,Salam friend,"Sun, 20 Jul 2014 19:23:53 +0000"



代码:


import csv



# Python 2 version of opens


#with open('training_data.csv','rb') as inp:


# with open('training_result.csv','wb') as outp:



with open('training_data.csv','r',newline='',encoding='utf8') as inp:


 with open('training_result.csv','w',newline='',encoding='utf8') as outp:


 reader = csv.DictReader(inp)


 writer = csv.DictWriter(outp,


 fieldnames=['interaction.author.username','interaction.content'],


 extrasaction='ignore')


 writer.writeheader()


 for line in reader:


 line['interaction.content'] = line['interaction.content'].replace('n',' ')


 writer.writerow(line)



结果:


interaction.author.username,interaction.content


__Thasya__,At Sheraton Senggigi Beach Resort äóî https://t.co/1FdTsMYWje


__Waone,Mataram


_5at,"perdana, my first nephew from my lil sibling sister,,, *moga gäó» ketularan songong kayak pamannya >_< http://t.co/UBEwcxWY5c"


_5at,"@indiraputeri udah pinter bahasa sasak nih skrng,,, inaq rari"


_5at,@indiraputeri dalemmm bgt nih ndoro..!!! mksd nya apaan?


_5at,"pagi2, hujan, holiday, nasi goreng hangat, kopi hangat, di rumah, + spesial: kumpul keluarga,,, ^_^ *kurang_apa_lagi"


_5at,"Doäó»a bepergian keluar rumah:""Bismillaahitawakkaltu äó»alallooh"" *pasrah-pasrah-pasrah; *bandara_international_lombok"


_5at,Time for riding with my lil bro: Mataram - Senggigi - Gili Terawangan *jenguk_ponakan_baru; *very_early


1821922,Salam friend



原作者:
...