这是一个VB.NET项目 . 我现有的方法将逗号分隔的文件转换为管道分隔的文件 . 它有点挑战性,因为其中一些字段中包含逗号,因此这些字段在字段内容周围有双引号 .
这是工作代码(感谢The Blue Dog的一百万人对此进行研究):
Private Function ConvertCommaSepToPipeSep() As Boolean
Dim line, result As String
Dim pattern As String = ",([^,""]*(?:""[^""]*"")?[^,""]*)(?=,|$)"
Dim replacement As String = "|$1"
Dim rgx As New Regex(pattern)
'Console.WriteLine("Conversion start time: " & DateTime.Now.ToLongTimeString())
Try
Using sw As New StreamWriter("output.csv")
Using sr As New StreamReader("source.csv")
While Not sr.EndOfStream
line = sr.ReadLine
result = rgx.Replace(line, replacement)
sw.WriteLine(result.Replace(Chr(34), ""))
End While
End Using
End Using
Catch ex As Exception
MessageBox.Show("There was a problem converting the file." & vbcrlf & ex.message)
Return False
End Try
'Console.WriteLine("Conversion end time: " & DateTime.Now.ToLongTimeString())
Return True
End Function
然而,我发现其中一些字段也有双引号 .
以下是我要转换的源文件中的一些示例行 .
122749,JOHN DOE,ACS155,7/5/2014,P,SCH/RC Activation Week 2,HRLY,1299577,Scheduler IT,2204,CVISA-Client Activation,1220000,Svcs Clin Implement,34
110310,JANE DOE,ACS150,2/8/2014,P,"Developed Employee Interface""",HRLY,1267305,Project Management - Client Implementation Services,2500,PJM -Project Management,1410000,Tech Services Development,8
110310,MARY DOE,ACS160,2/8/2014,P,EDManage+ CSV data extract,HRLY,1527401,Project Management - Client Implementation Services,2500,PJM -Project Management,1410000,Tech Services Development,8
129084,ROBERT SMITH,ACS80,9/27/2014,P,,PTO,0,Company General Services,1030,"Time Off - PTO, Holiday, Personal Holiday, FTO",1100000,Client Services Technical,40
117592,HARRY JOHNSON,ACS64,5/10/2014,P,"helped penny post AP ""E"" cks",HRLY,1554404,General Financials IT,2120,CCON-Client Conference Call,1100000,Client Services Technical,1.5
110310,MARK WILSON,ACS130,2/8/2014,P,"""Charge Vs Payment""",HRLY,1267305,Project Management - Clinical Implementation Services,2500,PJM -Project Management,1410000,Tech Services Development,8
需要将这些相同的行转换为如下所示:
122749|JOHN DOE|ACS155|7/5/2014|P|SCH/RC Activation Week 2|HRLY|1299577|Scheduler IT|2204|CVISA-Client Activation|1220000|Svcs Clin Implement|34
110310|JANE DOE|ACS150|2/8/2014|P|Developed Employee Interface""|HRLY|1267305|Project Management - Client Implementation Services|2500|PJM -Project Management|1410000|Tech Services Development|8
110310|MARY DOE|ACS160|2/8/2014|P|EDManage+ CSV data extract|HRLY|1527401|Project Management - Client Implementation Services|2500|PJM -Project Management|1410000|Tech Services Development|8
129084|ROBERT SMITH|ACS80|9/27/2014|P||PTO|0|Company General Services|1030|Time Off - PTO, Holiday, Personal Holiday, FTO|1100000|Client Services Technical|40
117592|HARRY JOHNSON|ACS64|5/10/2014|P|helped penny post AP E cks|HRLY|1554404|General Financials IT|2120|CCON-Client Conference Call|1100000|Client Services Technical|1.5
110310|MARK WILSON|ACS130|2/8/2014|P|Charge Vs Payment|HRLY|1267305|Project Management - Clinical Implementation Services|2500|PJM -Project Management|1410000|Tech Services Development|8
在此CSV中,文本中包含逗号的列在列周围显示双引号,上面的正则表达式代表该列 . 但我发现一些字段中也有双引号 . 可以删除字段中双引号的任何实例,但在某些情况下,字段可以以双引号结束或开头,从而产生三个双引号,但我不能删除所有双引号,因为它们有助于描述其中包含逗号的字段的开头和结尾 .
需要添加到正则表达式中才能做到这一点?
1 回答
""
应该转换为单个"
. 你确定要完全删除它们吗? - nhahtdh难道你不能在运行RE之前
csvString = csvString.Replace( ... )
- Alex K.