Skip to main content

csv format - Import vs ReadList CSV file with Currency Values


A testing file with the specified format can be downloaded from here


Let's read a few lines from this file


(recLines = ReadList[csvFileName, Record, 3]) // TableForm

SalesOrderID,ProductID,ProductName,OrderDate,ShipDate,Revenue
51217,779,Soft Cushion,5/7/2014,11/7/2014,"$2,400"
51217,788,Soft Cushion,5/7/2014,11/7/2014,"$2,400"

ls[[2]] // FullForm

"51217,779,Soft Cushion,5/7/2014,11/7/2014,\"$2,400\""

Had I used the Import I would have got


recData = Import[csvFileName];
recDate[[1;;3]]//TableForm

SalesOrderID ProductID ProductName OrderDate ShipDate Revenue
51217 779 Soft Cushion 5/7/2014 11/7/2014 2400
51217 788 Soft Cushion 5/7/2014 11/7/2014 2400


Head /@ csvData[[3]]
{Integer, Integer, String, String, String, Integer}

This is exactly the output I would like to get, but instead of using Import I want to use ReadList.


I have read various posts related to ReadList and this is the closest I have found to an answer that will use ReadList


rl = ReadList[csvFileName, Word, WordSeparators->{","}, RecordSeparators->{"\n"}, RecordLists->True]

rl[[1 ;; 3]] // TableForm

SalesOrderID ProductID ProductName OrderDate ShipDate Revenue

51217 779 Soft Cushion 5/7/2014 11/7/2014 "$2 400"
51217 788 Soft Cushion 5/7/2014 11/7/2014 "$2 400"

rl[[3]] // FullForm

List["51217","788","Soft Cushion","5/7/2014","11/7/2014","\"$2","400\""]

Head /@ rl[[3]]
{String, String, String, String, String, String, String}


Could you please continue this to a full answer or perhaps share a better solution with ReadList ? For example, is it possible to read and transform at the same time with ReadList, by specifying the header, i.e. types of the record items, and get exactly the same result as that of Import ?


I also have this challenge question for the experts of the language:


ReadList is a memory efficient and fast method of reading and parsing CSV files. But it cannot detect automatically the data types of the record items, something that Import does AUTOMAGICALLY. Wouldn't be possible to implement somehow a ReadList that can also detect the header format?


If not, suppose we do know what is the format of the records, and suppose it does not change. There are all these cryptic types and options of ReadList to assist you in getting the parsing of data items right. Can somebody explain me in the file I specified how I can use these types and options to achieve reading lists same way like Import function ?



Answer



As long as you are using ReadList to read in the entire file, and then acting on each line, you may as well just feed the result of ReadLine into ImportString.


test = ImportString[
StringJoin[
Riffle[ReadList["shoes_revenue.csv", String], "\n"]
],

"CSV"]; // AbsoluteTiming
(*{2.7918, Null}*)

which is marginally faster than this answer


conv = {ToExpression@#[[1]], 
ToExpression@#[[2]], #[[3]], #[[4]], #[[5]],
ToExpression@StringTake[StringJoin@#[[6 ;;]], {3, -2}]} &;
(wrdlst =
conv /@ (ReadList["shoes_revenue.csv", Word,
WordSeparators -> {","}, RecordSeparators -> {"\n"},

RecordLists -> True])[[2 ;;]];
PrependTo[wrdlst,
Flatten@StringSplit[ReadList["shoes_revenue.csv", Record, 1], ","]];
) // AbsoluteTiming
(* {3.00567, Null} *)

Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...