RAM limitations make import impractical for large CSV files. Is it possible to Stream CSV correctly?
str = OpenRead["EU.csv"];
Read[str, "CSV"]
Read::readn: Invalid real number found when reading from EU.csv
Read@str
Read::readt: Invalid input found when reading "lTid, cDealable, CurrencyPair, RateDateTime, RateBid, RateAsk" from EU.csv
Close@str
Here's the CSV file. Though I think it's properly formatted.
Answer
Here is a function which may help:
Clear[readRows];
readRows[stream_, n_] :=
With[{str = ReadList[stream, "String", n]},
ImportString[StringJoin[Riffle[str, "\n"]], "Table"] /; str =!= {}];
readRows[__] := $Failed;
I tested on your file and it works all right (it may make sense to read rows in batches, this is much faster):
n=0;
str = OpenRead["C:\\Temp\\EUR_USD_Week1.csv"];
While[readRows[str, 1000] =!= $Failed, n++];
Close[str];
n
(* 82 *)
By the way, speaking of practicality of Import
- I agree, but do read this answer - it is based on the same idea as above code, and makes importing whole files quite practical IMO. In particular, using the readTable
function from there, I get your file read in its entirety under 3 seconds on my machine:
In[64]:= readTable["C:\\Temp\\EUR_USD_Week1.csv",2000]//Short//AbsoluteTiming
Out[64]= {2.4775391,{{lTid,cDealable,CurrencyPair,RateDateTime,RateBid,RateAsk},
<<81172>>,{1385715072,D\[Ellipsis] SD,2011-01-07,\[Ellipsis] }}}
with a very decent memory usage:
MaxMemoryUsed[]
(* 71652808 *)
meaning 50 Mb of net usage (when you subtract the startup memory usage) - for this particular example. You can tweak the second parameter to trade run-time for memory efficiency. See the linked post for more details.
Comments
Post a Comment