Skip to main content

performance tuning - Speeding up Import and Export in CSV format


I am handling large numerical data in Mathematica. In smaller problems everything worked fine using Export and Import with the parameter csv and nothing more.


Now I am facing a much larger data volume and plain Export and Import is way too slow for CSV format.


What I want to do: First, exporting a numerical list of approximately $1400 \cdot 260$. Then I perform some calculations outside of mathematica and finally I import a csv file back using Import.



In this question I read how to improve the speed of Export with CSV.


I tried


Export["data.csv", 
ExportString[Transpose[temp], "CSV", "FieldSeparators" -> ", "],
"Table"];

for a toy-example of dimension $9 \cdot 6$. This could be


temp = RandomReal[{0, 1000}, {9, 6}];

The problem is that I got additional blank lines in my CSV file. How can I avoid those blank lines? I did not have them using plain Export.



Second part of the question: can I use a similar approach to speed up Import for CSV files?


I work in English locale in Windows 7 and Mathematica 8. My data should be comma-separated.


The result of the above looks like this:


155.9418457227914,427.72566448956945,462.4370455183139,434.230107096781,377.73423736605037,457.7044624877774,229.5721937681028,453.6973831247924,827.5146478718962

656.9573702857699,975.1048399942904,716.715190156526,67.07781324817643,168.78248854317894,863.1953962590844,997.7580107302701,427.94798294100747,565.2955778916687

192.3648037459477,435.0418975785194,126.17228368369842,772.0737083559297,453.73573640921836,957.9178360741387,920.4158275934401,234.75353158374764,162.82606110943834

841.7132070637356,799.1268178998612,931.2448706410551,950.7753472229233,114.01596316796622,145.0771999411104,287.47149951303663,786.9008107323455,99.09420650484662


116.9916885502289,715.7594598282562,970.6252946068753,654.1742185278038,262.3778046629968,200.13980161337577,347.24862854841354,314.5612015073982,241.11046402342163

203.65015448763597,952.1236458849723,578.2673369638862,527.2990305555661,655.1228742370724,318.81372163827496,311.2738362265584,315.97629887850667,514.7676854642548

EDIT: A small example of the data that I want to import (2nd step) can be found here. The true problem size is here.



Answer



Here's a much faster, purely Mathematica way than using Import to import your data:


UPDATE


As Leonid mentioned the previous code doesn't exactly replicate Import. The truth is I was only trying to retrieve the numerical part. Here's an updated version that tries to replicate the output from Import.



readYourCSV2[file_String?FileExistsQ, n_Integer] := Module[{str = OpenRead[file], data}, 
data = ReadList[str, Table[Record, {n}], RecordSeparators -> {",", "\n"}];
Close[str];
ReleaseHold[ToExpression[data, InputForm, Hold] /. {Plus[Times[x_, E | e], y_] :> x * 10 ^ y}]
]

Here, n is the number of columns.


UPDATE 2


Now for the Export, here's a fast, again, purely Mathematica way to export in CSV format.


writeYourCSV[file_String, list_List?MatrixQ] := 

With[{str = OpenWrite[file, PageWidth -> Infinity], len = Length[ list[[1]] ]},
Scan[Write[str, Sequence @@ (Flatten[Table[{FortranForm[ #[[i]] ], OutputForm[","]},
{i, len - 1}]]) ~ Join ~ { FortranForm[ #[[len]] ] }] &, list]; Close[str];
]

This takes less than 10 seconds to write your large data back to CSV format:


writeYourCSV["testcsv.csv", databig] // AbsoluteTiming


{9.921969, Null}




Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....