Skip to main content

Mathematica memory management for large arrays


I have come across a weird phenomenon in Mathematica when dealing with large arrays. When generating a list with all the possible subsets of three elements of another list (thus having elements which are lists of 3 elements), I have observed that if you extract these elements in three separate arrays, Mathematica uses much less memory to store the data, even if the total number of elements we have is exactly the same. My question is surely naive: why does this happen?


This is a minimal code that sets an example of what I'm saying, recording the memory used by Mathematica in every step:


n = 100;
memvec = {MemoryInUse[]};

timesubset = Subsets[Range[1, n], {3}];

AppendTo[memvec, MemoryInUse[]];

(*Extract all the first elements*)
t0 = timesubset[[All, 1]];
AppendTo[memvec, MemoryInUse[]];

(*Extract all the second elements*)
t1 = timesubset[[All, 2]];
AppendTo[memvec, MemoryInUse[]];


(*Extract all the third elements*)
t2 = timesubset[[All, 3]];
AppendTo[memvec, MemoryInUse[]];

Clear[timesubset];
AppendTo[memvec, MemoryInUse[]];

Clear[t1];
Clear[t2];
Clear[t0];

AppendTo[memvec, MemoryInUse[]];

ListLinePlot[memvec]

I feel there is an important concept on how is the memory managed that I am missing here.



Answer



You can use ByteCount to look at the memory usage. ByteCount is not precise (doesn't take into account sharing), but it will make it easier to understand what is going on.


Let's do this experiment:


In[1]:= list = Subsets[Range[100], {3}];


In[2]:= ByteCount[list]
Out[2]= 19404080

In[3]:= ByteCount[Transpose[list]]
Out[3]= 11642704

Note that Transpose[list] is the same as {list[[All,1]], list[[All,2]], list[[All,3]]}. We get the same result you got: Transpose[list] takes less memory. The reason for this is that every expression, including Lists has some storage overhead in addition to the elements it contains. When Mathematica stores {1,2,3}, i.e. List[1,2,3], it needs to store more than the elements 1, 2 and 3. It need to store the head of the expression, i.e. List, and the number of elements it contains.


Now Transpose[list] only contains four lists. The outer one, and the three inner ones it holds. This is much less overhead than what's necessary for list which has 161700 inner expressions.




A little practical experimentation shows that on my machine a compound expression with no elements, such as List[], takes 40 bytes. Every element takes an additional 8 bytes for the reference (I'm on a 64 it system where pointers are 8 byte long), plus the storage needed for an element itself. An integer takes 16 bytes. Thus {1} will take 64 bytes, out of which 40 are for the compound expression, 16 for the integer 1 and 8 for the pointer to the integer.



If you use this information to estimate how much space list and Transpose[list] should take up, you'll get values which are close to the actual ones, though not identical. I'm not sure what the difference is due to.




This is a good opportunity to talk about packed arrays a bit. While Mathematica's expression format is very general, it is not too efficient either for storage (memory wise) or for numerical computations. To solve this problem, Mathematica can automatically use an alternate storage format called packed arrays. This only works for proper arrays (i.e. lists for which ArrayQ would return True) that contain only numbers of the same kind (integers, reals, complexes). It stores the numbers as a flat array in memory with extra information about the dimensions of the array. Thus the memory needed to store a numerical array with n elements in packed format is just a few bytes more than n*8 bytes (note that a machine precision number takes up 8 bytes).


You can use some Developer` context functions to test if arrays are stored in a packed format (Developer`PackedArrayQ) and to convert between internal storage formats (ToPackedArray and FromPackedArray).


In[4]:= ByteCount[Developer`ToPackedArray[list]]
Out[4]= 3880952

In[5]:= ByteCount[Developer`ToPackedArray@Transpose[list]]
Out[5]= 3880952




A final word about ByteCount and subexpression sharing: ByteCount returns the number of bytes needed to store an expression without any sharing. However, when not using a packed format, Mathematica will usually only store s once in the expression {s, s, s}. Even though s is present three times, these are represented as three references to the same memory location. The function Share[] will try to discover repeated subexpressions and optimise their storage to avoid repetition.


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...