I have come across a weird phenomenon in Mathematica when dealing with large arrays. When generating a list with all the possible subsets of three elements of another list (thus having elements which are lists of 3 elements), I have observed that if you extract these elements in three separate arrays, Mathematica uses much less memory to store the data, even if the total number of elements we have is exactly the same. My question is surely naive: why does this happen?
This is a minimal code that sets an example of what I'm saying, recording the memory used by Mathematica in every step:
n = 100;
memvec = {MemoryInUse[]};
timesubset = Subsets[Range[1, n], {3}];
AppendTo[memvec, MemoryInUse[]];
(*Extract all the first elements*)
t0 = timesubset[[All, 1]];
AppendTo[memvec, MemoryInUse[]];
(*Extract all the second elements*)
t1 = timesubset[[All, 2]];
AppendTo[memvec, MemoryInUse[]];
(*Extract all the third elements*)
t2 = timesubset[[All, 3]];
AppendTo[memvec, MemoryInUse[]];
Clear[timesubset];
AppendTo[memvec, MemoryInUse[]];
Clear[t1];
Clear[t2];
Clear[t0];
AppendTo[memvec, MemoryInUse[]];
ListLinePlot[memvec]
I feel there is an important concept on how is the memory managed that I am missing here.
Answer
You can use ByteCount
to look at the memory usage. ByteCount
is not precise (doesn't take into account sharing), but it will make it easier to understand what is going on.
Let's do this experiment:
In[1]:= list = Subsets[Range[100], {3}];
In[2]:= ByteCount[list]
Out[2]= 19404080
In[3]:= ByteCount[Transpose[list]]
Out[3]= 11642704
Note that Transpose[list]
is the same as {list[[All,1]], list[[All,2]], list[[All,3]]}
. We get the same result you got: Transpose[list]
takes less memory. The reason for this is that every expression, including List
s has some storage overhead in addition to the elements it contains. When Mathematica stores {1,2,3}
, i.e. List[1,2,3]
, it needs to store more than the elements 1
, 2
and 3
. It need to store the head of the expression, i.e. List
, and the number of elements it contains.
Now Transpose[list]
only contains four lists. The outer one, and the three inner ones it holds. This is much less overhead than what's necessary for list
which has 161700 inner expressions.
A little practical experimentation shows that on my machine a compound expression with no elements, such as List[]
, takes 40 bytes. Every element takes an additional 8 bytes for the reference (I'm on a 64 it system where pointers are 8 byte long), plus the storage needed for an element itself. An integer takes 16 bytes. Thus {1}
will take 64 bytes, out of which 40 are for the compound expression, 16 for the integer 1
and 8 for the pointer to the integer.
If you use this information to estimate how much space list
and Transpose[list]
should take up, you'll get values which are close to the actual ones, though not identical. I'm not sure what the difference is due to.
This is a good opportunity to talk about packed arrays a bit. While Mathematica's expression format is very general, it is not too efficient either for storage (memory wise) or for numerical computations. To solve this problem, Mathematica can automatically use an alternate storage format called packed arrays. This only works for proper arrays (i.e. lists for which ArrayQ
would return True
) that contain only numbers of the same kind (integers, reals, complexes). It stores the numbers as a flat array in memory with extra information about the dimensions of the array. Thus the memory needed to store a numerical array with n
elements in packed format is just a few bytes more than n*8
bytes (note that a machine precision number takes up 8 bytes).
You can use some Developer`
context functions to test if arrays are stored in a packed format (Developer`PackedArrayQ
) and to convert between internal storage formats (ToPackedArray
and FromPackedArray
).
In[4]:= ByteCount[Developer`ToPackedArray[list]]
Out[4]= 3880952
In[5]:= ByteCount[Developer`ToPackedArray@Transpose[list]]
Out[5]= 3880952
A final word about ByteCount
and subexpression sharing: ByteCount
returns the number of bytes needed to store an expression without any sharing. However, when not using a packed format, Mathematica will usually only store s
once in the expression {s, s, s}
. Even though s
is present three times, these are represented as three references to the same memory location. The function Share[]
will try to discover repeated subexpressions and optimise their storage to avoid repetition.
Comments
Post a Comment