Skip to main content

Mathematica memory management for large arrays

I have come across a weird phenomenon in Mathematica when dealing with large arrays. When generating a list with all the possible subsets of three elements of another list (thus having elements which are lists of 3 elements), I have observed that if you extract these elements in three separate arrays, Mathematica uses much less memory to store the data, even if the total number of elements we have is exactly the same. My question is surely naive: why does this happen?

This is a minimal code that sets an example of what I'm saying, recording the memory used by Mathematica in every step:

n = 100;
memvec = {MemoryInUse[]};

timesubset = Subsets[Range[1, n], {3}];

AppendTo[memvec, MemoryInUse[]];

(*Extract all the first elements*)
t0 = timesubset[[All, 1]];
AppendTo[memvec, MemoryInUse[]];

(*Extract all the second elements*)
t1 = timesubset[[All, 2]];
AppendTo[memvec, MemoryInUse[]];

(*Extract all the third elements*)
t2 = timesubset[[All, 3]];
AppendTo[memvec, MemoryInUse[]];

AppendTo[memvec, MemoryInUse[]];


AppendTo[memvec, MemoryInUse[]];


I feel there is an important concept on how is the memory managed that I am missing here.


You can use ByteCount to look at the memory usage. ByteCount is not precise (doesn't take into account sharing), but it will make it easier to understand what is going on.

Let's do this experiment:

In[1]:= list = Subsets[Range[100], {3}];

In[2]:= ByteCount[list]
Out[2]= 19404080

In[3]:= ByteCount[Transpose[list]]
Out[3]= 11642704

Note that Transpose[list] is the same as {list[[All,1]], list[[All,2]], list[[All,3]]}. We get the same result you got: Transpose[list] takes less memory. The reason for this is that every expression, including Lists has some storage overhead in addition to the elements it contains. When Mathematica stores {1,2,3}, i.e. List[1,2,3], it needs to store more than the elements 1, 2 and 3. It need to store the head of the expression, i.e. List, and the number of elements it contains.

Now Transpose[list] only contains four lists. The outer one, and the three inner ones it holds. This is much less overhead than what's necessary for list which has 161700 inner expressions.

A little practical experimentation shows that on my machine a compound expression with no elements, such as List[], takes 40 bytes. Every element takes an additional 8 bytes for the reference (I'm on a 64 it system where pointers are 8 byte long), plus the storage needed for an element itself. An integer takes 16 bytes. Thus {1} will take 64 bytes, out of which 40 are for the compound expression, 16 for the integer 1 and 8 for the pointer to the integer.

If you use this information to estimate how much space list and Transpose[list] should take up, you'll get values which are close to the actual ones, though not identical. I'm not sure what the difference is due to.

This is a good opportunity to talk about packed arrays a bit. While Mathematica's expression format is very general, it is not too efficient either for storage (memory wise) or for numerical computations. To solve this problem, Mathematica can automatically use an alternate storage format called packed arrays. This only works for proper arrays (i.e. lists for which ArrayQ would return True) that contain only numbers of the same kind (integers, reals, complexes). It stores the numbers as a flat array in memory with extra information about the dimensions of the array. Thus the memory needed to store a numerical array with n elements in packed format is just a few bytes more than n*8 bytes (note that a machine precision number takes up 8 bytes).

You can use some Developer` context functions to test if arrays are stored in a packed format (Developer`PackedArrayQ) and to convert between internal storage formats (ToPackedArray and FromPackedArray).

In[4]:= ByteCount[Developer`ToPackedArray[list]]
Out[4]= 3880952

In[5]:= ByteCount[Developer`ToPackedArray@Transpose[list]]
Out[5]= 3880952

A final word about ByteCount and subexpression sharing: ByteCount returns the number of bytes needed to store an expression without any sharing. However, when not using a packed format, Mathematica will usually only store s once in the expression {s, s, s}. Even though s is present three times, these are represented as three references to the same memory location. The function Share[] will try to discover repeated subexpressions and optimise their storage to avoid repetition.


Popular posts from this blog

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

What is and isn't a valid variable specification for Manipulate?

I have an expression whose terms have arguments (representing subscripts), like this: myExpr = A[0] + V[1,T] I would like to put it inside a Manipulate to see its value as I move around the parameters. (The goal is eventually to plot it wrt one of the variables inside.) However, Mathematica complains when I set V[1,T] as a manipulated variable: Manipulate[Evaluate[myExpr], {A[0], 0, 1}, {V[1, T], 0, 1}] (*Manipulate::vsform: Manipulate argument {V[1,T],0,1} does not have the correct form for a variable specification. >> *) As a workaround, if I get rid of the symbol T inside the argument, it works fine: Manipulate[ Evaluate[myExpr /. T -> 15], {A[0], 0, 1}, {V[1, 15], 0, 1}] Why this behavior? Can anyone point me to the documentation that says what counts as a valid variable? And is there a way to get Manpiulate to accept an expression with a symbolic argument as a variable? Investigations I've done so far: I tried using variableQ from this answer , but it says V[1...