performance tuning - Efficient lazy weak compositions

In Mathematica all weak k-compositions of integer n can be generated using permutations of integer partitions:

ClearAll[weakCompositionsPermPart]
weakCompositionsPermPart[n_Integer, k_Integer?NonNegative] :=
    Join @@ Permutations /@ IntegerPartitions[n, {k}, Range[0, n]]

But since their number grow very rapidly as Binomial[n + k - 1, n] we can easily run out of memory when trying to use weak compositions of large numbers.

Problem of lack of memory could be solved if we could generate weak compositions in chunks. So here's my question.

How can we generate weak k-compositions in chunks of given length?

IntegerPartitions accepts fourth argument with standard sequence specification, using it we can get partitions in chunks. Combining this with Streaming` module we can easily create lazy integer partitions:

Needs["Streaming`"]

ClearAll[lazyIntegerPartitions]
lazyIntegerPartitions[
    n_Integer, kspec_:All, sspec_:All, chunkSize:_Integer?Positive:100000
] :=
    Module[{ctr = 0, active = False},
        Check[

            Quiet[
                IntegerPartitions[n, kspec, sspec, {1}],
                IntegerPartitions::take
            ],
            Return[$Failed, Module]
        ];
        LazyListCreate[
            IteratorCreate[
                ListIterator,
                (active = True) &,

                With[
                    {taken = 
                        Quiet[
                            IntegerPartitions[
                                n, kspec, sspec, {ctr + 1, ctr + chunkSize}
                            ],
                            IntegerPartitions::take
                        ]
                    }
                    ,

                    ctr += Length[taken];
                    taken
                ] &,
                TrueQ[active] &,
                Remove[active, ctr] &
            ],
            chunkSize
        ]
    ]

Basic usage. Lazy list of partitions of 10 to, up to 7, elements containing only 1, 3, 5 and 7, taken in chunks of length 3:

lazyIntegerPartitions[10, 7, {1, 3, 5, 7}, 3]
(* « LazyList[{7,3},{7,1,1,1},{5,5},...] » *)

Scan[Print, %]
(* {7,3}
   {7,1,1,1}
   {5,5}
   {5,3,1,1}
   {5,1,1,1,1,1}

   {3,3,3,1}
   {3,3,1,1,1,1} *)

%% // Normal
(* {{7, 3}, {7, 1, 1, 1}, {5, 5}, {5, 3, 1, 1}, {5, 1, 1, 1, 1, 1},
    {3, 3, 3, 1}, {3, 3, 1, 1, 1, 1}} *)

Some basic tests that whole lazy lists are the same as non-lazy lists of integer partitions:

And @@ Table[
    SameQ[

        lazyIntegerPartitions[n] // Normal, 
        IntegerPartitions[n]
    ],
    {n, 30}
]
(* True *)

And @@ Table[
    SameQ[
        lazyIntegerPartitions[n, {2, 3}] // Normal, 

        IntegerPartitions[n, {2, 3}]
    ],
    {n, 30}
]
(* True *)

And @@ Table[
    SameQ[
        lazyIntegerPartitions[n, All, {1, 4, 5, 6, 7, 12}, 5] // Normal,
        IntegerPartitions[n, All, {1, 4, 5, 6, 7, 12}]

    ],
    {n, 30}
]
(* True *)

Now we can map Permutations over lazy list.

lazyIntegerPartitions[5, {3}, Range[0, 5]]
(* « LazyList[{5,0,0},{4,1,0},{3,2,0},{3,1,1},{2,2,1},...] » *)

Permutations /@ %

(* « LazyList[{{5,0,0},{0,5,0},{0,0,5}},{{4,1,0},{4,0,1},{1,4,0},{1,0,4},{0,4,1},{0,1,4}},{{3,2,0},{3,0,2},{2,3,0},{2,0,3},{0,3,2},{0,2,3}},{{3,1,1},{1,3,1},{1,1,3}},{{2,2,1},{2,1,2},{1,2,2}},...] » *)

Scan[Print, %]
(* {{5,0,0},{0,5,0},{0,0,5}}
   {{4,1,0},{4,0,1},{1,4,0},{1,0,4},{0,4,1},{0,1,4}}
   {{3,2,0},{3,0,2},{2,3,0},{2,0,3},{0,3,2},{0,2,3}}
   {{3,1,1},{1,3,1},{1,1,3}}
   {{2,2,1},{2,1,2},{1,2,2}} *)

And it works, but since Permutations don't accept sequence specification as argument we must generate all permutations at once. So even if we generate integer partitions in chunks of length 1, we still need to store, at once, in memory all permutations of single partition, which quickly becomes to large to fit my memory.

For example in partitions of 12 to 17 elements:

lazyIntegerPartitions[12, {17}, Range[0, 12], 1]
(* « LazyList[{12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},...] » *)

Although there are only 77 partitions, which we take one by one:

% // Length
(* 77 *)

One of them has 4 084 080 permutations, that need to be stored in memory at once:

Length /@ Permutations /@ %% // Max

(* 4 084 080 *)

That's why for large numbers this solution is not sufficient.

Clean up cache after playing with Streaming`:

Scan[LazyListDestroy, LazyLists[]]

Answer

Here is slightly modified version of algorithm used in Combinatorica`NextComposition converted to a LibraryFunction.

Needs["CCompilerDriver`"]
"

#include \"WolframLibrary.h\"

DLLEXPORT mint WolframLibrary_getVersion() {
    return WolframLibraryVersion;
}
DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) {
    return LIBRARY_NO_ERROR;
}
DLLEXPORT void WolframLibrary_uninitialize(WolframLibraryData libData) {}


DLLEXPORT int nextWeakCompositionsChunk(
        WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res
) {
    // number which compositions are calculated
    mint n = MArgument_getInteger(Args[0]);

    // length of composition
    mint k = MArgument_getInteger(Args[1]);

    // \" current \" composition tensor

    MTensor compositionT = MArgument_getMTensor(Args[2]);
    // actual data of \" current \" composition
    mint* composition = libData->MTensor_getIntegerData(compositionT);

    // one-dimentional, one-element tensor holding index of last non-zero 
    // element in \"current\" composition
    MTensor lastNonZeroIndexT = MArgument_getMTensor(Args[3]);
    // pointer to index of last non-zero element in \"current\" composition
    mint* lastNonZeroIndex = libData->MTensor_getIntegerData(lastNonZeroIndexT);


    // number of compositions in returned chunk
    mint chunkSize = MArgument_getInteger(Args[4]);

    // dimensions of returned chunk tensor
    mint chunkDims[2] = {chunkSize, k};
    // 2 dimentional tensor with chunk of compositions to be returned
    MTensor chunkT;
    libData->MTensor_new(MType_Integer, 2, chunkDims, &chunkT);
    // actual data of the chunk tensor
    mint* chunk = libData->MTensor_getIntegerData(chunkT);


    // index enumerating compositions in chunk
    mint i;
    // index enumerating elements of composition
    mint j;

    for (i = 0; i < chunkSize; i++) {
        // Moving to next composition does basically this:
        // {..., x, y} -> {..., x+1, y-1} if last element is non-zero,
        // {..., x, y, 0, ..., 0} -> {..., x+1, 0, 0, ..., y-1}

        // if last element is zero and y is last non-zero element.

        ++(composition[*lastNonZeroIndex - 1]);

        if (*lastNonZeroIndex == k -1) {
            --(composition[*lastNonZeroIndex]);
        } else {
            composition[k - 1] = composition[*lastNonZeroIndex] - 1;
            composition[*lastNonZeroIndex] = 0;
        }


        // If last element is zero it means we moved last non-zero element to
        // the left. If last element is non zero, then, obviously, it is the
        // last non-zero element.
        if (composition[k - 1] == 0) {
            --(*lastNonZeroIndex);
        } else {
            *lastNonZeroIndex = k - 1;
        }


        // Copy \"current\" composition to i-th row of chunk.
        for (j = 0; j < k; j++) {
            chunk[i*k + j] = composition[j];
        }
    }

    // Return control over composition and lastNonZeroIndex tensors back to
    // Wolfram Language.
    libData->MTensor_disown(lastNonZeroIndexT);
    libData->MTensor_disown(compositionT);


    // Set chunk tensor as returned value of LibraryFunction.
    MArgument_setMTensor(Res, chunkT);

    return LIBRARY_NO_ERROR;
}
";
CreateLibrary[%, "weakCompositions"]

ClearAll[nextWeakCompositionsChunk]

nextWeakCompositionsChunk = 
    LibraryFunctionLoad[
        %%, 
        "nextWeakCompositionsChunk",
        {Integer, Integer, {Integer, 1, "Shared"}, {Integer, 1, "Shared"}, Integer},
        {Integer, 2}
    ]

It returns chunk of weak k-compositions of n, in lexicographic order starting from composition after given one. It accepts 5 arguments: numbers n and k, list of integers representing "previous" composition, list containing one integer being index (in C sense i.e. starting from 0) of last non-zero element in "previous" composition, last argument is number of compositions in returned chunk.

Besides returning chunk of compositions nextWeakCompositionsChunk has a side effect. "Previous" composition and list containing index of last non-zero element are passed as "Shared" tensors. They are modified inside nextWeakCompositionsChunk function and they stay modified. For example get 3 weak 4-compositions of 5 starting from one after {1, 3, 0, 1}:

ClearAll[currentComposition, lastNonZeroIndex]
currentComposition = {1, 3, 0, 1} // Developer`ToPackedArray;
lastNonZeroIndex = {3} // Developer`ToPackedArray;
nextWeakCompositionsChunk[5, 4, currentComposition, lastNonZeroIndex, 3]
(* {{1, 3, 1, 0}, {1, 4, 0, 0}, {2, 0, 0, 3}} *)
currentComposition
(* {2, 0, 0, 3} *)
lastNonZeroIndex
(* {3} *)

To get first composition we need to pass a fake composition that implemented algorithm will change to the first one. This "fake one before first composition" is always of form {0, ..., 0, -1, n+1}. For example to get all 10 weak 3-compositions of 3 we use:

ClearAll[currentComposition, lastNonZeroIndex]
currentComposition = {0, -1, 4} // Developer`ToPackedArray;
lastNonZeroIndex = {2} // Developer`ToPackedArray;
nextWeakCompositionsChunk[3, 3, currentComposition, lastNonZeroIndex, 10]
(* {{0, 0, 3}, {0, 1, 2}, {0, 2, 1}, {0, 3, 0}, {1, 0, 2}, {1, 1, 1}, {1, 2, 0}, {2, 0, 1}, {2, 1, 0}, {3, 0, 0}} *)
currentComposition
(* {3, 0, 0} *)
lastNonZeroIndex
(* {0} *)

This is a "low level" function (in package it would be private) that does no tests of consistency of given arguments. It is intended to be used only through functions defined in next sections. Passing wrong arguments can result in kernel crash.

Using our low level function we can define a function that returns all weak compositions.

ClearAll[weakCompositions]
weakCompositions[n_Integer?NonNegative, 0] := {}
weakCompositions[n_Integer?NonNegative, 1] := {{n}}
weakCompositions[n_Integer?NonNegative, k_Integer /; k >= 2] :=
    nextWeakCompositionsChunk[
        n,

        k,
        Join[ConstantArray[0, k - 2], {-1, n + 1}] // Developer`ToPackedArray,
        {k - 1} // Developer`ToPackedArray,
        Binomial[n + k - 1, n]
    ]

For example get all weak 3-compositions of 2:

weakCompositions[2, 3]
(* {{0, 0, 2}, {0, 1, 1}, {0, 2, 0}, {1, 0, 1}, {1, 1, 0}, {2, 0, 0}} *)

Test that we get same output as sorted Permutations of IntegerPartitions:

And @@ Flatten @ Table[
    SameQ[
        weakCompositions[n, k],
        weakCompositionsPermPart[n, k] // Sort
    ],
    {n, 12}, {k, 12}
]
(* True *)

weakCompositions is faster and more memory efficient than combination of built-ins.

Below are some benchmarks to prove it.

Since Joining of permutations of partitions can double memory usage, and in some cases is not needed, here we use function that doesn't join them.

ClearAll[weakCompositionsPermPartNonJoined]
weakCompositionsPermPartNonJoined[n_, k_] :=
    Permutations /@ IntegerPartitions[n, {k}, Range[0, n]]

Some helper functions:

ClearAll[$functionsList, fixedNFunctionsList, fixedKFunctionsList]
$functionsList = {weakCompositionsPermPartNonJoined, weakCompositions};

fixedNFunctionsList[n_] := Function[func, func[n, #] &] /@ $functionsList
fixedKFunctionsList[k_] := Function[func, func[#, k] &] /@ $functionsList

We're benchmarking both time and memory usage using my timeAndMemoryBenchmarkPlots function.

Fixed n, variable k:

timeAndMemoryBenchmarkPlots[fixedNFunctionsList[2], Identity, {1, 2, 3, 4, 5, 8, 11, 15, 21, 29, 41, 58, 81, 114, 160, 225, 315, 442, 620, 870}, TimeConstraint -> Infinity]
timeAndMemoryBenchmarkPlots[fixedNFunctionsList[8], Identity, {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 23, 25}, TimeConstraint -> Infinity]
timeAndMemoryBenchmarkPlots[fixedNFunctionsList[32], Identity, Range[8], TimeConstraint -> Infinity]

Fixed k, variable n:

timeAndMemoryBenchmarkPlots[fixedKFunctionsList[2], Identity, {1, 2, 4, 7, 13, 25, 47, 90, 171, 324, 617, 1172, 2229, 4237, 8054, 15312, 29109, 55338, 105203, 200000}, TimeConstraint -> Infinity]
timeAndMemoryBenchmarkPlots[fixedKFunctionsList[8], Identity, {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 22, 25, 28, 32, 36}, TimeConstraint -> Infinity]
timeAndMemoryBenchmarkPlots[fixedKFunctionsList[32], Identity, Range[6], TimeConstraint -> Infinity]

And now the lazy approach:

Needs["Streaming`"]


ClearAll[lazyWeakCompositions]

lazyWeakCompositions[
    n_Integer?NonNegative, 0, chunkSize : _Integer?Positive : 10^5,
    opts : OptionsPattern[]
] :=
    LazyListCreate[{}, chunkSize, opts]

lazyWeakCompositions[
    n_Integer?NonNegative, 1, chunkSize : _Integer?Positive : 10^5,

    opts : OptionsPattern[]
] :=
    LazyListCreate[{{n}}, chunkSize, opts]

lazyWeakCompositions[
    n_Integer?NonNegative, k_Integer /; k >= 2,
    chunkSize : _Integer?Positive : 10^5, opts : OptionsPattern[]
] :=
    With[{length = Binomial[n + k - 1, n]},
        Module[

            {
                active = False,
                compositionsLeft = length,
                prevComposition = 
                    Join[ConstantArray[0, k - 2], {-1, n + 1}] // 
                        Developer`ToPackedArray,
                lastNonZeroIndex = {k - 1} // Developer`ToPackedArray
            }
            ,
            LazyListCreate[

                IteratorCreate[
                    ListIterator,
                    (active = True) &,
                    With[{realChunkSize = Min[chunkSize, compositionsLeft]},
                        If[realChunkSize === 0,
                            {}
                        ,(* else *)
                            compositionsLeft -= realChunkSize;
                            nextWeakCompositionsChunk[
                                n, k, prevComposition, lastNonZeroIndex,

                                realChunkSize
                            ]
                        ]
                    ] &,
                    TrueQ[active] &,
                    Remove[
                        active, compositionsLeft,
                        prevComposition, lastNonZeroIndex
                    ] &
                ],

                chunkSize,
                opts,
                "Length" -> length,
                "FiniteList" -> True
            ]
        ]
    ]

Again all weak 3-compositions of 2, but this time taken in chunks of 2:

lazyWeakCompositions[2, 3, 2]

(* « LazyList[{0,0,2},{0,1,1},...] » *)
Scan[Print, %]
(* {0,0,2}
   {0,1,1}
   {0,2,0}
   {1,0,1}
   {1,1,0}
   {2,0,0} *)

Test that Normalized lazy compositions are the same as sorted Permutations of IntegerPartitions:

And @@ Flatten @ Table[
    SameQ[
        lazyWeakCompositions[n, k] // Normal,
        weakCompositionsPermPart[n, k] // Sort
    ],
    {n, 12}, {k, 12}
]
(* True *)

We will benchmark scanning of all weak compositions in three ways:

All compositions generated at once:

ClearAll[scanWeakCompositionsPermPartNonJoined]
scanWeakCompositionsPermPartNonJoined[n_, k_] :=
    LazyListBlock@Scan[Identity, weakCompositionsPermPartNonJoined[n, k], {2}]

Compositions generated by permuting lazy partitions. Partitions are taken one by one, so in single chunk we have all permutations of given partition.

ClearAll[lazyWeakCompositionsPermPart, scanLazyWeakCompositionsPermPart]

lazyWeakCompositionsPermPart[n_, k_] := 
    Permutations /@ lazyIntegerPartitions[n, {k}, Range[0, n], 1]

scanLazyWeakCompositionsPermPart[n_, k_] :=
    LazyListBlock@Scan[Scan[Identity], lazyWeakCompositionsPermPart[n, k]]

Compositions generated using our lazyWeakCompositions function in chunks of default length: 10^5.

ClearAll[scanLazyWeakCompositions]
scanLazyWeakCompositions[n_, k_, chunkSize_: 10^5] :=

    LazyListBlock@Scan[Identity, lazyWeakCompositions[n, k, chunkSize]]

Since lazy lists cache their values and benchmark measures running time multiple times, we clear cache after each usage, using LazyListBlock (as per comment by Leonid).

$functionsList = {scanWeakCompositionsPermPartNonJoined, scanLazyWeakCompositionsPermPart, scanLazyWeakCompositions};
timeAndMemoryBenchmarkPlots[fixedKFunctionsList[8], Identity, {1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 14, 16, 18, 21, 24, 28, 32, 37, 42}, TimeConstraint -> Infinity, MemoryConstraint -> 2^31]
timeAndMemoryBenchmarkPlots[fixedKFunctionsList[16], Identity, Range[16], TimeConstraint -> Infinity, MemoryConstraint -> 2^31]
timeAndMemoryBenchmarkPlots[fixedKFunctionsList[32], Identity, Range[8], TimeConstraint -> Infinity, MemoryConstraint -> 2^31]

We see that speed for large values is comparable. Memory usage when generating all compositions at once grows fast. When using lazy integer partitions memory usage grows slower, but still grows with number of permutations. When using lazyWeakCompositions memory usage grows until it reaches size of one chunk, then it's (almost) constant.

For k=8 number of permutations per partition is relatively small, so scanLazyWeakCompositionsPermPart uses a lot of relatively small chunks, in tested range of n. That's why it's considerably slower and it's memory usage increases slowly with n.

For those who don't have C compiler here's version using CompiledFunction instead of LibraryFunction. It can be compiled to virtual machine. weakCompositions and lazyWeakCompositions functions work exactly as above, but are slower and less memory efficient.

ClearAll[nextWeakCompositionsChunk]
nextWeakCompositionsChunk = 
    Compile[{{n, _Integer}, {k, _Integer}, {prevComposition, _Integer, 1}, {chunkSize, _Integer}},
        Module[{lastNonZeroPosition = k, composition = prevComposition},
            While[composition[[lastNonZeroPosition]] == 0, --lastNonZeroPosition];
            Table[

                ++composition[[lastNonZeroPosition - 1]];
                If[lastNonZeroPosition == k,
                    --composition[[-1]]
                ,(* else *)
                    composition[[-1]] = composition[[lastNonZeroPosition]] - 1;
                    composition[[lastNonZeroPosition]] = 0
                ];
                If[composition[[-1]] == 0,
                    --lastNonZeroPosition
                ,(* else *)

                    lastNonZeroPosition = k
                ];
                composition
                ,
                chunkSize
            ]
        ],
        RuntimeOptions -> "Speed",
        CompilationTarget -> "WVM"
    ];


ClearAll[weakCompositions]
weakCompositions[n_Integer?NonNegative, 0] := {}
weakCompositions[n_Integer?NonNegative, 1] := {{n}}
weakCompositions[n_Integer?NonNegative, k_Integer /; k >= 2] :=
    nextWeakCompositionsChunk[n, k, Join[ConstantArray[0, k - 2], {-1, n + 1}], Binomial[n + k - 1, n]]


Needs["Streaming`"]


ClearAll[lazyWeakCompositions]
lazyWeakCompositions[n_Integer?NonNegative, 0, chunkSize : _Integer?Positive : 10^5, opts : OptionsPattern[]] := LazyListCreate[{}, chunkSize, opts]
lazyWeakCompositions[n_Integer?NonNegative, 1, chunkSize : _Integer?Positive : 10^5, opts : OptionsPattern[]] := LazyListCreate[{{n}}, chunkSize, opts]
lazyWeakCompositions[n_Integer?NonNegative, k_Integer /; k >= 2, chunkSize : _Integer?Positive : 10^5, opts : OptionsPattern[]] :=
    With[{length = Binomial[n + k - 1, n]},
        Module[{active = False, compositionsLeft = length, prevComposition = Join[ConstantArray[0, k - 2], {-1, n + 1}]},
            LazyListCreate[
                IteratorCreate[
                    ListIterator,
                    (active = True) &,

                    With[{realChunkSize = Min[chunkSize, compositionsLeft]},
                        If[realChunkSize === 0,
                            {}
                        ,(*else*)
                            compositionsLeft -= realChunkSize;
                            With[{taken = nextWeakCompositionsChunk[n, k, prevComposition, realChunkSize]},
                                prevComposition = Last[taken];
                                taken
                            ]
                        ]

                    ] &,
                    TrueQ[active] &,
                    Remove[active, compositionsLeft, prevComposition] &
                ],
                chunkSize,
                opts,
                "Length" -> length,
                "FiniteList" -> True
            ]
        ]

    ]

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

Blog

Search This Blog

performance tuning - Efficient lazy weak compositions

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?