list manipulation - Implementing a function which generalizes the merging step in merge sort

One of the key steps in merge sort is the merging step. Given two sorted lists

sorted1={2,6,10,13,16,17,19};
sorted2={1,3,4,5,7,8,9,11,12,14,15,18,20};

of integers, we want to produce a new list as follows:

Start with an empty list acc.

Compare the first elements of sorted1 and sorted2. Append the smaller one to acc.

Remove the element used in step 2 from either sorted1 or sorted2.

If neither sorted1 nor sorted2 is empty, go to step 2. Otherwise append the remaining list to acc and output the value of acc.

Applying this process to sorted1 and sorted2, we get

acc={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}

Added in response to Rojo's question: We can carry out this procedure even if the two lists are not pre-sorted. So list1 and list2 below are not assumed to be sorted.

If there were a built-in function MergeList which carries out this process, it would probably take three arguments list1, list2, and f. Here f is a Boolean function of two arguments used to decide which element to pick. In the case of merge sort, f = LessEqual. I feel that MergeList is a fundamental list operation, so

Question 1: Is there such a built-in function or one very close to that?

If I were to write such a function in Scheme, I would use a recursive definition equivalent to the following:

MergeList[list1_,{},f_,acc_:{}]:=Join[acc,list1];
MergeList[{},list2_,f_,acc_:{}]:=Join[acc,list2];
MergeList[list1_,list2_,f_,acc_:{}]:=
 If[
  f@@First/@{list1,list2},
  MergeList[Rest[list1],list2,f,Append[acc,First[list1]]],

  MergeList[list1,Rest[list2],f,Append[acc,First[list2]]]
 ]

Sample output with unsorted lists:

In[2]:= MergeList[{2,5,1},{3,6,4},LessEqual]
Out[2]= {2,3,5,1,6,4}

My impression is that recursive solutions tend to be inefficient in Mathematica, so

Question 2: What would be a better way to implement MergeList?

If you have tips about converting loops into their functional equivalents, feel free to mention them as well.

Answer

Preamble

Since I agree that it would be nice to have a generic function of this type, I will provide a general implementation. First, I will give a generic one based on linked lists, then I will add a JIT-compiled one for special numeric types, and lastly, I will bring it all together.

Top-level implementation based on linked lists

Here is a reasonably efficient implementation based on linked lists:

ClearAll[toLinkedList, ll];
SetAttributes[ll, HoldAllComplete];
toLinkedList[s_List] := Fold[ll[#2, #1] &, ll[], Reverse[s]];

and the main function:

ClearAll[merge];
merge[a_ll, ll[], s_, _] := List @@ Flatten[ll[s, a], Infinity, ll];
merge[ll[], b_ll, s_, _] := List @@ Flatten[ll[s, b], Infinity, ll];
merge[ll[a1_, atail_], b : ll[b1_, _], s_, f_: LessEqual] /;f[a1, b1] :=
    merge[atail, b, ll[s, a1], f];
merge[a : ll[a1_, _], ll[b1_, brest_], s_, f_: LessEqual] :=
    merge[a, brest, ll[s, b1], f];
merge[a_List, b_List, f_: LessEqual] :=
    merge[toLinkedList@a, toLinkedList@b, ll[], f];

For example:

merge[{2,5,1},{3,6,4},LessEqual]

 {2,3,5,1,6,4}

merge[{2,5,1},{3,6,4},Greater]

 {3,6,4,2,5,1}

And also for large lists:

large1 = RandomInteger[100, 10000];
large2 = RandomInteger[100, 10000];

Block[{$IterationLimit = Infinity},
   merge[large1,large2,LessEqual]]//Short//AbsoluteTiming

{0.0751953,{70,54,78,84,11,21,41,49,78,93,90,70,19,

      <<19975>>,42,2,10,40,53,12,2,47,89,40,2,80}}

For a complete implementation of merge sort algorithm based on linked lists, see this post (the difference there is that I used repeated rule application instead of recursion. Originally, the goal of that example was to show that ReplaceRepeated is not necessarily slow if the patterns are constructed efficiently).

Full implementation including JIT-compilation

I'd like to show here how one could implement a fairly complete function which would automatically dispatch to an efficient JIT-compiled code when the arguments are appropriate. Compilation will work not just for numeric lists, but for lists of tensors in general, as long as they are of the same shape.

JIT - compilation

First comes the JIT-compiled version, done along the lines discussed in this answer, section "Making JIT-compiled functions"

ClearAll[mergeJIT];
mergeJIT[pred_, listType_, target : ("MVM" | "C") : "MVM"] :=
  mergeJIT[pred, Verbatim[listType], target] =

    Block[{fst, sec},
      With[{decl = {Prepend[listType, fst], Prepend[listType, sec]}},
       Compile @@
         Hold[decl,
           Module[{result = Table[0, {Length[fst] + Length[sec]}], i = 0, 
                fctr = 1, sctr = 1},
             While[fctr <= Length[fst] && sctr <= Length[sec],
               If[pred[fst[[fctr]], sec[[sctr]]],
                   result[[++i]] = fst[[fctr++]],
                   (* else *)

                   result[[++i]] = sec[[sctr++]]
               ]
             ];
             If[fctr > Length[fst],
               result[[i + 1 ;; -1]] = sec[[sctr ;; -1]],
               (* else *)
               result[[i + 1 ;; -1]] = fst[[fctr ;; -1]]
             ];
             result
           ],

           CompilationTarget -> target
         ]]];

You can use this in isolation:

mergeJIT[LessEqual,{_Integer,1},"MVM"][{2,5,1},{3,6,4}]

 {2,3,5,1,6,4}

but it is much better to use as a part of the generic function, which would figure out the types for you automatically.

Generic function implementation

This is a function to find the type of our lists:

Clear[getType, $useCompile];
getType[arg_List] /; $useCompile && ArrayQ[arg, _, IntegerQ] := 
    {_Integer, Length@Dimensions@arg};
getType[arg_List] /; $useCompile && ArrayQ[arg, _, NumericQ] && 
     Re[arg] == arg := 
        {_Real, Length@Dimensions@arg};
getType[_] := General;

This is a function to dispatch to a right type:

Clear[mergeDispatch];
SetAttributes[mergeDispatch, Orderless];
mergeDispatch[{Verbatim[_Integer], n_}, {Verbatim[_Real], n_}, pred_] :=
    mergeDispatch[{Verbatim[_Real], n}, {Verbatim[_Real], n}, pred];

mergeDispatch[f : {Verbatim[_Real], n_}, {Verbatim[_Real], n_}, pred_] :=
    mergeJIT[pred, f, $target];

mergeDispatch[f : {Verbatim[_Integer], n_}, {Verbatim[_Integer], n_}, pred_] :=

    mergeJIT[pred, f, $target];

mergeDispatch[_, _, pred_] :=
    Function[{fst, sec},
      Block[{$IterationLimit = Infinity},
         merge[fst, sec, pred]]];

and this is a function to bring it all together:

ClearAll[mergeList];
Options[mergeList] = 

 {
    CompileToC -> False,
    Compiled -> True
 };
mergeList[f_, s_, pred_, opts : OptionsPattern[]] :=
  Block[{
       $target = If[TrueQ[OptionValue[CompileToC]], "C", "MVM"],
       $useCompile = TrueQ[OptionValue[Compiled]]
    },
    mergeDispatch[getType@f, getType@s, pred][f, s]

  ];

Finally, a helper function to clear the cache of mergeJIT, if that would be desired:

ClearAll[clearMergeJITCache];
clearMergeJITCache[] :=
   DownValues[mergeJIT] = {Last@DownValues[mergeJIT]};

Benchmarks and tests

First, create test data:

clearMergeJITCache[];

huge1 = RandomInteger[1000,1000000];
huge2 = RandomInteger[1000,1000000];

A first call to the function with C compilation target is expensive:

mergeList[huge1,huge2,Less,CompileToC -> True]//Short//AbsoluteTiming

 {3.8652344,{267,461,66,607,797,116,197,474,852,805,135,
     <<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}

But then, for the same types of lists, it will pay off for huge lists:

mergeList[huge1,huge2,Less,CompileToC -> True]//Short//AbsoluteTiming

 {0.0468750,{267,461,66,607,797,116,197,474,852,805,135,
       <<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}

On the other hand, the call with MVM target is fast out of the box, but not as fast as the one with the C target after the "warm-up":

mergeList[huge1,huge2,Less]//Short//AbsoluteTiming

 {0.2138672,{267,461,66,607,797,116,197,474,852,805,135,
       <<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}

The call to generic one is general but comparatively very slow:

mergeList[huge1,huge2,Less,Compiled->False]//Short//AbsoluteTiming

 {5.015,{267,461,66,607,797,116,197,474,852,805,135,
       <<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

Blog

Search This Blog