One of the key steps in merge sort is the merging step. Given two sorted lists
sorted1={2,6,10,13,16,17,19};
sorted2={1,3,4,5,7,8,9,11,12,14,15,18,20};
of integers, we want to produce a new list as follows:
- Start with an empty list
acc
. - Compare the first elements of
sorted1
andsorted2
. Append the smaller one toacc
. - Remove the element used in step 2 from either
sorted1
orsorted2
. - If neither
sorted1
norsorted2
is empty, go to step 2. Otherwise append the remaining list toacc
and output the value ofacc
.
Applying this process to sorted1
and sorted2
, we get
acc={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
Added in response to Rojo's question: We can carry out this procedure even if the two lists are not pre-sorted. So list1
and list2
below are not assumed to be sorted.
If there were a built-in function MergeList
which carries out this process, it would probably take three arguments list1
, list2
, and f
. Here f
is a Boolean function of two arguments used to decide which element to pick. In the case of merge sort, f = LessEqual
. I feel that MergeList
is a fundamental list operation, so
Question 1: Is there such a built-in function or one very close to that?
If I were to write such a function in Scheme, I would use a recursive definition equivalent to the following:
MergeList[list1_,{},f_,acc_:{}]:=Join[acc,list1];
MergeList[{},list2_,f_,acc_:{}]:=Join[acc,list2];
MergeList[list1_,list2_,f_,acc_:{}]:=
If[
f@@First/@{list1,list2},
MergeList[Rest[list1],list2,f,Append[acc,First[list1]]],
MergeList[list1,Rest[list2],f,Append[acc,First[list2]]]
]
Sample output with unsorted lists:
In[2]:= MergeList[{2,5,1},{3,6,4},LessEqual]
Out[2]= {2,3,5,1,6,4}
My impression is that recursive solutions tend to be inefficient in Mathematica, so
Question 2: What would be a better way to implement MergeList
?
If you have tips about converting loops into their functional equivalents, feel free to mention them as well.
Answer
Preamble
Since I agree that it would be nice to have a generic function of this type, I will provide a general implementation. First, I will give a generic one based on linked lists, then I will add a JIT-compiled one for special numeric types, and lastly, I will bring it all together.
Top-level implementation based on linked lists
Here is a reasonably efficient implementation based on linked lists:
ClearAll[toLinkedList, ll];
SetAttributes[ll, HoldAllComplete];
toLinkedList[s_List] := Fold[ll[#2, #1] &, ll[], Reverse[s]];
and the main function:
ClearAll[merge];
merge[a_ll, ll[], s_, _] := List @@ Flatten[ll[s, a], Infinity, ll];
merge[ll[], b_ll, s_, _] := List @@ Flatten[ll[s, b], Infinity, ll];
merge[ll[a1_, atail_], b : ll[b1_, _], s_, f_: LessEqual] /;f[a1, b1] :=
merge[atail, b, ll[s, a1], f];
merge[a : ll[a1_, _], ll[b1_, brest_], s_, f_: LessEqual] :=
merge[a, brest, ll[s, b1], f];
merge[a_List, b_List, f_: LessEqual] :=
merge[toLinkedList@a, toLinkedList@b, ll[], f];
For example:
merge[{2,5,1},{3,6,4},LessEqual]
{2,3,5,1,6,4}
merge[{2,5,1},{3,6,4},Greater]
{3,6,4,2,5,1}
And also for large lists:
large1 = RandomInteger[100, 10000];
large2 = RandomInteger[100, 10000];
Block[{$IterationLimit = Infinity},
merge[large1,large2,LessEqual]]//Short//AbsoluteTiming
{0.0751953,{70,54,78,84,11,21,41,49,78,93,90,70,19,
<<19975>>,42,2,10,40,53,12,2,47,89,40,2,80}}
For a complete implementation of merge sort algorithm based on linked lists, see this post (the difference there is that I used repeated rule application instead of recursion. Originally, the goal of that example was to show that ReplaceRepeated
is not necessarily slow if the patterns are constructed efficiently).
Full implementation including JIT-compilation
I'd like to show here how one could implement a fairly complete function which would automatically dispatch to an efficient JIT-compiled code when the arguments are appropriate. Compilation will work not just for numeric lists, but for lists of tensors in general, as long as they are of the same shape.
JIT - compilation
First comes the JIT-compiled version, done along the lines discussed in this answer, section "Making JIT-compiled functions"
ClearAll[mergeJIT];
mergeJIT[pred_, listType_, target : ("MVM" | "C") : "MVM"] :=
mergeJIT[pred, Verbatim[listType], target] =
Block[{fst, sec},
With[{decl = {Prepend[listType, fst], Prepend[listType, sec]}},
Compile @@
Hold[decl,
Module[{result = Table[0, {Length[fst] + Length[sec]}], i = 0,
fctr = 1, sctr = 1},
While[fctr <= Length[fst] && sctr <= Length[sec],
If[pred[fst[[fctr]], sec[[sctr]]],
result[[++i]] = fst[[fctr++]],
(* else *)
result[[++i]] = sec[[sctr++]]
]
];
If[fctr > Length[fst],
result[[i + 1 ;; -1]] = sec[[sctr ;; -1]],
(* else *)
result[[i + 1 ;; -1]] = fst[[fctr ;; -1]]
];
result
],
CompilationTarget -> target
]]];
You can use this in isolation:
mergeJIT[LessEqual,{_Integer,1},"MVM"][{2,5,1},{3,6,4}]
{2,3,5,1,6,4}
but it is much better to use as a part of the generic function, which would figure out the types for you automatically.
Generic function implementation
This is a function to find the type of our lists:
Clear[getType, $useCompile];
getType[arg_List] /; $useCompile && ArrayQ[arg, _, IntegerQ] :=
{_Integer, Length@Dimensions@arg};
getType[arg_List] /; $useCompile && ArrayQ[arg, _, NumericQ] &&
Re[arg] == arg :=
{_Real, Length@Dimensions@arg};
getType[_] := General;
This is a function to dispatch to a right type:
Clear[mergeDispatch];
SetAttributes[mergeDispatch, Orderless];
mergeDispatch[{Verbatim[_Integer], n_}, {Verbatim[_Real], n_}, pred_] :=
mergeDispatch[{Verbatim[_Real], n}, {Verbatim[_Real], n}, pred];
mergeDispatch[f : {Verbatim[_Real], n_}, {Verbatim[_Real], n_}, pred_] :=
mergeJIT[pred, f, $target];
mergeDispatch[f : {Verbatim[_Integer], n_}, {Verbatim[_Integer], n_}, pred_] :=
mergeJIT[pred, f, $target];
mergeDispatch[_, _, pred_] :=
Function[{fst, sec},
Block[{$IterationLimit = Infinity},
merge[fst, sec, pred]]];
and this is a function to bring it all together:
ClearAll[mergeList];
Options[mergeList] =
{
CompileToC -> False,
Compiled -> True
};
mergeList[f_, s_, pred_, opts : OptionsPattern[]] :=
Block[{
$target = If[TrueQ[OptionValue[CompileToC]], "C", "MVM"],
$useCompile = TrueQ[OptionValue[Compiled]]
},
mergeDispatch[getType@f, getType@s, pred][f, s]
];
Finally, a helper function to clear the cache of mergeJIT
, if that would be desired:
ClearAll[clearMergeJITCache];
clearMergeJITCache[] :=
DownValues[mergeJIT] = {Last@DownValues[mergeJIT]};
Benchmarks and tests
First, create test data:
clearMergeJITCache[];
huge1 = RandomInteger[1000,1000000];
huge2 = RandomInteger[1000,1000000];
A first call to the function with C compilation target is expensive:
mergeList[huge1,huge2,Less,CompileToC -> True]//Short//AbsoluteTiming
{3.8652344,{267,461,66,607,797,116,197,474,852,805,135,
<<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}
But then, for the same types of lists, it will pay off for huge lists:
mergeList[huge1,huge2,Less,CompileToC -> True]//Short//AbsoluteTiming
{0.0468750,{267,461,66,607,797,116,197,474,852,805,135,
<<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}
On the other hand, the call with MVM target is fast out of the box, but not as fast as the one with the C target after the "warm-up":
mergeList[huge1,huge2,Less]//Short//AbsoluteTiming
{0.2138672,{267,461,66,607,797,116,197,474,852,805,135,
<<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}
The call to generic one is general but comparatively very slow:
mergeList[huge1,huge2,Less,Compiled->False]//Short//AbsoluteTiming
{5.015,{267,461,66,607,797,116,197,474,852,805,135,
<<1999978>>,266,667,799,280,261,930,241,83,594,904,894}}
Comments
Post a Comment