For reasons that have never been entirely clear to me, Mathematica has had a built-in TakeWhile
function since version 6.0, but has no equivalent DropWhile
function. This means that I find myself periodically writing my own. Since this is a function that I use fairly frequently, I'd like to have a version that is both fast and robust. It's also the kind of function that you can write in a bunch of ways; I've tested variants that depend on While
loops, on using Scan
and Throw
, and on using Position
.
Of these, a version using Position
is the fastest I've found:
DropWhile[list_, test_] :=
With[{pos =
Position[list, elt_ /; ! TrueQ@test[elt], {1}, 1, Heads -> False]},
pos /. {
{} -> {},
{{fail_}} :> Drop[list, fail - 1]
}];
The TrueQ
slows things down a bit, but is there to match the observed behavior of TakeWhile
, which takes elements only as long as the test function returns True
. Are there good ways to make this function work faster?
Answer
Simple solution
Why not just
dropWhile[list_, test_] := Drop[list, LengthWhile[list, test]]
?
Fast JIT-based solution with automatic type identification / dispatch
Here I will show a solution that is potentially much faster on packed arrays. The code is directly modeled after this answer, so I refer to some additional details there.
JIT version with type memoization
Here is the first ingredient: the specialized JIT version
ClearAll[dropWhileJIT];
dropWhileJIT[pred_,listType_,target:"MVM"|"C":"MVM"]:=
dropWhileJIT[pred,Verbatim[listType],target]=
Block[{l},
With[{decl={Prepend[listType,l]}},
Compile@@
Hold[
decl,
Module[{pos=1},
While[pred[l[[pos]]],pos++];Drop[l,pos-1]
],
CompilationTarget->target
]
]
]
which can be tested as
dropWhileJIT[# < 99999 &, {_Integer, 1}, "C"][Range[100000]] // AbsoluteTiming
(* {5.481445, {99999, 100000}} *)
The second and subsequent times this will be blazingly fast:
dropWhileJIT[# < 99999 &, {_Integer, 1}, "C"][Range[100000]] // AbsoluteTiming
(* {0.000977, {99999, 100000}} *)
Here, we also should have a function to clear the cache:
ClearAll[clearDropWhileCache];
clearDropWhileCache[]:=
DownValues[dropWhileJIT]={Last[DownValues[dropWhileJIT]]};
which can be used to remove the memoized definitions.
Automatic type identification and dispatch
Here we will use the following functions:
Clear[getType,$useCompile];
getType[arg_List]/;$useCompile&&ArrayQ[arg,_,IntegerQ]:=
{_Integer,Length[Dimensions[arg]]};
getType[arg_List]/;$useCompile&&ArrayQ[arg,_,NumericQ]&&Re[arg]==arg:=
{_Real,Length[Dimensions[arg]]};
getType[_]:=
General;
and
Clear[dropWhileDispatch];
dropWhileDispatch[
t:{Verbatim[_Integer]|Verbatim[_Real]|Verbatim[_Complex],n_},pred_
]:=
dropWhileJIT[pred,t,$target];
dropWhileDispatch[_,pred_]:=
dropWhileGeneric[#1,pred]&;
Final functions
Here is our previous generic implementation (I changed the name):
ClearAll[dropWhileGeneric];
dropWhileGeneric[list_,test_]:=
Drop[list,LengthWhile[list,test]]
and here is a final function:
ClearAll[dropWhile];
Options[dropWhile]={CompileToC->False,Compiled->True};
dropWhile[lst_List,pred_,opts:OptionsPattern[]]:=
Block[
{
$target=If[TrueQ[OptionValue[CompileToC]],"C","MVM"],
$useCompile=TrueQ[OptionValue[Compiled]]
},
dropWhileDispatch[getType[lst],pred][lst]
];
Benchmarks
JIT-compilation to MVM is really fast:
clearDropWhileCache[];
dropWhile[Range[100000], # < 99999 &] // AbsoluteTiming
(* {0.007813, {99999, 100000}} *)
The second time is faster still, since we don't have to recompile:
dropWhile[Range[100000], # < 99999 &] // AbsoluteTiming
(* {0.006836, {99999, 100000}} *)
The compilation to C
is quite slow:
dropWhile[Range[100000], # < 99999 &,CompileToC -> True] // AbsoluteTiming
(* {4.640625, {99999, 100000}} *)
But gives a considerable further speedup:
dropWhile[Range[100000], # < 99999 &,CompileToC -> True] // AbsoluteTiming
(* {0.001953, {99999, 100000}} *)
Here is what we get from the generic implementation:
dropWhile[Range[100000], # < 99999 &, Compiled -> False] // AbsoluteTiming
(* {0.157226, {99999, 100000}} *)
It is not as bad as it could have been, since LengthWhile
by itself is optimized on packed arrays, but it does not compare with the JIT versions.
The complete code
ClearAll[dropWhileJIT];
dropWhileJIT[pred_,listType_,target:"MVM"|"C":"MVM"]:=
dropWhileJIT[pred,Verbatim[listType],target]=
Block[{l},
With[{decl={Prepend[listType,l]}},
Compile@@
Hold[
decl,
Module[{pos=1},
While[pred[l[[pos]]],pos++];Drop[l,pos-1]
],
CompilationTarget->target
]
]
]
Clear[getType,$useCompile];
getType[arg_List]/;$useCompile&&ArrayQ[arg,_,IntegerQ]:=
{_Integer,Length[Dimensions[arg]]};
getType[arg_List]/;$useCompile&&ArrayQ[arg,_,NumericQ]&&Re[arg]==arg:=
{_Real,Length[Dimensions[arg]]};
getType[_]:=
General;
Clear[dropWhileDispatch];
dropWhileDispatch[
t:{Verbatim[_Integer]|Verbatim[_Real]|Verbatim[_Complex],n_},pred_
]:=
dropWhileJIT[pred,t,$target];
dropWhileDispatch[_,pred_]:=
dropWhileGeneric[#1,pred]&;
ClearAll[dropWhileGeneric];
dropWhileGeneric[list_,test_]:=
Drop[list,LengthWhile[list,test]]
ClearAll[dropWhile];
Options[dropWhile]={CompileToC->False,Compiled->True};
dropWhile[lst_List,pred_,opts:OptionsPattern[]]:=
Block[
{
$target=If[TrueQ[OptionValue[CompileToC]],"C","MVM"],
$useCompile=TrueQ[OptionValue[Compiled]]
},
dropWhileDispatch[getType[lst],pred][lst]
];
ClearAll[clearDropWhileCache];
clearDropWhileCache[]:=
DownValues[dropWhileJIT]={Last[DownValues[dropWhileJIT]]};
Comments
Post a Comment