performance tuning - Speeding up the built-in Rudin-Shapiro and Thue-Morse sequence functions

Version 10.2 introduced two well-studied sequences as functions: the (Golay-)Rudin-Shapiro sequence (RudinShapiro[]) and the (Prouhet-)Thue-Morse sequence (ThueMorse[]). Since these functions are defined in terms of the bits of an integer, one would expect these functions to evaluate very fast through low-level bit operations.

Out of curiosity, I had a few volunteers do some tests on these two functions, and it would seem that they are not quite as fast as one would like, thus violating my assumption that these were implemented at bit level.

So: are there more efficient implementations of these two functions?

Answer

I can't take much credit for this answer--I hadn't even got version 10.2 installed until J. M. commented to me that these functions could be written efficiently in terms of the Hamming weight function. But, it is understandable that he doesn't want to write an answer using a smartphone.

The definition of the built-in ThueMorse is:

ThueMorse[n_Integer] := Mod[DigitCount[n, 2, 1], 2]

And, sure enough, the performance of DigitCount used in this way is exactly what Mr. Wizard complained about previously. Let's re-define it to use the hammingWeight LibraryLink function given in the linked answer:

hammingWeightC = LibraryFunctionLoad[
  "hammingWeight.dll",
  "hammingWeight_T_I", {{Integer, 1, "Constant"}}, {Integer, 0, Automatic}
  ];
hammingWeight[num_Integer] := hammingWeightC@IntegerDigits[num, 2^62];

thueMorse[n_Integer] := Mod[hammingWeight[n], 2]

The performance is considerably improved, at least for large numbers:

test = 10^(10^5);
ThueMorse[test] // RepeatedTiming (* -> 0.00184271 seconds *)
thueMorse[test] // RepeatedTiming (* -> 0.0000230804 seconds *)

That's 80 times faster.

The definition of the built-in RudinShapiro is:

RudinShapiro[n_Integer] := (-1)^StringCount[IntegerString[n, 2], "11", Overlaps -> True]

It is a little strange, in my opinion, to implement the function quite so literally. It can also be written in terms of ThueMorse as:

rudinShapiro[n_Integer] := 1 - 2 ThueMorse[BitAnd[n, Quotient[n, 2]]]

Where the ThueMorse used here is still the built-in version. Its performance is fairly improved as a result of this rewriting (which, again, I do not take any credit for):

test = 10^(10^5);
RudinShapiro[test] // RepeatedTiming (* -> 0.0131471 seconds *)
rudinShapiro[test] // RepeatedTiming (* -> 0.00195247 seconds *)

So, it's almost 7 times faster just by avoiding the use of string functions. What about the case if we also use the improved thueMorse?

rudinShapiro2[n_Integer] := 1 - 2 thueMorse[BitAnd[n, Quotient[n, 2]]]


test = 10^(10^5);
rudinShapiro2[test] // RepeatedTiming (* -> 0.0000490027 seconds *)

This revision is 270 times faster than the built-in version. If its timing only depended on thueMorse, it would be about $7 \times 80 = 560$ times faster. The fact that it comes within a factor of 2 of this limit suggests that BitAnd and Quotient are quite efficient.

But, Quotient still isn't quite as efficient as a bit shift (again, not my idea):

rudinShapiro3[n_Integer] := 1 - 2 thueMorse[BitAnd[n, BitShiftRight[n]]];

test = 10^(10^5);
rudinShapiro3[test] // RepeatedTiming (* -> 0.0000314508 seconds *)

Now it's 420 times faster than the built-in.

Here I will just take the opportunity to remind anyone from WRI who may read this that all of the code in my answers is offered under academic-style permissive licences (your choice of three). So, it should be possible to speed up these functions in the product with minimal effort and without need to write any new code.

Blog

Search This Blog

performance tuning - Speeding up the built-in Rudin-Shapiro and Thue-Morse sequence functions

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?