Skip to main content

built in symbols - What's purpose of the new function BinarySerialize?


11.1 introduced a new function BinarySerialize, but I don't know what it can do better than the traditional method.Its behavior is very similar to Compress,though I cannot find any advantage of it.It even consumes more space than Compress,such as


BinarySerialize[Range[100]] // Normal // Length



805



Compress[Range[100]] // ToCharacterCode // Length


290




And ByteCount[BinarySerialize[Range[100]]] is also greater than ByteCount[Compress[Range[100]]].So what's purpose of this function? Anyone can provide a good example to use it?



Answer



Disclaimer: This answer is written from a user's point of view. For useful insider information on this topic see this discussion with Mathematica developers on Community Forums.


Introduction


Binary serialization is rewriting expressions as an array of bytes (list of integers from Range[0, 255]). Binary representation of expression takes less space than a textual one and also can be exported and imported faster than text.


How do Compress and BinarySerialize functions work?


Compress (with default options) always does three steps:



  1. It performs binary serialization.


  2. It deflates result using zlib.

  3. It transforms deflated result to a Base64-encoded text string.


BinarySerialize performs only binary serialization and sometimes deflates the result using zlib. With default options it will decide itself if it wants to deflate or not. With an option PerformanceGoal -> "Speed" it will avoid deflation. With an option PerformanceGoal -> "Size" it will likely deflate. BinarySerialize returns a ByteArray object. ByteArray is something like a packed array of 8-bit integers. However FullForm of ByteArray is visualized as Base64-encoded text string. This visualization can be somewhat misleading, because internally ByteArrays are stored and operated in binary form, not as text strings.


Binary serialization algorithms of Compress and BinarySerialize


Original serialization algorithm of Compress is described in this answer. That algorithm is not very optimized for size and produces larger-then-necessary output for many typical expressions. For example, it has no support for packed arrays of integers and rewrites such arrays as nested lists, which take a lot bytes.


BinarySerialize uses a more size-optimized binary serialization algorithm compared to what Compress (with default options) does. This algorithm supports packed arrays of integers, has optimizations for integers of different size (8,16,32 bit), stores big integers in binary form (not as text strings), and has other optimizations.


Applications of BinarySerialize


Using BinarySerialize we can write our own Compress-like functions with better compression. For example we can write myCompress function which does the same three steps as original Compress, but uses BinarySerialize for the serialization step:


myCompress[expr_]:=Module[

{compressedBinaryData},
compressedBinaryData = BinarySerialize[expr, PerformanceGoal->"Size"];
Developer`EncodeBase64[compressedBinaryData]
];

myUncompress[string_]:=Module[
{binaryData},
binaryData = Developer`DecodeBase64ToByteArray[string];
BinaryDeserialize[binaryData]
];


Even for simple integer list we can see size reduction.


Compress[Range[100]] // StringLength
(* 290 *)

myCompress[Range[100]] // StringLength
(* 244 *)

myUncompress[myCompress[Range[100]]] === Range[100]
(* True *)


If we take an expression with large number of small integers we get much more noticeable improvement:


bitmap = Rasterize[Plot[x, {x, 0, 1}]];

StringLength[Compress[bitmap]]
(*31246*)

StringLength[myCompress[bitmap]]
(*17820*)


myUncompress[myCompress[bitmap]] === bitmap
(* True *)

Conclusion


The example above shows that the result of a simple user-defined function myCompress based on a BinarySerialize can be almost twice more compact than the result of Compress.


Outlook


To decrease the output size even further one can use a compression algorithm with higher compression settings (in the second step) or use Ascii85-encoding instead of Base64 in the third step.


Appendix 1: Undocumented options of Compress


I have noticed that in Version 11.1 Compress has more undocumented options than in previous versions. Those options allows one to:





  • Disable both compression and Base64 encoding and return a binary serialized result as a string with unprintable characters:


    Compress[Range[100], Method -> {"Version" -> 4}]




  • Change binary serialization algorithm to a more efficient one, but not exactly to BinarySerialize.


    Compress[Range[100], Method -> {"Version" -> 6}] // StringLength


    (* 254 *)





There is also a "ByteArray" option shown in usage message ??Compress but it does not work in Version 11.1.


Note that this behavior is undocumented and may change in future versions.


Appendix 2: Compression option of BinarySerialize


Just for fun one can manually compress result of BinarySerialize[..., PerformanceGoal -> "Speed"] to get the same output as BinarySerialize[..., PerformanceGoal -> "Size"] produces. This can be done with the following code:


myBinarySerializeSize[expr_]:=Module[
{binaryData, dataBytes, compressedBytes},
binaryData = Normal[BinarySerialize[expr, PerformanceGoal->"Speed"]];
dataBytes = Drop[binaryData, 2]; (*remove magic "7:"*)
compressedBytes = Developer`RawCompress[dataBytes];
ByteArray[Join[ToCharacterCode["7C:"], compressedBytes]]

]

We can check that it gives the same result as PerformanceGoal -> "Size" option


data = Range[100];
myBinarySerializeSize[data] === BinarySerialize[data, PerformanceGoal -> "Size"]

Appendix 3: zlib compression functions


Description of undocumented zlib compression/decompression functions Developer`RawCompress and Developer`RawUncompress can be found in this answer.


Appendix 4: Base64 encoding functions


Usage of Base64 encoding/decoding functions from the Developer` context can be explained using the following code:



binaryData = Range[0, 255];

Normal[
Developer`DecodeBase64ToByteArray[
Developer`EncodeBase64[binaryData]
]
] == binaryData

(* True *)

Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...