Is it possible for LibraryLink function to be Listable like that in Compile?

We know in Compile, there is an Option RuntimeAttributes -> {Listable} which can let the compiled function easily explore the power of thread parallelization.

So is it possible for a LibraryLink function to be Listable like Compile?

Answer

There is an easier solution than the one I gave almost 2 years ago. In principle, you wrap your library function inside another CompiledFunction that is listable. Let the code speak:

fun = LibraryFunctionLoad["demo", "demo_I_I", {Integer}, Integer];
With[{fc = fun}, 
  funListable = Compile[{{i, _Integer, 0}}, fc[i],
    RuntimeAttributes -> {Listable},
    Parallelization -> True]
];

If you inspect the compiled function, you see there is no external call. Instead, you find a special instruction that calls the library function code:

The performance is exceptionally good even though we have this wrapping. It might be not the best idea to profile such a simple function, but let's do it anyway. For comparison, I'm creating the same function as directly compiled code

inc = Compile[{{i, _Integer, 0}},
  i + 1,
  RuntimeAttributes -> {Listable},
  Parallelization -> True,

  (* CompilationTarget -> "C" *)
  ]

You should leave out the CompilationTarget as it was in my tests slower than the one that runs on the virtual machine:

r = Range[10^7];
funListable[r]; // AbsoluteTiming
inc[r]; // AbsoluteTiming
funListable[r] === inc[r]

For the parallelized library function, I get a runtime of 0.42 seconds, while the compiled version needs 0.53 seconds (about 0.7 seconds with compilation target "C"). If you want to see this paradigm outperform other solutions, you should read this answer of mine where I used it to parallelize a highly complex and non-trivial c-code that came from a library.

It seems it is possible. At least I got a toy-example that works. Without having any specific information about this topic from WRI, I always suspected that LibraryLink was not mainly created to give users a way to attach shared library functions to the kernel. I believe that the underlying technology was first used in Compile to make CompilationTarget->"C" possible and afterwards, a part of the framework was exposed to the user to make it possible to use LibraryLink. If someone has more information about this, please feel free to add it here.

That being said, when you look at the InputForm of a simple compiled function, you will find that it contains a LibraryFunction in the exact same way you would get it when loading your own library functions with LibraryFunctionLoad:

fc = Compile[{{x, _Integer, 0}},
  x,
  Parallelization -> True,
  RuntimeAttributes -> {Listable},
  CompilationTarget -> "C"
  ];
InputForm[fc]


(*
CompiledFunction[{10, 10.3, 5852}, {_Integer}, 
  {{2, 0, 0}, {2, 0, 0}}, {}, {0, 1, 0, 0, 0}, {{1}}, 
  Function[{x}, x, Listable], Evaluate, 
  LibraryFunction["/home/some/path/compiledFunction0.so",     
    "compiledFunction0", {{Integer, 0, "Constant"}}, Integer
  ]
]
*)

Some tests with Compile seemed to indicate, that the code that is created is not different with or without using the Listable attribute. This makes me believe that the distribution of arguments for a parallel evaluation of fc happens before the actual LibraryFunction is called. Therefore, when fc is called with a tensor, there might be some wrapper C function that calls the underlying LibraryFunction on all elements of the tensor in parallel.

My idea was that it might be possible to replace the LibraryFunction inside this CompiledFunction when the type of the function is correct. In the above example fc gets a single integer and returns a single integer. Let us use a LibraryLink example of the same type:

libFun = LibraryFunctionLoad["demo",   "demo_I_I", 
  {{Integer, 0, "Constant"}}, Integer]

Note that this function does something different than fc because it increments its argument by one. Additionally, we can ensure that it is not Listable:

Looking at InputForm[libFun] reveals that it has the exact same type as the LibraryFunction inside fc except that does something completely differently and was created by us, not by Compile. Let us inject our libFun inside the existing CompiledFunction

fcLibFunc = fc /. _LibraryFunction -> libFun;

fcLibFunc[10]
(* 11 *)

Now the big question is, is fcLibFunc working on lists doing the work in parallel?

fcLibFunc[{1, 2, 3, 4, 5, 6, 7, 8, 9}]
(* {2, 3, 4, 5, 6, 7, 8, 9, 10} *)

That seems to work. Creating a bigger example shows that the function runs parallel. Let us time this simple toy function against a compiled function that does the same:

fc2 = Compile[{{x, _Integer, 0}},
   x + 1,
   Parallelization -> True,
   RuntimeAttributes -> {Listable},
   CompilationTarget -> "C"
   ];

r = Range[10^6];



Do[fc2[r], {100}] // AbsoluteTiming
(* {3.97018, Null} *)

Do[fcLibFunc[r], {100}] // AbsoluteTiming
(* {3.13338, Null} *)

I have measured it several times and it seems our fcLibFunc needs on my machine only 80% of the runtime of fc2. I do not know why this is and whether it can be generalized, but we could show that it is possible to make a library function parallel-Listable.

Let me end by making clear the steps to do this yourself:

Create a fake compiled function like above that has the exact same type that your library function has. Please note that you cannot use library functions that change their input arguments. Therefore, you should always use "Constant" passing.

Load your library function and replace the last argument inside CompiledFunction with it. This ensures that your library function is called instead of the code that was created by Compile.

Think carefully about that when you call this new function with the wrong arguments, the highlevel fake code is used! To give an example, try to evaluate fcLibFunc[I].

Blog

Search This Blog

Is it possible for LibraryLink function to be Listable like that in Compile?

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

dynamic - How can I make a clickable ArrayPlot that returns input?