We know in Compile
, there is an Option RuntimeAttributes -> {Listable}
which can let the compiled function easily explore the power of thread parallelization.
So is it possible for a LibraryLink function to be Listable
like Compile
?
Answer
There is an easier solution than the one I gave almost 2 years ago. In principle, you wrap your library function inside another CompiledFunction
that is listable. Let the code speak:
fun = LibraryFunctionLoad["demo", "demo_I_I", {Integer}, Integer];
With[{fc = fun},
funListable = Compile[{{i, _Integer, 0}}, fc[i],
RuntimeAttributes -> {Listable},
Parallelization -> True]
];
If you inspect the compiled function, you see there is no external call. Instead, you find a special instruction that calls the library function code:
The performance is exceptionally good even though we have this wrapping. It might be not the best idea to profile such a simple function, but let's do it anyway. For comparison, I'm creating the same function as directly compiled code
inc = Compile[{{i, _Integer, 0}},
i + 1,
RuntimeAttributes -> {Listable},
Parallelization -> True,
(* CompilationTarget -> "C" *)
]
You should leave out the CompilationTarget
as it was in my tests slower than the one that runs on the virtual machine:
r = Range[10^7];
funListable[r]; // AbsoluteTiming
inc[r]; // AbsoluteTiming
funListable[r] === inc[r]
For the parallelized library function, I get a runtime of 0.42 seconds, while the compiled version needs 0.53 seconds (about 0.7 seconds with compilation target "C"). If you want to see this paradigm outperform other solutions, you should read this answer of mine where I used it to parallelize a highly complex and non-trivial c-code that came from a library.
It seems it is possible. At least I got a toy-example that works. Without having any specific information about this topic from WRI, I always suspected that LibraryLink was not mainly created to give users a way to attach shared library functions to the kernel. I believe that the underlying technology was first used in Compile
to make CompilationTarget->"C"
possible and afterwards, a part of the framework was exposed to the user to make it possible to use LibraryLink. If someone has more information about this, please feel free to add it here.
That being said, when you look at the InputForm
of a simple compiled function, you will find that it contains a LibraryFunction
in the exact same way you would get it when loading your own library functions with LibraryFunctionLoad
:
fc = Compile[{{x, _Integer, 0}},
x,
Parallelization -> True,
RuntimeAttributes -> {Listable},
CompilationTarget -> "C"
];
InputForm[fc]
(*
CompiledFunction[{10, 10.3, 5852}, {_Integer},
{{2, 0, 0}, {2, 0, 0}}, {}, {0, 1, 0, 0, 0}, {{1}},
Function[{x}, x, Listable], Evaluate,
LibraryFunction["/home/some/path/compiledFunction0.so",
"compiledFunction0", {{Integer, 0, "Constant"}}, Integer
]
]
*)
Some tests with Compile
seemed to indicate, that the code that is created is not different with or without using the Listable
attribute. This makes me believe that the distribution of arguments for a parallel evaluation of fc
happens before the actual LibraryFunction
is called. Therefore, when fc is called with a tensor, there might be some wrapper C function that calls the underlying LibraryFunction
on all elements of the tensor in parallel.
My idea was that it might be possible to replace the LibraryFunction
inside this CompiledFunction
when the type of the function is correct. In the above example fc
gets a single integer and returns a single integer. Let us use a LibraryLink
example of the same type:
libFun = LibraryFunctionLoad["demo", "demo_I_I",
{{Integer, 0, "Constant"}}, Integer]
Note that this function does something different than fc
because it increments its argument by one. Additionally, we can ensure that it is not Listable
:
Looking at InputForm[libFun]
reveals that it has the exact same type as the LibraryFunction
inside fc
except that does something completely differently and was created by us, not by Compile
. Let us inject our libFun
inside the existing CompiledFunction
fcLibFunc = fc /. _LibraryFunction -> libFun;
fcLibFunc[10]
(* 11 *)
Now the big question is, is fcLibFunc
working on lists doing the work in parallel?
fcLibFunc[{1, 2, 3, 4, 5, 6, 7, 8, 9}]
(* {2, 3, 4, 5, 6, 7, 8, 9, 10} *)
That seems to work. Creating a bigger example shows that the function runs parallel. Let us time this simple toy function against a compiled function that does the same:
fc2 = Compile[{{x, _Integer, 0}},
x + 1,
Parallelization -> True,
RuntimeAttributes -> {Listable},
CompilationTarget -> "C"
];
r = Range[10^6];
Do[fc2[r], {100}] // AbsoluteTiming
(* {3.97018, Null} *)
Do[fcLibFunc[r], {100}] // AbsoluteTiming
(* {3.13338, Null} *)
I have measured it several times and it seems our fcLibFunc
needs on my machine only 80% of the runtime of fc2
. I do not know why this is and whether it can be generalized, but we could show that it is possible to make a library function parallel-Listable
.
Let me end by making clear the steps to do this yourself:
Create a fake compiled function like above that has the exact same type that your library function has. Please note that you cannot use library functions that change their input arguments. Therefore, you should always use
"Constant"
passing.Load your library function and replace the last argument inside
CompiledFunction
with it. This ensures that your library function is called instead of the code that was created byCompile
.Think carefully about that when you call this new function with the wrong arguments, the highlevel fake code is used! To give an example, try to evaluate
fcLibFunc[I]
.
Comments
Post a Comment