I naively thought that the second argument of BeginPackage
can simply be used to ensure the loading and availability in the $ContextPath
of additional packages.
Example:
(* Pack1.m *)
BeginPackage["Pack1`"]
Print["Pack1: ", $ContextPath]
EndPackage[]
and
(* Pack2.m *)
BeginPackage["Pack2`", {"Pack1`"}]
Print["Pack2: ", $ContextPath]
EndPackage[]
Now loading Pack2`
gives me this:
Needs["Pack2`"]
During evaluation of Pack1: {Pack1`,System`}
During evaluation of Pack2: {Pack2`,Pack1`,System`}
$ContextPath
(* {"Pack2`", "Pack1`", ...} *)
Everything is as I expected. Pack1`
is available for use within the implementation of Pack2 and it's also available to the user (i.e. included in the $ContextPath
) after Pack2 has finished loading.
This second behaviour is what makes the second argument of BeginPackage
so convenient. If we had done BeginPackage["Pack2`"]; Needs["Pack1`"]; ...
instead, then Pack1`
would have been available for use only within the implementation of Pack2, but not after Pack2 has finished loading.
Let us add a third package that depends on Pack2`
now.
(* Pack3.m *)
BeginPackage["Pack3`", {"Pack2`"}]
Print["Pack3: ",$ContextPath]
EndPackage[]
In a fresh kernel, let's load Pack3.
Needs["Pack3`"]
During evaluation of Pack1: {Pack1`,System`}
During evaluation of Pack2: {Pack2`,Pack1`,System`}
During evaluation of Pack3: {Pack3`,Pack2`,System`}
$ContextPath
(* {"Pack3`", "Pack2`", "Pack1`", ...} *)
Just like before, loading only Pack3`
makes all three packages available to the user (i.e. includes all of them in the $ContextPath
).
But wait! Adding the Pack2`
dependency to Pack3 did not make Pack1`
available within the implementation of Pack3. I found this surprising and I got bitten by it (it caused a strange bug).
Do others find this surprising too? What is the reasoning for this behaviour? What are its advantages compared to simply ensuring that Pack2 makes Pack1 available as well regardless of whether we need it inside of a package (for its implementation) or outside of it (for interactive use)?
What implications does this behaviour have for proper package design, especially in the case when the Kernel/init.m
file of a package Get
s several sub-packages, each of which have their own context and BeginPackage
? This complex package might then be the dependency of another.
Answer
I've certainly encountered this behavior before. While I can't speak authoritatively, I'd think this is as designed, although it does introduce certain inconsistency. I also think that this issue is a result of clash of cultures: the end user - oriented one from the earlier days of Mathematica, and the one coming from standard software-engineering practices.
A bit of history, and the package extension model
I think, it may help to go a little into the history of the package mechanism.
On one hand, private import via Needs
inside a package wasn't always available, in the sense that such an import in early versions of Mathematica would still keep the context of the loaded package on the $ContextPath
, after the package has been loaded, even if called in the private section of the package.
On the other hand, to me it looks like early package development practices had, in particular, these features:
- (Deeply) nested package dependencies were not very common
- In many cases, one wanted to extend some of the built-in packages, giving the end user the functionality of one of the core package extended with some additional functionality, rather than build a package based on other packages but expose only that package's interface.
- The end-user typically was not supposed to know anything about package development and such. In other words, the majority of end-users were considered non-programmers (which was IMO quite justified).
So, the main goals of the second argument of BeginPackage
were, I think, these:
- The usual one - make the functionality from packages listed in the second argument of
BeginPackage
available for the implementation of a given package - Provide a formal declarative syntax to specify package dependencies
- Provide a way to expose an extended package
"MyPackage`"
, that would also make additional packages available to the end-users without a need to callNeeds
on them separately.
While the first two goals are really necessary and are present in one form or another in most programming languages, the last one is different. And the inconsistency caused by it, is exactly what you noted: if we think of a package as an extension of one or more other packages, then it should always behave like that, no matter which way we load it.
Why I think the current behavior is better
From the developer's perspective, I'd argue that the current behavior is better than if BeginPackage
was loading all those packages and their dependencies on the $ContextPath
. Here are the reasons:
- Information-hiding. It is always better to keep only as much information available to the client code, as it needs, and no more.
Level of control. By restricting the set of loaded packages to only those listed in the second argument of
BeginPackage
(but not their dependencies), the system allows the developer to second-guess developers of those packages s/he uses to build a given package at hand. The developer can then control exactly which packages are on the$ContextPath
during the loading of their own code.If the packages were loaded automatically with all their dependencies, the developer of a given package would need to care about name clashes with all those dependent packages, and perhaps remove them from the
$ContextPath
. However, in general this is not even possible, since s/he is not supposed to know all those dependencies of dependencies.Nesting ambiguity. If the dependencies of dependencies were loaded too, then which packages do we keep on the
$ContextPath
after the package loads? Where do we stop in this nesting? We may end up adding a bunch of contexts to the$ContextPath
, which would've been really bad.
Implications for package design
I think, the guiding principles should be separation of responsibility and information-hiding.
The fact that your package A depends on packages B and C, is an internal business of your package, about which the users of A could not care less. So, there are two different cases here:
You do want to expose some of the functionality of B and / or C to the end user of your package A
As I said before, this really doesn't sound like a right approach to me, in particular because, preferably, for any functionality there should be a single source that provides it and is responsible for it. Of course, duplication of code is the worst, but duplication of the interface is almost as bad.
However, if you really need to, there is always delegation: create your own wrappers around these functions from B and C you need, and expose these wrappers as a part of the interface of A. This means that now you take the responsibility of those functions as a part of the A's interface - which is IMO the right thing to do. If later something changes in the functionality of B or C, it will be your responsibility to maintain the consistency of the interface of A, so you then don't put that burden on the users of A.
You don't require the end user of A to have an access of B and C - then you don't need anything besides the standard single - package development machinery.
If your package contains several sub-packages, you simply load those in your Kernel/init.m
. This does not change the fact that you expose a single interface. If you want to make those sub-packages relatively independent, and packages on their own right, this may require to use some delegation / wrappers inside your main package, as a price for it - as I mentioned above. In most cases, however, I wouldn't do it, but would let those users who need that functionality load those packages (B and / or C in this example) separately, on their own.
Summary
Putting it more bluntly: the extension package idea (leaving contexts on the $ContextPath
after the package has been loaded) may be Ok from the end-user usability point of view, but is IMO a failed concept for software development.
The advantage of such an approach is that it is friendly to non-programmers. The disadvantage is that it does not nest, and therefore also does not scale. From the software engineering perspective, a package can not and should not take any responsibility for other packages, and should only be responsible for its own interface. In addition, "extension package" model makes it harder to enforce information hiding and encapsulation - another two of the fundamental software engineering principles.
So, here I agree with Kuba, in that I tend to use the second argument of BeginPackage
rarely. The only scalable model of imports are private imports, which is what also use all other programming languages I am familiar with.
Comments
Post a Comment