Lets say I have a file named file.m
that contains:
test[] := (
Print["test"]
)
How would I go about extracting the function declarations in such a file consistently as a list?
I have figured out how to extract the data without having it execute.
Import["file.m", "Text"]
This is my current code.
Cases[
FullForm[
MakeExpression[Import["file.m", "Text"], TraditionalForm]
],
SetDelayed[x__, y__] -> f[x, y]
]
Answer
Preamble
This is not such an easy task actually, if you want to do this fast and clean. I have been developing some functionality for one of my side projects, for which I needed to analyze symbols inside packages, so I will share some of the code I ended up with.
Speed
The following function will be two orders of magnitude faster than the one based on Import[...,"HeldExpressions"]
:
ClearAll[loadFile];
loadFile[path_String?FileExistsQ]:=
DeleteCases[
ToExpression[
FromCharacterCode[BinaryReadList[path]],InputForm,HoldComplete
],
Null
]
hopefully this function is still robust enough.
Benchmarks
Benchmarks on a medium-size package:
file = FileNameJoin[{$InstallationDirectory,"SystemFiles","Links","JLink", "JLink.m"}];
Do[Import[file, "HeldExpressions"], {100}] // AbsoluteTiming
Do[loadFile[file], {100}] // AbsoluteTiming
(*
{4.351563, Null}
{0.153320, Null}
*)
The format of the result of loadFile
is a little different - it just wraps all code in one HoldComplete
. But it is easy to transform to the partial HoldComplete
wrapped around the pieces.
The speed difference can be quite important (for my application it was. I would have been in trouble using Import
).
Namespace pollution
The problem
Perhaps even worse than the speed issue is the one of namespace pollution, which is a flaw shared by both methods described so far. The new symbols are being created as a result of parsing the code, and they are created in whatever happens to be the current working context. For example, for the case at hand:
Names["Global`*`*"]//Short
(*
{Global`Private`arg,Global`Private`arg$,Global`Private`e,
<<21>>,Global`Private`val,Global`Private`val$,Global`Package`$jlinkDir}
*)
Sugegsted solution
With the implementation details described at the bottom of the post, here is the way I ended up doing this: I introduced the macro with the following signature:
withHeldCodeInTemporaryContext[{var_Symbol, code_}, fname_, opts : OptionsPattern[]]
which is supposed to read in the file fname
, parse it into some temporary context, assign the result to a variable var
, then execute the code code
, and finally clean up that temporary context. Here is an example of use (you will need to run the code posted below for it to work):
Block[
{
heldCode,
file=
FileNameJoin[
{$InstallationDirectory,"SystemFiles","Links","JLink","JLink.m"}
]
},
withHeldCodeInTemporaryContext[
{
heldCode,
Union[
Cases[heldCode,s_Symbol:>ToString[Unevaluated[s]],\[Infinity],Heads->True]
]
},
file
]
]
This code collects all the symbols which build up the code being read. The result looks like:
(*
{And,AppendTo,BeginPackage,Blank,Check,Close,
<<78>>,$ContextPath,$Failed,$Input,$Off,$SystemID,$VersionNumber}
*)
but one can check that the working context (or any other context) was not polluted.
Implementation
Here is the code (formatting done using the code formatter palette):
SetAttributes[CleanUp,HoldAll];
CleanUp[expr_,cleanup_]:=
Module[{exprFn,result,abort=False,rethrow=True,seq},
exprFn[]:=
expr;
result=
CheckAbort[
Catch[Catch[result=exprFn[];rethrow=False;result],_,seq[##1]&],
abort=True
];
cleanup;
If[abort,Abort[]];
If[rethrow,Throw[result/.
seq->Sequence]];
result
]
SetAttributes[parseInContext,HoldFirst];
Options[parseInContext]=
{
LocalizingContext->"MyLocalizingContext`",
DefaultImportedContexts:>{},
ExtraImportedContexts:>{}
};
parseInContext[code_,opts:OptionsPattern[]]:=
Module[
{
result,
context=OptionValue[LocalizingContext],
defcontexts=OptionValue[DefaultImportedContexts],
extraContexts=OptionValue[ExtraImportedContexts],
allContexts
},
allContexts={Sequence@@defcontexts,Sequence@@extraContexts};
Block[{$ContextPath},
CleanUp[
BeginPackage[context];Needs/@allContexts;result=code,
EndPackage[]
];
result
]
];
ClearAll[inPrivateContext];
SetAttributes[inPrivateContext,HoldAll];
inPrivateContext[code_]:=
CleanUp[Begin["`Private`"];code,End[]];
ClearAll[parseInPrivateSubcontext];
SetAttributes[parseInPrivateSubcontext,HoldFirst];
parseInPrivateSubcontext[code_,opts:OptionsPattern[]]:=
parseInContext[inPrivateContext[code],opts];
ClearAll[withTemporaryContext];
SetAttributes[withTemporaryContext,HoldRest];
withTemporaryContext[context_String,{contVar_Symbol,code_}]:=
Block[{contVar=context},
With[{names=context<>"Private`*",remove=If[Names[#1]=!={},Remove[#1]]&},
CleanUp[remove[names];code,remove[names]]
]
];
ClearAll[withHeldCodeInTemporaryContext];
Options[withHeldCodeInTemporaryContext]=
{
TemporaryContextName->"TemporaryContext`",
ExtraImportedContexts:>{"Global`"}
};
SetAttributes[withHeldCodeInTemporaryContext,HoldFirst];
withHeldCodeInTemporaryContext[
{var_Symbol,code_},fname_,opts:OptionsPattern[]
]:=
Module[{tempcont},
Block[{var},
withTemporaryContext[
OptionValue[TemporaryContextName],
{
tempcont,
parseInPrivateSubcontext[
var=loadFile[fname];code,
LocalizingContext->tempcont,
ExtraImportedContexts->OptionValue[ExtraImportedContexts]
]
}
]
]
];
The code contains a number of macros,some of which are general (like WReach's CleanUp
and a few others), while others specialize the generic ones to the more narrow goals we set here.
Comments
Post a Comment