Lets say I have a file named file.m that contains:
test[] := (
Print["test"]
)
How would I go about extracting the function declarations in such a file consistently as a list?
I have figured out how to extract the data without having it execute.
Import["file.m", "Text"]
This is my current code.
Cases[
FullForm[
MakeExpression[Import["file.m", "Text"], TraditionalForm]
],
SetDelayed[x__, y__] -> f[x, y]
]
Answer
Preamble
This is not such an easy task actually, if you want to do this fast and clean. I have been developing some functionality for one of my side projects, for which I needed to analyze symbols inside packages, so I will share some of the code I ended up with.
Speed
The following function will be two orders of magnitude faster than the one based on Import[...,"HeldExpressions"]:
ClearAll[loadFile];
loadFile[path_String?FileExistsQ]:=
DeleteCases[
ToExpression[
FromCharacterCode[BinaryReadList[path]],InputForm,HoldComplete
],
Null
]
hopefully this function is still robust enough.
Benchmarks
Benchmarks on a medium-size package:
file = FileNameJoin[{$InstallationDirectory,"SystemFiles","Links","JLink", "JLink.m"}];
Do[Import[file, "HeldExpressions"], {100}] // AbsoluteTiming
Do[loadFile[file], {100}] // AbsoluteTiming
(*
{4.351563, Null}
{0.153320, Null}
*)
The format of the result of loadFile is a little different - it just wraps all code in one HoldComplete. But it is easy to transform to the partial HoldComplete wrapped around the pieces.
The speed difference can be quite important (for my application it was. I would have been in trouble using Import).
Namespace pollution
The problem
Perhaps even worse than the speed issue is the one of namespace pollution, which is a flaw shared by both methods described so far. The new symbols are being created as a result of parsing the code, and they are created in whatever happens to be the current working context. For example, for the case at hand:
Names["Global`*`*"]//Short
(*
{Global`Private`arg,Global`Private`arg$,Global`Private`e,
<<21>>,Global`Private`val,Global`Private`val$,Global`Package`$jlinkDir}
*)
Sugegsted solution
With the implementation details described at the bottom of the post, here is the way I ended up doing this: I introduced the macro with the following signature:
withHeldCodeInTemporaryContext[{var_Symbol, code_}, fname_, opts : OptionsPattern[]]
which is supposed to read in the file fname, parse it into some temporary context, assign the result to a variable var, then execute the code code, and finally clean up that temporary context. Here is an example of use (you will need to run the code posted below for it to work):
Block[
{
heldCode,
file=
FileNameJoin[
{$InstallationDirectory,"SystemFiles","Links","JLink","JLink.m"}
]
},
withHeldCodeInTemporaryContext[
{
heldCode,
Union[
Cases[heldCode,s_Symbol:>ToString[Unevaluated[s]],\[Infinity],Heads->True]
]
},
file
]
]
This code collects all the symbols which build up the code being read. The result looks like:
(*
{And,AppendTo,BeginPackage,Blank,Check,Close,
<<78>>,$ContextPath,$Failed,$Input,$Off,$SystemID,$VersionNumber}
*)
but one can check that the working context (or any other context) was not polluted.
Implementation
Here is the code (formatting done using the code formatter palette):
SetAttributes[CleanUp,HoldAll];
CleanUp[expr_,cleanup_]:=
Module[{exprFn,result,abort=False,rethrow=True,seq},
exprFn[]:=
expr;
result=
CheckAbort[
Catch[Catch[result=exprFn[];rethrow=False;result],_,seq[##1]&],
abort=True
];
cleanup;
If[abort,Abort[]];
If[rethrow,Throw[result/.
seq->Sequence]];
result
]
SetAttributes[parseInContext,HoldFirst];
Options[parseInContext]=
{
LocalizingContext->"MyLocalizingContext`",
DefaultImportedContexts:>{},
ExtraImportedContexts:>{}
};
parseInContext[code_,opts:OptionsPattern[]]:=
Module[
{
result,
context=OptionValue[LocalizingContext],
defcontexts=OptionValue[DefaultImportedContexts],
extraContexts=OptionValue[ExtraImportedContexts],
allContexts
},
allContexts={Sequence@@defcontexts,Sequence@@extraContexts};
Block[{$ContextPath},
CleanUp[
BeginPackage[context];Needs/@allContexts;result=code,
EndPackage[]
];
result
]
];
ClearAll[inPrivateContext];
SetAttributes[inPrivateContext,HoldAll];
inPrivateContext[code_]:=
CleanUp[Begin["`Private`"];code,End[]];
ClearAll[parseInPrivateSubcontext];
SetAttributes[parseInPrivateSubcontext,HoldFirst];
parseInPrivateSubcontext[code_,opts:OptionsPattern[]]:=
parseInContext[inPrivateContext[code],opts];
ClearAll[withTemporaryContext];
SetAttributes[withTemporaryContext,HoldRest];
withTemporaryContext[context_String,{contVar_Symbol,code_}]:=
Block[{contVar=context},
With[{names=context<>"Private`*",remove=If[Names[#1]=!={},Remove[#1]]&},
CleanUp[remove[names];code,remove[names]]
]
];
ClearAll[withHeldCodeInTemporaryContext];
Options[withHeldCodeInTemporaryContext]=
{
TemporaryContextName->"TemporaryContext`",
ExtraImportedContexts:>{"Global`"}
};
SetAttributes[withHeldCodeInTemporaryContext,HoldFirst];
withHeldCodeInTemporaryContext[
{var_Symbol,code_},fname_,opts:OptionsPattern[]
]:=
Module[{tempcont},
Block[{var},
withTemporaryContext[
OptionValue[TemporaryContextName],
{
tempcont,
parseInPrivateSubcontext[
var=loadFile[fname];code,
LocalizingContext->tempcont,
ExtraImportedContexts->OptionValue[ExtraImportedContexts]
]
}
]
]
];
The code contains a number of macros,some of which are general (like WReach's CleanUp and a few others), while others specialize the generic ones to the more narrow goals we set here.
Comments
Post a Comment