I am trying to find the list of any file in any directory of a given name (by in the directory, I mean directly in the directory, so in a directory which is in the directory would not count). For the sake of example. Let's suppose I want to find all files in each folder called "Preferences", and let's restrict our search to the folder ~/.Mathematica. If I wanted to do this from the terminal, I could just do
find ~/.Mathematica -regex ~/.Mathematica.*Preferences/[^/]*
.
This works and I see there is a single file matching my criteon, ~/.Mathematica/ApplicationData/Parallel/Preferences/Preferences.m
But I want to try to do it conveniently in mathematica. I am thinking the Filenames
function should do it.
I will first run
SetDirectory["~/.Mathematica"]
Then I would run
fileAndDirectoryNames =
FileNames["*",RegularExpression[".*Preferences"], 1]
followed by
fileNames = Select[fileAndDirectoryNames, ! DirectoryQ[#] &]
However, this gives incorrect results for me: fileAndDirectoryNames
is an empty list. If I instead run
fileAndDirectoryNames =
FileNames["*", RegularExpression[".*/.*/Preferences"], 1]
and recompute fileNames
as before, then I get correct output.
I am confused because it seems to me that the regular expression in my second attempt is stronger (allows for fewer matches) than the one in my first attempt. The fileNames function should have a monotonicity property in the second argument that if you weaken the pattern, then the new output ought to be a superset of the original output. Yet this doesn't seem to happen. Why is this? I am not sure if I am having a problem with mathematica or my understanding of regular expressions.
Answer
All three of parameters for FileNames
can affect the depth at which Mathematica searches for results. It seems like your confusion is a result of interaction among these parameters. This is easily understandable as the documentation for FileNames
is not very illustrative. (Indeed my first attempt at answering this question was faulty for the same reason.)
The first parameter -- the form -- should be thought of as a relative path. It has no intrinsic depth specification, but will be tested at depths specified by the next two parameters. However, it is possible to control the depth of the search with this parameter by specifying a folder hierarchy in the form you are searching for. (See below.) This can be a literal string, a string with simple wildcards (*
, etc.), a Mathematica-style string pattern, or a regular expression.
The second parameter -- the directories -- specifies the top-level locations in which Mathematica will conduct its search. The first parameter will be tested relative to what is specified here. This can also be a literal or a pattern, same as above.
The third parameter -- the depth -- tells Mathematica whether it should repeat the search for the first parameter in subdirectories of the paths specified in the second parameter. When its value is 1 (the default), Mathematica will only return matches that are immediately relative to a directory specified in the second argument.
Rather than writing a bunch of prose, I think it will be easier to just supply some examples to see how these things can interact.
First, here is the entire directory tree of the folder tmp
:
FileNames["*", "tmp", Infinity]
{"tmp/1B.2010-2011.dataless", "tmp/Preferences", "tmp/Preferences/test6", "tmp/t1", "tmp/t1/Preferences", "tmp/t1/Preferences/dir1", "tmp/t1/Preferences/Preferences", "tmp/t1/Preferences/Preferences/test7", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t1/st1", "tmp/t1/st1/Preferences", "tmp/t1/st1/Preferences/test9", "tmp/t2", "tmp/t3", "tmp/t3/Preferences", "tmp/t3/Preferences/test3", "tmp/t3/Preferences/test4", "tmp/test5"}
So of course we see that Infinity
directs Mathematica to walk the whole tree. By contrast, the default value (1) yields:
FileNames["*", "tmp"]
{"tmp/1B.2010-2011.dataless", "tmp/Preferences", "tmp/t1", "tmp/t2", "tmp/t3", "tmp/test5"}
Similarly,
FileNames["*", "tmp", 2]
{"tmp/1B.2010-2011.dataless", "tmp/Preferences", "tmp/Preferences/test6", "tmp/t1", "tmp/t1/Preferences", "tmp/t1/st1", "tmp/t2", "tmp/t3", "tmp/t3/Preferences", "tmp/test5"}
This is all straightforward. Now, consider these examples. Take note of how we are controlling the depth of the search in various ways.
FileNames["t1/*", "tmp"]
{"tmp/t1/Preferences", "tmp/t1/st1"}
FileNames["*", "tmp/t1"]
{"tmp/t1/Preferences", "tmp/t1/st1"}
FileNames["t1/*", "tmp", 2]
{"tmp/t1/Preferences", "tmp/t1/Preferences/dir1", "tmp/t1/Preferences/Preferences", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t1/st1", "tmp/t1/st1/Preferences"}
FileNames["t1/*", "tmp", Infinity]
{"tmp/t1/Preferences", "tmp/t1/Preferences/dir1", "tmp/t1/Preferences/Preferences", "tmp/t1/Preferences/Preferences/test7", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t1/st1", "tmp/t1/st1/Preferences", "tmp/t1/st1/Preferences/test9"}
FileNames["test*", "tmp/t1", Infinity]
{"tmp/t1/Preferences/Preferences/test7", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t1/st1/Preferences/test9"}
FileNames["*", "tmp/*/Preferences"]
{"tmp/t1/Preferences/dir1", "tmp/t1/Preferences/Preferences", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t3/Preferences/test3", "tmp/t3/Preferences/test4"}
Note that *
in the second parameter is not matching nested directories. (E.g., we are not getting "tmp/t1/Preferences/Preferences/test7".) The same happens if we try RegularExpression["tmp/.*/Preferences"]
. The reason is given in the documentation:
Mathematica syntax is sometimes inconsistent in unpredictable ways to remind users of the imperfection of the human condition.
FileNames["*", "tmp/*/*/Preferences", Infinity]
{"tmp/t1/Preferences/Preferences/test7", "tmp/t1/st1/Preferences/test9"}
The best way to conduct the search in question, then, is to describe the folder hierarchy in the first argument.
paths = FileNames[RegularExpression["Preferences/[^/]+"],"tmp",Infinity]
{"tmp/Preferences/test6", "tmp/t1/Preferences/dir1", "tmp/t1/Preferences/Preferences", "tmp/t1/Preferences/Preferences/test7", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t1/st1/Preferences/test9", "tmp/t3/Preferences/test3", "tmp/t3/Preferences/test4"}
Notice how RegularExpression
is doing what we would expect when it is passed to the form parameter.
And then we can filter as needed.
Select[Not@*DirectoryQ]@paths
{"tmp/Preferences/test6", "tmp/t1/Preferences/Preferences/test7", "tmp/t1/Preferences/test1", "tmp/t1/Preferences/test2", "tmp/t1/st1/Preferences/test9", "tmp/t3/Preferences/test3", "tmp/t3/Preferences/test4"}
Comments
Post a Comment