I have a very large network of atoms ($\approx 10^6$ atoms) with fixed positions that resemble a cellular structure:
I have two files including:
- fixed locations of each atom
- set of points which makes each cell.
The first one is like:
1 1.72907 3.50783
2 3.89771 506.561
3 514.767 4.35252
...
The second dataset has $\approx 5 \times 10^5$ rows (but size of each row is different). For example:
{1,3,485,969,970,971,1452}
{1,487,488,970,972}
{1, 485, 486, 487, 966, 968}
{2,99706,99707,99708,99709,100190,100191}
{2,99225,99226,99227,99708,99710,99711}
{2, 99222, 99223, 99224, 99225, 99706}
...
I need to find out all sets which have exactly two elements in common ($=$ two cells share an edge or in other words, are neighbors). I already have a code but it's inefficient because it compares the sets line by line. Here's my code ($n$ is number of rows):
ParallelEvaluate[file = OpenWrite["RN" <> ToString[$KernelID] <> ".dat"]]
ParallelDo[
WriteString[file, i, " ",
Flatten[Last[Reap[Do[If[Length[Intersection[ring[[i]], ring[[j]]]] == 2,
Sow[j]], {j, 1, n}]]]], "\n"]
, {i, 1, n}];
ParallelEvaluate[Close@file];
I find intersection length of a specific set $i$ (ring[[i]]
) with all other sets and if it is equal to two, I write the set number in a file. Is there anyway to improve efficiency of this code?
Update
I have an alternative solution without using Intersection
and with only one loop, as follows:
ring = ReadList["rings.dat", Number, RecordLists -> True];
ParallelEvaluate[file = OpenWrite["RN" <> ToString[$KernelID] <> ".dat"]]
ParallelDo[
RN = Complement[First/@Tally[Flatten[First /@ Position[ring, #] & /@ ring[[i]]]],
{i}];
WriteString[file, i, " ", RN ,"\n"]
, {i, 1, n}];
ParallelEvaluate[Close@file];
But it seems it is not that much better than previous one.
Answer
apologies for typos I had to retype this. (edit there was one now fixed)
amax = Max@Flatten@idx;
construct complementary connectivity list.
atomc = Flatten@# &/@ Last@Reap[Do[Sow[i,#]&/@idx[[i]],{i,Length[idx]}],Range[amax]];
extract neighbors (should be fast):
celln = Flatten@(First/@Select[Tally@Flatten[atomc[[#]]&/@#],#[[2]]==2 &]) &/@idx ;
timing results for the example set w/ 823 cells:
{.0251,.0238}
for the two steps.
example result: celln[[400]]
{397,399,402,744,745}
This takes a minute for ~10^6 cells.
Comments
Post a Comment