performance tuning - Finding all sets of elements with two common elements in a large dataset

I have a very large network of atoms ($\approx 10^6$ atoms) with fixed positions that resemble a cellular structure:

enter image description here

I have two files including:

fixed locations of each atom

set of points which makes each cell.

The first one is like:

 1 1.72907 3.50783
 2 3.89771 506.561
 3 514.767 4.35252
 ...

The second dataset has $\approx 5 \times 10^5$ rows (but size of each row is different). For example:

{1,3,485,969,970,971,1452}
{1,487,488,970,972}
{1, 485, 486, 487, 966, 968}
{2,99706,99707,99708,99709,100190,100191}
{2,99225,99226,99227,99708,99710,99711}
{2, 99222, 99223, 99224, 99225, 99706}
...

I need to find out all sets which have exactly two elements in common ($=$ two cells share an edge or in other words, are neighbors). I already have a code but it's inefficient because it compares the sets line by line. Here's my code ($n$ is number of rows):

 ParallelEvaluate[file = OpenWrite["RN" <> ToString[$KernelID] <> ".dat"]]    

 ParallelDo[
 WriteString[file, i, " ",
 Flatten[Last[Reap[Do[If[Length[Intersection[ring[[i]], ring[[j]]]] == 2, 
 Sow[j]], {j, 1, n}]]]], "\n"]
 , {i, 1, n}];

 ParallelEvaluate[Close@file];

I find intersection length of a specific set $i$ (ring[[i]]) with all other sets and if it is equal to two, I write the set number in a file. Is there anyway to improve efficiency of this code?

Update

I have an alternative solution without using Intersection and with only one loop, as follows:

 ring = ReadList["rings.dat", Number, RecordLists -> True];
 ParallelEvaluate[file = OpenWrite["RN" <> ToString[$KernelID] <> ".dat"]]    

 ParallelDo[
 RN = Complement[First/@Tally[Flatten[First /@ Position[ring, #] & /@ ring[[i]]]],   
 {i}];

 WriteString[file, i, " ", RN ,"\n"]
 , {i, 1, n}];

 ParallelEvaluate[Close@file];

But it seems it is not that much better than previous one.

Answer

apologies for typos I had to retype this. (edit there was one now fixed)

 amax = Max@Flatten@idx;

construct complementary connectivity list.

 atomc = Flatten@# &/@ Last@Reap[Do[Sow[i,#]&/@idx[[i]],{i,Length[idx]}],Range[amax]];

extract neighbors (should be fast):

 celln = Flatten@(First/@Select[Tally@Flatten[atomc[[#]]&/@#],#[[2]]==2 &]) &/@idx ;

timing results for the example set w/ 823 cells:

{.0251,.0238}

for the two steps.

example result: celln[[400]]

{397,399,402,744,745}

This takes a minute for ~10^6 cells.

Blog

Search This Blog

performance tuning - Finding all sets of elements with two common elements in a large dataset

Update

Comments

Post a Comment

Popular posts from this blog

plotting - How to draw lines between specified dots on ListPlot?

equation solving - Invert and fit implicitly defined curve

dynamic - How can I make a clickable ArrayPlot that returns input?