Skip to main content

performance tuning - Efficiently computing current flow betweenness centrality for graphs


Definitions:


Given a graph G=(V,E), the current flow betweenness is a node-wise measure that captures the fraction of current through a given node with a unit source (s) sink (t) supply bst (1 unit of current inserted at node s, bst(s)=1 and extracted at node t, bst(t)=−1, and bst(v)=0 for v∈V∖{s,t}).


For a fixed s-t pair, the throughput Ï„ of a node v is given by:


τst(v)=12(−|bst(v)|+∑e∋v|I(e)|)


where bst is the supply function defined above for the given s,t pair, I(e) is the current flowing through edge e, and e∋v means all edges incident on vertex v (i.e. v is part of, irrespective of it being at tail or head of edge).


Now the current flow betweenness centrality of a node v is simply a normalized sum over all its throughput for all possible supplied pairs s,t, i.e.:


c(v)=1(n−1)(n−2)∑s,t∈Vτs,t(v).





My implementation of the current-flow betweenness centrality goes as follows:



  • Given a graph G, I compute its incidence matrix b, corresponding Laplacian lap, and its inverse in S only once at the begining.

  • Then I have a module which takes n (n=|V|), b, S, conductances, supply nodes s,t and returns the list of currents through edges for the given s,t pair as supply.

  • Then I have module that computes Ï„st given in (1), in which I use a piecewise function for supply bst, and use Total[] to compute the sum in (1).

  • Then I have a module that computes c given in (2), where I use a Table to compute Ï„ of v for all possible s,t and then again use Total to sum them.

  • Finally, to compute c for all nodes I create a table that runs over all nodes and calls the module for c.


Actual implementation with a dummy random graph to showcase:


SeedRandom[123]

n = 15;
m = 20;
G = RandomGraph[{n, m}, VertexLabels -> "Name"]
edges = EdgeList[G];

GDirected =
Graph[Range[n], Map[#[[1]] -> #[[2]] &, edges],
VertexLabels -> "Name"]
conductances = ConstantArray[1., m];
b = -1.*Transpose[IncidenceMatrix[GDirected]];

lap = b\[Transpose].DiagonalMatrix[SparseArray[conductances]].b;
a = SparseArray[ConstantArray[1., {1, n}]];
A = ArrayFlatten[{{lap, a\[Transpose]}, {a, 0.}}];
S = LinearSolve[A];
\[Epsilon] = 1. 10^-8;
s = 1;
t = 2;

Edge current module:


edgecurrents[ncount_, invertedkirch_, incid_, conducarr_, nodei_, 

nodej_, threshold_] :=
Module[{n = ncount, solver = invertedkirch, incidmat = incid,
G = conducarr, source = nodei, sink = nodej, eps = threshold},
appliedcurr = 1.;
J = SparseArray[{{source}, {sink}} -> {appliedcurr, -appliedcurr}, \
{n}, 0.];
psi = solver[Join[J, {0.}]][[;; -2]];
edgecurr = G incidmat.psi;
(*define current threshold to take care of small values*)


foundcurrents = Threshold[edgecurr, eps];
Return[foundcurrents, Module];
];

Ï„ module:


tau[edgels_, currls_, source_, sink_, vertex_] := 
Module[{edges = edgels, iedges = currls, s = source, t = sink,
v = vertex},
bst[u_, so_, to_] := Piecewise[{{1., u == so}, {-1., u == to}}, 0.];
If[s == t,

res = 0.,
incidv =
Flatten[Position[
edges, (v \[UndirectedEdge] _ | _ \[UndirectedEdge] v)]];
If[incidv == {},
inoutcurrs = 0.;
,
inoutcurrs = Total[Abs[Part[iedges, incidv]]];
];
res = 0.5*(-Abs[bst[v, s, t]] + inoutcurrs);

];
Return[res, Module];
];

c module:


currinbet[vcount_, edgels_, conduc_, vertex_, threshold_] := 
Module[{n = vcount, edges = edgels, conducmat = conduc, v = vertex,
eps = threshold},
taust =
Table[tau[edges, edgecurrents[n, S, b, conducmat, s, t, eps], s,

t, v], {s, n}, {t, n}];
ccb = Total[taust, 2]/((n - 1)*(n - 2));
Return[ccb, Module];
];

Example of currents for s=1,t=2:


edgecurrents[n, S, b, conductances, s, t, \[Epsilon]]
{0.640145, 0.359855, -0.0198915, -0.200723, -0.039783, -0.640145, \
-0.0994575, -0.0144665, 0., 0.0144665, -0.0198915, -0.0433996, \
0.0578662, -0.0144665, 0.359855, -0.359855, 0.101266, -0.0596745, 0., \

0.}

and computing the current-flow betweenness for all nodes:


vccb = Threshold[
Table[currinbet[n, EdgeList[G], conductances, i, \[Epsilon]], {i, 1,
n}], \[Epsilon]]

{0.182869, 0.403493, 0.268327, 0.052163, 0.253522, 0.240516, \
0.524532, 0.135177, 0., 0.208672, 0.275441, 0., 0., 0.282883, \
0.246786}


The obtained results are cross-checked with the existing Python library Networkx for computing c and they are in perfect agreement. But sadly efficiency wise, I am doing terribly.




Improved notebook version after Henrik Schumacher's suggestions can be downloaded here, with a working example.




Questions:




  • I (think) have minimized the current through edge calculations since S is simply pre-computed, thanks to Henrik Schumacher's approach here. However, I have the feeling I might be doing some things terribly inefficiently from then onward, as my routine slows down drastically for larger graphs. Is there anywhere I could be doing things much more efficiently?





  • Is my module-based approach or use of tables also responsible for part of the slow-down?




  • Maybe one line of optimization would be to cast (1) and (2) into linear-algebraic computations to speed them up, but I currently do not see how to do so.




(Any general feedback for rendering the code more efficient is most welcome of course.)



Answer



One potential bottleneck is



incidv = Flatten[Position[edges, (v \[UndirectedEdge] _ | _ \[UndirectedEdge] v)]]

as it involves (i) a search in the rather long list of edges and (ii) pattern matching, which both tend to be rather slow.


A quicker way will be to compute all these lists at once via


vertexedgeincidences = IncidenceMatrix[G]["AdjacencyLists"];

and to access the v-th one like this:


incidv = vertexedgeincidences[[v]]

The numbers



inoutcurrs = Total[Abs[Part[iedges, incidv]]];

can also all be computed at once for all v. This can be done with the help if the incidence matrix


B = IncidenceMatrix[G];

via


B.Abs[iedges]

As a general suggestion: Whenever you find yourself evaluating a Sum or Total of something, try to reprase it into Dot-products of vectors, matrices, etc.


Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]