performance tuning - Finding column with maximum sum of by-element products of column subsets efficiently?
A couple of recent questions and associated answers reminded me of an old problem:
As a toy example, take:
SeedRandom[1211]
matsize = 5;
subsize = 3;
mat = RandomInteger[{-10, 10}, {matsize, matsize}];
mat // MatrixForm
$\left( \begin{array}{ccccc} -2 & 8 & 2 & 7 & -7 \\ 5 & -2 & -4 & 2 & -3 \\ 3 & -9 & -1 & -3 & 5 \\ 4 & -7 & 8 & 9 & -7 \\ 4 & 3 & -3 & -8 & 7 \\ \end{array} \right)$
where rows represent some subject/object under test, columns are different sets of conditions, and the elements are an integer score on [-10,10].
It is desired to find the column index whose elements, taken over all subsets of size subsize, have the maximum value when the elements of each subset are multiplied and those products are totaled.
For the toy example, this is column 2, with a result of 487.
A naive way might be
Tr /@ Apply[Times, Subsets[#, {subsize}] & /@ Transpose[mat], {2}] // Ordering // Last
but that will obviously blow RAM for cases with large subset cardinality.
Much more memory efficient, but with its own issues is:
SymmetricPolynomial[subsize, #] & /@ Transpose[mat] // Ordering // Last
How might you do this, with the proviso that I'd prefer pure Mathematica (compilation to WVM is fine, compilation to C means I'rather just write it directly in C or whatever), and the actual use-case is matsize=100 and subsize=30.
Edit: Since adding bounty, I came up with
(Coefficient[Times @@ (1 + #1 x) // Expand, x^#2] // Position[#, Max@#] &) &[mat, subsize]
Fastest way so far (on my box), reasonable RAM use.
Comments
Post a Comment