Skip to main content

How to split data into clusters based on fitting to function


I have data which looks somewhat like this:


data


On the picture, you can see that the data can be described by 2 linear functions - if you manage to split it into 2 groups, each can easily be fitted with a x + b


However, when I run FindClusters on that data, I get this:


clusters


It seems, I need to adjust it somehow


Can anyone show me how to properly use FindClusters to split this data?



UPDATE


The functions are all known and the same.


To play with, the data can be emulated with:


dd = Join[Table[ {x, .3 x + 2 + RandomReal[{-1, 1}]}, {x, 0, 100}], 
Table[ {x, .6 x - 20 + RandomReal[{-1, 1}]}, {x, 0,
100}]]; ListPlot[dd]; cc = FindClusters[dd]; ListPlot[cc]

Answer



If we can assume we know more or less the functions for the two groups, in this case two linear functions (two parameters each) we can just NMinimize for the Min distance of the two functions.


{w, sol} = NMinimize[
Total[

Map[
Min[Abs[#[[2]] - (n1 + m1 #[[1]])],
Abs[#[[2]] - (n2 + m2 #[[1]])]] &
, dd
]
], {n1, n2, m1, m2}]


{95.3241, {n1 -> 1.84575, n2 -> -19.8623, m1 -> 0.300732,  m2 -> 0.598053}}


Once we have extracted the parameters for the two functions, then we just GatherBy each point, based on which function is closer.


ListPlot@Module[{n1, n2, m1, m2},
{n1, n2, m1, m2} = Values[sol];
GatherBy[dd,
LessEqual[Abs[#[[2]] - (n1 + m1 #[[1]])],
Abs[#[[2]] - (n2 + m2 #[[1]])]] &]
]


Mathematica graphics




More general


Define your function with arbitrary number of parameters


func[parms_List][x_] := parms[[1]] + parms[[2]]  x

A distance function


dist[f_][{x_, y_}] := Abs[y - f[x]]

Or (after comment by @SampoSmolander)


dist[f_][{x_, y_}] := (y - f[x])^2


Fake data


dd = Table[
With[
{
a = RandomChoice[{1, 4}],
b = RandomChoice[{1/2, 2}]
}, {x, func[{a, b}][x] + RandomReal[{-1, 1}]}], {x, 0, 10,
0.01}];


Minimisation, The i index goes through the number of parameters of each function (here 2), the index j over the number of functions to fit (in this case 4).


sol2 = Last@With[{n = 2, m = 4},
NMinimize[
Total@Map[Function[{L}, Min @@ Table[
dist[func[Array[c[#, k] &, n]]][L]
, {k, m}]], dd]
, Flatten[Table[c[i, j], {i, n}, {j, m}]]]
]

Plot



Show[
ListPlot@GatherBy[
dd
, Position[#, Min[#]] &[
Table[dist[func[{c[1, k], c[2, k]} /. sol2]][#], {k, 4}]] &
],
Plot[
Evaluate[
Table[
func[{c[1, k], c[2, k]}][x]

, {k, 4}] /. sol2
]
, {x, 0, 10}
]]


Mathematica graphics



Obviously there is no way to be sure this will converge to reasonable clusters if the lines are too mixed or the functions too complicated.


Fitting different number of functions



Let's see how does this perform when there are four groups of points and we ask for m different number of clusters, from 1 to 6.


plots = Table[
sol2 = Last@With[{n = 2},
NMinimize[
Total@Map[Function[{L}, Min @@ Table[
dist[func[Array[c[#, k] &, n]]][L]
, {k, m}]], dd]
, Flatten[Table[c[i, j], {i, n}, {j, m}]]]
];
ListPlot[

GatherBy[
dd
, Position[#, Min[#]] &[
Table[dist[func[{c[1, k], c[2, k]} /. sol2]][#], {k, m}]] &
], PlotLabel -> m], {m, 6}];

Grid@Partition[plots, 2]

Mathematica graphics


Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...