I would like a discrete distance measure between two binary vectors (or strings). Like HammingDistance
but I want the vectors to be considered closer if they have more matches that are separated by zeros (or a default value).
For example: given the four vectors and distance measure thedistancemeasure
vec1={1,0,0,0,0,1,0,1};
vec2={1,0,1,0,0,1,0,0};
vec3={1,0,0,0,1,1,0,0};
vec4={0,1,0,0,1,1,0,0};
such that.
thedistancemeasure[vec1,vec2]< thedistancemeasure[vec3,vec4]
True
The measure likes small group of matches that are well separated versus a large group of matches that are "connected" or less seperated.
The amount of zeros shouldn't matter, but if it does, I prefers more zeros to give a smaller measure. The more separated the better.
If possible I also want the measure to give even closer distances for higher count of well separated correctly matched ones, for example.
vec5={1,0,0,1,0,1,0,1};
vec6={1,0,0,1,0,0,0,1};
would give.
thedistancemeasure[vec1,vec2]>thedistancemeasure[vec5,vec6]
True
The size of the vectors would always be fixed.
It might be possible using the output from ListCorrelate
since it should give the position correlations between lists.
Answer
ClearAll[distF1, distF2]
distF1 = With[{p = Intersection @@ (Flatten@ SparseArray[#]["NonzeroPositions"]&/@ #)},
-Length @ p] &;
distF2 = With[{p = Intersection @@ (Flatten@SparseArray[#]["NonzeroPositions"]&/@#)},
-Total[Differences@p]] &;
Example:
vec1 = {1, 0, 0, 0, 0, 1, 0, 1};
vec2 = {1, 0, 1, 0, 0, 1, 0, 0};
vec3 = {1, 0, 0, 0, 1, 1, 0, 0};
vec4 = {0, 1, 0, 0, 1, 1, 0, 0};
vec5 = {1, 0, 0, 1, 0, 1, 0, 1};
vec6 = {1, 0, 0, 1, 0, 0, 0, 1};
vecs = {vec1, vec2, vec3, vec4, vec5, vec6};
pairs = Partition[vecs, 2];
plabels = {"v1v2", "v3v4", "v5v6"};
Sort pairs lexicographically in ascending order using the distance function distF1
and breaking ties with the distance function distF2
:
SortBy[pairs, {distF1, distF2}] /. Thread[pairs -> plabels]
{"v5v6", "v1v2", "v3v4"}
Comments
Post a Comment