Skip to main content

filtering - How to get list of duplicates when using DeleteDuplicates?



This might be easy, but can't find a way to use DeleteDuplicates to get also list of the actual duplicates.


Example:


lstA = {1, 2, 4, 4, 6, 7, 8, 8};
r = DeleteDuplicates[lstA]
(* {1, 2, 4, 6, 7, 8} *)

I also wanted to get list of the actual duplicates, which are {4,8} in this example. It would have been nice if DeleteDuplicates would also return those, but there is no option there for that. I do not know what many Mathematica functions do not return back more useful information when called. Many seem to return one piece of information only, and one has to call another function to get another piece of information.


For example, here DeleteDuplicates could had an API like this


{r,d}=DeleteDuplicates[lst]


and r will contain the list after duplicates are moved, but d would contain the actual list of duplicates. To make it even more useful, it can be


{r,d,p}=DeleteDuplicates[lst]

Where p will be the positions of the duplicates in the original list. Matlab seems to do it this way. Many functions there can return more than one piece of information at a time. This might be due to WL being functional programming language, and designed for cascading function calls, where each function only does one thing at a time. I am not sure now.


DeleteDuplicates.html



Answer



Perhaps one of the simplest ways is to use Tally:


p = {1, 2, 4, 4, 6, 7, 8, 8};

Cases[Tally @ p, {x_, n_ /; n > 1} :> x]



{4, 8}

A somewhat faster formulation:


Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p]

Taking the optimization to a rather excessive degree:


#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p]




Though not as fast as the SparseArray optimized form of Tally, an alternative is to use Split after sorting. This is reasonably clean and fast:


Flatten[Split @ Sort @ p, {2}][[2]]


{4, 8}

For Integer data this method is twice as fast as any other listed here:


With[{s = Sort @ p},
DeleteDuplicates @

s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
]



Timings


p = RandomInteger[1*^8, 1*^6];

Cases[Tally @ p, {x_, n_ /; n > 1} :> x] // Timing // First

Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p] // Timing // First


#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p] //
Timing // First

Flatten[Split @ Sort @ p, {2}][[2]] // Timing // First

With[{s = Sort @ p},
DeleteDuplicates @
s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
] // Timing // First



0.827

0.343

0.265

0.343


0.11

Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...