This might be easy, but can't find a way to use DeleteDuplicates to get also list of the actual duplicates.
Example:
lstA = {1, 2, 4, 4, 6, 7, 8, 8};
r = DeleteDuplicates[lstA]
(* {1, 2, 4, 6, 7, 8} *)
I also wanted to get list of the actual duplicates, which are {4,8} in this example. It would have been nice if DeleteDuplicates would also return those, but there is no option there for that. I do not know what many Mathematica functions do not return back more useful information when called. Many seem to return one piece of information only, and one has to call another function to get another piece of information.
For example, here DeleteDuplicates could had an API like this
{r,d}=DeleteDuplicates[lst]
and r will contain the list after duplicates are moved, but d would contain the actual list of duplicates. To make it even more useful, it can be
{r,d,p}=DeleteDuplicates[lst]
Where p will be the positions of the duplicates in the original list. Matlab seems to do it this way. Many functions there can return more than one piece of information at a time. This might be due to WL being functional programming language, and designed for cascading function calls, where each function only does one thing at a time. I am not sure now.
Answer
Perhaps one of the simplest ways is to use Tally:
p = {1, 2, 4, 4, 6, 7, 8, 8};
Cases[Tally @ p, {x_, n_ /; n > 1} :> x]
{4, 8}
A somewhat faster formulation:
Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p]
Taking the optimization to a rather excessive degree:
#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p]
Though not as fast as the SparseArray optimized form of Tally, an alternative is to use Split after sorting. This is reasonably clean and fast:
Flatten[Split @ Sort @ p, {2}][[2]]
{4, 8}
For Integer data this method is twice as fast as any other listed here:
With[{s = Sort @ p},
DeleteDuplicates @
s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
]
Timings
p = RandomInteger[1*^8, 1*^6];
Cases[Tally @ p, {x_, n_ /; n > 1} :> x] // Timing // First
Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p] // Timing // First
#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p] //
Timing // First
Flatten[Split @ Sort @ p, {2}][[2]] // Timing // First
With[{s = Sort @ p},
DeleteDuplicates @
s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
] // Timing // First
0.827
0.343
0.265
0.343
0.11
Comments
Post a Comment