Filtering of a Dataset based on a column name with underscore and a nested string pattern

I have the following dataset :

ds=Dataset@{<|"rid" -> "#30:0", "objID" -> 1, "out_LTSnamAbbr" -> {"TMAP-TIR"}, "out_LTSnamPref" -> {"Topic Map - Term Information Resource"},   "out_LTSnamAlt" -> {"Index_term"}|>, <|"rid" -> "#30:1",   "objID" -> 2, "out_LTSnamAbbr" -> {"TMAP-BIR"},   "out_LTSnamPref" -> {"Topic Map - Binary Information Resource"}, "out_LTSnamAlt" -> Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:2", "objID" -> 3, "out_LTSnamAbbr" -> {"NAM-prefer"}, "out_LTSnamPref" -> {"Topic Map - Base name"}, "out_LTSnamAlt" -> Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:3", "objID" -> 4, "out_LTSnamAbbr" -> {"NAM-alt"}, "out_LTSnamPref" -> {"Topic Map - Variant Name"}, "out_LTSnamAlt" -> {"prettyString"}|>, <|"rid" -> "#30:4", "objID" -> 5, "out_LTSnamAbbr" -> {"TMAP-PROP"}, "out_LTSnamPref" -> {"Topic Map - Occurrence"}, "out_LTSnamAlt" ->    Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:5", "objID" -> 6, "out_LTSnamAbbr" -> {"TMAP-LOC"}, "out_LTSnamPref" -> {"Topic Map - Locator"}, "out_LTSnamAlt" -> Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:6", "objID" -> 7, "out_LTSnamAbbr" -> {"VAL-null"}, "out_LTSnamPref" -> {"Is Null"}, "out_LTSnamAlt" -> {"ISNull"}|>, <|"rid" -> "#30:7", "objID" -> 9, "out_LTSnamAbbr" -> {"TMAP-ASSOC"},   "out_LTSnamPref" -> {"Topic Map - Association"}, "out_LTSnamAlt" ->    Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:8", "objID" -> 10, "out_LTSnamAbbr" -> {"TMAP-ROLE"}, "out_LTSnamPref" -> {"Topic Map - Association Role"},"out_LTSnamAlt" -> Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:9", "objID" -> 11, "out_LTSnamAbbr" -> {"NAM-abbr"}, "out_LTSnamPref" -> {"Abbreviation"}, "out_LTSnamAlt" -> {"Abbreviation"}|>, <|"rid" -> "#30:10", "objID" -> 12, "out_LTSnamAbbr" -> {"REF-NULON"}, "out_LTSnamPref" -> {"Neurorganon Upper Level Ontology (NULO)"}, "out_LTSnamAlt" -> Missing["KeyAbsent", "out_LTSnamAlt"]|>, <|"rid" -> "#30:11", "objID" -> 13, "out_LTSnamAbbr" -> {"NAM"}, "out_LTSnamPref" -> {"Proper Name String"}, "out_LTSnamAlt" -> {"Name"}|>}

I want to filter the rows of my dataset based on a string pattern of the column "out_LTSnamAbbr", e.g. return all rows that match "TMAP-*".

If this was a list, e.g.

lis = Flatten@Normal@ds[All, "out_LTSnamAbbr"];

I would do

Select[lis, StringMatchQ[#, "TMAP" ~~ ___] &]

It may start like

ds[Select[Slot["out_LTSnamAbbr"] .... &]]

Answer

I couldn't get this to work until I realized the keys for out_LTSnamAbbr were lists of strings with just one element, so I added First to get the string inside

ds[Select[StringMatchQ[First@#"out_LTSnamAbbr", "TMAP" ~~ ___] &]]

Mathematica graphics

Blog

Search This Blog

Filtering of a Dataset based on a column name with underscore and a nested string pattern

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?