Skip to main content

string manipulation - What stopwords list is the Wolfram language using?


The documentation of DeleteStopwords only says that it "uses a standard, built-in list of stopwords".


So what is it exactly?




Update


Now that it is said "standard", does that standard have a name?



Answer



A little spelunking of the code for DeleteStopwords[] yields the internally used stopword list:


DeleteStopwords; (* force auto-load *)

AlphabeticSort[List @@ TextProcessing`TextModificationDump`$stopWords["English"]] // Short
{"a", "A", "about", "above", "across", "after", "again", "against", "all", "almost",
"alone", "along", "already", "also", "although", <<240>>,
"within", "without", "won't", "would", "wouldn't", "yet", "you", "you'd", "you'll",
"you're", "you've", "your", "yours", "yourself", "yourselves"}

Comments