Skip to main content

graphics - How to create word clouds?



Word clouds are rather useless fancy and visually appealing plots, where words are plotted with different sizes according to their frequency in a corpus. Many applications exist out there (Wordle, Tagxedo, etc.) that can give an example. I am interested in the algorithm that achieves the closest possible packing of words or other irregular shapes.


There is a method for defining the convex hull of an object (in the Computational geometry package), but I think one needs here the boundary that closes the least area. If this is calculated, perhaps the packing method of graph layout can be exploited by assuming that points on the hull of a word correspond to graph vertices... but this is just speculation. So far I could only list and style the words (that was the easy part):


tally = Tally@
Cases[StringSplit[ExampleData[{"Text", "AliceInWonderland"}],
Except@LetterCharacter], _?(StringLength@# > 4 &)];
tally = Cases[tally, _?(Last@# > 10 &)];
range = {Min@(Last /@ tally), Max@(Last /@ tally)};

words = Style[First@#, FontFamily -> "Cardinal", FontWeight -> Bold,
FontColor ->

Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, .8}]],
FontSize -> Last@Rescale[#, range, {12, 70}]] & /@ tally;

Framed[Grid@Partition[words, 10, 10, {1, 1}, {}],
FrameStyle -> {Gray, Thick}, RoundingRadius -> 10, ImageMargins -> 5]

Mathematica graphics


Some possible specifications of the algorithm:



  • According to this link (shared by cormullion) identifying the closest boundary of each word is not enough as words can appear inside other glyphs with holes, like P, A, etc. Thus indeed intersection of words must be tested.


  • According to Szabolcs, the code might be able to resize words to fit them better

  • Many applications are able to arrange the cloud to fill up a user-specified shape (e.g. ellipse, apple, Che Guevara, etc.) instead of being casually positioned along the ever-increasing spiral.

  • It would be nice to allow individual words to have different rotations.

  • As usually, a fully vectorized version is preferred over image-processing methods (if the former is faster).

  • Also it would be nice to have post-rendering effects, like clickable words, mouseover effects, etc.


One way to convert strings to vector graphics is:


First@ImportString[
ExportString[
Style["SomeText", Italic, FontFamily -> "Times", FontSize -> 36],

"PDF"], "PDF", "TextMode" -> "Outlines"]

Some related questions for those who want to do further research:




Answer



Here's what I came up with


Mathematica graphics


How I did it


First we need a list of words. Here, I've taken the original list ordered by size.


tally = Tally@

Cases[StringSplit[ExampleData[{"Text", "AliceInWonderland"}],
Except@LetterCharacter], _?(StringLength@# > 4 \[And] # =!=
"Alice" &)];
tally = Cases[tally, _?(Last@# > 10 &)];
tally = Reverse@SortBy[tally, Last];
range = {Min@(Last /@ tally), Max@(Last /@ tally)};

words = Style[First@#, FontFamily -> "Cracked", FontWeight -> Bold,
FontColor ->
Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, 1}]],

FontSize -> Last@Rescale[#, range, {12, 70}]] & /@ tally;

The words are rasterised and cropped to make sure the bounding box is as tight as possible.


wordsimg = ImageCrop[Image[Graphics[Text[#]]]] & /@ words;

To produce the image the words are added one by one using a Fold loop where the next word is placed as close to the centre of the existing image as possible. This is done by applying a max filter to the binarized version of the original image thus turning forbidden pixels white and looking for the black point that is closest to the centre of the image.


iteration[img1_, w_, fun_: (Norm[#1 - #2] &)] := 
Module[{imdil, centre, diff, dimw, padding, padded1, minpos},
dimw = ImageDimensions[w];
padded1 = ImagePad[img1, {dimw[[1]] {1, 1}, dimw[[2]] {1, 1}}, 1];


imdil = MaxFilter[Binarize[ColorNegate[padded1], 0.01],
Reverse@Floor[dimw/2 + 2]];
centre = ImageDimensions[padded1]/2;

minpos = Reverse@Nearest[Position[Reverse[ImageData[imdil]], 0],
Reverse[centre], DistanceFunction -> fun][[1]];
diff = ImageDimensions[imdil] - dimw;
padding[pos_] := Transpose[{#, diff - #} &@Round[pos - dimw/2]];


ImagePad[#, (-Min[#] {1, 1 }) & /@ BorderDimensions[#]] &@
ImageMultiply[padded1, ImagePad[w, padding[minpos], 1]]]

Fold[iteration, wordsimg[[1]], Rest[wordsimg]]

You can play around with the distance function. For example for a distance function


fun = Norm[{1, 1/2} (#2 - #1)] &

you get an ellipsoidal shape:


Fold[iteration[##, fun]&, wordsimg[[1]], Rest[wordsimg]]


Mathematica graphics




Updated version


The previous code places new words in the image by approximating them with rectangles. This works fine for horizontally or vertically oriented words, but not so well for rotated words or more general shapes. Luckily, the code can be easily modified to deal with this by replacing the MaxFilter with a ImageCorrelate:


iteration2[img1_, w_, fun_: ( Norm[#1 - #2] &)] := 
Module[{imdil, centre, diff, dimw, padding, padded1, minpos},
dimw = ImageDimensions[w];
padded1 = ImagePad[img1, {dimw[[1]] {1, 1}, dimw[[2]] {1, 1}}, 1];
imdil = Binarize[ImageCorrelate[Binarize[ColorNegate[padded1], 0.05],

Dilation[Binarize[ColorNegate[w], .05], 1]]];
centre = ImageDimensions[padded1]/2;
minpos =
Reverse@Nearest[Position[Reverse[ImageData[imdil]], 0],
Reverse[centre], DistanceFunction -> fun][[1]];
Sow[minpos - centre]; (* for creating vector plot *)
diff = ImageDimensions[imdil] - dimw;
padding[pos_] := Transpose[{#, diff - #} &@Round[pos - dimw/2]];
ImagePad[#, (-Min[#] {1, 1}) & /@ BorderDimensions[#]] &@
ImageMultiply[padded1, ImagePad[w, padding[minpos], 1]]]


To test this code we use a list of rotated words. Note that I'm using ImagePad instead of ImageCrop to crop the images. This is because ImageCrop seems to clip the words sometimes.


words = Style[First@#, FontFamily -> "Times", 
FontColor ->
Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, 1}]],
FontSize -> (Last@Rescale[#, range, {12, 150}])] & /@ tally;

wordsimg = ImagePad[#, -3 -
BorderDimensions[#]] & /@ (Image[
Graphics[Text[Framed[#, FrameMargins -> 2]]]] & /@ words);


wordsimgRot = ImageRotate[#, RandomReal[2 Pi],
Background -> White] & /@ wordsimg;

The iteration loop is as before:


Fold[iteration2, wordsimgRot[[1]], Rest[wordsimgRot]]

which produces


Mathematica graphics


Second update



To create a vector graphics of the previous result, we need to save the positions of the words in the image, for example by adding Sow[minpos - centre] to the definition of iteration2 somewhere towards the end of the code and using Reap to reap the results. We also need to keep the rotation angles of the words, so we'll replace wordsimgRot with


angles = RandomReal[2 Pi, Length[wordsimg]];

wordsimgRot = ImageRotate[##, Background -> White] & @@@
Transpose[{wordsimg, angles}];

As mentioned before, we use Reap to create the position list


poslist = Reap[img = Fold[iteration2, wordsimgRot[[1]], 
Rest[wordsimgRot]];][[2, 1]]


The vector graphics can then be created with


Graphics[MapThread[Text[#1, Offset[#2, {0, 0}], {0, 0}, {Cos[#3], Sin[#3]}] &,
{words, Prepend[poslist, {0, 0}], angles}]]

Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...