graphics - How to create word clouds?

Word clouds are ~~rather useless~~ fancy and visually appealing plots, where words are plotted with different sizes according to their frequency in a corpus. Many applications exist out there (Wordle, Tagxedo, etc.) that can give an example. I am interested in the algorithm that achieves the closest possible packing of words or other irregular shapes.

There is a method for defining the convex hull of an object (in the Computational geometry package), but I think one needs here the boundary that closes the least area. If this is calculated, perhaps the packing method of graph layout can be exploited by assuming that points on the hull of a word correspond to graph vertices... but this is just speculation. So far I could only list and style the words (that was the easy part):

tally = Tally@
   Cases[StringSplit[ExampleData[{"Text", "AliceInWonderland"}], 
     Except@LetterCharacter], _?(StringLength@# > 4 &)];
tally = Cases[tally, _?(Last@# > 10 &)];
range = {Min@(Last /@ tally), Max@(Last /@ tally)};

words = Style[First@#, FontFamily -> "Cardinal", FontWeight -> Bold, 
     FontColor -> 

      Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, .8}]], 
     FontSize -> Last@Rescale[#, range, {12, 70}]] & /@ tally;

Framed[Grid@Partition[words, 10, 10, {1, 1}, {}], 
 FrameStyle -> {Gray, Thick}, RoundingRadius -> 10, ImageMargins -> 5]

Mathematica graphics

Some possible specifications of the algorithm:

According to this link (shared by cormullion) identifying the closest boundary of each word is not enough as words can appear inside other glyphs with holes, like P, A, etc. Thus indeed intersection of words must be tested.

According to Szabolcs, the code might be able to resize words to fit them better

Many applications are able to arrange the cloud to fill up a user-specified shape (e.g. ellipse, apple, Che Guevara, etc.) instead of being casually positioned along the ever-increasing spiral.

It would be nice to allow individual words to have different rotations.

As usually, a fully vectorized version is preferred over image-processing methods (if the former is faster).

Also it would be nice to have post-rendering effects, like clickable words, mouseover effects, etc.

One way to convert strings to vector graphics is:

First@ImportString[
 ExportString[
   Style["SomeText", Italic, FontFamily -> "Times", FontSize -> 36], 

   "PDF"], "PDF", "TextMode" -> "Outlines"]

Some related questions for those who want to do further research:

Answer

Here's what I came up with

Mathematica graphics

How I did it

First we need a list of words. Here, I've taken the original list ordered by size.

tally = Tally@

   Cases[StringSplit[ExampleData[{"Text", "AliceInWonderland"}], 
     Except@LetterCharacter], _?(StringLength@# > 4 \[And] # =!= 
         "Alice" &)];
tally = Cases[tally, _?(Last@# > 10 &)];
tally = Reverse@SortBy[tally, Last];
range = {Min@(Last /@ tally), Max@(Last /@ tally)};

words = Style[First@#, FontFamily -> "Cracked", FontWeight -> Bold, 
     FontColor -> 
      Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, 1}]], 

     FontSize -> Last@Rescale[#, range, {12, 70}]] & /@ tally;

The words are rasterised and cropped to make sure the bounding box is as tight as possible.

wordsimg = ImageCrop[Image[Graphics[Text[#]]]] & /@ words;

To produce the image the words are added one by one using a Fold loop where the next word is placed as close to the centre of the existing image as possible. This is done by applying a max filter to the binarized version of the original image thus turning forbidden pixels white and looking for the black point that is closest to the centre of the image.

iteration[img1_, w_, fun_: (Norm[#1 - #2] &)] := 
 Module[{imdil, centre, diff, dimw, padding, padded1, minpos},
  dimw = ImageDimensions[w];
  padded1 = ImagePad[img1, {dimw[[1]] {1, 1}, dimw[[2]] {1, 1}}, 1];


  imdil = MaxFilter[Binarize[ColorNegate[padded1], 0.01], 
    Reverse@Floor[dimw/2 + 2]];
  centre = ImageDimensions[padded1]/2;

  minpos = Reverse@Nearest[Position[Reverse[ImageData[imdil]], 0], 
      Reverse[centre], DistanceFunction -> fun][[1]];
  diff = ImageDimensions[imdil] - dimw;
  padding[pos_] := Transpose[{#, diff - #} &@Round[pos - dimw/2]];


  ImagePad[#, (-Min[#] {1, 1 }) & /@ BorderDimensions[#]] &@
   ImageMultiply[padded1, ImagePad[w, padding[minpos], 1]]]

Fold[iteration, wordsimg[[1]], Rest[wordsimg]]

You can play around with the distance function. For example for a distance function

fun = Norm[{1, 1/2} (#2 - #1)] &

you get an ellipsoidal shape:

Fold[iteration[##, fun]&, wordsimg[[1]], Rest[wordsimg]]

Mathematica graphics

Updated version

The previous code places new words in the image by approximating them with rectangles. This works fine for horizontally or vertically oriented words, but not so well for rotated words or more general shapes. Luckily, the code can be easily modified to deal with this by replacing the MaxFilter with a ImageCorrelate:

iteration2[img1_, w_, fun_: ( Norm[#1 - #2] &)] := 
 Module[{imdil, centre, diff, dimw, padding, padded1, minpos}, 
  dimw = ImageDimensions[w];
  padded1 = ImagePad[img1, {dimw[[1]] {1, 1}, dimw[[2]] {1, 1}}, 1];
  imdil = Binarize[ImageCorrelate[Binarize[ColorNegate[padded1], 0.05], 

     Dilation[Binarize[ColorNegate[w], .05], 1]]];
  centre = ImageDimensions[padded1]/2;
  minpos = 
   Reverse@Nearest[Position[Reverse[ImageData[imdil]], 0], 
      Reverse[centre], DistanceFunction -> fun][[1]];
  Sow[minpos - centre]; (* for creating vector plot *)
  diff = ImageDimensions[imdil] - dimw;
  padding[pos_] := Transpose[{#, diff - #} &@Round[pos - dimw/2]];
  ImagePad[#, (-Min[#] {1, 1}) & /@ BorderDimensions[#]] &@
   ImageMultiply[padded1, ImagePad[w, padding[minpos], 1]]]

To test this code we use a list of rotated words. Note that I'm using ImagePad instead of ImageCrop to crop the images. This is because ImageCrop seems to clip the words sometimes.

words = Style[First@#, FontFamily -> "Times", 
     FontColor -> 
      Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, 1}]], 
     FontSize -> (Last@Rescale[#, range, {12, 150}])] & /@ tally;

wordsimg = ImagePad[#, -3 - 
  BorderDimensions[#]] & /@ (Image[
   Graphics[Text[Framed[#, FrameMargins -> 2]]]] & /@ words);


wordsimgRot = ImageRotate[#, RandomReal[2 Pi], 
  Background -> White] & /@ wordsimg;

The iteration loop is as before:

Fold[iteration2, wordsimgRot[[1]], Rest[wordsimgRot]]

which produces

Mathematica graphics

Second update

To create a vector graphics of the previous result, we need to save the positions of the words in the image, for example by adding Sow[minpos - centre] to the definition of iteration2 somewhere towards the end of the code and using Reap to reap the results. We also need to keep the rotation angles of the words, so we'll replace wordsimgRot with

angles = RandomReal[2 Pi, Length[wordsimg]];

wordsimgRot = ImageRotate[##, Background -> White] & @@@ 
   Transpose[{wordsimg, angles}];

As mentioned before, we use Reap to create the position list

poslist = Reap[img = Fold[iteration2, wordsimgRot[[1]], 
  Rest[wordsimgRot]];][[2, 1]]

The vector graphics can then be created with

Graphics[MapThread[Text[#1, Offset[#2, {0, 0}], {0, 0}, {Cos[#3], Sin[#3]}] &,
  {words, Prepend[poslist, {0, 0}], angles}]]

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

Blog

Search This Blog