Skip to main content

graphics - How to create word clouds?



Word clouds are rather useless fancy and visually appealing plots, where words are plotted with different sizes according to their frequency in a corpus. Many applications exist out there (Wordle, Tagxedo, etc.) that can give an example. I am interested in the algorithm that achieves the closest possible packing of words or other irregular shapes.


There is a method for defining the convex hull of an object (in the Computational geometry package), but I think one needs here the boundary that closes the least area. If this is calculated, perhaps the packing method of graph layout can be exploited by assuming that points on the hull of a word correspond to graph vertices... but this is just speculation. So far I could only list and style the words (that was the easy part):


tally = Tally@
Cases[StringSplit[ExampleData[{"Text", "AliceInWonderland"}],
Except@LetterCharacter], _?(StringLength@# > 4 &)];
tally = Cases[tally, _?(Last@# > 10 &)];
range = {Min@(Last /@ tally), Max@(Last /@ tally)};

words = Style[First@#, FontFamily -> "Cardinal", FontWeight -> Bold,
FontColor ->

Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, .8}]],
FontSize -> Last@Rescale[#, range, {12, 70}]] & /@ tally;

Framed[Grid@Partition[words, 10, 10, {1, 1}, {}],
FrameStyle -> {Gray, Thick}, RoundingRadius -> 10, ImageMargins -> 5]

Mathematica graphics


Some possible specifications of the algorithm:



  • According to this link (shared by cormullion) identifying the closest boundary of each word is not enough as words can appear inside other glyphs with holes, like P, A, etc. Thus indeed intersection of words must be tested.


  • According to Szabolcs, the code might be able to resize words to fit them better

  • Many applications are able to arrange the cloud to fill up a user-specified shape (e.g. ellipse, apple, Che Guevara, etc.) instead of being casually positioned along the ever-increasing spiral.

  • It would be nice to allow individual words to have different rotations.

  • As usually, a fully vectorized version is preferred over image-processing methods (if the former is faster).

  • Also it would be nice to have post-rendering effects, like clickable words, mouseover effects, etc.


One way to convert strings to vector graphics is:


First@ImportString[
ExportString[
Style["SomeText", Italic, FontFamily -> "Times", FontSize -> 36],

"PDF"], "PDF", "TextMode" -> "Outlines"]

Some related questions for those who want to do further research:




Answer



Here's what I came up with


Mathematica graphics


How I did it


First we need a list of words. Here, I've taken the original list ordered by size.


tally = Tally@

Cases[StringSplit[ExampleData[{"Text", "AliceInWonderland"}],
Except@LetterCharacter], _?(StringLength@# > 4 \[And] # =!=
"Alice" &)];
tally = Cases[tally, _?(Last@# > 10 &)];
tally = Reverse@SortBy[tally, Last];
range = {Min@(Last /@ tally), Max@(Last /@ tally)};

words = Style[First@#, FontFamily -> "Cracked", FontWeight -> Bold,
FontColor ->
Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, 1}]],

FontSize -> Last@Rescale[#, range, {12, 70}]] & /@ tally;

The words are rasterised and cropped to make sure the bounding box is as tight as possible.


wordsimg = ImageCrop[Image[Graphics[Text[#]]]] & /@ words;

To produce the image the words are added one by one using a Fold loop where the next word is placed as close to the centre of the existing image as possible. This is done by applying a max filter to the binarized version of the original image thus turning forbidden pixels white and looking for the black point that is closest to the centre of the image.


iteration[img1_, w_, fun_: (Norm[#1 - #2] &)] := 
Module[{imdil, centre, diff, dimw, padding, padded1, minpos},
dimw = ImageDimensions[w];
padded1 = ImagePad[img1, {dimw[[1]] {1, 1}, dimw[[2]] {1, 1}}, 1];


imdil = MaxFilter[Binarize[ColorNegate[padded1], 0.01],
Reverse@Floor[dimw/2 + 2]];
centre = ImageDimensions[padded1]/2;

minpos = Reverse@Nearest[Position[Reverse[ImageData[imdil]], 0],
Reverse[centre], DistanceFunction -> fun][[1]];
diff = ImageDimensions[imdil] - dimw;
padding[pos_] := Transpose[{#, diff - #} &@Round[pos - dimw/2]];


ImagePad[#, (-Min[#] {1, 1 }) & /@ BorderDimensions[#]] &@
ImageMultiply[padded1, ImagePad[w, padding[minpos], 1]]]

Fold[iteration, wordsimg[[1]], Rest[wordsimg]]

You can play around with the distance function. For example for a distance function


fun = Norm[{1, 1/2} (#2 - #1)] &

you get an ellipsoidal shape:


Fold[iteration[##, fun]&, wordsimg[[1]], Rest[wordsimg]]


Mathematica graphics




Updated version


The previous code places new words in the image by approximating them with rectangles. This works fine for horizontally or vertically oriented words, but not so well for rotated words or more general shapes. Luckily, the code can be easily modified to deal with this by replacing the MaxFilter with a ImageCorrelate:


iteration2[img1_, w_, fun_: ( Norm[#1 - #2] &)] := 
Module[{imdil, centre, diff, dimw, padding, padded1, minpos},
dimw = ImageDimensions[w];
padded1 = ImagePad[img1, {dimw[[1]] {1, 1}, dimw[[2]] {1, 1}}, 1];
imdil = Binarize[ImageCorrelate[Binarize[ColorNegate[padded1], 0.05],

Dilation[Binarize[ColorNegate[w], .05], 1]]];
centre = ImageDimensions[padded1]/2;
minpos =
Reverse@Nearest[Position[Reverse[ImageData[imdil]], 0],
Reverse[centre], DistanceFunction -> fun][[1]];
Sow[minpos - centre]; (* for creating vector plot *)
diff = ImageDimensions[imdil] - dimw;
padding[pos_] := Transpose[{#, diff - #} &@Round[pos - dimw/2]];
ImagePad[#, (-Min[#] {1, 1}) & /@ BorderDimensions[#]] &@
ImageMultiply[padded1, ImagePad[w, padding[minpos], 1]]]


To test this code we use a list of rotated words. Note that I'm using ImagePad instead of ImageCrop to crop the images. This is because ImageCrop seems to clip the words sometimes.


words = Style[First@#, FontFamily -> "Times", 
FontColor ->
Hue[RandomReal[], RandomReal[{.5, 1}], RandomReal[{.5, 1}]],
FontSize -> (Last@Rescale[#, range, {12, 150}])] & /@ tally;

wordsimg = ImagePad[#, -3 -
BorderDimensions[#]] & /@ (Image[
Graphics[Text[Framed[#, FrameMargins -> 2]]]] & /@ words);


wordsimgRot = ImageRotate[#, RandomReal[2 Pi],
Background -> White] & /@ wordsimg;

The iteration loop is as before:


Fold[iteration2, wordsimgRot[[1]], Rest[wordsimgRot]]

which produces


Mathematica graphics


Second update



To create a vector graphics of the previous result, we need to save the positions of the words in the image, for example by adding Sow[minpos - centre] to the definition of iteration2 somewhere towards the end of the code and using Reap to reap the results. We also need to keep the rotation angles of the words, so we'll replace wordsimgRot with


angles = RandomReal[2 Pi, Length[wordsimg]];

wordsimgRot = ImageRotate[##, Background -> White] & @@@
Transpose[{wordsimg, angles}];

As mentioned before, we use Reap to create the position list


poslist = Reap[img = Fold[iteration2, wordsimgRot[[1]], 
Rest[wordsimgRot]];][[2, 1]]


The vector graphics can then be created with


Graphics[MapThread[Text[#1, Offset[#2, {0, 0}], {0, 0}, {Cos[#3], Sin[#3]}] &,
{words, Prepend[poslist, {0, 0}], angles}]]

Comments

Popular posts from this blog

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Adding a thick curve to a regionplot

Suppose we have the following simple RegionPlot: f[x_] := 1 - x^2 g[x_] := 1 - 0.5 x^2 RegionPlot[{y < f[x], f[x] < y < g[x], y > g[x]}, {x, 0, 2}, {y, 0, 2}] Now I'm trying to change the curve defined by $y=g[x]$ into a thick black curve, while leaving all other boundaries in the plot unchanged. I've tried adding the region $y=g[x]$ and playing with the plotstyle, which didn't work, and I've tried BoundaryStyle, which changed all the boundaries in the plot. Now I'm kinda out of ideas... Any help would be appreciated! Answer With f[x_] := 1 - x^2 g[x_] := 1 - 0.5 x^2 You can use Epilog to add the thick line: RegionPlot[{y < f[x], f[x] < y < g[x], y > g[x]}, {x, 0, 2}, {y, 0, 2}, PlotPoints -> 50, Epilog -> (Plot[g[x], {x, 0, 2}, PlotStyle -> {Black, Thick}][[1]]), PlotStyle -> {Directive[Yellow, Opacity[0.4]], Directive[Pink, Opacity[0.4]],