bugs - How to improve the recognition quality when TextRecognize work on single character

Bug introduced in 11.0 and persisting through 11.3

From this answer, I doubt the capability to work on single character. So I give some test to verify this possibility. You can get my test imgs by this code

imgs = Binarize[
    Import[#]] & /@ {"https://i.stack.imgur.com/PvuFe.png", 
   "https://i.stack.imgur.com/bXHyv.png", 
   "https://i.stack.imgur.com/6Uxpo.png"};

Note the TextRecognize[#, "Character"] & /@ imgs will get nothing. We can get a example from the documentation in Examples/Applications, that indicate the appropriate mask maybe can improve the performance to get a character, but I don't very like this method. Because it is hard to build a mask for characters "i","j" like

TextRecognize[#, 
   Masking -> MorphologicalTransform[#, "BoundingBoxes", Infinity], 
   RecognitionPrior -> "Character"] & /@ imgs

{{H,1,O},{m,Y},{d,d}}

Any workaround that can improve the recognition quality when TextRecognize work on single character

Or

If we want to improve the recognition quality by mask, how to build correct mask?

I desire to make my this answer better by TextRecognize.

Answer

I felt that I miss some simple way to unite closely located components and finally I found it: ImageForestingComponents (thanks to this answer)!

_{It is unfortunate that a link to this function isn't included in the "See Also" drop-down list neither on the Docs page for ComponentMeasurements, nor MorphologicalComponents, nor MorphologicalTransform. That's why I wasn't able to find it quickly...}

I'll show how it can be used on the most problematic case with letter "i" which is formed by two not connected clusters of points:

i = Import["https://i.stack.imgur.com/PvuFe.png"]

With horizontal radius 1 and vertical radius 6 we get a segmentation where our letter "i" is counted as a single component:

ImageForestingComponents[i, Automatic, {1, 6}] // Colorize

Using ComponentMeasurements we can get the bounding boxes of our characters dropping the background:

c = ComponentMeasurements[ImageForestingComponents[i, Automatic, {1, 6}], 
  "BoundingBox", #"ConvexCoverage" < .9 &]

{2 -> {{66., 125.}, {79., 161.}}, 3 -> {{46., 61.}, {84., 98.}}}

HighlightImage[i, {Yellow, Rectangle @@@ c[[All, 2]]}]

TextRecognize accepts a set of Rectangle primitives as a Mask (it is documented under the Examples ► Options ► Masking sub-subsection):

TextRecognize[i, Masking -> Rectangle @@@ c[[All, 2]], RecognitionPrior -> "Character"]

{"i", "O"}

That's all. :^)

Blog

Search This Blog

bugs - How to improve the recognition quality when TextRecognize work on single character

Or

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?