Bug introduced in 11.0 and persisting through 11.3
From this answer, I doubt the capability to work on single character. So I give some test to verify this possibility. You can get my test imgs
by this code
imgs = Binarize[
Import[#]] & /@ {"https://i.stack.imgur.com/PvuFe.png",
"https://i.stack.imgur.com/bXHyv.png",
"https://i.stack.imgur.com/6Uxpo.png"};
Note the TextRecognize[#, "Character"] & /@ imgs
will get nothing. We can get a example from the documentation in Examples/Applications, that indicate the appropriate mask maybe can improve the performance to get a character, but I don't very like this method. Because it is hard to build a mask for characters "i"
,"j"
like
TextRecognize[#,
Masking -> MorphologicalTransform[#, "BoundingBoxes", Infinity],
RecognitionPrior -> "Character"] & /@ imgs
{{H,1,O},{m,Y},{d,d}}
- Any workaround that can improve the recognition quality when
TextRecognize
work on single character
Or
- If we want to improve the recognition quality by mask, how to build correct mask?
I desire to make my this answer better by TextRecognize
.
Answer
I felt that I miss some simple way to unite closely located components and finally I found it: ImageForestingComponents
(thanks to this answer)!
- It is unfortunate that a link to this function isn't included in the "See Also" drop-down list neither on the Docs page for
ComponentMeasurements
, norMorphologicalComponents
, norMorphologicalTransform
. That's why I wasn't able to find it quickly...
I'll show how it can be used on the most problematic case with letter "i" which is formed by two not connected clusters of points:
i = Import["https://i.stack.imgur.com/PvuFe.png"]
With horizontal radius 1
and vertical radius 6
we get a segmentation where our letter "i" is counted as a single component:
ImageForestingComponents[i, Automatic, {1, 6}] // Colorize
Using ComponentMeasurements
we can get the bounding boxes of our characters dropping the background:
c = ComponentMeasurements[ImageForestingComponents[i, Automatic, {1, 6}],
"BoundingBox", #"ConvexCoverage" < .9 &]
{2 -> {{66., 125.}, {79., 161.}}, 3 -> {{46., 61.}, {84., 98.}}}
HighlightImage[i, {Yellow, Rectangle @@@ c[[All, 2]]}]
TextRecognize
accepts a set of Rectangle
primitives as a Mask
(it is documented under the Examples ► Options ► Masking
sub-subsection):
TextRecognize[i, Masking -> Rectangle @@@ c[[All, 2]], RecognitionPrior -> "Character"]
{"i", "O"}
That's all. :^)
Comments
Post a Comment