You will get completely plastered if you use OCR APIs ready, it will not have much less ghosting to put points around a specific word, nothing in this sense will be possible, an OCR has only the function of trying to extract the letters of a image and return in text mode.
As commented by @LuizVieira OpenCV will be your right arm for this type of project, you can really train each letter and number of the alphabet to make comparisons in real-time, this training must have an invariant scale, that is, no matter the size of the font, no matter the scale, even then it will have to know which letter it is.
I can give you the basic steps of how this can be done by using OpenCV
to extract pixels and without using OpenCV
to train
-
Create vectors with the patterns of all letters and numbers, you will
need to cut each letter and number, extract the pixels of each,
use OpenCV
it has functions ready for pixel extraction, store it in any way you see fit.
-
Now you have the basis for comparison, you will want to compare each
captured in real time with the extracted patterns, use the
OpenCV
to cut each letter of your texts in real time, as
know where each letter begins and ends while you point the camera
from your cell phone? This algorithm can be done by comparing
horizontally each pixel until you find the beginning and end of each
letter, we are talking about something basic here, 99% of the texts are in
black with white background (it is super important to define what color is
text background, you can do this by writing
an RGB histogram), or simply force everything into black and white which is really a great idea, let's focus on the white background for the character of
example, walk until the white pixel finish marks the position, in this
point will start the new pixel (black in this case), walk to the
black pixel finish mark the position, this will tell you where to cut
each letter or number (start and end), you just
segment (sort) letters in real time.
- Perfect trimmed the letter of the text, now extract the pixels of it,
as well as the first step in building your bank.
- Now compare what was extracted from the text with your database,
in linear algebra has a concept called linear space , in this case
we will have which pixels appear most often, is a way
simple that can be used for mensural which is the most
similar.
- Assemble each word based on this rank (the higher the cosine
returned by the linear space better) and surprise if that word is a
específica
you will have the whole position of it (beginning, end) and you can use OpenCV
again to insert some desired art since now you know the exact position of it inside of the text.
I have just described a simple way to create a OCR
, without using escala invariante
, instead of using espaço linear
you can train each letter and number using OpenCV
, there is function SURF in% w / w which applies invariant scale and is faster than its predecessor SIFT , the basic the bulk of how everything works is that.