What's needed for Android OCR

Question

What's needed for Android OCR

Navigation

#1 by (5 votes)

6

I have seen several articles and questions in forums and many other sites on the internet, I know what is necessary for a basic ocr, I have already done one and tals but what I am going to ask here is more specific with respect to the subject. For an android OCR is required. - Camera - OCR API (Tesseract for example)

But I would like to know the following. 1 - When you are pointing to the text when the camera is able to focus take the image and analyze with the OCR API to find the text of that moment without having to take a photo, save and analyze the JPEG.

2 - How could I search the image to capture specific words.

3 - Put some art on the screen as a dots around the letters as I've seen in some other OCR apps.

I know it can be complicated, well, for me it's a lot, but if you have some that can give some light, some direction, obviously does not need a ready solution but which android classes I might use for that I study them.

android ocr

asked by anonymous 21.01.2016 / 00:39

1 answer

Resolve a bug in the middle of a sprint? Delete record with duplicate (Id) leaving only one occurrence

score 5 · Answer 1

You will get completely plastered if you use OCR APIs ready, it will not have much less ghosting to put points around a specific word, nothing in this sense will be possible, an OCR has only the function of trying to extract the letters of a image and return in text mode.

As commented by @LuizVieira OpenCV will be your right arm for this type of project, you can really train each letter and number of the alphabet to make comparisons in real-time, this training must have an invariant scale, that is, no matter the size of the font, no matter the scale, even then it will have to know which letter it is.

I can give you the basic steps of how this can be done by using OpenCV to extract pixels and without using OpenCV to train

Create vectors with the patterns of all letters and numbers, you will need to cut each letter and number, extract the pixels of each, use OpenCV it has functions ready for pixel extraction, store it in any way you see fit.
Now you have the basis for comparison, you will want to compare each captured in real time with the extracted patterns, use the OpenCV to cut each letter of your texts in real time, as know where each letter begins and ends while you point the camera from your cell phone? This algorithm can be done by comparing horizontally each pixel until you find the beginning and end of each letter, we are talking about something basic here, 99% of the texts are in black with white background (it is super important to define what color is text background, you can do this by writing an RGB histogram), or simply force everything into black and white which is really a great idea, let's focus on the white background for the character of example, walk until the white pixel finish marks the position, in this point will start the new pixel (black in this case), walk to the black pixel finish mark the position, this will tell you where to cut each letter or number (start and end), you just segment (sort) letters in real time.
Perfect trimmed the letter of the text, now extract the pixels of it, as well as the first step in building your bank.
Now compare what was extracted from the text with your database, in linear algebra has a concept called linear space , in this case we will have which pixels appear most often, is a way simple that can be used for mensural which is the most similar.
Assemble each word based on this rank (the higher the cosine returned by the linear space better) and surprise if that word is a específica you will have the whole position of it (beginning, end) and you can use OpenCV again to insert some desired art since now you know the exact position of it inside of the text.

I have just described a simple way to create a OCR , without using escala invariante , instead of using espaço linear you can train each letter and number using OpenCV , there is function SURF in% w / w which applies invariant scale and is faster than its predecessor SIFT , the basic the bulk of how everything works is that.