Open Source CAPTCHA decoder

4

I'm looking for some CAPTCHA decoding API, one that is freely usable and open source.

I understand that this is a complex process that uses OCR scans and advanced techniques for digital image analysis and processing, but I still think it's something interesting to study these mechanisms behind decoding.

So I looked for some references on the web, they were:

libautocaptcha

I found libautocaptcha , but with no success in use. After downloading the source code and necessary libraries it presents errors due to lack of classes.

JDownloader

A widely-quoted application in international forums is the JDonwloader , which internally has an implementation (JAnticaptcha) that decodes CAPTCHA from the major file-sharing sites .

Tesseract OCR tesseract

This is a powerful OCR that is also quoted in forums as a good option to scan and decode the CAPTCHA.

Given all this, does anyone have experience with some other decoding API or one of those above? Could you offer some functional example that resulted from this experience?

    
asked by anonymous 05.02.2014 / 19:54

2 answers

4

Captcha wrap routines are usually made for a specific captcha . You should usually treat the image before trying to read it with an OCR, trying to make the letters black on white. One that I recommend is tesseract, which you mentioned yourself.

I do not think there is a generic algorithm for this.

    
05.02.2014 / 20:26
1

It is possible, theoretically, to build generic software that is trained to solve any captcha and I believe that in a few years we will have it available. In my company, Infosimples , we have obtained incredible results on similar problems using Deep Learning, technology in which we are experts.

The article published by Google at ICLR14 regarding how they automated the recognition of digits in house numbers can be found at this link:

link

The presentation of the article in ICLR14 can be seen in this video:

link

They applied the same solution in reCaptcha using a base with a few million examples and managed to solve about 100,000 new captchas with 99.8% hit rate (better than a human in the same activity).

p>

The essence of the solution is to train databases (with millions of examples) in a very deep neural network (with many layers, convolutions and billions of connections between neurons).

Unfortunately, the technology to get results like the ones above is still relatively restricted, difficult to use and very expensive.

    
15.05.2015 / 15:04