How does a CAPTCHA work?

25

I understand that a CAPTCHA is a way for me to ensure that the user interacting with my system is a human and not a script.

But this is the simple explanation we give to lay people. How do CAPTCHAs really work, and what strategies do they use? Would it be possible to have a simple code sample to demonstrate the concept?

    
asked by anonymous 11.07.2017 / 14:29

1 answer

15

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart , ie fully automated public tests to differentiate computers from humans.

In general, CAPTCHAS are made so that they are easy to be solved by a human and difficult for computers. The program that displays the captcha usually already knows the correct answer and only confirms if what the user replied is correct. There are several types of CAPTCHAs

  • Text : Usually some random letters and some noise is added (such as straight lines, or dots). They are presented to the user in the form of an image.

  • Audio : They are generally used in conjunction with text capcthas and primarily perform accessibility for visually impaired users. They are sounds with some noises included.

  • Images : most recent. Some images are presented to the user and the program prompts them to select them from some category.

Anyone can create a captcha: as long as it is a fully automated test.

Here's a generator made in R:

library(magick)
library(magrittr)
gerar_captcha <- function(base_img){

  letras <- sample(letters, 6, replace = TRUE) %>%
    paste0(collapse = "")

  cap <- base_img %>%
    image_annotate(
      letras, 
      size = sample(30:70, 1),
      degrees = sample(1:60, 1),
      color = sample(c("green", "blue", "red"), 1),
      location = paste0("+", sample(20:100, 1), "+", sample(20:100, 1))
    )

  list(
    letras = letras,
    cap = cap
  )
}

This code generates captchas of this type, with random position, tilt, and color. In addition to random lyrics.

However,textcaptchasaredoomedtofailure.Itdoesnottakesomuchworktobreakthemthesedays,especiallyusingmachinelearningtechniques.Ihaveaprojecttobreakpublicservicescaptchas(whichdonotofferAPI)andwithconvolutionalneuralnetworkswearereachingmorethan99%accuracyinseveraltypesofcaptchas: link

Therefore, companies are recently developing a number of other ways to verify that the user is human. The most commonly used solution today is Google's reCaptcha, which, unbelievably, just asks you to click a button. This capctha analyzes various information about your navigation and how you click the button to tell if you are human or computer, and it is much harder to break than capctha by text.

An interesting story about captchas was its use for creating databases of properly labeled images and for transcription of books. The first versions of reCaptcha were as follows:

Awordscannedfromabook(whichthecaptchavendorhimselfdidnotknowtheanswerto)andawordgenerated(whichtheprogramknewanswer)werepresented.Withtheresponseofusers,theprogramcouldidentifyandtranscribewordswritteninbooksthathadbeendigitized.SomemoremodernversionsalsohelpGoogleidentifyhousenumbersinStreetViewimages:

Sowhenyou'reansweringcaptchas,youmightbe helping google to improve Google Maps .

    
11.07.2017 / 16:41