Jsoup returning no value

1

I have a problem with Jsoup library, which when I try to make the connection to a certain page, it simply does not return any value from the connection.

public static void main(String[] args) {

    try{

        Document doc = Jsoup.connect("http://pt.stackoverflow.com/").get();

        //Pegando elemento das perguntas
        Elements elements = doc.select("a.question-hyperlink");

        System.out.println("O  titulo da página é: "+doc.title());

        //exibindo titulo da pergunta
        for(int i = 0; i <elements.size(); i++){
            System.out.println(elements.get(i).text());
        }
    }catch(Exception e){
        System.out.println("Erro "+ e);
    }
}

By coincidence I tested it with Stack Overflow and it gave the same problem.

Netbeans IDE Return:

Erro: org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://pt.stackoverflow.com/

------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.282s
Finished at: Sun May 15 13:26:41 BRT 2016
Final Memory: 5M/109M
------------------------------------------------------------------------

I do not know if this influences something, but the project type is Maven.

@EDIT

I was able to resolve the problem by adding the following method to the connection.

Document doc = Jsoup.connect("http://pt.stackoverflow.com/")
                .userAgent("Mozilla").get();
    
asked by anonymous 15.05.2016 / 18:28

1 answer

0

Note Important: When accessing any page or service using a Web Crawler it is important to first check whether the provider allows you to read content in this way. In some cases it is allowed, the vendor itself provides a API for this access to be done correctly.

The code 403 status HTTP refers to error Forbidden (Forbidden). This happens for several reasons, but the most recurring is what the page realized that access is not being done by a browser. In these cases, just use the Connection#userAgent method to set it according to a browser accepted by the server. O of Chrome 55 in Windows 8 - 64 bits , for example:

Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36

Adding between the Jsoup#connect and Connection#get as follows:

...
.userAgent("Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")
...
    
02.02.2017 / 19:04