I'm trying to get the HTML of a page with Jsoup .
This page has Cloudflare as a protection, and instead of getting the HTML code of the site I'm interested in, you're returning the HTML to the Cloudflare page ( see image below ) that appears before to redirect to the target site. I need to get the HTML of the site to which the Cloudflare will redirect after that page.
Cloudflarepageexample(notthesiteI'mlookingfor,butit'sforexample).
Mycodelookslikethis:
importjava.io.IOException;importorg.jsoup.Jsoup;importorg.jsoup.nodes.Document;publicclassMain{publicstaticvoidmain(String...args)throwsIOException{Documentdocument=Jsoup.connect("http://site.com")
.userAgent("Mozilla/5.0")
.timeout(10000)
.get();
System.out.println(document.html());
}
}
The output looks something like this:
<html>
<head>
<title>You are being redirected...</title>
<script> <!-- código JS enorme --> </script>
</head>
<body></body>
</html>
I thought of setting setRedirects
to true
, but reading the documentation I saw that this is the default value. I found this question with the same title in StackOverflow but the problem there is another one.
I also tried two requests, the second using the first cookies and gave the same, I fall on the same page:
import java.io.IOException;
import java.util.Map;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Main {
public static void main(String...args) throws IOException{
final String URL = "http://site.com/";
// Executando a primeira requisição.
Connection.Response response =
Jsoup.connect(URL)
.timeout(10000)
.method(Connection.Method.GET)
.execute();
// Pegando os cookies da resposta
Map<String, String> cookies = response.cookies();
Document doc = Jsoup.connect(URL)
.cookies(cookies) // Usando os cookies na 2ª chamada
.get();
System.out.println(doc.html()); // Fail! Cloudflare me bloqueia.
}
}
I accept a response that does not use Jsoup as well, as long as it solves this problem. I do not need anything complex, just that the return containing the HTML is a String
.