In summary, the HtmlUnit has an API that allows Java applications to perform the same actions a user would take in the browser, some examples include invoke a web page, click on buttons and / or links, fill out forms ...
Roughly it is a browser without the graphical interface ─ the project managers call it ─ < in> features and other information can be found on the project page.
Example
Consider an access to the http://meusiteficticio.com
page that has a form on the page with this structure:
<form id='form-login' action='/login' method='post'>
<input name='user' type='text' placeholder='Nome de usuário'/>
<input name='pass' type='password' placeholder='Senha'/>
<input type='submit' value='entrar'/>
</form>
By the browser, the user would enter a username and password in the appropriate fields, then click the button to submit the form. We will do the same however within the application.
They implemented ( v2.8 ) and made it public ( v2.11 ) the querySelector
and querySelectorAll
methods that work similar to the functions that exist in Javascript. To get the same result from the previous code with these methods the code looks like this:
// Obtém a página de login.
HtmlPage paginaDeLogin = new WebClient(BrowserVersion.BEST_SUPPORTED)
.getPage("http://meusiteficticio.com");
// Obtém os elementos do formulário.
HtmlTextInput inputNomeDeUsuario = paginaDeLogin.querySelector("input[name='user']");
HtmlPasswordInput inputSenha = paginaDeLogin.querySelector("input[name='pass']");
HtmlSubmitInput botaoEnviar = paginaDeLogin.querySelector("#form-login > input[type='submit']");
// Define o valor do atributo 'value' dos inputs.
inputNomeDeUsuario.setValueAttribute("joao");
inputSenha.setValueAttribute("joao1234");
// Simula o "click" no botão de submit e aguarda retorno
HtmlPage paginaAposOLogin = botaoEnviar.click();
// Mostra o código html da página
System.out.println(paginaAposOLogin.asXml());
If you are using an old version, which does not support querySelector
, you will first have to get the form and then get the inputs using the getInputByName
method:
// Simulando um navegador Chrome.
WebClient client = new WebClient(BrowserVersion.CHROME);
// Obtém a página.
HtmlPage paginaDeLogin = client.getPage("http://meusiteficticio.com");
// Obtém o formulário de login pelo atributo "id" no html.
// O segundo parâmetro é para aceitar case-sensitive
// e.g "FoRm-LoGiN" também encontraria o formulário.
HtmlForm formularioDeLogin = paginaDeLogin.getElementById("form-login", true);
// Obtém o inputs (do formulário) pelo atributo "name":
HtmlTextInput inputNomeDeUsuario = formularioDeLogin.getInputByName("user");
HtmlPasswordInput inputSenha = formularioDeLogin.getInputByName("pass");
// O "botão" de submit não possui name, id, class, etc.
// Então uma forma de obtê-lo é pelo "value='entrar".
HtmlSubmitInput botaoEnviar = formularioDeLogin.getInputByValue("entrar");
// Insere os valores nos campos de nome de usuário e senha
// (como se estivesse digitando pelo navegador)
inputNomeDeUsuario.setValueAttribute("joao");
inputSenha.setValueAttribute("joao1234");
// Simula o "click" no botão de submit e aguarda retorno
HtmlPage paginaAposOLogin = botaoEnviar.click();
// Mostra o código html da página
System.out.println(paginaAposOLogin.getWebResponse().getContentAsString());
Be cool, treat the exceptions. Attempting to insert (or even manipulate) a value into an input that does not exist will launch a NullPointerException
.
Keeping cookies
If you need to keep cookies for use in future requests, you should set CookieManager
for your" browser "- read WebClient
.
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
CookieManager cookieManager = client.getCookieManager();
cookieManager.setCookiesEnabled(true);
client.setCookieManager(cookieManager);
HtmlPage fb = client.getPage("https://facebook.com");
Disabling Warnings and Warnings
HtmlUnit will display all warnings that invalidate the Html document, eg obsolete attributes, Javascript and CSS errors - as seen in this image:
YoucanturnoffthesealertsbysettingHtmlUnitloggerleveltoOFF
:
Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);