I'm having a little difficulty consuming an HTML generated by a third-party page where HTML is missing some closing tags.
For example:
<div>
<li>
<div>
<div>test
test
</div>
<li>
<div>test
<div>test2</div>
</div>
Running the Nokogiri parse
html = Nokogiri::HTML(open('origem.html'))
The result is:
OrinHTML:
<!DOCTYPEhtmlPUBLIC"-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>
<li>
<div>
<div>test
test
</div>
<li>
<div>test
<div>test2</div>
</div>
</li>
</div>
</li>
</div></body></html>
Being that the correct one would look something like:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div>
<li>
<div>
<div>test
test
</div>
</div>
</li>
<li>
<div>test
<div>test2</div>
</div>
</li>
</div>
</body></html>