In my project I need to read an HTML file that in the source code has a structure of an xml. I need to read this HTML file, get the value of the xml tags that have there make a whole process to save this data in my database ....
Read an xml, my system reads a good one, but I need my system to be able to read an HTML file as well.
How can I do this? I have no idea where to start.
Structure of my HTML file
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body><certidao>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
</certidao>
</body></html>
I need to read everything inside the root tag certidao
and disregard HTML tags
The html page is saved on the computer and you do not need to access the link but rather the file path.