I will only answer the part: Why does the error happen?
When you read an empty file with the read_html
function of the xml2
package using the code below:
tf <- tempfile()
file.create(tf)
html_erro <- read_html(tf)
You get a list of two elements with the externalptr
class. This can be seen with:
str(html_erro)
List of 2
$ node:<externalptr>
$ doc :<externalptr>
- attr(*, "class")= chr [1:2] "xml_document" "xml_node"
Now let's look at each of these objects in the list. First $doc
:
html_erro$doc
<pointer: 0x128c4d4c0>
See that it is a pointer to this memory address: 0x128c4d4c0
.
Now look at the object $node
:
html_erro$node
<pointer: 0x0>
It is a pointer to the address 0x0
. Here the problem will happen. When at some point your program tries to access the value of this pointer, it will attempt to access a null / non-existent memory address, causing what is called Segmentation fault .
In your case, the html_nodes
function attempted to access this address and found the problem, but it could happen for example when you do print(html_erro)
, here the print
function method for xml_doc
tries to access that pointer and causes segmentation to fail.