How to create the tag! DOCTYPE html in Beautiful Soup (bs4)

0

I wanted to create the Beautiful Soup tag (bs4), and I developed the following:

from bs4 import Doctype

tag = Doctype('html')

I did the above. But it does not create the tag.

How to proceed?

    
asked by anonymous 09.05.2018 / 16:30

2 answers

1

Create the Doctype with beautifulsoup elements:

>>> from bs4 import Doctype
>>> tag = Doctype('html')
>>> type(tag)
<class 'bs4.element.Doctype'>
>>> print(tag)
'html'

Insert into HTML:

>>> from bs4 import Doctype
>>> from bs4 import BeautifulSoup

>>> html = '''<html><body></body></html>'''
>>> soup = BeautifulSoup(html, 'html.parser')

>>> tag = Doctype('html')
>>> type(tag)
<class 'bs4.element.Doctype'>
>>> tag
'html'
>>> soup.insert(0, tag)
>>> soup
<!DOCTYPE html>
<html><body></body></html>
    
05.06.2018 / 00:57
1

If in fact the intention is to generate .html files I believe that

You can install html5lib with pip:

pip install html5lib

And then use html5lib , like this:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<p></p>', 'html5lib')

soup.body.append(soup.new_tag("a", href="https://pt.stackoverflow.com"))

print(soup)

Of course the output will look something like:

b'<html>\n <head>\n </head>\n <body>\n  <p>\n  </p>\n  <a href="https://pt.stackoverflow.com">\n  </a>\n </body>\n</html>'

But to solve it would suffice to concatenate a string with the HTML5 doctype, for example:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<p></p>', 'html5lib')

soup.body.append(soup.new_tag("a", href="https://pt.stackoverflow.com"))

source = soup.prettify("utf-8")

with open("output.html", "wb") as file:
    file.write(b'<!DOCTYPE html>\n')
    file.write(source)

print(source)

I do not know html5lib in depth, but maybe I should do something with this alone.

    
09.05.2018 / 18:53