Remove tags generated at the end of the string of a Text Editor

2

I'm using a text editor and like others I've used, it always generates some useless tags that I'd like to remove. I can remove the last one, but sometimes it generates more than once.

My code:

def remove_useless_tags(message):
    message = message.replace("<p><br/></p>", "") \
                .replace("<p></p>", "") \
                .replace("<p><b><br/></b></p>", "")
    # .replace("<p><br></p>", "")
    if message[-11:] == "<p><br></p>":
        message = message[:-11]
    return message

When a string appears like this: <p>Olá</p><p><br></p> it can remove <p><br></p> from the end. But sometimes there are texts in this format:

<p>Olá</p><p><br></p><p><br></p>
<p>Olá</p><p><br></p><p><br></p><p><br></p>

I would like to remove all <p><br></p> from the end of the string. Remembering that there are some <p><br></p> that are in the middle of the sentence that can not be removed . They are "enters" that the user himself puts when he is going to write. The problem is the final "enters", which are unnecessary but compromise the layout.

I think I can solve it with regex, but I need some help with it. Thanks!

    
asked by anonymous 09.11.2018 / 21:06

1 answer

5

If you need to get exactly one string at the end of the text you can use the $ ken token, your regex only needs a limiter and a quantifier in the sequence you want to capture, so you do not have to keep repeating the replacement command .

So I recommend that you use this Regex (<p><br><\/p>)*?$ with the re.sub (pattern, substitution, string) , since replace does not work with regex.

Application in your code:

import re
[...]
def remove_useless_tags(message):
    result = re.sub('(<p><br><\/p>)*?$', "", message)

    return result

Explanation of Regex

(<p><br><\/p>)*?$
  • (<p><br><\/p>) > Sequence you want to capture.

  • *? > Quantifier lazy, will capture 0 or + sequences.

  • $ > It signals that it can only capture at the end of the string.

Here you also have a Regex test

    
10.11.2018 / 01:09