How to make a regular expression that finds a name and then looks for a character?

1

I was analyzing an extensive html code that basically contains this format:

<span id="mensagem" class="topo">Classes e comandos</span>

The problem is that the amount of arguments within span vary in quantity and position

The goal is to get the " Classes and Commands " set.

To do this, I need to find the next string " > " when the search finds the string " message ", and when it finds it, that are different from the " " character.

So:

           (achou)--------------v(achou) 
<span id="mensagem" class="topo">Classes e comandos</span> 
                                 ||||||||||||||||||x(chega nesse e para)
                                   (pega esses) 

Just need to express this in regular expression. I'm using notepad ++, would anyone know how to formulate a regular expression for this problem?

    
asked by anonymous 23.08.2017 / 03:10

2 answers

2

Answer
As mentioned by the user Wellington, you should follow the steps:

  

Go to Search- > Replace .
  Set the value of the Search / Find field: (<.*?(?=mensagem).*?>)(.*?)(<.*?>)|(.*)
  Set the value of the Replace With field: \ 2 or $ 2.   Set the search mode to: Regular expression .
  Click the Replace All button.

This will replace all text with content that has the keyword message within the tag.

You can test this regex here.

If you have not solved your problem, please comment here what I expected, what happened wrong and I try to solve it, I hope I have helped: D

Explanation of Regex
This regex has 4 groups of captures, I will explain what each one does so you can understand better

(<.*?(?=mensagem).*?>)

Group 1 will capture everything that is between the tag, if you have the word message in any position before the character " > ", for this I used one positive lookahead , it determines that everything between (?= and ) is a condition for capturing what is before.

(.*?)

Group 2 will only be triggered if group 1 captures something, since it is in the same expression and is not after an OR operator, it captures everything but breaks of lines and for as soon as another character of the next expression is found.

(<.*?>)

Group 3 captures everything between tags after group 2, the " < " tag also serves as a limiter for group 2 to stop capture when you find it.

|(.*)

Group 4 is an expression after the OU operator, this means that if the regex does not capture with the previous expression, it will try to capture with that logo I just inserted a ". " operator to catch any character other than line break ( \n ), so anything that does not match your search will be deleted by replacing everything with the contents of group 2.

    
24.08.2017 / 01:49
1

Follow these steps:

  

Go to Search- > Replace.   Set the value of the Find / Find field:
  Set the value of the Replace With: \ 1 or \ 2 field.   Set the search mode to: Regular expression .
  Click the Replace All button.

Remembering that it will only leave the result found, example:

<div>
   <span id="mensagem" class="topo">Texto 01</span>
   <span id="mensagem" class="topo">Texto 02</span>
   <span id="mensagem" class="topo">Texto 03</span>
   <span id="mensagem" class="topo">Texto 04</span>
   <span id="mensagem" class="topo">Texto 05</span>
</div>

It only stays:

Texto 01
Texto 02
Texto 03
Texto 04
Texto 05
    
23.08.2017 / 04:54