Capture groups where a specific word appears with Regex

1

I have the following situation:

text_1 = O cachorro correu com o gato
text_2 = O carro passou e o cachorro foi atrás
text_3 = Sempre que chego em casa meu cachorro pula em mim
text_4 = Ele foi correndo atrás do sonho
text_5 = O cachorro latiu para o carteiro
text_6 = Quando seu dono ordenou, corra cachorro

I want to get groups with "cachorro, pul\w+, corre\w+ e foi" , but in all groups the word dog is present.

I tried:

re.search((?:\s(cachorro|corre\w+|foi|pul\w+)){2,},text_n)

What gives match in:

text_1 = cachorro correu
text_2 = cachorro foi
text_3 = cachorro pula 
text_4 = foi correndo
text_5 = None
text_6 = corra cachorro

My problem is with the text_4 match , this result is not good for me. What I want to know is if there is a way to match groups using Regular Expressions where a particular word, in the case dog , appears at least once. Other variations of the word correr and pular may occur together with the dog.

Obg to all.

    
asked by anonymous 10.08.2017 / 18:49

1 answer

1

Response

If you want to identify the words that are preceded by "dog" you can use a positive lookbehind .

((?<=cachorro )corre\w+|(?<=cachorro )foi|(?<=cachorro )pul\w+)

You can see how this regex works here .


Explanation:

((?<=cachorro )[...]

The above regex identifies the word "dog" (with space at the end), through a positive lookbehind : this means that it identifies the use of this string and starts match

[...]corre\w+[...]

After this captures the following word if it was something with a prefix run, pul or equal was. Above is the example with run.
With this you can add the word "dog" before each match resulting in what you wanted.

What you did wrong
By wrapping the capture group with OU ( | ) you did not even capture all occurrences of the words cachorro, corre\w*, foi e pul\w* no matter what words precede them.

Addendum

As mentioned in the comments, if you want to use some other predecessor other than a dog, you can use OU by copying the previous expression and changing the predecessor and occurrences you want to capture after it. Example:

((?<=cachorro )corre\w+|(?<=cachorro )foi|(?<=cachorro )pul\w+)|((?<=gato )corre\w+|(?<=gato)foi|(?<=gato)dorm\w+)

Here is an example of the upstream regex in operation

    
10.08.2017 / 21:39