Doubts with RegEx with new line

8

I created the following expression:

"<strike>.*?</strike>" 

To get all the text taxed, but due to the source code having a line break (as in the example below) is not working.

<p style="margin-top: 0; margin-bottom: 0"><a name="6"></a><strike>Art. 
6º São direitos sociais a educação, a saúde, o
trabalho, o lazer, a segurança, a previdência social, a proteção à maternidade e à
infância, a assistência aos desamparados, na forma desta Constituição.</strike></p>

<p style="margin-top: 0; margin-bottom: 0">
<strike><a name="art6"></a>Art.     6<sup>o</sup> São direitos sociais a educação, a saúde, o
trabalho, a moradia, o lazer, a segurança, a previdência social, a proteção à
maternidade e à infância, a assistência aos desamparados, na forma desta
Constituição.<a href="Emendas/Emc/emc26.htm#1">(Redação dada pela Emenda
Constitucional nº 26, de 2000)</a></strike></p>

I'm using regex in Notepad ++ find.

How do I regex get the line break too?

    
asked by anonymous 11.02.2014 / 17:57

4 answers

3

Just check the box ". consider line break" .

See the image:

    
11.02.2014 / 21:29
5

Most engines use . as "any character except line break" . Usually there is the m option, which binds the multiline mode and removes the constraint in . . Usually the syntax is this:

/<strike>.*?</strike>/m

But it varies from language to language and from every implementation. Look at the engine documentation you are using for more details.

    
11.02.2014 / 18:01
3

It depends on the language you are running. Each has a way of specifying that the dot must include end-of-line characters.

The dot all concept causes the dot to consider the line breaks, but we should not forget that by default some languages do not interpret several lines, so we must specify that the expression is multiline .

Java

In Java you add java.util.regex.Pattern.DOTALL and java.util.regex.Pattern.MULTILINE when creating Pattern :

Pattern.compile("\s+", Pattern.MULTILINE + Pattern.DOTALL);

Javascript

In Javascript it does not exist, but according to this response from the SOEN you can use [\s\S] instead of the point to reach the same goal.

  • \s includes whitespace, including line breaks and tabs
  • \S includes what is not whitespace (the opposite)

Therefore, [\s\S] includes all characters.

PHP

In PHP you can use the modifiers s ( dotall ) and m (multiline). Example:.

<?php
$subject = "abcdef";
$pattern = ''/(.*)/sm'';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>

Python

In Python we have the constants re.DOTALL and re.MULTILINE :

import re
regex = re.compile(pattern, flags = re.MULTILINE | re.DOTALL)
    
11.02.2014 / 18:02
1

The expression that satisfies the condition is thus

 <strike>(.|\n)*?</strike>

I could also use the expression I mentioned earlier:

<strike>.*?</strike>

and mark in the search mode of the checkbox "maches newline"

    
11.02.2014 / 19:59