preg_match returning 0

0

Good afternoon, I have a string that I get from the database

<!DOCTYPE html> <html> <head> </head> <body> <div> </div> <div> </div> <div> <h3> </h3> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <br /><br /><br /><br /><br /><br /> <h6> </h6> <br /><br /><br /><br /> <p>Portaria n&ordm; 69 de 18/01/2017 - Publicada no DOU de 19/01/2017</p> <br /> <h3>Certificamos que</h3> <h5>{NOME_ALUNO}</h5> <h3>concluiu em {DT_APR} o <br />{NOME_CURSO}<br />realizado pela ---- na qualidade de aluno(a), perfazendo um total de {CARGA_HOR} horas.</h3> <h3><em> </em></h3> <h4><em>Cidade </em>{DATE_EXT}<em> .</em></h4> </div> </body> </html>

And I'm trying to do a regex in this string preg_match('/^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/', $texto);

But it always returns 0, and if I test the regex on sites like link it returns the expected result.

Can anyone point to my error?

Thank you.

    
asked by anonymous 27.09.2018 / 22:49

2 answers

4

Explaining preg_match:

The preg_match () function accepts 5 parameters, the first two of which are required.

  • The first parameter is the regular expression ($ pattern).
  • The second parameter is the string where we will search for the expression ($ subject).
  • The third parameter is an array that will store the term that matches ($ matches).

I tested your code and see what it did:

$texto = "<!DOCTYPE html> <html> <head> </head> <body> <div> </div> <div> </div> <div> <h3> </h3> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <br /><br /><br /><br /><br /><br /> <h6> </h6> <br /><br /><br /><br /> <p>Portaria nº 69 de 18/01/2017 - Publicada no DOU de 19/01/2017</p> <br /> <h3>Certificamos que</h3> <h5>{NOME_ALUNO}</h5> <h3>concluiu em {DT_APR} o <br />{NOME_CURSO}<br />realizado pela ---- na qualidade de aluno(a), perfazendo um total de {CARGA_HOR} horas.</h3> <h3><em> </em></h3> <h4><em>Cidade </em>{DATE_EXT}<em> .</em></h4> </div> </body> </html>";

$matches = array();

$resultado = preg_match('/^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/', $texto, $matches);

var_dump($resultado, $matches);

Giving a var_dump see the result:

int(1) array(1) { [0]=> string(37) " " }

Explaining what you are trying with your REGEX:

^ Indicates that it is the initial position of the string

<! finds all literal characters <! (case sensitive)

\w+ Find any character (containing the following pattern [a-zA-Z0-9 _])

+ Quantifier - Find one or more times, as many times as possible, returning back if necessary (greedy)

\s Find any blank space (can be [\ r \ n \ t \ f \ v])

\w+ Find any character (containing the following pattern [a-zA-Z0-9 _])

+ Quantifier - Find one or more times, as many times as possible, returning back if necessary (greedy)

> Finds the character > (case sensitive)

\s search for empty spaces (can be [\ r \ n \ t \ f \ v])

< Finds the < (case sensitive)

\w+ Find any character (containing the following pattern [a-zA-Z0-9 _])

+ Quantifier - Find one or more times, as many times as possible, returning back if necessary (greedy)

> Finds the character > (case sensitive)

\s search for empty spaces (can be [\ r \ n \ t \ f \ v])

< Finds the < (case sensitive)

\w+ Find any character (containing the following pattern [a-zA-Z0-9 _])

> Finds the character > (case sensitive)

\s search for empty spaces (can be [\ r \ n \ t \ f \ v])

< Finds the < (case sensitive)

\/ Find the character / (case sensitive)

\w+ Find any character (containing the following pattern [a-zA-Z0-9 _])

> Finds the character > (case sensitive)

By joining all this we have your regex /^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/

If you just want to remove the HTML tags from a string, just use the strip_tags () function:

(PHP 4, PHP 5, PHP 7)

strip_tags - Removes HTML and PHP tags from a string

strip_tags (string $ str [ string $ allowable_tags])

  

Parameters

str The input string.

allowable_tags You can use the second parameter, which is optional, to indicate tags that should not be removed.

  

Note:

HTML comments and PHP tags are also removed. And this can not be modified with allowable_tags.

Example strip_tags ()

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

The above example will print:

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

Reference: PHP: strip_tags

    
27.09.2018 / 23:21
1

You need a 3rd parameter in the function that is the result of the capture:

preg_match('/^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/', $texto, $resultado);

The value of $resultado[0] in the string will be <!DOCTYPE html> <html> <head> </head> .

If you want to do a replace to eliminate what the regex encountered, use preg_replace reassigning the value of the $texto variable:

$texto = preg_replace('/^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/', '', $texto);

Or if you want to define another variable without changing $texto :

$novotexto = preg_replace('/^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/', '', $texto);

But regex is not the best solution for these cases as there may be variance and will not work. You can use substr () by taking the position after </head> to the end:

$novotexto = substr($texto, strpos($texto, "</head>")+7);
    
27.09.2018 / 23:06