Explaining preg_match:
The preg_match () function accepts 5 parameters, the first two of which are required.
- The first parameter is the regular expression ($ pattern).
- The second parameter is the string where we will search for the expression ($ subject).
- The third parameter is an array that will store the term that matches ($ matches).
I tested your code and see what it did:
$texto = "<!DOCTYPE html> <html> <head> </head> <body> <div> </div> <div> </div> <div> <h3> </h3> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <br /><br /><br /><br /><br /><br /> <h6> </h6> <br /><br /><br /><br /> <p>Portaria nº 69 de 18/01/2017 - Publicada no DOU de 19/01/2017</p> <br /> <h3>Certificamos que</h3> <h5>{NOME_ALUNO}</h5> <h3>concluiu em {DT_APR} o <br />{NOME_CURSO}<br />realizado pela ---- na qualidade de aluno(a), perfazendo um total de {CARGA_HOR} horas.</h3> <h3><em> </em></h3> <h4><em>Cidade </em>{DATE_EXT}<em> .</em></h4> </div> </body> </html>";
$matches = array();
$resultado = preg_match('/^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/', $texto, $matches);
var_dump($resultado, $matches);
Giving a var_dump see the result:
int(1) array(1) { [0]=> string(37) " " }
Explaining what you are trying with your REGEX:
^
Indicates that it is the initial position of the string
<!
finds all literal characters <!
(case sensitive)
\w+
Find any character (containing the following pattern [a-zA-Z0-9 _])
+
Quantifier - Find one or more times, as many times as possible, returning
back if necessary (greedy)
\s
Find any blank space (can be [\ r \ n \ t \ f \ v])
\w+
Find any character (containing the following pattern [a-zA-Z0-9 _])
+
Quantifier - Find one or more times, as many times as possible, returning
back if necessary (greedy)
>
Finds the character > (case sensitive)
\s
search for empty spaces (can be [\ r \ n \ t \ f \ v])
<
Finds the < (case sensitive)
\w+
Find any character (containing the following pattern [a-zA-Z0-9 _])
+
Quantifier - Find one or more times, as many times as possible, returning
back if necessary (greedy)
>
Finds the character > (case sensitive)
\s
search for empty spaces (can be [\ r \ n \ t \ f \ v])
<
Finds the < (case sensitive)
\w+
Find any character (containing the following pattern [a-zA-Z0-9 _])
>
Finds the character > (case sensitive)
\s
search for empty spaces (can be [\ r \ n \ t \ f \ v])
<
Finds the < (case sensitive)
\/
Find the character / (case sensitive)
\w+
Find any character (containing the following pattern [a-zA-Z0-9 _])
>
Finds the character > (case sensitive)
By joining all this we have your regex /^<!\w+\s\w+>\s<\w+>\s<\w+>\s<\/\w+>/
If you just want to remove the HTML tags from a string, just use the strip_tags () function:
(PHP 4, PHP 5, PHP 7)
strip_tags - Removes HTML and PHP tags from a string
strip_tags (string $ str [ string $ allowable_tags])
Parameters
str
The input string.
allowable_tags
You can use the second parameter, which is optional, to indicate tags that should not be removed.
Note:
HTML comments and PHP tags are also removed. And this can not be modified with allowable_tags.
Example strip_tags ()
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will print:
Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>
Reference: PHP: strip_tags