PHP Regex Get the html tag

1

Good afternoon

I would like to ask for help to make a regex that separates the values of this string:

<table>|<tr>[<td>#VALOR#</td>]</tr>|</table>

I would need the regex to break the values as follows:

match 1: match 2: match 3: #VALUE #

I tried, tried and I can not. I was using something like this:

(<\s*?table\b[^>]*>).*(<\/table\b[^>]*>)

Thank you in advance

Thank you

    
asked by anonymous 19.02.2016 / 15:02

3 answers

1
  // String a ser tratada
  $string = "<table>|<tr>[<td>#VALOR#</td>]</tr>|</table>"; 

  // Expressão regular 
  $regex  = "#\<table\>\|\<tr\>\[\<td\>(.*)\<\/td\>\]\<\/tr\>\|\<\/table\>#"; 

  // Extrai o conteudo
  preg_match_all($regex,$string,$retorno,PREG_PATTERN_ORDER);

  // Valor #VALOR#
  $valor = $retorno[1][0];

  // Exibi o valor
  echo $valor;
    
19.02.2016 / 15:18
1

I will not go into all the explanations about Do not use REGEX for HTML again.

I think what you want is this:

~<table>.*?(<tr>.*?(<td>(.*?)</td>).*?</tr>).*?</table>~

Explanation

  • <table> - literal should start the sentence.
  • .*? - anything as little as possible until you fit the next sentence.
  • <tr> - literal must have this sentence.
  • .*? - anything as little as possible until you fit the next sentence.
  • <td> - literal must have this sentence.
  • .*? - anything as little as possible until you fit the next sentence. (its value will be here).

With this you have created 4 groups:

  • 0 - The captured string itself.
  • 1 - From <tr> ... </tr> .
  • 2 - From <td> ... </td> .
  • 3 - The value of <td> .

Addendum

Example

REGEX101

    
01.11.2016 / 17:10
0

You can not do what you're after with just three exact matches. You can not just capture WW from the WAW string in just one capturing group, even using non-capturing groups .

What gives to do, however, is as follows:

$string = "<table>|<tr>[<td>#VALOR#</td>]</tr>|</table>";

$regex = "#(<table>)\|(<tr>)\[<td>([^<]*)<\/td>\](<\/tr>)\|(<\/table>)#";

preg_match($regex, $string, $retorno);

$match1 = $retorno[1] . $retorno[5];
$match2 = $retorno[2] . $retorno[4];
$match3 = $retorno[3];

echo $match1 . "\n";
echo $match2 . "\n";
echo $match3 . "\n";

At the end, the variables $match1 , $match2 and $match3 will have the values <table></table> , <tr></tr> and #VALOR# , respectively, which is what you want.

And you can see the regex working in regex 101 .

Considerations:

    The regex assumes that the only variable value in its string is "# VALUE #", which can assume any string that does not have the character < ;

  • The regex does not handle blanks. If the string starts with < table> all the captures would fail.

26.02.2016 / 04:29