Retrieve content between custom tags using RegEx

0

I need to capture content that is between a custom tag that has a default identifier, such as <:item>Conteúdo</item> , but I'm not able to make the closing of this tag also customizable, and in the case I'm only getting this way : <:item>Conteúdo</end> , keeping a default closing for all tags of the same content.

Current RegEx:

preg_match_all("~<:(.*?)>(.*?)</end>~si", $conteudo, $retorno);

What would be the regular expression for finding the opening tag and its relative closing tag? even if there is a parent-child hierarchy with the same tag name.

    
asked by anonymous 18.09.2014 / 20:34

2 answers

1

1) If you intend to ONLY use the content of the tag, you can use the ER below, in case it removes everything between <> :

$conteudo = '<:item>Conteúdo</item>';
print_r( preg_replace("/<.*?>/", "", $conteudo) );

Example available at ideone

2) If you intend to use the tag itself and the content, you can use the ER below:

$conteudo = '<:item>Conteúdo</item>';
preg_match_all( '~<.+?>(.+?)<\/.+?>~' , $conteudo , $retorno );
echo $retorno[1][0];

Example available at ideone

Update

Step 1) replace <...> with a | marker Result: |HEADER||MAIN|ITEM||

Step 3) Remove double markers || by simple |
Result: |HEADER|MAIN|ITEM|

Step 4) Break the string in the markers and filter the null values
Result: array( 1 => 'HEADER' , 2 => 'MAIN' , 3 => 'ITEM' )

$string = '<:header>HEADER</header><:main>MAIN<:item>ITEM</item></main>';

// passo 1
$string = preg_replace( '/<.*?>/' , '|' , $string );

// passo 2
$string = preg_replace('/\|+/', '|', $string);

// passo 3
$string = array_filter( explode( '|' , $string ) );

Note that NOT is ideal, it only solves a problem. The way you generate this string is inappropriate. See a demo on ideone

    
18.09.2014 / 22:17
0

I do not know if this is what you want, but it works:

$html = "<:item>ConteúdoA</item><:valor>ConteúdoB</valor><:tag>ConteúdoC</tag><:teste>ConteúdoD</teste>";
preg_match_all("/<:(.*?)>/", $html, $arrTag);

foreach($arrTag[1] as $tag)
{
echo $tag;
preg_match_all('/<:'.$tag.'>(.+?)<\/'.$tag.'>/sm', $html, $conteudo);
print_r($conteudo);
}
    
18.09.2014 / 21:56