Get content between tags [x] and [/ x] with Regular Expression

1

My question is as follows, I have the following content that comes from a table in a database and I would like to use a regular expression or if you have something better, just separate the contents from within the brackets.

  

[pt-br]

     

What is Lorem Ipsum?

     

Lorem Ipsum is simply dummy print text and composition   industry. Lorem Ipsum has been the standard text of the   industry since the 1500s, when an unknown printer took   a type galley and scrambled it to make a specimen type book. He   survived not only five centuries, but also the leap to   electronic composition, remaining essentially unchanged. Was   popularized in the 1960s with the release of Letraset sheets   containing Lorem Ipsum passages, and more recently with   such as Aldus PageMaker, including versions of Lorem Ipsum.

     

[/pt-br]

  

[en-us]

     

What is Lorem Ipsum?

     

Lorem Ipsum is simply dummy text of the printing and typesetting   industry. Lorem Ipsum has been the industry's standard dummy text ever   since the 1500s, when an unknown printer took a galley of type and   scrambled it to make a type specimen book. It has survived not only   five centuries, but also the leap into electronic typesetting,   remaining essentially unchanged. It was popularized in the 1960s with   the release of Letraset sheets containing Lorem Ipsum passages, and   more recently with desktop publishing software like Aldus PageMaker   including versions of Lorem Ipsum.

     

[/en-us]

    
asked by anonymous 02.02.2017 / 00:21

4 answers

2
<?php

$texto = "[pt-br]

Qual é Lorem Ipsum?

Lorem Ipsum é simplesmente texto manequim da impressão e composição 
indústria. Lorem Ipsum tem sido o texto padrão do manequim da indústria
desde os anos 1500, quando uma impressora desconhecida tomou uma galera 
de tipo e mexidos-lo para fazer um livro tipo espécime. Ele sobreviveu 
não apenas cinco séculos, mas também o salto para composição eletrônica,
permanecendo essencialmente inalterado. Foi popularizado na década de 1960
com o lançamento de folhas Letraset contendo Lorem Ipsum passagens, e mais 
recentemente com software de editoração como Aldus PageMaker, incluindo
versões de Lorem Ipsum.

[/pt-br]";

$output = array();
preg_match_all("/\[(.*?)\]/", $texto, $output);
$texto = str_replace($output[0],'', $texto);
echo $texto;

Examples:

Separating into a array :

$output = array();
preg_match_all("/\[(.*?)\]/", $texto, $output);
$result = array();
for($i = 0; $i < count($output[0]); $i = $i + 2)
{
    $ini = strripos($texto, $output[0][$i]);    
    $end = strripos($texto, $output[0][$i+1]);
    $result[str_replace(['[',']'],'',$output[0][$i])] = 
        str_replace($output[0],'', substr($texto, $ini, $end-$ini)); 

}

var_dump($result);

Example:

02.02.2017 / 01:41
0

You can use the following code that will return both [pt-br] and [en-us], or any other value type between square brackets.

$re = '/\[[^]]+\]([^[]+)\[\/[^]]+\]/is';
$str = '[pt-br]

Qual é Lorem Ipsum?

Lorem Ipsum é simplesmente texto manequim da impressão e composição indústria. Lorem Ipsum tem sido o texto padrão do manequim da indústria desde os anos 1500, quando uma impressora desconhecida tomou uma galera de tipo e mexidos-lo para fazer um livro tipo espécime. Ele sobreviveu não apenas cinco séculos, mas também o salto para composição eletrônica, permanecendo essencialmente inalterado. Foi popularizado na década de 1960 com o lançamento de folhas Letraset contendo Lorem Ipsum passagens, e mais recentemente com software de editoração como Aldus PageMaker, incluindo versões de Lorem Ipsum.

[/pt-br]';

preg_match($re, $str, $matches);

// Retorna os valores encontrado
print_r($matches);

In your case you will find 2 groups, and in the second is just the text. That is, you should access the second position of the array.

    
02.02.2017 / 01:58
0

Try this regex (?<=\[\w{2}\-\w{2}\])(\.*)(?=\[\/\w{2}\-\w{2}\]) .

    
02.02.2017 / 01:53
0

The ideal when you capture markup content is to have the markup pattern in a group.

Example

  • Marking [pt-br][/pt-br] , default is pt-br .
  • Marking [en-us][/en-us] , default is en-us .

Why so in a group? So you use the capture itself to identify the end of the tag.

Resolution

~\[([^]]+?)\](.*)\[/\]~s

Explanation

  • \[([^]]+?)\] - We have the beginning of the marking that must begin with [ and end with ] where we apply the rule of the group mentioned above.
  • (.*) - Catch anything, remembering that since we have the modifier s , it includes \n .
  • \[/\] - This is where it guarantees that it will stop at the occurrence of the final markup, as it should capture [/ + already captured tag in group1 ( ) + ]

Problems

  • It is the same idea of HTML and the ideal one to handle these cases is a parser, not regex.

See the REGEX101

    
02.02.2017 / 16:39