Capture and filter result

3

I have a string

<div></div>
[........]
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<p></p>
[........]

How do I make the contents of the first <p> , ie the Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim

and from that content remove everything in parentheses (including parentheses) and finally get the text that precedes the first endpoint, with the final result:

Ola meu nome é pseudomatica, etc.

    
asked by anonymous 20.12.2014 / 22:23

2 answers

6

Catch the value of the first <p/>

One practical way is to take this HTML and generate a DOM of it through the PHP class DOMDocument :

$html = '<div></div>
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<p></p>';

$dom = new DOMDocument;
$dom->loadHTML($html);

// de todos os p, fica com o texto do primeiro
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue; 

// divide o texto por '.' e fica com a primeira parte
$texto = explode(".", $p)[0];

Example on Ideone :

var_dump(explode(".", $p)[0]); // string(48) "Ola meu nome é pseudomatica (sou normal), etc"

Removing parentheses and their contents

Then you can use a regular expression to remove the text in parentheses including the parentheses:

$texto = explode(".", $p)[0];
$textoFinal = preg_replace("/\([^)]+\)/","", $texto);

Example on Ideone :

var_dump(preg_replace("/\([^)]+\)/","", $texto));  // Ola meu nome é pseudomatica , etc
    
20.12.2014 / 23:08
2

I confess that if it were not the example of how the string should get after being captured and cleaned would have been almost impossible to answer.

And these "placeholders" [...] made it even harder.

Well, first you have to find all of the text, up to endpoint:

preg_match( '/<p>(.*?\.).*?<\/p>/', $string, $text );

If you find this paragraph, the variable $ text will have two indexes: In the first all that was married and in the second only what is inside the <p> .

Captured, you clean:

preg_replace( '/\s\(.*?\)/', '', $text[ 1 ] );

Cleaning is done by finding a space, followed by an opening-bracket, with anything inside and a closing-bracket.

Located this fragment, it is all removed and resulting astring:

Ola meu nome é pseudomatica, etc.

The complete code:

$string = '<div></div>
[........]
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<p></p>
[........]';

if( preg_match( '/<p>(.*?\.).*?<\/p>/', $string, $text ) != 0 ) {

    echo preg_replace( '/\s\(.*?\)/', '', $text[ 1 ] );
}
    
20.12.2014 / 22:38