Browsing HTML content and removing parts of HTML using PHP

1

I have for example this code that removes all DIVs that contains the contextual class from my HTML code passed in the string $sHTML :

$nPosIni = strpos($sHTML, '<div class="contextual">');
while ($nPosIni > 0) { // remove todas as divs com a classe contextual
    $nPosFim = strpos($sHTML, '</div>', $nPosIni);
    $sHTML = substr($sHTML, 0, $nPosIni) . 
             substr($sHTML, ($nPosFim + strlen("</div>")));
    $nPosIni = strpos($sHTML, '<div class="contextual">');
}

So, what I need is to remove from a code HTML another <div> with another class, however I want it to remain only a <h3> CONTEÚDO </h3> that has within that <div> .

I tried in many ways but could not find an efficient way, does anyone know of any good practice?

NOTE: The code I am using does not accept scripts or functions, only PHP , HTML and CSS ...

EXAMPLE HTML:

<html>
    <head></head>
    <body>
        <div class="xy">
            <h3> conteúdo </h3>
        </div>
    </body>
</html>

HTML HOW TO STAY:

<html>
    <head></head>
    <body>
        <h3> conteúdo </h3>
    </body>
</html>
    
asked by anonymous 05.10.2017 / 22:47

2 answers

1

As I've commented, the best way to manipulate HTML with PHP is to use the native DOM classes. . In this case, I directly used the DOMDocument and DOMXPath classes. The code is commented on when the steps are executed and I think it will be easy to understand it:

<?php

$html = <<<HTML
<html>
    <head></head>
    <body>
        <div class="xy">
            <h3> conteúdo </h3>
        </div>
    </body>
</html>
HTML;

// 1. Cria uma instância de DOMDocument:
$dom = new DOMDocument();

// 2. Carrega o código HTML a partir de uma string:
$dom->loadHTML($html);

// 3. Cria uma instância de DomXPath:
$xpath = new DOMXPath($dom);

// 4. Busca no HTML todos os elementos 'div' que possuem a classe 'xy':
$nodes = $xpath->query("//div[@class='xy']");

// 5. Percorre a lista de elementos encontrados:
foreach ($nodes as $node) {

    // 6. Busca o primeiro elemento 'h3' dentro da 'div':
    $h3 = $node->getElementsByTagName("h3")[0];

    // 7. Substitui no HTML a 'div' pelo respectivo 'h3':
    $node->parentNode->replaceChild($h3, $node);
}

// 8. Exibe o HTML final:
echo $dom->saveHTML(), PHP_EOL;

See working at Repl.it

    
06.10.2017 / 15:07
0

You can replace the div by using the preg_replace_callback () function and leaving only the h3 tag.

$new_sHTML = preg_replace_callback('/<div class=\"xy\">.*?<\/div>/sim',
  function($match) {
    preg_match('/<h3.*?<\/h3>/sim', $match[0], $h3);
    return $h3[0];
  }, $sHTML
);

Follow the example working at link

    
06.10.2017 / 02:42