How to minify HTML with PHP without affecting pre and textarea tags?

1

When trying to minify an HTML with the function below, I have problems with the tags that preserve the spacing shown in the code.

The function is as follows:

function compressHtml( $buffer ) {
    if( $this->_app->getEnvironment() == 'development' ) {
        return $buffer;
    }
    $pattern = array(
        '/\>[^\S ]+/s',         // limpando espaços em branco antes das tags
        '/[^\S ]+\</s',         // limpando espaços em branco depois das tags
        '/(\s)+/s',             // diminuindo espaços repetidos para um apenas
        '/<!--(.|\s)*?-->/',    // retirando comentários do HTML
        '#(?://)?<!\[CDATA\[(.*?)(?://)?\]\]>#s' // deixando CDATA sozinho
    );
    $replacement = array(
        '>',
        '<',
        '\1',
        '',
        "//<![CDATA[\n".''."\n//]]>"
    );
    $html = preg_replace( $pattern, $replacement, $buffer );
    return trim( $html );
}

After that I already use GZIP to compress and I have knowledge of libraries that do this work, however I wanted something manual so I did not have to deal with many library dependencies.

    
asked by anonymous 28.10.2015 / 19:36

1 answer

1

The function that is displayed can be replaced underneath, but you need to be careful with CSS and Javascript inline.

function compressHtml($buffer)
{
    // Iniciando variáveis para busca de tags
    $foundTxt = null;
    $foundPre = null;
    $foundCode = null;
    $foundScript = null;

    // Procurando tags textarea, pre, code e script
    preg_match_all('#\<textarea.*\>.*\<\/textarea\>#Uis', $buffer, $foundTxt);
    preg_match_all('#\<pre.*\>.*\<\/pre\>#Uis', $buffer, $foundPre);
    preg_match_all('#\<code.*\>.*\<\/code\>#Uis', $buffer, $foundCode);
    preg_match_all('#\<script.*\>.*\<\/script\>#Uis', $buffer, $foundScript);

    // Substitui o canteúdo da tags por uma palavra chave, assim
    // o conteúdo pode ser posto novamente intacto no lugar.
    // Exemplo: <textarea>$index</textarea> / <pre>$index</pre>
    $buffer = str_replace($foundTxt[0], array_map(function($el){ return '<textarea>'.$el.'</textarea>'; }, array_keys($foundTxt[0])), $buffer);
    $buffer = str_replace($foundPre[0], array_map(function($el){ return '<pre>'.$el.'</pre>'; }, array_keys($foundPre[0])), $buffer);
    $buffer = str_replace($foundCode[0], array_map(function($el){ return '<code>'.$el.'</code>'; }, array_keys($foundCode[0])), $buffer);
    $buffer = str_replace($foundScript[0], array_map(function($el){ return '<script>'.$el.'</script>'; }, array_keys($foundScript[0])), $buffer);

    // Minifica o html
    $search = array(
        '/\>[^\S ]+/s',         // Limpando espaços em branco antes das tags
        '/[^\S ]+\</s',         // Limpando espaços em branco depois das tags
        '/(\s)+/s',             // Siminuindo espaços repetidos para um apenas
        '/<!--(.|\s)*?-->/',    // Eetirando comentários do HTML
        '#(?://)?<!\[CDATA\[(.*?)(?://)?\]\]>#s' // Deixando CDATA em linhas separadas
    );

    $replace = array(
        '>',
        '<',
        '\1',
        '',
        "//<![CDATA[\n".''."\n//]]>"
    );

    $buffer = preg_replace($search, $replace, $buffer);

    // Replacing back with content
    $buffer = str_replace(array_map(function($el){ return '<textarea>'.$el.'</textarea>'; }, array_keys($foundTxt[0])), $foundTxt[0], $buffer);
    $buffer = str_replace(array_map(function($el){ return '<pre>'.$el.'</pre>'; }, array_keys($foundPre[0])), $foundPre[0], $buffer);
    $buffer = str_replace(array_map(function($el){ return '<code>'.$el.'</code>'; }, array_keys($foundCode[0])), $foundCode[0], $buffer);
    $buffer = str_replace(array_map(function($el){ return '<script>'.$el.'</script>'; }, array_keys($foundScript[0])), $foundScript[0], $buffer);

    return $buffer;
}

This is not the best solution, but it is the basics of the process.

    
28.10.2015 / 19:36