PHP syntax highlighter - DOM object does not update HTML

2

I've been developing a code to do syntax highlighting , but the code is failing to update the HTML.

The code is extensive, and unfortunately all of it is needed.

I have a main file named highlight.php :

<?php

final class highlight {

    private static $langs = array();
    private static $exts = array();
    private static $default_replace = array(
            'tag'=>'span',
            'text'=>'$1'
        );

    static function highlight_string($lang, $code) {
        if(isset(self::$langs[$lang]))
        {
            $lang_defs = &self::$langs[$lang];

            $dom = new DOMDocument('1.0', 'utf-8');

            $element = $dom->createElement('code', $code);

            $element->setAttribute('class','highlight '.$lang);

            $dom->appendChild($element);

            foreach($lang_defs as $k=>&$lang_def)
            {
                $html = '';

                while($child = &$element->firstChild)
                {

                    if($child->nodeType === 3)
                    {

                        if(!isset($lang_def['replace']))
                        {
                            $lang_def['replace'] = self::$default_replace;
                        }

                        switch(gettype($lang_def['replace']))
                        {
                            case 'string':
                                $html .= preg_replace(
                                        $lang_def['match'],
                                        $lang_def['replace'],
                                        $child->nodeValue
                                    );
                                break;
                            case 'array':
                                $html .= preg_replace(
                                        $lang_def['match'],
                                        '<' . $lang_def['replace']['tag'] .
                                        ' class="' . $lang_def['class'] . '">' .
                                            $lang_def['replace']['text'] .
                                        '</' . $lang_def['replace']['tag'] . '>',
                                        $child->nodeValue
                                    );
                                break;
                            case 'object':
                                $html .= preg_replace_callback(
                                        $lang_def['match'],
                                        $lang_def['replace'],
                                        $child->nodeValue
                                    );
                                break;

                        }
                    }
                    else
                    {
                        $html .= $child->nodeValue;
                    }

                    $element->removeChild($child);

                }

                if($html)
                {
                    $fragment = $dom->createDocumentFragment();

                    $fragment->appendXML(isset($lang_def['patch'])?$lang_def['patch']($html):$html);

                    $element->appendChild($fragment);

                    if($element->nodeValue == $code)
                    {
                        trigger_error('Syntax highlight failed on the rule no. '.$k.', for the language '.$lang,E_USER_WARNING);

                        return false;
                    }

                    echo '<br>',htmlentities($dom->saveXML()),'<br>',var_dump($element);
                }
            }

            //removes the xml declaration
            return trim(
                    str_replace(
                        '<?xml version="1.0" encoding="utf-8"?>',
                        '',
                        $dom->saveXML()
                    ),
                    "\r\n"
                );

        }
        else
        {
            return false;
        }
    }

    static function highlight_file($file) {

        if(@is_file($file))
        {
            if(preg_match('@(?P<file>.*)\.(?P<ext>[^\.]*)$@', $file, $name) && isset(self::$langs[$name['ext']]))
            {
                return self::highlight_string(self::$langs[$name['ext']], file_get_contents($file));
            }
        }
        else
        {
            return false;
        }
    }

    static function add_lang($lang, $defs){
        switch(gettype($defs)){
            case 'string':
                $defs = (array)include($defs);
                if( $defs === array() )
                {
                    return false;
                }
            case 'array':

                self::$langs[$lang] = $defs['lang'];

                foreach( $defs['exts'] as $ext)
                {
                    self::$exts[$ext] = $lang; 
                }

                break;
            default:
                return false;
        }
        return true;
    }

    static function lang_loaded($lang) {
        return isset(self::$langs[$lang]);
    }

};

This is the file where the crash is. Constantly runs the trigger_error() function. The function runs when the html code is the same as the original code, indicating that the function failed.

Settings are loaded into separate files, an example for the file sql.php :

<?php
    return array(
        'exts'=>array('sql'),
        'lang'=>array(

            array(
                'class'=>'string',
                'match'=>'/([bn]?"(?:[^"]|[\"]")*"|[bn]?\'(?:[^\']|[\\']\')*\')(?=[\b\s\(\),;\$#\+\-\*\/]|$)/'
            ),
            array(
                'class'=>'comment',
                'match'=>'/((?:\/\/|\-\-\s|#)[^\r\n]*|\/\*(?:[^*]|\*[^\/])*(?:\*\/|$))/',
                'patch'=>function($html){
                    //step one: try to fix the spans
                    $html = preg_replace(
                            '/((?:\/\/|\-\-\s|#)[^\r\n]*|\/\*(?:[^*]|\*[^\/])*(?:\*\/|$))/',
                            '$1</span>',
                            $html
                        );

                    //step 2: fix single-line comments (-- and #)
                    $html = preg_replace_callback(
                            '/<span class="comment">((?:#|-- |\/\/)(?:.|<\/span><span class="[^"]+">([^<])<\/span>)*)([\r\n]|$)/',
                            function($matches){
                                return '<span class="comment">'.
                                    //cleans up all spans
                                    preg_replace(
                                            '/<\/?span(?: class="[^"]+")?>/',
                                            $matches[1].$matches[2],
                                            ''
                                        ).'</span>'.$matches[3];
                            },
                            $html
                        );

                    //step 3: fix multi-line comments
                    return preg_replace_callback(
                            '/<span class="comment">(\/\*(?:[^*]|\*[^\/])+(?:\*\/(?:<\/span>)?|$))/',
                            function($matches){
                                return '<span class="comment">'.
                                    //cleans up all spans
                                    preg_replace(
                                        '/<\/?span(?: class="[^"]+")?>/',
                                        $matches[1],
                                        ''
                                    ).'</span>';
                            },
                            $html
                        );
                }
            ),
            array(
                /*
                 * numbers aren't that 'regular' and many edge-cases were left behind    
                 * with the help of @MLM (http://stackoverflow.com/users/796832/mlm),    
                 * we were able to make this work.    
                 * he took over the regex and patched it all up, I did the replace string    
                 */
                'match'=>'/((?:^|\b|\(|\s|,))(?![a-z_]+)([+\-]?\d+(?:\.\d+)?(?:[eE]-?\d+)?)((?=$|\b|\s|\(|\)|,|;))/',
                'replace'=>'$1<span class="number">$2</span>$3'
            ),
            array(
                'class'=>'name',
                'match'=>'/('[^']+')/'
            ),
            array(
                'class'=>'var',
                'match'=>'/(@@?[a-z_][a-z_\d]*)/'
            ),
            array(
                'class'=>'keyword',
                //the keyword replace must have an aditional check ('(?!\()' after the name), due to the function replace()
                'match'=>'/\b(accessible|add|all|alter|analyze|and|as|asc|asensitive|before|between|bigint|binary|blob|both|by|call|cascade|case|change|char|character|check|collate|column|condition|constraint|continue|convert|create|cross|current_date|current_time|current_timestamp|current_user|cursor|database|databases|day_hour|day_microsecond|day_minute|day_second|dec|decimal|declare|default|delayed|delete|desc|describe|deterministic|distinct|distinctrow|div|double|drop|dual|each|else|elseif|enclosed|escaped|exists|exit|explain|false|fetch|float|float4|float8|for|force|foreign|from|fulltext|generated|get|grant|group|having|high_priority|hour_microsecond|hour_minute|hour_second|if|ignore|in|index|infile|inner|inout|insensitive|insert|int|int1|int2|int3|int4|int8|integer|interval|into|io_after_gtids|io_before_gtids|is|iterate|join|key|keys|kill|leading|leave|left|like|limit|linear|lines|load|localtime|localtimestamp|lock|long|longblob|longtext|loop|low_priority|master_bind|master_ssl_verify_server_cert|match|maxvalue|mediumblob|mediumint|mediumtext|middleint|minute_microsecond|minute_second|mod|modifies|natural|nonblocking|not|no_write_to_binlog|null|numeric|on|optimize|optimizer_costs|option|optionally|or|order|out|outer|outfile|parse_gcol_expr|partition|precision|primary|procedure|purge|range|read|reads|read_write|real|references|regexp|release|rename|repeat|replace(?!\()|require|resignal|restrict|return|revoke|right|rlike|schema|schemas|second_microsecond|select|sensitive|separator|set|show|signal|smallint|spatial|specific|sql|sqlexception|sqlstate|sqlwarning|sql_big_result|sql_calc_found_rows|sql_small_result|ssl|starting|stored|straight_join|table|terminated|then|tinyblob|tinyint|tinytext|to|trailing|trigger|true|undo|union|unique|unlock|unsigned|update|usage|use|using|utc_date|utc_time|utc_timestamp|values|varbinary|varchar|varcharacter|varying|virtual|when|where|while|with|write|xor|year_month|zerofill)\b/i'
            ),
            array(
                'class'=>'func',
                'match'=>'/\b([a-z_][a-z_\d]*)\b(?=\()/i'
            ),
            array(
                'class'=>'name',
                'match'=>'/\b([a-z\_][a-z_\d]*)\b/i'
            )
        )
    );

This is all called in the file index.php :

<style>
.highlight, .highlight *{
    background:black;
    color:white;
    font-family:'Consolas',monospace;
    font-size:16px;
    word-wrap: break-word;
    /*forces whitespace to stay there*/
    white-space: pre;
}

.highlight.sql .keyword{color:teal;}
.highlight.sql .string{color:red;}
.highlight.sql .func{color:purple;}
.highlight.sql .number{color:#0F0;}
.highlight.sql .name{color:olive;}
.highlight.sql .var{color:green;}
.highlight.sql .comment{color:gray;}
</style>

Testing highlight of a string:

<?php

    include('highlight.php');

    highlight::add_lang('sql','lang/sql.php');

    echo highlight::highlight_string('sql','select 1,"2";');

?>

What should I do echo of the following html:

<code class="highlight sql"><span class="keyword">select</span> <span class="number">1</span>,<span class="string">"2"</span>;</code>

Therefore, the final file structure is as follows:

/
|--- index.php
|--- highlight.php
|--- /lang/
     |--- sql.php
     |--- ...

I've tried lots of experiments and they all failed.

One of them was the following code:

<?

$dom = new DOMDocument('1.0', 'utf-8');

$element = $dom->createElement('code', 'select 1,"2","test"; #test');

$element->setAttribute('class','highlight');

$dom->appendChild($element);

while($element->childNodes->length){
    $element->removeChild($element->firstChild);
}

$fragment = $dom->createDocumentFragment();

$fragment->appendXML('<span class="dest">select</span> 1,"2","test"; ');

$element->appendChild($fragment);

echo '<br>',htmlentities($dom->saveXML());

What works! And it's quite similar! The code is generated correctly, but the HTML of the code tag is never updated.

What am I doing wrong in the code?

    
asked by anonymous 18.05.2015 / 22:28

0 answers