BBCode Parser ignore what's inside [code]

2

I made a bbcode parser based on some to meet my needs, however I have some problems.

  • The code inside [code] should be ignored, but I do not know how I could do this, since it has all the other tags that are parsed.
  • I tried to do this, did not answer 100%

    $pos = strpos($text, '[code]');
        $code = "";
    
        if($pos!==false){
    
            $code = substr($text, $pos, strpos($text, '[/code]')-$pos);
            $text = str_replace($code.'[/code]','',$text);
            $code = substr($code, 6);
        }
    
        
    asked by anonymous 11.08.2014 / 17:42

    6 answers

    4

    Maybe this answer is MUCH more than what you need, but in my opinion it is not enough to have just a Regular Expression or a solution based on the positions of certain characters (even more so because this requires that the input data be perfectly normalized ).

    So, I propose an object-oriented solution where a Parser applies as many replacement strategies as you have:

    First the structure of the files:

    |-\BBCode
    | |-\BBCode\Parser.php
    | \-\BBCode\Parsers
    |   |-\BBCode\Parsers\Code.php
    |   |-\BBCode\Parsers\Emphasis.php
    |   |-\BBCode\Parsers\Parser.php
    |   \-\BBCode\Parsers\Strong.php
    \-\index.php
    

    BBCode \ Parser.php is our class of access to the different strategies of analysis and substitution:

    <?php
    
    namespace BBCode;
    
    class Parser {
    
        /**
         * Available Parsers
         *
         * @var array parsers
         */
        private $parsers = array();
    
        /**
         * Input Text (with BBCodes)
         *
         * @var string $text;
         */
        protected $text;
    
        /**
         * Output Text (parsed)
         *
         * @var string $output;
         */
        protected $output;
    
        /**
         * Parser Constructor
         * Prepares the text to be parsed
         */
        public function __construct( $text ) {
    
            // Preparing text
    
            $text = $this -> prepare( $text );
    
            $this -> text = $this -> output = $text;
        }
    
        /**
         * Add new BBCode Parser to be used
         *
         * @param Parsers\Parser $parser
         *  BBCode Parser
         *
         * @return BBCode\Parser
         *  Parser Object (Fluent Interface)
         */
        public function addParser( Parsers\Parser $parser ) {
    
            $this -> parsers[] = $parser;
    
            return $this;
        }
    
        /**
         * Parses BBCodes
         *
         * @return BBCode\Parser
         *  Parser Object (Fluent Interface)
         */
        public function parse() {
    
            foreach( $this -> parsers as $parser ) {
    
                $this -> output = $parser -> parse( $this -> output );
            }
    
            return $this;
        }
    
        // Accessors
    
        /**
         * Get output (parsed) text
         *
         * @return string
         *  Parsed text
         */
        public function getText() {
            return $this -> output;
        }
    
        // Auxiliary Methods
    
        /**
         * Applies some routines over inout text
         * allowing easier parsing
         *
         * @param string $text
         *  Text to cleanup
         *
         * @return string
         *  Cleaned text
         */
        private function prepare( $text ) {
    
            // Cleaning trailing spaces
    
            $text = trim( $text );
    
            // Removing duplicated spaces
    
            $text = preg_replace( '/\s{2,}/', ' ', $text );
    
            return $text;
        }
    }
    

    It seems too much just because of the comments, but it's really very simple. In it we have, besides the properties, of course:

    • The constructor to receive the input data which will be handled by each individual Parser ;
    • A Parser :: addParser () method) through which we can add new Parsing strategies, all secured with interfaces and polymorphism through the type- hinting .
    • A method that iterates the Parsers collection and bundles them with input text
    • A getter to get the text with the BBCodes replaced with the appropriate tags.

    We also have a private method that simplifies the possible Regular Expressions of analysis strategies. I've just added two routines: one to wipe spaces around the string and one to remove duplicate spaces.

    These two routines allow us, for example, not to need borders (\ b), anchors (^ and $) or the unprintable character bar (\ s).

    Then we have the classes responsible for parsing strategies:

    Strong.php     

    namespace BBCode\Parsers;
    
    class Strong implements Parser {
    
        /**
         * Parses found BBCodes
         *
         * @param string $text
         *  Input text to parse
         */
        public function parse( $text ) {
    
            $text = $this -> applyParsingRestrictions( $text );
    
            return preg_replace_callback(
    
                '/\[b\](.*?)\[\/b\]/',
    
                function( $matches ) {
                    return sprintf( '<strong>%s</strong>', $matches[ 1 ] );
                },
    
                $text
            );
        }
    
        // Auxiliary methods
    
        /**
         * Apply parsing restrictions against nested BBCodes
         *
         * @param string $text
         *  Input Text to analyze
         *
         * @return string
         *  Input text with nested BBCodes striped
         */
        private function applyParsingRestrictions( $text ) {
    
            if( preg_match( '/((?<=\[code\])\[b\])(.*)(\[\/b\](?=\[\/code\]))/', $text, $matches ) ) {
    
                $text = str_replace(
    
                    sprintf( '[b]%s[/b]', $matches[ 2 ] ), $matches[ 2 ], $text
                );
            }
    
            return $text;
        }
    }
    

    Emphasis.php     

    namespace BBCode\Parsers;
    
    class Emphasis implements Parser {
    
        /**
         * Parses found BBCodes
         *
         * @param string $text
         *  Input text to parse
         */
        public function parse( $text ) {
    
            return preg_replace_callback(
    
                '/\[i\](.*?)\[\/i\]/',
    
                function( $matches ) {
                    return sprintf( '<em>%s</em>', $matches[ 1 ] );
                },
    
                $text
            );
        }
    }
    

    Code.php

    <?php
    
    namespace BBCode\Parsers;
    
    class Code implements Parser {
    
        /**
         * Parses found BBCodes
         *
         * @param string $text
         *  Input text to parse
         */
        public function parse( $text ) {
    
            return preg_replace_callback(
    
                '/\[code\](.*?)\[\/code\]/',
    
                function( $matches ) {
                    return sprintf( '<code>%s</code>', $matches[ 1 ] );
                },
    
                $text
            );
        }
    }
    

    And you can create as many strategies as you need, all of them implementing the method defined in the Parsers \ Parser.php interface:

    <?php
    
    namespace BBCode\Parsers;
    
    interface Parser {
    
        /**
         * Parses found BBCodes
         *
         * @param string $text
         *  Input text to parse
         */
        public function parse( $text );
    }
    

    Replacement routines are almost self-explanatory. It is a simple regular replacement. I've opted for preg_replace_callback () for being more readable.

    The cat leap that (finally) contextualizes this response to the topic issue was demonstrated only in the Strong.php class using the Strong :: applyParsingRestrictions () .

    Before replacing the [b] and [/ b] tags are made by their & strong counterparts < / strong >

    I just set a search, by BBCode [code] . If the bold BBCode is found within a code BBcode, instead of continuing the substitution by the HTML tags, we have removed the BBCode from the input text.

    And the idea is basically that posted by Guillermo Lautert , used lookbacks and lookaheads . We look back looking for the opening of the BBCode code and look forward through the closing BBCode, if we find, we remove the bold BBCodes that exist inside.

    And back to the Parsers \ Parser :: parse () interface method, if there is no other BBCode instance in bold, preg_replace_callback () will not run, returning the stream to the next Parser of the defined collection.

    To use this all we have:

    <?php
    
    // Autoloading
    
    spl_autoload_register( function( $classname ) {
    
        $classname = stream_resolve_include_path(
    
            str_replace( '\', DIRECTORY_SEPARATOR, $classname ) . '.php'
        );
    
        if( $classname !== FALSE ) {
    
            include $classname;
        }
    });
    
    $parser = new BBCode\Parser(
    
        '[code][b]This[/b][/code]       [code][i]is[/i][/code] my [b]text[/b]  !'
    );
    
    $parser -> addParser( new BBCode\Parsers\Strong )
            -> addParser( new BBCode\Parsers\Emphasis )
            -> addParser( new BBCode\Parsers\Code );
    
    echo $parser -> parse() -> getText();
    
    ?>
    

    And we have as output;

    <code>This</code> <code><em>is</em></code> my <strong>text</strong> !
    

    View the restriction application in action. Our input string has a bold BBCode inside another code. Because of the constraint, we removed the bold one by leaving code.

    This is without prejudice to the bold BBCode set ahead, which works normally.

    But look at what happened to the BBCode of italics ( emphasis ). As no constraint rule was defined, the resulting string had a

    14.08.2014 / 20:15
    4

    Possible solution:

    <?php
    
       $code = '
       teste[code]123[/code]bla 
       teste[code]456[/code]bla 
       teste[code]789[/code]bla 
       teste[code]xyz[/code]bla
    ';
    
       While ( $pos = stripos( ' '.$code, '[code]') ) {
          $left = substr( $code, 0, $pos - 1 );
          $code = substr( $code, $pos + 5 );
          $right = substr( $code, stripos( $code, '[/code]' ) + 7 );
    
          // Se quiser fazer algo com o código que foi removido faça nesta linha:
          echo htmlentities( 'Removido: '.substr( $code, 0, stripos( $code, '[/code]' ) ) ).'<br>';
    
          $code = $left.$right;
       }
    
       echo 'Resultado: '.nl2br( htmlentities( $code ) );
    
    ?>
    

    This loop basically removes everything between [code] and [/code] of the original string, including the tags. Some considerations:

    • If you want to extract only [code] in lowercase, change stripos to strpos ;
    • If you want to do something with the removed code, just use the logic below comment ;
    • Depending on how you process the data, it might be better to ignore the data when submitting it, not actually removing it from the original string.
    • The above code ignores unclosed tags; it is the case that you decide whether to tag unclosed account to the end of the line, or leave as is.

    Result:

    Removido: 123
    Removido: 456
    Removido: 789
    Removido: 444
    Resultado: 
     testebla 
     testebla 
     testebla 
     testebla
    
        
    13.08.2014 / 21:35
    1

    Daniel, I do not know if I meant what you need, but try:

        $str = "texte 1 [code] texte code [/code] texte 2";     
    
        preg_match('/(?<=\[code\]).*(?=\[\/code\])/', $str, $match);
    
        $strCode = $match[0];
    

    This will return everything you have inside "[code] [/ code]"

        
    13.08.2014 / 21:02
    1

    I think it's more or less that, just change the function inside preg_replace_callback to what you need:

    $text = "Teste de string com code: [code]<p>teste</p>[/code] e continuação de teste com outro code: [code]<p>teste 2</p>[/code] com mais texto.";
    
    $text = preg_replace_callback('/\[code\](.*?)\[\/code\]/i',
            function ($matches) {
                return ($matches[1] ? '<div class="code">' . htmlspecialchars($matches[1], ENT_COMPAT,'ISO-8859-1') . '</div>' : '');
            }, $text);
    
    echo $text;
    
    // retorno
    
    // Teste de string com code: <div class="code">&lt;p&gt;teste&lt;/p&gt;</div> e continuação de teste com outro code: <div class="code">&lt;p&gt;teste 2&lt;/p&gt;</div> com mais texto.
    

    In this example, the function adds the contents of the [code] tags within a div , and converts the HTML characters so that they are displayed. The .code class can be used to format div with background, border, font, etc ...

    And if you find [code][/code] , they are removed without creating% empty%.

        
    14.08.2014 / 17:21
    0

    You can do with regular expression like this:

    <?php 
    function removeBB($texto) { 
        $regex = '|[[\/\!]*?[^\[\]]*?]|si';  
        return preg_replace($regex, '', $texto); 
    } 
    $s = "[url=http://google.com.br]Google[/url]"; 
    echo removeBB($s); 
    ?> 
    
        
    15.08.2014 / 15:22
    -1

    This regex can solve the problem.

    <?php
    
    $string = 'texto texto la la la [code]<html><iframe><p>codigo html.</p></iframe></html>[/code] continuacao do texto e bla bla bla';
    $pattern = '%(\[code\].*\[/code\])%';
    $replacement = "${1}[code][/code]";
    $response = preg_replace($pattern, $replacement, $string);
    
    print $response;
    
    ?>
    

    output text text la la [code] [/ code] continuation of the text and bla bla bla

        
    13.08.2014 / 18:18