BBCode Parser ignore what's inside [code]

Question

BBCode Parser ignore what's inside [code]

Navigation

#1 by (4 votes)
#2 by (4 votes)
#3 by (1 votes)
#4 by (1 votes)
#5 by (0 votes)
#6 by (-1 votes)

2

I made a bbcode parser based on some to meet my needs, however I have some problems.

The code inside [code] should be ignored, but I do not know how I could do this, since it has all the other tags that are parsed.

I tried to do this, did not answer 100%

$pos = strpos($text, '[code]');
    $code = "";

    if($pos!==false){

        $code = substr($text, $pos, strpos($text, '[/code]')-$pos);
        $text = str_replace($code.'[/code]','',$text);
        $code = substr($code, 6);
    }

php regex

asked by anonymous 11.08.2014 / 17:42

6 answers

How to instantiate a class with abstract methods in C # How to create an Excel macro to delete duplicate rows

score 4 · Accepted Answer

Maybe this answer is MUCH more than what you need, but in my opinion it is not enough to have just a Regular Expression or a solution based on the positions of certain characters (even more so because this requires that the input data be perfectly normalized ).

So, I propose an object-oriented solution where a Parser applies as many replacement strategies as you have:

First the structure of the files:

|-\BBCode
| |-\BBCode\Parser.php
| \-\BBCode\Parsers
|   |-\BBCode\Parsers\Code.php
|   |-\BBCode\Parsers\Emphasis.php
|   |-\BBCode\Parsers\Parser.php
|   \-\BBCode\Parsers\Strong.php
\-\index.php

BBCode \ Parser.php is our class of access to the different strategies of analysis and substitution:

<?php

namespace BBCode;

class Parser {

    /**
     * Available Parsers
     *
     * @var array parsers
     */
    private $parsers = array();

    /**
     * Input Text (with BBCodes)
     *
     * @var string $text;
     */
    protected $text;

    /**
     * Output Text (parsed)
     *
     * @var string $output;
     */
    protected $output;

    /**
     * Parser Constructor
     * Prepares the text to be parsed
     */
    public function __construct( $text ) {

        // Preparing text

        $text = $this -> prepare( $text );

        $this -> text = $this -> output = $text;
    }

    /**
     * Add new BBCode Parser to be used
     *
     * @param Parsers\Parser $parser
     *  BBCode Parser
     *
     * @return BBCode\Parser
     *  Parser Object (Fluent Interface)
     */
    public function addParser( Parsers\Parser $parser ) {

        $this -> parsers[] = $parser;

        return $this;
    }

    /**
     * Parses BBCodes
     *
     * @return BBCode\Parser
     *  Parser Object (Fluent Interface)
     */
    public function parse() {

        foreach( $this -> parsers as $parser ) {

            $this -> output = $parser -> parse( $this -> output );
        }

        return $this;
    }

    // Accessors

    /**
     * Get output (parsed) text
     *
     * @return string
     *  Parsed text
     */
    public function getText() {
        return $this -> output;
    }

    // Auxiliary Methods

    /**
     * Applies some routines over inout text
     * allowing easier parsing
     *
     * @param string $text
     *  Text to cleanup
     *
     * @return string
     *  Cleaned text
     */
    private function prepare( $text ) {

        // Cleaning trailing spaces

        $text = trim( $text );

        // Removing duplicated spaces

        $text = preg_replace( '/\s{2,}/', ' ', $text );

        return $text;
    }
}

It seems too much just because of the comments, but it's really very simple. In it we have, besides the properties, of course:

The constructor to receive the input data which will be handled by each individual Parser ;
A Parser :: addParser () method) through which we can add new Parsing strategies, all secured with interfaces and polymorphism through the type- hinting .
A method that iterates the Parsers collection and bundles them with input text
A getter to get the text with the BBCodes replaced with the appropriate tags.

We also have a private method that simplifies the possible Regular Expressions of analysis strategies. I've just added two routines: one to wipe spaces around the string and one to remove duplicate spaces.

These two routines allow us, for example, not to need borders (\ b), anchors (^ and $) or the unprintable character bar (\ s).

Then we have the classes responsible for parsing strategies:

Strong.php

namespace BBCode\Parsers;

class Strong implements Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text ) {

        $text = $this -> applyParsingRestrictions( $text );

        return preg_replace_callback(

            '/\[b\](.*?)\[\/b\]/',

            function( $matches ) {
                return sprintf( '<strong>%s</strong>', $matches[ 1 ] );
            },

            $text
        );
    }

    // Auxiliary methods

    /**
     * Apply parsing restrictions against nested BBCodes
     *
     * @param string $text
     *  Input Text to analyze
     *
     * @return string
     *  Input text with nested BBCodes striped
     */
    private function applyParsingRestrictions( $text ) {

        if( preg_match( '/((?<=\[code\])\[b\])(.*)(\[\/b\](?=\[\/code\]))/', $text, $matches ) ) {

            $text = str_replace(

                sprintf( '[b]%s[/b]', $matches[ 2 ] ), $matches[ 2 ], $text
            );
        }

        return $text;
    }
}

Emphasis.php

namespace BBCode\Parsers;

class Emphasis implements Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text ) {

        return preg_replace_callback(

            '/\[i\](.*?)\[\/i\]/',

            function( $matches ) {
                return sprintf( '<em>%s</em>', $matches[ 1 ] );
            },

            $text
        );
    }
}

Code.php

<?php

namespace BBCode\Parsers;

class Code implements Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text ) {

        return preg_replace_callback(

            '/\[code\](.*?)\[\/code\]/',

            function( $matches ) {
                return sprintf( '<code>%s</code>', $matches[ 1 ] );
            },

            $text
        );
    }
}

And you can create as many strategies as you need, all of them implementing the method defined in the Parsers \ Parser.php interface:

<?php

namespace BBCode\Parsers;

interface Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text );
}

Replacement routines are almost self-explanatory. It is a simple regular replacement. I've opted for preg_replace_callback () for being more readable.

The cat leap that (finally) contextualizes this response to the topic issue was demonstrated only in the Strong.php class using the Strong :: applyParsingRestrictions () .

Before replacing the [b] and [/ b] tags are made by their & strong counterparts < / strong >

I just set a search, by BBCode [code] . If the bold BBCode is found within a code BBcode, instead of continuing the substitution by the HTML tags, we have removed the BBCode from the input text.

And the idea is basically that posted by Guillermo Lautert , used lookbacks and lookaheads . We look back looking for the opening of the BBCode code and look forward through the closing BBCode, if we find, we remove the bold BBCodes that exist inside.

And back to the Parsers \ Parser :: parse () interface method, if there is no other BBCode instance in bold, preg_replace_callback () will not run, returning the stream to the next Parser of the defined collection.

To use this all we have:

<?php

// Autoloading

spl_autoload_register( function( $classname ) {

    $classname = stream_resolve_include_path(

        str_replace( '\', DIRECTORY_SEPARATOR, $classname ) . '.php'
    );

    if( $classname !== FALSE ) {

        include $classname;
    }
});

$parser = new BBCode\Parser(

    '[code][b]This[/b][/code]       [code][i]is[/i][/code] my [b]text[/b]  !'
);

$parser -> addParser( new BBCode\Parsers\Strong )
        -> addParser( new BBCode\Parsers\Emphasis )
        -> addParser( new BBCode\Parsers\Code );

echo $parser -> parse() -> getText();

?>

And we have as output;

<code>This</code> <code><em>is</em></code> my <strong>text</strong> !

View the restriction application in action. Our input string has a bold BBCode inside another code. Because of the constraint, we removed the bold one by leaving code.

This is without prejudice to the bold BBCode set ahead, which works normally.

But look at what happened to the BBCode of italics ( emphasis ). As no constraint rule was defined, the resulting string had a

14.08.2014 / 20:15