Maybe this answer is MUCH more than what you need, but in my opinion it is not enough to have just a Regular Expression or a solution based on the positions of certain characters (even more so because this requires that the input data be perfectly normalized ).
So, I propose an object-oriented solution where a Parser applies as many replacement strategies as you have:
First the structure of the files:
|-\BBCode
| |-\BBCode\Parser.php
| \-\BBCode\Parsers
| |-\BBCode\Parsers\Code.php
| |-\BBCode\Parsers\Emphasis.php
| |-\BBCode\Parsers\Parser.php
| \-\BBCode\Parsers\Strong.php
\-\index.php
BBCode \ Parser.php is our class of access to the different strategies of analysis and substitution:
<?php
namespace BBCode;
class Parser {
/**
* Available Parsers
*
* @var array parsers
*/
private $parsers = array();
/**
* Input Text (with BBCodes)
*
* @var string $text;
*/
protected $text;
/**
* Output Text (parsed)
*
* @var string $output;
*/
protected $output;
/**
* Parser Constructor
* Prepares the text to be parsed
*/
public function __construct( $text ) {
// Preparing text
$text = $this -> prepare( $text );
$this -> text = $this -> output = $text;
}
/**
* Add new BBCode Parser to be used
*
* @param Parsers\Parser $parser
* BBCode Parser
*
* @return BBCode\Parser
* Parser Object (Fluent Interface)
*/
public function addParser( Parsers\Parser $parser ) {
$this -> parsers[] = $parser;
return $this;
}
/**
* Parses BBCodes
*
* @return BBCode\Parser
* Parser Object (Fluent Interface)
*/
public function parse() {
foreach( $this -> parsers as $parser ) {
$this -> output = $parser -> parse( $this -> output );
}
return $this;
}
// Accessors
/**
* Get output (parsed) text
*
* @return string
* Parsed text
*/
public function getText() {
return $this -> output;
}
// Auxiliary Methods
/**
* Applies some routines over inout text
* allowing easier parsing
*
* @param string $text
* Text to cleanup
*
* @return string
* Cleaned text
*/
private function prepare( $text ) {
// Cleaning trailing spaces
$text = trim( $text );
// Removing duplicated spaces
$text = preg_replace( '/\s{2,}/', ' ', $text );
return $text;
}
}
It seems too much just because of the comments, but it's really very simple. In it we have, besides the properties, of course:
- The constructor to receive the input data which will be handled by each individual Parser ;
- A Parser :: addParser () method) through which we can add new Parsing strategies, all secured with interfaces and polymorphism through the type- hinting .
- A method that iterates the Parsers collection and bundles them with input text
- A getter to get the text with the BBCodes replaced with the appropriate tags.
We also have a private method that simplifies the possible Regular Expressions of analysis strategies. I've just added two routines: one to wipe spaces around the string and one to remove duplicate spaces.
These two routines allow us, for example, not to need borders (\ b), anchors (^ and $) or the unprintable character bar (\ s).
Then we have the classes responsible for parsing strategies:
Strong.php
namespace BBCode\Parsers;
class Strong implements Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text ) {
$text = $this -> applyParsingRestrictions( $text );
return preg_replace_callback(
'/\[b\](.*?)\[\/b\]/',
function( $matches ) {
return sprintf( '<strong>%s</strong>', $matches[ 1 ] );
},
$text
);
}
// Auxiliary methods
/**
* Apply parsing restrictions against nested BBCodes
*
* @param string $text
* Input Text to analyze
*
* @return string
* Input text with nested BBCodes striped
*/
private function applyParsingRestrictions( $text ) {
if( preg_match( '/((?<=\[code\])\[b\])(.*)(\[\/b\](?=\[\/code\]))/', $text, $matches ) ) {
$text = str_replace(
sprintf( '[b]%s[/b]', $matches[ 2 ] ), $matches[ 2 ], $text
);
}
return $text;
}
}
Emphasis.php
namespace BBCode\Parsers;
class Emphasis implements Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text ) {
return preg_replace_callback(
'/\[i\](.*?)\[\/i\]/',
function( $matches ) {
return sprintf( '<em>%s</em>', $matches[ 1 ] );
},
$text
);
}
}
Code.php
<?php
namespace BBCode\Parsers;
class Code implements Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text ) {
return preg_replace_callback(
'/\[code\](.*?)\[\/code\]/',
function( $matches ) {
return sprintf( '<code>%s</code>', $matches[ 1 ] );
},
$text
);
}
}
And you can create as many strategies as you need, all of them implementing the method defined in the Parsers \ Parser.php interface:
<?php
namespace BBCode\Parsers;
interface Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text );
}
Replacement routines are almost self-explanatory. It is a simple regular replacement. I've opted for preg_replace_callback () for being more readable.
The cat leap that (finally) contextualizes this response to the topic issue was demonstrated only in the Strong.php class using the Strong :: applyParsingRestrictions () .
Before replacing the [b] and [/ b] tags are made by their & strong counterparts < / strong >
I just set a search, by BBCode [code] . If the bold BBCode is found within a code BBcode, instead of continuing the substitution by the HTML tags, we have removed the BBCode from the input text.
And the idea is basically that posted by Guillermo Lautert , used lookbacks and lookaheads . We look back looking for the opening of the BBCode code and look forward through the closing BBCode, if we find, we remove the bold BBCodes that exist inside.
And back to the Parsers \ Parser :: parse () interface method, if there is no other BBCode instance in bold, preg_replace_callback () will not run, returning the stream to the next Parser of the defined collection.
To use this all we have:
<?php
// Autoloading
spl_autoload_register( function( $classname ) {
$classname = stream_resolve_include_path(
str_replace( '\', DIRECTORY_SEPARATOR, $classname ) . '.php'
);
if( $classname !== FALSE ) {
include $classname;
}
});
$parser = new BBCode\Parser(
'[code][b]This[/b][/code] [code][i]is[/i][/code] my [b]text[/b] !'
);
$parser -> addParser( new BBCode\Parsers\Strong )
-> addParser( new BBCode\Parsers\Emphasis )
-> addParser( new BBCode\Parsers\Code );
echo $parser -> parse() -> getText();
?>
And we have as output;
<code>This</code> <code><em>is</em></code> my <strong>text</strong> !
View the restriction application in action. Our input string has a bold BBCode inside another code. Because of the constraint, we removed the bold one by leaving code.
This is without prejudice to the bold BBCode set ahead, which works normally.
But look at what happened to the BBCode of italics ( emphasis ). As no constraint rule was defined, the resulting string had a
14.08.2014 / 20:15