To deal with HTML editing, regular expressions are not the best hit:
solutions based on html / xml parsers are most appropriate.
However, keeping the initial question approach:
$reply = preg_replace('~(?<![\'"])(https?://\S+(gif|jpe?g|png))~i', ## jpeg -> img
"<img src='$1'>", $reply);
$reply = preg_replace('~(?<![\'"])(https?://\S+)~', ## http -> a href
"<a href='$1' target='_blank'>$1</a>", $reply);
$reply = preg_replace('~(?<![\'"/])(www.\w\S+)~', ## www.x -> a href
"<a href='http://$1' target='_blank'>$1</a>", $reply);
EDIT 1 : (removed for being contained in issue 2)
EDIT2 :
Replacing% with% with% with% will only occur if URL is not
you are following URL
or <f href='URL'>URL</a>
.
To deal with situations of previously annotated links and other situations not to annotate,
I added:
a '
tag before the address (because it drops a "
to the address, this
will not be annotated)
In the end we removed the mark
that is:
$contexto_esq = ' <a\s.*?>\s* # proteger após <a...>
| " # proteger após ",
';
$txt= preg_replace("~($contexto_esq)( https? | www. )~ix", # add ''PrOtEcT''
"$1''PrOtEcT''$2", $txt);
$txt= preg_replace('~(?<![\'"])(https?://\S+(gif|jpe?g|png))~i', # jpeg -> img
"<img src='$1'>", $txt);
$txt= preg_replace('~(?<![\'"])(https?://\S+)~', # http -> a href
"<a href='$1' target='_blank'>$1</a>", $txt);
$txt= preg_replace('~(?<![\'"/])(www.\w\S+)~', # www. -> a hreg
"<a href='http://$1' target='_blank'>$1</a>", $txt);
$txt= preg_replace("~''PrOtEcT''~","",$txt); # remove ''PrOtEcT''
Test Policy As the exercise conditions are not set and are always changing,
it is crucial to use a set of test samples (which ideally should
being in the utterance)
$txt = <<<EOD
1) anotar com <a>:
http://n.u.pt/ mais www.di.br e http://www.di.br
2) não anotar:
"http://n.u.pt/sss1" e 'www.di.br' e 'http://www.di.br'
3) anotar com <img>:
http://n.u.pt/sss2.jpg e ainda http://www.di.br/dir/f.png
4) não anotar:
"http://n.u.pt/sss3.jpg" e ainda 'http://www.di.br/dir/f.png'
5) já anotado:
<a href="http://n.u.pt/" target="_blank">http://n.u.pt/</a>
6) já anotadao
<a href="http://n.u.pt/" target="_blank"> http://n.u.pt/f.jpg</a>
7) não anotar:
href="http://www.yoble.com.br/Community/434"> <img
EOD;
In the present case you are producing:
1) anotar com <a>:
<a href='http://n.u.pt/' target='_blank'>http://n.u.pt/</a> mais <a href='http://www.di.br' target='_blank'>www.di.br</a> e <a href='http://www.di.br' target='_blank'>http://www.di.br</a>
2) não anotar:
"http://n.u.pt/sss1" e 'www.di.br' e 'http://www.di.br'
3) anotar com <img>:
<img src='http://n.u.pt/sss2.jpg'> e ainda <img src='http://www.di.br/dir/f.png'>
4) não anotar:
"http://n.u.pt/sss3.jpg" e ainda 'http://www.di.br/dir/f.png'
5) já anotado:
<a href="http://n.u.pt/" target="_blank">http://n.u.pt/</a>
6) já anotadao
<a href="http://n.u.pt/" target="_blank"> http://n.u.pt/f.jpg</a>
7) não anotar:
href="http://www.yoble.com.br/Community/434"> <img