Regular expression for href and www / https / http links

2

Good morning, I have 2 regular expressions, where one serves to get the URL of any href , and another that I need to sort out that would only be to get the URL's that are not within href , that is, only the URL's entered with www , https:// and http:// start. With the second expression I can not get the urls that are inside the href.

1st expression to get href

preg_replace('/href="(?!http:\/\/)([^"]+)"/e', '$this->href("$1", "$id", "$posi_email")', $texto);

2nd expression that should not get the urls inside a href.

preg_replace('/(www.|http:\/\/|https:\/\/)[^ ]+?([^,])+)/e', '$this->url("$1", "$id", "$posi_email")', $texto);
    
asked by anonymous 06.02.2015 / 11:13

1 answer

1

Do not use regular expression (RegEx) to parse HTML. RegEx no is the most appropriate tool to do this.

There are several frameworks to do this in almost any language. In PHP you can do natively with DOM or with XML Parser . At Phyton you have BeautifulSoup . In Java you have JSoup . And so on.

You can not stop linking to one of Stack Overflow's most famous questions: link

    
18.03.2017 / 22:34