Problem, regex capturing everything

3

I'm having trouble with a Regex query, it's not only catching 1 de 8 as I wish, it's taking a lot more of it, see: link

This is the string I'm trying to match:

<span class='pages'>1 de 8</span><span class='current'>1</span><a class="page larger" href="http://megafilmeshd.net/category/lancamentos/page/2/">2</a><a class="page larger" href="http://megafilmeshd.net/category/lancamentos/page/3/">3</a><span class='extend'>...</span>

and regex:

<span class='pages'>(.*)<\/span>
    
asked by anonymous 17.01.2015 / 19:16

2 answers

3

To capture only

<span class='pages'>1 de 8</span>

Add a question in the regex, it means that the content group inside the parentesses will combine only once, it 'nullifies' matching everything as possible ( .* )

<span class='pages'>(.*?)<\/span>
    
17.01.2015 / 19:24
2

Your problem is that the * q quantifier is greedy , which means that it will match as much input as possible before giving up. If you want it to match as little as possible, you can use its lazy variant, *? :

<span class='pages'>(.*?)<\/span>

That said, think twice before using regular expressions to interpret HTML. In some very limited cases may even work, but in general it is better to use a complete parser for this language.

    
17.01.2015 / 19:24