PHP Regex Issues

2

I'm trying to get some information from a webpage and I'm using regex for it.

I'm using regex101.com to test the pattern and got to one that suits me perfectly. It happens that this pattern works perfectly on regex101.com, but when I do the same thing in PHP, it does not match.

This is my code on regex101.com: link

See that 7 hits are found.

This is my PHP code. While using the same text and pattern, no match is made.

$pattern = '/<span\>([0-9\/]{10})( {0,})+([A-zÁ-ÿ ]+) {1,}(&#[0-9]{1,6})*[; ]*([0-9]{2}\:{0,}[0-9]{0,2}) +[A-Z]+ +([A-z]{2,}-{0,}[A-z]*)<br \/> <\/span>([0-9]{1,2}). +([0-9]{1,5}) {0,}[0-9]{1,2} [ A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) {1,}[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}</';

$mirror = file_get_contents('http://resultadodojogodobicho.deunopostehoje.com/sao-paulo/');

$data = [];

preg_match_all($pattern, $mirror, $data, PREG_SET_ORDER, 0);

var_dump($data);

Does anyone know what might be happening?

    
asked by anonymous 11.08.2017 / 16:27

1 answer

2

There are two errors in your pattern that when passing to PHP can cause this error. The first one is very simple.

• Your PHP code is being treated with ASCII formatting, and your text has special characters like Á , É , etc. and must be treated as UTF-8 . For the compiler to correct this error, you will need to add the /u flag to your regex.

• The other error is also just a lack of attention. To separate white characters from your pattern, you used a space: . This can cause the compiler to not correctly identify them depending on the version of your PHP. This error can be solved with the terms: \s or \h .
NOTE: Do not use \s in this case! The term can on all white character types, including line breaks (% with%)! Use \n to give match on all horizontal space characters .

Regex new:

/<span\>([0-9\/]{10})(\h{0,})+([A-zÁ-ÿ\h]+)\h{1,}(&#[0-9]{1,6})*[;\h]*([0-9]{2}\:{0,}[0-9]{0,2})\h+[A-Z]+\h+([A-z]{2,}-{0,}[A-z]*)<br\h\/>\h<\/span>([0-9]{1,2}).\h+([0-9]{1,5})\h{0,}[0-9]{1,2}\h[\hA-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h{1,}[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}</u

My result:

C:\wamp64\www\testcode.php:9:
array (size=7)
  0 => 
    array (size=21)
      0 => string '<span>12/08/2017 SÁBADO &#8211; 14 HORAS PT-SP<br /> </span>1° 6319  05 Cachorro<br /> 2° 7792  23 Urso<br /> 3° 0978  20 Peru<br /> 4° 0043  11 Cavalo<br /> 5° 8487  22 Tigre<br /> 6° 3619  05 Cachorro<br /> 7° 237  10 Coelho<' (length=242)
      1 => string '12/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'SÁBADO' (length=7)
      4 => string '&#8211' (length=6)
      5 => string '14' (length=2)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '6319' (length=4)
      9 => string '2' (length=1)
      10 => string '7792' (length=4)
      11 => string '3' (length=1)
      12 => string '0978' (length=4)
      13 => string '4' (length=1)
      14 => string '0043' (length=4)
      15 => string '5' (length=1)
      16 => string '8487' (length=4)
      17 => string '6' (length=1)
      18 => string '3619' (length=4)
      19 => string '7' (length=1)
      20 => string '237' (length=3)
  1 => 
    array (size=21)
      0 => string '<span>11/08/2017 SEXTA FEIRA &#8211; 18 HORAS PTN-SP<br /> </span>1° 8116  04 Borboleta<br /> 2° 2115  04 Borboleta<br /> 3° 1720  05 Cachorro<br /> 4° 7308  02 Águia<br /> 5° 2939  10 Coelho<br /> 6° 2198  25 Vaca<br /> 7° 165  17 Macaco<' (length=254)
      1 => string '11/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'SEXTA FEIRA' (length=11)
      4 => string '&#8211' (length=6)
      5 => string '18' (length=2)
      6 => string 'PTN-SP' (length=6)
      7 => string '1' (length=1)
      8 => string '8116' (length=4)
      9 => string '2' (length=1)
      10 => string '2115' (length=4)
      11 => string '3' (length=1)
      12 => string '1720' (length=4)
      13 => string '4' (length=1)
      14 => string '7308' (length=4)
      15 => string '5' (length=1)
      16 => string '2939' (length=4)
      17 => string '6' (length=1)
      18 => string '2198' (length=4)
      19 => string '7' (length=1)
      20 => string '165' (length=3)
  2 => 
    array (size=21)
      0 => string '<span>11/08/2017 SEXTA FEIRA &#8211; 14 HORAS PT-SP<br /> </span>1° 2254  14 Gato<br /> 2° 0696  24 Veado<br /> 3° 8048  12 Elefante<br /> 4° 5440  10 Coelho<br /> 5° 3019  05 Cachorro<br /> 6° 9457  15 Jacaré<br /> 7° 568  18 Porco<' (length=248)
      1 => string '11/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'SEXTA FEIRA' (length=11)
      4 => string '&#8211' (length=6)
      5 => string '14' (length=2)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '2254' (length=4)
      9 => string '2' (length=1)
      10 => string '0696' (length=4)
      11 => string '3' (length=1)
      12 => string '8048' (length=4)
      13 => string '4' (length=1)
      14 => string '5440' (length=4)
      15 => string '5' (length=1)
      16 => string '3019' (length=4)
      17 => string '6' (length=1)
      18 => string '9457' (length=4)
      19 => string '7' (length=1)
      20 => string '568' (length=3)
  3 => 
    array (size=21)
      0 => string '<span>10/08/2017 QUINTA FEIRA &#8211; 18 HORAS PTN-SP<br /> </span>1° 7961  16 Leão<br /> 2° 9257  15 Jacaré<br /> 3° 6104  01 Avestruz<br /> 4° 0089  23 Urso<br /> 5° 3311  03 Burro<br /> 6° 6722  06 Cabra<br /> 7° 694  24 Veado<' (length=246)
      1 => string '10/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'QUINTA FEIRA' (length=12)
      4 => string '&#8211' (length=6)
      5 => string '18' (length=2)
      6 => string 'PTN-SP' (length=6)
      7 => string '1' (length=1)
      8 => string '7961' (length=4)
      9 => string '2' (length=1)
      10 => string '9257' (length=4)
      11 => string '3' (length=1)
      12 => string '6104' (length=4)
      13 => string '4' (length=1)
      14 => string '0089' (length=4)
      15 => string '5' (length=1)
      16 => string '3311' (length=4)
      17 => string '6' (length=1)
      18 => string '6722' (length=4)
      19 => string '7' (length=1)
      20 => string '694' (length=3)
  4 => 
    array (size=21)
      0 => string '<span>10/08/2017 QUINTA FEIRA &#8211; 14 HORAS PT-SP<br /> </span>1° 6483  21 Touro<br /> 2° 3411  03 Burro<br /> 3° 8032  08 Camelo<br /> 4° 1259  15 Jacaré<br /> 5° 2156  14 Gato<br /> 6° 1341  11 Cavalo<br /> 7° 113  04 Borboleta<' (length=248)
      1 => string '10/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'QUINTA FEIRA' (length=12)
      4 => string '&#8211' (length=6)
      5 => string '14' (length=2)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '6483' (length=4)
      9 => string '2' (length=1)
      10 => string '3411' (length=4)
      11 => string '3' (length=1)
      12 => string '8032' (length=4)
      13 => string '4' (length=1)
      14 => string '1259' (length=4)
      15 => string '5' (length=1)
      16 => string '2156' (length=4)
      17 => string '6' (length=1)
      18 => string '1341' (length=4)
      19 => string '7' (length=1)
      20 => string '113' (length=3)
  5 => 
    array (size=21)
      0 => string '<span>09/08/2017 QUARTA FEIRA EXTRAÇÃO DAS 13:20 HORAS PT-SP<br /> </span>1• 8222  06 Cabra<br /> 2• 9302  01 Avestruz<br /> 3• 1143  11 Cavalo<br /> 4• 0626  07 Carneiro<br /> 5• 7363  16 Leão<br /> 6• 6656  14 Gato<br /> 7• 481  21 Touro<' (length=264)
      1 => string '09/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'QUARTA FEIRA EXTRAÇÃO DAS' (length=27)
      4 => string '' (length=0)
      5 => string '13:20' (length=5)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '8222' (length=4)
      9 => string '2' (length=1)
      10 => string '9302' (length=4)
      11 => string '3' (length=1)
      12 => string '1143' (length=4)
      13 => string '4' (length=1)
      14 => string '0626' (length=4)
      15 => string '5' (length=1)
      16 => string '7363' (length=4)
      17 => string '6' (length=1)
      18 => string '6656' (length=4)
      19 => string '7' (length=1)
      20 => string '481' (length=3)
  6 => 
    array (size=21)
      0 => string '<span>08/08/2017 TERÇA FEIRA &#8211; 18 HORAS PTN-SP<br /> </span>1• 3686  22 Tigre<br /> 2• 8315  04 Borboleta<br /> 3• 0928  07  Carneiro<br /> 4• 8461  16 Leão<br /> 5• 6494  24 Veado<br /> 6• 7884  21 Touro<br /> 7• 649  13 Galo<' (length=257)
      1 => string '08/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'TERÇA FEIRA' (length=12)
      4 => string '&#8211' (length=6)
      5 => string '18' (length=2)
      6 => string 'PTN-SP' (length=6)
      7 => string '1' (length=1)
      8 => string '3686' (length=4)
      9 => string '2' (length=1)
      10 => string '8315' (length=4)
      11 => string '3' (length=1)
      12 => string '0928' (length=4)
      13 => string '4' (length=1)
      14 => string '8461' (length=4)
      15 => string '5' (length=1)
      16 => string '6494' (length=4)
      17 => string '6' (length=1)
      18 => string '7884' (length=4)
      19 => string '7' (length=1)
      20 => string '649' (length=3)' (length=3)
    
13.08.2017 / 14:30