Capture data following pattern to last space

-1

I'm trying to do a scrapping of a website, and would like to capture the value that follows the pattern below:

Advogado: XXXXX Número do Processo: XXXXXX OutroCampo: XXXXX

In general, what separates this information is a space, so what would be captured is, for example, Lawyer: Bill Gates (here would have a space / tab)

Default:

CAMERA_NAME: (optional space) Valor a ser capturado (final space)

I started with this regex, but it only captures the beginning and not the value "in between"

regex: \w+:\s{1}

    
asked by anonymous 04.01.2019 / 01:13

2 answers

1

See if that's what you need:

<?php

$string = 'Advogado: XXX XX Número do Processo: XX XXXX OutroCampo: XXX XX';

preg_match_all('/(Advogado\:)(.+?)(Número\sdo\sProcesso\:)(.+?)(OutroCampo\:)(.+?)$/', $string, $matches);


echo 'MATCHES: <br>';
echo 'Advogado: '.$matches[2][0].'<br>';
echo 'Processo: '.$matches[4][0].'<br>';
echo 'Outro campo: '.$matches[6][0].'<br>';

echo '<pre>';
print_r($matches);
echo '</pre>';
  

Output:

MATCHES: 
Advogado: XXX XX 
Processo: XX XXXX 
Outro campo: XXX XX
Array
(
    [0] => Array
        (
            [0] => Advogado: XXX XX Número do Processo: XX XXXX OutroCampo: XXX XX
        )

    [1] => Array
        (
            [0] => Advogado:
        )

    [2] => Array
        (
            [0] =>  XXX XX 
        )

    [3] => Array
        (
            [0] => Número do Processo:
        )

    [4] => Array
        (
            [0] =>  XX XXXX 
        )

    [5] => Array
        (
            [0] => OutroCampo:
        )

    [6] => Array
        (
            [0] =>  XXX XX
        )

)

Example on RegEx101.com

    
04.01.2019 / 20:27
1

Having spaces like the field delimiter, and allowing spaces in the field values can become a problem. It is unlikely, but if the lawyer's name is "Process Number: XXXXXX Other Field: XXXXX" it is difficult to validate.

Anyway, I imagine that whatever you want is a simple '/^Advogado:\s?(.*)\s?Número do Processo/' . There is no reason to validate a sequence of letters with \w+ since the first word will always be Lawyer.

Example:

$entrada = "Advogado: Bill Gates Número do Processo: XXXXXX OutroCampo: XXXXX";
preg_match('/^Advogado:\s?(.*)\s?Número do Processo/', $entrada, $match);

//toda a expressão 'Advogado: Bill Gates Número do Processo'
echo $match[0];

echo '<br>';

//apenas o match entre parenteses 'Bill Gates'
echo $match[1];
    
04.01.2019 / 01:43