Although it is possible to do a regex - possibly quite complicated - involving lookaheads and lookbehinds , I find it easier to use a little" trick " use capture groups .
Basically, if you have a string like this:
$texto = "123 abc '456' def789'112' ghi";
As far as I understand, you only want to catch 123
and 789
, since they are numbers that are not enclosed in single quotation marks ( '
). So you could have an expression like this:
preg_match_all("/\'\d+\'|(\d+)/", $texto, $matches);
This regex uses toggle ( |
) to say you want a thing or strong> other. These "things" are:
number in single quotation marks: '\d+'
, or
number (without the quotation marks) and within parentheses, to form a catch group: (\d+)
Remembering that some characters in the regex are properly escaped with \
because they are within a string.
With this, a match of the regex may fall into one of two cases:
- If the number is in single quotation marks, it falls in the first stretch
- otherwise, it falls on the second stretch
If you fall in the first case, the catch group is not filled, and if it falls in the second stretch, the catch group is filled.
So, to get numbers that are not enclosed in single quotation marks, just check that the capture group is filled in. And for the array to return in an easier format to check this, we can use the PREG_SET_ORDER
option:
$texto = "123 abc '456' def789'112' ghi";
preg_match_all("/\'\d+\'|(\d+)/", $texto, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
This code produces the following output:
array(4) {
[0]=>
array(2) {
[0]=>
string(3) "123"
[1]=>
string(3) "123"
}
[1]=>
array(1) {
[0]=>
string(5) "'456'"
}
[2]=>
array(2) {
[0]=>
string(3) "789"
[1]=>
string(3) "789"
}
[3]=>
array(1) {
[0]=>
string(5) "'112'"
}
}
Notice that in the matches that fall in the second case (number is not in single quotation marks), the array has 2 positions. The first one corresponds to the match , and the second matches the capture group (in this case they are the same, but depending on the expression, it may not be).
In cases where the number is enclosed in quotation marks, its array only has one position, because in these cases the capture group is not filled.
Then just go through the matches array and check which of the internal arrays has the set capture group (ie just see if the size is greater than 1):
foreach ($matches as $m) {
if (count($m) > 1) { // grupo de captura preenchido (número não está entre aspas)
echo $m[1]. "\n";
}
}
The output of this foreach
is:
123
789
If you want numbers with decimal places, just change \d+
to \d+\.\d+
(which within the string would be \d+\.\d+
) or any other expression you are using to capture numbers.
If the boxes after the comma are optional, for example, you can use \d+(?:\.\d+)?
. It's not the specific focus of the question, but validating numbers can become tricky, since it all depends on what cases you want to consider .
As reminded by @fernandosavio in
19.12.2018 / 17:52