Regular expression to clone a value in a field

5

I have the following values:

PRO|00000001|GASOLINA ADITIVADA|0101001|27101259|

I would like an expression that would change the values and look like this:

PRO|00000001|GASOLINA ADITIVADA|00000001|27101259|

I already have more or less an idea of what it will be like:

(^PRO\|)(\d*)(.\w*.)

But the problem is that it's only catching up:

PRO|00000001|GASOLINA 

And I'm not getting the remaining value of ADITIVADA| .

    
asked by anonymous 26.11.2018 / 17:10

1 answer

3

I find it easier to make a split , separating the fields by | and then concatenating what you need, but if you want to use regex, come on.

If your entries are always separated by | and are always in this order, you can be more specific, saying exactly what you want and what you do not want .

If you only want the lines that start with "PRO" and have "GASOLINE ADVANTAGED", you can use these texts explicitly. Otherwise, you can use [^|] , which means "anything other than | ".

Using the dot ( . ) is too broad because it means "any character". By explicitly using | for the field separator and [^|] for "anything other than the separator", the regex becomes more specific to you.

Another detail is deciding whether to use + instead of * . This is because * means "zero or more occurrences", ie if it has nothing, it is also valid. The + means "one or more occurrences", ie the field can not be empty.

The same goes for the numbers, because \d* will accept the empty field. It is best to use \d+ , which checks to see if it has at least one digit. Or, if you know the exact quantity, use \d{8} for exactly 8 digits, or \d{8,} for "8 or more digits" or even \d{8,20} for "between 8 and 20 digits." Choose what fits best in your use cases and tailor the quantities to what you need.

Anyway, a regex option would be:

^PRO\|\d+\|[^|]+\|\d+\|.*$

Note that | should be escaped and written as \| , since only | means toggle (ie PRO|\d+ means "PRO" or digits). With this we have:

  • ^PRO\| : starts with "PRO", followed by |
  • \d+\| : digits, followed by |
  • [^|]+\| : one or more characters that are not | , followed by |
  • \d+\| : digits, followed by |
  • .*$ : zero or more characters, to the end of the string ( $ )

Already to make the substitution, it depends on the language you are using, since each has its own functions for replacing strings with regex.

Anyway, for this you usually use parentheses to group the parts you want to capture, so the regex looks like this:

^(PRO\|)(\d+\|)([^|]+\|)\d+\|(.*)$

The first pair of parentheses is (PRO\|) , so this will be the first group, the second pair of parentheses is (\d+\|) (the digits plus | ), so this will be the second group and so on.

To do the override, you use the $1 syntax to refer to the first group, $2 to the second, and so on. Depending on the language / engine, the syntax is , , etc. Thus, the result would be $1$2$3$2$4 (group 2 repeats in place of the fourth field). See here for an example.

    
26.11.2018 / 17:45