Delete letters before after a number

2

Hello, I have a String ( var test ), and I'd like to do some operations on it. The desired result is a number, preceded or of the letter p or ps ) (ex s ) : ps1), and a number, followed by any letter az , in any quantity, but without repeating the same letters (ex: p1abd, would correspond, MAS p1abbd not because I thought it was ideal to use test[i].match(/\b(p|s|ps)\d[a-z]*\b/) , but, as you can see in the 1st Example, the array comes with 2 values, the second one being a (s) before the number, undesirable , I just want a value, the first . It seems that this has to do with the use of parentheses, but I did not get another combination that worked. In the 2nd example, it appears exactly as I want, but the regex is wrong because it lists any combination of the letters p and s, but does not match the letters ps . Already in the 3rd Example I want to delete all except the letters after the number , but I had problems probably because of Array w / two values. And in the 4th Example I want to delete everything except the number between the letters . In the case of the 3rd and 4th Examples, I know there are simpler ways to do this, for example: test[i].match(/\d/g).toString() , to display only the number. But I'd like to know, for learning purposes, how to isolate the pattern number to be deleted, just like I did in the 3rd Example. I tried something like: ...replace(/[p|s|ps][^\d][a-z]*/, '')) , but it did not work.

var test = 'xyz p1abc xyz; xyz s3de xyz; xyz ps2fgh xyz'; // p1abc, s3de, ps2fgh
test = test.split(';');

for (var i = 0; i < test.length; i++) {
test[i] = test[i].replace(/^\s+|\s+$/g, '');

// Exibir as letras 'p', 's', ou 'ps' ANTES do nº, e qualquer letra APÓS o nº em qualquer quantidade.
console.log(test[i].match(/\b(p|s|ps)\d[a-z]*\b/)); // 1º Exemplo
// Resultado:
Array [ "p1abc", "p" ] // repete a letra p
Array [ "s3de", "s" ] // repete a letra s
Array [ "ps2fgh", "ps" ] // repete as letras ps

console.log(test[i].match(/\b[p|s|ps]\d[a-z]*\b/)); // 2º Exemplo
// Só não funciona porque não inclui o 'ps'. Resultado:
Array [ "p1abc" ]
Array [ "s3de" ]
null

// Exibir só as letras APÓS o número. ([a-z]*) // 3º Exemplo
console.log(test[i].match(/\b[p|s|ps]\d[a-z]*\b/).toString().replace(/[p|s|ps]\d[^a-z]*/, ''));
// Novamente, só não funciona porque não inclui o 'ps'. Resultado:
TypeError: test[i].match(...) is null
"abc"
"de"

// Exibir só o número. (\d) // 4º Exemplo
console.log(test[i].match(/\b[p|s|ps]\d[a-z]*\b/).toString().replace(/[p|s|ps]\d[a-z]*/, ''));
// Para este não achei solução.    
}
    
asked by anonymous 18.03.2015 / 05:33

1 answer

3

You need to understand how capture groups work:

  • The first value returned is always the entire match (this can not be changed);
  • For every% re_data in regex , an additional value is returned, in consecutive indexes, corresponding to that part of regex ; can be empty (eg, (...) applied to (a?)b will come with empty group).
  • If you do not want a group to be captured, use b .

Example:

\b(p|s|ps)(\d)([a-z]*)\b

Three capture groups. Some possible results:

p1abc ==> "p1abc", "p", "1", "abc"
ps2   ==> "ps2", "ps", "2", ""

Another example:

\b(?:p|s|ps)\d[a-z]*\b

No capture groups:

p1abc ==> "p1abc"
ps2   ==> "ps2"

In these three examples, I will capture only one part of the string, not the others:

\b(p|s|ps)\d[a-z]*\b
    p1abc ==> "p1abc", "p"
    ps2   ==> "ps2", "ps"

\b(?:p|s|ps)(\d)[a-z]*\b
    p1abc ==> "p1abc", "1"
    ps2   ==> "ps2", "2"

\b(?:p|s|ps)\d([a-z]*)\b
    p1abc ==> "p1abc", "abc"
    ps2   ==> "ps2", ""

Substitutions

Once you have established catch groups in your regex , in addition to returning them in the (?:...) method you can also reference them and use them during a replacement in the match . You do this using replace , where $n is the index of the group (starting with n ). For example, assuming the pattern with all three groups, let's say you want to replace the prefix, digit, or suffix with "foo", keeping the rest intact:

"p1abc".replace(/\b(p|s|ps)(\d)([a-z]*)\b/, 'foo$2$3'); // foo1abc
"p1abc".replace(/\b(p|s|ps)(\d)([a-z]*)\b/, '$1foo$3'); // pfooabc
"p1abc".replace(/\b(p|s|ps)(\d)([a-z]*)\b/, '$1$2foo'); // p1foo

Observations

  • If you really can not use capture groups, and need to match only part of the string, read about lookarounds . These regexes , for example, only house the prefix, only the number and only the suffix:

    \b(?:p|s|ps)(?=\d[a-z]*\b)
    (?<=\bp|\bs|\bps)\d(?=[a-z]*\b)
    (?<=\bp\d|\bs\d|\bps\d)[a-z]*\b
    

    But if you can avoid this, avoid, capture groups are much simpler to understand and apply (and have fewer restrictions than lookarounds - for example, JavaScript does not support lookbehinds , and the vast majority of languages only accept fixed-length lookbehinds.)

  • Your attempt 1 did not work because the brackets match only one character, among those listed. Your example would match [p|s|ps] , p or s !

  • If you do not want the lyrics to repeat themselves at the end, I suggest doing this other than regex . Theoretically is possible (it is a regular language), but in practice the number of states would equal the number of possible combinations (since regex would need to "remember" already appeared to not allow them to appear again). The performance would probably be pretty bad ...

    But ... it's not impractical at all! This answer in SOen shows a way to marry a string without repeating by combining a catch group, a backreference (reference to an already captured group) and a negative lookahead :

    ^(?:([A-Za-z])(?!.*))*$
    

    Adapted to your case, with all the capture groups (you can not get rid of the last one, at least then I kicked the bucket and included all ...), it would look like this:

    \b(p|s|ps)(\d)((?:([a-z])(?![a-z]*))*)\b
    

    See a example in rubular : the married slice is highlighted, and capturing groups 1 through 3 show the prefix, the digit and the suffix (group 4 is useless).

18.03.2015 / 06:47