Remove duplicate characters in string if it is not a digraph

1

How to remove duplicate characters in a string, if not a digraph (rr, ss) using Regex? Ex:

  

Oiiiii = > Hi

     

Aloooo = > Alo

     

Past = > Past

     

Carroooo = > Car

If rr or ss appears at the beginning or end of the word, it can be removed, eg

  

Carrosss = > Cars

    
asked by anonymous 21.04.2018 / 20:56

1 answer

2

I start by saying that this regex does not cover 100% of its cases, but hits almost everyone. And honestly, I do not see much of a way to cover without drastically complicating and maybe using code by hand.

But let's start with the regex itself:

([^rs])(?=+)|(rr)(?=r+)|(ss)(?=s+)

View on regex101

Explanation:

([^rs])  - Qualquer letra que não r ou s
(?=+)  - Que se repita uma vez ou mais
|(rr)    - Ou dois r's
(?=r+)   - Que tenham mais r's à frente
|(ss)    - Ou dois s's
(?=s+)   - Que tenham mais s's à frente

And it does the substitution for nothing, empty text, because what is captured are the duplicate letters that you want to remove.

Testing: Entry:

oiiiiiiii amiggggos passssado Carrrrrrrros

Output:

oi amigos passado Carros

You can always adjust the regex to other letters that you want to let duplicate by moving [^rs] and (rr) groups, adding whatever you want.

Note that if you enter Carross the regex can not perceive that it was supposed to be Carros , but this can complicate and well.

    
21.04.2018 / 21:39