Regular expression to retrieve strings that begin with a colon (:)

4

I need a regular expression to retrieve a list of strings that start with the colon (":") and end with the space character or end of parentheses (")"). >

Example:

String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";

Note

There is no default for "keywords," the words that are accompanied by the colon; they have varied size and are succeeded by white space or end of parentheses as already mentioned. How can I get only the list of these strings ?

Expected result

[TEXTURNER, TEXTOQQDENOVO, TEXTOQQMAIS, TEXTO3343]

    
asked by anonymous 11.12.2014 / 20:15

2 answers

5

The expression suggested by Sergio in comments seems to be the simplest way, saved by the " (which was not mentioned in the question), and by the missing white space (as pointed out by Gustavo Cinque in the comments) . My suggestion is to use it to find all marriages:

List<String> resultado = new ArrayList<String>();
Matcher m = Pattern.compile(":([^:\) ]+)").matcher(texto);
while ( m.find() )
    resultado.add(m.group(1));
  

Note: My previous answer (in file) does not apply in this case, first because it is no longer necessary to use trim (the string no longer contains whitespace), second because it does not it is necessary to remove the middle spaces (idem).

    
11.12.2014 / 21:04
1

Response without using RegEx:

import java.util.*;

class Program {
    public static void main (String[] args) {
        String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";
        List<String> textos = new ArrayList<String>();
        while (texto.length() > 0) {
            texto = texto.substring(texto.indexOf(":") + 1);
            int posicaoParentese = texto.indexOf(")");
            int posicaoEspaco = texto.indexOf(" ");
            int posicaoFinal = Math.min((posicaoParentese == -1 ? Integer.MAX_VALUE : posicaoParentese), (posicaoEspaco == -1 ? Integer.MAX_VALUE : posicaoEspaco));
            textos.add(texto.substring(0, posicaoFinal));
            texto = texto.substring(posicaoFinal + 1);
        }
        for (String item : textos) System.out.println(item);
    }
}

See running on ideone . And no Coding Ground . Also put it in GitHub for future reference .

I will leave the previous attempts to help anyone who has a similar problem. The question was rather confusing, forcing the answers (not just mine) to be edited to arrive at the desired result. I hope you're ok now.

Reading your question better I think you want something else, I think it would be just this.

import java.util.*;

class Program {
    public static void main (String[] args) {
        String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";
        List<String> textos = new ArrayList<String>();
        while (texto.length() > 0) {
            texto = texto.substring(texto.indexOf(":") + 1);
            int posicaoParentese = texto.indexOf(")");
            int posicaoEspaco = texto.indexOf(" ");
            int posicaoFinal = Math.min((posicaoParentese == -1 ? Integer.MAX_VALUE : posicaoParentese), (posicaoEspaco == -1 ? Integer.MAX_VALUE : posicaoEspaco));
            textos.add(texto.substring(0, posicaoFinal));
            texto =  texto.substring(posicaoFinal + 1);
        }
        for (String item : textos) System.out.println(item);
    }
}

See running on ideone . And no Coding Ground . Also put it in GitHub for future reference .

If it has not yet been answered, you do not need RegEx for this, just a Split() :

class Program {
    public static void main (String[] args) {
        String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";
        String[] textos = texto.split(":");
        for (String item : textos) System.out.println(item);
    }
}

See running on ideone . And no Coding Ground . Also I've placed GitHub for future reference .

If you do not want what comes before the first : simply ignore element 0 of arryay (texts [0]).

    
11.12.2014 / 20:39