I have text within StringBuffer
and I need to check and mark the words that appear more than once. At first I used a circular queue of 10 positions, because I'm interested only in words repeated in a "ray" of 10 words.
It is worth noting that the marking of repeated words can only occur if the repeated words are within a 10-word radius between them. If repeated words are "away" from more than 10 words, they should not be marked.
The Contem
method returns null
if there is no repetition or returns the word that has repetition.
String
is only the variable that contains the full text.
StringBuffer stringProximas = new StringBuffer();
String has = "";
Pattern pR = Pattern.compile("[a-zA-Zà-úÀ-Ú]+");
Matcher mR = pR.matcher(string);
while(mR.find()){
word = mR.group();
nextWord.Inserir(word);//inserir na lista
has = nextWord.Contem();//verifica se há palavras iguais na lista
//um if pra verificar se has é null ou nao
//e aqui marca a palavra repetida, se has for diferente de null
mR.appendReplacement(stringProximas, "");
stringProximas.append(has);
}
public void Inserir(String palavra){
if(this.list[9].equals("null")){
if(this.list[0].equals("null")){
this.list[this.fim]=palavra;
}else{
this.fim++;
this.list[this.fim] = palavra;
}
}else{
//inverte o apontador fim para a posição 0
if(this.inicio == 0 && this.fim == 9){
this.inicio++;
this.fim = 0;
this.list[this.fim] = palavra;
}else if(this.inicio == 9 && this.fim == 8){//inverte o apontador inicio para posição 0
this.inicio = 0;
this.fim++;
this.list[this.fim] = palavra;
}else{
this.inicio++;
this.fim++;
this.list[this.fim] = palavra;
}
}
}
public String Contem() throws Exception{
for(int i=0;i<this.list.length;i++){
for(int j=i+1;j<this.list.length;j++){
if(this.list[i].equals(this.list[j]) && (!this.list[i].equals("null") || !this.list[j].equals("null"))){
//nao pegar a mesma repetição mais de uma vez
if(!this.list[i].equals("?")){
this.list[i] = "?";//provavelmente será retirado isso
return this.list[j];
}
}
}
}
return "null";
}
My big problem: If I find repeated words, I can only mark the second occurrence because even the first one is in the queue, the variable word
will be the second one and because while
I can not mark the second. p>
I'm using this text as an example:
Nowadays, you have to be smart. Our day to day is complicated.
The method should return for example (I put it in bold here, but it's not necessarily the way it's marked):
Today in day , it is necessary to be smart. Our day is day is complicated .