Regular expression to select whole word and case sensitive in accented word

Question

Regular expression to select whole word and case sensitive in accented word

Navigation

#1 by (2 votes)
#2 by (0 votes)

2

I need to make a program that searches a particular word in a set of texts and labels the searched word in the middle of the text.

For this I developed the following method:

 public void grifarTexto(Relato relato, String texto) {
    relato.setDescricaoRelato(relato.getDescricaoRelato().replaceAll("(?i)("
 + texto + ")", "<mark>$1</mark>"));
 }

But there were two problems ...

I would like it to take the whole word, but when you put the start (^) and end ($) marking characters, it ends up not highlighting any part of the text.

Method used:

 public void grifarTexto(Relato relato, String texto) {
    relato.setDescricaoRelato(relato.getDescricaoRelato().replaceAll("(?i)
 ^(" + texto + ")$", "<mark>$1</mark>")); 
 }

2º He is ignoring the lowercase and lowercase characters of the word correctly, except when it has an accent. For example: When I search for the word hand

MÃO (não grifa)
mão (grifa)
mÃo (não grifa)
MãO (grifa)

In other words, it does not ignore the minuscule and minuscule characters of accented letters.

I tested these expressions on the Rubular site to see if they were correct and the return of the site appears to be ok. Links to the tests: link and link

Does anyone know which regular expression I should use to get the validations I want?

java regex

asked by anonymous 15.07.2017 / 21:47

2 answers

0

RESOLVED

Thank you, Guilherme, it worked out!

One detail is that I ended up doing the following:

relato.setDescricaoRelato(relato.getDescricaoRelato().replaceAll("(?i)(?u)(\b" + texto + "\b)", "<mark>$1</mark>"));

18.07.2017 / 06:23

Improving the performance of an Android application Array inside an object, what is the correct syntax?

score 2 · Accepted Answer

As you are working with text word search word search delimiters are not the ^ (start), $ end, as these refer to integer string .

^ - start of string
$ - end of string

To solve this case uses \b (boundary) , which is for words.

As for the accented words, the problem is that just like PHP and Java uses the simple ASCII table to treat the searches ie limiting the first 127 positions of the table.

To solve this problem you need to use the modifier:

Pattern.UNICODE_CHARACTER_CLASS

You could do something like this

Pattern p = Pattern.compile("\b"+texto+"\b", Pattern.UNICODE_CHARACTER_CLASS);