What you need to use are unicode-based regex . The accented characters do not have an order that makes a lot of sense there for mere mortals, so things like á-ú
does not work.
The class \p{Letter}
(or simply \p{L}
) represents letters in general. However, it also covers non-Latin letters (Cyrillic, Greek, Hebrew, Chinese, Arabic, etc.).
Class \p{IsLatin}
considers the Latin characters. However, special symbols (parentheses, brackets, asterisks, percentages, etc.) are also considered special characters.
So the solution is to use the intersection of these two sets with [\p{L}&&[\p{IsLatin}]]
or with [\p{IsLatin}&&[\p{L}]]
.
Here is the resulting code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegexTest {
/* Validação. */
public static final Pattern PADRAO = Pattern.compile(
"^[([\p{L}&&[\p{IsLatin}]]|0-9| |'|-]+$");
/* Testes positivos. */
public static String[] positivos = {
"á é í ó ú",
"ã ẽ ĩ õ ũ",
"Á È Ĩ Ã ó",
"aeiou",
"abc def ghi",
"um 23 45",
"Um - 2 - tres quatro",
"Um' 2 três' quatro",
"maçã",
"Â Ê Î ô û",
"á Ae Éi Ĩô O",
"O rato roeu a roupa do rei de Roma",
"áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩñõũÃẼĨÑÕŨçÇ"
};
/* Testes negativos. */
public static String[] negativos = {
".",
"*",
"/",
"<",
"≃",
"^",
"~",
"()",
"#",
"中国"
};
public static void main(String[] args) {
for (final String s : positivos) {
boolean b = isValid(s);
System.out.println(b + (b ? " ok - " : " oops - ") + s);
}
for (final String s : negativos) {
boolean b = isValid(s);
System.out.println(b + (b ? " oops - " : " ok - ") + s);
}
}
public static boolean isValid(final String string) {
return PADRAO.matcher(string).matches();
}
}
Here's the output:
true ok - á é í ó ú
true ok - ã ẽ ĩ õ ũ
true ok - Á È Ĩ Ã ó
true ok - aeiou
true ok - abc def ghi
true ok - um 23 45
true ok - Um - 2 - tres quatro
true ok - Um' 2 três' quatro
true ok - maçã
true ok - Â Ê Î ô û
true ok - á Ae Éi Ĩô O
true ok - O rato roeu a roupa do rei de Roma
true ok - áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩñõũÃẼĨÑÕŨçÇ
false ok - .
false ok - *
false ok - /
false ok - <
false ok - ≃
false ok - ^
false ok - ~
false ok - ()
false ok - #
false ok - 中国
See here working on ideone.
Ah, one more detail: The object of class Pattern
is expensive to build, however it is immutable, thread-safe and can be reused at will once it is created. So, always choose to build it in static scope if possible, avoiding creating and re-creating% s of% multiple times.