REGEX - problem with validation

2

I need to do the following validation:

   apostrophe , hyphen , hyphen () and numbers (0-9)

For this I did the following:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {
    /** validação */
    public static final String PATTERN= "^[A-Z|a-z|0-9| |Á-Ú|á-ú|Ã-Ũ|ã-ũ|'|-]+$";
    /** testes positivos */
    public static String[] itens = { "á é í ó ú", "ã ẽ ĩ õ ũ", "Á È Ĩ Ã ó", "aeiou", "abc def ghi", "um 23 45",
                                     "Um - 2 - tres quatro", "Um' 2  três' quatro", "maçã", "Â Ê Î ô û", "á Ae Éi Ĩô O"};

    public static void main(String[] args) {
        for(final String s : itens) {
            boolean b = isValid(s);
            System.out.println(b+" : "+s);
        }
    }
    public static boolean isValid(final String string) {
        Pattern p = Pattern.compile(PATTERN);
        Matcher m = p.matcher(string);
        return m.matches();
    }
}

Technically it is for all items to return true .

But the following String ã ẽ ĩ õ ũ , returns false .

How can I do this validation?

Follow the Ideone link

    
asked by anonymous 08.08.2017 / 23:13

1 answer

3

What you need to use are unicode-based regex . The accented characters do not have an order that makes a lot of sense there for mere mortals, so things like á-ú does not work.

The class \p{Letter} (or simply \p{L} ) represents letters in general. However, it also covers non-Latin letters (Cyrillic, Greek, Hebrew, Chinese, Arabic, etc.).

Class \p{IsLatin} considers the Latin characters. However, special symbols (parentheses, brackets, asterisks, percentages, etc.) are also considered special characters.

So the solution is to use the intersection of these two sets with [\p{L}&&[\p{IsLatin}]] or with [\p{IsLatin}&&[\p{L}]] .

Here is the resulting code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class RegexTest {

    /* Validação. */
    public static final Pattern PADRAO = Pattern.compile(
            "^[([\p{L}&&[\p{IsLatin}]]|0-9| |'|-]+$");

    /* Testes positivos. */
    public static String[] positivos = {
            "á é í ó ú",
            "ã ẽ ĩ õ ũ",
            "Á È Ĩ Ã ó",
            "aeiou",
            "abc def ghi",
            "um 23 45",
            "Um - 2 - tres quatro",
            "Um' 2  três' quatro",
            "maçã",
            "Â Ê Î ô û",
            "á Ae Éi Ĩô O",
            "O rato roeu a roupa do rei de Roma",
            "áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩñõũÃẼĨÑÕŨçÇ"
    };

    /* Testes negativos. */
    public static String[] negativos = {
            ".",
            "*",
            "/",
            "<",
            "≃",
            "^",
            "~",
            "()",
            "#",
            "中国"
    };

    public static void main(String[] args) {
        for (final String s : positivos) {
            boolean b = isValid(s);
            System.out.println(b + (b ? " ok - " : " oops - ") + s);
        }
        for (final String s : negativos) {
            boolean b = isValid(s);
            System.out.println(b + (b ? " oops - " : " ok - ") + s);
        }
    }

    public static boolean isValid(final String string) {
        return PADRAO.matcher(string).matches();
    }
}

Here's the output:

true ok - á é í ó ú
true ok - ã ẽ ĩ õ ũ
true ok - Á È Ĩ Ã ó
true ok - aeiou
true ok - abc def ghi
true ok - um 23 45
true ok - Um - 2 - tres quatro
true ok - Um' 2  três' quatro
true ok - maçã
true ok - Â Ê Î ô û
true ok - á Ae Éi Ĩô O
true ok - O rato roeu a roupa do rei de Roma
true ok - áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩñõũÃẼĨÑÕŨçÇ
false ok - .
false ok - *
false ok - /
false ok - <
false ok - ≃
false ok - ^
false ok - ~
false ok - ()
false ok - #
false ok - 中国

See here working on ideone.

Ah, one more detail: The object of class Pattern is expensive to build, however it is immutable, thread-safe and can be reused at will once it is created. So, always choose to build it in static scope if possible, avoiding creating and re-creating% s of% multiple times.

    
08.08.2017 / 23:48