How do I work with REGEx on name validation?

7

I tried in several ways to develop a Regular Expression that validates:

  • maximum length of 60 characters
  • can not be number
  • can not have accents, cedille and punctuation
  • the first letter of the name uppercase and the lower case.
  • Examples of true names:

    • Jose da Silva
    • Nycolas Merino
    • Antonio Ferreira Pacheco

    Examples of fake names:

    • Jose da silva
    • Nycolas merino
    • antonio Ferreira pacheco

    What I was able to create, is this: [A-Z][a-z]+[[ ][A-Z][a-z]+]* However, it is only validating the first and second names, if the person has 3 names does not do the first-character validation "masculo" and also does not validate 60 characters. Yes, I need to do this in regular expression! If you want to test the expression, you can do it in this site: link

        
    asked by anonymous 04.11.2014 / 18:37

    2 answers

    7

    Your general idea is ok (marry the first name, and zero or more times marry a space followed by another name), the problem is in the use of brackets ( [] ) in the second part of the expression - brackets marry one and only one character, among the possible options. Exchanging parentheses should solve the problem:

    [A-Z][a-z]+([ ][A-Z][a-z]+)*
    

    Note that, depending on how this expression is used, it can match only part of a string (eg, 123Fulano Beltrano456 would have its "middle" married). If you want to ensure that the expression only matches the entire string, one means is by using the start delimiters ( ^ ) and end ( $ ):

    ^[A-Z][a-z]+([ ][A-Z][a-z]+)*$
    

    Finally, if you have problems with capture groups, mark the expression inside the parentheses as "do not capture":

    ^[A-Z][a-z]+(?:[ ][A-Z][a-z]+)*$
    

    As for validating by a specific size, that my answer in a related question (" 2 regular expressions in 1 ") shows a way to do this using lookarounds (ie test the string by first regex, without consuming it, then test it again for the second regex):

    (?=^.{2,60}$)^[A-Z][a-z]+(?:[ ][A-Z][a-z]+)*$
    

    Example in Rubular . PS If you are using this regex within an XML, then perhaps the lookaheads are not available a>. I think that's not the case, but make sure the engine used supports this functionality. Otherwise, there is little I can suggest for you to validate the size, it would be ideal to do this in a separate step (such as suggested by Guill in comments ).

    Note that some of your "valid" names are invalid by this regex - those that have "da" in the middle (started in lowercase). If you want to make an exception for "da" (and maybe also for "do", "de" and "and") you can do something like:

    (?=^.{2,60}$)^[A-Z][a-z]+(?:[ ](?:das?|dos?|de|e|[A-Z][a-z]+))*$
    

    Updated example .

        
    04.11.2014 / 19:58
    1

    For names in Portuguese, taking advantage of what was said by @mgibsonbr, and joining with some more things that I found on the net, I was able to reach a near perfect ReGex for Portuguese names:

    /(?=^.{2,60}$)^[A-ZÀÁÂĖÈÉÊÌÍÒÓÔÕÙÚÛÇ][a-zàáâãèéêìíóôõùúç]+(?:[ ](?:das?|dos?|de|e|[A-Z][a-z]+))*$/
    
        
    07.05.2017 / 10:22