Validate first and last name with regular expression

Question

Validate first and last name with regular expression

Navigation

#1 by (18 votes)
#2 by (16 votes)
#3 by (9 votes)
#4 by (0 votes)

17

I need to create a regular expression that validates whether the format of the user's first and last name is valid, for example:

Maria Silva | true
Maria  Silva | false
Maria silva | false
maria Silva | false
MariaSilva  | false

In other words, it can only accept when the first and last name is in capital letters and with only one space between the words, I tried to use this expression:

[A-Z][a-z].* [A-Z][a-z].*

However, this is accepting modes that do not fit the pattern I need.

regex

asked by anonymous 03.10.2017 / 20:20

4 answers

16

TL; DR

The regex is:

^(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+(?:\-(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+)*(?: (?:(?:e|y|de(?:(?: la| las| lo| los))?|do|dos|da|das|del|van|von|bin|le) )?(?:(?:(?:d'|D'|O'|Mc|Mac|al\-))?(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+|(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+(?:\-(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+)*))+(?: (?:Jr\.|II|III|IV))?$

According to the regex unicode page, \p{IsLatin} is supported by Java, C #, PHP , Perl and Ruby. You can test this regular expression on the regexplanet site.

Detailed explanation

First, let's define some rules for full names:

At least two names (although there are people who only have one name, such as the Emperor Akihito of Japan , but let's leave them out).
Exactly a space separating names.
The same name or surname can be composed, separated by a hyphen. There may be more than one hyphen (eg "Louis Augusto de Saxe-Coburgo-Gota ")
Accepting accents.
The beginning of each word must be uppercase and the other lowercase letters.
Must accept entirely minuscule conjunctions.
Surnames such as " O'Brian ", " d'Alembert " and " McDonald " should be accepted.
There can not be two or more conjunctions followed (eg " da "). However " María Antonieta de las Nieves is a valid name, so the conjunction can be composed.
Names and surnames (but not conjunctions) must have at least two letters.
Samuel Eto'o "Samuel Eto'o" ). But they can not be at the beginning or the end of the word, nor be consecutive.
Some names such as "Martin Luther King Jr." and " William Henry Gates III "has suffixes."

To do this with regex, let's stipulate the following:

Surnames can be prefixed with " d '", " D' ",

So, the structure of the name would be this:

NOME-COMPLETO := PRENOME (espaço [CONJUNÇÃO espaço] SOBRENOME)+ (espaço SUFIXO)?
SOBRENOME := (PREFIXO)? NOME | PRENOME
PRENOME := NOME ("-" NOME)*
NOME := MAIÚSCULA (("'")? MINÚSCULA)+
PREFIXO := "d'" | "O'" | "Mc" | "Mac" | "al-"
SUFIXO = "Jr." | "II" | "III" | "IV"
CONJUNÇÃO := "e" | "y" | "de" (" lo" | " los" | " la" | " las")? | "do" | "dos" | "da" | "das" | "del" | "van" | "von" | "bin" | "le"
MAIÚSCULA := [\p{Lu}&&[\p{IsLatin}]]
MINÚSCULA := [\p{Ll}&&[\p{IsLatin}]]

This [\p{Lu}&&[\p{IsLatin}]] rule is responsible for recognizing a character that is at the intersection between the set of uppercase letters ( \p{Lu} ) and latin characters ( \p{IsLatin} ). Therefore, this also accepts accented upper Latin characters. The ( \p{Ll} ) is for lowercase letters. See more about the character classes in this other answer of mine and also in this link .

The above rule set can be read as a context-free grammar. However, it can be reduced in a regular expression, since there are no recursive rules in it. To do this, simply replace the rules that are below in the above rules.

However, how to construct this regex manually is a boring, labor-intensive, very error-prone process and the resulting regex is a monstrosity, especially if you have to change something from time to time, I made a program that builds the corresponding regex and also tests it with several different names. Here's the program (in Java):

Building and testing regex

import java.util.regex.Pattern;
import java.util.StringJoiner;

class TesteRegex {

    private static final String MAIUSCULA = "(?:[\p{Lu}&&[\p{IsLatin}]])";
    private static final String MINUSCULA = "(?:[\p{Ll}&&[\p{IsLatin}]])";

    private static final String PREFIXO = choice("d'", "D'", "O'", "Mc", "Mac", "al\-");
    private static final String SUFIXO = choice("Jr\.", "II", "III", "IV");
    private static final String CONJUNCAO = choice("e", "y", "de" + opt(choice(" la", " las", " lo", " los")), "do", "dos", "da", "das", "del", "van", "von", "bin", "le");
    private static final String NOME = MAIUSCULA + plus(opt("'") + MINUSCULA);
    private static final String PRENOME = NOME + star("\-" + NOME);
    private static final String SOBRENOME = choice(opt(PREFIXO) + NOME, PRENOME);
    private static final String NOME_COMPLETO = "^" + PRENOME + plus(" " + opt(CONJUNCAO + " ") + SOBRENOME) + opt(" " + SUFIXO) + "$";

    private static String opt(String in) {
        return "(?:" + in + ")?";
    }

    private static String plus(String in) {
        return "(?:" + in + ")+";
    }

    private static String star(String in) {
        return "(?:" + in + ")*";
    }

    private static String choice(String... in) {
        StringJoiner sj = new StringJoiner("|", "(?:", ")");
        for (String s : in) {
            sj.add(s);
        }
        return sj.toString();
    }

    private static final Pattern REGEX_NOME = Pattern.compile(NOME_COMPLETO);

    private static final String[] NOMES = {
        "Maria Silva",
        "Pedro Carlos",
        "Luiz Antônio",
        "Albert Einstein",
        "João Doria",
        "Barack Obama",
        "Friedrich von Hayek",
        "Ludwig van Beethoven",
        "Jeanne d'Arc",
        "Saddam Hussein al-Tikriti",
        "Osama bin Mohammed bin Awad bin Laden",
        "Luís Inácio Lula da Silva",
        "Getúlio Dornelles Vargas",
        "Juscelino Kubitschek de Oliveira",
        "Jean-Baptiste le Rond d'Alembert",
        "Pierre-Simon Laplace",
        "Hans Christian Ørsted",
        "Joseph Louis Gay-Lussac",
        "Scarlett O'Hara",
        "Ronald McDonald",
        "María Antonieta de las Nieves",
        "Pedro de Alcântara Francisco António João Carlos Xavier de Paula Miguel Rafael Joaquim José Gonzaga Pascoal Cipriano Serafim",
        "Luís Augusto Maria Eudes de Saxe-Coburgo-Gota",
        "Martin Luther King Jr.",
        "William Henry Gates III",
        "John William D'Arcy",
        "John D'Largy",
        "Samuel Eto'o",
        "Åsa Ekström",
        "Gregor O'Sulivan",
        "Ítalo Gonçalves"
    };

    private static final String[] LIXOS = {
        "",
        "Maria",
        "Maria-Silva",
        "Marcos E",
        "E Marcos",
        "Maria  Silva",
        "Maria Silva ",
        " Maria Silva ",
        "Maria silva",
        "maria Silva",
        "MARIA SILVA",
        "MAria Silva",
        "Maria SIlva",
        "Jean-Baptiste",
        "Jeanne d' Arc",
        "Joseph Louis Gay-lussac",
        "Pierre-simon Laplace",
        "Maria daSilva",
        "Maria~Silva",
        "Maria Silva~",
        "~Maria Silva",
        "Maria~ Silva",
        "Maria ~Silva",
        "Maria da da Silva",
        "Maria da e Silva",
        "Maria de le Silva",
        "William Henry Gates iii",
        "Martin Luther King, Jr.",
        "Martin Luther King JR",
        "Martin Luther Jr. King",
        "Martin Luther King Jr. III",
        "Maria G. Silva",
        "Maria G Silva",
        "Maria É Silva",
        "Maria wi Silva",
        "Samuel 'Etoo",
        "Samuel Etoo'",
        "Samuel Eto''o"
    };

    private static void testar(String nome) {
        boolean bom = REGEX_NOME.matcher(nome).matches();
        System.out.println("O nome [" + nome + "] é bom? " + (bom ? "Sim." : "Não."));
    }

    public static void main(String[] args) {
        System.out.println("Regex: " + NOME_COMPLETO);

        System.out.println();
        System.out.println("Esses nomes devem ser bons:");
        for (String s : NOMES) {
            testar(s);
        }

        System.out.println();
        System.out.println("Esses nomes devem ser ruins:");
        for (String s : LIXOS) {
            testar(s);
        }
    }
}

This program constructs the regex using non-catching groups (% with%), zero-or-more-times operator (% with%), one-or-more-times operator, (% with%), beginning of string (% with%) and end of string (% with%).

Here's the program working on ideone. Here is the output of this program:

Regex: ^(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+(?:\-(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+)*(?: (?:(?:e|y|de(?:(?: la| las| lo| los))?|do|dos|da|das|del|van|von|bin|le) )?(?:(?:(?:d'|D'|O'|Mc|Mac|al\-))?(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+|(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+(?:\-(?:[\p{Lu}&&[\p{IsLatin}]])(?:(?:')?(?:[\p{Ll}&&[\p{IsLatin}]]))+)*))+(?: (?:Jr\.|II|III|IV))?$

Esses nomes devem ser bons:
O nome [Maria Silva] é bom? Sim.
O nome [Pedro Carlos] é bom? Sim.
O nome [Luiz Antônio] é bom? Sim.
O nome [Albert Einstein] é bom? Sim.
O nome [João Doria] é bom? Sim.
O nome [Barack Obama] é bom? Sim.
O nome [Friedrich von Hayek] é bom? Sim.
O nome [Ludwig van Beethoven] é bom? Sim.
O nome [Jeanne d'Arc] é bom? Sim.
O nome [Saddam Hussein al-Tikriti] é bom? Sim.
O nome [Osama bin Mohammed bin Awad bin Laden] é bom? Sim.
O nome [Luís Inácio Lula da Silva] é bom? Sim.
O nome [Getúlio Dornelles Vargas] é bom? Sim.
O nome [Juscelino Kubitschek de Oliveira] é bom? Sim.
O nome [Jean-Baptiste le Rond d'Alembert] é bom? Sim.
O nome [Pierre-Simon Laplace] é bom? Sim.
O nome [Hans Christian Ørsted] é bom? Sim.
O nome [Joseph Louis Gay-Lussac] é bom? Sim.
O nome [Scarlett O'Hara] é bom? Sim.
O nome [Ronald McDonald] é bom? Sim.
O nome [María Antonieta de las Nieves] é bom? Sim.
O nome [Pedro de Alcântara Francisco António João Carlos Xavier de Paula Miguel Rafael Joaquim José Gonzaga Pascoal Cipriano Serafim] é bom? Sim.
O nome [Luís Augusto Maria Eudes de Saxe-Coburgo-Gota] é bom? Sim.
O nome [Martin Luther King Jr.] é bom? Sim.
O nome [William Henry Gates III] é bom? Sim.
O nome [John William D'Arcy] é bom? Sim.
O nome [John D'Largy] é bom? Sim.
O nome [Samuel Eto'o] é bom? Sim.
O nome [Åsa Ekström] é bom? Sim.
O nome [Gregor O'Sulivan] é bom? Sim.
O nome [Ítalo Gonçalves] é bom? Sim.

Esses nomes devem ser ruins:
O nome [] é bom? Não.
O nome [Maria] é bom? Não.
O nome [Maria-Silva] é bom? Não.
O nome [Marcos E] é bom? Não.
O nome [E Marcos] é bom? Não.
O nome [Maria  Silva] é bom? Não.
O nome [Maria Silva ] é bom? Não.
O nome [ Maria Silva ] é bom? Não.
O nome [Maria silva] é bom? Não.
O nome [maria Silva] é bom? Não.
O nome [MARIA SILVA] é bom? Não.
O nome [MAria Silva] é bom? Não.
O nome [Maria SIlva] é bom? Não.
O nome [Jean-Baptiste] é bom? Não.
O nome [Jeanne d' Arc] é bom? Não.
O nome [Joseph Louis Gay-lussac] é bom? Não.
O nome [Pierre-simon Laplace] é bom? Não.
O nome [Maria daSilva] é bom? Não.
O nome [Maria~Silva] é bom? Não.
O nome [Maria Silva~] é bom? Não.
O nome [~Maria Silva] é bom? Não.
O nome [Maria~ Silva] é bom? Não.
O nome [Maria ~Silva] é bom? Não.
O nome [Maria da da Silva] é bom? Não.
O nome [Maria da e Silva] é bom? Não.
O nome [Maria de le Silva] é bom? Não.
O nome [William Henry Gates iii] é bom? Não.
O nome [Martin Luther King, Jr.] é bom? Não.
O nome [Martin Luther King JR] é bom? Não.
O nome [Martin Luther Jr. King] é bom? Não.
O nome [Martin Luther King Jr. III] é bom? Não.
O nome [Maria G. Silva] é bom? Não.
O nome [Maria G Silva] é bom? Não.
O nome [Maria É Silva] é bom? Não.
O nome [Maria wi Silva] é bom? Não.
O nome [Samuel 'Etoo] é bom? Não.
O nome [Samuel Etoo'] é bom? Não.
O nome [Samuel Eto''o] é bom? Não.

Note that regex has accepted all the names it should accept and rejected all that it should reject. The regex produced is:

(?: ... )

03.10.2017 / 22:32

9

Regex

The Regex is this (monster): ^(?![ ])(?!.*[ ]{2})((?:e|da|do|das|dos|de|d'|D'|la|las|el|los)\s*?|(?:[A-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð'][^\s]*\s*?)(?!.*[ ]$))+$

Where global, multine and unicode / gmu is enabled

Or without special characters ^(?![ ])(?!.*[ ]{2})((?:e|da|do|das|dos|de|d'|D'|la|las|el|los)\s*?|(?:[A-Z][^\s]*\s*?)(?!.*[ ]$))+$

The links to ReGex101 and Regex Planet , with the validation of the tests can be accessed.

Note: It is long, so there are spaces for optimizations.

References

Using these global OS links:

regular expression for first and last name
Regular Expression to

disallow two consecutive white spaces in the middle of a string

A regex to test if all words are title -case

Code generated by Regex101

const regex = /^(?![ ])(?!.*[ ]{2})((?:e|da|do|das|dos|de|d'|D'|la|las|el|los)\s*?|(?:[A-Z][^\s]*\s*?)(?!.*[ ]$))+$/gmu;
const str = 'Maria  Silva
Maria silva
maria Silva
MariaSilva
 Maria Silva
Maria Silva 
Maria da Silva
Marina Silva
Maria / Silva
Maria . Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Natalia maria
Natalia aria
Natalia orea
Maria dornelas
Samuel eto'
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves';
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the 'm'-variable.
    m.forEach((match, groupIndex) => {
        console.log('Found match, group ${groupIndex}: ${match}');
    });
}

Excel VBA code

As a programming language is not specified, a simple way to check the validation of the code in Excel will be demonstrated, since the other answers already include java.

Enable Regex in Excel

RegEx needs to be enabled, Enable Developer mode

In the 'Developer' tab, click 'Visual Basic' and the VBA window will open.

Go to 'Tools' - > 'References ...' and a window will open.

Look for 'Microsoft VBScript Regular Expressions 5.5', as in the image below. And enable this option.

VBACode

DimstrAsStringDimobjMatchesAsObjectDimwsAsWorksheet:Setws=ThisWorkbook.Sheets(1)DimiAsLonglastrow=ws.Cells(ws.Rows.Count,"A").End(xlUp).Row
For i = 1 To lastrow
    str = CStr(Cells(i, 1))
    Set objRegExp = CreateObject("VBScript.RegExp") 'New regexp
    objRegExp.Pattern = "^(?![ ])(?!.*[ ]{2})((?:e|da|do|das|dos|de|d'|D'|la|las|el|los)\s*?|(?:[A-Z][^\s]*\s*?)(?!.*[ ]$))+$"
    objRegExp.Global = True
    Set objMatches = objRegExp.Execute(str)
    If objMatches.Count <> 0 Then
        For Each m In objMatches
            ws.Cells(i, 2) = m.Value
            ws.Cells(i, 2).Interior.ColorIndex = 4
        Next
    Else
        ws.Cells(i, 2).Interior.ColorIndex = 3
        ws.Cells(i, 2)=""
    End If
Next i

Result

PCRECode

Thisisacodemadeby ctwheels in question Validate Title Full Name with Regex .

What the code is:

(?(DEFINE)
    (?# Definitions )
    (?<valid_nameChars>[\p{L}\p{Nl}])
    (?<valid_nonNameChars>[^\p{L}\p{Nl}\p{Zs}])
    (?<valid_startFirstName>(?![a-z])[\p{L}'])
    (?<valid_upperChar>(?![a-z])\p{L})
    (?<valid_nameSeparatorsSoft>[\p{Pd}'])
    (?<valid_nameSeparatorsHard>\p{Zs})
    (?<valid_nameSeparators>(?&valid_nameSeparatorsSoft)|(?&valid_nameSeparatorsHard))
    (?# Invalid combinations )
    (?<invalid_startChar>^[\p{Zs}a-z])
    (?<invalid_endChar>.*[^\p{L}\p{Nl}.\p{C}]$)
    (?<invalid_unaccompaniedSymbol>.*(?&valid_nameSeparatorsHard)(?&valid_nonNameChars)(?&valid_nameSeparatorsHard))
    (?<invalid_overTwoUpper>(?:(?&valid_nameChars)*\p{Lu}){3})
    (?<invalid>(?&invalid_startChar)|(?&invalid_endChar)|(?&invalid_unaccompaniedSymbol)|(?&invalid_overTwoUpper))
    (?# Valid combinations )
    (?<valid_name>(?:(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*(?&valid_nameChars)+(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*)+\.?)
    (?<valid_firstName>(?&valid_startFirstName)(?:\.|(?&valid_name)*))
    (?<valid_multipleName>(?&valid_firstName)(?=.*(?&valid_nameSeparators)(?&valid_upperChar))(?:(?&valid_nameSeparatorsHard)(?&valid_name))+)
    (?<valid>(?&valid_multipleName)|(?&valid_firstName))
)
^(?!(?&invalid))(?&valid)$

And the validation and debugging test on Regex 101

03.10.2017 / 21:11

0

I've been having this problem for name structure validation, even my system must accept foreign names.

So I was able to compile a Regex that meets the following specifications:

At least 2 words (first and last name);

Does not accept special characters, except "" apostrophe

It is not case sensitive (since I deal in the backend)

Each word must have at least two characters

Follow Regex

Can be tested on link

12.05.2018 / 21:54

How does grid system work? What is devops?

score 18 · Accepted Answer

The problem is you're using the dot, which is for any character except line terminators. If you remove it will work.

. matches any character (except for line terminators)

function valida(nome){
return !!nome.match(/[A-Z][a-z]* [A-Z][a-z]*/);
}

const testes = ["Maria Silva", "Maria  Silva", "Maria silva", "maria Silva", "MariaSilva"];
const resultados = testes.map(valida);
console.log(resultados);

Regex for names is always complicated because there are names of countries more complex than Rosa or Maria. For example "Åsa Ekström", "John O Sílilleabháin" or "Gregor O'Sulivan".

In these cases regex can be absurdly complex. And it's hard to hit all the variants ... yet one suggestion might be

function valida(nome) {
  return !!nome.match(/^[A-ZÀ-Ÿ][A-zÀ-ÿ']+\s([A-zÀ-ÿ']\s?)*[A-ZÀ-Ÿ][A-zÀ-ÿ']+$/) + ' ' + nome;
}

const testes = ["Maria Silva", "Åsa Ekström", "John Ó Súilleabháin", "Gregor O'Sulivan", "Maria  Silva", "Maria silva", "maria Silva", "MariaSilva"];
const resultados = testes.map(valida);
console.log(resultados);