Domain names can currently be unicode (utf8). Depending on the rules of your business model, if you need to allow domains that have non-ASCII characters, the following routine can be useful:
function validate_domain_name($str, $force_utf8 = true)
{
$force_utf8 = $force_utf8? 'u': '';
//Isso é ineficiente.
//$re = '[^a-zA-Z0-9\.]';
//Isso é ineficiente. Pois não valida normas básicas
//$re = '^(http[s]?\:\/\/)?((\w+)\.)?(([\w-]+)?)(\.[\w-]+){1,2}$';
//Esse é mais consistente
$re = '^(?!\-)(?:[\w\d\-]{0,62}[\w\d]\.){1,126}(?!\d+)[\w\d]{1,63}$';
if (preg_match('/'.$re.'/'.$force_utf8, $str, $rs) && isset($rs[0]) && !empty($rs[0])) {
return $rs[0];
} else {
return null;
}
}
$str = '000-.com';
$str = '-000.com';
$str = '000.com'; // válido
$str = 'foo-.com';
$str = '-foo.com';
$str = 'foo.com'; // válido
$str = 'foo.any'; // válido
$str = 'お名前0.com'; // válido
$str = 'お名前.コム'; // válido
echo 'domain: '.validate_domain_name($str);
To disable unicode, set the second parameter to Boolean false
.
The original regular expression has been adapted from this response: link
The adaptations I made were to change a-zA-Z
to \w
and add the option to include the u
flag, which allows non-ASCII characters.