After having consulted this page and started to scratch a gambiarra, I noticed that I had in the "chest" a ready function ... And looking up I notice that the foundation was already given here by @DanielOmine, remembering that cellular with DDD has 11 digits and fixed 10 digits. PS: no DDD are 9 and 8 respectively.
It was around the year 2000 that Brazilian phones had 8 digits, and between 2010 and 2013, all cell phones were 9 digits. See Anatel standards .
The generic algorithm for interpreting a supposed free-phone string is based on the Anatel rule ... It's only a little more complicated because it needs to check the context and know how to differentiate between cases with and without DDI and DDD. >
Anyone interested in ... Algorithm that "sanitizes" typed telephone data, written (in PL / SQL) and validated years ago ... I update if I find something better later (leaving it in Wiki for you to edit) .
CREATE or replace FUNCTION lib.get_string_tel(text) RETURNS text AS $f$
SELECT regexp_replace(
$1
,'^.*?((\+?55[\s-]|\+\d\d[\s-])?\s*\(?\d\d\)?[\s\-\.]*\d[\.\s]?\d\d[\d\s\.\-]+).*?$'
,''
)
$f$ LANGUAGE SQL IMMUTABLE;
CREATE or replace FUNCTION lib.telbr_split(
p_tel text -- a string contendo o telefone
,p_pode_ddi boolean DEFAULT true -- deixa preceder 55?
,p_ddd_default text DEFAULT '11' -- contexto Sampa
,p_max int DEFAULT 21 -- tamaho máximo da string para remover outros dígitos
) RETURNS text[] AS $f$
DECLARE
basenum text;
digits text;
len int;
ret text[];
BEGIN
basenum := trim(p_tel);
IF length(basenum)>p_max THEN
basenum := lib.get_string_tel(basenum);
END IF;
digits := lib.digits(basenum);
IF p_pode_ddi THEN
IF length(digits)>=14 THEN
basenum := regexp_replace(basenum,'^[^\d]*\d\d','');
digits := lib.digits(basenum);
ELSEIF length(digits)>=12 AND substr(digits,1,2)='55' THEN
basenum := regexp_replace(basenum,'^[^\d]*55','');
digits := lib.digits(basenum);
END IF;
END IF;
digits := regexp_replace(digits,'^0+','');
len := length(digits);
IF len<=9 THEN
digits := p_ddd_default||digits;
END IF;
RETURN array[substr(digits,1,2), substr(digits,3)];
END
$f$ LANGUAGE PLpgSQL IMMUTABLE;
The first function extracts the string containing apparently a phone number in the middle of free text.
For example, another day was entered as Iphone 6s plus (11) 9 1234-5678
as one of those Google forms.
The second is that it actually normalizes and explodes the two-part number, DDD and number.
The PL / SQL language (PostgreSQL or Oracle databases) is almost a Pascal, very easy to convert to PHP, Javascript, etc. Just be patient.