How to find a String from Regular Expressions

16

I have a field in the SQL Server 2014 database that stores the client address. The problem is that everything was stored in a single varchar field.

Here are some real examples:

  

Antonio José Nerchis Street, 81
   Street: Guaira / JD Pacera, 1201
  Rua 321, N 168
  R ALTAIR GAGLIARDI, 132
  Rua Rio Grande do Sul, 243
  Rua Duque de Caxias
  R CAPELLEN, 58
  ROD ACCESSO PLINIO ARLINDO DE NEZ, 4303
  R Anibal Gazaniga, 107
  R D PEDRO I, 231
  Rua Dep. Ivan Ferreira do Amaral, 440
  R SANTA CATARINA, 711
  Rua Santa Catarina, 411
  Rua Augusto de Jesus, 77
  Marcondes Sobrinho Street, 40
  Antonio Tonelli street nº 755

I need to move this to the 3 columns:

  • Type of street (street, avenue, village ...)
  • The street itself (name of street, avenue, village ...)
  • Number of the street (number and other complements that may have been placed near the street)

Does anyone know how to do this via regular expression? If yes, how? I do not understand anything about regular expressions.

    
asked by anonymous 31.08.2016 / 20:09

3 answers

8

ATTENTION : Before performing the steps outlined in this response, make a backup of your table in case something goes wrong.

I do not think regular expression solves your problem, at least not easily.

Now imagine that you have these addresses in the endereco column of a table named tabela .

Then, you would first add the required fields:

ALTER TABLE tabela ADD COLUMN logradouro varchar(255) DEFAULT NULL;
ALTER TABLE tabela ADD COLUMN numero_logradouro varchar(6) DEFAULT NULL;
ALTER TABLE tabela ADD COLUMN tipo_logradouro varchar(5) DEFAULT NULL;

The size of the logradouro field must be the same as the endereco field.

Hence, you just have to populate those fields. Let's start by separating the type of the street from the rest:

UPDATE tabela SET
    tipo_logradouro = 'Rua',
    logradouro = SUBSTRING(endereco, 3, 9999)
WHERE UPPER(SUBSTRING(endereco, 1, 2)) = 'R ';

UPDATE tabela SET
    tipo_logradouro = 'Rua',
    logradouro = SUBSTRING(endereco, 4, 9999)
WHERE UPPER(SUBSTRING(endereco, 1, 3)) = 'R. ';

UPDATE tabela SET
    tipo_logradouro = 'Rua',
    logradouro = SUBSTRING(endereco, 5, 9999)
WHERE UPPER(SUBSTRING(endereco, 1, 4)) = 'RUA ';

UPDATE tabela SET
    tipo_logradouro = 'Avenida',
    logradouro = SUBSTRING(endereco, 4, 9999)
WHERE UPPER(SUBSTRING(endereco, 1, 3)) = 'AV ';

UPDATE tabela SET
    tipo_logradouro = 'Avenida',
    logradouro = SUBSTRING(endereco, 5, 9999)
WHERE UPPER(SUBSTRING(endereco, 1, 4)) = 'AV. ';

UPDATE tabela SET
    tipo_logradouro = 'Avenida',
    logradouro = SUBSTRING(endereco, 9, 9999)
WHERE UPPER(SUBSTRING(endereco, 1, 8)) = 'AVENIDA ';

And then, you make the same steps to square, village, road, highway, mall, etc.

Note that 9999 is just large enough to fit the entire address. If the address is varchar(123) , then changing 9999 by 123 will be enough.

Now is the time to separate the number. First we normalize:

UPDATE tabela SET logradouro = REPLACE(logradouro, 'nº ', ', ');
UPDATE tabela SET logradouro = REPLACE(logradouro, 'nº', ', ');
UPDATE tabela SET logradouro = REPLACE(logradouro, 'Nº ', ', ');
UPDATE tabela SET logradouro = REPLACE(logradouro, 'Nº', ', ');

Then we separate the number after the last comma. To find the last, we use the expression LEN(logradouro) - CHARINDEX(REVERSE(logradouro), ',') :

UPDATE tabela SET
    numero_logradouro = TRIM(SUBSTRING(logradouro, LEN(logradouro) - CHARINDEX(REVERSE(logradouro), ',') + 1, 9999)),
    logradouro = TRIM(SUBSTRING(logradouro, 1, LEN(logradouro) - CHARINDEX(REVERSE(logradouro), ',') - 1));
WHERE CHARINDEX(logradouro, ',') >= 1;

Finally, look at what records are left over or gone wrong for some reason. Pray for a few:

SELECT * FROM tabela
WHERE endereco IS NOT NULL
AND (logradouro IS NULL
    OR numero_logradouro IS NULL
    OR tipo_logradouro IS NULL
);

When you are sure that everything is correct and that all the records are correct:

ALTER TABLE tabela DROP COLUMN endereco;

Problems : What is here is still not able to handle the add-on. And if the complement is after the number, it can go wrong. If these cases are few, it is possible to treat them manually. If they are many, edit the question by putting some examples and I try to adapt here in the answer.

    
31.08.2016 / 21:58
5

The problem with an open field for user insertion is that even though it seems to obviate / default, its content can not be trusted. The ideal would be to update the system and create address tables:

CREATE TABLE tb_pais(
    id SERIAL,
    pais VARCHAR(100),
    PRIMARY KEY (id)
);

CREATE TABLE tb_uf(
    id SERIAL,
    uf VARCHAR(100),
    sigla CHAR(2),
    id_pais INTEGER REFERENCES tb_pais(id),
    PRIMARY KEY (id)
);

CREATE TABLE tb_cidade(
    id SERIAL,
    cidade VARCHAR(100),
    id_uf INTEGER REFERENCES tb_uf(id),
    PRIMARY KEY (id)
);

CREATE TABLE tb_bairro(
    id SERIAL,
    bairro VARCHAR(100),
    id_cidade INTEGER REFERENCES tb_cidade(id),
    PRIMARY KEY (id)
);

CREATE TABLE tb_logradouro(
    id SERIAL,
    logradouro VARCHAR(100),
    id_bairro INTEGER REFERENCES tb_bairro(id),
    PRIMARY KEY (id)
);

CREATE TABLE tb_cep(
    id SERIAL,
    cep VARCHAR(100),
    id_logradouro INTEGER REFERENCES tb_logradouro(id),
    PRIMARY KEY (id)
);

CREATE TABLE tb_endereco(
    id_cliente INTEGER REFERENCES tb_cliente(id),
    id_pais INTEGER REFERENCES tb_pais(id),
    id_uf INTEGER REFERENCES tb_uf(id),
    id_cidade INTEGER REFERENCES tb_cidade(id),
    id_bairro INTEGER REFERENCES tb_bairro(id),
    id_logradouro INTEGER REFERENCES tb_logradouro(id),
    id_cep INTEGER REFERENCES tb_cep(id),
    numero INTEGER
);

This eliminates duplicate names and can also develop search engines.

Once you have done this, you can start by updating the content in the database, which would consist of capturing the current content and inserting it in its place.

Let's say As commented out your current field follows the Rua endereço numero pattern, you could create a [^ ]+ (.+?) (\d+?) REGEX. Thus, group 1 would be the street and group 2 would be the number, the initial part to the first space being discarded, Rua Almeida, 123 , would be Almeida, to the street and 123 to the address number.

After this completed step, a management would be created to update the data of the street, and in case of duplicate removal and substitution of duplicate ids.

    
31.08.2016 / 21:39
4

For your data sample you can do, change the PivotTable for your actual table, create the fields in your table and try to make the change.

You can use Begin tran and if everything is right commit or rollback

   declare @Enderecos table
(
  Enderecos varchar(max),
  TipodeLogradouro varchar(200),
  Logradouro varchar(200),
  Numero varchar(200)
)

insert into @Enderecos(Enderecos) values 
('Rua Antonio José Nerchis, 81'),
('Rua: Guaira/JD Pacera, 1201'),
('Rua 321, N 168'),
('R ALTAIR GAGLIARDI, 132'),
('Rua Rio Grande do Sul, 243'),
('Rua Duque de Caxias'),
('R CAPELLEN, 58'),
('ROD ACESSO PLINIO ARLINDO DE NEZ, 4303'),
('R Anibal Gazaniga, 107'),
('R D PEDRO I, 231'),
('Rua Dep. Ivan Ferreira do Amaral, 440'),
('R SANTA CATARINA, 711'),
('Rua Santa Catarina, 411'),
('Rua Augusto de Jesus, 77'),
('Rua Marcondes Sobrinho, 40'),
('rua Antonio Tonelli nº 755 casa 2')

update @Enderecos
set TipodeLogradouro = REPLACE(REPLACE(REPLACE(LEFT(Enderecos, charindex(' ', Enderecos) - 1),',' , ''),':' , ''),';' , ''),
Numero = 
case when  Enderecos  like '%º%' then substring(Enderecos, charindex('º', Enderecos) - 1, 100 )
        else 
        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace
        (Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace
        (reverse(left(reverse(Enderecos), charindex(' ', reverse(Enderecos)) -1)        
        ) COLLATE sql_latin1_general_cp1251_ci_as, 'z', ''), 'x', ''), 'w', ''), 'y', ''), 'v', ''), 'u', ''), 't', ''), 's', ''), 'r', ''), 'q', ''), 'p', '')
        , 'o', ''), 'n', ''), 'm', ''), 'l', ''), 'k', ''), 'j', ''), 'i', ''), 'h', '')
        , 'g', ''), 'f', ''), 'e', ''), 'd', ''), 'c', ''), 'b', ''), 'a', ''),',' , ''),':' , ''),';' , ''),' ',''),'º',''),'.',''),'/',''),'\','')
      end
from @Enderecos 

update @Enderecos
set Numero = REPLACE(Numero,'N?','Nº')
update @Enderecos

set Logradouro = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Enderecos,TipodeLogradouro,''),Numero,'')
,':' , ''),';' , ''),',',''),'Nº',''),'.',''),'\','')


select *  from @Enderecos

I changed to have a given ('rua Antonio Tonelli nº 755 casa 2') with add-in.

    
31.08.2016 / 21:22