Divide string into substrings and search for them in another

4

I am creating a function in which I will check if a endereço_A contains parts in a endereço_B , exemplifying:

address_A

AVENIDA JOÃO E MARIA

B_address

RUA JOÃO

The result should return JOÃO , which is the substring that both addresses have in common.

CREATE FUNCTION dbo.splitSearchAddress ( @addressSearch VARCHAR(MAX), @addressIn VARCHAR(MAX) )
RETURNS
    @returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN

    DECLARE @name NVARCHAR(255)
    DECLARE @pos INT
    DECLARE @separator CHAR(1)

    SET @separator = ' '

    WHILE CHARINDEX(@separator, @addressSearch) > 0
    BEGIN
        SELECT @pos  = CHARINDEX(@separator, @addressSearch)  
        SELECT @name = SUBSTRING(@addressSearch, 1, @pos-1)

        IF CHARINDEX(@name, @addressIn) > 0
        BEGIN
            INSERT INTO @returnList 
            SELECT @name

        END

        SELECT @addressSearch = SUBSTRING(@addressSearch, @pos+1, LEN(@addressSearch)-@pos)

    END

    INSERT INTO @returnList
    SELECT ISNULL(@addressSearch, 0)

    RETURN
END

The problem is when I look for something that does not contain endereço_B and it returns me the last value even though it was not found. What would be the ideal way to solve this? Do I need to be resetting the variable for each condition?

Problem example:

SELECT * FROM dbo.splitSearchAddress('RUA JOÃO E MARIA', 'AVENIDA JOÃO')

It will return me

JOÃO
E
MARIA

When should I return only

JOÃO

If the substring of endereço_A is the last position and if it is located in endereço_B , it works correctly:

SELECT * FROM dbo.splitSearchAddress('RUA MARIA JOÃO', 'AVENIDA JOÃO')
    
asked by anonymous 01.12.2016 / 15:11

2 answers

3

CHARINDEX does not work for your problem because it looks for occurrences in the String. Words like "Street Da", "STREET A" would fall into the same situation as the word "E".

When "E" is assigned there is @name variable IF CHARINDEX('E', 'AVENIDA JOÃO') is returned 3 due to occurrence of "E" in "AVENUE".

One solution I found was to break the text "RUA JOÃO E MARIA" and "AVENIDA JOÃO" into separate tables using a Split(' ') function with space delimiter, after that you can compare what is common using a INNER JOIN . My code stayed like this.

Split Function:

CREATE FUNCTION [dbo].[Split] ( @text VARCHAR(MAX),
                                      @delimiter VARCHAR(20) = ' ' ) RETURNS @Strings TABLE ( [position] INT IDENTITY PRIMARY KEY,
                                                                                                                              [value] VARCHAR(100) ) AS BEGIN DECLARE @index INT
SET @index = -1 WHILE (LEN(@text) > 0) BEGIN -- Find the first delimiter

SET @index = CHARINDEX(@delimiter, @text) IF (@index = 0)
AND (LEN(@text) > 0) BEGIN
INSERT INTO @Strings
VALUES (CAST(@text AS VARCHAR(100))) BREAK END

                IF (@index > 1) BEGIN
INSERT INTO @Strings
VALUES (CAST(LEFT(@text, @index - 1) AS VARCHAR(100)))
SET @text = RIGHT(@text, (LEN(@text) - @index)) END --Delimiter is 1st position = no @text to insert
 ELSE
SET @text = CAST(RIGHT(@text, (LEN(@text) - @index)) AS VARCHAR(100)) END RETURN END

SplitSearchAddress function: (adapted)

CREATE FUNCTION dbo.splitSearchAddress (@addressSearch VARCHAR(MAX), @addressIn VARCHAR(MAX)) RETURNS @returnList TABLE ([Name] [nvarchar] (500)) AS BEGIN DECLARE @Temp1 TABLE (NAME varchar(MAX)) DECLARE @Temp2 TABLE (NAME varchar(MAX))
INSERT INTO @Temp1
SELECT value
FROM dbo.Split(@addressSearch, ' ');


INSERT INTO @Temp2
SELECT value
FROM dbo.Split(@addressIn, ' ');


INSERT INTO @returnList
SELECT T1.NAME
FROM @Temp1 T1
    INNER JOIN @Temp2 T2 ON T1.NAME = T2.NAME RETURN 
END

Query:

SELECT * FROM dbo.splitSearchAddress('RUA JOÃO E MARIA', 'AVENIDA JOÃO')

Result:

    
01.12.2016 / 17:37
1

Based on a function to separate words by space:

if object_id('fc_split', 'TF') is null
begin
  exec('create function fc_split() returns @resultado table(item varchar(max)) as begin return end');
end
go

alter function dbo.fc_split(@frase       varchar(max),
                            @delimitador varchar(max))
returns @resultado table(item      varchar(max),
                      sequencia integer)
begin
  declare @parte     varchar(max),
          @sequencia integer;

  set @sequencia = 0;

  while charindex(@delimitador, @frase, 0) <> 0
  begin
    set @parte = substring(@frase, 1, charindex(@delimitador, @frase, 0) - 1);
    set @frase = substring(@frase, charindex(@delimitador, @frase, 0) + len(replace(@delimitador, ' ', '.')), len(@frase));

    if len(@parte) > 0
    begin
      insert into @resultado(item, sequencia) values(@parte, @sequencia + 1);

      set @sequencia = @sequencia + 1;
    end;
  end;

  if len(@frase) > 0
  begin
    insert into @resultado(item, sequencia) values(@frase, @sequencia+ 1);
  end

  return;
end
go

You only need to% w / w with the results of dividing the two sentences, resulting in the correctness of your function:

if object_id('splitSearchAddress', 'TF') is null
begin
  exec('create function splitSearchAddress() returns @resultado table(item varchar(max)) as begin return end');
end
go

alter function dbo.splitSearchAddress(@addressSearch varchar(max),
                                      @addressIn     varchar(max))
returns @returnList table(name nvarchar (500))
as
begin
  insert into @returnList(name)
  select item
    from fc_split(@addressSearch, ' ')
  intersect
  select item
    from fc_split(@addressIn, ' ');

  return
end
  

INTERSECT Operator

     

The INTERSECT operator is used to return the result of 2 or more INTERSECT . Not entering will only return the rows that are selected by all SELECT or queries . If a record exists in a data sets and not in another, it is omitted from the result.

    
01.12.2016 / 18:09