Save without accents and uppercase SQL Server

3

I am creating a database, where I need to save all varchar fields in uppercase. Searching the internet, I found the following collate that would work, but it still keeps saving with accents and in uppercase and lowercase. Here's the example:

CREATE TABLE Cidades(
  CodCidade int identity(1,1) not null,
  Descricao varchar(80) COLLATE SQL_Latin1_General_Cp1251_CS_AS not null,
  CodEstado int not null,
  CodIBGE varchar(30) not null,
  Excluido bit not null default 0,
  CONSTRAINT PK_Cidades PRIMARY KEY(CodCidade)
)

Does anyone know what's wrong for this collate not to work? Do you have any other?

    
asked by anonymous 03.10.2016 / 19:10

2 answers

1

Collation SQL can not help in this case. It will define at most that accent and case (uppercase / lowercase) should be ignored in searches.

In order to do the conversion and in fact to keep everything in uppercase, without an accent, you need to convert the text before it persists, using some specific function.

MS SQL Server has no native function to remove accent. You would have to create this function yourself in SQL, or create this function in the application layer if you have one (in C # or Java, for example).

Algorithm

One of the ways to do this is to use two arrays to make a from-to , from there you go through the word, character, search for the position of the original character in the first array, character in the second array, in this same position.

Having only the accented characters and their substitutes in the arrays, if a character is not found, just return it yourself.

In addition to character substitutions accented by non-accented characters, you should also convert all to upper case (for this the languages already include native function, including SQL Server).

Ready-made language functions

Eventually the language you use already has some functions ready to help with some of the work.

For example, C # has String.Normalize that already helps a lot. See this example : p>

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

The code above first separates the letters of your accents or cedilla using String.Normalize , and then traverses the result by removing the accent characters that were separated from the letters, leaving only the letters

To the result of this function above you still have to apply string.UpperCase() . More or less like this:

static void Main(string[] args)
{
    var textoOriginal = "É é ç";
    var textoConvertido = RemoveDiacritics(textoOriginal).ToUpper();

    Console.WriteLine(textoConvertido);
}

The output of the above code is E E C .

    
03.10.2016 / 19:39
3

Solution 1:

Combine collation with function UPPER when adding records in the column.

INSERT INTO Cidades(Descricao, CodEstado, CodIBGE)
    values(UPPER('São Paulo'), 1, 1)

output:
CodCidade   Descricao   CodEstado   CodIBGE Excluido
1           SAO PAULO   1           1       0

Your collation should already remove the accents, but using SQL_Latin1_General_CP1253_CI_AI also works.

To search for other collations run:

select name, COLLATIONPROPERTY(name, 'CodePage') as Code_Page, description
from   sys.fn_HelpCollations()

Solution 2:

Create a separate field, such as DescricaoNormalizada , in order not to lose the original value:

Descricao varchar(80) not null,
DescricaoNormalizada varchar(80) COLLATE SQL_Latin1_General_Cp1251_CS_AS null,

Prepending with normalized value through triggers (triggers) or explicitly in INSERT, resulting in:

CodCidade   Descricao   DescricaoNormalizada    CodEstado   CodIBGE Excluido
1           São Paulo   SAO PAULO               1           1       0

Solution 3:

Do nothing in the database, leave the transformation behind the backend, or frontend if the goal is formatting.

Descricao varchar(80) not null,
    
03.10.2016 / 19:59