The lexicographic order or Collation is very relative to the language and alphabet you are using, and let's say it is problem other than the issue of choosing a Charset , which has already been resolved by UNICODE . Home
For your doubt I recommend an essential reading:
The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)
Problem approach in C
Recommendation is always to use a UNICODE representation instead of using literal characters expressed in char
mainly for example the extended representation of accented Latin characters are multi-byte, that is, they will not be correctly represented in char
(- 128 to 127 ) or even using unsigned char
(0 to 255).
Using as a reference:
It = LATIN CAPITAL LETTER AND WITH ACUTE
It would be the unicode-codepoint U+00C9
being the hexa c3 89
occupying 2 bytes in UTF-8.
This would have to be represented by%% multibyte-character type . Home
Suppose the question revolves around receiving an input, converting it and testing it as you expose:
I need to compare two strings by ignoring the accent
An approach would be like this example, using the Wide-Character I / O functions to replace all wchar_t
:
//constante unicode representado por um type wide char
const wchar_t E_GRANDE_ACENTO L'\u00C9';
int main()
{
//obtem o locale default do ambiente, linux padrão normalmente UTF-8
setlocale(LC_ALL, "");
//fputs para wide char type
fputws(L"Informe a String: ", stdout);
wchar_t wbuff[128];
//fgets para wide char type
fgetws(wbuff, 128, stdin);
int len = wcslen(wbuff);
for (int n = 0; n < len ;++n)
{
if (buff[n] == E_GRANDE_ACENTO)
buff[n] = L'E';
}
wprintf(L" %ls\n", buff);
return 0;
}
This is a reference example in the case of a broader approach to this kind of problem API (UNAC) informed by @Intrusion would be more recommended.
What about Collation of a UNICODE stream?
Maybe this would be the approach you hoped for, I recommend using the ICU - International Components for Unicode API, it solves the ordering using existing standards or even with specific ruleset declared during your instance.
Collator example using ICU API for unicode array sorting
UChar *s [] = { /* lista de strings unicode */ };
uint32_t listSize = sizeof(s)/sizeof(s[0]);
UErrorCode status = U_ZERO_ERROR;
UCollator *coll = ucol_open("en_US", &status);
uint32_t i, j;
if(U_SUCCESS(status)) {
for(i=listSize-1; i>=1; i--) {
for(j=0; j<i; j++) {
if(ucol_strcoll(s[j], -1, s[j+1], -1) == UCOL_LESS) {
swap(s[j], s[j+1]);
}
}
}
ucol_close(coll);
}