How to do a spell check in C #?

19

I need to do an analysis of the words contained in a database. The analysis consists of promoting a spell check only, showing a gridview with the misspelled words.

I never developed anything like it, I wanted a light.

I can start the example with this:

string[] palavrasParaCorrigir = {"batata", "conoira", "cebola", "pimentao", "beterraba"};
    
asked by anonymous 07.03.2014 / 20:58

2 answers

11

If you do not mind the fact that the spell-checker is under the GPL license, a good solution would be to use NHunspell . >

You can get one of their latest versions here . After adding NHunspell.dll to your project, simply use the following code to do the verification:

using (Hunspell hunspell = new Hunspell("pt_br.aff", "pt_br.dic"))
{
    bool ortografia = hunspell.Spell("palavra a ser verificada");

    if (ortografia == false) //A palavra não está escrita corretamente.
    {
        /*...*/
    }

    List<string> sugestoes = hunspell.Suggest("palavra a ser verificada"); //Definindo lista de sugestões (palavras possíveis).
}

Note: The affixes and dictionary files (.aff and .dic) can be found here .

    
08.03.2014 / 16:00
14

Existing libraries such as Hunspell (already cited in answer) or Aspell will solve your problem quickly: these libraries exist for several languages and are used in various programs.

But if you want to delve a little bit: there's an excellent article by Peter Norvig (Director of Google Research) on the subject: link

Of course, it's in English, but it basically explains how Google's agent works when we use the search engine and it suggests a fix.

In summary: the system is based on a dictionary with Hamming Code check of distance 2. In the case of the article and the examples, the dictionary is a file with enough text, where they are spelled correctly. Peter Norvig used a number of Shakespeare texts for this.

When the user enters a word, the program takes this word, and sees if it exists in the dictionary. If so, the word is correct.

If it does not exist, it generates several mutants (variations with error) of that word using the following techniques:

  • Change the position of the next letters;
  • Take one of the letters for each position;
  • Insert a letter in each position;
  • Delete a letter in each position.

From this list of mutants, it will check if any of them exist in the dictionary. The one that exists in greater numbers, will be the correct one.

In the example program, if you still can not find a correct word, it takes every word from the list of mutants, and generates new mutants. And again see if any of them exist in the dictionary.

In the end of article , you have the program code in various languages (at the time, I wrote a Java version and in Groovy) but you will see versions for almost all languages, including two versions in C #.

The only additional detail is that you may have to tinker with the source code so that the range of letters does not just go from a-z, but also include accented letters, as we use in Portuguese.

Of course, you will need a dictionary in Portuguese. Or, optionally, if your list is made up only of products, for example, you can use instead of the dictionary your list of products.

    
07.03.2014 / 23:09