Filter items that do not contain words in my list

6

I have a personal: List<Grupos> lista = new List<Grupos>();

And I have this code

var txtFiltro = "noivas,unhas";
var palavrasFiltro = txtFiltro.ToLower().Split(',');
matches = lista.Where(x => !palavrasFiltro.Contains(x.Nome_Grupo.ToLower().ToString())).ToList();

This code filters everything that is different from the text I am reporting, but only works if you have a word; if you have phrases does not work, how could you do the implementation of this filter?

Here is a code that the @jbueno user helped me with. link

    
asked by anonymous 12.05.2016 / 16:50

2 answers

6

The algorithm presented in link is wrong. Is looking for the phrase within words, this will not even work. You can not search for a larger text within a minor, the greater one will never be within the minor. If this is what you wanted, it's mathematically impossible and it makes no sense at all.

If you are looking for phrases, just use the phrase. The space between words will not disturb anything.

If you want to use some loose words as a filter, you need to create a more complex filter mechanism. Although I can do 100% in LINQ I find it interesting to create an extension method that solves this (you can even use LINQ in it, if you wish). It does not even need to be an extension method, but it's more convenient to use.

This method has to check whether each of the strings used in the filter is contained within each sentence.

It has more performative ways of doing this, but it is more complex, I do not know if it compensates.

public static bool ContainsAny(this string haystack, params string[] needles) {
    foreach (var needle in needles) {
        if (haystack.Contains(needle))
            return true;
    }
    return false;
}

See running on dotNetFiddle .

Form with LINQ:

public static bool ContainsAny(this string haystack, params string[] needles) {
    return needles.Any(x => haystack.Contains(x));
}

Actually if you want to make sure it's just exact words, it complicates things. This example (based on the previous applied, if this does not do what you want, the other does not) if you look for "bride", will find "engagement", even if you do not want this. This search is not for words, it is for excerpts of text, it does not observe the syntax of the text.

If you want to solve this naively, you would have to separate the sentence into words and check the equality of each one.

See running on dotNetFiddle and CodingGround .

This does not solve all cases. A more complete parsing is needed to handle all cases, it gets tricky.

There you should think: if I want "bride" and "brides"? You have to treat it like two words. The same goes for verb conjugation, gender change, etc.

    
12.05.2016 / 17:20
3

I think the problem is that you are looking for the largest string inside the smallest one, ie you are seeing if the filter contains the phrase and not the other.

I suggest:

string[] palavrasFiltro; // mantive array porque é o que você usa...
List<string> lista; // ... mas eu prefiro trabalhar com listas. Suas frases ficam aqui.
List<string> remocao = new List<string>();

foreach (string elemento in lista)
{
    foreach (string filtro in palavrasFiltro)
    {
        if (elemento.Contains(filtro))
        {
            remocao.Add(elemento);
            break;
        }
    }
}

foreach (string s in remocao)
{
    lista.Remove(s);
}
    
12.05.2016 / 17:17