How does Bag of Words work and where is it used?

8

I recently researched on artificial intelligence and found some articles talking about such a "bag of words", but I do not know what it is and I did not find anything in Portuguese talking about it.

I wonder what the "bag of words" is, and in what cases does it apply? If possible, leave the sources.

    
asked by anonymous 29.06.2017 / 02:30

1 answer

10

Explanation

The bag-of-words model is a simplified representation used in the natural language processing > and in the Information Retrieval (IR) . In this model, a text (as a phrase or a document) is represented as the bag (multiset) of its words, disregarding the grammar and even the order of the words, but maintaining the multiplicity.

Implementation Example

The following templates are a text document using bag-of-words .

Here are two simple text documents:

(1) John gosta de assistir filmes. Mary também gosta de filmes.

(2) John também gosta de assistir jogos de futebol.

Based on these two text documents, a list is constructed as follows:

[ 
    "John" , 
    "gosta" , 
    "de" , 
    "assistir" , 
    "filmes" , 
    "Mary" , 
    "também" , 
    "futebol" , 
    "jogos" 
]

It is also common to calculate the frequency of appearance of words:

linear(tj) = 1 − d(tj)/N

Where tj is the word you want to find the frequency, d(tj) the number of times the word appears and N is the number of documents or phrases.

Conclusion

Simply put, the bag-of-words is a form of text representation. And it is commonly used to machine learning , sentiment analysis , chatbot and topic model .

Source: Wikipedia

    
29.06.2017 / 15:09