How does an automatic categorization algorithm work?

4

I have this doubt.

I've noticed on sites like Yahoo Answers that there is a recognition of the semantics of the questions and they are categorized automatically. Of course, there are bugs, but it is very effective most of the time.

Which method is used?

I've already thought of ways to do it, but I'd like to hear from you here.

I thought about doing a keyword count on the reported text, and thus, direct you to the category that contains those keywords. It would be a sort of "punctuation", where with each keyword found, adds a point to the category that contains it, in a field like "cat_keywords" in the database.

Another question I have is about computing resources. Would such an algorithm not consume many resources?

    
asked by anonymous 21.03.2017 / 16:39

1 answer

0

I could easily solve creating a table categories, which would be the tags, and another category_words. In this second you would enter words that have to do with the associated category. Dai would be sweeping every word ... So what was typed can bring more than one category or bring the category that has more words associated with it.

Ex: categories:

id - category

1 - Ruby

2 - PHP

categories_words:

id - category_id - word

1 - 1 - Rails

1 - 1 - System

1 - 2 - Laravel

1 - 2 - System

text: I want to develop a system in Laravel

Per occurrence: Categories - Ruby and PHP

For more words: Categories - PHP

    
20.04.2017 / 17:02