Problem with the classifier in PLN

2

I'm developing a chatbot, and to get the answer I'm using the Naive Bayes classifier by sorting the questions and answers. For those who want to see the whole project code and more definitions follow the link GitHub

To develop I am using the TextBlob library for python, the problem is that when training my classifier it is always returning the same message, regardless of the input that I use. The message is:

  

"Everything good?"

I have not yet been able to identify the problem, I do not know if the problem is in the way my data is prepared to perform the training or if it is the way I am training the classifier.

My class that performs the sorting process is this:

#encoding: utf-8
#!/usr/bin/env python
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob
import logging

class Talk(object):
    """A classe Talk é responsável por retornar a resposta
    de uma frase, baseando nas informações exportadas. Utilizando a classificação
    de acordo com o teorema de Bayes
    """
    def __init__(self):
        """
        Construtor da classe

        cl -> Armazena o classificador
        accuracy -> Armazena a precisão do algoritmo
        """
        self.__cl = None
        self.__accuracy = 0


    def train(self, train_set):
        """
        Treina com a lista de informações formada de frases e suas
        respectivas classificações:
        """

        logging.debug('Inicia treinamento da previsão de intenção')
        self.__cl = NaiveBayesClassifier(train_set)
        logging.debug('Treinamento da previsão de intenção finalizado')

    def test(self, test_set):
        """
        Realiza testes com a lista de informações formada
        de frases e sua respectiva classificação para obter a precisão:
        """

        logging.debug('Inicia teste da previsão de intenção')
        self.__accuracy = self.__cl.accuracy(test_set)
        logging.debug('Teste da previsão de intenção finalizado')
        logging.info('Precisão da previsão: {}'.format(self.__accuracy))

    def response(self, phrase):
        """
        Retorna a rasposta da frase de acordo com o classificador criado
        """
        logging.debug('Analisa a frase "{}"'.format(phrase))
        blob = TextBlob(phrase,classifier=self.__cl)
        result = blob.classify()
        logging.debug('Resposta: "{}"'.format(result))
        return result

Follow the link in my file with training information and test data

asked by anonymous 25.07.2017 / 04:58

1 answer

1

After many tests, I was able to figure out what the problem was.

The problem was the number of possible classes that the classifier needed to interpret

For example the following training set:

oie, oi
oi, oiee
olá, oii
tudo bem?, tudo certo
td bem?, tudo bom
tudo bom?, tudo tranquilo

In the above case, all the answers are different from each other, however obvious that there are answers with the same meaning, the classifier can not do this analysis. Summarizing in the above example I have 6 inputs and 6 output classes, this is bad for a classifier to seize anything.

My solution was to define response classes:

oie, [oi]
oi, [oi]
olá, [oi]
tudo bem?, [resposta tudo bem]
td bem?, [resposta tudo bem]
tudo bom?, [resposta tudo bem]

Now I have a completely different situation, I have 6 inputs and 2 classes output, and this causes the accuracy in the responses to go up absurdly.

    
02.08.2017 / 13:39