PYTHON - Difficulties in arranging COH-PIAH code [closed]

1

Good evening, I'm doing a course in python and this is the last exercise of the course, but I'm having a lot of trouble correcting it, when I send it it gives the following error:

"[0.5 points]: Testing text evaluation (Texts = ['Old browsers had a glorious phrase:" Navigating is accurate; living is not necessary. ") I want the spirit [d] this phrase, transformed the form to marry as I am: Living is not necessary, what is necessary is to create.I do not expect to enjoy my life, nor to enjoy it I think.I only want to make it great, to be my body and (my soul) the fire of this fire. I only want to make it of all humanity, although for that I have to lose it as mine. Every time more so I think. Every time I put more of the soul essence of my blood, the impersonal purpose of aggrandizing the homeland and contributing to the evolution of humanity. It is the form that has taken the mysticism of our Race in me. "I turned to her, Capitu had his eyes on the ground, We looked at each other slowly ... Confession of children, you were well worth two or three pages, but I want to be spared. the we say anything; the wall spoke for us. We do not move, the hands are stretched out little by little, all four, catching themselves, squeezing, merging. I did not mark the exact time of that gesture. He should have marked it; I miss a note written that night, and I would put it here with the spelling mistakes I brought, but I would not bring any, that was the difference between the student and the teenager. He knew the rules of writing without suspecting those of love; had our orgies of Latin and was a virgin of women. OUR joy in the face of a metaphysical system, our satisfaction in the presence of a construction of thought, in which the spiritual organization of the world is shown in a logical, coherent and harmonious whole, always depends eminently on aesthetics; have the same origin as pleasure, that the high satisfaction, always serene after all, that artistic activity gives us when it creates the order and the form allows us to cover with a view the chaos of life, giving it transparency. '] Signature = [4.79, 0.72, 0.56, 80.5, 2.5, 31.6]) - Failed ***** AssertionError: Expected: 2; received: 1 "

This is the code I used:

import re


def le_assinatura():
    """
    A função lê os valores dos traços linguísticos do modelo e devolve uma
    assinatura a ser comparada com os textos fornecidos.
    """
    print("Bem-vindo ao detector automático de COH-PIAH.")

    tam_m_pal = float(input("Entre o tamanho medio de palavra: "))
    type_token = float(input("Entre a relação Type-Token: "))
    h_lego = float(input("Entre a Razão Hapax Legomana: "))
    tam_m_sent = float(input("Entre o tamanho médio de sentença: "))
    compx_med = float(input("Entre a complexidade média da sentença: "))
    tam_m_frase = float(input("Entre o tamanho medio de frase: "))

    return [tam_m_pal, type_token, h_lego, tam_m_sent, compx_med, tam_m_frase]


def le_textos():
    i = 1
    textos = []
    texto = input("Digite o texto: " + str(i) + "(aperte enter para sair):")
    while texto:
        textos.append(texto)
        i += 1
        texto = input("Digite o texto: " + str(i) + "(aperte enter para sair):")
    return textos


def calcula_assinatura(texto):
    """
    Essa função recebe um texto e deve devolver a assinatura
    do texto.
    """
    if type(texto) != list:
        aux = texto
        texto = []
        texto.append(aux)
    for i in texto:
        sentencas = []
        sentencas = separa_sentencas(str(i))  
        frases = []
        num_tot_sentencas = 0
        soma_cat_sentencas = 0
        for i in range(len(sentencas)):
            frase_i = separa_frases(str(sentencas[i]))
            frases.append(frase_i)  
            num_tot_sentencas += 1
            soma_cat_sentencas = soma_cat_sentencas + len(sentencas[i])
        palavras = []
        num_tot_frases = 0
        soma_cat_frases = 0
        for lin in range(len(frases)):
            for col in range(len(frases[lin])):
                palavra_i = separa_palavras(str(frases[lin][col]))
                palavras.append(palavra_i)  
                num_tot_frases += 1
                soma_cat_frases = soma_cat_frases + len(str(frases[lin][col]))
        mtrx_para_lista = []  
        for lin in range(len(palavras)):
            for col in range(len(palavras[lin])):
                mtrx_para_lista.append(palavras[lin][col])
        palavras = mtrx_para_lista[:]
        soma_comp_palavras = 0
        num_tot_palavras = 0
        for lin in range(len(palavras)):
            for col in range(len(palavras[lin])):
                soma_comp_palavras = soma_comp_palavras + len(str(palavras[lin][col]))
            num_tot_palavras += 1
        matriz_ass_input = []
        matriz_ass_input.append(tam_m_pal(soma_comp_palavras, num_tot_palavras))
        matriz_ass_input.append(type_token(palavras, num_tot_palavras))
        matriz_ass_input.append(h_lego(palavras, num_tot_palavras))
        matriz_ass_input.append(tam_m_sent(soma_cat_sentencas, num_tot_sentencas))
        matriz_ass_input.append(compx_med(num_tot_frases, num_tot_sentencas))
        matriz_ass_input.append(tam_m_frase(soma_cat_frases, num_tot_frases))
    return matriz_ass_input  


def tam_m_pal(soma_comp_palavras, num_tot_palavras):
    if num_tot_palavras != 0:
        tam_m_pal = soma_comp_palavras / num_tot_palavras
    else:
        tam_m_pal = 0
    return tam_m_pal


def type_token(lista_palavras, num_tot_palavras):
    num_pal_dif = n_palavras_diferentes(lista_palavras)
    if num_tot_palavras != 0:
        type_token = num_pal_dif / num_tot_palavras
    else:
        type_token = 0
    return type_token


def h_lego(lista_palavras, num_tot_palavras):
    num_pal_uni = n_palavras_unicas(lista_palavras)
    if num_tot_palavras != 0:
        h_lego = num_pal_uni / num_tot_palavras
    else:
        h_lego = 0
    return h_lego


def tam_m_sent(soma_num_cat, num_sent):
    if num_sent != 0:
        tam_m_sent = soma_num_cat / num_sent
    else:
        tam_m_sent = 0
    return tam_m_sent


def compx_med(num_tot_frases, num_tot_sentencas):
    if num_tot_sentencas != 0:
        compx_med = num_tot_frases / num_tot_sentencas
    else:
        compx_med = 0
    return compx_med


def tam_m_frase(soma_cat_frases, num_tot_frases):
    if num_tot_frases != 0:
        tam_m_frase = soma_cat_frases / num_tot_frases
    else:
        tam_m_frase = 0
    return tam_m_frase


def separa_sentencas(texto):
    """
    A função recebe um texto e devolve uma lista das sentenças dentro
    do texto.
    """
    sentencas = re.split(r'[.!?]+', texto)
    if sentencas[-1] == '':
        del sentencas[-1]
    return sentencas


def separa_frases(sentenca):
    """
    A função recebe uma sentença e devolve uma lista das frases dentro
    da sentença.
    """
    return re.split(r'[,:;]+', sentenca)


def separa_palavras(frase):
    """
    A função recebe uma frase e devolve uma lista das palavras dentro
    da frase.
    """
    return frase.split()


def n_palavras_unicas(lista_palavras):
    """
    Essa função recebe uma lista de palavras e devolve o numero de palavras
    que aparecem uma única vez.
    """
    freq = dict()
    unicas = 0
    for palavra in lista_palavras:
        p = palavra.lower()
        if p in freq:
            if freq[p] == 1:
                unicas -= 1
            freq[p] += 1
        else:
            freq[p] = 1
            unicas += 1

    return unicas


def n_palavras_diferentes(lista_palavras):
    """
    Essa função recebe uma lista de palavras e devolve o numero de palavras
    diferentes utilizadas.
    """
    freq = dict()
    for palavra in lista_palavras:
        p = palavra.lower()
        if p in freq:
            freq[p] += 1
        else:
            freq[p] = 1

    return len(freq)


def compara_assinatura(ass_main, matriz_ass_input):
    """
    Essa função recebe duas assinaturas de texto e deve devolver o grau de
    similaridade nas assinaturas.
    """
    lista_Sab = []
    soma_mod = 0
    if type(matriz_ass_input[0]) is list:
        for lin in range(len(matriz_ass_input)):
            for col in range(len(matriz_ass_input[lin])):
                soma_mod += abs(ass_main[col] - matriz_ass_input[lin][col])
            Sab = soma_mod / 6
            lista_Sab.append(Sab)
        return lista_Sab
    else:
        for i in range(len(matriz_ass_input)):
            soma_mod += abs(ass_main[i] - matriz_ass_input[i])
        Sab = soma_mod / 6
        return Sab


def avalia_textos(textos_main, ass_comparadas):
    """
    Essa função recebe uma lista de textos e deve devolver o numero (0 a n-1)
    do texto com maior probabilidade de ter sido infectado por COH-PIAH.
    """
    aux_ass_com = (ass_comparadas[:])
    aux_ass_com.sort()
    for indice in range(len(ass_comparadas)):
        if aux_ass_com[0] == ass_comparadas[indice]:
            copiah = indice
    return copiah -1

If someone can give me a light, I'll insert how the program should work, to make it a bit clearer.

"Several studies have been compiled, and today the exact signature of a COH-PIAH holder is known. Your program should receive several texts and calculate the values of the different linguistic traits as follows:

Average word size is the sum of word sizes divided by the total number of words. Relationship Type-Token is the number of different words divided by the total number of words. For example, in the phrase "The cat hunted the mouse", we have 5 words in total (the, the cat, the hunted, the, the mouse) but only 4 different ones (the cat, the hunted, the mouse). In this sentence, the Type-Token ratio is 45 = 0.8 Hapax Legomana Reason is the number of words that appear only once divided by the total of words. For example, in the phrase "The cat hunted the mouse", we have 5 words in total (the, cat, the hunted, the mouse) but only 3 that appear only once (cat, hunt, mouse). In this sentence, the Hapax Legomana relation is 35 = 0.6 Average sentence length is the sum of character numbers in all sentences divided by the number of sentences (characters that separate one sentence from the other should not be counted as part of the sentence). Sentence complexity is the total number of sentences divided by the number of sentences. Average sentence size is the sum of the number of characters in each sentence divided by the number of sentences in the text (characters that separate one sentence from the other should not be counted as part of the sentence). After calculating these values for each text, you should compare it with the signature provided to those infected by COH-PIAH. The degree of similarity between two texts, a and b, is given by the formula:

Sab = Σ6i = 1 || fi, a-fi, b || 6

Where:

Sab is the degree of similarity between texts a and b; fi, a is the value of each linguistic trait i in text a; and fi, b is the value of each linguistic trait i in text b. Notice that the more similar a and b are, the lower Sab will be. For each text, you must calculate the degree of similarity with the COH-PIAH carrier signature and in the end display which text was most likely written by some infected student. "

    
asked by anonymous 01.05.2017 / 01:45

0 answers