Analyzing strings in a text file and returning the string that most appeared

Question

Analyzing strings in a text file and returning the string that most appeared

Navigation

#1 by (2 votes)
#2 by (0 votes)

1

I need to parse strings into a text file and return the one that pops up (if there is a draw, take both) and save it to another text file. I open the file and I analyze the lines but I do not know how to check what else appears. Example:

file "A" - Input

1 #BrasilNaCopa

2 #OperaLavaJato

3 # PartiuAD2

4 # PartiuAD2

5 #OperaLavaJato

6 # PartiuAD2

7 #DietSecond Day

File "B" - Output

1 # PartiuAD2

python python-3.x

asked by anonymous 10.05.2017 / 23:01

2 answers

0

It's simple :) But it's important that the words from both files are on separate lists! (one for each). Here you go:

palavras = {}
for x in a:    # a e b sao as listas
contador = 1
for y in b:
    if x == y:
        contador +=1
palavras[x] = contador

Ready! Then you will have a dictionary whose keys are the words, and the values are the number of times that word repeats.

11.05.2017 / 05:56

Scheduling Tasks in Django CLOB for string and vice versa [closed]

score 2 · Accepted Answer

I have little information about your case so I'll consider the following:

File "A" has one word per line

You need to find out which word appears most often, but it does not have a list of possible words, ie you have to count the words inside the "A" File whatever they are

Whereas File "A" has this content:

BrasilNaCopa
OperacaoLavaJato
PartiuAD2
PartiuAD2
OperacaoLavaJato
PartiuAD2
DietaSegundaFeira

What we will need to do is the following:

# Abrimos o arquivo "arquivoa.txt" para leitura
arquivoA = open('arquivoa.txt', 'r') 

# Lemos o conteúdo do aquivo para a variável "texto"
# A variável "texto" é uma lista onde cada item é uma linha
texto = arquivoA.readlines()

# === OBSERVAÇÇÕES IMPORTANTES ===
# Dar um print na variável texto:
#   print(texto) 
#
# Resultaria em:
#   ['BrasilNaCopa\n', 'OperacaoLavaJato\n', 'PartiuAD2\n', 'PartiuAD2\n', 'OperacaoLavaJato\n', 'PartiuAD2\n', 'DietaSegundaFeira\n']
#
# Observe que há o "\n" (quebra de linha) no final de cada string, vamos ter 
# que limpar isso depois

# Criamos um dicionário para armazenar a contagem nas palavras
contagem = dict()

for linha in texto:
    # Limpamos aquela quebra de linha (\n) com o strip()
    palavra = linha.strip()

    if palavra not in contagem.keys():
        # Se a palavra ainda não existir na contagem, incluimos com o valor 1
        contagem[palavra] = 1
    else:
        # Se a palavra já existe na contagem a gente soma 1 no valor atual
        contagem[palavra] += 1

# Nesse ponto o dicionário "contagem" já tem a contagem de todas as palavras
# Dar um print em contagem:
#   print(contagem)
#
# Resultaria em:
#   {'DietaSegundaFeira': 1, 'PartiuAD2': 3, 'OperacaoLavaJato': 2, 'BrasilNaCopa': 1}

# Agora obtemos a palavra com maior contagem
palavraMaisRepetida = max(contagem)

# Dar um print em palavraMaisRepetida:
#   print(palavraMaisRepetida)
#
# Resultaria em:
#   'PartiuAD2'

IMPORTANT

See here I only addressed the treatment for cases where only one word appears at the top of the count. You said that in case of a tie at the top you should get all the words that are at the top of the count.

I'm going to leave this treatment for you to complete, I think you've already understood the spirit of the thing and now it's easy.

It is also necessary to record this count result in File "B"