Count how many times a word in a file appears in another file

4

I would like to count how many times a list of words (file1) appears in another list of words (file2)

with open("corpus.txt", "r") as f1, open("lexico.txt", "r") as f2:
    file1 = f1.read()
    file2 = f2.read()

    corpus1 = file1.split(" ")

    for word in file2:
        print(word, corpus1.count(word))

Corpus.txt file (file2)

  

I am afraid to look for other options because of the quality of light is very recommendable light very white, but the duration of everything opposite I lasted less than months the two lamps and that I put them in the assist lamp. The light I want is paler but strong enough to illuminate the room.

File lexico.txt (file1)

  

is

     

but

     

light

Result

  

is 2

     

0

     

m 0

     

a 2

     

s 0

     

0

     

l 0

     

u 0

     

z 0

     

0

    
asked by anonymous 26.09.2017 / 12:04

1 answer

7

You can do the following:

count = {}

with open('corpus.txt') as f1, open('lexico.txt') as f2:
    corpus = f1.read().split() # texto
    for word in f2: # palavras a quantificar no texto
        w_strp = word.strip() # retirar quebras de linha
        if w_strp != '' and w_strp not in count: # se ja a adicionamos nao vale a pena faze-lo outra vez
            count[w_strp] = corpus.count(w_strp)
print(count) # {'mas': 2, 'é': 2, 'luz': 4}

Or in this case:

count = {}

with open('corpus.txt') as f1, open('lexico.txt') as f2:
    corpus = f1.read().split()
    lexico = set(word.strip() for word in f2 if word.strip() != '') # set() para evitar palavras repetidas

count = {l_word: corpus.count(l_word) for l_word in lexico}
print(count) # {'mas': 2, 'é': 2, 'luz': 4}

If you are sure that no repeated words in lexico.txt , you can only:

...
lexico = [word.strip() for word in f2 if word.strip() != '']
...

Or until:

count = {}

with open('temp/corpus.txt') as f1, open('temp/lexico.txt') as f2:
    corpus = f1.read().split()
    count = {l_word: corpus.count(l_word) for l_word in (word.strip() for word in f2 if word.strip() != '')}

print(count) # {'mas': 2, 'é': 2, 'luz': 4}
    
26.09.2017 / 13:28