Display highest frequency word per line and calculate number of words per line

1

I have the following text and I have to display the words that appear most frequently per line and calculate number of words for each line

This is a really really really cool experiment really
Cute little experiment
Will it work maybe it will work do you think it will it will

The above text is in a test.txt file . I opened the file, read the data for a string and then moved on to a hash

File.foreach ("teste.txt") do |linha|
      a = linha.split
      p a
      b = Hash[a.group_by(&:itself).map { |word, words| [word, words.size]}]
      puts b
    end

Having done this I got the following results (this is interspersed with the string with the hash):

["This", "is", "a", "really", "really", "really", "cool", "experiment", "really"]
{"This"=>1, "is"=>1, "a"=>1, "really"=>4, "cool"=>1, "experiment"=>1}
["Cute", "little", "experiment"]
{"Cute"=>1, "little"=>1, "experiment"=>1}
["Will", "it", "work", "maybe", "it", "will", "work", "do", "you", "think", "it", "will", "it", "will"]
{"Will"=>1, "it"=>4, "work"=>2, "maybe"=>1, "will"=>3, "do"=>1, "you"=>1, "think"=>1}

At this point I'm stuck because I'm not sure how to iterate over the hash, when I find the key and the "really" = > 4 value of the first line I can not print the most frequent word in the line from the bottom (although they are all once). I do not know if this is the best way to do this, but that's what I thought I'd try to solve.

    
asked by anonymous 04.01.2018 / 01:33

1 answer

1

You're on the right track! First of all you need to group the values, so I'll use Enumerable#each_with_object .

frase = 'hustle hustle talent'
# precisa do split para obter ['hustle', 'hustle, 'talent']
frequencias = frase.split.each_with_object(Hash.new(0)) { |palavra, hash| hash[palavra] += 1 }

So far it will work as you are already working. To know which word has the highest frequency is not difficult, see:

frequencias.max_by { |key, value| value }
#=> ["hustle", 2]

To print, just store the values returned by Enumerable#max_by . It returns an array with the key and value.

maior_ocorrencia = frequencias.max_by { |key, value| value }
puts "A palavra que mais aparece é: #{maior_ocorrencia[0]} (#{maior_ocorrencia[1]})"

Then iterate and use the same logic.

See working on repl.it . I think it gave a better light, anything leaves a comment.

    
04.01.2018 / 03:41