FATAL failed to allocate memory

1

Good afternoon,

I'm using the gem rb-libsvm:

  

link

In order to generate a template for tweeting (positive / negative).

From a database with 19000 positive tweets and 19000 negative tweets I tried to create the template with this code (which is an adaptation of the use example that is in github):

require 'libsvm'

path = 'pos.txt'
documents = IO.readlines(path).map do |line|
  [1,line.tr("\n","")]
end

path2 = 'neg.txt'
documents += IO.readlines(path).map do |line|
  [0,line.tr("\n","")]
end


# Lets create a dictionary of unique words and then we can
# create our vectors.  This is a very simple example.  If you
# were doing this in a production system you'd do things like
# stemming and removing all punctuation (in a less casual way).
#
dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }
training_set = []
documents.each do |doc|
  features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
  training_set << [doc.first, Libsvm::Node.features(features_array)]
end

# Lets set up libsvm so that we can test our prediction
# using the test set
#
problem = Libsvm::Problem.new
parameter = Libsvm::SvmParameter.new

parameter.cache_size = 1 # in megabytes
parameter.eps = 0.001
parameter.c   = 10

# Train classifier using training set
#
problem.set_examples(training_set.map(&:first),training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)
model.save("ic.model")

When I use a smaller number of tweets to do the training (2000 for example) I have no problem, but when I try to do with the 38000 tweets the following error occurs:

FATAL failed to allocate memory

The error occurs in the following code snippet:

documents.each do |doc|
  features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
  training_set << [doc.first, Libsvm::Node.features(features_array)]
end

I'm new to Ruby and I did not understand the reason for this problem, could anyone help?

    
asked by anonymous 09.07.2015 / 19:15

2 answers

1

The problem is that the pc where you are running this code does not have enough memory to run Libsvm with 38000 tweets.

Try reducing the amount of tweets.

    
04.08.2015 / 13:27
1

Apparently you are out of memory. You are keeping in memory at the same time:

  • A String documents;
  • The Hash dictionary;
  • The Array features_array;
  • The Array training_set;

By the error line you know you were able to allocate documents and dictionary , lacked space for features_array and training_set .

If it is possible (it is difficult to judge without knowing exactly what the methods you are calling do), try to restructure the code to use a pipe structure.

Instead of trying to mount these in memory, go calling .set_examples during the loop you make in documents.

Another option is to rewrite this loop using documents.pop, so you will delete the array elements and free up memory.

The ideal would be to be able to read and treat a tweet at a time, more refactoring and I do not know if that is possible in your case.

    
30.08.2015 / 04:21