Good afternoon,
I'm using the gem rb-libsvm:
In order to generate a template for tweeting (positive / negative).
From a database with 19000 positive tweets and 19000 negative tweets I tried to create the template with this code (which is an adaptation of the use example that is in github):
require 'libsvm'
path = 'pos.txt'
documents = IO.readlines(path).map do |line|
[1,line.tr("\n","")]
end
path2 = 'neg.txt'
documents += IO.readlines(path).map do |line|
[0,line.tr("\n","")]
end
# Lets create a dictionary of unique words and then we can
# create our vectors. This is a very simple example. If you
# were doing this in a production system you'd do things like
# stemming and removing all punctuation (in a less casual way).
#
dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }
training_set = []
documents.each do |doc|
features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
training_set << [doc.first, Libsvm::Node.features(features_array)]
end
# Lets set up libsvm so that we can test our prediction
# using the test set
#
problem = Libsvm::Problem.new
parameter = Libsvm::SvmParameter.new
parameter.cache_size = 1 # in megabytes
parameter.eps = 0.001
parameter.c = 10
# Train classifier using training set
#
problem.set_examples(training_set.map(&:first),training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)
model.save("ic.model")
When I use a smaller number of tweets to do the training (2000 for example) I have no problem, but when I try to do with the 38000 tweets the following error occurs:
FATAL failed to allocate memory
The error occurs in the following code snippet:
documents.each do |doc|
features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
training_set << [doc.first, Libsvm::Node.features(features_array)]
end
I'm new to Ruby and I did not understand the reason for this problem, could anyone help?