WEKA reads a file in the format ARFF .
To create an arff file, you must define the following headers:
Statement of Relation
A name for the relationship, defined in the first line of the file. It is stated:
@relation <nome da relacao>
If the relationship name contains spaces, you must use quotation marks.
Statement of Attributes
Attributes are declared by an ordered sequence of @attributes
. Each attribute in the dataset must have its own declaration using @attribute
that uniquely identifies the name of this attribute and the data type. The order in which they are declared indicates the order in which they appear in the dataset.
It is stated:
@attribute <nome do atributo> <tipo de dado>
The attribute name must begin with a letter, and if it contains spaces, it must be enclosed in quotation marks.
The data types supported by WEKA are:
- Numbers (real or integers): Numeric
- "Free" text: String
- Nominal attributes (default text)
- Date: Date []
- Relational Attributes
Numeric attributes
Serves for both integers and real. It declares:
@attribute idade numeric
Nominal Attributes
Named values are defined when a list of possible values is provided. For example:
@attribute classe {comprador, possivel-comprador, nao-comprador}
Attributes of type String
Used for arbitrary text. It is stated:
@attribute tweet string
Note: Must be enclosed in quotation marks if it contains spaces.
Data set declaration
The data set is declared in a single line. It is stated:
@data
It delimits where the instance data actually begins.
Instance data
The instance data is declared one per line and you must separate the attributes with a comma.
By responding directly to your question, a possible configuration of an ARFF file for your problem would look like this:
% Tudo depois do % é ignorado. Pode-se utilizar para inserir comentários
@relation compradores
@attribute tweet string
@attribute classe {compraria, nao-compraria}
@data
"To e morto Galaxy S5 por R$ 2,600", nao-compraria
"Preciso de um galaxy s5", compraria
"Configurando meu Galaxy s5", compraria
"Prefiro um iphone do que um galaxy s5", nao-compraria