I created a simple grammar to interpret a file whose format is very similar to a JSON. However, when I try to parse the file I get the Sytem.OutOfMemoryException
exception. This is because of the size of the file I'm trying to parse. The file has 108MB and 4.682.073 of rows.
How do I parse smaller files, everything works normally, however, for this file, I realize that when the memory occupied by the process reaches almost 2GB the exception is fired and the program stops. The exception comes from the code generated for the parser with the ANTLR extension for Visual Studio.
How do I run the parser for a really large files with ANTLR?
More information
The machine I'm running the parser has 8GB of memory, 2.8 GHz processor (Intel Core 2 Duo).
Problem example
Sample file for reading
(
:field ("ObjectName"
:field (
:field ("{6BF621F9-A0E2-49BB-A86B-3DE4750954F4}")
:field (Value)
:field (Value)
:field (
:Time ("Sun Jan 26 10:08:33 2014")
:last_modified_utc (1390730913)
:By ("Mensagem qualquer")
:From (localhost)
)
:field ("Applications/application_fw1")
:field (false)
:field (false)
)
:field ()
:field ()
:field ()
:field (0)
:field (true)
:field (true)
)
.
.
.
Milhares de outros fields.
.
.
.
)
The grammar
grammar Objects;
/*
* Parser Rules
*/
compileUnit
: obj
;
obj
: OPEN ID? (field)* CLOSE
;
field
: ':'(ID)? obj
;
/*
* Lexer Rules
*/
OPEN
: '('
;
CLOSE
: ')'
;
ID
: (ALPHA | ALPHA_IN_STRING)
;
fragment
INT_ID
: ('0'..'9')
;
fragment
ALPHA_EACH
: 'A'..'Z' | 'a'..'z' | '_' | INT_ID | '-' | '.' | '@'
;
fragment
ALPHA
: (ALPHA_EACH)+
;
fragment
ALPHA_IN_STRING
: ('"' ( ~[\r\n] )+ '"')
;
WS
// : ' ' -> channel(HIDDEN)
: [ \t\r\n]+ -> skip // skip spaces, tabs, newlines
;
Running the parser
// text é o texto do arquivo de 108MB que será lido.
var input = new Antlr4.Runtime.AntlrInputStream(text);
var lexer = new ObjectsLexer(input);
var tokens = new Antlr4.Runtime.CommonTokenStream(lexer);
var parser = new ObjectsParser(tokens);
// Contexto para a regra compileUnit
// ERRO: Aqui ocorre o problema. Quando inicia a montagem da árvore para compileUnit
// Não chega no Visitor, a exceção ocorre em compileUnit()
var ctx = parser.compileUnit();
// Execução do visitor
new ObjectsVisitor().Visit(ctx);