I need to read and save in the database two files the first (smaller) has about 49,000 rows (each row will be make a record). I parse this file by instantiating the object and then saving that object to the database using Doctrine. I can save about 13,000 records, after which the script dies due to memory consumption.
Although the file is large having more than 100MB I do not load the entire file into memory, I read one line at a time. The complete process of reading the file, instance creation, parse, and loading all other data for the file to work (with the exception of Doctrine) never consumes more than 30 MB (consumes less than 2.5% of ram). Only when you run Doctrine does the memory consumption increase absurdly.
I know that it is possible to increase the amount of memory to be used by changing php.ini directives or using ini_set, but I would not like that solution. The object just needs to be written to the database and not be in memory.
I tried to retrieve the elements and remove them from memory using detach:
private function clearBatch() {
$id = $this->last->getId() - 1000;
$query = $this->getDoctrine()->getEntityManager()->createQuery(sprintf('SELECT c FROM {MyObjet} c WHERE c.id >= \'%d\'', $id > 0 ? $id : 0));
$cache = $query->getResult();
foreach ($cache as $c) {
$this->getDoctrine()->getManager()->detach($c);
}
$query->free();
$this->getDoctrine()->getManager()->clear();
$this->getDoctrine()->resetManager();
gc_collect_cycles();
}
But the script always dies, I believe that the doctrine is every insert storing it in memory, but I just need to feed the database with this information. This does not need to be in memory.
Every 1000 records I enter (persist) I run clearBatch
public function onProcess($c){
$this->getDoctrine()->getManager()->persist($c);
$this->count++;
$this->last = $c;
if ($this->count % 1000 == 0) {
$this->getDoctrine()->getManager()->flush();
$this->clearBatch();
}
}
I even tried calling the Garbage Collector.
If I do not store in persistence I can read all the records. And my memory consumption is constant (it reaches that peak, right at the beginning of the script and stays to the end) 2.4% (it's a test-only VMWare, has 1GB of ram), the script dies when it reaches ~ 14%.
The onProcess function is called in the file reading loop.