What is the best way to read an XML in PHP and insert it in the Database?

5

I am reading XML files of about 50M of information (about a hundred thousand records, may be more), and I have two classes read XML read from a DOM and another SAX.

I'm having a problem in terms of runtimes, take me longer than 10 minutes, max PHP runtime, changed by me in php.ini .

I've read a lot about this and know that SAX is faster than the very large DOM for files, which I think is the case.

One of my mandatory requirements is that either the entire file is read and placed in the Database, or it does not insert anything.

The question that remains is what is the most effective way, in terms of EXECUTION TIME , to read XML and save in DB?

  • Go reading TAG to TAG and inserting into DB?
  • Read everything to a data structure and insert everything in the end?
  • Another way, which one?
asked by anonymous 17.06.2014 / 15:34

1 answer

1

Let's look at the possibilities you gave yourself:

  

Go reading TAG to TAG and inserting into DB

In this format, we will have a redundant structure - each tag read, a request opens with the database and saves it there.

The good:

If you have not been able to read and save everything, at least you have what is possible; The problem with this "pro" is that it goes against the question yourself raised:

  

"[...]

     

One of my mandatory requirements is that either the entire file is read and placed in the Database, or it inserts NOTHING.

     

[...] "

The bad:

If you do not maintain a live database connection during the application lifecycle, you will have what I just called a redundant structure . Now, every time you save some information, you will be forced to request a connection with your base - which, of course, is a waste of very relevant processing; of course, this will directly affect the runtime of your service - which is just what we do not want .

Consider this technique a kind of streaming . Its processing is fragmented - if the experience is not social, I see more than the unnecessary, I see as something wasted .

Furthermore, the concept of "all or nothing" does not apply without a stash for a later commit - even if you apply this, keeping the connection alive , stash and commit require a "more" hardware - which further deprecates the performance you are looking for.

On the other side of the tatami , we have:

  

Read everything to a data structure and insert everything at the end

Understand:

This is a very technique used in the latest versions of Microsoft software - that's where the fallacy comes from that Windows Phone is fluid, for example: the loading is all done before the final execution of an application in question, which makes the experience smooth and non-crashing. Understand this as a video being 100% loaded before it runs: it takes more time for you to watch, but you watch without problems .

Technical terms:

This method requires a request with the database: at the time of saving. This unique request does not make your structure redundant and guarantees your "all or nothing". If you managed to save, it means that it was everything - and that's a point you raised.

In general, the response time to save everything will shorter than that of the prior art. This is because the processing is now smaller; the hardware responds faster and manages to execute everything with smoothness, fluidity and consequently more agility.

Once you no longer want "all or nothing," consider re-analyzing the prior streaming technique.

    
17.06.2014 / 16:18