In which programming language does a crawler / scrapper sweep the DOM faster?

1

I've developed a script in which I use PHP's DOMDocument class to make a crawler on a third-party site.

The script speed does not meet the expected goal, I would like to know in which programming language a script for the same purpose will bring me a DOM scan result with more speed?

    
asked by anonymous 23.11.2017 / 17:55

1 answer

3

Programming languages do not have speed as a characteristic. Some have features that help you get more speed. Libraries may already have speed, but the default does not have to be used. If the standard does not meet the performance requirements, rare, very rare, then look for another library.

What gives the most speed is using the right data structure and the right algorithm. The difference between right and wrong choice may be to take less than 1 second to make or take for centuries. There are cases that are in this proportion, and they are not few.

Choosing a faster language can do something that takes 1 minute to take less than 1 second, no more than that, and in a few cases it makes so much difference. And we're talking about languages with glaring differences, for example one of the worst implementations of Ruby compared to very well written Assembly.

Assembly is the language that allows the best possible performance. But in practice today it is so difficult to write a correct and fast code in Assembly that almost always one written in C will be faster. In some cases in C ++, or Rust, or Fortran may be better. But in Delphi, Java and C #, just to name a few, most of the tasks will be executed with minimal difference to these languages and even in those they are bad the difference is it takes about 3 seconds where in C it would take less than 1. p>

If you want to stay in script languages then JavaScript (who knows Typescript ) and Moon , mainly in the dialect LuJIT , should be the best options.

PHP does not perform that badly, especially in newer versions.

But if you do not master the language, programming, and concepts described above, the result will not be good.

Most applications do not need as much performance as people think, the ones they need often require hard and complex engineering work. So if it is possible to have a big performance gain by changing something is because the original was very wrong (but working, which makes people think it was right).

If you do it right, it is likely that the bottleneck is bringing the information across the network, even in "slow languages."

You can see a comparison of languages . But pay attention that this is called "game", it is not a scientific method. If using this to make important decisions can break the face.

    
23.11.2017 / 18:19