Read contents of a folder and process the files asynchronously

1

I have a folder with several XML .

I would like a script to read these files, and process each one of them.

How do I read folder contents and "pull files" since are different names ?

I would like to do this in asynchronous form, thus processing more than 1 at the same time, and setting a limit on concurrent processes, avoiding crashing processing?

    
asked by anonymous 22.05.2018 / 15:19

1 answer

2

Use glob .

$array = glob('caminho/ate/a/pasta/*.xml');

You can set some folder patterns and use the asterisk to define wildcards, where any character, or amount of them, will be returned.

That is, the code above will return any file, within that path, that has the .xml extension.

Another idea is to use the SPL GlobIterator :

$iterator = new GlobIterator('caminho/ate/a/pasta/*.xml');

foreach ($iterator as $item) {
    echo $item;
}

Or some other iterator below:

Everyone will arrive at the result you want, but specific for file search is glob or GlobIterator .

Parallel Processing

For parallel processing, there are three ways I know:

PHP Thread

In a very simplistic way, you should extend the Thread class and define the file's processing:

class XmlProcessThread extends Thread {

    protected $filename;

    public function __construct($filename) { 
        $this->filename = $filename;
    }

    public function run() {
        /** utiliza o filename e reealiza o processamento **/
    }
}

When you call processing, you must instantiate each thread by starting it:

foreach ($fileList as $filename) {
    $thread = new XmlProcessThread($filename);
    $thread->start();
    $threadList[] = $thread;
}

Script via Exec

Basically, you can run PHP files using the exec command. In the script call (command line), you should add the & symbol as the last parameter, this causes the script to run in the background, and the PHP script (the one that started execution) does not wait for the complete execution.

exec('php diretorio/thread.php filename.xml &');

In the thread.php file, you must use the variable $argv . It will contain all the parameters sent to the script (in this case, filename.xml ).

More information: link

Distributed threads

In this method, you must create sockets , where each socket will process a file, and these sockets must be executed via Thread (according to the first example).

Readings:

Which is better?

Well, it always depends. Usually they tend to have different results for different scenarios.

For example, distributed parallel processing tends to be faster in cases that require a lot of processing (processing times). Well, distribute the processing to other servers. However, in this case, it may require the file to be sent to the other server (if the file is not in range) and is expensive for final processing.

On the other hand, Thread and exec uses the same server and the processes will compete with each other, which will probably sacrifice performance.

These are just some of the advantages and disadvantages examples. A good overview you can get from the answer below:

Is it always guaranteed that a multi-threaded application runs faster than using a single thread?

    
22.05.2018 / 15:29