Use glob
.
$array = glob('caminho/ate/a/pasta/*.xml');
You can set some folder patterns and use the asterisk to define wildcards, where any character, or amount of them, will be returned.
That is, the code above will return any file, within that path, that has the .xml
extension.
Another idea is to use the SPL GlobIterator
:
$iterator = new GlobIterator('caminho/ate/a/pasta/*.xml');
foreach ($iterator as $item) {
echo $item;
}
Or some other iterator below:
Everyone will arrive at the result you want, but specific for file search is glob
or GlobIterator
.
Parallel Processing
For parallel processing, there are three ways I know:
PHP Thread
In a very simplistic way, you should extend the Thread class and define the file's processing:
class XmlProcessThread extends Thread {
protected $filename;
public function __construct($filename) {
$this->filename = $filename;
}
public function run() {
/** utiliza o filename e reealiza o processamento **/
}
}
When you call processing, you must instantiate each thread by starting it:
foreach ($fileList as $filename) {
$thread = new XmlProcessThread($filename);
$thread->start();
$threadList[] = $thread;
}
Script via Exec
Basically, you can run PHP files using the exec
command. In the script call (command line), you should add the &
symbol as the last parameter, this causes the script to run in the background, and the PHP script (the one that started execution) does not wait for the complete execution.
exec('php diretorio/thread.php filename.xml &');
In the thread.php
file, you must use the variable $argv
. It will contain all the parameters sent to the script (in this case, filename.xml
).
More information: link
Distributed threads
In this method, you must create sockets , where each socket will process a file, and these sockets must be executed via Thread (according to the first example).
Readings:
Which is better?
Well, it always depends. Usually they tend to have different results for different scenarios.
For example, distributed parallel processing tends to be faster in cases that require a lot of processing (processing times). Well, distribute the processing to other servers. However, in this case, it may require the file to be sent to the other server (if the file is not in range) and is expensive for final processing.
On the other hand, Thread and exec uses the same server and the processes will compete with each other, which will probably sacrifice performance.
These are just some of the advantages and disadvantages examples. A good overview you can get from the answer below:
Is it always guaranteed that a multi-threaded application runs faster than using a single thread?