How to check duplicate files?

5

How can I check if two files are the same, even if the name is different in a node js application?

    
asked by anonymous 02.04.2017 / 12:45

1 answer

4

One of the ways is to compare the cryptographic hash of each of the files, if they are the same it is because the file is identical.

For this you can use crypto and fs to open the file.

I created a function:

function CriarHash(Texto){

    return Hash.createHash('sha512').update(Texto).digest('hex');

}

To create Hashes, in this case in 512-bit SHA-2, you can use others from the SHA-2 family or even from SHA-3 or use BLAKE2 .

I've also created a function to open each file and create a hash of each and return all hashes :

function CriarHashArquivos(NomeArquivos){

    ArquivosHashes = [];

    NomeArquivos.forEach(function(Nome) {

        ArquivosHashes.push(CriarHash( Arquivo.readFileSync(Nome)));

    });

    return ArquivosHashes;

}

So what you need to do is just check that all hashes are the same, so I created this:

function HashIgual(Hashes){

    return Hashes.every( v => v === Hashes[0])

}
  

/! \ This is not Timing attack safe , this is vulnerable to timing attacks , I do not know if NodeJS has some native comparison function that is protected to it.

At the end you will have this:

var Hash = require('crypto');
var Arquivo = require('fs');

function CriarHashArquivos(NomeArquivos){

    ArquivosHashes = [];

    NomeArquivos.forEach(function(Nome) {

        ArquivosHashes.push(CriarHash( Arquivo.readFileSync(Nome)));

    });

    return ArquivosHashes;

}

function CriarHash(Texto){

    return Hash.createHash('sha512').update(Texto).digest('hex');

}

function HashIgual(Hashes){

    return Hashes.every( v => v === Hashes[0])

}

To use just define the files, for example:

var NomesDosArquivos = ['arquivo1.txt', 'arquivo2.txt'];

console.log( HashIgual( CriarHashArquivos(NomesDosArquivos) ) );

It will return true if all files are equal or false if at least one of the files compared is different.

NodeJS, as far as I've researched, does not have a function that is Timing Attack Safe, so if you want to use a more secure comparison, use , for example :

function HashIgual(Hashes){

    eIgual = 0;

    Hashes.forEach(function(HashArquivo){

        for (var i = 0; i < HashArquivo.length; ++i) {
            eIgual |= (HashArquivo.charCodeAt(i) ^ Hashes[0].charCodeAt(i));
        }

    });

    return eIgual;

}

This will execute a XOR which will make the execution time equal in any case, as opposed to comparing == and === . In this case, I do not see the need to use it, but if it is, for example, comparing passwords always use constant-time functions. ;)

PS: I do not have much knowledge on NodeJS.     

02.04.2017 / 14:04