I need to make a program that finds duplicate files on my computer, so the user can decide what action to take with these files (eg delete the copies). For now, I only worry about a binary comparison between files (that is, the file is only duplicated if it is 100% equal to another)
I know that searching only for the filename is insufficient, since the same file may have been saved under another name.
Is there an algorithm for comparing files?
I imagine that generating the checksum of all files and comparing all of them against all is unproductive because it is not normal to have so many duplicate files. I also assume you can not use just the file size. And they may have cases where the file is duplicated more than once.