Is it efficient to check file modifications for Hash?

7

Scenery

I need to implement a file change check between 2 points of my application. * ¹

  • Point 1 - Server - I have a folder where are some product images;
  • Point 2 - Mobile Device - I have a catalog application that downloads these images from the Server to a specific folder on your sdcard;

Problem

I would like from time to time to compare the images of the Device, with the images of the Server, and to check if it hears some modification, and if there is a lower image again;

Requirements

  • Synchronization is applied via the internet, so consider the size of the information passed on the network;

Technologies

The technologies I'm using are as follows:

  • The Mobile Device Application is on Android;
  • The WebService that checks and returns the image for the application is in C # (MVC Web API);

Question

One of the options I found to implement this is by hashs . So I would like to know if generating hash of the file on the Device and compares it with the hash of the file on the Server is efficient for that case? Or is there any better and more efficient option? (remembering that the Server may have multiple requests for concurrent hash generation, is this a lightweight operation for the Server?).

* ¹ - The changes that should be relevant are those that are applied to the Server folder.

  

Note: When I say "efficiency", I mean: better reliability (I accept 99.999% of hash as quoted by @MiguelAngelo in the comments) and performance (involving here, time and resources, being in processing, or in network traffic).

    
asked by anonymous 11.06.2014 / 15:59

2 answers

4

Hash basically works to confirm the integrity of a data stream. There are several algorithms for HASH.

  • Collision, different files and same Hash

      

    It happens, usually with large files, rarely with small files. But it depends solely on the hash algorithm you will use.

  • Lightweight?

      

    Depends on which hash algorithm you will use. You have CRC32 which is usually quite fast, but the collision is more frequent.

  • Solution

  

If you're going to work with large files, I recommend using SHA1 or MD5, which are not as heavy but not as lightweight. If it will work with small files, between 1kB ~ 10MB, use CRC32, it has considerable performance for a server.

It's worth pointing out that these are only indications and introductions to your doubt based on my experiences. I recommend that you test yourself , compare the results and choose which one is best for you.

    
11.06.2014 / 23:05
0

First we need to understand some things. How much image will it have in folders? File sizes? Because the hash calculation can cause the user's server or device to be very slow and end up causing Timeout when multiple users are using and multiple files are being calculated, depending on where the calculation is done.

I do not recommend doing this in real time with webservice. What we can think of may be as follows. A file (json, xml, txt ... is at your discretion) can be created that whenever a figure is updated it places its hash in that list or updates (server side), and your app has the updated list of times in times as mentioned, where it merge between them and can know which file to download because it is different from the list. I see it as a way to reduce traffic on the network as well and not burden the two sides of the application.

As far as the hash comparison between MD5, SHA1, CRC32, SHA256 ... will feel difference in large files or in large amount of files.

I hope I have helped.

    
27.12.2016 / 13:55