Handle large files with GIT

10

Scenario

Some time ago, I tried to use GIT to version some backups, mostly small files, git behaved very well versioning them when they were not big changes from one commit to another, but on a specific server, there were large binary files, which GIT could not handle, I could not even commit.

Problem

If git did not behave well with these files (errors were related to memory problems), the real limitations of handling binaries with GIT remain open, of course handling binaries is not the purpose of GIT, but they were not suffucient clear information I got back then.

Question

  • What is the relationship between the limit of a binary file to be viewed in the GIT with processing capacity and machine memory?
  • Is it safe to keep binaries in GIT, even if small versioned in many commits?
  • What method can we use to optimize the GIT so that it behaves better when versioning binaries can not be avoided?
  •   

    You can cite solutions like Git Annex or Git Bup, but only as a response aid, it refers to the behavior of the pure GIT, without plugins or Forks

        
    asked by anonymous 22.01.2014 / 13:46

    2 answers

    10

    The primary reason git does not support very large files is that it passes the files through xdelta , the which usually means that it tries to load all the contents of the file into memory at once .

    If it did not, you would have to store all the contents of each revision of each file, even when you changed only a few bytes of that file. This would be terribly inefficient in the aspect of disk usage, and git is known for its extremely efficient repository format.

    You can try to tweak these server parameters:

    [core]
      packedGitLimit = 128m
      packedGitWindowSize = 128m
    
    [pack]
      deltaCacheSize = 128m
      packSizeLimit = 128m
      windowMemory = 128m
    

    I think git-annex and this type of solution really are the best because of the way git is built. You can work around these issues, but you will have a highly customized git server and it would not work "out of the blue" in other environments if you need to migrate the server.

        
    22.01.2014 / 17:35
    8

    Git has great difficulty with large files (> 50MB) and a large loss of resources with large repositories (> 10GB).

    1) If you are running your own git, you will have to set up a maximum size for the files in the repository. In github, the maximum file size is 100MB. But with 50MB it already gives you a Warn.

    2) Git was not meant to version binary files. It's better to use rsync and copy somewhere else.

    3) It has a solution called git-annex to manage large files. Take a look at link

        
    22.01.2014 / 15:22