Additions and Deletions per file in commit

1

I'm using the git log --stat <commit> command to get the additions and deletions made to a commit.

Ex:

commit 1a1a
Author: Gabriel Hardoim
Date: 2018-08-20 20:30:40

arquivo.java    | 3 +++
arquivo.css     | 3 ---

2 files changed, 3 insertions(+), 3 deletions(-)

So far so good, but in cases where in the same file lines have been added and removed it displays a number of changes per file greater than the total number of changes.

Ex:

commit 1a1a
Author: Gabriel Hardoim
Date: 2018-08-20 20:30:40

arquivo.java    | 5 +----
arquivo.css     | 5 ++++-

2 files changed, 5 insertions(+), 5 deletions(-)

In such cases:

  • How do I find out how many additions / deletions have been made to each file?
  • Is there any command or flag that can help me in this regard?
  

Ps: When the number of changes is too large, the amount of + and - next to the file is not as clear as in small changes.

    
asked by anonymous 11.09.2018 / 18:42

1 answer

2

Let's understand why it looks like git "displays a number of changes per file greater than the total number of changes."

Let's say I have a file with 3 lines:

primeira linha do arquivo
segunda linha do arquivo
terceira linha do arquivo

Let's also assume that this is the content that is in the last commit (in my case, it only has 1 commit):

$  git log --oneline
792eda8 primeiro commit

Next I edit the file, changing the second line and adding a fourth line:

primeira linha do arquivo
mudando a segunda linha do arquivo, blablabla etc
terceira linha do arquivo
adicionar quarta linha

And I commit it:

$ git add arq.txt
warning: LF will be replaced by CRLF in arq.txt.

$ git commit -m "mudar arquivo"
[master d0be287] mudar arquivo
1 file changed, 2 insertions(+), 1 deletion(-)

$ git log --oneline
d0be287 mudar arquivo
792eda8 primeiro commit

If we use git log to see the changes, we will have 2 inserts and 1 deletion:

$ git log --stat d0be287 -1
commit d0be287ddf28aa910d8fa9d002a609aa8056e357
Author: Fulano de Tal <[email protected]>
Date:   2018-09-12 09:07:34

    mudar arquivo

 arq.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

And if we use git diff , we can see the changes in more detail:

$ git diff 792eda8 d0be287
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
 primeira linha do arquivo
-segunda linha do arquivo
+mudando a segunda linha do arquivo, blablabla etc
 terceira linha do arquivo
+adicionar quarta linha

In this case, I'm seeing the difference between commits 792eda8 and d0be287 .

Now notice how the change is shown on the second line. It shows that the segunda linha do arquivo excerpt has been "removed" (it has - in front), giving mudando a segunda linha do arquivo, blablabla etc (which has been "added" since it has + in front).

So, although the change was "edit a line", git considers that an insert and a deletion have been made.

The fourth line appears as added (with + in front). So the result is 2 insertions and 1 deletion. You can have a summary of this with the options --stat , --shortstat and --numstat , which show the same information in different formats:

$ git diff 792eda8 d0be287 --stat
 arq.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

$ git diff 792eda8 d0be287 --shortstat
 1 file changed, 2 insertions(+), 1 deletion(-)

$ git diff 792eda8 d0be287 --numstat
2       1       arq.txt

--numstat is the most "compact" option, showing only the numbers and filename (in the case above, there are 2 insertions and 1 deletion), which is a good format to be read by scripts, for example .

But if you do not want a change on the same row to be counted twice (such as an insert and a deletion) and would like to know only if one row was modified, another option is to use --word-diff , which shows the differences leading into account the words (where "words" are delimited by spaces):

$ git diff 792eda8 d0be287 --word-diff
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
{+mudando a+} segunda linha do [-arquivo-]{+arquivo, blablabla etc+}
terceira linha do arquivo
{+adicionar quarta linha+}

Notice that the second line changes are now all shown in a single line, showing the words that have been added and removed.

You can even use --word-diff-regex to define a regular expression that defines what a word is. In your case, you could use ^.*$ , which means "zero or more characters ( .* ) from the beginning ( ^ ) to the end ( $ ) of the line" - that is, the entire line is considered a single word. The result is:

$ git diff 792eda8 d0be287 --word-diff-regex="^.*$"
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
[-segunda linha do arquivo-]{+mudando a segunda linha do arquivo, blablabla etc+}
terceira linha do arquivo
{+adicionar quarta linha+}

The second line continues to appear as if it has a deletion and an insert, but at least the information is all on one line only. With this, you can use egrep (if you are using bash) to count only these lines.

$ git diff 792eda8 d0be287 --word-diff-regex="^.*$" | egrep -e "^(\[-|\{\+)" -c
2

In this case, I'm only considering lines that start with [- or {+ . Since I consider the entire line to be a single word, this ensures that any modified line will begin with any of these characters. Then I use the -c option to return the number of rows (in this case, 2 ). If you do not use the -c option, the return is the modified lines themselves:

$ git diff 792eda8 d0be287 --word-diff-regex="^.*$" | egrep -e "^(\[-|\{\+)"
[-segunda linha do arquivo-]{+mudando a segunda linha do arquivo, blablabla etc+}
{+adicionar quarta linha+}

You can also change the regex of egrep to bring the modifications separately:

  • egrep -e "^\[-.*-\]\{\+" - lines that have been modified (start with [- and also have {+ )
  • egrep -e "^\[-.*-\]$" - rows that were removed (start with [- and end with -] )
  • egrep -e "^\{\+.*\+\}" - lines that have been added (start with {+ and end with +} )

Remembering that there may be false positives (if some of the delimiters [- , {+ , etc. are part of the line itself).

You could also use git diff d0be287~ - and in this case d0be287~ means "the previous commit to d0be287" (for more details on this syntax, see here and here ). In this case, it checks the differences between the previous commit to d0be287 and its HEAD (that is, the branch you are currently in).

    
12.09.2018 / 14:47