git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Barret Rhoden <brho@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "René Scharfe" <l.s.r@web.de>,
	git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"David Kastrup" <dak@gnu.org>, "Jeff King" <peff@peff.net>,
	"Jeff Smith" <whydoubt@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Stefan Beller" <stefanbeller@gmail.com>
Subject: Re: [PATCH v2 2/3] blame: add the ability to ignore commits and their changes
Date: Fri, 1 Feb 2019 15:37:02 -0500	[thread overview]
Message-ID: <20190201153702.2d029143@gnomeregan.cam.corp.google.com> (raw)
In-Reply-To: <xmqqo987b2g5.fsf@gitster-ct.c.googlers.com>

On 2019-01-23 at 11:26 Junio C Hamano <gitster@pobox.com> wrote:
> Yeah, and if the original had two adjacent lines, and replacement
> has three adjacent lines, the algorithm would not even know if 
> 
>  - the first line in the original was split into first two in the
>    update and the second line was modified in place; or
> 
>  - the first line in the original was modified in place and the
>    second line was split into the latter two lines in the update
> 
> In short, there is no answer to "what is the corresponding line of
> this line before this commit changed it?" in general, and any
> algorithm, as long as it tries to see what was the "corresponding
> line" of the line that is blamed to a commit, would not produce
> results human readers would expect all the time.
> 
> As you said, heuristics may get us far enough to be useful, though
> ;-).

Yeah.  We can do one more thing: when we ignore a change that added
more lines than it removed, we can at least not report totally
unrelated commits.  

For example, the parent of an ignored commit has this, say at line 11:

commit-a 11) void new_func_1(void *x, void *y);
commit-b 12) void new_func_2(void *x, void *y);
commit-c 13) some_line_c
commit-d 14) some_line_d

After a commit 'X', we have:

commit-X 11) void new_func_1(void *x,
commit-X 12)                 void *y);
commit-X 13) void new_func_2(void *x,
commit-X 14)                 void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d

In my existing code, if you ignore commit X, the blames look like this:

commit-a 11) void new_func_1(void *x,
commit-b 12)                 void *y);
commit-c 13) void new_func_2(void *x,
commit-d 14)                 void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d

Lines 13 and 14 are blamed on the nearby commits C and D.  The reason
is the blame entry for X is four lines long, rooted at line 11, but when
we look back through the history, we'll look at the parent's image of
the file where that diff hunk is only two lines long.  The extra two
lines just blame whatever follows the diff hunk in the parent's image.
In this case, it is just the lines C and D repeated again.

I can detect this situation when we ignore the diffs from commit X.  If
X added more lines than it removed, then I only pass the number of
lines to the parent that the parent had.  The rest get their own
blame_entry, marked 'unblamable', which I'll catch when we create the
output.  The parent can't find blame for lines that don't exist in its
image of the file.  

With that change, the above example blames like this:

commit-a 11) void new_func_1(void *x,
commit-b 12)                 void *y);
00000000 13) void new_func_2(void *x,
00000000 14)                 void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d

As we discussed, we still can never be certain about which commits
*should* be blamed for which lines.  (Note that line 12 was blamed on
B, though B was the commit for new_func_2(), not new_func_1()).  But I
can avoid blaming commits that just happen to be in the area and would
only be 'correct' due to dumb luck.

This also handles cases where an ignored commit only adds lines,
which is a specific case of "added more than removed."

Handling this case is a surprisingly small change.  I'll post it once I
sort out the tests and other comments on this patch series.  

For now, I went with unconditionally marking the commit as all 0s for
the case where we know it is 'wrong.'  I can do that conditionally
based on blame.markIgnoredLines if you want, though I think any commit
attributed to those lines would be misleading.

Thanks,

Barret


  reply	other threads:[~2019-02-01 20:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07 21:30 [PATCH] blame: add the ability to ignore commits Barret Rhoden
2019-01-07 23:13 ` Junio C Hamano
2019-01-08 16:27   ` Barret Rhoden
2019-01-08 18:26     ` Junio C Hamano
2019-01-09 20:48       ` Barret Rhoden
2019-01-10 22:29         ` Junio C Hamano
2019-01-14 15:19           ` Barret Rhoden
2019-01-14 17:45             ` Junio C Hamano
2019-01-14 18:26               ` Randall S. Becker
2019-01-14 19:03                 ` Barret Rhoden
2019-01-15  6:08                 ` Junio C Hamano
2019-01-09 21:21       ` Stefan Beller
2019-01-08 13:12 ` Ævar Arnfjörð Bjarmason
2019-01-08 16:41   ` Barret Rhoden
2019-01-08 18:04     ` Barret Rhoden
2019-01-17 20:29 ` [PATCH v2 0/3] " Barret Rhoden
2019-01-17 20:29   ` [PATCH v2 1/3] Move init_skiplist() outside of fsck Barret Rhoden
2019-01-18  9:25     ` Johannes Schindelin
2019-01-18  9:45     ` Ævar Arnfjörð Bjarmason
2019-01-18 17:36       ` Junio C Hamano
2019-01-18 20:59         ` Johannes Schindelin
2019-01-18 21:30           ` Jeff King
2019-01-18 22:26             ` Ævar Arnfjörð Bjarmason
2019-01-22  7:12               ` Jeff King
2019-01-22  9:46                 ` Ævar Arnfjörð Bjarmason
2019-01-22 17:54                   ` Junio C Hamano
2019-01-22 18:28                   ` Jeff King
2019-01-17 20:29   ` [PATCH v2 2/3] blame: add the ability to ignore commits and their changes Barret Rhoden
2019-01-18  9:47     ` Johannes Schindelin
2019-01-18 17:33       ` Barret Rhoden
2019-01-20 18:19     ` René Scharfe
2019-01-22 15:26       ` Barret Rhoden
2019-01-22 18:14       ` Junio C Hamano
2019-01-22 19:35         ` Barret Rhoden
2019-01-23 19:26           ` Junio C Hamano
2019-02-01 20:37             ` Barret Rhoden [this message]
2019-01-17 20:29   ` [PATCH v2 3/3] blame: add a config option to mark ignored lines Barret Rhoden
2019-01-18 10:03     ` Johannes Schindelin
2019-01-18  9:52   ` [PATCH v2 0/3] blame: add the ability to ignore commits Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190201153702.2d029143@gnomeregan.cam.corp.google.com \
    --to=brho@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    --cc=whydoubt@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).