All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barret Rhoden <brho@google.com>
To: Harrison McCullough <mccullough.harrison@gmail.com>
Cc: "René Scharfe" <l.s.r@web.de>, git@vger.kernel.org
Subject: Re: git blame --ignore-rev does not work
Date: Fri, 2 Oct 2020 20:56:27 -0400	[thread overview]
Message-ID: <c77eb54a-ecfd-508b-ed6a-030e66d8e257@google.com> (raw)
In-Reply-To: <CAHLeu+zSaTwPEDQ=CuFua0NdEppM+OjFaREU+Yiy9udK1OUK4w@mail.gmail.com>

Hi -

On 10/2/20 6:52 PM, Harrison McCullough wrote:
> Thank you for your feedback! I do have some more information to
> provide that is confusing.
> 
> I tried running `git blame -w`, and this correctly ignores the
> revision I tried to ignore with `--ignore-rev`, etc. So it appears
> that the algorithm to attribute lines to commits is capable of
> ignoring the commit in question (in the lines I've inspected) but it's
> not doing it when I use the "ignore-rev" capability—only the "ignore
> whitespace changes" capability.
> 
> Does anyone have any ideas about why that may be the case? Does the
> "ignore whitespace" and "ignore commit" algorithms use different
> logic? I would have assumed that they shared most of the logic.

Yeah, the logic is a little different.  IIRC, the "whitespace ignore" 
affects individual diff_hunks that are generated during the blame 
process, generated by calling into xdiff.  That's when blame tries and 
figure out what lines change from target/child to parent.  So the "blame 
chunk" that gets analyzed is different, depending on whitespace-ignore 
or not.

The "blame ignore" logic happens after that, and it operates on the 
output of xdiff.  If the 'target' is one of the ignored commits, it 
looks at those chunks produced by xdiff (differences from target commit 
to parent) and for each line in the target commit, attempts to figure 
out what line in the parent to match it to.

The matching process is imperfect.  It uses a fingerprinting algorithm 
to find a likely candidate line in the parent's diff hunk.  The 
fingerprinting is based on matching the two-character pairs in each 
string.  (I didn't write the fingerprinting, btw).  It does some ranking 
of the lines, picks the best one, and a few other 'search quality' 
things.  etc.  If it fails to find a matching line in the hunk, it'll do 
a simple O(n) scan in the entire file.  But when it looks in the entire 
file, it has a threshold of similarity, so you don't match arbitrary 
things.  (Threshold is 10 two-letter pairs, I think).

Anyway, my guess is that when you use the ignore-whitespace option, it 
changes the diff hunks enough that blame_chunk() gives a different 
result.  This could be because there are larger diff hunks (more search 
space for the "good" initial scan).  Or maybe because you have a 
different set of two-character pairs or something.

> I would love to provide a concrete example, but the only time I've
> been able to reproduce this is with proprietary code. I'll try to
> create a new repository with a similar commit and see if I can ignore
> it there.

I'd be glad to take a look, though I don't know how fixable it is. 
My guess is you're hitting right on the edge of the "find a similar line 
inside a hunk" and "couldn't find a good change in the entire file" 
threshold, and any change would just be tuning it for your repo.  But 
maybe not, and I'd like for --ignore-rev to work for you.

> For the information of those interested, the commit I'm trying to
> ignore is a "reformat the world" commit. We introduced the tool
> "astyle" into our codebase, and as part of that effort I ran astyle
> over our entire codebase.

Same situation for me - reformatted a lot of code and didn't want to 
break blame.  =)  That was the original motivation.  I've actually used 
it a lot for other things now, such as finding out which commit changed 
a line in a particular way.  git blame, look around.  git show XXX, 
realize i want and older commit, git blame --ignore-rev XXX, etc.

> Is it possible that the commit isn't being ignored because it's too
> big? It did change over 1300 files....

That should be OK.  The algorithm doesn't care about the other files in 
the commit - only the one that you are blaming.

Thanks,

Barret


> 
> On Fri, Oct 2, 2020 at 4:44 PM Barret Rhoden <brho@google.com> wrote:
>>
>> Hi -
>>
>> On 10/2/20 5:40 PM, René Scharfe wrote:
>> [snip]
>>> I don't know if these revisions are not ignored due to bugs or because
>>> the feature just isn't strong enough, yet, but I would expect your
>>> particular case to be represented by at least one of these...
>>
>> Correct.
>>
>> When skipping a revision, the algorithm attempts to find another
>> revision that could be responsible for the change.  But it might not be
>> able to find anything.  Consider a commit that just adds a few lines to
>> a file with only 'foo' and 'bar':
>>
>> commit: "Adding Lines"
>> -------------
>>    foo
>> +No commit
>> +ever touched
>> +these lines
>>    bar
>>
>> If we ignored that revision, which commit do we assign those lines to?
>> If they were "similar" to the existing lines, then the algorithm might
>> match.  But in general, we can't find 'correct' (as defined by a user)
>> matches for arbitrary changes.
>>
>> I usually run git with these settings:
>>
>> [blame]
>>           ignorerevsfile = .git-blame-ignore-revs
>>           markIgnoredLines = true
>>           markUnblamableLines = true
>>
>> Which points out when --ignore-revs is doing something.
>>
>> Thanks,
>>
>> Barret
>>
>>
> 
> 


      reply	other threads:[~2020-10-03  0:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 21:15 git blame --ignore-rev does not work Harrison McCullough
2020-10-02 21:40 ` René Scharfe
2020-10-02 22:44   ` Barret Rhoden
2020-10-02 22:52     ` Harrison McCullough
2020-10-03  0:56       ` Barret Rhoden [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c77eb54a-ecfd-508b-ed6a-030e66d8e257@google.com \
    --to=brho@google.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=mccullough.harrison@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.