All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Braun <thomas.braun@virtuell-zuhause.de>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net
Subject: Re: [PATCH v1 1/2] log -G: Ignore binary files
Date: Wed, 28 Nov 2018 12:31:19 +0100 (CET)	[thread overview]
Message-ID: <1544218571.1651.1543404679825@ox.hosteurope.de> (raw)
In-Reply-To: <87k1l5zabd.fsf@evledraar.gmail.com>

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> hat am 22. November 2018 um 11:16 geschrieben:

[...]

> >
> > +test_expect_success 'log -G ignores binary files' '
> > +	rm -rf .git &&
> > +	git init &&
> > +	printf "a\0b" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git log -G a >result &&
> 
> Would be less confusing as "-Ga" since that's the invocation we
> document, even though I see (but wasn't aware that...) "-G a" works too.

Done.

> > +	test_must_be_empty result
> > +'
> > +
> > +test_expect_success 'log -G looks into binary files with textconv filter' '
> > +	rm -rf .git &&
> > +	git init &&
> > +	echo "* diff=bin" > .gitattributes &&
> > +	printf "a\0b" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git -c diff.bin.textconv=cat log -G a >actual &&
> > +	git log >expected &&
> > +	test_cmp actual expected
> > +'
> > +
> >  test_done
> 
> This patch seems like the wrong direction to me. In particular the
> assertion that "the concept of differences only makes sense for text
> files". That's just not true. This patch breaks this:
> 
>     (
>         rm -rf /tmp/g-test &&
>         git init /tmp/g-test &&
>         cd /tmp/g-test &&
>         for i in {1..10}; do
>             echo "Always matching thensome 5" >file &&
>             printf "a thensome %d binary \0" $i >>file &&
>             git add file &&
>             git commit -m"Bump $i"
>         done &&
>         git log -Gthensome.*5
>     )
> 
> Right now this will emit 3/10 patches, and the right ones! I.e. "Bump
> [156]". The 1st one because it introduces the "Always matching thensome
> 5". Then 5/6 because the add/remove the string "a thensome 5 binary",
> respectively. Which matches /thensome.*5/.

log -p does not show you the patch text in your example because it is treated
as binary. And currently "log -G" has a different opinion into what it looks
and what it ignores. My patch tries to bring both more in line.
 
> I.e. in the first one we do a regex match against the content here
> because we don't have both sides:
> https://github.com/git/git/blob/v2.19.2/diffcore-pickaxe.c#L48-L53
> 
> And then for the later ones where we have both sides we end up in
> diffgrep_consume():
> https://github.com/git/git/blob/v2.19.2/diffcore-pickaxe.c#L27-L36
> 
> I think there may be a real issue here to address, which might be some
> combination of:
> 
>  a) Even though the diffcore can do a binary diff internally, this is
>     not what it exposes with "-p", we just say "Binary files differ".
> 
>     I don't know how to emit the raw version we'll end up passing to
>     diffgrep_consume() in this case. Is it just --binary without the
>     encoding? I don't know...
> 
>  b) Your test case shows that you're matching a string at a \0
>     boundary. Is this perhaps something you ran into? I.e. that we don't
>     have some -F version of -G so we can't supply regexes that match
>     past a \0? I had some related work on grep for this that hasn't been
>     carried over to the diffcore:
> 
>         git log --grep='grep:.*\\0' --author=Ævar
> 
>  c) Is this binary diff we end up matching against just bad in some
>     cases? I haven't dug but that wouldn't surprise me, i.e. that it's
>     trying to be line-based so we'll overmatch in many cases.
> 
> So maybe this is something that should be passed down as a flag? See a
> recent discussion at
> https://public-inbox.org/git/87lg77cmr1.fsf@evledraar.gmail.com/ for how
> that could be done.

It is not about the \0 boundary. v2 of the patches will clarify that. My main
motiviation is to speed up "log -G" as that takes a considerable amount of time 
when it wades through MBs of binary files which change often. And in multiple places
I can already treat binary files differently (e.g. turn off delta compression, skip
trying to diff them, no EOL normalization). And for me making log -G ignore what git 
thinks are binary files is making the line clearer between what should be treated as binary
and what as text.

> Also if we don't have some tests already that were failing with this
> patch we really should have those as "let's test the current behavior
> first". Unfortunately tests in this area are really lacking, see
> e.g. my:
> 
>     git log --author=Junio --min-parents=2 --grep=ab/.*grep
> 
> For some series of patches to grep where to get one patch in I needed to
> often lead with 5-10 test patches to convince reviewers that I knew what
> I was changing, and also to be comfortable that I'd covered all the edge
> cases we currently supported, but weren't testing for.

I'm happy to add more test cases to convince everyone involved :)

  parent reply	other threads:[~2018-11-28 11:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-21 20:52 [PATCH 0/2] Teach log -G to ignore binary files Thomas Braun
2018-11-21 20:52 ` [PATCH v1 1/2] log -G: Ignore " Thomas Braun
2018-11-21 20:52   ` [PATCH v1 2/2] log -S: Add test which searches in " Thomas Braun
2018-11-21 21:00     ` [PATCH 0/2] Teach log -G to ignore " Thomas Braun
2018-11-28 11:32       ` [PATCH v2] log -G: Ignore " Thomas Braun
2018-11-28 12:54         ` Ævar Arnfjörð Bjarmason
2018-12-14 18:44           ` Thomas Braun
2018-11-29  7:10         ` Junio C Hamano
2018-11-29  7:22           ` Junio C Hamano
2018-12-14 18:45             ` Thomas Braun
2018-12-14 18:45           ` Thomas Braun
2018-12-14 18:49       ` [PATCH v3] log -G: ignore " Thomas Braun
2018-12-26 23:24         ` Junio C Hamano
2018-11-22  1:34     ` [PATCH v1 2/2] log -S: Add test which searches in " Junio C Hamano
2018-11-28 11:31       ` Thomas Braun
2018-11-22  9:14     ` Ævar Arnfjörð Bjarmason
2018-11-24  2:27       ` Junio C Hamano
2018-11-28 11:31       ` Thomas Braun
2018-11-22  1:29   ` [PATCH v1 1/2] log -G: Ignore " Junio C Hamano
2018-11-28 11:31     ` Thomas Braun
2018-11-22 10:16   ` Ævar Arnfjörð Bjarmason
2018-11-22 16:27     ` Jeff King
2018-11-28 11:31     ` Thomas Braun [this message]
2018-11-28 11:31     ` Thomas Braun
2018-11-22 16:20   ` Jeff King
2018-11-24  2:32     ` Junio C Hamano
2018-11-28 11:31     ` Thomas Braun
2018-11-26 20:19   ` Stefan Beller
2018-11-27  0:51     ` Junio C Hamano
2018-11-28 11:31       ` Thomas Braun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1544218571.1651.1543404679825@ox.hosteurope.de \
    --to=thomas.braun@virtuell-zuhause.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.