All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.keller@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Stefan Beller <sbeller@google.com>,
	Junio C Hamano <gitster@pobox.com>,
	Git mailing list <git@vger.kernel.org>,
	Jens Lehmann <Jens.Lehmann@web.de>
Subject: Re: weird diff output?
Date: Wed, 6 Apr 2016 10:47:31 -0700	[thread overview]
Message-ID: <CA+P7+xpX_xR9wVdRPgymXe0wRjDY2USRx2PyWJMKTjAepWpP+A@mail.gmail.com> (raw)
In-Reply-To: <20160331134750.GA29790@sigill.intra.peff.net>

On Thu, Mar 31, 2016 at 6:47 AM, Jeff King <peff@peff.net> wrote:
> On Wed, Mar 30, 2016 at 12:31:30PM -0700, Jacob Keller wrote:
>
>> So far I've only found a single location that ends up looking worse
>> within the Linux kernel. Diffs of some Kbuild settings result in
>> something like
>>
>> before:
>>
>>           If unsure, say Y.
>> +
>> +config RMI4_I2C
>> +       tristate "RMI4 I2C Support"
>> +       depends on RMI4_CORE && I2C
>> +       help
>> +         Say Y here if you want to support RMI4 devices connected to an I2C
>> +         bus.
>> +
>> +         If unsure, say Y.
>>
>> after:
>>
>>           required for all RMI4 device support.
>>
>> +         If unsure, say Y.
>> +
>> +config RMI4_I2C
>> +       tristate "RMI4 I2C Support"
>> +       depends on RMI4_CORE && I2C
>> +       help
>> +         Say Y here if you want to support RMI4 devices connected to an I2C
>> +         bus.
>> +
>>           If unsure, say Y.
>>
>> So in this particular instance which has multiple blank lines and is a
>> similar issue as with Stefan's note above, this is where the heuristic
>> falls apart. At least for C code this is basically vanishingly small
>> compared to the number of comment header fix ups.
>>
>> I think it may be that Stefan's suggestions above may be on the right
>> track to resolve that too.
>
> This is a tricky one. There _aren't_ actually multiple blank lines in
> the ambiguous area, because this particular example comes at the very
> end of the file. Try:
>
>   git show 8d99758dee3 drivers/input/rmi4/Kconfig
>
> which adds a block in the middle of the file. It looks good both before
> and after running through the script. Now look at this example:
>
>   git show fdf51604f10 drivers/input/rmi4/Kconfig
>
> which looks like:
>
> diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig
> index 5ea60e3..cc3f7c5 100644
> --- a/drivers/input/rmi4/Kconfig
> +++ b/drivers/input/rmi4/Kconfig
> @@ -8,3 +8,12 @@ config RMI4_CORE
>           required for all RMI4 device support.
>
>           If unsure, say Y.
> +
> +config RMI4_I2C
> +       tristate "RMI4 I2C Support"
> +       depends on RMI4_CORE && I2C
> +       help
> +         Say Y here if you want to support RMI4 devices connected to an I2C
> +         bus.
> +
> +         If unsure, say Y.
>
>
> Note that there is no trailing context, as we're adding at the end of
> the file. So the ambiguous portion consists of only two lines: an empty
> line, and "If unsure...". And we bump the latter to the top, per the
> heuristic (it's the exact opposite of every other case, where the blank
> line is a true delimiter).
>
> As a human, I think the indentation here is the real syntactic clue. But
> getting into indentation heuristics is probably insane.
>
> The script could probably make this work by disabling itself if the hunk
> is at the end of the diffed file (i.e., we don't see more context lines
> afterward). That covers any case like this where newline _is_ a
> delimiter, but we just have some internal newlines, too. It wouldn't
> cover the case where we had internal newlines but used some other
> paragraph delimiter, but based on the results so far, that seems rather
> rare.
>
> Something like this:
>
> --- foo.pl.orig 2016-03-31 09:44:44.281232230 -0400
> +++ foo.pl      2016-03-31 09:44:34.901232632 -0400
> @@ -24,13 +24,15 @@
>        push @hunk, $_;
>        $state = STATE_IN_CHUNK;
>      } else {
> -      flush_hunk();
> +      print @hunk;
> +      @hunk = ();
>        $state = STATE_NONE;
>        print;
>      }
>    }
>  }
> -flush_hunk();
> +print @hunk;
> +@hunk = ();
>
>  sub flush_hunk {
>    my $context_len = 0;
>
> -Peff

I started attempting to implement this heuristic within xdiff, but I
am at a loss as to how xdiff actually works. I suspect this would go
in xdi_change_compact or after it, but I really don't understand how
xdiff represents the diffs at all...

Thanks,
Jake

  reply	other threads:[~2016-04-06 17:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-29  0:26 weird diff output? Jacob Keller
2016-03-29 17:37 ` Stefan Beller
2016-03-29 17:54   ` Junio C Hamano
2016-03-29 18:16     ` Stefan Beller
2016-03-29 23:05       ` Jacob Keller
2016-03-30  0:04         ` Junio C Hamano
2016-03-30  4:55         ` Jeff King
2016-03-30  6:05           ` Stefan Beller
2016-03-30  6:05           ` Jacob Keller
2016-03-30 19:14             ` Jacob Keller
2016-03-30 19:31               ` Jacob Keller
2016-03-30 19:40                 ` Stefan Beller
2016-04-01 19:04                   ` Junio C Hamano
2016-03-31 13:47                 ` Jeff King
2016-04-06 17:47                   ` Jacob Keller [this message]
2016-04-12 19:34                     ` Stefan Beller
2016-04-14 13:56                       ` Davide Libenzi
2016-04-14 18:34                         ` Jeff King
2016-04-14 21:05                           ` Stefan Beller
2016-04-15  0:07                             ` [RFC PATCH, WAS: "weird diff output?"] Implement better chunk heuristics Stefan Beller
2016-04-15  0:26                               ` Jacob Keller
2016-04-15  0:43                                 ` Stefan Beller
2016-04-15  2:07                                   ` Jacob Keller
2016-04-15  2:09                               ` Junio C Hamano
2016-04-15  3:33                                 ` Stefan Beller
2016-04-15  0:21                             ` weird diff output? Jacob Keller
2016-04-15  2:18                             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+P7+xpX_xR9wVdRPgymXe0wRjDY2USRx2PyWJMKTjAepWpP+A@mail.gmail.com \
    --to=jacob.keller@gmail.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.