All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jacob Keller <jacob.keller@gmail.com>,
	Git mailing list <git@vger.kernel.org>,
	Jens Lehmann <Jens.Lehmann@web.de>
Subject: Re: weird diff output?
Date: Tue, 29 Mar 2016 11:16:57 -0700	[thread overview]
Message-ID: <CAGZ79kZiiOgxh6vMDnaJ_b+VVGrFBfGzZukTN6OEBxUV9-2vQw@mail.gmail.com> (raw)
In-Reply-To: <xmqqzithxj8l.fsf@gitster.mtv.corp.google.com>

On Tue, Mar 29, 2016 at 10:54 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> I thought this is an optimization for C code where you have a diff like:
>>
>>     int existingStuff1(..) {
>>     ...
>>     }
>>     +
>>     + int foo(..) {
>>     +...
>>     +}
>>
>>     int existingStuff2(...) {
>>     ...
>>
>> Note that the closing '}' could be taken from the method existingStuff1 instead
>> of correctly closing foo.
>
> That is a less optimal output.  Another possible output would be
> like so:
>
>       int existingStuff1(..) {
>       ...
>       }
>
>      + int foo(..) {
>      +...
>      +}
>      +
>       int existingStuff2(...) {
>
> All three are valid output, and ...
>
>> So the correct heuristic really depends on what kind of text we
>> are diffing.
>
> ... this realization is correct.
>
> I have a feeling that any heuristic would be correct half of the
> time, including the ehuristic implemented in the current code.  The
> readers of patches have inherent bias.  They do not notice when the
> hunk is formed to match their expectation, but they will notice and
> remember when they see something less optimal.
>

We have 3 possible diffs:
1) closing brace and newline before the chunk
2) newline before, closing brace after the chunk
3) closing brace and newline after the chunk

For C code we may want to conclude that 3) is best. (appeals the bias of
most people) 2 is slightly worse, whereas 1) is absolutely worst.

Now looking at the code Jacob found strange:

>  cat > expect <<EOF
> + expected results ...
> + EOF
> +test_expect_failure  ... '
> + ...
> + '
> +
> +cat > expect <<EOF

This can be written in two ways:

1) "cat > expect <<EOF" before the diff chunk
2) "cat > expect <<EOF" after the diff chunk

We claim 1) is better than 2).
This is different from the C code as now we want to have the
same lines before not after.

To find a heuristic, which appeals both the C code
and the shell code, we could take the empty line
as a strong hint for the divider:

1) determine the amount of diff which is ambiguous, i.e. can
   go before or after the chunk.
2) Does the ambiguous part contain an empty line?
3) If not, I have no offer for you, stop.
4) divide the ambiguous chunk by the empty line,
5) put the lines *after* the empty line in front of the chunk
6) put the part before (including) the empty line after the
   chunk
7) Observe output:

>       }
>
>      + int foo(..) {
>      +...
>      +}
>      +
>       int existingStuff2(...) {

> test_expect_failure ... '
> existing test ...
> '
>
> + cat > expect <<EOF
> + expected results ...
> + EOF
> +test_expect_failure  ... '
> + ...
> + '
> +
> cat > expect <<EOF

This is what we want in both cases.
And I would argue it would appease many other kinds of text as well, because
an empty line is usually a strong indicator for any text that a
different thing comes along.
(Other programming languages, such as Java, C++ and any other C like
language behaves
that way; even when writing latex figures you'd rather want to break
at new lines?)

Thanks,
Stefan

  reply	other threads:[~2016-03-29 18:17 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-29  0:26 weird diff output? Jacob Keller
2016-03-29 17:37 ` Stefan Beller
2016-03-29 17:54   ` Junio C Hamano
2016-03-29 18:16     ` Stefan Beller [this message]
2016-03-29 23:05       ` Jacob Keller
2016-03-30  0:04         ` Junio C Hamano
2016-03-30  4:55         ` Jeff King
2016-03-30  6:05           ` Stefan Beller
2016-03-30  6:05           ` Jacob Keller
2016-03-30 19:14             ` Jacob Keller
2016-03-30 19:31               ` Jacob Keller
2016-03-30 19:40                 ` Stefan Beller
2016-04-01 19:04                   ` Junio C Hamano
2016-03-31 13:47                 ` Jeff King
2016-04-06 17:47                   ` Jacob Keller
2016-04-12 19:34                     ` Stefan Beller
2016-04-14 13:56                       ` Davide Libenzi
2016-04-14 18:34                         ` Jeff King
2016-04-14 21:05                           ` Stefan Beller
2016-04-15  0:07                             ` [RFC PATCH, WAS: "weird diff output?"] Implement better chunk heuristics Stefan Beller
2016-04-15  0:26                               ` Jacob Keller
2016-04-15  0:43                                 ` Stefan Beller
2016-04-15  2:07                                   ` Jacob Keller
2016-04-15  2:09                               ` Junio C Hamano
2016-04-15  3:33                                 ` Stefan Beller
2016-04-15  0:21                             ` weird diff output? Jacob Keller
2016-04-15  2:18                             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZiiOgxh6vMDnaJ_b+VVGrFBfGzZukTN6OEBxUV9-2vQw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.