git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Philippe Blain <levraiphilippeblain@gmail.com>,
	Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 0/3] built-in add -p: support diff-so-fancy better
Date: Wed, 24 Aug 2022 23:05:38 +0200 (CEST)	[thread overview]
Message-ID: <34qs5607-001o-0n6o-rson-9587q8r909qn@tzk.qr> (raw)
In-Reply-To: <xmqqmtbt7b5p.fsf@gitster.g>

Hi Junio & Philippe,

On Wed, 24 Aug 2022, Junio C Hamano wrote:

> Philippe Blain <levraiphilippeblain@gmail.com> writes:
>
> >>>> However, I've tried it on a more "real-life" example, and then I get:
> >>>>
> >>>>     error: mismatched output from interactive.diffFilter
> >>>>     hint: Your filter must maintain a one-to-one correspondence
> >>>>     hint: between its input and output lines.
> >>>>
> >>>> This is despite using "diff-so-fancy --patch" as interactive.diffFilter, which should
> >>>> keep the number of lines the same.
> >>>
> >>> Would you mind sharing the example with me?
> >>
> >> In trying to write a reproducer, I realize that this is in fact a separate "bug".
> >> I happened to have a submodule in the repository I was testing, and it had modified
> >> content. This is what was causing the "mismatched output" error, although I'm not
> >> sure why. If I use a pathspec that does not include the submodule when invoking
> >> 'git add -p', it works correctly.
> >>
> >> I'm a bit out of time now but I'll try to write a separate
> >> reproducer for this later today.
> >
> > So in trying to write a new reproducer, I found the cause of the
> > bug. ...

Ah. Submodules. The gift that keeps on giving.

I'm not _really_ complaining, though. It seems that with beautiful
regularity, submodules come up as crucial parts of attack vectors in
exploits that are reported to the git-security mailing list, requiring
somebody with my abilities to implement fixes. Therefore, submodules award
me with a sort of job security. :-)

> I somehow had an impression that that the C reimplementation was
> designed to be a bug-for-bug compatible faithful rewrite of the
> original scripted version,

Yes, it was designed as that, with a few things done differently, though,
such as avoiding unnecessary `git diff` invocations, and using C-native
data structures and memory management.

One unfortunate consequence of avoiding the `git diff` invocations of
`git-add--interactive.perl` that are not actually necessary is that I
missed that one of those invocations specifically ignores dirty
submodules, and that the Perl script then only processes files whose
numstat shows up in _that_ diff output.

Fortunately, the fix is easy: simply ignore dirty submodules in _all_ `git
diff` invocations of `add -p`.

I will submit the next iteration as soon as the PR builds pass.

> but looking at the way how this bug is handled, I am not sure (it looks
> as if the initial fix was to patch the code to match the symptom, not to
> make the code to behave more faithfully to the scripted version).

The built-in `add -p` does something slightly different from the Perl
version when it comes to the line ranges in hunk headers: instead of
storing stringified versions of them and parsing & re-rendering them when
the hunks are modified, the C version stores the actual numbers, adjusts
them as needed (e.g. when hunks are split), and, crucially, renders them
on the fly when displaying the hunks.

That means that also the colorized hunk headers are generated on the fly
whenever the hunks are displayed, and they are never stored in rendered
form, neither while parsing diffs nor when hunks are split or otherwise
edited. That's why the built-in `add -p` looked for a `@@ ... @@` part in
the colorized diff, so that it can display the remainder of that line
after showing the rendered line range.

And yes, this is a deviation from the bug-for-bug principle I follow for
conversions from scripted commands to built-in C code, but it was for a
good reason: memory management in C is less trivial than in Perl (where
one essentially YOLOs memory management), and I was not prepared to invent
a rope data structure just to be able to replace parts of a long string
buffer with rendered text of a different length, nor was I prepared to
call `memmove()` tons of times.

Note that the Perl version does something slightly iffy, too: if a diff
colorizer is configured, it initially shows the original hunk headers
(e.g. "@ file:16 @" in the `diff-so-fancy` case) but when the hunk is
modified in any way, it generates "@@ -1,6 +1,7 @@"-style hunk headers and
does not show the ones generated by the diff colorizer _at all_ anymore.

In this patch series, I teach the built-in `add -p` to be more lenient and
simply show the entire original hunk header after the rendered line range
if no `@@ ... @@` part was found. If that part was found, we still replace
it with the rendered line range.

Yes, it is a deviation from what the Perl version does. I consider it an
improvement, though.

> As with any big rewrite, a complete re-implementation always has risks
> to introduce unintended regressions and that is why starting with
> faithful rewrite with the staged transition plan is valuable.
>
> In this case, the staged transition plan, the step to flip the
> default without before remove the old one, is working beautifully.
> And it was only made possible with the actual users reporting issues
> before they say "the new one is broken, so let's use the knob to
> retain the old implementation".

Indeed, and thank you, Philippe, for bringing this to the Git mailing list
so that I could do something about the bug.

Ciao,
Dscho

  reply	other threads:[~2022-08-24 21:05 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-23 18:04 [PATCH 0/3] built-in add -p: support diff-so-fancy better Johannes Schindelin via GitGitGadget
2022-08-23 18:04 ` [PATCH 1/3] t3701: redefine what is "bogus" output of a diff filter Johannes Schindelin via GitGitGadget
2022-08-23 18:04 ` [PATCH 2/3] add -p: gracefully ignore unparseable hunk headers in colored diffs Johannes Schindelin via GitGitGadget
2022-08-23 18:04 ` [PATCH 3/3] add -p: handle `diff-so-fancy`'s hunk headers better Johannes Schindelin via GitGitGadget
2022-08-24  3:49 ` [PATCH 0/3] built-in add -p: support diff-so-fancy better Philippe Blain
2022-08-24  6:27   ` Johannes Schindelin
2022-08-24 13:21     ` Philippe Blain
2022-08-24 17:49       ` Philippe Blain
2022-08-24 18:24         ` Junio C Hamano
2022-08-24 21:05           ` Johannes Schindelin [this message]
2022-08-24 21:37             ` Junio C Hamano
2022-08-24 21:21 ` [PATCH v2 0/4] " Johannes Schindelin via GitGitGadget
2022-08-24 21:21   ` [PATCH v2 1/4] t3701: redefine what is "bogus" output of a diff filter Johannes Schindelin via GitGitGadget
2022-08-24 21:21   ` [PATCH v2 2/4] add -p: gracefully ignore unparseable hunk headers in colored diffs Johannes Schindelin via GitGitGadget
2022-08-29  7:56     ` Junio C Hamano
2022-08-24 21:21   ` [PATCH v2 3/4] add -p: handle `diff-so-fancy`'s hunk headers better Johannes Schindelin via GitGitGadget
2022-08-29  8:06     ` Junio C Hamano
2022-08-29 13:32       ` Johannes Schindelin
2022-08-29 17:19         ` Junio C Hamano
2022-08-30 14:14           ` Johannes Schindelin
2022-08-24 21:21   ` [PATCH v2 4/4] add -p: ignore dirty submodules Johannes Schindelin via GitGitGadget
2022-08-24 22:11   ` [PATCH v2 0/4] built-in add -p: support diff-so-fancy better Junio C Hamano
2022-08-25  0:18   ` Philippe Blain
2022-08-26 11:43     ` Johannes Schindelin
2022-08-26 23:15       ` Philippe Blain
2022-08-29 15:11   ` [PATCH v3 0/5] " Johannes Schindelin via GitGitGadget
2022-08-29 15:11     ` [PATCH v3 1/5] t3701: redefine what is "bogus" output of a diff filter Johannes Schindelin via GitGitGadget
2022-08-30 13:17       ` Phillip Wood
2022-08-30 21:36         ` Junio C Hamano
2022-08-31  9:26           ` Phillip Wood
2022-08-31 15:36             ` Jeff King
2022-08-31 15:47               ` Jeff King
2022-08-31 19:57                 ` Johannes Schindelin
2022-08-29 15:11     ` [PATCH v3 2/5] add -p: gracefully ignore unparseable hunk headers in colored diffs Johannes Schindelin via GitGitGadget
2022-08-29 15:11     ` [PATCH v3 3/5] add -p: insert space in colored hunk header as needed Johannes Schindelin via GitGitGadget
2022-08-29 15:11     ` [PATCH v3 4/5] add -p: handle `diff-so-fancy`'s hunk headers better Johannes Schindelin via GitGitGadget
2022-08-30 13:23       ` Phillip Wood
2022-08-29 15:11     ` [PATCH v3 5/5] add -p: ignore dirty submodules Johannes Schindelin via GitGitGadget
2022-08-30 13:26       ` Phillip Wood
2022-08-31 20:05         ` Johannes Schindelin
2022-08-31 20:19           ` Junio C Hamano
2022-08-31 20:38             ` Johannes Schindelin
2022-08-29 18:01     ` [PATCH v3 0/5] built-in add -p: support diff-so-fancy better Junio C Hamano
2022-08-30 14:22       ` Johannes Schindelin
2022-08-30 13:29     ` Phillip Wood
2022-08-31 20:44       ` Johannes Schindelin
2022-08-31 20:31     ` [PATCH v4 0/3] " Johannes Schindelin via GitGitGadget
2022-08-31 20:31       ` [PATCH v4 1/3] add -p: detect more mismatches between plain vs colored diffs Johannes Schindelin via GitGitGadget
2022-09-01 13:19         ` Phillip Wood
2022-08-31 20:31       ` [PATCH v4 2/3] add -p: gracefully handle unparseable hunk headers in " Johannes Schindelin via GitGitGadget
2022-09-01 13:53         ` Phillip Wood
2022-09-01 15:09           ` Johannes Schindelin
2022-08-31 20:31       ` [PATCH v4 3/3] add -p: ignore dirty submodules Johannes Schindelin via GitGitGadget
2022-09-01 15:45         ` Jeff King
2022-09-01 15:49           ` Jeff King
2022-09-01 16:17         ` Junio C Hamano
2022-09-02  8:53           ` Johannes Schindelin
2022-09-01 13:55       ` [PATCH v4 0/3] built-in add -p: support diff-so-fancy better Phillip Wood
2022-09-01 16:19         ` Junio C Hamano
2022-09-01 15:42       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-09-01 15:42         ` [PATCH v5 1/3] add -p: detect more mismatches between plain vs colored diffs Johannes Schindelin via GitGitGadget
2022-09-01 15:42         ` [PATCH v5 2/3] add -p: gracefully handle unparseable hunk headers in " Johannes Schindelin via GitGitGadget
2022-09-01 16:03           ` Phillip Wood
2022-09-01 15:42         ` [PATCH v5 3/3] add -p: ignore dirty submodules Johannes Schindelin via GitGitGadget
2022-09-01 16:55           ` Junio C Hamano
2022-09-01 16:04         ` [PATCH v5 0/3] built-in add -p: support diff-so-fancy better Phillip Wood
2022-09-01 16:54           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34qs5607-001o-0n6o-rson-9587q8r909qn@tzk.qr \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=levraiphilippeblain@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).