git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Getting clean diff data from git-mailinfo
@ 2020-02-21 17:14 Konstantin Ryabitsev
  2020-02-22 16:47 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-21 17:14 UTC (permalink / raw)
  To: git

Hello:

Git-mailinfo is a handy utility to quickly parse the contents of a 
message containing a patch. However, I'm curious why there isn't a way 
to get just the diff data, without all the surrounding junk. E.g.:

curl https://lore.kernel.org/driverdev-devel/20200221123817.16643-1-ajay.kathat@microchip.com/raw  \
  | git mailinfo msg patch > info

The contents of "msg" are already munged to reduce it to exactly what 
would be in the commit message (properly processing the extra From: 
header), but the contents of "patch" contain all the junk from around 
the diff, like the diffstat, git version info, and the list trailer.

Is there a git-native command to further clean up the "patch" file to 
get just diff contents (i.e. as returned by "git diff" after this patch 
is applied)?

-K

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Getting clean diff data from git-mailinfo
  2020-02-21 17:14 Getting clean diff data from git-mailinfo Konstantin Ryabitsev
@ 2020-02-22 16:47 ` Junio C Hamano
  2020-02-22 16:56   ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2020-02-22 16:47 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> Is there a git-native command to further clean up the "patch" file to 
> get just diff contents (i.e. as returned by "git diff" after this patch 
> is applied)?

There isn't one, as Git did not need one ;-)

The "git am" toolchain is tasked to take a reasonably formatted
e-mailed patch generated by tools other people use.  When fed a
piece of e-mail, after it was split out of a mailbox by the "git
mailsplit" program, the "git mailinfo" program is asked to

 (1) gather metainfo for author identity
 (2) gather commit log message material
 (3) collect the input for "git apply"

The e-mail header is parsed for (1) and the first line of (2), and
then the e-mail body is scanned to find the boundary between (2) and
(3), and this is done in order to avoid cruft at the end of (2) as
much as possible, because (2) is something a human user has to clean
up while applying, as opposed to (3) that is mechanically processed.
For that, the line between (2) and (3) is drawn:

 (a) at "---\n" line, for output by "git format-patch";

 (b) at "Index: " line, that often comes from CVS repository;

 (c) at "diff -" line, that can catch handmade patch e-mail using
     GNU and BSD diff.

And that is why we throw the diffstat and commentary to maintainer
that are written after the "---\n" line but before the diff in (3).

Now, if "git apply" were less smart and required a pure diff without
anything else wround it as its input, then we may have had split (3)
into three pieces:

 (3a) material before the pure diff (e.g. diffstat, etc.)
 (3b) pure diff
 (3c) trailing junk (e.g. base-commit info, e-mail signature, etc.)

But "git apply" was designed to be usable on the whole of plain text
e-mail, roughly as a "GNU diff" replacement, it does not require (3a)
and (3c) cleansed out from its input.

So, because there is no such need so far, there is no tool in the
Git toolbox to split (3) into three pieces.

You're welcome to write one, but the current toolset does not need
it.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Getting clean diff data from git-mailinfo
  2020-02-22 16:47 ` Junio C Hamano
@ 2020-02-22 16:56   ` Junio C Hamano
  2020-02-22 18:00     ` Andreas Schwab
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2020-02-22 16:56 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> ... then we may have had split (3) into three pieces:
>
>  (3a) material before the pure diff (e.g. diffstat, etc.)
>  (3b) pure diff
>  (3c) trailing junk (e.g. base-commit info, e-mail signature, etc.)
> ...
> So, because there is no such need so far, there is no tool in the
> Git toolbox to split (3) into three pieces.
>
> You're welcome to write one, but the current toolset does not need
> it.

Writing something that reads (3), discarding lines before the first
"diff --git", counting lines that appear on "@@ ... @@" line while
copying it to the output, repeating the process when you see
something other than "diff --git" (i.e. beginning of the patch for
the next path) or "@@ ... @@" (i.e. another hunk in the patch for
the current path), and discarding the rest may be trivial.

But in practice, people edit their diff [*1*], forgetting the line
counts on the "@@ ... @@" lines, and it helps the maintainer to have
the whole (3), not only (3b), in a single file to recover from such
a broken patch submission.

So adding another tool to produce (3b) only is fine, but an attempt
to get rid of (3) and to claim that (3b) replaces the need for (3)
is highly discouraged.

Thanks.


[Footnote]

*1* Even when people edit without changing the line numbers (imagine
    a typofix on a '+' line), I saw that "patch" mode of Emacs broke
    the line count on "@@ ...@@" line of the last hunk when the
    patch ends with certain patterns.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Getting clean diff data from git-mailinfo
  2020-02-22 16:56   ` Junio C Hamano
@ 2020-02-22 18:00     ` Andreas Schwab
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Schwab @ 2020-02-22 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Konstantin Ryabitsev, git

On Feb 22 2020, Junio C Hamano wrote:

> *1* Even when people edit without changing the line numbers (imagine
>     a typofix on a '+' line), I saw that "patch" mode of Emacs broke
>     the line count on "@@ ...@@" line of the last hunk when the
>     patch ends with certain patterns.

For example, when followed by the "-- " signature of git format-patch,
as that makes the output ambiguous.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-22 18:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 17:14 Getting clean diff data from git-mailinfo Konstantin Ryabitsev
2020-02-22 16:47 ` Junio C Hamano
2020-02-22 16:56   ` Junio C Hamano
2020-02-22 18:00     ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).