linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nathan Chancellor <nathan@kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	KVM list <kvm@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	Netdev <netdev@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	mie@igel.co.jp
Subject: Re: [GIT PULL] virtio: last minute fixup
Date: Wed, 11 May 2022 08:51:40 -0400	[thread overview]
Message-ID: <20220511125140.ormw47yluv4btiey@meerkat.local> (raw)
In-Reply-To: <CAHk-=wgAk3NEJ2PHtb0jXzCUOGytiHLq=rzjkFKfpiuH-SROgA@mail.gmail.com>

On Tue, May 10, 2022 at 04:50:47PM -0700, Linus Torvalds wrote:
> > For what it's worth, as someone who is frequently tracking down and
> > reporting issues, a link to the mailing list post in the commit message
> > makes it much easier to get these reports into the right hands, as the
> > original posting is going to have all relevant parties in one location
> > and it will usually have all the context necessary to triage the
> > problem.
> 
> Honestly, I think such a thing would be trivial to automate with
> something like just a patch-id lookup, rather than a "Link:".

I'm not sure that's quite reliable, and I'm speaking from experience of
running git-patchwork-bot, which attempts to match commits to patches.
Patch-id has these important disadvantages:

1. git-patch-id can be fragile: if the maintainer changes things like add
   curly braces, rename a variable, or edit a code comment for clarity, the
   patch-id stops matching. This happens routinely with git-patchwork-bot,
   and patchwork uses an even laxer algorithm than git-patch-id. In fact, I
   had to hack git-patchwork-bot to fall back on Link: tags to match by
   message-id to address some of the maintainers' complaints.

2. git-patch-id doesn't include author/date/commit message: which can actually
   be important for establishing provenance and attribution and can confuse
   automation. E.g. an author submits a patch as part of a large series, gets
   told to break it apart, then submits it as part of a different series.
   Automated processes trying to match commits to submissions won't be able to
   tell from which series the commit came from.

Cregit folks (cregit.linuxsources.org) have encountered all of these and I
know from talking to them that they are quite happy to have a way to match
commit provenance to the exact messages in the archives.

> Wouldn't it be cool if you had some webby interface to just go from
> commit SHA1 to patch ID to a lore.kernel.org lookup of where said
> patch was done?

Yes, it's https://cregit.linuxsources.org/ and it's... okay. :) It certainly
doesn't manage to match all commits to patches despite having access to all of
lore.kernel.org archives.

> My argument here really is that "find where this commit was posted" is
> 
>  (a) not generally the most interesting thing
> 
>  (b) doesn't even need that "Link:" line.
> 
> but what *is* interesting, and where the "Link:" line is very useful,
> is finding where the original problem that *caused* that patch to be
> posted in the first place.

I think the disconnect here is that you're approaching this from the
perspective of a human being, while what many want is a dumb and reliable way
to match commits to ML submissions, which will allow improving unattended
automation.

> So that whole "searching is often an option" is true for pretty much
> _any_ Link:, but I think that for the whole "original submission" it's
> so mindless and can be automated that it really doesn't add much real
> value at all.

Believe me, I've tried, and I really, really like having a fool-proof way to
match commits directly to the exact ML submissions. :( Even a 99%-reliable
fuzzy matching algorithm has enough of a failure rate that causes maintainers
to get annoyed -- I have many "git-patchwork-bot missed this commit"
complaints in the queue to prove this.

I think we should simply disambiguate the trailer added by tooling like b4.
Instead of using Link:, it can go back to using Message-Id, which is already
standard with git -- it's trivial for git.kernel.org to link them to
lore.kernel.org.

Before:

    Signed-off-by: Main Tainer <main.tainer@linux.dev>
    Link: https://lore.kernel.org/r/CAHk-=wgAk3NEJ2PHtb0jXzCUOGytiHLq=rzjkFKfpiuH-SROgA@mail.gmail.com

After:

    Signed-off-by: Main Tainer <main.tainer@linux.dev>
    Message-Id: <CAHk-=wgAk3NEJ2PHtb0jXzCUOGytiHLq=rzjkFKfpiuH-SROgA@mail.gmail.com>

This would allow people to still use Link: for things like linking to actual
ML discussions. I know this pollutes commits a bit, but I would argue that
this is a worthwhile trade-off that allows us to improve our automation and
better scale maintainers.

-K

  parent reply	other threads:[~2022-05-11 12:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 12:23 [GIT PULL] virtio: last minute fixup Michael S. Tsirkin
2022-05-10 18:23 ` Linus Torvalds
2022-05-10 23:12   ` Nathan Chancellor
2022-05-10 23:50     ` Linus Torvalds
2022-05-11  7:13       ` Michael S. Tsirkin
2022-05-11 12:51       ` Konstantin Ryabitsev [this message]
2022-05-11 13:40         ` Michael Ellerman
2022-05-11 16:31           ` Konstantin Ryabitsev
2022-05-12  2:07             ` Theodore Ts'o
2022-05-11 17:35       ` Dave Taht
2022-05-11  6:22   ` Michael S. Tsirkin
2022-05-11 10:12   ` Michael Ellerman
2022-05-11 16:20     ` Linus Torvalds
2022-05-12 13:30       ` Michael Ellerman
2022-05-12 17:10         ` Linus Torvalds
2022-05-12 17:19           ` Linus Torvalds
2022-05-13 14:14             ` Eric W. Biederman
2022-05-13 17:00               ` Jakub Kicinski
2022-05-16  9:03           ` Michael S. Tsirkin
2022-05-11 12:24   ` Jörg Rödel
2022-05-13 12:16     ` Michael S. Tsirkin
2022-05-10 18:31 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220511125140.ormw47yluv4btiey@meerkat.local \
    --to=konstantin@linuxfoundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mie@igel.co.jp \
    --cc=mst@redhat.com \
    --cc=nathan@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).