All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Github PR bot questions
@ 2021-06-16 17:18 Konstantin Ryabitsev
  2021-06-16 17:24 ` Drew DeVault
                   ` (10 more replies)
  0 siblings, 11 replies; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-16 17:18 UTC (permalink / raw)
  To: users, workflows

Hi, all:

I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
pull requests on a project and convert them into fully well-formed patch
series. This would be a one-way operation, effectively turning Github into a
fancy "git-send-email" replacement. That said, it would have the following
benefits for both submitters and maintainers:

- submitters would no longer need to navigate their way around
  git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
  have a patch-friendly outgoing mail gateway to properly contribute patches
- subsystem maintainers can configure whatever CI pre-checks they want before
  the series is sent to them for review (and we can work on a library of
  Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
- the bot should (eventually) be clever enough to automatically track v1..vX
  on pull request updates, assuming the API makes it straightforward

A this point, I need your input to make sure I'm not going down any wrong
paths:

- My general assumption is that putting this bot on github.com/torvalds/linux
  would not be useful, as this will probably result in more noise than signal.
  I expect that subsystem maintainers would prefer to configure their own
  GitHub projects so they can have full control on what kind of CI prechecks
  must succeed before the series is sent out. Is that a valid assumption, or
  should I be working towards having a single point of submission on each
  forge platform (Github, Gitlab, etc)?

- We can *probably* track when patch series get applied and auto-close pull
  requests that are accepted -- but it's not going to be perfect (we'd
  basically be using git-patch-id to match commits to pull requests). Or is it
  better to auto-close the pull request right after it's sent to the list with
  a message like "thank you, please monitor your email for the rest of the
  process"? The latter is much easier for me, of course. :)

I'll probably have more questions as I go along, but I wanted to start with
these two.

Thanks,
-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
@ 2021-06-16 17:24 ` Drew DeVault
  2021-06-16 17:47 ` Johannes Berg
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 53+ messages in thread
From: Drew DeVault @ 2021-06-16 17:24 UTC (permalink / raw)
  To: Konstantin Ryabitsev, users, workflows

FYI I have some prior art doing this in the reverse:

https://lists.alpinelinux.org/~alpine/aports/patches/3527
https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/21708

via dispatch.sr.ht:

https://git.sr.ht/~sircmpwn/dispatch.sr.ht

I would be quite happy to see support for doing this the other way
around, too.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
  2021-06-16 17:24 ` Drew DeVault
@ 2021-06-16 17:47 ` Johannes Berg
  2021-06-16 17:55   ` Konstantin Ryabitsev
       [not found] ` <CAJhbpm_BgbSx581HU0mTCkcE28n_hRx=tv74az_mE2VBmPtrVA@mail.gmail.com>
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 53+ messages in thread
From: Johannes Berg @ 2021-06-16 17:47 UTC (permalink / raw)
  To: Konstantin Ryabitsev, users, workflows

On Wed, 2021-06-16 at 13:18 -0400, Konstantin Ryabitsev wrote:
> 
> - the bot should (eventually) be clever enough to automatically track v1..vX
>   on pull request updates, assuming the API makes it straightforward


> - We can *probably* track when patch series get applied and auto-close pull
>   requests that are accepted -- but it's not going to be perfect (we'd
>   basically be using git-patch-id to match commits to pull requests). Or is it
>   better to auto-close the pull request right after it's sent to the list with
>   a message like "thank you, please monitor your email for the rest of the
>   process"? The latter is much easier for me, of course. :)

Those two conflict, no? I mean, if you close it, then there won't really
be a v2/vX?

Given that many (if not most) patch series require a v2 or more, it
would seem closing it immediately would be a bit confusing, and then
what do you even do to get a v2?

johannes


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:47 ` Johannes Berg
@ 2021-06-16 17:55   ` Konstantin Ryabitsev
  2021-06-16 18:13     ` Miguel Ojeda
  0 siblings, 1 reply; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-16 17:55 UTC (permalink / raw)
  To: Johannes Berg; +Cc: users, workflows

On Wed, Jun 16, 2021 at 07:47:29PM +0200, Johannes Berg wrote:
> > - We can *probably* track when patch series get applied and auto-close pull
> >   requests that are accepted -- but it's not going to be perfect (we'd
> >   basically be using git-patch-id to match commits to pull requests). Or is it
> >   better to auto-close the pull request right after it's sent to the list with
> >   a message like "thank you, please monitor your email for the rest of the
> >   process"? The latter is much easier for me, of course. :)
> 
> Those two conflict, no? I mean, if you close it, then there won't really
> be a v2/vX?
> 
> Given that many (if not most) patch series require a v2 or more, it
> would seem closing it immediately would be a bit confusing, and then
> what do you even do to get a v2?

Unless I'm misreading the docs, the submitter should be able to reopen the
closed pull request, so it's not a permanent action. This may actually make it
easier to track things for v1/v2, since we wouldn't need to continuously poll
open pull requests to see if any of them changed.

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
       [not found] ` <CAJhbpm_BgbSx581HU0mTCkcE28n_hRx=tv74az_mE2VBmPtrVA@mail.gmail.com>
@ 2021-06-16 18:05   ` Konstantin Ryabitsev
  0 siblings, 0 replies; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-16 18:05 UTC (permalink / raw)
  To: Ido Rosen; +Cc: users, workflows

On Wed, Jun 16, 2021 at 10:48:59AM -0700, Ido Rosen wrote:
> I’d suggest limiting the feature set to automatically responding to GitHub
> PRs with a link to a tutorial on how to contribute patches etc.  Doing too
> much for them may create more harm than good in the long run because it
> removes the incentive to learn the contribution process.

I do understand that sentiment, to a degree; in fact, I often compare our
process to submitting a paper to a peer-reviewed journal. You can't just email
your napkin thoughts -- it must be formatted in a very specific way, list its
references in a very specific way etc, etc.

However, that is not a perfect analogy. Our code review process must also
allow for what is effectively a "report a typo" link. Currently, this is
extremely onerous for anyone, as a 15-minute affair suddenly becomes a
herculean effort. The goal of this work is to make drive-by patches easier
without also burying maintainers under a pile of junk submissions.

> An exception to this would be documentation patches and typo fixes: the
> GitHub web editor makes it very easy to edit documentation and other
> changes that don’t require compilation to test (e.g. spelling errors in
> comments, man pages, txt files, etc.).

Or it could be a tiny change in the driver string output, so I'm not sure it
makes sense to limit this to just docs (especially with many docs being
sourced straight from .c files).

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (2 preceding siblings ...)
       [not found] ` <CAJhbpm_BgbSx581HU0mTCkcE28n_hRx=tv74az_mE2VBmPtrVA@mail.gmail.com>
@ 2021-06-16 18:11 ` Miguel Ojeda
  2021-06-16 18:22   ` Konstantin Ryabitsev
  2021-06-16 20:10 ` Willy Tarreau
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 53+ messages in thread
From: Miguel Ojeda @ 2021-06-16 18:11 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

Hi Konstantin,

On Wed, Jun 16, 2021 at 7:18 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> pull requests on a project and convert them into fully well-formed patch
> series. This would be a one-way operation, effectively turning Github into a
> fancy "git-send-email" replacement. That said, it would have the following
> benefits for both submitters and maintainers:

For Rust for Linux, I have a GitHub bot that reviews PRs and spots the
usual mistakes in commit messages (tags, formatting, lkml vs. lore
links, that sort of thing). It has been very effective so far to teach
newcomers how to follow the kernel development process.

I am also extending it to take Acks, Reviewed-by's, Tested-by's, etc.,
and then performing the merge only if the CI passes (which includes
running tests under QEMU, code formatting, lints, etc.) after applying
each patch.

So, in a way, I am trying to go a bit further by bringing the kernel
development workflow into GitHub (and similar services), rather than
keeping the ML as the main place for code review etc. I know it can be
controversial, but hey, I want to try. :)

I also thought about having the bot parse the ML and create PRs
automatically from that (i.e. the reverse direction from what you want
to achieve), and I thought about reusing some of your `b4` code for
that, but it will depend on the amount of patches we receive in the ML
etc. One could even make it bi-directional to some degree, but I am
not sure it is worth the complexity.

By the way, the gccrs project (also in GitHub, but also following
GCC's workflow which involves MLs) also wanted something like ML-to-PR
direction, and I suggested to them taking a look at `b4` too.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:55   ` Konstantin Ryabitsev
@ 2021-06-16 18:13     ` Miguel Ojeda
  2021-06-17 17:07       ` Serge E. Hallyn
  0 siblings, 1 reply; 53+ messages in thread
From: Miguel Ojeda @ 2021-06-16 18:13 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Johannes Berg, users, workflows

On Wed, Jun 16, 2021 at 7:55 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> Unless I'm misreading the docs, the submitter should be able to reopen the
> closed pull request, so it's not a permanent action. This may actually make it
> easier to track things for v1/v2, since we wouldn't need to continuously poll
> open pull requests to see if any of them changed.

For versioning, what I request at the moment is that people force-push
their PR, overwriting the previous set of commits, rather than sending
a new PR. Then people write in the GitHub-PR-message (which we use as
the cover letter) the diff/updates, if any. The bot detects changes
and pushes a new review every time, and hides its own past messages
(i.e. for previous versions).

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 18:11 ` Miguel Ojeda
@ 2021-06-16 18:22   ` Konstantin Ryabitsev
  2021-06-16 18:38     ` Miguel Ojeda
  0 siblings, 1 reply; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-16 18:22 UTC (permalink / raw)
  To: Miguel Ojeda; +Cc: users, workflows

On Wed, Jun 16, 2021 at 08:11:23PM +0200, Miguel Ojeda wrote:
> So, in a way, I am trying to go a bit further by bringing the kernel
> development workflow into GitHub (and similar services), rather than
> keeping the ML as the main place for code review etc. I know it can be
> controversial, but hey, I want to try. :)

That's pretty cool, but I'm opposed to this on theological grounds. :)
I think there's intrinsic value to development happening outside of any
proprietary environments, especially if this becomes cornerstone to how a
subsystem is maintained.

I'm not saying Github is going to go away any time, but on the other hand we
just watched Freenode completely implode within the span of 3 weeks, so I'm
extra wary of introducing any single points of value into the development
workflow. If my pr bot goes away, it'll at most reduce the convenience of the
submission process, but not hobble it in any way, so I intend to just treat
Github as a developer frontend tool.

> By the way, the gccrs project (also in GitHub, but also following
> GCC's workflow which involves MLs) also wanted something like ML-to-PR
> direction, and I suggested to them taking a look at `b4` too.

Sure, but just be wary that b4 internal structure is still under heavy
development, so I wouldn't recommend treating it as a stable library until it
reaches 1.0.

Best,
-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 18:22   ` Konstantin Ryabitsev
@ 2021-06-16 18:38     ` Miguel Ojeda
  0 siblings, 0 replies; 53+ messages in thread
From: Miguel Ojeda @ 2021-06-16 18:38 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Wed, Jun 16, 2021 at 8:22 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> That's pretty cool, but I'm opposed to this on theological grounds. :)
> I think there's intrinsic value to development happening outside of any
> proprietary environments, especially if this becomes cornerstone to how a
> subsystem is maintained.

GitHub being proprietary does not really matter -- we can migrate to
any other similar service quite easily (even a kernel.org-hosted one!,
e.g. a hosted GitLab instance).

> I'm not saying Github is going to go away any time, but on the other hand we
> just watched Freenode completely implode within the span of 3 weeks, so I'm
> extra wary of introducing any single points of value into the development
> workflow. If my pr bot goes away, it'll at most reduce the convenience of the
> submission process, but not hobble it in any way, so I intend to just treat
> Github as a developer frontend tool.

Sure -- that is why the bot does not depend on any "advanced" or
GitHub-specific feature: currently it only needs to read
commits/PR/messages and post messages (i.e. the same as your bot),
which is basic functionality that any GitHub/GitLab/...-like service
must provide.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (3 preceding siblings ...)
  2021-06-16 18:11 ` Miguel Ojeda
@ 2021-06-16 20:10 ` Willy Tarreau
  2021-06-17 15:11   ` Konstantin Ryabitsev
  2021-06-16 20:24 ` Linus Torvalds
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 53+ messages in thread
From: Willy Tarreau @ 2021-06-16 20:10 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

Hi Konstantin,

On Wed, Jun 16, 2021 at 01:18:13PM -0400, Konstantin Ryabitsev wrote:
> - We can *probably* track when patch series get applied and auto-close pull
>   requests that are accepted

In haproxy we have a bot that forwards PRs to the mailing list. It
CCs the author and sends a gentle message saying that reviews are
made in public using plain-text e-mails, and that for this reason
the PR is automatically closed. And I think it's easier for both
sides, because users who are used to PRs would probably count a bit
too much on the "github mode", where they're certain that someone
will eventually notice that the PR count went from 1137 to 1138 and
will have a look, while it's certain that there will be quite some
losses, and by warning the submitter upfront there is less surprise.

In addition, responding to reviews over e-mail and having the submitter
do the same along the discussion basically means that everyone will
forget about closing the issue. Also one of the benefits of e-mail is
that maintainers can ultimately fix some typos in the commit message
or adjust the subject before committing, something which will quickly
confuse the bot.

For these reasons I think that instant close (with a gentle message)
will be more robust.

Just my two cents,
Willy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (4 preceding siblings ...)
  2021-06-16 20:10 ` Willy Tarreau
@ 2021-06-16 20:24 ` Linus Torvalds
  2021-06-17 15:09   ` Konstantin Ryabitsev
  2021-06-16 21:11 ` Rob Herring
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2021-06-16 20:24 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Wed, Jun 16, 2021 at 10:18 AM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> - My general assumption is that putting this bot on github.com/torvalds/linux
>   would not be useful, as this will probably result in more noise than signal.

Yeah, I've had to turn off all emails from github because it's too
noisy. Github also makes it very easyt to "invite" people to be team
members, whether they want to or not, so I'm a "member" of random
projects. There needs to be some way to make this _very_ much opt-in,
not "anybody can do anything whether the target is ok with it or not".

I do think that having a way to generate patch series would be good,
but I really really hope your robot would have

 (a) some limit on sizes

 (b) check that the commit messages are well-formed

I've seen a lot of github code that doesn't have the Linux kernel kind
of "good commit messages required".

And no, it's not just about proper sign-off chains. It's about things
like "proper one-line header". It's about things like "actual
whitespace and proper line breaks and indentation for quoted text
etc".

I see way too many projects that have a mess of unwrapped (or purely
auto-wrapped) wall of text for commit messages. I think some of the
"do commits in the browser" workflows seem to think it's ok to have
reflowed text and not have proper formatting of quoting (error
messages) etc.

Yes, yes, that happens in emailed patches too (I actually spend time
on emailed patches making sure 'dmesg' snippets are cleaned up etc -
not everybody does), but it happens a _lot_ in other projects that
don't have quote the same quality requirements for commit messages
that we do.

               Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (5 preceding siblings ...)
  2021-06-16 20:24 ` Linus Torvalds
@ 2021-06-16 21:11 ` Rob Herring
  2021-06-16 21:18   ` Stefano Stabellini
                     ` (3 more replies)
  2021-06-17  6:37 ` Dmitry Vyukov
                   ` (3 subsequent siblings)
  10 siblings, 4 replies; 53+ messages in thread
From: Rob Herring @ 2021-06-16 21:11 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> Hi, all:
>
> I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> pull requests on a project and convert them into fully well-formed patch
> series. This would be a one-way operation, effectively turning Github into a
> fancy "git-send-email" replacement. That said, it would have the following
> benefits for both submitters and maintainers:

What makes this specific to Github PRs? A Github PR is really just a
git branch plus a target at least to the extent we would use it here.
The more of this that works on just a git branch, the more widely
useful it would be.

> - submitters would no longer need to navigate their way around
>   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
>   have a patch-friendly outgoing mail gateway to properly contribute patches

Presumably, the bot would rely on get_maintainer.pl or it would get
who to send to based on GH repo and reviewers? Without work on
get_maintainer.pl, I don't think it will work well beyond simple
cases.

> - subsystem maintainers can configure whatever CI pre-checks they want before
>   the series is sent to them for review (and we can work on a library of
>   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)

What about all the patches that don't come from the GH PR? Those need
CI pre-checks too. We're going to implement CI twice? The biggest
issue I have on CI checks is applying patches. My algorithm is apply
to my current base (last rc1 typically) or give up. I'm sure it could
be a lot smarter trying several branches or looking at base-commit
(not consistently used) or the git diff treeish hashes. What I'd
really like is some bot or script that's applying series and
publishing git branches with a messageid to git branch tool. 0-day is
doing this now. Basically, the opposite direction as others have
mentioned.

> - the bot should (eventually) be clever enough to automatically track v1..vX
>   on pull request updates, assuming the API makes it straightforward
>
> A this point, I need your input to make sure I'm not going down any wrong
> paths:
>
> - My general assumption is that putting this bot on github.com/torvalds/linux
>   would not be useful, as this will probably result in more noise than signal.
>   I expect that subsystem maintainers would prefer to configure their own
>   GitHub projects so they can have full control on what kind of CI prechecks
>   must succeed before the series is sent out. Is that a valid assumption, or
>   should I be working towards having a single point of submission on each
>   forge platform (Github, Gitlab, etc)?

I think it needs to be per maintainer in terms of what checks run, but
if submission is per maintainer project then the problem will be how
does the submitter know where to send something? get_maintainer.pl
tells them? It doesn't do a great job of that IMO. There's not a clear
distinction of who applies my patch and others Cc'ed (file
maintainers).

I've kind of reached the conclusion that relying on submitters to get
it right is never going to work (is Cc the DT list for DT patches so
PW picks them up so hard!?). I think the model needs to be send
patches to 'the kernel' and then maintainers have tools to extract all
the patches they are interested in (the planned lore
local-email-interface).

I'm sure there are maintainers who want nothing to do with Github or
anything else. So it's got to work without any maintainer involvement.
The submitter can't be expected to figure out who will and will not
take GH PR based submissions.

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 21:11 ` Rob Herring
@ 2021-06-16 21:18   ` Stefano Stabellini
  2021-06-16 21:59     ` Rob Herring
  2021-06-16 22:33   ` James Bottomley
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 53+ messages in thread
From: Stefano Stabellini @ 2021-06-16 21:18 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, users, workflows

On Wed, 16 Jun 2021, Rob Herring wrote:
> > - subsystem maintainers can configure whatever CI pre-checks they want before
> >   the series is sent to them for review (and we can work on a library of
> >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> 
> What about all the patches that don't come from the GH PR? Those need
> CI pre-checks too. We're going to implement CI twice? The biggest
> issue I have on CI checks is applying patches. My algorithm is apply
> to my current base (last rc1 typically) or give up. I'm sure it could
> be a lot smarter trying several branches or looking at base-commit
> (not consistently used) or the git diff treeish hashes. What I'd
> really like is some bot or script that's applying series and
> publishing git branches with a messageid to git branch tool. 0-day is
> doing this now. Basically, the opposite direction as others have
> mentioned.

It exists: it is called patchew and we are using it in QEMU and in Xen
Project (https://gitlab.com/xen-project/patchew).

It takes patch series off of a mailing list and commits them to a branch
to trigger a CI-loop run. It is also able to send back a "passed" or
"failed" email to the mailing list. It is great!

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 21:18   ` Stefano Stabellini
@ 2021-06-16 21:59     ` Rob Herring
  0 siblings, 0 replies; 53+ messages in thread
From: Rob Herring @ 2021-06-16 21:59 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Konstantin Ryabitsev, users, workflows

On Wed, Jun 16, 2021 at 3:18 PM Stefano Stabellini
<sstabellini@kernel.org> wrote:
>
> On Wed, 16 Jun 2021, Rob Herring wrote:
> > > - subsystem maintainers can configure whatever CI pre-checks they want before
> > >   the series is sent to them for review (and we can work on a library of
> > >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> >
> > What about all the patches that don't come from the GH PR? Those need
> > CI pre-checks too. We're going to implement CI twice? The biggest
> > issue I have on CI checks is applying patches. My algorithm is apply
> > to my current base (last rc1 typically) or give up. I'm sure it could
> > be a lot smarter trying several branches or looking at base-commit
> > (not consistently used) or the git diff treeish hashes. What I'd
> > really like is some bot or script that's applying series and
> > publishing git branches with a messageid to git branch tool. 0-day is
> > doing this now. Basically, the opposite direction as others have
> > mentioned.
>
> It exists: it is called patchew and we are using it in QEMU and in Xen
> Project (https://gitlab.com/xen-project/patchew).
>
> It takes patch series off of a mailing list and commits them to a branch
> to trigger a CI-loop run. It is also able to send back a "passed" or
> "failed" email to the mailing list. It is great!

Yes, I've seen that. To me, that's really just a patchwork
alternative, and standing up a web server is not what I want. Now
maybe if someone else does and I can fetch what I need then it would
work. Otherwise, it's just another project with patch applying code
that can't be used on its own. But I doubt the patch applying would do
what's needed for Linux. According to documentation, it says it can
'Apply the patch series on top of git master'. My CI already does
that. I need apply to this git tree containing all these subsystem
trees (i.e. linux-next) and find the right base yourself. Note that
applying to linux-next itself is wrong as that's not a tree any
maintainer can apply patches to.

At the end of the day, I just want to do:

git fetch agitserver $(msgid2branch $msgid)

or

b4 am -o $msgid | a-smart-git-am-that-finds-the-right-commit-base

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 21:11 ` Rob Herring
  2021-06-16 21:18   ` Stefano Stabellini
@ 2021-06-16 22:33   ` James Bottomley
  2021-06-17 14:18     ` Rob Herring
  2021-06-17  6:52   ` Mauro Carvalho Chehab
  2021-06-17 14:47   ` Konstantin Ryabitsev
  3 siblings, 1 reply; 53+ messages in thread
From: James Bottomley @ 2021-06-16 22:33 UTC (permalink / raw)
  To: Rob Herring, Konstantin Ryabitsev; +Cc: users, workflows

On Wed, 2021-06-16 at 15:11 -0600, Rob Herring wrote:
> > - subsystem maintainers can configure whatever CI pre-checks they
> > want before the series is sent to them for review (and we can work
> > on a library of Github actions, so nobody needs to reimplement
> > checkpatch.pl multiple times)
> 
> What about all the patches that don't come from the GH PR? Those need
> CI pre-checks too. We're going to implement CI twice? The biggest
> issue I have on CI checks is applying patches. My algorithm is apply
> to my current base (last rc1 typically) or give up. I'm sure it could
> be a lot smarter trying several branches or looking at base-commit
> (not consistently used) or the git diff treeish hashes. What I'd
> really like is some bot or script that's applying series and
> publishing git branches with a messageid to git branch tool. 0-day is
> doing this now. Basically, the opposite direction as others have
> mentioned.

I've got to say my experience with Github CIs has been pretty
unpleasant.  Pretty much every project I've ever pushed to has had at
least one commit reject because of a bug in the CI rather than the
commit which they usually dump on the submitter to fix.  As an endless
devops make work project, I'm sure they're fine, but what we have now
with 0-day is pretty much good enough for most kernel work, plus if it
goes wrong we can ignore it and somebody else fixes it ...

James



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (6 preceding siblings ...)
  2021-06-16 21:11 ` Rob Herring
@ 2021-06-17  6:37 ` Dmitry Vyukov
  2021-06-17  7:30 ` Greg KH
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 53+ messages in thread
From: Dmitry Vyukov @ 2021-06-17  6:37 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

think On Wed, Jun 16, 2021 at 7:18 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> Hi, all:
>
> I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> pull requests on a project and convert them into fully well-formed patch
> series. This would be a one-way operation, effectively turning Github into a
> fancy "git-send-email" replacement. That said, it would have the following
> benefits for both submitters and maintainers:
>
> - submitters would no longer need to navigate their way around
>   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
>   have a patch-friendly outgoing mail gateway to properly contribute patches
> - subsystem maintainers can configure whatever CI pre-checks they want before
>   the series is sent to them for review (and we can work on a library of
>   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> - the bot should (eventually) be clever enough to automatically track v1..vX
>   on pull request updates, assuming the API makes it straightforward
>
> A this point, I need your input to make sure I'm not going down any wrong
> paths:
>
> - My general assumption is that putting this bot on github.com/torvalds/linux
>   would not be useful, as this will probably result in more noise than signal.
>   I expect that subsystem maintainers would prefer to configure their own
>   GitHub projects so they can have full control on what kind of CI prechecks
>   must succeed before the series is sent out. Is that a valid assumption, or
>   should I be working towards having a single point of submission on each
>   forge platform (Github, Gitlab, etc)?

Hi Konstantin,

This is exciting!
I think it will be more useful in the long run to have it on a single
github repo with multiple branches (single point of submission). The
advantages I see are:
 - having single integration point with testing systems
 - no version skew, no broken deployments that need maintenance
 - no radically different configurations, these rules are like code
style (does not matter which one exactly we use as long as it's
consistent across the project)
 - much higher RoI for testing/CI/tool experts contributions (this
addresses one of the main pain points of the current kernel testing --
it's simply not possible to contribute to it. Why would I contribute
only to a single subsystem testing that runs on somebody's personal
machine which may disappear tomorrow? and how do I even choose one
subsystem if I don't have personal interest in any?)
I also assume that lots of maintainers either won't have lots of
interest in configuring/maintaining this, or will have some interest
initially but will lose it over time.
For once: it will be possible to have proper documentation on the
process (as compared to current per-subsystem rules, which are usually
not documented again because of low RoI for anything related to a
single subsystem only).

If we have a single point of submission, will it be possible to have
some per-branch/subsystem settings? If yes, that may be a good
compromise: having a well-defined set of preferences (maintainer can
choose A or B, or opt-in into a new static check or not) that are
managed centrally.



> - We can *probably* track when patch series get applied and auto-close pull
>   requests that are accepted -- but it's not going to be perfect (we'd
>   basically be using git-patch-id to match commits to pull requests). Or is it
>   better to auto-close the pull request right after it's sent to the list with
>   a message like "thank you, please monitor your email for the rest of the
>   process"? The latter is much easier for me, of course. :)
>
> I'll probably have more questions as I go along, but I wanted to start with
> these two.
>
> Thanks,
> -K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 21:11 ` Rob Herring
  2021-06-16 21:18   ` Stefano Stabellini
  2021-06-16 22:33   ` James Bottomley
@ 2021-06-17  6:52   ` Mauro Carvalho Chehab
  2021-06-17  8:20     ` Dmitry Vyukov
  2021-06-17 14:02     ` Rob Herring
  2021-06-17 14:47   ` Konstantin Ryabitsev
  3 siblings, 2 replies; 53+ messages in thread
From: Mauro Carvalho Chehab @ 2021-06-17  6:52 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, users, workflows

Em Wed, 16 Jun 2021 15:11:33 -0600
Rob Herring <robh@kernel.org> escreveu:

> On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> >
> > Hi, all:
> >
> > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > pull requests on a project and convert them into fully well-formed patch
> > series. This would be a one-way operation, effectively turning Github into a
> > fancy "git-send-email" replacement. That said, it would have the following
> > benefits for both submitters and maintainers:  
> 
> What makes this specific to Github PRs? A Github PR is really just a
> git branch plus a target at least to the extent we would use it here.
> The more of this that works on just a git branch, the more widely
> useful it would be.
> 
> > - submitters would no longer need to navigate their way around
> >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> >   have a patch-friendly outgoing mail gateway to properly contribute patches  
> 
> Presumably, the bot would rely on get_maintainer.pl or it would get
> who to send to based on GH repo and reviewers? Without work on
> get_maintainer.pl, I don't think it will work well beyond simple
> cases.

Some sanity test is needed, as otherwise it will end by trying to send
the patch to a large number of people.

> 
> > - subsystem maintainers can configure whatever CI pre-checks they want before
> >   the series is sent to them for review (and we can work on a library of
> >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)  
> 
> What about all the patches that don't come from the GH PR? Those need
> CI pre-checks too. We're going to implement CI twice? 

> The biggest
> issue I have on CI checks is applying patches. My algorithm is apply
> to my current base (last rc1 typically) or give up. I'm sure it could
> be a lot smarter trying several branches or looking at base-commit
> (not consistently used) or the git diff treeish hashes. 

Finding the common parent for a PR is easy. In the case of github,
all the script need is to get the common ancestor between the forked
tree (from where the PR originated) and the tree where the PR is being
submitted.

It shouldn't be hard to add this at a patch 00/xx.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (7 preceding siblings ...)
  2021-06-17  6:37 ` Dmitry Vyukov
@ 2021-06-17  7:30 ` Greg KH
  2021-06-17 14:59   ` Konstantin Ryabitsev
  2021-06-17  8:24 ` Christoph Hellwig
  2021-06-17 20:42 ` Brendan Higgins
  10 siblings, 1 reply; 53+ messages in thread
From: Greg KH @ 2021-06-17  7:30 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Wed, Jun 16, 2021 at 01:18:13PM -0400, Konstantin Ryabitsev wrote:
> - My general assumption is that putting this bot on github.com/torvalds/linux
>   would not be useful, as this will probably result in more noise than signal.
>   I expect that subsystem maintainers would prefer to configure their own
>   GitHub projects so they can have full control on what kind of CI prechecks
>   must succeed before the series is sent out. Is that a valid assumption, or
>   should I be working towards having a single point of submission on each
>   forge platform (Github, Gitlab, etc)?

What ever repo you put this on, it's going to take constant maintenance
to keep it up to date and prune out the PRs that are going to accumulate
there, as well as deal with the obvious spam and abuse issues that
popular trees always accumulate.

Linus has already said to not do it on his "tree", and I offer a full
"all branches" kernel mirror on github as well, but I don't want to be
responsible for this type of mess.

So perhaps you get a new kernel.org "mirror" account somehow and put it
there?  But again, someone will be responsible for keeping it alive and
clean, a thankless task that will take constant work.

> - We can *probably* track when patch series get applied and auto-close pull
>   requests that are accepted -- but it's not going to be perfect (we'd
>   basically be using git-patch-id to match commits to pull requests). Or is it
>   better to auto-close the pull request right after it's sent to the list with
>   a message like "thank you, please monitor your email for the rest of the
>   process"? The latter is much easier for me, of course. :)

Auto-close is the only way to do this, otherwise someone will have to go
back and clean it up on their own.  Again, who is going to do that?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  6:52   ` Mauro Carvalho Chehab
@ 2021-06-17  8:20     ` Dmitry Vyukov
  2021-06-17  8:55       ` Mauro Carvalho Chehab
  2021-06-17 14:02     ` Rob Herring
  1 sibling, 1 reply; 53+ messages in thread
From: Dmitry Vyukov @ 2021-06-17  8:20 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Rob Herring, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Wed, 16 Jun 2021 15:11:33 -0600
> Rob Herring <robh@kernel.org> escreveu:
>
> > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > <konstantin@linuxfoundation.org> wrote:
> > >
> > > Hi, all:
> > >
> > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > pull requests on a project and convert them into fully well-formed patch
> > > series. This would be a one-way operation, effectively turning Github into a
> > > fancy "git-send-email" replacement. That said, it would have the following
> > > benefits for both submitters and maintainers:
> >
> > What makes this specific to Github PRs? A Github PR is really just a
> > git branch plus a target at least to the extent we would use it here.
> > The more of this that works on just a git branch, the more widely
> > useful it would be.
> >
> > > - submitters would no longer need to navigate their way around
> > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> >
> > Presumably, the bot would rely on get_maintainer.pl or it would get
> > who to send to based on GH repo and reviewers? Without work on
> > get_maintainer.pl, I don't think it will work well beyond simple
> > cases.
>
> Some sanity test is needed, as otherwise it will end by trying to send
> the patch to a large number of people.

I think this system needs to use get_maintainer.pl results as is and
any fixing/filtering/sanity checking needs to go into
get_maintainer.pl itself.
get_maintainer.pl is what is used by lots of contributors, the only
option for any automated systems, what is used by new contributors if
they don't use this system anyway. And even experienced developers
know internal rules only for a few subsystems and use
get_maintainer.pl when sending a one-off patch to another subsystem
(what else?).

I don't see where we are getting if we accept get_maintainer.pl
produces bad results and needs additional fixing in every system out
there (dozens) and when used by humans. All systems would need the
same filtering/checking rules and they need to keep in sync. What a
kernel developer would even need to do to fix something (add/remove
themselves)? Go and talk to a large unknown set of systems that
duplicate the same additional rules?

And the only way to surface actual issues with get_maintainer.pl is to
start using it. In fact it's already widely used as is, so I am not
sure it's particularly bad.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (8 preceding siblings ...)
  2021-06-17  7:30 ` Greg KH
@ 2021-06-17  8:24 ` Christoph Hellwig
  2021-06-17  8:33   ` Jiri Kosina
  2021-06-17 20:42 ` Brendan Higgins
  10 siblings, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2021-06-17  8:24 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

Please opt all subsystems I maintain out of this crap.  The last thing
I need is patches from people that can't deal with a sane workflow.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  8:24 ` Christoph Hellwig
@ 2021-06-17  8:33   ` Jiri Kosina
  2021-06-17  9:52     ` Dmitry Vyukov
  0 siblings, 1 reply; 53+ messages in thread
From: Jiri Kosina @ 2021-06-17  8:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Konstantin Ryabitsev, users, workflows

On Thu, 17 Jun 2021, Christoph Hellwig wrote:

> Please opt all subsystems I maintain out of this crap.  The last thing
> I need is patches from people that can't deal with a sane workflow.

I agree with Christoph. This (if at all) really needs to be on an explicit 
maintainer opt-in basis only (and I am not going to opt in with my 
subsystems :) ).

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  8:20     ` Dmitry Vyukov
@ 2021-06-17  8:55       ` Mauro Carvalho Chehab
  2021-06-17  9:33         ` Dmitry Vyukov
  2021-06-17 14:33         ` Rob Herring
  0 siblings, 2 replies; 53+ messages in thread
From: Mauro Carvalho Chehab @ 2021-06-17  8:55 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Rob Herring, Konstantin Ryabitsev, users, workflows

Em Thu, 17 Jun 2021 10:20:31 +0200
Dmitry Vyukov <dvyukov@google.com> escreveu:

> On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> >
> > Em Wed, 16 Jun 2021 15:11:33 -0600
> > Rob Herring <robh@kernel.org> escreveu:
> >  
> > > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > > <konstantin@linuxfoundation.org> wrote:  
> > > >
> > > > Hi, all:
> > > >
> > > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > > pull requests on a project and convert them into fully well-formed patch
> > > > series. This would be a one-way operation, effectively turning Github into a
> > > > fancy "git-send-email" replacement. That said, it would have the following
> > > > benefits for both submitters and maintainers:  
> > >
> > > What makes this specific to Github PRs? A Github PR is really just a
> > > git branch plus a target at least to the extent we would use it here.
> > > The more of this that works on just a git branch, the more widely
> > > useful it would be.
> > >  
> > > > - submitters would no longer need to navigate their way around
> > > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > > >   have a patch-friendly outgoing mail gateway to properly contribute patches  
> > >
> > > Presumably, the bot would rely on get_maintainer.pl or it would get
> > > who to send to based on GH repo and reviewers? Without work on
> > > get_maintainer.pl, I don't think it will work well beyond simple
> > > cases.  
> >
> > Some sanity test is needed, as otherwise it will end by trying to send
> > the patch to a large number of people.  
> 
> I think this system needs to use get_maintainer.pl results as is and
> any fixing/filtering/sanity checking needs to go into
> get_maintainer.pl itself.
> get_maintainer.pl is what is used by lots of contributors, the only
> option for any automated systems, what is used by new contributors if
> they don't use this system anyway. And even experienced developers
> know internal rules only for a few subsystems and use
> get_maintainer.pl when sending a one-off patch to another subsystem
> (what else?).
> 
> I don't see where we are getting if we accept get_maintainer.pl
> produces bad results and needs additional fixing in every system out
> there (dozens) and when used by humans. All systems would need the
> same filtering/checking rules and they need to keep in sync. What a
> kernel developer would even need to do to fix something (add/remove
> themselves)? Go and talk to a large unknown set of systems that
> duplicate the same additional rules?
> 
> And the only way to surface actual issues with get_maintainer.pl is to
> start using it. In fact it's already widely used as is, so I am not
> sure it's particularly bad.

I'm not saying that get_maintainer.pl produces bad result. Depending 
on what is done, it could produce a very large output.

Let's suppose that someone do something like globally renaming a
widely-used kAPI, e. g. something like:

	$ git ls-files|xargs sed s,mutex_,new_mutex_, -i 

A change like that would touch lots of subsystems, making get_maintainer.pl
to spend a lot of time processing it, and producing thousands of
entries (btw, we had a change somewhat similar to the above a long time
ago when mutex API was introduced and most of the semaphores were converted
to use mutex kAPI instead).

People that use to work with github/gitlab won't care much on doing a
change like that on a single patch, but this is something that won't
work on a PR -> email interface - nor maintainers would like to review
such big patch touching on multiple subsystems.

So, a bot would need to not only check the size of the patches,
but also the output of checkpatch.pl.

It will also need to have a timeout to abort checkpatch.pl, not only
to avoid the bot to become out of service for a long time, but also
because checkpatch.pl taking more than a minute or so is a good 
indication that the patch is not good - as it is touching too many
stuff.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  8:55       ` Mauro Carvalho Chehab
@ 2021-06-17  9:33         ` Dmitry Vyukov
  2021-06-17  9:52           ` Geert Uytterhoeven
  2021-06-17 14:33         ` Rob Herring
  1 sibling, 1 reply; 53+ messages in thread
From: Dmitry Vyukov @ 2021-06-17  9:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Rob Herring, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 10:55 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Thu, 17 Jun 2021 10:20:31 +0200
> Dmitry Vyukov <dvyukov@google.com> escreveu:
>
> > On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
> > <mchehab+huawei@kernel.org> wrote:
> > >
> > > Em Wed, 16 Jun 2021 15:11:33 -0600
> > > Rob Herring <robh@kernel.org> escreveu:
> > >
> > > > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > > > <konstantin@linuxfoundation.org> wrote:
> > > > >
> > > > > Hi, all:
> > > > >
> > > > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > > > pull requests on a project and convert them into fully well-formed patch
> > > > > series. This would be a one-way operation, effectively turning Github into a
> > > > > fancy "git-send-email" replacement. That said, it would have the following
> > > > > benefits for both submitters and maintainers:
> > > >
> > > > What makes this specific to Github PRs? A Github PR is really just a
> > > > git branch plus a target at least to the extent we would use it here.
> > > > The more of this that works on just a git branch, the more widely
> > > > useful it would be.
> > > >
> > > > > - submitters would no longer need to navigate their way around
> > > > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > > > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> > > >
> > > > Presumably, the bot would rely on get_maintainer.pl or it would get
> > > > who to send to based on GH repo and reviewers? Without work on
> > > > get_maintainer.pl, I don't think it will work well beyond simple
> > > > cases.
> > >
> > > Some sanity test is needed, as otherwise it will end by trying to send
> > > the patch to a large number of people.
> >
> > I think this system needs to use get_maintainer.pl results as is and
> > any fixing/filtering/sanity checking needs to go into
> > get_maintainer.pl itself.
> > get_maintainer.pl is what is used by lots of contributors, the only
> > option for any automated systems, what is used by new contributors if
> > they don't use this system anyway. And even experienced developers
> > know internal rules only for a few subsystems and use
> > get_maintainer.pl when sending a one-off patch to another subsystem
> > (what else?).
> >
> > I don't see where we are getting if we accept get_maintainer.pl
> > produces bad results and needs additional fixing in every system out
> > there (dozens) and when used by humans. All systems would need the
> > same filtering/checking rules and they need to keep in sync. What a
> > kernel developer would even need to do to fix something (add/remove
> > themselves)? Go and talk to a large unknown set of systems that
> > duplicate the same additional rules?
> >
> > And the only way to surface actual issues with get_maintainer.pl is to
> > start using it. In fact it's already widely used as is, so I am not
> > sure it's particularly bad.
>
> I'm not saying that get_maintainer.pl produces bad result. Depending
> on what is done, it could produce a very large output.
>
> Let's suppose that someone do something like globally renaming a
> widely-used kAPI, e. g. something like:
>
>         $ git ls-files|xargs sed s,mutex_,new_mutex_, -i
>
> A change like that would touch lots of subsystems, making get_maintainer.pl
> to spend a lot of time processing it, and producing thousands of
> entries (btw, we had a change somewhat similar to the above a long time
> ago when mutex API was introduced and most of the semaphores were converted
> to use mutex kAPI instead).
>
> People that use to work with github/gitlab won't care much on doing a
> change like that on a single patch, but this is something that won't
> work on a PR -> email interface - nor maintainers would like to review
> such big patch touching on multiple subsystems.
>
> So, a bot would need to not only check the size of the patches,
> but also the output of checkpatch.pl.
>
> It will also need to have a timeout to abort checkpatch.pl, not only
> to avoid the bot to become out of service for a long time, but also
> because checkpatch.pl taking more than a minute or so is a good
> indication that the patch is not good - as it is touching too many
> stuff.

Hi Mauro,

Interesting point.

But maybe it still can be incorporated into get_maintiners.pl?... at
least some part of it?
It may be useful for a new contributor who invokes it manually to get
the same "your patch is probably too big, consider splitting it"
message. And any other automated system probably wants the same
threshold for "running too long"/"returning too many emails" w/o
duplicating the values on its side.
Not sure what's the best interface for this... either
get_maintiners.pl can accept an additional flag that will make it fail
for the "no, don't mail this patch automatically" case. Or it can
always produce an additional recognizable message at the end for
"that's too many emails, consider splitting the patch".
What do you think?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  8:33   ` Jiri Kosina
@ 2021-06-17  9:52     ` Dmitry Vyukov
  2021-06-17 10:09       ` Christoph Hellwig
  2021-06-17 14:23       ` Miguel Ojeda
  0 siblings, 2 replies; 53+ messages in thread
From: Dmitry Vyukov @ 2021-06-17  9:52 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Christoph Hellwig, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 10:33 AM Jiri Kosina <jikos@kernel.org> wrote:
>
> On Thu, 17 Jun 2021, Christoph Hellwig wrote:
>
> > Please opt all subsystems I maintain out of this crap.  The last thing
> > I need is patches from people that can't deal with a sane workflow.
>
> I agree with Christoph. This (if at all) really needs to be on an explicit
> maintainer opt-in basis only (and I am not going to opt in with my
> subsystems :) ).

Hi Jiri, Christoph,

You don't want to accept these contributions even if you won't be able
to tell the difference on your end? But why? This looks really
orthogonal to a person's expertise and proficiency in software
development. Whether that's inability to deal with email, or no desire
to spend time on that, or just personal preferences.

On the other hand this workflow has the potential to ensure that you
never need to remind to run checkpatch.pl, nor spend time on writing
code formatting comments and re-reviewing v2 because code formatting
will be enforced, etc. So I see how this is actually beneficial for
maintainers.

For what it's worth I currently can't send emails with git because
Gmail with phone auth does accept plain passwords nor ASP. As far as I
understand proper two-factor auth would work, but phone auth is
different (kind of 1.5-factor). I know it's solvable, e.g. I can get
another account somewhere, but I don't see setting this up and
maintaining and checking a second account as a worthy time investment.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  9:33         ` Dmitry Vyukov
@ 2021-06-17  9:52           ` Geert Uytterhoeven
  0 siblings, 0 replies; 53+ messages in thread
From: Geert Uytterhoeven @ 2021-06-17  9:52 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Mauro Carvalho Chehab, Rob Herring, Konstantin Ryabitsev, users,
	workflows

On Thu, Jun 17, 2021 at 11:33 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Jun 17, 2021 at 10:55 AM Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> > Em Thu, 17 Jun 2021 10:20:31 +0200
> > Dmitry Vyukov <dvyukov@google.com> escreveu:
> >
> > > On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
> > > <mchehab+huawei@kernel.org> wrote:
> > > >
> > > > Em Wed, 16 Jun 2021 15:11:33 -0600
> > > > Rob Herring <robh@kernel.org> escreveu:
> > > >
> > > > > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > > > > <konstantin@linuxfoundation.org> wrote:
> > > > > >
> > > > > > Hi, all:
> > > > > >
> > > > > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > > > > pull requests on a project and convert them into fully well-formed patch
> > > > > > series. This would be a one-way operation, effectively turning Github into a
> > > > > > fancy "git-send-email" replacement. That said, it would have the following
> > > > > > benefits for both submitters and maintainers:
> > > > >
> > > > > What makes this specific to Github PRs? A Github PR is really just a
> > > > > git branch plus a target at least to the extent we would use it here.
> > > > > The more of this that works on just a git branch, the more widely
> > > > > useful it would be.
> > > > >
> > > > > > - submitters would no longer need to navigate their way around
> > > > > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > > > > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> > > > >
> > > > > Presumably, the bot would rely on get_maintainer.pl or it would get
> > > > > who to send to based on GH repo and reviewers? Without work on
> > > > > get_maintainer.pl, I don't think it will work well beyond simple
> > > > > cases.
> > > >
> > > > Some sanity test is needed, as otherwise it will end by trying to send
> > > > the patch to a large number of people.
> > >
> > > I think this system needs to use get_maintainer.pl results as is and
> > > any fixing/filtering/sanity checking needs to go into
> > > get_maintainer.pl itself.
> > > get_maintainer.pl is what is used by lots of contributors, the only
> > > option for any automated systems, what is used by new contributors if
> > > they don't use this system anyway. And even experienced developers
> > > know internal rules only for a few subsystems and use
> > > get_maintainer.pl when sending a one-off patch to another subsystem
> > > (what else?).
> > >
> > > I don't see where we are getting if we accept get_maintainer.pl
> > > produces bad results and needs additional fixing in every system out
> > > there (dozens) and when used by humans. All systems would need the
> > > same filtering/checking rules and they need to keep in sync. What a
> > > kernel developer would even need to do to fix something (add/remove
> > > themselves)? Go and talk to a large unknown set of systems that
> > > duplicate the same additional rules?
> > >
> > > And the only way to surface actual issues with get_maintainer.pl is to
> > > start using it. In fact it's already widely used as is, so I am not
> > > sure it's particularly bad.
> >
> > I'm not saying that get_maintainer.pl produces bad result. Depending
> > on what is done, it could produce a very large output.
> >
> > Let's suppose that someone do something like globally renaming a
> > widely-used kAPI, e. g. something like:
> >
> >         $ git ls-files|xargs sed s,mutex_,new_mutex_, -i
> >
> > A change like that would touch lots of subsystems, making get_maintainer.pl
> > to spend a lot of time processing it, and producing thousands of
> > entries (btw, we had a change somewhat similar to the above a long time
> > ago when mutex API was introduced and most of the semaphores were converted
> > to use mutex kAPI instead).
> >
> > People that use to work with github/gitlab won't care much on doing a
> > change like that on a single patch, but this is something that won't
> > work on a PR -> email interface - nor maintainers would like to review
> > such big patch touching on multiple subsystems.
> >
> > So, a bot would need to not only check the size of the patches,
> > but also the output of checkpatch.pl.
> >
> > It will also need to have a timeout to abort checkpatch.pl, not only
> > to avoid the bot to become out of service for a long time, but also
> > because checkpatch.pl taking more than a minute or so is a good
> > indication that the patch is not good - as it is touching too many
> > stuff.
>
> Hi Mauro,
>
> Interesting point.
>
> But maybe it still can be incorporated into get_maintiners.pl?... at
> least some part of it?

There's definitely room for improvement.
For changes to drivers, it usually works fine without a need to do
manual tuning.
But if you touch a Makefile or Kconfig file, get_maintainer.pl may
list all former contributors to the file.

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  9:52     ` Dmitry Vyukov
@ 2021-06-17 10:09       ` Christoph Hellwig
  2021-06-17 14:57         ` Konstantin Ryabitsev
  2021-06-17 14:23       ` Miguel Ojeda
  1 sibling, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2021-06-17 10:09 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jiri Kosina, Christoph Hellwig, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 11:52:04AM +0200, Dmitry Vyukov wrote:
> You don't want to accept these contributions even if you won't be able
> to tell the difference on your end? But why? This looks really
> orthogonal to a person's expertise and proficiency in software
> development. Whether that's inability to deal with email, or no desire
> to spend time on that, or just personal preferences.

Because I don't waste my time on the kind of crap that comes from
github.  If you build a separate webinterface that allows anyone to send
a proper series from a git tree that is all fine.  But github is toxic.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  6:52   ` Mauro Carvalho Chehab
  2021-06-17  8:20     ` Dmitry Vyukov
@ 2021-06-17 14:02     ` Rob Herring
  1 sibling, 0 replies; 53+ messages in thread
From: Rob Herring @ 2021-06-17 14:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 12:52 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Wed, 16 Jun 2021 15:11:33 -0600
> Rob Herring <robh@kernel.org> escreveu:
>
> > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > <konstantin@linuxfoundation.org> wrote:
> > >
> > > Hi, all:
> > >
> > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > pull requests on a project and convert them into fully well-formed patch
> > > series. This would be a one-way operation, effectively turning Github into a
> > > fancy "git-send-email" replacement. That said, it would have the following
> > > benefits for both submitters and maintainers:
> >
> > What makes this specific to Github PRs? A Github PR is really just a
> > git branch plus a target at least to the extent we would use it here.
> > The more of this that works on just a git branch, the more widely
> > useful it would be.
> >
> > > - submitters would no longer need to navigate their way around
> > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> >
> > Presumably, the bot would rely on get_maintainer.pl or it would get
> > who to send to based on GH repo and reviewers? Without work on
> > get_maintainer.pl, I don't think it will work well beyond simple
> > cases.
>
> Some sanity test is needed, as otherwise it will end by trying to send
> the patch to a large number of people.
>
> >
> > > - subsystem maintainers can configure whatever CI pre-checks they want before
> > >   the series is sent to them for review (and we can work on a library of
> > >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> >
> > What about all the patches that don't come from the GH PR? Those need
> > CI pre-checks too. We're going to implement CI twice?
>
> > The biggest
> > issue I have on CI checks is applying patches. My algorithm is apply
> > to my current base (last rc1 typically) or give up. I'm sure it could
> > be a lot smarter trying several branches or looking at base-commit
> > (not consistently used) or the git diff treeish hashes.
>
> Finding the common parent for a PR is easy. In the case of github,
> all the script need is to get the common ancestor between the forked
> tree (from where the PR originated) and the tree where the PR is being
> submitted.

I'm not interested in doing it in Github. It's about applying a patch
from a list, so how would Github help there anyways? It's not easy in
git either because there's not a command to do it AFAIK. I suspect
much of what's needed exists as git-am can find the merge base. I
think it's something like for a given set of branches or refs, which
ones match all the base treeishs in the diff. Then you'd know for
certain that the patch would apply to those branches. If that doesn't
work, then you could fallback to just try to apply patches to
different branches. It would be nice if the fallback could work
without doing a checkout though.

> It shouldn't be hard to add this at a patch 00/xx.

Not hard and we have that with base-commit, but that's left up to the
submitter. I don't use base-commit in my patches because it's more
work. For example, it doesn't work if you didn't setup a tracking
branch. In general, for anything left up to the submitter, we have to
deal with the submitter not doing it. If it is built-in to the tools
(git) then we can start relying on it.

Maybe if patch 0 became more automated like a git PR email, then it
would work. But then what happens on single patches without a cover.

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 22:33   ` James Bottomley
@ 2021-06-17 14:18     ` Rob Herring
  2021-06-17 14:27       ` James Bottomley
  0 siblings, 1 reply; 53+ messages in thread
From: Rob Herring @ 2021-06-17 14:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: Konstantin Ryabitsev, users, workflows

On Wed, Jun 16, 2021 at 4:33 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Wed, 2021-06-16 at 15:11 -0600, Rob Herring wrote:
> > > - subsystem maintainers can configure whatever CI pre-checks they
> > > want before the series is sent to them for review (and we can work
> > > on a library of Github actions, so nobody needs to reimplement
> > > checkpatch.pl multiple times)
> >
> > What about all the patches that don't come from the GH PR? Those need
> > CI pre-checks too. We're going to implement CI twice? The biggest
> > issue I have on CI checks is applying patches. My algorithm is apply
> > to my current base (last rc1 typically) or give up. I'm sure it could
> > be a lot smarter trying several branches or looking at base-commit
> > (not consistently used) or the git diff treeish hashes. What I'd
> > really like is some bot or script that's applying series and
> > publishing git branches with a messageid to git branch tool. 0-day is
> > doing this now. Basically, the opposite direction as others have
> > mentioned.
>
> I've got to say my experience with Github CIs has been pretty
> unpleasant.  Pretty much every project I've ever pushed to has had at
> least one commit reject because of a bug in the CI rather than the
> commit which they usually dump on the submitter to fix.  As an endless
> devops make work project, I'm sure they're fine, but what we have now
> with 0-day is pretty much good enough for most kernel work, plus if it
> goes wrong we can ignore it and somebody else fixes it ...

It's the making a git branch somewhere that I'm interested in, not the
Github part of it. If someone wants to tie GH CI to that and send out
replies to patches, then fine. We can use them if useful or ignore if
not.

0-day is a bit unpredictable in terms of response time. I often only
get reports after things land in linux-next which is a bit late IMO.
What is run and the priorities are all opaque.

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  9:52     ` Dmitry Vyukov
  2021-06-17 10:09       ` Christoph Hellwig
@ 2021-06-17 14:23       ` Miguel Ojeda
  1 sibling, 0 replies; 53+ messages in thread
From: Miguel Ojeda @ 2021-06-17 14:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jiri Kosina, Christoph Hellwig, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 11:52 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> You don't want to accept these contributions even if you won't be able
> to tell the difference on your end? But why? This looks really
> orthogonal to a person's expertise and proficiency in software
> development. Whether that's inability to deal with email, or no desire
> to spend time on that, or just personal preferences.

Agreed -- torvalds/linux may attract a lot of spam, but the LKML also
does. Any sufficiently "famous" repo/service/ML will, after all.

> On the other hand this workflow has the potential to ensure that you
> never need to remind to run checkpatch.pl, nor spend time on writing
> code formatting comments and re-reviewing v2 because code formatting
> will be enforced, etc. So I see how this is actually beneficial for
> maintainers.

Also agreed -- it is particularly useful to teach newcomers and to
save time for maintainers having to explain things. Even if a
maintainer has a set of email templates for the usual things, it takes
time vs. not even having to read the email.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 14:18     ` Rob Herring
@ 2021-06-17 14:27       ` James Bottomley
  0 siblings, 0 replies; 53+ messages in thread
From: James Bottomley @ 2021-06-17 14:27 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, users, workflows

On Thu, 2021-06-17 at 08:18 -0600, Rob Herring wrote:
> On Wed, Jun 16, 2021 at 4:33 PM James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Wed, 2021-06-16 at 15:11 -0600, Rob Herring wrote:
> > > > - subsystem maintainers can configure whatever CI pre-checks
> > > > they want before the series is sent to them for review (and we
> > > > can work on a library of Github actions, so nobody needs to
> > > > reimplement checkpatch.pl multiple times)
> > > 
> > > What about all the patches that don't come from the GH PR? Those
> > > need CI pre-checks too. We're going to implement CI twice? The
> > > biggest issue I have on CI checks is applying patches. My
> > > algorithm is apply to my current base (last rc1 typically) or
> > > give up. I'm sure it could be a lot smarter trying several
> > > branches or looking at base-commit (not consistently used) or the
> > > git diff treeish hashes. What I'd really like is some bot or
> > > script that's applying series and publishing git branches with a
> > > messageid to git branch tool. 0-day is doing this now. Basically,
> > > the opposite direction as others have mentioned.
> > 
> > I've got to say my experience with Github CIs has been pretty
> > unpleasant.  Pretty much every project I've ever pushed to has had
> > at least one commit reject because of a bug in the CI rather than
> > the commit which they usually dump on the submitter to fix.  As an
> > endless devops make work project, I'm sure they're fine, but what
> > we have now with 0-day is pretty much good enough for most kernel
> > work, plus if it goes wrong we can ignore it and somebody else
> > fixes it ...
> 
> It's the making a git branch somewhere that I'm interested in, not
> the Github part of it. If someone wants to tie GH CI to that and send
> out replies to patches, then fine. We can use them if useful or
> ignore if not.
> 
> 0-day is a bit unpredictable in terms of response time. I often only
> get reports after things land in linux-next which is a bit late IMO.
> What is run and the priorities are all opaque.

You can specifically ask it (or rather it's handlers) to send you or
the mailing list a success report when the tests you've requested have
run.  I think they also respond to triggers (please test this branch
now).

I suspect what we could all do with is a nice how 0-day can work for
you presentation from its handlers so all of us know all of the tricks.

James



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  8:55       ` Mauro Carvalho Chehab
  2021-06-17  9:33         ` Dmitry Vyukov
@ 2021-06-17 14:33         ` Rob Herring
  2021-06-17 15:24           ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 53+ messages in thread
From: Rob Herring @ 2021-06-17 14:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Dmitry Vyukov, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 2:55 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Thu, 17 Jun 2021 10:20:31 +0200
> Dmitry Vyukov <dvyukov@google.com> escreveu:
>
> > On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
> > <mchehab+huawei@kernel.org> wrote:
> > >
> > > Em Wed, 16 Jun 2021 15:11:33 -0600
> > > Rob Herring <robh@kernel.org> escreveu:
> > >
> > > > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > > > <konstantin@linuxfoundation.org> wrote:
> > > > >
> > > > > Hi, all:
> > > > >
> > > > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > > > pull requests on a project and convert them into fully well-formed patch
> > > > > series. This would be a one-way operation, effectively turning Github into a
> > > > > fancy "git-send-email" replacement. That said, it would have the following
> > > > > benefits for both submitters and maintainers:
> > > >
> > > > What makes this specific to Github PRs? A Github PR is really just a
> > > > git branch plus a target at least to the extent we would use it here.
> > > > The more of this that works on just a git branch, the more widely
> > > > useful it would be.
> > > >
> > > > > - submitters would no longer need to navigate their way around
> > > > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > > > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> > > >
> > > > Presumably, the bot would rely on get_maintainer.pl or it would get
> > > > who to send to based on GH repo and reviewers? Without work on
> > > > get_maintainer.pl, I don't think it will work well beyond simple
> > > > cases.
> > >
> > > Some sanity test is needed, as otherwise it will end by trying to send
> > > the patch to a large number of people.
> >
> > I think this system needs to use get_maintainer.pl results as is and
> > any fixing/filtering/sanity checking needs to go into
> > get_maintainer.pl itself.
> > get_maintainer.pl is what is used by lots of contributors, the only
> > option for any automated systems, what is used by new contributors if
> > they don't use this system anyway. And even experienced developers
> > know internal rules only for a few subsystems and use
> > get_maintainer.pl when sending a one-off patch to another subsystem
> > (what else?).
> >
> > I don't see where we are getting if we accept get_maintainer.pl
> > produces bad results and needs additional fixing in every system out
> > there (dozens) and when used by humans. All systems would need the
> > same filtering/checking rules and they need to keep in sync. What a
> > kernel developer would even need to do to fix something (add/remove
> > themselves)? Go and talk to a large unknown set of systems that
> > duplicate the same additional rules?
> >
> > And the only way to surface actual issues with get_maintainer.pl is to
> > start using it. In fact it's already widely used as is, so I am not
> > sure it's particularly bad.
>
> I'm not saying that get_maintainer.pl produces bad result. Depending
> on what is done, it could produce a very large output.
>
> Let's suppose that someone do something like globally renaming a
> widely-used kAPI, e. g. something like:
>
>         $ git ls-files|xargs sed s,mutex_,new_mutex_, -i
>
> A change like that would touch lots of subsystems, making get_maintainer.pl
> to spend a lot of time processing it, and producing thousands of
> entries (btw, we had a change somewhat similar to the above a long time
> ago when mutex API was introduced and most of the semaphores were converted
> to use mutex kAPI instead).

What I end up doing in those cases is only Cc'ing the subsystem
maintainers. But that's a manual step of dropping all the driver and
SoC maintainers. A related problem is if you want to put who should
apply the patch on To. That's maybe as simple as whether the
maintainer entry has a git tree. I'd tackle that, but it's Perl, no
thanks.

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 21:11 ` Rob Herring
                     ` (2 preceding siblings ...)
  2021-06-17  6:52   ` Mauro Carvalho Chehab
@ 2021-06-17 14:47   ` Konstantin Ryabitsev
  2021-06-17 15:25     ` Steven Rostedt
  2021-06-17 17:15     ` Rob Herring
  3 siblings, 2 replies; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-17 14:47 UTC (permalink / raw)
  To: Rob Herring; +Cc: users, workflows

On Wed, Jun 16, 2021 at 03:11:33PM -0600, Rob Herring wrote:
> > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > pull requests on a project and convert them into fully well-formed patch
> > series. This would be a one-way operation, effectively turning Github into a
> > fancy "git-send-email" replacement. That said, it would have the following
> > benefits for both submitters and maintainers:
> 
> What makes this specific to Github PRs? A Github PR is really just a
> git branch plus a target at least to the extent we would use it here.
> The more of this that works on just a git branch, the more widely
> useful it would be.

It's not specific to GH at all. The same bot will be able to perform similar
actions to emails created by git-request-pull, e.g.:

- submitter runs git request-pull instead of git-format-patch
- submitter sends the output to a dedicated mailing list like
  pulls@lists.linux.dev
- the bot auto-converts these requests into patch series and sends them to
  proper destinations

This is more cumbersome to implement, though, which is why I want to get it
done with GH first, as this gets us some immediate perks:

1. we get a fast, stable remote to pull from instead of potentially slow,
   broken remote that's intermittently working
2. we can offload all sanity checking to github instead of reimplementing them
   with our own CI
3. we end up doing a lot less state tracking for v1..v2..v3 with github

Once the GH implementation is working, I can adapt it to also support other
forges and pull requests sent to mailing lists.

> > - submitters would no longer need to navigate their way around
> >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> >   have a patch-friendly outgoing mail gateway to properly contribute patches
> 
> Presumably, the bot would rely on get_maintainer.pl or it would get
> who to send to based on GH repo and reviewers? Without work on
> get_maintainer.pl, I don't think it will work well beyond simple
> cases.

The bot will actually rely on git-send-email, which can be configured to use
"tocmd" and "cccmd" to get the necessary info from get_maintainer.pl. E.g. in
my tests I have the following:

    tocmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nol"
    cccmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nom"

This does the right thing *most* of the time, and if it's not doing the right
thing, then it's the fault of get_maintainer.pl. :)

> > - subsystem maintainers can configure whatever CI pre-checks they want before
> >   the series is sent to them for review (and we can work on a library of
> >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> 
> What about all the patches that don't come from the GH PR? Those need
> CI pre-checks too. We're going to implement CI twice?

Most likely, yes, though we can certainly weigh how much we want to do on the
GH side. One thing I've thought about is letting bot inject a Tested-by: into
the patches it creates in order to reflect what's been already done, e.g.:

    Tested-by: GH Preflight Bot <ghbot@kernel.org>

There is indeed a lot of duplicated CI testing happening for Linux patches,
but it's a separate problem that I believe is being looked at by the Kernel CI
folks.

> The biggest issue I have on CI checks is applying patches. My algorithm is
> apply to my current base (last rc1 typically) or give up. I'm sure it could
> be a lot smarter trying several branches or looking at base-commit (not
> consistently used) or the git diff treeish hashes. What I'd really like is
> some bot or script that's applying series and publishing git branches with a
> messageid to git branch tool. 0-day is doing this now. Basically, the
> opposite direction as others have mentioned.

b4 will try to do this for you with -g, but it will only check against the
last 10 tags, as otherwise this takes a very long time, especially on series
that modify a lot of files. It can probably be a lot more intelligent about it
and work more like git bisect. I'll look into improving this feature.

> I think it needs to be per maintainer in terms of what checks run, but
> if submission is per maintainer project then the problem will be how
> does the submitter know where to send something? get_maintainer.pl
> tells them? It doesn't do a great job of that IMO. There's not a clear
> distinction of who applies my patch and others Cc'ed (file
> maintainers).

Yes, I'm leaning further towards having the submission point be a single
project per forge, and then just running some bare-minimum checks, similar to
what would be expected of the submitter using git-format-patch.

> I've kind of reached the conclusion that relying on submitters to get
> it right is never going to work (is Cc the DT list for DT patches so
> PW picks them up so hard!?). I think the model needs to be send
> patches to 'the kernel' and then maintainers have tools to extract all
> the patches they are interested in (the planned lore
> local-email-interface).

Yes! I'm hoping that we'll soon get to the point where "just send your patch
to linux-kernel@vger.kernel.org" becomes a reasonable thing to say again. E.g.
See this tread here:
https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/t/#u

However, I'm approaching this from multiple ends, so fixing up
get_maintainer.pl to return something reasonable also needs to happen imo.

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 10:09       ` Christoph Hellwig
@ 2021-06-17 14:57         ` Konstantin Ryabitsev
  2021-06-17 15:16           ` Mark Brown
  0 siblings, 1 reply; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-17 14:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dmitry Vyukov, Jiri Kosina, users, workflows

On Thu, Jun 17, 2021 at 11:09:34AM +0100, Christoph Hellwig wrote:
> > You don't want to accept these contributions even if you won't be able
> > to tell the difference on your end? But why? This looks really
> > orthogonal to a person's expertise and proficiency in software
> > development. Whether that's inability to deal with email, or no desire
> > to spend time on that, or just personal preferences.
> 
> Because I don't waste my time on the kind of crap that comes from
> github.  If you build a separate webinterface that allows anyone to send
> a proper series from a git tree that is all fine.  But github is toxic.

Won't this just end up reimplementing a lot of stuff that we already get "for
free" from Github and other forges? Yes, I know Github is proprietary, but so
are many SMTP gateways used to send the patch series. I don't see how what
the GH bot would do is different from:

- a developer uses some proprietary IDE to develop the code
- they send patches via their Exchange infrastructure

As long as we don't make proprietary or cloud tools any kind of dependency,
then we're not painting ourselves into any corners. As I mentioned earlier,
I'm also working on a version of the same bot that would work with pull
requests generated with "git-request-pull" and convert them into patch series
for review -- this would be a suitable alternative for people who aren't
interested in using Github.

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17  7:30 ` Greg KH
@ 2021-06-17 14:59   ` Konstantin Ryabitsev
  0 siblings, 0 replies; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-17 14:59 UTC (permalink / raw)
  To: Greg KH; +Cc: users, workflows

On Thu, Jun 17, 2021 at 09:30:02AM +0200, Greg KH wrote:
> What ever repo you put this on, it's going to take constant maintenance
> to keep it up to date and prune out the PRs that are going to accumulate
> there, as well as deal with the obvious spam and abuse issues that
> popular trees always accumulate.
> 
> Linus has already said to not do it on his "tree", and I offer a full
> "all branches" kernel mirror on github as well, but I don't want to be
> responsible for this type of mess.
> 
> So perhaps you get a new kernel.org "mirror" account somehow and put it
> there?  But again, someone will be responsible for keeping it alive and
> clean, a thankless task that will take constant work.

Well, part of the goal at the LF is to make maintainers' lives easier, so if
we can dedicate part of someone's LF-paid time to make the lives of 100
maintainers better, then it's a worthwhile investment. :)

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 20:24 ` Linus Torvalds
@ 2021-06-17 15:09   ` Konstantin Ryabitsev
  0 siblings, 0 replies; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-17 15:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: users, workflows

On Wed, Jun 16, 2021 at 01:24:52PM -0700, Linus Torvalds wrote:
> Yeah, I've had to turn off all emails from github because it's too
> noisy. Github also makes it very easyt to "invite" people to be team
> members, whether they want to or not, so I'm a "member" of random
> projects. There needs to be some way to make this _very_ much opt-in,
> not "anybody can do anything whether the target is ok with it or not".
> 
> I do think that having a way to generate patch series would be good,
> but I really really hope your robot would have
> 
>  (a) some limit on sizes
> 
>  (b) check that the commit messages are well-formed

I'm hoping that these 2 will be handled by Github pre-flight CI tests before
the pull request is even given the status "approved" and processed by the bot.
It's much easier to separate meatballs from the flies before they leave the
sausage factory. :)

> I've seen a lot of github code that doesn't have the Linux kernel kind
> of "good commit messages required".

Heh, and then there are pull requests like this one:
https://github.com/torvalds/linux/pull/779

It's a single commit that changes 44,000 files, so at least it would be easy
to catch this with pre-flight checks so the pull request never receives the
"approved" status.

In other words, I am fully aware of a lot of potentially negative outcomes of
this work, and I hope we'll be able to add sufficient anti-abuse measures so
it doesn't create useless work for anyone on the receiving end of the bot.

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 20:10 ` Willy Tarreau
@ 2021-06-17 15:11   ` Konstantin Ryabitsev
  2021-06-17 15:25     ` Willy Tarreau
  0 siblings, 1 reply; 53+ messages in thread
From: Konstantin Ryabitsev @ 2021-06-17 15:11 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: users, workflows

On Wed, Jun 16, 2021 at 10:10:11PM +0200, Willy Tarreau wrote:
> In haproxy we have a bot that forwards PRs to the mailing list. It
> CCs the author and sends a gentle message saying that reviews are
> made in public using plain-text e-mails, and that for this reason
> the PR is automatically closed. And I think it's easier for both
> sides, because users who are used to PRs would probably count a bit
> too much on the "github mode", where they're certain that someone
> will eventually notice that the PR count went from 1137 to 1138 and
> will have a look, while it's certain that there will be quite some
> losses, and by warning the submitter upfront there is less surprise.

Good to know. I was only aware of GitGitGadget, so knowing that HAProxy does
it as well is useful. Unfortunately, I can't really use much of GitGitGadget
code as it's too closely integrated into Azure and is written in typescript,
which I don't know at all.

-K

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 14:57         ` Konstantin Ryabitsev
@ 2021-06-17 15:16           ` Mark Brown
  2021-06-17 15:24             ` Laurent Pinchart
  2021-06-17 15:31             ` Paolo Bonzini
  0 siblings, 2 replies; 53+ messages in thread
From: Mark Brown @ 2021-06-17 15:16 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Christoph Hellwig, Dmitry Vyukov, Jiri Kosina, users, workflows

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Thu, Jun 17, 2021 at 10:57:28AM -0400, Konstantin Ryabitsev wrote:
> On Thu, Jun 17, 2021 at 11:09:34AM +0100, Christoph Hellwig wrote:

> > Because I don't waste my time on the kind of crap that comes from
> > github.  If you build a separate webinterface that allows anyone to send
> > a proper series from a git tree that is all fine.  But github is toxic.

> Won't this just end up reimplementing a lot of stuff that we already get "for
> free" from Github and other forges? Yes, I know Github is proprietary, but so
> are many SMTP gateways used to send the patch series. I don't see how what
> the GH bot would do is different from:

I think part of the concern here is that people have some standard
expectations for how projects they work with on Github are going to
function so if people end up using Github to submit patches we may end
up with some culture and process mismatches which could cause issues.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:16           ` Mark Brown
@ 2021-06-17 15:24             ` Laurent Pinchart
  2021-06-17 16:36               ` Geert Uytterhoeven
  2021-06-17 18:43               ` Miguel Ojeda
  2021-06-17 15:31             ` Paolo Bonzini
  1 sibling, 2 replies; 53+ messages in thread
From: Laurent Pinchart @ 2021-06-17 15:24 UTC (permalink / raw)
  To: Mark Brown
  Cc: Konstantin Ryabitsev, Christoph Hellwig, Dmitry Vyukov,
	Jiri Kosina, users, workflows

On Thu, Jun 17, 2021 at 04:16:59PM +0100, Mark Brown wrote:
> On Thu, Jun 17, 2021 at 10:57:28AM -0400, Konstantin Ryabitsev wrote:
> > On Thu, Jun 17, 2021 at 11:09:34AM +0100, Christoph Hellwig wrote:
> 
> > > Because I don't waste my time on the kind of crap that comes from
> > > github.  If you build a separate webinterface that allows anyone to send
> > > a proper series from a git tree that is all fine.  But github is toxic.
> 
> > Won't this just end up reimplementing a lot of stuff that we already get "for
> > free" from Github and other forges? Yes, I know Github is proprietary, but so
> > are many SMTP gateways used to send the patch series. I don't see how what
> > the GH bot would do is different from:
> 
> I think part of the concern here is that people have some standard
> expectations for how projects they work with on Github are going to
> function so if people end up using Github to submit patches we may end
> up with some culture and process mismatches which could cause issues.

There are some features (or lack thereof) of git..b that I suspect
actively decrease the quality of the hosted software. For instance, the
inability to comment on the commit messages during review can play a
role in the average low quality of those messages. Similarly, review is
often based on changes, not on individual commits, which results in
commits being badly split (or not split at all, it's common to see very
large commits with a "fix stuff" commit message).

Developers who have only been exposed to those platforms are very likely
to never have learnt the importance of commit messages, and of proper
split of changes across commits. Those are issues that are inherent to
those platforms and that we will likely need to handle in an automated
way (at least to some extent) or maintainers will become crazy (I know
we already suffer from those issues with the mailing list-based
workflow, but I believe it would get worse, not better, and some of our
maintainers are already suffering way more than they should).

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 14:33         ` Rob Herring
@ 2021-06-17 15:24           ` Mauro Carvalho Chehab
  2021-06-17 15:38             ` Rob Herring
  2021-06-17 15:45             ` Christoph Hellwig
  0 siblings, 2 replies; 53+ messages in thread
From: Mauro Carvalho Chehab @ 2021-06-17 15:24 UTC (permalink / raw)
  To: Rob Herring; +Cc: Dmitry Vyukov, Konstantin Ryabitsev, users, workflows

Em Thu, 17 Jun 2021 08:33:29 -0600
Rob Herring <robh@kernel.org> escreveu:

> On Thu, Jun 17, 2021 at 2:55 AM Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> >
> > Em Thu, 17 Jun 2021 10:20:31 +0200
> > Dmitry Vyukov <dvyukov@google.com> escreveu:
> >  
> > > On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
> > > <mchehab+huawei@kernel.org> wrote:  
> > > >
> > > > Em Wed, 16 Jun 2021 15:11:33 -0600
> > > > Rob Herring <robh@kernel.org> escreveu:
> > > >  
> > > > > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > > > > <konstantin@linuxfoundation.org> wrote:  
> > > > > >
> > > > > > Hi, all:
> > > > > >
> > > > > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > > > > pull requests on a project and convert them into fully well-formed patch
> > > > > > series. This would be a one-way operation, effectively turning Github into a
> > > > > > fancy "git-send-email" replacement. That said, it would have the following
> > > > > > benefits for both submitters and maintainers:  
> > > > >
> > > > > What makes this specific to Github PRs? A Github PR is really just a
> > > > > git branch plus a target at least to the extent we would use it here.
> > > > > The more of this that works on just a git branch, the more widely
> > > > > useful it would be.
> > > > >  
> > > > > > - submitters would no longer need to navigate their way around
> > > > > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > > > > >   have a patch-friendly outgoing mail gateway to properly contribute patches  
> > > > >
> > > > > Presumably, the bot would rely on get_maintainer.pl or it would get
> > > > > who to send to based on GH repo and reviewers? Without work on
> > > > > get_maintainer.pl, I don't think it will work well beyond simple
> > > > > cases.  
> > > >
> > > > Some sanity test is needed, as otherwise it will end by trying to send
> > > > the patch to a large number of people.  
> > >
> > > I think this system needs to use get_maintainer.pl results as is and
> > > any fixing/filtering/sanity checking needs to go into
> > > get_maintainer.pl itself.
> > > get_maintainer.pl is what is used by lots of contributors, the only
> > > option for any automated systems, what is used by new contributors if
> > > they don't use this system anyway. And even experienced developers
> > > know internal rules only for a few subsystems and use
> > > get_maintainer.pl when sending a one-off patch to another subsystem
> > > (what else?).
> > >
> > > I don't see where we are getting if we accept get_maintainer.pl
> > > produces bad results and needs additional fixing in every system out
> > > there (dozens) and when used by humans. All systems would need the
> > > same filtering/checking rules and they need to keep in sync. What a
> > > kernel developer would even need to do to fix something (add/remove
> > > themselves)? Go and talk to a large unknown set of systems that
> > > duplicate the same additional rules?
> > >
> > > And the only way to surface actual issues with get_maintainer.pl is to
> > > start using it. In fact it's already widely used as is, so I am not
> > > sure it's particularly bad.  
> >
> > I'm not saying that get_maintainer.pl produces bad result. Depending
> > on what is done, it could produce a very large output.
> >
> > Let's suppose that someone do something like globally renaming a
> > widely-used kAPI, e. g. something like:
> >
> >         $ git ls-files|xargs sed s,mutex_,new_mutex_, -i
> >
> > A change like that would touch lots of subsystems, making get_maintainer.pl
> > to spend a lot of time processing it, and producing thousands of
> > entries (btw, we had a change somewhat similar to the above a long time
> > ago when mutex API was introduced and most of the semaphores were converted
> > to use mutex kAPI instead).  
> 
> What I end up doing in those cases is only Cc'ing the subsystem
> maintainers. But that's a manual step of dropping all the driver and
> SoC maintainers.

Yeah, surely it would be a lot better if the maintainer's file would
have a way to distinguish between driver and subsystem maintainers.

> A related problem is if you want to put who should
> apply the patch on To. That's maybe as simple as whether the
> maintainer entry has a git tree. 

It is not that simple. Driver maintainers usually also point to the
subsystem's tree.

One way would be to look at the committer for the file.

Another one would be to check if there are multiple maintainers
for the same file, with different entries. If so, the first
entry is the driver maintainer, and the last one, the subsystem
maintainer. Yet, this is heuristics, and may lead to errors.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:11   ` Konstantin Ryabitsev
@ 2021-06-17 15:25     ` Willy Tarreau
  0 siblings, 0 replies; 53+ messages in thread
From: Willy Tarreau @ 2021-06-17 15:25 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Thu, Jun 17, 2021 at 11:11:55AM -0400, Konstantin Ryabitsev wrote:
> On Wed, Jun 16, 2021 at 10:10:11PM +0200, Willy Tarreau wrote:
> > In haproxy we have a bot that forwards PRs to the mailing list. It
> > CCs the author and sends a gentle message saying that reviews are
> > made in public using plain-text e-mails, and that for this reason
> > the PR is automatically closed. And I think it's easier for both
> > sides, because users who are used to PRs would probably count a bit
> > too much on the "github mode", where they're certain that someone
> > will eventually notice that the PR count went from 1137 to 1138 and
> > will have a look, while it's certain that there will be quite some
> > losses, and by warning the submitter upfront there is less surprise.
> 
> Good to know. I was only aware of GitGitGadget, so knowing that HAProxy does
> it as well is useful. Unfortunately, I can't really use much of GitGitGadget
> code as it's too closely integrated into Azure and is written in typescript,
> which I don't know at all.

Note, I was speaking about a return on experience, not particularly about
code. I don't know what language the bot is written in, it's entirely
managed by one of our great contributors, but if you're interested in
exchanging with him I can put you in contact with him off-list.

Anyway if you're interested in seeking some feedback on such practices
from other projects, just look for old ones which adopted github long
after having solved their versionning issues. Very often the github
workflow doesn't suit them at all. Nginx automatically closes PRs for
example. For Apache it's not very clear. FreeBSD uses github as a mirror
and has very few PRs. Gcc also uses it as a mirror and has very few PRs.
Probably these ones use some form of automated bots to close them, or
do that in batches once a year.

Willy

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 14:47   ` Konstantin Ryabitsev
@ 2021-06-17 15:25     ` Steven Rostedt
  2021-06-17 15:48       ` Christoph Hellwig
  2021-06-17 17:15     ` Rob Herring
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2021-06-17 15:25 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Rob Herring, users, workflows

On Thu, 17 Jun 2021 10:47:41 -0400
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:

> It's not specific to GH at all. The same bot will be able to perform similar
> actions to emails created by git-request-pull, e.g.:
> 
> - submitter runs git request-pull instead of git-format-patch
> - submitter sends the output to a dedicated mailing list like
>   pulls@lists.linux.dev
> - the bot auto-converts these requests into patch series and sends them to
>   proper destinations

I know many people that this will be useful for. Especially those that have
to send with their work email, and have to fight Exchange servers. And
gmail isn't very good at sending patches either.

Let me see if I understand the proposed work flow, and perhaps add some
myself.

1) Someone clones Linus's tree or some subsystem tree into their github
   account.

2) Does some work.

3) Pushes it to their account on github.

4) Runs git pull-request that will go to pulls@lists.kernel.dev

5) Then this bot can pull the request into a kernel.org tree.

6) Runs checkpatch on each commit.

   On errors, sends a report to the submitter, and stops here.

7) Could have the zero-day-bot run against it.

   On errors, send it back to the submitter.

8) If it passes all the above, it then gets broken into a patch series and
   then sent out to the mailing lists and Cc's maintainers based on the
   get_maintainers list. Probably should be filtered a bit.

Is this what is being proposed?

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:16           ` Mark Brown
  2021-06-17 15:24             ` Laurent Pinchart
@ 2021-06-17 15:31             ` Paolo Bonzini
  2021-06-17 17:06               ` Stefano Stabellini
  1 sibling, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2021-06-17 15:31 UTC (permalink / raw)
  To: Mark Brown, Konstantin Ryabitsev
  Cc: Christoph Hellwig, Dmitry Vyukov, Jiri Kosina, users, workflows

On 17/06/21 17:16, Mark Brown wrote:
>> Won't this just end up reimplementing a lot of stuff that we already get "for
>> free" from Github and other forges? Yes, I know Github is proprietary, but so
>> are many SMTP gateways used to send the patch series. I don't see how what
>> the GH bot would do is different from:
> I think part of the concern here is that people have some standard
> expectations for how projects they work with on Github are going to
> function so if people end up using Github to submit patches we may end
> up with some culture and process mismatches which could cause issues.

This is true, and it's why I liked the suggestion of a bot that 
auto-closes the pull request but still 1) removes the need for the 
submitter to set up "git send-email" and 2) only does the send after 
basic checks for Signed-off-by and the like (checkpatch is already a bit 
borderline).

Paolo


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:24           ` Mauro Carvalho Chehab
@ 2021-06-17 15:38             ` Rob Herring
  2021-06-17 15:45             ` Christoph Hellwig
  1 sibling, 0 replies; 53+ messages in thread
From: Rob Herring @ 2021-06-17 15:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Dmitry Vyukov, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 9:24 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Thu, 17 Jun 2021 08:33:29 -0600
> Rob Herring <robh@kernel.org> escreveu:
>
> > On Thu, Jun 17, 2021 at 2:55 AM Mauro Carvalho Chehab
> > <mchehab+huawei@kernel.org> wrote:
> > >
> > > Em Thu, 17 Jun 2021 10:20:31 +0200
> > > Dmitry Vyukov <dvyukov@google.com> escreveu:
> > >
> > > > On Thu, Jun 17, 2021 at 8:53 AM Mauro Carvalho Chehab
> > > > <mchehab+huawei@kernel.org> wrote:
> > > > >
> > > > > Em Wed, 16 Jun 2021 15:11:33 -0600
> > > > > Rob Herring <robh@kernel.org> escreveu:
> > > > >
> > > > > > On Wed, Jun 16, 2021 at 11:18 AM Konstantin Ryabitsev
> > > > > > <konstantin@linuxfoundation.org> wrote:
> > > > > > >
> > > > > > > Hi, all:
> > > > > > >
> > > > > > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > > > > > pull requests on a project and convert them into fully well-formed patch
> > > > > > > series. This would be a one-way operation, effectively turning Github into a
> > > > > > > fancy "git-send-email" replacement. That said, it would have the following
> > > > > > > benefits for both submitters and maintainers:
> > > > > >
> > > > > > What makes this specific to Github PRs? A Github PR is really just a
> > > > > > git branch plus a target at least to the extent we would use it here.
> > > > > > The more of this that works on just a git branch, the more widely
> > > > > > useful it would be.
> > > > > >
> > > > > > > - submitters would no longer need to navigate their way around
> > > > > > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > > > > > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> > > > > >
> > > > > > Presumably, the bot would rely on get_maintainer.pl or it would get
> > > > > > who to send to based on GH repo and reviewers? Without work on
> > > > > > get_maintainer.pl, I don't think it will work well beyond simple
> > > > > > cases.
> > > > >
> > > > > Some sanity test is needed, as otherwise it will end by trying to send
> > > > > the patch to a large number of people.
> > > >
> > > > I think this system needs to use get_maintainer.pl results as is and
> > > > any fixing/filtering/sanity checking needs to go into
> > > > get_maintainer.pl itself.
> > > > get_maintainer.pl is what is used by lots of contributors, the only
> > > > option for any automated systems, what is used by new contributors if
> > > > they don't use this system anyway. And even experienced developers
> > > > know internal rules only for a few subsystems and use
> > > > get_maintainer.pl when sending a one-off patch to another subsystem
> > > > (what else?).
> > > >
> > > > I don't see where we are getting if we accept get_maintainer.pl
> > > > produces bad results and needs additional fixing in every system out
> > > > there (dozens) and when used by humans. All systems would need the
> > > > same filtering/checking rules and they need to keep in sync. What a
> > > > kernel developer would even need to do to fix something (add/remove
> > > > themselves)? Go and talk to a large unknown set of systems that
> > > > duplicate the same additional rules?
> > > >
> > > > And the only way to surface actual issues with get_maintainer.pl is to
> > > > start using it. In fact it's already widely used as is, so I am not
> > > > sure it's particularly bad.
> > >
> > > I'm not saying that get_maintainer.pl produces bad result. Depending
> > > on what is done, it could produce a very large output.
> > >
> > > Let's suppose that someone do something like globally renaming a
> > > widely-used kAPI, e. g. something like:
> > >
> > >         $ git ls-files|xargs sed s,mutex_,new_mutex_, -i
> > >
> > > A change like that would touch lots of subsystems, making get_maintainer.pl
> > > to spend a lot of time processing it, and producing thousands of
> > > entries (btw, we had a change somewhat similar to the above a long time
> > > ago when mutex API was introduced and most of the semaphores were converted
> > > to use mutex kAPI instead).
> >
> > What I end up doing in those cases is only Cc'ing the subsystem
> > maintainers. But that's a manual step of dropping all the driver and
> > SoC maintainers.
>
> Yeah, surely it would be a lot better if the maintainer's file would
> have a way to distinguish between driver and subsystem maintainers.
>
> > A related problem is if you want to put who should
> > apply the patch on To. That's maybe as simple as whether the
> > maintainer entry has a git tree.
>
> It is not that simple. Driver maintainers usually also point to the
> subsystem's tree.

I somewhat suspected that. Perhaps it should be explicit then.

Or remove those as it's redundant information.

> One way would be to look at the committer for the file.
>
> Another one would be to check if there are multiple maintainers
> for the same file, with different entries. If so, the first
> entry is the driver maintainer, and the last one, the subsystem
> maintainer. Yet, this is heuristics, and may lead to errors.

That's every binding file. :( There's me, the subsystem maintainer and
the file/device maintainer. Could be a 4th one if there's file
wildcard matching.

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:24           ` Mauro Carvalho Chehab
  2021-06-17 15:38             ` Rob Herring
@ 2021-06-17 15:45             ` Christoph Hellwig
  1 sibling, 0 replies; 53+ messages in thread
From: Christoph Hellwig @ 2021-06-17 15:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rob Herring, Dmitry Vyukov, Konstantin Ryabitsev, users, workflows

On Thu, Jun 17, 2021 at 05:24:50PM +0200, Mauro Carvalho Chehab wrote:
> > > A change like that would touch lots of subsystems, making get_maintainer.pl
> > > to spend a lot of time processing it, and producing thousands of
> > > entries (btw, we had a change somewhat similar to the above a long time
> > > ago when mutex API was introduced and most of the semaphores were converted
> > > to use mutex kAPI instead).  
> > 
> > What I end up doing in those cases is only Cc'ing the subsystem
> > maintainers. But that's a manual step of dropping all the driver and
> > SoC maintainers.
> 
> Yeah, surely it would be a lot better if the maintainer's file would
> have a way to distinguish between driver and subsystem maintainers.

Well, the problem here is really that get_mainters is broken and things
anyone touching a file would care about future patches.  Right now the
only workaround for that is to get yourself added to
.get_maintainer.ignore.

The actual MAINTAINERS file processing issues are minor compared
to that.  One thing that I'd love is a way to express that yes, there
are maintainers, and no you should not Cc them but just send the
patches to the list.  Maybe even with a core files vs drivers split
as suggested by you above.  But compared to the messed up git
heuristics that is pretty much a minor issue.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:25     ` Steven Rostedt
@ 2021-06-17 15:48       ` Christoph Hellwig
  2021-06-17 15:53         ` Laurent Pinchart
  0 siblings, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2021-06-17 15:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Konstantin Ryabitsev, Rob Herring, users, workflows

On Thu, Jun 17, 2021 at 11:25:56AM -0400, Steven Rostedt wrote:
> Let me see if I understand the proposed work flow, and perhaps add some
> myself.
> 
> 1) Someone clones Linus's tree or some subsystem tree into their github
>    account.
> 
> 2) Does some work.
> 
> 3) Pushes it to their account on github.
> 
> 4) Runs git pull-request that will go to pulls@lists.kernel.dev
> 
> 5) Then this bot can pull the request into a kernel.org tree.
> 
> 6) Runs checkpatch on each commit.
> 
>    On errors, sends a report to the submitter, and stops here.
> 
> 7) Could have the zero-day-bot run against it.
> 
>    On errors, send it back to the submitter.
> 
> 8) If it passes all the above, it then gets broken into a patch series and
>    then sent out to the mailing lists and Cc's maintainers based on the
>    get_maintainers list. Probably should be filtered a bit.
> 
> Is this what is being proposed?

Minus checkpatch.pl which actively causes harm these days this would be
incredibly useful.  I'm all for this while I'm totally opposed to
plugging into the fashionable SAAS desaster of the day.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:48       ` Christoph Hellwig
@ 2021-06-17 15:53         ` Laurent Pinchart
  0 siblings, 0 replies; 53+ messages in thread
From: Laurent Pinchart @ 2021-06-17 15:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Steven Rostedt, Konstantin Ryabitsev, Rob Herring, users, workflows

Hi Christoph,

On Thu, Jun 17, 2021 at 04:48:59PM +0100, Christoph Hellwig wrote:
> On Thu, Jun 17, 2021 at 11:25:56AM -0400, Steven Rostedt wrote:
> > Let me see if I understand the proposed work flow, and perhaps add some
> > myself.
> > 
> > 1) Someone clones Linus's tree or some subsystem tree into their github
> >    account.
> > 
> > 2) Does some work.
> > 
> > 3) Pushes it to their account on github.
> > 
> > 4) Runs git pull-request that will go to pulls@lists.kernel.dev
> > 
> > 5) Then this bot can pull the request into a kernel.org tree.
> > 
> > 6) Runs checkpatch on each commit.
> > 
> >    On errors, sends a report to the submitter, and stops here.
> > 
> > 7) Could have the zero-day-bot run against it.
> > 
> >    On errors, send it back to the submitter.
> > 
> > 8) If it passes all the above, it then gets broken into a patch series and
> >    then sent out to the mailing lists and Cc's maintainers based on the
> >    get_maintainers list. Probably should be filtered a bit.
> > 
> > Is this what is being proposed?
> 
> Minus checkpatch.pl which actively causes harm these days this would be
> incredibly useful.  I'm all for this while I'm totally opposed to
> plugging into the fashionable SAAS desaster of the day.

For the heuristics-based checkers that don't provide a 100% trustable
pass/fail result, we could send a report to the submitter with the
option to override the checker and go ahead, or submit a fixed pull
request. Assuming that good faith would be the norm rather than the
exception, I would expect submitters to take this into account more
often than not, which can reduce the load on maintainers.

This however doesn't solve the problem of differences between subsystems
in, for instance, coding style.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:24             ` Laurent Pinchart
@ 2021-06-17 16:36               ` Geert Uytterhoeven
  2021-06-17 18:43               ` Miguel Ojeda
  1 sibling, 0 replies; 53+ messages in thread
From: Geert Uytterhoeven @ 2021-06-17 16:36 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Mark Brown, Konstantin Ryabitsev, Christoph Hellwig,
	Dmitry Vyukov, Jiri Kosina, users, workflows

Hi Laurent,

On Thu, Jun 17, 2021 at 5:24 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Thu, Jun 17, 2021 at 04:16:59PM +0100, Mark Brown wrote:
> > On Thu, Jun 17, 2021 at 10:57:28AM -0400, Konstantin Ryabitsev wrote:
> > > On Thu, Jun 17, 2021 at 11:09:34AM +0100, Christoph Hellwig wrote:
> > > > Because I don't waste my time on the kind of crap that comes from
> > > > github.  If you build a separate webinterface that allows anyone to send
> > > > a proper series from a git tree that is all fine.  But github is toxic.
> >
> > > Won't this just end up reimplementing a lot of stuff that we already get "for
> > > free" from Github and other forges? Yes, I know Github is proprietary, but so
> > > are many SMTP gateways used to send the patch series. I don't see how what
> > > the GH bot would do is different from:
> >
> > I think part of the concern here is that people have some standard
> > expectations for how projects they work with on Github are going to
> > function so if people end up using Github to submit patches we may end
> > up with some culture and process mismatches which could cause issues.
>
> There are some features (or lack thereof) of git..b that I suspect
> actively decrease the quality of the hosted software. For instance, the
> inability to comment on the commit messages during review can play a
> role in the average low quality of those messages. Similarly, review is
> often based on changes, not on individual commits, which results in
> commits being badly split (or not split at all, it's common to see very
> large commits with a "fix stuff" commit message).

I thought so, too, until I received a first comment on an individual commit
in a github PR yesterday[1].

> Developers who have only been exposed to those platforms are very likely
> to never have learnt the importance of commit messages, and of proper
> split of changes across commits. Those are issues that are inherent to
> those platforms and that we will likely need to handle in an automated
> way (at least to some extent) or maintainers will become crazy (I know
> we already suffer from those issues with the mailing list-based
> workflow, but I believe it would get worse, not better, and some of our
> maintainers are already suffering way more than they should).

That's definitely not a problem that originated on git..b. Been like that
in the corporate world before.  Guess what happened after switching
from a more-forgiving VCS to git, and seeing

    Aborting commit due to empty commit message.

? ... Dummy commit messages, of course.

[1] https://github.com/esmil/linux/commit/f5b077c3e80d85b1c2749999c9f74491f69a6ceb#comments

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:31             ` Paolo Bonzini
@ 2021-06-17 17:06               ` Stefano Stabellini
  2021-06-17 22:35                 ` Jiri Kosina
  0 siblings, 1 reply; 53+ messages in thread
From: Stefano Stabellini @ 2021-06-17 17:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mark Brown, Konstantin Ryabitsev, Christoph Hellwig,
	Dmitry Vyukov, Jiri Kosina, users, workflows

On Thu, 17 Jun 2021, Paolo Bonzini wrote:
> On 17/06/21 17:16, Mark Brown wrote:
> > > Won't this just end up reimplementing a lot of stuff that we already get
> > > "for
> > > free" from Github and other forges? Yes, I know Github is proprietary, but
> > > so
> > > are many SMTP gateways used to send the patch series. I don't see how what
> > > the GH bot would do is different from:
> > I think part of the concern here is that people have some standard
> > expectations for how projects they work with on Github are going to
> > function so if people end up using Github to submit patches we may end
> > up with some culture and process mismatches which could cause issues.
> 
> This is true, and it's why I liked the suggestion of a bot that auto-closes
> the pull request but still 1) removes the need for the submitter to set up
> "git send-email" and 2) only does the send after basic checks for
> Signed-off-by and the like (checkpatch is already a bit borderline).

+1 for this idea

This would be very useful. Obviously I have no issues with emails and
everyone in this thread has no issues with emails otherwise we wouldn't
be here. (Personally I despise the github workflow.)


But newcomers often complain that it is too difficult to configure their
company email to work with git send-email and the LKML. Things such as:

- the company only offers Outlook as a client
- the company automatically adds the annoying disclaimer at the end of
  every email ("IMPORTANT NOTICE: The contents of this email and any
  attachments are confidential and may also be privileged...")

My comeback was: you don't have to use the company email. Setup your own
email. Their answer was that they are not allowed to do company work on
their private emails.


The GH bot would allow them to skip the git send-email setup. However,
in all fairness, they would still be stuck to Outlook with the
disclaimer to reply to code reviews unless they do something.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 18:13     ` Miguel Ojeda
@ 2021-06-17 17:07       ` Serge E. Hallyn
  0 siblings, 0 replies; 53+ messages in thread
From: Serge E. Hallyn @ 2021-06-17 17:07 UTC (permalink / raw)
  To: Miguel Ojeda; +Cc: Konstantin Ryabitsev, Johannes Berg, users, workflows

On Wed, Jun 16, 2021 at 08:13:55PM +0200, Miguel Ojeda wrote:
> On Wed, Jun 16, 2021 at 7:55 PM Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> >
> > Unless I'm misreading the docs, the submitter should be able to reopen the
> > closed pull request, so it's not a permanent action. This may actually make it
> > easier to track things for v1/v2, since we wouldn't need to continuously poll
> > open pull requests to see if any of them changed.
> 
> For versioning, what I request at the moment is that people force-push
> their PR, overwriting the previous set of commits, rather than sending
> a new PR. Then people write in the GitHub-PR-message (which we use as
> the cover letter) the diff/updates, if any. The bot detects changes
> and pushes a new review every time, and hides its own past messages
> (i.e. for previous versions).

In Jenkins that gives you /1 /2 /3 etc versions, but IIRC doesn't doing
this in github lead to lost commits?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 14:47   ` Konstantin Ryabitsev
  2021-06-17 15:25     ` Steven Rostedt
@ 2021-06-17 17:15     ` Rob Herring
  1 sibling, 0 replies; 53+ messages in thread
From: Rob Herring @ 2021-06-17 17:15 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Thu, Jun 17, 2021 at 8:47 AM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Wed, Jun 16, 2021 at 03:11:33PM -0600, Rob Herring wrote:
> > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > pull requests on a project and convert them into fully well-formed patch
> > > series. This would be a one-way operation, effectively turning Github into a
> > > fancy "git-send-email" replacement. That said, it would have the following
> > > benefits for both submitters and maintainers:
> >
> > What makes this specific to Github PRs? A Github PR is really just a
> > git branch plus a target at least to the extent we would use it here.
> > The more of this that works on just a git branch, the more widely
> > useful it would be.
>
> It's not specific to GH at all. The same bot will be able to perform similar
> actions to emails created by git-request-pull, e.g.:
>
> - submitter runs git request-pull instead of git-format-patch
> - submitter sends the output to a dedicated mailing list like
>   pulls@lists.linux.dev
> - the bot auto-converts these requests into patch series and sends them to
>   proper destinations

This seems like 2 separate problems. The first is automating the steps
from a branch to patch series. The second is just avoiding email
configuration issues.
The first would be useful to everyone, so it would be great if we
could keep these as separate tools. This is why b4 is so great and has
been pretty widely adopted. Maintainers can plug-in whatever parts of
it they want for their existing workflow.


> This is more cumbersome to implement, though, which is why I want to get it
> done with GH first, as this gets us some immediate perks:
>
> 1. we get a fast, stable remote to pull from instead of potentially slow,
>    broken remote that's intermittently working
> 2. we can offload all sanity checking to github instead of reimplementing them
>    with our own CI
> 3. we end up doing a lot less state tracking for v1..v2..v3 with github
>
> Once the GH implementation is working, I can adapt it to also support other
> forges and pull requests sent to mailing lists.
>
> > > - submitters would no longer need to navigate their way around
> > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> >
> > Presumably, the bot would rely on get_maintainer.pl or it would get
> > who to send to based on GH repo and reviewers? Without work on
> > get_maintainer.pl, I don't think it will work well beyond simple
> > cases.
>
> The bot will actually rely on git-send-email, which can be configured to use
> "tocmd" and "cccmd" to get the necessary info from get_maintainer.pl. E.g. in
> my tests I have the following:
>
>     tocmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nol"
>     cccmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nom"
>
> This does the right thing *most* of the time, and if it's not doing the right
> thing, then it's the fault of get_maintainer.pl. :)

True, but I suspect the complaints will be directed at the bot rather
than generating fixes to get_maintainer.pl.

> > > - subsystem maintainers can configure whatever CI pre-checks they want before
> > >   the series is sent to them for review (and we can work on a library of
> > >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> >
> > What about all the patches that don't come from the GH PR? Those need
> > CI pre-checks too. We're going to implement CI twice?
>
> Most likely, yes, though we can certainly weigh how much we want to do on the
> GH side. One thing I've thought about is letting bot inject a Tested-by: into
> the patches it creates in order to reflect what's been already done, e.g.:
>
>     Tested-by: GH Preflight Bot <ghbot@kernel.org>
>
> There is indeed a lot of duplicated CI testing happening for Linux patches,
> but it's a separate problem that I believe is being looked at by the Kernel CI
> folks.
>
> > The biggest issue I have on CI checks is applying patches. My algorithm is
> > apply to my current base (last rc1 typically) or give up. I'm sure it could
> > be a lot smarter trying several branches or looking at base-commit (not
> > consistently used) or the git diff treeish hashes. What I'd really like is
> > some bot or script that's applying series and publishing git branches with a
> > messageid to git branch tool. 0-day is doing this now. Basically, the
> > opposite direction as others have mentioned.
>
> b4 will try to do this for you with -g, but it will only check against the
> last 10 tags, as otherwise this takes a very long time, especially on series
> that modify a lot of files. It can probably be a lot more intelligent about it
> and work more like git bisect. I'll look into improving this feature.

Humm, I can't seem to get -g to tell me anything but 'current tree'.

How I think this could work is extracting all the files and their base
treeish hash from a patch and iterate thru branches (user and/or
project specific) and find branch which has matching set of treeish
hashes. Seems like this wouldn't be too hard for someone that knows
the git internals.

Rob

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 15:24             ` Laurent Pinchart
  2021-06-17 16:36               ` Geert Uytterhoeven
@ 2021-06-17 18:43               ` Miguel Ojeda
  1 sibling, 0 replies; 53+ messages in thread
From: Miguel Ojeda @ 2021-06-17 18:43 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Mark Brown, Konstantin Ryabitsev, Christoph Hellwig,
	Dmitry Vyukov, Jiri Kosina, users, workflows

On Thu, Jun 17, 2021 at 5:24 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> There are some features (or lack thereof) of git..b that I suspect
> actively decrease the quality of the hosted software. For instance, the
> inability to comment on the commit messages during review can play a
> role in the average low quality of those messages. Similarly, review is

It only decreases the quality of the repository if you allow the
commits to go in.

The same happens in the LKML -- some people have sent bad messages,
but we correct them and they learn.

Also, as soon as you have a queue of well-written PRs (equivalently:
messages in lore), then it is easier for others to know what the
customs for a given project are.

> Developers who have only been exposed to those platforms are very likely
> to never have learnt the importance of commit messages, and of proper
> split of changes across commits. Those are issues that are inherent to
> those platforms and that we will likely need to handle in an automated
> way (at least to some extent) or maintainers will become crazy (I know
> we already suffer from those issues with the mailing list-based
> workflow, but I believe it would get worse, not better, and some of our
> maintainers are already suffering way more than they should).

I am not sure it would get worse: in Rust for Linux, contributors that
were new to the kernel process learnt quite quickly how to write
proper commit messages.

I think all depends on how willing to learn people is and how strict
maintainers are enforcing things.

But I agree that a bot that automatically performs the basic reviews
helps *a lot*.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
                   ` (9 preceding siblings ...)
  2021-06-17  8:24 ` Christoph Hellwig
@ 2021-06-17 20:42 ` Brendan Higgins
  10 siblings, 0 replies; 53+ messages in thread
From: Brendan Higgins @ 2021-06-17 20:42 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, workflows

On Wed, Jun 16, 2021 at 10:18 AM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> Hi, all:
>
> I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> pull requests on a project and convert them into fully well-formed patch
> series. This would be a one-way operation, effectively turning Github into a
> fancy "git-send-email" replacement. That said, it would have the following
> benefits for both submitters and maintainers:

Neat! I have been going from the other direction, trying to take
patches sent to LKML (actually from the kselftest mailing list) and
upload them to Gerrit:

https://linux-review.googlesource.com/

My thinking was that most developers would want to keep sending
patches via git-send-email, but sometimes it is really nice to have a
side-by side diff without having to download a patch and apply it
somewhere. Also, sometimes it is nice to have all the comments on code
in a single place (I have got it able to extract comments from emails
- although not super reliably).

I was thinking it might be nice to be able to go both directions, but
it had not been on the top of my priority list.

> - submitters would no longer need to navigate their way around
>   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
>   have a patch-friendly outgoing mail gateway to properly contribute patches

I like having the tools run automatically, that was something I wanted
to do with my LKML-Gerrit-Bridge at some point.

> - subsystem maintainers can configure whatever CI pre-checks they want before
>   the series is sent to them for review (and we can work on a library of
>   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)

That would be awesome!

> - the bot should (eventually) be clever enough to automatically track v1..vX
>   on pull request updates, assuming the API makes it straightforward

Yeah, that's something I have wanted to do with the
LKML-Gerrit-Bridge. This would be super awesome!

> A this point, I need your input to make sure I'm not going down any wrong
> paths:
>
> - My general assumption is that putting this bot on github.com/torvalds/linux
>   would not be useful, as this will probably result in more noise than signal.
>   I expect that subsystem maintainers would prefer to configure their own
>   GitHub projects so they can have full control on what kind of CI prechecks
>   must succeed before the series is sent out. Is that a valid assumption, or
>   should I be working towards having a single point of submission on each
>   forge platform (Github, Gitlab, etc)?

I would like to be able to run KUnit tests, and I am sure other
maintainers would want to run other tests. Beyond that, I don't think
I would want to deviate from your above defaults too much.

> - We can *probably* track when patch series get applied and auto-close pull
>   requests that are accepted -- but it's not going to be perfect (we'd
>   basically be using git-patch-id to match commits to pull requests). Or is it
>   better to auto-close the pull request right after it's sent to the list with
>   a message like "thank you, please monitor your email for the rest of the
>   process"? The latter is much easier for me, of course. :)

That would be super awesome as well. My plan with the
LKML-Gerrit-Bridge was to leave it open until applied and then have
some sort of timeout.

> I'll probably have more questions as I go along, but I wanted to start with
> these two.

Not sure how far along you are with this. But I would love to see this
happen. It sounds like you are planning on supporting most of the
features I am trying to get in LKML-Gerrit-Bridge, but not all - I
mainly would like to get patches uploaded from the mailing lists. I am
not sure how much we could collaborate here depending on how far along
you are. I saw some concerns in some other emails about relying on
open source, Gerrit could be a solution to that. Anyway, let me know
if you are interested in my project. You can see my code here:

https://github.com/google/lkml-gerrit-bridge

Also note, I said "I" above, but I have gotten some contributions from
others. I'm not trying to take credit away from them, more that I
don't want them to take blame for any of my bad ideas. Also, this is
in NO WAY a plug for my project. It's not very stable right now, and
has some issues - it is very experimental. I am just hoping that it
can maybe be of some help to you.

Cheers

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: RFC: Github PR bot questions
  2021-06-17 17:06               ` Stefano Stabellini
@ 2021-06-17 22:35                 ` Jiri Kosina
  0 siblings, 0 replies; 53+ messages in thread
From: Jiri Kosina @ 2021-06-17 22:35 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Paolo Bonzini, Mark Brown, Konstantin Ryabitsev,
	Christoph Hellwig, Dmitry Vyukov, users, workflows

On Thu, 17 Jun 2021, Stefano Stabellini wrote:

> But newcomers often complain that it is too difficult to configure their
> company email to work with git send-email and the LKML. Things such as:
> 
> - the company only offers Outlook as a client
> - the company automatically adds the annoying disclaimer at the end of
>   every email ("IMPORTANT NOTICE: The contents of this email and any
>   attachments are confidential and may also be privileged...")

AFAIK Linux Foundation is currently in the process of setting up a 
@linux.dev e-mail domain for kernel developers (hosted by migadu.com) that 
doesn't suffer from any of the above.

So either the employer wants to be recognized in the commit / LKML 
messages, and they are able to get their mail systems to behave [1], or 
they are not, and then people will eventually be able to use LF-provided 
@linux.dev address, and cast an indirect public shame on their employer by 
doing so :)

[1] @linux.intel.com, @linux.ibm.com, @suse.[de,cz], you name it .. that's 
    exactly the reason for existence of those email domains

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2021-06-17 22:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 17:18 RFC: Github PR bot questions Konstantin Ryabitsev
2021-06-16 17:24 ` Drew DeVault
2021-06-16 17:47 ` Johannes Berg
2021-06-16 17:55   ` Konstantin Ryabitsev
2021-06-16 18:13     ` Miguel Ojeda
2021-06-17 17:07       ` Serge E. Hallyn
     [not found] ` <CAJhbpm_BgbSx581HU0mTCkcE28n_hRx=tv74az_mE2VBmPtrVA@mail.gmail.com>
2021-06-16 18:05   ` Konstantin Ryabitsev
2021-06-16 18:11 ` Miguel Ojeda
2021-06-16 18:22   ` Konstantin Ryabitsev
2021-06-16 18:38     ` Miguel Ojeda
2021-06-16 20:10 ` Willy Tarreau
2021-06-17 15:11   ` Konstantin Ryabitsev
2021-06-17 15:25     ` Willy Tarreau
2021-06-16 20:24 ` Linus Torvalds
2021-06-17 15:09   ` Konstantin Ryabitsev
2021-06-16 21:11 ` Rob Herring
2021-06-16 21:18   ` Stefano Stabellini
2021-06-16 21:59     ` Rob Herring
2021-06-16 22:33   ` James Bottomley
2021-06-17 14:18     ` Rob Herring
2021-06-17 14:27       ` James Bottomley
2021-06-17  6:52   ` Mauro Carvalho Chehab
2021-06-17  8:20     ` Dmitry Vyukov
2021-06-17  8:55       ` Mauro Carvalho Chehab
2021-06-17  9:33         ` Dmitry Vyukov
2021-06-17  9:52           ` Geert Uytterhoeven
2021-06-17 14:33         ` Rob Herring
2021-06-17 15:24           ` Mauro Carvalho Chehab
2021-06-17 15:38             ` Rob Herring
2021-06-17 15:45             ` Christoph Hellwig
2021-06-17 14:02     ` Rob Herring
2021-06-17 14:47   ` Konstantin Ryabitsev
2021-06-17 15:25     ` Steven Rostedt
2021-06-17 15:48       ` Christoph Hellwig
2021-06-17 15:53         ` Laurent Pinchart
2021-06-17 17:15     ` Rob Herring
2021-06-17  6:37 ` Dmitry Vyukov
2021-06-17  7:30 ` Greg KH
2021-06-17 14:59   ` Konstantin Ryabitsev
2021-06-17  8:24 ` Christoph Hellwig
2021-06-17  8:33   ` Jiri Kosina
2021-06-17  9:52     ` Dmitry Vyukov
2021-06-17 10:09       ` Christoph Hellwig
2021-06-17 14:57         ` Konstantin Ryabitsev
2021-06-17 15:16           ` Mark Brown
2021-06-17 15:24             ` Laurent Pinchart
2021-06-17 16:36               ` Geert Uytterhoeven
2021-06-17 18:43               ` Miguel Ojeda
2021-06-17 15:31             ` Paolo Bonzini
2021-06-17 17:06               ` Stefano Stabellini
2021-06-17 22:35                 ` Jiri Kosina
2021-06-17 14:23       ` Miguel Ojeda
2021-06-17 20:42 ` Brendan Higgins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.