[Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
@ 2018-09-04 20:16 Sasha Levin
  2018-09-04 20:53 ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-04 20:16 UTC (permalink / raw)
  To: ksummit-discuss

Hi folks,

I've previously sent a mail
(https://lwn.net/ml/linux-kernel/20180501163818.GD1468@sasha-vm/) about
this topic, so I won't repeat everything here.

>From the dicussion that ensued, it seems that there is an agreement that
there is a problem with the process, but different maintainers have
different ideas on how to resolve this. Achieving any sort of a
consensus over mails seems to be impossible.

I'd like to have a discussion about this topic. It is my belief that
this is a "low hanging fruit" and by addressing it we can prevent a good
portion of the bugs we have to deal with from creeping in in the first
place.

If there is more information folks need to support making a decision I'd
be happy to work on it so it will be available during discussions. I
could also propose a few possible solutions, but I suspect we have more
than enough ideas about how that might look - we just need to reach a
consensus about one.

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-04 20:16 [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches Sasha Levin
@ 2018-09-04 20:53 ` Daniel Vetter
  2018-09-05 14:17   ` Steven Rostedt
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2018-09-04 20:53 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit-discuss

On Tue, Sep 4, 2018 at 10:16 PM, Sasha Levin via Ksummit-discuss
<ksummit-discuss@lists.linuxfoundation.org> wrote:
> Hi folks,
>
> I've previously sent a mail
> (https://lwn.net/ml/linux-kernel/20180501163818.GD1468@sasha-vm/) about
> this topic, so I won't repeat everything here.

That thread sprawled all over the place, and I honestly don't even
recollect half of it. I'd very much appreciate a summary of the
problems and different viewpoints shared in there. Otherwise I think
we'll just have the exact same discussion once more ...

Thanks, Daniel

> From the dicussion that ensued, it seems that there is an agreement that
> there is a problem with the process, but different maintainers have
> different ideas on how to resolve this. Achieving any sort of a
> consensus over mails seems to be impossible.
>
> I'd like to have a discussion about this topic. It is my belief that
> this is a "low hanging fruit" and by addressing it we can prevent a good
> portion of the bugs we have to deal with from creeping in in the first
> place.
>
> If there is more information folks need to support making a decision I'd
> be happy to work on it so it will be available during discussions. I
> could also propose a few possible solutions, but I suspect we have more
> than enough ideas about how that might look - we just need to reach a
> consensus about one.
>
>
> Thanks,
> Sasha
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-04 20:53 ` Daniel Vetter
@ 2018-09-05 14:17   ` Steven Rostedt
  2018-09-07  0:51     ` Sasha Levin
  0 siblings, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-05 14:17 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss

On Tue, 4 Sep 2018 22:53:23 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> > I've previously sent a mail
> > (https://lwn.net/ml/linux-kernel/20180501163818.GD1468@sasha-vm/) about
> > this topic, so I won't repeat everything here.  
> 
> That thread sprawled all over the place, and I honestly don't even
> recollect half of it. I'd very much appreciate a summary of the
> problems and different viewpoints shared in there. Otherwise I think
> we'll just have the exact same discussion once more ...

I was thinking the exact same thing.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-05 14:17   ` Steven Rostedt
@ 2018-09-07  0:51     ` Sasha Levin
  2018-09-07  1:09       ` Steven Rostedt
  2018-09-07  1:09       ` Linus Torvalds
  0 siblings, 2 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-07  0:51 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Wed, Sep 05, 2018 at 10:17:10AM -0400, Steven Rostedt wrote:
>On Tue, 4 Sep 2018 22:53:23 +0200
>Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
>> > I've previously sent a mail
>> > (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2Fml%2Flinux-kernel%2F20180501163818.GD1468%40sasha-vm%2F&amp;data=02%7C01%7CAlexander.Levin%40microsoft.com%7C7643e868bb5e4d76817c08d6133a46cb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636717538346291007&amp;sdata=MEleczglT%2FGM46uI0F4hwbofY9ARD2NEQLS7NW37ntg%3D&amp;reserved=0) about
>> > this topic, so I won't repeat everything here.
>>
>> That thread sprawled all over the place, and I honestly don't even
>> recollect half of it. I'd very much appreciate a summary of the
>> problems and different viewpoints shared in there. Otherwise I think
>> we'll just have the exact same discussion once more ...
>
>I was thinking the exact same thing.

Assuming you've read the original mail, it appears that most parties who
participated in the discussion agreed that there's an issue where
patches that go in during (late) -rc cycles seems to be less tested and
are buggier than they should be.

Most of that thread discussed possible solutions such as:

 - Not taking non-critical patches past -rcX (-rc4 seemed to be a
   popular one).
 - -rc patches must fix something introduced in the current merge
   window. Patches fixing anything older should go in the next merge
   window.
 - 1 or more weeks at the end of the cycle where nothing is taken at all
   and we only run testing.
 - Mandate X days/weeks in linux-next before a patch goes in.

We've never reached a conclusion because maintainers have different
approach to this and different pain points, so it seemed difficult
finding a one-size-fits-all solution.

If you look at the few last -rc cycles of every release in recent
history, almost all of them were written within 2-3 days of being
merged. There is no way to properly test these patches.

Furthermore, these patches often end up in Stable, which is quite bad of
the Stable kernel's regression rates.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  0:51     ` Sasha Levin
@ 2018-09-07  1:09       ` Steven Rostedt
  2018-09-07 20:12         ` Greg KH
  2018-09-07  1:09       ` Linus Torvalds
  1 sibling, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-07  1:09 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit-discuss

On Fri, 7 Sep 2018 00:51:42 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> Assuming you've read the original mail, it appears that most parties who
> participated in the discussion agreed that there's an issue where
> patches that go in during (late) -rc cycles seems to be less tested and
> are buggier than they should be.
> 
> Most of that thread discussed possible solutions such as:
> 
>  - Not taking non-critical patches past -rcX (-rc4 seemed to be a
>    popular one).
>  - -rc patches must fix something introduced in the current merge
>    window. Patches fixing anything older should go in the next merge
>    window.

Interesting, because this is exactly what Linus blew up about that made
headlines and a loss of a kernel developer 5 years ago:

  https://lore.kernel.org/lkml/1373593870.17876.70.camel@gandalf.local.home/T/#mb7018718ce288b55fe041778721004cd62cd00a1


>  - 1 or more weeks at the end of the cycle where nothing is taken at all
>    and we only run testing.
>  - Mandate X days/weeks in linux-next before a patch goes in.
> 
> We've never reached a conclusion because maintainers have different
> approach to this and different pain points, so it seemed difficult
> finding a one-size-fits-all solution.
> 
> If you look at the few last -rc cycles of every release in recent
> history, almost all of them were written within 2-3 days of being
> merged. There is no way to properly test these patches.

Yep, and that's caused by the design of the kernel development work
flow. Linus sets a fast paced cycle, and patches will get in fast.
That's actually what makes stable a reason to keep around. If anything,
the wait period from entering Linus's tree to going into stable (for
everything but the embargo like fixes) should probably be a week or
two, being in Linus's tree is usually the best testing of any patch, as
that's the tree that probably gets the most testing. (We don't need QA,
that's what users are for ;-)

-- Steve

> 
> Furthermore, these patches often end up in Stable, which is quite bad of
> the Stable kernel's regression rates.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  0:51     ` Sasha Levin
  2018-09-07  1:09       ` Steven Rostedt
@ 2018-09-07  1:09       ` Linus Torvalds
  2018-09-07  1:49         ` Sasha Levin
  1 sibling, 1 reply; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07  1:09 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Thu, Sep 6, 2018 at 5:51 PM Sasha Levin via Ksummit-discuss
<ksummit-discuss@lists.linuxfoundation.org> wrote:
>
> Assuming you've read the original mail, it appears that most parties who
> participated in the discussion agreed that there's an issue where
> patches that go in during (late) -rc cycles seems to be less tested and
> are buggier than they should be.

Some parties didn't participate, because it was pointless.

Honestly, is that even interesting data?

Seriously.

OF COURSE the patches that go in during late -rc cycles are newer and
less tested. Anything else would be insane and stupid.

Patches that go in through the merge window should have been around
for a while. They should have been in -next. They should have gone
through a lot of testing by the developer, and by all the test-bots
etc we have.

So by any measure, the normal development patches should *absolutely*
be the ones that are

 (a) most tested, most of those patches have been around for a long
time and discussed. Sometimes for months. Sometimes for closer to a
_year_ before a feature really gets merged.

 (b) hopefully the patches I pull during the merge window mostly
pretty normal and noncontroversial. Sure, some of them have bugs too,
but on average, you'd expect a patch to be good.

 (c) not very subtle at all on average. Again, most patches are just
not very "interesting". They're bread-and-butter trivial changes.

Now, compare that to something that goes in late in the rc timeframe.

Ask yourself what kind of patch does that. Really ask yourself that,
and ask yourself what caused that patch to go in late in the rc.

I'll tell you:

 (a) it's likely a nastier issue than most patches. It wasn't some
simple thing, and it wasn't an obvious problem.

 (b) it's subtle. It took a while to even find the bug, much less fix it.

 (c) it sure as hell isn't going to be a patch that has been around
for a long time and that has gone through a lot of linux-next etc.

So OF COURSE the patches that come in late during rc not only see less
testing, but they are for subtler issues to begin with! They are fixes
for unusual corner cases that the developer didn't think of.

So exactly what do you think it proves that late rc patches then might
be buggier than average?

I claim it proves nothing at all. It's just a direct consequence of
late rc patches being _different_, and being much more difficult
issues than your average patch.

Not all patches start out the same. Saying "a higher percentage of rc
patches are buggy and less tested than during the merge window" isn't
even worth commenting on.

                  Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:09       ` Linus Torvalds
@ 2018-09-07  1:49         ` Sasha Levin
  2018-09-07  2:31           ` Linus Torvalds
                             ` (4 more replies)
  0 siblings, 5 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-07  1:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

On Thu, Sep 06, 2018 at 06:09:43PM -0700, Linus Torvalds wrote:
>On Thu, Sep 6, 2018 at 5:51 PM Sasha Levin via Ksummit-discuss
><ksummit-discuss@lists.linuxfoundation.org> wrote:
>>
>> Assuming you've read the original mail, it appears that most parties who
>> participated in the discussion agreed that there's an issue where
>> patches that go in during (late) -rc cycles seems to be less tested and
>> are buggier than they should be.
>
>Some parties didn't participate, because it was pointless.
>
>Honestly, is that even interesting data?
>
>Seriously.
>
>OF COURSE the patches that go in during late -rc cycles are newer and
>less tested. Anything else would be insane and stupid.
>
>Patches that go in through the merge window should have been around
>for a while. They should have been in -next. They should have gone
>through a lot of testing by the developer, and by all the test-bots
>etc we have.
>
>So by any measure, the normal development patches should *absolutely*
>be the ones that are
>
> (a) most tested, most of those patches have been around for a long
>time and discussed. Sometimes for months. Sometimes for closer to a
>_year_ before a feature really gets merged.
>
> (b) hopefully the patches I pull during the merge window mostly
>pretty normal and noncontroversial. Sure, some of them have bugs too,
>but on average, you'd expect a patch to be good.
>
> (c) not very subtle at all on average. Again, most patches are just
>not very "interesting". They're bread-and-butter trivial changes.
>
>Now, compare that to something that goes in late in the rc timeframe.
>
>Ask yourself what kind of patch does that. Really ask yourself that,
>and ask yourself what caused that patch to go in late in the rc.
>I'll tell you:
>
> (a) it's likely a nastier issue than most patches. It wasn't some
>simple thing, and it wasn't an obvious problem.
>
> (b) it's subtle. It took a while to even find the bug, much less fix it.
>
> (c) it sure as hell isn't going to be a patch that has been around
>for a long time and that has gone through a lot of linux-next etc.
>
>So OF COURSE the patches that come in late during rc not only see less
>testing, but they are for subtler issues to begin with! They are fixes
>for unusual corner cases that the developer didn't think of.

This is where you're wrong because I suspect that you don't see what's
going in (late) -rc cycles

You're saying that patches that come in during -rc cycles are more
difficult and tricky, and simultaneously you're saying that it's
completely fine taking them in without any testing at all. Does that
sound like a viable testing strategy?

>From your description, one would think that folks merge all their shiny
new features during the merge window and then spend the next two months
testing that new code, finding bugs and sending you fixes, and as time
goes by fixes are more difficult so it's okay to sneak them in late
during late -rc cycles.

This is complete bullshit.

Look at v4.17-rc8..v4.18: how many of those commits fix something
introduced in the v4.17 merge window vs fixing something older?

This is a *huge* reason why we see regressions in Stable. Take a look at
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005287.html
for a list of recent user visible regressions the CoreOS folks have
observed this year. Do you want to know when they were merged? Let me
help you: all but one were merged in -rc5 or later.

>So exactly what do you think it proves that late rc patches then might
>be buggier than average?

It proves that your rules on what you take during late -rc cycles make 0
sense. It appears that once we passed -rc5 you will take anything that
looks like a fix, even if it's completely unrelated to the current merge
window or if it's riskier to take that patch than revert whatever new
code that was merged in.

>I claim it proves nothing at all. It's just a direct consequence of
>late rc patches being _different_, and being much more difficult
>issues than your average patch.
>
>Not all patches start out the same. Saying "a higher percentage of rc
>patches are buggy and less tested than during the merge window" isn't
>even worth commenting on.

How can you justify sneaking a patch that spent 0 days in linux-next,
never ran through any of our automated test frameworks and was never
tested by a single real user into a Stable kernel release?

What's the point in all the testing effort that's going on if after -rc5
you just say "fuck it" and take stuff that didn't go through any
testing?

What's the rush in pulling in untested fixes for bugs that were
introduced in previous releases? Why can't they wait until the next
merge window?


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:49         ` Sasha Levin
@ 2018-09-07  2:31           ` Linus Torvalds
  2018-09-07  2:45             ` Steven Rostedt
  2018-09-07 14:54             ` Sasha Levin
  2018-09-07  2:33           ` Steven Rostedt
                             ` (3 subsequent siblings)
  4 siblings, 2 replies; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07  2:31 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Thu, Sep 6, 2018 at 6:49 PM Sasha Levin
<Alexander.Levin@microsoft.com> wrote:
>
> You're saying that patches that come in during -rc cycles are more
> difficult and tricky, and simultaneously you're saying that it's
> completely fine taking them in without any testing at all. Does that
> sound like a viable testing strategy?

It sounds like *reality*.

Note that I'm making the simple argument that there is a selection
bias going on. The patches coming in are simply different.

What do you suggest you do about a patch that is a bug fix? Delay it
until it has two weeks of testing? Which it won't get, because nobody
actually runs it until it is merged?

THAT is my argument. There _is_ no viable testing strategy once you're
in the bug fix territory.

The testing was hopefully done on the stuff in the merge window, so
that the bug fix territory might be smaller, but once you have a fix
for something, what are your choices?

Wait until the next merge window? Not apply it at all and have it
percolate back in stable?

That sounds _way_ crazier to me.

Revert? Which we do do, btw, but it has its own set of serious
problems too, and we've had bugs due to *that* (because then it turns
out there were patches that built on the code that weren't obvious and
so we had semantic conflicts).

But do you not realize what this means: this means *by*definition*
that the fixes get less testing. That's just how it is.

THAT is my argument. They are statistically very different animals
from the development patches. And they *will* stand out because they
are different, and you'd actually expect them to stand ou tout _more_
the further in the rc series you get.

And then when you look at percentages of breakage, yes, the fixes look
bad. But that, I think, really is because of the fundamental selection
bias.

> Look at v4.17-rc8..v4.18: how many of those commits fix something
> introduced in the v4.17 merge window vs fixing something older?

What is the relevance of that question?

Seriously.

What does it matter whether they fixed something older or something in
that release?

And notice also how it doesn't matter to the bias question. Sure,
fixes come in during the merge window too (and early rc too). But
there they are simply statistically not as noticeable.

> This is a *huge* reason why we see regressions in Stable.

No.

The *stable* people are the ones that were supposed to be the careful ones.

Instead, you use automated scripts and hoover things up, and then you
try to blame the development tree for getting stuff that regresses in
your tree.

What's the logic of that, again?

Now, don't get me wrong. I'd like to get even fewer changes in during
late rc, I do think we actually agree on that. But I don't think that
really changes the *problem*. It just shifts the problem around, it
doesn't change it in any fundamental way. You still end up with the
same situation eventually.

Also, don't get me wrong another way: I'm not actually blaming the
stable people either. Because I think you guys end up being in the
exact same situation - even if *you* are careful, and you delay
applying stable patches, it really doesn't make the problem go away,
it just shifts it later in time.

> Take a look at
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005287.html
> for a list of recent user visible regressions the CoreOS folks have
> observed this year. Do you want to know when they were merged? Let me
> help you: all but one were merged in -rc5 or later.

And hey, here's another way of looking at it: those were seen to be
serious fixes (there's at least one CVE in there) that came in late,
and then the fix had a subtle interaction that people didn't realize
or catch.

The "rc5 or later" is actually time-wise about 1/3 of everything. It's
not some insignificant fraction. And yes, the patches that come in in
that timeframe are often going to be _way_ subtler than the ones that
get delayed until later because people don't think they are as
critical.

> >So exactly what do you think it proves that late rc patches then might
> >be buggier than average?
>
> It proves that your rules on what you take during late -rc cycles make 0
> sense. It appears that once we passed -rc5 you will take anything that
> looks like a fix, even if it's completely unrelated to the current merge
> window or if it's riskier to take that patch than revert whatever new
> code that was merged in.

But that's not the rule.

Post-rc5 has nothing to do with "current merge window". You just made
that up. And that rule would make zero sense indeed.

Basically, during rc5, I should take anything that would be marked for
stable. The only difference between "current merge window" or
"previous" is that _if_ it was actually the current merge window,
there won't be a "stable" tag, because it isn't relevant for older
kernels.

See what I'm saying?

You're basically trying to make the rule be "don't take stuff that is
marked for stable". But *THAT* would be truly incredibly insane, and
actually cut down testing even further, because now you lose the
testing that mainline kernels *do* get (well, I hope they do - more
than linux-next, for sure).

So the "even if it's completely unrelated to the current merge window"
argument of yours makes absolute zero sense.

What would you actually want us to do?

Delay fixes until the next merge window?

> How can you justify sneaking a patch that spent 0 days in linux-next,
> never ran through any of our automated test frameworks and was never
> tested by a single real user into a Stable kernel release?

I'm not doing that. YOU ARE.

I'm putting it into the development release. You're not supposed to
take it without testing. Being *in* the development tree is what gets
it actual real-life testing, Sasha.

Really.

For *stable*, you should be waiting for a week or two before you
actually apply it. That's what Greg claims he does (ie he delays it
until a "one past" rc release - if it went into rc5, he'll take it
after rc6). Of course, there are exceptions there too, but that's my
understanding of what the default stable flow should be.

That way you get *way* better testing than linux-next ever gives you,
because hardly anybody runs linux-next outside of bots (which do find
a lot, don't get me wrong, but they miss a *ton*).

But at the same time, we should all admit that what gets even more
testing is not just when it hits stable, but when it hits a distro
_because_ it hit stable.

Anybody who thinks that that won't show problems that didn't get found
in testing is living in a dream world. It will. The regressions in
stable are inevitable.

It's called "reality". Tough, and we all wish it wasn't all nasty and
complex, but that messiness is fundamentally what makes reality
different from theory.

                     Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:49         ` Sasha Levin
  2018-09-07  2:31           ` Linus Torvalds
@ 2018-09-07  2:33           ` Steven Rostedt
  2018-09-07  2:52           ` Guenter Roeck
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 138+ messages in thread
From: Steven Rostedt @ 2018-09-07  2:33 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, 7 Sep 2018 01:49:31 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> Look at v4.17-rc8..v4.18: how many of those commits fix something
> introduced in the v4.17 merge window vs fixing something older?

I will note that from my own experience, I have found a lot of older
bugs because new code that was added triggers the easier. Or I will
find old bugs while debugging a bug that was introduced in the last
merge window.

Why should this bug that I found and fixed be treated differently than
a bug that was introduced in the merge window. A lot of times, these
older bugs become "crap, this is bad, I need to get this to stable
ASAP" too.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  2:31           ` Linus Torvalds
@ 2018-09-07  2:45             ` Steven Rostedt
  2018-09-07  3:43               ` Linus Torvalds
  2018-09-07  8:40               ` Geert Uytterhoeven
  2018-09-07 14:54             ` Sasha Levin
  1 sibling, 2 replies; 138+ messages in thread
From: Steven Rostedt @ 2018-09-07  2:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

On Thu, 6 Sep 2018 19:31:18 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> What do you suggest you do about a patch that is a bug fix? Delay it
> until it has two weeks of testing? Which it won't get, because nobody
> actually runs it until it is merged?

A good statistic to take is to see what found a bug that was caused by
something added after rc5. Was it a bot (then there's credence that
running through linux-next would be helpful). Is it because someone
triggered it because it was in Linus's tree (which means the bug
wouldn't show up until it hit Linus's tree). Or was it something that
was discovered when it got into the distros?

Really, the only testing coverage that a patch gets in linux-next is by
the bots that are run on them. I will agree that the number of bots and
automated tests are getting better. I don't push even late fixes to
Linus without waiting for the zero day bot to give me the OK, because
sometimes it finds a subtle mistake I made, which would embarrass me if
it was found after I pushed it to Linus.

Another issue about having fixes sit in linux-next for some time after
-rc5, is that by that time, linux-next is filled with new development
code waiting for the next merge window. A subtle fix for a bug that
wasn't caught by linux-next in the first place (how else would that bug
still be around by rc5?) is highly likely not to catch a bug with the
fix to that subtle bug.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:49         ` Sasha Levin
  2018-09-07  2:31           ` Linus Torvalds
  2018-09-07  2:33           ` Steven Rostedt
@ 2018-09-07  2:52           ` Guenter Roeck
  2018-09-07 14:37             ` Laura Abbott
  2018-09-07  3:38           ` Al Viro
  2018-09-07  4:27           ` Theodore Y. Ts'o
  4 siblings, 1 reply; 138+ messages in thread
From: Guenter Roeck @ 2018-09-07  2:52 UTC (permalink / raw)
  To: Sasha Levin, Linus Torvalds; +Cc: ksummit

On 09/06/2018 06:49 PM, Sasha Levin via Ksummit-discuss wrote:
> 
> This is a *huge* reason why we see regressions in Stable. Take a look at
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005287.html
> for a list of recent user visible regressions the CoreOS folks have
> observed this year. Do you want to know when they were merged? Let me
> help you: all but one were merged in -rc5 or later.
> 

My conclusion from that would be that patches are applied to stable
before they had time to soak in mainline. Your argument against
accepting patches into mainline might as well be applied to patches
applied to stable.

I think you are a bit hypocritical arguing that patches should be
restricted from being accepted into mainline ... when at the same
time patches are at least sometimes applied almost immediately to
stable releases from there. Plus, some if not many of the patches
applied to stable releases nowadays don't really fix critical or
even severe bugs. If the patches mentioned above indeed caused
regressions in mainline, those regressions should have been found
and fixed _before_ the patches made it into stable releases.
Blaming mainline for the problem is just shifting the blame.

I would argue that, if anything, the rules for accepting patches into
_stable_ releases should be much more strict than they are today.
If anything, we need to look into that, not into restricting patch
access to mainline.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:49         ` Sasha Levin
                             ` (2 preceding siblings ...)
  2018-09-07  2:52           ` Guenter Roeck
@ 2018-09-07  3:38           ` Al Viro
  2018-09-07  4:27           ` Theodore Y. Ts'o
  4 siblings, 0 replies; 138+ messages in thread
From: Al Viro @ 2018-09-07  3:38 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via Ksummit-discuss wrote:

> This is a *huge* reason why we see regressions in Stable. Take a look at
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005287.html
> for a list of recent user visible regressions the CoreOS folks have
> observed this year. Do you want to know when they were merged? Let me
> help you: all but one were merged in -rc5 or later.

	Umm...  Looking.  Looks like the nastiest one in there is a TCP
regression, caused by inclusion of "tcp: avoid integer overflows in
tcp_rcv_space_adjust()" in 4.14.48.  Fixed by inclusion of "tcp: do not
overshoot window_clamp in tcp_rcv_space_adjust()" into 4.14.50.  

	Note that in mainline the latter is actually a sodding *PARENT*
of the former, so no amount of soaking in, testing, etc. would have caught
the problem.  Simply because there wasn't one.

	That, presumably, is the one exception you mention?  I honestly
went by the apparent impact - digging through their (github) user interface
is not my idea of fun ;-/

	Could you post summaries of the other 4?  Or links to such, if that
got covered at some point in that thread...

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  2:45             ` Steven Rostedt
@ 2018-09-07  3:43               ` Linus Torvalds
  2018-09-07  8:52                 ` Daniel Vetter
  2018-09-07  8:40               ` Geert Uytterhoeven
  1 sibling, 1 reply; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07  3:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Thu, Sep 6, 2018 at 7:45 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Really, the only testing coverage that a patch gets in linux-next is by
> the bots that are run on them. I will agree that the number of bots and
> automated tests are getting better.

I do have to agree with this.

The bots have gone from finding build issues due to configurations to
actually being pretty damn good for many things.

So linux-next test coverage has clearly improved. It finds real issues
(ok, build problems are real issues too, but I think we all agree they
are trivial in comparison), and thanks to having a lot of debug stuff
enabled, testing these days often finds stuff that is core kernel and
triggerable with KASAN or lockdep etc.

It's been very helpful.

But the automated tests are still very limited, particularly when it
comes to hardware issues. Most bot runs tend to be in virtual machines
and/or on a fairly small set of hardware, and usually for a fairly
limited set of loads.

It still does find things, but to just take that list of five stable
regressions that was posted, none of them look like they'd necessarily
have been found by one of the automated boot bots.

And while I think more real people run development kernels with real
loads, I don't think the coverage is *that* good there either. For
example, I tend find one or two major bugs (as in "doesn't boot") kind
of issues that were never found in linux-next pretty much every single
merge window (I consider it a good merge window if I didn't have to
bisect anything).

And _my_ hardware and usage really is pretty damn basic, which should
tell you about how little some of the automated stuff catches.

(Side note: I think it's improving even on the hardware side. I think
the i915 people must be running a _lot_ more testing before pushing to
me, because while GPU issues used to be one of the areas that was one
of the common causes, and it really hasn't been that lately).

And I suspect most people who run development kernels actually end up
running fairly similar hardware (ie "fairly modern workstation" kind
of hardware).

The thing that starts seeing more actual users tends to be the distros
that have "test" versions. At least with Fedora has had a
bleeding-edge rawhide that often has quite recent kernels, and it has
often found things early.

And then there is stable and actual distro users.

And that really is when you find the _much_ wider hardware, and the
odder cases. Sure, the early testing has hopefully found the _core_
problems, but sometimes there are core problems that are simply
triggered by specific hardware patterns or software uses.

> Another issue about having fixes sit in linux-next for some time after
> -rc5, is that by that time, linux-next is filled with new development
> code waiting for the next merge window. A subtle fix for a bug that
> wasn't caught by linux-next in the first place (how else would that bug
> still be around by rc5?) is highly likely not to catch a bug with the
> fix to that subtle bug.

Also note that going into my tree does mean that now linux-next covers
it. So it's not like a patch being accepted should ever make for
_less_ coverage.

If the bug isn't found in the development tree for a week, and ithe
stable people take it, I think we can just all agree that the
automation in  linux-next simply didn't find it.

So the argument that we should delay bug-fixes in order for them to
get more coverage in linux-next seems entirely mis-guided.

Now, if the argument is that people send me stuff that doesn't even
_pretend_ to be a bug-fix, and that this is a problem, then I agree
whole-heartedly with that being a problem. I do occasionally complain
loudly about _that_ problem. It doesn't affect the stable kernels,
perhaps, but it affects the general stability of the development
process.

                Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:49         ` Sasha Levin
                             ` (3 preceding siblings ...)
  2018-09-07  3:38           ` Al Viro
@ 2018-09-07  4:27           ` Theodore Y. Ts'o
  2018-09-07  5:45             ` Stephen Rothwell
                               ` (2 more replies)
  4 siblings, 3 replies; 138+ messages in thread
From: Theodore Y. Ts'o @ 2018-09-07  4:27 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via Ksummit-discuss wrote:
> 
> How can you justify sneaking a patch that spent 0 days in linux-next,
> never ran through any of our automated test frameworks and was never
> tested by a single real user into a Stable kernel release?

At least for file system patches, my file system regression testing
(gce-xfstests) beats *all* of the Linux-next bots.  And in fact, the
regression tests actually catch more problems than users, because most
users' file system workloads are incredibly boring.  :-)

It might be different for fixes in hardware drivers, where a fix for
Model 785 might end up breaking Model 770.  But short of the driver
developer having an awesomely huge set of hardware in their testing
lab, what are they going to do?  And is holding off until the Merge
window really going to help find the regression?  The linux-bots
aren't likely to find such problems!

As far as users testing Linux-next --- I'm willing to try running
anything past, say, -rc3 on my laptop.  But running linux-next?  Heck,
no!  That's way too scary for me.

Side bar comment:

There actually is a perverse incentive to having all of the test
'bots, which is that I suspect some people have come to rely on it to
catch problems.  I generally run a full set of regression tests before
I push an update to git.kernel.org (it only takes about 2 hours, and
12 VM's :-); and by the time we get to the late -rc's I *always* will
do a full regression test.

In the early-to-mid- rc's, sometimes if I'm in a real rush, I'll just
run the 15 minute smoke test; but I'll do at least *some* testing.

But other trees seem to be much more loosey-goosey about what they
will push to linux-next, since they want to let the 'bots catch
problems.  With the net result that they scare users away from wanting
to use linux-next.

       		    	      		- Ted

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  4:27           ` Theodore Y. Ts'o
@ 2018-09-07  5:45             ` Stephen Rothwell
  2018-09-07  9:13             ` Daniel Vetter
  2018-09-07 14:56             ` Sasha Levin
  2 siblings, 0 replies; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-07  5:45 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]

On Fri, 7 Sep 2018 00:27:54 -0400 "Theodore Y. Ts'o" <tytso@mit.edu> wrote:
>
> But other trees seem to be much more loosey-goosey about what they
> will push to linux-next, since they want to let the 'bots catch
> problems.  With the net result that they scare users away from wanting
> to use linux-next.

And irritate the linux-next maintainer ;-)

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  2:45             ` Steven Rostedt
  2018-09-07  3:43               ` Linus Torvalds
@ 2018-09-07  8:40               ` Geert Uytterhoeven
  2018-09-07  9:07                 ` Daniel Vetter
  1 sibling, 1 reply; 138+ messages in thread
From: Geert Uytterhoeven @ 2018-09-07  8:40 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Fri, Sep 7, 2018 at 4:45 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> Another issue about having fixes sit in linux-next for some time after
> -rc5, is that by that time, linux-next is filled with new development
> code waiting for the next merge window. A subtle fix for a bug that
> wasn't caught by linux-next in the first place (how else would that bug
> still be around by rc5?) is highly likely not to catch a bug with the
> fix to that subtle bug.

Or the issue may never show up in linux-next at all.

With strict by-subsystem merge policies, splitting a complicated patch set
by subsystem and scheduling it for inclusion across multiple kernel
versions can be challenging.  If anything slightly related is applied
independently, or due to a scheduling oversight or merge window miss, this
may lead to a regression in mainline that is never present in linux-next,
and usually only detected late, leading to a fix after -rc5.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  3:43               ` Linus Torvalds
@ 2018-09-07  8:52                 ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2018-09-07  8:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

On Fri, Sep 7, 2018 at 5:43 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> (Side note: I think it's improving even on the hardware side. I think
> the i915 people must be running a _lot_ more testing before pushing to
> me, because while GPU issues used to be one of the areas that was one
> of the common causes, and it really hasn't been that lately).

We run a _lot_ more before even pushing to linux-next :-) I think
rule-of-thumb is that pre-merge we burn down one machine-week on every
patch series when it gets posted (every time it gets posted), and
post-merge that goes up to about a machine-month. Of course repeated
plenty of times - I think we do a handful of the one-month runs each
week, and about 20 of the one-week runs per day.

Pretty much all the things that do still slip through are for features
we haven't figured out how to test in a fully automated way (some
obscure display features - we e.g. _do_ have fully automated rigs for
hotplug testing). Everywhere else coverage is pretty awesome, because
we have lots of tests and lots of different machines.

Aside: We also subject linux-next to the same torture, so we know how
many machines will die when we pull in -rc1. It's hard to tell,
because lots of noise in the data, but I think overall system
stability of linux-next has improved. And I think it's been a while
since we last catched someone pushing untested patches to linux-next
that failed on our entire farm :-)
-Daniel
---
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  8:40               ` Geert Uytterhoeven
@ 2018-09-07  9:07                 ` Daniel Vetter
  2018-09-07  9:28                   ` Geert Uytterhoeven
  2018-09-07 17:05                   ` Olof Johansson
  0 siblings, 2 replies; 138+ messages in thread
From: Daniel Vetter @ 2018-09-07  9:07 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: ksummit

On Fri, Sep 7, 2018 at 10:40 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Fri, Sep 7, 2018 at 4:45 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>> Another issue about having fixes sit in linux-next for some time after
>> -rc5, is that by that time, linux-next is filled with new development
>> code waiting for the next merge window. A subtle fix for a bug that
>> wasn't caught by linux-next in the first place (how else would that bug
>> still be around by rc5?) is highly likely not to catch a bug with the
>> fix to that subtle bug.
>
> Or the issue may never show up in linux-next at all.
>
> With strict by-subsystem merge policies, splitting a complicated patch set
> by subsystem and scheduling it for inclusion across multiple kernel
> versions can be challenging.  If anything slightly related is applied
> independently, or due to a scheduling oversight or merge window miss, this
> may lead to a regression in mainline that is never present in linux-next,
> and usually only detected late, leading to a fix after -rc5.

Aside: I'm still baffled at how much soc people split up their work.
As you point out, testing becomes a game of luck, because integration
happens only in the merge window for real.

Anything I'm involved in I'm insisting on a proper topic branch that
everyone pulls in, to make sure that we can test the full interactions
when we actually feature-freeze before the merge window. Usually that
means baking the merge into the drm side, because we do a lot more
testing than others (e.g. upstream intel audio validation is done
through the drm trees too - it doesn't work that great because it's
post-merge only for sound).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  4:27           ` Theodore Y. Ts'o
  2018-09-07  5:45             ` Stephen Rothwell
@ 2018-09-07  9:13             ` Daniel Vetter
  2018-09-07 11:32               ` Mark Brown
  2018-09-07 21:06               ` Mauro Carvalho Chehab
  2018-09-07 14:56             ` Sasha Levin
  2 siblings, 2 replies; 138+ messages in thread
From: Daniel Vetter @ 2018-09-07  9:13 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: ksummit

On Fri, Sep 7, 2018 at 6:27 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via Ksummit-discuss wrote:
>>
>> How can you justify sneaking a patch that spent 0 days in linux-next,
>> never ran through any of our automated test frameworks and was never
>> tested by a single real user into a Stable kernel release?
>
> At least for file system patches, my file system regression testing
> (gce-xfstests) beats *all* of the Linux-next bots.  And in fact, the
> regression tests actually catch more problems than users, because most
> users' file system workloads are incredibly boring.  :-)
>
> It might be different for fixes in hardware drivers, where a fix for
> Model 785 might end up breaking Model 770.  But short of the driver
> developer having an awesomely huge set of hardware in their testing
> lab, what are they going to do?  And is holding off until the Merge
> window really going to help find the regression?  The linux-bots
> aren't likely to find such problems!
>
> As far as users testing Linux-next --- I'm willing to try running
> anything past, say, -rc3 on my laptop.  But running linux-next?  Heck,
> no!  That's way too scary for me.
>
> Side bar comment:
>
> There actually is a perverse incentive to having all of the test
> 'bots, which is that I suspect some people have come to rely on it to
> catch problems.  I generally run a full set of regression tests before
> I push an update to git.kernel.org (it only takes about 2 hours, and
> 12 VM's :-); and by the time we get to the late -rc's I *always* will
> do a full regression test.

This is what imo a well-run subsystem should sound like from a testing
pov. All the subsystem specific testing should be done before merging.
Post-merge is only for integration testing and catching the long-tail
issues that need months/years of machine time to surface.

Of course this is much harder for anything that needs physical
hardware, but even for driver subsystems there's lots you can do with
test-drivers, selftests and a pile of emulation, to at least catch
bugs in generic code. And for reasonably sized teams like drm/i915
building a proper CI is a very obvious investement that will pay off.

> In the early-to-mid- rc's, sometimes if I'm in a real rush, I'll just
> run the 15 minute smoke test; but I'll do at least *some* testing.
>
> But other trees seem to be much more loosey-goosey about what they
> will push to linux-next, since they want to let the 'bots catch
> problems.  With the net result that they scare users away from wanting
> to use linux-next.

Yeah, if maintainers see linux-next as their personal testing ground
then it becomes useless for actual integration testing. But it's not
quite as bleak as it was 2 years ago I think, at least from what I'm
seeing when in our linux-next runs for drm/i915. It still does need to
get a lot better though.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  9:07                 ` Daniel Vetter
@ 2018-09-07  9:28                   ` Geert Uytterhoeven
  2018-09-07 17:05                   ` Olof Johansson
  1 sibling, 0 replies; 138+ messages in thread
From: Geert Uytterhoeven @ 2018-09-07  9:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss

Hi Daniel,

On Fri, Sep 7, 2018 at 11:07 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Fri, Sep 7, 2018 at 10:40 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Fri, Sep 7, 2018 at 4:45 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >> Another issue about having fixes sit in linux-next for some time after
> >> -rc5, is that by that time, linux-next is filled with new development
> >> code waiting for the next merge window. A subtle fix for a bug that
> >> wasn't caught by linux-next in the first place (how else would that bug
> >> still be around by rc5?) is highly likely not to catch a bug with the
> >> fix to that subtle bug.
> >
> > Or the issue may never show up in linux-next at all.
> >
> > With strict by-subsystem merge policies, splitting a complicated patch set
> > by subsystem and scheduling it for inclusion across multiple kernel
> > versions can be challenging.  If anything slightly related is applied
> > independently, or due to a scheduling oversight or merge window miss, this
> > may lead to a regression in mainline that is never present in linux-next,
> > and usually only detected late, leading to a fix after -rc5.
>
> Aside: I'm still baffled at how much soc people split up their work.
> As you point out, testing becomes a game of luck, because integration
> happens only in the merge window for real.

SoC maintainers usually still have their own integration branches. E.g.
for Renesas ARM SoCs we have renesas-devel (DTS and drivers/soc/
integration) and renesas-drivers (renesas-devel + various subsystem
for-next branches).

> Anything I'm involved in I'm insisting on a proper topic branch that
> everyone pulls in, to make sure that we can test the full interactions
> when we actually feature-freeze before the merge window. Usually that
> means baking the merge into the drm side, because we do a lot more
> testing than others (e.g. upstream intel audio validation is done
> through the drm trees too - it doesn't work that great because it's
> post-merge only for sound).

We used to have topic branches for e.g. clock definitions, to be included
by both driver and DTS, but these days we start with a few hardcoded clock
numbers in the DTS, and replace them by symbols in the next release.
Less need for setup and communication for shared topic branches.

The original issue I described above usually doesn't show up for new
development, but for converting from e.g. old board code to new DT-based
code. In se, that should disappear, eventually ;-)

For new development, all pieces go in separately in maintainers trees
(clocks, pinctrl, DT bindings, drivers, DTS, ...), and start becoming
operational when all pieces have fallen together. That's one nice
property of DT ;-)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  9:13             ` Daniel Vetter
@ 2018-09-07 11:32               ` Mark Brown
  2018-09-07 21:06               ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 138+ messages in thread
From: Mark Brown @ 2018-09-07 11:32 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On Fri, Sep 07, 2018 at 11:13:20AM +0200, Daniel Vetter wrote:

> Of course this is much harder for anything that needs physical
> hardware, but even for driver subsystems there's lots you can do with
> test-drivers, selftests and a pile of emulation, to at least catch
> bugs in generic code. And for reasonably sized teams like drm/i915
> building a proper CI is a very obvious investement that will pay off.

It does depend what the primary focus of the QA people is of course -
one of the challenges that comes from people shipping stable in products
is that this tends to be where all the QA investment from vendors goes.
Vendors who are taking a longer term view of their investment in
upstream (and especially those who realistically still have to ship an
out of tree patch stack on top of whatever's upstream) often work on the
basis that they'll figure things out when they pick up the release for
production.  It's a cost, they know it's a cost but there's also costs
in having QA tracking upstream.

> > But other trees seem to be much more loosey-goosey about what they
> > will push to linux-next, since they want to let the 'bots catch
> > problems.  With the net result that they scare users away from wanting
> > to use linux-next.

> Yeah, if maintainers see linux-next as their personal testing ground
> then it becomes useless for actual integration testing. But it's not
> quite as bleak as it was 2 years ago I think, at least from what I'm
> seeing when in our linux-next runs for drm/i915. It still does need to
> get a lot better though.

My experience has been a lot better here working on embedded - yes,
things do occasionally collapse horribly but for the most part it's a
reasonably solid basis to work from for as long as I can remember and
when things do break they normally get fixed fast enough.  Things were a
bit worse before KernelCI and Olof's boot farm, over time (and with the
breakage the DT transition introduced winding down) those have helped
quite a bit.  I think some of that's down to differences in the
underlying hardware with more being directly visible it's easier to unit
test, don't know if there's anything else going on.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  2:52           ` Guenter Roeck
@ 2018-09-07 14:37             ` Laura Abbott
  2018-09-07 15:06               ` Sasha Levin
  0 siblings, 1 reply; 138+ messages in thread
From: Laura Abbott @ 2018-09-07 14:37 UTC (permalink / raw)
  To: Guenter Roeck, Sasha Levin, Linus Torvalds; +Cc: ksummit

On 09/06/2018 07:52 PM, Guenter Roeck wrote:
> On 09/06/2018 06:49 PM, Sasha Levin via Ksummit-discuss wrote:
>>
>> This is a *huge* reason why we see regressions in Stable. Take a look at
>> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005287.html
>> for a list of recent user visible regressions the CoreOS folks have
>> observed this year. Do you want to know when they were merged? Let me
>> help you: all but one were merged in -rc5 or later.
>>
> 
> My conclusion from that would be that patches are applied to stable
> before they had time to soak in mainline. Your argument against
> accepting patches into mainline might as well be applied to patches
> applied to stable.
> 
> I think you are a bit hypocritical arguing that patches should be
> restricted from being accepted into mainline ... when at the same
> time patches are at least sometimes applied almost immediately to
> stable releases from there. Plus, some if not many of the patches
> applied to stable releases nowadays don't really fix critical or
> even severe bugs. If the patches mentioned above indeed caused
> regressions in mainline, those regressions should have been found
> and fixed _before_ the patches made it into stable releases.
> Blaming mainline for the problem is just shifting the blame.
> 
> I would argue that, if anything, the rules for accepting patches into
> _stable_ releases should be much more strict than they are today.
> If anything, we need to look into that, not into restricting patch
> access to mainline.

Part of my proposal for a longer -rc time for stable was for this
exact problem: patches that have been merged in mainline but
tagged for stable may not have had time to test to find all
bugs. The thought was a longer stable -rc cycle would help
in finding those. I think you've hit upon the real problem
though which is that the patches probably shouldn't have been
in stable in the first place.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  2:31           ` Linus Torvalds
  2018-09-07  2:45             ` Steven Rostedt
@ 2018-09-07 14:54             ` Sasha Levin
  2018-09-07 15:52               ` Linus Torvalds
  1 sibling, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 14:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

On Thu, Sep 06, 2018 at 07:31:18PM -0700, Linus Torvalds wrote:
>On Thu, Sep 6, 2018 at 6:49 PM Sasha Levin
><Alexander.Levin@microsoft.com> wrote:
>>
>> You're saying that patches that come in during -rc cycles are more
>> difficult and tricky, and simultaneously you're saying that it's
>> completely fine taking them in without any testing at all. Does that
>> sound like a viable testing strategy?
>
>It sounds like *reality*.
>
>Note that I'm making the simple argument that there is a selection
>bias going on. The patches coming in are simply different.

Let's split this argument into two:

1. You argue that fixes for features that were merged in the current
window are getting more and more tricky as -rc cycles go on, and I agree
with that.

2. You argue that stable fixes (i.e. fixes for bugs introduced in
previous kernel versions) are getting trickier as -rc cycles go on -
which I completely disagree with.

Stable fixes look the same whether they showed up during the merge
window, -rc1 or -rc8, they are disconnected from whatever stage we're at
in the release cycle.

If you agree with me on that, maybe you could explain why most of the
stable regressions seem to show up in -rc5 or later? Shouldn't there be
an even distribution of stable regressions throughout the release cycle?

>What do you suggest you do about a patch that is a bug fix? Delay it
>until it has two weeks of testing? Which it won't get, because nobody
>actually runs it until it is merged?
>
>THAT is my argument. There _is_ no viable testing strategy once you're
>in the bug fix territory.

Sure, the various bots cover much less ground than actual users testing
stuff out.

However, your approach discourages further development of those bots. If
you're not going to use them properly then what's the point in investing
more effort into them.

If what you're saying is that it's pointless testing anything that comes
during late -rc windows then what reason I have to keep it in my
testing pipeline? I'll just drop all my upstream/-next testing and focus
on testing stable branches.

>> This is a *huge* reason why we see regressions in Stable.
>
>No.
>
>The *stable* people are the ones that were supposed to be the careful ones.
>
>Instead, you use automated scripts and hoover things up, and then you
>try to blame the development tree for getting stuff that regresses in
>your tree.

Yes, because stuff that regresses in my tree usually regresses your tree
as well.

This is also not about "hoovering": out of those 5 CoreOS issues, 2 came
in through David Miller's tree. David is probably the best person you
can have doing the net/ stable work and yet things still sneak in.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  4:27           ` Theodore Y. Ts'o
  2018-09-07  5:45             ` Stephen Rothwell
  2018-09-07  9:13             ` Daniel Vetter
@ 2018-09-07 14:56             ` Sasha Levin
  2018-09-07 15:07               ` Jens Axboe
  2 siblings, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 14:56 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: ksummit

On Fri, Sep 07, 2018 at 12:27:54AM -0400, Theodore Y. Ts'o wrote:
>As far as users testing Linux-next --- I'm willing to try running
>anything past, say, -rc3 on my laptop.  But running linux-next?  Heck,
>no!  That's way too scary for me.

That's why linux-next has a pending-fixes branch. IMO it makes more
sense to run that than a random -rcX release.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 14:37             ` Laura Abbott
@ 2018-09-07 15:06               ` Sasha Levin
  2018-09-07 15:54                 ` Laura Abbott
  2018-09-07 21:32                 ` Dan Carpenter
  0 siblings, 2 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 15:06 UTC (permalink / raw)
  To: Laura Abbott; +Cc: ksummit

On Fri, Sep 07, 2018 at 07:37:06AM -0700, Laura Abbott wrote:
>On 09/06/2018 07:52 PM, Guenter Roeck wrote:
>>On 09/06/2018 06:49 PM, Sasha Levin via Ksummit-discuss wrote:
>>>
>>>This is a *huge* reason why we see regressions in Stable. Take a look at
>>>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fksummit-discuss%2F2018-September%2F005287.html&amp;data=02%7C01%7CAlexander.Levin%40microsoft.com%7Cf206ee69bd71452d0d8d08d614cf64a5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636719278316194423&amp;sdata=rYS7lCcdGMzo0kbFowVot790z8GV2Alr1ynEBd1X6qA%3D&amp;reserved=0
>>>for a list of recent user visible regressions the CoreOS folks have
>>>observed this year. Do you want to know when they were merged? Let me
>>>help you: all but one were merged in -rc5 or later.
>>>
>>
>>My conclusion from that would be that patches are applied to stable
>>before they had time to soak in mainline. Your argument against
>>accepting patches into mainline might as well be applied to patches
>>applied to stable.
>>
>>I think you are a bit hypocritical arguing that patches should be
>>restricted from being accepted into mainline ... when at the same
>>time patches are at least sometimes applied almost immediately to
>>stable releases from there. Plus, some if not many of the patches
>>applied to stable releases nowadays don't really fix critical or
>>even severe bugs. If the patches mentioned above indeed caused
>>regressions in mainline, those regressions should have been found
>>and fixed _before_ the patches made it into stable releases.
>>Blaming mainline for the problem is just shifting the blame.
>>
>>I would argue that, if anything, the rules for accepting patches into
>>_stable_ releases should be much more strict than they are today.
>>If anything, we need to look into that, not into restricting patch
>>access to mainline.
>
>Part of my proposal for a longer -rc time for stable was for this
>exact problem: patches that have been merged in mainline but
>tagged for stable may not have had time to test to find all
>bugs. The thought was a longer stable -rc cycle would help
>in finding those. I think you've hit upon the real problem
>though which is that the patches probably shouldn't have been
>in stable in the first place.

Let me use the CoreOS example here again. Here are the 5 user visible
stable regressions they had this year:

8844618d8aa ("ext4: only look at the bg_flags field if it is valid")
f46ecbd97f5 ("cifs: Fix slab-out-of-bounds in send_set_info() on SMB2
ACE setting")
a6f81fcb2c3 ("tcp: avoid integer overflows in tcp_rcv_space_adjust()")
7b2ee50c0cd ("hv_netvsc: common detach logic")
f599c64fdf7 ("xen-netfront: Fix race between device setup and open")
a93bf0ff449 ("vxlan: update skb dst pmtu on tx path")

Which of those patches would you not take in a stable tree in the first
place?

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 14:56             ` Sasha Levin
@ 2018-09-07 15:07               ` Jens Axboe
  2018-09-07 20:58                 ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 138+ messages in thread
From: Jens Axboe @ 2018-09-07 15:07 UTC (permalink / raw)
  To: Sasha Levin, Theodore Y. Ts'o; +Cc: ksummit

On 9/7/18 8:56 AM, Sasha Levin via Ksummit-discuss wrote:
> On Fri, Sep 07, 2018 at 12:27:54AM -0400, Theodore Y. Ts'o wrote:
>> As far as users testing Linux-next --- I'm willing to try running
>> anything past, say, -rc3 on my laptop.  But running linux-next?  Heck,
>> no!  That's way too scary for me.
> 
> That's why linux-next has a pending-fixes branch. IMO it makes more
> sense to run that than a random -rcX release.

I'm pretty convinced that linux-next is very useful as integration
testing. On numerous occasions I learn of conflicts that will impact me
for the merge window, and Stephen is great at providing merge fixes that
helps everybody out. I'm much less convinced that it's useful for
runtime testing. It's extremely rare that I get a bug report on
linux-next, whereas I get them after patches have been merged into
Linus's tree all the time. Nobody is going to be running that
pending-fixes branch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 14:54             ` Sasha Levin
@ 2018-09-07 15:52               ` Linus Torvalds
  2018-09-07 16:17                 ` Linus Torvalds
  2018-09-10 19:43                 ` Sasha Levin
  0 siblings, 2 replies; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07 15:52 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 7, 2018 at 7:54 AM Sasha Levin
<Alexander.Levin@microsoft.com> wrote:
>
> 1. You argue that fixes for features that were merged in the current
> window are getting more and more tricky as -rc cycles go on, and I agree
> with that.

Well, yes, and no. There's two sides to my argument.

Yes, for the current merge window, one issue is that the fixes get
trickier as time goes on (just based on "it took longer to find"). But
that wasn't actually the *bulk* of the argument.

The bulk of the argument is that there's a selection bias, which shows
up as "fixes look worse", and that *also* gets worse as you get later
in the rc period.

> 2. You argue that stable fixes (i.e. fixes for bugs introduced in
> previous kernel versions) are getting trickier as -rc cycles go on -
> which I completely disagree with.

No, this is not the "trickier because it took longer to find". This is
mostly the "fixes during the merge window get lost in the noise"
argument.

Why does rc5+ look worse than the merge window when you do statistics?
Because when you look for fixes *early* in the release, you are simply
mixing those fixes up with a lot of "background noise".

Note that this is true even if you were to look _only_ at fixes. The
simple non-critical fixes don't tend to get pushed to me during the
later rc series at all. If it's not critical, but simply fixes some
random issue, people put it in their "next" branch.

And *that* gets more common as the rc series gets later.

So you have a double whammy. Later rc's get fewer patches overall -
obviously there shouldn't be anything *but* fixes, but we all know
that's not entirely true - and even when it comes to fixes it gets
fewer of the of the trivial non-critical ones.

What are left? During the later rc series, I argue that even for
stable fixes, you *should* expect to see more of the nasty kinds of
fixes, and - again, BY DEFINITION - fixes that got less testing time
in linux-next.

Why the "BY DEFINITION"? Simply exactly because of that simple issue
of "people thought this was a critical issue, so they pushed it late
in the rc rather than putting it in their pile for the next merge
window" issue.

Don't you see how that *directly* translates into your "less testing
time" metric?

It's not even a correlation, it's literally just direct causation.

But this is not something we can or we should change. A more important
fix *should* go on earlier, for chrissake! That's such an obvious
thing that I really don't see anybody seriously arguing anything else.

Put another way: of _course_ the simple and less important stuff gets
delayed more, and of _course_ that means that they look better in your
"testing time metrics".

And of _course_ the simple stuff causes less problems.

So this is what my argument really boils down to: the more critical a
patch is, the more likely it is to be pushed more aggressively, which
in turn makes it statistically much more likely to show up not only
during the latter part of the development cycle, but it will directly
mean that it looks "less tested".

And AT THE SAME TIME, the more critical a patch is, the more likely it
is to also show up as a problem spot for distros. Because, by
definition, it touched something critical and likely subtle.

End result: BY DEFINITION you'll see a correlation between "less
testing" and "more problems".

But THAT is correlation. That's not the fundamental causation.

Now, I agree that it's correlation that makes sense to treat as
causation. It just is very tempting to say: "less testing obviously
means more problems". And I do think that it's very possibly a real
causal property as well, but my argument has been that it's not at all
obviously so, exactly because I would expect that correlation to exist
even if there was absolutely ZERO causality.

See what my argument is? You're arguing from correlation. And I think
there is a much more direct causal argument that explains a lot of the
correlation.

> Stable fixes look the same whether they showed up during the merge
> window, -rc1 or -rc8, they are disconnected from whatever stage we're at
> in the release cycle.

See above. That's simply not true. An unimportant stable fix is less
likely to show up in rc8 than in the merge window. Again, for the
selection bias.

The stuff that shows up in late rc's really is supposed to be somewhat special.

Will there be critical stable fixes during merge window and early
rc's? Yes. But they will be statistically fewer, simply because
there's a lot of the non-critical stuff.

> If you agree with me on that, maybe you could explain why most of the
> stable regressions seem to show up in -rc5 or later? Shouldn't there be
> an even distribution of stable regressions throughout the release cycle?

First off, I obviously don't agree with your.

But secondly, an N=5 is likely not statistically relevant anyway.

And thirdly, clearly some of the problems stable has isn't about the
patch itself, which was fine in mainline. Even in your N=5 case, we
had at least one of those (the TCP one), where the problem was that
another patch it depended on hadn't been backported.

That, btw, might be another "later rcs look worse in stable". Simply
because fixes in later rcs obviously have way more of the "we found
this in this cycle because of the _other_ changes we were working on
during this release". Maybe the other changes _triggered_ the problem
more easily, for example. So then you find a (subtle) bug, and realize
that the bug has been there for years, and mark it for stable.

And guess what? That fix for a N-year-old bug is now fundamentally
more likely to depend on all the changes you just did, which weren't
necessarily marked for stable, because they supposedly weren't
bugfixes.

See? I'm just arguing that there can be correlations with problems
that are much more likely than "it spent only 3 days in next before it
got into mainline".

> Sure, the various bots cover much less ground than actual users testing
> stuff out.
>
> However, your approach discourages further development of those bots.

So that I absolutely do *not* want to do, and not want to be seen doing.

But honestly, I do not think "it got merged early" should even be seen
as that kind of argument. There should be *more* bots testing things I
merge. Because even when you test linux-next, you're by implication
testing the stuff I'm merging, since mainline too gets merged into
linux-next.

So I do think that it's true that

 (a) bots generally haven't hit the issues in question, because if
they had, they would have been seen and noted _before_ they made it to
stable

 (b) bots potentially *cannot* hit it in mainline or linux-next,
because what gets back-ported is not "mainline or linux-next", but a
tiny tiny percentage of it, and the very act of backporting may be the
thing that introduces the problem

but neither of those arguments is an argument to discourage further
development of bots. Quite the reverse.

                 Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 15:06               ` Sasha Levin
@ 2018-09-07 15:54                 ` Laura Abbott
  2018-09-07 16:09                   ` Sasha Levin
  2018-09-07 21:32                 ` Dan Carpenter
  1 sibling, 1 reply; 138+ messages in thread
From: Laura Abbott @ 2018-09-07 15:54 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On 09/07/2018 08:06 AM, Sasha Levin wrote:
> On Fri, Sep 07, 2018 at 07:37:06AM -0700, Laura Abbott wrote:
>> On 09/06/2018 07:52 PM, Guenter Roeck wrote:
>>> On 09/06/2018 06:49 PM, Sasha Levin via Ksummit-discuss wrote:
>>>>
>>>> This is a *huge* reason why we see regressions in Stable. Take a look at
>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fksummit-discuss%2F2018-September%2F005287.html&amp;data=02%7C01%7CAlexander.Levin%40microsoft.com%7Cf206ee69bd71452d0d8d08d614cf64a5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636719278316194423&amp;sdata=rYS7lCcdGMzo0kbFowVot790z8GV2Alr1ynEBd1X6qA%3D&amp;reserved=0
>>>> for a list of recent user visible regressions the CoreOS folks have
>>>> observed this year. Do you want to know when they were merged? Let me
>>>> help you: all but one were merged in -rc5 or later.
>>>>
>>>
>>> My conclusion from that would be that patches are applied to stable
>>> before they had time to soak in mainline. Your argument against
>>> accepting patches into mainline might as well be applied to patches
>>> applied to stable.
>>>
>>> I think you are a bit hypocritical arguing that patches should be
>>> restricted from being accepted into mainline ... when at the same
>>> time patches are at least sometimes applied almost immediately to
>>> stable releases from there. Plus, some if not many of the patches
>>> applied to stable releases nowadays don't really fix critical or
>>> even severe bugs. If the patches mentioned above indeed caused
>>> regressions in mainline, those regressions should have been found
>>> and fixed _before_ the patches made it into stable releases.
>>> Blaming mainline for the problem is just shifting the blame.
>>>
>>> I would argue that, if anything, the rules for accepting patches into
>>> _stable_ releases should be much more strict than they are today.
>>> If anything, we need to look into that, not into restricting patch
>>> access to mainline.
>>
>> Part of my proposal for a longer -rc time for stable was for this
>> exact problem: patches that have been merged in mainline but
>> tagged for stable may not have had time to test to find all
>> bugs. The thought was a longer stable -rc cycle would help
>> in finding those. I think you've hit upon the real problem
>> though which is that the patches probably shouldn't have been
>> in stable in the first place.
> 
> Let me use the CoreOS example here again. Here are the 5 user visible
> stable regressions they had this year:
> 
> 8844618d8aa ("ext4: only look at the bg_flags field if it is valid")
> f46ecbd97f5 ("cifs: Fix slab-out-of-bounds in send_set_info() on SMB2
> ACE setting")
> a6f81fcb2c3 ("tcp: avoid integer overflows in tcp_rcv_space_adjust()")
> 7b2ee50c0cd ("hv_netvsc: common detach logic")
> f599c64fdf7 ("xen-netfront: Fix race between device setup and open")
> a93bf0ff449 ("vxlan: update skb dst pmtu on tx path")
> 
> Which of those patches would you not take in a stable tree in the first
> place?
Okay let me see if I can choose my wording better so I'm not going
around in circles.

I don't disagree that those patches look like they should go in stable.
My issue is that a stable release went out with those patches in them
when they were buggy. We're very good at finding patches for stable
which fix bugs. We're less good at finding buggy patches themselves
in stable. Can we make more of a distinction between patches that
are proposed for stable (all of those patches) and patches that
have had enough testing to be included in stable (probably not those
patches)? I'd like to answer the question of what more could be done
(testing?) to identify those patches which are tagged as fixing
bugs but are also still buggy.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 15:54                 ` Laura Abbott
@ 2018-09-07 16:09                   ` Sasha Levin
  2018-09-07 20:23                     ` Greg KH
  0 siblings, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 16:09 UTC (permalink / raw)
  To: Laura Abbott; +Cc: ksummit

On Fri, Sep 07, 2018 at 08:54:54AM -0700, Laura Abbott wrote:
>I don't disagree that those patches look like they should go in stable.
>My issue is that a stable release went out with those patches in them
>when they were buggy. We're very good at finding patches for stable
>which fix bugs. We're less good at finding buggy patches themselves
>in stable. Can we make more of a distinction between patches that
>are proposed for stable (all of those patches) and patches that
>have had enough testing to be included in stable (probably not those
>patches)? I'd like to answer the question of what more could be done
>(testing?) to identify those patches which are tagged as fixing
>bugs but are also still buggy.

I agree.

What are your thoughts about a stable-next branch of sorts where we can
push stable tagged fixes as soon as they hit either Linus's tree or
maybe the pending-fixes branch in linux-next?

This way we'll have a longer term stable tree to test, and Greg can just
cut releases from there.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 15:52               ` Linus Torvalds
@ 2018-09-07 16:17                 ` Linus Torvalds
  2018-09-07 21:39                   ` Mauro Carvalho Chehab
  2018-09-09 12:50                   ` Stephen Rothwell
  2018-09-10 19:43                 ` Sasha Levin
  1 sibling, 2 replies; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07 16:17 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 7, 2018 at 8:52 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> See? I'm just arguing that there can be correlations with problems
> that are much more likely than "it spent only 3 days in next before it
> got into mainline".

Btw, another source of these kinds of non-causal correlations might be
just how people work.

For example, for me, the merge window is my busiest season by far
(obviously).  I try to schedule my time off to coincide with late in
the rc, because it's just quieter: at that point I'm mostly waiting
for stuff.

But that's actually _supposed_ to be just me (and maybe Stephen Rothwell).

And for maintainers, it can be the exact reverse. In Vancouver Greg
said that normally, for him, the merge window is when he can take a
break, because the bulk of his work is the "leading up to merge
window" time, since that's when he works with people to set up the
branches for the next merge window.

So *during* the merge window, the development tree actually looks
really busy, but maintainers may be taking it easy and are basically
in the "my work is done, now I'm waiting for reports". So exactly the
reverse of my situation, and exactly the reverse of what it *looks*
like in the development tree.

Everybody thinks that the merge window is when all the work goes on,
but in reality, the reverse can true. The merge window and the early
rc's can (and almost certainly _should_) be the time when a maintainer
takes a breather.

And guess what? Last merge window was the exception to that rule. With
the timing of L1TF, we had the stable tree work happening on patches
that were merged during the merge window.

And honestly, I'd not be surprised at all if the usual "stable patches
that came in rc5+ caused more problems" is reversed this time around.
We already know that this time around, we had tons of issues with the
stable tree with patches that were merged in mainline during the merge
window.

Of course, there will - once again - be a very strong correlation with
"it wasn't in linux-next". But this time the correlation won't be with
"rc5+".

And - once again - the correlation is real, but it's incidental to the
*real* causal relationship. It's just that - correlation. The real
causality just happened to be different this time, and so you won't
find the usual "rc5+" correlation, but you will still find the "less
time in linux-next" one.

But normally, I'd actually expect that "late rc" is exactly when
maintainers are doing most of their work, and then they see "oh, this
patch needs to go in *now*, so I'll send it to Linus in a fixes pull".

So none of my arguments are "testing is bad". I really don't want it
to appear that I make that argument.

But honestly, my reaction remains that "late rc fixes are more likely
to cause nasty problems" sounds very natural to me, and sounds largely
independent of the testing issue.

           Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  9:07                 ` Daniel Vetter
  2018-09-07  9:28                   ` Geert Uytterhoeven
@ 2018-09-07 17:05                   ` Olof Johansson
  1 sibling, 0 replies; 138+ messages in thread
From: Olof Johansson @ 2018-09-07 17:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit

On Fri, Sep 7, 2018 at 2:07 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Fri, Sep 7, 2018 at 10:40 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
>> On Fri, Sep 7, 2018 at 4:45 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>>> Another issue about having fixes sit in linux-next for some time after
>>> -rc5, is that by that time, linux-next is filled with new development
>>> code waiting for the next merge window. A subtle fix for a bug that
>>> wasn't caught by linux-next in the first place (how else would that bug
>>> still be around by rc5?) is highly likely not to catch a bug with the
>>> fix to that subtle bug.
>>
>> Or the issue may never show up in linux-next at all.
>>
>> With strict by-subsystem merge policies, splitting a complicated patch set
>> by subsystem and scheduling it for inclusion across multiple kernel
>> versions can be challenging.  If anything slightly related is applied
>> independently, or due to a scheduling oversight or merge window miss, this
>> may lead to a regression in mainline that is never present in linux-next,
>> and usually only detected late, leading to a fix after -rc5.
>
> Aside: I'm still baffled at how much soc people split up their work.
> As you point out, testing becomes a game of luck, because integration
> happens only in the merge window for real.
>
> Anything I'm involved in I'm insisting on a proper topic branch that
> everyone pulls in, to make sure that we can test the full interactions
> when we actually feature-freeze before the merge window. Usually that
> means baking the merge into the drm side, because we do a lot more
> testing than others (e.g. upstream intel audio validation is done
> through the drm trees too - it doesn't work that great because it's
> post-merge only for sound).

As Geert mentioned, each platform maintainer usually merges their
pending work into an integration tree on their own, and test that.

There are a few reasons for why we've been splitting it up as much as
we have, but most of them come back to some aspect of scale.

You're one large vendor, coordinating with a few other maintainers
isn't so bad. When you've got 10 different vendors all coordinating,
and some of those sharing the drivers that they need to coordinate
about, it quickly can get into an enormous conflict-ridden mess.

One option there is to merge everything through arm-soc (I guess
that's the equivalent of what you've been doing), but we've been
pretty careful about making sure we spread out the load of
merging/reviewing/maintaining ARM-related things across the community
and not just on us. If we can avoid adding dependencies by doing just
a little bit more work, that's normally what we prefer.

It also forces people to structure their work a bit more into
separating cleanups and new features, which in my opinion isn't that
bad idea. It usually helps things like backports where needed too,
since if you intermingle the two you end up with a pretty heavy list
of dependencies for those looking to do things like driver backports
or stable fixes.

-Olof

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  1:09       ` Steven Rostedt
@ 2018-09-07 20:12         ` Greg KH
  2018-09-07 21:12           ` Greg KH
  0 siblings, 1 reply; 138+ messages in thread
From: Greg KH @ 2018-09-07 20:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Thu, Sep 06, 2018 at 09:09:31PM -0400, Steven Rostedt wrote:
> On Fri, 7 Sep 2018 00:51:42 +0000
> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> 
> > Assuming you've read the original mail, it appears that most parties who
> > participated in the discussion agreed that there's an issue where
> > patches that go in during (late) -rc cycles seems to be less tested and
> > are buggier than they should be.
> > 
> > Most of that thread discussed possible solutions such as:
> > 
> >  - Not taking non-critical patches past -rcX (-rc4 seemed to be a
> >    popular one).
> >  - -rc patches must fix something introduced in the current merge
> >    window. Patches fixing anything older should go in the next merge
> >    window.
> 
> Interesting, because this is exactly what Linus blew up about that made
> headlines and a loss of a kernel developer 5 years ago:
> 
>   https://lore.kernel.org/lkml/1373593870.17876.70.camel@gandalf.local.home/T/#mb7018718ce288b55fe041778721004cd62cd00a1

And it turns out that today I am feeling the same way again as I said so
here:
	https://lore.kernel.org/lkml/20130711214830.611455274@linuxfoundation.org/

Looking at the patches in this -rc1 merge window that were marked for
stable, and some of the dates of them (really old for some subsystems),
it makes me "wonder" why they were postponed so for -rc1, and didn't go
into the -final release.

I know I have done this for "small" patches, or stuff that comes in late
in the -rc cycle that just really does not matter much.  Or for things
that I want to see "bake" in linux-next more.  But even with that, I
don't think that's what is happening here, I think maintainers are just
waiting until -rc1 as it's "easier".   I really have no other
explaination.

Now I can't reject the patches as they are good fixes, and they are now
in Linus's tree.  But the "delay" is worrying to me.  I don't know what
to do about it...

Look at what comes out in this next round of stable releases, and tell
me that all of those really deserved to wait for -rc1.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 16:09                   ` Sasha Levin
@ 2018-09-07 20:23                     ` Greg KH
  2018-09-07 21:13                       ` Sasha Levin
  0 siblings, 1 reply; 138+ messages in thread
From: Greg KH @ 2018-09-07 20:23 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 07, 2018 at 04:09:46PM +0000, Sasha Levin via Ksummit-discuss wrote:
> On Fri, Sep 07, 2018 at 08:54:54AM -0700, Laura Abbott wrote:
> >I don't disagree that those patches look like they should go in stable.
> >My issue is that a stable release went out with those patches in them
> >when they were buggy. We're very good at finding patches for stable
> >which fix bugs. We're less good at finding buggy patches themselves
> >in stable. Can we make more of a distinction between patches that
> >are proposed for stable (all of those patches) and patches that
> >have had enough testing to be included in stable (probably not those
> >patches)? I'd like to answer the question of what more could be done
> >(testing?) to identify those patches which are tagged as fixing
> >bugs but are also still buggy.

I am "supposed" to be waiting a full -rc cycle from when a patch hits
Linus's tree and when I push it out in a stable release.  Most of the
time that happens, but every once in a while I am ahead of the game and
get it out the same week.

So far, no one has noticed, or if they have, they have not told me :)

Unfortunately (or fortunately depending on your viewpoint), due to the
security mess lately, I've been backlogged and have been keeping pretty
true to the "wait for an -rc" rule.

Note, that this doesn't come into play for things that I "think" are
security issues, or patches that are just so "obviously correct" that I
pick up being merged in -rc1 before -rc1 is out to try to stay ahead of
the mess that happens after -rc1 is released (too many patches,
different email thread, etc.)

So it's a mixed bag, I have been trying to wait, yet people somehow feel
I'm not waiting long enough?  How long is enough?  Given that the number
of regressions we have is _very_ low overall, and 0 is impossible to
ever hit, I don't know what else to do at this point in time.

> What are your thoughts about a stable-next branch of sorts where we can
> push stable tagged fixes as soon as they hit either Linus's tree or
> maybe the pending-fixes branch in linux-next?
> 
> This way we'll have a longer term stable tree to test, and Greg can just
> cut releases from there.

No one will pay attention to "stable-next", why would putting something
there be any different from what I do now?  We run all of the normal
bots on the stable-rc releases, putting it out for a week longer would
not cause anything else to happen.

Except for a delay to have any patch in Linus's tree show up to be a
problem, and then a fix show up there.  But then I would have to notice
that this patch that showed up now, really fixes something that I need
to apply now, and not wait for a -rc release before applying.  But wait,
that fix was buggy and it should have soaked for longer...

See, I just can't win :)

So I'll stick to the "wait for a -rc" for now, as it seems like the best
middle ground we have come up with here so far.

And I don't want to maintain yet-another-tree...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 15:07               ` Jens Axboe
@ 2018-09-07 20:58                 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 138+ messages in thread
From: Mauro Carvalho Chehab @ 2018-09-07 20:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: ksummit

Em Fri, 7 Sep 2018 09:07:41 -0600
Jens Axboe <axboe@kernel.dk> escreveu:

> On 9/7/18 8:56 AM, Sasha Levin via Ksummit-discuss wrote:
> > On Fri, Sep 07, 2018 at 12:27:54AM -0400, Theodore Y. Ts'o wrote:  
> >> As far as users testing Linux-next --- I'm willing to try running
> >> anything past, say, -rc3 on my laptop.  But running linux-next?  Heck,
> >> no!  That's way too scary for me.  
> > 
> > That's why linux-next has a pending-fixes branch. IMO it makes more
> > sense to run that than a random -rcX release.  
> 
> I'm pretty convinced that linux-next is very useful as integration
> testing. On numerous occasions I learn of conflicts that will impact me
> for the merge window, and Stephen is great at providing merge fixes that
> helps everybody out. I'm much less convinced that it's useful for
> runtime testing. It's extremely rare that I get a bug report on
> linux-next, whereas I get them after patches have been merged into
> Linus's tree all the time. Nobody is going to be running that
> pending-fixes branch.

Same applies here: the stuff I usually get from linux-next bots
are usually due to some random config that, and, while it is nice
to fix, there's no real impact in practice, as it usually means 
building a driver for an architecture where it doesn't apply, or
a weird mix of modules/builtin drivers with no users.

Yet, I always wait for a patch to be merged at next before sending
upstream, although I usually don't wait for a long time after
-next in order to send stuff upstream, as I don't expect users
to actually test what's at -next.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07  9:13             ` Daniel Vetter
  2018-09-07 11:32               ` Mark Brown
@ 2018-09-07 21:06               ` Mauro Carvalho Chehab
  2018-09-08  9:44                 ` Laurent Pinchart
  1 sibling, 1 reply; 138+ messages in thread
From: Mauro Carvalho Chehab @ 2018-09-07 21:06 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit

Em Fri, 7 Sep 2018 11:13:20 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Sep 7, 2018 at 6:27 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> > On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via Ksummit-discuss wrote:  

> > There actually is a perverse incentive to having all of the test
> > 'bots, which is that I suspect some people have come to rely on it to
> > catch problems.  I generally run a full set of regression tests before
> > I push an update to git.kernel.org (it only takes about 2 hours, and
> > 12 VM's :-); and by the time we get to the late -rc's I *always* will
> > do a full regression test.  
> 
> This is what imo a well-run subsystem should sound like from a testing
> pov. All the subsystem specific testing should be done before merging.
> Post-merge is only for integration testing and catching the long-tail
> issues that need months/years of machine time to surface.
> 
> Of course this is much harder for anything that needs physical
> hardware, but even for driver subsystems there's lots you can do with
> test-drivers, selftests and a pile of emulation, to at least catch
> bugs in generic code. And for reasonably sized teams like drm/i915
> building a proper CI is a very obvious investement that will pay off.

IMHO, CI would do even a better job for smaller teams, as they won't
have much resources for testing, but the problem here is that those
teams probably lack resources and money to invest on a physical hardware
to setup a CI infra and to buy the myriad of different hardware to
do regression testing.

Also, some devices are harder to test: how would you check if a camera
microphone is working? How to check if the camera captured images
are ok?

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 20:12         ` Greg KH
@ 2018-09-07 21:12           ` Greg KH
  0 siblings, 0 replies; 138+ messages in thread
From: Greg KH @ 2018-09-07 21:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Fri, Sep 07, 2018 at 10:12:24PM +0200, Greg KH wrote:
> On Thu, Sep 06, 2018 at 09:09:31PM -0400, Steven Rostedt wrote:
> > On Fri, 7 Sep 2018 00:51:42 +0000
> > Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> > 
> > > Assuming you've read the original mail, it appears that most parties who
> > > participated in the discussion agreed that there's an issue where
> > > patches that go in during (late) -rc cycles seems to be less tested and
> > > are buggier than they should be.
> > > 
> > > Most of that thread discussed possible solutions such as:
> > > 
> > >  - Not taking non-critical patches past -rcX (-rc4 seemed to be a
> > >    popular one).
> > >  - -rc patches must fix something introduced in the current merge
> > >    window. Patches fixing anything older should go in the next merge
> > >    window.
> > 
> > Interesting, because this is exactly what Linus blew up about that made
> > headlines and a loss of a kernel developer 5 years ago:
> > 
> >   https://lore.kernel.org/lkml/1373593870.17876.70.camel@gandalf.local.home/T/#mb7018718ce288b55fe041778721004cd62cd00a1
> 
> And it turns out that today I am feeling the same way again as I said so
> here:
> 	https://lore.kernel.org/lkml/20130711214830.611455274@linuxfoundation.org/
> 
> Looking at the patches in this -rc1 merge window that were marked for
> stable, and some of the dates of them (really old for some subsystems),
> it makes me "wonder" why they were postponed so for -rc1, and didn't go
> into the -final release.

Ok, my mistake, I was looking at stuff that hit between -rc1 and -rc2 as
well as -rc1 patches, so I might be totally wrong here.  It just "feels"
a little odd that some of those patches had such "old" dates on them...

Anyway, no rant from me at the moment :)

greg k-h

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 20:23                     ` Greg KH
@ 2018-09-07 21:13                       ` Sasha Levin
  2018-09-07 22:27                         ` Linus Torvalds
  0 siblings, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 21:13 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit

On Fri, Sep 07, 2018 at 10:23:28PM +0200, Greg KH wrote:
>On Fri, Sep 07, 2018 at 04:09:46PM +0000, Sasha Levin via Ksummit-discuss wrote:
>> What are your thoughts about a stable-next branch of sorts where we can
>> push stable tagged fixes as soon as they hit either Linus's tree or
>> maybe the pending-fixes branch in linux-next?
>>
>> This way we'll have a longer term stable tree to test, and Greg can just
>> cut releases from there.
>
>No one will pay attention to "stable-next", why would putting something
>there be any different from what I do now?  We run all of the normal
>bots on the stable-rc releases, putting it out for a week longer would
>not cause anything else to happen.

We run bots on stable-rc, but the point I think Laura was trying to make
(and I agree with) is that the 2-3 days of stable-rc isn't enough for
non-bot tests. We'd like to have actual users run stable-rc as well.

So yes, putting it for longer will add a lot more testing.

So the stable-next is just a way for folks to test out new stable
commits without you having to do longer -rc cycles or maintaining extra
trees.

Right now your workflow seems to be:

1. Grab a batch of ~2-3 week old commits from Linus's tree.
2. Review, basic tests and send stable-rc notification.
3. Wait a few days for reviews.
4. Ship it.

The part that's tricky here is that there are only a few days during
step 3 to test out that stable-rc kernel. Not enough for Fedora to let
their testers to get it and play around with.

With a -next branch, this might look something like this:

1. Grab stable tagged commits as they go in Linus's tree and put them on
top of the appropriate stable-next branches (i.e. linux-4.14.y-next).
2. X times a week pick a batch of ~2-3 week old commits, put them in the
-rc branch and send out a review request.
3. Wait a few days for reviews.
4. Ship it.

So it's very similar, but between steps 1 and 2 folks have a chance to
further test out stable commits. This is something that Fedora, for
example, could offer to it's testers as a kernel option.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 15:06               ` Sasha Levin
  2018-09-07 15:54                 ` Laura Abbott
@ 2018-09-07 21:32                 ` Dan Carpenter
  2018-09-07 21:43                   ` Sasha Levin
  2018-09-10  7:53                   ` Jan Kara
  1 sibling, 2 replies; 138+ messages in thread
From: Dan Carpenter @ 2018-09-07 21:32 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 07, 2018 at 03:06:24PM +0000, Sasha Levin via Ksummit-discuss wrote:
> Let me use the CoreOS example here again. Here are the 5 user visible
> stable regressions they had this year:
> 
> 8844618d8aa ("ext4: only look at the bg_flags field if it is valid")

The fix was 501228470077 ("ext4: fix check to prevent initializing
reserved inodes").  The bug was found by running the test suite but with
nojournal?

> f46ecbd97f5 ("cifs: Fix slab-out-of-bounds in send_set_info() on SMB2
> ACE setting")

What was the bug with this one?

> a6f81fcb2c3 ("tcp: avoid integer overflows in tcp_rcv_space_adjust()")

My understanding was that this one was applied without a patch it
depended on? 02db55718d53 ("tcp: do not overshoot window_clamp in
tcp_rcv_space_adjust()")

> 7b2ee50c0cd ("hv_netvsc: common detach logic")

The patch summary sells this as a cleanup but it's a bugfix.  The fix
for it was commit 52acf73b6e9a ("hv_netvsc: Fix a network regression
after ifdown/ifup").  It took two months for anyone to notice the if
up/down sometimes fails.  Are there any standard tests for network
drivers?  There is no way we're going to hold back the patch for two
months.

> f599c64fdf7 ("xen-netfront: Fix race between device setup and open")

Two bugs:
cb257783c292 ("xen-netfront: Fix mismatched rtnl_unlock")
45c8184c1bed ("xen-netfront: Update features after registering netdev")

We should add a static checker warning to prevent the first one from
re-occuring.  Just send an email to Julia or me.  For the second one, it
really feels like we should have a test suite to see if setting the MTU
works.

> a93bf0ff449 ("vxlan: update skb dst pmtu on tx path")

The fix was commit f15ca723c1eb ("net: don't call update_pmtu
unconditionally").

Why does this patch add a NULL check for "dst"?  Is that required?  The
original code generated a static checker warning for me that "error:
potential null dereference 'dst'.  (skb_dst returns null)".  I have 54
places where the skb_dst() return isn't checked but I don't really
understand the code so I ignore those.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 16:17                 ` Linus Torvalds
@ 2018-09-07 21:39                   ` Mauro Carvalho Chehab
  2018-09-09 12:50                   ` Stephen Rothwell
  1 sibling, 0 replies; 138+ messages in thread
From: Mauro Carvalho Chehab @ 2018-09-07 21:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

Em Fri, 7 Sep 2018 09:17:18 -0700
Linus Torvalds <torvalds@linux-foundation.org> escreveu:

> On Fri, Sep 7, 2018 at 8:52 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:

> And for maintainers, it can be the exact reverse. In Vancouver Greg
> said that normally, for him, the merge window is when he can take a
> break, because the bulk of his work is the "leading up to merge
> window" time, since that's when he works with people to set up the
> branches for the next merge window.

That's exactly what I do here :-)

During the merge window, I usually take off my maintainer's hat and
I either do some usespace stuff or do some development myself.

It is not uncommon that, whatever I'm doing during the merge window
would require part of the -rc1 week to finish. So, typically, fixes
start being merged during -rc2 week, meaning that they'll reach
upstream by -rc3 week or later.

From subsystem developer's side, what I notice from my chair is that
most people developing new features work to cope with the Kernel 
merging cycle, meaning that they do their testing and development
focused on getting their stuff merged up to the -rc6 week. The 
vast majority of pull requests I receive are sent during the -rc6
week.

Most of the bug fixes we have actually are discovered from
people are working on new stuff.

Ok, pure bug reports (and fixes) can be merged anytime, but,
due to the core developer's working cycle, in practice that means
that I receive more bug fixes (both critical and non-critical)
late at the -rc cycle.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 21:32                 ` Dan Carpenter
@ 2018-09-07 21:43                   ` Sasha Levin
  2018-09-08 13:20                     ` Dan Carpenter
  2018-09-10  8:23                     ` Jan Kara
  2018-09-10  7:53                   ` Jan Kara
  1 sibling, 2 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 21:43 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: ksummit

On Sat, Sep 08, 2018 at 12:32:13AM +0300, Dan Carpenter wrote:
>On Fri, Sep 07, 2018 at 03:06:24PM +0000, Sasha Levin via Ksummit-discuss wrote:
>> Let me use the CoreOS example here again. Here are the 5 user visible
>> stable regressions they had this year:
>>
>> 8844618d8aa ("ext4: only look at the bg_flags field if it is valid")
>
>The fix was 501228470077 ("ext4: fix check to prevent initializing
>reserved inodes").  The bug was found by running the test suite but with
>nojournal?

It looks like it was found when people tried mounting 3TB+ filesystems
and failed.

>> f46ecbd97f5 ("cifs: Fix slab-out-of-bounds in send_set_info() on SMB2
>> ACE setting")
>
>What was the bug with this one?

https://github.com/coreos/bugs/issues/2480

>> a6f81fcb2c3 ("tcp: avoid integer overflows in tcp_rcv_space_adjust()")
>
>My understanding was that this one was applied without a patch it
>depended on? 02db55718d53 ("tcp: do not overshoot window_clamp in
>tcp_rcv_space_adjust()")

Right. I would thing this would get caught using automatic regression as
apparently it reduced network throughput down to 300bytes/sec.

>> 7b2ee50c0cd ("hv_netvsc: common detach logic")
>
>The patch summary sells this as a cleanup but it's a bugfix.  The fix
>for it was commit 52acf73b6e9a ("hv_netvsc: Fix a network regression
>after ifdown/ifup").  It took two months for anyone to notice the if
>up/down sometimes fails.  Are there any standard tests for network
>drivers?  There is no way we're going to hold back the patch for two
>months.

These examples were less about "keep it waiting longer" and more to show
that it'll be hard and/or pointless trying to restrict what goes in
Stable as regressions come from commits that are "obviously" stable
material.

>> f599c64fdf7 ("xen-netfront: Fix race between device setup and open")
>
>Two bugs:
>cb257783c292 ("xen-netfront: Fix mismatched rtnl_unlock")
>45c8184c1bed ("xen-netfront: Update features after registering netdev")
>
>We should add a static checker warning to prevent the first one from
>re-occuring.  Just send an email to Julia or me.  For the second one, it
>really feels like we should have a test suite to see if setting the MTU
>works.

It appears that there's a lot missing with how network devices are
getting tested.

--
Thanks,
Sasha

>> a93bf0ff449 ("vxlan: update skb dst pmtu on tx path")
>
>The fix was commit f15ca723c1eb ("net: don't call update_pmtu
>unconditionally").
>
>Why does this patch add a NULL check for "dst"?  Is that required?  The
>original code generated a static checker warning for me that "error:
>potential null dereference 'dst'.  (skb_dst returns null)".  I have 54
>places where the skb_dst() return isn't checked but I don't really
>understand the code so I ignore those.
>
>regards,
>dan carpenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 21:13                       ` Sasha Levin
@ 2018-09-07 22:27                         ` Linus Torvalds
  2018-09-07 22:43                           ` Guenter Roeck
  0 siblings, 1 reply; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07 22:27 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg Kroah-Hartman, ksummit

On Fri, Sep 7, 2018 at 2:13 PM Sasha Levin via Ksummit-discuss
<ksummit-discuss@lists.linuxfoundation.org> wrote:
>
> 1. Grab a batch of ~2-3 week old commits from Linus's tree.
> 2. Review, basic tests and send stable-rc notification.

Side note: maybe the stable grabbing and testing could be automated?

IOW, right now the stable people intentionally (generally) wait a week
before they even start. Maybe there could be an automated queue for
"this has been marked for stable" (and the whole "fixes:" magic that
you guys already trigger on) that gets applied to the previous stable
tree, and starts testing immediately.

Because one of the patterns we *do* obviously see is that something
was fine in mainline, but then broke in stable because of an unforseen
lack of depdenencies. Sure, it's probably pretty rare (and *many*
dependencies willl show up as an actual conflict), but I think the
times it does happen it's particularly painful because it can be so
non-obvious.

So maybe an automated "linux-next" that starts happening *before* the
rc stage would catch some things?

Done right, maybe it can be helpful to the stable flow in other ways
too (ie trigger "oops, this doesn't even apply" flow even before you
guys start actively looking at patches)?

Of course, it's easy to say "maybe we could add automation". Possibly
it would be really difficult to actually do that due to  conflicts etc
being *so* common that it just ends up being an unhelpful mess.

             Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 22:27                         ` Linus Torvalds
@ 2018-09-07 22:43                           ` Guenter Roeck
  2018-09-07 22:53                             ` Linus Torvalds
  2018-09-10 16:20                             ` Dan Rue
  0 siblings, 2 replies; 138+ messages in thread
From: Guenter Roeck @ 2018-09-07 22:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Greg Kroah-Hartman, ksummit

On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
> On Fri, Sep 7, 2018 at 2:13 PM Sasha Levin via Ksummit-discuss
> <ksummit-discuss@lists.linuxfoundation.org> wrote:
> >
> > 1. Grab a batch of ~2-3 week old commits from Linus's tree.
> > 2. Review, basic tests and send stable-rc notification.
> 
> Side note: maybe the stable grabbing and testing could be automated?
> 
> IOW, right now the stable people intentionally (generally) wait a week
> before they even start. Maybe there could be an automated queue for
> "this has been marked for stable" (and the whole "fixes:" magic that
> you guys already trigger on) that gets applied to the previous stable
> tree, and starts testing immediately.
> 
> Because one of the patterns we *do* obviously see is that something
> was fine in mainline, but then broke in stable because of an unforseen
> lack of depdenencies. Sure, it's probably pretty rare (and *many*
> dependencies willl show up as an actual conflict), but I think the
> times it does happen it's particularly painful because it can be so
> non-obvious.
> 
> So maybe an automated "linux-next" that starts happening *before* the
> rc stage would catch some things?
> 

And it does, as soon as Greg publishes a set of patches. At the very
least 0day runs on those, as well as my builders. There is a question
of scalability, though. I am sure that will improve over time as more
test resources become available, but six stable releases plus mainline
plus next plus whatever contributing branches covered by 0day and others
does take a lot of resources.

Personally I would suggest to further improve test coverage, not to add
more branches to test. More hardware for sure, but also adding more tests
such as the network testing suggested by Sasha.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 22:43                           ` Guenter Roeck
@ 2018-09-07 22:53                             ` Linus Torvalds
  2018-09-07 22:57                               ` Sasha Levin
  2018-09-10 16:20                             ` Dan Rue
  1 sibling, 1 reply; 138+ messages in thread
From: Linus Torvalds @ 2018-09-07 22:53 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, ksummit

On Fri, Sep 7, 2018 at 3:43 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
> >
> > So maybe an automated "linux-next" that starts happening *before* the
> > rc stage would catch some things?
>
> And it does, as soon as Greg publishes a set of patches.

Yes, yes, I was clearly not explaining myself well.

I see all the reports that you (and Nathan, and Shuah, and others) do
for stable rc's. So I very much know that happens.

But I was literally thinking of that week or two *before* Greg
actually picks up the stable patches because he wants to have them get
some testing in mainline first.

*If* the same kinds of scripts that Greg and Sasha already use to pick
up their stable patches could be automated early, maybe the patches
would also get a bit of special testing in the *context* of the stable
tree? A special automated "these are marked for stable, but haven't
been picked up yet" stable-next testing thing?

                  Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 22:53                             ` Linus Torvalds
@ 2018-09-07 22:57                               ` Sasha Levin
  2018-09-07 23:52                                 ` Guenter Roeck
  2018-09-08 16:33                                 ` Greg Kroah-Hartman
  0 siblings, 2 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-07 22:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Greg Kroah-Hartman, ksummit

On Fri, Sep 07, 2018 at 03:53:08PM -0700, Linus Torvalds wrote:
>On Fri, Sep 7, 2018 at 3:43 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
>> >
>> > So maybe an automated "linux-next" that starts happening *before* the
>> > rc stage would catch some things?
>>
>> And it does, as soon as Greg publishes a set of patches.
>
>Yes, yes, I was clearly not explaining myself well.
>
>I see all the reports that you (and Nathan, and Shuah, and others) do
>for stable rc's. So I very much know that happens.
>
>But I was literally thinking of that week or two *before* Greg
>actually picks up the stable patches because he wants to have them get
>some testing in mainline first.
>
>*If* the same kinds of scripts that Greg and Sasha already use to pick
>up their stable patches could be automated early, maybe the patches
>would also get a bit of special testing in the *context* of the stable
>tree? A special automated "these are marked for stable, but haven't
>been picked up yet" stable-next testing thing?

I agree. This is what I was suggesting with stable-next branches.

I actually had something to do automatic testing of such commits as they
get pushed upstream, sending mails with the results. For example: 
https://lkml.org/lkml/2018/4/4/923 but folks complained it was too
noisy.

Maybe I can use it to build -next branches instead.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 22:57                               ` Sasha Levin
@ 2018-09-07 23:52                                 ` Guenter Roeck
  2018-09-08 16:33                                 ` Greg Kroah-Hartman
  1 sibling, 0 replies; 138+ messages in thread
From: Guenter Roeck @ 2018-09-07 23:52 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg Kroah-Hartman, ksummit

On Fri, Sep 07, 2018 at 10:57:45PM +0000, Sasha Levin wrote:
> On Fri, Sep 07, 2018 at 03:53:08PM -0700, Linus Torvalds wrote:
> >On Fri, Sep 7, 2018 at 3:43 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
> >> >
> >> > So maybe an automated "linux-next" that starts happening *before* the
> >> > rc stage would catch some things?
> >>
> >> And it does, as soon as Greg publishes a set of patches.
> >
> >Yes, yes, I was clearly not explaining myself well.
> >
> >I see all the reports that you (and Nathan, and Shuah, and others) do
> >for stable rc's. So I very much know that happens.
> >
> >But I was literally thinking of that week or two *before* Greg
> >actually picks up the stable patches because he wants to have them get
> >some testing in mainline first.
> >
> >*If* the same kinds of scripts that Greg and Sasha already use to pick
> >up their stable patches could be automated early, maybe the patches
> >would also get a bit of special testing in the *context* of the stable
> >tree? A special automated "these are marked for stable, but haven't
> >been picked up yet" stable-next testing thing?
> 
> I agree. This is what I was suggesting with stable-next branches.
> 
> I actually had something to do automatic testing of such commits as they
> get pushed upstream, sending mails with the results. For example: 
> https://lkml.org/lkml/2018/4/4/923 but folks complained it was too
> noisy.
> 
Wrong lesson, I think. You can still run those tests, just don't send
result e-mails but use the result as decision criteria if the patch should
be included in a stable release (and, if not, inform the submitter that
it won't be applied to stable because the test failed).

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 21:06               ` Mauro Carvalho Chehab
@ 2018-09-08  9:44                 ` Laurent Pinchart
  2018-09-08 11:48                   ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-08  9:44 UTC (permalink / raw)
  To: ksummit-discuss; +Cc: Mauro Carvalho Chehab

On Saturday, 8 September 2018 00:06:33 EEST Mauro Carvalho Chehab wrote:
> Em Fri, 7 Sep 2018 11:13:20 +0200 Daniel Vetter escreveu:
> > On Fri, Sep 7, 2018 at 6:27 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> > > On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via
> > > Ksummit-discuss wrote:
> > > 
> > > There actually is a perverse incentive to having all of the test
> > > 'bots, which is that I suspect some people have come to rely on it to
> > > catch problems.  I generally run a full set of regression tests before
> > > I push an update to git.kernel.org (it only takes about 2 hours, and
> > > 12 VM's :-); and by the time we get to the late -rc's I *always* will
> > > do a full regression test.
> > 
> > This is what imo a well-run subsystem should sound like from a testing
> > pov. All the subsystem specific testing should be done before merging.
> > Post-merge is only for integration testing and catching the long-tail
> > issues that need months/years of machine time to surface.
> > 
> > Of course this is much harder for anything that needs physical
> > hardware, but even for driver subsystems there's lots you can do with
> > test-drivers, selftests and a pile of emulation, to at least catch
> > bugs in generic code. And for reasonably sized teams like drm/i915
> > building a proper CI is a very obvious investement that will pay off.
> 
> IMHO, CI would do even a better job for smaller teams, as they won't
> have much resources for testing, but the problem here is that those
> teams probably lack resources and money to invest on a physical hardware
> to setup a CI infra and to buy the myriad of different hardware to
> do regression testing.
> 
> Also, some devices are harder to test: how would you check if a camera
> microphone is working? How to check if the camera captured images
> are ok?

The same way you would check the display output. Cameras can be pointed at 
known scenes with controlled lightning. TV capture cards can be fed a known 
signal. Even for microphone testing we could put the camera in a sound-proof 
enclosure, with an audio source. Solutions exist, whether we have the budget 
to implement them is the real question.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-08  9:44                 ` Laurent Pinchart
@ 2018-09-08 11:48                   ` Mauro Carvalho Chehab
  2018-09-09 14:26                     ` Laurent Pinchart
  0 siblings, 1 reply; 138+ messages in thread
From: Mauro Carvalho Chehab @ 2018-09-08 11:48 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: ksummit-discuss

Em Sat, 08 Sep 2018 12:44:32 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu:

> On Saturday, 8 September 2018 00:06:33 EEST Mauro Carvalho Chehab wrote:
> > Em Fri, 7 Sep 2018 11:13:20 +0200 Daniel Vetter escreveu:  
> > > On Fri, Sep 7, 2018 at 6:27 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote:  
> > > > On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via
> > > > Ksummit-discuss wrote:
> > > > 
> > > > There actually is a perverse incentive to having all of the test
> > > > 'bots, which is that I suspect some people have come to rely on it to
> > > > catch problems.  I generally run a full set of regression tests before
> > > > I push an update to git.kernel.org (it only takes about 2 hours, and
> > > > 12 VM's :-); and by the time we get to the late -rc's I *always* will
> > > > do a full regression test.  
> > > 
> > > This is what imo a well-run subsystem should sound like from a testing
> > > pov. All the subsystem specific testing should be done before merging.
> > > Post-merge is only for integration testing and catching the long-tail
> > > issues that need months/years of machine time to surface.
> > > 
> > > Of course this is much harder for anything that needs physical
> > > hardware, but even for driver subsystems there's lots you can do with
> > > test-drivers, selftests and a pile of emulation, to at least catch
> > > bugs in generic code. And for reasonably sized teams like drm/i915
> > > building a proper CI is a very obvious investement that will pay off.  
> > 
> > IMHO, CI would do even a better job for smaller teams, as they won't
> > have much resources for testing, but the problem here is that those
> > teams probably lack resources and money to invest on a physical hardware
> > to setup a CI infra and to buy the myriad of different hardware to
> > do regression testing.
> > 
> > Also, some devices are harder to test: how would you check if a camera
> > microphone is working? How to check if the camera captured images
> > are ok?  
> 
> The same way you would check the display output. Cameras can be pointed at 
> known scenes with controlled lightning. TV capture cards can be fed a known 
> signal. Even for microphone testing we could put the camera in a sound-proof 
> enclosure, with an audio source. Solutions exist, whether we have the budget 
> to implement them is the real question.

Solutions exist, but they require a hole new kind of environment control.

In the case of DRM (and TV cards), display output can be tested with some
HDMI grabber card. No need for a "controlled lightning environment" or
anything like that. Once it is set, people can just place it into a
random datacenter located anywhere and forget about it.

However, in the case of hardware like cameras, microphones, speakers,
keyboards, mice, touchscreen, etc, it is a way more complex, as the
environment will require adjustments (a silent room, specific
lightning, mechanical components, etc) and a more proactive supervision,
as it would tend to produce more false positive errors if something
changes there. A normal datacenter won't fit those needs.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 21:43                   ` Sasha Levin
@ 2018-09-08 13:20                     ` Dan Carpenter
  2018-09-10  8:23                     ` Jan Kara
  1 sibling, 0 replies; 138+ messages in thread
From: Dan Carpenter @ 2018-09-08 13:20 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 07, 2018 at 09:43:58PM +0000, Sasha Levin via Ksummit-discuss wrote:
> >> f46ecbd97f5 ("cifs: Fix slab-out-of-bounds in send_set_info() on SMB2
> >> ACE setting")
> >
> >What was the bug with this one?
> 
> https://github.com/coreos/bugs/issues/2480
> 

So this is another one where we backported a patch without pulling the
earlier changes.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 22:57                               ` Sasha Levin
  2018-09-07 23:52                                 ` Guenter Roeck
@ 2018-09-08 16:33                                 ` Greg Kroah-Hartman
  2018-09-08 18:35                                   ` Guenter Roeck
  2018-09-09  4:36                                   ` Sasha Levin
  1 sibling, 2 replies; 138+ messages in thread
From: Greg Kroah-Hartman @ 2018-09-08 16:33 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Fri, Sep 07, 2018 at 10:57:45PM +0000, Sasha Levin via Ksummit-discuss wrote:
> On Fri, Sep 07, 2018 at 03:53:08PM -0700, Linus Torvalds wrote:
> >On Fri, Sep 7, 2018 at 3:43 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
> >> >
> >> > So maybe an automated "linux-next" that starts happening *before* the
> >> > rc stage would catch some things?
> >>
> >> And it does, as soon as Greg publishes a set of patches.
> >
> >Yes, yes, I was clearly not explaining myself well.
> >
> >I see all the reports that you (and Nathan, and Shuah, and others) do
> >for stable rc's. So I very much know that happens.
> >
> >But I was literally thinking of that week or two *before* Greg
> >actually picks up the stable patches because he wants to have them get
> >some testing in mainline first.
> >
> >*If* the same kinds of scripts that Greg and Sasha already use to pick
> >up their stable patches could be automated early, maybe the patches
> >would also get a bit of special testing in the *context* of the stable
> >tree? A special automated "these are marked for stable, but haven't
> >been picked up yet" stable-next testing thing?
> 
> I agree. This is what I was suggesting with stable-next branches.

Ok, this sounds semi-reasonable.  I'll knock something up this week to
see if it's viable to do automated and then have it get sent to 0-day to
do a basic "smoke test".

let's see how that works...

greg k-h

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-08 16:33                                 ` Greg Kroah-Hartman
@ 2018-09-08 18:35                                   ` Guenter Roeck
  2018-09-10 13:47                                     ` Mark Brown
  2018-09-09  4:36                                   ` Sasha Levin
  1 sibling, 1 reply; 138+ messages in thread
From: Guenter Roeck @ 2018-09-08 18:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Sasha Levin; +Cc: ksummit

On 09/08/2018 09:33 AM, Greg Kroah-Hartman wrote:
> On Fri, Sep 07, 2018 at 10:57:45PM +0000, Sasha Levin via Ksummit-discuss wrote:
>> On Fri, Sep 07, 2018 at 03:53:08PM -0700, Linus Torvalds wrote:
>>> On Fri, Sep 7, 2018 at 3:43 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>
>>>> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
>>>>>
>>>>> So maybe an automated "linux-next" that starts happening *before* the
>>>>> rc stage would catch some things?
>>>>
>>>> And it does, as soon as Greg publishes a set of patches.
>>>
>>> Yes, yes, I was clearly not explaining myself well.
>>>
>>> I see all the reports that you (and Nathan, and Shuah, and others) do
>>> for stable rc's. So I very much know that happens.
>>>
>>> But I was literally thinking of that week or two *before* Greg
>>> actually picks up the stable patches because he wants to have them get
>>> some testing in mainline first.
>>>
>>> *If* the same kinds of scripts that Greg and Sasha already use to pick
>>> up their stable patches could be automated early, maybe the patches
>>> would also get a bit of special testing in the *context* of the stable
>>> tree? A special automated "these are marked for stable, but haven't
>>> been picked up yet" stable-next testing thing?
>>
>> I agree. This is what I was suggesting with stable-next branches.
> 
> Ok, this sounds semi-reasonable.  I'll knock something up this week to
> see if it's viable to do automated and then have it get sent to 0-day to
> do a basic "smoke test".

This is a good idea to help finding build errors earlier, but having it
in place would not have helped avoiding any of the reported regressions
on stable releases.

On the other side, adding networking tests and some basic virtual/network
file system tests to the existing test suites would have caught several
of the recent regressions. Not all of them, for sure, but at least some.
If I had to choose one improvement, doing that would be my preference.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-08 16:33                                 ` Greg Kroah-Hartman
  2018-09-08 18:35                                   ` Guenter Roeck
@ 2018-09-09  4:36                                   ` Sasha Levin
  1 sibling, 0 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-09  4:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: ksummit

On Sat, Sep 08, 2018 at 06:33:51PM +0200, Greg Kroah-Hartman wrote:
>On Fri, Sep 07, 2018 at 10:57:45PM +0000, Sasha Levin via Ksummit-discuss wrote:
>> On Fri, Sep 07, 2018 at 03:53:08PM -0700, Linus Torvalds wrote:
>> >On Fri, Sep 7, 2018 at 3:43 PM Guenter Roeck <linux@roeck-us.net> wrote:
>> >>
>> >> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
>> >> >
>> >> > So maybe an automated "linux-next" that starts happening *before* the
>> >> > rc stage would catch some things?
>> >>
>> >> And it does, as soon as Greg publishes a set of patches.
>> >
>> >Yes, yes, I was clearly not explaining myself well.
>> >
>> >I see all the reports that you (and Nathan, and Shuah, and others) do
>> >for stable rc's. So I very much know that happens.
>> >
>> >But I was literally thinking of that week or two *before* Greg
>> >actually picks up the stable patches because he wants to have them get
>> >some testing in mainline first.
>> >
>> >*If* the same kinds of scripts that Greg and Sasha already use to pick
>> >up their stable patches could be automated early, maybe the patches
>> >would also get a bit of special testing in the *context* of the stable
>> >tree? A special automated "these are marked for stable, but haven't
>> >been picked up yet" stable-next testing thing?
>>
>> I agree. This is what I was suggesting with stable-next branches.
>
>Ok, this sounds semi-reasonable.  I'll knock something up this week to
>see if it's viable to do automated and then have it get sent to 0-day to
>do a basic "smoke test".
>
>let's see how that works...

I've pushed an autogenerated branch for 4.14 here: https://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git/log/?h=linux-4.14.y-next

Would be interesting to compare with what you end up with.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 16:17                 ` Linus Torvalds
  2018-09-07 21:39                   ` Mauro Carvalho Chehab
@ 2018-09-09 12:50                   ` Stephen Rothwell
  2018-09-10 20:05                     ` Tony Lindgren
  1 sibling, 1 reply; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-09 12:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Hi Linus,

On Fri, 7 Sep 2018 09:17:18 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> For example, for me, the merge window is my busiest season by far
> (obviously).  I try to schedule my time off to coincide with late in
> the rc, because it's just quieter: at that point I'm mostly waiting
> for stuff.
> 
> But that's actually _supposed_ to be just me (and maybe Stephen Rothwell).

Actually, my quiet time is about rc1 to rc3, because everything I was
dealing with is now in your tree and people haven't added new stuff to
their trees yet (at least not much).  My busiest time is rc6 to the
middle of the merge window as people stuff in all the bits and pieces
that need to be merged (and manage to create conflicts with what you
have already merged during the merge window) :-(

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-08 11:48                   ` Mauro Carvalho Chehab
@ 2018-09-09 14:26                     ` Laurent Pinchart
  2018-09-10 22:14                       ` Eduardo Valentin
  0 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-09 14:26 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: ksummit-discuss

On Saturday, 8 September 2018 14:48:22 EEST Mauro Carvalho Chehab wrote:
> Em Sat, 08 Sep 2018 12:44:32 +0300 Laurent Pinchart escreveu:
> > On Saturday, 8 September 2018 00:06:33 EEST Mauro Carvalho Chehab wrote:
> >> Em Fri, 7 Sep 2018 11:13:20 +0200 Daniel Vetter escreveu:
> >>> On Fri, Sep 7, 2018 at 6:27 AM, Theodore Y. Ts'o wrote:
> >>>> On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via
> >>>> Ksummit-discuss wrote:
> >>>> 
> >>>> There actually is a perverse incentive to having all of the test
> >>>> 'bots, which is that I suspect some people have come to rely on it
> >>>> to catch problems.  I generally run a full set of regression tests
> >>>> before I push an update to git.kernel.org (it only takes about 2
> >>>> hours, and 12 VM's :-); and by the time we get to the late -rc's I
> >>>> *always* will do a full regression test.
> >>> 
> >>> This is what imo a well-run subsystem should sound like from a testing
> >>> pov. All the subsystem specific testing should be done before merging.
> >>> Post-merge is only for integration testing and catching the long-tail
> >>> issues that need months/years of machine time to surface.
> >>> 
> >>> Of course this is much harder for anything that needs physical
> >>> hardware, but even for driver subsystems there's lots you can do with
> >>> test-drivers, selftests and a pile of emulation, to at least catch
> >>> bugs in generic code. And for reasonably sized teams like drm/i915
> >>> building a proper CI is a very obvious investement that will pay off.
> >> 
> >> IMHO, CI would do even a better job for smaller teams, as they won't
> >> have much resources for testing, but the problem here is that those
> >> teams probably lack resources and money to invest on a physical hardware
> >> to setup a CI infra and to buy the myriad of different hardware to
> >> do regression testing.
> >> 
> >> Also, some devices are harder to test: how would you check if a camera
> >> microphone is working? How to check if the camera captured images
> >> are ok?
> > 
> > The same way you would check the display output. Cameras can be pointed at
> > known scenes with controlled lightning. TV capture cards can be fed a
> > known signal. Even for microphone testing we could put the camera in a
> > sound-proof enclosure, with an audio source. Solutions exist, whether we
> > have the budget to implement them is the real question.
> 
> Solutions exist, but they require a hole new kind of environment control.
> 
> In the case of DRM (and TV cards), display output can be tested with some
> HDMI grabber card. No need for a "controlled lightning environment" or
> anything like that. Once it is set, people can just place it into a
> random datacenter located anywhere and forget about it.
> 
> However, in the case of hardware like cameras, microphones, speakers,
> keyboards, mice, touchscreen, etc, it is a way more complex, as the
> environment will require adjustments (a silent room, specific
> lightning, mechanical components, etc) and a more proactive supervision,
> as it would tend to produce more false positive errors if something
> changes there. A normal datacenter won't fit those needs.

We would have to build hardware (in the generic sense, not necessarily 
electronics), but that's not specific to cameras. An exclosure with a scene, a 
light and a camera wouldn't necessarily be larger than someone of the ARM 
development boards I've had the "pleasure" to work with.

Again, solutions exist, it's a matter of how willing we are to implement them. 
If we consider testing crucial, then we have to invest resources in making it 
happen. If we don't invest the resources, then we can't claim that we value 
these particular tests very high.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 21:32                 ` Dan Carpenter
  2018-09-07 21:43                   ` Sasha Levin
@ 2018-09-10  7:53                   ` Jan Kara
  1 sibling, 0 replies; 138+ messages in thread
From: Jan Kara @ 2018-09-10  7:53 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: ksummit

On Sat 08-09-18 00:32:13, Dan Carpenter wrote:
> On Fri, Sep 07, 2018 at 03:06:24PM +0000, Sasha Levin via Ksummit-discuss wrote:
> > Let me use the CoreOS example here again. Here are the 5 user visible
> > stable regressions they had this year:
> > 
> > 8844618d8aa ("ext4: only look at the bg_flags field if it is valid")
> 
> The fix was 501228470077 ("ext4: fix check to prevent initializing
> reserved inodes").  The bug was found by running the test suite but with
> nojournal?
 
Yeah, this could have been caught by automatic testing ext4 people do, just
the configuration it happens in was not among the tested ones... The test
matrix is too big to test everything so only common configs get tested.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 21:43                   ` Sasha Levin
  2018-09-08 13:20                     ` Dan Carpenter
@ 2018-09-10  8:23                     ` Jan Kara
  1 sibling, 0 replies; 138+ messages in thread
From: Jan Kara @ 2018-09-10  8:23 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit, Dan Carpenter

On Fri 07-09-18 21:43:58, Sasha Levin via Ksummit-discuss wrote:
> On Sat, Sep 08, 2018 at 12:32:13AM +0300, Dan Carpenter wrote:
> >> 7b2ee50c0cd ("hv_netvsc: common detach logic")
> >
> >The patch summary sells this as a cleanup but it's a bugfix.  The fix
> >for it was commit 52acf73b6e9a ("hv_netvsc: Fix a network regression
> >after ifdown/ifup").  It took two months for anyone to notice the if
> >up/down sometimes fails.  Are there any standard tests for network
> >drivers?  There is no way we're going to hold back the patch for two
> >months.
> 
> These examples were less about "keep it waiting longer" and more to show
> that it'll be hard and/or pointless trying to restrict what goes in
> Stable as regressions come from commits that are "obviously" stable
> material.

I agree neither of these regressions would likely be prevented by waiting
longer and I also agree all those fixes should have been taken into stable.
I don't agree with the conclusion "it'll be hard and/or pointless trying to
restrict what goes in Stable" - in my opinion every patch included into
stable carries a risk of a similar regression. The more patches you include
the higher the chances of a regression. So you have to make sure the
problems fixed by included patches are serious enough that they outweight
this risk. Whether the bar for patch inclusion into stable is high enough
is a question some people dispute... And I agree it's a tough decision
because different people have different ideas on what is important enough
and it also obviously depends on the use case.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-08 18:35                                   ` Guenter Roeck
@ 2018-09-10 13:47                                     ` Mark Brown
  0 siblings, 0 replies; 138+ messages in thread
From: Mark Brown @ 2018-09-10 13:47 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, ksummit

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

On Sat, Sep 08, 2018 at 11:35:54AM -0700, Guenter Roeck wrote:

> On the other side, adding networking tests and some basic virtual/network
> file system tests to the existing test suites would have caught several
> of the recent regressions. Not all of them, for sure, but at least some.
> If I had to choose one improvement, doing that would be my preference.

And also using more of the testsuites which we have but aren't widely
deployed outside of the subsystem maintainers/developers - we have a
bunch of things, especially in graphics and storage related areas but
they're not joined up with any of the stable testing stuff yet as far as
I know.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 22:43                           ` Guenter Roeck
  2018-09-07 22:53                             ` Linus Torvalds
@ 2018-09-10 16:20                             ` Dan Rue
  1 sibling, 0 replies; 138+ messages in thread
From: Dan Rue @ 2018-09-10 16:20 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, ksummit

On Fri, Sep 07, 2018 at 03:43:46PM -0700, Guenter Roeck wrote:
> On Fri, Sep 07, 2018 at 03:27:01PM -0700, Linus Torvalds wrote:
> > On Fri, Sep 7, 2018 at 2:13 PM Sasha Levin via Ksummit-discuss
> > <ksummit-discuss@lists.linuxfoundation.org> wrote:
> > >
> > > 1. Grab a batch of ~2-3 week old commits from Linus's tree.
> > > 2. Review, basic tests and send stable-rc notification.
> > 
> > Side note: maybe the stable grabbing and testing could be automated?
> > 
> > IOW, right now the stable people intentionally (generally) wait a week
> > before they even start. Maybe there could be an automated queue for
> > "this has been marked for stable" (and the whole "fixes:" magic that
> > you guys already trigger on) that gets applied to the previous stable
> > tree, and starts testing immediately.
> > 
> > Because one of the patterns we *do* obviously see is that something
> > was fine in mainline, but then broke in stable because of an unforseen
> > lack of depdenencies. Sure, it's probably pretty rare (and *many*
> > dependencies willl show up as an actual conflict), but I think the
> > times it does happen it's particularly painful because it can be so
> > non-obvious.
> > 
> > So maybe an automated "linux-next" that starts happening *before* the
> > rc stage would catch some things?
> > 
> 
> And it does, as soon as Greg publishes a set of patches. At the very
> least 0day runs on those, as well as my builders. There is a question
> of scalability, though. I am sure that will improve over time as more
> test resources become available, but six stable releases plus mainline
> plus next plus whatever contributing branches covered by 0day and others
> does take a lot of resources.

We build and test every push to the stable-rc branches, mainline, and
next, too. Branches have a carrying cost (maintenance, triage, etc -
this cost goes down as tooling improves), and pushes have an
infrastructure (build and test) cost. I'll agree with Guenter that I'd
rather see better coverage on fewer branches. It's also important for
tree maintainers to realize the downstream consequences of a push on the
CI/CD environments. Sometimes we see a lot of pushes with little or no
changes, which causes duplicate testing and delays results. On the other
hand, it is nice to have bisection points within a branch for when
problems do emerge.

The fewer branches we track for CI/CD, the more attention they will
receive, the deeper the testing can go, and the better the results will
be. -next is a good example of this principle.

In fact, we recently stopped building and testing the stable release
branches altogether, because it's redundant if you're already testing
every push on the stable-rc branches.

This is also true for test suites/frameworks. I'd prefer to see more
activity around fewer test suites, rather than a proliferation of
different ways of writing and running tests. The cost of adding a test
suite to CI/CD is high, especially if you do it well, and so as we see
efforts centralize around fewer suites, coverage and test quality will
improve in general.

Dan

> 
> Personally I would suggest to further improve test coverage, not to add
> more branches to test. More hardware for sure, but also adding more tests
> such as the network testing suggested by Sasha.

> 
> Guenter
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-07 15:52               ` Linus Torvalds
  2018-09-07 16:17                 ` Linus Torvalds
@ 2018-09-10 19:43                 ` Sasha Levin
  2018-09-10 20:45                   ` Steven Rostedt
  1 sibling, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-10 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ksummit

On Fri, Sep 07, 2018 at 08:52:40AM -0700, Linus Torvalds wrote:
>So this is what my argument really boils down to: the more critical a
>patch is, the more likely it is to be pushed more aggressively, which
>in turn makes it statistically much more likely to show up not only
>during the latter part of the development cycle, but it will directly
>mean that it looks "less tested".
>
>And AT THE SAME TIME, the more critical a patch is, the more likely it
>is to also show up as a problem spot for distros. Because, by
>definition, it touched something critical and likely subtle.
>
>End result: BY DEFINITION you'll see a correlation between "less
>testing" and "more problems".
>
>But THAT is correlation. That's not the fundamental causation.
>
>Now, I agree that it's correlation that makes sense to treat as
>causation. It just is very tempting to say: "less testing obviously
>means more problems". And I do think that it's very possibly a real
>causal property as well, but my argument has been that it's not at all
>obviously so, exactly because I would expect that correlation to exist
>even if there was absolutely ZERO causality.
>
>See what my argument is? You're arguing from correlation. And I think
>there is a much more direct causal argument that explains a lot of the
>correlation.

Both of us agree that patches in later -rc cycles are buggier. We don't
agree on why, but I think that it actually doesn't matter much. For the
sake of the argument, let's go with what you're saying and assume that
they're buggier because they are are more critical, tricky and subtle.

So we have this time period of a few weeks where we know that we're
going to see tricky patches. What can we do to better deal with it?
Saying that we'll just see more bugs and we should just live with it
because it's "BY DEFINITION" is not really a good answer IMO.

For stable trees, we can address that by waiting even longer before
picking up -rc5+ stuff, but that will move us further away from your
tree which is an undesirable effect.

I don't have anything beyond guesses, but I don't think the
solution here is WONTFIX.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-09 12:50                   ` Stephen Rothwell
@ 2018-09-10 20:05                     ` Tony Lindgren
  0 siblings, 0 replies; 138+ messages in thread
From: Tony Lindgren @ 2018-09-10 20:05 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: ksummit

* Stephen Rothwell <sfr@canb.auug.org.au> [180909 12:55]:
> Hi Linus,
> 
> On Fri, 7 Sep 2018 09:17:18 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > For example, for me, the merge window is my busiest season by far
> > (obviously).  I try to schedule my time off to coincide with late in
> > the rc, because it's just quieter: at that point I'm mostly waiting
> > for stuff.
> > 
> > But that's actually _supposed_ to be just me (and maybe Stephen Rothwell).
> 
> Actually, my quiet time is about rc1 to rc3, because everything I was
> dealing with is now in your tree and people haven't added new stuff to
> their trees yet (at least not much).  My busiest time is rc6 to the
> middle of the merge window as people stuff in all the bits and pieces
> that need to be merged (and manage to create conflicts with what you
> have already merged during the merge window) :-(

Yeah with Linux next, rc6 and later seems to be where we often
see surprise regressions.

What has worked good for me for past three or so years is keep
testing Linux next few times a week and report the regressions.
Usually the regressions in Linux next get fixed fast within a
few days.

Before I was doing that I would always end up wasting my time
chasing multiple regressions during the -rc cycle.

I think we should always aim for a regression free -rc1.

Not sure what we could do to get people test Linux next
more and keep it regression free. But that could potentially
make the -rc cycles very quiet.

Regards,

Tony

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 19:43                 ` Sasha Levin
@ 2018-09-10 20:45                   ` Steven Rostedt
  2018-09-10 21:20                     ` Guenter Roeck
                                       ` (2 more replies)
  0 siblings, 3 replies; 138+ messages in thread
From: Steven Rostedt @ 2018-09-10 20:45 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Mon, 10 Sep 2018 19:43:11 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> On Fri, Sep 07, 2018 at 08:52:40AM -0700, Linus Torvalds wrote:
> >So this is what my argument really boils down to: the more critical a
> >patch is, the more likely it is to be pushed more aggressively, which
> >in turn makes it statistically much more likely to show up not only
> >during the latter part of the development cycle, but it will directly
> >mean that it looks "less tested".
> >
> >And AT THE SAME TIME, the more critical a patch is, the more likely it
> >is to also show up as a problem spot for distros. Because, by
> >definition, it touched something critical and likely subtle.
> >
> >End result: BY DEFINITION you'll see a correlation between "less
> >testing" and "more problems".
> >
> >But THAT is correlation. That's not the fundamental causation.
> >
> >Now, I agree that it's correlation that makes sense to treat as
> >causation. It just is very tempting to say: "less testing obviously
> >means more problems". And I do think that it's very possibly a real
> >causal property as well, but my argument has been that it's not at all
> >obviously so, exactly because I would expect that correlation to exist
> >even if there was absolutely ZERO causality.
> >
> >See what my argument is? You're arguing from correlation. And I think
> >there is a much more direct causal argument that explains a lot of the
> >correlation.  
> 
> Both of us agree that patches in later -rc cycles are buggier. We don't
> agree on why, but I think that it actually doesn't matter much. For the
> sake of the argument, let's go with what you're saying and assume that
> they're buggier because they are are more critical, tricky and subtle.
> 
> So we have this time period of a few weeks where we know that we're
> going to see tricky patches. What can we do to better deal with it?
> Saying that we'll just see more bugs and we should just live with it
> because it's "BY DEFINITION" is not really a good answer IMO.
> 
> For stable trees, we can address that by waiting even longer before
> picking up -rc5+ stuff, but that will move us further away from your
> tree which is an undesirable effect.
> 
> I don't have anything beyond guesses, but I don't think the
> solution here is WONTFIX.
> 

I think it may be more of CANTFIX.

The bugs introduced after -rc5 are more subtle and harder to trigger. I
(and I presume Linus, but he can talk for himself) don't believe that
keeping it in linux-next any longer will help find them, unless the
bots get better to do so. The problem is that these bugs are not going
to be triggered until they get into the mainline kernel and perhaps not
even until they get into the distros. We want to find them before that,
but it's not until they are used in production environments that they
will get found.

The best we can do is make the automated testing of linux-next better
such that there's less -rc5 patches that need to go in in the first
place.

I do think that anything that goes into -rc5 or later should be tested
by the developer and the 0day bot, to make sure they don't introduce
some silly bug. But linux-next was mainly to deal with bugs caused by
integration of various sub systems. But -rc5 fixes only care about
integrating with mainline. And as Linus pointed out, when it gets into
mainline, it will then be pulled into linux-next where it gets
integrated with new code coming into the next merge window.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 20:45                   ` Steven Rostedt
@ 2018-09-10 21:20                     ` Guenter Roeck
  2018-09-10 21:46                       ` Steven Rostedt
                                         ` (2 more replies)
  2018-09-10 23:01                     ` Eduardo Valentin
  2018-09-10 23:38                     ` Sasha Levin
  2 siblings, 3 replies; 138+ messages in thread
From: Guenter Roeck @ 2018-09-10 21:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> 
> The best we can do is make the automated testing of linux-next better
> such that there's less -rc5 patches that need to go in in the first
> place.
> 

Would that help ? -next has been more or less unusable for a week or so.
Maybe it is just a bad time (it hasn't been as bad as it is right now
for quite some time), but

Build results:
	total: 135 pass: 133 fail: 2
Qemu test results:
	total: 315 pass: 112 fail: 203

on next-20180910 doesn't really make me very confident that useful regression
tests on -next are even possible. it seems to me that -next is quite often
used as dumping ground for sparsely tested changes, and is far from "ready
for upstream".

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 21:20                     ` Guenter Roeck
@ 2018-09-10 21:46                       ` Steven Rostedt
  2018-09-10 23:03                         ` Eduardo Valentin
  2018-09-11  0:47                         ` Stephen Rothwell
  2018-09-11  0:43                       ` Stephen Rothwell
  2018-09-11 11:18                       ` Mark Brown
  2 siblings, 2 replies; 138+ messages in thread
From: Steven Rostedt @ 2018-09-10 21:46 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit

On Mon, 10 Sep 2018 14:20:19 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> > 
> > The best we can do is make the automated testing of linux-next better
> > such that there's less -rc5 patches that need to go in in the first
> > place.
> >   
> 
> Would that help ? -next has been more or less unusable for a week or so.
> Maybe it is just a bad time (it hasn't been as bad as it is right now
> for quite some time), but
> 
> Build results:
> 	total: 135 pass: 133 fail: 2
> Qemu test results:
> 	total: 315 pass: 112 fail: 203
> 
> on next-20180910 doesn't really make me very confident that useful regression
> tests on -next are even possible. it seems to me that -next is quite often
> used as dumping ground for sparsely tested changes, and is far from "ready
> for upstream".
>

Honestly, I think this is something that Linus should yell at
maintainers for. I treat my pushes into linux-next the same as I treat
my pull requests to Linus. I don't push anything into next until it's
been fully run through my test suite, and passes. That also makes it
easier for me to know that whatever I have in next is also ready for
Linus (the way it was suppose to be).

With the 0day bot, I think it's become much better. But honestly, I
think any branch that causes next to fail to build, or run basic tests,
should be taken out of linux-next and a nasty message sent to the
guilty maintainer. With the exception that a breakage was caused by two
conflicting commits (for example, one that changes an API, and another
branch that uses that API without the update). Those types of breakages
is what linux-next is made for. But if the branch being pulled into
linux-next breaks something without the integration, then that's
unacceptable.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-09 14:26                     ` Laurent Pinchart
@ 2018-09-10 22:14                       ` Eduardo Valentin
  0 siblings, 0 replies; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-10 22:14 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Mauro Carvalho Chehab, ksummit-discuss

Hello,

On Sun, Sep 09, 2018 at 05:26:48PM +0300, Laurent Pinchart wrote:
> On Saturday, 8 September 2018 14:48:22 EEST Mauro Carvalho Chehab wrote:
> > Em Sat, 08 Sep 2018 12:44:32 +0300 Laurent Pinchart escreveu:
> > > On Saturday, 8 September 2018 00:06:33 EEST Mauro Carvalho Chehab wrote:
> > >> Em Fri, 7 Sep 2018 11:13:20 +0200 Daniel Vetter escreveu:
> > >>> On Fri, Sep 7, 2018 at 6:27 AM, Theodore Y. Ts'o wrote:
> > >>>> On Fri, Sep 07, 2018 at 01:49:31AM +0000, Sasha Levin via
> > >>>> Ksummit-discuss wrote:
> > >>>> 
> > >>>> There actually is a perverse incentive to having all of the test
> > >>>> 'bots, which is that I suspect some people have come to rely on it
> > >>>> to catch problems.  I generally run a full set of regression tests
> > >>>> before I push an update to git.kernel.org (it only takes about 2
> > >>>> hours, and 12 VM's :-); and by the time we get to the late -rc's I
> > >>>> *always* will do a full regression test.
> > >>> 
> > >>> This is what imo a well-run subsystem should sound like from a testing
> > >>> pov. All the subsystem specific testing should be done before merging.
> > >>> Post-merge is only for integration testing and catching the long-tail
> > >>> issues that need months/years of machine time to surface.
> > >>> 
> > >>> Of course this is much harder for anything that needs physical
> > >>> hardware, but even for driver subsystems there's lots you can do with
> > >>> test-drivers, selftests and a pile of emulation, to at least catch
> > >>> bugs in generic code. And for reasonably sized teams like drm/i915
> > >>> building a proper CI is a very obvious investement that will pay off.
> > >> 
> > >> IMHO, CI would do even a better job for smaller teams, as they won't
> > >> have much resources for testing, but the problem here is that those
> > >> teams probably lack resources and money to invest on a physical hardware
> > >> to setup a CI infra and to buy the myriad of different hardware to
> > >> do regression testing.
> > >> 
> > >> Also, some devices are harder to test: how would you check if a camera
> > >> microphone is working? How to check if the camera captured images
> > >> are ok?
> > > 
> > > The same way you would check the display output. Cameras can be pointed at
> > > known scenes with controlled lightning. TV capture cards can be fed a
> > > known signal. Even for microphone testing we could put the camera in a
> > > sound-proof enclosure, with an audio source. Solutions exist, whether we
> > > have the budget to implement them is the real question.
> > 
> > Solutions exist, but they require a hole new kind of environment control.
> > 
> > In the case of DRM (and TV cards), display output can be tested with some
> > HDMI grabber card. No need for a "controlled lightning environment" or
> > anything like that. Once it is set, people can just place it into a
> > random datacenter located anywhere and forget about it.
> > 
> > However, in the case of hardware like cameras, microphones, speakers,
> > keyboards, mice, touchscreen, etc, it is a way more complex, as the
> > environment will require adjustments (a silent room, specific
> > lightning, mechanical components, etc) and a more proactive supervision,
> > as it would tend to produce more false positive errors if something
> > changes there. A normal datacenter won't fit those needs.
> 
> We would have to build hardware (in the generic sense, not necessarily 
> electronics), but that's not specific to cameras. An exclosure with a scene, a 
> light and a camera wouldn't necessarily be larger than someone of the ARM 
> development boards I've had the "pleasure" to work with.
> 
> Again, solutions exist, it's a matter of how willing we are to implement them. 
> If we consider testing crucial, then we have to invest resources in making it 
> happen. If we don't invest the resources, then we can't claim that we value 
> these particular tests very high.

Yeah, taking the camera case aside for a moment, thermal has similar
issues. For proper characterization and testing of the control loop
algorithms, one requires a controlled environment to isolate variability
across runs (e.g. ambient temperature).

But there two aspects here. While I do agree with Mauro that a CI may
have more value for a targeted team (say specific driver, specific board,
specific product, etc) and prohibitive for a larger project (do we want
to setup all cameras supported by v4l into a CI, or all thermal sensors
supported by thermal subsystem into a CI?), limiting the scope of the
testing and getting at least some automation, maybe based on emulation,
to test out the core code and maybe a subset of drivers, it is still
worth to have into a CI setup. Of course, the sizing and investment
needed may change from subsystem to subsystem.


> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 
> 
> 
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 20:45                   ` Steven Rostedt
  2018-09-10 21:20                     ` Guenter Roeck
@ 2018-09-10 23:01                     ` Eduardo Valentin
  2018-09-10 23:12                       ` Steven Rostedt
  2018-09-10 23:38                     ` Sasha Levin
  2 siblings, 1 reply; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-10 23:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> On Mon, 10 Sep 2018 19:43:11 +0000
> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> 
> > On Fri, Sep 07, 2018 at 08:52:40AM -0700, Linus Torvalds wrote:
> > >So this is what my argument really boils down to: the more critical a
> > >patch is, the more likely it is to be pushed more aggressively, which
> > >in turn makes it statistically much more likely to show up not only
> > >during the latter part of the development cycle, but it will directly
> > >mean that it looks "less tested".
> > >
> > >And AT THE SAME TIME, the more critical a patch is, the more likely it
> > >is to also show up as a problem spot for distros. Because, by
> > >definition, it touched something critical and likely subtle.
> > >
> > >End result: BY DEFINITION you'll see a correlation between "less
> > >testing" and "more problems".
> > >
> > >But THAT is correlation. That's not the fundamental causation.
> > >
> > >Now, I agree that it's correlation that makes sense to treat as
> > >causation. It just is very tempting to say: "less testing obviously
> > >means more problems". And I do think that it's very possibly a real
> > >causal property as well, but my argument has been that it's not at all
> > >obviously so, exactly because I would expect that correlation to exist
> > >even if there was absolutely ZERO causality.
> > >
> > >See what my argument is? You're arguing from correlation. And I think
> > >there is a much more direct causal argument that explains a lot of the
> > >correlation.  
> > 
> > Both of us agree that patches in later -rc cycles are buggier. We don't
> > agree on why, but I think that it actually doesn't matter much. For the
> > sake of the argument, let's go with what you're saying and assume that
> > they're buggier because they are are more critical, tricky and subtle.
> > 
> > So we have this time period of a few weeks where we know that we're
> > going to see tricky patches. What can we do to better deal with it?
> > Saying that we'll just see more bugs and we should just live with it
> > because it's "BY DEFINITION" is not really a good answer IMO.
> > 
> > For stable trees, we can address that by waiting even longer before
> > picking up -rc5+ stuff, but that will move us further away from your
> > tree which is an undesirable effect.
> > 
> > I don't have anything beyond guesses, but I don't think the
> > solution here is WONTFIX.
> > 
> 
> I think it may be more of CANTFIX.
> 
> The bugs introduced after -rc5 are more subtle and harder to trigger. I
> (and I presume Linus, but he can talk for himself) don't believe that
> keeping it in linux-next any longer will help find them, unless the
> bots get better to do so. The problem is that these bugs are not going
> to be triggered until they get into the mainline kernel and perhaps not
> even until they get into the distros. We want to find them before that,
> but it's not until they are used in production environments that they
> will get found.
> 

I agree that leaving in linux-next, with no improvements to bots, would
not help much. Maybe it will complicate the life of stable tree maintainers and
consumers. 

One thing that could be done to help is to ask from developers for
some sort of selftest that can be executed by the bots and used while
backporting their fixes to stable. That way the developer can have a way
to tell how to check if the kernel did not regress and whoever wants to
try out the fix can validate it. Of course, can this really fly, that is
a different story. Not sure the community will end up in a place where
all patches post -rc5 requires a selftest :-)

And of course, there is the other type of regression, which is the fix /
backport causing issue on other parts of the kernel/subsystem. Maybe
forcing each subsystem to have some sort of selftest/sanity check would
be one way to improve the reliability of the results of the bots
overall.

> 
> -- Steve
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 21:46                       ` Steven Rostedt
@ 2018-09-10 23:03                         ` Eduardo Valentin
  2018-09-10 23:13                           ` Steven Rostedt
  2018-09-11  0:49                           ` Stephen Rothwell
  2018-09-11  0:47                         ` Stephen Rothwell
  1 sibling, 2 replies; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-10 23:03 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Mon, Sep 10, 2018 at 05:46:38PM -0400, Steven Rostedt wrote:
> On Mon, 10 Sep 2018 14:20:19 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> > > 
> > > The best we can do is make the automated testing of linux-next better
> > > such that there's less -rc5 patches that need to go in in the first
> > > place.
> > >   
> > 
> > Would that help ? -next has been more or less unusable for a week or so.
> > Maybe it is just a bad time (it hasn't been as bad as it is right now
> > for quite some time), but
> > 
> > Build results:
> > 	total: 135 pass: 133 fail: 2
> > Qemu test results:
> > 	total: 315 pass: 112 fail: 203
> > 
> > on next-20180910 doesn't really make me very confident that useful regression
> > tests on -next are even possible. it seems to me that -next is quite often
> > used as dumping ground for sparsely tested changes, and is far from "ready
> > for upstream".
> >
> 
> Honestly, I think this is something that Linus should yell at
> maintainers for. I treat my pushes into linux-next the same as I treat
> my pull requests to Linus. I don't push anything into next until it's
> been fully run through my test suite, and passes. That also makes it
> easier for me to know that whatever I have in next is also ready for
> Linus (the way it was suppose to be).


Shouldn't we all be doing that?

> 
> With the 0day bot, I think it's become much better. But honestly, I
> think any branch that causes next to fail to build, or run basic tests,
> should be taken out of linux-next and a nasty message sent to the
> guilty maintainer. With the exception that a breakage was caused by two
> conflicting commits (for example, one that changes an API, and another
> branch that uses that API without the update). Those types of breakages
> is what linux-next is made for. But if the branch being pulled into
> linux-next breaks something without the integration, then that's
> unacceptable.

I thought that was the case already, everthing that goes to linux-next
is ready to go to Linus.

> 
> -- Steve
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 23:01                     ` Eduardo Valentin
@ 2018-09-10 23:12                       ` Steven Rostedt
  2018-09-10 23:32                         ` Eduardo Valentin
  0 siblings, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-10 23:12 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: ksummit

On Mon, 10 Sep 2018 16:01:06 -0700
Eduardo Valentin <edubezval@gmail.com> wrote:

> One thing that could be done to help is to ask from developers for
> some sort of selftest that can be executed by the bots and used while
> backporting their fixes to stable. That way the developer can have a way

We have that already, it's tools/testing/selftests/...

There's a series of ftrace selftests there that I run before running my
own more complicated tests. There's still tests I need to move to that
selftest directory and out of my own suite, but there's some tests that
are too complicated for the the selftests directory.


> to tell how to check if the kernel did not regress and whoever wants to
> try out the fix can validate it. Of course, can this really fly, that is
> a different story. Not sure the community will end up in a place where
> all patches post -rc5 requires a selftest :-)
> 
> And of course, there is the other type of regression, which is the fix /
> backport causing issue on other parts of the kernel/subsystem. Maybe
> forcing each subsystem to have some sort of selftest/sanity check would
> be one way to improve the reliability of the results of the bots
> overall.

Heh, "forcing"? That hasn't been able to work yet ;-)

Also, tests for others that don't have the necessary hardware is not
going to help much. A lot of regressions show up on hardware that we
don't use.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 23:03                         ` Eduardo Valentin
@ 2018-09-10 23:13                           ` Steven Rostedt
  2018-09-11 15:42                             ` Steven Rostedt
  2018-09-11  0:49                           ` Stephen Rothwell
  1 sibling, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-10 23:13 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: ksummit

On Mon, 10 Sep 2018 16:03:03 -0700
Eduardo Valentin <edubezval@gmail.com> wrote:

> I thought that was the case already, everthing that goes to linux-next
> is ready to go to Linus.

It's suppose to be, but not always, and this is why I suggested that
Linus start yelling at those that are not doing it.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 23:12                       ` Steven Rostedt
@ 2018-09-10 23:32                         ` Eduardo Valentin
  2018-09-10 23:38                           ` Guenter Roeck
  0 siblings, 1 reply; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-10 23:32 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Mon, Sep 10, 2018 at 07:12:39PM -0400, Steven Rostedt wrote:
> On Mon, 10 Sep 2018 16:01:06 -0700
> Eduardo Valentin <edubezval@gmail.com> wrote:
> 
> > One thing that could be done to help is to ask from developers for
> > some sort of selftest that can be executed by the bots and used while
> > backporting their fixes to stable. That way the developer can have a way
> 
> We have that already, it's tools/testing/selftests/...

well, yes, but what I really meant was to populate that directory with a
full set of tests that can detect regressions, on all subsystems

> 
> There's a series of ftrace selftests there that I run before running my
> own more complicated tests. There's still tests I need to move to that
> selftest directory and out of my own suite, but there's some tests that
> are too complicated for the the selftests directory.
> 
> 
> > to tell how to check if the kernel did not regress and whoever wants to
> > try out the fix can validate it. Of course, can this really fly, that is
> > a different story. Not sure the community will end up in a place where
> > all patches post -rc5 requires a selftest :-)
> > 
> > And of course, there is the other type of regression, which is the fix /
> > backport causing issue on other parts of the kernel/subsystem. Maybe
> > forcing each subsystem to have some sort of selftest/sanity check would
> > be one way to improve the reliability of the results of the bots
> > overall.
> 
> Heh, "forcing"? That hasn't been able to work yet ;-)
> 

:-)

> Also, tests for others that don't have the necessary hardware is not
> going to help much. A lot of regressions show up on hardware that we
> don't use.
> 

I agree. Thermal is one of those weird cases one would find most of real
problems while putting devices inside a thermal chamber and running real
workloads in a controlled manner. And on top of that, those are many
times not enough, and only end users would really trigger corner cases
that can really be seen when the device gets into a person's hand.

But still, the fact that selftests do not get all bugs does not mean it
cannot be used to catch at least a subset of it.

Also, some CI / bots do have a rig of hardware attached (kernelCI for
one). But yeah, I agree, hardware availability is a real issue.

> -- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 20:45                   ` Steven Rostedt
  2018-09-10 21:20                     ` Guenter Roeck
  2018-09-10 23:01                     ` Eduardo Valentin
@ 2018-09-10 23:38                     ` Sasha Levin
  2 siblings, 0 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-10 23:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
>On Mon, 10 Sep 2018 19:43:11 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> On Fri, Sep 07, 2018 at 08:52:40AM -0700, Linus Torvalds wrote:
>> >So this is what my argument really boils down to: the more critical a
>> >patch is, the more likely it is to be pushed more aggressively, which
>> >in turn makes it statistically much more likely to show up not only
>> >during the latter part of the development cycle, but it will directly
>> >mean that it looks "less tested".
>> >
>> >And AT THE SAME TIME, the more critical a patch is, the more likely it
>> >is to also show up as a problem spot for distros. Because, by
>> >definition, it touched something critical and likely subtle.
>> >
>> >End result: BY DEFINITION you'll see a correlation between "less
>> >testing" and "more problems".
>> >
>> >But THAT is correlation. That's not the fundamental causation.
>> >
>> >Now, I agree that it's correlation that makes sense to treat as
>> >causation. It just is very tempting to say: "less testing obviously
>> >means more problems". And I do think that it's very possibly a real
>> >causal property as well, but my argument has been that it's not at all
>> >obviously so, exactly because I would expect that correlation to exist
>> >even if there was absolutely ZERO causality.
>> >
>> >See what my argument is? You're arguing from correlation. And I think
>> >there is a much more direct causal argument that explains a lot of the
>> >correlation.
>>
>> Both of us agree that patches in later -rc cycles are buggier. We don't
>> agree on why, but I think that it actually doesn't matter much. For the
>> sake of the argument, let's go with what you're saying and assume that
>> they're buggier because they are are more critical, tricky and subtle.
>>
>> So we have this time period of a few weeks where we know that we're
>> going to see tricky patches. What can we do to better deal with it?
>> Saying that we'll just see more bugs and we should just live with it
>> because it's "BY DEFINITION" is not really a good answer IMO.
>>
>> For stable trees, we can address that by waiting even longer before
>> picking up -rc5+ stuff, but that will move us further away from your
>> tree which is an undesirable effect.
>>
>> I don't have anything beyond guesses, but I don't think the
>> solution here is WONTFIX.
>>
>
>I think it may be more of CANTFIX.
>
>The bugs introduced after -rc5 are more subtle and harder to trigger. I
>(and I presume Linus, but he can talk for himself) don't believe that
>keeping it in linux-next any longer will help find them, unless the
>bots get better to do so. The problem is that these bugs are not going
>to be triggered until they get into the mainline kernel and perhaps not
>even until they get into the distros. We want to find them before that,
>but it's not until they are used in production environments that they
>will get found.

If you're fixing something in -rc8, which is, according to Linus, only
for *critical* fixes that are usually complex, you better have tested
that code before pushing in.

Is it on obscure hardware no one has access too? I can't imagine what
makes that bug critical then.

Otherwise, yes, it should be a requirement that a patch was reasonably
tested before being merged, this is more true for those late -rc
critical fixes.

>The best we can do is make the automated testing of linux-next better
>such that there's less -rc5 patches that need to go in in the first
>place.

Being in -next is not only about running it through automatic bots.
Being on 0day means, in practice, "amount of days humans had to
review/test that code".

I didn't want to count days-in-next just to credit automatic testing,
but also as an indicator of how many eyeballs a commit attracted before
being merged.

>I do think that anything that goes into -rc5 or later should be tested
>by the developer and the 0day bot, to make sure they don't introduce
>some silly bug. But linux-next was mainly to deal with bugs caused by
>integration of various sub systems. But -rc5 fixes only care about
>integrating with mainline. And as Linus pointed out, when it gets into
>mainline, it will then be pulled into linux-next where it gets
>integrated with new code coming into the next merge window.

It would be nice if every bug coming in that late would have a
Tested-by: tag. Isn't it a requirement that patches should be tested
anyways?

Require that every patch was sent to lkml? Is it a big ask?

If the patches are so complex and subtle, require at least one
reviewed-by/acked-by?


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 23:32                         ` Eduardo Valentin
@ 2018-09-10 23:38                           ` Guenter Roeck
  0 siblings, 0 replies; 138+ messages in thread
From: Guenter Roeck @ 2018-09-10 23:38 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: ksummit

On Mon, Sep 10, 2018 at 04:32:40PM -0700, Eduardo Valentin wrote:
> On Mon, Sep 10, 2018 at 07:12:39PM -0400, Steven Rostedt wrote:
> > On Mon, 10 Sep 2018 16:01:06 -0700
> > Eduardo Valentin <edubezval@gmail.com> wrote:
> > 
> > > One thing that could be done to help is to ask from developers for
> > > some sort of selftest that can be executed by the bots and used while
> > > backporting their fixes to stable. That way the developer can have a way
> > 
> > We have that already, it's tools/testing/selftests/...
> 
> well, yes, but what I really meant was to populate that directory with a
> full set of tests that can detect regressions, on all subsystems
> 
> > 
> > There's a series of ftrace selftests there that I run before running my
> > own more complicated tests. There's still tests I need to move to that
> > selftest directory and out of my own suite, but there's some tests that
> > are too complicated for the the selftests directory.
> > 
> > 
> > > to tell how to check if the kernel did not regress and whoever wants to
> > > try out the fix can validate it. Of course, can this really fly, that is
> > > a different story. Not sure the community will end up in a place where
> > > all patches post -rc5 requires a selftest :-)
> > > 
> > > And of course, there is the other type of regression, which is the fix /
> > > backport causing issue on other parts of the kernel/subsystem. Maybe
> > > forcing each subsystem to have some sort of selftest/sanity check would
> > > be one way to improve the reliability of the results of the bots
> > > overall.
> > 
> > Heh, "forcing"? That hasn't been able to work yet ;-)
> > 
> 
> :-)
> 
> > Also, tests for others that don't have the necessary hardware is not
> > going to help much. A lot of regressions show up on hardware that we
> > don't use.
> > 
> 
> I agree. Thermal is one of those weird cases one would find most of real
> problems while putting devices inside a thermal chamber and running real
> workloads in a controlled manner. And on top of that, those are many
> times not enough, and only end users would really trigger corner cases
> that can really be seen when the device gets into a person's hand.
> 
> But still, the fact that selftests do not get all bugs does not mean it
> cannot be used to catch at least a subset of it.
> 
> Also, some CI / bots do have a rig of hardware attached (kernelCI for
> one). But yeah, I agree, hardware availability is a real issue.
> 

I think a lot of this could be resolved with qemu. That is not perfect
either, but simulating environmental conditions to trigger a system
response should be quite straightforward and much less costly than
thermal chambers.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 21:20                     ` Guenter Roeck
  2018-09-10 21:46                       ` Steven Rostedt
@ 2018-09-11  0:43                       ` Stephen Rothwell
  2018-09-11 16:49                         ` Guenter Roeck
  2018-09-11 11:18                       ` Mark Brown
  2 siblings, 1 reply; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-11  0:43 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

Hi Guenter,

On Mon, 10 Sep 2018 14:20:19 -0700 Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> > 
> > The best we can do is make the automated testing of linux-next better
> > such that there's less -rc5 patches that need to go in in the first
> > place.
> >   
> 
> Would that help ? -next has been more or less unusable for a week or so.
> Maybe it is just a bad time (it hasn't been as bad as it is right now
> for quite some time), but
> 
> Build results:
> 	total: 135 pass: 133 fail: 2
> Qemu test results:
> 	total: 315 pass: 112 fail: 203

I assume that most of that is the mount api changes.  I also assume you
have reported these?

> on next-20180910 doesn't really make me very confident that useful regression
> tests on -next are even possible. it seems to me that -next is quite often
> used as dumping ground for sparsely tested changes, and is far from "ready
> for upstream".

Well, we do get some of that, but also some things are harder to test
in isolation.
-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 21:46                       ` Steven Rostedt
  2018-09-10 23:03                         ` Eduardo Valentin
@ 2018-09-11  0:47                         ` Stephen Rothwell
  2018-09-11 17:35                           ` Linus Torvalds
  1 sibling, 1 reply; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-11  0:47 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 879 bytes --]

Hi Steve,

On Mon, 10 Sep 2018 17:46:38 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Honestly, I think this is something that Linus should yell at
> maintainers for. I treat my pushes into linux-next the same as I treat
> my pull requests to Linus. I don't push anything into next until it's
> been fully run through my test suite, and passes. That also makes it
> easier for me to know that whatever I have in next is also ready for
> Linus (the way it was suppose to be).

I am pretty sure that Linus does not follow linux-next (except to see
what is still coming during the merge window), so maybe I should be
yelling at people - but yelling is not one of my better skills :-)

In the current particular case it is runtime testing that is failing
and I only do a small amount of that (and it is not failing for me).

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 23:03                         ` Eduardo Valentin
  2018-09-10 23:13                           ` Steven Rostedt
@ 2018-09-11  0:49                           ` Stephen Rothwell
  2018-09-11  1:01                             ` Al Viro
  1 sibling, 1 reply; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-11  0:49 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]

Hi Eduardo,

On Mon, 10 Sep 2018 16:03:03 -0700 Eduardo Valentin <edubezval@gmail.com> wrote:
>
> On Mon, Sep 10, 2018 at 05:46:38PM -0400, Steven Rostedt wrote:
> > Honestly, I think this is something that Linus should yell at
> > maintainers for. I treat my pushes into linux-next the same as I treat
> > my pull requests to Linus. I don't push anything into next until it's
> > been fully run through my test suite, and passes. That also makes it
> > easier for me to know that whatever I have in next is also ready for
> > Linus (the way it was suppose to be).  
> 
> Shouldn't we all be doing that?

Yes.

> > With the 0day bot, I think it's become much better. But honestly, I
> > think any branch that causes next to fail to build, or run basic tests,
> > should be taken out of linux-next and a nasty message sent to the
> > guilty maintainer. With the exception that a breakage was caused by two
> > conflicting commits (for example, one that changes an API, and another
> > branch that uses that API without the update). Those types of breakages
> > is what linux-next is made for. But if the branch being pulled into
> > linux-next breaks something without the integration, then that's
> > unacceptable.  
> 
> I thought that was the case already, everthing that goes to linux-next
> is ready to go to Linus.

And, again, yes.

This is what I tell every maintainer that adds a tree to linux-next:

"Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgement of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
     * submitted under GPL v2 (or later) and include the Contributor's
        Signed-off-by,
     * posted to the relevant mailing list,
     * reviewed by you (or another maintainer of your subsystem tree),
     * successfully unit tested, and 
     * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary."

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11  0:49                           ` Stephen Rothwell
@ 2018-09-11  1:01                             ` Al Viro
  0 siblings, 0 replies; 138+ messages in thread
From: Al Viro @ 2018-09-11  1:01 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: ksummit

On Tue, Sep 11, 2018 at 10:49:20AM +1000, Stephen Rothwell wrote:
>      * submitted under GPL v2 (or later) and include the Contributor's
>         Signed-off-by,

Incidentally, the wording is a bit off here.  "Compatible with GPL v2" is
the real test - at least I bloody well hope so, since by default my stuff
is v2-only, when it's non-trivial enough to be copyrightable in the first
place.  I do *not* give permission to distribute it under the terms of
any later versions of GPL, which is what "v2 or later" usually refers to.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 21:20                     ` Guenter Roeck
  2018-09-10 21:46                       ` Steven Rostedt
  2018-09-11  0:43                       ` Stephen Rothwell
@ 2018-09-11 11:18                       ` Mark Brown
  2018-09-11 17:02                         ` Guenter Roeck
  2 siblings, 1 reply; 138+ messages in thread
From: Mark Brown @ 2018-09-11 11:18 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

On Mon, Sep 10, 2018 at 02:20:19PM -0700, Guenter Roeck wrote:

> Would that help ? -next has been more or less unusable for a week or so.
> Maybe it is just a bad time (it hasn't been as bad as it is right now
> for quite some time), but

> Build results:
> 	total: 135 pass: 133 fail: 2
> Qemu test results:
> 	total: 315 pass: 112 fail: 203

> on next-20180910 doesn't really make me very confident that useful regression
> tests on -next are even possible. it seems to me that -next is quite often
> used as dumping ground for sparsely tested changes, and is far from "ready
> for upstream".

I suspect this is something where if someone starts consistently
reporting test results things will get a lot better if someone
consistently reports test results and chases people to fix problems.  I
expect it to go like builds - used to see huge numbers of build and boot
failures in -next, and even in mainline, but ever since people started
actively pushing on them the results have got much better to the point
where it's the exeception rather than the rule.  You can see it
happening if you look at the build error/warning results from releases
over a few years (stable doesn't show it so clearly any more as a lot of
these fixes got backported there).

FWIW kernelci isn't nearly so bad on -next today - only four build
failures from the configurations it tests (someone managed to break
arm64) and the boot tests are clean apart from one board that's been
having what look like intermittent board specific issues.  

   https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180911/

No testsuites run there though.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-10 23:13                           ` Steven Rostedt
@ 2018-09-11 15:42                             ` Steven Rostedt
  2018-09-11 17:40                               ` Tony Lindgren
  0 siblings, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-11 15:42 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: ksummit

On Mon, 10 Sep 2018 19:13:29 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 10 Sep 2018 16:03:03 -0700
> Eduardo Valentin <edubezval@gmail.com> wrote:
> 
> > I thought that was the case already, everthing that goes to linux-next
> > is ready to go to Linus.  
> 
> It's suppose to be, but not always, and this is why I suggested that
> Linus start yelling at those that are not doing it.
> 

This may have come across a bit too strong. We don't need Linus to
yell, but there should definitely be consequences for any maintainer
that pushes untested code to linux-next. At a bare minimum, all code
that goes into linux-next should have passed 0day bot. Push code to a
non-linux-next branch on kernel.org, wait a few days, if you don't get
any reports that a bot caught something broken, you should be good to
go (also you can opt-in to get reports on 0day success, which I do, to
make that cycle even shorter). And that's a pretty low bar to have to
pass. Ideally, all maintainers should have a set of tests they run
before pushing anything to linux-next, or to Linus in the late -rcs. If
you can't be bothered just to rely on at least 0day then you should not
be a maintainer.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11  0:43                       ` Stephen Rothwell
@ 2018-09-11 16:49                         ` Guenter Roeck
  2018-09-11 17:47                           ` Guenter Roeck
  0 siblings, 1 reply; 138+ messages in thread
From: Guenter Roeck @ 2018-09-11 16:49 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: ksummit

On Tue, Sep 11, 2018 at 10:43:39AM +1000, Stephen Rothwell wrote:
> Hi Guenter,
> 
> On Mon, 10 Sep 2018 14:20:19 -0700 Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> > > 
> > > The best we can do is make the automated testing of linux-next better
> > > such that there's less -rc5 patches that need to go in in the first
> > > place.
> > >   
> > 
> > Would that help ? -next has been more or less unusable for a week or so.
> > Maybe it is just a bad time (it hasn't been as bad as it is right now
> > for quite some time), but
> > 
> > Build results:
> > 	total: 135 pass: 133 fail: 2
> > Qemu test results:
> > 	total: 315 pass: 112 fail: 203
> 
> I assume that most of that is the mount api changes.  I also assume you
> have reported these?
> 
I think so. I just noticed that the failure pattern changed yesterday,
and did not have time to run bisect. So, no, I have not reported this
specific failure.

> > on next-20180910 doesn't really make me very confident that useful regression
> > tests on -next are even possible. it seems to me that -next is quite often
> > used as dumping ground for sparsely tested changes, and is far from "ready
> > for upstream".
> 
> Well, we do get some of that, but also some things are harder to test
> in isolation.
> -- 
> Cheers,
> Stephen Rothwell

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 11:18                       ` Mark Brown
@ 2018-09-11 17:02                         ` Guenter Roeck
  2018-09-11 17:12                           ` Jani Nikula
                                             ` (4 more replies)
  0 siblings, 5 replies; 138+ messages in thread
From: Guenter Roeck @ 2018-09-11 17:02 UTC (permalink / raw)
  To: Mark Brown; +Cc: ksummit

On Tue, Sep 11, 2018 at 12:18:53PM +0100, Mark Brown wrote:
> On Mon, Sep 10, 2018 at 02:20:19PM -0700, Guenter Roeck wrote:
> 
> > Would that help ? -next has been more or less unusable for a week or so.
> > Maybe it is just a bad time (it hasn't been as bad as it is right now
> > for quite some time), but
> 
> > Build results:
> > 	total: 135 pass: 133 fail: 2
> > Qemu test results:
> > 	total: 315 pass: 112 fail: 203
> 
> > on next-20180910 doesn't really make me very confident that useful regression
> > tests on -next are even possible. it seems to me that -next is quite often
> > used as dumping ground for sparsely tested changes, and is far from "ready
> > for upstream".
> 
> I suspect this is something where if someone starts consistently
> reporting test results things will get a lot better if someone
> consistently reports test results and chases people to fix problems.  I
> expect it to go like builds - used to see huge numbers of build and boot
> failures in -next, and even in mainline, but ever since people started
> actively pushing on them the results have got much better to the point
> where it's the exeception rather than the rule.  You can see it
> happening if you look at the build error/warning results from releases
> over a few years (stable doesn't show it so clearly any more as a lot of
> these fixes got backported there).
> 

FWIW, for the most part I stopped reporting issues with -next after some people
yelled at me for the 'noise' I was creating. Along the line of "This has been
fixed in branch xxx; why don't you do your homework and check there", with
branch xxx not even being in -next. I don't mind "this has already been
reported/fixed", quite the contrary, but the "why don't you do your homework"
got me over the edge.

To even consider reporting issues in -next on a more regular basis, I'd like
to see a common agreement that reporting such issues does not warrant being
yelled at, even if the issue has been fixed somewhere or if it has already
been reported. Otherwise I'll stick with doing what I do now: If something
is broken for more than a week, I _may_ start looking at it if I have some
spare time and/or need a break from my day-to-day work.

> FWIW kernelci isn't nearly so bad on -next today - only four build
> failures from the configurations it tests (someone managed to break
> arm64) and the boot tests are clean apart from one board that's been
> having what look like intermittent board specific issues.  
> 
>    https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180911/
> 
> No testsuites run there though.

It doesn't require test suites. The crash happens on reboot/poweroff when
unmounting the root file system. initrd/initramfs boots don't see the
problem. Pretty much every architecture except arm (for whatever reason)
should see the problem.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:02                         ` Guenter Roeck
@ 2018-09-11 17:12                           ` Jani Nikula
  2018-09-11 17:31                             ` Mark Brown
  2018-09-11 18:03                             ` Geert Uytterhoeven
  2018-09-11 17:22                           ` James Bottomley
                                             ` (3 subsequent siblings)
  4 siblings, 2 replies; 138+ messages in thread
From: Jani Nikula @ 2018-09-11 17:12 UTC (permalink / raw)
  To: Guenter Roeck, Mark Brown; +Cc: ksummit

On Tue, 11 Sep 2018, Guenter Roeck <linux@roeck-us.net> wrote:
> FWIW, for the most part I stopped reporting issues with -next after some people
> yelled at me for the 'noise' I was creating. Along the line of "This has been
> fixed in branch xxx; why don't you do your homework and check there", with
> branch xxx not even being in -next.

What would be the reason for *not* having all the branches, including
fixes, of a subsystem/driver in linux-next? Baffled.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:02                         ` Guenter Roeck
  2018-09-11 17:12                           ` Jani Nikula
@ 2018-09-11 17:22                           ` James Bottomley
  2018-09-11 17:56                             ` Mark Brown
                                               ` (2 more replies)
  2018-09-11 17:26                           ` Mark Brown
                                             ` (2 subsequent siblings)
  4 siblings, 3 replies; 138+ messages in thread
From: James Bottomley @ 2018-09-11 17:22 UTC (permalink / raw)
  To: Guenter Roeck, Mark Brown; +Cc: ksummit

On Tue, 2018-09-11 at 10:02 -0700, Guenter Roeck wrote:
> On Tue, Sep 11, 2018 at 12:18:53PM +0100, Mark Brown wrote:
> > On Mon, Sep 10, 2018 at 02:20:19PM -0700, Guenter Roeck wrote:
> > 
> > > Would that help ? -next has been more or less unusable for a week
> > > or so. Maybe it is just a bad time (it hasn't been as bad as it
> > > is right now for quite some time), but Build results:
> > > 	total: 135 pass: 133 fail: 2
> > > Qemu test results:
> > > 	total: 315 pass: 112 fail: 203
> > > on next-20180910 doesn't really make me very confident that
> > > useful regression tests on -next are even possible. it seems to
> > > me that -next is quite often used as dumping ground for sparsely
> > > tested changes, and is far from "ready for upstream".
> > 
> > I suspect this is something where if someone starts consistently
> > reporting test results things will get a lot better if someone
> > consistently reports test results and chases people to fix
> > problems.  I expect it to go like builds - used to see huge numbers
> > of build and boot failures in -next, and even in mainline, but ever
> > since people started actively pushing on them the results have got
> > much better to the point where it's the exeception rather than the
> > rule.  You can see it happening if you look at the build
> > error/warning results from releases over a few years (stable
> > doesn't show it so clearly any more as a lot of these fixes got
> > backported there).
> > 
> 
> FWIW, for the most part I stopped reporting issues with -next after
> some people yelled at me for the 'noise' I was creating. Along the
> line of "This has been fixed in branch xxx; why don't you do your
> homework and check there", with branch xxx not even being in -next. I
> don't mind "this has already been reported/fixed", quite the
> contrary, but the "why don't you do your homework" got me over the
> edge.

Not to excuse rudeness, we always try to be polite on lists when this
happens, but -next builds on Australian time, so when we find and fix
an issue there can be up to 24h before it propagates.  In that time,
particularly if it's a stupid bug, it gets picked up and flagged by a
number of self contained 0day type projects and possibly a couple of
coccinelle type ones as well.  It does get a bit repetitive for
maintainers to receive and have to respond to 4 or 5 bug reports for
something they just fixed ...

Perhaps the -next tracking projects could have some sort of co-
ordination list to prevent the five bug reports for the same issue
problem?

James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:02                         ` Guenter Roeck
  2018-09-11 17:12                           ` Jani Nikula
  2018-09-11 17:22                           ` James Bottomley
@ 2018-09-11 17:26                           ` Mark Brown
  2018-09-11 18:45                           ` Steven Rostedt
  2018-09-12  9:03                           ` Dan Carpenter
  4 siblings, 0 replies; 138+ messages in thread
From: Mark Brown @ 2018-09-11 17:26 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]

On Tue, Sep 11, 2018 at 10:02:12AM -0700, Guenter Roeck wrote:

> FWIW, for the most part I stopped reporting issues with -next after some people
> yelled at me for the 'noise' I was creating. Along the line of "This has been
> fixed in branch xxx; why don't you do your homework and check there", with
> branch xxx not even being in -next. I don't mind "this has already been
> reported/fixed", quite the contrary, but the "why don't you do your homework"
> got me over the edge.

Ugh, yeah - that sort of response is super annoying, especially when it
also comes along with something about not fixing -next for a while for
some process reason.  I have found it's very rare these days fortunately
but it has happened.

> To even consider reporting issues in -next on a more regular basis, I'd like
> to see a common agreement that reporting such issues does not warrant being
> yelled at, even if the issue has been fixed somewhere or if it has already
> been reported. Otherwise I'll stick with doing what I do now: If something
> is broken for more than a week, I _may_ start looking at it if I have some
> spare time and/or need a break from my day-to-day work.

I'd say that should be true in general, being pointed at some previous
discussion or whatever is clearly fine but it's unreasonable to expect
people doing general purpose testing to know about random other threads
or branches.  It's especially true if it's something that disrupts other
testing in an integration tree.

> > FWIW kernelci isn't nearly so bad on -next today - only four build
> > failures from the configurations it tests (someone managed to break
> > arm64) and the boot tests are clean apart from one board that's been
> > having what look like intermittent board specific issues.  

> >    https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180911/

> > No testsuites run there though.

> It doesn't require test suites. The crash happens on reboot/poweroff when
> unmounting the root file system. initrd/initramfs boots don't see the
> problem. Pretty much every architecture except arm (for whatever reason)
> should see the problem.

You'd also need to reboot or power off which KernelCI doesn't do for
boot tests - it's just happy if we make it as far as a prompt then kills
the power.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:12                           ` Jani Nikula
@ 2018-09-11 17:31                             ` Mark Brown
  2018-09-11 17:41                               ` Daniel Vetter
  2018-09-11 18:03                             ` Geert Uytterhoeven
  1 sibling, 1 reply; 138+ messages in thread
From: Mark Brown @ 2018-09-11 17:31 UTC (permalink / raw)
  To: Jani Nikula; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

On Tue, Sep 11, 2018 at 08:12:37PM +0300, Jani Nikula wrote:
> On Tue, 11 Sep 2018, Guenter Roeck <linux@roeck-us.net> wrote:

> > FWIW, for the most part I stopped reporting issues with -next after some people
> > yelled at me for the 'noise' I was creating. Along the line of "This has been
> > fixed in branch xxx; why don't you do your homework and check there", with
> > branch xxx not even being in -next.

> What would be the reason for *not* having all the branches, including
> fixes, of a subsystem/driver in linux-next? Baffled.

Some people only put things into -next after they've passed QA (like
Steven's thing about 0day) so you'll see branches that are undergoing QA
in git before they get merged into the -next branch.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11  0:47                         ` Stephen Rothwell
@ 2018-09-11 17:35                           ` Linus Torvalds
  0 siblings, 0 replies; 138+ messages in thread
From: Linus Torvalds @ 2018-09-11 17:35 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: ksummit

On Mon, Sep 10, 2018 at 2:47 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> I am pretty sure that Linus does not follow linux-next (except to see
> what is still coming during the merge window),

Correct. I follow next only for tracking how much to expect, and then
verifying the occasional borderline pull (if if something looks fishy,
I might check :was it in linux-next?"). I don't really track it
otherwise.

>                   so maybe I should be
> yelling at people - but yelling is not one of my better skills :-)

I wish I wasn't always the bad guy ("when daddy gets home, you'll
really get it"), but one thing that might be worth doing is to just
track what causes failures, and inform me.

If a failure persists, I'll know at least to not pull it.

But that requires that the failures be pinpointed at lest to which
linux-next pull it was, so that you (or somebody else who runs tests)
can do more than "build/boot failed for linux-next", and actually
point to where the failure came in.

             Linus

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 15:42                             ` Steven Rostedt
@ 2018-09-11 17:40                               ` Tony Lindgren
  2018-09-11 17:47                                 ` James Bottomley
  0 siblings, 1 reply; 138+ messages in thread
From: Tony Lindgren @ 2018-09-11 17:40 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

* Steven Rostedt <rostedt@goodmis.org> [180911 15:46]:
> On Mon, 10 Sep 2018 19:13:29 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Mon, 10 Sep 2018 16:03:03 -0700
> > Eduardo Valentin <edubezval@gmail.com> wrote:
> > 
> > > I thought that was the case already, everthing that goes to linux-next
> > > is ready to go to Linus.  
> > 
> > It's suppose to be, but not always, and this is why I suggested that
> > Linus start yelling at those that are not doing it.
> > 
> 
> This may have come across a bit too strong. We don't need Linus to
> yell, but there should definitely be consequences for any maintainer
> that pushes untested code to linux-next. At a bare minimum, all code
> that goes into linux-next should have passed 0day bot. Push code to a
> non-linux-next branch on kernel.org, wait a few days, if you don't get
> any reports that a bot caught something broken, you should be good to
> go (also you can opt-in to get reports on 0day success, which I do, to
> make that cycle even shorter). And that's a pretty low bar to have to
> pass. Ideally, all maintainers should have a set of tests they run
> before pushing anything to linux-next, or to Linus in the late -rcs. If
> you can't be bothered just to rely on at least 0day then you should not
> be a maintainer.

Based on the regressions I seem to hit quite a few Linux next
regressions could have been avoided if Andrew's mm tree had seem some
more testing before being added to next. Probably because Andrew
queues lots of complicated patches :)

So yeah what you're suggesting might help with that if we establish
a let's say 24 hour period before adding branches to next. At
least that gives the automated systems a chance to test stuff
before it hits next. And people who want to can then test various
branches separately in advance.

Regards,

Tony

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:31                             ` Mark Brown
@ 2018-09-11 17:41                               ` Daniel Vetter
  2018-09-11 18:54                                 ` Mark Brown
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2018-09-11 17:41 UTC (permalink / raw)
  To: Mark Brown; +Cc: ksummit

On Tue, Sep 11, 2018 at 7:31 PM, Mark Brown <broonie@kernel.org> wrote:
> On Tue, Sep 11, 2018 at 08:12:37PM +0300, Jani Nikula wrote:
>> On Tue, 11 Sep 2018, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> > FWIW, for the most part I stopped reporting issues with -next after some people
>> > yelled at me for the 'noise' I was creating. Along the line of "This has been
>> > fixed in branch xxx; why don't you do your homework and check there", with
>> > branch xxx not even being in -next.
>
>> What would be the reason for *not* having all the branches, including
>> fixes, of a subsystem/driver in linux-next? Baffled.
>
> Some people only put things into -next after they've passed QA (like
> Steven's thing about 0day) so you'll see branches that are undergoing QA
> in git before they get merged into the -next branch.

This is why we have a pre-merge CI SLA of mean latency < 6h for the
full pre-merge run. This is from the time your patch hits the m-l to
when the most extensive runs have completed (representing about 1 week
of machine). Early smoke-test results show up much earlier. In
practice this means you're almost always limited by review
turn-around, and not by CI. Exactly to avoid the "the regression fix
is ready except not yet fully tested" issues.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:40                               ` Tony Lindgren
@ 2018-09-11 17:47                                 ` James Bottomley
  2018-09-11 18:12                                   ` Eduardo Valentin
  2018-09-11 18:39                                   ` Steven Rostedt
  0 siblings, 2 replies; 138+ messages in thread
From: James Bottomley @ 2018-09-11 17:47 UTC (permalink / raw)
  To: Tony Lindgren, Steven Rostedt; +Cc: ksummit

On Tue, 2018-09-11 at 10:40 -0700, Tony Lindgren wrote:
> * Steven Rostedt <rostedt@goodmis.org> [180911 15:46]:
> > On Mon, 10 Sep 2018 19:13:29 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Mon, 10 Sep 2018 16:03:03 -0700
> > > Eduardo Valentin <edubezval@gmail.com> wrote:
> > > 
> > > > I thought that was the case already, everthing that goes to
> > > > linux-next is ready to go to Linus.  
> > > 
> > > It's suppose to be, but not always, and this is why I suggested
> > > that Linus start yelling at those that are not doing it.
> > > 
> > 
> > This may have come across a bit too strong. We don't need Linus to
> > yell, but there should definitely be consequences for any
> > maintainer that pushes untested code to linux-next. At a bare
> > minimum, all code that goes into linux-next should have passed 0day
> > bot. Push code to a non-linux-next branch on kernel.org, wait a few
> > days, if you don't get any reports that a bot caught something
> > broken, you should be good to go (also you can opt-in to get
> > reports on 0day success, which I do, to make that cycle even
> > shorter). And that's a pretty low bar to have to
> > pass. Ideally, all maintainers should have a set of tests they run
> > before pushing anything to linux-next, or to Linus in the late
> > -rcs. If you can't be bothered just to rely on at least 0day then
> > you should not be a maintainer.
> 
> Based on the regressions I seem to hit quite a few Linux next
> regressions could have been avoided if Andrew's mm tree had seem some
> more testing before being added to next. Probably because Andrew
> queues lots of complicated patches :)
> 
> So yeah what you're suggesting might help with that if we establish
> a let's say 24 hour period before adding branches to next. At
> least that gives the automated systems a chance to test stuff
> before it hits next. And people who want to can then test various
> branches separately in advance.

I really don't think that helps.  The 0day mailing list bot seems to be
a bit overloaded and about 80% of the automation isn't run *unless*
your branch hits -next.  Our criterion for -next queueing is local
tests pass, code inspection complete and 0day ML didn't complain. 
However, we still get quite a few reports from the -next automated
testing even after our local stuff.  I really don't see what delaying
into -next buys you except delaying finding and fixing bugs.

This applies to 0day as well, because, by agreement, it has a much
deeper set of xfstest runs for branches we actually queue for -next
rather than the more cursory set of tests it runs on ML patches.

James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 16:49                         ` Guenter Roeck
@ 2018-09-11 17:47                           ` Guenter Roeck
  0 siblings, 0 replies; 138+ messages in thread
From: Guenter Roeck @ 2018-09-11 17:47 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: ksummit

On Tue, Sep 11, 2018 at 09:49:38AM -0700, Guenter Roeck wrote:
> On Tue, Sep 11, 2018 at 10:43:39AM +1000, Stephen Rothwell wrote:
> > Hi Guenter,
> > 
> > On Mon, 10 Sep 2018 14:20:19 -0700 Guenter Roeck <linux@roeck-us.net> wrote:
> > >
> > > On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
> > > > 
> > > > The best we can do is make the automated testing of linux-next better
> > > > such that there's less -rc5 patches that need to go in in the first
> > > > place.
> > > >   
> > > 
> > > Would that help ? -next has been more or less unusable for a week or so.
> > > Maybe it is just a bad time (it hasn't been as bad as it is right now
> > > for quite some time), but
> > > 
> > > Build results:
> > > 	total: 135 pass: 133 fail: 2
> > > Qemu test results:
> > > 	total: 315 pass: 112 fail: 203
> > 
> > I assume that most of that is the mount api changes.  I also assume you
> > have reported these?
> > 
> I think so. I just noticed that the failure pattern changed yesterday,
> and did not have time to run bisect. So, no, I have not reported this
> specific failure.
> 
Now reported.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:22                           ` James Bottomley
@ 2018-09-11 17:56                             ` Mark Brown
  2018-09-11 18:00                               ` James Bottomley
  2018-09-11 18:07                             ` Geert Uytterhoeven
  2018-09-12  9:09                             ` Dan Carpenter
  2 siblings, 1 reply; 138+ messages in thread
From: Mark Brown @ 2018-09-11 17:56 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

On Tue, Sep 11, 2018 at 10:22:10AM -0700, James Bottomley wrote:

> Not to excuse rudeness, we always try to be polite on lists when this
> happens, but -next builds on Australian time, so when we find and fix
> an issue there can be up to 24h before it propagates.  In that time,
> particularly if it's a stupid bug, it gets picked up and flagged by a
> number of self contained 0day type projects and possibly a couple of
> coccinelle type ones as well.  It does get a bit repetitive for
> maintainers to receive and have to respond to 4 or 5 bug reports for
> something they just fixed ...

I'd have thought most maintainers would be pretty used to sending
repetitive e-mail by now :)

> Perhaps the -next tracking projects could have some sort of co-
> ordination list to prevent the five bug reports for the same issue
> problem?

We're not *that* overburdened with people running testing, and
especially not with people actively reporting the results, at the
minute.  A lot of the testers do have different focuses too which
reduces overlap quite a bit.  When I've seen this happening it's more
been collisions with people testing a specific platform for their own
use duplicating things.  There is kernel-build-reports@lists.linaro.org
but it's very much a firehose right now, really this would want a bug
tracker.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:56                             ` Mark Brown
@ 2018-09-11 18:00                               ` James Bottomley
  2018-09-11 18:16                                 ` Mark Brown
  0 siblings, 1 reply; 138+ messages in thread
From: James Bottomley @ 2018-09-11 18:00 UTC (permalink / raw)
  To: Mark Brown; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

On Tue, 2018-09-11 at 18:56 +0100, Mark Brown wrote:
> On Tue, Sep 11, 2018 at 10:22:10AM -0700, James Bottomley wrote:
> 
> > Not to excuse rudeness, we always try to be polite on lists when
> > this happens, but -next builds on Australian time, so when we find
> > and fix an issue there can be up to 24h before it propagates.  In
> > that time, particularly if it's a stupid bug, it gets picked up and
> > flagged by a number of self contained 0day type projects and
> > possibly a couple of coccinelle type ones as well.  It does get a
> > bit repetitive for maintainers to receive and have to respond to 4
> > or 5 bug reports for something they just fixed ...
> 
> I'd have thought most maintainers would be pretty used to sending
> repetitive e-mail by now :)
> 
> > Perhaps the -next tracking projects could have some sort of co-
> > ordination list to prevent the five bug reports for the same issue
> > problem?
> 
> We're not *that* overburdened with people running testing, and
> especially not with people actively reporting the results, at the
> minute.

We are if a maintainer snapping at someone causes tests not to be run
or results not to be reported.

James

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:12                           ` Jani Nikula
  2018-09-11 17:31                             ` Mark Brown
@ 2018-09-11 18:03                             ` Geert Uytterhoeven
  1 sibling, 0 replies; 138+ messages in thread
From: Geert Uytterhoeven @ 2018-09-11 18:03 UTC (permalink / raw)
  To: Jani Nikula; +Cc: ksummit-discuss

On Tue, Sep 11, 2018 at 7:13 PM Jani Nikula <jani.nikula@intel.com> wrote:
> On Tue, 11 Sep 2018, Guenter Roeck <linux@roeck-us.net> wrote:
> > FWIW, for the most part I stopped reporting issues with -next after some people
> > yelled at me for the 'noise' I was creating. Along the line of "This has been
> > fixed in branch xxx; why don't you do your homework and check there", with
> > branch xxx not even being in -next.
>
> What would be the reason for *not* having all the branches, including
> fixes, of a subsystem/driver in linux-next? Baffled.

The same reason why there may be a periof of a few weeks in between
receiving an "applied" email response, and discovering the branch has
finally been updated on git.kernel.org?

"release early, release often".

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:22                           ` James Bottomley
  2018-09-11 17:56                             ` Mark Brown
@ 2018-09-11 18:07                             ` Geert Uytterhoeven
  2018-09-12  9:09                             ` Dan Carpenter
  2 siblings, 0 replies; 138+ messages in thread
From: Geert Uytterhoeven @ 2018-09-11 18:07 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit-discuss

On Tue, Sep 11, 2018 at 7:22 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2018-09-11 at 10:02 -0700, Guenter Roeck wrote:
> > FWIW, for the most part I stopped reporting issues with -next after
> > some people yelled at me for the 'noise' I was creating. Along the
> > line of "This has been fixed in branch xxx; why don't you do your
> > homework and check there", with branch xxx not even being in -next. I
> > don't mind "this has already been reported/fixed", quite the
> > contrary, but the "why don't you do your homework" got me over the
> > edge.
>
> Not to excuse rudeness, we always try to be polite on lists when this
> happens, but -next builds on Australian time, so when we find and fix
> an issue there can be up to 24h before it propagates.  In that time,
> particularly if it's a stupid bug, it gets picked up and flagged by a
> number of self contained 0day type projects and possibly a couple of
> coccinelle type ones as well.  It does get a bit repetitive for
> maintainers to receive and have to respond to 4 or 5 bug reports for
> something they just fixed ...
>
> Perhaps the -next tracking projects could have some sort of co-
> ordination list to prevent the five bug reports for the same issue
> problem?

Like, http://vger.kernel.org/vger-lists.html#linux-next ?

/me is guilty of not being subscribed
(but usually I do Google for independent fixes before reporting issues).

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:47                                 ` James Bottomley
@ 2018-09-11 18:12                                   ` Eduardo Valentin
  2018-09-11 18:17                                     ` Geert Uytterhoeven
  2018-09-11 18:19                                     ` James Bottomley
  2018-09-11 18:39                                   ` Steven Rostedt
  1 sibling, 2 replies; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-11 18:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

Hey,

On Tue, Sep 11, 2018 at 10:47:02AM -0700, James Bottomley wrote:
> On Tue, 2018-09-11 at 10:40 -0700, Tony Lindgren wrote:
> > * Steven Rostedt <rostedt@goodmis.org> [180911 15:46]:
> > > On Mon, 10 Sep 2018 19:13:29 -0400
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > On Mon, 10 Sep 2018 16:03:03 -0700
> > > > Eduardo Valentin <edubezval@gmail.com> wrote:
> > > > 
> > > > > I thought that was the case already, everthing that goes to
> > > > > linux-next is ready to go to Linus.  
> > > > 
> > > > It's suppose to be, but not always, and this is why I suggested
> > > > that Linus start yelling at those that are not doing it.
> > > > 
> > > 
> > > This may have come across a bit too strong. We don't need Linus to
> > > yell, but there should definitely be consequences for any
> > > maintainer that pushes untested code to linux-next. At a bare
> > > minimum, all code that goes into linux-next should have passed 0day
> > > bot. Push code to a non-linux-next branch on kernel.org, wait a few
> > > days, if you don't get any reports that a bot caught something
> > > broken, you should be good to go (also you can opt-in to get
> > > reports on 0day success, which I do, to make that cycle even
> > > shorter). And that's a pretty low bar to have to
> > > pass. Ideally, all maintainers should have a set of tests they run
> > > before pushing anything to linux-next, or to Linus in the late
> > > -rcs. If you can't be bothered just to rely on at least 0day then
> > > you should not be a maintainer.
> > 
> > Based on the regressions I seem to hit quite a few Linux next
> > regressions could have been avoided if Andrew's mm tree had seem some
> > more testing before being added to next. Probably because Andrew
> > queues lots of complicated patches :)
> > 
> > So yeah what you're suggesting might help with that if we establish
> > a let's say 24 hour period before adding branches to next. At
> > least that gives the automated systems a chance to test stuff
> > before it hits next. And people who want to can then test various
> > branches separately in advance.
> 
> I really don't think that helps.  The 0day mailing list bot seems to be

I think the idea is to minimize the failures on -next, and use the
linux-next tree for its original purpose: check for integration issues.

> a bit overloaded and about 80% of the automation isn't run *unless*
> your branch hits -next.  Our criterion for -next queueing is local

Oh, I see. I was not aware of such dependency of the 0day bot. The
kernelCI bot, which I mainly use after my local test, has no such
requirement.

> tests pass, code inspection complete and 0day ML didn't complain. 
> However, we still get quite a few reports from the -next automated
> testing even after our local stuff.  I really don't see what delaying
> into -next buys you except delaying finding and fixing bugs.
> 

having a larger build/boot/test coverage before pushing on linux-next.
But given the 0day dependency on -next itself, I am not sure it is worth
it. Why is that a thing anyways? 0day cannot test individual branches?


> This applies to 0day as well, because, by agreement, it has a much
> deeper set of xfstest runs for branches we actually queue for -next
> rather than the more cursory set of tests it runs on ML patches.
> 
> James
> 
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:00                               ` James Bottomley
@ 2018-09-11 18:16                                 ` Mark Brown
  0 siblings, 0 replies; 138+ messages in thread
From: Mark Brown @ 2018-09-11 18:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]

On Tue, Sep 11, 2018 at 11:00:32AM -0700, James Bottomley wrote:
> On Tue, 2018-09-11 at 18:56 +0100, Mark Brown wrote:
> > On Tue, Sep 11, 2018 at 10:22:10AM -0700, James Bottomley wrote:

> > > Perhaps the -next tracking projects could have some sort of co-
> > > ordination list to prevent the five bug reports for the same issue
> > > problem?

> > We're not *that* overburdened with people running testing, and
> > especially not with people actively reporting the results, at the
> > minute.

> We are if a maintainer snapping at someone causes tests not to be run
> or results not to be reported.

That needn't follow - we don't know that this was lots of the automated
testing people reporting the same issue, one automated testing person
and some other people who were just trying to do work on the subsystem,
just the maintainer having a bad day or something else.  The automated
testing people coordinating with each other is only going to help if
it's only them.  In this case it sounds like it might be as much about
having someone report a bug who wasn't familiar with the unusual
processes the subsystem was using as anything else.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:12                                   ` Eduardo Valentin
@ 2018-09-11 18:17                                     ` Geert Uytterhoeven
  2018-09-12 15:15                                       ` Eduardo Valentin
  2018-09-11 18:19                                     ` James Bottomley
  1 sibling, 1 reply; 138+ messages in thread
From: Geert Uytterhoeven @ 2018-09-11 18:17 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: James Bottomley, ksummit-discuss

On Tue, Sep 11, 2018 at 8:13 PM Eduardo Valentin <edubezval@gmail.com> wrote:
> But given the 0day dependency on -next itself, I am not sure it is worth
> it. Why is that a thing anyways? 0day cannot test individual branches?

0day does test individual branches (I get email reports for each branch I
push to git.kernel.org).  The number of tests run depends on the load,
though, AFAIK.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:12                                   ` Eduardo Valentin
  2018-09-11 18:17                                     ` Geert Uytterhoeven
@ 2018-09-11 18:19                                     ` James Bottomley
  2018-09-12 15:17                                       ` Eduardo Valentin
  1 sibling, 1 reply; 138+ messages in thread
From: James Bottomley @ 2018-09-11 18:19 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: ksummit

On September 11, 2018 11:12:51 AM PDT, Eduardo Valentin <edubezval@gmail.com> wrote:
>Hey,
>
>On Tue, Sep 11, 2018 a 10:47:02AM -0700, James Bottomley wrote:
>> On Tue, 2018-09-11 at 10:40 -0700, Tony Lindgren wrote:
>> > * Steven Rostedt <rostedt@goodmis.org> [180911 15:46]:
>> > > On Mon, 10 Sep 2018 19:13:29 -0400
>> > > Steven Rostedt <rostedt@goodmis.org> wrote:
>> > > 
>> > > > On Mon, 10 Sep 2018 16:03:03 -0700
>> > > > Eduardo Valentin <edubezval@gmail.com> wrote:
>> > > > 
>> > > > > I thought that was the case already, everthing that goes to
>> > > > > linux-next is ready to go to Linus.  
>> > > > 
>> > > > It's suppose to be, but not always, and this is why I suggested
>> > > > that Linus start yelling at those that are not doing it.
>> > > > 
>> > > 
>> > > This may have come across a bit too strong. We don't need Linus
>to
>> > > yell, but there should definitely be consequences for any
>> > > maintainer that pushes untested code to linux-next. At a bare
>> > > minimum, all code that goes into linux-next should have passed
>0day
>> > > bot. Push code to a non-linux-next branch on kernel.org, wait a
>few
>> > > days, if you don't get any reports that a bot caught something
>> > > broken, you should be good to go (also you can opt-in to get
>> > > reports on 0day success, which I do, to make that cycle even
>> > > shorter). And that's a pretty low bar to have to
>> > > pass. Ideally, all maintainers should have a set of tests they
>run
>> > > before pushing anything to linux-next, or to Linus in the late
>> > > -rcs. If you can't be bothered just to rely on at least 0day then
>> > > you should not be a maintainer.
>> > 
>> > Based on the regressions I seem to hit quite a few Linux next
>> > regressions could have been avoided if Andrew's mm tree had seem
>some
>> > more testing before being added to next. Probably because Andrew
>> > queues lots of complicated patches :)
>> > 
>> > So yeah what you're suggesting might help with that if we establish
>> > a let's say 24 hour period before adding branches to next. At
>> > least that gives the automated systems a chance to test stuff
>> > before it hits next. And people who want to can then test various
>> > branches separately in advance.
>> 
>> I really don't think that helps.  The 0day mailing list bot seems to
>be
>
>I think the idea is to minimize the failures on -next, and use the
>linux-next tree for its original purpose: check for integration issues.

We thought the patch was ready based on our acceptance criteria. That's why it went into our -next integration branch.  Finding bugs after we thought it was ready is a legitimate integration issue...


>> a bit overloaded and about 80% of the automation isn't run *unless*
>> your branch hits -next.  Our criterion for -next queueing is local
>
>Oh, I see. I was not aware of such dependency of the 0day bot. The
>kernelCI bot, which I mainly use after my local test, has no such
>requirement.
>
>> tests pass, code inspection complete and 0day ML didn't complain. 
>> However, we still get quite a few reports from the -next automated
>> testing even after our local stuff.  I really don't see what delaying
>> into -next buys you except delaying finding and fixing bugs.
>> 
>
>having a larger build/boot/test coverage before pushing on linux-next.
>But given the 0day dependency on -next itself, I am not sure it is
>worth
>it. Why is that a thing anyways? 0day cannot test individual branches?

0day does test most branches it finds but its better to work off a curated list which the -next build bot has.

James

>> This applies to 0day as well, because, by agreement, it has a much
>> deeper set of xfstest runs for branches we actually queue for -next
>> rather than the more cursory set of tests it runs on ML patches.
>> 
>> James
>> 
>> _______________________________________________
>> Ksummit-discuss mailing list
>> Ksummit-discuss@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
>_______________________________________________
>Ksummit-discuss mailing list
>Ksummit-discuss@lists.linuxfoundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:47                                 ` James Bottomley
  2018-09-11 18:12                                   ` Eduardo Valentin
@ 2018-09-11 18:39                                   ` Steven Rostedt
  2018-09-11 20:09                                     ` James Bottomley
  1 sibling, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-11 18:39 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, 11 Sep 2018 10:47:02 -0700
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> I really don't think that helps.  The 0day mailing list bot seems to be
> a bit overloaded and about 80% of the automation isn't run *unless*

Really? I get reports on my branches about a lot of issues without ever
pushing it to next. Maybe I lucked out and these issues were caught by
the 20%.

When I have a branch ready to test, I push it to my local branch on
kernel.org and then kick off my own test suite. I like to see which one
will catch any bugs first. I usually get a response from 0day and a
couple of hours later my tests will fail with the same error. Thus I
found 0day to be quite efficient.

> your branch hits -next.  Our criterion for -next queueing is local
> tests pass, code inspection complete and 0day ML didn't complain. 
> However, we still get quite a few reports from the -next automated
> testing even after our local stuff.  I really don't see what delaying
> into -next buys you except delaying finding and fixing bugs.

But you state you have your own local tests, which I think could be
enough of a requirement. Although running allmod and allyes should be
part of a local test, but I think 0day does that too (as that's usually
the test that fails most often that 0day catches first).

> 
> This applies to 0day as well, because, by agreement, it has a much
> deeper set of xfstest runs for branches we actually queue for -next
> rather than the more cursory set of tests it runs on ML patches.

What about the tests on local branches? Not what you post to the ML.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:02                         ` Guenter Roeck
                                             ` (2 preceding siblings ...)
  2018-09-11 17:26                           ` Mark Brown
@ 2018-09-11 18:45                           ` Steven Rostedt
  2018-09-11 18:57                             ` Daniel Vetter
  2018-09-12  9:03                           ` Dan Carpenter
  4 siblings, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-11 18:45 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit

On Tue, 11 Sep 2018 10:02:12 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> FWIW, for the most part I stopped reporting issues with -next after some people
> yelled at me for the 'noise' I was creating. Along the line of "This has been
> fixed in branch xxx; why don't you do your homework and check there", with
> branch xxx not even being in -next. I don't mind "this has already been
> reported/fixed", quite the contrary, but the "why don't you do your homework"
> got me over the edge.

A bug reporter should never be yelled at. The correct response should
be "Thank you for the report, but we have already fixed that bug in XYZ
branch, would you mind testing that?"

That's how I respond to such reports.

I've been told that yelling is never appropriate (although I may not
totally agree with that statement), but nasty messages to people
reporting bugs to your code (regardless if it's fixed or not someplace
else) is totally uncalled for.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:41                               ` Daniel Vetter
@ 2018-09-11 18:54                                 ` Mark Brown
  0 siblings, 0 replies; 138+ messages in thread
From: Mark Brown @ 2018-09-11 18:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit

[-- Attachment #1: Type: text/plain, Size: 961 bytes --]

On Tue, Sep 11, 2018 at 07:41:40PM +0200, Daniel Vetter wrote:
> On Tue, Sep 11, 2018 at 7:31 PM, Mark Brown <broonie@kernel.org> wrote:

> > Some people only put things into -next after they've passed QA (like
> > Steven's thing about 0day) so you'll see branches that are undergoing QA
> > in git before they get merged into the -next branch.

> This is why we have a pre-merge CI SLA of mean latency < 6h for the
> full pre-merge run. This is from the time your patch hits the m-l to
> when the most extensive runs have completed (representing about 1 week
> of machine). Early smoke-test results show up much earlier. In
> practice this means you're almost always limited by review
> turn-around, and not by CI. Exactly to avoid the "the regression fix
> is ready except not yet fully tested" issues.

Right, though CI like that can't do the longer running tests that some
subsystems have for various things.  Don't know if that was the case
here mind you.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:45                           ` Steven Rostedt
@ 2018-09-11 18:57                             ` Daniel Vetter
  2018-09-11 20:15                               ` Thomas Gleixner
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2018-09-11 18:57 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Tue, Sep 11, 2018 at 8:45 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 11 Sep 2018 10:02:12 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
>
>> FWIW, for the most part I stopped reporting issues with -next after some people
>> yelled at me for the 'noise' I was creating. Along the line of "This has been
>> fixed in branch xxx; why don't you do your homework and check there", with
>> branch xxx not even being in -next. I don't mind "this has already been
>> reported/fixed", quite the contrary, but the "why don't you do your homework"
>> got me over the edge.
>
> A bug reporter should never be yelled at. The correct response should
> be "Thank you for the report, but we have already fixed that bug in XYZ
> branch, would you mind testing that?"
>
> That's how I respond to such reports.
>
> I've been told that yelling is never appropriate (although I may not
> totally agree with that statement), but nasty messages to people
> reporting bugs to your code (regardless if it's fixed or not someplace
> else) is totally uncalled for.

I don't report bugs if there's any yelling involved, period.

It's not fun, and me not being the direct target doesn't make it
better really. The other bit is also that the risk of yelling sends at
least some people into panic mode, and that's the last you need when
trying to get some tricky bug fixed. A maintainer who's brain is busy
freaking out about Linus jumping at them, when you need 100% of their
brain to understand the bug, is just plain not much use. E.g. I
stopped calling regressions regressions, exactly to avoid the panic
reaction and yelling. And for some subsystems I outright stopped
reporting issues, because stuffing a small fixup patch into our local
tree and maintaining it forever is less headaches than the shouting
and yelling the report will cause. Note: There's also other reasons we
have patches in our local fixup tree, so don't go looking there now
and shredding the involved subsystem maintainers, pls.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:39                                   ` Steven Rostedt
@ 2018-09-11 20:09                                     ` James Bottomley
  2018-09-11 20:31                                       ` Steven Rostedt
  0 siblings, 1 reply; 138+ messages in thread
From: James Bottomley @ 2018-09-11 20:09 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
> On Tue, 11 Sep 2018 10:47:02 -0700
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>  
> > I really don't think that helps.  The 0day mailing list bot seems
> > to be a bit overloaded and about 80% of the automation isn't run
> > *unless*
> 
> Really? I get reports on my branches about a lot of issues without
> ever pushing it to next. Maybe I lucked out and these issues were
> caught by the 20%.
> 
> When I have a branch ready to test, I push it to my local branch on
> kernel.org and then kick off my own test suite. I like to see which
> one will catch any bugs first. I usually get a response from 0day and
> a couple of hours later my tests will fail with the same error. Thus
> I found 0day to be quite efficient.

I treat -next as the integration tree: I push to it when we're ready to
integrate.  I don't think that's unreasonable and I do see it being
unreasonable to impose a delay on this process.

> > your branch hits -next.  Our criterion for -next queueing is local
> > tests pass, code inspection complete and 0day ML didn't complain. 
> > However, we still get quite a few reports from the -next automated
> > testing even after our local stuff.  I really don't see what
> > delaying into -next buys you except delaying finding and fixing
> > bugs.
> 
> But you state you have your own local tests, which I think could be
> enough of a requirement. Although running allmod and allyes should be
> part of a local test, but I think 0day does that too (as that's
> usually the test that fails most often that 0day catches first).

So when local tests are complete we're ready for integration.

> > This applies to 0day as well, because, by agreement, it has a much
> > deeper set of xfstest runs for branches we actually queue for -next
> > rather than the more cursory set of tests it runs on ML patches.
> 
> What about the tests on local branches? Not what you post to the ML.

I don't really see any point having a local -pre-next pass and hoping
0day will find it.  It's much more valuable (and faster) to push to
-next and have all the integrated tests work on it once we've done
everything we can locally.

James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:57                             ` Daniel Vetter
@ 2018-09-11 20:15                               ` Thomas Gleixner
  0 siblings, 0 replies; 138+ messages in thread
From: Thomas Gleixner @ 2018-09-11 20:15 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit

On Tue, 11 Sep 2018, Daniel Vetter wrote:
> and yelling the report will cause. Note: There's also other reasons we
> have patches in our local fixup tree, so don't go looking there now
> and shredding the involved subsystem maintainers, pls.

I nevertheless went to look just to see what touches my areas of
interest. And interestingly I found two patches which I looked at before
and both were discussed on the mailing list in civilized ways. Both ended
up with recommendations and requests for change from reviewers and then
nada.

One is an enhancement and the other is a bug fix. I dont care about the
former too much other than having wasted time for reviewing it and thinking
about a cleaner solution. That happens and I know that everyone is
overworked.

But for the bug fix I really have to ask _WHY_ this is carried in some
random drm branch and not followed up for more than 18 month. The
maintainer in question was obviously putting it aside and waiting for the
review comment to be addressed and then forgot about it, but you cannot
have forgotten about it because you or whoever is in charge of that tree
rebased it a gazillion of times.

There is another fix which has a similar fate, which has a bugzilla entry
updated a few days ago telling the world that the fix is not upstreamed
yet, but now documented in the BZ so everyone can find it.

Nothing of this has to do with the 'oh I gave up on talking to those people
because they are so hard to work with' AFAICT. So what are you trying to
tell us here?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 20:09                                     ` James Bottomley
@ 2018-09-11 20:31                                       ` Steven Rostedt
  2018-09-11 22:53                                         ` James Bottomley
  0 siblings, 1 reply; 138+ messages in thread
From: Steven Rostedt @ 2018-09-11 20:31 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, 11 Sep 2018 16:09:32 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
> > On Tue, 11 Sep 2018 10:47:02 -0700
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >    
> > > I really don't think that helps.  The 0day mailing list bot seems
> > > to be a bit overloaded and about 80% of the automation isn't run
> > > *unless*  
> > 
> > Really? I get reports on my branches about a lot of issues without
> > ever pushing it to next. Maybe I lucked out and these issues were
> > caught by the 20%.
> > 
> > When I have a branch ready to test, I push it to my local branch on
> > kernel.org and then kick off my own test suite. I like to see which
> > one will catch any bugs first. I usually get a response from 0day and
> > a couple of hours later my tests will fail with the same error. Thus
> > I found 0day to be quite efficient.  
> 
> I treat -next as the integration tree: I push to it when we're ready to
> integrate.  I don't think that's unreasonable and I do see it being
> unreasonable to impose a delay on this process.

I believe that's exactly what -next is for. Integration with
development from other sub systems.

> 
> > > your branch hits -next.  Our criterion for -next queueing is local
> > > tests pass, code inspection complete and 0day ML didn't complain. 
> > > However, we still get quite a few reports from the -next automated
> > > testing even after our local stuff.  I really don't see what
> > > delaying into -next buys you except delaying finding and fixing
> > > bugs.  
> > 
> > But you state you have your own local tests, which I think could be
> > enough of a requirement. Although running allmod and allyes should be
> > part of a local test, but I think 0day does that too (as that's
> > usually the test that fails most often that 0day catches first).  
> 
> So when local tests are complete we're ready for integration.

If your local tests catch most common bugs. (testing on non-SMP,
x86_32, and other non-common cases where things can break).

> 
> > > This applies to 0day as well, because, by agreement, it has a much
> > > deeper set of xfstest runs for branches we actually queue for -next
> > > rather than the more cursory set of tests it runs on ML patches.  
> > 
> > What about the tests on local branches? Not what you post to the ML.  
> 
> I don't really see any point having a local -pre-next pass and hoping
> 0day will find it.  It's much more valuable (and faster) to push to
> -next and have all the integrated tests work on it once we've done
> everything we can locally.

Why not do what I do and push to a -pre-next branch when you kick off
your local tests? Then perhaps 0day may catch something you introduced
before it heads off to -next. -next is about catching bugs due to
integration with other sub systems, it shouldn't be used to catch bugs
that could have been caught without the integration.

Kind of like if you give blood in the US. There's a note that tells you
to not use blood donation as a way to get tested for a particular
disease. Just get tested for it before giving blood and possibly
infecting others.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 20:31                                       ` Steven Rostedt
@ 2018-09-11 22:53                                         ` James Bottomley
  2018-09-11 23:04                                           ` Sasha Levin
                                                             ` (2 more replies)
  0 siblings, 3 replies; 138+ messages in thread
From: James Bottomley @ 2018-09-11 22:53 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> On Tue, 11 Sep 2018 16:09:32 -0400
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
[...]
> > > > This applies to 0day as well, because, by agreement, it has a
> > > > much deeper set of xfstest runs for branches we actually queue
> > > > for -next rather than the more cursory set of tests it runs on
> > > > ML patches.  
> > > 
> > > What about the tests on local branches? Not what you post to the
> > > ML.  
> > 
> > I don't really see any point having a local -pre-next pass and
> > hoping 0day will find it.  It's much more valuable (and faster) to
> > push to -next and have all the integrated tests work on it once
> > we've done everything we can locally.
> 
> Why not do what I do and push to a -pre-next branch when you kick off
> your local tests?

Because there's no point.  As I said, when we complete the local
criteria the branch is ready for integration.  We push to -next and
*all* the built bots tell us if there are any problems (which I don't
expect there are but there's room for me to be wrong) ... including
0day.  I don't see what the delay and the process hassle would buy us
if we only get a review by 0day in the -pre-next branch.  It seems more
efficient to let every bot loose on what we think is mergeable.


James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 22:53                                         ` James Bottomley
@ 2018-09-11 23:04                                           ` Sasha Levin
  2018-09-11 23:11                                             ` James Bottomley
  2018-09-11 23:22                                           ` Tony Lindgren
  2018-09-12 20:24                                           ` Steven Rostedt
  2 siblings, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-11 23:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, Sep 11, 2018 at 06:53:29PM -0400, James Bottomley wrote:
>On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
>> On Tue, 11 Sep 2018 16:09:32 -0400
>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>
>> > On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
>[...]
>> > > > This applies to 0day as well, because, by agreement, it has a
>> > > > much deeper set of xfstest runs for branches we actually queue
>> > > > for -next rather than the more cursory set of tests it runs on
>> > > > ML patches.  
>> > >
>> > > What about the tests on local branches? Not what you post to the
>> > > ML.  
>> >
>> > I don't really see any point having a local -pre-next pass and
>> > hoping 0day will find it.  It's much more valuable (and faster) to
>> > push to -next and have all the integrated tests work on it once
>> > we've done everything we can locally.
>>
>> Why not do what I do and push to a -pre-next branch when you kick off
>> your local tests?
>
>Because there's no point.  As I said, when we complete the local
>criteria the branch is ready for integration.  We push to -next and
>*all* the built bots tell us if there are any problems (which I don't
>expect there are but there's room for me to be wrong) ... including
>0day.  I don't see what the delay and the process hassle would buy us
>if we only get a review by 0day in the -pre-next branch.  It seems more
>efficient to let every bot loose on what we think is mergeable.

The problem with that approach is that it will break build more often,
and if build is broken then usually no automated testing are getting
done for that day.

So you're not hurting just your tree, you're basically stopping any
automated testing on -next.

And yes, bots are already building way more things than just -next. Both
0day and kernelci build individual maintainer branches as well. Maybe a
simple solution would be to just add your pre-next branch while you run
your own tests and have it added to 0day and kernelci?


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 23:04                                           ` Sasha Levin
@ 2018-09-11 23:11                                             ` James Bottomley
  2018-09-11 23:20                                               ` Sasha Levin
  0 siblings, 1 reply; 138+ messages in thread
From: James Bottomley @ 2018-09-11 23:11 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit

On Tue, 2018-09-11 at 23:04 +0000, Sasha Levin via Ksummit-discuss
wrote:
> On Tue, Sep 11, 2018 at 06:53:29PM -0400, James Bottomley wrote:
> > On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> > > On Tue, 11 Sep 2018 16:09:32 -0400
> > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > > 
> > > > On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
> > 
> > [...]
> > > > > > This applies to 0day as well, because, by agreement, it has
> > > > > > a much deeper set of xfstest runs for branches we actually
> > > > > > queue for -next rather than the more cursory set of tests
> > > > > > it runs on ML patches.  
> > > > > 
> > > > > What about the tests on local branches? Not what you post to
> > > > > the ML.  
> > > > 
> > > > I don't really see any point having a local -pre-next pass and
> > > > hoping 0day will find it.  It's much more valuable (and faster)
> > > > to push to -next and have all the integrated tests work on it
> > > > once we've done everything we can locally.
> > > 
> > > Why not do what I do and push to a -pre-next branch when you kick
> > > off your local tests?
> > 
> > Because there's no point.  As I said, when we complete the local
> > criteria the branch is ready for integration.  We push to -next and
> > *all* the built bots tell us if there are any problems (which I
> > don't expect there are but there's room for me to be wrong) ...
> > including 0day.  I don't see what the delay and the process hassle
> > would buy us if we only get a review by 0day in the -pre-next
> > branch.  It seems more efficient to let every bot loose on what we
> > think is mergeable.
> 
> The problem with that approach is that it will break build more
> often, and if build is broken then usually no automated testing are
> getting done for that day.

You're making the wrong assumption: most bugs aren't build breaks.  We
do occasionally have them, usually because of an obscure config
interaction issue, but it's a tiny percentage, so the impact to the
entire tree usually isn't great.

However, think of this like release early, release often.  If we think
a branch is ready but there's a lurking bug, even a build related one,
it's better we find out sooner than later, so it's still better we
expose it ASAP to the full test machinery.  If it's one the local
criteria should have caught, I'm sure there'll be a huge line up of
people wanting to complain (which is why we try to make sure any
lurking bugs aren't build related).
  
James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 23:11                                             ` James Bottomley
@ 2018-09-11 23:20                                               ` Sasha Levin
  2018-09-12 15:41                                                 ` Eduardo Valentin
  0 siblings, 1 reply; 138+ messages in thread
From: Sasha Levin @ 2018-09-11 23:20 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, Sep 11, 2018 at 07:11:48PM -0400, James Bottomley wrote:
>On Tue, 2018-09-11 at 23:04 +0000, Sasha Levin via Ksummit-discuss
>wrote:
>> On Tue, Sep 11, 2018 at 06:53:29PM -0400, James Bottomley wrote:
>> > On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
>> > > On Tue, 11 Sep 2018 16:09:32 -0400
>> > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> > >
>> > > > On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
>> >
>> > [...]
>> > > > > > This applies to 0day as well, because, by agreement, it has
>> > > > > > a much deeper set of xfstest runs for branches we actually
>> > > > > > queue for -next rather than the more cursory set of tests
>> > > > > > it runs on ML patches.  
>> > > > >
>> > > > > What about the tests on local branches? Not what you post to
>> > > > > the ML.  
>> > > >
>> > > > I don't really see any point having a local -pre-next pass and
>> > > > hoping 0day will find it.  It's much more valuable (and faster)
>> > > > to push to -next and have all the integrated tests work on it
>> > > > once we've done everything we can locally.
>> > >
>> > > Why not do what I do and push to a -pre-next branch when you kick
>> > > off your local tests?
>> >
>> > Because there's no point.  As I said, when we complete the local
>> > criteria the branch is ready for integration.  We push to -next and
>> > *all* the built bots tell us if there are any problems (which I
>> > don't expect there are but there's room for me to be wrong) ...
>> > including 0day.  I don't see what the delay and the process hassle
>> > would buy us if we only get a review by 0day in the -pre-next
>> > branch.  It seems more efficient to let every bot loose on what we
>> > think is mergeable.
>>
>> The problem with that approach is that it will break build more
>> often, and if build is broken then usually no automated testing are
>> getting done for that day.
>
>You're making the wrong assumption: most bugs aren't build breaks.  We
>do occasionally have them, usually because of an obscure config
>interaction issue, but it's a tiny percentage, so the impact to the
>entire tree usually isn't great.

Right, most bugs aren't build/boot bugs, but between all various
subsystems there's always the one that snuck through. It only takes one
to kill build.

>However, think of this like release early, release often.  If we think
>a branch is ready but there's a lurking bug, even a build related one,
>it's better we find out sooner than later, so it's still better we
>expose it ASAP to the full test machinery.  If it's one the local
>criteria should have caught, I'm sure there'll be a huge line up of
>people wanting to complain (which is why we try to make sure any
>lurking bugs aren't build related).

Right, I'm not saying delay it, I'm just saying that you should feed it
to the bots *while* you run your testsuites, even if to just get build
coverage.

Also remember that -next releases once a day in a very predictable
timing, there's usually a few hours to spare between your push and the
time linux-next gets constructed, so ASAP isn't really all that critical
here as long as it gets in the same day.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 22:53                                         ` James Bottomley
  2018-09-11 23:04                                           ` Sasha Levin
@ 2018-09-11 23:22                                           ` Tony Lindgren
  2018-09-11 23:29                                             ` James Bottomley
  2018-09-12 10:04                                             ` Mark Brown
  2018-09-12 20:24                                           ` Steven Rostedt
  2 siblings, 2 replies; 138+ messages in thread
From: Tony Lindgren @ 2018-09-11 23:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

* James Bottomley <James.Bottomley@HansenPartnership.com> [180911 22:58]:
> On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> > 
> > Why not do what I do and push to a -pre-next branch when you kick off
> > your local tests?
> 
> Because there's no point.  As I said, when we complete the local
> criteria the branch is ready for integration.  We push to -next and
> *all* the built bots tell us if there are any problems (which I don't
> expect there are but there's room for me to be wrong) ... including
> 0day.  I don't see what the delay and the process hassle would buy us
> if we only get a review by 0day in the -pre-next branch.  It seems more
> efficient to let every bot loose on what we think is mergeable.

Well what we're after is providing a trigger for people writing test
scripts to test individual branches before they get merged into next.

With the goal of trying to keep next usable constantly.

Establishing a branch naming standard like "-pre-next" would allow
the scripts to test the various branches where available before
they hit next and warn about issues.

Regards,

Tony

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 23:22                                           ` Tony Lindgren
@ 2018-09-11 23:29                                             ` James Bottomley
  2018-09-12 11:55                                               ` Geert Uytterhoeven
  2018-09-12 10:04                                             ` Mark Brown
  1 sibling, 1 reply; 138+ messages in thread
From: James Bottomley @ 2018-09-11 23:29 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: ksummit

On Tue, 2018-09-11 at 16:22 -0700, Tony Lindgren wrote:
> * James Bottomley <James.Bottomley@HansenPartnership.com> [180911
> 22:58]:
> > On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> > > 
> > > Why not do what I do and push to a -pre-next branch when you kick
> > > off
> > > your local tests?
> > 
> > Because there's no point.  As I said, when we complete the local
> > criteria the branch is ready for integration.  We push to -next and
> > *all* the built bots tell us if there are any problems (which I
> > don't
> > expect there are but there's room for me to be wrong) ... including
> > 0day.  I don't see what the delay and the process hassle would buy
> > us
> > if we only get a review by 0day in the -pre-next branch.  It seems
> > more
> > efficient to let every bot loose on what we think is mergeable.
> 
> Well what we're after is providing a trigger for people writing test
> scripts to test individual branches before they get merged into next.
> 
> With the goal of trying to keep next usable constantly.
> 
> Establishing a branch naming standard like "-pre-next" would allow
> the scripts to test the various branches where available before
> they hit next and warn about issues.

I still don't get the purpose: as I've said several times, SCSI pushes
to -next when it thinks the patches are ready for merging.  Almost none
of the subsequently discovered bugs (by both bots and humans) affect
anything other than SCSI (and usually only a specific driver) so there
would have been no benefit to testing them in a separate branch and
indeed probably the detriment of diverting resources.

That's my point: from my point of view the -next process is actually
working; I don't see a reason to complicate it.

James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:02                         ` Guenter Roeck
                                             ` (3 preceding siblings ...)
  2018-09-11 18:45                           ` Steven Rostedt
@ 2018-09-12  9:03                           ` Dan Carpenter
  4 siblings, 0 replies; 138+ messages in thread
From: Dan Carpenter @ 2018-09-12  9:03 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit

On Tue, Sep 11, 2018 at 10:02:12AM -0700, Guenter Roeck wrote:
> FWIW, for the most part I stopped reporting issues with -next after some people
> yelled at me for the 'noise' I was creating. Along the line of "This has been
> fixed in branch xxx; why don't you do your homework and check there", with
> branch xxx not even being in -next. I don't mind "this has already been
> reported/fixed", quite the contrary, but the "why don't you do your homework"
> got me over the edge.
> 
> To even consider reporting issues in -next on a more regular basis, I'd like
> to see a common agreement that reporting such issues does not warrant being
> yelled at, even if the issue has been fixed somewhere or if it has already
> been reported. Otherwise I'll stick with doing what I do now: If something
> is broken for more than a week, I _may_ start looking at it if I have some
> spare time and/or need a break from my day-to-day work.

I ran into the same issue with static checker warnings.  I tried to make
it more obvious that my bug reports were basically automated.  I tried
to make the language a bit stilted and robotic.  Robots don't judge you
and it does no good to get angry with a robot.

The static checker people sometimes do send duplicate fixes.  But really
if people don't want that they should stop writing buggy code to begin
with.  I think everyone is used to getting duplicate warnings and
patches now maybe so they don't complain so much any more.  (to me).

The deal with using a different tree.  I do that for networking because
Dave Miller is a very busy maintainer, but for everyone else I just use
linux-next.  Some people have complained that they are special but all
the static checker devs assured them that they are not.  We are not
going to read the custom documentation for sending patches to their
git tree.  Unless they can script it so it works automatically for
everyone then, sorry.  I will still send them bug reports, but I'm not
going to send custom formatted patches.  Just give me Reported-by credit
and custom format it your own blasted self.  :)

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 17:22                           ` James Bottomley
  2018-09-11 17:56                             ` Mark Brown
  2018-09-11 18:07                             ` Geert Uytterhoeven
@ 2018-09-12  9:09                             ` Dan Carpenter
  2 siblings, 0 replies; 138+ messages in thread
From: Dan Carpenter @ 2018-09-12  9:09 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, Sep 11, 2018 at 10:22:10AM -0700, James Bottomley wrote:
> Not to excuse rudeness, we always try to be polite on lists when this
> happens, but -next builds on Australian time, so when we find and fix
> an issue there can be up to 24h before it propagates.  In that time,
> particularly if it's a stupid bug, it gets picked up and flagged by a
> number of self contained 0day type projects and possibly a couple of
> coccinelle type ones as well.  It does get a bit repetitive for
> maintainers to receive and have to respond to 4 or 5 bug reports for
> something they just fixed ...
> 

Figure out how people are generating the bug reports and do it yourself
pre-emptively.  That's what other trees do.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 23:22                                           ` Tony Lindgren
  2018-09-11 23:29                                             ` James Bottomley
@ 2018-09-12 10:04                                             ` Mark Brown
  1 sibling, 0 replies; 138+ messages in thread
From: Mark Brown @ 2018-09-12 10:04 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On Tue, Sep 11, 2018 at 04:22:49PM -0700, Tony Lindgren wrote:

> Well what we're after is providing a trigger for people writing test
> scripts to test individual branches before they get merged into next.

> With the goal of trying to keep next usable constantly.

One issue here is that the systems to build and run tests aren't free -
some will be able to cope with scaling up but a lot wouldn't be able to
add many extra branches in.  

> Establishing a branch naming standard like "-pre-next" would allow
> the scripts to test the various branches where available before
> they hit next and warn about issues.

Add enough delays in and someone will end up making a linux-next-next...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 23:29                                             ` James Bottomley
@ 2018-09-12 11:55                                               ` Geert Uytterhoeven
  2018-09-12 12:03                                                 ` Laurent Pinchart
  2018-09-12 12:36                                                 ` James Bottomley
  0 siblings, 2 replies; 138+ messages in thread
From: Geert Uytterhoeven @ 2018-09-12 11:55 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit-discuss

Hi James,

On Wed, Sep 12, 2018 at 1:29 AM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2018-09-11 at 16:22 -0700, Tony Lindgren wrote:
> > * James Bottomley <James.Bottomley@HansenPartnership.com> [180911
> > 22:58]:
> > > On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> > > >
> > > > Why not do what I do and push to a -pre-next branch when you kick
> > > > off
> > > > your local tests?
> > >
> > > Because there's no point.  As I said, when we complete the local
> > > criteria the branch is ready for integration.  We push to -next and
> > > *all* the built bots tell us if there are any problems (which I
> > > don't
> > > expect there are but there's room for me to be wrong) ... including
> > > 0day.  I don't see what the delay and the process hassle would buy
> > > us
> > > if we only get a review by 0day in the -pre-next branch.  It seems
> > > more
> > > efficient to let every bot loose on what we think is mergeable.
> >
> > Well what we're after is providing a trigger for people writing test
> > scripts to test individual branches before they get merged into next.
> >
> > With the goal of trying to keep next usable constantly.
> >
> > Establishing a branch naming standard like "-pre-next" would allow
> > the scripts to test the various branches where available before
> > they hit next and warn about issues.
>
> I still don't get the purpose: as I've said several times, SCSI pushes
> to -next when it thinks the patches are ready for merging.  Almost none
> of the subsequently discovered bugs (by both bots and humans) affect
> anything other than SCSI (and usually only a specific driver) so there
> would have been no benefit to testing them in a separate branch and
> indeed probably the detriment of diverting resources.
>
> That's my point: from my point of view the -next process is actually
> working; I don't see a reason to complicate it.

Good. Then this discussion wasn't targeted to the SCSI people, but to
other maintainers pushing brown paper bags and other trivial breakages
they should have caught beforehand to linux-next ;-)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 11:55                                               ` Geert Uytterhoeven
@ 2018-09-12 12:03                                                 ` Laurent Pinchart
  2018-09-12 12:29                                                   ` Thomas Gleixner
  2018-09-12 12:36                                                 ` James Bottomley
  1 sibling, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-12 12:03 UTC (permalink / raw)
  To: ksummit-discuss; +Cc: James Bottomley

On Wednesday, 12 September 2018 14:55:44 EEST Geert Uytterhoeven wrote:
> On Wed, Sep 12, 2018 at 1:29 AM James Bottomley wrote:
> > On Tue, 2018-09-11 at 16:22 -0700, Tony Lindgren wrote:
> >> * James Bottomley [180911 22:58]:
> >>> On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> >>> Why not do what I do and push to a -pre-next branch when you kick
> >>>> off your local tests?
> >>> 
> >>> Because there's no point.  As I said, when we complete the local
> >>> criteria the branch is ready for integration.  We push to -next and
> >>> *all* the built bots tell us if there are any problems (which I
> >>> don't expect there are but there's room for me to be wrong) ...
> >>> including 0day.  I don't see what the delay and the process hassle
> >>> would buy us if we only get a review by 0day in the -pre-next branch. 
> >>> It seems more efficient to let every bot loose on what we think is
> >>> mergeable.
> >> 
> >> Well what we're after is providing a trigger for people writing test
> >> scripts to test individual branches before they get merged into next.
> >> 
> >> With the goal of trying to keep next usable constantly.
> >> 
> >> Establishing a branch naming standard like "-pre-next" would allow
> >> the scripts to test the various branches where available before
> >> they hit next and warn about issues.
> > 
> > I still don't get the purpose: as I've said several times, SCSI pushes
> > to -next when it thinks the patches are ready for merging.  Almost none
> > of the subsequently discovered bugs (by both bots and humans) affect
> > anything other than SCSI (and usually only a specific driver) so there
> > would have been no benefit to testing them in a separate branch and
> > indeed probably the detriment of diverting resources.
> > 
> > That's my point: from my point of view the -next process is actually
> > working; I don't see a reason to complicate it.
> 
> Good. Then this discussion wasn't targeted to the SCSI people, but to
> other maintainers pushing brown paper bags and other trivial breakages
> they should have caught beforehand to linux-next ;-)

That's a behaviour that has been annoying me lately, maintainers should have 
no special privilege when it comes to pushing code upstream. All patches 
should be posted publicly, given enough time to be reviewed, and review 
comments should be addressed before anything is merged to a -next branch.  
Unfortunately that's not always the case :-S

I'm not sure whether we're actually doing better or worse in this area, as I 
haven't studied it across the kernel, but I've been bothered by that problem 
in several real cases.

I would even go as far as saying that all patches should have Reviewed-by or 
Acked-by tag, without enforcing that rule too strictly (I'm thinking in 
particular about drivers that only a single person cares about, it's sometimes 
hard to get patches reviewed).

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 12:03                                                 ` Laurent Pinchart
@ 2018-09-12 12:29                                                   ` Thomas Gleixner
  2018-09-12 12:53                                                     ` Laurent Pinchart
  0 siblings, 1 reply; 138+ messages in thread
From: Thomas Gleixner @ 2018-09-12 12:29 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: James Bottomley, ksummit-discuss

On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> On Wednesday, 12 September 2018 14:55:44 EEST Geert Uytterhoeven wrote:
> > Good. Then this discussion wasn't targeted to the SCSI people, but to
> > other maintainers pushing brown paper bags and other trivial breakages
> > they should have caught beforehand to linux-next ;-)
> 
> That's a behaviour that has been annoying me lately, maintainers should have 
> no special privilege when it comes to pushing code upstream. All patches 
> should be posted publicly, given enough time to be reviewed, and review 
> comments should be addressed before anything is merged to a -next branch.  
> Unfortunately that's not always the case :-S

Come on. Do you really expect me to wait for review when I fix up the
internal testing/ 0-day fallout which is often enough something trivial?
Do you really expect me to wait for review when I worked with a bug
reporter to decode something and have a 100% explanation that it fixes the
root cause and not the symptom?

1) Our review capacity is small enough already, so we don't have to
   throw more stuff out for review.

2) With that modus, bugs will stay unfixed way longer and merging of code
   will even be more delayed.

If I don't have special rights as a maintainer and you don't trust me that
I use my common sense when I'm using these special rights, then you
degraded me to a patch juggling monkey. On the day this happens, I'll step
down.

> I would even go as far as saying that all patches should have Reviewed-by or 
> Acked-by tag, without enforcing that rule too strictly (I'm thinking in 
> particular about drivers that only a single person cares about, it's sometimes 
> hard to get patches reviewed).

If we enforce that, then a large part of reviewed-by and acked-by tags will
just come from coworkers or other affiliates and have no value at all.

That's anyway a growing disease that patches already carry reviewed tags
when they are posted the first time and then you look at them and they have
at least one easy to spot or easy to detect by tools bug in them.

Great value, really. What we need is more competent review capacity, more
people who care and not some silly rules which are going to be played on
the day they are made.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 11:55                                               ` Geert Uytterhoeven
  2018-09-12 12:03                                                 ` Laurent Pinchart
@ 2018-09-12 12:36                                                 ` James Bottomley
  2018-09-12 13:38                                                   ` Guenter Roeck
  1 sibling, 1 reply; 138+ messages in thread
From: James Bottomley @ 2018-09-12 12:36 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: ksummit-discuss

On Wed, 2018-09-12 at 13:55 +0200, Geert Uytterhoeven wrote:
> Hi James,
> 
> On Wed, Sep 12, 2018 at 1:29 AM James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Tue, 2018-09-11 at 16:22 -0700, Tony Lindgren wrote:
> > > * James Bottomley <James.Bottomley@HansenPartnership.com> [180911
> > > 22:58]:
> > > > On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> > > > > 
> > > > > Why not do what I do and push to a -pre-next branch when you
> > > > > kick off your local tests?
> > > > 
> > > > Because there's no point.  As I said, when we complete the
> > > > local criteria the branch is ready for integration.  We push to
> > > > -next and *all* the built bots tell us if there are any
> > > > problems (which I don't expect there are but there's room for
> > > > me to be wrong) ... including 0day.  I don't see what the delay
> > > > and the process hassle would buy us if we only get a review by
> > > > 0day in the -pre-next branch.  It seems more efficient to let
> > > > every bot loose on what we think is mergeable.
> > > 
> > > Well what we're after is providing a trigger for people writing
> > > test scripts to test individual branches before they get merged
> > > into next.
> > > 
> > > With the goal of trying to keep next usable constantly.
> > > 
> > > Establishing a branch naming standard like "-pre-next" would
> > > allow the scripts to test the various branches where available
> > > before they hit next and warn about issues.
> > 
> > I still don't get the purpose: as I've said several times, SCSI
> > pushes to -next when it thinks the patches are ready for
> > merging.  Almost none of the subsequently discovered bugs (by both
> > bots and humans) affect anything other than SCSI (and usually only
> > a specific driver) so there would have been no benefit to testing
> > them in a separate branch and indeed probably the detriment of
> > diverting resources.
> > 
> > That's my point: from my point of view the -next process is
> > actually working; I don't see a reason to complicate it.
> 
> Good. Then this discussion wasn't targeted to the SCSI people, but to
> other maintainers pushing brown paper bags and other trivial
> breakages they should have caught beforehand to linux-next ;-)

Look, shit happens occasionally.  What then happens is that you get a
note from Stephen saying your tree is dropped for a day for crapping on
the carpet and you fix it.  -next still builds without you so I don't
get what all the fuss is about.  From my point of view the -next
process works very well and I don't see a need to complicate it with a
-next-next or a -pre-next or whatever.

James

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 12:29                                                   ` Thomas Gleixner
@ 2018-09-12 12:53                                                     ` Laurent Pinchart
  2018-09-12 13:10                                                       ` Alexandre Belloni
  2018-09-12 14:11                                                       ` Thomas Gleixner
  0 siblings, 2 replies; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-12 12:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: James Bottomley, ksummit-discuss

Hi Thomas,

On Wednesday, 12 September 2018 15:29:25 EEST Thomas Gleixner wrote:
> On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> > On Wednesday, 12 September 2018 14:55:44 EEST Geert Uytterhoeven wrote:
> >> Good. Then this discussion wasn't targeted to the SCSI people, but to
> >> other maintainers pushing brown paper bags and other trivial breakages
> >> they should have caught beforehand to linux-next ;-)
> > 
> > That's a behaviour that has been annoying me lately, maintainers should
> > have no special privilege when it comes to pushing code upstream. All
> > patches should be posted publicly, given enough time to be reviewed, and
> > review comments should be addressed before anything is merged to a -next
> > branch. Unfortunately that's not always the case :-S
> 
> Come on. Do you really expect me to wait for review when I fix up the
> internal testing/ 0-day fallout which is often enough something trivial?
> Do you really expect me to wait for review when I worked with a bug
> reporter to decode something and have a 100% explanation that it fixes the
> root cause and not the symptom?
> 
> 1) Our review capacity is small enough already, so we don't have to
>    throw more stuff out for review.
> 
> 2) With that modus, bugs will stay unfixed way longer and merging of code
>    will even be more delayed.

I don't expect to wait for review forever, but I expect maintainers to give an 
opportunity to reviewers to review patches. We obviously need to consider the 
balance between review opportunity and problems (such as build breakages) that 
could affect hundreds of developers if left unfixed even for a few days.

Too often I've noticed changes to code I maintain that introduced bugs or 
other issues, performed by a maintainer who didn't even bother to post the 
patch before pushing to to his -next branch, who didn't CC me (I could take 
part of the blame for not reading mailing lists with enough attention, but the 
volume is very high), or, possibly worse, who sent a patch out, received my 
review on the same day, and completely ignored it. The last issue is very 
demotivating for reviewers. Those changes were not at all urgent, some of them 
were "cleanups", or replacement of a deprecated API by a new one. That's very 
different than fixing a build breakage in -next which clearly can't wait.

> If I don't have special rights as a maintainer and you don't trust me that
> I use my common sense when I'm using these special rights, then you
> degraded me to a patch juggling monkey. On the day this happens, I'll step
> down.

Maintainers are much more than patch juggling monkeys, otherwise they could be 
replaced by machines. I believe that maintainers are given the huge 
responsibility of taking care of their community. Fostering a productive work 
environment, attracting (and keeping) talented developers and reviewers is a 
huge and honourable task, and gets my full respect. On top of that, if a 
maintainer has great technical skills, it's even better, and I've learnt a lot 
from talented maintainers over the time. I however believe that technical 
skills are not an excuse for not leading by example and showing what the good 
practices are by applying them.

(This goes without saying, but even better when said explicitly, there's not 
judgment about your or any particular maintainer's technical or non-technical 
skills here)

> > I would even go as far as saying that all patches should have Reviewed-by
> > or Acked-by tag, without enforcing that rule too strictly (I'm thinking
> > in particular about drivers that only a single person cares about, it's
> > sometimes hard to get patches reviewed).
> 
> If we enforce that, then a large part of reviewed-by and acked-by tags will
> just come from coworkers or other affiliates and have no value at all.

That's a concern I share, and one of the reasons why I have my doubts about 
some of the maintainership experiments in the DRM subsystem. The proponents of 
the changes there pointed out to me that development has sped up as a result, 
but I think the costs associated with the acceleration haven't been fully 
evaluated.

> That's anyway a growing disease that patches already carry reviewed tags
> when they are posted the first time and then you look at them and they have
> at least one easy to spot or easy to detect by tools bug in them.

Do you get bothered that they carry a tag when they are posted the first time, 
or only that they do so *and* have clear problems ?

> Great value, really. What we need is more competent review capacity, more
> people who care and not some silly rules which are going to be played on
> the day they are made.

That we agree on, it's not about rules, it's about agreeing on a goal (and 
hopefully making sure it will be an improvement) and working to achieve it. I 
also agree we need more competent reviewers (as in a larger number of 
competent reviewers, I'm not implying that our current reviewers are 
incompetent), and I think that goes back to my point: if maintainers 
discourage reviewers (or even developers) on a regular basis, then they're not 
doing a very good job, even if their technical skills are high (again not 
pointing fingers to anyone in particular, it's a general concern).

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 12:53                                                     ` Laurent Pinchart
@ 2018-09-12 13:10                                                       ` Alexandre Belloni
  2018-09-12 13:30                                                         ` Thomas Gleixner
  2018-09-12 23:16                                                         ` Laurent Pinchart
  2018-09-12 14:11                                                       ` Thomas Gleixner
  1 sibling, 2 replies; 138+ messages in thread
From: Alexandre Belloni @ 2018-09-12 13:10 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: James Bottomley, ksummit-discuss

On 12/09/2018 15:53:59+0300, Laurent Pinchart wrote:
> Hi Thomas,
> 
> On Wednesday, 12 September 2018 15:29:25 EEST Thomas Gleixner wrote:
> > On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> > > On Wednesday, 12 September 2018 14:55:44 EEST Geert Uytterhoeven wrote:
> > >> Good. Then this discussion wasn't targeted to the SCSI people, but to
> > >> other maintainers pushing brown paper bags and other trivial breakages
> > >> they should have caught beforehand to linux-next ;-)
> > > 
> > > That's a behaviour that has been annoying me lately, maintainers should
> > > have no special privilege when it comes to pushing code upstream. All
> > > patches should be posted publicly, given enough time to be reviewed, and
> > > review comments should be addressed before anything is merged to a -next
> > > branch. Unfortunately that's not always the case :-S
> > 
> > Come on. Do you really expect me to wait for review when I fix up the
> > internal testing/ 0-day fallout which is often enough something trivial?
> > Do you really expect me to wait for review when I worked with a bug
> > reporter to decode something and have a 100% explanation that it fixes the
> > root cause and not the symptom?
> > 
> > 1) Our review capacity is small enough already, so we don't have to
> >    throw more stuff out for review.
> > 
> > 2) With that modus, bugs will stay unfixed way longer and merging of code
> >    will even be more delayed.
> 
> I don't expect to wait for review forever, but I expect maintainers to give an 
> opportunity to reviewers to review patches. We obviously need to consider the 
> balance between review opportunity and problems (such as build breakages) that 
> could affect hundreds of developers if left unfixed even for a few days.
> 
> Too often I've noticed changes to code I maintain that introduced bugs or 
> other issues, performed by a maintainer who didn't even bother to post the 
> patch before pushing to to his -next branch, who didn't CC me (I could take 
> part of the blame for not reading mailing lists with enough attention, but the 
> volume is very high), or, possibly worse, who sent a patch out, received my 
> review on the same day, and completely ignored it. The last issue is very 
> demotivating for reviewers. Those changes were not at all urgent, some of them 
> were "cleanups", or replacement of a deprecated API by a new one. That's very 
> different than fixing a build breakage in -next which clearly can't wait.
> 
> > If I don't have special rights as a maintainer and you don't trust me that
> > I use my common sense when I'm using these special rights, then you
> > degraded me to a patch juggling monkey. On the day this happens, I'll step
> > down.
> 
> Maintainers are much more than patch juggling monkeys, otherwise they could be 
> replaced by machines. I believe that maintainers are given the huge 
> responsibility of taking care of their community. Fostering a productive work 
> environment, attracting (and keeping) talented developers and reviewers is a 
> huge and honourable task, and gets my full respect. On top of that, if a 
> maintainer has great technical skills, it's even better, and I've learnt a lot 
> from talented maintainers over the time. I however believe that technical 
> skills are not an excuse for not leading by example and showing what the good 
> practices are by applying them.
> 
> (This goes without saying, but even better when said explicitly, there's not 
> judgment about your or any particular maintainer's technical or non-technical 
> skills here)
> 
> > > I would even go as far as saying that all patches should have Reviewed-by
> > > or Acked-by tag, without enforcing that rule too strictly (I'm thinking
> > > in particular about drivers that only a single person cares about, it's
> > > sometimes hard to get patches reviewed).
> > 
> > If we enforce that, then a large part of reviewed-by and acked-by tags will
> > just come from coworkers or other affiliates and have no value at all.
> 
> That's a concern I share, and one of the reasons why I have my doubts about 
> some of the maintainership experiments in the DRM subsystem. The proponents of 
> the changes there pointed out to me that development has sped up as a result, 
> but I think the costs associated with the acceleration haven't been fully 
> evaluated.
> 
> > That's anyway a growing disease that patches already carry reviewed tags
> > when they are posted the first time and then you look at them and they have
> > at least one easy to spot or easy to detect by tools bug in them.
> 
> Do you get bothered that they carry a tag when they are posted the first time, 
> or only that they do so *and* have clear problems ?
> 

I guess the issue is that it is very difficult to trust reviewed-by or
acked-by tags that are coming from coworkers/affiliates, especially more
when the review didn't happen publicly. From my point of view, that
review may or may not have happened.

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 13:10                                                       ` Alexandre Belloni
@ 2018-09-12 13:30                                                         ` Thomas Gleixner
  2018-09-12 23:16                                                         ` Laurent Pinchart
  1 sibling, 0 replies; 138+ messages in thread
From: Thomas Gleixner @ 2018-09-12 13:30 UTC (permalink / raw)
  To: Alexandre Belloni; +Cc: James Bottomley, ksummit-discuss

On Wed, 12 Sep 2018, Alexandre Belloni wrote:
> On 12/09/2018 15:53:59+0300, Laurent Pinchart wrote:
> >
> > Do you get bothered that they carry a tag when they are posted the first time, 
> > or only that they do so *and* have clear problems ?

The tag does not bother me too much when the patch is correct, albeit I
prefer public review on the mailing list.

Though if you get a trivial comment typo fix with 5 reviewed tags already
applied then I really have to ask whether people have actually understood
what review means. These things emerge from 'have to follow rules no matter
what' processes in companies or they simply are caused by statistics
dressing. Both reasons are not helping the cause.

> I guess the issue is that it is very difficult to trust reviewed-by or
> acked-by tags that are coming from coworkers/affiliates, especially more
> when the review didn't happen publicly. From my point of view, that
> review may or may not have happened.

When the patch is obviously buggy, it's irrelevant if it happened or
not. It's worthless in both cases.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 12:36                                                 ` James Bottomley
@ 2018-09-12 13:38                                                   ` Guenter Roeck
  2018-09-12 13:59                                                     ` Tony Lindgren
  0 siblings, 1 reply; 138+ messages in thread
From: Guenter Roeck @ 2018-09-12 13:38 UTC (permalink / raw)
  To: James Bottomley, Geert Uytterhoeven; +Cc: ksummit-discuss

On 09/12/2018 05:36 AM, James Bottomley wrote:
> 
> Look, shit happens occasionally.  What then happens is that you get a
> note from Stephen saying your tree is dropped for a day for crapping on
> the carpet and you fix it.  -next still builds without you so I don't
> get what all the fuss is about.  From my point of view the -next
> process works very well and I don't see a need to complicate it with a
> -next-next or a -pre-next or whatever.
> 
Not sure if I agree with the "works very well" part. I would say it works,
for the most part, decently well, and we would be in a much worse situation
without it. I would hope for code in -next to be tested a bit better,
and problems to be fixed faster, but one can not have everything.

However, I definitely agree that we don't need -next-next or -pre-next.
That would not improve the situation at all; it would just create even
more noise and result in people testing their code even less before
publishing it.

Guenter

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 13:38                                                   ` Guenter Roeck
@ 2018-09-12 13:59                                                     ` Tony Lindgren
  0 siblings, 0 replies; 138+ messages in thread
From: Tony Lindgren @ 2018-09-12 13:59 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: James Bottomley, ksummit-discuss

* Guenter Roeck <linux@roeck-us.net> [180912 13:42]:
> On 09/12/2018 05:36 AM, James Bottomley wrote:
> > 
> > Look, shit happens occasionally.  What then happens is that you get a
> > note from Stephen saying your tree is dropped for a day for crapping on
> > the carpet and you fix it.  -next still builds without you so I don't
> > get what all the fuss is about.  From my point of view the -next
> > process works very well and I don't see a need to complicate it with a
> > -next-next or a -pre-next or whatever.

Fine so I trust James and many maintainers to do proper testing before
next. And I do testing too for the stuff I queue. And in many cases the
damage is quite limited in case of issues.

But then we have constant stream of patches that affect everybody and cause
regressions that don't seem to have much testing done on them before they
hit next.

> Not sure if I agree with the "works very well" part. I would say it works,
> for the most part, decently well, and we would be in a much worse situation
> without it. I would hope for code in -next to be tested a bit better,
> and problems to be fixed faster, but one can not have everything.

I think we can do much better. If we get next into usable shape then more
people will use it for testing. And then the -rc cycle becomes easy as
things have been mostly done already in next. Well, at least for me the
-rc cycle is now much easier after doing constant testing with next.

After all, next is really our common development tree, right? :)

So "How to keep Linux next working and usable continuously" might actually
be a somewhat constructive topic to dicuss.

> However, I definitely agree that we don't need -next-next or -pre-next.
> That would not improve the situation at all; it would just create even
> more noise and result in people testing their code even less before
> publishing it.

We could still establish a standard naming for "please-test-me" type
branches. Then we would at least provide a trigger for the test scripts
to go test them.

Regards,

Tony

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 12:53                                                     ` Laurent Pinchart
  2018-09-12 13:10                                                       ` Alexandre Belloni
@ 2018-09-12 14:11                                                       ` Thomas Gleixner
  2018-09-19  8:26                                                         ` Laurent Pinchart
  1 sibling, 1 reply; 138+ messages in thread
From: Thomas Gleixner @ 2018-09-12 14:11 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: James Bottomley, ksummit-discuss

On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> 
> Maintainers are much more than patch juggling monkeys, otherwise they could be 
> replaced by machines. I believe that maintainers are given the huge 
> responsibility of taking care of their community. Fostering a productive work 
> environment, attracting (and keeping) talented developers and reviewers is a 
> huge and honourable task, and gets my full respect. On top of that, if a 
> maintainer has great technical skills, it's even better, and I've learnt a lot 
> from talented maintainers over the time. I however believe that technical 
> skills are not an excuse for not leading by example and showing what the good 
> practices are by applying them.

I surely agree, but reality is different.

I definitely apply my own patches w/o a review tag from time to time. And
aside of obvious typo cleanups/fixlets, which really can and have to do
without, all of my patches are posted to LKML and I carefuly respect and
address review comments.

Though, what am I supposed to do if nothing happens? Repost them five times
to annoy people? Been there, tried that. Does not help.

Most of these patches are refactoring and cleanups of the subsystems I
maintain and I do them for three reasons:

  1) Making the code more maintainable, which in the first place serves the
     egoistic bastard I am, because it makes my life as a maintainer
     simpler in the long run. It also allows others to work easier on top
     of that, which again makes it easier for me to review.

  2) During review of a feature patch submitted by someone else, I notice
     that the code is crap already and the feature adds more crap to it.

     So I first try to nudge the submitter to fix that up, but either it's
     outside their expertise level or they are simply telling me: 'I need
     to get this in and cleanup is outside of the scope of my task'.

     For the latter, I just refuse to merge it most of the times, but then
     I already identified how it should be done and go back to #1

  3) New hardware, new levels of scale unearth shortcomings in the code. I
     get problem reports and because I deeply care about the stuff I'm
     responsible for, I go and fix it if nobody else cares. Guess what,
     often enough I do not even get a tested-by reply by the people who
     complained in the first place. But with the knowledge of the problem
     and the solution, I would be outright stupid to just put them into
     /dev/null because applying them again makes my life easier.

So again, it's a problem which has to do with the lack of review capacity
and the lack of people who really care beyond the brim of their teacup.

The 'Make feature X work upstream' task mentality of companies is part of
the problem along with the expectation, that maintainers will coach,
educate and babysit their newbies when they have been tasked with problems
way over their expertise levels. Especially the last part is frustrating
for everyone. The submitter has worked on this feature for a long time just
to get it shredded in pieces and then after I got frustrated by the review
ping pong, I give up and fix it myself in order to have time for other
things on that ever growing todo list.

This simply cannot scale at all and I'm well aware of it, but I completely
disagree that this can be fixed by more formalistic processes, gitlab or
whatever people dream up.

It has to be fixed at the mindset level. A code base as large and as
complex as the kernel needs continous refactoring and cannot be used as
dumping ground for new features in a drive by mode.

Aside of that, I see people working for large companies doing reviews in
their spare time, because they care about it. But that's just wrong, they
should be able to enjoy their spare time as anybody else and get the time
to review during their work hours.

I surely encourage people to review things and I offload quite some of the
work to people who care, but finding them and keeping them on board is hard
because their daily work just does not allow them to keep up.

I'm definitely open for new ideas and new ways to work, but OTOH I'm not
interested at all in the 'fix the symptoms' approach and thereby hoping
that the root cause will cure itself. It simply does not work independent
of the problem space.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:17                                     ` Geert Uytterhoeven
@ 2018-09-12 15:15                                       ` Eduardo Valentin
  0 siblings, 0 replies; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-12 15:15 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: James Bottomley, ksummit-discuss

On Tue, Sep 11, 2018 at 08:17:33PM +0200, Geert Uytterhoeven wrote:
> On Tue, Sep 11, 2018 at 8:13 PM Eduardo Valentin <edubezval@gmail.com> wrote:
> > But given the 0day dependency on -next itself, I am not sure it is worth
> > it. Why is that a thing anyways? 0day cannot test individual branches?
> 
> 0day does test individual branches (I get email reports for each branch I
> push to git.kernel.org).  The number of tests run depends on the load,
> though, AFAIK.

Yeah, looking back at my mailbox, I do see 0day reports on branches I
update at my git trees on kernel.org. Maybe the issue is more on what
tests it runs on individual branches. And Yes, I also think sometimes it
take longer for it to send a report, so, yeah, depends on load.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 18:19                                     ` James Bottomley
@ 2018-09-12 15:17                                       ` Eduardo Valentin
  0 siblings, 0 replies; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-12 15:17 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, Sep 11, 2018 at 11:19:20AM -0700, James Bottomley wrote:
> On September 11, 2018 11:12:51 AM PDT, Eduardo Valentin <edubezval@gmail.com> wrote:
> >Hey,
> >
> >On Tue, Sep 11, 2018 a 10:47:02AM -0700, James Bottomley wrote:
> >> On Tue, 2018-09-11 at 10:40 -0700, Tony Lindgren wrote:
> >> > * Steven Rostedt <rostedt@goodmis.org> [180911 15:46]:
> >> > > On Mon, 10 Sep 2018 19:13:29 -0400
> >> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> >> > > 
> >> > > > On Mon, 10 Sep 2018 16:03:03 -0700
> >> > > > Eduardo Valentin <edubezval@gmail.com> wrote:
> >> > > > 
> >> > > > > I thought that was the case already, everthing that goes to
> >> > > > > linux-next is ready to go to Linus.  
> >> > > > 
> >> > > > It's suppose to be, but not always, and this is why I suggested
> >> > > > that Linus start yelling at those that are not doing it.
> >> > > > 
> >> > > 
> >> > > This may have come across a bit too strong. We don't need Linus
> >to
> >> > > yell, but there should definitely be consequences for any
> >> > > maintainer that pushes untested code to linux-next. At a bare
> >> > > minimum, all code that goes into linux-next should have passed
> >0day
> >> > > bot. Push code to a non-linux-next branch on kernel.org, wait a
> >few
> >> > > days, if you don't get any reports that a bot caught something
> >> > > broken, you should be good to go (also you can opt-in to get
> >> > > reports on 0day success, which I do, to make that cycle even
> >> > > shorter). And that's a pretty low bar to have to
> >> > > pass. Ideally, all maintainers should have a set of tests they
> >run
> >> > > before pushing anything to linux-next, or to Linus in the late
> >> > > -rcs. If you can't be bothered just to rely on at least 0day then
> >> > > you should not be a maintainer.
> >> > 
> >> > Based on the regressions I seem to hit quite a few Linux next
> >> > regressions could have been avoided if Andrew's mm tree had seem
> >some
> >> > more testing before being added to next. Probably because Andrew
> >> > queues lots of complicated patches :)
> >> > 
> >> > So yeah what you're suggesting might help with that if we establish
> >> > a let's say 24 hour period before adding branches to next. At
> >> > least that gives the automated systems a chance to test stuff
> >> > before it hits next. And people who want to can then test various
> >> > branches separately in advance.
> >> 
> >> I really don't think that helps.  The 0day mailing list bot seems to
> >be
> >
> >I think the idea is to minimize the failures on -next, and use the
> >linux-next tree for its original purpose: check for integration issues.
> 
> We thought the patch was ready based on our acceptance criteria. That's why it went into our -next integration branch.  Finding bugs after we thought it was ready is a legitimate integration issue...
> 
> 
> >> a bit overloaded and about 80% of the automation isn't run *unless*
> >> your branch hits -next.  Our criterion for -next queueing is local
> >
> >Oh, I see. I was not aware of such dependency of the 0day bot. The
> >kernelCI bot, which I mainly use after my local test, has no such
> >requirement.
> >
> >> tests pass, code inspection complete and 0day ML didn't complain. 
> >> However, we still get quite a few reports from the -next automated
> >> testing even after our local stuff.  I really don't see what delaying
> >> into -next buys you except delaying finding and fixing bugs.
> >> 
> >
> >having a larger build/boot/test coverage before pushing on linux-next.
> >But given the 0day dependency on -next itself, I am not sure it is
> >worth
> >it. Why is that a thing anyways? 0day cannot test individual branches?
> 
> 0day does test most branches it finds but its better to work off a curated list which the -next build bot has.

Oh I see your point now. This is really a load / capacity issue then. I
am assuming, based on this dicussion, that 0day would put resources
first on stuff that is already in -next, then test individual branches,
when possible. 

> 
> James
> 
> >> This applies to 0day as well, because, by agreement, it has a much
> >> deeper set of xfstest runs for branches we actually queue for -next
> >> rather than the more cursory set of tests it runs on ML patches.
> >> 
> >> James
> >> 
> >> _______________________________________________
> >> Ksummit-discuss mailing list
> >> Ksummit-discuss@lists.linuxfoundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
> >_______________________________________________
> >Ksummit-discuss mailing list
> >Ksummit-discuss@lists.linuxfoundation.org
> >https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 23:20                                               ` Sasha Levin
@ 2018-09-12 15:41                                                 ` Eduardo Valentin
  0 siblings, 0 replies; 138+ messages in thread
From: Eduardo Valentin @ 2018-09-12 15:41 UTC (permalink / raw)
  To: Sasha Levin; +Cc: James Bottomley, ksummit

On Tue, Sep 11, 2018 at 11:20:01PM +0000, Sasha Levin via Ksummit-discuss wrote:
> On Tue, Sep 11, 2018 at 07:11:48PM -0400, James Bottomley wrote:
> >On Tue, 2018-09-11 at 23:04 +0000, Sasha Levin via Ksummit-discuss
> >wrote:
> >> On Tue, Sep 11, 2018 at 06:53:29PM -0400, James Bottomley wrote:
> >> > On Tue, 2018-09-11 at 16:31 -0400, Steven Rostedt wrote:
> >> > > On Tue, 11 Sep 2018 16:09:32 -0400
> >> > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >> > >
> >> > > > On Tue, 2018-09-11 at 14:39 -0400, Steven Rostedt wrote:
> >> >
> >> > [...]
> >> > > > > > This applies to 0day as well, because, by agreement, it has
> >> > > > > > a much deeper set of xfstest runs for branches we actually
> >> > > > > > queue for -next rather than the more cursory set of tests
> >> > > > > > it runs on ML patches.  
> >> > > > >
> >> > > > > What about the tests on local branches? Not what you post to
> >> > > > > the ML.  
> >> > > >
> >> > > > I don't really see any point having a local -pre-next pass and
> >> > > > hoping 0day will find it.  It's much more valuable (and faster)
> >> > > > to push to -next and have all the integrated tests work on it
> >> > > > once we've done everything we can locally.
> >> > >
> >> > > Why not do what I do and push to a -pre-next branch when you kick
> >> > > off your local tests?
> >> >
> >> > Because there's no point.  As I said, when we complete the local
> >> > criteria the branch is ready for integration.  We push to -next and
> >> > *all* the built bots tell us if there are any problems (which I
> >> > don't expect there are but there's room for me to be wrong) ...
> >> > including 0day.  I don't see what the delay and the process hassle
> >> > would buy us if we only get a review by 0day in the -pre-next
> >> > branch.  It seems more efficient to let every bot loose on what we
> >> > think is mergeable.
> >>
> >> The problem with that approach is that it will break build more
> >> often, and if build is broken then usually no automated testing are
> >> getting done for that day.
> >
> >You're making the wrong assumption: most bugs aren't build breaks.  We
> >do occasionally have them, usually because of an obscure config
> >interaction issue, but it's a tiny percentage, so the impact to the
> >entire tree usually isn't great.
> 
> Right, most bugs aren't build/boot bugs, but between all various
> subsystems there's always the one that snuck through. It only takes one
> to kill build.


Right, I agree build/boot bugs are not necessarily the more complex
issues.  But one single boot problem in linux next has a huge impact on
everyone that cares about testing it, and catching it to sooner the
better, no?

Also one maintainer may consider a config combination as an obscure one,
but not every may think that way. Having bots build/boot testing
branches on multiple configs is a thing that actually helps, specially
if done before pushing to -next. Then again, I am not saying that we
should push the responsibility of maintainers to bots to test stuff, but
improving the test coverage is always welcome, IMO.

> 
> >However, think of this like release early, release often.  If we think
> >a branch is ready but there's a lurking bug, even a build related one,
> >it's better we find out sooner than later, so it's still better we
> >expose it ASAP to the full test machinery.  If it's one the local
> >criteria should have caught, I'm sure there'll be a huge line up of
> >people wanting to complain (which is why we try to make sure any
> >lurking bugs aren't build related).

I agree. Looks like everyone wants to improve the testing coverage, the
disagreement seams to be on when and how.


> 
> Right, I'm not saying delay it, I'm just saying that you should feed it
> to the bots *while* you run your testsuites, even if to just get build
> coverage.

I agree with this proposal, specially if the idea is to get more people
to use a stabilized -next.

However, I also tend to agree that other nasty bugs will be found only
when changes hit a release or -rc kernel, or when they hit a stable
release. Specially when those gets released by distros. Obviously, this
should not prevent us to try to improve the process, specially wrt
testing.


> 
> Also remember that -next releases once a day in a very predictable
> timing, there's usually a few hours to spare between your push and the
> time linux-next gets constructed, so ASAP isn't really all that critical
> here as long as it gets in the same day.
> 
> --
> Thanks,
> Sasha
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-11 22:53                                         ` James Bottomley
  2018-09-11 23:04                                           ` Sasha Levin
  2018-09-11 23:22                                           ` Tony Lindgren
@ 2018-09-12 20:24                                           ` Steven Rostedt
  2018-09-12 20:29                                             ` Sasha Levin
  2018-09-13  0:19                                             ` Stephen Rothwell
  2 siblings, 2 replies; 138+ messages in thread
From: Steven Rostedt @ 2018-09-12 20:24 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit

On Tue, 11 Sep 2018 18:53:29 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> > Why not do what I do and push to a -pre-next branch when you kick off
> > your local tests?  
> 
> Because there's no point.  As I said, when we complete the local
> criteria the branch is ready for integration.  We push to -next and
> *all* the built bots tell us if there are any problems (which I don't
> expect there are but there's room for me to be wrong) ... including
> 0day.  I don't see what the delay and the process hassle would buy us
> if we only get a review by 0day in the -pre-next branch.  It seems more
> efficient to let every bot loose on what we think is mergeable.

Stephen,

If a bot discovers a new failure in linux-next, do you look to see
which tree caused it? And then create a new linux-next without that
tree?

If not, then perhaps we should do so.

-- Steve

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 20:24                                           ` Steven Rostedt
@ 2018-09-12 20:29                                             ` Sasha Levin
  2018-09-13  0:19                                             ` Stephen Rothwell
  1 sibling, 0 replies; 138+ messages in thread
From: Sasha Levin @ 2018-09-12 20:29 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: James Bottomley, ksummit

On Wed, Sep 12, 2018 at 04:24:22PM -0400, Steven Rostedt wrote:
>On Tue, 11 Sep 2018 18:53:29 -0400
>James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
>> > Why not do what I do and push to a -pre-next branch when you kick off
>> > your local tests?
>>
>> Because there's no point.  As I said, when we complete the local
>> criteria the branch is ready for integration.  We push to -next and
>> *all* the built bots tell us if there are any problems (which I don't
>> expect there are but there's room for me to be wrong) ... including
>> 0day.  I don't see what the delay and the process hassle would buy us
>> if we only get a review by 0day in the -pre-next branch.  It seems more
>> efficient to let every bot loose on what we think is mergeable.
>
>Stephen,
>
>If a bot discovers a new failure in linux-next, do you look to see
>which tree caused it? And then create a new linux-next without that
>tree?
>
>If not, then perhaps we should do so.

I suspect that by the time Stephen finishes merging everything, pushes
the tree out and receives back failure reports he's already about to hit
the bed.

Maybe it'll be useful adding someone who can revert a patch/merge on
Stephen's off-hours.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 13:10                                                       ` Alexandre Belloni
  2018-09-12 13:30                                                         ` Thomas Gleixner
@ 2018-09-12 23:16                                                         ` Laurent Pinchart
  1 sibling, 0 replies; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-12 23:16 UTC (permalink / raw)
  To: Alexandre Belloni; +Cc: James Bottomley, ksummit-discuss

Hi Alex,

On Wednesday, 12 September 2018 16:10:01 EEST Alexandre Belloni wrote:
> On 12/09/2018 15:53:59+0300, Laurent Pinchart wrote:
> > On Wednesday, 12 September 2018 15:29:25 EEST Thomas Gleixner wrote:
> >> On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> >>> On Wednesday, 12 September 2018 14:55:44 EEST Geert Uytterhoeven wrote:
> >>>> Good. Then this discussion wasn't targeted to the SCSI people, but to
> >>>> other maintainers pushing brown paper bags and other trivial
> >>>> breakages they should have caught beforehand to linux-next ;-)
> >>> 
> >>> That's a behaviour that has been annoying me lately, maintainers
> >>> should have no special privilege when it comes to pushing code
> >>> upstream. All patches should be posted publicly, given enough time to
> >>> be reviewed, and review comments should be addressed before anything
> >>> is merged to a -next branch. Unfortunately that's not always the case
> >>> :-S
> >> 
> >> Come on. Do you really expect me to wait for review when I fix up the
> >> internal testing/ 0-day fallout which is often enough something trivial?
> >> Do you really expect me to wait for review when I worked with a bug
> >> reporter to decode something and have a 100% explanation that it fixes
> >> the root cause and not the symptom?
> >> 
> >> 1) Our review capacity is small enough already, so we don't have to
> >>    throw more stuff out for review.
> >> 
> >> 2) With that modus, bugs will stay unfixed way longer and merging of
> >>    code will even be more delayed.
> > 
> > I don't expect to wait for review forever, but I expect maintainers to
> > give an opportunity to reviewers to review patches. We obviously need to
> > consider the balance between review opportunity and problems (such as
> > build breakages) that could affect hundreds of developers if left unfixed
> > even for a few days.
> > 
> > Too often I've noticed changes to code I maintain that introduced bugs or
> > other issues, performed by a maintainer who didn't even bother to post the
> > patch before pushing to to his -next branch, who didn't CC me (I could
> > take part of the blame for not reading mailing lists with enough
> > attention, but the volume is very high), or, possibly worse, who sent a
> > patch out, received my review on the same day, and completely ignored it.
> > The last issue is very demotivating for reviewers. Those changes were not
> > at all urgent, some of them were "cleanups", or replacement of a
> > deprecated API by a new one. That's very different than fixing a build
> > breakage in -next which clearly can't wait.
> > 
> >> If I don't have special rights as a maintainer and you don't trust me
> >> that I use my common sense when I'm using these special rights, then you
> >> degraded me to a patch juggling monkey. On the day this happens, I'll
> >> step down.
> > 
> > Maintainers are much more than patch juggling monkeys, otherwise they
> > could be replaced by machines. I believe that maintainers are given the
> > huge responsibility of taking care of their community. Fostering a
> > productive work environment, attracting (and keeping) talented developers
> > and reviewers is a huge and honourable task, and gets my full respect. On
> > top of that, if a maintainer has great technical skills, it's even
> > better, and I've learnt a lot from talented maintainers over the time. I
> > however believe that technical skills are not an excuse for not leading
> > by example and showing what the good practices are by applying them.
> > 
> > (This goes without saying, but even better when said explicitly, there's
> > not judgment about your or any particular maintainer's technical or
> > non-technical skills here)
> > 
> >>> I would even go as far as saying that all patches should have
> >>> Reviewed-by or Acked-by tag, without enforcing that rule too strictly
> >>> (I'm thinking in particular about drivers that only a single person
> >>> cares about, it's sometimes hard to get patches reviewed).
> >> 
> >> If we enforce that, then a large part of reviewed-by and acked-by tags
> >> will just come from coworkers or other affiliates and have no value at
> >> all.
> > 
> > That's a concern I share, and one of the reasons why I have my doubts
> > about some of the maintainership experiments in the DRM subsystem. The
> > proponents of the changes there pointed out to me that development has
> > sped up as a result, but I think the costs associated with the
> > acceleration haven't been fully evaluated.
> > 
> >> That's anyway a growing disease that patches already carry reviewed tags
> >> when they are posted the first time and then you look at them and they
> >> have at least one easy to spot or easy to detect by tools bug in them.
> > 
> > Do you get bothered that they carry a tag when they are posted the first
> > time, or only that they do so *and* have clear problems ?
> 
> I guess the issue is that it is very difficult to trust reviewed-by or
> acked-by tags that are coming from coworkers/affiliates, especially more
> when the review didn't happen publicly. From my point of view, that
> review may or may not have happened.

I'm not sure we can put more trust on public review when the reply only 
contains a Reviewed-by without any other comment, just because the review is 
public. In the end it's a matter of trusting the reviewer. I'll try that you 
have carefully reviewed an RTC patch carrying your R-b tag when it gets posted 
by another bootlin developer, because I trust you. I won't have much trust in 
a R-b coming in an otherwise empty e-mail from someone who has never sent or 
reviewed any patch and who has no history of open source contributions, even 
if that review e-mail is public and done by a non-coworker/affiliate.

I don't think we disagree fundamentally here, R-b carries a value when the 
reviewer has done enough work to gain trust from the community, regardless of 
whether reviews are first done in private or don't contain any other 
information than the R-b tag.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 20:24                                           ` Steven Rostedt
  2018-09-12 20:29                                             ` Sasha Levin
@ 2018-09-13  0:19                                             ` Stephen Rothwell
  2018-09-13 11:39                                               ` Mark Brown
  1 sibling, 1 reply; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-13  0:19 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

Hi Steve,

On Wed, 12 Sep 2018 16:24:22 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 11 Sep 2018 18:53:29 -0400
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > > Why not do what I do and push to a -pre-next branch when you kick off
> > > your local tests?    
> > 
> > Because there's no point.  As I said, when we complete the local
> > criteria the branch is ready for integration.  We push to -next and
> > *all* the built bots tell us if there are any problems (which I don't
> > expect there are but there's room for me to be wrong) ... including
> > 0day.  I don't see what the delay and the process hassle would buy us
> > if we only get a review by 0day in the -pre-next branch.  It seems more
> > efficient to let every bot loose on what we think is mergeable.  
> 
> If a bot discovers a new failure in linux-next, do you look to see
> which tree caused it? And then create a new linux-next without that
> tree?

Well, obviously, that depends.  Firstly, I have only once done 2
linux-next release in one day (and that was way back in 2008) as it is
mostly just to much work (a minimal linux-next release takes 4 hours or
more ... just before the merge window opens, it can take over 12 hours).

Sometimes a bot will actually identify the actual commit and sometimes
not.  Sometimes I don't even see the notifications :-(

Reverting a whole tree can be a real challenge in itself.  I currently
have to consider 286 branches for merging every day (a lot are empty
especially in the first few -rcs, obviously) and if the branch has been
merged early, then there is a reasonable chance that there is some
interaction with later merges.

An alternative is to reset linux-next to just before the offending
branch was merged and remerge all the following branches without doing
the intermediate builds (optimistically).

As Sacha said, I often don't see reports until after I have finished
for the day (or woken up the next morning), so the best chance is to
fix the next linux-next release.

> If not, then perhaps we should do so.

I will think about how I could go about it.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-13  0:19                                             ` Stephen Rothwell
@ 2018-09-13 11:39                                               ` Mark Brown
  2018-09-19  6:27                                                 ` Stephen Rothwell
  0 siblings, 1 reply; 138+ messages in thread
From: Mark Brown @ 2018-09-13 11:39 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

On Thu, Sep 13, 2018 at 10:19:55AM +1000, Stephen Rothwell wrote:
> On Wed, 12 Sep 2018 16:24:22 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:

> > If a bot discovers a new failure in linux-next, do you look to see
> > which tree caused it? And then create a new linux-next without that
> > tree?

> Sometimes a bot will actually identify the actual commit and sometimes
> not.  Sometimes I don't even see the notifications :-(

Do you want to see these sent to you directly as a matter of course
(rather than just if they're particularly disruptive)?  I don't CC you
on stuff normally because I figure it'd be too noisy.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-13 11:39                                               ` Mark Brown
@ 2018-09-19  6:27                                                 ` Stephen Rothwell
  2018-09-19 17:24                                                   ` Mark Brown
  0 siblings, 1 reply; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-19  6:27 UTC (permalink / raw)
  To: Mark Brown; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

Hi Mark,

On Thu, 13 Sep 2018 12:39:19 +0100 Mark Brown <broonie@kernel.org> wrote:
>
> On Thu, Sep 13, 2018 at 10:19:55AM +1000, Stephen Rothwell wrote:
> > On Wed, 12 Sep 2018 16:24:22 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:  
> 
> > > If a bot discovers a new failure in linux-next, do you look to see
> > > which tree caused it? And then create a new linux-next without that
> > > tree?  
> 
> > Sometimes a bot will actually identify the actual commit and sometimes
> > not.  Sometimes I don't even see the notifications :-(  
> 
> Do you want to see these sent to you directly as a matter of course
> (rather than just if they're particularly disruptive)?  I don't CC you
> on stuff normally because I figure it'd be too noisy.

I do get your summaries each evening on the linux-next list, though,
right?  Certainly I guess I should know about disruptive problems.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-12 14:11                                                       ` Thomas Gleixner
@ 2018-09-19  8:26                                                         ` Laurent Pinchart
  2018-09-20  9:02                                                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-19  8:26 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: James Bottomley, ksummit-discuss

Hi Thomas,

On Wednesday, 12 September 2018 17:11:34 EEST Thomas Gleixner wrote:
> On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> > Maintainers are much more than patch juggling monkeys, otherwise they
> > could be replaced by machines. I believe that maintainers are given the
> > huge responsibility of taking care of their community. Fostering a
> > productive work environment, attracting (and keeping) talented developers
> > and reviewers is a huge and honourable task, and gets my full respect. On
> > top of that, if a maintainer has great technical skills, it's even
> > better, and I've learnt a lot from talented maintainers over the time. I
> > however believe that technical skills are not an excuse for not leading
> > by example and showing what the good practices are by applying them.
> 
> I surely agree, but reality is different.
> 
> I definitely apply my own patches w/o a review tag from time to time. And
> aside of obvious typo cleanups/fixlets, which really can and have to do
> without, all of my patches are posted to LKML and I carefuly respect and
> address review comments.
> 
> Though, what am I supposed to do if nothing happens? Repost them five times
> to annoy people? Been there, tried that. Does not help.

Certainly not. I do apply my own patches without Reviewed-by tags from time to 
time as well because nobody cared enough about a particular driver to review 
the code. When working in corporate environment, or on code that is developed 
by a group of people, it gets a bit easier to find interested (or in the 
corporate case one could argue coerced) reviewers, but in the general case 
it's not always possible. I trust that you will actively try to find a 
reviewer for changes you have doubts about yourself, while you won't 
necessarily go and ping people for a typo fix.

I don't see anything wrong with this. My point with maintainers privilege was 
that maintainers shouldn't bypass the normal review procedure of sending 
patches out to public mailing lists, CC'ing the appropriate developers, giving 
a bit of time for the review to take place, and incorporating review comments 
in new versions. If no reviewer shows up, it's business as usual, but not 
different than what happens with other developers than subsystem maintainers, 
code can still be merged (I'd love our review rate to be improved, but that's 
not specific to maintainers patches, it's a general issue).

As I believe I have mentioned before a case where a maintainer submitted a 
patch touching my code, I reviewed it within a few hours, asking for changes, 
and the patch was still merged as-is. That's very demotivating for reviewers. 
Worse, I've pointed out the problem twice by replying to my original review, 
and still haven't received an answer. This is the kind of maintainers 
privilege culture that I think isn't acceptable.

> Most of these patches are refactoring and cleanups of the subsystems I
> maintain and I do them for three reasons:
> 
>   1) Making the code more maintainable, which in the first place serves the
>      egoistic bastard I am, because it makes my life as a maintainer
>      simpler in the long run. It also allows others to work easier on top
>      of that, which again makes it easier for me to review.
> 
>   2) During review of a feature patch submitted by someone else, I notice
>      that the code is crap already and the feature adds more crap to it.
> 
>      So I first try to nudge the submitter to fix that up, but either it's
>      outside their expertise level or they are simply telling me: 'I need
>      to get this in and cleanup is outside of the scope of my task'.
> 
>      For the latter, I just refuse to merge it most of the times, but then
>      I already identified how it should be done and go back to #1
> 
>   3) New hardware, new levels of scale unearth shortcomings in the code. I
>      get problem reports and because I deeply care about the stuff I'm
>      responsible for, I go and fix it if nobody else cares. Guess what,
>      often enough I do not even get a tested-by reply by the people who
>      complained in the first place. But with the knowledge of the problem
>      and the solution, I would be outright stupid to just put them into
>      /dev/null because applying them again makes my life easier.
> 
> So again, it's a problem which has to do with the lack of review capacity
> and the lack of people who really care beyond the brim of their teacup.
> 
> The 'Make feature X work upstream' task mentality of companies is part of
> the problem along with the expectation, that maintainers will coach,
> educate and babysit their newbies when they have been tasked with problems
> way over their expertise levels. Especially the last part is frustrating
> for everyone. The submitter has worked on this feature for a long time just
> to get it shredded in pieces and then after I got frustrated by the review
> ping pong, I give up and fix it myself in order to have time for other
> things on that ever growing todo list.
> 
> This simply cannot scale at all and I'm well aware of it, but I completely
> disagree that this can be fixed by more formalistic processes, gitlab or
> whatever people dream up.

No disagreement here. While gitlab offers interesting features (such as CI 
integration), no tool will magically improve our review capacity (a new tool 
could cause a marginal influx of new reviewers currently put off by the need 
to use e-mail, but I think that in many cases it would be canceled by the 
exodus of current reviewers who would be forced to use something else - gerrit 
comes to mind, I think that particular tool it could kill the kernel 
community).

> It has to be fixed at the mindset level. A code base as large and as
> complex as the kernel needs continous refactoring and cannot be used as
> dumping ground for new features in a drive by mode.
> 
> Aside of that, I see people working for large companies doing reviews in
> their spare time, because they care about it. But that's just wrong, they
> should be able to enjoy their spare time as anybody else and get the time
> to review during their work hours.

I've successfully negotiated in the past budget (as in time) with a customer 
to review code in subsystems of interest not directly related to the 
customer's needs. My main argument was that review was allowing the team to be 
recognized as a major actor in the subsystem, and to influence technical 
decisions in a direction as favourable as possible for the customer (but not 
at the detriment of others of course). This was unfortunately an exception 
rather than a rule, but I think that if we could hammer the message in at a 
larger scale, there would be hope for improvement.

> I surely encourage people to review things and I offload quite some of the
> work to people who care, but finding them and keeping them on board is hard
> because their daily work just does not allow them to keep up.
> 
> I'm definitely open for new ideas and new ways to work, but OTOH I'm not
> interested at all in the 'fix the symptoms' approach and thereby hoping
> that the root cause will cure itself. It simply does not work independent
> of the problem space.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-19  6:27                                                 ` Stephen Rothwell
@ 2018-09-19 17:24                                                   ` Mark Brown
  2018-09-19 21:42                                                     ` Stephen Rothwell
  0 siblings, 1 reply; 138+ messages in thread
From: Mark Brown @ 2018-09-19 17:24 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 610 bytes --]

On Wed, Sep 19, 2018 at 04:27:31PM +1000, Stephen Rothwell wrote:
> On Thu, 13 Sep 2018 12:39:19 +0100 Mark Brown <broonie@kernel.org> wrote:

> > Do you want to see these sent to you directly as a matter of course
> > (rather than just if they're particularly disruptive)?  I don't CC you
> > on stuff normally because I figure it'd be too noisy.

> I do get your summaries each evening on the linux-next list, though,
> right?  Certainly I guess I should know about disruptive problems.

Yes, everything's on the list - I just don't CC you personally on
everything unless it's urgent, but I can if you want.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-19 17:24                                                   ` Mark Brown
@ 2018-09-19 21:42                                                     ` Stephen Rothwell
  0 siblings, 0 replies; 138+ messages in thread
From: Stephen Rothwell @ 2018-09-19 21:42 UTC (permalink / raw)
  To: Mark Brown; +Cc: James Bottomley, ksummit

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Hi Mark,

On Wed, 19 Sep 2018 10:24:48 -0700 Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Sep 19, 2018 at 04:27:31PM +1000, Stephen Rothwell wrote:
> > On Thu, 13 Sep 2018 12:39:19 +0100 Mark Brown <broonie@kernel.org> wrote:  
> 
> > > Do you want to see these sent to you directly as a matter of course
> > > (rather than just if they're particularly disruptive)?  I don't CC you
> > > on stuff normally because I figure it'd be too noisy.  
> 
> > I do get your summaries each evening on the linux-next list, though,
> > right?  Certainly I guess I should know about disruptive problems.  
> 
> Yes, everything's on the list - I just don't CC you personally on
> everything unless it's urgent, but I can if you want.

No, I do see everything on the linux-next list, so that's fine.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-19  8:26                                                         ` Laurent Pinchart
@ 2018-09-20  9:02                                                           ` Rafael J. Wysocki
  2018-09-20 10:10                                                             ` Laurent Pinchart
  0 siblings, 1 reply; 138+ messages in thread
From: Rafael J. Wysocki @ 2018-09-20  9:02 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: James Bottomley, ksummit-discuss

On Wednesday, September 19, 2018 10:26:20 AM CEST Laurent Pinchart wrote:
> Hi Thomas,
> 
> On Wednesday, 12 September 2018 17:11:34 EEST Thomas Gleixner wrote:
> > On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> > > Maintainers are much more than patch juggling monkeys, otherwise they
> > > could be replaced by machines. I believe that maintainers are given the
> > > huge responsibility of taking care of their community. Fostering a
> > > productive work environment, attracting (and keeping) talented developers
> > > and reviewers is a huge and honourable task, and gets my full respect. On
> > > top of that, if a maintainer has great technical skills, it's even
> > > better, and I've learnt a lot from talented maintainers over the time. I
> > > however believe that technical skills are not an excuse for not leading
> > > by example and showing what the good practices are by applying them.
> > 
> > I surely agree, but reality is different.
> > 
> > I definitely apply my own patches w/o a review tag from time to time. And
> > aside of obvious typo cleanups/fixlets, which really can and have to do
> > without, all of my patches are posted to LKML and I carefuly respect and
> > address review comments.
> > 
> > Though, what am I supposed to do if nothing happens? Repost them five times
> > to annoy people? Been there, tried that. Does not help.
> 
> Certainly not. I do apply my own patches without Reviewed-by tags from time to 
> time as well because nobody cared enough about a particular driver to review 
> the code. When working in corporate environment, or on code that is developed 
> by a group of people, it gets a bit easier to find interested (or in the 
> corporate case one could argue coerced) reviewers, but in the general case 
> it's not always possible. I trust that you will actively try to find a 
> reviewer for changes you have doubts about yourself, while you won't 
> necessarily go and ping people for a typo fix.
> 
> I don't see anything wrong with this. My point with maintainers privilege was 
> that maintainers shouldn't bypass the normal review procedure of sending 
> patches out to public mailing lists, CC'ing the appropriate developers, giving 
> a bit of time for the review to take place, and incorporating review comments 
> in new versions. If no reviewer shows up, it's business as usual, but not 
> different than what happens with other developers than subsystem maintainers, 
> code can still be merged (I'd love our review rate to be improved, but that's 
> not specific to maintainers patches, it's a general issue).
> 
> As I believe I have mentioned before a case where a maintainer submitted a 
> patch touching my code, I reviewed it within a few hours, asking for changes, 
> and the patch was still merged as-is. That's very demotivating for reviewers. 
> Worse, I've pointed out the problem twice by replying to my original review, 
> and still haven't received an answer. This is the kind of maintainers 
> privilege culture that I think isn't acceptable.

Agreed, but how many maintainers do that?

Also, did you talk about the situation to anyone except for the given
maintainer?

Such practices are potentially dangerous from the technical standpoint too,
so it would be good to bring them more to light when such things happen.

That said, talking about "maintainers privilege" in general terms sort of puts
all maintainers into one "ugly" bucket which many of them don't deserve.

> > Most of these patches are refactoring and cleanups of the subsystems I
> > maintain and I do them for three reasons:
> > 
> >   1) Making the code more maintainable, which in the first place serves the
> >      egoistic bastard I am, because it makes my life as a maintainer
> >      simpler in the long run. It also allows others to work easier on top
> >      of that, which again makes it easier for me to review.
> > 
> >   2) During review of a feature patch submitted by someone else, I notice
> >      that the code is crap already and the feature adds more crap to it.
> > 
> >      So I first try to nudge the submitter to fix that up, but either it's
> >      outside their expertise level or they are simply telling me: 'I need
> >      to get this in and cleanup is outside of the scope of my task'.
> > 
> >      For the latter, I just refuse to merge it most of the times, but then
> >      I already identified how it should be done and go back to #1
> > 
> >   3) New hardware, new levels of scale unearth shortcomings in the code. I
> >      get problem reports and because I deeply care about the stuff I'm
> >      responsible for, I go and fix it if nobody else cares. Guess what,
> >      often enough I do not even get a tested-by reply by the people who
> >      complained in the first place. But with the knowledge of the problem
> >      and the solution, I would be outright stupid to just put them into
> >      /dev/null because applying them again makes my life easier.
> > 
> > So again, it's a problem which has to do with the lack of review capacity
> > and the lack of people who really care beyond the brim of their teacup.
> > 
> > The 'Make feature X work upstream' task mentality of companies is part of
> > the problem along with the expectation, that maintainers will coach,
> > educate and babysit their newbies when they have been tasked with problems
> > way over their expertise levels. Especially the last part is frustrating
> > for everyone. The submitter has worked on this feature for a long time just
> > to get it shredded in pieces and then after I got frustrated by the review
> > ping pong, I give up and fix it myself in order to have time for other
> > things on that ever growing todo list.
> > 
> > This simply cannot scale at all and I'm well aware of it, but I completely
> > disagree that this can be fixed by more formalistic processes, gitlab or
> > whatever people dream up.
> 
> No disagreement here. While gitlab offers interesting features (such as CI 
> integration), no tool will magically improve our review capacity (a new tool 
> could cause a marginal influx of new reviewers currently put off by the need 
> to use e-mail, but I think that in many cases it would be canceled by the 
> exodus of current reviewers who would be forced to use something else - gerrit 
> comes to mind, I think that particular tool it could kill the kernel 
> community).

Well, to be a meaningful reviewer, you need to be sufficiently familiar with
the code in question and, to put it bluntly, switching over to a new tool
won't make people magically acquire that knowledge.

Honestly, do we have any research data on how many people actually are put off
by the "unfriendly" e-mail use for patch review requirement or is it just pure
speculation?

> > It has to be fixed at the mindset level. A code base as large and as
> > complex as the kernel needs continous refactoring and cannot be used as
> > dumping ground for new features in a drive by mode.
> > 
> > Aside of that, I see people working for large companies doing reviews in
> > their spare time, because they care about it. But that's just wrong, they
> > should be able to enjoy their spare time as anybody else and get the time
> > to review during their work hours.
> 
> I've successfully negotiated in the past budget (as in time) with a customer 
> to review code in subsystems of interest not directly related to the 
> customer's needs. My main argument was that review was allowing the team to be 
> recognized as a major actor in the subsystem, and to influence technical 
> decisions in a direction as favourable as possible for the customer (but not 
> at the detriment of others of course). This was unfortunately an exception 
> rather than a rule, but I think that if we could hammer the message in at a 
> larger scale, there would be hope for improvement.

In the first place, as stated above, there need to be more people sufficiently
failiar with the code where the review is needed and, importantly enough, with
the assumptions behind it.  Unfortunately, this requires quite a bit of
learning and, in many cases, significant involvement in the development of
the code in question.  For mature and complex pieces of code this means
a steep learning curve for pretty much no benefit at least to start with,
unless you have a vested interest in that code for some reason.

IMO the only way to improve the situation in that respect would be to find a
way to retain the people who had already invested time and effort in thorough
understanding of some kernel code in the community as reviewers, but
unfortunately I don't see any easy way to achieve that.  Also, I don't really
think that the tooling and workflow organization changes discussed in this
thread and elsewhere are likely to really help with this particular thing.

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-20  9:02                                                           ` Rafael J. Wysocki
@ 2018-09-20 10:10                                                             ` Laurent Pinchart
  2018-09-20 11:00                                                               ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-20 10:10 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: James Bottomley, ksummit-discuss

Hi Rafael,

On Thursday, 20 September 2018 12:02:33 EEST Rafael J. Wysocki wrote:
> On Wednesday, September 19, 2018 10:26:20 AM CEST Laurent Pinchart wrote:
> > On Wednesday, 12 September 2018 17:11:34 EEST Thomas Gleixner wrote:
> >> On Wed, 12 Sep 2018, Laurent Pinchart wrote:
> >>> Maintainers are much more than patch juggling monkeys, otherwise they
> >>> could be replaced by machines. I believe that maintainers are given
> >>> the huge responsibility of taking care of their community. Fostering a
> >>> productive work environment, attracting (and keeping) talented
> >>> developers and reviewers is a huge and honourable task, and gets my
> >>> full respect. On top of that, if a maintainer has great technical
> >>> skills, it's even better, and I've learnt a lot from talented
> >>> maintainers over the time. I however believe that technical skills are
> >>> not an excuse for not leading by example and showing what the good
> >>> practices are by applying them.
> >> 
> >> I surely agree, but reality is different.
> >> 
> >> I definitely apply my own patches w/o a review tag from time to time.
> >> And aside of obvious typo cleanups/fixlets, which really can and have to
> >> do without, all of my patches are posted to LKML and I carefuly respect
> >> and address review comments.
> >> 
> >> Though, what am I supposed to do if nothing happens? Repost them five
> >> times to annoy people? Been there, tried that. Does not help.
> > 
> > Certainly not. I do apply my own patches without Reviewed-by tags from
> > time to time as well because nobody cared enough about a particular
> > driver to review the code. When working in corporate environment, or on
> > code that is developed by a group of people, it gets a bit easier to find
> > interested (or in the corporate case one could argue coerced) reviewers,
> > but in the general case it's not always possible. I trust that you will
> > actively try to find a reviewer for changes you have doubts about
> > yourself, while you won't necessarily go and ping people for a typo fix.
> > 
> > I don't see anything wrong with this. My point with maintainers privilege
> > was that maintainers shouldn't bypass the normal review procedure of
> > sending patches out to public mailing lists, CC'ing the appropriate
> > developers, giving a bit of time for the review to take place, and
> > incorporating review comments in new versions. If no reviewer shows up,
> > it's business as usual, but not different than what happens with other
> > developers than subsystem maintainers, code can still be merged (I'd love
> > our review rate to be improved, but that's not specific to maintainers
> > patches, it's a general issue).
> > 
> > As I believe I have mentioned before a case where a maintainer submitted a
> > patch touching my code, I reviewed it within a few hours, asking for
> > changes, and the patch was still merged as-is. That's very demotivating
> > for reviewers. Worse, I've pointed out the problem twice by replying to
> > my original review, and still haven't received an answer. This is the
> > kind of maintainers privilege culture that I think isn't acceptable.
> 
> Agreed, but how many maintainers do that?

This specific example is a single person, but that's not an isolated incident. 
I don't have exact numbers though.

> Also, did you talk about the situation to anyone except for the given
> maintainer?

I haven't reported the situation higher up. I more or less gave up due to lack 
of time. And that's not a good thing: I know I got tough enough to think I can 
live with this kind of behaviour, but giving up has an effect on all the other 
developers who could also be subject to this.

> Such practices are potentially dangerous from the technical standpoint too,
> so it would be good to bring them more to light when such things happen.
> 
> That said, talking about "maintainers privilege" in general terms sort of
> puts all maintainers into one "ugly" bucket which many of them don't
> deserve.

I certainly didn't want to imply that the above behaviour was the norm from 
all maintainers, and I do apologize if someone took it personally when that 
wasn't intended, but I have received reports that it's not an isolate case 
either. We do have maintainers privileges in the kernel in the sense that 
maintainers can get away with lots of questionable actions. It does *not* mean 
that those privileges are used and abused by everyone, I am personally of the 
opinion that an overwhelming majority of maintainers do their best, and most 
of them do a good job (mistakes happen, but as long as they're not recurring, 
I have no concern there). There are however cases of abuse, and I don't think 
we should be silent about them on the basis that putting them under the 
spotlight would cast a bad light on all the good maintainers we have. Quite 
the contrary, I think that ignoring issues has a more damaging potential for 
the community as a whole.

I'm thus open to proposal for an alternative vocabulary to discuss what I have 
described so far as maintainers privileges.

> >> Most of these patches are refactoring and cleanups of the subsystems I
> >> 
> >> maintain and I do them for three reasons:
> >>   1) Making the code more maintainable, which in the first place serves
> >>      the egoistic bastard I am, because it makes my life as a maintainer
> >>      simpler in the long run. It also allows others to work easier on
> >>      top of that, which again makes it easier for me to review.
> >>   
> >>   2) During review of a feature patch submitted by someone else, I
> >>      notice that the code is crap already and the feature adds more crap
> >>      to it.
> >>      
> >>      So I first try to nudge the submitter to fix that up, but either
> >>      it's outside their expertise level or they are simply telling me:
> >>      'I need to get this in and cleanup is outside of the scope of my
> >>      task'.
> >>      
> >>      For the latter, I just refuse to merge it most of the times, but
> >>      then I already identified how it should be done and go back to #1
> >>   
> >>   3) New hardware, new levels of scale unearth shortcomings in the code.
> >>      I get problem reports and because I deeply care about the stuff I'm
> >>      responsible for, I go and fix it if nobody else cares. Guess what,
> >>      often enough I do not even get a tested-by reply by the people who
> >>      complained in the first place. But with the knowledge of the
> >>      problem and the solution, I would be outright stupid to just put
> >>      them into /dev/null because applying them again makes my life
> >>      easier.
> >> 
> >> So again, it's a problem which has to do with the lack of review
> >> capacity and the lack of people who really care beyond the brim of their
> >> teacup.
> >> 
> >> The 'Make feature X work upstream' task mentality of companies is part
> >> of the problem along with the expectation, that maintainers will coach,
> >> educate and babysit their newbies when they have been tasked with
> >> problems way over their expertise levels. Especially the last part is
> >> frustrating for everyone. The submitter has worked on this feature for a
> >> long time just to get it shredded in pieces and then after I got
> >> frustrated by the review ping pong, I give up and fix it myself in order
> >> to have time for other things on that ever growing todo list.
> >> 
> >> This simply cannot scale at all and I'm well aware of it, but I
> >> completely disagree that this can be fixed by more formalistic
> >> processes, gitlab or whatever people dream up.
> > 
> > No disagreement here. While gitlab offers interesting features (such as CI
> > integration), no tool will magically improve our review capacity (a new
> > tool could cause a marginal influx of new reviewers currently put off by
> > the need to use e-mail, but I think that in many cases it would be
> > canceled by the exodus of current reviewers who would be forced to use
> > something else - gerrit comes to mind, I think that particular tool it
> > could kill the kernel community).
> 
> Well, to be a meaningful reviewer, you need to be sufficiently familiar with
> the code in question and, to put it bluntly, switching over to a new tool
> won't make people magically acquire that knowledge.

Slightly out of topic, this is one of my concerns with the DRM multi-
committers model. The subsystem has gained lots of reviewers in the sense that 
people have been pushed to cross-review patches (which in itself is not a bad 
idea). However, I'm worried that the global review knowledge (that's a notion 
that may be worth a more formal definition) gets diluted in the process, 
lowering the value of each review independently. I don't know how to fix that 
though, and what the right balance would be between a single reviewer with all 
the knowledge and a myriad of reviewers with little knowledge each.

> Honestly, do we have any research data on how many people actually are put
> off by the "unfriendly" e-mail use for patch review requirement or is it
> just pure speculation?

I believe Daniel has more information.

> >> It has to be fixed at the mindset level. A code base as large and as
> >> complex as the kernel needs continous refactoring and cannot be used as
> >> dumping ground for new features in a drive by mode.
> >> 
> >> Aside of that, I see people working for large companies doing reviews in
> >> their spare time, because they care about it. But that's just wrong,
> >> they should be able to enjoy their spare time as anybody else and get
> >> the time to review during their work hours.
> > 
> > I've successfully negotiated in the past budget (as in time) with a
> > customer to review code in subsystems of interest not directly related to
> > the customer's needs. My main argument was that review was allowing the
> > team to be recognized as a major actor in the subsystem, and to influence
> > technical decisions in a direction as favourable as possible for the
> > customer (but not at the detriment of others of course). This was
> > unfortunately an exception rather than a rule, but I think that if we
> > could hammer the message in at a larger scale, there would be hope for
> > improvement.
> 
> In the first place, as stated above, there need to be more people
> sufficiently failiar with the code where the review is needed and,
> importantly enough, with the assumptions behind it.  Unfortunately, this
> requires quite a bit of learning and, in many cases, significant
> involvement in the development of the code in question.  For mature and
> complex pieces of code this means a steep learning curve for pretty much no
> benefit at least to start with, unless you have a vested interest in that
> code for some reason.
> 
> IMO the only way to improve the situation in that respect would be to find a
> way to retain the people who had already invested time and effort in
> thorough understanding of some kernel code in the community as reviewers,
> but unfortunately I don't see any easy way to achieve that.

I don't think there's an easy way, but I've also been in touch with customers 
who were willing to pay for just development guidance and patch review. That 
was a small minority though (as in a single one :-)), but if we could show 
that companies benefits from such services, it could create a business case 
for kernel developers.

> Also, I don't really think that the tooling and workflow organization
> changes discussed in this thread and elsewhere are likely to really help
> with this particular thing.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-20 10:10                                                             ` Laurent Pinchart
@ 2018-09-20 11:00                                                               ` Daniel Vetter
  2018-09-20 11:08                                                                 ` Laurent Pinchart
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2018-09-20 11:00 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: James Bottomley, ksummit

On Thu, Sep 20, 2018 at 12:10 PM, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Thursday, 20 September 2018 12:02:33 EEST Rafael J. Wysocki wrote:
>> Honestly, do we have any research data on how many people actually are put
>> off by the "unfriendly" e-mail use for patch review requirement or is it
>> just pure speculation?
>
> I believe Daniel has more information.

I think 2 people left the drm/i915 team (and kernel development at
large) explicitly because of our archaic toolchain. A pile more left,
where the archaic toolchain at least motivated a switch in
teams/projects. No solid data yet on what happens when we'd enable
merge request, but some of the engineers (who never contributed to the
kernel before) I've chatted with are absolutely raving about the mere
possibility even if very small&distant. This might be biased towards
coporate teams, since I have no data on the people who might join as
volunteers or from random places - I only know this because they all
worked for Intel.

Also, no data ofc about how many we might lose if we force a merge
request flow instead of patches scribbled on chalk boards :-) And a
good middle ground should be achievable, giving us everyone.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-20 11:00                                                               ` Daniel Vetter
@ 2018-09-20 11:08                                                                 ` Laurent Pinchart
  2018-09-20 11:49                                                                   ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Laurent Pinchart @ 2018-09-20 11:08 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Bottomley, ksummit

Hi Daniel,

On Thursday, 20 September 2018 14:00:47 EEST Daniel Vetter wrote:
> On Thu, Sep 20, 2018 at 12:10 PM, Laurent Pinchart wrote:
> > On Thursday, 20 September 2018 12:02:33 EEST Rafael J. Wysocki wrote:
> >> Honestly, do we have any research data on how many people actually are
> >> put off by the "unfriendly" e-mail use for patch review requirement or is
> >> it just pure speculation?
> > 
> > I believe Daniel has more information.
> 
> I think 2 people left the drm/i915 team (and kernel development at
> large) explicitly because of our archaic toolchain. A pile more left,
> where the archaic toolchain at least motivated a switch in
> teams/projects. No solid data yet on what happens when we'd enable
> merge request, but some of the engineers (who never contributed to the
> kernel before) I've chatted with are absolutely raving about the mere
> possibility even if very small&distant.

Have you asked them for precise points that bother them (up to the point they 
wouldn't contribute) in the existing process ?

> This might be biased towards coporate teams, since I have no data on the
> people who might join as volunteers or from random places - I only know this
> because they all worked for Intel.

Thank you for providing a bit of data.

> Also, no data ofc about how many we might lose if we force a merge
> request flow instead of patches scribbled on chalk boards :-)

Is that the opposite of the patches through instagram workflow ? :-)

The answer obviously depends on what new process we would put in place. 
There's a wide range of options between enabling pull requests through git 
pull and forcing the whole review and merge process through gerrit. In the 
latter case you'd lose me :-)

> And a good middle ground should be achievable, giving us everyone.

I believe so as well, if we manage to reach a sensible consensus.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches
  2018-09-20 11:08                                                                 ` Laurent Pinchart
@ 2018-09-20 11:49                                                                   ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2018-09-20 11:49 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: James Bottomley, ksummit

On Thu, Sep 20, 2018 at 1:08 PM, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Thursday, 20 September 2018 14:00:47 EEST Daniel Vetter wrote:
>> On Thu, Sep 20, 2018 at 12:10 PM, Laurent Pinchart wrote:
>> > On Thursday, 20 September 2018 12:02:33 EEST Rafael J. Wysocki wrote:
>> >> Honestly, do we have any research data on how many people actually are
>> >> put off by the "unfriendly" e-mail use for patch review requirement or is
>> >> it just pure speculation?
>> >
>> > I believe Daniel has more information.
>>
>> I think 2 people left the drm/i915 team (and kernel development at
>> large) explicitly because of our archaic toolchain. A pile more left,
>> where the archaic toolchain at least motivated a switch in
>> teams/projects. No solid data yet on what happens when we'd enable
>> merge request, but some of the engineers (who never contributed to the
>> kernel before) I've chatted with are absolutely raving about the mere
>> possibility even if very small&distant.
>
> Have you asked them for precise points that bother them (up to the point they
> wouldn't contribute) in the existing process ?

Frankly, I was a bit much in denial that patches on mailing lists
might not be the most awesome thing ever. One of them spent
cocnsiderable time working on the fd.o patchwrok, in a in hindsight
futile attempt to fix things up. So I guess it's all the reasons
already discussed on why patchwork isn't really the promised land that
can make sense of the chaos on mailing lists.

>> This might be biased towards coporate teams, since I have no data on the
>> people who might join as volunteers or from random places - I only know this
>> because they all worked for Intel.
>
> Thank you for providing a bit of data.
>
>> Also, no data ofc about how many we might lose if we force a merge
>> request flow instead of patches scribbled on chalk boards :-)
>
> Is that the opposite of the patches through instagram workflow ? :-)

Ah, should have looked up the quote first. It's "patches carved into
stone tablets":

https://www.youtube.com/watch?v=L8OOzaqS37s
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2018-09-20 11:49 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 20:16 [Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches Sasha Levin
2018-09-04 20:53 ` Daniel Vetter
2018-09-05 14:17   ` Steven Rostedt
2018-09-07  0:51     ` Sasha Levin
2018-09-07  1:09       ` Steven Rostedt
2018-09-07 20:12         ` Greg KH
2018-09-07 21:12           ` Greg KH
2018-09-07  1:09       ` Linus Torvalds
2018-09-07  1:49         ` Sasha Levin
2018-09-07  2:31           ` Linus Torvalds
2018-09-07  2:45             ` Steven Rostedt
2018-09-07  3:43               ` Linus Torvalds
2018-09-07  8:52                 ` Daniel Vetter
2018-09-07  8:40               ` Geert Uytterhoeven
2018-09-07  9:07                 ` Daniel Vetter
2018-09-07  9:28                   ` Geert Uytterhoeven
2018-09-07 17:05                   ` Olof Johansson
2018-09-07 14:54             ` Sasha Levin
2018-09-07 15:52               ` Linus Torvalds
2018-09-07 16:17                 ` Linus Torvalds
2018-09-07 21:39                   ` Mauro Carvalho Chehab
2018-09-09 12:50                   ` Stephen Rothwell
2018-09-10 20:05                     ` Tony Lindgren
2018-09-10 19:43                 ` Sasha Levin
2018-09-10 20:45                   ` Steven Rostedt
2018-09-10 21:20                     ` Guenter Roeck
2018-09-10 21:46                       ` Steven Rostedt
2018-09-10 23:03                         ` Eduardo Valentin
2018-09-10 23:13                           ` Steven Rostedt
2018-09-11 15:42                             ` Steven Rostedt
2018-09-11 17:40                               ` Tony Lindgren
2018-09-11 17:47                                 ` James Bottomley
2018-09-11 18:12                                   ` Eduardo Valentin
2018-09-11 18:17                                     ` Geert Uytterhoeven
2018-09-12 15:15                                       ` Eduardo Valentin
2018-09-11 18:19                                     ` James Bottomley
2018-09-12 15:17                                       ` Eduardo Valentin
2018-09-11 18:39                                   ` Steven Rostedt
2018-09-11 20:09                                     ` James Bottomley
2018-09-11 20:31                                       ` Steven Rostedt
2018-09-11 22:53                                         ` James Bottomley
2018-09-11 23:04                                           ` Sasha Levin
2018-09-11 23:11                                             ` James Bottomley
2018-09-11 23:20                                               ` Sasha Levin
2018-09-12 15:41                                                 ` Eduardo Valentin
2018-09-11 23:22                                           ` Tony Lindgren
2018-09-11 23:29                                             ` James Bottomley
2018-09-12 11:55                                               ` Geert Uytterhoeven
2018-09-12 12:03                                                 ` Laurent Pinchart
2018-09-12 12:29                                                   ` Thomas Gleixner
2018-09-12 12:53                                                     ` Laurent Pinchart
2018-09-12 13:10                                                       ` Alexandre Belloni
2018-09-12 13:30                                                         ` Thomas Gleixner
2018-09-12 23:16                                                         ` Laurent Pinchart
2018-09-12 14:11                                                       ` Thomas Gleixner
2018-09-19  8:26                                                         ` Laurent Pinchart
2018-09-20  9:02                                                           ` Rafael J. Wysocki
2018-09-20 10:10                                                             ` Laurent Pinchart
2018-09-20 11:00                                                               ` Daniel Vetter
2018-09-20 11:08                                                                 ` Laurent Pinchart
2018-09-20 11:49                                                                   ` Daniel Vetter
2018-09-12 12:36                                                 ` James Bottomley
2018-09-12 13:38                                                   ` Guenter Roeck
2018-09-12 13:59                                                     ` Tony Lindgren
2018-09-12 10:04                                             ` Mark Brown
2018-09-12 20:24                                           ` Steven Rostedt
2018-09-12 20:29                                             ` Sasha Levin
2018-09-13  0:19                                             ` Stephen Rothwell
2018-09-13 11:39                                               ` Mark Brown
2018-09-19  6:27                                                 ` Stephen Rothwell
2018-09-19 17:24                                                   ` Mark Brown
2018-09-19 21:42                                                     ` Stephen Rothwell
2018-09-11  0:49                           ` Stephen Rothwell
2018-09-11  1:01                             ` Al Viro
2018-09-11  0:47                         ` Stephen Rothwell
2018-09-11 17:35                           ` Linus Torvalds
2018-09-11  0:43                       ` Stephen Rothwell
2018-09-11 16:49                         ` Guenter Roeck
2018-09-11 17:47                           ` Guenter Roeck
2018-09-11 11:18                       ` Mark Brown
2018-09-11 17:02                         ` Guenter Roeck
2018-09-11 17:12                           ` Jani Nikula
2018-09-11 17:31                             ` Mark Brown
2018-09-11 17:41                               ` Daniel Vetter
2018-09-11 18:54                                 ` Mark Brown
2018-09-11 18:03                             ` Geert Uytterhoeven
2018-09-11 17:22                           ` James Bottomley
2018-09-11 17:56                             ` Mark Brown
2018-09-11 18:00                               ` James Bottomley
2018-09-11 18:16                                 ` Mark Brown
2018-09-11 18:07                             ` Geert Uytterhoeven
2018-09-12  9:09                             ` Dan Carpenter
2018-09-11 17:26                           ` Mark Brown
2018-09-11 18:45                           ` Steven Rostedt
2018-09-11 18:57                             ` Daniel Vetter
2018-09-11 20:15                               ` Thomas Gleixner
2018-09-12  9:03                           ` Dan Carpenter
2018-09-10 23:01                     ` Eduardo Valentin
2018-09-10 23:12                       ` Steven Rostedt
2018-09-10 23:32                         ` Eduardo Valentin
2018-09-10 23:38                           ` Guenter Roeck
2018-09-10 23:38                     ` Sasha Levin
2018-09-07  2:33           ` Steven Rostedt
2018-09-07  2:52           ` Guenter Roeck
2018-09-07 14:37             ` Laura Abbott
2018-09-07 15:06               ` Sasha Levin
2018-09-07 15:54                 ` Laura Abbott
2018-09-07 16:09                   ` Sasha Levin
2018-09-07 20:23                     ` Greg KH
2018-09-07 21:13                       ` Sasha Levin
2018-09-07 22:27                         ` Linus Torvalds
2018-09-07 22:43                           ` Guenter Roeck
2018-09-07 22:53                             ` Linus Torvalds
2018-09-07 22:57                               ` Sasha Levin
2018-09-07 23:52                                 ` Guenter Roeck
2018-09-08 16:33                                 ` Greg Kroah-Hartman
2018-09-08 18:35                                   ` Guenter Roeck
2018-09-10 13:47                                     ` Mark Brown
2018-09-09  4:36                                   ` Sasha Levin
2018-09-10 16:20                             ` Dan Rue
2018-09-07 21:32                 ` Dan Carpenter
2018-09-07 21:43                   ` Sasha Levin
2018-09-08 13:20                     ` Dan Carpenter
2018-09-10  8:23                     ` Jan Kara
2018-09-10  7:53                   ` Jan Kara
2018-09-07  3:38           ` Al Viro
2018-09-07  4:27           ` Theodore Y. Ts'o
2018-09-07  5:45             ` Stephen Rothwell
2018-09-07  9:13             ` Daniel Vetter
2018-09-07 11:32               ` Mark Brown
2018-09-07 21:06               ` Mauro Carvalho Chehab
2018-09-08  9:44                 ` Laurent Pinchart
2018-09-08 11:48                   ` Mauro Carvalho Chehab
2018-09-09 14:26                     ` Laurent Pinchart
2018-09-10 22:14                       ` Eduardo Valentin
2018-09-07 14:56             ` Sasha Levin
2018-09-07 15:07               ` Jens Axboe
2018-09-07 20:58                 ` Mauro Carvalho Chehab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.