All of lore.kernel.org
 help / color / mirror / Atom feed
* bug-introducing patches (or: -rc cycles suck)
@ 2018-04-30 17:58 Sasha Levin
  2018-04-30 18:38 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sasha Levin @ 2018-04-30 17:58 UTC (permalink / raw)
  To: Greg KH, julia.lawall; +Cc: linux-kernel

Working on AUTOSEL, it became even more obvious to me how difficult it is for a patch to get a proper review. Maintainers found it difficult to keep up with the upstream work for their subsystem, and reviewing additional -stable patches put even more load on them which some suggested would be more than what they can handle.

While AUTOSEL tries to understand if a patch fixes a bug, this was a bit late: the bug was already introduced, folks already have to deal with it, and the kernel is broken. I was wondering if I can do a similar process to AUTOSEL, but teach the AI about bug-introducing patches.

When someone fixes a bug, he would describe the patch differently than he would if he was writing a new feature. This lets AUTOSEL build on different commit message constructs, among various inputs, to recognize bug fixes. However, people are unaware that they introduce a bug, so the commit message for bug introducing patches is essentially the same as for commits that don't introduce a bug. This meant that I had to try and source data out of different sources.

Few of the parameters I ended up using are:
 - -next data (days spent in -next, changes in the patch between -next trees, ...)
 - Mailing list data (was this patch ever sent to a ML? How long before it was merged? How many replies did it get? ...)
 - Author/commiter/maintainer chain data. Just like sports, some folks are more likely to produce better results than others. This goes beyond just "skill", but also looks at things such as whether the author patches a subsystem he's "familiar with" (== subsystem where most of his patches usually go), or is he modifying a subsystem he never sent a patch for.
 - Patch complexity metrics - various code metrics to indicate how "complex" a patch is. Think 100 lines of whitespace fixes vs 100 lines that significantly changes a subsystem.
 - Kernel process correctness - I tried using "violations" of the kernel process (patch formatting, correctness of the mailing to lkml, etc) as an indicator of how familiar the author is with the kernel, with the presumption that folks who are newer to kernel development are more likely to introduce bugs

Running an initial iteration on a set of commits made two things very obvious to me:

1. -rc releases suck. seriously suck. The quality of commits that went in -rc cycles was much worse that merge window commit:
 - All commits had the same chance of introducing a bug whether they came in a merge window or an -rc cycle. This means that -rc commits mostly end up replacing obvious bugs with less obvious ones.
 - While the average merge window commit changes, on average, 3x more lines than an -rc commit, the chances of a bug introduced per patch is the same, which means that bugs-per-line metric of code is much higher with -rc patches.
 - A merge window commit spent 50% more days, on average, in -next than a -rc commit.
 - The number of -rc commits that never saw any mailing list or has never been replied to on a mailing list was **way** higher than merge window commits.
 - For some reason, the odds of a -rc commit to be targetted for -stable is over 20%, while for merge window commits it's about 3%. I can't quite explain why that happens, but this would suggest that -rc commits end up hurting -stable pretty badly.

2. Maintainers need to stop writing patches, commiting them, and pushing them in without reviews.
In -rc cycles there is quite a large number of commits that were either written by maintainers, commited, and merged upstream the same day. These patches are very likely to introduce a new bug.


I don't really have a proposal beyond "tighten up -rc cycles", but I think it's a discussion worth having. We have enough data to show what parts of kernel development work, and what parts are just hurting us.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-04-30 17:58 bug-introducing patches (or: -rc cycles suck) Sasha Levin
@ 2018-04-30 18:38 ` Greg KH
  2018-04-30 19:09 ` Willy Tarreau
  2018-04-30 19:12 ` Julia Lawall
  2 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2018-04-30 18:38 UTC (permalink / raw)
  To: Sasha Levin; +Cc: julia.lawall, linux-kernel

On Mon, Apr 30, 2018 at 05:58:30PM +0000, Sasha Levin wrote:
> Working on AUTOSEL, it became even more obvious to me how difficult it is for a patch to get a proper review. Maintainers found it difficult to keep up with the upstream work for their subsystem, and reviewing additional -stable patches put even more load on them which some suggested would be more than what they can handle.
> 

<snip>

/me hands Sasha some extra '\n' characters

This is great info.  Can you reformat your email to be sane and send
this to the ksummit discuss mailing list?  I think the audience there
might be a bit more receptive than the noise that is lkml at times :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-04-30 17:58 bug-introducing patches (or: -rc cycles suck) Sasha Levin
  2018-04-30 18:38 ` Greg KH
@ 2018-04-30 19:09 ` Willy Tarreau
  2018-05-01 16:19   ` Sasha Levin
  2018-04-30 19:12 ` Julia Lawall
  2 siblings, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2018-04-30 19:09 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg KH, julia.lawall, linux-kernel

Hi Sasha,

On Mon, Apr 30, 2018 at 05:58:30PM +0000, Sasha Levin wrote:
>  - For some reason, the odds of a -rc commit to be targetted for -stable is
>  over 20%, while for merge window commits it's about 3%. I can't quite
>  explain why that happens, but this would suggest that -rc commits end up
>  hurting -stable pretty badly.

Often, merge window collects work that has been done during the previous
cycle and which is prepared to target this merge window. Fixes that happen
during this period very likely tend to either be remerged with the patches
before they are submitted if they concern the code to be submitted, or are
delayed to after the work gets merged. As a result few of the pre-rc1 patches
get backported while the next ones mostly contain fixes. By the way, you
probably also noticed it when backporting patches to your stable releases,
the mainline commit almost never comes from a merge window.

> 2. Maintainers need to stop writing patches, commiting them, and pushing them
> in without reviews. In -rc cycles there is quite a large number of commits
> that were either written by maintainers, commited, and merged upstream the
> same day. These patches are very likely to introduce a new bug.

Developers are humans before anything else. We probably all address most
bug reports the same way : "ah, of course, stupid me, now that's fixed".
Keep in mind that for the developer, the pressure has lowered now that
the code got merged, and that mentally the fix is "on top" of the initial
work and no more part of it. It often means a narrower mental image of
how the fix fits in the whole code.

I think that you'll also notice that fixes that address bugs introduced
during the merge window of the same version will more often introduce
bugs than the ones which address 6-months old bugs which require some
deeper thinking. In short it indicates that we tend to believe we are
better than we really are, especially very late at night.

> I don't really have a proposal beyond "tighten up -rc cycles", but I think
> it's a discussion worth having. We have enough data to show what parts of
> kernel development work, and what parts are just hurting us.

I'm inclined to believe that making individuals aware of their own
mistakes can help. I personally like to try to understand how I managed
to introduce a bug, it's always useful. Very often it's around "I was
pretty sure it didn't require testing, the change was so obvious". We
all know this feeling when you write 100 lines in a new file, you
compile, and it builds without any warning and apparently works, and
suddenly you think "uh oh, what did I do wrong?" and you have no idea
where to start to look for possible mistakes.

Probably that some statistics on mistake classifications and maybe some
affected subsystems (if that doesn't blame anyone) could be useful.

Just my two cents,
Willy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-04-30 17:58 bug-introducing patches (or: -rc cycles suck) Sasha Levin
  2018-04-30 18:38 ` Greg KH
  2018-04-30 19:09 ` Willy Tarreau
@ 2018-04-30 19:12 ` Julia Lawall
  2018-05-01 16:24   ` Sasha Levin
  2 siblings, 1 reply; 8+ messages in thread
From: Julia Lawall @ 2018-04-30 19:12 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg KH, linux-kernel



On Mon, 30 Apr 2018, Sasha Levin wrote:

> Working on AUTOSEL, it became even more obvious to me how difficult it is for a patch to get a proper review. Maintainers found it difficult to keep up with the upstream work for their subsystem, and reviewing additional -stable patches put even more load on them which some suggested would be more than what they can handle.
>
> While AUTOSEL tries to understand if a patch fixes a bug, this was a bit late: the bug was already introduced, folks already have to deal with it, and the kernel is broken. I was wondering if I can do a similar process to AUTOSEL, but teach the AI about bug-introducing patches.
>
> When someone fixes a bug, he would describe the patch differently than he would if he was writing a new feature. This lets AUTOSEL build on different commit message constructs, among various inputs, to recognize bug fixes. However, people are unaware that they introduce a bug, so the commit message for bug introducing patches is essentially the same as for commits that don't introduce a bug. This meant that I had to try and source data out of different sources.
>
> Few of the parameters I ended up using are:
>  - -next data (days spent in -next, changes in the patch between -next trees, ...)
>  - Mailing list data (was this patch ever sent to a ML? How long before it was merged? How many replies did it get? ...)
>  - Author/commiter/maintainer chain data. Just like sports, some folks are more likely to produce better results than others. This goes beyond just "skill", but also looks at things such as whether the author patches a subsystem he's "familiar with" (== subsystem where most of his patches usually go), or is he modifying a subsystem he never sent a patch for.
>  - Patch complexity metrics - various code metrics to indicate how "complex" a patch is. Think 100 lines of whitespace fixes vs 100 lines that significantly changes a subsystem.
>  - Kernel process correctness - I tried using "violations" of the kernel process (patch formatting, correctness of the mailing to lkml, etc) as an indicator of how familiar the author is with the kernel, with the presumption that folks who are newer to kernel development are more likely to introduce bugs

I'm not completely sure to understand what you are doing.  Is there also
some connection to things that are identified in some way as being bug
introducing patches?  Or are you just using these as metrics of low
quality?

I wonder how far one could get by just collecting the set of patches that
are referenced with fixes tags by stable patches, and then using machine
learning taking into account only the code to find other patches that make
similar changes.

julia

> Running an initial iteration on a set of commits made two things very obvious to me:
>
> 1. -rc releases suck. seriously suck. The quality of commits that went in -rc cycles was much worse that merge window commit:
>  - All commits had the same chance of introducing a bug whether they came in a merge window or an -rc cycle. This means that -rc commits mostly end up replacing obvious bugs with less obvious ones.
>  - While the average merge window commit changes, on average, 3x more lines than an -rc commit, the chances of a bug introduced per patch is the same, which means that bugs-per-line metric of code is much higher with -rc patches.
>  - A merge window commit spent 50% more days, on average, in -next than a -rc commit.
>  - The number of -rc commits that never saw any mailing list or has never been replied to on a mailing list was **way** higher than merge window commits.
>  - For some reason, the odds of a -rc commit to be targetted for -stable is over 20%, while for merge window commits it's about 3%. I can't quite explain why that happens, but this would suggest that -rc commits end up hurting -stable pretty badly.
>
> 2. Maintainers need to stop writing patches, commiting them, and pushing them in without reviews.
> In -rc cycles there is quite a large number of commits that were either written by maintainers, commited, and merged upstream the same day. These patches are very likely to introduce a new bug.
>
>
> I don't really have a proposal beyond "tighten up -rc cycles", but I think it's a discussion worth having. We have enough data to show what parts of kernel development work, and what parts are just hurting us.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-04-30 19:09 ` Willy Tarreau
@ 2018-05-01 16:19   ` Sasha Levin
  2018-05-01 16:50     ` Willy Tarreau
  0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2018-05-01 16:19 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Greg KH, julia.lawall, linux-kernel

On Mon, Apr 30, 2018 at 09:09:18PM +0200, Willy Tarreau wrote:
>Hi Sasha,
>
>On Mon, Apr 30, 2018 at 05:58:30PM +0000, Sasha Levin wrote:
>>  - For some reason, the odds of a -rc commit to be targetted for -stable is
>>  over 20%, while for merge window commits it's about 3%. I can't quite
>>  explain why that happens, but this would suggest that -rc commits end up
>>  hurting -stable pretty badly.
>
>Often, merge window collects work that has been done during the previous
>cycle and which is prepared to target this merge window. Fixes that happen
>during this period very likely tend to either be remerged with the patches
>before they are submitted if they concern the code to be submitted, or are
>delayed to after the work gets merged. As a result few of the pre-rc1 patches
>get backported while the next ones mostly contain fixes. By the way, you
>probably also noticed it when backporting patches to your stable releases,
>the mainline commit almost never comes from a merge window.

I'm not sure I understand/agree with this explanation. You're saying
that commits that fix issues in newly introduced features got folded in
the feature before it was sent during the merge window, so then there
was no need for them to be tagged for stable?

This would be also true for -rc cycle patches if they fix a commit that
was introduced in that merge window: patches that fix a feature that got
in that same merge window don't need to be tagged for stable either
since the feature didn't exist in a previous release.

The way I see it is that -stable commits fix a bug that was introduced
in a feature that exists in a kernel that was already released. At that
point, the fix can come in at any point in time, whether the fix was
created during the merge window, or during an -rc cycle.

It also appears that pretty much the same ratio of commits are tagged
for -stable accross all -rc cycles, so there are no spikes at any point
during the cycle, which seems to suggest that there is no particular
relationship between when a -stable commit is created to the stage in a
release cycle of the current kernel.

>> 2. Maintainers need to stop writing patches, commiting them, and pushing them
>> in without reviews. In -rc cycles there is quite a large number of commits
>> that were either written by maintainers, commited, and merged upstream the
>> same day. These patches are very likely to introduce a new bug.
>
>Developers are humans before anything else. We probably all address most
>bug reports the same way : "ah, of course, stupid me, now that's fixed".
>Keep in mind that for the developer, the pressure has lowered now that
>the code got merged, and that mentally the fix is "on top" of the initial
>work and no more part of it. It often means a narrower mental image of
>how the fix fits in the whole code.
>
>I think that you'll also notice that fixes that address bugs introduced
>during the merge window of the same version will more often introduce
>bugs than the ones which address 6-months old bugs which require some
>deeper thinking. In short it indicates that we tend to believe we are
>better than we really are, especially very late at night.

I very much agree. I also think that "upper-level" maintainers, and
Linus in particular have to stop this behavior. Yes, folks who do these
patches are often very familiar with the subsystem, but this doesn't
mean that they don't make mistakes.

It's as if during -rc cycles all rules are void and bug fixes are now
no be collected and merged in as fast as humanly possible without any
regard to how well these fixes were tested.

With merge window stuff, Linus will make lots of noise if commits didn't
spend any time in -next (see https://lkml.org/lkml/2017/2/23/611) for
example. But it seems that -rc commits don't have that requirement.

>> I don't really have a proposal beyond "tighten up -rc cycles", but I think
>> it's a discussion worth having. We have enough data to show what parts of
>> kernel development work, and what parts are just hurting us.
>
>I'm inclined to believe that making individuals aware of their own
>mistakes can help. I personally like to try to understand how I managed
>to introduce a bug, it's always useful. Very often it's around "I was
>pretty sure it didn't require testing, the change was so obvious". We
>all know this feeling when you write 100 lines in a new file, you
>compile, and it builds without any warning and apparently works, and
>suddenly you think "uh oh, what did I do wrong?" and you have no idea
>where to start to look for possible mistakes.
>
>Probably that some statistics on mistake classifications and maybe some
>affected subsystems (if that doesn't blame anyone) could be useful.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-04-30 19:12 ` Julia Lawall
@ 2018-05-01 16:24   ` Sasha Levin
  0 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2018-05-01 16:24 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Greg KH, linux-kernel

On Mon, Apr 30, 2018 at 09:12:08PM +0200, Julia Lawall wrote:
>
>
>On Mon, 30 Apr 2018, Sasha Levin wrote:
>
>> Working on AUTOSEL, it became even more obvious to me how difficult it is for a patch to get a proper review. Maintainers found it difficult to keep up with the upstream work for their subsystem, and reviewing additional -stable patches put even more load on them which some suggested would be more than what they can handle.
>>
>> While AUTOSEL tries to understand if a patch fixes a bug, this was a bit late: the bug was already introduced, folks already have to deal with it, and the kernel is broken. I was wondering if I can do a similar process to AUTOSEL, but teach the AI about bug-introducing patches.
>>
>> When someone fixes a bug, he would describe the patch differently than he would if he was writing a new feature. This lets AUTOSEL build on different commit message constructs, among various inputs, to recognize bug fixes. However, people are unaware that they introduce a bug, so the commit message for bug introducing patches is essentially the same as for commits that don't introduce a bug. This meant that I had to try and source data out of different sources.
>>
>> Few of the parameters I ended up using are:
>>  - -next data (days spent in -next, changes in the patch between -next trees, ...)
>>  - Mailing list data (was this patch ever sent to a ML? How long before it was merged? How many replies did it get? ...)
>>  - Author/commiter/maintainer chain data. Just like sports, some folks are more likely to produce better results than others. This goes beyond just "skill", but also looks at things such as whether the author patches a subsystem he's "familiar with" (== subsystem where most of his patches usually go), or is he modifying a subsystem he never sent a patch for.
>>  - Patch complexity metrics - various code metrics to indicate how "complex" a patch is. Think 100 lines of whitespace fixes vs 100 lines that significantly changes a subsystem.
>>  - Kernel process correctness - I tried using "violations" of the kernel process (patch formatting, correctness of the mailing to lkml, etc) as an indicator of how familiar the author is with the kernel, with the presumption that folks who are newer to kernel development are more likely to introduce bugs
>
>I'm not completely sure to understand what you are doing.  Is there also
>some connection to things that are identified in some way as being bug
>introducing patches?  Or are you just using these as metrics of low
>quality?

Yes! My theory is that the things I listed above are actually better at
identifying bug introducing commits than plain code patterns or metrics.

To some extent, Coccinelle, smatch, etc already deal with identifying
problematic code patterns and addressing them.

>I wonder how far one could get by just collecting the set of patches that
>are referenced with fixes tags by stable patches, and then using machine
>learning taking into account only the code to find other patches that make
>similar changes.

This is exactly the training set I used. I didn't try looking at the
code itself because I don't have a good idea about how to turn code
patterns into something meaningfull for ML. Code metrics didn't prove
to be too useful for AUTOSEL so I sort of ignored it here (I only used
the same metrics we use for AUTOSEL).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-05-01 16:19   ` Sasha Levin
@ 2018-05-01 16:50     ` Willy Tarreau
  2018-05-01 17:21       ` Sasha Levin
  0 siblings, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2018-05-01 16:50 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg KH, julia.lawall, linux-kernel

On Tue, May 01, 2018 at 04:19:35PM +0000, Sasha Levin wrote:
> On Mon, Apr 30, 2018 at 09:09:18PM +0200, Willy Tarreau wrote:
> >Hi Sasha,
> >
> >On Mon, Apr 30, 2018 at 05:58:30PM +0000, Sasha Levin wrote:
> >>  - For some reason, the odds of a -rc commit to be targetted for -stable is
> >>  over 20%, while for merge window commits it's about 3%. I can't quite
> >>  explain why that happens, but this would suggest that -rc commits end up
> >>  hurting -stable pretty badly.
> >
> >Often, merge window collects work that has been done during the previous
> >cycle and which is prepared to target this merge window. Fixes that happen
> >during this period very likely tend to either be remerged with the patches
> >before they are submitted if they concern the code to be submitted, or are
> >delayed to after the work gets merged. As a result few of the pre-rc1 patches
> >get backported while the next ones mostly contain fixes. By the way, you
> >probably also noticed it when backporting patches to your stable releases,
> >the mainline commit almost never comes from a merge window.
> 
> I'm not sure I understand/agree with this explanation. You're saying
> that commits that fix issues in newly introduced features got folded in
> the feature before it was sent during the merge window, so then there
> was no need for them to be tagged for stable?

No, what I mean is that often a developer is either in development mode
or in bug-fixing mode but it's often quite hard to quickly switch between
the two. So when you're finishing what you were doing to meet the merge
window deadline and you receive bug fixes, it's natural to hold on a few
fixes because it's hard to switch to the review mode. However, if some
fixes concern the code you're about to submit, it's not bug fixing but
fixes for your development in progress and that doesn't require as much
effort, so these updates can often be remerged before being submitted.

> This would be also true for -rc cycle patches if they fix a commit that
> was introduced in that merge window: patches that fix a feature that got
> in that same merge window don't need to be tagged for stable either
> since the feature didn't exist in a previous release.

Definitely, unless the analysis is wrong and the fix addresses the tipping
part of the iceberg, as occasionally happens of course.

> The way I see it is that -stable commits fix a bug that was introduced
> in a feature that exists in a kernel that was already released. At that
> point, the fix can come in at any point in time, whether the fix was
> created during the merge window, or during an -rc cycle.

Yes, except if it requires some review by someone really busy finishing
some work for the merge window that's about to close.

> It also appears that pretty much the same ratio of commits are tagged
> for -stable accross all -rc cycles, so there are no spikes at any point
> during the cycle, which seems to suggest that there is no particular
> relationship between when a -stable commit is created to the stage in a
> release cycle of the current kernel.

Not much surprising to me. After all, -rc are "let's observe and fix",
and it's expected that bugs are randomly met and fixed during that
period.

> >I think that you'll also notice that fixes that address bugs introduced
> >during the merge window of the same version will more often introduce
> >bugs than the ones which address 6-months old bugs which require some
> >deeper thinking. In short it indicates that we tend to believe we are
> >better than we really are, especially very late at night.
> 
> I very much agree. I also think that "upper-level" maintainers, and
> Linus in particular have to stop this behavior.

Well it's easier said than done. You don't really choose when you can
become creative or efficient. For some people it's when everyone else
is asleep, for others it's when they can have 8 uninterrupted hours
in front of them to work on a complex bug. I think it's more efficient
to let people be aware of their limits than to try to overcome them.
The typical thought "I'm too stupid now, let's go to bed" followed the
next morning with a review starting to think "what did I break last
night" is already quite profitable provided people are humble enough
to think like this.

> Yes, folks who do these
> patches are often very familiar with the subsystem, but this doesn't
> mean that they don't make mistakes.

But we all do mistakes all the time. And quite frankly I find that the
recent kernels quality in the early stages after the release is much
better than what it used to be. Kernels build fine, boot fine on most
hardware, and after a few stable versions you can really start to
forget to update them because you don't meet the crashes anymore. Just
a simple example (please don't reproduce, I'm not proud of it), when
I replaced my PC, it came with 4.4.6. I thought "I'll have to upgrade
next week". But I had so many trouble with its crappy bogus BIOS that
I was afraid to reboot it. Then I had hundreds of xterms spread over
multiple displays and it was never the best moment to reboot. Finally
it happened 550 days later. Yes, the 6th maintenance release of 4.4
lasted 550 days on a developers machine doing all sort of stuff without
even a scary message in dmesg. Of course in terms of security it's
terrible. But we didn't see this level of stability in 2.6.x nor in
the early 3.x versions.

> It's as if during -rc cycles all rules are void and bug fixes are now
> no be collected and merged in as fast as humanly possible without any
> regard to how well these fixes were tested.

These stages are supposed to serve to collect fixes, and fixes are
supposed to be tested. Often it's worse to let a fix rot somewhere
than to get it. At the very least by merging it you expose it more
quickly and you have more chances to know if you missed anything.

I remember in the past some people arguing that we shouldn't backport
fixes that haven't experienced a release yet, but that would make the
situation even worse, with no stable fix for the 3 months following a
release. The overall amount of reverts in stable kernels remains very
low, which indicates to me that the overall quality is quite good,
eventhough the process causes gray hair to the involved people (well
for those still having hair).

That's overall why I think that your work can be useful to raise
awareness of what behaviours decrease the level of quality so that
everyone can try to improve a bit, but I don't think there is that
much to squeeze without hitting the wall of what a human brain can
reasonably deal with. And extra process is a mental pressure just
like dealing with bugs, so comes a point where process competes
with quality.

Willy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug-introducing patches (or: -rc cycles suck)
  2018-05-01 16:50     ` Willy Tarreau
@ 2018-05-01 17:21       ` Sasha Levin
  0 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2018-05-01 17:21 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Greg KH, julia.lawall, linux-kernel

On Tue, May 01, 2018 at 06:50:51PM +0200, Willy Tarreau wrote:
>On Tue, May 01, 2018 at 04:19:35PM +0000, Sasha Levin wrote:
>> On Mon, Apr 30, 2018 at 09:09:18PM +0200, Willy Tarreau wrote:
>> >Hi Sasha,
>> >
>> >On Mon, Apr 30, 2018 at 05:58:30PM +0000, Sasha Levin wrote:
>> >>  - For some reason, the odds of a -rc commit to be targetted for -stable is
>> >>  over 20%, while for merge window commits it's about 3%. I can't quite
>> >>  explain why that happens, but this would suggest that -rc commits end up
>> >>  hurting -stable pretty badly.
>> >
>> >Often, merge window collects work that has been done during the previous
>> >cycle and which is prepared to target this merge window. Fixes that happen
>> >during this period very likely tend to either be remerged with the patches
>> >before they are submitted if they concern the code to be submitted, or are
>> >delayed to after the work gets merged. As a result few of the pre-rc1 patches
>> >get backported while the next ones mostly contain fixes. By the way, you
>> >probably also noticed it when backporting patches to your stable releases,
>> >the mainline commit almost never comes from a merge window.
>>
>> I'm not sure I understand/agree with this explanation. You're saying
>> that commits that fix issues in newly introduced features got folded in
>> the feature before it was sent during the merge window, so then there
>> was no need for them to be tagged for stable?
>
>No, what I mean is that often a developer is either in development mode
>or in bug-fixing mode but it's often quite hard to quickly switch between
>the two. So when you're finishing what you were doing to meet the merge
>window deadline and you receive bug fixes, it's natural to hold on a few
>fixes because it's hard to switch to the review mode. However, if some
>fixes concern the code you're about to submit, it's not bug fixing but
>fixes for your development in progress and that doesn't require as much
>effort, so these updates can often be remerged before being submitted.

I see. But then, wouldn't there be a spike of -stable patches for -rc1
and -rc2?

[snip]
>> It also appears that pretty much the same ratio of commits are tagged
>> for -stable accross all -rc cycles, so there are no spikes at any point
>> during the cycle, which seems to suggest that there is no particular
>> relationship between when a -stable commit is created to the stage in a
>> release cycle of the current kernel.
>
>Not much surprising to me. After all, -rc are "let's observe and fix",
>and it's expected that bugs are randomly met and fixed during that
>period.

So for bugs found and fixed during -rc6+, those fixes should be merged
around the next merge window (time for reviews + time to bake in
stable), but this doesn't seem to happen. Maybe the lack of -stable
commits during merge windows is a symptom of the problem?

>> >I think that you'll also notice that fixes that address bugs introduced
>> >during the merge window of the same version will more often introduce
>> >bugs than the ones which address 6-months old bugs which require some
>> >deeper thinking. In short it indicates that we tend to believe we are
>> >better than we really are, especially very late at night.
>>
>> I very much agree. I also think that "upper-level" maintainers, and
>> Linus in particular have to stop this behavior.
>
>Well it's easier said than done. You don't really choose when you can
>become creative or efficient. For some people it's when everyone else
>is asleep, for others it's when they can have 8 uninterrupted hours
>in front of them to work on a complex bug. I think it's more efficient
>to let people be aware of their limits than to try to overcome them.
>The typical thought "I'm too stupid now, let's go to bed" followed the
>next morning with a review starting to think "what did I break last
>night" is already quite profitable provided people are humble enough
>to think like this.

I'm not saying that patches should be rejected, they should just be told
to spend more time in -next gathering reviews and tests.

Linus would basically say "resend this once the patch has been in -next
for 21 days". That's all.

Heck, we could automate this and check pull requests send to Linus and
warn about "new" patches.

>> Yes, folks who do these
>> patches are often very familiar with the subsystem, but this doesn't
>> mean that they don't make mistakes.
>
>But we all do mistakes all the time. And quite frankly I find that the
>recent kernels quality in the early stages after the release is much
>better than what it used to be. Kernels build fine, boot fine on most
>hardware, and after a few stable versions you can really start to
>forget to update them because you don't meet the crashes anymore. Just
>a simple example (please don't reproduce, I'm not proud of it), when
>I replaced my PC, it came with 4.4.6. I thought "I'll have to upgrade
>next week". But I had so many trouble with its crappy bogus BIOS that
>I was afraid to reboot it. Then I had hundreds of xterms spread over
>multiple displays and it was never the best moment to reboot. Finally
>it happened 550 days later. Yes, the 6th maintenance release of 4.4
>lasted 550 days on a developers machine doing all sort of stuff without
>even a scary message in dmesg. Of course in terms of security it's
>terrible. But we didn't see this level of stability in 2.6.x nor in
>the early 3.x versions.
>
>> It's as if during -rc cycles all rules are void and bug fixes are now
>> no be collected and merged in as fast as humanly possible without any
>> regard to how well these fixes were tested.
>
>These stages are supposed to serve to collect fixes, and fixes are
>supposed to be tested. Often it's worse to let a fix rot somewhere
>than to get it. At the very least by merging it you expose it more
>quickly and you have more chances to know if you missed anything.

Linus's tree isn't a testing tree anymore. In reality, it's just a
cadence/sync point for kernel devs.

Integration and testing happen in -next. The various bots we have run on
-next, most folks are doing their custom testing on -next (we do...).

Linus's tree is was "demoted" as a result of the significant improvement
in testing automation and the capacity to be able to test a fast
changing tree such as -next.

So keeping patches out of Linus's tree isn't really equals to "letting
them rot", it just means "let them get more testing".

>I remember in the past some people arguing that we shouldn't backport
>fixes that haven't experienced a release yet, but that would make the
>situation even worse, with no stable fix for the 3 months following a
>release. The overall amount of reverts in stable kernels remains very
>low, which indicates to me that the overall quality is quite good,
>eventhough the process causes gray hair to the involved people (well
>for those still having hair).

Right, the statistics didn't support the policy change. The -stable
kernel is better at not introducing bugs because (IMO) the commits get
even more reviews.

Countless times my requests for reviews of -stable commits have
uncovered a bug in mainline.

>That's overall why I think that your work can be useful to raise
>awareness of what behaviours decrease the level of quality so that
>everyone can try to improve a bit, but I don't think there is that
>much to squeeze without hitting the wall of what a human brain can
>reasonably deal with. And extra process is a mental pressure just
>like dealing with bugs, so comes a point where process competes
>with quality.

I'm trying to come up with a way where, similar to AUTOSEL, humans won't
need to do much more work.

I'm also not advocating for *more* process, I'm advocating for a
*different* process.

Linus, as he already stated himself, is looking at how long a patch
spent in -next before he pulls it. I'm suggesting to improve and build
on that. Look at how long a patch was in -next, how many people reviewed
it, how much mailing list discussion it triggered, etc.

What I want to end up with is a tool to make maintainer's life easier by
highlighting "dangerous" patches that require more careful review. It's
much more time efficient to keep bugs out than deal with them later.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-01 17:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30 17:58 bug-introducing patches (or: -rc cycles suck) Sasha Levin
2018-04-30 18:38 ` Greg KH
2018-04-30 19:09 ` Willy Tarreau
2018-05-01 16:19   ` Sasha Levin
2018-05-01 16:50     ` Willy Tarreau
2018-05-01 17:21       ` Sasha Levin
2018-04-30 19:12 ` Julia Lawall
2018-05-01 16:24   ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.