All of lore.kernel.org
 help / color / mirror / Atom feed
* stable? quality assurance?
@ 2010-07-11  7:18 Martin Steigerwald
  2010-07-11  8:39 ` Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-11  7:18 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]


Hi!

2.6.34 was a desaster for me: bug #15969 - patch was availble before 
2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well 
as most important two complete lockups - well maybe just X.org and radeon 
KMS, I didn't start my second laptop to SSH into the locked up one - on my 
ThinkPad T42. I fixed the first one with the patch, but after the lockups I 
just downgraded to 2.6.33 again.

I still actually *use* my machines for something else than hunting patches 
for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" 
(accentuation from me). I know of the argument that one should use a 
distro kernel for machines that are for production use. But frankly, does 
that justify to deliver in advance known crap to the distributors? What 
impact do partly grave bugs reported on bugzilla have on the release 
decision?

And how about people who have their reasons - mine is TuxOnIce - to 
compile their own kernels?

Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the 
freezes as well. So far so good.

Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the 
website. And I just again always wait for .2 or .3, as with 2.6.34.1 I 
still have some problems like the hang on hibernation reported in

hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1

on this mailing list just a moment ago. But then 2.6.33 did hang with 
TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since 
2.6.34 did not hang with it anymore which was a reason for me to try 
2.6.34 earlier.

I am quite a bit worried about the quality of the recent kernels. Some 
iterations earlier I just compiled them, partly even rc-ones which I do 
not expact to be table, and they just worked. But in the recent times .0, 
partly even .1 or .2 versions haven't been stable for me quite some times 
already and thus they better not be advertised as such on kernel.org I 
think. I am willing to risk some testing and do bug reports, but these are 
still production machines, I do not have any spare test machines, and 
there needs to be some balance, i.e. the kernels should basically work. 
Thus I for sure will be more reluctant to upgrade in the future.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  7:18 stable? quality assurance? Martin Steigerwald
@ 2010-07-11  8:39 ` Eric Dumazet
  2010-07-11 14:22   ` Martin Steigerwald
                     ` (2 more replies)
  2010-07-11 13:16 ` Ted Ts'o
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 72+ messages in thread
From: Eric Dumazet @ 2010-07-11  8:39 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit :
> Hi!
> 
> 2.6.34 was a desaster for me: bug #15969 - patch was availble before 
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well 
> as most important two complete lockups - well maybe just X.org and radeon 
> KMS, I didn't start my second laptop to SSH into the locked up one - on my 
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I 
> just downgraded to 2.6.33 again.
> 
> I still actually *use* my machines for something else than hunting patches 
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" 
> (accentuation from me). I know of the argument that one should use a 
> distro kernel for machines that are for production use. But frankly, does 
> that justify to deliver in advance known crap to the distributors? What 
> impact do partly grave bugs reported on bugzilla have on the release 
> decision?
> 
> And how about people who have their reasons - mine is TuxOnIce - to 
> compile their own kernels?
> 
> Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the 
> freezes as well. So far so good.
> 
> Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the 
> website. And I just again always wait for .2 or .3, as with 2.6.34.1 I 
> still have some problems like the hang on hibernation reported in
> 
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
> 
> on this mailing list just a moment ago. But then 2.6.33 did hang with 
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since 
> 2.6.34 did not hang with it anymore which was a reason for me to try 
> 2.6.34 earlier.
> 
> I am quite a bit worried about the quality of the recent kernels. Some 
> iterations earlier I just compiled them, partly even rc-ones which I do 
> not expact to be table, and they just worked. But in the recent times .0, 
> partly even .1 or .2 versions haven't been stable for me quite some times 
> already and thus they better not be advertised as such on kernel.org I 
> think. I am willing to risk some testing and do bug reports, but these are 
> still production machines, I do not have any spare test machines, and 
> there needs to be some balance, i.e. the kernels should basically work. 
> Thus I for sure will be more reluctant to upgrade in the future.
> 
> Ciao,

Anybody running latest kernel on a production machine is living
dangerously. Dont you already know that ?

When 2.6.X is released, everybody knows it contains at least 100 bugs.

It was true for all previous values of X, it will be true for all
futures values.

If you want to be safer, use a one year old kernel, with all stable
patches in.

Something like 2.6.32.16 : Its probably more stable than all 2.6.X
kernels.

If 2.6.33 runs OK on your machine, you are lucky, since 2.6.33.6
contains numerous bug fixes.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  7:18 stable? quality assurance? Martin Steigerwald
  2010-07-11  8:39 ` Eric Dumazet
@ 2010-07-11 13:16 ` Ted Ts'o
  2010-07-11 18:02   ` Anca Emanuel
                     ` (2 more replies)
  2010-07-11 13:56 ` Lee Mathers
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 72+ messages in thread
From: Ted Ts'o @ 2010-07-11 13:16 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

On Sun, Jul 11, 2010 at 09:18:41AM +0200, Martin Steigerwald wrote:
> 
> I still actually *use* my machines for something else than hunting patches 
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" 
> (accentuation from me). I know of the argument that one should use a 
> distro kernel for machines that are for production use. But frankly, does 
> that justify to deliver in advance known crap to the distributors? What 
> impact do partly grave bugs reported on bugzilla have on the release 
> decision?

So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
I find bugs, I report them and I help fix them.  If more people did
that, then the 2.6.X.0 releases would be more stable.  But kernel
development is a volunteer effort, so it's up to the volunteers to
test and fix bugs during the rc4, -rc5 and -rc6 time frame.  But if
the work tails off, because the developers are busily working on new
features for the new release, then past a certain point, delaying the
release reaches a point of diminishing returns.  This is why we do
time-based releases.

It is possible to do other types of release strategies, but look at
Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens
if you insist on waiting until all release blockers are fixed (and
even with Debian, past a certain point the release engineer will still
just reclassify bugs as no longer being release blockers --- after the
stable release has slipped for months or years past the original
projected release date.)

So if you and others like you are willing to help, then the quality of
the Linux kernels can continue to improve.  But simply complaining
about it is not likely to solve things, since threating to not be
willing to upgrade kernels is generally not going to motivate many, if
not most, of the volunteers who work on stablizing the kernel.

> I am willing to risk some testing and do bug reports, but these are 
> still production machines, I do not have any spare test machines, and 
> there needs to be some balance, i.e. the kernels should basically work. 

So you want the latest and greatest new features in a brand-new kernel
release, but you're not willing to pay for test machines, and you're
not willing to pay for a distribution support...  The fact that you
are willing to do some testing is appreciated, but remember, there's
no such thing as a free lunch.  Linux may be a very good bargain (look
at how much Oracle has increased its support contracts for Solaris!),
but it's still not a free lunch.  At the end of the day, you get what
you put into it.

Best regards,

						- Ted



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  7:18 stable? quality assurance? Martin Steigerwald
  2010-07-11  8:39 ` Eric Dumazet
  2010-07-11 13:16 ` Ted Ts'o
@ 2010-07-11 13:56 ` Lee Mathers
  2010-07-11 14:51   ` Martin Steigerwald
  2010-07-12 19:46 ` stable? quality assurance? Nix
       [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>
  4 siblings, 1 reply; 72+ messages in thread
From: Lee Mathers @ 2010-07-11 13:56 UTC (permalink / raw)
  To: Martin Steigerwald, linux-kernel

Wow!

First question what is a "desaster"?

Second question, what makes you so important that you feel you can
makes demands and comments as you did.

If indeed these are production systems and you are an administrator of
said production systems. I suggest you need to do a little more home
work to expand your knowledge base.

I would follow Eric's advice.  It's sound advice and better yet it was free.

Hope you have better luck in getting your systems running well.

On 7/11/10, Martin Steigerwald <Martin@lichtvoll.de> wrote:
>
> Hi!
>
> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
> as most important two complete lockups - well maybe just X.org and radeon
> KMS, I didn't start my second laptop to SSH into the locked up one - on my
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I
> just downgraded to 2.6.33 again.
>
> I still actually *use* my machines for something else than hunting patches
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
> (accentuation from me). I know of the argument that one should use a
> distro kernel for machines that are for production use. But frankly, does
> that justify to deliver in advance known crap to the distributors? What
> impact do partly grave bugs reported on bugzilla have on the release
> decision?
>
> And how about people who have their reasons - mine is TuxOnIce - to
> compile their own kernels?
>
> Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
> freezes as well. So far so good.
>
> Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
> website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
> still have some problems like the hang on hibernation reported in
>
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
> 2.6.34 did not hang with it anymore which was a reason for me to try
> 2.6.34 earlier.
>
> I am quite a bit worried about the quality of the recent kernels. Some
> iterations earlier I just compiled them, partly even rc-ones which I do
> not expact to be table, and they just worked. But in the recent times .0,
> partly even .1 or .2 versions haven't been stable for me quite some times
> already and thus they better not be advertised as such on kernel.org I
> think. I am willing to risk some testing and do bug reports, but these are
> still production machines, I do not have any spare test machines, and
> there needs to be some balance, i.e. the kernels should basically work.
> Thus I for sure will be more reluctant to upgrade in the future.
>
> Ciao,
> --
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
>

-- 
Sent from my mobile device

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  8:39 ` Eric Dumazet
@ 2010-07-11 14:22   ` Martin Steigerwald
  2010-07-11 14:52     ` Martin Steigerwald
  2010-07-11 15:58   ` William Pitcock
  2010-07-11 17:04   ` Heinz Diehl
  2 siblings, 1 reply; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-11 14:22 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 2998 bytes --]

Am Sonntag 11 Juli 2010 schrieb Eric Dumazet:
> Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit :
> > Hi!

Hi Eric,

> > 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as
> > well as most important two complete lockups - well maybe just X.org
> > and radeon KMS, I didn't start my second laptop to SSH into the
> > locked up one - on my ThinkPad T42. I fixed the first one with the
> > patch, but after the lockups I just downgraded to 2.6.33 again.
> > 
> > I still actually *use* my machines for something else than hunting
> > patches for kernel bugs and on kernel.org it is written "Latest
> > *Stable* Kernel" (accentuation from me). I know of the argument that
[...]

> > advertised as such on kernel.org I think. I am willing to risk some
> > testing and do bug reports, but these are still production machines,
> > I do not have any spare test machines, and there needs to be some
> > balance, i.e. the kernels should basically work. Thus I for sure
> > will be more reluctant to upgrade in the future.
> > 
> > Ciao,
> 
> Anybody running latest kernel on a production machine is living
> dangerously. Dont you already know that ?

Yes, and I indicated it above. But in my - naturally rather subjective I 
admit - perception the balance between stable and unstable from about 1 or 
2 years ago has been lost. In my personal experience it has gotten much 
worse in the last time. To the extent that I skipped some major kernels 
versions completely. For example 2.6.30.

And its not servers - these use distro kernels.  

> When 2.6.X is released, everybody knows it contains at least 100 bugs.

Then why its still labeled "stable" on kernel.org? It is not. It is at 
most beta quality software.

Its not more stable than KDE 4.0 wasn't stable, but at least they 
mentioned in the release notes.

> It was true for all previous values of X, it will be true for all
> futures values.
> 
> If you want to be safer, use a one year old kernel, with all stable
> patches in.
> 
> Something like 2.6.32.16 : Its probably more stable than all 2.6.X
> kernels.
> 
> If 2.6.33 runs OK on your machine, you are lucky, since 2.6.33.6
> contains numerous bug fixes.

Actually it was 2.6.33.1 with userspace software suspend and it had pretty 
good uptimes above 20 days - only interrupted by installing 2.6.34.

Well then if everybody else considers this for granted I just replace that 
"stable" on kernel.org by "beta quality" - from my perception it does not 
even have release candidate status in the last iterations - in my mind and 
be done with it.

At as soon as the kernel contains a performant hibernation infrastructure 
I will probably just use distro kernels and be done with it.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 13:56 ` Lee Mathers
@ 2010-07-11 14:51   ` Martin Steigerwald
  2010-07-11 17:22     ` Willy Tarreau
                       ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-11 14:51 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 3155 bytes --]


Hi Lee,

Am Sonntag 11 Juli 2010 schrieb Lee Mathers:
> Wow!
> 
> First question what is a "desaster"?

For me freezing the machine or at least complete desktop randomly for 
example. And actually I said "for me" as you can reread on the bottom of 
your top posting.

> Second question, what makes you so important that you feel you can
> makes demands and comments as you did.

Since when I do need to be considered to be important by you or anyone 
else to make comments? Actually I think I do not - this is still an open 
mailinglist, isn't it? And I won't waste my time with proofs that I 
contributed to free software here and there - also to kernel testing what 
for example Ingo Molnar could testify back in early CFS times where I 
roughly compiled a kernel a day and to kernel documentation once.

I also do not get why you are attacking me personally. It seems to be that 
you feel personally attacked by me. But I did not. I just questioned the 
quality of the kernel and its current quality assurance process. No one is 
personally bad then anything of that lacks.

One reason for a demand for me is best expressed by this question: Does 
the kernel developer community want to encourage that a group of advanced 
Linux users - but mostly non-developers - compile their own vanilla or 
valnilla near kernels, provide wider testing and report a bug now and 
then?

I can live with either answer. If not, I just will be much more reluctant 
to try out new kernels.

But I have experienced working productively with kernel developers like 
Ingo and tuxonice developer Nigel who where pretty interested in my usage 
of latest kernels.

I admit my wording could have been friendlier, too, but I was just 
frustrated out of my recent experiences. What I wanted to achieve is 
raising concern whether kernel quality actually has decreased and more 
importantly something needs to be done to make it more stable again.

Well Linus has at least been a bit more reluctant to take big changes 
after rc1 this cycle, so maybe 2.6.35 will be better again.

> If indeed these are production systems and you are an administrator of
> said production systems. I suggest you need to do a little more home
> work to expand your knowledge base.

Its production system that have some fault tolerance, i.e. not servers, 
but laptops and one, not yet all workstations. But for me a certain 
balance has to be met. I will just downgrade and drop newer kernels or 
even start skipping whole major versions completely on a regular basis if 
that turns out to be the only way to have stable enough machines for me. 
One approach would be to stick to the stable kernels that Greg and the 
stable team maintains for a longer time

> Hope you have better luck in getting your systems running well.

Thanks. I certainly will. If need be by downgrading.

I hope that someone answers who actually can take some critique. From the 
current replies I perceive a lack of that ability.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 14:22   ` Martin Steigerwald
@ 2010-07-11 14:52     ` Martin Steigerwald
  0 siblings, 0 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-11 14:52 UTC (permalink / raw)
  To: linux-kernel

Am Sonntag 11 Juli 2010 schrieb Martin Steigerwald:
> worse in the last time. To the extent that I skipped some major
> kernels  versions completely. For example 2.6.30.

Okay, not some, but one.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  8:39 ` Eric Dumazet
  2010-07-11 14:22   ` Martin Steigerwald
@ 2010-07-11 15:58   ` William Pitcock
  2010-07-11 16:34     ` Eric Dumazet
  2010-07-16  6:59     ` Greg KH
  2010-07-11 17:04   ` Heinz Diehl
  2 siblings, 2 replies; 72+ messages in thread
From: William Pitcock @ 2010-07-11 15:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel


----- "Eric Dumazet" <eric.dumazet@gmail.com> wrote:

> Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit
> :
> > Hi!
> > 
> > 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 
> > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already,
> as well 
> > as most important two complete lockups - well maybe just X.org and
> radeon 
> > KMS, I didn't start my second laptop to SSH into the locked up one -
> on my 
> > ThinkPad T42. I fixed the first one with the patch, but after the
> lockups I 
> > just downgraded to 2.6.33 again.
> > 
> > I still actually *use* my machines for something else than hunting
> patches 
> > for kernel bugs and on kernel.org it is written "Latest *Stable*
> Kernel" 
> > (accentuation from me). I know of the argument that one should use a
> 
> > distro kernel for machines that are for production use. But frankly,
> does 
> > that justify to deliver in advance known crap to the distributors?
> What 
> > impact do partly grave bugs reported on bugzilla have on the release
> 
> > decision?
> > 
> > And how about people who have their reasons - mine is TuxOnIce - to
> 
> > compile their own kernels?
> > 
> > Well 2.6.34.1 fixed the two reported bugs and it seemed to have
> fixed the 
> > freezes as well. So far so good.
> > 
> > Maybe it should read "prerelease of stable" for at least 2.6.34.0 on
> the 
> > website. And I just again always wait for .2 or .3, as with 2.6.34.1
> I 
> > still have some problems like the hang on hibernation reported in
> > 
> > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
> > 
> > on this mailing list just a moment ago. But then 2.6.33 did hang
> with 
> > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either,
> since 
> > 2.6.34 did not hang with it anymore which was a reason for me to try
> 
> > 2.6.34 earlier.
> > 
> > I am quite a bit worried about the quality of the recent kernels.
> Some 
> > iterations earlier I just compiled them, partly even rc-ones which I
> do 
> > not expact to be table, and they just worked. But in the recent
> times .0, 
> > partly even .1 or .2 versions haven't been stable for me quite some
> times 
> > already and thus they better not be advertised as such on kernel.org
> I 
> > think. I am willing to risk some testing and do bug reports, but
> these are 
> > still production machines, I do not have any spare test machines,
> and 
> > there needs to be some balance, i.e. the kernels should basically
> work. 
> > Thus I for sure will be more reluctant to upgrade in the future.
> > 
> > Ciao,
> 
> Anybody running latest kernel on a production machine is living
> dangerously. Dont you already know that ?
> 
> When 2.6.X is released, everybody knows it contains at least 100
> bugs.
> 
> It was true for all previous values of X, it will be true for all
> futures values.
> 
> If you want to be safer, use a one year old kernel, with all stable
> patches in.
> 
> Something like 2.6.32.16 : Its probably more stable than all 2.6.X
> kernels.

2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
as a Xen domU.  I would say 2.6.32.12 is the best choice since who knows
what other regressions there are in .16.

William

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 15:58   ` William Pitcock
@ 2010-07-11 16:34     ` Eric Dumazet
  2010-07-16  6:59     ` Greg KH
  1 sibling, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2010-07-11 16:34 UTC (permalink / raw)
  To: William Pitcock; +Cc: linux-kernel

Le dimanche 11 juillet 2010 à 19:58 +0400, William Pitcock a écrit :
> ----- "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
> > 
> > Something like 2.6.32.16 : Its probably more stable than all 2.6.X
> > kernels.
> 
> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
> as a Xen domU.  I would say 2.6.32.12 is the best choice since who knows
> what other regressions there are in .16.
> 

Yea, strictly speaking, you can be sure no kernel will be bug free,
ever.

This is why I said "probably more stable" ;)




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  8:39 ` Eric Dumazet
  2010-07-11 14:22   ` Martin Steigerwald
  2010-07-11 15:58   ` William Pitcock
@ 2010-07-11 17:04   ` Heinz Diehl
  2 siblings, 0 replies; 72+ messages in thread
From: Heinz Diehl @ 2010-07-11 17:04 UTC (permalink / raw)
  To: linux-kernel

On 11.07.2010, Eric Dumazet wrote: 

> When 2.6.X is released, everybody knows it contains at least 100 bugs.
[....]

http://s5.directupload.net/file/d/2217/ckghonrx_jpg.htm

:-)


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 14:51   ` Martin Steigerwald
@ 2010-07-11 17:22     ` Willy Tarreau
  2010-07-11 21:38       ` Rafael J. Wysocki
                         ` (3 more replies)
  2010-07-11 19:49     ` Stefan Richter
  2010-07-13 11:11     ` Alejandro Riveira Fernández
  2 siblings, 4 replies; 72+ messages in thread
From: Willy Tarreau @ 2010-07-11 17:22 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Hi Martin,

On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> I hope that someone answers who actually can take some critique. From the 
> current replies I perceive a lack of that ability.

well, I'll try to do then :-)

There were some threads in the past about kernel releases quality,
where Linus explained why it could not be completely black or white.

Among the things he explained, I remember that one of primary concern
was the inability to slow down development. I mean, if he waits 2 more
weeks for things to stabilize, then there will be two more weeks of
crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
will just shift dates and not quality.

There are also some regressions that get merged with every pre-release.
Thus, assuming he would wait for one more pre-release to merge the
fixes you spotted, 2 or 3 more would appear, so there's a point where
it must be decided when to release.

Right now it's released when he feels it "good enough". This can be
very subjective, but I'd think that "good enough" basically means
that the kernel will be able to live in its stable branch without
major changes and without reverting features.

Also, you have to consider that there are several types of users.
Some of them are developers who will run a latest -git kernel at
some point. Some of them will be enthousiasts waiting for a feature,
and who will run every -rc kernel once the feature is merged, to
ensure it does not break before the release. There are also janitors
and the curious ones who'll basically run a few of the last -rc as
time permits to see if they can spot a few last-minute issues before
the release. There are the brave ones who systematically download
the dot-0 release once Linus announces it and will proudly run it
to show their friends who it's better than the last one. There are
those who need a bit of stability (eg: professional laptop or home
server) and will prefer to wait for a few stable releases to ensure
they won't waste their time on a big stupid issue that all other ones
above will have immediately spotted for them. And there are the ones
who run production servers who will either use distro kernels of
long term stable kernels, with a more or less long qualification
process between upgrades.

It's just an ecosystem where you have to find your place. From your
description, I think you're before the last ones above, you need
something which works, eventhough it's not critical, so you could
very well wait for 2-3 stable updates before upgrading (that does
not prevent you from testing earlier on other systems if you want
to test performance, new features, regressions, etc...).

It's not really advisable to call dot-0 releases "unstable" because
it will only result in shifting the adoption point between the user
classes above. We need to have enthousiasts who proudly say "hey
look, dot-0 and it's already rock solid". We've all seen some of them
and they're the ones who help reporting issues that get fixed in the
next stable release.

I think that the most reasonable thing to do is to assume your need
for stability and always refrain from running on the latest release.

Speaking for myself, I tend to run rock solid kernels for my data (my
file server was still on 2.4.37.9 till this afternoon, I just upgraded
it to 2.6). The distro's kernel currently is 2.6.33.4 and I'm going to
switch it back to 2.6.32.x or 2.6.27.x because I'd rather have something
fully tested there. My desktop which regularly reaches 50-100 days
uptime runs on whatever looks stable enough for the job when I upgrade.
Usually it's one of Greg's long term stable series. 2.6.27.x or
2.6.32.x, with x >= 10. My work laptop is on similar kernels. My
netbook is generally running experimental code, it does not matter
much. It's where I'd try 2.6.35-rc for instance, or where I test
2.6.32.x-rc when Greg announces them.

You see, there's a kernel for everyone, and for every usage. You just
have to make your choice. And when you don't know or don't want to
guess, stick to the distro's kernel.

Regards,
Willy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 13:16 ` Ted Ts'o
@ 2010-07-11 18:02   ` Anca Emanuel
  2010-07-12  6:46   ` David Newall
  2010-09-04 17:12   ` Martin Steigerwald
  2 siblings, 0 replies; 72+ messages in thread
From: Anca Emanuel @ 2010-07-11 18:02 UTC (permalink / raw)
  To: Ted Ts'o, Martin Steigerwald, linux-kernel

Offtopic.

I'm using Ubuntu 10.04 and kernel 2.6.35-rc1 from kernel.ubuntu.com
Wonking fine (stable, but my webcam still not working).

Using this https://wiki.ubuntu.com/KernelTeam/GitKernelBuild tutorial
to compile the kernel. But no success (it finish the compile but no
deb packages).
I have done it from virtualbox some weeks ago, and grub can not mount.

Is there any tutorial how to build the kernel for Ubuntu 10.04 ?

Please test it yourself in (Ubuntu 10.04):
sudo cfdisk
result: Bad primary partition 1. (any kernel, any enviroment).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 14:51   ` Martin Steigerwald
  2010-07-11 17:22     ` Willy Tarreau
@ 2010-07-11 19:49     ` Stefan Richter
  2010-07-13 11:11     ` Alejandro Riveira Fernández
  2 siblings, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-11 19:49 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel, Lee Mathers

Martin Steigerwald wrote:
> One reason for a demand for me is best expressed by this question: Does 
> the kernel developer community want to encourage that a group of advanced 
> Linux users - but mostly non-developers - compile their own vanilla or 
> valnilla near kernels, provide wider testing and report a bug now and 
> then?

Yes, testing is desired --- in order to shake out bugs that are not
manifest on the developer's systems.  Remember that the kernel is a
special program in which there are many classes of bugs that can only be
reproduced on special hardware and/or with special workloads.

Alas, there are not only new bugs in new features but also new bugs in
existing features, a.k.a. regressions.  But like new bugs, many
regressions can alas not be found by the developers themselves on their
test systems.

You mentioned two particular regressions in your initial posting.  Do
you have suggestions how they could have been prevented in the first
place?  Or how they could have been handled better than they were?

Do you see subsystems of the kernel in which regressions are not taken
as seriously as in other ones?

> Well Linus has at least been a bit more reluctant to take big changes 
> after rc1 this cycle, so maybe 2.6.35 will be better again.

2.6.35 will only be better if this (gradual) change of procedure means
that -rc kernels are going to be tested more and new bugs are going to
be found and fixed quicker in the -rc phase than before.  And 2.6.36+
will only be better if the stricter post -rc1 merges do not motivate
developers to put even more hastily assembled under-tested crap into
their pre -rc1 pull requests than they already do.

[PS: 2.6.34 works very well for me, as most 2.6.x releases do.]
[PS2:  When on lkml, please use reply-to-all, not reply-to-list-only.]
-- 
Stefan Richter
-=====-==-=- -=== -=-==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 17:22     ` Willy Tarreau
@ 2010-07-11 21:38       ` Rafael J. Wysocki
  2010-07-12  4:17         ` Willy Tarreau
  2010-07-12  9:56       ` Martin Steigerwald
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-07-11 21:38 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Martin Steigerwald, linux-kernel

On Sunday, July 11, 2010, Willy Tarreau wrote:
> Hi Martin,
> 
> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From the 
> > current replies I perceive a lack of that ability.
> 
> well, I'll try to do then :-)
> 
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
> 
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.
...
> It's not really advisable to call dot-0 releases "unstable" because
> it will only result in shifting the adoption point between the user
> classes above.

IMnshO it's not exactly fair to call them "stable" either.  I tend to call them
"major releases" which basically reflects what they are - events in the
development process that each start a new merge window.  Nothing more, either
way.

Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 21:38       ` Rafael J. Wysocki
@ 2010-07-12  4:17         ` Willy Tarreau
  0 siblings, 0 replies; 72+ messages in thread
From: Willy Tarreau @ 2010-07-12  4:17 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Martin Steigerwald, linux-kernel

Hi Rafael,

On Sun, Jul 11, 2010 at 11:38:28PM +0200, Rafael J. Wysocki wrote:
> > It's not really advisable to call dot-0 releases "unstable" because
> > it will only result in shifting the adoption point between the user
> > classes above.
> 
> IMnshO it's not exactly fair to call them "stable" either.  I tend to call them
> "major releases" which basically reflects what they are - events in the
> development process that each start a new merge window.  Nothing more, either
> way.

Indeed, just exactly that. Maybe the confusion comes from the title
"Latest Stable Kernel" on kernel.org, which we could rename "Latest
Kernel Release" whatever it reflects ?

Willy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 13:16 ` Ted Ts'o
  2010-07-11 18:02   ` Anca Emanuel
@ 2010-07-12  6:46   ` David Newall
       [not found]     ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>
  2010-09-04 17:12   ` Martin Steigerwald
  2 siblings, 1 reply; 72+ messages in thread
From: David Newall @ 2010-07-12  6:46 UTC (permalink / raw)
  To: Ted Ts'o, Martin Steigerwald, linux-kernel

Ted Ts'o wrote:
> It is possible to do other types of release strategies, but look at
> Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens
> if you insist on waiting until all release blockers are fixed

I don't know if Ted intended to be snide, but that is how he sounded.  
And yet, his comment was a fair reflection of how core developers seem 
to feel about stability, namely that a stable kernel is obsolete and 
therefore not particularly desirable.  (I use the word "stable" in it's 
common English meaning, not the almost inexplicable Tux variation.)

I think the truth is that linux kernels are only ever stable as released 
by distributions, and then only the more conservative of them.  What 
comes direct from kernel.org, I mean those called "latest stable", are 
an exercise in dissembling.  It's stable because someone calls it 
stable, even though it crashes and has regressions?  That's not stable, 
that's just misleading.

Stable kernels *could* be stable.  Debian succeeds.  If it takes them a 
long time, that is only because the core developers fail to release 
reasonable quality kernels.  Don't sneer at them because they do the 
right thing; do the right thing yourself so that they can produce more 
timely updates.

I don't expect fair consideration of these comments; why change when 
shooting the messenger is so much more satisfying?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 17:22     ` Willy Tarreau
  2010-07-11 21:38       ` Rafael J. Wysocki
@ 2010-07-12  9:56       ` Martin Steigerwald
  2010-07-12 15:43       ` Martin Steigerwald
  2010-09-04 16:38       ` Martin Steigerwald
  3 siblings, 0 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-12  9:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Willy Tarreau

[-- Attachment #1: Type: Text/Plain, Size: 943 bytes --]

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,
 
> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the current replies I perceive a lack of that ability.
> 
> well, I'll try to do then :-)
> 
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
[...]
> You see, there's a kernel for everyone, and for every usage. You just
> have to make your choice. And when you don't know or don't want to
> guess, stick to the distro's kernel.

Wow! Thanks to you and all the others who provided such constructive 
feedback.

I need a bit of time to digest and think through it. I will answer then.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Fwd: stable? quality assurance?
       [not found]     ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>
@ 2010-07-12 12:35       ` Marcin Letyns
  2010-07-12 12:42         ` Alexey Dobriyan
  2010-07-12 15:56       ` David Newall
  1 sibling, 1 reply; 72+ messages in thread
From: Marcin Letyns @ 2010-07-12 12:35 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ----------
From: Marcin Letyns <mletyns@gmail.com>
Date: 2010/7/12
Subject: Re: stable? quality assurance?
To: David Newall <davidn@davidnewall.com>


2010/7/12 David Newall <davidn@davidnewall.com>:
>
> I don't know if Ted intended to be snide, but that is how he sounded.  And
> yet, his comment was a fair reflection of how core developers seem to feel
> about stability, namely that a stable kernel is obsolete and therefore not
> particularly desirable.  (I use the word "stable" in it's common English
> meaning, not the almost inexplicable Tux variation.)

What about a bsd variation? Last time I tried freebsd it wasn't
stable. It had problems with my hard drive controler. There are many
regressions introduced in newer releases. I see you don't want Linux
to be developed rapidly (remember your lame slow down please?).

> I think the truth is that linux kernels are only ever stable as released by
> distributions, and then only the more conservative of them.  What comes
> direct from kernel.org, I mean those called "latest stable", are an exercise
> in dissembling.  It's stable because someone calls it stable, even though it
> crashes and has regressions?  That's not stable, that's just misleading.

Show me a "stable" kernel. Windows, *bsd, solaris, os x? There's none.
I've never had problems with the newest mainline kernels, because
they're rock stable and rock solid for me. Why don't go at freebsd.com
and why don't you complain they should stop calling some of the
freebsd releases a stable ones? There are regressions,  crashes, but I
guess it's a *bsd variation of a "stable" term.

> Stable kernels *could* be stable.  Debian succeeds.  If it takes them a long
> time, that is only because the core developers fail to release reasonable
> quality kernels.  Don't sneer at them because they do the right thing; do
> the right thing yourself so that they can produce more timely updates.

While there's Debian with the stable kernel then what the hell do you
want? :> I don't want Debian with its old user space and with the old
kernel. If this is what you want then what are you complaining here
about? You want everyone to choose a Debian's way? Btw. it takes
Debian developers a long time to make a release, mainly because of the
user space...

> I don't expect fair consideration of these comments; why change when
> shooting the messenger is so much more satisfying?

You missed the point, so what do you expect? Btw. slowing down would
be very stupid. If you don't know why, it's because you're missing the
point.

> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 12:35       ` Fwd: " Marcin Letyns
@ 2010-07-12 12:42         ` Alexey Dobriyan
       [not found]           ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>
  2010-07-12 14:57           ` Valdis.Kletnieks
  0 siblings, 2 replies; 72+ messages in thread
From: Alexey Dobriyan @ 2010-07-12 12:42 UTC (permalink / raw)
  To: Marcin Letyns; +Cc: linux-kernel

On Mon, Jul 12, 2010 at 3:35 PM, Marcin Letyns <mletyns@gmail.com> wrote:
> Last time I tried freebsd it wasn't stable. It had problems with my hard
> drive controler.

This thread needs more anecdotal evidence.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Fwd: stable? quality assurance?
       [not found]           ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>
@ 2010-07-12 12:52             ` Marcin Letyns
  0 siblings, 0 replies; 72+ messages in thread
From: Marcin Letyns @ 2010-07-12 12:52 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ----------
From: Marcin Letyns <mletyns@gmail.com>
Date: 2010/7/12
Subject: Re: stable? quality assurance?
To: Alexey Dobriyan <adobriyan@gmail.com>


2010/7/12 Alexey Dobriyan <adobriyan@gmail.com>:
> On Mon, Jul 12, 2010 at 3:35 PM, Marcin Letyns <mletyns@gmail.com>
>
>
> This thread needs more anecdotal evidence.
>

This for sure! However, why should I care to provide something while
other's don't? :> Anyways, I won't install freebsd anymore and I'm not
interested in helping them.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 12:42         ` Alexey Dobriyan
       [not found]           ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>
@ 2010-07-12 14:57           ` Valdis.Kletnieks
  1 sibling, 0 replies; 72+ messages in thread
From: Valdis.Kletnieks @ 2010-07-12 14:57 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Marcin Letyns, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

On Mon, 12 Jul 2010 15:42:32 +0300, Alexey Dobriyan said:
> On Mon, Jul 12, 2010 at 3:35 PM, Marcin Letyns <mletyns@gmail.com> wrote:
> > Last time I tried freebsd it wasn't stable. It had problems with my hard
> > drive controler.
> 
> This thread needs more anecdotal evidence.

To be fair, the continual re-appearance of this thread is *always* anecdotal.

It's always somebody who has trouble getting it to work on *their* hardware, or
with *their* software, and insisting that stuff doesn't get shipped unless it
works properly on everything.  Apparently, having it work on 99.997% of the
gear out there isn't good enough for them.  Then there's the inevitable call
for "no shipping with blocker bugs" - never with a good objective definition of
what constitutes a "blocker" bug.

Ted had it right - you insist on fixing *everything*, you end up with
Debian Obsolete.  It's the nature of the beast - you *will* detect regressions
at something resembling an exponential-decay curve. The only question that remains is
how close to zero it has to decay before the ship date - and there's no single
answer for that which fits everybody.  One point to note is that if you ship
earlier, the decay rate increases because of wider deployment.  As a result,
it's quite probable that you get to some objective level of "stable" faster
by releasing early and then releasing a half-dozen dot releases, instead of
waiting for the 3 or 4 dozen people testing it before release to shake out all
the bugs (which obviously won't happen due to things like access to hardware).

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 17:22     ` Willy Tarreau
  2010-07-11 21:38       ` Rafael J. Wysocki
  2010-07-12  9:56       ` Martin Steigerwald
@ 2010-07-12 15:43       ` Martin Steigerwald
  2010-07-12 17:36         ` Willy Tarreau
  2010-07-12 17:55         ` Stefan Richter
  2010-09-04 16:38       ` Martin Steigerwald
  3 siblings, 2 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-12 15:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Willy Tarreau

[-- Attachment #1: Type: Text/Plain, Size: 7409 bytes --]

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,
 
> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the current replies I perceive a lack of that ability.
> 
> well, I'll try to do then :-)
> 
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
> 
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.

Would it make that much of a difference? Linus could still say no to 
obvious crap, couldn't he?

> There are also some regressions that get merged with every pre-release.
> Thus, assuming he would wait for one more pre-release to merge the
> fixes you spotted, 2 or 3 more would appear, so there's a point where
> it must be decided when to release.

Some sort of classifying bugs could help here I think. Something that 
helps Linus to decide whether it is worth to do another release candidate 
round or not.

Actually I think the USB soundcard not working after resume bug I 
mentioned (bug #15788) wouldn't warrant a new release candidate round, 
especially as it didn't have a patch yet and will likely just affect a 
minority of users. Still it would be fine if it was fixed in time. I do 
think that the Radeon KMS does not work after resume bug (#15969) does 
qualify since it causes loss of data handled by the current X session(s) - 
sure I normally save my stuff before hibernating, but... And it actually 
had a patch that has been tested! The desktop freeze bug I mentioned would 
slip, cause I didn't report it and except from a debian bug report I found 
it wasn't confirmed at all. An reported and confirmed desktop freeze would 
qualify IMHO.

Actually I read postings from Linus that he actually reads the regression 
list kindly provided by Rafael. 15788 was in there, but IMHO wouldn't 
qualify (see posting "2.6.34-rc5: Reported regressions from 2.6.33"). But 
15969 was not - well it was reported for rc7, so too late for the manual 
report by Rafael. So yes, I see how it can have slipped.

Maybe an approach would be to dynamically generate the list from all bug 
reports marked for 2.6.34 versions and have it posted to kernel mailing 
list after every rc. This way bug #15969 would at least have been in the 
list of known regressions.

Bugzilla severity and priority fields or something similar could be used to 
set the importance of a bug report and the regression list could be sorted 
by importance. One important criterion also would be whether someone could 
confirm it, reproduce it. Even when I reported those desktop freezes, 
unless someone confirmed them it might just happen for me. Well a "confirm" 
or vote button might be good, so that the amount of confirmations could be 
counted. 

It would need some triaging and classifying and I am willing to help with 
that.

> Right now it's released when he feels it "good enough". This can be
> very subjective, but I'd think that "good enough" basically means
> that the kernel will be able to live in its stable branch without
> major changes and without reverting features.

Okay, then thats two different definitions of stable. I mean stable enough 
for (adventurous) end users. And here its more of a development point of 
view.
 
> Also, you have to consider that there are several types of users.
> Some of them are developers who will run a latest -git kernel at
> some point. Some of them will be enthousiasts waiting for a feature,
> and who will run every -rc kernel once the feature is merged, to
> ensure it does not break before the release. There are also janitors
> and the curious ones who'll basically run a few of the last -rc as
> time permits to see if they can spot a few last-minute issues before
> the release. There are the brave ones who systematically download
> the dot-0 release once Linus announces it and will proudly run it
> to show their friends who it's better than the last one. There are
> those who need a bit of stability (eg: professional laptop or home
> server) and will prefer to wait for a few stable releases to ensure
> they won't waste their time on a big stupid issue that all other ones
> above will have immediately spotted for them. And there are the ones
> who run production servers who will either use distro kernels of
> long term stable kernels, with a more or less long qualification
> process between upgrades.

Yes, stable enough for whom? I see.

> It's just an ecosystem where you have to find your place. From your
> description, I think you're before the last ones above, you need
> something which works, eventhough it's not critical, so you could
> very well wait for 2-3 stable updates before upgrading (that does
> not prevent you from testing earlier on other systems if you want
> to test performance, new features, regressions, etc...).

ACK.

> It's not really advisable to call dot-0 releases "unstable" because
> it will only result in shifting the adoption point between the user
> classes above. We need to have enthousiasts who proudly say "hey
> look, dot-0 and it's already rock solid". We've all seen some of them
> and they're the ones who help reporting issues that get fixed in the
> next stable release.

I do think the claim should be honest. "stable" IMHO is not, at least from 
a user's point of view. "unstable" isn't either, cause a dot-0 kernel is 
not guarenteed to be unstable ;). So I agree with the major release kernel 
approach from Rafael.

> I think that the most reasonable thing to do is to assume your need
> for stability and always refrain from running on the latest release.
> 
> Speaking for myself, I tend to run rock solid kernels for my data (my
[...]
> You see, there's a kernel for everyone, and for every usage. You just
> have to make your choice. And when you don't know or don't want to
> guess, stick to the distro's kernel.

Yes. As told already I will rebalance my decision on which kernel to use. 
And I now better understand some of the problems. Thanks.

But beyond that, I do think its worth thinking about ways to improve the 
process of ensuring as much stability as sensibly possible. A dot-0 kernel 
won't be error-free - but I find just claiming the current process as "the 
best we can have" not actually satisfying. And I do think it can be 
improved upon. I do not do kernel development, but I am willing to help 
with collecting information about the current state of the kernel, help 
with bug triaging as good as I can and manage to take time. I do have some 
experience with quality management as I coordinated the betatest of some 
AmigaOS versions,  but then this has been in a closed group. Here its a 
different scale and I believe it needs somewhat different approaches.

I reply to other posts in that thread later in the next days.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
       [not found]     ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>
  2010-07-12 12:35       ` Fwd: " Marcin Letyns
@ 2010-07-12 15:56       ` David Newall
  2010-07-12 17:48         ` Marcin Letyns
                           ` (2 more replies)
  1 sibling, 3 replies; 72+ messages in thread
From: David Newall @ 2010-07-12 15:56 UTC (permalink / raw)
  To: Marcin Letyns; +Cc: Linux Kernel Mailing List

Marcin,

>> I don't expect fair consideration of these comments; why change when
>> shooting the messenger is so much more satisfying?
>>     
Q.E.D.


First, for the sake of brevity, I want it agreed that we're talking 
about new kernels, not those which are old, time-tested and patched.

I didn't notice anyone say they want Linux development to slow down; 
rather, and not just in this thread but in many threads before, that 
kernels released as "stable" fail to meet the common meaning of that 
word; and this needs to be improved.  Predictably, the common response 
sounds a bit like "shut up, go away, you're an idiot, it doesn't happen 
to me."  These are not useful as they serve not one whit to improve the 
situation, but give pause to those who might otherwise want to bring up 
a valid issue, once more.

Expectations are key to the problem.  When Linus says, "here is a shiny 
new, stable kernel", he creates expectations.  When that kernel proves 
unstable, those expectations are dashed and confidence in Linux 
suffers.  There's no reason why development methods need to change in 
order to reduce the number of flaky "stable" kernels.  It would be 
sufficient to replace the somewhat deceptive word "stable" with one that 
is more accurate; beta or gamma test make sense as they already have 
industry acceptance.  Clearly "stable" is not appropriate, as implicitly 
agreed by others who have advised: "don't use in production"; "wait at 
least a year"; and more.

Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I 
doubt anybody honestly thinks otherwise.

As to whether other operating systems are stable, well that's a fair 
question.  I agree that few large bodies of computer code are flawless, 
and so stability can be relative.  In that spirit I venture to put the 
stipulated kernels into order of decreasing reliability: Best is BSD, 
Solaris & OS X; then Windows; and then there's Linux.  If named 
distributions had been included, the list would look better (for us); 
they'd go in the first group.  Thank goodness for the Debian, Red Hat 
and Novell (to name just a few) for giving the world something which 
does, at least largely, meet expectations.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 15:43       ` Martin Steigerwald
@ 2010-07-12 17:36         ` Willy Tarreau
  2010-07-12 19:56           ` Martin Steigerwald
  2010-07-12 17:55         ` Stefan Richter
  1 sibling, 1 reply; 72+ messages in thread
From: Willy Tarreau @ 2010-07-12 17:36 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Hi Martin,

On Mon, Jul 12, 2010 at 05:43:56PM +0200, Martin Steigerwald wrote:
> > Among the things he explained, I remember that one of primary concern
> > was the inability to slow down development. I mean, if he waits 2 more
> > weeks for things to stabilize, then there will be two more weeks of
> > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> > will just shift dates and not quality.
> 
> Would it make that much of a difference? Linus could still say no to 
> obvious crap, couldn't he?

It's not "obvious" crap, it's that the developers will simply have
advanced two more weeks ahead of their schedule, so their merge will
be larger as it will contain some parts that ought to be in next release
should the kernel be release earlier. And it will not be possible to
delay merging because among them there's always the killer feature
everybody wants. This is the reason for the strict merge window.

> > There are also some regressions that get merged with every pre-release.
> > Thus, assuming he would wait for one more pre-release to merge the
> > fixes you spotted, 2 or 3 more would appear, so there's a point where
> > it must be decided when to release.
> 
> Some sort of classifying bugs could help here I think. Something that 
> helps Linus to decide whether it is worth to do another release candidate 
> round or not.

Maybe sometimes that could indeed help, but that must not be done too
often, otherwise releases slip and patches get even bigger.

(...)
> I do 
> think that the Radeon KMS does not work after resume bug (#15969) does 
> qualify since it causes loss of data handled by the current X session(s) - 
> sure I normally save my stuff before hibernating, but... And it actually 
> had a patch that has been tested!

Then the problem should be checked on this side : why this patch didn't get
merged in time ? Maybe the maintainer needed more time to recheck it, maybe
he was on holiday, maybe he was ill on the wrong day, maybe he had already
merged tons of fixes and preferred to get this one for next time, ... But
even if there are fixes pending, this should not be a reason to *delay*
releases, otherwise we go back to the problem above, with also the problem
of new regressions reported with tested fixes available...

(...)
> Maybe an approach would be to dynamically generate the list from all bug 
> reports marked for 2.6.34 versions and have it posted to kernel mailing 
> list after every rc. This way bug #15969 would at least have been in the 
> list of known regressions.

In fact, Rafael regularly emits this list, and the respective maintainers
are informed. That means to me that there's little hope that you'll get the
maintainers to merge and send a fix they did not manage to do. What *could*
be improved though would be if Linus publically states the deadline for last
fixes, as Greg does with the stable branch. That can give hopes to some of
them to finish a little merge work in time instead of considering it's too
late.

> Bugzilla severity and priority fields or something similar could be used to 
> set the importance of a bug report and the regression list could be sorted 
> by importance. One important criterion also would be whether someone could 
> confirm it, reproduce it. Even when I reported those desktop freezes, 
> unless someone confirmed them it might just happen for me. Well a "confirm" 
> or vote button might be good, so that the amount of confirmations could be 
> counted. 

Maybe that could help, but it will not necessarily be the best solution. Keep
in mind that some issues may be more important but still reported only by one
user. If one reports FS corruption, you certainly don't want to wait for a few
other ones to confirm the bug for instance. Security issues don't need counting
either.

(...)
> > It's not really advisable to call dot-0 releases "unstable" because
> > it will only result in shifting the adoption point between the user
> > classes above. We need to have enthousiasts who proudly say "hey
> > look, dot-0 and it's already rock solid". We've all seen some of them
> > and they're the ones who help reporting issues that get fixed in the
> > next stable release.
> 
> I do think the claim should be honest. "stable" IMHO is not, at least from 
> a user's point of view. "unstable" isn't either, cause a dot-0 kernel is 
> not guarenteed to be unstable ;). So I agree with the major release kernel 
> approach from Rafael.

But it's also the starting point of the stable branch. And what about the
-stable branch itself. Sometimes an awful bug will prevent the kernel from
even booting for most users, and a single patch will be present in the
stable branch to fix this early. Same if a major security issue gets
discovered at the time of release, it's possible that the stable branch
only contains one patch. That does not qualify it for more stable than
the main branch either, eventhough it's called "stable". Maybe we should
indicate on www.kernel.org that a new release has generally received
little testing but should be good enough for experienced users to test
it, and that stable releases before .3-.4 are not recommended for general
use.

> But beyond that, I do think its worth thinking about ways to improve the 
> process of ensuring as much stability as sensibly possible. A dot-0 kernel 
> won't be error-free - but I find just claiming the current process as "the 
> best we can have" not actually satisfying. And I do think it can be 
> improved upon. I do not do kernel development, but I am willing to help 
> with collecting information about the current state of the kernel, help 
> with bug triaging as good as I can and manage to take time. I do have some 
> experience with quality management as I coordinated the betatest of some 
> AmigaOS versions,  but then this has been in a closed group. Here its a 
> different scale and I believe it needs somewhat different approaches.

In fact, I think we're at a point where the development process scales
linearly with every brain and every pair of eyeballs. There are two
orthogonal axes to scale, one on the quality and one on the quantity.
Both are required, but the time spent on one is not spent on the other
one. Customers want quantity (features) and expect implicit quality.
It is possible for some people to bring a lot of added value, a lot
more than they would through their share of brain time on code. This is
the case for Rafael and Greg who noticeably enhance quality, but it's
not limited to them too. Code reviews, bug reviews, -next branch, etc...
are all geared towards quality. But one thing is sure, there are far
less people working on quality than there are working on features, so I
think that if you want to help, there is possibly a way to noticeably
improve quality with one more guy there, though you have to find how
to efficiently spend that time !

Regards,
Willy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 15:56       ` David Newall
@ 2010-07-12 17:48         ` Marcin Letyns
  2010-07-12 18:00         ` Stefan Richter
  2010-07-13 16:50         ` Theodore Tso
  2 siblings, 0 replies; 72+ messages in thread
From: Marcin Letyns @ 2010-07-12 17:48 UTC (permalink / raw)
  To: David Newall; +Cc: Linux Kernel Mailing List

2010/7/12 David Newall <davidn@davidnewall.com>:
>
> First, for the sake of brevity, I want it agreed that we're talking about
> new kernels, not those which are old, time-tested and patched.
>
> I didn't notice anyone say they want Linux development to slow down; >rather,
> and not just in this thread but in many threads before, that kernels
> released as "stable" fail to meet the common meaning of that word; and > this needs to be improved.

I remember when Greg (correct me if I'm wrong) said something like
there are no more stable releases. Those are distros which should
choose a 'proper' kernel. This seems to be working well: Ubuntu
usually ships with the one release older kernel, the same about
Debian, but they're much more restrictive and some other distros.
Those who wants to live on a bleeding edge they choose Fedora with the
latest kernel etc. Personally, I consider the LTS kernel is a stable
one and IMHO, like someone said in this thread before, the latest
mainline kernel shouldn't be called stable, but differently.

> Predictably, the common response sounds a bit like
> "shut up, go away, you're an idiot, it doesn't happen to me."  These are not
> useful as they serve not one whit to improve the situation, but give pause
> to those who might otherwise want to bring up a valid issue, once more.

Yes, I apologize for this. After reading your response now, such
complains are much more clear to me.

> There's no
> reason why development methods need to change in order to reduce the > number
> of flaky "stable" kernels.  It would be sufficient to replace the somewhat
> deceptive word "stable" with one that is more accurate; beta or gamma >test
> make sense as they already have industry acceptance.  Clearly "stable" is
> not appropriate, as implicitly agreed by others who have advised: "don't >use
> in production"; "wait at least a year"; and more.
>
> Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I doubt
> anybody honestly thinks otherwise.

This is the whole point IMHO. :D Fully agree with you here.

> As to whether other operating systems are stable, well that's a fair
> question.  I agree that few large bodies of computer code are flawless, and
> so stability can be relative.  In that spirit I venture to put the
> stipulated kernels into order of decreasing reliability: Best is BSD,
> Solaris & OS X; then Windows; and then there's Linux.  If named
> distributions had been included, the list would look better (for us); they'd
> go in the first group.  Thank goodness for the Debian, Red Hat and Novell
> (to name just a few) for giving the world something which does, at least
> largely, meet expectations.
>

In my opinion you shouldn't compare the latest Linux kernel (however,
such comparison would be fair if the latest Linux kernel would be a
'real' stable one) to other operating systems, but rather you should
just compare proper Linux distributions: Debian, RHEL to FreeBSD and
Solaris, OpenSuse, Kubuntu to Windows and OS X etc. Otherwise, it's
like comparing some *BSD development branch to Debian.

The similar situation to described in this thread is when comes to
Fedora. There are people (Linux newbies etc.) who can consider Fedora
is just an another ordinary, Linux distribution, but they're wrong.
Fedora usually ships with the latest, experimental stuff and if some
newbie (or even developer) decides to use Fedora and then he discovers
things simply brake he can consider Linux is a mess. Fedora shipped
with KDE 4.0 development release and even Linus was taken in, because
he probably thought it's a stable KDE release. Imho there should be a
notice what people have to deal with.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 15:43       ` Martin Steigerwald
  2010-07-12 17:36         ` Willy Tarreau
@ 2010-07-12 17:55         ` Stefan Richter
  1 sibling, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-12 17:55 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel, Willy Tarreau

Martin Steigerwald wrote:
> Bugzilla severity and priority fields or something similar could be used to 
> set the importance of a bug report and the regression list could be sorted 
> by importance. One important criterion also would be whether someone could 
> confirm it, reproduce it. Even when I reported those desktop freezes, 
> unless someone confirmed them it might just happen for me. Well a "confirm" 
> or vote button might be good, so that the amount of confirmations could be 
> counted. 

"I can reproduce it" comments are often very helpful.  "It is important
to me (and it should be to you too)" comments perhaps not so much.

If a bug doesn't make any progress, it may be because the cause of the
bug (i.e. which subsystem is at fault or when the bug was introduced) is
not known well enough.  In such a case, more reproducers won't really
help (let alone stating that it is important to somebody); then somebody
needs to delve deeper into it and narrow the cause further down.

A bug which can be reproduced by several people is usually a bug that
can be reproduced quite reliably, and hence is a bug whose cause can
likely be found by bisection.  A bug report with a to be blamed git
commit ID attached (at least as far as the reporter could determine),
Cc'd to author and committer of that commit, has more chances to get
fixed quicker than others.

So, votes don't help IMO; good reports do.  And the reports need to be
early enough --- i.e. somebody needs to run -rc kernels --- since coming
up with a fix, validating the fix, and merging it may take time.

If there is little progress on a regression for which at least the
faulty subsystem is known, and the release goes by, the merge window
opens, and you see a pull request for that subsystem, then reply to that
pull request with a friendly reminder that there is still an unresolved
regression in that subsystem waiting for attention.

[...]
> As told already I will rebalance my decision on which kernel to use.

If or when you cannot spare resources to test a kernel yourself (be it
Linus' final release, or an -rc, not to mention linux-next), you can
also look out for Raphael's regression lists around the time of a final
release, to get a picture whether it is a worse or better one.
-- 
Stefan Richter
-=====-==-=- -=== -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 15:56       ` David Newall
  2010-07-12 17:48         ` Marcin Letyns
@ 2010-07-12 18:00         ` Stefan Richter
  2010-07-12 19:58           ` David Newall
  2010-07-13 16:50         ` Theodore Tso
  2 siblings, 1 reply; 72+ messages in thread
From: Stefan Richter @ 2010-07-12 18:00 UTC (permalink / raw)
  To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List

David Newall wrote:
> Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I
> doubt anybody honestly thinks otherwise.

It works stable for what I use it for.

If it doesn't for you, then I hope you are already in contact with the
respective subsystem developers to get the regressions that you
experience fixed.
-- 
Stefan Richter
-=====-==-=- -=== -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11  7:18 stable? quality assurance? Martin Steigerwald
                   ` (2 preceding siblings ...)
  2010-07-11 13:56 ` Lee Mathers
@ 2010-07-12 19:46 ` Nix
       [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>
  4 siblings, 0 replies; 72+ messages in thread
From: Nix @ 2010-07-12 19:46 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

On 11 Jul 2010, Martin Steigerwald said:

> 2.6.34 was a desaster for me: bug #15969 - patch was availble before 
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well 
> as most important two complete lockups - well maybe just X.org and radeon 
> KMS, I didn't start my second laptop to SSH into the locked up one - on my 
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I 
> just downgraded to 2.6.33 again.
[...]
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with 
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since 
> 2.6.34 did not hang with it anymore which was a reason for me to try 
> 2.6.34 earlier.

To introduce yet more anecdata into this thread, I too had problems with
TuxOnIce-driven suspend/resume from just post-2.6.32 to just pre-2.6.34.
The solution was, surprise surprise, to *raise a bug report*, whereupon
in short order I had a workaround. In 2.6.34, the problem vanished as
mysteriously as it appeared, as did the bug whereby X coredumped and the
screen stayed dark forever upon quitting X. 2.6.34 and 2.6.34.1 have
worked better for me than any kernel I've used since 2.6.30, with no
bugs noticeable on any of my machines (that's a first since 2.6.26).

I speculate that there may be some subtle piece of overwriting inside
the Radeon KMS and/or DRM code, which is obscure enough that it is
relatively easily perturbed by changes elsewhere in the kernel.

But nonetheless, one cannot extrapolate from a single bug in a subsystem
as complex as DRM/KMS to the quality of the entire kernel. This is
doubly true given the degree of difference between different cards
labelled as Radeons: I'd venture to state that most of the Radeon bugs
I've seen flow past over the last year or so only affect a small subset
of cards: but if you add them all up, it's likely that most users have
been bitten by at least one. But the problem here is not the kernel
developers, nor the kernel quality: it's that ATI Radeons are a
horrifically complicated and tangled web of slightly variable hardware.
(In this they are no different from any other modern graphics card.)


Martin, might I suggest considering stable kernels 'experimental' until
at least .1 is out? Before Linus releases a kernel, its only users are
dedicated masochists and developers: after the release, piles of regular
early adopters pour in, and heaps of bug reports head to lkml and fixes
head to -stable. The .1 kernels, with fixes for some of those, are the
first you can really call *stable*, as they've got fixes for bugs
isolated after testing by a much larger userbase of suckers.

  -- N., dedicated sucker and masochist

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 17:36         ` Willy Tarreau
@ 2010-07-12 19:56           ` Martin Steigerwald
  2010-07-12 23:03             ` Stefan Richter
  0 siblings, 1 reply; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-12 19:56 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 12427 bytes --]

Am Montag 12 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,

for now I downgraded to 2.6.33.2 and started a compile of 2.6.33.6. I hit 
yet another bug, but thats a TuxOnIce one (nevertheless reported at 
bugzilla.kernel.org at #15873). And after booting again after the resume 
did not work, the machine just locked up again while just playing an avi 
file from photo sd card - I *think* that dubious freeze bug I mentioned 
before. Since I am holding a Linux training this week I just decide to 
downgrade now. Again I didn't try to SSH into the machine, but it was 
after eight o clock after a long work day, its really hot here and I just 
couldn't stand doing any collecting information about the bug work that 
might have easily taken two or more hours. Actually I also do not know 
what to do with such a random freeze bug? How to best approach it without 
sinking insane amounts of time into it?

The last freeze bug I had was with my ThinkPad T23 when plugging in and 
later removing the eSATA PCMCIA card. It worked for quite some kernel 
versions, but since a certain version it just started to freeze on 
removal. Upto 2.6.33 where I last tried I think. And there I had at least 
found on what situation it happens.

What do I do with such bugs? Back then I just decided to not use the eSATA  
PCMCIA card in that ThinkPad T23 again, which isn't that unreasonable I 
think. I didn't even report, which granted might be the reason that its 
not yet fixed.

I am willing to do some testing, but I also like to use Linux. And above a 
certain amount its just too much for me. Frankly said for me its all 
happening too fast. I experienced it with some KDE 4 versions - later ones 
like 4.3 and 4.4! - where I reported so many bug I easily stumpled upon 
that at some time I just gave up reporting anything. Sure I wanted Radeon 
DRM KMS. Its great. But I really hope things will be more stable again 
soon. A new feature is great - when it works. That said, I am not sure 
whether the recent freeze bug on my ThinkPad T42 is related to Radeon DRM.

I think I wait for 2.6.34.2 or .3 and then try again. If it then happens 
again, hopefully in a moment where I have nerve to deal with such bugs, I 
fire up my second notebook and try to SSH into the machine. If that works I 
at least could look into dmesg and X.org logs.

Thats what I meant: For me personally the balance is lost. The kernel does 
not have to be perfect, but I am experiencing just too many issues 
including quite nasty ones at the moment. 2.6.33.2 with userspace software 
suspend was stable, or 2.6.32 with TuxOnIce. Thus I am trying 2.6.33.6.

> On Mon, Jul 12, 2010 at 05:43:56PM +0200, Martin Steigerwald wrote:
> > > Among the things he explained, I remember that one of primary
> > > concern was the inability to slow down development. I mean, if he
> > > waits 2 more weeks for things to stabilize, then there will be two
> > > more weeks of crap^H^H^H^Hdevelopment merged in next merge window,
> > > so in fact this will just shift dates and not quality.
> > 
> > Would it make that much of a difference? Linus could still say no to
> > obvious crap, couldn't he?
> 
> It's not "obvious" crap, it's that the developers will simply have
> advanced two more weeks ahead of their schedule, so their merge will
> be larger as it will contain some parts that ought to be in next
> release should the kernel be release earlier. And it will not be
> possible to delay merging because among them there's always the killer
> feature everybody wants. This is the reason for the strict merge
> window.

Hmmm, it could also be used as two more weeks for testing the new stuff 
that should go on, but that might just be wishful thinking...

Is the Linux kernel development really in balance with feature work and 
stabilization work? Currently at least from my personal perception it is 
not. Development goes that fast - can you all cope with that speed? Maybe 
its just time to *slow it down* a bit? Does it really scale? I am 
overwhelmed. Several times I just had enough of it. Others had other 
experiences. So it might just be me having lots of bad luck. What are 
experiences of others?

Actually I think a bit more shift to quality work couldn't harm.

> > > There are also some regressions that get merged with every
> > > pre-release. Thus, assuming he would wait for one more pre-release
> > > to merge the fixes you spotted, 2 or 3 more would appear, so
> > > there's a point where it must be decided when to release.
> > 
> > Some sort of classifying bugs could help here I think. Something that
> > helps Linus to decide whether it is worth to do another release
> > candidate round or not.
> 
> Maybe sometimes that could indeed help, but that must not be done too
> often, otherwise releases slip and patches get even bigger.
> 
> (...)
> 
> > I do
> > think that the Radeon KMS does not work after resume bug (#15969)
> > does qualify since it causes loss of data handled by the current X
> > session(s) - sure I normally save my stuff before hibernating,
> > but... And it actually had a patch that has been tested!
> 
> Then the problem should be checked on this side : why this patch didn't
> get merged in time ? Maybe the maintainer needed more time to recheck
> it, maybe he was on holiday, maybe he was ill on the wrong day, maybe
> he had already merged tons of fixes and preferred to get this one for
> next time, ... But even if there are fixes pending, this should not be
> a reason to *delay* releases, otherwise we go back to the problem
> above, with also the problem of new regressions reported with tested
> fixes available...
> 
> (...)

Well it should only be done for major regressions I think. I still think 
some sorting in the regression list regarding importance and tested patch 
availability could help. I think that the Radeon DRM fix was quite a low 
hanging fruit.

> > Maybe an approach would be to dynamically generate the list from all
> > bug reports marked for 2.6.34 versions and have it posted to kernel
> > mailing list after every rc. This way bug #15969 would at least have
> > been in the list of known regressions.
> 
> In fact, Rafael regularly emits this list, and the respective
> maintainers are informed. That means to me that there's little hope
> that you'll get the maintainers to merge and send a fix they did not
> manage to do. What *could* be improved though would be if Linus
> publically states the deadline for last fixes, as Greg does with the
> stable branch. That can give hopes to some of them to finish a little
> merge work in time instead of considering it's too late.

Hmmm, I did not find any regression list after 2.6.34-rc5 but before 2.6.35 
on kernel mailing list here. And the bug and fix was with rc7. If the list 
would be generated right after every rc? I wouldn't want to demand of 
anyone to do it that often, but with some automation and a team of people 
triaging and collecting regressions...

> > Bugzilla severity and priority fields or something similar could be
> > used to set the importance of a bug report and the regression list
> > could be sorted by importance. One important criterion also would be
> > whether someone could confirm it, reproduce it. Even when I reported
> > those desktop freezes, unless someone confirmed them it might just
> > happen for me. Well a "confirm" or vote button might be good, so
> > that the amount of confirmations could be counted.
> 
> Maybe that could help, but it will not necessarily be the best
> solution. Keep in mind that some issues may be more important but
> still reported only by one user. If one reports FS corruption, you
> certainly don't want to wait for a few other ones to confirm the bug
> for instance. Security issues don't need counting either.

Okay, granted. It would just be a indication.

But a complete or desktop freeze bug could lead to huge data loss, too, 
depending on when the user saved his data the last time. Thus is it that 
much more unimportant.

> > > It's not really advisable to call dot-0 releases "unstable" because
> > > it will only result in shifting the adoption point between the user
> > > classes above. We need to have enthousiasts who proudly say "hey
> > > look, dot-0 and it's already rock solid". We've all seen some of
> > > them and they're the ones who help reporting issues that get fixed
> > > in the next stable release.
> > 
> > I do think the claim should be honest. "stable" IMHO is not, at least
> > from a user's point of view. "unstable" isn't either, cause a dot-0
> > kernel is not guarenteed to be unstable ;). So I agree with the
> > major release kernel approach from Rafael.
> 
> But it's also the starting point of the stable branch. And what about
> the -stable branch itself. Sometimes an awful bug will prevent the
> kernel from even booting for most users, and a single patch will be
> present in the stable branch to fix this early. Same if a major
> security issue gets discovered at the time of release, it's possible
> that the stable branch only contains one patch. That does not qualify
> it for more stable than the main branch either, eventhough it's called
> "stable". Maybe we should indicate on www.kernel.org that a new
> release has generally received little testing but should be good
> enough for experienced users to test it, and that stable releases
> before .3-.4 are not recommended for general use.

I thought about calling it a "major kernel release" or something like that 
from dot-0 and then after stable patches settle - but on what criterion to 
decide that? - "stable". Just .3 or .4? Or when there have been some dot 
releases with few patches? But then what if Greg just takes a bit longer 
to make the next one and it just contains more patches due to that reason?

> > But beyond that, I do think its worth thinking about ways to improve
> > the process of ensuring as much stability as sensibly possible. A
> > dot-0 kernel won't be error-free - but I find just claiming the
> > current process as "the best we can have" not actually satisfying.
> > And I do think it can be improved upon. I do not do kernel
> > development, but I am willing to help with collecting information
> > about the current state of the kernel, help with bug triaging as
> > good as I can and manage to take time. I do have some experience
> > with quality management as I coordinated the betatest of some
> > AmigaOS versions,  but then this has been in a closed group. Here
> > its a different scale and I believe it needs somewhat different
> > approaches.
> 
> In fact, I think we're at a point where the development process scales
> linearly with every brain and every pair of eyeballs. There are two
> orthogonal axes to scale, one on the quality and one on the quantity.
> Both are required, but the time spent on one is not spent on the other
> one. Customers want quantity (features) and expect implicit quality.

Don't customers also want stability? I certainly want it. And many people 
running servers too in my experience.

> It is possible for some people to bring a lot of added value, a lot
> more than they would through their share of brain time on code. This is
> the case for Rafael and Greg who noticeably enhance quality, but it's
> not limited to them too. Code reviews, bug reviews, -next branch,
> etc... are all geared towards quality. But one thing is sure, there
> are far less people working on quality than there are working on
> features, so I think that if you want to help, there is possibly a way
> to noticeably improve quality with one more guy there, though you have
> to find how to efficiently spend that time !

Yes, and I didn't find that yet. I am not in a state where I can just read 
kernel code and actually understand what it does. Where I might be able to 
start helping with his collecting and categorizing bug and regression 
information, bug triaging and stuff. For some bugs at least. I think there 
are bugs where I just do not understand enough to do anything helpful.

Last post for today. Enough of computing.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 18:00         ` Stefan Richter
@ 2010-07-12 19:58           ` David Newall
  2010-07-12 21:11             ` Stefan Richter
                               ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: David Newall @ 2010-07-12 19:58 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Marcin Letyns, Linux Kernel Mailing List

Stefan Richter wrote:
> David Newall wrote:
>   
>> Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I
>> doubt anybody honestly thinks otherwise.
>>     
>
> It works stable for what I use it for.
>   
Mea culpa.  I didn't mean that 2.6.34 is unstable, but that the term 
"stable" is not appropriate for a newly released kernel; "gamma" should 
be used instead.

Merely six months ago 2.6.32 was released; today we're preparing for 
2.6.35; a new kernel every two months!  Perhaps 2.6.31 is truly the 
latest stable kernel; or else 2.6.27 does, which is the other 2.6 on the 
front page of kernel.org.  I'm pretty sure 2.4 is stable (which might 
explain why I see it embedded *much* more frequently than 2.6.)

> If it doesn't for you, then I hope you are already in contact with the
> respective subsystem developers to get the regressions that you
> experience fixed.
>   
(Segue to a problem which follows from calling bleeding-edge kernels 
"stable".)

When reporting bugs, the first response is often, "we're not interested 
in such an old kernel; try it with the latest."  That's not hugely 
useful when the latest kernels are not suitable for production use.  If 
kernels weren't marked stable until they had earned the moniker, for 
example 2.6.27, then the expectation of developers and of users would be 
consistent: developers could expect users to try it again with latest 
stable kernel, and users could reasonably expect that trying it wouldn't 
break their system.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 19:58           ` David Newall
@ 2010-07-12 21:11             ` Stefan Richter
  2010-07-12 21:39             ` Martin Steigerwald
  2010-07-15  7:23             ` david
  2 siblings, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-12 21:11 UTC (permalink / raw)
  To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List

David Newall wrote:
> Stefan Richter wrote:
>> If it doesn't for you, then I hope you are already in contact with the
>> respective subsystem developers to get the regressions that you
>> experience fixed.
>>   
> (Segue to a problem which follows from calling bleeding-edge kernels
> "stable".)
> 
> When reporting bugs, the first response is often, "we're not interested
> in such an old kernel; try it with the latest."

Because there are continuously going bug fixes into the new kernels.

> That's not hugely useful when the latest kernels are not suitable for
> production use.

"I have this bug here." - "It might be fixed in 2.6.mn.  Try it." - "I
don't want to because I got burned by 2.6.jk."  Well, then don't do it
and keep using the old buggy kernel.  Or use a forked kernel where
somebody adds bugfix backports and feature backports as you require
them, if that somebody does a really good job.

> If kernels weren't marked stable until they had earned the moniker,
> for example 2.6.27, then the expectation of developers and of users
> would be consistent:

2.6.27.y is what you call stable exactly because none of the boatloads
of bug fixes and improvements of each subsequent 2.6.x release goes into
it anymore.

That's the nature of the beast.  You can't have the cake and eat it.
Which is why it is important that we keep the regression count in new
kernels low and try to detect and fix regressions as early as possible.
I admit that I do not really help with this myself outside the subsystem
which I maintain.  I usually start to run -rc kernel at later -rc's only
(say, -rc5, only sometimes earlier) and don't test them beyond the one
or to two configurations that I use personally.  There were occasionally
regressions in the subsystem that I maintain but they were few and
always fixed quickly, and each one was a lesson how to do better.  So,
for that subsystem, the "Latest Stable Kernel" that is advertised on the
front page of kernel.org really and truly /is/ the latest stable release
that is recommended for production use, as far as that subsystem is
concerned.
-- 
Stefan Richter
-=====-==-=- -=== -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 19:58           ` David Newall
  2010-07-12 21:11             ` Stefan Richter
@ 2010-07-12 21:39             ` Martin Steigerwald
  2010-07-12 22:44               ` Stefan Richter
  2010-07-15  7:23             ` david
  2 siblings, 1 reply; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-12 21:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Newall, Stefan Richter, Marcin Letyns

[-- Attachment #1: Type: Text/Plain, Size: 8705 bytes --]

Am Montag 12 Juli 2010 schrieb David Newall:
> Stefan Richter wrote:
> > David Newall wrote:
> >> Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I
> >> doubt anybody honestly thinks otherwise.
> > 
> > It works stable for what I use it for.
> 
> Mea culpa.  I didn't mean that 2.6.34 is unstable, but that the term
> "stable" is not appropriate for a newly released kernel; "gamma" should
> be used instead.

I indeed think stable should mean "stable for the majority of users". Its 
difficult to estimate. But I doubt that every dot-0 release qualified for 
that.

> Merely six months ago 2.6.32 was released; today we're preparing for
> 2.6.35; a new kernel every two months!  Perhaps 2.6.31 is truly the
> latest stable kernel; or else 2.6.27 does, which is the other 2.6 on
> the front page of kernel.org.  I'm pretty sure 2.4 is stable (which
> might explain why I see it embedded *much* more frequently than 2.6.)

I have these metrics:

martin@shambhala:~> uprecords -m 20 | cut -c1-70
     #               Uptime | System                                  
----------------------------+-----------------------------------------
     1    36 days, 09:57:31 | Linux 2.6.32.3-tp42-toi-  Tue Jan 12 09:
     2    31 days, 01:07:24 | Linux 2.6.26.5-tp42-toi-  Tue Sep 30 13:
     3    24 days, 13:29:07 | Linux 2.6.33.2-tp42-toi-  Mon May 31 22:
     4    21 days, 15:08:21 | Linux 2.6.29.2-tp42-toi-  Tue Apr 28 22:
     5    19 days, 21:22:14 | Linux 2.6.33.2-tp42-toi-  Tue May 11 17:
     6    19 days, 09:49:05 | Linux 2.6.32.8-tp42-toi-  Fri Mar  5 11:
     7    18 days, 02:31:41 | Linux 2.6.29.6-tp42-toi-  Thu Jul  9 09:
     8    17 days, 12:38:36 | Linux 2.6.28.8-tp42-toi-  Wed Mar 18 10:
     9    16 days, 16:10:28 | Linux 2.6.31-tp42-toi-3.  Tue Sep 22 21:
    10    15 days, 14:39:26 | Linux 2.6.28.4-tp42-toi-  Mon Feb  9 22:
    11    15 days, 13:58:12 | Linux 2.6.27.7-tp42-toi-  Tue Dec  9 22:
    12    13 days, 21:11:06 | Linux 2.6.31-rc7-tp42-to  Mon Aug 31 21:
    13    13 days, 18:34:00 | Linux 2.6.29.2-tp42-toi-  Wed May 27 19:
    14    12 days, 21:54:18 | Linux 2.6.26.5-tp42-toi-  Fri Oct 31 13:
    15    10 days, 22:02:14 | Linux 2.6.28.7-tp42-toi-  Thu Feb 26 16:
    16    10 days, 16:29:02 | Linux 2.6.33.2-tp42-toi-  Fri Jun 25 19:
    17    10 days, 08:04:52 | Linux 2.6.26.2-tp42-toi-  Thu Sep 18 14:
    18    10 days, 03:52:30 | Linux 2.6.31.3-tp42-toi-  Thu Oct 15 09:
    19     9 days, 22:03:29 | Linux 2.6.31.5-tp42-toi-  Tue Nov  3 11:
    20     9 days, 00:24:22 | Linux 2.6.29.2-tp42-toi-  Thu Jun 25 14:
----------------------------+-----------------------------------------
-> 116     0 days, 00:52:03 | Linux 2.6.33.6-tp42-toi-  Mo
----------------------------+-----------------------------------------
1up in     0 days, 00:31:56 | at                        Mon Jul 12 23:
t10 in    15 days, 13:47:24 | at                        Wed Jul 28 12:
no1 in    36 days, 09:05:29 | at                        Wed Aug 18 08:
    up   608 days, 02:40:08 | since                     Thu Sep 18 14:
  down    54 days, 06:12:57 | since                     Thu Sep 18 14:
   %up               91.808 | since                     Thu Sep 18 14:

And 228 entries in there in total since 2.6.26, with 

martin@shambhala:~> uprecords -m 300 | cut -c1-70 | grep "0 days" | wc -l
148

entries for shorter than one day.

Sure these are not to be read without the experiences I made and the 
reasons for rebooting, since sometimes just I messed up with some kernel 
option and compiled another one.

AFAIR 2.6.26 upto 2.6.32 has been fine, except 2.6.30 where TuxOnIce just 
didn't work, but I am not yet sure whether this was caused by TuxOnIce or 
by some problem with general hibernation infrastructure. I then just 
omitted 2.6.30. Since I only tried 2.6.31 with my T42 I got an whooping 
uptime of over 100 days for 2.6.29 on my T23! Thats stable. Well any 
kernels that reproducably reach more than 15 or 30 days are quite stable 
in my own subjective consideration. Most kernels that got that far would 
likely have lastest much longer if I didn't just compile the next one, be 
it a dot release or a major release.

This all without Radeon KMS!

2.6.33.2 was only stable when I used Radeon KMS without TuxOnIce. Ok, so 
might be a TuxOnIce problem, but then at least those quite frequent hangs 
on hibernation at the place where the screen goes black for a few seconds 
and comes back then which I had with 2.6.33.2 where gone for 2.6.34. Maybe 
they are gone with 2.6.33.6 since it carries some more radeon drm fixes.

2.6.34 did not reach an uptime of more than 2 or 3 days yet.

Well maybe Nix is right and its just that Radeon KMS has not been 
stabilized enough and rest of kernel is quite stable.

And when the combination of 2.6.33 now .6 and userspace software suspend 
works for me - for the first time, often it was TuxOnIce that worked, but 
not any in kernel method I tried from time to time - so be it for the time 
being, even if userspace software suspend is way slower and doesn't 
satisfy the disk on writing the image.

> > If it doesn't for you, then I hope you are already in contact with
> > the respective subsystem developers to get the regressions that you
> > experience fixed.
> 
> (Segue to a problem which follows from calling bleeding-edge kernels
> "stable".)
> 
> When reporting bugs, the first response is often, "we're not interested
> in such an old kernel; try it with the latest."  That's not hugely
> useful when the latest kernels are not suitable for production use.  If
> kernels weren't marked stable until they had earned the moniker, for
> example 2.6.27, then the expectation of developers and of users would
> be consistent: developers could expect users to try it again with
> latest stable kernel, and users could reasonably expect that trying it
> wouldn't break their system.

I think thats really a question on how to attract more widespread testing. 
For wider spread testing it needs to be stable enough to have enough users 
deal with it. But without wider spread testing it might not get there.

I just dropped 2.6.34 for now and I will wait for more dot releases. Maybe 
I am really the only one for whom 2.6.34 doesn't work, maybe just other 
people did so to frustrated without telling here or in bugzilla. 

Maybe providing better ways to report bugs and gather information even on 
freeze bugs without setting up too much manually could help. I certainly 
think that the enhanced DrKonqi crash reported from KDE 4.3 and up helped 
users to provide *good bug reports*. Maybe there could be something like 
that for the kernel and an easy option to have the kernel store even 
backtraces for hard crashes. Unfortunately there is no reset button on 
notebooks, so memory might be the wrong place. Well one could dedicate a 
ring buffer space on the swap partition for that or something like that - 
that area should be writable even when no filesystem is not working 
anymore. On next reboot the bug report application recovers the crash data 
from there. Would impose a risk that on severe memory corruption the 
kernels write crash data elsewhere, where it shouldn't save it. An USB 
stick comes to mind, but what when the USB stack doesn't work anymore?

Well not every bug is a freeze bug and maybe something could be done for 
non freeze bugs. Like an application which records selected data while the 
user reproduces the bug. Just like enhanced DrKonqi collects crash data 
and even helps the user to install necessary debug packages.

But I think when a kernel behaves to unstable for lots of users they just 
drop it. Some bugs are okay, but especially freeze bugs and even more so 
fs corruptions bugs scare non die-hard kernel debuggers who bisect a 
kernel a day away.

Maybe I just had lots of bad luck, so I would love to hear other 
experiences, some already said 2.6.34 works pretty stable for them.

I will leave 2.6.34.1 on my T23 which has a Savage which maybe will never 
get KMS, who knows, and on the workstation at work, which doesn't use 
Radeon KMS due to rock solid stable Debian Lenny userspace. Maybe this at 
least sheds a light, whether most of my issues have likely been Radeon KMS 
related.

As a side note: Ext4 is absolutely rock stable for me! As is XFS on my T23 
and even BTRFS for the T23 /home and some work directory on the 
workstation (not yet on my production T42).

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 21:39             ` Martin Steigerwald
@ 2010-07-12 22:44               ` Stefan Richter
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-12 22:44 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel, David Newall, Marcin Letyns

Martin Steigerwald wrote:
> And when the combination of 2.6.33 now .6 and userspace software suspend 
> works for me - for the first time, often it was TuxOnIce that worked, but 
> not any in kernel method I tried from time to time - so be it for the time 
> being, even if userspace software suspend is way slower and doesn't 
> satisfy the disk on writing the image.

BTW, the need to rely on a quite fundamental kernel component that is
not in the mainline (for whichever reason) in the long term, almost
guarantees you a lot of recurring pain, one way or another.
-- 
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 19:56           ` Martin Steigerwald
@ 2010-07-12 23:03             ` Stefan Richter
  2010-07-13 10:30               ` Martin Steigerwald
  2010-07-15  7:32               ` david
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-12 23:03 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Willy Tarreau, linux-kernel

Martin Steigerwald wrote:
> I think I wait for 2.6.34.2 or .3 and then try again. If it then happens 
> again, hopefully in a moment where I have nerve to deal with such bugs, I 
> fire up my second notebook and try to SSH into the machine. If that works I 
> at least could look into dmesg and X.org logs.

netconsole might be required.

...
> Is the Linux kernel development really in balance with feature work and 
> stabilization work? Currently at least from my personal perception it is 
> not. Development goes that fast - can you all cope with that speed? Maybe 
> its just time to *slow it down* a bit?

If those who added the regressions are found out and asked to debug and
fix them, the balance should be corrected and perhaps more precautions
being taken in the future.  Alas, finding the point in history at which
the kernel regressed might take a lot more time than to actually fix it
then.  In that case, maybe give the author of the bug an estimate of the
volunteered hours that were spent on reporting this bug, to put the
repercussions into it into perspective.  OTOH I suspect a lack of
responsibility at the developers is not so much an issue here, more so
that the number of people who take the time for -rc tests (not to
mention linux-next tests) _and_ to file reports is rather low.  Plus, a
good bug report often requires experience or good intuition, besides
patience and rigor.

There were discussions in the past on how more enthusiasts who are
willing and able to test prereleases could be attracted.  But maybe
(just maybe) there are more ways in which the developers themselves
could perform more extensive/ more systematic tests.
-- 
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 23:03             ` Stefan Richter
@ 2010-07-13 10:30               ` Martin Steigerwald
  2010-07-15  7:32               ` david
  1 sibling, 0 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-13 10:30 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Willy Tarreau, linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 2218 bytes --]

Am Dienstag 13 Juli 2010 schrieb Stefan Richter:
> ...
> 
> > Is the Linux kernel development really in balance with feature work
> > and  stabilization work? Currently at least from my personal
> > perception it is not. Development goes that fast - can you all cope
> > with that speed? Maybe its just time to slow it down a bit?
> 
> If those who added the regressions are found out and asked to debug and
> fix them, the balance should be corrected and perhaps more precautions
> being taken in the future.  Alas, finding the point in history at which
> the kernel regressed might take a lot more time than to actually fix it
> then.  In that case, maybe give the author of the bug an estimate of
> the volunteered hours that were spent on reporting this bug, to put
> the repercussions into it into perspective.  OTOH I suspect a lack of
> responsibility at the developers is not so much an issue here, more so
> that the number of people who take the time for -rc tests (not to
> mention linux-next tests) and to file reports is rather low.  Plus, a
> good bug report often requires experience or good intuition, besides
> patience and rigor.
> 
> There were discussions in the past on how more enthusiasts who are
> willing and able to test prereleases could be attracted.  But maybe
> (just maybe) there are more ways in which the developers themselves
> could perform more extensive/ more systematic tests.

Well I reported it now, although it contains not nearly as much 
information on how to reproduce it or any other debug information either. 
I just did not report it before cause I didn't find the information I can 
provide very helpful and until yesterday I thought it might just have been 
these two freezes and thats it. But maybe report it early is better than 
not to report it at all.
 
Bug 16376 -  random - possibly Radeon DRM KMS related - freezes
https://bugzilla.kernel.org/show_bug.cgi?id=16376

I will look in the logs whether I might have luck and find anything this 
afternoon when my students learn vi/vim, but I doubt it.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 14:51   ` Martin Steigerwald
  2010-07-11 17:22     ` Willy Tarreau
  2010-07-11 19:49     ` Stefan Richter
@ 2010-07-13 11:11     ` Alejandro Riveira Fernández
  2010-07-13 12:50       ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter
  2 siblings, 1 reply; 72+ messages in thread
From: Alejandro Riveira Fernández @ 2010-07-13 11:11 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

El Sun, 11 Jul 2010 16:51:42 +0200
Martin Steigerwald <Martin@lichtvoll.de> escribió:


> 
> One reason for a demand for me is best expressed by this question: Does 
> the kernel developer community want to encourage that a group of advanced 
> Linux users - but mostly non-developers - compile their own vanilla or 
> valnilla near kernels, provide wider testing and report a bug now and 
> then?
> 
> I can live with either answer. If not, I just will be much more reluctant 
> to try out new kernels.

 I for one stopped booting into -rc kernels.
 The fact that still have to patch my kernels with a *one* liner
 since 2.6.29 kernel [1] does not give me confidence on the "test
 report/bisect and it will be fixed" promise some have made in this
 threath
 
 [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 11:11     ` Alejandro Riveira Fernández
@ 2010-07-13 12:50       ` Stefan Richter
  2010-07-13 15:35         ` John W. Linville
  2010-07-13 18:06         ` Alejandro Riveira Fernández
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-13 12:50 UTC (permalink / raw)
  To: Alejandro Riveira Fernández
  Cc: Martin Steigerwald, linux-kernel, Johannes Berg,
	John W. Linville, linux-wireless

Alejandro Riveira Fernández wrote:
>  I for one stopped booting into -rc kernels.
>  The fact that still have to patch my kernels with a *one* liner
>  since 2.6.29 kernel [1] does not give me confidence on the "test
>  report/bisect and it will be fixed" promise some have made in this
>  threath
>  
>  [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362

There were promises made in this thread?  Then I must have read a
different mailinglist or so.

I do not know why your WLAN regression has not been fixed yet, but at
least it seems rather plausible why commit
7e0986c17f695952ce5d61ed793ce048ba90a661 is not going to be reverted (if
such a revert is the one-liner that you are referring to).

Why is one reporter's rt2500 OK now though but not yours?  Are there
different card revisions or firmwares out there that require quirk handling?
-- 
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 12:50       ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter
@ 2010-07-13 15:35         ` John W. Linville
  2010-07-13 18:19           ` Alejandro Riveira Fernández
  2010-07-13 18:06         ` Alejandro Riveira Fernández
  1 sibling, 1 reply; 72+ messages in thread
From: John W. Linville @ 2010-07-13 15:35 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Alejandro Riveira Fernández, Martin Steigerwald,
	linux-kernel, Johannes Berg, linux-wireless

On Tue, Jul 13, 2010 at 02:50:14PM +0200, Stefan Richter wrote:
> Alejandro Riveira Fernández wrote:
> >  I for one stopped booting into -rc kernels.
> >  The fact that still have to patch my kernels with a *one* liner
> >  since 2.6.29 kernel [1] does not give me confidence on the "test
> >  report/bisect and it will be fixed" promise some have made in this
> >  threath
> >  
> >  [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362
> 
> There were promises made in this thread?  Then I must have read a
> different mailinglist or so.
> 
> I do not know why your WLAN regression has not been fixed yet, but at
> least it seems rather plausible why commit
> 7e0986c17f695952ce5d61ed793ce048ba90a661 is not going to be reverted (if
> such a revert is the one-liner that you are referring to).
> 
> Why is one reporter's rt2500 OK now though but not yours?  Are there
> different card revisions or firmwares out there that require quirk handling?

The patch (7e0986c1) corrects an obvious error.  Reverting it might
improve your (i.e. Alejandro) performance, but it seems likely to
cause connectivity problems for others.

The fact that reverting 7e098c1 helps you suggests that rt2500usb
isn't using the basic_rates map properly.  But after reviewing the
code and the data I have, I can't see what would be causing that.
It is at least possible that your AP is sending bad rate information.
Have you tried this device with other APs?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 15:56       ` David Newall
  2010-07-12 17:48         ` Marcin Letyns
  2010-07-12 18:00         ` Stefan Richter
@ 2010-07-13 16:50         ` Theodore Tso
  2010-07-13 20:45           ` David Newall
  2 siblings, 1 reply; 72+ messages in thread
From: Theodore Tso @ 2010-07-13 16:50 UTC (permalink / raw)
  To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List


On Jul 12, 2010, at 11:56 AM, David Newall wrote:

> Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I doubt anybody honestly thinks otherwise.

Stable is relative.  Some people are willing to consider 
Fedora "stable".  Other people will only use a RHEL 
kernel, and there are those who are using RHEL 4 
or even RHEL 3 because they are extremely risk-adverse.

So arguments about whether or not a specific kernel 
version deserves to be called "stable" is going to be 
a waste of time and electrons because it's all about 
expectations.

But the one huge thing that people are forgetting is that
the fundamental premise behind open source is "scratch
your own itch".    That means that people who own a 
specific piece of hardware have to collectively be responsible
for making sure that it works.   It's not possible for me to
assure that some eSATA PCMCIA card on a T23 laptop
still works, because I don't own the hardware.   So the only
way we know whether or not there is a regression is 
there is *someone* who owns that hardware which is
willing to try it out, hopefully during -rc3 or -rc4, and let
us known if there is a problem, and hopefully help us
debug the problem.

If you have people saying, "-rc3 isn't stable", I'll wait until
"-rc5" to test things, then it will be that much later before
we discover a potential problem with the T23 laptop, and 
before we can fix it.   If people say, "2.6.34.0" isn't stable,
I refuse to run a kernel until "2.6.34.4", then if they are the 
only person with the T23 eSata device, then we won't hear
about the problem until 2.6.34.4, and it might not get fixed
until 2.6.34.5 or 2.6.34.6!

What this means is yes that stable basically means, "stable
for the core kernel developers".   You can say that this isn't
correct, and maybe even dishonest, but if we wait until 2.6.34.N
before we call a release "stable", and this discourages users
from testing 2.6.34.M for M<N, it just delays when bugs will
be found and fixed.

This is why to me, arguing that 2.6.34.0 is not "stable" really
isn't useful.   If you really want to frequently update your kernel
and use the latest and greatest, part of the price that you have
to pay is to help us with the testing, bug reporting, and root
cause determination.

If you don't like this, your other choice is to pay $$$ to the
folks who provide support for Solaris and OS X, and accept
the restrictions in hardware implied by Solaris and OS X.
(Hint: neither supports a Thinkpad T23.)   But to compare
Linux, especially the non-distribution source code distribution
from kernel.org with operating systems that have very different
business models is to really and fundamentally understand
how things work in the Linux world.

If you want that kind of stability, then you will need to use an
older kernel.  Or use a distribution kernel which has a support
and testing and business model compatible with your desires.
Fedora for example uses kernels which are six months out of 
date, because during those six months, the people who use the
testing versions of Fedora are doing testing and helping with
the bug fixing.   Red Hat uses this free testing pool to improve
the testing and stability of Red Hat Enterprise Linux, so if you 
are willing to live with a 2-3 year release cycle, RHEL will be 
more stable than Fedora.  And if you need to make sure that
bugs are fixed very quickly, and you can call and demand 
a developer's attention, you can pay $$$ for a support contract.

I will say once again.   There is no such thing as a free lunch.
Linux is a better deal than most, and you have multiple
choices about how frequently you update, whether you let
someone else decide whether or not a particular kernel
release plus patches is "stable", or more accurately,
"stable enough", and you can choose how much you are willing
to pay, either in personal time and effort, or $$$ to some support
organization.

But demanding that kernel.org become "more stable" when it
is supported by purely volunteers is simply not reasonable.

-- Ted


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 12:50       ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter
  2010-07-13 15:35         ` John W. Linville
@ 2010-07-13 18:06         ` Alejandro Riveira Fernández
  2010-07-13 19:18           ` Stefan Richter
  1 sibling, 1 reply; 72+ messages in thread
From: Alejandro Riveira Fernández @ 2010-07-13 18:06 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Martin Steigerwald, linux-kernel, Johannes Berg,
	John W. Linville, linux-wireless

El Tue, 13 Jul 2010 14:50:14 +0200
Stefan Richter <stefanr@s5r6.in-berlin.de> escribió:

> Alejandro Riveira Fernández wrote:
> >  I for one stopped booting into -rc kernels.
> >  The fact that still have to patch my kernels with a *one* liner
> >  since 2.6.29 kernel [1] does not give me confidence on the "test
> >  report/bisect and it will be fixed" promise some have made in this
> >  threath
> >  
> >  [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362
> 
> There were promises made in this thread?  Then I must have read a
> different mailinglist or so.

 Ok no promises.
 Maybe I read to much in to Mr Tso previous mail. My apologies
 [quote]
 > So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
 > I find bugs, I report them and I help fix them.  If more people did
 > that, then the 2.6.X.0 releases would be more stable.  But kernel
 > development is a volunteer effort, so it's up to the volunteers to
 > test and fix bugs during the rc4, -rc5 and -rc6 time frame. 
 
 [...]
 > [...]                         Linux may be a very good bargain (look
 > at how much Oracle has increased its support contracts for Solaris!),
 > but it's still not a free lunch.  At the end of the day, you get what
 > you put into it.

  I tested the kernels i reported the bugs and helped (to the best of my
  knowledge; I'm not a programmer) 
  I got no result.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 15:35         ` John W. Linville
@ 2010-07-13 18:19           ` Alejandro Riveira Fernández
  2010-07-13 18:38             ` John W. Linville
  0 siblings, 1 reply; 72+ messages in thread
From: Alejandro Riveira Fernández @ 2010-07-13 18:19 UTC (permalink / raw)
  To: John W. Linville
  Cc: Stefan Richter, Martin Steigerwald, linux-kernel, Johannes Berg,
	linux-wireless

El Tue, 13 Jul 2010 11:35:31 -0400
"John W. Linville" <linville@tuxdriver.com> escribió:


> 
> The patch (7e0986c1) corrects an obvious error.  Reverting it might
> improve your (i.e. Alejandro) performance, but it seems likely to
> cause connectivity problems for others.
> 
> The fact that reverting 7e098c1 helps you suggests that rt2500usb
  
  my card is pci so it would be rt2500pci
  
> isn't using the basic_rates map properly.  But after reviewing the
> code and the data I have, I can't see what would be causing that.
> It is at least possible that your AP is sending bad rate information.
> Have you tried this device with other APs?
 
 No; this is a desktop pc that connects to my home router/AP. A new wifi
 card is cheaper than a new AP ...


> 
> John

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 18:19           ` Alejandro Riveira Fernández
@ 2010-07-13 18:38             ` John W. Linville
  2010-07-13 19:07               ` Alejandro Riveira Fernández
  0 siblings, 1 reply; 72+ messages in thread
From: John W. Linville @ 2010-07-13 18:38 UTC (permalink / raw)
  To: Alejandro Riveira Fernández
  Cc: Stefan Richter, Martin Steigerwald, linux-kernel, Johannes Berg,
	linux-wireless

On Tue, Jul 13, 2010 at 08:19:27PM +0200, Alejandro Riveira Fernández wrote:
> El Tue, 13 Jul 2010 11:35:31 -0400
> "John W. Linville" <linville@tuxdriver.com> escribió:
> 
> 
> > 
> > The patch (7e0986c1) corrects an obvious error.  Reverting it might
> > improve your (i.e. Alejandro) performance, but it seems likely to
> > cause connectivity problems for others.
> > 
> > The fact that reverting 7e098c1 helps you suggests that rt2500usb
>   
>   my card is pci so it would be rt2500pci
  
Sorry, typo...

> > isn't using the basic_rates map properly.  But after reviewing the
> > code and the data I have, I can't see what would be causing that.
> > It is at least possible that your AP is sending bad rate information.
> > Have you tried this device with other APs?
>  
>  No; this is a desktop pc that connects to my home router/AP. A new wifi
>  card is cheaper than a new AP ...

Perhaps you could capture some beacons from that AP?

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 18:38             ` John W. Linville
@ 2010-07-13 19:07               ` Alejandro Riveira Fernández
  0 siblings, 0 replies; 72+ messages in thread
From: Alejandro Riveira Fernández @ 2010-07-13 19:07 UTC (permalink / raw)
  To: John W. Linville
  Cc: Stefan Richter, Martin Steigerwald, linux-kernel, Johannes Berg,
	linux-wireless

El Tue, 13 Jul 2010 14:38:52 -0400
"John W. Linville" <linville@tuxdriver.com> escribió:

> 
> > > isn't using the basic_rates map properly.  But after reviewing the
> > > code and the data I have, I can't see what would be causing that.
> > > It is at least possible that your AP is sending bad rate information.
> > > Have you tried this device with other APs?

 I do no know; i captured some debug data for Ivo back in the day and from
 what he said all the info passed to the card was correct...
 See http://lkml.org/lkml/2009/5/25/163 ( link is in bugzilla) in case
 you missed it

> >  
> >  No; this is a desktop pc that connects to my home router/AP. A new wifi
> >  card is cheaper than a new AP ...
> 
> Perhaps you could capture some beacons from that AP?

  f you explain how; I can try.

> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)
  2010-07-13 18:06         ` Alejandro Riveira Fernández
@ 2010-07-13 19:18           ` Stefan Richter
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-07-13 19:18 UTC (permalink / raw)
  To: Alejandro Riveira Fernández
  Cc: Martin Steigerwald, linux-kernel, Johannes Berg,
	John W. Linville, linux-wireless

Alejandro Riveira Fernández wrote:
> El Tue, 13 Jul 2010 14:50:14 +0200
> Stefan Richter <stefanr@s5r6.in-berlin.de> escribió:
>> There were promises made in this thread?  Then I must have read a
>> different mailinglist or so.
> 
>  Ok no promises.
>  Maybe I read to much in to Mr Tso previous mail. My apologies
>  [quote]
>  > So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
>  > I find bugs, I report them and I help fix them.  If more people did
>  > that, then the 2.6.X.0 releases would be more stable.  But kernel
>  > development is a volunteer effort, so it's up to the volunteers to
>  > test and fix bugs during the rc4, -rc5 and -rc6 time frame. 
>  
>  [...]
>  > [...]                         Linux may be a very good bargain (look
>  > at how much Oracle has increased its support contracts for Solaris!),
>  > but it's still not a free lunch.  At the end of the day, you get what
>  > you put into it.
> 
>   I tested the kernels i reported the bugs and helped (to the best of my
>   knowledge; I'm not a programmer) 
>   I got no result.

"You get what you put into it" probably did not mean "report a bug, get
it fixed, every time".  Often enough, kernel bugs or hardware quirks are
very hard to fix without direct access to affected hardware.

Here is how my involvement with Linux started:  I reported a bug but
nobody reacted.  I collected some more information, reported the bug
again, and it was immediately fixed by the driver authors.  From then on
I kept following driver development as a tester and answered user
questions.  A few years later, the driver authors all had left for other
projects but there were still bugs to tackle.  So I started to write and
submit bug fixes myself.  (I'm not a programmer either but by then I
already knew a lot about the subsystem.)
-- 
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-13 16:50         ` Theodore Tso
@ 2010-07-13 20:45           ` David Newall
  2010-07-14  6:33             ` Theodore Tso
  0 siblings, 1 reply; 72+ messages in thread
From: David Newall @ 2010-07-13 20:45 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Marcin Letyns, Linux Kernel Mailing List

Theodore Tso wrote:
> What this means is yes that stable basically means, "stable
> for the core kernel developers".   You can say that this isn't
> correct, and maybe even dishonest, but if we wait until 2.6.34.N
> before we call a release "stable", and this discourages users
> from testing 2.6.34.M for M<N, it just delays when bugs will
> be found and fixed.
>   

Calling it stable instils and reinforces a Pavlovian response in typical 
users, that recent Linux kernels are dangerous and unreliable; one year 
old was suggested as a safe benchmark. Typical users being 99% of the 
population, testing hardly begins until a kernel is "sufficiently old." 
This Pavlovian response is what really delays finding and fixing bugs. 
Being up-front and saying which kernels are likely to fail would help 
many users calculate the risk and improve their willingness to try newer 
kernels. "Sufficiently old" might well come down to six months, maybe four.

That is to say, instead of taking a year to pass gamma-testing, new 
kernels could be passed in six months or less. That would be a big 
improvement in stability and quality assurance however you dice it.


> But demanding that kernel.org become "more stable" when it
> is supported by purely volunteers is simply not reasonable.

Let's not be hysterical; nobody made any demands. Semantics aside, the 
suggestion is reasonable because it affects developers' workloads not 
one whit. The only change is the label that Linus applies to new releases.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-13 20:45           ` David Newall
@ 2010-07-14  6:33             ` Theodore Tso
  0 siblings, 0 replies; 72+ messages in thread
From: Theodore Tso @ 2010-07-14  6:33 UTC (permalink / raw)
  To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List


On Jul 13, 2010, at 4:45 PM, David Newall wrote:
> 
> Calling it stable instils and reinforces a Pavlovian response in typical users, that recent Linux kernels are dangerous and unreliable; one year old was suggested as a safe benchmark. Typical users being 99% of the population, testing hardly begins until a kernel is "sufficiently old." This Pavlovian response is what really delays finding and fixing bugs. Being up-front and saying which kernels are likely to fail would help many users calculate the risk and improve their willingness to try newer kernels. "Sufficiently old" might well come down to six months, maybe four.

Most typical users should be using distribution kernels.  Period.

We can't say which kernels are likely to fail, because we don't know.  If people don't test newer kernels, the mere passage of time, whether it's four months, or six months, or a year, or two years, is not going to magically make problems go away and get fixed.   That only happens if someone steps up and tries it out, and if it breaks submits bug reports or patches.   A fairly large number of Linux developers seem to prefer relatively recent vintage Thinkpads, preferably without Nvidia or ATI chipsets.   These laptops are generally safe and reliable by -rc3 or so --- because if they aren't the Linux developers step up and complain and do code bisections and they fix the problem.

If someone has a T23 laptop, and they help out by doing the same, then it will also be safe and reliable by the time of 2.6.X.0.   If they just kvetch and complain, and stamp their feet, and say "Linux is unsafe and unreliable", and no other T23 owners step up to the challenge, then two years might go by and the same kernel might still be unreliable --- for them.

-- Ted


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 19:58           ` David Newall
  2010-07-12 21:11             ` Stefan Richter
  2010-07-12 21:39             ` Martin Steigerwald
@ 2010-07-15  7:23             ` david
  2 siblings, 0 replies; 72+ messages in thread
From: david @ 2010-07-15  7:23 UTC (permalink / raw)
  To: David Newall; +Cc: Stefan Richter, Marcin Letyns, Linux Kernel Mailing List

On Tue, 13 Jul 2010, David Newall wrote:

> (Segue to a problem which follows from calling bleeding-edge kernels 
> "stable".)
>
> When reporting bugs, the first response is often, "we're not interested in 
> such an old kernel; try it with the latest."  That's not hugely useful when 
> the latest kernels are not suitable for production use.  If kernels weren't 
> marked stable until they had earned the moniker, for example 2.6.27, then the 
> expectation of developers and of users would be consistent: developers could 
> expect users to try it again with latest stable kernel, and users could 
> reasonably expect that trying it wouldn't break their system.

2.6.27 didn't get declared 'stable' because it had very few bugs, it was 
declared 'stable' because someone volunteered to maintain it longer and 
back-port patches to it long past the normal process.

2.6.32 got declared 'long-term stable' before 2.6.33 was released, again 
not because it was especially good, but because it didn't appear to be 
especially bad and several distros were shipping kernels based on it, so 
again someone volunteered (or was volunteered by the distro that pays 
their paycheck) to badk-port patches to it longer.

I have been running kernel.org kernels on my production systems for >13 
years. I am _very_ short of time, so I generally don't get a chance to 
test the -rc kernels (once in a while I do get a chance to do so on my 
laptop). What I do is every 2-3 kernel releases I wait a couple days after 
the kernel release to see if there are show-stopper bugs, and if nothing 
shows up (which is the common case for the last several years) I compile a 
kernel and load it on machines in my lab. I try to have a selection of 
machines that match the systems I have in production in what I have found 
are the 'important' ways (a defintition that changes once in a while when 
I find something that should 'just work' that doesn't ;-). This primarily 
includes systems with all the network card types and Raid card types that 
I use in production, but now also includes a machine with a SSD (after I 
found a bug that only affected that combination)

if my lab machiens don't crash immediatly, I leave them running (usually 
not even stress testing them, again lack of time) for a week or so, then I 
put the new kernel on my development machiens, wait a few days, then put 
them on QA machines, wait a few days, then put them in production. I have 
the old kernel around so that I can re-boot into it if needed.

This tends to work very well for me. It's not perfect and every couple of 
cycles I run into grief and have to report a bug to the kernel list. 
Usually I find it before I get into production, but I have run into cases 
that got all the way into production before I found a problem.

with the 'new' -stable series, I generally wait until at least 2.6.x.1 is 
released before I consider it ready to go anywhere outside my lab (I'll 
still install the 2.6.x kernel in the lab, but I'll wait for the 
additional testing that comes with the .1 stable kernels before moving it 
on)

I don't go through this entire process with the later -stable kernels, If 
I'm already running 2.6.x and there is a 2.6.x.y released that contains 
fixes that look like they are relavent to the configuration that I run 
(which lets out the majority of changes, I do fairly minimal kernel 
configs) I will just test it in the lab to do a smoke test, then schedule 
a rollout through the rest of my network. If there are no problems before 
I get permission to deploy to production I put it on half my boxes, 
failover to them, then wait a little bit (a day to a week) before 
upgrading the backups.

this writeup actually makes it sound like I spend a lot of time working 
with kernels, but I really don't. I'll spend couple half days twice a year 
on testing, and then additional time rolling it out to the 150+ clusters 
of servers I have in place. If you can't spend at least this much time on 
the kernel you are probably better off just running your distro kernel, 
but even there you really should do a very similar set of tests on it's 
kernel releases.

There's another department in my company that uses distro kernels (big 
name distro, but I will avoid flames by not naming names) without the 
testing routine that I use and my track record for stability compares 
favorablely to theirs over the last 7 years or so (they haven't been 
running linux as long as I have, so we can't go back as far ;-) They also 
do more updates than I do simply because they can't as easily look at the 
kernel release and decide it doesn't apply to them.

David Lang

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-12 23:03             ` Stefan Richter
  2010-07-13 10:30               ` Martin Steigerwald
@ 2010-07-15  7:32               ` david
  1 sibling, 0 replies; 72+ messages in thread
From: david @ 2010-07-15  7:32 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Martin Steigerwald, Willy Tarreau, linux-kernel

On Tue, 13 Jul 2010, Stefan Richter wrote:

> Plus, a
> good bug report often requires experience or good intuition, besides
> patience and rigor.

In my experience these are less of a requirement than patience and 
persistence. With these attributes you will be able to work your way 
through figuring out what data is needed for this bug report by answering 
questions (and if you get no response, trying again)

nobody starts off knowing how to report a bug, and frequently you don't 
start off knowing all the info that will be needed to solve the bug, but 
if you report it and keep digging you will almost always get helped.

David Lang

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
       [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>
@ 2010-07-15  9:09   ` Valeo de Vries
  2010-07-16  7:00     ` Greg KH
  0 siblings, 1 reply; 72+ messages in thread
From: Valeo de Vries @ 2010-07-15  9:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Martin

On 11 July 2010 08:18, Martin Steigerwald <Martin@lichtvoll.de> wrote:
>
> Hi!
>
> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
> as most important two complete lockups - well maybe just X.org and radeon
> KMS, I didn't start my second laptop to SSH into the locked up one - on my
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I
> just downgraded to 2.6.33 again.
>
> I still actually *use* my machines for something else than hunting patches
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
> (accentuation from me). I know of the argument that one should use a
> distro kernel for machines that are for production use. But frankly, does
> that justify to deliver in advance known crap to the distributors? What
> impact do partly grave bugs reported on bugzilla have on the release
> decision?
>
> And how about people who have their reasons - mine is TuxOnIce - to
> compile their own kernels?
>
> Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
> freezes as well. So far so good.
>
> Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
> website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
> still have some problems like the hang on hibernation reported in
>
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
> 2.6.34 did not hang with it anymore which was a reason for me to try
> 2.6.34 earlier.
>
> I am quite a bit worried about the quality of the recent kernels. Some
> iterations earlier I just compiled them, partly even rc-ones which I do
> not expact to be table, and they just worked. But in the recent times .0,
> partly even .1 or .2 versions haven't been stable for me quite some times
> already and thus they better not be advertised as such on kernel.org I
> think. I am willing to risk some testing and do bug reports, but these are
> still production machines, I do not have any spare test machines, and
> there needs to be some balance, i.e. the kernels should basically work.
> Thus I for sure will be more reluctant to upgrade in the future.

Ooh, it's been a while since I've partaken in a LKML flamewar. ;-)

On a slightly less childish note, I agree with a few of your points. I have
noticed *stable* releases (I'm talking distro kernels here) being less than
stable on occasion recently (the sporadic hard lock-up, bdi-writeback
taking damn long, the recent 'umount with dirty buffers taking an ice-age
to complete' bug). Additionally there seems to have been some very
chunky point-releases in the last 3-6 months, many containing patches
that really should have been kept for the next Linus kernel.org kernel, IMO.
These annoyances drove me away from Linux for a good few months... it's
amazing what working full-time with Windows can do to one's soul, though!

That said, from what I've seen of late, there's only one guy (Greg) handling
most of the stable stuff (there are probably others working behind the
scenes), and he has a hell of a lot on his plate. So if you, like me, want to
see more reliable stable releases, I'd recommend either offering to help out
in some way reviewing/testing stable patches, as telling volunteers their shit
doesn't tend to gain you much at all, generally. :-)

Valeo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 15:58   ` William Pitcock
  2010-07-11 16:34     ` Eric Dumazet
@ 2010-07-16  6:59     ` Greg KH
  2010-08-05  3:27       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 72+ messages in thread
From: Greg KH @ 2010-07-16  6:59 UTC (permalink / raw)
  To: William Pitcock; +Cc: Eric Dumazet, linux-kernel

On Sun, Jul 11, 2010 at 07:58:42PM +0400, William Pitcock wrote:
> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
> as a Xen domU.  I would say 2.6.32.12 is the best choice since who knows
> what other regressions there are in .16.

Did you happen to tell the stable maintainer about this and do a simple
'git bisect' to find the offending patch so that it can be resolved?

{sigh}


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-15  9:09   ` Valeo de Vries
@ 2010-07-16  7:00     ` Greg KH
  2010-07-16  7:19       ` Justin P. Mattock
                         ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Greg KH @ 2010-07-16  7:00 UTC (permalink / raw)
  To: Valeo de Vries; +Cc: linux-kernel, Martin

On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
> That said, from what I've seen of late, there's only one guy (Greg) handling
> most of the stable stuff (there are probably others working behind the
> scenes),?and he has a hell of a lot on his plate.

Nope, it's just me :)

thanks,

greg "i need some minions" k-h

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-16  7:00     ` Greg KH
@ 2010-07-16  7:19       ` Justin P. Mattock
  2010-07-16 15:25       ` Randy Dunlap
  2010-07-16 15:34       ` Valeo de Vries
  2 siblings, 0 replies; 72+ messages in thread
From: Justin P. Mattock @ 2010-07-16  7:19 UTC (permalink / raw)
  To: Greg KH; +Cc: Valeo de Vries, linux-kernel, Martin

On 07/16/2010 12:00 AM, Greg KH wrote:
> On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
>> That said, from what I've seen of late, there's only one guy (Greg) handling
>> most of the stable stuff (there are probably others working behind the
>> scenes),?and he has a hell of a lot on his plate.
>
> Nope, it's just me :)
>
> thanks,
>
> greg "i need some minions" k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


you need some some minions...

Justin P. Mattock

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-16  7:00     ` Greg KH
  2010-07-16  7:19       ` Justin P. Mattock
@ 2010-07-16 15:25       ` Randy Dunlap
  2010-07-16 15:34       ` Valeo de Vries
  2 siblings, 0 replies; 72+ messages in thread
From: Randy Dunlap @ 2010-07-16 15:25 UTC (permalink / raw)
  To: Greg KH; +Cc: Valeo de Vries, linux-kernel, Martin

On Fri, 16 Jul 2010 00:00:10 -0700 Greg KH wrote:

> On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
> > That said, from what I've seen of late, there's only one guy (Greg) handling
> > most of the stable stuff (there are probably others working behind the
> > scenes),?and he has a hell of a lot on his plate.
> 
> Nope, it's just me :)
> 
> thanks,
> 
> greg "i need some minions" k-h
> --

Chris Wright is still listed in MAINTAINERS...

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-16  7:00     ` Greg KH
  2010-07-16  7:19       ` Justin P. Mattock
  2010-07-16 15:25       ` Randy Dunlap
@ 2010-07-16 15:34       ` Valeo de Vries
  2 siblings, 0 replies; 72+ messages in thread
From: Valeo de Vries @ 2010-07-16 15:34 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On 16 July 2010 08:00, Greg KH <greg@kroah.com> wrote:
> On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
>> That said, from what I've seen of late, there's only one guy (Greg) handling
>> most of the stable stuff (there are probably others working behind the
>> scenes),?and he has a hell of a lot on his plate.
>
> Nope, it's just me :)
>
> thanks,
>
> greg "i need some minions" k-h

I thought that was the case, alas.

I'm not sure how much time I could commit, but I'd be interested in
helping out, even if it's just reviewing and testing patches heading
for stable. Are there any specific areas you could use a hand with
though?

Valeo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-16  6:59     ` Greg KH
@ 2010-08-05  3:27       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 72+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-05  3:27 UTC (permalink / raw)
  To: Greg KH; +Cc: William Pitcock, Eric Dumazet, linux-kernel

  On 07/15/2010 11:59 PM, Greg KH wrote:
> On Sun, Jul 11, 2010 at 07:58:42PM +0400, William Pitcock wrote:
>> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
>> as a Xen domU.  I would say 2.6.32.12 is the best choice since who knows
>> what other regressions there are in .16.
> Did you happen to tell the stable maintainer about this and do a simple
> 'git bisect' to find the offending patch so that it can be resolved?

If it is compiled on Debian then its probably that cmpxchg memory 
argument bug which hits in pvclock.c.

     J

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 17:22     ` Willy Tarreau
                         ` (2 preceding siblings ...)
  2010-07-12 15:43       ` Martin Steigerwald
@ 2010-09-04 16:38       ` Martin Steigerwald
  2010-09-04 18:46         ` Ted Ts'o
                           ` (2 more replies)
  3 siblings, 3 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-04 16:38 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 4902 bytes --]

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy, hi everyone else reading this,

> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the  current replies I perceive a lack of that ability.
> 
> well, I'll try to do then :-)
> 
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
> 
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.

During bisecting [Bug 16376] random - possibly Radeon DRM KMS related 
freezes, which goes very slowly due to having lots of unbootable kernels 
with an ext4 / readahead related backtrace during boot, I had an idea:

I think main problem is that the current development process does not give 
time for quality work and bug fixing. As I understand it currently its just 
a constant development of new features with bug fixing and quality work 
having to be done beneath that development:

- before 2.6.36 is released developers aim at developing new stuff for 
2.6.37.

- after 2.6.36 is released developers aim at getting as much stuff into 
2.6.37 and then after two weeks at developing new features for 2.6.38.

This process does not take bug fixing into account at all, cause after the 
merge window has closing, developers hurry to get the stuff ready for the 
next window.

In that model extending the freeze period after rc1 doesn't help at all, 
cause as you say more "crap^H^H^H^Hdevelopment" gets collected for the 
next kernel.

But is that a *given* that no one actually has any influence to? Is 
collecting changes for next kernel like rain that either pours down or not 
- usually pours down in this case like in August in Germany ;)? Who feeds 
Linus with new stuff during the merge window? From what I understand of the 
Linux development process its mainly the subsystem maintainers and Andrew 
Morton.

What if those people stop collecting new stuff for Linus except bugfixes 
about two or three weeks before the next kernel is relased? This would 
give the subsystem trees and the mm tree some time to stabilize a bit, so 
that Linus gets more quality stuff in the first time. And more importantly, 
since developers know that subsystem maintainers and Andrew only collect 
bugfixes 2-3 weeks before the release of a stable kernel, they can as well 
spend some time on quality work.

Of course, developers can still decide: Well if 2.6.37 work is closed 
already and continue developing for 2.6.38 even earlier, but I still think 
this would help to slow things down a bit prior to the critical phase 
before releasing a stable kernel. Cause when I know my subsystem 
maintainer or Andrew won't be taking my stuff anyway, before the release 
kernel is released, I can take a little time for other things.

The main idea here is to have a two-staged freeze process and to 
distribute the "I am only taking bug fixes" work to more people than Linus.

For this to work properly, I think at the time of the release of the 
stable kernel subsystem maintainers and Andrew should branch their trees. 
For example when 2.6.36 is released:

- tree 
  => 2.6.36-stable-tree
  => tree, where 2.6.37 stuff will be going in

Thus when subsystem maintainers take new stuff during the merge window, it 
will be for the next kernel release already, not for the current one. 
Except bugfix work. Whereas I think the criteria for bug fix work should not 
be that strict than for the stable patches Greg collects.

Thus it needs to be clear: No new stuff for next kernel already two weeks 
prior to release the current stable kernel.

I think, this could help. Its a bit like the two-staged development 
process of Debian, but with the freeze period for "unstable" being a fixed 
time interval of about 2 weeks instead of RC=0 for stable ;). Its a bit of 
a formal shift of attention to the stable kernel about 2 weeks before its 
release. Developers might find creative ways to circumvent it, or they 
understand, that this process serves a purpose of improving kernel 
quality.

When you think these two weeks cannot be squeezed into the three-monthly 
development cycle, a four-monthly development cycle might do. But actually 
I don't see why these two weeks could not be made to fit in there.

Installing and testing next kernel after yet another mail to this thread,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-07-11 13:16 ` Ted Ts'o
  2010-07-11 18:02   ` Anca Emanuel
  2010-07-12  6:46   ` David Newall
@ 2010-09-04 17:12   ` Martin Steigerwald
  2 siblings, 0 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-04 17:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ted Ts'o

[-- Attachment #1: Type: Text/Plain, Size: 6297 bytes --]


Hi Ted,

I wanted to answer this for a long time...

Am Sonntag 11 Juli 2010 schrieb Ted Ts'o:
> On Sun, Jul 11, 2010 at 09:18:41AM +0200, Martin Steigerwald wrote:
> > I still actually *use* my machines for something else than hunting
> > patches for kernel bugs and on kernel.org it is written "Latest
> > *Stable* Kernel" (accentuation from me). I know of the argument that
> > one should use a distro kernel for machines that are for production
> > use. But frankly, does that justify to deliver in advance known crap
> > to the distributors? What impact do partly grave bugs reported on
> > bugzilla have on the release decision?
> 
> So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
> I find bugs, I report them and I help fix them.  If more people did
> that, then the 2.6.X.0 releases would be more stable.  But kernel
> development is a volunteer effort, so it's up to the volunteers to
> test and fix bugs during the rc4, -rc5 and -rc6 time frame.  But if
> the work tails off, because the developers are busily working on new
> features for the new release, then past a certain point, delaying the
> release reaches a point of diminishing returns.  This is why we do
> time-based releases.

It sure helps quality of the kernel if people test rc candidates of them 
and report bugs, but I think at least partly you missed my point. I wrote 
in my initial mail:

> 2.6.34 was a desaster for me: bug #15969 - patch was availble before 
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as 
> well as most important two complete lockups - well maybe just

So two out of three bugs I experienced - the third one being [Bug 16376] 
random - possibly Radeon DRM KMS freezed I am currently bisecting - 
actually have been from testers that actually tested rc kernels. One even 
had a patch prior to releasing 2.6.34.

So for these two bugs testing rc kernels clearly has not helped raising 
the *release* kernel quality.

I now understand that deferring a stable kernel release can cause a lot of 
pain. But still I have the question why at least the patch from the bug 
15969 has not been taken prior to release? Not to find some guilt, but to 
possibly find ways to improve the process. I can't check bugzilla right now 
due to too many MySQL connections on the server - already reported, but 
supposedly already known to the admins anyway - but AFAIR the patch has 
been available and AFAIR also tested way before the release.

So my question still stands whether anything can be improved with at least 
getting as much bugfix patches from Bugzilla into stable kernel. At least 
for critical bugs like does not boot or only garbage on screen after 
booting.

I can accept that bug 15788 would have been missed by that, but this bug 
was not that important - it was just the tip on the iceberg.

> It is possible to do other types of release strategies, but look at
> Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens
> if you insist on waiting until all release blockers are fixed (and
> even with Debian, past a certain point the release engineer will still
> just reclassify bugs as no longer being release blockers --- after the
> stable release has slipped for months or years past the original
> projected release date.)

I made a suggestion on how to improve the development process while still 
holding to time-based releases in my other mail to this thread today.

> So if you and others like you are willing to help, then the quality of
> the Linux kernels can continue to improve.  But simply complaining
> about it is not likely to solve things, since threating to not be
> willing to upgrade kernels is generally not going to motivate many, if
> not most, of the volunteers who work on stablizing the kernel.

I do, but I need to balance this. I already spend quite some hours on 
bisecting that freeze bug mentioned above and it might take some more 
weeks to nail it down.

And it was not a threat at all. I just have to balance how much 
instability I can take on systems that I use for my daily stuff.

> > I am willing to risk some testing and do bug reports, but these are
> > still production machines, I do not have any spare test machines, and
> > there needs to be some balance, i.e. the kernels should basically
> > work.
> 
> So you want the latest and greatest new features in a brand-new kernel
> release, but you're not willing to pay for test machines, and you're
> not willing to pay for a distribution support...  The fact that you
> are willing to do some testing is appreciated, but remember, there's
> no such thing as a free lunch.  Linux may be a very good bargain (look
> at how much Oracle has increased its support contracts for Solaris!),
> but it's still not a free lunch.  At the end of the day, you get what
> you put into it.

Ted, I think there is no need to attack me like that. Actually all of the 
bugs have been on my laptop that I use for work *and* private work. Most 
of the time I spent on these bugs have been during my spare volunteer time 
as well. And we are yet a small company.

When I apply what you wrote above, the only sane thing would be to use a 
distro kernel and be done with it - which means less testing of recent 
kernels. Still even then that likely radeon kms related freeze could have 
slipped even into Debian stable kernel, considering that no one posted to 
the bug report that he was able to reproduce the bug.

Then I'd just accept the slower turn-around cycles with in kernel or 
userspace software suspend and be done with compiling TuxOnIce kernels. 

But I am not there yet. Cause compiling TuxOnIce kernels worked pretty 
well prior from 2.6.11 to 2.6.33. And I want to help as good as I can. 
Hopefully after bisecting the radeon kms relate freeze bug thinks are 
calmer again - although there is another wierd, possibly difficult to track 
bug left. Maybe I just had lots of bad luck with 2.6.34, and after 
tracking those two bugs things are calmer again. The Radeon KMS stuff has 
been a big change as well.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 16:38       ` Martin Steigerwald
@ 2010-09-04 18:46         ` Ted Ts'o
  2010-09-04 19:11           ` Martin Steigerwald
  2010-09-04 19:24         ` Stefan Richter
  2010-09-05  8:35         ` Avi Kivity
  2 siblings, 1 reply; 72+ messages in thread
From: Ted Ts'o @ 2010-09-04 18:46 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

On Sat, Sep 04, 2010 at 06:38:59PM +0200, Martin Steigerwald wrote:
> 
> During bisecting [Bug 16376] random - possibly Radeon DRM KMS related 
> freezes, which goes very slowly due to having lots of unbootable kernels 
> with an ext4 / readahead related backtrace during boot, I had an idea:

So I'm not sure what you're referring to here.  If there's an ext4
bug, why haven't you reported it to the linux-ext4 list?  I've done a
Google search for "Steigerwald ext4 readahead" and I can't find any
bug report related to kernel oops that are ext4/readahead-related.

No one else has reported such a bug to me, and I run a complete set of
regression tests before I push ext4 changes to Linus.  So I'm not sure
what you're seeing.  But complaining about it in passing on an e-mail
without sending a formal bug report to the linux-ext4 mailing list is
not likely to solve your problem...

						- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 18:46         ` Ted Ts'o
@ 2010-09-04 19:11           ` Martin Steigerwald
  2010-09-04 23:23             ` Ted Ts'o
  0 siblings, 1 reply; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-04 19:11 UTC (permalink / raw)
  To: Ted Ts'o, linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 3448 bytes --]

Am Samstag 04 September 2010 schrieb Ted Ts'o:
> On Sat, Sep 04, 2010 at 06:38:59PM +0200, Martin Steigerwald wrote:
> > During bisecting [Bug 16376] random - possibly Radeon DRM KMS related
> > freezes, which goes very slowly due to having lots of unbootable
> > kernels
> 
> > with an ext4 / readahead related backtrace during boot, I had an idea:
> So I'm not sure what you're referring to here.  If there's an ext4
> bug, why haven't you reported it to the linux-ext4 list?  I've done a
> Google search for "Steigerwald ext4 readahead" and I can't find any
> bug report related to kernel oops that are ext4/readahead-related.
> 
> No one else has reported such a bug to me, and I run a complete set of
> regression tests before I push ext4 changes to Linus.  So I'm not sure
> what you're seeing.  But complaining about it in passing on an e-mail
> without sending a formal bug report to the linux-ext4 mailing list is
> not likely to solve your problem...

Stop! I think we are misunderstanding.

Its a bug I stumpled across the bisecting process. Neither 2.6.33 or 
2.6.34 are affected, but some kernels in between. As such I didn't think 
its worth reporting the bug.

I made a photo of part of the backtrace tough, so if you want I open a bug 
report about it nonetheless. But I really think it has been fixed during 
the 2.6.33 to 2.6.34 development cycle.

For now I just skipped affected kernels in the bisection process in the 
hope that none is the first last good or first bad one regarding the freeze 
bug. Since for now it has all been kernels of a usb merge that showed this 
issue, I don't think the freeze bug is in there.

Its from:

# skip: [124d255382ddd37ffa920e9f5183efa54bbfe4f2] USB: pl2303: remove 
unnecessary reset of usb_device in urbs

to

# skip: [c68bb0d738897ed39b90c7ccb22e01c938117051] USB: cxacru: document 
how to interact with the flash memory

I did not test booting every single of those >100 revisions, but got fed 
up with this after the fifth non booting kernel or so. I didn't get why git 
bisect insisted on taking me back to this range of commits - even in the 
middle of two skips! - instead of just readjusting the binary search so 
that that range is met later in the process. Cause then it might have not 
met again at all. In the end I skipped every commit in this USB merge 
manually. The ext4 readahead thing must have been introduced before that 
merge and fixed somewhere after that merge. But I didn't find the comment 
that might have fixed it from a quick glance.

I do not even know whether its ext4 related at all, but ext4 and readahead 
has been in that backtrace.

So I just wanted to show that I am seriously working on tracking down that 
likely radeon kms related freeze bug and that its time-consuming for me 
due to having lots of unbootable kernels. I got another one of these with 
"Destination address too large" before even InitRD seems to have done 
anything. I skipped this one commit as well, and now git bisect seems to 
have taken me to a good one again, lets see. At least it didn't freeze 
prior up to now and I better press send now ;-). But from my bet on where 
the offending commit might be, this should be a good one. I am learning a 
lot on how to bisect a kernel right now ;).

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 16:38       ` Martin Steigerwald
  2010-09-04 18:46         ` Ted Ts'o
@ 2010-09-04 19:24         ` Stefan Richter
  2010-09-04 19:34           ` Stefan Richter
  2010-09-04 20:21           ` Martin Steigerwald
  2010-09-05  8:35         ` Avi Kivity
  2 siblings, 2 replies; 72+ messages in thread
From: Stefan Richter @ 2010-09-04 19:24 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Martin Steigerwald wrote:
> I think main problem is that the current development process does not give 
> time for quality work and bug fixing.

This has little to do with process.

Put simply, the paid developers work on what they are paid for.  The
volunteers work on what they are interested in.

If you feel that too little work is spent on stabilization and bug fixing, pay
someone or take matters into your own hand.  I.e. report bugs and work with
the developers to get the bugs fixed.

The current development process OTOH gives plenty of time for quality work and
bug fixing:

  - There are several stages at which new code can be tested:
    When it lives in subsystem development trees,
    when it has been pulled into the linux-next tree,
    when it has been pulled into Linus' tree.

  - Bug fixes are pulled by Linus almost any time whenever they are ready.
    (Of course, since fixes can and do introduce regressions too, only
    critical fixes are accepted in later -rcs.)

  - New code submissions are pulled by Linus in a fairly reliable cycle
    with reasonable frequency (less than three months).  That way,
    developers know that if their stuff did not quite cut it for
    mainline merge in month N, they know they can try again in month
    N+2 or N+3.  They are not left to guess whether their next chance
    will be in half a year or two years or next week.  Hence, nobody
    needs to panic and rush things when a merge window draws near.
    Plus, the code and the repository are open, so anybody can ship
    features to customers at any time independently of Linus' release
    cycle.  Linux distributors do this all the time.

> But is that a *given* that no one actually has any influence to? Is 
> collecting changes for next kernel like rain that either pours down or not 
> - usually pours down in this case like in August in Germany ;)? Who feeds 
> Linus with new stuff during the merge window? From what I understand of the 
> Linux development process its mainly the subsystem maintainers and Andrew 
> Morton.
> 
> What if those people stop collecting new stuff for Linus except bugfixes 
> about two or three weeks before the next kernel is relased?

Most of the maintainers are responsible enough to put only stuff into
linux-next which belongs there, i.e. tested, release-ready stuff.  Likewise
with submissions to Linus during the merge window.

Only some maintainers do in fact try to submit rushed, untested crap.
Sometimes they get caught red-handed.

The release-ready submissions that come via responsible maintainers still
contain some regressions though.  This is inevitable.  There are less
regressions if there are more enthusiasts who test development trees and
linux-next.  There are less regressions in Linus' releases if there are more
enthusiasts who test -rc kernels.  (And submit good bug reports and work with
the developers on them.)  And vice versa.

Process does not do much to prevent bugs or fix bugs.  People do. :-)

However, you can hardly tell people to implement less features and fix more
bugs if they don't owe you anything.
-- 
Stefan Richter
-=====-==-=- =--= --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 19:24         ` Stefan Richter
@ 2010-09-04 19:34           ` Stefan Richter
  2010-09-04 20:21           ` Martin Steigerwald
  1 sibling, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-09-04 19:34 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Stefan Richter wrote:
> Process does not do much to prevent bugs or fix bugs.  People do. :-)
> 
> However, you can hardly tell people to implement less features and fix more
> bugs if they don't owe you anything.

PS:  When a tester sunk a lot of time into a bisection or generally into a
good bug report, like you did recently according to your other post, then the
developer of the bug for sure owes you something...  But I am sure that most
developers do appreciate such work a lot.
-- 
Stefan Richter
-=====-==-=- =--= --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 19:24         ` Stefan Richter
  2010-09-04 19:34           ` Stefan Richter
@ 2010-09-04 20:21           ` Martin Steigerwald
  2010-09-04 22:50             ` Stefan Richter
  2010-09-04 23:16             ` Ted Ts'o
  1 sibling, 2 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-04 20:21 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 6220 bytes --]

Am Samstag 04 September 2010 schrieb Stefan Richter:
> Martin Steigerwald wrote:
> > I think main problem is that the current development process does not
> > give time for quality work and bug fixing.
> 
> This has little to do with process.
> 
> Put simply, the paid developers work on what they are paid for.  The
> volunteers work on what they are interested in.

And they are paid for features instead of fixing bugs? I doubt enterprise 
customers have this preference. I admit, they have no reason to pay for 
fixing my bug, unless they experience it too, however.

> If you feel that too little work is spent on stabilization and bug
> fixing, pay someone or take matters into your own hand.  I.e. report
> bugs and work with the developers to get the bugs fixed.

I do already for the bugs I encountered.

> The current development process OTOH gives plenty of time for quality
> work and bug fixing:
> 
>   - There are several stages at which new code can be tested:
>     When it lives in subsystem development trees,
>     when it has been pulled into the linux-next tree,
>     when it has been pulled into Linus' tree.
>
>   - Bug fixes are pulled by Linus almost any time whenever they are
> ready. (Of course, since fixes can and do introduce regressions too,
> only critical fixes are accepted in later -rcs.)
> 
>   - New code submissions are pulled by Linus in a fairly reliable cycle
>     with reasonable frequency (less than three months).  That way,
>     developers know that if their stuff did not quite cut it for
>     mainline merge in month N, they know they can try again in month
>     N+2 or N+3.  They are not left to guess whether their next chance
[...]

I will think a bit more about this. But my first impression is that all of 
these provisions are currently in conflict with time for feature work. If 
there is no stabilization or sorta of freeze period, the speed won't calm 
down in order to give stabilizitation a realistic chance.
 
> > But is that a *given* that no one actually has any influence to? Is
> > collecting changes for next kernel like rain that either pours down
> > or not - usually pours down in this case like in August in Germany
> > ;)? Who feeds Linus with new stuff during the merge window? From
> > what I understand of the Linux development process its mainly the
> > subsystem maintainers and Andrew Morton.
> > 
> > What if those people stop collecting new stuff for Linus except
> > bugfixes about two or three weeks before the next kernel is relased?
> 
> Most of the maintainers are responsible enough to put only stuff into
> linux-next which belongs there, i.e. tested, release-ready stuff. 
> Likewise with submissions to Linus during the merge window.
> 
> Only some maintainers do in fact try to submit rushed, untested crap.
> Sometimes they get caught red-handed.
> 
> The release-ready submissions that come via responsible maintainers
> still contain some regressions though.  This is inevitable.  There are
> less regressions if there are more enthusiasts who test development
> trees and linux-next.  There are less regressions in Linus' releases
> if there are more enthusiasts who test -rc kernels.  (And submit good
> bug reports and work with the developers on them.)  And vice versa.
> 
> Process does not do much to prevent bugs or fix bugs.  People do. :-)

Yes, my suggestion do not guarantee that people do report and fix bugs. But 
it gives more room for doing so, especially regarding fixing the open and 
known regressions. Again two of those that I mentioned initially have been 
reported by people *during* the rc phase already. Still the stable kernel 
did not receive the bug fix patch for the nastier one of it in time: 

That is what I am concerned about. If people do test, do report and 
someone even does a patch and yet its not in the stable kernel then, what 
for did they do it?

Okay, it was in 2.6.35.1, but when a major and reported regression is only 
fixed in stable patches I still think that any release without at least two 
or three stable patches should not be called stable at all - its just 
misleading then. And I think I am perfectly entitled to that oppinion. 
Anyway I will relabel kernels in my mind and not consider a kernel without 
stable patches stable anymore. I did so theoretically before already but 
now I experienced it for myself the first time.

> However, you can hardly tell people to implement less features and fix
> more bugs if they don't owe you anything.

Sorry for the demanding tone in my post that initiated the thread, but in 
the post you are answering too I merely made a suggestion. No one does owe 
me anything and I am aware of that.

But still even when I do not prepend each of my mails with a list of what 
I have done for the kernel - which is clearly less than what any core 
kernel developer or even a casual kernel developer did for the kernel  - I 
still can make a valuable suggestion.

That said I compiled a kernel a day or two for some time to help Ingo 
Molnar with testing an use case for his CFS scheduler. And am I regularily 
testing new TuxOnIce kernels and report back to Nigel how they fare. I 
report bugs for other open source projects like KDE or Debian as well and 
contribute a bit here and then, like my first debian package "fio".

And this work mostly has been enjoyable. Neither Ingo, nor Nigel, nor Jens 
Axboe asked me what I did for the kernel prior to working with me. They 
have just been happy for the feedback I gave.

I admit my initial post did well to provoke the kind of "what did you do?" 
feedback as it actually was demanding. But then I really was frustrated 
with the kernel and I think sometimes an oppinionated post like my 
"stable? quality assurance?" can be quite good. If I think a kernel is 
crap, why should it be prohibited that I tell it to their developers? At 
least I learned a lot and even started bisecting that bug even though it 
takes an insane amount of time to do so.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 20:21           ` Martin Steigerwald
@ 2010-09-04 22:50             ` Stefan Richter
  2010-09-04 23:16             ` Ted Ts'o
  1 sibling, 0 replies; 72+ messages in thread
From: Stefan Richter @ 2010-09-04 22:50 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Martin Steigerwald wrote:
> Am Samstag 04 September 2010 schrieb Stefan Richter:
>> Put simply, the paid developers work on what they are paid for.  The
>> volunteers work on what they are interested in.
> 
> And they are paid for features instead of fixing bugs?

There are lots of people who fix bugs on paid time or are even specifically
paid to fix bugs.

[...]
> I will think a bit more about this. But my first impression is that all of 
> these provisions are currently in conflict with time for feature work. If 
> there is no stabilization or sorta of freeze period, the speed won't calm 
> down in order to give stabilizitation a realistic chance.

Linus' merge--rc--release cycle only influences what is pulled into the
mainline when.  It does not prevent anyone to implement a new feature or to
stabilize an existing feature any time.

[...]
>> However, you can hardly tell people to implement less features and fix
>> more bugs if they don't owe you anything.
> 
> Sorry for the demanding tone in my post that initiated the thread, but in 
> the post you are answering too I merely made a suggestion. No one does owe 
> me anything and I am aware of that.
> 
> But still even when I do not prepend each of my mails with a list of what 
> I have done for the kernel - which is clearly less than what any core 
> kernel developer or even a casual kernel developer did for the kernel  - I 
> still can make a valuable suggestion.
> 
> That said I compiled a kernel a day or two for some time to help Ingo 
> Molnar with testing an use case for his CFS scheduler. And am I regularily 
> testing new TuxOnIce kernels and report back to Nigel how they fare. I 
> report bugs for other open source projects like KDE or Debian as well and 
> contribute a bit here and then, like my first debian package "fio".
> 
> And this work mostly has been enjoyable. Neither Ingo, nor Nigel, nor Jens 
> Axboe asked me what I did for the kernel prior to working with me. They 
> have just been happy for the feedback I gave.
> 
> I admit my initial post did well to provoke the kind of "what did you do?" 
> feedback as it actually was demanding.

By the sentence above I merely meant to say that you or I or anybody cannot
lay out work schedules for others who are not our employees. :-)

> But then I really was frustrated 
> with the kernel and I think sometimes an oppinionated post like my 
> "stable? quality assurance?" can be quite good. If I think a kernel is 
> crap, why should it be prohibited that I tell it to their developers?

It is not prohibited.  OTOH I don't know how useful it is at this general
level.  There are lots of subsystem projects in the kernel project, all in
different situations regarding how mature their subsystem is, how many
developers and testers they have, what their balance of new features vs.
stabilization work is.
-- 
Stefan Richter
-=====-==-=- =--= --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 20:21           ` Martin Steigerwald
  2010-09-04 22:50             ` Stefan Richter
@ 2010-09-04 23:16             ` Ted Ts'o
  1 sibling, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2010-09-04 23:16 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Stefan Richter, linux-kernel

On Sat, Sep 04, 2010 at 10:21:34PM +0200, Martin Steigerwald wrote:
> Am Samstag 04 September 2010 schrieb Stefan Richter:
> > Martin Steigerwald wrote:
> > > I think main problem is that the current development process does not
> > > give time for quality work and bug fixing.
> > 
> > This has little to do with process.
> > 
> > Put simply, the paid developers work on what they are paid for.  The
> > volunteers work on what they are interested in.
> 
> And they are paid for features instead of fixing bugs? I doubt enterprise 
> customers have this preference. I admit, they have no reason to pay for 
> fixing my bug, unless they experience it too, however.

Kernel developers are paid to work on feature, yes.  They are not paid
to fix bugs for random folks who want run the latest stable kernel.

There are separate groups of people who work on stablizing kernels for
the community and enterprise kernels.  These folks tend to spend about
3 months stablizing a community distribution, and probably 6-9 months
stablizing a kernel for an enterprise distribution.  These folks also
tend to do most performance tuning on kernels destined for enterprise
kernels as well.  Obviously some developeres who happen to be employed
by distributions will help out in stablizing an enterprise kernel, but
usually they get called in to fix a bug after the testers have found
it.

You can argue that this maybe shouldn't be the way things work, but
you're not the ones paying the salaries for the enterprise
distributions.  I'm sure if enough enterprise distribution customers
were willing to pay the enterprise distro folks to stablizing each
2.6.x kernel, the distro's would put their people on it.  I know there
are some kernel developers who would prefer it if enterprise distro's
didn't spend so long stablizing the tree, but instead worked on
stablizing each and every mainline release.

> I will think a bit more about this. But my first impression is that all of 
> these provisions are currently in conflict with time for feature work. If 
> there is no stabilization or sorta of freeze period, the speed won't calm 
> down in order to give stabilizitation a realistic chance.

Again, you can't force developers to work on stablization.  Many will
work on bugs because they want their driver or their file system to
have a good reputation.  And we do have people like Rafael who tracks
regressions; if there is a regression, and the patch isn't being
accepted by the maintainer, nag the maintainer; make sure it's in the
kernel bugzilla, and nag Rafael, who normally will also ping
maintainers when there is a know bug fix.  In the worst case, send
mail to Linus.  You are empowered to do this.   So do it!

And BTW, if the fix is reported in -rc7, to be fair, sometimes the
maintainer simply won't have time to test and quality control the
patch before Linus does a release.  So having something show up in a
2.6.x.y release really isn't the end of the world.  And the bug did
get fixed.  It just didn't get fixed in time for *your* needs, but
especially in the case of drivers (and your problems seemed to be
mostly driver related problems), remember that sometimes the driver
maintainer is a volunteer.

(One of the advantages of sticking with an Intel video chipset is that
maintainer is paid by Intel to support the Intel video drivers, and he
is normally quite responsive.  In contrast, the Radeon support is, if
I recall correctly all done by volunteers, and regardless of whether
or not the driver maintainers are paid full-time to work on supporting
the driver, or volunteers, people do go on vacation during the summer
months...)

As far as whether or not a kernel stable, I think the answer is, it's
stable if it's stable for *you*.  As I've said, with the hardware I've
chosen, very often it's stable by -rc3 or -rc4.  For others, they may
need to wait until several 2.6.x.y releases have gone by.  I tend to
complain when drivers I care about are broken in the -rc2 or -rc3 time
frame.  But if people wait until -rc7 to try out the kernel, then it
might not get fixed before 2.6.35 comes out.

						- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 19:11           ` Martin Steigerwald
@ 2010-09-04 23:23             ` Ted Ts'o
  2010-09-05  7:59               ` Martin Steigerwald
  0 siblings, 1 reply; 72+ messages in thread
From: Ted Ts'o @ 2010-09-04 23:23 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

On Sat, Sep 04, 2010 at 09:11:34PM +0200, Martin Steigerwald wrote:
> 
> Stop! I think we are misunderstanding.
> 
> Its a bug I stumpled across the bisecting process. Neither 2.6.33 or 
> 2.6.34 are affected, but some kernels in between. As such I didn't think 
> its worth reporting the bug.
> 
> I made a photo of part of the backtrace tough, so if you want I open a bug 
> report about it nonetheless. But I really think it has been fixed during 
> the 2.6.33 to 2.6.34 development cycle.

FYI, it's fair game to send a note to LKML with the backtrace, saying,
I'm getting this wierd stack trace while trying to do a bisect; it
looks like it's fixed in 2.6.34, does it look familiar?  If so,
someone might be able to point you at the commit that fixes the bug,
and then you can apply that patch by hand while doing the bisect at
each step (and then unapply it before doing the next bisect
iteration).

> For now I just skipped affected kernels in the bisection process in the 
> hope that none is the first last good or first bad one regarding the freeze 
> bug. Since for now it has all been kernels of a usb merge that showed this 
> issue, I don't think the freeze bug is in there.

Are you actually booting off of a USB device?  Even if you are, it
seems... strange... that a series of USB patches would cause an
ext4/readahead kernel OOPS.  Can you disable using USB devices, which
would hopefully prevent the problem from showing up?

Note by the way, that you don't have to try compiling at the points
chosen by "git bisect".  If you run into problems, you can try going
to the head of the USB patches, and if that works, report that
particular commit as "good" or "bad".

> So I just wanted to show that I am seriously working on tracking down that 
> likely radeon kms related freeze bug and that its time-consuming for me 
> due to having lots of unbootable kernels.

Have you reported this bug to the maintainer?  Is he helping you out?
Have you looked at the various Radeon-related commits between 2.6.34
and 2.6.33?  I imagine there probably aren't that many of them.  You
might try testing commits just before and after the Radeon-related
commits, which might speed up the git bisect significantly.

						- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 23:23             ` Ted Ts'o
@ 2010-09-05  7:59               ` Martin Steigerwald
  0 siblings, 0 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-05  7:59 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 3725 bytes --]

Am Sonntag 05 September 2010 schrieb Ted Ts'o:
> On Sat, Sep 04, 2010 at 09:11:34PM +0200, Martin Steigerwald wrote:
> > Stop! I think we are misunderstanding.
> > 
> > Its a bug I stumpled across the bisecting process. Neither 2.6.33 or
> > 2.6.34 are affected, but some kernels in between. As such I didn't
> > think its worth reporting the bug.
> > 
> > I made a photo of part of the backtrace tough, so if you want I open
> > a bug report about it nonetheless. But I really think it has been
> > fixed during the 2.6.33 to 2.6.34 development cycle.
> 
> FYI, it's fair game to send a note to LKML with the backtrace, saying,
> I'm getting this wierd stack trace while trying to do a bisect; it
> looks like it's fixed in 2.6.34, does it look familiar?  If so,
> someone might be able to point you at the commit that fixes the bug,
> and then you can apply that patch by hand while doing the bisect at
> each step (and then unapply it before doing the next bisect
> iteration).

Thanks. As to your advice I am seeking help again with bisecting this bug. 
See the thread "help with git bisecting a bug 16376: random - possibly 
Radeon DRM KMS related - freezes". I put you on Cc for the Ext4 / 
readahead related backtrace.
 
> > For now I just skipped affected kernels in the bisection process in
> > the hope that none is the first last good or first bad one regarding
> > the freeze bug. Since for now it has all been kernels of a usb merge
> > that showed this issue, I don't think the freeze bug is in there.
> 
> Are you actually booting off of a USB device?  Even if you are, it
> seems... strange... that a series of USB patches would cause an
> ext4/readahead kernel OOPS.  Can you disable using USB devices, which
> would hopefully prevent the problem from showing up?

Nope. I think the bug is completely unrelated to the commits from the USB 
merge. I think that the USB commits just had the bad luck having been 
merged between the other bug was introduced and fixed.

> Note by the way, that you don't have to try compiling at the points
> chosen by "git bisect".  If you run into problems, you can try going
> to the head of the USB patches, and if that works, report that
> particular commit as "good" or "bad".

Yes, thats what the git reset --hard example should do. But I wondered on 
how to do it exactly. I saw "git reset --hard HEAD~3" in the manpage to go 
three commits back and only later found out that I could give a commit id 
to "git reset". Is just going to the head of that USB merge and testing 
that better than skipping the complete range? Anyway I really think that 
none of the commits in there caused or fixed that bug.

> > So I just wanted to show that I am seriously working on tracking down
> > that likely radeon kms related freeze bug and that its
> > time-consuming for me due to having lots of unbootable kernels.
> 
> Have you reported this bug to the maintainer?  Is he helping you out?
> Have you looked at the various Radeon-related commits between 2.6.34
> and 2.6.33?  I imagine there probably aren't that many of them.  You
> might try testing commits just before and after the Radeon-related
> commits, which might speed up the git bisect significantly.

Yes, of course. I also posted my previous git bisect results already. I 
wanted to add a comment with the current results yesterday, but bugzilla 
had to many MySQL connection for an extended period of time. Now I did 
with more specifically asking for help[1]

[1] https://bugzilla.kernel.org/show_bug.cgi?id=16376#c38

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 16:38       ` Martin Steigerwald
  2010-09-04 18:46         ` Ted Ts'o
  2010-09-04 19:24         ` Stefan Richter
@ 2010-09-05  8:35         ` Avi Kivity
  2010-09-05  9:48           ` Martin Steigerwald
  2 siblings, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2010-09-05  8:35 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

  On 09/04/2010 07:38 PM, Martin Steigerwald wrote:
> Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
>> Hi Martin,
> Hi Willy, hi everyone else reading this,
>

Interesting, how do you expect Willy to read this if you don't copy him?

Don't trim cc lists if you want people to read you email, especially on 
a high volume list like lkml.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-05  8:35         ` Avi Kivity
@ 2010-09-05  9:48           ` Martin Steigerwald
  0 siblings, 0 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-05  9:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 1007 bytes --]

Am Sonntag 05 September 2010 schrieb Avi Kivity:
>   On 09/04/2010 07:38 PM, Martin Steigerwald wrote:
> > Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> >> Hi Martin,
> > 
> > Hi Willy, hi everyone else reading this,
> 
> Interesting, how do you expect Willy to read this if you don't copy
> him?
> 
> Don't trim cc lists if you want people to read you email, especially on
> a high volume list like lkml.

It was a mistake. I send another copy with the him on cc and he actually 
also replied already. There are mailing lists like all the debian ones 
where cc's are usually not wanted - even on the higher volume lists - and 
mailing lists where they are wanted like most linux kernel related ones. 
Sometimes when I switch from debian lists to linux kernel related ones I 
forget the cc. Would be nice to have a default setting per folder in KMail 
for this. ;)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 19:33   ` Martin Steigerwald
@ 2010-09-04 20:19     ` Willy Tarreau
  0 siblings, 0 replies; 72+ messages in thread
From: Willy Tarreau @ 2010-09-04 20:19 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

On Sat, Sep 04, 2010 at 09:33:27PM +0200, Martin Steigerwald wrote:
> > Thus at one point you can't hope to get bug reports anymore.
> > When you see an -rc7 or -rc8, you think "hey, -rc4 was OK, let's
> > wait for -final and install it".
> 
> That fits perfectly well. If the first rcs are nicely testing, then ideally 
> all major issues should be done, when rc7 or rc8 are reached. And thus 
> time can be spent on fixing the major remaining open regression.

OK I see that you're talking about *open* regressions. I thought you were
talking about bugs in general. I think (but that's my own feeling) that as
soon as the cause of a regression is narrowed down enough to identify the
commit that caused it, it gets quickly fixed (though I have no numbers on
the subject). But when someone says "I was doing this or that when my
kernel froze", it can be anything. Drivers are different because they
impact less people than the core. However the developers don't always
have access to the hardware combination causing a reproducible error case.

> I guess 
> those who reported these regression are interested in testing a fix.

I really think that there's good interactivity when the bug is spotted.
The hard part is the one before.

> >   - people concerned by stability don't test every release. They test
> > when they can, precisely because they can't impact production. So they
> > don't contribute bug reports in time. And as the 2.4 maintainer, I'm
> > well aware of that, because when I break something, I only know about
> > it 3-4 months later.
> 
> How does this affect my suggestion above? If as you say the first rcs are 
> tested better and if as I assume those who reported regressions have an 
> interest in testing their fixes, I think this can work out nicely.

But you can't have developer sit on their code for 4 months waiting for
bug reports to come in. And if you're talking about open bugs only, each
one of them will think the issue is probably in the other one's code.
Common problem of development teams.

> Aside from that, I am not sure whether most people step in with rc1 or rc2 
> already. When I tested rc kernels - there have been some times - I usually 
> waited to rc3 or rc4 so I could be somewhat confident that really major 
> issues are fixed already.

I think that people waiting for a specific feature will immediately jump on
rc1 or rc2. People who are curious about what was stuffed in the new kernel
will likely wait for rc3/4, hoping to get something they can run a day long.

> > I think that trying to evaluate and publish quality per developer or
> > maintainer can have a better effect because everyone in the commit
> > chain is responsible. But even doing that is hard because some changes
> > touch everything and it's not obvious to say that Mr X or Y has done
> > some crap.
> 
> And who judges on what is crap? Build failures could be tracked 
> automatically. Partly maybe even performance regression as the automated 
> tests from Phoronix show. Well boot failures or freezes are even more 
> important. But then, you are probably not judging the quality of the work 
> of the developer but the difficulty of the area he works on.

I agree with you in general on this point, which makes the issue even harder
to solve. However, some bugs are definitely caused by crap (look for Al
Viro's occasional audit reports, missing locks and thinks like this should
not get merged). Every developer starts inexperienced, and may humbly ask
for help.

> Nix pointed out that programming ATI Radeon cards can be quite 
> challenging. And I do have lots of respect for the Radeon KMS related 
> work. So I think it would be unfair to point at one of the Radeon KMS 
> developers and say to him "you did crap" for example.

100% agreed. It's the same in my opinion for every piece of code that
relies on configs that are hard to obtain. For instance, if a driver
breaks on configs with more than 256 CPUs or 1 TB of RAM, we can't
necessarily blame the author for not being able to test his code in
such situations.

> I think crap does happen and am more concerned about how to handle it when 
> it does.

OK, but when an unusual config is required, sometimes the author cannot
help getting his code fixed.

> Okay, my contribution then: I report bugs. I reported 4-5 kernels bugs in 
> the last time. I reported some before, but only occassionally. 

That's really nice.

> I didn't 
> face that many bugs prior to 2.6.34 which contributed to my admittedly 
> very subjective impression that kernel quality has lowered.

Possible, but it's also possible that the new bugs affect an area that
you're using much more than the ones affected by bugs in older versions.
It's also possible that you became better at noticing bugs.

> > Last, developers must not betray their users' trust. When they're not
> > certain of their code, this must be advertised (this is often the case
> > but not always). That helps a lot end users select only reliable
> > features and experience more stability.
> 
> Well for me a balance must be met: A kernel has to work good enough for me 
> to use it regularily.

That's what everyone looks for, and obviously the threshold is not the same
for everyone, and the bugs don't affect everyone. You see, while 2.4 is in
feature freeze and thought to be very stable by its users (and I occasionally
encounter systems with 2 years of uptime under permanent stress), i would
not be surprized that some people consider it still not stable enough for
their usages. It's just a matter of personal taste.

> And currently 2.6.34 upto 2.6.36-rc2 on my ThinkPad 
> T42 simply do not fulfil that criterium. What annoys me most: Radeon KMS 
> already works perfectly stable on 2.6.33 for me. So the issue was not in 
> the initial version of Radeon KMS. It has been introduced afterwards. Thus 
> a supposedly more matured and stable version of it is working less stable 
> for me.

That's where you're on the wrong side. 2.6.34 is not supposed to be a more
matured and stable version than 2.6.33. It's supposed to be a more *advanced*
version. Some issues were fixed, some features were added, some improvements
were performed and many bugs were added in that whole process. There's a rule
to follow concerning kernel upgrades in my opinion : you should only upgrade
for at least one of these 4 reasons :
  - test new kernels
  - get new features
  - fix a known bug
  - remain on a supported version

It's very likely that you'll regularly switch between newer and older kernels
to switch between the first 2 and the last 2 reasons. But people who upgrade
just to be on the edge and who don't even contribute bug reports back are just
looking for trouble in my opinion.

Regards,
Willy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 17:22 ` Willy Tarreau
@ 2010-09-04 19:33   ` Martin Steigerwald
  2010-09-04 20:19     ` Willy Tarreau
  0 siblings, 1 reply; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-04 19:33 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 7236 bytes --]


Hi again,

Am Samstag 04 September 2010 schrieb Willy Tarreau:
> On Sat, Sep 04, 2010 at 06:42:19PM +0200, Martin Steigerwald wrote:
> (...)
> 
> > The main idea here is to have a two-staged freeze process and to
> > distribute the "I am only taking bug fixes" work to more people than
> > Linus.
> > 
> > For this to work properly, I think at the time of the release of the
> > stable kernel subsystem maintainers and Andrew should branch their
> > trees. For example when 2.6.36 is released:
> > 
> > - tree
> > 
> >   => 2.6.36-stable-tree
> >   => tree, where 2.6.37 stuff will be going in
> > 
> > Thus when subsystem maintainers take new stuff during the merge
> > window, it will be for the next kernel release already, not for the
> > current one. Except bugfix work. Whereas I think the criteria for
> > bug fix work should not be that strict than for the stable patches
> > Greg collects.
> > 
> > Thus it needs to be clear: No new stuff for next kernel already two
> > weeks prior to release the current stable kernel.
> 
> While I respect your beliefs on this matter (they once were mine too),
> I now realized I was wrong for several reasons :
>   - most developers want to create. They (generally) test what they
> create, they believe it's flawless because it works for them. No need
> for more testing, go on with new features ; if you refuse to merge
> their new work for some time, they work on their own tree and push you
> more work at once next time.
> 
>   - developers need real world use cases. That means more testers.
> Developers are bad testers because they don't trigger the unexpected
> use cases. And how do you get good testers ? by motivating end users
> to test your code. Most testers will only test a new kernel to get a
> new feature. If it works for them, no need to push the tests further.
> So that means that the first RCs are the most tested, and that the
> later ones are the least tested. Thus at one point you can't hope to
> get bug reports anymore. When you see an -rc7 or -rc8, you think "hey,
> -rc4 was OK, let's wait for -final and install it".

That fits perfectly well. If the first rcs are nicely testing, then ideally 
all major issues should be done, when rc7 or rc8 are reached. And thus 
time can be spent on fixing the major remaining open regression. I guess 
those who reported these regression are interested in testing a fix.

For me features have been number one reason to upgrade kernels as well, 
but then its not a yes or no decision, but more a tuning on how much new 
feature stuff each stable kernel release should have and a way to put a 
little bit more attention to making a stable kernel release stable.

>   - people concerned by stability don't test every release. They test
> when they can, precisely because they can't impact production. So they
> don't contribute bug reports in time. And as the 2.4 maintainer, I'm
> well aware of that, because when I break something, I only know about
> it 3-4 months later.

How does this affect my suggestion above? If as you say the first rcs are 
tested better and if as I assume those who reported regressions have an 
interest in testing their fixes, I think this can work out nicely.

Aside from that, I am not sure whether most people step in with rc1 or rc2 
already. When I tested rc kernels - there have been some times - I usually 
waited to rc3 or rc4 so I could be somewhat confident that really major 
issues are fixed already.

> For this reason, I think the release rhythm can't much be changed.

I still object that for above given reasons. And cause I think that if 
something does not work out perfectly it still can be improved. But I am 
interested in your other suggestions as well, cause maybe its not so much 
the release process but something else the issue here:

> I think that trying to evaluate and publish quality per developer or
> maintainer can have a better effect because everyone in the commit
> chain is responsible. But even doing that is hard because some changes
> touch everything and it's not obvious to say that Mr X or Y has done
> some crap.

And who judges on what is crap? Build failures could be tracked 
automatically. Partly maybe even performance regression as the automated 
tests from Phoronix show. Well boot failures or freezes are even more 
important. But then, you are probably not judging the quality of the work 
of the developer but the difficulty of the area he works on.

Nix pointed out that programming ATI Radeon cards can be quite 
challenging. And I do have lots of respect for the Radeon KMS related 
work. So I think it would be unfair to point at one of the Radeon KMS 
developers and say to him "you did crap" for example.

I think crap does happen and am more concerned about how to handle it when 
it does.

> In my opinion, reporting bugs is the most effective way of improving
> quality. If you report 10 bugs in a week on the same driver, there are
> chances that at one point this driver's author will want to take some
> time to audit his code and find other bugs before you next point your
> finger at him/her. As you see, the goal is not just to report bugs to
> get them fixed, but to educate bug authors.

Okay, my contribution then: I report bugs. I reported 4-5 kernels bugs in 
the last time. I reported some before, but only occassionally. I didn't 
face that many bugs prior to 2.6.34 which contributed to my admittedly 
very subjective impression that kernel quality has lowered.

> I can tell you that I am an author of quite a number of bugs in another
> project (haproxy), and I absolutely hate it when a bug is detected in
> production (especially given the product's goal), to the point that the
> code is generally reworked 2, 3, 5, 10 times before being committed. Of
> course it is still not enough to catch all bugs, but since the product
> has got a widely accepted reputation of being rock solid, I think it
> works quite well afterall.

Interesting project, I am implementing a highly available active/passive 
loadbalancer cluster using Corosync, Pacemaker and the IPVS frontend 
Ldirectord at the moment currently at work.

> Last, developers must not betray their users' trust. When they're not
> certain of their code, this must be advertised (this is often the case
> but not always). That helps a lot end users select only reliable
> features and experience more stability.

Well for me a balance must be met: A kernel has to work good enough for me 
to use it regularily. And currently 2.6.34 upto 2.6.36-rc2 on my ThinkPad 
T42 simply do not fulfil that criterium. What annoys me most: Radeon KMS 
already works perfectly stable on 2.6.33 for me. So the issue was not in 
the initial version of Radeon KMS. It has been introduced afterwards. Thus 
a supposedly more matured and stable version of it is working less stable 
for me.

2.6.33-tp42-01231-g11b897c has been good to me so far. I am glad it had 
not frozen yet. I better press send now.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
  2010-09-04 16:42 Martin Steigerwald
@ 2010-09-04 17:22 ` Willy Tarreau
  2010-09-04 19:33   ` Martin Steigerwald
  0 siblings, 1 reply; 72+ messages in thread
From: Willy Tarreau @ 2010-09-04 17:22 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-kernel

Hi Martin,

On Sat, Sep 04, 2010 at 06:42:19PM +0200, Martin Steigerwald wrote:
(...)
> The main idea here is to have a two-staged freeze process and to 
> distribute the "I am only taking bug fixes" work to more people than Linus.
> 
> For this to work properly, I think at the time of the release of the 
> stable kernel subsystem maintainers and Andrew should branch their trees. 
> For example when 2.6.36 is released:
> 
> - tree 
>   => 2.6.36-stable-tree
>   => tree, where 2.6.37 stuff will be going in
> 
> Thus when subsystem maintainers take new stuff during the merge window, it 
> will be for the next kernel release already, not for the current one. 
> Except bugfix work. Whereas I think the criteria for bug fix work should not 
> be that strict than for the stable patches Greg collects.
> 
> Thus it needs to be clear: No new stuff for next kernel already two weeks 
> prior to release the current stable kernel.

While I respect your beliefs on this matter (they once were mine too), I now
realized I was wrong for several reasons :
  - most developers want to create. They (generally) test what they create,
    they believe it's flawless because it works for them. No need for more
    testing, go on with new features ; if you refuse to merge their new work
    for some time, they work on their own tree and push you more work at once
    next time.

  - developers need real world use cases. That means more testers. Developers
    are bad testers because they don't trigger the unexpected use cases. And
    how do you get good testers ? by motivating end users to test your code.
    Most testers will only test a new kernel to get a new feature. If it works
    for them, no need to push the tests further. So that means that the first
    RCs are the most tested, and that the later ones are the least tested.
    Thus at one point you can't hope to get bug reports anymore. When you see
    an -rc7 or -rc8, you think "hey, -rc4 was OK, let's wait for -final and
    install it".

  - people concerned by stability don't test every release. They test when
    they can, precisely because they can't impact production. So they don't
    contribute bug reports in time. And as the 2.4 maintainer, I'm well
    aware of that, because when I break something, I only know about it 3-4
    months later.

For this reason, I think the release rhythm can't much be changed. I think
that trying to evaluate and publish quality per developer or maintainer can
have a better effect because everyone in the commit chain is responsible.
But even doing that is hard because some changes touch everything and it's
not obvious to say that Mr X or Y has done some crap.

In my opinion, reporting bugs is the most effective way of improving
quality. If you report 10 bugs in a week on the same driver, there are
chances that at one point this driver's author will want to take some
time to audit his code and find other bugs before you next point your
finger at him/her. As you see, the goal is not just to report bugs to
get them fixed, but to educate bug authors.

I can tell you that I am an author of quite a number of bugs in another
project (haproxy), and I absolutely hate it when a bug is detected in
production (especially given the product's goal), to the point that the
code is generally reworked 2, 3, 5, 10 times before being committed. Of
course it is still not enough to catch all bugs, but since the product
has got a widely accepted reputation of being rock solid, I think it
works quite well afterall.

Last, developers must not betray their users' trust. When they're not
certain of their code, this must be advertised (this is often the case
but not always). That helps a lot end users select only reliable features
and experience more stability.

Regards,
Willy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: stable? quality assurance?
@ 2010-09-04 16:42 Martin Steigerwald
  2010-09-04 17:22 ` Willy Tarreau
  0 siblings, 1 reply; 72+ messages in thread
From: Martin Steigerwald @ 2010-09-04 16:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Willy Tarreau

[-- Attachment #1: Type: text/plain, Size: 4929 bytes --]

Sorry, forgot Cc again.

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy, hi everyone else reading this,

> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the  current replies I perceive a lack of that ability.
> 
> well, I'll try to do then :-)
> 
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
> 
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.

During bisecting [Bug 16376] random - possibly Radeon DRM KMS related 
freezes, which goes very slowly due to having lots of unbootable kernels 
with an ext4 / readahead related backtrace during boot, I had an idea:

I think main problem is that the current development process does not give 
time for quality work and bug fixing. As I understand it currently its just 
a constant development of new features with bug fixing and quality work 
having to be done beneath that development:

- before 2.6.36 is released developers aim at developing new stuff for 
2.6.37.

- after 2.6.36 is released developers aim at getting as much stuff into 
2.6.37 and then after two weeks at developing new features for 2.6.38.

This process does not take bug fixing into account at all, cause after the 
merge window has closing, developers hurry to get the stuff ready for the 
next window.

In that model extending the freeze period after rc1 doesn't help at all, 
cause as you say more "crap^H^H^H^Hdevelopment" gets collected for the 
next kernel.

But is that a *given* that no one actually has any influence to? Is 
collecting changes for next kernel like rain that either pours down or not 
- usually pours down in this case like in August in Germany ;)? Who feeds 
Linus with new stuff during the merge window? From what I understand of the 
Linux development process its mainly the subsystem maintainers and Andrew 
Morton.

What if those people stop collecting new stuff for Linus except bugfixes 
about two or three weeks before the next kernel is relased? This would 
give the subsystem trees and the mm tree some time to stabilize a bit, so 
that Linus gets more quality stuff in the first time. And more importantly, 
since developers know that subsystem maintainers and Andrew only collect 
bugfixes 2-3 weeks before the release of a stable kernel, they can as well 
spend some time on quality work.

Of course, developers can still decide: Well if 2.6.37 work is closed 
already and continue developing for 2.6.38 even earlier, but I still think 
this would help to slow things down a bit prior to the critical phase 
before releasing a stable kernel. Cause when I know my subsystem 
maintainer or Andrew won't be taking my stuff anyway, before the release 
kernel is released, I can take a little time for other things.

The main idea here is to have a two-staged freeze process and to 
distribute the "I am only taking bug fixes" work to more people than Linus.

For this to work properly, I think at the time of the release of the 
stable kernel subsystem maintainers and Andrew should branch their trees. 
For example when 2.6.36 is released:

- tree 
  => 2.6.36-stable-tree
  => tree, where 2.6.37 stuff will be going in

Thus when subsystem maintainers take new stuff during the merge window, it 
will be for the next kernel release already, not for the current one. 
Except bugfix work. Whereas I think the criteria for bug fix work should not 
be that strict than for the stable patches Greg collects.

Thus it needs to be clear: No new stuff for next kernel already two weeks 
prior to release the current stable kernel.

I think, this could help. Its a bit like the two-staged development 
process of Debian, but with the freeze period for "unstable" being a fixed 
time interval of about 2 weeks instead of RC=0 for stable ;). Its a bit of 
a formal shift of attention to the stable kernel about 2 weeks before its 
release. Developers might find creative ways to circumvent it, or they 
understand, that this process serves a purpose of improving kernel 
quality.

When you think these two weeks cannot be squeezed into the three-monthly 
development cycle, a four-monthly development cycle might do. But actually 
I don't see why these two weeks could not be made to fit in there.

Installing and testing next kernel after yet another mail to this thread,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2010-09-05  9:48 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-11  7:18 stable? quality assurance? Martin Steigerwald
2010-07-11  8:39 ` Eric Dumazet
2010-07-11 14:22   ` Martin Steigerwald
2010-07-11 14:52     ` Martin Steigerwald
2010-07-11 15:58   ` William Pitcock
2010-07-11 16:34     ` Eric Dumazet
2010-07-16  6:59     ` Greg KH
2010-08-05  3:27       ` Jeremy Fitzhardinge
2010-07-11 17:04   ` Heinz Diehl
2010-07-11 13:16 ` Ted Ts'o
2010-07-11 18:02   ` Anca Emanuel
2010-07-12  6:46   ` David Newall
     [not found]     ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>
2010-07-12 12:35       ` Fwd: " Marcin Letyns
2010-07-12 12:42         ` Alexey Dobriyan
     [not found]           ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>
2010-07-12 12:52             ` Fwd: " Marcin Letyns
2010-07-12 14:57           ` Valdis.Kletnieks
2010-07-12 15:56       ` David Newall
2010-07-12 17:48         ` Marcin Letyns
2010-07-12 18:00         ` Stefan Richter
2010-07-12 19:58           ` David Newall
2010-07-12 21:11             ` Stefan Richter
2010-07-12 21:39             ` Martin Steigerwald
2010-07-12 22:44               ` Stefan Richter
2010-07-15  7:23             ` david
2010-07-13 16:50         ` Theodore Tso
2010-07-13 20:45           ` David Newall
2010-07-14  6:33             ` Theodore Tso
2010-09-04 17:12   ` Martin Steigerwald
2010-07-11 13:56 ` Lee Mathers
2010-07-11 14:51   ` Martin Steigerwald
2010-07-11 17:22     ` Willy Tarreau
2010-07-11 21:38       ` Rafael J. Wysocki
2010-07-12  4:17         ` Willy Tarreau
2010-07-12  9:56       ` Martin Steigerwald
2010-07-12 15:43       ` Martin Steigerwald
2010-07-12 17:36         ` Willy Tarreau
2010-07-12 19:56           ` Martin Steigerwald
2010-07-12 23:03             ` Stefan Richter
2010-07-13 10:30               ` Martin Steigerwald
2010-07-15  7:32               ` david
2010-07-12 17:55         ` Stefan Richter
2010-09-04 16:38       ` Martin Steigerwald
2010-09-04 18:46         ` Ted Ts'o
2010-09-04 19:11           ` Martin Steigerwald
2010-09-04 23:23             ` Ted Ts'o
2010-09-05  7:59               ` Martin Steigerwald
2010-09-04 19:24         ` Stefan Richter
2010-09-04 19:34           ` Stefan Richter
2010-09-04 20:21           ` Martin Steigerwald
2010-09-04 22:50             ` Stefan Richter
2010-09-04 23:16             ` Ted Ts'o
2010-09-05  8:35         ` Avi Kivity
2010-09-05  9:48           ` Martin Steigerwald
2010-07-11 19:49     ` Stefan Richter
2010-07-13 11:11     ` Alejandro Riveira Fernández
2010-07-13 12:50       ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter
2010-07-13 15:35         ` John W. Linville
2010-07-13 18:19           ` Alejandro Riveira Fernández
2010-07-13 18:38             ` John W. Linville
2010-07-13 19:07               ` Alejandro Riveira Fernández
2010-07-13 18:06         ` Alejandro Riveira Fernández
2010-07-13 19:18           ` Stefan Richter
2010-07-12 19:46 ` stable? quality assurance? Nix
     [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>
2010-07-15  9:09   ` Valeo de Vries
2010-07-16  7:00     ` Greg KH
2010-07-16  7:19       ` Justin P. Mattock
2010-07-16 15:25       ` Randy Dunlap
2010-07-16 15:34       ` Valeo de Vries
2010-09-04 16:42 Martin Steigerwald
2010-09-04 17:22 ` Willy Tarreau
2010-09-04 19:33   ` Martin Steigerwald
2010-09-04 20:19     ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.