linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Slow DOWN, please!!!
@ 2008-04-30  2:03 David Miller
  2008-04-30  4:03 ` David Newall
                   ` (2 more replies)
  0 siblings, 3 replies; 229+ messages in thread
From: David Miller @ 2008-04-30  2:03 UTC (permalink / raw)
  To: linux-kernel


This is starting to get beyond frustrating for me.

Yesterday, I spent the whole day bisecting boot failures
on my system due to the totally untested linux/bitops.h
optimization, which I fully analyzed and debugged.

Today, I had hoped that I could get some work done of my
own, but that's not the case.

Yet another bootup regression got added within the last 24
hours.

I don't mind fixing the regression or two during the merge
window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!

The tree breaks every day, and it's becomming an extremely
non-fun environment to work in.

We need to slow down the merging, we need to review things
more, we need people to test their fucking changes!

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  2:03 Slow DOWN, please!!! David Miller
@ 2008-04-30  4:03 ` David Newall
  2008-04-30  4:18   ` David Miller
                     ` (2 more replies)
  2008-04-30 14:48 ` Peter Teoh
  2008-04-30 19:36 ` Rafael J. Wysocki
  2 siblings, 3 replies; 229+ messages in thread
From: David Newall @ 2008-04-30  4:03 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, Linus Torvalds

David Miller wrote:
> We need to slow down the merging, we need to review things
> more, we need people to test their fucking changes!

Yes.  The Linux process is becoming unreliable.  Newly "stable" versions
have stability problems.  The development process looks childish. 
Seasoned developers say not to worry, that the process works.  I do
worry.  BSD seems more attractive, and it may even be worth the
considerable effort to switch my entire client-base.  Linux was lucky to
gain the foothold that it did: traditionally, BSD had a better system
with a less restrictive licence, so it is surprising that manufacturers
chose to go with Linux.  BSD still has a less restrictive licence and
when mainstream press becomes interested in Linux's quality problems
it's adoption will fall.  BSD is still a good, maybe even better, option.

Linus, this is your baby and so it's your problem.  Only you have the
influence to change things.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  4:03 ` David Newall
@ 2008-04-30  4:18   ` David Miller
  2008-04-30 13:04     ` David Newall
  2008-04-30  7:11   ` Tarkan Erimer
  2008-04-30 14:55   ` Russ Dill
  2 siblings, 1 reply; 229+ messages in thread
From: David Miller @ 2008-04-30  4:18 UTC (permalink / raw)
  To: davidn; +Cc: linux-kernel, torvalds

From: David Newall <davidn@davidnewall.com>
Date: Wed, 30 Apr 2008 13:33:29 +0930

> Yes.

Please don't use my posting as an opportunity to portray
BSD as the best thing since sliced bread.

We're having ONE bad merge window, we're facing the problem
head on, RIGHT NOW, to prevent it in the future.  It's
not a severe ongoing issue as you portray it to be.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  0:31       ` RFC: starting a kernel-testers group for newbies Adrian Bunk
@ 2008-04-30  7:03         ` Arjan van de Ven
  2008-05-01  8:13           ` Andrew Morton
  2008-05-01 11:30           ` Adrian Bunk
  2008-05-01  0:41         ` David Miller
  1 sibling, 2 replies; 229+ messages in thread
From: Arjan van de Ven @ 2008-04-30  7:03 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Linus Torvalds, Andrew Morton, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Thu, 1 May 2008 03:31:25 +0300
Adrian Bunk <bunk@kernel.org> wrote:

> On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > > 
> > > <jumps up and down>
> > > 
> > > There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> > 
> > The problem I see with both -mm and linux-next is that they tend to
> > be better at finding the "physical conflict" kind of issues (ie the
> > merge itself fails) than the "code looks ok but doesn't actually
> > work" kind of issue.
> > 
> > Why?
> > 
> > The tester base is simply too small.
> > 
> > Now, if *that* could be improved, that would be wonderful, but I'm
> > not seeing it as very likely.
> > 
> > I think we have fairly good penetration these days with the regular
> > -git tree, but I think that one is quite frankly a *lot* less scary
> > than -mm or -next are, and there it has been an absolutely huge
> > boon to get the kernel into the Fedora test-builds etc (and I
> > _think_ Ubuntu and SuSE also started something like that).
> > 
> > So I'm very pessimistic about getting a lot of test coverage before
> > -rc1.
> > 
> > Maybe too pessimistic, who knows?
> 
> First of all:
> I 100% agree with Andrew that our biggest problems are in reviewing
> code and resolving bugs, not in finding bugs (we already have far too
> many unresolved bugs).

I would argue instead that we don't know which bugs to fix first.
We're never going to fix all bugs, and to be honest, that's ok.
As long as we fix the important bugs, we're doing really well.
And at least for the kerneloops.org reported issues, we're doing quite ok.

For me, 'important' is a combination of effect of the bug and the number of people
it'll hit. A compiler warning on parisc is less important than easy to trigger filesystem corruption
in ext3 that way; more people will hit it and the effect is more grave.


For oopses and WARN_ON()'s were getting to the hang of this now with kerneloops.org,
at least for the oopses that aren't really hard fatal. One thing I learned at least is that
lkml is a poor representation of what people actually hit; it's a very very selective
audience. 
oopses/warnons are only a subset of the bugs of course... but still.

So there's a few things we (and you / janitors) can do over time to get better data on what issues
people hit: 
1) Get automated collection of issues more wide spread. The wider our net the better we know which
   issues get hit a lot, and plain the more data we have on when things start, when they stop, etc etc.
   Especially if you get a lot of testers in your project, I'd like them to install the client for easy reporting
   of issues.
2) We should add more WARN_ON()s on "known bad" conditions. If it WARN_ON()'s, we can learn about it via
   the automated collection. And we can then do the statistics to figure out which ones happen a lot.
3) We need to get persistent-across-reboot oops saving going; there's some venues for this



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  4:03 ` David Newall
  2008-04-30  4:18   ` David Miller
@ 2008-04-30  7:11   ` Tarkan Erimer
  2008-04-30 13:28     ` David Newall
  2008-04-30 14:55   ` Russ Dill
  2 siblings, 1 reply; 229+ messages in thread
From: Tarkan Erimer @ 2008-04-30  7:11 UTC (permalink / raw)
  To: David Newall; +Cc: David Miller, linux-kernel, Linus Torvalds

David Newall wrote:
> Yes.  The Linux process is becoming unreliable.  Newly "stable" versions
> have stability problems.  The development process looks childish. 
> Seasoned developers say not to worry, that the process works.  I do
> worry.  BSD seems more attractive, and it may even be worth the
> considerable effort to switch my entire client-base.  Linux was lucky to
> gain the foothold that it did: traditionally, BSD had a better system
> with a less restrictive licence, so it is surprising that manufacturers
> chose to go with Linux.  BSD still has a less restrictive licence and
> when mainstream press becomes interested in Linux's quality problems
> it's adoption will fall.  BSD is still a good, maybe even better, option.
>
> Linus, this is your baby and so it's your problem.  Only you have the
> influence to change things.
>   
I completely disagree with your foolish and nonsense comments about the 
Linux Kernel and the Linux OS. It's perfectly clear that you didn't 
understand well enough how the linux development process works. If you 
thought that the recently released kernels are not stable then, you have 
to wait the 2.6.x.y series or you can use the distro kernels. All of 
your comments are pointless and no base. You are free to choose BSD or 
whatever you want to use. No one is putting a gun on your head to use 
Linux :-)

I can very easily say that,cause of my experiences , the Linux Kernel is 
PERFECTLY STABLE! I work in an one of the largest ISP of my country and 
I use Linux very intensively under very high loads and I NEVER NEVER 
faced any problems because of the fault of the Linux Kernel on my 
environments. For example, many of our mail servers run on Linux and all 
the day they process hundred thousands emails without any downtime or 
trouble!

The manufactures mostly choose Linux instead of BSD flavors, simply 
because of that Linux kernel, technically, more superior to BSDs or 
others. When it comes to licenses: the BSD license is more and more 
worse, if GPL is bad. GPL protects your freedom and openness of the 
codes via forcing the changes to the source code must be return in open 
form. For BSD, it is opposite. You are free to take someone else's code 
and there is NO PROTECTION to prevent your code to become a closed 
(proprietary) source. Can you imagine that one company (like Microsoft) 
takes your whole kernel source code and creates a PROPRIETARY OS (Like 
Windows!) as making a fool of you ? Why? Because; simply, BSD license 
allows it! No need to return the code! Do you think really think that 
BSD license is more free as making a fool of you ?







^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  4:18   ` David Miller
@ 2008-04-30 13:04     ` David Newall
  2008-04-30 13:18       ` Michael Kerrisk
  2008-04-30 14:51       ` Linus Torvalds
  0 siblings, 2 replies; 229+ messages in thread
From: David Newall @ 2008-04-30 13:04 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, torvalds

David Miller wrote:
> We're having ONE bad merge window, we're facing the problem
> head on, RIGHT NOW, to prevent it in the future.  It's
> not a severe ongoing issue as you portray it to be.

No.  The problem is more than just a bad merge window.  There is poor or
non-existent review; frequent "regressions"; release of kernels as
stable when they are not.  There is resentment and resistance to even
acknowledging these problems.  Take, as an example, the desire to NOT
record who gives good code and who gives bugs: that one clearly hit a
nerve, which it should not have except from people who feel guilty.

I don't claim BSD to be perfect, but it appears to have a consistently
good quality.  Old Linux kernels also have that; new ones not so.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 13:04     ` David Newall
@ 2008-04-30 13:18       ` Michael Kerrisk
  2008-04-30 14:51       ` Linus Torvalds
  1 sibling, 0 replies; 229+ messages in thread
From: Michael Kerrisk @ 2008-04-30 13:18 UTC (permalink / raw)
  To: David Newall; +Cc: David Miller, linux-kernel, torvalds

> Take, as an example, the desire to NOT
>  record who gives good code and who gives bugs: that one clearly hit a
>  nerve, which it should not have except from people who feel guilty.

Speaking as someone who has found quite a few kernel bugs, but written
few (because I've written little kernel code ;-))...

No.  It hit a nerve because it's the simply wrong way of going about
things.  There is no use in assigning blame.

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  7:11   ` Tarkan Erimer
@ 2008-04-30 13:28     ` David Newall
  2008-04-30 13:38       ` Mike Galbraith
  2008-04-30 14:41       ` mws
  0 siblings, 2 replies; 229+ messages in thread
From: David Newall @ 2008-04-30 13:28 UTC (permalink / raw)
  To: Tarkan Erimer; +Cc: David Miller, linux-kernel, Linus Torvalds

Tarkan Erimer wrote:
> I completely disagree with your foolish and nonsense comments about
> the Linux Kernel and the Linux OS. It's perfectly clear that you
> didn't understand well enough how the linux development process works.
> If you thought that the recently released kernels are not stable then,
> you have to wait the 2.6.x.y series or you can use the distro kernels.
> All of your comments are pointless and no base. You are free to choose
> BSD or whatever you want to use. No one is putting a gun on your head
> to use Linux :-)

The problem is not exactly faults in recently released kernels, rather
that introduction of faults is common when it should be rare, and
kernels are released as stable when they are fragile.  Ignoring a
problem, and not caring if they migrate to BSD, is foolishness.  Of
course you don't want people to migrate to BSD, so don't pretend that
you don't care.

> Do you think really think that BSD license is more free as making a
> fool of you ?

It is a matter of transparent fact that BSD's licence is less
restrictive than Linux's.  Whether that is desirable is not something
that need be discussed at this juncture.  My point in raising BSD was
that, from a commercial point of view, BSD is attractive in a way that
Linux is not.  The many commercial vendors who have been taken to task
for not honouring their GPL obligations are strong demonstrations of
that.  Do not pretend that Linux is sacrosanct.  BSD would be an easy
swap for vendors should Linux gain a reputation for poor quality (and it
already runs Linux applications.)

Reputations snowball.  By the time anybody notices that a good one has
become tarnished it could be too late, and take too long, to rectify. 
I'm sure somebody else observed approximately this just yesterday, so
it's not just me, is it?

I won't champion this because it's unimportant to me.  Linux's quality
problems are not my problems.  I do what I can to help Linux, but I'm
not religious about operating systems and I know that good, free
operating systems will continue to thrive, even if Linux's dies, just as
they did before Linux was born.

Ignore the problem, even shoot the messenger, if you like; or be adult,
consider the proposition dispassionately, and take steps from there.

I've said my bit, in fact more than I wanted to, so I choose to stop here.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 13:28     ` David Newall
@ 2008-04-30 13:38       ` Mike Galbraith
  2008-04-30 14:41       ` mws
  1 sibling, 0 replies; 229+ messages in thread
From: Mike Galbraith @ 2008-04-30 13:38 UTC (permalink / raw)
  To: David Newall; +Cc: Tarkan Erimer, David Miller, linux-kernel, Linus Torvalds


On Wed, 2008-04-30 at 22:58 +0930, David Newall wrote:

> Ignore the problem, even shoot the messenger, if you like;

BANG!

#include <chicken_little_headstone.h>


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  8:13           ` Andrew Morton
@ 2008-04-30 14:15             ` Arjan van de Ven
  2008-05-01 12:42               ` David Woodhouse
  2008-05-04 12:45               ` Rene Herman
  2008-05-01  9:16             ` RFC: starting a kernel-testers group for newbies Frans Pop
  1 sibling, 2 replies; 229+ messages in thread
From: Arjan van de Ven @ 2008-04-30 14:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, Linus Torvalds, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Thu, 1 May 2008 01:13:46 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 30 Apr 2008 00:03:38 -0700 Arjan van de Ven
> <arjan@infradead.org> wrote:
> 
> > > First of all:
> > > I 100% agree with Andrew that our biggest problems are in
> > > reviewing code and resolving bugs, not in finding bugs (we
> > > already have far too many unresolved bugs).
> > 
> > I would argue instead that we don't know which bugs to fix first.
> 
> <boggle>
> 
> How about "a bug which we just added"?  One which is repeatable. 
> Repeatable by a tester who is prepared to work with us on resolving
> it. Those bugs.
> 
> Rafael has a list of them.  We release kernels when that list still
> has tens of unfixed regressions dating back up to a couple of months.
> 


I know he does. But I will still argue that if that is all we work from, and treat
all of those equally, we're doing the wrong thing.
I'm sorry, but I really do not consider "ext4 doesn't compile on m68k" which is 
on that list to be as relevant as a "i915 drm driver crashes" bug which is among
us for a while and not on that list, just based on the total user base for either of those. 

Does that mean nobody should fix the m68k bug?
Someone who cares about m68k for sure should work on it, or if it's easy for an ext4 developer,
sure. But if the ext4 person has to spend 8 hours on it figuring cross compilers, I say 
we're doing something very wrong here. (no offense to the m68k people, but there's just
a few of you; maybe I should have picked voyager instead)

Maybe that's a "boggle" for you; but for me that's symptomatic of where we are today:
We don't make (effective) prioritization decisions. Such decisions are hard, because it 
effectively means telling people "I'm sorry but your bug is not yet important". That's
unpopular, especially if the reporter is very motivated on lkml. And it will involve a 
certain amount of non-quantifiable judgement calls, which also means we won't always be
right. Another hard thing is that lkml is a very self-selective audience. A bug may be 
reported three times there, but never hit otherwise, while another bug might not be reported
at all (or only once) while thousands and thousands of people are hitting it.

Not that we're doing all that bad, we ARE fixing the bugs (at least the oopses/warnings) that
are frequently hit. So I wouldn't blindly say we're doing a bad job at prioritizing. I would
rather say that if we focus only on what is left afterwards without doing a reality check,
we'll *always* have a negative view of quality, since there will *always* be bugs we don't 
fix. Linux well over ten million users (much more if you count embedded devices). 
A lot of them will have "standard" hardware, and a bunch of them will have "weird" stuff.
Cosmic rays happen. As do overclocking and bad DIMMs. And some BIOSes are just weird etc etc.
If we do not prioritize effectively we'll be stuck forever chasing ghosts, or we'll be stuck
saying "our quality sucks" forever without making progress.

Another trap is to only look at what goes wrong, not on what goes right... we tend to only
see what goes wrong on lkml and it's an easy trap to fall into doomthinking that way.
Are we doing worse on quality? My (subjective) opinion is that we are doing better than last year.
We are focused more on quality. We are fixing the bugs that people hit most. We are fixing most
of the regressions (yes, not all). Subsystems are seeing flat or lower bugcounts/bugrates. Take ACPI, 
the number of outstanding bugs *halved* over the last year. Of course you can pick a single 
bug and say "but this one did not get fixed", but that just loses the big picture (and 
proves the point :). All of this with a growing userbase and a rate of development that's a bit
faster than last year as well.

Can we do better? Always. More testing will help. Both to detect things early, and by 
letting us figure out which bugs are important. Just saying "more testing is not relevant
because we're not even fixing the bugs we have now" is just incorrect. Sorry.
More testers helps. Wider range of hardware/usages allows us to find better patterns
in the hard to track down bugs. More testers means more people willing to see if they
can diagnose the bugs at least somewhat themselves, via bisection or otherwise. That's important,
because that's the part of the problem that scales well with a growing userbase.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 11:30           ` Adrian Bunk
@ 2008-04-30 14:20             ` Arjan van de Ven
  2008-05-01 12:53               ` Rafael J. Wysocki
  2008-05-01 13:21               ` Adrian Bunk
  0 siblings, 2 replies; 229+ messages in thread
From: Arjan van de Ven @ 2008-04-30 14:20 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Linus Torvalds, Andrew Morton, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Thu, 1 May 2008 14:30:38 +0300
Adrian Bunk <bunk@kernel.org> wrote:

> On Wed, Apr 30, 2008 at 12:03:38AM -0700, Arjan van de Ven wrote:
> > On Thu, 1 May 2008 03:31:25 +0300
> > Adrian Bunk <bunk@kernel.org> wrote:
> > 
> > > On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> > > > 
> > > > 
> > > > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > > > > 
> > > > > <jumps up and down>
> > > > > 
> > > > > There should be nothing in 2.6.x-rc1 which wasn't in
> > > > > 2.6.x-mm1!
> > > > 
> > > > The problem I see with both -mm and linux-next is that they
> > > > tend to be better at finding the "physical conflict" kind of
> > > > issues (ie the merge itself fails) than the "code looks ok but
> > > > doesn't actually work" kind of issue.
> > > > 
> > > > Why?
> > > > 
> > > > The tester base is simply too small.
> > > > 
> > > > Now, if *that* could be improved, that would be wonderful, but
> > > > I'm not seeing it as very likely.
> > > > 
> > > > I think we have fairly good penetration these days with the
> > > > regular -git tree, but I think that one is quite frankly a
> > > > *lot* less scary than -mm or -next are, and there it has been
> > > > an absolutely huge boon to get the kernel into the Fedora
> > > > test-builds etc (and I _think_ Ubuntu and SuSE also started
> > > > something like that).
> > > > 
> > > > So I'm very pessimistic about getting a lot of test coverage
> > > > before -rc1.
> > > > 
> > > > Maybe too pessimistic, who knows?
> > > 
> > > First of all:
> > > I 100% agree with Andrew that our biggest problems are in
> > > reviewing code and resolving bugs, not in finding bugs (we
> > > already have far too many unresolved bugs).
> > 
> > I would argue instead that we don't know which bugs to fix first.
> > We're never going to fix all bugs, and to be honest, that's ok.
> >...
> 
> That might be OK.
> 
> But our current status quo is not OK:
> 
> Check Rafael's regressions lists asking yourself
> "How many regressions are older than two weeks?" 

"ext4 doesn't compile on m68k".
YAWN.

Wrong question...
"How many bugs that a sizable portion of users will hit in reality are there?"
is the right question to ask...


> 
> We have unmaintained and de facto unmaintained parts of the kernel
> where even issues that might be easy to fix don't get fixed.

And how many people are hitting those issues? If a part of the kernel is really
important to enough people, there tends to be someone who stands up to either fix
the issue or start de-facto maintaining that part.
And yes I know there's parts where that doesn't hold. But to be honest, there's
not that many of them that have active development (and thus get the biggest
share of regressions)

> 
> >...
> > So there's a few things we (and you / janitors) can do over time to
> > get better data on what issues people hit: 
> > 1) Get automated collection of issues more wide spread. The wider
> > our net the better we know which issues get hit a lot, and plain
> > the more data we have on when things start, when they stop, etc
> > etc. Especially if you get a lot of testers in your project, I'd
> > like them to install the client for easy reporting of issues. 2) We
> > should add more WARN_ON()s on "known bad" conditions. If it
> > WARN_ON()'s, we can learn about it via the automated collection.
> > And we can then do the statistics to figure out which ones happen a
> > lot. 3) We need to get persistent-across-reboot oops saving going;
> > there's some venues for this
> 
> No disagreement on this, its just a different issue than our bug
> fixing problem.

No it's not! Knowing earlier and better which bugs get hit is NOT different
to our bug fixing "problem", it's in fact an essential part to the solution of it!

> 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 11:38                       ` Rafael J. Wysocki
@ 2008-04-30 14:28                         ` Arjan van de Ven
  2008-05-01 12:41                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 229+ messages in thread
From: Arjan van de Ven @ 2008-04-30 14:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Willy Tarreau, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Thu, 1 May 2008 13:38:33 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> BTW, we seem to underestimate testing in this discussion.  In fact,
> the vast majority of kernel bugs are discovered by testing, so
> perhaps the way to go is to make regular testing of the new code a
> part of the process.

well.. -rc1 to -rc8 are doing that already, somewhat.
Can we do better? Always. The more testing the better, and the more
testers the better. 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 13:28     ` David Newall
  2008-04-30 13:38       ` Mike Galbraith
@ 2008-04-30 14:41       ` mws
  1 sibling, 0 replies; 229+ messages in thread
From: mws @ 2008-04-30 14:41 UTC (permalink / raw)
  To: David Newall; +Cc: Tarkan Erimer, David Miller, linux-kernel, Linus Torvalds

David Newall schrieb:
> Tarkan Erimer wrote:
>> I completely disagree with your foolish and nonsense comments about
>> the Linux Kernel and the Linux OS. It's perfectly clear that you
>> didn't understand well enough how the linux development process works.
>> If you thought that the recently released kernels are not stable then,
>> you have to wait the 2.6.x.y series or you can use the distro kernels.
>> All of your comments are pointless and no base. You are free to choose
>> BSD or whatever you want to use. No one is putting a gun on your head
>> to use Linux :-)
> 
> The problem is not exactly faults in recently released kernels, rather
> that introduction of faults is common when it should be rare, and
> kernels are released as stable when they are fragile.  Ignoring a
> problem, and not caring if they migrate to BSD, is foolishness.  Of
> course you don't want people to migrate to BSD, so don't pretend that
> you don't care.
> 
>> Do you think really think that BSD license is more free as making a
>> fool of you ?
> 
> It is a matter of transparent fact that BSD's licence is less
> restrictive than Linux's.  Whether that is desirable is not something
> that need be discussed at this juncture.  My point in raising BSD was
> that, from a commercial point of view, BSD is attractive in a way that
> Linux is not.  The many commercial vendors who have been taken to task
> for not honouring their GPL obligations are strong demonstrations of
> that.  Do not pretend that Linux is sacrosanct.  BSD would be an easy
> swap for vendors should Linux gain a reputation for poor quality (and it
> already runs Linux applications.)
> 
> Reputations snowball.  By the time anybody notices that a good one has
> become tarnished it could be too late, and take too long, to rectify. 
> I'm sure somebody else observed approximately this just yesterday, so
> it's not just me, is it?
> 
> I won't champion this because it's unimportant to me.  Linux's quality
> problems are not my problems.  I do what I can to help Linux, but I'm
> not religious about operating systems and I know that good, free
> operating systems will continue to thrive, even if Linux's dies, just as
> they did before Linux was born.
> 
> Ignore the problem, even shoot the messenger, if you like; or be adult,
> consider the proposition dispassionately, and take steps from there.
> 
> I've said my bit, in fact more than I wanted to, so I choose to stop here.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
within all time you spent up here to discuss nonsense (from my pov it is),

several bugfixes, regression fixes, new drivers, ... have been done or started.

lets concentrate back on what counts - shouldn't we?

my 2ct
marcel



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  2:03 Slow DOWN, please!!! David Miller
  2008-04-30  4:03 ` David Newall
@ 2008-04-30 14:48 ` Peter Teoh
  2008-04-30 19:36 ` Rafael J. Wysocki
  2 siblings, 0 replies; 229+ messages in thread
From: Peter Teoh @ 2008-04-30 14:48 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel

On Wed, Apr 30, 2008 at 10:03 AM, David Miller <davem@davemloft.net> wrote:
>
>  This is starting to get beyond frustrating for me.
>
>  Yesterday, I spent the whole day bisecting boot failures
>  on my system due to the totally untested linux/bitops.h
>  optimization, which I fully analyzed and debugged.
>
>  Today, I had hoped that I could get some work done of my
>  own, but that's not the case.
>
>  Yet another bootup regression got added within the last 24
>  hours.
>
>  I don't mind fixing the regression or two during the merge
>  window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!
>
>  The tree breaks every day, and it's becomming an extremely
>  non-fun environment to work in.
>
>  We need to slow down the merging, we need to review things
>  more, we need people to test their fucking changes!
>  --

Just some comments:

Analogous to that of the football team, everyone has an impt role to
play.   And u better let go of the ball as fast as u can, otherwise u
are going to tire yourself out easily.

So, in a development team, if u think there is some unequal
distribution of workload, make noise.   Or think of some means to do
automatic loading of workload - specifically in the area of change
review.   (At other times, it is not easily to pass the load
around.....eg, if the bug happened only on your machines and not on
others?)

1.   Generally, the more people reviewed the work, the higher chances
the piece of work is ok.

2.   If more variation of real-testing is done, the better.
"variation" here means testing by users of different background
skills, different applications running, and most impt - is the base
kernel version where the patch is applied and tested. etc.

3.   Based on the two numbers above alone, we can immediately have
some measure of confidence of the patch - correct?

4.   So if we can put all these in a web page - the patches itself,
the reviewers/testers that have worked on it.

When someone comes in and review, review counter increase by one.   Or
tester counter increased by one after testing.

And I supposed everyone will attempt to cover those that are lesser
covered by others - automatic loading of workload done in a
distributed manner.

Avoid having to fill in too much information though...u will
discourage taking up the work, and let the participant spent more
precious time on reviewing instead.

So prior to consolidation of sources, just by looking at the numbers,
u can see how successful the consolidation will be.   If it is lesser
tested, then avoid including it for consoldating......

Please comments......
-- 
Regards,
Peter Teoh

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 13:04     ` David Newall
  2008-04-30 13:18       ` Michael Kerrisk
@ 2008-04-30 14:51       ` Linus Torvalds
  2008-04-30 18:21         ` David Newall
  1 sibling, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 14:51 UTC (permalink / raw)
  To: David Newall; +Cc: David Miller, linux-kernel



On Wed, 30 Apr 2008, David Newall wrote:
> 
> I don't claim BSD to be perfect, but it appears to have a consistently
> good quality.

Lol. You should try VMS. Now *there* was a stable system.

Oh, but it didn't actually make any progress, did it?

The fact is, we're merging a lot. It comes from having a lot of 
development. If you don't want that, then you're a fool - because you 
aren't looking at the long term.

>  Old Linux kernels also have that; new ones not so.

Can you point to any actual stability problem?

The problem under discussion is the fact that some people are unhappy 
because we had some merge trouble. The fact is, the problems got fixed in 
a few days. And yes, we will probably will have to make Ingo follow the 
rules that pretty much everybody else also follows, and no, it's not going 
to solve all problems either - the fundamental issue is that we are just 
too damn good at development.

And that's not a big problem in my view, as long as we are also also able 
to handle the _result_ of that flood of patches. Which, quite frankly, we 
are.

DavidN, you just have an agenda, and you think that mentioning BSD as some 
kind of shining example of goodness is a good way to reach that agenda. It 
isn't. It just shows that you don't understand the issue, and that you 
think that "threatening" developers by saying you'll switch is a great way 
to make PR.

But you know what? I really don't care one _whit_ what you do. You can 
switch to Vista for all I care, and I really don't mind. All I care about 
is doing a good job technically. 

And you just show that you don't have a clue what you are talking about. 
If you want stable kernel, don't follow the current -git tree. Don't mind 
the fact that in two weeks we merge 

	 6672 files changed, 373817 insertions(+), 285901 deletions(-)

and instead look at something like the enterprise kernels or other tree 
that lags the development tree by half a year or more exactly _because_ 
they care about stable, not development.

In short: what do you think the git tree is? Is it something that should 
prioritize good developmnent, or is it something that should worry about 
you making inane arguments? Ask yourself that.

			Linus


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  4:03 ` David Newall
  2008-04-30  4:18   ` David Miller
  2008-04-30  7:11   ` Tarkan Erimer
@ 2008-04-30 14:55   ` Russ Dill
  2 siblings, 0 replies; 229+ messages in thread
From: Russ Dill @ 2008-04-30 14:55 UTC (permalink / raw)
  To: linux-kernel

David Newall <davidn <at> davidnewall.com> writes:

> Yes.  The Linux process is becoming unreliable.  Newly "stable" versions
> have stability problems.  The development process looks childish. 
> Seasoned developers say not to worry, that the process works.  I do
> worry.  BSD seems more attractive, and it may even be worth the
> considerable effort to switch my entire client-base.  Linux was lucky to
> gain the foothold that it did: traditionally, BSD had a better system
> with a less restrictive licence, so it is surprising that manufacturers
> chose to go with Linux.  BSD still has a less restrictive licence and
> when mainstream press becomes interested in Linux's quality problems
> it's adoption will fall.  BSD is still a good, maybe even better, option.
> 
> Linus, this is your baby and so it's your problem.  Only you have the
> influence to change things.
> 

Can you please point me in the direction of the BSD kernel lists so that I can
inject useless snarky flaimbait anytime someone attempts a little process
improvement. I don't do any BSD kernel development, but I like to stir things up.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 12:42               ` David Woodhouse
@ 2008-04-30 15:02                 ` Arjan van de Ven
  2008-05-05 10:03                 ` Benny Halevy
  1 sibling, 0 replies; 229+ messages in thread
From: Arjan van de Ven @ 2008-04-30 15:02 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andrew Morton, Adrian Bunk, Linus Torvalds, Rafael J. Wysocki,
	davem, linux-kernel, jirislaby, Steven Rostedt

On Thu, 01 May 2008 13:42:44 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> On Wed, 2008-04-30 at 07:15 -0700, Arjan van de Ven wrote:
> > Maybe that's a "boggle" for you; but for me that's symptomatic of
> > where we are today: We don't make (effective) prioritization
> > decisions. Such decisions are hard, because it effectively means
> > telling people "I'm sorry but your bug is not yet important". 
> 
> It's not that clear-cut, either. Something which manifests itself as a
> build failure or an immediate test failure on m68k alone, might
> actually turn out to cause subtle data corruption on other platforms.
> 
> You can't always know that it isn't important, just because it only
> shows up in some esoteric circumstances. You only really know how
> important it was _after_ you've fixed it.
> 
> That obviously doesn't help us to prioritise.

absolutely. I'm not going to argue that prioritization is easy. Or 
that we'll be able to get it right all the time.
Doesn't mean we shouldn't try at least somewhat..

> 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 12:41                           ` Rafael J. Wysocki
@ 2008-04-30 15:06                             ` Arjan van de Ven
  0 siblings, 0 replies; 229+ messages in thread
From: Arjan van de Ven @ 2008-04-30 15:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Willy Tarreau, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Thu, 1 May 2008 14:41:05 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> 
> The testing is not really a part of the process right now, though.
> We somehow hope that the kernel will be tested sufficiently before a
> major release, but we don't measure the testing coverage, for
> example.

Well. Take 2.6.25.. we know Fedora shipped it in their alpha's and
betas (and in rawhide). Those are used by a lot of people; so for me
that's a whole bunch of coverage right there. Is it perfect? No.
But in a way it's in the spirit of open source: the people who care
about a stable release the most (distros) [1], helped us getting this
tested. The other people on this thread we care greatly at least also
help us test in general.


[1] Not trying to say no single person wouldn't care; but a distro
tends to care more due to the sheer number of users... 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 14:51       ` Linus Torvalds
@ 2008-04-30 18:21         ` David Newall
  2008-04-30 18:27           ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: David Newall @ 2008-04-30 18:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel

Linus Torvalds wrote:
> Can you point to any actual stability problem?
>   

Well of course.  So could you because they are a matter of public record
on the list.  Don't pretend otherwise.  Just to give you some recent,
personal bugaboos, and not even drawing on the many hundreds of relevant
messages on LKML each month:

1. Out of memory, caused by apparent leak somewhere, resulting in
machine effectively hanging for a minute or two (massive disk i/o)
culminating in termination of one or more processes.  (For what it's
worth: 512MB, no swap.)  Problem takes a couple of days to develop
(hence I suspect a leak.)  This is running only Firefox, Thunderbird and
Evince, plus whatever xubuntu wants.  Restarting the killed
application(s) causes the problem to recur.  Restarting X doesn't help. 
Killing almost all processes also doesn't help.  Reboot is required. 
This problem seems not to be in 2.6.17, but is in 2.6.22 (plus whatever
patches xubuntu use) and 2.6.23.  I'm still testing 2.6.25, but probably
going to have to abandon it and go backwards, because...

2. Suspend to disk doesn't resume properly (two out of three times.) 
System comes back but X has severe wierdness.  Draws frames and title
bar, but not window contents.  Text-mode is just as bad: Screen is blank
(erased font table, perhaps?)  Subsequent suspend to disk doesn't resume
at all.

Note the wide range of kernels exhibiting problem 1.  I don't even want
to think about problem 2 at this stage; I just want to stop having to
reboot to reclaim memory, especially when a mate who does Windows
training visits!


> the fundamental issue is that we are just 
> too damn good at development.
>   

Not so good.  The process is flawed.  Inadequate testing.  Inadequate
review.  This has been mentioned by others, so you know I'm not making
it up.  The real fundamental issue is that people are too keen to
release and don't appear to care enough about correctness.

> you think that mentioning BSD as some 
> kind of shining example of goodness is a good way to reach that agenda.

Yes, BSD does seem to be a shining example of goodness, but I didn't
mention it because I think people should switch.  I did so to warn of
competition, to say that the world does not owe Linux a second chance
and isn't going to give it one.  It's pointless to debate the relative
merits of the two systems because, aside from the kernel, they are
identical; and there's little that matters between the kernels, other
than one appears to have a careful, robust and professional development
process.  Make no mistake about this point: I'm not saying that BSD is
better, rather that Linux cannot lose credibility and survive.

> But you know what? I really don't care one _whit_ what you do. You can 
> switch to Vista for all I care, and I really don't mind. All I care about 
> is doing a good job technically. 
>   
Sadly, you're doing a bad technical job in certain, important areas. 
You're pushing out buggy kernels and claiming that they're stable.  This
can't continue.  Attrition to BSD is the risk, not some threat that I'm
making.

> And you just show that you don't have a clue what you are talking about. 
> If you want stable kernel, don't follow the current -git tree.

Why are you bringing up git trees (which I don't use)?  I'm presently
plagued with a problem that's 2.6.22 or older, extending to at least
2.6.23 and maybe still current.  I've said quite clearly that I'm
talking about "stable" kernels, yet you presume I mean the git tree. 
Yet it's not the specifics of the problem I'm having that matters, it's
the systemic problems in Linux's development process.

I don't think I've anything to add unless the topic evolves in a
direction that asks what should be changed.  I'm posting this only
because I want on record the answer to the question about actual
stability problems.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 18:21         ` David Newall
@ 2008-04-30 18:27           ` Linus Torvalds
  2008-04-30 18:55             ` David Newall
  2008-04-30 19:06             ` Chris Friesen
  0 siblings, 2 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 18:27 UTC (permalink / raw)
  To: David Newall; +Cc: David Miller, linux-kernel



On Thu, 1 May 2008, David Newall wrote:
> 
> Why are you bringing up git trees (which I don't use)?  I'm presently
> plagued with a problem that's 2.6.22 or older, extending to at least
> 2.6.23 and maybe still current.

Ok, *PLONK*.

You're on an old kernel, don't know if your problem is fixed, and ask us 
to slow down development.

That makes sense.

Go away.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 18:27           ` Linus Torvalds
@ 2008-04-30 18:55             ` David Newall
  2008-04-30 19:08               ` Linus Torvalds
  2008-04-30 19:06             ` Chris Friesen
  1 sibling, 1 reply; 229+ messages in thread
From: David Newall @ 2008-04-30 18:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel

Linus Torvalds wrote:
> You're on an old kernel, don't know if your problem is fixed, and ask us 
> to slow down development.

I just finished telling you that I'm currently trying 2.6.25.  But you
couldn't have read that with any care at all, because I also just
finished telling you that it's not the specifics of the problem I'm
having that matters, it's the systemic problems in Linux's development
process.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 18:27           ` Linus Torvalds
  2008-04-30 18:55             ` David Newall
@ 2008-04-30 19:06             ` Chris Friesen
  2008-04-30 19:13               ` Linus Torvalds
  1 sibling, 1 reply; 229+ messages in thread
From: Chris Friesen @ 2008-04-30 19:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Newall, David Miller, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 1 May 2008, David Newall wrote:
> 
>>Why are you bringing up git trees (which I don't use)?  I'm presently
>>plagued with a problem that's 2.6.22 or older, extending to at least
>>2.6.23 and maybe still current.
> 
> 
> Ok, *PLONK*.
> 
> You're on an old kernel, don't know if your problem is fixed, and ask us 
> to slow down development.
> 
> That makes sense.
> 
> Go away.

He did say that he was testing 2.6.25, and that suspend-to-disk was 
broken in 2.6.25.

Chris

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 18:55             ` David Newall
@ 2008-04-30 19:08               ` Linus Torvalds
  2008-04-30 19:16                 ` David Newall
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 19:08 UTC (permalink / raw)
  To: David Newall; +Cc: David Miller, linux-kernel



On Thu, 1 May 2008, David Newall wrote:
> 
> I just finished telling you that I'm currently trying 2.6.25.  But you
> couldn't have read that with any care at all, because I also just
> finished telling you that it's not the specifics of the problem I'm
> having that matters, it's the systemic problems in Linux's development
> process.

No. What you told us was nothing like that at all. What you told us was 
that you totally ignored the issue I brought up, namely that development 
happens, and that you have the choice of stagnating or accepting it.

You point to it as some "systemic problem", and I told you that it's a 
sign of fast development. Things change. You didn't listen, or understand.

If you want systemic problems, it is your kind of "bug report" that isn't 
anything like a bug report. Make a real report, don't whine. Push the 
_report_, not your inane agenda. Talk about *technology*, not about how 
you wish everything revolved around you and your wishes.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:06             ` Chris Friesen
@ 2008-04-30 19:13               ` Linus Torvalds
  2008-04-30 19:22                 ` David Newall
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 19:13 UTC (permalink / raw)
  To: Chris Friesen; +Cc: David Newall, David Miller, linux-kernel



On Wed, 30 Apr 2008, Chris Friesen wrote:
> 
> He did say that he was testing 2.6.25, and that suspend-to-disk was 
> broken in 2.6.25.

Neither of which had anything to do with the whole "slow down" argument.

If you have a bug, make a bug report, and push it, and make people aware 
of it. But don't make it an argument for development to slow down.

Should we all stand around with our thumbs up our *ss because somebody has 
a bug? Should the other developers just stop, because suspend-to-disk is 
broken for somebody? Should everything come to a standstill because David 
Newall doesn't like how there are other things going on that are 
independent of _his_ problems?

Do you really believe that?

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:08               ` Linus Torvalds
@ 2008-04-30 19:16                 ` David Newall
  2008-04-30 19:25                   ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: David Newall @ 2008-04-30 19:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel

Linus Torvalds wrote:
> On Thu, 1 May 2008, David Newall wrote:
>   
>> I just finished telling you that I'm currently trying 2.6.25.  But you
>> couldn't have read that with any care at all, because I also just
>> finished telling you that it's not the specifics of the problem I'm
>> having that matters, it's the systemic problems in Linux's development
>> process.
>>     
>
> No. What you told us was nothing like that at all.

Don't be foolish, Linus.  It was exactly like that, almost to the point
of quoting myself.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:13               ` Linus Torvalds
@ 2008-04-30 19:22                 ` David Newall
  2008-04-30 19:42                   ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: David Newall @ 2008-04-30 19:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Friesen, David Miller, linux-kernel

Linus Torvalds wrote:
> Should everything come to a standstill because David 
> Newall doesn't like how there are other things going on that are 
> independent of _his_ problems?
>   

You're being a nasty piece of work this day, Linus, and you're fibbing
by mischaracterising what I said which, by the way, included, "it's not
the specifics of the problem I'm having that matters".  You're taking
this far too personally.  Get a grip.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:16                 ` David Newall
@ 2008-04-30 19:25                   ` Linus Torvalds
  2008-05-01  4:31                     ` David Newall
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 19:25 UTC (permalink / raw)
  To: David Newall; +Cc: David Miller, linux-kernel



On Thu, 1 May 2008, David Newall wrote:
> >
> > No. What you told us was nothing like that at all.
> 
> Don't be foolish, Linus.  It was exactly like that, almost to the point
> of quoting myself.

You misunderstand.

I object to your _idiotic_ claim that there are "systemic problems", where 
your "solution" to them is apparently to stop making releases and stop 
making forward progress. 

That's why I said you told us was nothing like that. What you told us were 
your personal problems, no "systemic" issues.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30  2:03 Slow DOWN, please!!! David Miller
  2008-04-30  4:03 ` David Newall
  2008-04-30 14:48 ` Peter Teoh
@ 2008-04-30 19:36 ` Rafael J. Wysocki
  2008-04-30 20:00   ` Andrew Morton
                     ` (2 more replies)
  2 siblings, 3 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 19:36 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, Andrew Morton, Linus Torvalds, Jiri Slaby

On Wednesday, 30 of April 2008, David Miller wrote:
> 
> This is starting to get beyond frustrating for me.
> 
> Yesterday, I spent the whole day bisecting boot failures
> on my system due to the totally untested linux/bitops.h
> optimization, which I fully analyzed and debugged.
> 
> Today, I had hoped that I could get some work done of my
> own, but that's not the case.
> 
> Yet another bootup regression got added within the last 24
> hours.
> 
> I don't mind fixing the regression or two during the merge
> window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!
> 
> The tree breaks every day, and it's becomming an extremely
> non-fun environment to work in.
> 
> We need to slow down the merging, we need to review things
> more, we need people to test their fucking changes!

Well, I must say I second that.

I'm not seeing regressions myself this time (well, except for the one that
Jiri fixed), but I did find a few of them during the post-2.6.24 merge window
and I wouldn't like to repeat that experience, so to speak.

IMO, the merge window is way too short for actually testing anything.  I rebuild
the kernel once or even twice a day and there's no way I can really test it.
I can only check if it breaks right away.  And if it does, there's no time to
find out what broke it before the next few hundreds of commits land on top of
that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:22                 ` David Newall
@ 2008-04-30 19:42                   ` Linus Torvalds
  0 siblings, 0 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 19:42 UTC (permalink / raw)
  To: David Newall; +Cc: Chris Friesen, David Miller, linux-kernel



On Thu, 1 May 2008, David Newall wrote:
>
> You're taking this far too personally.

Umm. If you didn't want a personal opinion, why did you Cc me in the first 
place then, and ask for my input?

I gave my input to you. I think your arguments are ludicrous, to the point 
of being totally idiotic. You complain how I don't release kernels that 
are stable, but without any suggestions on what the issue might be, apart 
from apparently me merging too much and making too many releases.

But do you really expect me to stop merging, or hold up releases that fix 
hundreds of issues, just because there are other issues pending? Do you 
really think development can be stopped? Trust me, we've tried. Every 
time, it just leads to worse problems when the floodgates are then opened.

And yes, there is a solution: don't develop so much. Don't allow thousands 
of developers to be involved. Do a small core group, and make development 
so hard or inconvenient that you only have a few tens of people who write 
code, and vet them and force them to jump through hoops when adding new 
features (or fixing old ones, for that matter).

And yes, that *does* result in a "stable" system. Never mind that it's 
stable for all the wrong reasons, and generally doesn't actually work well 
across a dynamic environment (whether the hardware base below or user 
space above).

See? This is why I think your arguments are so silly and misguided. 

But if you actually have real constructive ideas on things to actually 
*do*, please do mention them. We've changed our models over time, several 
times, exactly because we've searched for better ways to do thigns. But do 
realize that

 (a) we can't just stop, or even really slow down. We can onyl try to 
     regulate and to some degree direct the flood, not hold it up for any 
     particular issue.

 (b) We do have process in place, and it may not be perfect, but I doubt 
     anything is, and what we do have actually has evolved over the years. 

     And that's not just my process (ie "two-week merge window, followed 
     by about 6-8 weeks of fixups"), but the whole process both before and 
     after it (Andrew and now linux-next in front of it, and stable kernel 
     tree and the vendors after it).

 (c) the "big picture" discussion is separate from individual issues. If 
     you want your suspend-to-disk issue resolved, or a memory leak 
     solved, you don't solve those by trying to complain about other parts 
     of the system, that are totally separate.

     The global flow of patches and releases is not something that we can 
     hold up for _any_ of your individual problems. I do end up delaying 
     releases for really core things, so individual problems do obviously 
     affect (for example) the release timing. But the solution to them is 
     not in complaining about slowing down development, it is about 
     actually trying to engage the developers of *that* feature in *that* 
     particular bug.

And finally, trust me, if you want to have people care about your 
problems, the last thing you want to do is say "I might switch to BSD". 
Because quite frankly, I really don't care. People who think that threats 
like that work in any productive way can go screw themselves. I'll flame 
idiots like that, and my likelihood of helping people because they think 
they hold a gun to my head is almost zero.

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:36 ` Rafael J. Wysocki
@ 2008-04-30 20:00   ` Andrew Morton
  2008-04-30 20:20     ` Rafael J. Wysocki
  2008-04-30 20:05   ` Linus Torvalds
  2008-04-30 20:15   ` Andrew Morton
  2 siblings, 1 reply; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 20:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: davem, linux-kernel, torvalds, jirislaby

On Wed, 30 Apr 2008 21:36:57 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Wednesday, 30 of April 2008, David Miller wrote:
> > 
> > This is starting to get beyond frustrating for me.
> > 
> > Yesterday, I spent the whole day bisecting boot failures
> > on my system due to the totally untested linux/bitops.h
> > optimization, which I fully analyzed and debugged.
> > 
> > Today, I had hoped that I could get some work done of my
> > own, but that's not the case.
> > 
> > Yet another bootup regression got added within the last 24
> > hours.
> > 
> > I don't mind fixing the regression or two during the merge
> > window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!
> > 
> > The tree breaks every day, and it's becomming an extremely
> > non-fun environment to work in.
> > 
> > We need to slow down the merging, we need to review things
> > more, we need people to test their fucking changes!
> 
> Well, I must say I second that.

ooh, fun thread.

One of the main reasons for -mm (probably _the_ main reason) is to weed out
other-developer-impacting regressions before they hit mainline and, umm,
affect developers.

But there are implementation problems:

a) developers aren't testing -mm enough

b) -mm releases have become too slow, and (hence) too unstable

c) people are slamming changes into mainline which have never been seen
   in -mm.  Lots of changes.

So here's how we're going to fix David's problem:

- Everyone gets their stuff into linux-next.

- Lots of people _test_ linux-next.  Just once a week.

Those two steps will improve the merge-window chaos a lot.  Things will get
better.




The remaining open problem is what do we do about the shiny new code which
is getting slammed into the merge window?

Well, it's very easy to tell whether code which appears in the merge window
was present in linux-next.

Our first way of preventing people from shoving inadequately-cooked code
into the merge window is suasion (aka flaming their titties off).  If that
proves insufficient and if we still have a sufficiently large problem that
we need to do something about it then sure, let's reevaluate.

But one thing at a time.  For the 2.6.27 release let us concentrte on two
things

- get your stuff into linux-next

- test linux-next.

If merge-window stability is still a problem after that then let's revisit?

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:36 ` Rafael J. Wysocki
  2008-04-30 20:00   ` Andrew Morton
@ 2008-04-30 20:05   ` Linus Torvalds
  2008-04-30 20:14     ` Linus Torvalds
                       ` (2 more replies)
  2008-04-30 20:15   ` Andrew Morton
  2 siblings, 3 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 20:05 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> 
> IMO, the merge window is way too short for actually testing anything.

That is largely on purpose.

There's two choices:

 - have a longer and calmer merge window, spread out the joy, and have 
   people test and fix their things during the merge window too. In other 
   words, less black-and-white.

 - Really short merge window, and use the extra time *after* it to fix the 
   issues.

and I've obviously gone for the latter. In fact, I'd personally like to 
make it even shorter, because the problem with the long merge window can 
be summed up very simply:

   Long merge windows don't work - because rather than test more, it just 
   means that people will use them to make more changes!

So one of the major things about the short merge window is that it's 
hopefully encouraging people to have things ready by the time the merge 
window opens, because it's too late to do anything later.

And yes, we could have some other way of enforcing that - allow the merge 
window to be longer, but have some other mechanism to make sure that I 
only merge old code. 

In fact, I'd personally *love* to have a hard rule that says "I will only 
pull from trees that were already 'done' by the time the window opened", 
and we've been kind-of moving in that direction.

But that wish is counteracted by the fact that the merges themselves do 
need some development, so expecting everything to be ready before-hand is 
simply not realistic. 

Also, while I'd like trees to be ready when the window opens, at the same 
time I do think that it's good to spread out some of it, and get *some* 
basic testing - even if it's just a nightly build and a few tens of 
developers.

> I rebuild the kernel once or even twice a day and there's no way I can 
> really test it. I can only check if it breaks right away.

And really, that's all that we'd expect during the merge window. We want 
to find the *obvious* problems - build issues, and the things that hit 
everybody, but let's face it, the subtle ones will take time to find 
regardless.

Then, the short merge window means that we have more time when we really 
don't have big changes going in to find the subtle ones.

(And making the release cycle longer would *not* help - that would just 
make the next merge window more painful, so while it can, and does, work 
for some individual release with particular problems, it's not a solution 
in the long run).

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:05   ` Linus Torvalds
@ 2008-04-30 20:14     ` Linus Torvalds
  2008-04-30 20:56       ` Rafael J. Wysocki
  2008-04-30 23:34       ` Greg KH
  2008-04-30 20:45     ` Rafael J. Wysocki
  2008-04-30 23:29     ` Paul Mackerras
  2 siblings, 2 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 20:14 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Wed, 30 Apr 2008, Linus Torvalds wrote:
> 
> In fact, I'd personally like to make it even shorter

Just to clarify: I'd actually like to make the merge window be just a 
week. If even that.

With linux-next hopefully stepping up to be a place where the actual 
_conflicts_ (which are usually not the big problem, they are just 
inconvenient from a timing standpoint) can get found and handled early, a 
shorter merge window should be technically possible.

HOWEVER. Even now, at two weeks, we do have issues where timing just 
doesn't fit some developer, because of conferences or vacations or just 
random personal issues or whatever. There are always people who grumble 
because the window didn't work for them.

Of course, they should have had it all ready, but somehow that simply 
doesn't happen. I think it's against most human nature to be quite _that_ 
forward-looking.

And maybe everything would be ok if we could also shorten the actual 
release cycle, so that if you miss one merge window for some random 
conference or other (or just a *really* bad hair-day and you didn't get 
your act together), you wouldn't mind too much and you'd just hit the next 
one instead.

But that, in turn, is unrealistic because when bugs do happen, the latency 
you get between testers and developers is long enough that I really don't 
think we can shorten the after-merge-window thing much. Six weeks seems to 
be already pushing it.

And as mentioned, a longer after-merge-window-stabilization phase is just 
going to aggravate the problem next time around.

We could have staggered releases, but let's face it, that's what -mm and 
linux-next and stable is all about.

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:36 ` Rafael J. Wysocki
  2008-04-30 20:00   ` Andrew Morton
  2008-04-30 20:05   ` Linus Torvalds
@ 2008-04-30 20:15   ` Andrew Morton
  2008-04-30 20:31     ` Linus Torvalds
  2 siblings, 1 reply; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 20:15 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: davem, linux-kernel, torvalds, jirislaby

On Wed, 30 Apr 2008 21:36:57 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> IMO, the merge window is way too short for actually testing anything.

<jumps up and down>

There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!

_anything_ which appears in 2.6.x-rc1 and which wasn't in 2.6.x-mm1 was
snuck in too late (OK, apart from trivia and bugfixes).


If we decide that we need to fix the oh-shit-lets-slam-this-in-and-hope
problem then I expect we can do so, via fairly relible means.

But the first attempt at solving it should be to ask people to not do that.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:00   ` Andrew Morton
@ 2008-04-30 20:20     ` Rafael J. Wysocki
  0 siblings, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 20:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davem, linux-kernel, torvalds, jirislaby

On Wednesday, 30 of April 2008, Andrew Morton wrote:
> On Wed, 30 Apr 2008 21:36:57 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Wednesday, 30 of April 2008, David Miller wrote:
> > > 
> > > This is starting to get beyond frustrating for me.
> > > 
> > > Yesterday, I spent the whole day bisecting boot failures
> > > on my system due to the totally untested linux/bitops.h
> > > optimization, which I fully analyzed and debugged.
> > > 
> > > Today, I had hoped that I could get some work done of my
> > > own, but that's not the case.
> > > 
> > > Yet another bootup regression got added within the last 24
> > > hours.
> > > 
> > > I don't mind fixing the regression or two during the merge
> > > window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!
> > > 
> > > The tree breaks every day, and it's becomming an extremely
> > > non-fun environment to work in.
> > > 
> > > We need to slow down the merging, we need to review things
> > > more, we need people to test their fucking changes!
> > 
> > Well, I must say I second that.
> 
> ooh, fun thread.
> 
> One of the main reasons for -mm (probably _the_ main reason) is to weed out
> other-developer-impacting regressions before they hit mainline and, umm,
> affect developers.
> 
> But there are implementation problems:
> 
> a) developers aren't testing -mm enough
> 
> b) -mm releases have become too slow, and (hence) too unstable
> 
> c) people are slamming changes into mainline which have never been seen
>    in -mm.  Lots of changes.

Yeah.

> So here's how we're going to fix David's problem:
> 
> - Everyone gets their stuff into linux-next.
> 
> - Lots of people _test_ linux-next.  Just once a week.

For this to happen, let's make the mainline change slower than once a day
after the merge window.
 
> Those two steps will improve the merge-window chaos a lot.  Things will get
> better.

Not until we make a rule that nothing that didn't went through linux-next is
mergeable unless it's an obvious bugfix that has no side effects.
 
> The remaining open problem is what do we do about the shiny new code which
> is getting slammed into the merge window?
> 
> Well, it's very easy to tell whether code which appears in the merge window
> was present in linux-next.
> 
> Our first way of preventing people from shoving inadequately-cooked code
> into the merge window is suasion (aka flaming their titties off).  If that
> proves insufficient and if we still have a sufficiently large problem that
> we need to do something about it then sure, let's reevaluate.

OK

> But one thing at a time.  For the 2.6.27 release let us concentrte on two
> things
> 
> - get your stuff into linux-next
> 
> - test linux-next.
> 
> If merge-window stability is still a problem after that then let's revisit?

I'll see you in the analogous thread during the next merge window. ;-)

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:15   ` Andrew Morton
@ 2008-04-30 20:31     ` Linus Torvalds
  2008-04-30 20:47       ` Dan Noe
                         ` (3 more replies)
  0 siblings, 4 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 20:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, davem, linux-kernel, jirislaby



On Wed, 30 Apr 2008, Andrew Morton wrote:
> 
> <jumps up and down>
> 
> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!

The problem I see with both -mm and linux-next is that they tend to be 
better at finding the "physical conflict" kind of issues (ie the merge 
itself fails) than the "code looks ok but doesn't actually work" kind of 
issue.

Why?

The tester base is simply too small.

Now, if *that* could be improved, that would be wonderful, but I'm not 
seeing it as very likely.

I think we have fairly good penetration these days with the regular -git 
tree, but I think that one is quite frankly a *lot* less scary than -mm or 
-next are, and there it has been an absolutely huge boon to get the kernel 
into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also 
started something like that).

So I'm very pessimistic about getting a lot of test coverage before -rc1.

Maybe too pessimistic, who knows?

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:05   ` Linus Torvalds
  2008-04-30 20:14     ` Linus Torvalds
@ 2008-04-30 20:45     ` Rafael J. Wysocki
  2008-04-30 21:37       ` Linus Torvalds
  2008-05-01 13:54       ` Stefan Richter
  2008-04-30 23:29     ` Paul Mackerras
  2 siblings, 2 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 20:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > 
> > IMO, the merge window is way too short for actually testing anything.
> 
> That is largely on purpose.
> 
> There's two choices:

Oh well, I don't think it's really that simple.

>  - have a longer and calmer merge window, spread out the joy, and have 
>    people test and fix their things during the merge window too. In other 
>    words, less black-and-white.
> 
>  - Really short merge window, and use the extra time *after* it to fix the 
>    issues.
> 
> and I've obviously gone for the latter. In fact, I'd personally like to 
> make it even shorter, because the problem with the long merge window can 
> be summed up very simply:
> 
>    Long merge windows don't work - because rather than test more, it just 
>    means that people will use them to make more changes!

And what do you think is happening _after_ the merge window closes, when
we're supposed to be fixing bugs?  People work on new code.  And, in fact, they
have to, if they want to be ready for the next merge window.

> So one of the major things about the short merge window is that it's 
> hopefully encouraging people to have things ready by the time the merge 
> window opens, because it's too late to do anything later.
> 
> And yes, we could have some other way of enforcing that - allow the merge 
> window to be longer, but have some other mechanism to make sure that I 
> only merge old code. 

How about, instead, putting limits on the amount of stuff that's going to be
merged during the next window?

> In fact, I'd personally *love* to have a hard rule that says "I will only 
> pull from trees that were already 'done' by the time the window opened", 
> and we've been kind-of moving in that direction.

Well, and when's the time for fixing bugs?  Surely not during the merge window
and also not after that, because otherwise people won't be ready for the next
merge window with the new code.

> But that wish is counteracted by the fact that the merges themselves do 
> need some development, so expecting everything to be ready before-hand is 
> simply not realistic. 
> 
> Also, while I'd like trees to be ready when the window opens, at the same 
> time I do think that it's good to spread out some of it, and get *some* 
> basic testing - even if it's just a nightly build and a few tens of 
> developers.
> 
> > I rebuild the kernel once or even twice a day and there's no way I can 
> > really test it. I can only check if it breaks right away.
> 
> And really, that's all that we'd expect during the merge window. We want 
> to find the *obvious* problems - build issues, and the things that hit 
> everybody, but let's face it, the subtle ones will take time to find 
> regardless.

Exactly.  Moreover, the code is now being merged at a pace that makes it
physically impossible to review it given the human resources we have.
 
> Then, the short merge window means that we have more time when we really 
> don't have big changes going in to find the subtle ones.

Sorry to say that, but I don't think this is realistic.  What happens after the merge
window is people go and develop new stuff.  They look at the already merged
code only if they have to.  Also, there are a _few_ people testing the kernel
carefully enough to see the more subtle problems, let alone debugging and
fixing them.

> (And making the release cycle longer would *not* help - that would just 
> make the next merge window more painful, so while it can, and does, work 
> for some individual release with particular problems, it's not a solution 
> in the long run).

My point is, given the width of the merge windown, there's too much stuff
going in during it.  As far as I'm concerned, the window can be a week long
or whatever, but let's make fewer commits over a unit of time.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:31     ` Linus Torvalds
@ 2008-04-30 20:47       ` Dan Noe
  2008-04-30 20:59         ` Andrew Morton
  2008-04-30 20:54       ` Andrew Morton
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 229+ messages in thread
From: Dan Noe @ 2008-04-30 20:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby

On 4/30/2008 16:31, Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Andrew Morton wrote:
>> <jumps up and down>
>>
>> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> 
> The problem I see with both -mm and linux-next is that they tend to be 
> better at finding the "physical conflict" kind of issues (ie the merge 
> itself fails) than the "code looks ok but doesn't actually work" kind of 
> issue.
> 
> Why?
> 
> The tester base is simply too small.
> 
> Now, if *that* could be improved, that would be wonderful, but I'm not 
> seeing it as very likely.

Perhaps we should be clear and simple about what potential testers 
should be running at any given point in time.  With -mm, linux-next, 
linux-2.6, etc, as a newcomer I find it difficult to know where my 
testing time and energy is best directed.

Is linux-next the right thing to be running at this point?  Is there a 
need for testing in a particular tree (netdev, x86, etc)?

Cheers,
Dan

-- 
                     /--------------- - -  -  -   -   -
                     |  Dan Noe
                     |  http://isomerica.net/~dpn/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:31     ` Linus Torvalds
  2008-04-30 20:47       ` Dan Noe
@ 2008-04-30 20:54       ` Andrew Morton
  2008-04-30 21:21         ` David Miller
                           ` (2 more replies)
  2008-04-30 21:52       ` H. Peter Anvin
  2008-05-01  0:31       ` RFC: starting a kernel-testers group for newbies Adrian Bunk
  3 siblings, 3 replies; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 20:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rjw, davem, linux-kernel, jirislaby

On Wed, 30 Apr 2008 13:31:08 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Wed, 30 Apr 2008, Andrew Morton wrote:
> > 
> > <jumps up and down>
> > 
> > There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> 
> The problem I see with both -mm and linux-next is that they tend to be 
> better at finding the "physical conflict" kind of issues (ie the merge 
> itself fails) than the "code looks ok but doesn't actually work" kind of 
> issue.
> 
> Why?
> 
> The tester base is simply too small.
> 
> Now, if *that* could be improved, that would be wonderful, but I'm not 
> seeing it as very likely.
> 
> I think we have fairly good penetration these days with the regular -git 
> tree, but I think that one is quite frankly a *lot* less scary than -mm or 
> -next are, and there it has been an absolutely huge boon to get the kernel 
> into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also 
> started something like that).
> 
> So I'm very pessimistic about getting a lot of test coverage before -rc1.
> 
> Maybe too pessimistic, who knows?
> 

Well.  We'll see.

linux-next is more than another-tree-to-test.  It is (or will be) a change
in our processes and culture.  For a start, subsystem maintainers can no
longer whack away at their own tree as if the rest of use don't exist. 
They now have to be more mindful of merge issues.

Secondly, linux-next is more accessible than -mm: more releases, more
stable, better tested by he-who-releases it, available via git:// etc.  It
should be very easy for developers to do their weekly "does linux-next
boot" test.

Plus, of course, people who complain about merge-window breakage only to
find that the breakage was already in linux-next except they didn't test it
will not have a leg to stand on.


I feared that linux-next wouldn't work: that Stephen would stomp off in
disgust at all the crap people send at him.  But in fact it seems to be
going very well from that POV.

I get the impression that we're seeing very little non-Stephen testing of
linux-next at this stage.  I hope we can ramp that up a bit, initially by
having core developers doing at least some basic sanity testing.



linux-next does little to address our two largest (IMO) problems:
inadequate review and inadequate response to bug and regression reports. 
But those problems are harder to fix..


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:14     ` Linus Torvalds
@ 2008-04-30 20:56       ` Rafael J. Wysocki
  2008-04-30 23:34       ` Greg KH
  1 sibling, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 20:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Linus Torvalds wrote:
> > 
> > In fact, I'd personally like to make it even shorter
> 
> Just to clarify: I'd actually like to make the merge window be just a 
> week. If even that.
> 
> With linux-next hopefully stepping up to be a place where the actual 
> _conflicts_ (which are usually not the big problem, they are just 
> inconvenient from a timing standpoint) can get found and handled early, a 
> shorter merge window should be technically possible.

That even might be better, if there's less code merged as a result.

> HOWEVER. Even now, at two weeks, we do have issues where timing just 
> doesn't fit some developer, because of conferences or vacations or just 
> random personal issues or whatever. There are always people who grumble 
> because the window didn't work for them.

Well, where's it stated that you have to develop new code for each merge
window?  By making shorter merge windows with less code merged in each of
them, we could actaully improve things.

> Of course, they should have had it all ready, but somehow that simply 
> doesn't happen. I think it's against most human nature to be quite _that_ 
> forward-looking.
> 
> And maybe everything would be ok if we could also shorten the actual 
> release cycle, so that if you miss one merge window for some random 
> conference or other (or just a *really* bad hair-day and you didn't get 
> your act together), you wouldn't mind too much and you'd just hit the next 
> one instead.

Exactly.

> But that, in turn, is unrealistic because when bugs do happen, the latency 
> you get between testers and developers is long enough that I really don't 
> think we can shorten the after-merge-window thing much. Six weeks seems to 
> be already pushing it.

That depends on the amount of bugs introduced during the merge window.  With
shorter merge windows we may introduce fewer bugs per merge window and
the most subtle ones take more six weeks to find anyway.

> And as mentioned, a longer after-merge-window-stabilization phase is just 
> going to aggravate the problem next time around.
> 
> We could have staggered releases, but let's face it, that's what -mm and 
> linux-next and stable is all about.

Well, that's assuming that people test linux-next and -mm etc., but frankly I'm
not seeing that happening.  Hopefully, things are going to improve.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:47       ` Dan Noe
@ 2008-04-30 20:59         ` Andrew Morton
  2008-04-30 21:30           ` Rafael J. Wysocki
  2008-04-30 22:53           ` Mariusz Kozlowski
  0 siblings, 2 replies; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 20:59 UTC (permalink / raw)
  To: Dan Noe; +Cc: torvalds, rjw, davem, linux-kernel, jirislaby

On Wed, 30 Apr 2008 16:47:00 -0400
Dan Noe <dpn@isomerica.net> wrote:

> On 4/30/2008 16:31, Linus Torvalds wrote:
> > 
> > On Wed, 30 Apr 2008, Andrew Morton wrote:
> >> <jumps up and down>
> >>
> >> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> > 
> > The problem I see with both -mm and linux-next is that they tend to be 
> > better at finding the "physical conflict" kind of issues (ie the merge 
> > itself fails) than the "code looks ok but doesn't actually work" kind of 
> > issue.
> > 
> > Why?
> > 
> > The tester base is simply too small.
> > 
> > Now, if *that* could be improved, that would be wonderful, but I'm not 
> > seeing it as very likely.
> 
> Perhaps we should be clear and simple about what potential testers 
> should be running at any given point in time.  With -mm, linux-next, 
> linux-2.6, etc, as a newcomer I find it difficult to know where my 
> testing time and energy is best directed.

-mm consists of the sum of

a) the ~80 subsytem maintainers trees (git and quilt)

b) the ~100 subsytem trees which are hosted only in -mm.


linux-next consists of only a)

Soon I shall remove a) from -mm and will replace it with linux-next (this
should be a no-op).

Later, I shall start feeding those 100 random subsystems into linux-next
as well (somehow).

> Is linux-next the right thing to be running at this point?

yes.  85% of the code which goes into Linux goes via the ~80 subsystem
maintainers' trees and is (or should be) in linux-next.  The other 15%
is the hosted-in-mm work.

>  Is there a 
> need for testing in a particular tree (netdev, x86, etc)?

No, please test the sum-of-all-trees in linux-next.  If you hit problems
then, as part of the problem resolving process a developer _might_ ask you
to test one tree specifically, but that would be a pretty unusual
circumstance.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:54       ` Andrew Morton
@ 2008-04-30 21:21         ` David Miller
  2008-04-30 21:47           ` Rafael J. Wysocki
                             ` (3 more replies)
  2008-04-30 21:42         ` Dmitri Vorobiev
  2008-05-09  9:28         ` Jiri Kosina
  2 siblings, 4 replies; 229+ messages in thread
From: David Miller @ 2008-04-30 21:21 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, rjw, linux-kernel, jirislaby

From: Andrew Morton <akpm@linux-foundation.org>
Date: Wed, 30 Apr 2008 13:54:05 -0700

> linux-next does little to address our two largest (IMO) problems:
> inadequate review and inadequate response to bug and regression reports. 
> But those problems are harder to fix..

This is all about positive and negative reinforcement.

The people who sit and git bisect their lives away to get the
regressions fixed need more positive reinforcement.  And the people
who stick these regressions into the tree need more negative
reinforcement.

The current way of dealing with folks who stick broken crud into the
tree results in zero change in behvaior.

People who insert the bum changes into the tree only really have one
core thing that they are sensitive to, their reputation.  That's why
there is an enormous reluctance to even suggest reverts, it looks bad
for them and it also makes more work for them in the end.

I guess what these folks are truly afraid of is that someone will
start tracking reverts and post their results in some presentation
at some big conference.  I say that would be a good thing.  To
be honest, hitting the revert button more aggressively and putting
the fear of being the "revert king" into everyone's minds might
really help with this problem.

Currently there is no sufficient negative pushback on people who
insert broken crud into the tree.  So it should be no surprise that it
continues.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:59         ` Andrew Morton
@ 2008-04-30 21:30           ` Rafael J. Wysocki
  2008-04-30 21:37             ` Andrew Morton
  2008-04-30 22:08             ` Linus Torvalds
  2008-04-30 22:53           ` Mariusz Kozlowski
  1 sibling, 2 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 21:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Noe, torvalds, davem, linux-kernel, jirislaby, Stephen Rothwell

On Wednesday, 30 of April 2008, Andrew Morton wrote:
> On Wed, 30 Apr 2008 16:47:00 -0400
> Dan Noe <dpn@isomerica.net> wrote:
> 
> > On 4/30/2008 16:31, Linus Torvalds wrote:
> > > 
> > > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > >> <jumps up and down>
> > >>
> > >> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> > > 
> > > The problem I see with both -mm and linux-next is that they tend to be 
> > > better at finding the "physical conflict" kind of issues (ie the merge 
> > > itself fails) than the "code looks ok but doesn't actually work" kind of 
> > > issue.
> > > 
> > > Why?
> > > 
> > > The tester base is simply too small.
> > > 
> > > Now, if *that* could be improved, that would be wonderful, but I'm not 
> > > seeing it as very likely.
> > 
> > Perhaps we should be clear and simple about what potential testers 
> > should be running at any given point in time.  With -mm, linux-next, 
> > linux-2.6, etc, as a newcomer I find it difficult to know where my 
> > testing time and energy is best directed.
> 
> -mm consists of the sum of
> 
> a) the ~80 subsytem maintainers trees (git and quilt)
> 
> b) the ~100 subsytem trees which are hosted only in -mm.
> 
> 
> linux-next consists of only a)
> 
> Soon I shall remove a) from -mm and will replace it with linux-next (this
> should be a no-op).
> 
> Later, I shall start feeding those 100 random subsystems into linux-next
> as well (somehow).
> 
> > Is linux-next the right thing to be running at this point?
> 
> yes.  85% of the code which goes into Linux goes via the ~80 subsystem
> maintainers' trees and is (or should be) in linux-next.  The other 15%
> is the hosted-in-mm work.
> 
> >  Is there a 
> > need for testing in a particular tree (netdev, x86, etc)?
> 
> No, please test the sum-of-all-trees in linux-next.  If you hit problems
> then, as part of the problem resolving process a developer _might_ ask you
> to test one tree specifically, but that would be a pretty unusual
> circumstance.

How bisectable is linux-next, BTW?

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:30           ` Rafael J. Wysocki
@ 2008-04-30 21:37             ` Andrew Morton
  2008-04-30 22:08             ` Linus Torvalds
  1 sibling, 0 replies; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 21:37 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: dpn, torvalds, davem, linux-kernel, jirislaby, sfr

On Wed, 30 Apr 2008 23:30:20 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> > No, please test the sum-of-all-trees in linux-next.  If you hit problems
> > then, as part of the problem resolving process a developer _might_ ask you
> > to test one tree specifically, but that would be a pretty unusual
> > circumstance.
> 
> How bisectable is linux-next, BTW?

don't know.  Fully, one hopes.

Laurent Riffard did a successful bisection last month; I don't see many
other signs on the linux-next list.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:45     ` Rafael J. Wysocki
@ 2008-04-30 21:37       ` Linus Torvalds
  2008-04-30 22:23         ` Rafael J. Wysocki
  2008-05-01 13:54       ` Stefan Richter
  1 sibling, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 21:37 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > 
> >    Long merge windows don't work - because rather than test more, it just 
> >    means that people will use them to make more changes!
> 
> And what do you think is happening _after_ the merge window closes, when
> we're supposed to be fixing bugs?  People work on new code.  And, in fact, they
> have to, if they want to be ready for the next merge window.

Oh, I agree. But at that point, the issue you brought up - of testing and 
then having the code change under you wildly - has at least gone away.

And I think you are missing a big issue:

> Sorry to say that, but I don't think this is realistic.  What happens after the merge
> window is people go and develop new stuff.

>From a testing standpoint, the *developers* aren't ever even the main 
issue. Yes, we get test coverage that way too, but we should really aim 
for getting most of the non-obvious issues from the user community, and 
not primarily from developers.

So the whole point of the merge window is *not* to have developers testing 
their code during the six subsequent weeks, but to have *users* able to 
use -rc1 and report issues!

That's why the distro "testing" trees are so important. And that's why 
it's so important that -rc1 be timely. 

> My point is, given the width of the merge windown, there's too much stuff
> going in during it.  As far as I'm concerned, the window can be a week long
> or whatever, but let's make fewer commits over a unit of time.

I'm not following that logic. 

A single merge will bring in easily thousands of commits. It doesn't 
matter if the merge window is a day or a week or two weeks, the merge will 
be one event.

And there's no way to avoid the fact that during the merge window, we will 
get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
9629 commits).

So your "fewer commits over a unit of time" doesn't make sense. We have 
those ten thousand commits. They need to go in. They cannot take forever. 
Ergo, you *will* have a thousand commits a day during the merge window.

We can spread it out a bit (and I do to some degree), but in many ways 
that is just going to be more painful. So it's actually easier if we can 
get about half of the merges done early, so that people like Andrew then 
has at least most of the base set for him by the first few days of the 
merge window.

So here's the math: 3,500 commits per month. That's just the *average* 
speed, it's sometimes more. And we *cannot* merge them continuously, 
because we need to have a stabler period for testing. And remember: those 
3,500 commits don't stop happening just because they aren't merged. You 
should think of them as a constant pressure.

So 3,500 commits per month, but with a stable period (that is *longer* 
than the merge window) that means that the merge window needs to merge 
that constant stream of commits *faster* than they happen, so that we can 
then have that breather when we try to get users to test it. Let's say 
that we have a 1:3 ratio (which is fairly close to what we have), and that 
means that we need to merge 3,500 commits in a week. 

That's just simple *math*. So when you say "let's make fewer commits over 
a unit of time" I can onyl shake my head and wonder what the hell you are 
talking about. The merge window _needs_ to do those 3,500 commits per 
week. Otherwise they don't get merged!

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:54       ` Andrew Morton
  2008-04-30 21:21         ` David Miller
@ 2008-04-30 21:42         ` Dmitri Vorobiev
  2008-04-30 22:06           ` Jiri Slaby
  2008-04-30 22:10           ` Andrew Morton
  2008-05-09  9:28         ` Jiri Kosina
  2 siblings, 2 replies; 229+ messages in thread
From: Dmitri Vorobiev @ 2008-04-30 21:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, rjw, davem, linux-kernel, jirislaby, Ingo Molnar

Andrew Morton пишет:
> On Wed, 30 Apr 2008 13:31:08 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>>
>> On Wed, 30 Apr 2008, Andrew Morton wrote:
>>> <jumps up and down>
>>>
>>> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
>> The problem I see with both -mm and linux-next is that they tend to be 
>> better at finding the "physical conflict" kind of issues (ie the merge 
>> itself fails) than the "code looks ok but doesn't actually work" kind of 
>> issue.
>>
>> Why?
>>
>> The tester base is simply too small.
>>
>> Now, if *that* could be improved, that would be wonderful, but I'm not 
>> seeing it as very likely.
>>
>> I think we have fairly good penetration these days with the regular -git 
>> tree, but I think that one is quite frankly a *lot* less scary than -mm or 
>> -next are, and there it has been an absolutely huge boon to get the kernel 
>> into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also 
>> started something like that).
>>
>> So I'm very pessimistic about getting a lot of test coverage before -rc1.
>>
>> Maybe too pessimistic, who knows?
>>
> 
> Well.  We'll see.
> 
> linux-next is more than another-tree-to-test.  It is (or will be) a change
> in our processes and culture.  For a start, subsystem maintainers can no
> longer whack away at their own tree as if the rest of use don't exist. 
> They now have to be more mindful of merge issues.
> 
> Secondly, linux-next is more accessible than -mm: more releases, more
> stable, better tested by he-who-releases it, available via git:// etc.

Andrew, the latter thing is a very good point. For me personally, the fact
that -mm is not available via git is the major obstacle for trying your
tree more frequently than just a few times per year. How difficult it
would be to switch to git for you? I guess there are good reasons for still
using the source code management system from the last century; please
correct me if I'm wrong, but I believe that using a modern SCM system could
make life easier for you and your testers, no?

> 
> I get the impression that we're seeing very little non-Stephen testing of
> linux-next at this stage.  I hope we can ramp that up a bit, initially by
> having core developers doing at least some basic sanity testing.
> 

For busy (or lazy) people like myself, the big problem with linux-next are
the frequent merge breakages, when pulling the tree stops with "you are in
the middle of a merge conflict". Perhaps, there is a better way to resolve
this without just removing the whole repo and cloning it once again - this
is what I'm doing, please flame me for stupidity or ignorance if I simply
am not aware of some git feature that could be useful in such cases.

Finally, while the list is at it, I'd like to make another technical comment.
My development zoo is a pretty fast 4-way Xeon server, where I keep a handful
of trees, a few cross-toolchains, Qemu, etc. The network setup in our
organization is such that I can use git only over http from that server. This
cannot be changed, it's the company policy. In view of that, it's a pity that
quite a few tree owners don't make sure that http access to their trees works
(I added Ingo to the Cc: list in the hope that this will be corrected soon for
the x86 tree, which I am using quite extensively), and I have to use a much
slower machine (a two and a half year old laptop) for these trees. Please see
this:

<<<<<<<

[dmitri.vorobiev@amber ~]$ git clone http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Initialized empty Git repository in /home/dmitri.vorobiev/linux-2.6-x86/.git/
Getting alternates list for http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Also look at http://www.kernel.org/home/ftp/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/
Getting pack list for http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Getting index for pack ded7039bef9c148e5bb991a1b61da1d67c0ad3c2
Getting pack list for http://www.kernel.org/home/ftp/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/
error: Unable to find 08acd4f8af42affd8cbed81cc1b69fa12ddb213f under http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Cannot obtain needed object 08acd4f8af42affd8cbed81cc1b69fa12ddb213f
[dmitri.vorobiev@amber ~]$ 

<<<<<<<

Thanks,
Dmitri


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:21         ` David Miller
@ 2008-04-30 21:47           ` Rafael J. Wysocki
  2008-04-30 22:02           ` Dmitri Vorobiev
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 21:47 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, torvalds, linux-kernel, jirislaby

On Wednesday, 30 of April 2008, David Miller wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Wed, 30 Apr 2008 13:54:05 -0700
> 
> > linux-next does little to address our two largest (IMO) problems:
> > inadequate review and inadequate response to bug and regression reports. 
> > But those problems are harder to fix..
> 
> This is all about positive and negative reinforcement.
> 
> The people who sit and git bisect their lives away to get the
> regressions fixed need more positive reinforcement.  And the people
> who stick these regressions into the tree need more negative
> reinforcement.
> 
> The current way of dealing with folks who stick broken crud into the
> tree results in zero change in behvaior.
> 
> People who insert the bum changes into the tree only really have one
> core thing that they are sensitive to, their reputation.  That's why
> there is an enormous reluctance to even suggest reverts, it looks bad
> for them and it also makes more work for them in the end.
> 
> I guess what these folks are truly afraid of is that someone will
> start tracking reverts and post their results in some presentation
> at some big conference.  I say that would be a good thing.  To
> be honest, hitting the revert button more aggressively and putting
> the fear of being the "revert king" into everyone's minds might
> really help with this problem.

Well, probably ...

> Currently there is no sufficient negative pushback on people who
> insert broken crud into the tree.  So it should be no surprise that it
> continues.

... but that should also point at the trees through which the bugs are
introduced.

I mean, the maintainers should be more careful for what they take to their
trees and push upstream.  If that happens, they'll (hopefully) put some more
pressure on patch submitters.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:31     ` Linus Torvalds
  2008-04-30 20:47       ` Dan Noe
  2008-04-30 20:54       ` Andrew Morton
@ 2008-04-30 21:52       ` H. Peter Anvin
  2008-05-01  3:24         ` Bob Tracy
  2008-05-01 16:39         ` Valdis.Kletnieks
  2008-05-01  0:31       ` RFC: starting a kernel-testers group for newbies Adrian Bunk
  3 siblings, 2 replies; 229+ messages in thread
From: H. Peter Anvin @ 2008-04-30 21:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby

Linus Torvalds wrote:
> 
> The tester base is simply too small.
> 
> Now, if *that* could be improved, that would be wonderful, but I'm not 
> seeing it as very likely.
> 

One thing is that we keep fragmenting the tester base by adding new 
confidence levels: we now have -mm, -next, mainline -git, mainline -rc, 
mainline release, stable, distro testing, and distro release (and some 
distros even have aggressive versus conservative tracks.)  Furthermore, 
thanks to craniorectal immersion on the part of graphics vendors, a lot 
of users have to run proprietary drivers on their "main work" systems, 
which means they can't even test newer releases even if they would dare.

This fragmentation is largely intentional, of course -- everyone can 
pick a risk level appropriate for them -- but it does mean:

a) The lag for a patch to ride through the pipeline is pretty long.
b) The section of people who are going to use the more aggressive trees 
for "real work" testing is going to be small.

	-hpa


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:21         ` David Miller
  2008-04-30 21:47           ` Rafael J. Wysocki
@ 2008-04-30 22:02           ` Dmitri Vorobiev
  2008-04-30 22:19           ` Ingo Molnar
  2008-05-02 13:37           ` Helge Hafting
  3 siblings, 0 replies; 229+ messages in thread
From: Dmitri Vorobiev @ 2008-04-30 22:02 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, torvalds, rjw, linux-kernel, jirislaby, Ingo Molnar

David Miller пишет:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Wed, 30 Apr 2008 13:54:05 -0700
> 
>> linux-next does little to address our two largest (IMO) problems:
>> inadequate review and inadequate response to bug and regression reports. 
>> But those problems are harder to fix..
> 
> This is all about positive and negative reinforcement.
> 
> The people who sit and git bisect their lives away to get the
> regressions fixed need more positive reinforcement.  And the people
> who stick these regressions into the tree need more negative
> reinforcement.
> 
> The current way of dealing with folks who stick broken crud into the
> tree results in zero change in behvaior.
> 
> People who insert the bum changes into the tree only really have one
> core thing that they are sensitive to, their reputation.  That's why
> there is an enormous reluctance to even suggest reverts, it looks bad
> for them and it also makes more work for them in the end.
> 
> I guess what these folks are truly afraid of is that someone will
> start tracking reverts and post their results in some presentation
> at some big conference.  I say that would be a good thing.  To
> be honest, hitting the revert button more aggressively and putting
> the fear of being the "revert king" into everyone's minds might
> really help with this problem.
> 
> Currently there is no sufficient negative pushback on people who
> insert broken crud into the tree.  So it should be no surprise that it
> continues.

I'm not a frequent poster to this mailing list, but I do spend a good
portion of my life reading it. Please excuse me for expressing my very
personal opinion, but I thought you might probably be interested in a
detached view of the situation.

I think that many have guessed that I would like to talk about the attacks
to Ingo and backwards. Believe me, this fight looks childish, as it becomes
obvious that that went beyond purely technical disputes, which Linus is so
keen of rightfully writing about.

In no case am I implying any kind of offense, but I do believe that bad
emotions do hinder the community from advancing with the technical things.

Dmitri

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:42         ` Dmitri Vorobiev
@ 2008-04-30 22:06           ` Jiri Slaby
  2008-04-30 22:10           ` Andrew Morton
  1 sibling, 0 replies; 229+ messages in thread
From: Jiri Slaby @ 2008-04-30 22:06 UTC (permalink / raw)
  To: Dmitri Vorobiev
  Cc: Andrew Morton, Linus Torvalds, rjw, davem, linux-kernel, Ingo Molnar

On 04/30/2008 11:42 PM, Dmitri Vorobiev wrote:
> For busy (or lazy) people like myself, the big problem with linux-next are
> the frequent merge breakages, when pulling the tree stops with "you are in
> the middle of a merge conflict". Perhaps, there is a better way to resolve
> this without just removing the whole repo and cloning it once again - this

If this is still an issue of -next, I would say we won't get too much testers. I 
gave up after first time I was attacked by that and got back to pure -mm.

I think greg-kh asked why this happens (Stephen rebases?), if you search 
archives, I'm sure you'll find it.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:30           ` Rafael J. Wysocki
  2008-04-30 21:37             ` Andrew Morton
@ 2008-04-30 22:08             ` Linus Torvalds
  1 sibling, 0 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 22:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Dan Noe, davem, linux-kernel, jirislaby, Stephen Rothwell



On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> 
> How bisectable is linux-next, BTW?

Each _individual_ release will be entirely bisectable, since it's all git 
trees, and at no point does anything collapse individual commits together 
like -mm does.

HOWEVER. 

Due to the way linux-next works, each individual release will be basically 
unrelated to the previous one, so it gets a bit more exciting indeed when 
you say "the last linux-next version worked for me, but the current one 
does not".

Git can actually do this - you can make the previous (good) linux-next 
version be one branch, and the not-directly-related next linux-next build 
be another, and then "git bisect" will _technically_ work, but:

 - it will not necessarily be as efficient (because the linux-next trees 
   will have re-done all the merges, so there will be new commits and 
   patterns in between them)

 - but much more distressingly, if the individual git trees that got 
   merged into linux-next were also using rebasing etc, now even all the 
   *base* commits will be different, and saying that the old release was 
   good tells you almost nothing about the new release!

   (The good news is that if only a couple of trees do that, the bisection 
   information from the other trees that don't do it will still be valid 
   and useful and help bisection)

 - also, while it's very easy for somebody who knows and understands git 
   branches, it's technically still quite a bit more challenging than just 
   following a single tree that never rebases (ie mine) and just bisecting 
   within that one.

So yes, git bisect will work in linux-next, and the fundamental nature of 
git-bisect will not change at all, but it's going to be a bit weaker 
"between different versions" of linux-next than it would be for the normal 
git tree that doesn't do the "merge different trees all over again" thing 
that linux-next does.

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:42         ` Dmitri Vorobiev
  2008-04-30 22:06           ` Jiri Slaby
@ 2008-04-30 22:10           ` Andrew Morton
  2008-04-30 22:19             ` Linus Torvalds
                               ` (2 more replies)
  1 sibling, 3 replies; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 22:10 UTC (permalink / raw)
  To: Dmitri Vorobiev; +Cc: torvalds, rjw, davem, linux-kernel, jirislaby, mingo

On Thu, 01 May 2008 01:42:59 +0400
Dmitri Vorobiev <dmitri.vorobiev@gmail.com> wrote:

> Andrew Morton __________:
> > On Wed, 30 Apr 2008 13:31:08 -0700 (PDT)
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> >>
> >> On Wed, 30 Apr 2008, Andrew Morton wrote:
> >>> <jumps up and down>
> >>>
> >>> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> >> The problem I see with both -mm and linux-next is that they tend to be 
> >> better at finding the "physical conflict" kind of issues (ie the merge 
> >> itself fails) than the "code looks ok but doesn't actually work" kind of 
> >> issue.
> >>
> >> Why?
> >>
> >> The tester base is simply too small.
> >>
> >> Now, if *that* could be improved, that would be wonderful, but I'm not 
> >> seeing it as very likely.
> >>
> >> I think we have fairly good penetration these days with the regular -git 
> >> tree, but I think that one is quite frankly a *lot* less scary than -mm or 
> >> -next are, and there it has been an absolutely huge boon to get the kernel 
> >> into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also 
> >> started something like that).
> >>
> >> So I'm very pessimistic about getting a lot of test coverage before -rc1.
> >>
> >> Maybe too pessimistic, who knows?
> >>
> > 
> > Well.  We'll see.
> > 
> > linux-next is more than another-tree-to-test.  It is (or will be) a change
> > in our processes and culture.  For a start, subsystem maintainers can no
> > longer whack away at their own tree as if the rest of use don't exist. 
> > They now have to be more mindful of merge issues.
> > 
> > Secondly, linux-next is more accessible than -mm: more releases, more
> > stable, better tested by he-who-releases it, available via git:// etc.
> 
> Andrew, the latter thing is a very good point. For me personally, the fact
> that -mm is not available via git is the major obstacle for trying your
> tree more frequently than just a few times per year.

Every -mm release if available via git://, as described in the release
announcements.

The scripts which do this are a bit cantankerous but I believe they do
work.

<tests it>

yup, 2.6.25-mm1 is there.

> How difficult it
> would be to switch to git for you?

Fatal, I expect.  A tool which manages source-code files is just the wrong
paradigm.  I manage _changes_ against someone else's source files.

> I guess there are good reasons for still
> using the source code management system from the last century; please
> correct me if I'm wrong, but I believe that using a modern SCM system could
> make life easier for you and your testers, no?
> 
> > 
> > I get the impression that we're seeing very little non-Stephen testing of
> > linux-next at this stage.  I hope we can ramp that up a bit, initially by
> > having core developers doing at least some basic sanity testing.
> > 
> 
> For busy (or lazy) people like myself, the big problem with linux-next are
> the frequent merge breakages, when pulling the tree stops with "you are in
> the middle of a merge conflict".

Really?  Doesn't Stephen handle all those problems?  It should be a clean
fetch each time?


> Perhaps, there is a better way to resolve
> this without just removing the whole repo and cloning it once again - this
> is what I'm doing, please flame me for stupidity or ignorance if I simply
> am not aware of some git feature that could be useful in such cases.
> 
> Finally, while the list is at it, I'd like to make another technical comment.
> My development zoo is a pretty fast 4-way Xeon server, where I keep a handful
> of trees, a few cross-toolchains, Qemu, etc. The network setup in our
> organization is such that I can use git only over http from that server.

Don't know what to do about that, sorry.  An off-site git->http proxy might
work, but I doubt if anyone has written the code.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:21         ` David Miller
  2008-04-30 21:47           ` Rafael J. Wysocki
  2008-04-30 22:02           ` Dmitri Vorobiev
@ 2008-04-30 22:19           ` Ingo Molnar
  2008-04-30 22:22             ` David Miller
                               ` (2 more replies)
  2008-05-02 13:37           ` Helge Hafting
  3 siblings, 3 replies; 229+ messages in thread
From: Ingo Molnar @ 2008-04-30 22:19 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, torvalds, rjw, linux-kernel, jirislaby


* David Miller <davem@davemloft.net> wrote:

> > linux-next does little to address our two largest (IMO) problems: 
> > inadequate review and inadequate response to bug and regression 
> > reports. But those problems are harder to fix..
> 
> This is all about positive and negative reinforcement.
> 
> The people who sit and git bisect their lives away to get the 
> regressions fixed need more positive reinforcement.  And the people 
> who stick these regressions into the tree need more negative 
> reinforcement.

What we need is not 'negative reinforcement'. That is just nasty, open 
warfare between isolated parties, expressed in a politically correct 
way.

The core problem is that every maintainer has his own subjective, 
assymetric view and experience about this matter: to him his own tree is 
almost problem-free and most problems are very easy to fix, while other 
problems in other trees are nuisance that should never have been put 
upstream.

Also, people get defensive when their regressions gets pointed out in 
anything but the most respectful and casual manner.

For example, how on earth do i tell you that during the v2.6.24 merge 
window, half of all x86 test-machines for me and others were broken 
because they had no networking, for more than a week in a row? Are you 
surprised about this (true) experience we had? Do you feel insulted? Do 
you feel unfairly handled and slandered?

The same goes in the other direction as well - you were just hit by 
scheduler tree related regressions that were only triggered on your 
128-way sparc64, but not on our 64way x86 and smaller boxes.

The thing is, what we really need is more cooperation and earlier 
integration - more people actually testing linux-next occasionally to 
see how things will look like in the next merge window.

linux-next doing build tests is fine, but the nasty regressions that 
will hit your box can only be solved if _you_ boot linux-next at least 
once before the merge window opens. The regressions that will hit my box 
can only be avoided if i test your tree.

hm? And can we please somehow talk about this without flaming each other 
in the process?

	Ingo

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:10           ` Andrew Morton
@ 2008-04-30 22:19             ` Linus Torvalds
  2008-04-30 22:28               ` Dmitri Vorobiev
  2008-05-01 23:06               ` Kevin Winchester
  2008-04-30 23:04             ` Dmitri Vorobiev
  2008-05-01  6:15             ` Jan Engelhardt
  2 siblings, 2 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 22:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dmitri Vorobiev, rjw, davem, linux-kernel, jirislaby, mingo



On Wed, 30 Apr 2008, Andrew Morton wrote:
> > For busy (or lazy) people like myself, the big problem with linux-next are
> > the frequent merge breakages, when pulling the tree stops with "you are in
> > the middle of a merge conflict".
> 
> Really?  Doesn't Stephen handle all those problems?  It should be a clean
> fetch each time?

It should indeed be a clean fetch, but I wonder if Dmitri perhaps does a 
"git pull" - which will do the fetch, but then try to _merge_ that fetched 
state into whatever the last base Dmitri happened to have.

Dmitry: you cannot just "git pull" on linux-next, because each version of 
linux-next is independent of the next one. What you should do is basically

	# Set this up just once..
	git remote add linux-next git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git

and then after that, you keep on just doing

	git fetch linux-next
	git checkout linux-next/master

which will get you the actual objects and check out the state of that 
remote (and then you'll normally never be on a local branch on that tree, 
git will end up using a so-called "detached head" for this).

IOW, you should never need to do any merges, because Stephen did all those 
in linux-next already.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:19           ` Ingo Molnar
@ 2008-04-30 22:22             ` David Miller
  2008-04-30 22:39               ` Rafael J. Wysocki
  2008-04-30 22:35             ` Ingo Molnar
  2008-05-05  3:04             ` Rusty Russell
  2 siblings, 1 reply; 229+ messages in thread
From: David Miller @ 2008-04-30 22:22 UTC (permalink / raw)
  To: mingo; +Cc: akpm, torvalds, rjw, linux-kernel, jirislaby

From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 1 May 2008 00:19:36 +0200

> The same goes in the other direction as well - you were just hit by 
> scheduler tree related regressions that were only triggered on your 
> 128-way sparc64, but not on our 64way x86 and smaller boxes.

You keep saying this over and over again, but the powerpc folks hit
this stuff too.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:37       ` Linus Torvalds
@ 2008-04-30 22:23         ` Rafael J. Wysocki
  2008-04-30 22:31           ` Linus Torvalds
  2008-04-30 22:40           ` david
  0 siblings, 2 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 22:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > > 
> > >    Long merge windows don't work - because rather than test more, it just 
> > >    means that people will use them to make more changes!
> > 
> > And what do you think is happening _after_ the merge window closes, when
> > we're supposed to be fixing bugs?  People work on new code.  And, in fact, they
> > have to, if they want to be ready for the next merge window.
> 
> Oh, I agree. But at that point, the issue you brought up - of testing and 
> then having the code change under you wildly - has at least gone away.
> 
> And I think you are missing a big issue:
> 
> > Sorry to say that, but I don't think this is realistic.  What happens after the merge
> > window is people go and develop new stuff.
> 
> From a testing standpoint, the *developers* aren't ever even the main 
> issue. Yes, we get test coverage that way too, but we should really aim 
> for getting most of the non-obvious issues from the user community, and 
> not primarily from developers.
> 
> So the whole point of the merge window is *not* to have developers testing 
> their code during the six subsequent weeks, but to have *users* able to 
> use -rc1 and report issues!
> 
> That's why the distro "testing" trees are so important. And that's why 
> it's so important that -rc1 be timely. 

That's correct, but since developers are already working on new code at that
point, the bug reports in fact distract them and make them go back to the "old"
stuff, recall why they did that particular changes etc.  As a result, the
developers often do not take the bug reports seriously enough, especially if
they do not finger the "guilty" change.  That, in turn, makes the users believe
there's no point in testing and reporting bugs.

> > My point is, given the width of the merge windown, there's too much stuff
> > going in during it.  As far as I'm concerned, the window can be a week long
> > or whatever, but let's make fewer commits over a unit of time.
> 
> I'm not following that logic. 
> 
> A single merge will bring in easily thousands of commits. It doesn't 
> matter if the merge window is a day or a week or two weeks, the merge will 
> be one event.

No, technically it doesn't.

> And there's no way to avoid the fact that during the merge window, we will 
> get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
> 9629 commits).

Well, do we _have_ _to_ take that much?  I know we _can_, but is this really
necessary?

> So your "fewer commits over a unit of time" doesn't make sense.

Oh, yes it does.  Equally well you could say that having brakes in a car
didn't make sense, even if you could drive it as fast as the engine allowed
you to. ;-)

> We have those ten thousand commits. They need to go in. They cannot take
> forever.

But perhaps some of them can wait a bit longer.
 
> Ergo, you *will* have a thousand commits a day during the merge window.

That's only if you insist on handling everything what people push to you.

> We can spread it out a bit (and I do to some degree), but in many ways 
> that is just going to be more painful. So it's actually easier if we can 
> get about half of the merges done early, so that people like Andrew then 
> has at least most of the base set for him by the first few days of the 
> merge window.
> 
> So here's the math: 3,500 commits per month. That's just the *average* 
> speed, it's sometimes more. And we *cannot* merge them continuously, 
> because we need to have a stabler period for testing. And remember: those 
> 3,500 commits don't stop happening just because they aren't merged. You 
> should think of them as a constant pressure.
> 
> So 3,500 commits per month, but with a stable period (that is *longer* 
> than the merge window) that means that the merge window needs to merge 
> that constant stream of commits *faster* than they happen, so that we can 
> then have that breather when we try to get users to test it. Let's say 
> that we have a 1:3 ratio (which is fairly close to what we have), and that 
> means that we need to merge 3,500 commits in a week. 
> 
> That's just simple *math*. So when you say "let's make fewer commits over 
> a unit of time" I can onyl shake my head and wonder what the hell you are 
> talking about. The merge window _needs_ to do those 3,500 commits per 
> week. Otherwise they don't get merged!

Surely, they don't, but maybe they don't have to.

You can technically handle merging even more, but what about quality?  Do we
have a quality assurance process in place?  If we do, what is it?  How is it
able to handle the 3500 commits a week?  Assuming it is, will it be able to
handle more and what's the limit?

IMO, there has to be a limit somewhere, or we will end up in a spiral driving
everybody mad.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:19             ` Linus Torvalds
@ 2008-04-30 22:28               ` Dmitri Vorobiev
  2008-05-01 16:26                 ` Diego Calleja
  2008-05-01 23:06               ` Kevin Winchester
  1 sibling, 1 reply; 229+ messages in thread
From: Dmitri Vorobiev @ 2008-04-30 22:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, rjw, davem, linux-kernel, jirislaby, mingo

Linus Torvalds пишет:
> 
> On Wed, 30 Apr 2008, Andrew Morton wrote:
>>> For busy (or lazy) people like myself, the big problem with linux-next are
>>> the frequent merge breakages, when pulling the tree stops with "you are in
>>> the middle of a merge conflict".
>> Really?  Doesn't Stephen handle all those problems?  It should be a clean
>> fetch each time?
> 
> It should indeed be a clean fetch, but I wonder if Dmitri perhaps does a 
> "git pull" - which will do the fetch, but then try to _merge_ that fetched 
> state into whatever the last base Dmitri happened to have.
> 
> Dmitry: you cannot just "git pull" on linux-next, because each version of 
> linux-next is independent of the next one. What you should do is basically
> 
> 	# Set this up just once..
> 	git remote add linux-next git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> 
> and then after that, you keep on just doing
> 
> 	git fetch linux-next
> 	git checkout linux-next/master
> 
> which will get you the actual objects and check out the state of that 
> remote (and then you'll normally never be on a local branch on that tree, 
> git will end up using a so-called "detached head" for this).
> 
> IOW, you should never need to do any merges, because Stephen did all those 
> in linux-next already.

Linus, thanks a lot for the detailed explanation. Indeed, it seems that I foolishly
tried to duplicate Stephen's work. In the future I'll do as you suggest here.

Dmitri

> 
> 			Linus
> 


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:23         ` Rafael J. Wysocki
@ 2008-04-30 22:31           ` Linus Torvalds
  2008-04-30 22:41             ` Andrew Morton
                               ` (2 more replies)
  2008-04-30 22:40           ` david
  1 sibling, 3 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 22:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> 
> > And there's no way to avoid the fact that during the merge window, we will 
> > get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
> > 9629 commits).
> 
> Well, do we _have_ _to_ take that much?  I know we _can_, but is this really
> necessary?

Do you want me to stop merging your code?

Do you think anybody else does?

Any suggestions on how to convince people that their code is not worth 
merging?

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:19           ` Ingo Molnar
  2008-04-30 22:22             ` David Miller
@ 2008-04-30 22:35             ` Ingo Molnar
  2008-04-30 22:49               ` Andrew Morton
  2008-04-30 22:51               ` David Miller
  2008-05-05  3:04             ` Rusty Russell
  2 siblings, 2 replies; 229+ messages in thread
From: Ingo Molnar @ 2008-04-30 22:35 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, torvalds, rjw, linux-kernel, jirislaby, Thomas Gleixner


* Ingo Molnar <mingo@elte.hu> wrote:

> What we need is not 'negative reinforcement'. That is just nasty, open 
> warfare between isolated parties, expressed in a politically correct 
> way.

in more detail: any "negative reinforcement" should be on the 
_technical_ level, i.e. when changes are handled - not at the broad tree 
level.

Sure, there are exceptions, etc. - but by the time stuff goes upstream 
it's too late and we've got to fix stuff instead of trying to push back 
on each other.

by earlier integration (= linux-next) we can do the pushback much 
earlier, in a much more granular, much more technical in a much less 
personal way: "hey Ingo, your new sched-dizzy-blah patch broke stuff 
here, zap it" or "hey Dave, that socket-foo rewrite just broke things 
here, zap it".

git-revert _kind of_ makes that possible too, but people still feel too 
personal about reverts - they take it as intrusion into their subsystem 
and regard it as an attack against their competence as a maintainer.

and this is all so typical btw.: the most effective measure against 
human warfare is for people to see each other and to talk to each other.

[ That's one reason why i am so worried about mailing list isolation.
  People get more distant, they mean less to each other, work less with
  each other => Linux suffers. I do accept that for some people lkml is
  simply too noisy - but i think the cure is worse than the disease. ]

	Ingo

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:22             ` David Miller
@ 2008-04-30 22:39               ` Rafael J. Wysocki
  2008-04-30 22:54                 ` david
  2008-04-30 23:12                 ` Willy Tarreau
  0 siblings, 2 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, akpm, torvalds, linux-kernel, jirislaby

On Thursday, 1 of May 2008, David Miller wrote:
> From: Ingo Molnar <mingo@elte.hu>
> Date: Thu, 1 May 2008 00:19:36 +0200
> 
> > The same goes in the other direction as well - you were just hit by 
> > scheduler tree related regressions that were only triggered on your 
> > 128-way sparc64, but not on our 64way x86 and smaller boxes.
> 
> You keep saying this over and over again, but the powerpc folks hit
> this stuff too.

Well, I think that some changes need some wider testing anyway.

They may be correct from the author's point of view and even from the knowledge
and point of view of the maintainer who takes them into his tree.  That's
because no one knows everything and it'll always be like this.

Still, with the current process such "suspicious" changes go in as parts of
large series of commits and need to be "rediscovered" by the affected testers
with the help of bisection.  Moreover, many changes of this kind may go in from
many different sources at the same time and that's really problematic.

In fact, so many changes go in at a time during a merge window, that we often
can't really say which of them causes the breakage observed by testers and
bisection, that IMO should really be a last-resort tool, is used on the main
debugging techinque.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:23         ` Rafael J. Wysocki
  2008-04-30 22:31           ` Linus Torvalds
@ 2008-04-30 22:40           ` david
  2008-04-30 23:45             ` Rafael J. Wysocki
  1 sibling, 1 reply; 229+ messages in thread
From: david @ 2008-04-30 22:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thu, 1 May 2008, Rafael J. Wysocki wrote:

> On Wednesday, 30 of April 2008, Linus Torvalds wrote:
>>
>> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
>> So your "fewer commits over a unit of time" doesn't make sense.
>
> Oh, yes it does.  Equally well you could say that having brakes in a car
> didn't make sense, even if you could drive it as fast as the engine allowed
> you to. ;-)
>
>> We have those ten thousand commits. They need to go in. They cannot take
>> forever.
>
> But perhaps some of them can wait a bit longer.

not really, if patches are produced at a rate of 1000/week and you decide 
to only accept 2000 of them this month, a month later you have 6000 
patches to deal with. history has shown that developers do not stop 
developing if their patches are not accepted, they just fork and go their 
own way.

David Lang


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:31           ` Linus Torvalds
@ 2008-04-30 22:41             ` Andrew Morton
  2008-04-30 23:23               ` Rafael J. Wysocki
                                 ` (2 more replies)
  2008-04-30 22:46             ` Willy Tarreau
  2008-04-30 23:03             ` Rafael J. Wysocki
  2 siblings, 3 replies; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 22:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rjw, davem, linux-kernel, jirislaby

On Wed, 30 Apr 2008 15:31:22 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Any suggestions on how to convince people that their code is not worth 
> merging?

Raise the quality.  Then the volume will automatically decrease.

Which leads us to...  the volume isn't a problem per-se.  The problem is
quality.  It's the fact that they vary inversely which makes us say "slow
down".

So David's Subject: should have been "Do Better, please".  Slowing down is
just a side-effect.  And, we expect, a tool.


We should be discussing how to raise the quality of our work.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:31           ` Linus Torvalds
  2008-04-30 22:41             ` Andrew Morton
@ 2008-04-30 22:46             ` Willy Tarreau
  2008-04-30 22:52               ` Andrew Morton
  2008-04-30 23:20               ` Linus Torvalds
  2008-04-30 23:03             ` Rafael J. Wysocki
  2 siblings, 2 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-04-30 22:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Wed, Apr 30, 2008 at 03:31:22PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > 
> > > And there's no way to avoid the fact that during the merge window, we will 
> > > get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
> > > 9629 commits).
> > 
> > Well, do we _have_ _to_ take that much?  I know we _can_, but is this really
> > necessary?
> 
> Do you want me to stop merging your code?
> 
> Do you think anybody else does?
> 
> Any suggestions on how to convince people that their code is not worth 
> merging?

I think you're approaching a solution Linus. If developers take a refusal
as a punishment, maybe you can use that for trees which have too many
unresolved regressions. This would be really unfair to subsystem maintainers
which themselves merge a lot of work, but recursively they may apply the
same principle to their own developers, so that everybody knows that it's
not worth working on next code past a point where too many regressions are
reported.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:35             ` Ingo Molnar
@ 2008-04-30 22:49               ` Andrew Morton
  2008-04-30 22:51               ` David Miller
  1 sibling, 0 replies; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 22:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: davem, torvalds, rjw, linux-kernel, jirislaby, tglx

On Thu, 1 May 2008 00:35:09 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> git-revert _kind of_ makes that possible too, but people still feel too 
> personal about reverts - they take it as intrusion into their subsystem 
> and regard it as an attack against their competence as a maintainer.

I'd question this.  People often seem pretty happy to yank their stuff out
of there - it relieves ongoing embarrassment and it relieves time pressure
- they can have another go and get it right at their leisure.

Of course, reverting is easy.  The hard part is often finding the thing
which needs to be reverted.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:35             ` Ingo Molnar
  2008-04-30 22:49               ` Andrew Morton
@ 2008-04-30 22:51               ` David Miller
  2008-05-01  1:40                 ` Ingo Molnar
  2008-05-01  2:48                 ` Adrian Bunk
  1 sibling, 2 replies; 229+ messages in thread
From: David Miller @ 2008-04-30 22:51 UTC (permalink / raw)
  To: mingo; +Cc: akpm, torvalds, rjw, linux-kernel, jirislaby, tglx

From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 1 May 2008 00:35:09 +0200

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > What we need is not 'negative reinforcement'. That is just nasty, open 
> > warfare between isolated parties, expressed in a politically correct 
> > way.
> 
> in more detail: any "negative reinforcement" should be on the 
> _technical_ level, i.e. when changes are handled - not at the broad tree 
> level.

Sure, and I'll provide some right here.

Ingo, let me know what I need to do to change your behavior in
situations like the one I'm about to describe, ok?

Today, you merged in this bogus "regression fix".

commit ae3a0064e6d69068b1c9fd075095da062430bda9
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Apr 30 00:15:31 2008 +0200

    inlining: do not allow gcc below version 4 to optimize inlining
    
    fix the condition to match intention: always use the old inlining
    behavior on all gcc versions below 4.
    
    this should solve the UML build problem.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Did you actually read the UML build failure report?

Adrian Bunk specifically stated that the UML build failure regression
occurs with GCC version 4.3

Next, did you test this regression fix?

Next, if you could not test this regression fix, did you wait
patiently for the bug reporter to validate your fix?  Adrian
responded that it didn't fix the problem, but that was after
you queued this up to Linus already.

This proves my main beef with you Ingo.  You're way too trigger happy,
you merge things in too quickly, without checks and without
verifications.

To an arbitrary person reading the commit logs, the above
looks like you fixed something, when you actually didn't fix
anything.

And let's address this specific inlining optimization and all the
fallout it's generating.  You said you merged this thing in because
you didn't want to "wait a year for such a useful feature."  In
hindsight, that's exactly what we should have done, waited until we
could sort out all of these issues.  Yes, even if it would take a
year.

Now we're forced to sort it out somehow, unless you can get beyond
your pride and revert the original change.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:46             ` Willy Tarreau
@ 2008-04-30 22:52               ` Andrew Morton
  2008-04-30 23:21                 ` Willy Tarreau
  2008-04-30 23:20               ` Linus Torvalds
  1 sibling, 1 reply; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 22:52 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: torvalds, rjw, davem, linux-kernel, jirislaby

On Thu, 1 May 2008 00:46:10 +0200
Willy Tarreau <w@1wt.eu> wrote:

> On Wed, Apr 30, 2008 at 03:31:22PM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > 
> > > > And there's no way to avoid the fact that during the merge window, we will 
> > > > get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
> > > > 9629 commits).
> > > 
> > > Well, do we _have_ _to_ take that much?  I know we _can_, but is this really
> > > necessary?
> > 
> > Do you want me to stop merging your code?
> > 
> > Do you think anybody else does?
> > 
> > Any suggestions on how to convince people that their code is not worth 
> > merging?
> 
> I think you're approaching a solution Linus. If developers take a refusal
> as a punishment, maybe you can use that for trees which have too many
> unresolved regressions. This would be really unfair to subsystem maintainers
> which themselves merge a lot of work, but recursively they may apply the
> same principle to their own developers, so that everybody knows that it's
> not worth working on next code past a point where too many regressions are
> reported.
> 

Well.  If we were good enough at tracking bug reports and regressions we
could look at the status of subsytem X and say "no new features for you".

That would be a drastic step even if we had the information to do it (which
we don't).

It would certainly put the pigeon amongst the cats tho.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:59         ` Andrew Morton
  2008-04-30 21:30           ` Rafael J. Wysocki
@ 2008-04-30 22:53           ` Mariusz Kozlowski
  2008-04-30 23:11             ` Andrew Morton
  2008-05-02 10:20             ` Andi Kleen
  1 sibling, 2 replies; 229+ messages in thread
From: Mariusz Kozlowski @ 2008-04-30 22:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dan Noe, torvalds, rjw, davem, linux-kernel, jirislaby

Hello,

> > Perhaps we should be clear and simple about what potential testers 
> > should be running at any given point in time.  With -mm, linux-next, 
> > linux-2.6, etc, as a newcomer I find it difficult to know where my 
> > testing time and energy is best directed.

Speaking of energy and time of a tester. I'd like to know where these resources
should be directed from the arch point of view. Once I had a plan to buy as
many arches as I could get and run a farm of test boxes 8-) But that's hard
because of various reasons (money, time, room, energy). What arches need more
attention? Which are forgotten? Which are going away? For example does buying
an alphaserver DS 20 (hey - it's cheap) and running tests on it makes sense
these days?

	Mariusz

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:39               ` Rafael J. Wysocki
@ 2008-04-30 22:54                 ` david
  2008-04-30 23:12                 ` Willy Tarreau
  1 sibling, 0 replies; 229+ messages in thread
From: david @ 2008-04-30 22:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Miller, mingo, akpm, torvalds, linux-kernel, jirislaby

On Thu, 1 May 2008, Rafael J. Wysocki wrote:

> On Thursday, 1 of May 2008, David Miller wrote:
>> From: Ingo Molnar <mingo@elte.hu>
>> Date: Thu, 1 May 2008 00:19:36 +0200
>>
>>> The same goes in the other direction as well - you were just hit by
>>> scheduler tree related regressions that were only triggered on your
>>> 128-way sparc64, but not on our 64way x86 and smaller boxes.
>>
>> You keep saying this over and over again, but the powerpc folks hit
>> this stuff too.
>
> Well, I think that some changes need some wider testing anyway.
>
> They may be correct from the author's point of view and even from the knowledge
> and point of view of the maintainer who takes them into his tree.  That's
> because no one knows everything and it'll always be like this.

I think this is a very important point to keep in mind

> Still, with the current process such "suspicious" changes go in as parts of
> large series of commits and need to be "rediscovered" by the affected testers
> with the help of bisection.  Moreover, many changes of this kind may go in from
> many different sources at the same time and that's really problematic.

git makes it easy to have many branches that get merged upstream, would it 
really help much if these changes were initially done as seperate branches 
and then merged in?

if so there are two ways to do this

have Ingo (and others) create a small forest of branches that get merged 
into linux-next

have Ingo (and others) create a small forest of branches that get merged 
into one 'please pull' branch that gets merged into linux-next

the second has the advantage that merge conflicts between the different 
branches will be resolved before they go upstream, and there's less work 
to be done upstream (as the upstream doesn't need to keep adding branches 
to pull)

the first may have an advantage in terms of making the different branches 
more visable.

> In fact, so many changes go in at a time during a merge window, that we often
> can't really say which of them causes the breakage observed by testers and
> bisection, that IMO should really be a last-resort tool, is used on the main
> debugging techinque.

there are always going to be cases where the problem can only be found by 
bisecting it, but I agree that there seems to be a little too much 
reliance on bisecting (but that was a heated topic a few weeks ago, let's 
not re-hash it now)

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:31           ` Linus Torvalds
  2008-04-30 22:41             ` Andrew Morton
  2008-04-30 22:46             ` Willy Tarreau
@ 2008-04-30 23:03             ` Rafael J. Wysocki
  2 siblings, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 23:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, linux-kernel, Andrew Morton, Jiri Slaby, Greg KH

On Thursday, 1 of May 2008, Linus Torvalds wrote:
> 
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > 
> > > And there's no way to avoid the fact that during the merge window, we will 
> > > get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
> > > 9629 commits).
> > 
> > Well, do we _have_ _to_ take that much?  I know we _can_, but is this really
> > necessary?
> 
> Do you want me to stop merging your code?

Well, no, but actually there are only a few of my patches in this merge
window. :-)

Moreover, if the maintainers who took them told me they would be scheduled for
the next merge window, I wouldn't mind.  That actually happended to some of my
patches that are in the Greg's tree at the moment and that's fine (although I
consider the patches as important).

IMO, this is a question of balance.  Of course, a maintainer can take
everything from everyone, but at the same time he can have a look at the
patches and say "Well, I have lots of stuff scheduled for this merge window
already, this stuff of yours will wait for the next merge window.  Please
improve the code or review the others' patches in the meantime".  The only
thing is to give everyone a fair treatment, which may be a challenge.

> Do you think anybody else does?

I think the majority of developers would understand if you told them you could
only merge a limited amount of changes in a single merge window, provided that
they would be treated fairly.

When you take everything from everyone, you actually reward people who are
able to develop more code between merge windows.  Not necessarily those who
spend time on different important activities, such as reviewing the others'
code, bug tracking etc.

> Any suggestions on how to convince people that their code is not worth 
> merging?

That shouldn't be necessary. :-)

The point is to tell people to develop the code less rapidly, so to speak.
Or maybe more carefully.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:10           ` Andrew Morton
  2008-04-30 22:19             ` Linus Torvalds
@ 2008-04-30 23:04             ` Dmitri Vorobiev
  2008-05-01 15:19               ` Jim Schutt
  2008-05-01  6:15             ` Jan Engelhardt
  2 siblings, 1 reply; 229+ messages in thread
From: Dmitri Vorobiev @ 2008-04-30 23:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, rjw, davem, linux-kernel, jirislaby, mingo

Andrew Morton пишет:

[skipped]

>> Finally, while the list is at it, I'd like to make another technical comment.
>> My development zoo is a pretty fast 4-way Xeon server, where I keep a handful
>> of trees, a few cross-toolchains, Qemu, etc. The network setup in our
>> organization is such that I can use git only over http from that server.
> 
> Don't know what to do about that, sorry.  An off-site git->http proxy might
> work, but I doubt if anyone has written the code.

But there is another solution, which I believe is straightforward: have the tree
maintainer set up his tree properly.

Dmitri

> 
> 


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:53           ` Mariusz Kozlowski
@ 2008-04-30 23:11             ` Andrew Morton
  2008-05-12  9:27               ` Ben Dooks
  2008-05-02 10:20             ` Andi Kleen
  1 sibling, 1 reply; 229+ messages in thread
From: Andrew Morton @ 2008-04-30 23:11 UTC (permalink / raw)
  To: Mariusz Kozlowski; +Cc: dpn, torvalds, rjw, davem, linux-kernel, jirislaby

On Thu, 1 May 2008 00:53:31 +0200
Mariusz Kozlowski <m.kozlowski@tuxland.pl> wrote:

> Hello,
> 
> > > Perhaps we should be clear and simple about what potential testers 
> > > should be running at any given point in time.  With -mm, linux-next, 
> > > linux-2.6, etc, as a newcomer I find it difficult to know where my 
> > > testing time and energy is best directed.
> 
> Speaking of energy and time of a tester. I'd like to know where these resources
> should be directed from the arch point of view. Once I had a plan to buy as
> many arches as I could get and run a farm of test boxes 8-) But that's hard
> because of various reasons (money, time, room, energy). What arches need more
> attention? Which are forgotten? Which are going away? For example does buying
> an alphaserver DS 20 (hey - it's cheap) and running tests on it makes sense
> these days?
> 

gee.

I think to a large extent this problem solves itself - the "more important"
architectures have more people using them, so they get more testing and
more immediate testing.

However there are gaps.  I'd say that arm is one of the more important
architectures, but many people who are interested in arm tend to shy away
from bleeding-edge kernels for various reasons.  Mainly because they have
real products to get out the door, rather than dinking around with mainline
kernel developement.  So testing bleeding-edge on some arm systems would be
good, I expect.

otoh, the platform we break most often is surely plain-old-PCs.  If it's
bugs you're looking for, I expect that dumpster-diving for as many
different PCs as you can and trying to get them to boot (let alone suspend
and resume!) would keep you entertained ;)


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:39               ` Rafael J. Wysocki
  2008-04-30 22:54                 ` david
@ 2008-04-30 23:12                 ` Willy Tarreau
  2008-04-30 23:59                   ` Rafael J. Wysocki
  2008-05-01  0:15                   ` Chris Shoemaker
  1 sibling, 2 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-04-30 23:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Miller, mingo, akpm, torvalds, linux-kernel, jirislaby

On Thu, May 01, 2008 at 12:39:01AM +0200, Rafael J. Wysocki wrote:
> On Thursday, 1 of May 2008, David Miller wrote:
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Thu, 1 May 2008 00:19:36 +0200
> > 
> > > The same goes in the other direction as well - you were just hit by 
> > > scheduler tree related regressions that were only triggered on your 
> > > 128-way sparc64, but not on our 64way x86 and smaller boxes.
> > 
> > You keep saying this over and over again, but the powerpc folks hit
> > this stuff too.
> 
> Well, I think that some changes need some wider testing anyway.
> 
> They may be correct from the author's point of view and even from the knowledge
> and point of view of the maintainer who takes them into his tree.  That's
> because no one knows everything and it'll always be like this.
> 
> Still, with the current process such "suspicious" changes go in as parts of
> large series of commits and need to be "rediscovered" by the affected testers
> with the help of bisection.  Moreover, many changes of this kind may go in from
> many different sources at the same time and that's really problematic.

That's very true IMHO and is the thing which has been progressively
appearing since we merge large amounts of code at once. In the "good
old days", something did not work, the first one to discover it could
quickly report it on LKML : "hey, my 128-way sparc64 does not boot
anymore, anybody has any clue", and another one immediately found
this mail (better signal/noise ratio on LKML at this time) and say
"oops, I suspect that change, try to revert it".

Now, it's close to impossible. Maintainers frequently ask for bisection,
in part because nobody knows what code is merged, and they have to pull
Linus' tree to know when their changes have been pulled. That may be
part of the "fun" aspect that Davem is seeing going away in exchange
for more administrative relations. But if we agree that nobody knows
all the changes, we must agree that we need tools to track them, and
tools are fundamentally incompatible with smart human relations.

> In fact, so many changes go in at a time during a merge window, that we often
> can't really say which of them causes the breakage observed by testers and
> bisection, that IMO should really be a last-resort tool, is used on the main
> debugging techinque.

Maybe we could slightly improve the process by releasing more often, but
based on topics. Small sets of minimally-overlapping topics would get
merged in each release, and other topics would only be allowed to pull
fixes. That way everybody still gets some work merged, everybody tests
and problems are more easily spotted.

I know this is in part what Andrew tries to do when proposing to
integrate trees, but maybe some approximate rules should be proposed
in order for developers to organize their works. This would begin
with announcing topics to be considered for next branch very early.
This would also make it more natural for developers to have creation
and bug-tracking phases.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:46             ` Willy Tarreau
  2008-04-30 22:52               ` Andrew Morton
@ 2008-04-30 23:20               ` Linus Torvalds
  2008-05-01  0:42                 ` Rafael J. Wysocki
  2008-05-01  1:30                 ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-04-30 23:20 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Rafael J. Wysocki, David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Thu, 1 May 2008, Willy Tarreau wrote:
> > 
> > Any suggestions on how to convince people that their code is not worth 
> > merging?
> 
> I think you're approaching a solution Linus. If developers take a refusal
> as a punishment, maybe you can use that for trees which have too many
> unresolved regressions.

Heh. It's been done. In fact, it's done all the time on a smaller scale. 
It's how I've enforced some cleanliness or process issues ("I won't pull 
that because it's too ugly"). I see similar messages floating around about 
individual patches.

That said, I don't think it really works that well as "the solution": it 
works as a small part of the bigger picture, but no, we can't see 
punishment as the primary model for encouraging better bevaiour. 

First off, and maybe this is not true, but I don't think it is a very 
healthy way to handle issues in general. I may come off as an opinionated 
bastard in discussions like these, and I am, but when it actually comes to 
maintaining code, really prefer a much softer approach.

I want to _trust_ people, and I really don't want to be a "you need to do 
'xyz' or else" kind of guy. 

So I'll happily say "I can't merge this, because xyz", where 'xyz' is 
something that is related to the particular code that is actually merged. 
But quite frankly, holding up _unrelated_ fixes, because some other issue 
hasn't been resolved, I really try to not do that. 

So I'll say "I don't want to merge this, because quite frankly, we've had 
enough code for this merge window already, it can wait". That tends to 
happen at the end of the merge window, but it's not a threat, it's just me 
being tired of the worries of inevitable new issues at the end of the 
window.

And I personally feel that this is important to keep people motivated. 
Being too stick-oriented isn't healthy.

The other reason I don't believe in the "won't merge until you do 'xyz'" 
kind of thing as a main development model is that it traditionally hasn't 
worked.  People simply disagree, the vendors will take the code that their 
customers need, the users will get the tree that works for them, and 
saying "I won't merge it" won't help anybody if it's actually useful.

Finally, the people I work with may not be perfect, but most maintainers 
are pretty much experts within their own area. At some point you have to 
ask yourself: "Could I do better? Would I have the time? Could I find 
somebody else to do better?" and not just in a theoretical way. And if the 
answer is "no", then at that point, what else can you do? 

Yes, we have personalities that clash, and merge problems. And let's face 
it, as kernel developers, we aren't exactly a very "cuddly" group of 
people. People are opinionated and not afraid to speak their mind. But on 
the whole, I think the kernel development community is actually driven a 
lot more by _positive_ things than by the stick of "I won't get merged 
unless I shape up".

So quite frankly, I'd personally much rather have a process that 
encourages people to have so much _pride_ in what they do that they want 
it to be seen as being really good (and hopefully then that pride means 
that they won't take crap!) than having a chain of fear that trickles 
down.

So this is why, for example, I have so strongly encouraged git maintainers 
to think of their public trees as "releases". Because I think people act 
differently when they *think* of their code as a release than when they 
think of it as a random development tree.

I do _not_ want to slow down development by setting some kind of "quality 
bar" - but I do believe that we should keep our quality high, not because 
of any hoops we need to jump through, but because we take pride in the 
thing we do.

[ An example of this: I don't believe code review tends to much help in 
  itself, but I *do* believe that the process of doing code review makes 
  people more aware of the fact that others are looking at the code they 
  produce, and that in turn makes the code often better to start with.

  And I think publicly announced git trees and -mm and linux-next are 
  great partly because they end up doing that same thing. I heartily 
  encourage submaintainers to always Cc: linux-kernel when they send me a 
  "please pull" request - I don't know if anybody else ever really pulls 
  that tree, but I do think that it's very healthy to write that message 
  and think of it as a publication event. ]

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:52               ` Andrew Morton
@ 2008-04-30 23:21                 ` Willy Tarreau
  2008-04-30 23:38                   ` Chris Shoemaker
  0 siblings, 1 reply; 229+ messages in thread
From: Willy Tarreau @ 2008-04-30 23:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, rjw, davem, linux-kernel, jirislaby

On Wed, Apr 30, 2008 at 03:52:52PM -0700, Andrew Morton wrote:
> On Thu, 1 May 2008 00:46:10 +0200
> Willy Tarreau <w@1wt.eu> wrote:
> 
> > On Wed, Apr 30, 2008 at 03:31:22PM -0700, Linus Torvalds wrote:
> > > 
> > > 
> > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > > 
> > > > > And there's no way to avoid the fact that during the merge window, we will 
> > > > > get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was 
> > > > > 9629 commits).
> > > > 
> > > > Well, do we _have_ _to_ take that much?  I know we _can_, but is this really
> > > > necessary?
> > > 
> > > Do you want me to stop merging your code?
> > > 
> > > Do you think anybody else does?
> > > 
> > > Any suggestions on how to convince people that their code is not worth 
> > > merging?
> > 
> > I think you're approaching a solution Linus. If developers take a refusal
> > as a punishment, maybe you can use that for trees which have too many
> > unresolved regressions. This would be really unfair to subsystem maintainers
> > which themselves merge a lot of work, but recursively they may apply the
> > same principle to their own developers, so that everybody knows that it's
> > not worth working on next code past a point where too many regressions are
> > reported.
> > 
> 
> Well.  If we were good enough at tracking bug reports and regressions we
> could look at the status of subsytem X and say "no new features for you".
> 
> That would be a drastic step even if we had the information to do it (which
> we don't).

We already have some information, Rafael is tracking this info. But we need
other developers to look at others' bugs. If we considered that for each
release, the *worst* subsystem does not get any new features merged, maybe
the ones who really want to get theirs merged will quickly take a look at
their not-so-friend coworkers's work to try to get their score up and
avoid getting spotted.

After all, that's what we want to achieve : better cross-testing. For
2.6.27, we would probably have Davem happy to report one hundred of
bugs brought by Ingo and ban him from next merge. But if that's the
only way to find 100 buts in one release cycle, hey that's quite
efficient! And in turn, Ingo would have more time to fix (or deny)
bugs assigned to him, then take a look at his accuser's code for next
release.

Not very moral, but the kernel team has evolved from a small team of
buddies to a large enterprise. And to survive this evolution, we may
need to apply the immoral principles found in big companies.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:41             ` Andrew Morton
@ 2008-04-30 23:23               ` Rafael J. Wysocki
  2008-04-30 23:41                 ` david
  2008-05-01  0:57               ` Adrian Bunk
  2008-05-01 12:31               ` Tarkan Erimer
  2 siblings, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 23:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, davem, linux-kernel, jirislaby

On Thursday, 1 of May 2008, Andrew Morton wrote:
> On Wed, 30 Apr 2008 15:31:22 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Any suggestions on how to convince people that their code is not worth 
> > merging?
> 
> Raise the quality.  Then the volume will automatically decrease.
> 
> Which leads us to...  the volume isn't a problem per-se.  The problem is
> quality.  It's the fact that they vary inversely which makes us say "slow
> down".
> 
> So David's Subject: should have been "Do Better, please".  Slowing down is
> just a side-effect.  And, we expect, a tool.
> 
> 
> We should be discussing how to raise the quality of our work.

I violently agree.

One of the (obvious?) ways in which we can raise the quality of the code
overall is to spend more time on reviewing the others' code and discussing that
code.  It follows from my experience that the quality of patches improves
dramatically if they are discussed while being developed.  Of course, that
requires time, but it's time well spent.

For this reason, there should be a mechanism in place that will encourage
people to review the existing code, even the code that hasn't changed for
a long time, and to review and discuss patches submitted by the other people
instead of producing new code.

Also, the patches that were thoroughly discussed during their development
should be regarded as more trustworthy than the ones that were not discussed
at all.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:05   ` Linus Torvalds
  2008-04-30 20:14     ` Linus Torvalds
  2008-04-30 20:45     ` Rafael J. Wysocki
@ 2008-04-30 23:29     ` Paul Mackerras
  2008-05-01  1:57       ` Jeff Garzik
  2008-05-01  3:47       ` Linus Torvalds
  2 siblings, 2 replies; 229+ messages in thread
From: Paul Mackerras @ 2008-04-30 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

Linus Torvalds writes:

> So one of the major things about the short merge window is that it's 
> hopefully encouraging people to have things ready by the time the merge 
> window opens, because it's too late to do anything later.

Having things ready by the time the merge window opens is difficult
when you don't know when the merge window is going to open.  OK, after
you release a -rc6 or -rc7, we know it's close, but it could still be
three weeks off at that point.  Or it could be tomorrow.

That's mitigated at the moment by having the merge window be two weeks
long.  So if you open the merge window at a point where I, or someone
downstream of me, thought we still had two weeks to go, we can hurry
up and try to get stuff finished within the first week and still get
it merged.

But if you made a really hard and fast rule that only stuff that is in
linux-next at the point where the merge window opens can be merged,
AND the point at which the merge window opens is unknown and
unpredictable within a period of about 4 weeks, then that makes it
really tough for those of us downstream of you to plan our work.

By the way, if you do want to make that rule, then there's a really
easy way to do it - just pull linux-next, and make that one pull be
the entire merge window. :)  But please give us at least a week's
notice that you're going to do that.

Paul.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:14     ` Linus Torvalds
  2008-04-30 20:56       ` Rafael J. Wysocki
@ 2008-04-30 23:34       ` Greg KH
  1 sibling, 0 replies; 229+ messages in thread
From: Greg KH @ 2008-04-30 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Wed, Apr 30, 2008 at 01:14:39PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 30 Apr 2008, Linus Torvalds wrote:
> > 
> > In fact, I'd personally like to make it even shorter
> 
> Just to clarify: I'd actually like to make the merge window be just a 
> week. If even that.

I'd go for that.  The only one with a possible problem might be Andrew
due to his need to rebase his 1000+ individual patches before he sends
them to you :)

Everyone else should have things queued up and ready to go for you as
it's not like we don't have some warning that the window is about to
open up...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:21                 ` Willy Tarreau
@ 2008-04-30 23:38                   ` Chris Shoemaker
  0 siblings, 0 replies; 229+ messages in thread
From: Chris Shoemaker @ 2008-04-30 23:38 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andrew Morton, torvalds, rjw, davem, linux-kernel, jirislaby


On Thu, May 01, 2008 at 01:21:43AM +0200, Willy Tarreau wrote:
> Not very moral, but the kernel team has evolved from a small team of
> buddies to a large enterprise. And to survive this evolution, we may
> need to apply the immoral principles found in big companies.

On the contrary, I call this "keeping everybody else honest".

-chris

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:23               ` Rafael J. Wysocki
@ 2008-04-30 23:41                 ` david
  2008-04-30 23:51                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 229+ messages in thread
From: david @ 2008-04-30 23:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Linus Torvalds, davem, linux-kernel, jirislaby

On Thu, 1 May 2008, Rafael J. Wysocki wrote:

> On Thursday, 1 of May 2008, Andrew Morton wrote:
>> On Wed, 30 Apr 2008 15:31:22 -0700 (PDT)
>> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Also, the patches that were thoroughly discussed during their development
> should be regarded as more trustworthy than the ones that were not discussed
> at all.

but you don't have any way of knowing how much discussion took place on 
any particular patch. that discussion could have taken place in many 
different places, and you don't have the ability to monitor them all.

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:40           ` david
@ 2008-04-30 23:45             ` Rafael J. Wysocki
  2008-04-30 23:57               ` david
  2008-05-01  0:38               ` Adrian Bunk
  0 siblings, 2 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 23:45 UTC (permalink / raw)
  To: david
  Cc: Linus Torvalds, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thursday, 1 of May 2008, david@lang.hm wrote:
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> 
> > On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> >>
> >> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> >> So your "fewer commits over a unit of time" doesn't make sense.
> >
> > Oh, yes it does.  Equally well you could say that having brakes in a car
> > didn't make sense, even if you could drive it as fast as the engine allowed
> > you to. ;-)
> >
> >> We have those ten thousand commits. They need to go in. They cannot take
> >> forever.
> >
> > But perhaps some of them can wait a bit longer.
> 
> not really, if patches are produced at a rate of 1000/week and you decide 
> to only accept 2000 of them this month, a month later you have 6000 
> patches to deal with.

Well, I think you know how TCP works.  The sender can only send as much
data as the receiver lets it, no matter how much data there are to send.
I'm thinking about an analogous approach.

If the developers who produce those patches know in advance about the rate
limit and are promised to be treated fairly, they should be able to organize
their work in a different way.

> history has shown that developers do not stop developing if their patches are
> not accepted, they just fork and go their own way.

That's mostly when they feel that they are treated unfairly.

OTOH, insisting that your patches should be merged at the same rate that you're
able to develop them is unreasonable to me.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:41                 ` david
@ 2008-04-30 23:51                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 23:51 UTC (permalink / raw)
  To: david; +Cc: Andrew Morton, Linus Torvalds, davem, linux-kernel, jirislaby

On Thursday, 1 of May 2008, david@lang.hm wrote:
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> 
> > On Thursday, 1 of May 2008, Andrew Morton wrote:
> >> On Wed, 30 Apr 2008 15:31:22 -0700 (PDT)
> >> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > Also, the patches that were thoroughly discussed during their development
> > should be regarded as more trustworthy than the ones that were not discussed
> > at all.
> 
> but you don't have any way of knowing how much discussion took place on 
> any particular patch. that discussion could have taken place in many 
> different places, and you don't have the ability to monitor them all.

Not at the moment, but there may be a way to do that if we think of it more
thoroughly.

One idea may be add a "Commented-by:" tag in which to place people who
provided valuable comments to the patch author and/or maintainer (as a comma
separated list, for example, in analogy with the email Cc lists), especially if
the patch has been changed as a result of the comments.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:45             ` Rafael J. Wysocki
@ 2008-04-30 23:57               ` david
  2008-05-01  0:01                 ` Chris Shoemaker
  2008-05-01  0:38               ` Adrian Bunk
  1 sibling, 1 reply; 229+ messages in thread
From: david @ 2008-04-30 23:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thu, 1 May 2008, Rafael J. Wysocki wrote:

> On Thursday, 1 of May 2008, david@lang.hm wrote:
>> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
>>
>>> On Wednesday, 30 of April 2008, Linus Torvalds wrote:
>>>>
>>>> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
>>>> So your "fewer commits over a unit of time" doesn't make sense.
>>>
>>> Oh, yes it does.  Equally well you could say that having brakes in a car
>>> didn't make sense, even if you could drive it as fast as the engine allowed
>>> you to. ;-)
>>>
>>>> We have those ten thousand commits. They need to go in. They cannot take
>>>> forever.
>>>
>>> But perhaps some of them can wait a bit longer.
>>
>> not really, if patches are produced at a rate of 1000/week and you decide
>> to only accept 2000 of them this month, a month later you have 6000
>> patches to deal with.
>
> Well, I think you know how TCP works.  The sender can only send as much
> data as the receiver lets it, no matter how much data there are to send.
> I'm thinking about an analogous approach.
>
> If the developers who produce those patches know in advance about the rate
> limit and are promised to be treated fairly, they should be able to organize
> their work in a different way.

they will make the patches bigger to get the changes in a smaller number 
of patches. arbatrary limits produce gaming the system :-)

>> history has shown that developers do not stop developing if their patches are
>> not accepted, they just fork and go their own way.
>
> That's mostly when they feel that they are treated unfairly.
>
> OTOH, insisting that your patches should be merged at the same rate that you're
> able to develop them is unreasonable to me.

it's not nessasarily the individuals that fork, it's the distros who want 
to include the fixes and other changes that the individuals that create 
the fork.

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:12                 ` Willy Tarreau
@ 2008-04-30 23:59                   ` Rafael J. Wysocki
  2008-05-01  0:15                   ` Chris Shoemaker
  1 sibling, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-04-30 23:59 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: David Miller, mingo, akpm, torvalds, linux-kernel, jirislaby

On Thursday, 1 of May 2008, Willy Tarreau wrote:
> On Thu, May 01, 2008 at 12:39:01AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, 1 of May 2008, David Miller wrote:
> > > From: Ingo Molnar <mingo@elte.hu>
> > > Date: Thu, 1 May 2008 00:19:36 +0200
> > > 
> > > > The same goes in the other direction as well - you were just hit by 
> > > > scheduler tree related regressions that were only triggered on your 
> > > > 128-way sparc64, but not on our 64way x86 and smaller boxes.
> > > 
> > > You keep saying this over and over again, but the powerpc folks hit
> > > this stuff too.
> > 
> > Well, I think that some changes need some wider testing anyway.
> > 
> > They may be correct from the author's point of view and even from the knowledge
> > and point of view of the maintainer who takes them into his tree.  That's
> > because no one knows everything and it'll always be like this.
> > 
> > Still, with the current process such "suspicious" changes go in as parts of
> > large series of commits and need to be "rediscovered" by the affected testers
> > with the help of bisection.  Moreover, many changes of this kind may go in from
> > many different sources at the same time and that's really problematic.
> 
> That's very true IMHO and is the thing which has been progressively
> appearing since we merge large amounts of code at once. In the "good
> old days", something did not work, the first one to discover it could
> quickly report it on LKML : "hey, my 128-way sparc64 does not boot
> anymore, anybody has any clue", and another one immediately found
> this mail (better signal/noise ratio on LKML at this time) and say
> "oops, I suspect that change, try to revert it".
> 
> Now, it's close to impossible. Maintainers frequently ask for bisection,
> in part because nobody knows what code is merged, and they have to pull
> Linus' tree to know when their changes have been pulled. That may be
> part of the "fun" aspect that Davem is seeing going away in exchange
> for more administrative relations. But if we agree that nobody knows
> all the changes, we must agree that we need tools to track them, and
> tools are fundamentally incompatible with smart human relations.
> 
> > In fact, so many changes go in at a time during a merge window, that we often
> > can't really say which of them causes the breakage observed by testers and
> > bisection, that IMO should really be a last-resort tool, is used on the main
> > debugging techinque.
> 
> Maybe we could slightly improve the process by releasing more often, but
> based on topics. Small sets of minimally-overlapping topics would get
> merged in each release, and other topics would only be allowed to pull
> fixes. That way everybody still gets some work merged, everybody tests
> and problems are more easily spotted.

I like this idea.

> I know this is in part what Andrew tries to do when proposing to
> integrate trees, but maybe some approximate rules should be proposed
> in order for developers to organize their works. This would begin
> with announcing topics to be considered for next branch very early.
> This would also make it more natural for developers to have creation
> and bug-tracking phases.

Yes, that's reasonable.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:57               ` david
@ 2008-05-01  0:01                 ` Chris Shoemaker
  2008-05-01  0:14                   ` david
  0 siblings, 1 reply; 229+ messages in thread
From: Chris Shoemaker @ 2008-05-01  0:01 UTC (permalink / raw)
  To: david
  Cc: Rafael J. Wysocki, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Wed, Apr 30, 2008 at 04:57:38PM -0700, david@lang.hm wrote:
>>> history has shown that developers do not stop developing if their patches are
>>> not accepted, they just fork and go their own way.
>>
>> That's mostly when they feel that they are treated unfairly.
>>
>> OTOH, insisting that your patches should be merged at the same rate that you're
>> able to develop them is unreasonable to me.
>
> it's not nessasarily the individuals that fork, it's the distros who want 
> to include the fixes and other changes that the individuals that create the 
> fork.

Is that really bad?  Isn't that effectively equivalent to "increased testing of
earlier intergrations"?

-chris

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:01                 ` Chris Shoemaker
@ 2008-05-01  0:14                   ` david
  2008-05-01  0:38                     ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: david @ 2008-05-01  0:14 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: Rafael J. Wysocki, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Wed, 30 Apr 2008, Chris Shoemaker wrote:

> On Wed, Apr 30, 2008 at 04:57:38PM -0700, david@lang.hm wrote:
>>>> history has shown that developers do not stop developing if their patches are
>>>> not accepted, they just fork and go their own way.
>>>
>>> That's mostly when they feel that they are treated unfairly.
>>>
>>> OTOH, insisting that your patches should be merged at the same rate that you're
>>> able to develop them is unreasonable to me.
>>
>> it's not nessasarily the individuals that fork, it's the distros who want
>> to include the fixes and other changes that the individuals that create the
>> fork.
>
> Is that really bad?  Isn't that effectively equivalent to "increased testing of
> earlier intergrations"?

not if there are so many changes that the testing isn't really relavent to 
mainline.

not if the changes don't get into mainline.

look at the mess of the distro kernels in the 2.5 and earlier days. having 
them maintain a large body of patches didn't work for them or for the 
mainline kernel.

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:12                 ` Willy Tarreau
  2008-04-30 23:59                   ` Rafael J. Wysocki
@ 2008-05-01  0:15                   ` Chris Shoemaker
  2008-05-01  5:09                     ` Willy Tarreau
  1 sibling, 1 reply; 229+ messages in thread
From: Chris Shoemaker @ 2008-05-01  0:15 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Rafael J. Wysocki, David Miller, mingo, akpm, torvalds,
	linux-kernel, jirislaby

On Thu, May 01, 2008 at 01:12:21AM +0200, Willy Tarreau wrote:
> On Thu, May 01, 2008 at 12:39:01AM +0200, Rafael J. Wysocki wrote:
> > In fact, so many changes go in at a time during a merge window, that we often
> > can't really say which of them causes the breakage observed by testers and
> > bisection, that IMO should really be a last-resort tool, is used on the main
> > debugging techinque.
> 
> Maybe we could slightly improve the process by releasing more often, but
> based on topics. Small sets of minimally-overlapping topics would get
> merged in each release, and other topics would only be allowed to pull
> fixes. That way everybody still gets some work merged, everybody tests
> and problems are more easily spotted.
> 
> I know this is in part what Andrew tries to do when proposing to
> integrate trees, but maybe some approximate rules should be proposed
> in order for developers to organize their works. This would begin
> with announcing topics to be considered for next branch very early.
> This would also make it more natural for developers to have creation
> and bug-tracking phases.

What would this look like, notionally?  Say the releases were twice as
frequent with Stage A and Stage B.  How could the topic be grouped
into the stages?  Could bugfixes of any type be merged in either
window?  Would this only apply to "new" features, API changes, etc? or
would maintenance-type changes have to be assigned to a stage, too?

-chris

^ permalink raw reply	[flat|nested] 229+ messages in thread

* RFC: starting a kernel-testers group for newbies
  2008-04-30 20:31     ` Linus Torvalds
                         ` (2 preceding siblings ...)
  2008-04-30 21:52       ` H. Peter Anvin
@ 2008-05-01  0:31       ` Adrian Bunk
  2008-04-30  7:03         ` Arjan van de Ven
  2008-05-01  0:41         ` David Miller
  3 siblings, 2 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01  0:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 30 Apr 2008, Andrew Morton wrote:
> > 
> > <jumps up and down>
> > 
> > There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> 
> The problem I see with both -mm and linux-next is that they tend to be 
> better at finding the "physical conflict" kind of issues (ie the merge 
> itself fails) than the "code looks ok but doesn't actually work" kind of 
> issue.
> 
> Why?
> 
> The tester base is simply too small.
> 
> Now, if *that* could be improved, that would be wonderful, but I'm not 
> seeing it as very likely.
> 
> I think we have fairly good penetration these days with the regular -git 
> tree, but I think that one is quite frankly a *lot* less scary than -mm or 
> -next are, and there it has been an absolutely huge boon to get the kernel 
> into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also 
> started something like that).
> 
> So I'm very pessimistic about getting a lot of test coverage before -rc1.
> 
> Maybe too pessimistic, who knows?

First of all:
I 100% agree with Andrew that our biggest problems are in reviewing code 
and resolving bugs, not in finding bugs (we already have far too many 
unresolved bugs).

But although testing mustn't replace code reviews it is a great help, 
especially for identifying regressions early.

Finding testers should actually be relatively easy since it doesn't 
require much knowledge from the testers.

And it could even solve a second problem:

It could be a way for getting newbies into kernel development.

We actually do only rarely have tasks suitable as janitor tasks for 
newbies, and the results of people who do neither know the kernel
nor know C running checkpatch on files in the kernel have already
been discussed extensively...

I'll try to do this:
- create some Wiki page
- get a mailing list at vger
- point newbies to this mailing list
- tell people there which kernels to test
- figure out and document stuff like how to bisect between -next kernels
- help them to do whatever is required for a proper bug report

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:14                   ` david
@ 2008-05-01  0:38                     ` Linus Torvalds
  2008-05-01  1:39                       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  0:38 UTC (permalink / raw)
  To: david
  Cc: Chris Shoemaker, Rafael J. Wysocki, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby



On Wed, 30 Apr 2008, david@lang.hm wrote:
> 
> look at the mess of the distro kernels in the 2.5 and earlier days. having
> them maintain a large body of patches didn't work for them or for the mainline
> kernel.

Exactly. 

I do think Rafael's TCP analogy is somewhat germane, but it misses the 
point that the longer the queue gets, the *worse* the quality gets. It 
gets worse because the queued-up patches don't actually get tested any 
more during their queueing, and because everybody else who isn't 
intimately involved with production of said patches just gets *less* 
inclined to look at big patch-queue than a small one.

So having a long queue and trying to manage it (by some kind of negative 
feedback) is counter-productive, because by the time that situation 
happens, you're basically screwed already.

That's what we largely had with the Xen merge, for example. A lot of the 
code had been around for basically _forever_, and the people involved in 
reviewing it got really tired of it, and there was no way in *hell* a new 
person would ever start reviewing the huge backlog. Once it is massive, 
it's just too massive.

So trying to push back from the destination is really painful. It's also 
aggravating for everybody else. When people were complaining about me not 
scaling (remember those flame-wars? Now the complaint is basically the 
reverse), it was very painful for everybody, and most of all me. 

So I really really hope that if we need throttling (and I do want to point 
out that I'm not entirely sure we do - I think the issue is not "number of 
commits", but "quality of code", and I do _not_ agree that the two are 
directly related in any way), it should be source-based.

Trying to make sure that the source throttles, and not by making 
developers feel unproductive. And quite frankly, most things that throttle 
the source are of the annoying and non-productive kind. The classic source 
throttle tends to be to make it very "expensive" to do development, by 
introducing various barriers.

The barriers are usually "you need to have <n> other people look at it", 
or "you need to pass this five-hour test-suite", and almost invariably, 
the big issue is not code quality, but literally to slow things down. And 
call me crazy, but I think that a process that is designed to not 
primarily get quality, but slow things down, is likely to generate not 
just bad feelings, but actually much worse code too!

And the thing is, I don't even think our main problem is "lots of 
changes". I think we've actually been very successful at managing lots of 
change. Our problems are elsewhere.

So I think our primary problems are:

 - making mistakes is inevitable and cannot be avoided, but we can still 
   add more layers to make it less likely. But these should *not* be aimed 
   at being cumbersome to slow things down - they should basically 
   pipeline perfectly, so that there is no frustrating ping-pong latency.

   And linux-next falls into this kind of category: it doesn't really slow
   down development, but it would be another "pipeline stage" in the 
   process.

   (In contrast, requiring every patch to have <n> reviewed-by etc would 
   add huge latencies and slow down things hugely, and just generally be 
   make-believe work once everybody started gaming the system because it's 
   so irritating)

 - we do want more testing as part of the pipeline (but again, not 
   synchronously - but to speed up feedback for when things go wrong. So 
   it wouldn't get rid of the errors, but if it happens quickly enough, 
   maybe we'd catch things early in the development pipeline before it 
   even hits my tree)

   Having more linux-next testing would be great.

 - Regular *user* debuggability and reporting.

   Quite frankly, I think the reason a lot of people really like being 
   able to bisect bugs is not that "git bisect" is such an inherently cool 
   program, but because it is a really great tool for *users* to 
   participate in the debugging, in ways oops reports etc were not.

   Similarly, I love the oops/warning report statistics that Arjan sends 
   out. With vendor support users help debugging and reporting without 
   even necessarily knowing about it. Things like *that* matter a lot.

Notice how none of the above are about slowing down development.  I don't 
think quality and speed of development are related. In fact, I think 
quality and speed often go hand-in-hand: the same way some of the best 
programmers are also the most productive, I think some of the most 
productive flows are likely to generate the best code!

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:45             ` Rafael J. Wysocki
  2008-04-30 23:57               ` david
@ 2008-05-01  0:38               ` Adrian Bunk
  2008-05-01  0:56                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01  0:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, Linus Torvalds, David Miller, linux-kernel, Andrew Morton,
	Jiri Slaby

On Thu, May 01, 2008 at 01:45:38AM +0200, Rafael J. Wysocki wrote:
> On Thursday, 1 of May 2008, david@lang.hm wrote:
> > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > 
> > > On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> > >>
> > >> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > >> So your "fewer commits over a unit of time" doesn't make sense.
> > >
> > > Oh, yes it does.  Equally well you could say that having brakes in a car
> > > didn't make sense, even if you could drive it as fast as the engine allowed
> > > you to. ;-)
> > >
> > >> We have those ten thousand commits. They need to go in. They cannot take
> > >> forever.
> > >
> > > But perhaps some of them can wait a bit longer.
> > 
> > not really, if patches are produced at a rate of 1000/week and you decide 
> > to only accept 2000 of them this month, a month later you have 6000 
> > patches to deal with.
> 
> Well, I think you know how TCP works.  The sender can only send as much
> data as the receiver lets it, no matter how much data there are to send.
> I'm thinking about an analogous approach.
> 
> If the developers who produce those patches know in advance about the rate
> limit and are promised to be treated fairly, they should be able to organize
> their work in a different way.
>...

We cannot control who develops what.

When someone wants some feature or wants to get Linux running on his 
hardware he will always develop the code.

We can only control what we merge.

And the main rationale for the 2.6 development model was that we do no 
longer want distributions to ship kernels with insane amounts of 
patches.

> Thanks,
> Rafael

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  0:31       ` RFC: starting a kernel-testers group for newbies Adrian Bunk
  2008-04-30  7:03         ` Arjan van de Ven
@ 2008-05-01  0:41         ` David Miller
  2008-05-01 13:23           ` Adrian Bunk
  1 sibling, 1 reply; 229+ messages in thread
From: David Miller @ 2008-05-01  0:41 UTC (permalink / raw)
  To: bunk; +Cc: torvalds, akpm, rjw, linux-kernel, jirislaby, rostedt

From: Adrian Bunk <bunk@kernel.org>
Date: Thu, 1 May 2008 03:31:25 +0300

> - get a mailing list at vger

kernel-testers@vger.kernel.org has been created, feel free to
use it

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:20               ` Linus Torvalds
@ 2008-05-01  0:42                 ` Rafael J. Wysocki
  2008-05-01  1:19                   ` Linus Torvalds
  2008-05-01  1:30                 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01  0:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thursday, 1 of May 2008, Linus Torvalds wrote:
> 
> On Thu, 1 May 2008, Willy Tarreau wrote:
[--snip--]

> I do _not_ want to slow down development by setting some kind of "quality 
> bar" - but I do believe that we should keep our quality high, not because 
> of any hoops we need to jump through, but because we take pride in the 
> thing we do.

Well, we certainly should, but do we always remeber about it?  Honest, guv?

> [ An example of this: I don't believe code review tends to much help in 
>   itself, but I *do* believe that the process of doing code review makes 
>   people more aware of the fact that others are looking at the code they 
>   produce, and that in turn makes the code often better to start with.

It may help directly, for example when people realize that they work on
conflicting or just related changes.

>   And I think publicly announced git trees and -mm and linux-next are 
>   great partly because they end up doing that same thing. I heartily 
>   encourage submaintainers to always Cc: linux-kernel when they send me a 
>   "please pull" request - I don't know if anybody else ever really pulls 
>   that tree, but I do think that it's very healthy to write that message 
>   and think of it as a publication event. ]

I totally agree with that.

Still, the issue at hand is that
(1) The code merged during a merge window is somewhat opaque from the tester's
     point of view and if a regression is found, the only practical means to
    figure out what caused it is to carry out a bisection (which generally is
    unpleasant, to put it lightly).
(2) Many regressions are introduced during merge windows (relative to the
    total amount of code merged they are a few, but the raw numbers are
    significant) and because of (1) the process of removing them is generally
    painful for the affected people.
(3) The suspicion is that the number of regressions introduced during merge
    windows has something to do with the quality of code being below
    expectations, that in turn may be related to the fact that it's being
    developed very rapidly.

My opinion is that we need to solve this issue sooner rather than later and so
the question is how we are going to approach that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:38               ` Adrian Bunk
@ 2008-05-01  0:56                 ` Rafael J. Wysocki
  2008-05-01  1:25                   ` Adrian Bunk
  0 siblings, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01  0:56 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: david, Linus Torvalds, David Miller, linux-kernel, Andrew Morton,
	Jiri Slaby

On Thursday, 1 of May 2008, Adrian Bunk wrote:
> On Thu, May 01, 2008 at 01:45:38AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, 1 of May 2008, david@lang.hm wrote:
> > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > 
> > > > On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> > > >>
> > > >> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > > >> So your "fewer commits over a unit of time" doesn't make sense.
> > > >
> > > > Oh, yes it does.  Equally well you could say that having brakes in a car
> > > > didn't make sense, even if you could drive it as fast as the engine allowed
> > > > you to. ;-)
> > > >
> > > >> We have those ten thousand commits. They need to go in. They cannot take
> > > >> forever.
> > > >
> > > > But perhaps some of them can wait a bit longer.
> > > 
> > > not really, if patches are produced at a rate of 1000/week and you decide 
> > > to only accept 2000 of them this month, a month later you have 6000 
> > > patches to deal with.
> > 
> > Well, I think you know how TCP works.  The sender can only send as much
> > data as the receiver lets it, no matter how much data there are to send.
> > I'm thinking about an analogous approach.
> > 
> > If the developers who produce those patches know in advance about the rate
> > limit and are promised to be treated fairly, they should be able to organize
> > their work in a different way.
> >...
> 
> We cannot control who develops what.

We don't need to.

> When someone wants some feature or wants to get Linux running on his 
> hardware he will always develop the code.
> 
> We can only control what we merge.

To be exact, we control what we merge and when.  There's no rule saying that
every patch has to be merged as soon as it appears to be ready for merging,
or during the nearest merge window, AFAICS.

> And the main rationale for the 2.6 development model was that we do no 
> longer want distributions to ship kernels with insane amounts of 
> patches.

This was an argument agaist starting a separate development branch in analogy
with 2.5, IIRC, and I agree with that.

Still, I think we don't need to merge patches at the current rate and it might
help improve their overall quality if we didn't.  Of course, the latter is only
a speculation, although it's based on my experience.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:41             ` Andrew Morton
  2008-04-30 23:23               ` Rafael J. Wysocki
@ 2008-05-01  0:57               ` Adrian Bunk
  2008-05-01  1:25                 ` Linus Torvalds
  2008-05-01  1:35                 ` Theodore Tso
  2008-05-01 12:31               ` Tarkan Erimer
  2 siblings, 2 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01  0:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, rjw, davem, linux-kernel, jirislaby

On Wed, Apr 30, 2008 at 03:41:24PM -0700, Andrew Morton wrote:
> On Wed, 30 Apr 2008 15:31:22 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Any suggestions on how to convince people that their code is not worth 
> > merging?
> 
> Raise the quality.  Then the volume will automatically decrease.

100% ACK to "Raise the quality" (no matter whether it influences the volume).

> Which leads us to...  the volume isn't a problem per-se.  The problem is
> quality.  It's the fact that they vary inversely which makes us say "slow
> down".
> 
> So David's Subject: should have been "Do Better, please".  Slowing down is
> just a side-effect.  And, we expect, a tool.
> 
> 
> We should be discussing how to raise the quality of our work.

One big problem I see is Linus wanting to merge all drivers regardless 
of the quality.

Linus said in [1]:
"I'd really rather have the driver merged, and then *other* people can 
 send patches!"

The problem is that such "other people" do not exist (except perhaps Al) 
for non-trivial stuff.

My favorite gem from this driver we merged in 2.6.25 is:
  grep -C4 volatile drivers/infiniband/hw/nes/nes_nic.c

Fixing such stuff aren't "janitorial kind of things", and people are 
actually more motivated to fix their code for getting it into the kernel 
than to fix their code after it went into the kernel.

I am not saying we shouldn't merge such a driver at all or set 
unrealistic high quality goals - I'm for merging all code of good 
quality that provides functionality not yet into the kernel.

But we need some minimum quality level.

cu
Adrian

[1] http://lkml.org/lkml/2008/2/21/334

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 15:49                 ` Andrew Morton
@ 2008-05-01  1:13                   ` Arjan van de Ven
  2008-05-02  9:00                     ` Adrian Bunk
  2008-05-01 16:38                   ` Steven Rostedt
  2008-05-01 17:24                   ` Theodore Tso
  2 siblings, 1 reply; 229+ messages in thread
From: Arjan van de Ven @ 2008-05-01  1:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, Linus Torvalds, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Thu, 1 May 2008 08:49:19 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> > Granted that compared to x86 there's not a sizable portion of users 
> > crazy enough to run Linux on powerpc machines...
> 
> Another fallacy which Arjan is pushing (even though he doesn't appear
> to have realised it) is "all hardware is the same".

no I'm pushing "some classes of hardware are much more popular/relevant
than others".


 
> Well, it isn't.  And most of our bugs are hardware-specific.  So, I'd
> venture, most of our bugs don't affect most people.  So, over time, by
> Arjan's "important to enough people" observation we just get more and
> more and more unfixed bugs.

I did not say "most people". I believe "most people" aren't hitting
bugs right now (or there would be a lot more screaming).
What I do believe is that *within the bugs that hit*, even the hardware
specific ones, there's a clear prioritization by how many people hit
the bug (or have the hardware in general).

> 
> And I believe this effect has been occurring.
> 

> And please stop regaling us with this kerneloops.org stuff.  It just
> isn't very interesting, useful or representative when considering the
> whole problem.  Very few kernel bugs result in a trace, and when they
> do they are usually easy to fix and, because of this, they will get
> fixed, often quickly.  I expect
> netdevwatchdogeth0transmittimedout.org would tell a different story.

now that's a fallacy of your own.. if you care about that one, it's 1)
trivial to track and/or 2) could contain a WARN_ON_ONCE(), at which
point it's automatically tracked. (and more useful information I
suspect, since it suddenly has a full backtrace including driver info
in it)
By your argument we should work hard to make sure we're better at
creating traces for cases we detect something goes wrong.
(I would not argue against that fwiw)

> I figure that after a bug is reported we have maybe 24 to 48 hours to
> send a good response before our chances of _ever_ fixing it have
> begun to decline sharply due to the clever minds at the other end.
> 
> Which leads us to Arjan's third fallacy:
> 
>    "How many bugs that a sizable portion of users will hit in reality
>    are there?" is the right question to ask...
> 
> well no, it isn't.  Because approximately zero of the hardware bugs

if it's a hardware bug there's little we can do.
If it's a hardware specific bug, yeah then it becomes a function of how
popular that hardware is.

> affect a sizeable portion of users.  With this logic we will end up
> with more and more and more and more bugs each of which affect a tiny
> number of users. Hundreds of different bugs.  You know where this
> process ends up.

Given that a normal PC has maybe 10 components... 
yes we don't want bugcreep that affects common hardware over time.
At the same time, by your argument, a bug that hits a piece of hardware
of which 5 are made (or left on this planet) is equally important to
a bug in something that 
> 
> Arjan's fourth fallacy: "We don't make (effective) prioritization
> decisions." lol.  This implies that someone somewhere once sat down
> and wondered which bug he should most effectively work on.  Well, we
> don't do that.  We ignore _all_ the bugs in favour of busily writing
> new ones

This statement is so rediculous and self contradicting to what you
said before that I'm not even going to respond to it. 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:42                 ` Rafael J. Wysocki
@ 2008-05-01  1:19                   ` Linus Torvalds
  2008-05-01  1:31                     ` Andrew Morton
                                       ` (2 more replies)
  0 siblings, 3 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  1:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Willy Tarreau, David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> 
> > I do _not_ want to slow down development by setting some kind of "quality 
> > bar" - but I do believe that we should keep our quality high, not because 
> > of any hoops we need to jump through, but because we take pride in the 
> > thing we do.
> 
> Well, we certainly should, but do we always remeber about it?  Honest, guv?

Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like 
process generates quality?

And I dislike how people try to conflate "quality" and "merging speed" as 
if there was any reason what-so-ever to believe that they are related.

You (and Andrew) have tried to argue that slowing things down results in 
better quality, and I simply don't for a moment believe that. I believe 
the exact opposite.

The way to get good quality is not to put barriers up in front of 
developers, but totally the reverse - by helping them. And yes, that help 
can quite possibly be in the form of "process" - by making things more 
streamlined, and by having people not have to waste time on wondering 
where they should send things etc.

But the notion that we should even _try_ to aim to slow things down, that 
one I find unlikely to be true, and I don't even understand why anybody 
would find it a logical goal?

Of course, you will have fewer new bugs if you have fewer changes. But 
that's not a goal, that's a tautology and totally uninteresting. A small 
program is likely to have fewer bugs, but that doesn't make something 
small "better" than something large that does more.

Similarly, a stagnant development community will introduce new bugs more 
seldom. But does that make a stagnant one better than a virbrant one? Hell 
no.

So what I'm arguing against here is not that we should aim for worse 
quality, but I'm arguing against the false dichotomy of believing that 
quality is incompatible with lots of change. 

So if we can get the discussion *away* from the "let's slow things down", 
then I'm interested. Because at that point we don't have to fight made-up 
arguments about something irrelevant.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:56                 ` Rafael J. Wysocki
@ 2008-05-01  1:25                   ` Adrian Bunk
  2008-05-01 12:05                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, Linus Torvalds, David Miller, linux-kernel, Andrew Morton,
	Jiri Slaby

On Thu, May 01, 2008 at 02:56:23AM +0200, Rafael J. Wysocki wrote:
> On Thursday, 1 of May 2008, Adrian Bunk wrote:
> > On Thu, May 01, 2008 at 01:45:38AM +0200, Rafael J. Wysocki wrote:
> > > On Thursday, 1 of May 2008, david@lang.hm wrote:
> > > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > > 
> > > > > On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> > > > >>
> > > > >> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > > > >> So your "fewer commits over a unit of time" doesn't make sense.
> > > > >
> > > > > Oh, yes it does.  Equally well you could say that having brakes in a car
> > > > > didn't make sense, even if you could drive it as fast as the engine allowed
> > > > > you to. ;-)
> > > > >
> > > > >> We have those ten thousand commits. They need to go in. They cannot take
> > > > >> forever.
> > > > >
> > > > > But perhaps some of them can wait a bit longer.
> > > > 
> > > > not really, if patches are produced at a rate of 1000/week and you decide 
> > > > to only accept 2000 of them this month, a month later you have 6000 
> > > > patches to deal with.
> > > 
> > > Well, I think you know how TCP works.  The sender can only send as much
> > > data as the receiver lets it, no matter how much data there are to send.
> > > I'm thinking about an analogous approach.
> > > 
> > > If the developers who produce those patches know in advance about the rate
> > > limit and are promised to be treated fairly, they should be able to organize
> > > their work in a different way.
> > >...
> > 
> > We cannot control who develops what.
> 
> We don't need to.
> 
> > When someone wants some feature or wants to get Linux running on his 
> > hardware he will always develop the code.
> > 
> > We can only control what we merge.
> 
> To be exact, we control what we merge and when.  There's no rule saying that
> every patch has to be merged as soon as it appears to be ready for merging,
> or during the nearest merge window, AFAICS.

What currently gets applied to the kernel are between two and three 
million lines changed per year.

We can discuss when and how to apply them.

But unless we want to create an evergrowing backlog we have to change 
roughly 200.000 lines per month on average.

Even with higher quality criteria that might result in some code not 
being merged we will still be > 100.000 lines per month on average.

> > And the main rationale for the 2.6 development model was that we do no 
> > longer want distributions to ship kernels with insane amounts of 
> > patches.
> 
> This was an argument agaist starting a separate development branch in analogy
> with 2.5, IIRC, and I agree with that.
>
> Still, I think we don't need to merge patches at the current rate and it might
> help improve their overall quality if we didn't.  Of course, the latter is only
> a speculation, although it's based on my experience.

See above - what do you want to do if we'd merge less and have a backlog 
of let's say one million lines to change after one year, much of it 
already in distribution kernels?

I also don't like this situation, but we have to cope with it.

> Thanks,
> Rafael

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:57               ` Adrian Bunk
@ 2008-05-01  1:25                 ` Linus Torvalds
  2008-05-01  2:13                   ` Adrian Bunk
  2008-05-01  1:35                 ` Theodore Tso
  1 sibling, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  1:25 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, rjw, davem, linux-kernel, jirislaby



On Thu, 1 May 2008, Adrian Bunk wrote:
> 
> One big problem I see is Linus wanting to merge all drivers regardless 
> of the quality.

That's not what I said.

What I said was that I think we get *better* quality by merging early.

In other words, you're turning the whole argument on its head, and 
incorrectly so.

I claim that you are the one that is arguing for *worse* quality, by 
arguing for a process that is KNOWN to tend to generate bad code 
(out-of-tree drivers) as opposed to one that tends to fix things over time 
(and note the "tends" in both cases - there are counter-examples, but 
the trend is so clear that anybody who disputes it would seem to be either 
blind or lying).

So here's my challenge: give me *one* reason to believe that quality 
improves more out-of-tree than it does in-tree, and then you'll have a 
point. But you'd better be able to explain the ton of historical data we 
have that proves otherwise.

Until you do that, your blathering is just that - total blathering. The 
process I advocate is the one that has historical data on its side. Yours 
is just a failed theory.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:20               ` Linus Torvalds
  2008-05-01  0:42                 ` Rafael J. Wysocki
@ 2008-05-01  1:30                 ` Jeremy Fitzhardinge
  2008-05-01  5:35                   ` Willy Tarreau
  1 sibling, 1 reply; 229+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-01  1:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Rafael J. Wysocki, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

Linus Torvalds wrote:
>   And I think publicly announced git trees and -mm and linux-next are 
>   great partly because they end up doing that same thing. I heartily 
>   encourage submaintainers to always Cc: linux-kernel when they send me a 
>   "please pull" request - I don't know if anybody else ever really pulls 
>   that tree, but I do think that it's very healthy to write that message 
>   and think of it as a publication event. ]
>   

And, ideally, they would have posted the changes as patches to the list 
for review anyway, so there shouldn't be anything surprising in that pull...

    J

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:19                   ` Linus Torvalds
@ 2008-05-01  1:31                     ` Andrew Morton
  2008-05-01  1:43                       ` Linus Torvalds
  2008-05-01  1:40                     ` Linus Torvalds
  2008-05-01  5:50                     ` Willy Tarreau
  2 siblings, 1 reply; 229+ messages in thread
From: Andrew Morton @ 2008-05-01  1:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby

On Wed, 30 Apr 2008 18:19:56 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> You (and Andrew) have tried to argue that slowing things down results in 
> better quality,

eh?  I argued the opposite: that increasing quality will as a side-effect
slow things down.

If we simply throttled things, people would spend more time watching the
shopping channel while merging smaller amounts of the same old crap.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:57               ` Adrian Bunk
  2008-05-01  1:25                 ` Linus Torvalds
@ 2008-05-01  1:35                 ` Theodore Tso
  1 sibling, 0 replies; 229+ messages in thread
From: Theodore Tso @ 2008-05-01  1:35 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, Linus Torvalds, rjw, davem, linux-kernel, jirislaby

On Thu, May 01, 2008 at 03:57:27AM +0300, Adrian Bunk wrote:
> > 
> > We should be discussing how to raise the quality of our work.
> 
> One big problem I see is Linus wanting to merge all drivers regardless 
> of the quality.
> 
> Linus said in [1]:
> "I'd really rather have the driver merged, and then *other* people can 
>  send patches!"
> 
> The problem is that such "other people" do not exist (except perhaps Al) 
> for non-trivial stuff.

Sure, but that's not cause of the problems that people like DavidN
whine about, or problems that frustrate David Miller and/or Ingo
Molnar.  The problems that cause whining and/or frustration are when
changes in core code break other maintainer.  That is a TOTALLY
DIFFERENT problem from lower-quality device drivers getting merged.
In general, those device drivers don't cause problems who don't have
the relevant hardware, and worse case, the device driver can just be
CONFIG'ed out.

So this is a totally different issue, and whether or not we merge new
device drivers, and at what quality level (from "it compiles, ship
it!", to every single checkpatch, sparse, and Cristoph Hellwig nitpick
has to be addressed *AND* then the submitter has to give a bottle of
high-quality alcohol to a Maintainer :-) is completely orthoganal to
the question of whether we can, in a King Canute fashion, compel
developers from stopping to develop by command them not to send pull
requests or by refusing to merge their work into mainline.

If we don't merge their work, and it's really cool features that our
end users are demanding, it will just flow into the distros via
out-of-tree patches, much like it did during the 2.4/2.5 era.  And
maybe the current enterprise distro's will try to hold it back, but if
end users start saying things like's "We want containers!!" and start
voting with their feet to a distro that is willing to merge OpenVZ
patches, it doesn't how much we try to tell the tide to stop flowing
in.  So yes, we can apply some amount of backpressure, but the real
challenge is to figure out how we can work smarter and flush out the
bugs faster.

							- Ted

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:38                     ` Linus Torvalds
@ 2008-05-01  1:39                       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 229+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-01  1:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: david, Chris Shoemaker, Rafael J. Wysocki, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

Linus Torvalds wrote:
> That's what we largely had with the Xen merge, for example. A lot of the 
> code had been around for basically _forever_, and the people involved in 
> reviewing it got really tired of it, and there was no way in *hell* a new 
> person would ever start reviewing the huge backlog. Once it is massive, 
> it's just too massive.
>   

Heh.  The Xen code in the kernel now is a complete rewrite, with only 
trace elements from the original patchset.  And yes, that's partly 
because the original patches were unreviewable.

    J

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:19                   ` Linus Torvalds
  2008-05-01  1:31                     ` Andrew Morton
@ 2008-05-01  1:40                     ` Linus Torvalds
  2008-05-01  1:51                       ` David Miller
                                         ` (4 more replies)
  2008-05-01  5:50                     ` Willy Tarreau
  2 siblings, 5 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  1:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Willy Tarreau, David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Wed, 30 Apr 2008, Linus Torvalds wrote:
> 
> You (and Andrew) have tried to argue that slowing things down results in 
> better quality,

Sorry, not Andrew. DavidN.

Andrew argued the other way (quality->slower), which I also happen to not 
necessarily believe in, but that's a separate argument.

Nobody should ever argue against raising quality.

The question could be about "at what cost"? (although I think that's not 
necessarily a good argument, since I personally suspect that good quality 
code comes from _lowering_ costs, not raising them).

But what's really relevant is "how?"

Now, we do know that open-source code tends to be higher quality (along a 
number of metrics) than closed source code, and my argument is that it's 
not because of bike-shedding (aka code review), but simply because the 
code is out there and available and visible.

And as a result of that, my personal belief is that the best way to raise 
quality of code is to distribute it. Yes, as patches for discussion, but 
even more so as a part of a cohesive whole - as _merged_ patches!

The thing is, the quality of individual patches isn't what matters! What 
matters is the quality of the end result. And people are going to be a lot 
more involved in looking at, testing, and working with code that is 
merged, rather than code that isn't.

So _my_ answer to the "how do we raise quality" is actually the exact 
reverse of what you guys seem to be arguing.

IOW, I argue that the high speed of merging very much is a big part of 
what gives us quality in the end. It may result in bugs along the way, but 
it also results in fixes, and lots of people looking at the result (and 
looking at it in *context*, not just as a patch flying around).

And yes, maybe that sounds counter-intuitive. But hey, people thought open 
source was counter-intuitive. I spent years explaining why it should work 
at all!

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:51               ` David Miller
@ 2008-05-01  1:40                 ` Ingo Molnar
  2008-05-01  2:48                 ` Adrian Bunk
  1 sibling, 0 replies; 229+ messages in thread
From: Ingo Molnar @ 2008-05-01  1:40 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, torvalds, rjw, linux-kernel, jirislaby, tglx


* David Miller <davem@davemloft.net> wrote:

> Ingo, let me know what I need to do to change your behavior in 
> situations like the one I'm about to describe, ok?
> 
> Today, you merged in this bogus "regression fix".

the motivation of that fix wasnt UML - that was just an (indeed 
incorrect) after-thought when i wrote up the commit log. The fix is 
obviously right - although it doesnt fix UML.

btw., did you see my stream of fixes about UML?

> To an arbitrary person reading the commit logs, the above looks like 
> you fixed something, when you actually didn't fix anything.

it is wrong that it "doesnt fix anything". Look at the change itself:

- * Force always-inline if the user requests it so via the .config:
+ * Force always-inline if the user requests it so via the .config,
+ * or if gcc is too old:
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
-    !defined(CONFIG_OPTIMIZE_INLINING) && (__GNUC__ >= 4)
+    !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)

before the change it was only possible to disable the optimization on 
gcc 4 and above. The intended (and now implemented) condition is to only 
change anything on gcc 4 and above. I.e. on gcc3x the config option has 
no effect at all - and that's what we want.

	Ingo

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:31                     ` Andrew Morton
@ 2008-05-01  1:43                       ` Linus Torvalds
  2008-05-01 10:59                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  1:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby



On Wed, 30 Apr 2008, Andrew Morton wrote:
> 
> eh?  I argued the opposite: that increasing quality will as a side-effect
> slow things down.

Yes, my bad, I realized that when I read through my message and already 
sent out a fix for my buggy email ;)

> If we simply throttled things, people would spend more time watching the
> shopping channel while merging smaller amounts of the same old crap.

I agree totally. And although some of the time would probably _also_ be 
spent on the frustrating crap that was designed to do the throttling, that 
isn't much more productive than watching the shopping channel would be ...

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:40                     ` Linus Torvalds
@ 2008-05-01  1:51                       ` David Miller
  2008-05-01  2:01                         ` Linus Torvalds
  2008-05-01  2:21                       ` Al Viro
                                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 229+ messages in thread
From: David Miller @ 2008-05-01  1:51 UTC (permalink / raw)
  To: torvalds; +Cc: rjw, w, linux-kernel, akpm, jirislaby

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 30 Apr 2008 18:40:39 -0700 (PDT)

> IOW, I argue that the high speed of merging very much is a big part of 
> what gives us quality in the end. It may result in bugs along the way, but 
> it also results in fixes, and lots of people looking at the result (and 
> looking at it in *context*, not just as a patch flying around).

This is a huge burdon to put on people.

The more broken stuff you merge, the more people are forced to track
these problems down so that they can get their own work done.

It punishes people who do put forth the effort to let new changes cook
properly, before pushing, and thus avoid putting turds into the tree.

You really have to think about the ramifications of this system.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:29     ` Paul Mackerras
@ 2008-05-01  1:57       ` Jeff Garzik
  2008-05-01  2:52         ` Frans Pop
  2008-05-01  3:47       ` Linus Torvalds
  1 sibling, 1 reply; 229+ messages in thread
From: Jeff Garzik @ 2008-05-01  1:57 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Linus Torvalds, Rafael J. Wysocki, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

Paul Mackerras wrote:
> By the way, if you do want to make that rule, then there's a really
> easy way to do it - just pull linux-next, and make that one pull be
> the entire merge window. :)

That's a unique and interesting idea...

	Jeff



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:51                       ` David Miller
@ 2008-05-01  2:01                         ` Linus Torvalds
  2008-05-01  2:17                           ` David Miller
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  2:01 UTC (permalink / raw)
  To: David Miller; +Cc: rjw, w, linux-kernel, akpm, jirislaby



On Wed, 30 Apr 2008, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Wed, 30 Apr 2008 18:40:39 -0700 (PDT)
> 
> > IOW, I argue that the high speed of merging very much is a big part of 
> > what gives us quality in the end. It may result in bugs along the way, but 
> > it also results in fixes, and lots of people looking at the result (and 
> > looking at it in *context*, not just as a patch flying around).
> 
> This is a huge burdon to put on people.
> 
> The more broken stuff you merge, the more people are forced to track
> these problems down so that they can get their own work done.

I'm not saying we should merge crap.

You can take any argument too far, and clearly it doesn't mean that we 
should just accept *anything*, because it will magically be gilded by its 
mere inclusion into the kernel. No, I'm not going to argue that.

But I do want to argue against the notion that the only way to raise 
quality is to do it before it gets merged. It's often better to merge 
early, and fix the issues the merge brings up early too!

Release early, release often. That was the watch-word early in Linux 
kernel development, and there was a reason for it. And it _worked_. Did it 
mean "release crap, release anything"? No. But it did mean that things got 
lots more exposure - even if those "things" were sometimes bugs.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:25                 ` Linus Torvalds
@ 2008-05-01  2:13                   ` Adrian Bunk
  2008-05-01  2:30                     ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01  2:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, rjw, davem, linux-kernel, jirislaby

On Wed, Apr 30, 2008 at 06:25:50PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 1 May 2008, Adrian Bunk wrote:
> > 
> > One big problem I see is Linus wanting to merge all drivers regardless 
> > of the quality.
> 
> That's not what I said.
> 
> What I said was that I think we get *better* quality by merging early.
> 
> In other words, you're turning the whole argument on its head, and 
> incorrectly so.
> 
> I claim that you are the one that is arguing for *worse* quality, by 
> arguing for a process that is KNOWN to tend to generate bad code 
> (out-of-tree drivers) as opposed to one that tends to fix things over time 
> (and note the "tends" in both cases - there are counter-examples, but 
> the trend is so clear that anybody who disputes it would seem to be either 
> blind or lying).
>...

I am *not* saying it should have stayed out-of-tree.

I am saying that it was merged too early, and that there are points that 
should have been addressed before the driver got merged.

Get it submitted for review to linux-kernel.
Give the maintainers some time to incorporate all comments.
Even one month later it could still have made it into 2.6.25.

The only problem with my suggestion is that it's currently pretty random 
whether someone takes the time to review such a driver on linux-kernel.

And even if I'm getting fire for this again (and different from newbies 
running checkpatch on the kernel) for driver submissions it actually 
makes sense to tell the submitter to fix the checkpatch errors [1], and 
it would have made the driver better in this case (again, it could still 
have made it into 2.6.25).

People are actually more motivated to fix their code for getting it into 
the kernel than to fix their code after it went into the kernel, so we 
might get better quality when merging a bit later.

> 			Linus

cu
Adrian

[1] not necessarily all checkpatch warnings

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:01                         ` Linus Torvalds
@ 2008-05-01  2:17                           ` David Miller
  0 siblings, 0 replies; 229+ messages in thread
From: David Miller @ 2008-05-01  2:17 UTC (permalink / raw)
  To: torvalds; +Cc: rjw, w, linux-kernel, akpm, jirislaby

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 30 Apr 2008 19:01:12 -0700 (PDT)

> I'm not saying we should merge crap.

That's exactly what's been happening this merge window though.

And throughout this, Andrew Morton has been the only person with the
balls and lack of ego problems to revert regression causing changes he
introduced.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:40                     ` Linus Torvalds
  2008-05-01  1:51                       ` David Miller
@ 2008-05-01  2:21                       ` Al Viro
  2008-05-01  5:19                         ` david
  2008-05-04  3:26                         ` Rene Herman
  2008-05-01  2:31                       ` Nigel Cunningham
                                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 229+ messages in thread
From: Al Viro @ 2008-05-01  2:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Willy Tarreau, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Wed, Apr 30, 2008 at 06:40:39PM -0700, Linus Torvalds wrote:

> Now, we do know that open-source code tends to be higher quality (along a 
> number of metrics) than closed source code, and my argument is that it's 
> not because of bike-shedding (aka code review), but simply because the 
> code is out there and available and visible.
 
Really?  And how, pray tell, being out there will magically improve the
code?  "With enough eyes all bugs are shallow" stuff out of ESR's arse?

FWIW, after the last month's flamefests I decided to actually do something
about review density of code in the areas I'm theoretically responsible
for.  Namely, do systematic review of core data structure handling (starting
with the place where most of the codepaths get into VFS - descriptor tables
and struct file), doing both blow-by-blow writeup on how that sort of things
is done and documentation of the life cycle/locking rules/assertions made
by code/etc.  I made one bad mistake that held the things back for quite
a while - sending heads-up for one of the worse bugs found in process to
never-sufficiently-damned vendor-sec.  The last time I'm doing that, TYVM...

Anyway, I'm going to get the notes on that stuff in order and put them in
the open.  I really hope that other folks will join the fun afterwards.
The goal is to get a coherent braindump that would be sufficient for
people new to the area wanting to understand and review VFS-related code -
both in the tree and in new patches.

files_struct/fdtable handling is mostly dealt with, struct file is only
partially done - unfortunately, struct file_lock has to be dealt with
before that and it's a (predictable) nightmare.  On the other end of
things, fs_struct is not really started, vfsmount review is partially
done, dentry/superblock/inode not even touched.

Even with what little had been covered... well, let's just say that it
caught quite a few fun turds.  With typical age around 3-4 years.  And
VFS is not the messiest part of the tree...

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:13                   ` Adrian Bunk
@ 2008-05-01  2:30                     ` Linus Torvalds
  2008-05-01 18:54                       ` Adrian Bunk
  2008-05-14 14:55                       ` Pavel Machek
  0 siblings, 2 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  2:30 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, rjw, davem, linux-kernel, jirislaby



On Thu, 1 May 2008, Adrian Bunk wrote:
> 
> I am saying that it was merged too early, and that there are points that 
> should have been addressed before the driver got merged.
> 
> Get it submitted for review to linux-kernel.
> Give the maintainers some time to incorporate all comments.
> Even one month later it could still have made it into 2.6.25.
> 
> The only problem with my suggestion is that it's currently pretty random 
> whether someone takes the time to review such a driver on linux-kernel.

Now, I do agree that we could/should have some more process in general. I 
really _would_ like to have a process in place that basically says:

 - everything must have gone through lkml at least once

 - after that point, it should have been in linux-next or the -mm queue

 - and then it can get merged (and if it didn't get any review by then, 
   maybe it was because nobody was interested, and it simply won't be 
   getting any until it oopses or catches peoples interest some other way)

HOWEVER.

That process doesn't actually work for everything anyway (a lot of trivial 
fixes are really best not being so noisy, and various patches that are 
specific to some subsystem really _are_ better off just discussed on that 
subsystem mailing lists).

And perhaps more pertinently, right now that kind of process is very 
inconvenient (to the point of effectively being impossible) for me to 
check. Obviously, if the patch comes from Andrew, I know it was in -mm, 
and I seldom drop those patches for obvious reasons anyway, but the last 
thing we want is some process that depends even _more_ on Andrew being a 
burnt-out-excuse-for-a-man in a few years (*).

So I could ask for people to always have pointers to "it was discussed 
here" on patches they send (and I'd likely mostly trust them without even 
bothering to verify), the same way -git maintainers often talk about "most 
of this has been in -mm for the last two months".

That might work. But then there would still be the patches that are 
obvious and don't need them.

And then even the obvious patches do break. And people will complain. Even 
though requiring that kind of process for the stupid stuff would just slow 
everybody down, and would be really painful.

So one of my _personal_ reasons I don't want to put too much process in 
place is that I don't think process is appropriate for everything, and yet 
even the stuff that obviously doesn't need or want process (speling fixes 
and build failures) _will_ cause problems, and then people will whine 
about them not being there.

			Linus

(*) Andrew, no offense. I'm sure you'd be a magnificent burnt-out-excuse- 
for-a-man.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:40                     ` Linus Torvalds
  2008-05-01  1:51                       ` David Miller
  2008-05-01  2:21                       ` Al Viro
@ 2008-05-01  2:31                       ` Nigel Cunningham
  2008-05-01 18:32                         ` Stephen Clark
  2008-05-01  3:53                       ` Frans Pop
  2008-05-01 11:38                       ` Rafael J. Wysocki
  4 siblings, 1 reply; 229+ messages in thread
From: Nigel Cunningham @ 2008-05-01  2:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Willy Tarreau, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

Hi.

On Wed, 2008-04-30 at 18:40 -0700, Linus Torvalds wrote:
> The thing is, the quality of individual patches isn't what matters! What 
> matters is the quality of the end result. And people are going to be a lot 
> more involved in looking at, testing, and working with code that is 
> merged, rather than code that isn't.

No. People generally expect that code that has been merged does work, so
they don't look at it unless they're forced to (by a bug or the desire
to make further modifications in that code) and they don't explicitly
seek to test it. They just seek to use it.

When it doesn't work, some of us will go and seek to find the cause,
others (most?) will simply roll back to whatever they last found to be
reliable.

Out of tree code has the same issues.

The only time code really gets looked at and tested is when there's a
problem, or when people are explicitly choosing to inspect it (pre-merge
reviews, eg).

So my answer to the "how do we raise quality" question would be that
when writing the code, we put time and effort into properly analysing
the problem and developing a solution, we put time and effort into
carefully testing the solution, and we put code in that will help the
end-user help us to debug issues later (without them necessarily needing
to git-bisect). After all, good software isn't the result of random (or
semi-random), unconsidered modifications, but of planning, thought and
attention to detail.

In other words, I'm arguing that the speed of merging should be
irrelevant. What's relevant is the quality of the work done in the first
place.

If you want better quality code, penalise the people who get buggy code
merged. Give them a reason to get it in a better state before they try
to merge. Of course Linus alone can't do that.

Nigel


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:51               ` David Miller
  2008-05-01  1:40                 ` Ingo Molnar
@ 2008-05-01  2:48                 ` Adrian Bunk
  1 sibling, 0 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01  2:48 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, akpm, torvalds, rjw, linux-kernel, jirislaby, tglx

On Wed, Apr 30, 2008 at 03:51:49PM -0700, David Miller wrote:
> From: Ingo Molnar <mingo@elte.hu>
> Date: Thu, 1 May 2008 00:35:09 +0200
> 
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > What we need is not 'negative reinforcement'. That is just nasty, open 
> > > warfare between isolated parties, expressed in a politically correct 
> > > way.
> > 
> > in more detail: any "negative reinforcement" should be on the 
> > _technical_ level, i.e. when changes are handled - not at the broad tree 
> > level.
> 
> Sure, and I'll provide some right here.
> 
> Ingo, let me know what I need to do to change your behavior in
> situations like the one I'm about to describe, ok?
> 
> Today, you merged in this bogus "regression fix".
> 
> commit ae3a0064e6d69068b1c9fd075095da062430bda9
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed Apr 30 00:15:31 2008 +0200
> 
>     inlining: do not allow gcc below version 4 to optimize inlining
>     
>     fix the condition to match intention: always use the old inlining
>     behavior on all gcc versions below 4.
>     
>     this should solve the UML build problem.
>     
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> Did you actually read the UML build failure report?
> 
> Adrian Bunk specifically stated that the UML build failure regression
> occurs with GCC version 4.3
> 
> Next, did you test this regression fix?
> 
> Next, if you could not test this regression fix, did you wait
> patiently for the bug reporter to validate your fix?  Adrian
> responded that it didn't fix the problem, but that was after
> you queued this up to Linus already.
>...

You got the facts wrong, it is even worse:

It was Ingo himself who reported this bug. [1]

Ingo managed to send an untested and not working patch for a bug he 
reported himself...

cu
Adrian

BTW: I finally figured out what is behind the problems on UML, and this
     is not related to any recent kernel changes.
     Patch comes when I'm awake again.

[1] http://lkml.org/lkml/2008/4/26/151

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:57       ` Jeff Garzik
@ 2008-05-01  2:52         ` Frans Pop
  0 siblings, 0 replies; 229+ messages in thread
From: Frans Pop @ 2008-05-01  2:52 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: paulus, torvalds, rjw, davem, linux-kernel, akpm, jirislaby

Jeff Garzik wrote:
> Paul Mackerras wrote:
>> By the way, if you do want to make that rule, then there's a really
>> easy way to do it - just pull linux-next, and make that one pull be
>> the entire merge window. :)
> 
> That's a unique and interesting idea...

Full ack.

Especially if there was some kind of "pre-merge linux-next freeze" where 
people (arch maintainers, kernel testers) would be actively invited to do 
pre-merge testing.

During that period only changes that fix reported issues (be it build issues 
or regressions) would be allowed:
- either a revert of the problematic commit
- or a targeted fix

This could even hugely improve the bisectability of mainline after the merge 
as such changes could be merged/rebased into the subsystem tree _before_ 
Linus pulls them into mainline.

Currently I avoid -next and -mm and I also don't do any merge window 
testing. Why? Too much flux, too many issues, too much energy required.
But if there was some sort of pre-merge call for testing of an identifiable 
and relatively stable tree, I would definitely participate in that and be 
willing to spend time to bisect the hell out of any issues I'd find.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:52       ` H. Peter Anvin
@ 2008-05-01  3:24         ` Bob Tracy
  2008-05-01 16:39         ` Valdis.Kletnieks
  1 sibling, 0 replies; 229+ messages in thread
From: Bob Tracy @ 2008-05-01  3:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Andrew Morton, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby

H. Peter Anvin wrote:
> Linus Torvalds wrote:
> > 
> > The tester base is simply too small.
> > 
> > Now, if *that* could be improved, that would be wonderful, but I'm not 
> > seeing it as very likely.
> > 
> 
> One thing is that we keep fragmenting the tester base by adding new 
> confidence levels: we now have -mm, -next, mainline -git, mainline -rc, 
> mainline release, stable, distro testing, and distro release (and some 
> distros even have aggressive versus conservative tracks.)  Furthermore, 
> thanks to craniorectal immersion on the part of graphics vendors, a lot 
> of users have to run proprietary drivers on their "main work" systems, 
> which means they can't even test newer releases even if they would dare.

Since I poke my head out of the foxhole every once in a while with a
relatively late-breaking bug report, I thought I should chime in...
Mr. Anvin has pretty much nailed it...

As the kernel development process has evolved, which "confidence level"
I select has evolved as well.  The thing that *hasn't* changed through
the years is, I tend to pick a "confidence level" that is appropriately
close to "mainline" and has an update release schedule roughly compatible
with my ability to keep up with it.  Specifically, if it takes me several
hours to download a patch set, apply it, build the new kernel, and test
on multiple platforms/architectures, then the update release schedule is
probably going to have to be no more often than twice a week if I'm going
to be at all interested in even trying to keep up with it.  In 2008, the
"-rcX" updates are a good fit.  In the not-too-distant past, keeping up
with 2.5.X.Y was no problem.

Yes, I realize I don't *have* to test every revision level in every
major tree, but I don't have to think about which one to pick for testing
if I can keep up with the update release schedule :-).

-- 
------------------------------------------------------------------------
Bob Tracy          |  "I was a beta tester for dirt.  They never did
rct@frus.com       |   get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:29     ` Paul Mackerras
  2008-05-01  1:57       ` Jeff Garzik
@ 2008-05-01  3:47       ` Linus Torvalds
  2008-05-01  4:17         ` Jeff Garzik
  1 sibling, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  3:47 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Rafael J. Wysocki, David Miller, linux-kernel, Andrew Morton, Jiri Slaby



On Thu, 1 May 2008, Paul Mackerras wrote:
> 
> Having things ready by the time the merge window opens is difficult
> when you don't know when the merge window is going to open.  OK, after
> you release a -rc6 or -rc7, we know it's close, but it could still be
> three weeks off at that point.  Or it could be tomorrow.

Well, if the tree is ready, you shouldn't need to care ;)

That said:

> By the way, if you do want to make that rule, then there's a really
> easy way to do it - just pull linux-next, and make that one pull be
> the entire merge window. :)  But please give us at least a week's
> notice that you're going to do that.

I'm not going to pull linux-next, because I hate how it gets rebuilt every 
time it gets done, so I would basically have to pick one at random, and 
then that would be it.

I also do actually try to spread the early pulls out a _bit_, so that 
if/when problems happen, there's some amount of information in the fact 
that something started showing up between -git2 and -git3.

HOWEVER.

One thing that was discussed when linux-next was starting up was whether I 
would maintain a next branch myself, that people could actually depend on 
(unlike linux-next, which gets rebuilt).

And while I could do that for really core infrastructure changes, I really 
would hate to see something like that become part of the flow - because 
I'd hope things that really require it should be so rare that it's not 
worth it for me to maintain a separate branch for it.

But there could be some kind of carrot here - maybe I could maintain a 
"next" branch myself, not for core infrastructure, but for stuff where the 
maintainer says "hey, I'm ready early, you can pull me into 'next' 
already".

In other words, it wouldn't be "core infrastructure", it would simply be 
stuff that you already know you'd send to me on the first day of the merge 
window. And if by maintaining a "next" branch I could encourage people to 
go early, _and_ let others perhaps build on it and sort out merge 
conflicts (which you can't do well on linux-next, exactly because it's a 
bit of a quick-sand and you cannot depend on merging the same order or 
even the same base in the end), maybe me having a 'next' branch would be 
worth it.

But it would have to be low-maintenance. Something I might open after 
-rc4, say, and something where I'd expect people to only ask me to pull 
_once_ (because they really are mostly ready, and can sort out the rest 
after the merge window), and if they have no open regressions (again, the 
"carrot" for good behaviour).

I'm not saying it's a great idea, but if that kind of flow makes sense to 
people, maybe it should be on the table as an idea or at least see if it 
might work.

But let's see how linux-next works out. Maybe all the subsystem 
maintainers can just get their tree in shape, see that it merges in 
linux-next, and not even need anything else. Then, when the merge window 
opens, if you're ready, just let me know.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:40                     ` Linus Torvalds
                                         ` (2 preceding siblings ...)
  2008-05-01  2:31                       ` Nigel Cunningham
@ 2008-05-01  3:53                       ` Frans Pop
  2008-05-01 11:38                       ` Rafael J. Wysocki
  4 siblings, 0 replies; 229+ messages in thread
From: Frans Pop @ 2008-05-01  3:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rjw, w, davem, linux-kernel, akpm, jirislaby

Linus Torvalds wrote:
> IOW, I argue that the high speed of merging very much is a big part of
> what gives us quality in the end. It may result in bugs along the way, but
> it also results in fixes, and lots of people looking at the result (and
> looking at it in *context*, not just as a patch flying around).

The main problem as I see it is with the huge number of hard, confirmed bugs 
that are *not* getting fixed.

With the current development model, developers only really care about 
current regressions. In a large part this is due to the excellent work of 
Rafael with his tracking of regressions since the previous release.
But it does mean older regressions fall by the wayside, even if they've been 
confirmed, bisected and the submitter is responsive.
For a while Natalie Protasevich did some work on trying to get attention for 
older regressions, but that effort seems to have died out.

Two concrete examples from my personal experience:
- http://bugzilla.kernel.org/show_bug.cgi?id=9749; the error:
  sysctl table check failed:
  /dev/parport/parport0/devices/ppdev0/timeslice  Sysctl already exists
  First reported for 2.6.24-rc5, just now confirmed with 2.6.25
  Acknowledged by maintainer, but no follow-up [1].

- http://bugzilla.kernel.org/show_bug.cgi?id=9310; the error:
  completely blank console with FRAMEBUFFER_CONSOLE_DETECT_PRIMARY set when
  framebuffer is active, but no VGA=xxx parameter is passed
  First reported for 2.6.23, confirmed for 2.6.24-rc6, almost certainly
  still present in 2.6.25
  Acknowledged by maintainer, but no follow-up despite later pings.


Another issue is that sometimes developers really are too eager to get their 
changes into mainline even when there are known issues or when they know in 
their heart that the changes have not received enough testing.

Example is a a scheduler change [2] that causes a completely reproducible 
regression (music skips and key repeats) on my box with one specific 
workload. Ingo and Peter have been great doing debugging after I reported 
it for 2.6.25-rc8 and it was reverted just before the release, but I was 
very surprised to see the patch resubmitted for 2.6.26 without the 
regression being resolved first.
It is now confirmed to still be there and there has been additional effort 
on it, but so far without result.

This really is nothing against Ingo (in fact he is in my experience one of 
the most responsive developers when issues are reported), but in this case 
I personally do feel the patch should not have been reintroduced into 
mainline before the regression had been sorted out.

Cheers,
FJP

[1] Update: Eric just added a nice reply in Bugzilla.
[2] http://bugzilla.kernel.org/show_bug.cgi?id=10428
    http://lkml.org/lkml/2008/4/19/181

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  3:47       ` Linus Torvalds
@ 2008-05-01  4:17         ` Jeff Garzik
  2008-05-01  4:46           ` Linus Torvalds
  2008-05-01  9:17           ` Alan Cox
  0 siblings, 2 replies; 229+ messages in thread
From: Jeff Garzik @ 2008-05-01  4:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, Rafael J. Wysocki, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

Linus Torvalds wrote:
> But there could be some kind of carrot here - maybe I could maintain a 
> "next" branch myself, not for core infrastructure, but for stuff where the 
> maintainer says "hey, I'm ready early, you can pull me into 'next' 
> already".
> 
> In other words, it wouldn't be "core infrastructure", it would simply be 
> stuff that you already know you'd send to me on the first day of the merge 
> window. And if by maintaining a "next" branch I could encourage people to 
> go early, _and_ let others perhaps build on it and sort out merge 
> conflicts (which you can't do well on linux-next, exactly because it's a 
> bit of a quick-sand and you cannot depend on merging the same order or 
> even the same base in the end), maybe me having a 'next' branch would be 
> worth it.

linux-next is _supposed_ to be solely the stuff that is ready to be sent 
to you upon window-open.

The only thing that isn't reliable are the commit ids -- and that's at 
the request of a large majority of maintainers, who noted to Stephen R 
that the branch he was pulling from them might get rebased -- thus 
necessitating the daily tree regeneration.

So, I think a 'next' branch from you would open cans o worms:

- one more tree to test, and judging from linux-next and -mm it's tough 
to get developers to test more than just upstream

- is the value of holy penguin pee great enough to overcome this 
another-tree-to-test obstacle?

- opens all the debates about running parallel branches, such as, would 
it be better to /branch/ for 2.6.X-rc, and then keep going full steam on 
the trunk?  After all, the primary logic behind 2.6.X-rc is to only take 
bug fixes, theoretically focusing developers more on that task.  But now 
we are slowly undoing that logic, or at least openly admitting that has 
been the reality all along.

	Jeff




^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 19:25                   ` Linus Torvalds
@ 2008-05-01  4:31                     ` David Newall
  2008-05-01  4:37                       ` David Miller
                                         ` (2 more replies)
  0 siblings, 3 replies; 229+ messages in thread
From: David Newall @ 2008-05-01  4:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel

Linus Torvalds wrote:
> I object to your _idiotic_ claim that there are "systemic problems", where 
> your "solution" to them is apparently to stop making releases and stop 
> making forward progress.
>   

I did not say to stop making releases or forward progress. You
completely made that up! I said there are systemic problems, namely
inadequate testing and review. Slow down; don't snatch up crap changes.
Only accept them when they are properly tested and properly reviewed.


> That's why I said you told us was nothing like that. What you told us were 
> your personal problems, no "systemic" issues.

You asked me to give a specific problem, so I did, but I also said that
the particulars of those problems weren't the point. You have ignored or
twisted everything I said. Did you ask me for a specific problem purely
to attack me with it? Perhaps you did.



Linus Torvalds also wrote:
> You complain how I don't release kernels that 
> are stable, but without any suggestions on what the issue might be

You do release kernels that are unstable, and you call them "stable",
but I'm sure I said that inadequate review and testing are causes, which
I think counts as a suggestion on what the issue might be. It's been a
repeating theme in this thread, and I'm talking about what everybody
else is saying, not what I'm saying, so again, you know that I'm not
making this up.

Stop telling the world that 2.6.25 is ready for them when you know it's
not. It's now ready for beta testing, and no more. Is 2.6.24 ready for
the world yet? There are still problems being reported with it.


> And yes, there is a solution: don't develop so much. Don't allow thousands 
> of developers to be involved. Do a small core group, and make development 
> so hard or inconvenient that you only have a few tens of people who write 
> code, and vet them and force them to jump through hoops when adding new 
> features (or fixing old ones, for that matter).
>   

You're being absurd, even hysterical. How about you require test plans
and test results? Is it possible to require serious, independent code
review?

And let me talk about code review. When one puts one's name to a
reviewed-by tag one takes joint responsibility for the result. There
needs to be some sort of balanced accounting. Presently it's all glory,
where the records show who has contributed code that made it to
mainline, but nobody counts who broke the system. There's no motive to
do a good job, in fact the opposite is true. The more crap you can sneak
in, the more glory you get.

Don't you go and twist this into some sort of, "David want's to point
fingers at people who regularly introduce bugs, which we don't want to
do" and ignore the problem. There is a problem; this entire thread is
testimony to that. You, Linus, are ultimately responsible for what goes
in so you have to acknowledge that there is a problem, you have to stop
shooting the messenger, and you have to shepherd a solution.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  4:31                     ` David Newall
@ 2008-05-01  4:37                       ` David Miller
  2008-05-01 13:49                       ` Lennart Sorensen
  2008-05-01 15:28                       ` Kasper Sandberg
  2 siblings, 0 replies; 229+ messages in thread
From: David Miller @ 2008-05-01  4:37 UTC (permalink / raw)
  To: davidn; +Cc: torvalds, linux-kernel

From: David Newall <davidn@davidnewall.com>
Date: Thu, 01 May 2008 14:01:43 +0930

> Stop telling the world that 2.6.25 is ready for them when you know it's
> not. It's now ready for beta testing, and no more. Is 2.6.24 ready for
> the world yet? There are still problems being reported with it.

This has an absurd presumption that something is only stable when
there are zero problems with it.

Fault free software, except in extremely trivial examples, does not
exist in nature.

BTW, this points out another BS aspect of your BSD fan-boy crap,
the BSD userbase is only a tiny fraction of how many people use
Linux.  So you can't even compare the number of outstanding problem
reports between the two.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  4:17         ` Jeff Garzik
@ 2008-05-01  4:46           ` Linus Torvalds
  2008-05-04 13:47             ` Krzysztof Halasa
  2008-05-01  9:17           ` Alan Cox
  1 sibling, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01  4:46 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Paul Mackerras, Rafael J. Wysocki, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby



On Thu, 1 May 2008, Jeff Garzik wrote:
> 
> linux-next is _supposed_ to be solely the stuff that is ready to be sent to
> you upon window-open.

Yes, the "stuff" may be supposed to be stable. But the trees feeding it 
certainly are not. People are rebasing them etc, and it doesn't matter 
because I think linux-next starts largely from scratch next time around.

> So, I think a 'next' branch from you would open cans o worms:
> 
> - one more tree to test, and judging from linux-next and -mm it's tough to get
> developers to test more than just upstream
> 
> - is the value of holy penguin pee great enough to overcome this
> another-tree-to-test obstacle?
> 
> - opens all the debates about running parallel branches, such as, would it be
> better to /branch/ for 2.6.X-rc, and then keep going full steam on the trunk?

I do agree. And maybe I should have made it clear that I think it's worth 
it to me only if it then means that the merge window can shrink.

If I'd have both a 'next' branch _and_ a full 2-week merge window, there's 
no upside.

Btw, it wouldn't be another tree to test, since it would presumaby be what 
'linux-next' starts out from - so it would purely be something that 
doesn't have the constant re-merging of the more wild-and-crazy 
'linux-next' tree.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  0:15                   ` Chris Shoemaker
@ 2008-05-01  5:09                     ` Willy Tarreau
  0 siblings, 0 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01  5:09 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: Rafael J. Wysocki, David Miller, mingo, akpm, torvalds,
	linux-kernel, jirislaby

On Wed, Apr 30, 2008 at 08:15:00PM -0400, Chris Shoemaker wrote:
> On Thu, May 01, 2008 at 01:12:21AM +0200, Willy Tarreau wrote:
> > On Thu, May 01, 2008 at 12:39:01AM +0200, Rafael J. Wysocki wrote:
> > > In fact, so many changes go in at a time during a merge window, that we often
> > > can't really say which of them causes the breakage observed by testers and
> > > bisection, that IMO should really be a last-resort tool, is used on the main
> > > debugging techinque.
> > 
> > Maybe we could slightly improve the process by releasing more often, but
> > based on topics. Small sets of minimally-overlapping topics would get
> > merged in each release, and other topics would only be allowed to pull
> > fixes. That way everybody still gets some work merged, everybody tests
> > and problems are more easily spotted.
> > 
> > I know this is in part what Andrew tries to do when proposing to
> > integrate trees, but maybe some approximate rules should be proposed
> > in order for developers to organize their works. This would begin
> > with announcing topics to be considered for next branch very early.
> > This would also make it more natural for developers to have creation
> > and bug-tracking phases.
> 
> What would this look like, notionally?  Say the releases were twice as
> frequent with Stage A and Stage B.  How could the topic be grouped
> into the stages?  Could bugfixes of any type be merged in either
> window?  Would this only apply to "new" features, API changes, etc? or
> would maintenance-type changes have to be assigned to a stage, too?

bug fixes are of course always possible, just that we limit important
changes, i.e. the ones which randomly break and that take a lot of time
to track down because everyone has changed something.

> -chris

willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:21                       ` Al Viro
@ 2008-05-01  5:19                         ` david
  2008-05-04  3:26                         ` Rene Herman
  1 sibling, 0 replies; 229+ messages in thread
From: david @ 2008-05-01  5:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Rafael J. Wysocki, Willy Tarreau, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

On Thu, 1 May 2008, Al Viro wrote:

> FWIW, after the last month's flamefests I decided to actually do something
> about review density of code in the areas I'm theoretically responsible
> for.  Namely, do systematic review of core data structure handling (starting
> with the place where most of the codepaths get into VFS - descriptor tables
> and struct file), doing both blow-by-blow writeup on how that sort of things
> is done and documentation of the life cycle/locking rules/assertions made
> by code/etc.  I made one bad mistake that held the things back for quite
> a while - sending heads-up for one of the worse bugs found in process to
> never-sufficiently-damned vendor-sec.  The last time I'm doing that, TYVM...
>
> Anyway, I'm going to get the notes on that stuff in order and put them in
> the open.  I really hope that other folks will join the fun afterwards.
> The goal is to get a coherent braindump that would be sufficient for
> people new to the area wanting to understand and review VFS-related code -
> both in the tree and in new patches.

thank you, the lack of good documentation on the intent of the code has 
been a significant barrier for new people. it's (relativly) easy for a 
good programmer to look at the code and figure out how it does things, a 
bit harder to figure out what it does, but why it does it (and what it was 
actually _intended_ to do) is very hard to track down

> files_struct/fdtable handling is mostly dealt with, struct file is only
> partially done - unfortunately, struct file_lock has to be dealt with
> before that and it's a (predictable) nightmare.  On the other end of
> things, fs_struct is not really started, vfsmount review is partially
> done, dentry/superblock/inode not even touched.
>
> Even with what little had been covered... well, let's just say that it
> caught quite a few fun turds.  With typical age around 3-4 years.  And
> VFS is not the messiest part of the tree...

it may not be the messiest part of the tree, but it's definantly one of 
the hardest to figure out the intent of.

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:30                 ` Jeremy Fitzhardinge
@ 2008-05-01  5:35                   ` Willy Tarreau
  0 siblings, 0 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01  5:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Rafael J. Wysocki, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Wed, Apr 30, 2008 at 06:30:43PM -0700, Jeremy Fitzhardinge wrote:
> Linus Torvalds wrote:
> >  And I think publicly announced git trees and -mm and linux-next are 
> >  great partly because they end up doing that same thing. I heartily 
> >  encourage submaintainers to always Cc: linux-kernel when they send me a 
> >  "please pull" request - I don't know if anybody else ever really pulls 
> >  that tree, but I do think that it's very healthy to write that message 
> >  and think of it as a publication event. ]
> >  
> 
> And, ideally, they would have posted the changes as patches to the list 
> for review anyway, so there shouldn't be anything surprising in that pull...

yes, it's something which has been disappearing since use of bk then git.
It would be impratical and useless to post everything during the merge
window now, but if we can get everyone to pass through linux-next, the
posts will be evenly distributed and it would make sense to require
everyone to post their changes to the list at the same time. Right now,
some developers already always post their changes. Jeff, Greg and
Bartlomiej come to mind, and I must say that I'm always interested in
performing a quick look, just in case something really obvious catches
my attention (which never happens).

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:19                   ` Linus Torvalds
  2008-05-01  1:31                     ` Andrew Morton
  2008-05-01  1:40                     ` Linus Torvalds
@ 2008-05-01  5:50                     ` Willy Tarreau
  2008-05-01 11:53                       ` Rafael J. Wysocki
  2 siblings, 1 reply; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01  5:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Wed, Apr 30, 2008 at 06:19:56PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > 
> > > I do _not_ want to slow down development by setting some kind of "quality 
> > > bar" - but I do believe that we should keep our quality high, not because 
> > > of any hoops we need to jump through, but because we take pride in the 
> > > thing we do.
> > 
> > Well, we certainly should, but do we always remeber about it?  Honest, guv?
> 
> Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like 
> process generates quality?
> 
> And I dislike how people try to conflate "quality" and "merging speed" as 
> if there was any reason what-so-ever to believe that they are related.
> 
> You (and Andrew) have tried to argue that slowing things down results in 
> better quality, and I simply don't for a moment believe that. I believe 
> the exact opposite.

Note that I'm not necessarily arguing for slowing down, but for reduced
functional conflicts (which slow down may help but it's not the only
solution). I think that refining the time resolution might achieve the
same goal. Instead of merging 10000 changes which each have 1% chance
of breaking any other area, and have all developers try to hunt bugs
caused by unrelated changes, I think we could do that in steps.

To illustrate, instead of changing 100 areas with one of them causing
breaking in the other ones, and having 100 victims try to hunt the
bug in 99 other areas, then theirs, and finally insult the faulty
author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
or 4*25, etc...). That way, we would only have 50 victims trying to
find the bug in 49 other areas (or 32 or 24). Less people wasting
their time will mean faster validation of changes, and possibly
faster release cycle with better quality.

People send you their crap every two months. If you accept half of
it every month, they don't have to sleep on their code, and at the
same time at most half of them are in trouble during half the time
(since bugs are found faster).

> So if we can get the discussion *away* from the "let's slow things down", 
> then I'm interested. Because at that point we don't have to fight made-up 
> arguments about something irrelevant.

well, is "let's split changes" ok ?

> 			Linus

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:10           ` Andrew Morton
  2008-04-30 22:19             ` Linus Torvalds
  2008-04-30 23:04             ` Dmitri Vorobiev
@ 2008-05-01  6:15             ` Jan Engelhardt
  2 siblings, 0 replies; 229+ messages in thread
From: Jan Engelhardt @ 2008-05-01  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dmitri Vorobiev, torvalds, rjw, davem, linux-kernel, jirislaby, mingo


On Thursday 2008-05-01 00:10, Andrew Morton wrote:
>> 
>> Andrew, the latter thing is a very good point. For me personally, the fact
>> that -mm is not available via git is the major obstacle for trying your
>> tree more frequently than just a few times per year.
>
>Every -mm release if available via git://, as described in the release
>announcements.
[...]
>> How difficult it
>> would be to switch to git for you?
>
>Fatal, I expect.  A tool which manages source-code files is just the wrong
>paradigm.  I manage _changes_ against someone else's source files.

Would you mind using stgit? That you way have the queue patch
functionality, yet a simple git-push -f will send the whole
patch stack over to a repo (without the stgit bits that is),
leaving what looks like a regular tree with just lots of
recent commits. Does not even need extra scripts to do a
patchset->git conversion.

>> For busy (or lazy) people like myself, the big problem with linux-next are
>> the frequent merge breakages, when pulling the tree stops with "you are in
>> the middle of a merge conflict".
>
>Really?  Doesn't Stephen handle all those problems?  It should be a clean
>fetch each time?

Indeed, assuming the remote is set up and you have a local branch,
`git reset --hard mm/master` after a fetch is the thing.
But be sure not to have any changed files.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-04-30  7:03         ` Arjan van de Ven
@ 2008-05-01  8:13           ` Andrew Morton
  2008-04-30 14:15             ` Arjan van de Ven
  2008-05-01  9:16             ` RFC: starting a kernel-testers group for newbies Frans Pop
  2008-05-01 11:30           ` Adrian Bunk
  1 sibling, 2 replies; 229+ messages in thread
From: Andrew Morton @ 2008-05-01  8:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Adrian Bunk, Linus Torvalds, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Wed, 30 Apr 2008 00:03:38 -0700 Arjan van de Ven <arjan@infradead.org> wrote:

> > First of all:
> > I 100% agree with Andrew that our biggest problems are in reviewing
> > code and resolving bugs, not in finding bugs (we already have far too
> > many unresolved bugs).
> 
> I would argue instead that we don't know which bugs to fix first.

<boggle>

How about "a bug which we just added"?  One which is repeatable. 
Repeatable by a tester who is prepared to work with us on resolving it. 
Those bugs.

Rafael has a list of them.  We release kernels when that list still has tens of
unfixed regressions dating back up to a couple of months.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  8:13           ` Andrew Morton
  2008-04-30 14:15             ` Arjan van de Ven
@ 2008-05-01  9:16             ` Frans Pop
  2008-05-01 10:30               ` Enrico Weigelt
  1 sibling, 1 reply; 229+ messages in thread
From: Frans Pop @ 2008-05-01  9:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: arjan, bunk, torvalds, rjw, davem, linux-kernel, jirislaby, rostedt

Andrew Morton wrote:
> On Wed, 30 Apr 2008 00:03:38 -0700 Arjan van de Ven <arjan@infradead.org>
> wrote:
>> I would argue instead that we don't know which bugs to fix first.
> 
> How about "a bug which we just added"?

And leave unfixed all the regressions introduced in earlier kernel versions 
and known at the time of the release of that version but still present in 
the current version? Not to mention all the other bugs reported by users of 
recent stable versions?

> One which is repeatable. 
> Repeatable by a tester who is prepared to work with us on resolving it.

That can be true for not-so-recently introduced bugs too.

There are so many bugs out there and developers tend to focus on new ones 
leaving a lot of others unattended, both important and not so important 
ones.

Which ones should someone focus on? Maybe on the ones that someone (helped) 
introduce him/herself. Maybe that should even sometimes be prioritized over 
introducing new bugs^W^W^Wdoing new development.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  4:17         ` Jeff Garzik
  2008-05-01  4:46           ` Linus Torvalds
@ 2008-05-01  9:17           ` Alan Cox
  1 sibling, 0 replies; 229+ messages in thread
From: Alan Cox @ 2008-05-01  9:17 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linus Torvalds, Paul Mackerras, Rafael J. Wysocki, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

> - opens all the debates about running parallel branches, such as, would 
> it be better to /branch/ for 2.6.X-rc, and then keep going full steam on 
> the trunk?  After all, the primary logic behind 2.6.X-rc is to only take 

That encourages developers to continue ignoring that stabilizing work.
The stall does have a side effect of refocussing them. A branch for -rc
and a monthly cycle would be interesting as it would mean that the
pushback for not fixing stability problems would be not getting you work
pulled for the main tree if you didn't fix the bugs first - and could be
both sufficient an incentive and not too vicious as it would be with a 2
month cycle.

Alan

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  9:16             ` RFC: starting a kernel-testers group for newbies Frans Pop
@ 2008-05-01 10:30               ` Enrico Weigelt
  2008-05-01 13:02                 ` Adrian Bunk
  0 siblings, 1 reply; 229+ messages in thread
From: Enrico Weigelt @ 2008-05-01 10:30 UTC (permalink / raw)
  To: linux kernel list


<big_snip />

Hi folks,


what do you think about Gentoo's "bug-wrangler" concept ?
Maybe could do something similar:

An Tester group (which eg. should be the entry point for newbies),
is responsible for receiving bug reports from users (maybe even 
distro maintainers who're not directly involved in kernel dev.). 
They try to reproduce the bugs and find out as much as they can,
then file a report to the actual kernel devs (just critical bugs 
are directly kicked to the devs with high priority). Maybe this 
group could also keep users informed about fixes and give some 
upgrade advise, etc.

This way we can build an good technical support (independent
from distributors ;-P), newbies can learn on the job and te 
load on kernel devs is reduced, so they can better concentrate
on their core competences.


What do you think about this ?


cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:43                       ` Linus Torvalds
@ 2008-05-01 10:59                         ` Rafael J. Wysocki
  2008-05-01 15:26                           ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 10:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby

On Thursday, 1 of May 2008, Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Andrew Morton wrote:
> > 
> > eh?  I argued the opposite: that increasing quality will as a side-effect
> > slow things down.
> 
> Yes, my bad, I realized that when I read through my message and already 
> sent out a fix for my buggy email ;)
> 
> > If we simply throttled things, people would spend more time watching the
> > shopping channel while merging smaller amounts of the same old crap.
> 
> I agree totally. And although some of the time would probably _also_ be 
> spent on the frustrating crap that was designed to do the throttling, that 
> isn't much more productive than watching the shopping channel would be ...

Okay, so what exactly are we going to do to address the issue that I described
in the part of my last message that you skipped?

Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-04-30  7:03         ` Arjan van de Ven
  2008-05-01  8:13           ` Andrew Morton
@ 2008-05-01 11:30           ` Adrian Bunk
  2008-04-30 14:20             ` Arjan van de Ven
  1 sibling, 1 reply; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01 11:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Andrew Morton, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Wed, Apr 30, 2008 at 12:03:38AM -0700, Arjan van de Ven wrote:
> On Thu, 1 May 2008 03:31:25 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> > > 
> > > 
> > > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > > > 
> > > > <jumps up and down>
> > > > 
> > > > There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!
> > > 
> > > The problem I see with both -mm and linux-next is that they tend to
> > > be better at finding the "physical conflict" kind of issues (ie the
> > > merge itself fails) than the "code looks ok but doesn't actually
> > > work" kind of issue.
> > > 
> > > Why?
> > > 
> > > The tester base is simply too small.
> > > 
> > > Now, if *that* could be improved, that would be wonderful, but I'm
> > > not seeing it as very likely.
> > > 
> > > I think we have fairly good penetration these days with the regular
> > > -git tree, but I think that one is quite frankly a *lot* less scary
> > > than -mm or -next are, and there it has been an absolutely huge
> > > boon to get the kernel into the Fedora test-builds etc (and I
> > > _think_ Ubuntu and SuSE also started something like that).
> > > 
> > > So I'm very pessimistic about getting a lot of test coverage before
> > > -rc1.
> > > 
> > > Maybe too pessimistic, who knows?
> > 
> > First of all:
> > I 100% agree with Andrew that our biggest problems are in reviewing
> > code and resolving bugs, not in finding bugs (we already have far too
> > many unresolved bugs).
> 
> I would argue instead that we don't know which bugs to fix first.
> We're never going to fix all bugs, and to be honest, that's ok.
>...

That might be OK.

But our current status quo is not OK:

Check Rafael's regressions lists asking yourself
"How many regressions are older than two weeks?" 

The kernel Bugzilla curerntly knows about 212 open regression bugs.
(And many more have not made it into Bugzilla.)

We have unmaintained and de facto unmaintained parts of the kernel where 
even issues that might be easy to fix don't get fixed.

>...
> So there's a few things we (and you / janitors) can do over time to get better data on what issues
> people hit: 
> 1) Get automated collection of issues more wide spread. The wider our net the better we know which
>    issues get hit a lot, and plain the more data we have on when things start, when they stop, etc etc.
>    Especially if you get a lot of testers in your project, I'd like them to install the client for easy reporting
>    of issues.
> 2) We should add more WARN_ON()s on "known bad" conditions. If it WARN_ON()'s, we can learn about it via
>    the automated collection. And we can then do the statistics to figure out which ones happen a lot.
> 3) We need to get persistent-across-reboot oops saving going; there's some venues for this

No disagreement on this, its just a different issue than our bug fixing 
problem.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:40                     ` Linus Torvalds
                                         ` (3 preceding siblings ...)
  2008-05-01  3:53                       ` Frans Pop
@ 2008-05-01 11:38                       ` Rafael J. Wysocki
  2008-04-30 14:28                         ` Arjan van de Ven
  4 siblings, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 11:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thursday, 1 of May 2008, Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Linus Torvalds wrote:
> > 
> > You (and Andrew) have tried to argue that slowing things down results in 
> > better quality,
> 
> Sorry, not Andrew. DavidN.
> 
> Andrew argued the other way (quality->slower), which I also happen to not 
> necessarily believe in, but that's a separate argument.
> 
> Nobody should ever argue against raising quality.
> 
> The question could be about "at what cost"? (although I think that's not 
> necessarily a good argument, since I personally suspect that good quality 
> code comes from _lowering_ costs, not raising them).
> 
> But what's really relevant is "how?"
> 
> Now, we do know that open-source code tends to be higher quality (along a 
> number of metrics) than closed source code, and my argument is that it's 
> not because of bike-shedding (aka code review), but simply because the 
> code is out there and available and visible.
> 
> And as a result of that, my personal belief is that the best way to raise 
> quality of code is to distribute it. Yes, as patches for discussion, but 
> even more so as a part of a cohesive whole - as _merged_ patches!
> 
> The thing is, the quality of individual patches isn't what matters! What 
> matters is the quality of the end result. And people are going to be a lot 
> more involved in looking at, testing, and working with code that is 
> merged, rather than code that isn't.
> 
> So _my_ answer to the "how do we raise quality" is actually the exact 
> reverse of what you guys seem to be arguing.
> 
> IOW, I argue that the high speed of merging very much is a big part of 
> what gives us quality in the end. It may result in bugs along the way, but 
> it also results in fixes, and lots of people looking at the result (and 
> looking at it in *context*, not just as a patch flying around).

And we introduce bugs that nobody sees until they appear in a CERT advisory.

IMnsHO, the quick merging results in lots of code that nobody looked at,
except for the author, nobody is looking at and nobody will _ever_ look at.
Simply, because there's no time for looking at that code, since we're
supposed to be working on preparing new code for the next merge window, testing
the already merged code etc., around the clock.  Now, you may hope that this
not-looked-at-by-anyone code is of high quality nevertheless, but I somehow
doubt it.

[Note that it's not directly related to the issue at hand, which is the fact
that people affected by regressions are heavily punished by our current
process.  Never mind, though.]

And that's not to mention bugs that appear in the code everybody looked at
and happily reach the mainline because that code has not been tested well
enough before merging.  Take SLUB as an example, if you wish.

The fact is, we're merging stuff with minimal-to-no review and with minimal
testing reasonably possible.  Is _that_ supposed to produce the high quality?

Also, I'm not buying the argument that the quality of code improves over time
just because it's open and available to everyone.  That only happens to the
code which is actually looked at by someone or attempted to modify.  This
obviously doesn't apply to the whole kernel code.

For this reason, IMO, we should do our best to ensure that the code being
merged is of high quality already at the moment we merge it.  How to achieve
that is a separate issue.

BTW, we seem to underestimate testing in this discussion.  In fact, the vast
majority of kernel bugs are discovered by testing, so perhaps the way to go
is to make regular testing of the new code a part of the process.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  5:50                     ` Willy Tarreau
@ 2008-05-01 11:53                       ` Rafael J. Wysocki
  2008-05-01 12:11                         ` Will Newton
                                           ` (2 more replies)
  0 siblings, 3 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 11:53 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thursday, 1 of May 2008, Willy Tarreau wrote:
> On Wed, Apr 30, 2008 at 06:19:56PM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > 
> > > > I do _not_ want to slow down development by setting some kind of "quality 
> > > > bar" - but I do believe that we should keep our quality high, not because 
> > > > of any hoops we need to jump through, but because we take pride in the 
> > > > thing we do.
> > > 
> > > Well, we certainly should, but do we always remeber about it?  Honest, guv?
> > 
> > Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like 
> > process generates quality?
> > 
> > And I dislike how people try to conflate "quality" and "merging speed" as 
> > if there was any reason what-so-ever to believe that they are related.
> > 
> > You (and Andrew) have tried to argue that slowing things down results in 
> > better quality, and I simply don't for a moment believe that. I believe 
> > the exact opposite.
> 
> Note that I'm not necessarily arguing for slowing down, but for reduced
> functional conflicts (which slow down may help but it's not the only
> solution). I think that refining the time resolution might achieve the
> same goal. Instead of merging 10000 changes which each have 1% chance
> of breaking any other area, and have all developers try to hunt bugs
> caused by unrelated changes, I think we could do that in steps.
> 
> To illustrate, instead of changing 100 areas with one of them causing
> breaking in the other ones, and having 100 victims try to hunt the
> bug in 99 other areas, then theirs, and finally insult the faulty
> author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
> or 4*25, etc...). That way, we would only have 50 victims trying to
> find the bug in 49 other areas (or 32 or 24). Less people wasting
> their time will mean faster validation of changes, and possibly
> faster release cycle with better quality.
> 
> People send you their crap every two months. If you accept half of
> it every month, they don't have to sleep on their code, and at the
> same time at most half of them are in trouble during half the time
> (since bugs are found faster).

Well, as far as I'm concerned, that will work too.

> > So if we can get the discussion *away* from the "let's slow things down", 
> > then I'm interested. Because at that point we don't have to fight made-up 
> > arguments about something irrelevant.
> 
> well, is "let's split changes" ok ?

How about:

(1) Merge a couple of trees at a time (one tree at a time would be ideal, but
    that's impossible due to the total number of trees).
(2) After (1) give testers some time to report problems introduced by the
    merge.
(3) Wait until the most urgent problems are resolved.  Revert the offending
    changes if there's no solution within given time.
(4) Repeat for another couple of trees.
(5) Arrange things so that every tree gets merged once every two months.

This would also give us an idea of which trees introduce more problems.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  1:25                   ` Adrian Bunk
@ 2008-05-01 12:05                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 12:05 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: david, Linus Torvalds, David Miller, linux-kernel, Andrew Morton,
	Jiri Slaby

On Thursday, 1 of May 2008, Adrian Bunk wrote:
> On Thu, May 01, 2008 at 02:56:23AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, 1 of May 2008, Adrian Bunk wrote:
> > > On Thu, May 01, 2008 at 01:45:38AM +0200, Rafael J. Wysocki wrote:
> > > > On Thursday, 1 of May 2008, david@lang.hm wrote:
> > > > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > > > 
> > > > > > On Wednesday, 30 of April 2008, Linus Torvalds wrote:
> > > > > >>
> > > > > >> On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
> > > > > >> So your "fewer commits over a unit of time" doesn't make sense.
> > > > > >
> > > > > > Oh, yes it does.  Equally well you could say that having brakes in a car
> > > > > > didn't make sense, even if you could drive it as fast as the engine allowed
> > > > > > you to. ;-)
> > > > > >
> > > > > >> We have those ten thousand commits. They need to go in. They cannot take
> > > > > >> forever.
> > > > > >
> > > > > > But perhaps some of them can wait a bit longer.
> > > > > 
> > > > > not really, if patches are produced at a rate of 1000/week and you decide 
> > > > > to only accept 2000 of them this month, a month later you have 6000 
> > > > > patches to deal with.
> > > > 
> > > > Well, I think you know how TCP works.  The sender can only send as much
> > > > data as the receiver lets it, no matter how much data there are to send.
> > > > I'm thinking about an analogous approach.
> > > > 
> > > > If the developers who produce those patches know in advance about the rate
> > > > limit and are promised to be treated fairly, they should be able to organize
> > > > their work in a different way.
> > > >...
> > > 
> > > We cannot control who develops what.
> > 
> > We don't need to.
> > 
> > > When someone wants some feature or wants to get Linux running on his 
> > > hardware he will always develop the code.
> > > 
> > > We can only control what we merge.
> > 
> > To be exact, we control what we merge and when.  There's no rule saying that
> > every patch has to be merged as soon as it appears to be ready for merging,
> > or during the nearest merge window, AFAICS.
> 
> What currently gets applied to the kernel are between two and three 
> million lines changed per year.
> 
> We can discuss when and how to apply them.
> 
> But unless we want to create an evergrowing backlog we have to change 
> roughly 200.000 lines per month on average.
> 
> Even with higher quality criteria that might result in some code not 
> being merged we will still be > 100.000 lines per month on average.
> 
> > > And the main rationale for the 2.6 development model was that we do no 
> > > longer want distributions to ship kernels with insane amounts of 
> > > patches.
> > 
> > This was an argument agaist starting a separate development branch in analogy
> > with 2.5, IIRC, and I agree with that.
> >
> > Still, I think we don't need to merge patches at the current rate and it might
> > help improve their overall quality if we didn't.  Of course, the latter is only
> > a speculation, although it's based on my experience.
> 
> See above - what do you want to do if we'd merge less and have a backlog 
> of let's say one million lines to change after one year, much of it 
> already in distribution kernels?
> 
> I also don't like this situation, but we have to cope with it.

Well, I'm feeling that's what Linus is tryig to say too. :-)

I, for one, don't really want to cope with a situation I don't feel comfortable
in, because in the long run that leads to growing frustration.  It seems pretty
obvious to me that people generally get more and more frustrated with the
current development process and it will have to be addressed somehow anyway.

If there's a problem, and I think that there really _is_ one, we should at
least try to _address_ it instead of just trying to duck it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 11:53                       ` Rafael J. Wysocki
@ 2008-05-01 12:11                         ` Will Newton
  2008-05-01 13:16                         ` Bartlomiej Zolnierkiewicz
  2008-05-01 19:36                         ` Valdis.Kletnieks
  2 siblings, 0 replies; 229+ messages in thread
From: Will Newton @ 2008-05-01 12:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Willy Tarreau, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Thu, May 1, 2008 at 12:53 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> On Thursday, 1 of May 2008, Willy Tarreau wrote:
>  > On Wed, Apr 30, 2008 at 06:19:56PM -0700, Linus Torvalds wrote:
>  > >
>  > >
>  > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
>  > > >
>  > > > > I do _not_ want to slow down development by setting some kind of "quality
>  > > > > bar" - but I do believe that we should keep our quality high, not because
>  > > > > of any hoops we need to jump through, but because we take pride in the
>  > > > > thing we do.
>  > > >
>  > > > Well, we certainly should, but do we always remeber about it?  Honest, guv?
>  > >
>  > > Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like
>  > > process generates quality?
>  > >
>  > > And I dislike how people try to conflate "quality" and "merging speed" as
>  > > if there was any reason what-so-ever to believe that they are related.
>  > >
>  > > You (and Andrew) have tried to argue that slowing things down results in
>  > > better quality, and I simply don't for a moment believe that. I believe
>  > > the exact opposite.
>  >
>  > Note that I'm not necessarily arguing for slowing down, but for reduced
>  > functional conflicts (which slow down may help but it's not the only
>  > solution). I think that refining the time resolution might achieve the
>  > same goal. Instead of merging 10000 changes which each have 1% chance
>  > of breaking any other area, and have all developers try to hunt bugs
>  > caused by unrelated changes, I think we could do that in steps.
>  >
>  > To illustrate, instead of changing 100 areas with one of them causing
>  > breaking in the other ones, and having 100 victims try to hunt the
>  > bug in 99 other areas, then theirs, and finally insult the faulty
>  > author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
>  > or 4*25, etc...). That way, we would only have 50 victims trying to
>  > find the bug in 49 other areas (or 32 or 24). Less people wasting
>  > their time will mean faster validation of changes, and possibly
>  > faster release cycle with better quality.
>  >
>  > People send you their crap every two months. If you accept half of
>  > it every month, they don't have to sleep on their code, and at the
>  > same time at most half of them are in trouble during half the time
>  > (since bugs are found faster).
>
>  Well, as far as I'm concerned, that will work too.
>
>
>  > > So if we can get the discussion *away* from the "let's slow things down",
>  > > then I'm interested. Because at that point we don't have to fight made-up
>  > > arguments about something irrelevant.
>  >
>  > well, is "let's split changes" ok ?
>
>  How about:
>
>  (1) Merge a couple of trees at a time (one tree at a time would be ideal, but
>     that's impossible due to the total number of trees).
>  (2) After (1) give testers some time to report problems introduced by the
>     merge.
>  (3) Wait until the most urgent problems are resolved.  Revert the offending
>     changes if there's no solution within given time.
>  (4) Repeat for another couple of trees.
>  (5) Arrange things so that every tree gets merged once every two months.
>
>  This would also give us an idea of which trees introduce more problems.

Perhaps it would make sense to split the merge window into 2 - first
week kernel/net/mm/lib etc., second week arch/drivers/fs? Obviously
some changes are going to span those two areas but it might help in
pinpointing where breakage was introduced as well as quietening the
thundering herd of pull requests at the start of a merge window and
thereby allow review to happen over a longer period.

Or I could just be dreaming...

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:41             ` Andrew Morton
  2008-04-30 23:23               ` Rafael J. Wysocki
  2008-05-01  0:57               ` Adrian Bunk
@ 2008-05-01 12:31               ` Tarkan Erimer
  2008-05-01 15:34                 ` Stefan Richter
  2 siblings, 1 reply; 229+ messages in thread
From: Tarkan Erimer @ 2008-05-01 12:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, rjw, davem, linux-kernel, jirislaby

Andrew Morton wrote:
> So David's Subject: should have been "Do Better, please".  Slowing down is
> just a side-effect.  And, we expect, a tool.
>
>   
To improve the quality of kernel releases, maybe we can create a special 
kernel testing tool.
This tool should have :

- Can check known bugs, regressions, compile errors etc.

- The design should be modular (plug-in support). So, easily these 
regressions, known bugs etc. should be implemented.

- It should have a git support. So,when hit a bug, this tool should have 
ability to bisect the commits which automates to finding buggy commits.

- It should have console interface and X interface. So; not just 
developers, also users, who wants to help to find out the issues, can 
contribute easily.

Just a few things came to my mind when thought about it. Any more 
ideas/suggestions welcomed :-) Also, we can create a web site for this 
project and we can identify the known regressions, bugs etc. So, easily 
who wants to contribute some code about this tool, easily find out these 
issues. If anyone interests to create/lead such a tool like this, I can 
host a website about this project on our system.

Cheers

Tarkan




^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 14:28                         ` Arjan van de Ven
@ 2008-05-01 12:41                           ` Rafael J. Wysocki
  2008-04-30 15:06                             ` Arjan van de Ven
  0 siblings, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 12:41 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Willy Tarreau, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Wednesday, 30 of April 2008, Arjan van de Ven wrote:
> On Thu, 1 May 2008 13:38:33 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > BTW, we seem to underestimate testing in this discussion.  In fact,
> > the vast majority of kernel bugs are discovered by testing, so
> > perhaps the way to go is to make regular testing of the new code a
> > part of the process.
> 
> well.. -rc1 to -rc8 are doing that already, somewhat.

Somewhat.

> Can we do better? Always. The more testing the better, and the more
> testers the better.

The testing is not really a part of the process right now, though.  We somehow
hope that the kernel will be tested sufficiently before a major release, but
we don't measure the testing coverage, for example.  Of course, that will
involve more work independent of the code writing, but at one point it'll
just become a necessity.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-04-30 14:15             ` Arjan van de Ven
@ 2008-05-01 12:42               ` David Woodhouse
  2008-04-30 15:02                 ` Arjan van de Ven
  2008-05-05 10:03                 ` Benny Halevy
  2008-05-04 12:45               ` Rene Herman
  1 sibling, 2 replies; 229+ messages in thread
From: David Woodhouse @ 2008-05-01 12:42 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Adrian Bunk, Linus Torvalds, Rafael J. Wysocki,
	davem, linux-kernel, jirislaby, Steven Rostedt

On Wed, 2008-04-30 at 07:15 -0700, Arjan van de Ven wrote:
> Maybe that's a "boggle" for you; but for me that's symptomatic of
> where we are today: We don't make (effective) prioritization
> decisions. Such decisions are hard, because it effectively means
> telling people "I'm sorry but your bug is not yet important". 

It's not that clear-cut, either. Something which manifests itself as a
build failure or an immediate test failure on m68k alone, might actually
turn out to cause subtle data corruption on other platforms.

You can't always know that it isn't important, just because it only
shows up in some esoteric circumstances. You only really know how
important it was _after_ you've fixed it.

That obviously doesn't help us to prioritise.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-04-30 14:20             ` Arjan van de Ven
@ 2008-05-01 12:53               ` Rafael J. Wysocki
  2008-05-01 13:21               ` Adrian Bunk
  1 sibling, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 12:53 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Adrian Bunk, Linus Torvalds, Andrew Morton, davem, linux-kernel,
	jirislaby, Steven Rostedt

On Wednesday, 30 of April 2008, Arjan van de Ven wrote:
> On Thu, 1 May 2008 14:30:38 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Wed, Apr 30, 2008 at 12:03:38AM -0700, Arjan van de Ven wrote:
> > > On Thu, 1 May 2008 03:31:25 +0300
> > > Adrian Bunk <bunk@kernel.org> wrote:
> > > 
> > > > On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> > > > > 
> > > > > 
> > > > > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > > > > > 
> > > > > > <jumps up and down>
> > > > > > 
> > > > > > There should be nothing in 2.6.x-rc1 which wasn't in
> > > > > > 2.6.x-mm1!
> > > > > 
> > > > > The problem I see with both -mm and linux-next is that they
> > > > > tend to be better at finding the "physical conflict" kind of
> > > > > issues (ie the merge itself fails) than the "code looks ok but
> > > > > doesn't actually work" kind of issue.
> > > > > 
> > > > > Why?
> > > > > 
> > > > > The tester base is simply too small.
> > > > > 
> > > > > Now, if *that* could be improved, that would be wonderful, but
> > > > > I'm not seeing it as very likely.
> > > > > 
> > > > > I think we have fairly good penetration these days with the
> > > > > regular -git tree, but I think that one is quite frankly a
> > > > > *lot* less scary than -mm or -next are, and there it has been
> > > > > an absolutely huge boon to get the kernel into the Fedora
> > > > > test-builds etc (and I _think_ Ubuntu and SuSE also started
> > > > > something like that).
> > > > > 
> > > > > So I'm very pessimistic about getting a lot of test coverage
> > > > > before -rc1.
> > > > > 
> > > > > Maybe too pessimistic, who knows?
> > > > 
> > > > First of all:
> > > > I 100% agree with Andrew that our biggest problems are in
> > > > reviewing code and resolving bugs, not in finding bugs (we
> > > > already have far too many unresolved bugs).
> > > 
> > > I would argue instead that we don't know which bugs to fix first.
> > > We're never going to fix all bugs, and to be honest, that's ok.
> > >...
> > 
> > That might be OK.
> > 
> > But our current status quo is not OK:
> > 
> > Check Rafael's regressions lists asking yourself
> > "How many regressions are older than two weeks?" 
> 
> "ext4 doesn't compile on m68k".
> YAWN.
> 
> Wrong question...
> "How many bugs that a sizable portion of users will hit in reality are there?"
> is the right question to ask...
> 
> 
> > 
> > We have unmaintained and de facto unmaintained parts of the kernel
> > where even issues that might be easy to fix don't get fixed.
> 
> And how many people are hitting those issues? If a part of the kernel is really
> important to enough people, there tends to be someone who stands up to either fix
> the issue or start de-facto maintaining that part.
> And yes I know there's parts where that doesn't hold. But to be honest, there's
> not that many of them that have active development (and thus get the biggest
> share of regressions)
> 
> > 
> > >...
> > > So there's a few things we (and you / janitors) can do over time to
> > > get better data on what issues people hit: 
> > > 1) Get automated collection of issues more wide spread. The wider
> > > our net the better we know which issues get hit a lot, and plain
> > > the more data we have on when things start, when they stop, etc
> > > etc. Especially if you get a lot of testers in your project, I'd
> > > like them to install the client for easy reporting of issues. 2) We
> > > should add more WARN_ON()s on "known bad" conditions. If it
> > > WARN_ON()'s, we can learn about it via the automated collection.
> > > And we can then do the statistics to figure out which ones happen a
> > > lot. 3) We need to get persistent-across-reboot oops saving going;
> > > there's some venues for this
> > 
> > No disagreement on this, its just a different issue than our bug
> > fixing problem.
> 
> No it's not! Knowing earlier and better which bugs get hit is NOT different
> to our bug fixing "problem", it's in fact an essential part to the solution of it!

Agreed.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 10:30               ` Enrico Weigelt
@ 2008-05-01 13:02                 ` Adrian Bunk
  0 siblings, 0 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01 13:02 UTC (permalink / raw)
  To: Enrico Weigelt; +Cc: linux kernel list

On Thu, May 01, 2008 at 12:30:00PM +0200, Enrico Weigelt wrote:
> 
> <big_snip />
> 
> Hi folks,
> 
> 
> what do you think about Gentoo's "bug-wrangler" concept ?
> Maybe could do something similar:
> 
> An Tester group (which eg. should be the entry point for newbies),
> is responsible for receiving bug reports from users (maybe even 
> distro maintainers who're not directly involved in kernel dev.). 
> They try to reproduce the bugs and find out as much as they can,
> then file a report to the actual kernel devs (just critical bugs 
> are directly kicked to the devs with high priority). Maybe this 
> group could also keep users informed about fixes and give some 
> upgrade advise, etc.
> 
> This way we can build an good technical support (independent
> from distributors ;-P), newbies can learn on the job and te 
> load on kernel devs is reduced, so they can better concentrate
> on their core competences.
> 
> What do you think about this ?

Andrew already does more or less this.

The problems are:
- kernel bugs tend to very quickly reach the state where you need expert
  knowledge in some area, and there's definitely not much room for
  newbies in bug handling
- "try to reproduce the bugs" works for much software, but in the 
  kernel bugs often tend to depend on some specific hardware

> cu

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 11:53                       ` Rafael J. Wysocki
  2008-05-01 12:11                         ` Will Newton
@ 2008-05-01 13:16                         ` Bartlomiej Zolnierkiewicz
  2008-05-01 13:53                           ` Rafael J. Wysocki
  2008-05-01 15:29                           ` Ray Lee
  2008-05-01 19:36                         ` Valdis.Kletnieks
  2 siblings, 2 replies; 229+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-05-01 13:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Willy Tarreau, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Thursday 01 May 2008, Rafael J. Wysocki wrote:
> On Thursday, 1 of May 2008, Willy Tarreau wrote:
> > On Wed, Apr 30, 2008 at 06:19:56PM -0700, Linus Torvalds wrote:
> > > 
> > > 
> > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > > 
> > > > > I do _not_ want to slow down development by setting some kind of "quality 
> > > > > bar" - but I do believe that we should keep our quality high, not because 
> > > > > of any hoops we need to jump through, but because we take pride in the 
> > > > > thing we do.
> > > > 
> > > > Well, we certainly should, but do we always remeber about it?  Honest, guv?
> > > 
> > > Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like 
> > > process generates quality?
> > > 
> > > And I dislike how people try to conflate "quality" and "merging speed" as 
> > > if there was any reason what-so-ever to believe that they are related.
> > > 
> > > You (and Andrew) have tried to argue that slowing things down results in 
> > > better quality, and I simply don't for a moment believe that. I believe 
> > > the exact opposite.
> > 
> > Note that I'm not necessarily arguing for slowing down, but for reduced
> > functional conflicts (which slow down may help but it's not the only
> > solution). I think that refining the time resolution might achieve the
> > same goal. Instead of merging 10000 changes which each have 1% chance
> > of breaking any other area, and have all developers try to hunt bugs
> > caused by unrelated changes, I think we could do that in steps.
> > 
> > To illustrate, instead of changing 100 areas with one of them causing
> > breaking in the other ones, and having 100 victims try to hunt the
> > bug in 99 other areas, then theirs, and finally insult the faulty
> > author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
> > or 4*25, etc...). That way, we would only have 50 victims trying to
> > find the bug in 49 other areas (or 32 or 24). Less people wasting
> > their time will mean faster validation of changes, and possibly
> > faster release cycle with better quality.
> > 
> > People send you their crap every two months. If you accept half of
> > it every month, they don't have to sleep on their code, and at the
> > same time at most half of them are in trouble during half the time
> > (since bugs are found faster).
> 
> Well, as far as I'm concerned, that will work too.
> 
> > > So if we can get the discussion *away* from the "let's slow things down", 
> > > then I'm interested. Because at that point we don't have to fight made-up 
> > > arguments about something irrelevant.
> > 
> > well, is "let's split changes" ok ?
> 
> How about:
> 
> (1) Merge a couple of trees at a time (one tree at a time would be ideal, but
>     that's impossible due to the total number of trees).
> (2) After (1) give testers some time to report problems introduced by the
>     merge.
> (3) Wait until the most urgent problems are resolved.  Revert the offending
>     changes if there's no solution within given time.
> (4) Repeat for another couple of trees.
> (5) Arrange things so that every tree gets merged once every two months.
> 
> This would also give us an idea of which trees introduce more problems.

...and what would you do with such information?

I'm not actually worried about my tree but if (theoretically) it happens to
be amongst the "problematic" ones I would be a bit pissed by blame shifting,
especially given that it is very difficult to compare different trees as
they (usually) deal with quite different areas of the code (some are messy
and problematic, yet critical while others can be more forgiving).

Also slowing down things to focus on quality is really a bad idea.  You can
trust me on this one, I've tried it once on the smaller scale and it was a
big disaster cause people won't focus on quality just because you want them
to.  They'll continue to operate in the usual way and try to workaround you
instead (which in turn causes extra tensions which may become quiet warfare).
In the end you will have a lot more problems to deal with...

Same goes for any other kind of improvement by incorporating "punishment" as
the part of the process.  You are much better helping people and trying them
to understand that they should apply some changes to their way of work because
it would be also beneficial for _them_, not only for _you_.

Now regarding the development model - I think that there is really no need
for a revolution yet, instead we should focus on refining the current process
(which works great IMO), just to summarize various ideas given by people:

- try to persuade few black sheeps that skipping linux-next completely for
  whole patch series is a really bad idea and that they should try to spend
  a bit more time on planning for merge instead of LastMinute assembly+push
  (by doing it right they could spend more time after merge to prepare for
  the next one or fixing old bugs instead of chasing new regressions, overall
  they should have _more_ time for development by doing it right)

- encourage flatting of merges during the merge window so instead of 1-2 big
  merges per tree at the beginning of the merge you have few smaller ones
  (majority of maintainers do it this way already)

- more testing for linux-next, distros may be of a great help here (-mm and
  -next often catches bugs that you wouldn't have ever imagined in the first
  place and they get fixed before the problem propagates into Linus' tree)

- more documentation for lowering the entry barrier for people who would like
  to review the code (what Al has mentioned in this thread is a great idea
  so no need for me to repeat it here)

- more co-operation between people from different areas of the code
  (i.e. testing linux-next instead of your own tree)

and just not to forget - changes happen by people actually putting the work
into them not by endless discussions.

Thanks,
Bart

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-04-30 14:20             ` Arjan van de Ven
  2008-05-01 12:53               ` Rafael J. Wysocki
@ 2008-05-01 13:21               ` Adrian Bunk
  2008-05-01 15:49                 ` Andrew Morton
  2008-05-02  2:08                 ` Paul Mackerras
  1 sibling, 2 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01 13:21 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Andrew Morton, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Wed, Apr 30, 2008 at 07:20:13AM -0700, Arjan van de Ven wrote:
> On Thu, 1 May 2008 14:30:38 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Wed, Apr 30, 2008 at 12:03:38AM -0700, Arjan van de Ven wrote:
> > > On Thu, 1 May 2008 03:31:25 +0300
> > > Adrian Bunk <bunk@kernel.org> wrote:
> > > 
> > > > On Wed, Apr 30, 2008 at 01:31:08PM -0700, Linus Torvalds wrote:
> > > > > 
> > > > > 
> > > > > On Wed, 30 Apr 2008, Andrew Morton wrote:
> > > > > > 
> > > > > > <jumps up and down>
> > > > > > 
> > > > > > There should be nothing in 2.6.x-rc1 which wasn't in
> > > > > > 2.6.x-mm1!
> > > > > 
> > > > > The problem I see with both -mm and linux-next is that they
> > > > > tend to be better at finding the "physical conflict" kind of
> > > > > issues (ie the merge itself fails) than the "code looks ok but
> > > > > doesn't actually work" kind of issue.
> > > > > 
> > > > > Why?
> > > > > 
> > > > > The tester base is simply too small.
> > > > > 
> > > > > Now, if *that* could be improved, that would be wonderful, but
> > > > > I'm not seeing it as very likely.
> > > > > 
> > > > > I think we have fairly good penetration these days with the
> > > > > regular -git tree, but I think that one is quite frankly a
> > > > > *lot* less scary than -mm or -next are, and there it has been
> > > > > an absolutely huge boon to get the kernel into the Fedora
> > > > > test-builds etc (and I _think_ Ubuntu and SuSE also started
> > > > > something like that).
> > > > > 
> > > > > So I'm very pessimistic about getting a lot of test coverage
> > > > > before -rc1.
> > > > > 
> > > > > Maybe too pessimistic, who knows?
> > > > 
> > > > First of all:
> > > > I 100% agree with Andrew that our biggest problems are in
> > > > reviewing code and resolving bugs, not in finding bugs (we
> > > > already have far too many unresolved bugs).
> > > 
> > > I would argue instead that we don't know which bugs to fix first.
> > > We're never going to fix all bugs, and to be honest, that's ok.
> > >...
> > 
> > That might be OK.
> > 
> > But our current status quo is not OK:
> > 
> > Check Rafael's regressions lists asking yourself
> > "How many regressions are older than two weeks?" 
> 
> "ext4 doesn't compile on m68k".
> YAWN.
>  
> Wrong question...
> "How many bugs that a sizable portion of users will hit in reality are there?"
> is the right question to ask...
>...

"Kernel oops while running kernbench and tbench on powerpc" took more 
than 2 months to get resolved, and we ship 2.6.25 with this regression.

Granted that compared to x86 there's not a sizable portion of users 
crazy enough to run Linux on powerpc machines...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  0:41         ` David Miller
@ 2008-05-01 13:23           ` Adrian Bunk
  0 siblings, 0 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01 13:23 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, rjw, linux-kernel, jirislaby, rostedt

On Wed, Apr 30, 2008 at 05:41:58PM -0700, David Miller wrote:
> From: Adrian Bunk <bunk@kernel.org>
> Date: Thu, 1 May 2008 03:31:25 +0300
> 
> > - get a mailing list at vger
> 
> kernel-testers@vger.kernel.org has been created, feel free to
> use it

Thanks  :-)
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  4:31                     ` David Newall
  2008-05-01  4:37                       ` David Miller
@ 2008-05-01 13:49                       ` Lennart Sorensen
  2008-05-01 15:28                       ` Kasper Sandberg
  2 siblings, 0 replies; 229+ messages in thread
From: Lennart Sorensen @ 2008-05-01 13:49 UTC (permalink / raw)
  To: David Newall; +Cc: Linus Torvalds, David Miller, linux-kernel

On Thu, May 01, 2008 at 02:01:43PM +0930, David Newall wrote:
[snip]
> Stop telling the world that 2.6.25 is ready for them when you know it's
> not. It's now ready for beta testing, and no more. Is 2.6.24 ready for
> the world yet? There are still problems being reported with it.

If a kernel release works without problems on 9999 out of 10000
machines, is it stable?  How few specific combinations of hardware are
there allowed to be with any problems before you can call it stable?
How do you know a problem you see wasn't tested by 500 people none of
whom had any problems because none of them had the hardware you do?

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 13:16                         ` Bartlomiej Zolnierkiewicz
@ 2008-05-01 13:53                           ` Rafael J. Wysocki
  2008-05-01 14:35                             ` Bartlomiej Zolnierkiewicz
  2008-05-01 15:29                           ` Ray Lee
  1 sibling, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 13:53 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Willy Tarreau, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Thursday, 1 of May 2008, Bartlomiej Zolnierkiewicz wrote:
> On Thursday 01 May 2008, Rafael J. Wysocki wrote:
> > On Thursday, 1 of May 2008, Willy Tarreau wrote:
> > > On Wed, Apr 30, 2008 at 06:19:56PM -0700, Linus Torvalds wrote:
> > > > 
> > > > 
> > > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > > > 
> > > > > > I do _not_ want to slow down development by setting some kind of "quality 
> > > > > > bar" - but I do believe that we should keep our quality high, not because 
> > > > > > of any hoops we need to jump through, but because we take pride in the 
> > > > > > thing we do.
> > > > > 
> > > > > Well, we certainly should, but do we always remeber about it?  Honest, guv?
> > > > 
> > > > Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like 
> > > > process generates quality?
> > > > 
> > > > And I dislike how people try to conflate "quality" and "merging speed" as 
> > > > if there was any reason what-so-ever to believe that they are related.
> > > > 
> > > > You (and Andrew) have tried to argue that slowing things down results in 
> > > > better quality, and I simply don't for a moment believe that. I believe 
> > > > the exact opposite.
> > > 
> > > Note that I'm not necessarily arguing for slowing down, but for reduced
> > > functional conflicts (which slow down may help but it's not the only
> > > solution). I think that refining the time resolution might achieve the
> > > same goal. Instead of merging 10000 changes which each have 1% chance
> > > of breaking any other area, and have all developers try to hunt bugs
> > > caused by unrelated changes, I think we could do that in steps.
> > > 
> > > To illustrate, instead of changing 100 areas with one of them causing
> > > breaking in the other ones, and having 100 victims try to hunt the
> > > bug in 99 other areas, then theirs, and finally insult the faulty
> > > author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
> > > or 4*25, etc...). That way, we would only have 50 victims trying to
> > > find the bug in 49 other areas (or 32 or 24). Less people wasting
> > > their time will mean faster validation of changes, and possibly
> > > faster release cycle with better quality.
> > > 
> > > People send you their crap every two months. If you accept half of
> > > it every month, they don't have to sleep on their code, and at the
> > > same time at most half of them are in trouble during half the time
> > > (since bugs are found faster).
> > 
> > Well, as far as I'm concerned, that will work too.
> > 
> > > > So if we can get the discussion *away* from the "let's slow things down", 
> > > > then I'm interested. Because at that point we don't have to fight made-up 
> > > > arguments about something irrelevant.
> > > 
> > > well, is "let's split changes" ok ?
> > 
> > How about:
> > 
> > (1) Merge a couple of trees at a time (one tree at a time would be ideal, but
> >     that's impossible due to the total number of trees).
> > (2) After (1) give testers some time to report problems introduced by the
> >     merge.
> > (3) Wait until the most urgent problems are resolved.  Revert the offending
> >     changes if there's no solution within given time.
> > (4) Repeat for another couple of trees.
> > (5) Arrange things so that every tree gets merged once every two months.
> > 
> > This would also give us an idea of which trees introduce more problems.
> 
> ...and what would you do with such information?
> 
> I'm not actually worried about my tree but if (theoretically) it happens to
> be amongst the "problematic" ones I would be a bit pissed by blame shifting,
> especially given that it is very difficult to compare different trees as
> they (usually) deal with quite different areas of the code (some are messy
> and problematic, yet critical while others can be more forgiving).
> 
> Also slowing down things to focus on quality is really a bad idea.  You can
> trust me on this one, I've tried it once on the smaller scale and it was a
> big disaster cause people won't focus on quality just because you want them
> to.  They'll continue to operate in the usual way and try to workaround you
> instead (which in turn causes extra tensions which may become quiet warfare).
> In the end you will have a lot more problems to deal with...

Well, I won't discuss with your experience.

> Same goes for any other kind of improvement by incorporating "punishment" as
> the part of the process.  You are much better helping people and trying them
> to understand that they should apply some changes to their way of work because
> it would be also beneficial for _them_, not only for _you_.

I agree.

> Now regarding the development model - I think that there is really no need
> for a revolution yet, instead we should focus on refining the current process
> (which works great IMO), just to summarize various ideas given by people:
> 
> - try to persuade few black sheeps that skipping linux-next completely for
>   whole patch series is a really bad idea and that they should try to spend
>   a bit more time on planning for merge instead of LastMinute assembly+push
>   (by doing it right they could spend more time after merge to prepare for
>   the next one or fixing old bugs instead of chasing new regressions, overall
>   they should have _more_ time for development by doing it right)
> 
> - encourage flatting of merges during the merge window so instead of 1-2 big
>   merges per tree at the beginning of the merge you have few smaller ones
>   (majority of maintainers do it this way already)
> 
> - more testing for linux-next, distros may be of a great help here (-mm and
>   -next often catches bugs that you wouldn't have ever imagined in the first
>   place and they get fixed before the problem propagates into Linus' tree)

There still are too many bugs of this kind that make it to the Linus' tree and
they are the source of this thread.

> - more documentation for lowering the entry barrier for people who would like
>   to review the code (what Al has mentioned in this thread is a great idea
>   so no need for me to repeat it here)

Agreed.

> - more co-operation between people from different areas of the code
>   (i.e. testing linux-next instead of your own tree)

Agreed.

> and just not to forget - changes happen by people actually putting the work
> into them not by endless discussions.

Well, I'm not sure what that's supposed to mean, so I won't comment.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:45     ` Rafael J. Wysocki
  2008-04-30 21:37       ` Linus Torvalds
@ 2008-05-01 13:54       ` Stefan Richter
  2008-05-01 14:06         ` Rafael J. Wysocki
  1 sibling, 1 reply; 229+ messages in thread
From: Stefan Richter @ 2008-05-01 13:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

Rafael J. Wysocki wrote:
> And what do you think is happening _after_ the merge window closes, when
> we're supposed to be fixing bugs?  People work on new code.

That's not correct.  People work on new code before, during, and after 
the merge window.  They also fix bugs before, during, and after it.

> And, in fact, they have to, if they want to be ready for the next merge
> window.

To be ready for the next merge window just means to know which code is 
sufficiently reviewed and tested, and to have it queued up and if 
necessary synchronized with other pending code.
-- 
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 13:54       ` Stefan Richter
@ 2008-05-01 14:06         ` Rafael J. Wysocki
  0 siblings, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 14:06 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Linus Torvalds, David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thursday, 1 of May 2008, Stefan Richter wrote:
> Rafael J. Wysocki wrote:
> > And what do you think is happening _after_ the merge window closes, when
> > we're supposed to be fixing bugs?  People work on new code.
> 
> That's not correct.  People work on new code before, during, and after 
> the merge window.  They also fix bugs before, during, and after it.

I'm not quite sure if really all of them do.  Well, I should have said "some
people" instead of just "poeple" to be fair.

> > And, in fact, they have to, if they want to be ready for the next merge
> > window.
> 
> To be ready for the next merge window just means to know which code is 
> sufficiently reviewed and tested, and to have it queued up and if 
> necessary synchronized with other pending code.

Of course it _should_ mean that, but the fact is unreviewed and untested
patches are pushed to Linus, at least from time to time.  [Even some known
broken patches were pushed to Linus in the past, but we can't prevent that from
happening by any process changes.]

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 13:53                           ` Rafael J. Wysocki
@ 2008-05-01 14:35                             ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 229+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-05-01 14:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Willy Tarreau, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

On Thursday 01 May 2008, Rafael J. Wysocki wrote:
> On Thursday, 1 of May 2008, Bartlomiej Zolnierkiewicz wrote:
> > On Thursday 01 May 2008, Rafael J. Wysocki wrote:
> > > On Thursday, 1 of May 2008, Willy Tarreau wrote:
> > > > On Wed, Apr 30, 2008 at 06:19:56PM -0700, Linus Torvalds wrote:
> > > > > 
> > > > > 
> > > > > On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > > > > > 
> > > > > > > I do _not_ want to slow down development by setting some kind of "quality 
> > > > > > > bar" - but I do believe that we should keep our quality high, not because 
> > > > > > > of any hoops we need to jump through, but because we take pride in the 
> > > > > > > thing we do.
> > > > > > 
> > > > > > Well, we certainly should, but do we always remeber about it?  Honest, guv?
> > > > > 
> > > > > Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like 
> > > > > process generates quality?
> > > > > 
> > > > > And I dislike how people try to conflate "quality" and "merging speed" as 
> > > > > if there was any reason what-so-ever to believe that they are related.
> > > > > 
> > > > > You (and Andrew) have tried to argue that slowing things down results in 
> > > > > better quality, and I simply don't for a moment believe that. I believe 
> > > > > the exact opposite.
> > > > 
> > > > Note that I'm not necessarily arguing for slowing down, but for reduced
> > > > functional conflicts (which slow down may help but it's not the only
> > > > solution). I think that refining the time resolution might achieve the
> > > > same goal. Instead of merging 10000 changes which each have 1% chance
> > > > of breaking any other area, and have all developers try to hunt bugs
> > > > caused by unrelated changes, I think we could do that in steps.
> > > > 
> > > > To illustrate, instead of changing 100 areas with one of them causing
> > > > breaking in the other ones, and having 100 victims try to hunt the
> > > > bug in 99 other areas, then theirs, and finally insult the faulty
> > > > author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
> > > > or 4*25, etc...). That way, we would only have 50 victims trying to
> > > > find the bug in 49 other areas (or 32 or 24). Less people wasting
> > > > their time will mean faster validation of changes, and possibly
> > > > faster release cycle with better quality.
> > > > 
> > > > People send you their crap every two months. If you accept half of
> > > > it every month, they don't have to sleep on their code, and at the
> > > > same time at most half of them are in trouble during half the time
> > > > (since bugs are found faster).
> > > 
> > > Well, as far as I'm concerned, that will work too.
> > > 
> > > > > So if we can get the discussion *away* from the "let's slow things down", 
> > > > > then I'm interested. Because at that point we don't have to fight made-up 
> > > > > arguments about something irrelevant.
> > > > 
> > > > well, is "let's split changes" ok ?
> > > 
> > > How about:
> > > 
> > > (1) Merge a couple of trees at a time (one tree at a time would be ideal, but
> > >     that's impossible due to the total number of trees).
> > > (2) After (1) give testers some time to report problems introduced by the
> > >     merge.
> > > (3) Wait until the most urgent problems are resolved.  Revert the offending
> > >     changes if there's no solution within given time.
> > > (4) Repeat for another couple of trees.
> > > (5) Arrange things so that every tree gets merged once every two months.
> > > 
> > > This would also give us an idea of which trees introduce more problems.
> > 
> > ...and what would you do with such information?
> > 
> > I'm not actually worried about my tree but if (theoretically) it happens to
> > be amongst the "problematic" ones I would be a bit pissed by blame shifting,
> > especially given that it is very difficult to compare different trees as
> > they (usually) deal with quite different areas of the code (some are messy
> > and problematic, yet critical while others can be more forgiving).
> > 
> > Also slowing down things to focus on quality is really a bad idea.  You can
> > trust me on this one, I've tried it once on the smaller scale and it was a
> > big disaster cause people won't focus on quality just because you want them
> > to.  They'll continue to operate in the usual way and try to workaround you
> > instead (which in turn causes extra tensions which may become quiet warfare).
> > In the end you will have a lot more problems to deal with...
> 
> Well, I won't discuss with your experience.
> 
> > Same goes for any other kind of improvement by incorporating "punishment" as
> > the part of the process.  You are much better helping people and trying them
> > to understand that they should apply some changes to their way of work because
> > it would be also beneficial for _them_, not only for _you_.
> 
> I agree.
> 
> > Now regarding the development model - I think that there is really no need
> > for a revolution yet, instead we should focus on refining the current process
> > (which works great IMO), just to summarize various ideas given by people:
> > 
> > - try to persuade few black sheeps that skipping linux-next completely for
> >   whole patch series is a really bad idea and that they should try to spend
> >   a bit more time on planning for merge instead of LastMinute assembly+push
> >   (by doing it right they could spend more time after merge to prepare for
> >   the next one or fixing old bugs instead of chasing new regressions, overall
> >   they should have _more_ time for development by doing it right)
> > 
> > - encourage flatting of merges during the merge window so instead of 1-2 big
> >   merges per tree at the beginning of the merge you have few smaller ones
> >   (majority of maintainers do it this way already)
> > 
> > - more testing for linux-next, distros may be of a great help here (-mm and
> >   -next often catches bugs that you wouldn't have ever imagined in the first
> >   place and they get fixed before the problem propagates into Linus' tree)
> 
> There still are too many bugs of this kind that make it to the Linus' tree and
> they are the source of this thread.

Agreed but if you trace the way of these bugs into the Linus' tree many of
them follow one of two patterns:

* -mm / -next skipped completely

* short time in -mm / -next (< 2 weeks)

[ disclaimer: this is based on my observations, no hard data to prove it ]

Please also remember that linux-next concept is still quite _fresh_ with
a _plenty_ of room for enhancements like having kernel-du-jour packages for
the most popular distros, doing more automated testing + searching for
error strings in logs etc.

> > - more documentation for lowering the entry barrier for people who would like
> >   to review the code (what Al has mentioned in this thread is a great idea
> >   so no need for me to repeat it here)
> 
> Agreed.
> 
> > - more co-operation between people from different areas of the code
> >   (i.e. testing linux-next instead of your own tree)
> 
> Agreed.
> 
> > and just not to forget - changes happen by people actually putting the work
> > into them not by endless discussions.
> 
> Well, I'm not sure what that's supposed to mean, so I won't comment.

This was not directed at you (you are doing great work BTW) but rather
at some people trolling the thread.

Thanks,
Bart

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:04             ` Dmitri Vorobiev
@ 2008-05-01 15:19               ` Jim Schutt
  0 siblings, 0 replies; 229+ messages in thread
From: Jim Schutt @ 2008-05-01 15:19 UTC (permalink / raw)
  To: linux-kernel

Dmitri Vorobiev <dmitri.vorobiev <at> gmail.com> writes:

> 
> Andrew Morton пишет:
> 

> >> The network setup in our
> >> organization is such that I can use git only over http from that server.
> > 
> > Don't know what to do about that, sorry.  An off-site git->http proxy might
> > work, but I doubt if anyone has written the code.

Maybe your organization's http proxy will let you
tunnel the git protocol through it?

GIT_PROXY_COMMAND as described in 
http://www.gelato.unsw.edu.au/archives/git/0605/20509.html
works for me, except I substitue "nc" for "socket" in the
proxy script.

I.e.:

export GIT_PROXY_COMMAND=/usr/local/bin/proxy-cmd.sh

where proxy-cmd.sh is:

#! /bin/bash
(echo "CONNECT $1:$2 HTTP/1.0"; echo; cat ) | \
    nc my.proxy.com proxy_port | (read a; read a; cat )

In .git/config there's also

gitproxy = /usr/local/bin/proxy-cmd.sh

-- Jim



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 10:59                         ` Rafael J. Wysocki
@ 2008-05-01 15:26                           ` Linus Torvalds
  2008-05-01 17:09                             ` Rafael J. Wysocki
  2008-05-01 18:35                             ` Chris Frey
  0 siblings, 2 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01 15:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby



On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> 
> Okay, so what exactly are we going to do to address the issue that I described
> in the part of my last message that you skipped?

Umm. I don't really see anythign to say. You said:

> Still, the issue at hand is that
> (1) The code merged during a merge window is somewhat opaque from the  tester's
>      point of view and if a regression is found, the only practical means  to
>     figure out what caused it is to carry out a bisection (which generally  is
>     unpleasant, to put it lightly).
> (2) Many regressions are introduced during merge windows (relative to the
>     total amount of code merged they are a few, but the raw numbers are
>     significant) and because of (1) the process of removing them is  generally
>     painful for the affected people.
> (3) The suspicion is that the number of regressions introduced during  merge
>     windows has something to do with the quality of code being below
>     expectations, that in turn may be related to the fact that it's being
>     developed very rapidly.

And quite frankly, (2) and (3) are both: "merge windows introduce new
bugs", and that's such an uninteresting tautology that I'm left
wordless.  And (1) is just a result of merrging lots of stuff.

Of course the new bugs / regressions are introduced during the merge
window.  That's when we merge new code.  New bugs don't generally happen
when you don't get new code. 

And of course finding bugs is always painful to everybody involved.

And of course the bugs indicate something about the quality of code
being merged.  Perfect code wouldn't have bugs.

So what you are stating isn't interesting, and isn't even worthy of
discussion.  The way you state it, the only answer is: don't take new
code, then.  That's what your whole argument always seems to boild down
to, and excuse me for (yet again) finding that argument totally
pointless. 

So let me repeat:

 (1) we have new code. We always *will* have new code, hopefully. A few
     million lines pe year.

     If you don't accept this, I don't have anything to say.

 (2) we need a merge window.  That is a direct result not of wanting to
     have lots of code at the same time, but of the _reverse_ issue: we
     want to have times of relative calm.

     And again, if you continue to see the merge window as the
     "problem", rather than as the INEVITABLE result of wanting to have
     a calm period, there's no point in talking to you. 

 (3) Ergo, there's a very fundamental and basic and inescapable result:
     we absolutely _will_ have times when we get lots and lots of new
     code. 

So these are not "problems".  They are *facts*.  Stating them as
problems is stupid and pointless.  I'm not going to discuss this with
you if you cannot get over this. 

So please accept the facts.

Once you accept the facts, you can state the things you can change.  But
the things you cannot change is the merge window, and the fact that we
get a lot of new code at a high rate (where the merge window will
inevitably compress that rate, so that we have _another_ window where
the rate is lower). 

So stop arguing against facts, and start arguing about other things that
can be argued about. That's all I'm saying.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  4:31                     ` David Newall
  2008-05-01  4:37                       ` David Miller
  2008-05-01 13:49                       ` Lennart Sorensen
@ 2008-05-01 15:28                       ` Kasper Sandberg
  2008-05-01 17:49                         ` Russ Dill
  2 siblings, 1 reply; 229+ messages in thread
From: Kasper Sandberg @ 2008-05-01 15:28 UTC (permalink / raw)
  To: David Newall; +Cc: Linus Torvalds, David Miller, linux-kernel

On Thu, 2008-05-01 at 14:01 +0930, David Newall wrote:
<snip>
> 
> Linus Torvalds also wrote:
> > You complain how I don't release kernels that 
> > are stable, but without any suggestions on what the issue might be
> 
> You do release kernels that are unstable, and you call them "stable",
> but I'm sure I said that inadequate review and testing are causes, which
> I think counts as a suggestion on what the issue might be. It's been a
> repeating theme in this thread, and I'm talking about what everybody
> else is saying, not what I'm saying, so again, you know that I'm not
> making this up.

this is kindof bullshit, You can never be sure that something works
perfectly for everyone, if there were to be so excessive testing that
you would be willing to make such a bold claim, any "stable" kernel
would be years in testing.. Linux stability also seems to be okay, and
people who wants to lower risk of problems can simply choose to use
slightly older versions.

What i find more of a problem is long term effects and problems of
changes.

For instance, Linux has slowly and steadily been getting alot more
sensitive to IO, and ALOT more memory hungry..

I Recently found a system with a 2.6.4 kernel, and when i upgraded to
2.6.23, i saw memory usage increase from ~250mb to around 500. I
upgraded to .25 to see if it was some weird bug, but it is the same.

Unfortunately i cannot investigate more, as i only had the box for a
very short time, but this is alot more concerning to me.

Unfortunately i dont think i can easily reproduce this as i am unsure
how to actually get to test 2.6.0 through .24 easily..

> 
> Stop telling the world that 2.6.25 is ready for them when you know it's
> not. It's now ready for beta testing, and no more. Is 2.6.24 ready for
> the world yet? There are still problems being reported with it.
> 
> 

Well.. its doing a quite nice job on my new workstation :)

<snip>


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 13:16                         ` Bartlomiej Zolnierkiewicz
  2008-05-01 13:53                           ` Rafael J. Wysocki
@ 2008-05-01 15:29                           ` Ray Lee
  2008-05-01 19:03                             ` Willy Tarreau
  1 sibling, 1 reply; 229+ messages in thread
From: Ray Lee @ 2008-05-01 15:29 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Rafael J. Wysocki, Willy Tarreau, Linus Torvalds, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

On Thu, May 1, 2008 at 6:16 AM, Bartlomiej Zolnierkiewicz
<bzolnier@gmail.com> wrote:
>
> On Thursday 01 May 2008, Rafael J. Wysocki wrote:
>  > How about:
>  >
>  > (1) Merge a couple of trees at a time (one tree at a time would be ideal, but
>  >     that's impossible due to the total number of trees).
>  > (2) After (1) give testers some time to report problems introduced by the
>  >     merge.
>  > (3) Wait until the most urgent problems are resolved.  Revert the offending
>  >     changes if there's no solution within given time.
>  > (4) Repeat for another couple of trees.
>  > (5) Arrange things so that every tree gets merged once every two months.
>  >
>  > This would also give us an idea of which trees introduce more problems.
>
>  ...and what would you do with such information?
[...]
>  Same goes for any other kind of improvement by incorporating "punishment" as
>  the part of the process.

When a teacher assigns grades in a class, it's not punishment, it's feedback.

I don't think anyone *intends* to push crap into the tree. However,
with the barrier to getting things into the tree so low, some may feel
there's less incentive to try to get things right the first (or
second) time. It would be nice to provide that incentive.

Normally, it'd be peer-review of the uncommitted patches. We don't
have a lot of that going on here, though. So,
peer-review-after-the-fact, ie, who placed this massive turd in the
tree, and everyone swivels an eye over there and asks what went wrong,
and how do we prevent it in the future. Those conversations seem to be
happening already, time to time.

And as a policy suggestion, if we're past rc1 and someone has
identified a commit as the root of a regression/bug, then the policy
should be just to revert it immediately, no questions asked. Let the
original author work with the person who identified the problem and
resend a fixed commit later. We lose testers in the meantime, and
perhaps the extra effort involved in having the author work out the
issues and redo the patch will help prevent drive-by patching in the
future.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 12:31               ` Tarkan Erimer
@ 2008-05-01 15:34                 ` Stefan Richter
  2008-05-02 14:05                   ` Tarkan Erimer
  0 siblings, 1 reply; 229+ messages in thread
From: Stefan Richter @ 2008-05-01 15:34 UTC (permalink / raw)
  To: Tarkan Erimer
  Cc: Andrew Morton, Linus Torvalds, rjw, davem, linux-kernel, jirislaby

Tarkan Erimer wrote:
> To improve the quality of kernel releases, maybe we can create a special 
> kernel testing tool.

A variety of bugs cannot be caught by automated tests.  Notably those 
which happen with rare hardware, or due to very specific interaction 
with hardware, or with very special workloads.

An interesting thing to investigate would be to start at the regression 
meta bugs at bugzilla.kernel.org, go through all bugs on which are 
linked from there, and try to figure out
   - if these bugs could have been found by automated or at least
     semiautomatic tests on pre-merge code, and
   - how those tests had to have looked like, e.g. what equipment would
     have been necessary.

Let's look back at the posting at the thread start:
| On Wed, Apr 30, 2008 at 10:03 AM, David Miller <davem@davemloft.net> 
wrote:
| >  Yesterday, I spent the whole day bisecting boot failures
| >  on my system due to the totally untested linux/bitops.h
| >  optimization, which I fully analyzed and debugged.
...
| >  Yet another bootup regression got added within the last 24
| >  hours.

Bootup regressions can be automatically caught if the necessary machines 
are available, and candidate code gets exposure to test parks of those 
machines.  I hear this is already being done, and increasingly so.  But 
those test parks will ever only cover a tiny fraction of existing 
hardware and cannot be subjected to all code iterations and all possible 
.config permutations, hence will have limited coverage of bugs.

And things like the bitops issue depend on review much more than on 
tests, AFAIU.
-- 
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 13:21               ` Adrian Bunk
@ 2008-05-01 15:49                 ` Andrew Morton
  2008-05-01  1:13                   ` Arjan van de Ven
                                     ` (2 more replies)
  2008-05-02  2:08                 ` Paul Mackerras
  1 sibling, 3 replies; 229+ messages in thread
From: Andrew Morton @ 2008-05-01 15:49 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Linus Torvalds, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Thu, 1 May 2008 16:21:59 +0300 Adrian Bunk <bunk@kernel.org> wrote:

> > > But our current status quo is not OK:
> > > 
> > > Check Rafael's regressions lists asking yourself
> > > "How many regressions are older than two weeks?" 
> > 
> > "ext4 doesn't compile on m68k".
> > YAWN.
> >  
> > Wrong question...
> > "How many bugs that a sizable portion of users will hit in reality are there?"
> > is the right question to ask...
> >...
> 
> "Kernel oops while running kernbench and tbench on powerpc" took more 
> than 2 months to get resolved, and we ship 2.6.25 with this regression.

Precisely.  Cherry-picking a single example such as the 68k thing and then
claiming that it reflects the general is known as a "fallacy".

> Granted that compared to x86 there's not a sizable portion of users 
> crazy enough to run Linux on powerpc machines...

Another fallacy which Arjan is pushing (even though he doesn't appear to
have realised it) is "all hardware is the same".

Well, it isn't.  And most of our bugs are hardware-specific.  So, I'd
venture, most of our bugs don't affect most people.  So, over time, by
Arjan's "important to enough people" observation we just get more and more
and more unfixed bugs.

And I believe this effect has been occurring.

And please stop regaling us with this kerneloops.org stuff.  It just isn't
very interesting, useful or representative when considering the whole
problem.  Very few kernel bugs result in a trace, and when they do they are
usually easy to fix and, because of this, they will get fixed, often
quickly.  I expect netdevwatchdogeth0transmittimedout.org would tell a
different story.

One thing which muddies all this up is that bug reporters vanish.  Over the
years I have sent thousands and thousands of ping emails to people who have
reported bugs via email, three to six months after the fact.  Some were
solved - maybe a fifth.  About the same proportion of reporters reply and
give some reason why they cannot work on the bug.  In the majorty of cases
people don't reply at all and I suspect they're in the same category of
cannot-work-on-the-bug.

And why can't they work on the bug?  Usually, because they found a
workaround.  People aren't going to spend months sitting in front of a
non-functional computer waiting for kernel developers to decide if their
machine is important enough to fix.  They will find a workaround.  They
will buy new hardware.  They will discover "noapic" (234000 google hits and
rising!).  They will swap it with a different machine.  They will switch to
a different distro which for some reason doesn't trigger the bug.  They
will use an older kernel.  They will switch to Solaris.  Etcetera.  People
are clever - they will find a way to get around it.

I figure that after a bug is reported we have maybe 24 to 48 hours to send
a good response before our chances of _ever_ fixing it have begun to
decline sharply due to the clever minds at the other end.

Which leads us to Arjan's third fallacy:

   "How many bugs that a sizable portion of users will hit in reality
   are there?" is the right question to ask...

well no, it isn't.  Because approximately zero of the hardware bugs affect
a sizeable portion of users.  With this logic we will end up with more and
more and more and more bugs each of which affect a tiny number of users. 
Hundreds of different bugs.  You know where this process ends up.

Arjan's fourth fallacy: "We don't make (effective) prioritization
decisions." lol.  This implies that someone somewhere once sat down and
wondered which bug he should most effectively work on.  Well, we don't do
that.  We ignore _all_ the bugs in favour of busily writing new ones.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:28               ` Dmitri Vorobiev
@ 2008-05-01 16:26                 ` Diego Calleja
  2008-05-01 16:31                   ` Dmitri Vorobiev
  2008-05-02  1:48                   ` Stephen Rothwell
  0 siblings, 2 replies; 229+ messages in thread
From: Diego Calleja @ 2008-05-01 16:26 UTC (permalink / raw)
  To: Dmitri Vorobiev
  Cc: Linus Torvalds, Andrew Morton, rjw, davem, linux-kernel,
	jirislaby, mingo, Stephen Rothwell

El Thu, 01 May 2008 02:28:33 +0400, Dmitri Vorobiev <dmitri.vorobiev@gmail.com> escribió:

> Linus, thanks a lot for the detailed explanation. Indeed, it seems that I foolishly
> tried to duplicate Stephen's work. In the future I'll do as you suggest here.

That "howto" should probably be added to the linux-next announcements...
(CC'ing Stephen)

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 16:26                 ` Diego Calleja
@ 2008-05-01 16:31                   ` Dmitri Vorobiev
  2008-05-02  1:48                   ` Stephen Rothwell
  1 sibling, 0 replies; 229+ messages in thread
From: Dmitri Vorobiev @ 2008-05-01 16:31 UTC (permalink / raw)
  To: Diego Calleja
  Cc: Linus Torvalds, Andrew Morton, rjw, davem, linux-kernel,
	jirislaby, mingo, Stephen Rothwell

Diego Calleja пишет:
> El Thu, 01 May 2008 02:28:33 +0400, Dmitri Vorobiev <dmitri.vorobiev@gmail.com> escribió:
> 
>> Linus, thanks a lot for the detailed explanation. Indeed, it seems that I foolishly
>> tried to duplicate Stephen's work. In the future I'll do as you suggest here.
> 
> That "howto" should probably be added to the linux-next announcements...
> (CC'ing Stephen)
> 

Excellent idea. Thanks, Diego!

Dmitri

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 15:49                 ` Andrew Morton
  2008-05-01  1:13                   ` Arjan van de Ven
@ 2008-05-01 16:38                   ` Steven Rostedt
  2008-05-01 17:18                     ` Andrew Morton
  2008-05-01 17:24                   ` Theodore Tso
  2 siblings, 1 reply; 229+ messages in thread
From: Steven Rostedt @ 2008-05-01 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, Arjan van de Ven, Linus Torvalds, Rafael J. Wysocki,
	davem, linux-kernel, jirislaby


On Thu, 1 May 2008, Andrew Morton wrote:
>
> Arjan's fourth fallacy: "We don't make (effective) prioritization
> decisions." lol.  This implies that someone somewhere once sat down and
> wondered which bug he should most effectively work on.  Well, we don't do
> that.  We ignore _all_ the bugs in favour of busily writing new ones.

And actually, core kernel developers are best for writing new bugs.

Really, the way I started out learning how the kernel ticks was to go and
try to solve some bugs that I was seeing (this was years ago). I get
people asking that they want to learn to be a kernel developer and they
ask what new feature should they work on? Well, honestly, the last thing
a newbie kernel developer should be doing is writing new bugs. We need to
send them to a URL that lists all the known bugs and have them pick one,
any one, and have them solve it. This would be the best way to learn part
of the kernel.

I even find that I understand my own code better when I'm in the debugging
phase.

People here mention differnt places to look at code, and besides the
kerneloops.org I really don't even know where to look for bugs, because I
haven't seen a URL to point me to.

The next time someone asks me how to get started in kernel programming, I
would love to tell them to go and look here, and solve the bugs. I'm
guessing that I should just point them to:

  http://janitor.kernelnewbies.org/

and tell them to focus on real bugs (not just comments and such) to get
fixed if they really want to learn the kernel.

-- Steve


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:52       ` H. Peter Anvin
  2008-05-01  3:24         ` Bob Tracy
@ 2008-05-01 16:39         ` Valdis.Kletnieks
  1 sibling, 0 replies; 229+ messages in thread
From: Valdis.Kletnieks @ 2008-05-01 16:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Andrew Morton, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Wed, 30 Apr 2008 14:52:44 PDT, "H. Peter Anvin" said:

> This fragmentation is largely intentional, of course -- everyone can 
> pick a risk level appropriate for them -- but it does mean:
> 
> a) The lag for a patch to ride through the pipeline is pretty long.
> b) The section of people who are going to use the more aggressive trees 
> for "real work" testing is going to be small.

And another problem is that often, it's hard to get good "real work" coverage
over the whole tree.  I just discovered an apparent borkage somewhere in
the networking/wireless area that seems to have gotten into Linus's tree
somewhere between 24-rc8 and 24-final, just because I haven't beaten on
my wireless card in the last few weeks, so I didn't notice a regression in
'ip link show' related to the rfkill switch...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 15:26                           ` Linus Torvalds
@ 2008-05-01 17:09                             ` Rafael J. Wysocki
  2008-05-01 17:41                               ` Linus Torvalds
  2008-05-01 18:35                             ` Chris Frey
  1 sibling, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 17:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby

On Thursday, 1 of May 2008, Linus Torvalds wrote:
> 
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > 
> > Okay, so what exactly are we going to do to address the issue that I described
> > in the part of my last message that you skipped?
> 
> Umm. I don't really see anythign to say. You said:
> 
> > Still, the issue at hand is that
> > (1) The code merged during a merge window is somewhat opaque from the  tester's
> >      point of view and if a regression is found, the only practical means  to
> >     figure out what caused it is to carry out a bisection (which generally  is
> >     unpleasant, to put it lightly).
> > (2) Many regressions are introduced during merge windows (relative to the
> >     total amount of code merged they are a few, but the raw numbers are
> >     significant) and because of (1) the process of removing them is  generally
> >     painful for the affected people.
> > (3) The suspicion is that the number of regressions introduced during  merge
> >     windows has something to do with the quality of code being below
> >     expectations, that in turn may be related to the fact that it's being
> >     developed very rapidly.
> 
> And quite frankly, (2) and (3) are both: "merge windows introduce new
> bugs", and that's such an uninteresting tautology that I'm left
> wordless.

Perhaps if they introduced fewer bugs, all of that would be less frustrating to
people who get hit by them, especially by two or more at a time.  Everyone
seems to be fine with that until it happens to him personally (like it happened
to David).

> And (1) is just a result of merrging lots of stuff. 
> 
> Of course the new bugs / regressions are introduced during the merge
> window.  That's when we merge new code.  New bugs don't generally happen
> when you don't get new code. 

I obviously agree with that.  The question is, however, if we can decrease the
number of bugs introduced during merge windows and you seem to be saying
that no, we can't.  Which is disappointing.

> And of course finding bugs is always painful to everybody involved.
> 
> And of course the bugs indicate something about the quality of code
> being merged.  Perfect code wouldn't have bugs.
> 
> So what you are stating isn't interesting, and isn't even worthy of
> discussion.  The way you state it, the only answer is: don't take new
> code, then.  That's what your whole argument always seems to boild down
> to, and excuse me for (yet again) finding that argument totally
> pointless. 

I have never said you shouldn't take new code at all.  That's not what I'm
saying and please don't paint me this way.

I see a problem in that you get patches that you shouldn't have got because
they are unfinished and not well thought through.  They introduce regressions
which are only possible to find using bisection because of the amount of code
merged at a time and that's frustrating.

You seem to be regarding this as a necessity, but I'm really not convinced
that you're right in that.

> So let me repeat:
> 
>  (1) we have new code. We always *will* have new code, hopefully. A few
>      million lines pe year.
> 
>      If you don't accept this, I don't have anything to say.
> 
>  (2) we need a merge window.  That is a direct result not of wanting to
>      have lots of code at the same time, but of the _reverse_ issue: we
>      want to have times of relative calm.
> 
>      And again, if you continue to see the merge window as the
>      "problem", rather than as the INEVITABLE result of wanting to have
>      a calm period, there's no point in talking to you. 

However, the width of the merge window is not a predetermined thing and might
be adjusted, for example.  Other things might be changed too.

>  (3) Ergo, there's a very fundamental and basic and inescapable result:
>      we absolutely _will_ have times when we get lots and lots of new
>      code. 

But that need not include obviously broken patches.

> So these are not "problems".  They are *facts*.  Stating them as
> problems is stupid and pointless.  I'm not going to discuss this with
> you if you cannot get over this. 
> 
> So please accept the facts.
>
> Once you accept the facts, you can state the things you can change.  But
> the things you cannot change is the merge window, and the fact that we
> get a lot of new code at a high rate (where the merge window will
> inevitably compress that rate, so that we have _another_ window where
> the rate is lower). 

The problem is the (relatively small) fraction of patches pushed to you that
is broken.  Some patches are obviously broken, some of them are just not
tested well enough.  The result is pretty much the same in either case.

Now, the question is if we can get rid of that fraction by adjusting the
process somehow.  You're arguing that we can't and so be it.  [This is your
opinion and BTW there's nothing allowing me to call that unreasonable or saying
that you use made up arguments or something like this.]

My opinion is that we could at least try to do something about it.  linux-next
is probably a step in the right direction, though time will tell.  I'm afraid,
though, that I personally can't do much more than I've been doing already to
improve things.
 
> So stop arguing against facts, and start arguing about other things that
> can be argued about. That's all I'm saying.

The message that started this whole thread was not from me and I believe
it was sent for a reason.  So the fact is that at least some people lose their
patience over the current handling of merge windows.  And I'm not sure that's
necessary.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 16:38                   ` Steven Rostedt
@ 2008-05-01 17:18                     ` Andrew Morton
  0 siblings, 0 replies; 229+ messages in thread
From: Andrew Morton @ 2008-05-01 17:18 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: bunk, arjan, torvalds, rjw, davem, linux-kernel, jirislaby

On Thu, 1 May 2008 12:38:23 -0400 (EDT)
Steven Rostedt <rostedt@goodmis.org> wrote:

> People here mention differnt places to look at code, and besides the
> kerneloops.org I really don't even know where to look for bugs, because I
> haven't seen a URL to point me to.

bugzilla.kernel.org is, umm, improving.

It would be an intersting exercise for someone to spend a few days seeing
how many of the bugzilla reports they personally can reproduce.  I'd guess
"zero".  There's a lesson in that.

The problem with bugzilla will be that it will be hard to find reports
where the reporter will be able to work with you on the fix - we've let
them go cold.

The most fruitful place to find fixable bugs is linux-kernel.  People who
report bugs there are sufficiently motivated to have actually sent the
email and the bug is still recent, so they probably haven't done the
Solaris install yet.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 15:49                 ` Andrew Morton
  2008-05-01  1:13                   ` Arjan van de Ven
  2008-05-01 16:38                   ` Steven Rostedt
@ 2008-05-01 17:24                   ` Theodore Tso
  2008-05-01 19:26                     ` Andrew Morton
  2 siblings, 1 reply; 229+ messages in thread
From: Theodore Tso @ 2008-05-01 17:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, Arjan van de Ven, Linus Torvalds, Rafael J. Wysocki,
	davem, linux-kernel, jirislaby, Steven Rostedt

On Thu, May 01, 2008 at 08:49:19AM -0700, Andrew Morton wrote:
> Another fallacy which Arjan is pushing (even though he doesn't appear to
> have realised it) is "all hardware is the same".
> 
> Well, it isn't.  And most of our bugs are hardware-specific.  So, I'd
> venture, most of our bugs don't affect most people.  So, over time, by
> Arjan's "important to enough people" observation we just get more and more
> and more unfixed bugs.
> 
> And I believe this effect has been occurring.

So the question is if we have a thousand bugs which only affect one
person each, and 70 million Linux users, how much should we beat up
ourselves that 1,000 people can't use a particular version of the
Linux kernel, versus the 99.9% of the people for which the kernel
works just fine?

Sometimes, we can't make everyone happy.

At the recent Linux Collaboration Summit, we had a local user walk up
to a microphone, and loosely paraphrased, said, "WHINE WHINE WHINE
WHINE I have have a $30 DVD drive that doesn't work with Linux.  WHINE
WHINE WHINE WHINE WHINE What are *you* going to do to fix my problem?"

Some people like James responded very diplomatically, with "Well, you
have to understand, the developer might not have your hardware, and
there's a lot of broken out here, etc., etc."  What I wanted to tell
this user was, "Ask not what the Linux development community can do
for you.  Ask what *you* can do for Linux?"  Suppose this person had
filed a kernel bugzilla bug, and it was one of the hundreds or
thousands of non-handled bugs.  Sure, it's a tragedy that bugs pile
up.  But if they pile up because of crappy hardware, that's not a
major tragedy.  If we can figure out how to blacklist it, and move on,
we should do so.  

> And why can't they work on the bug?  Usually, because they found a
> workaround.  People aren't going to spend months sitting in front of a
> non-functional computer waiting for kernel developers to decide if their
> machine is important enough to fix.  They will find a workaround.  They
> will buy new hardware.

Hey, in this particular case, if this user worked around the problem
by buying new hardware, it was probably the right solution.  As far as
we know we don't have a systematic problem where huge numbers DVD
drives aren't working, so if there are a few odd ball ones that are
out there, we just CAN'T self-flagellate ourselves that we're not
fixing all bugs, and letting some bugs pile up.

> Which leads us to Arjan's third fallacy:
> 
>    "How many bugs that a sizable portion of users will hit in reality
>    are there?" is the right question to ask...
> 
> well no, it isn't.  Because approximately zero of the hardware bugs affect
> a sizeable portion of users.  With this logic we will end up with more and
> more and more and more bugs each of which affect a tiny number of users. 
> Hundreds of different bugs.  You know where this process ends up.

... and maybe we can't solve hardware bugs.  Or that crappy hardware
isn't worth holding back Linux development.  And I'm not sure ignoring
it is that horrible of a thing.  And in practice, if it's a hardware
bug in something which is very common, it *will* get noticed very
quickly and fixed.  But if it's in a hardware bug in some rare piece
of hardware, the user is going to have to either (a) help us fix it,
or (b) decide that his time is more valuable and that buying another
$30 DVD drive might be a better use of his and our time.

Back when I was the serial driver maintainer, I certainly made those
kinds of triage decisions.  I knew the serial driver was working on
the vast majority of the Linux users, because if it broke in a major
ways, I would hear about it, in spades and get lots and lots of hate
mail.  And there were plenty of crappy ISA boards out there; and I
would help them out when I could, and sometimes spend more volunteer
time helping them by changing one or two outb() to outb_p()'s (yes,
that really made a difference; remember, we're talking about crappy PC
class hardware with hardware bugs), but at the end of the day, past a
certain point, even with a willing and cooperative end-user, I would
have to call it a day, and give up, and tell them to get another
serial card.  (And back in the days of ISA boards, we couldn't even
use blacklists.)

And you know what?  Linux didn't collapse into a steaming pile of dung
when I did that.  We're all volunteers, and we need to recognize there
are limits to what we can do --- otherwise, it will way to easy to
burn out and become a bitter shell of a maintainer....

Even BSD fan boys will realize that in BSD land, you have to do even
more of this; if there's random broken hardware, or simply a lack of a
device driver, very often your only recourse is to work around the
problem by buying another serial card, or wifi card, or whatever.  And
this happens much more with BSD than Linux, simply because they
support fewer devices to begin with.

					- Ted

P.S.  We should really try to categorize bugs so we can figure out
what percentage of the bugs are device driver bugs, and what
percentage are core kernel bugs, which are "if you stress the system
too badly" sort of bugs, or "if you do something bad like yank the USB
stick without unmounting the filesystem first" sort of thing.  I think
if we did this, the numbers wouldn't look quite so scary, because it's
things like device driver problems with wierd sh*t bugs are not
comparable with core functionality bugs in the SLUB allocator, for
example.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 17:09                             ` Rafael J. Wysocki
@ 2008-05-01 17:41                               ` Linus Torvalds
  2008-05-01 18:11                                 ` Al Viro
                                                   ` (3 more replies)
  0 siblings, 4 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01 17:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby



On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> 
> I obviously agree with that.  The question is, however, if we can decrease the
> number of bugs introduced during merge windows and you seem to be saying
> that no, we can't.  Which is disappointing.

No, that's not what I'm saying.

What I *am* saying is that as long as you concentrate on "merge window" 
and "lots of code", you're concentrating not on the problems, but on the 
facts of life. You can't change facts, and even trying is pointless.

What you should concentrate on is not how many patches there are during 
the merge window (because we can't do anything about that) or the fact 
that they all happen in a short timeframe, but about quality of patches 
_regardless_ of merge window.

So if you can make an argument that does not even *try* to change the fact 
that 
 - we have lots of patches
and
 - we have a merge window
and
 - merging patches causes bugs

but argues about quality from some other standpoint, then I can start to 
believe that you have a point.

But as long as you argue about the fact that we merge a lot of stuff, and 
that bugs come in during the merge window, I'm not interested. Arguing 
about facts is totally non-productive.

And as long as people keep saying "let's not merge broken patches" or "we 
should never have bugs", I'll just ignore those kinds of idiotic 
statements. They aren't even arguments, they are wishes, and they are 
unrealistic. If we knew they were broken and had bugs, of course we 
wouldn't merge them.

In short - I'm simply not interested in what you _wish_ reality was. 
People need to first acknowledge reality, and _then_ they may have 
solutions.

So the reality is:
 - we do have tons of patches, and they need to be merged (and furiously)

 - there *will* be bugs. And the number of bugs will inevitably be 
   relative to the number of patches. There is no "perfect", and anybody 
   who argues for a lower number of bugs by lowering the number of patches 
   is an idiot in my book.

 - there *will* be releases, even in the presense of bugs, because holding 
   everything up is simply not an option.

Those are the things that we have to accept. Anything else is just 
dreaming.

Now, what part _can_ we improve and still be realistic?

We can try to improve average quality - the number of bugs will *still* be 
relative to the size of the changes (no getting away from that), but we 
may be able to lower the absolute number of bugs. But not to zero!

And that "not to zero" is IMPORTANT. If you think you can aim for zero 
bugs, I'm simply not interested in discussing it with you. You live in a 
different universe, and we're not talking about the same reality.

And if you're not being realistic, then why the hell would I believe that 
your solutions are realistic? I'd rather take some pills and talk to the 
little purple man living under the deck in my back yard, because at least 
he's amusing, even if he doesn't make much sense either.

And I'm also not in the *least* interested in arguments like "We should 
just improve our quality of patches".

Of course everybody wishes for that. Again, it's not an argument, it's 
just a unrealistic wish, unless you can actually give a suggestion of a 
process or other thing that would actually seem to reach it (without 
assuming other impossible things like "we need more time" or "we need 
more people who just spend their day looking for bugs").

Same goes for "we should all just spend time looking at each others 
patches and trying to find bugs in them". That's not a solution, that's a 
drug-induced dream you're living in. And again, if I want to discuss 
dreams, I'd rather talk about my purple guy, and the bad things he does to 
the hedgehog that lives next door.

So do you have any productive *suggestions*? Some that involve more than 
"let's write less code" or "let's just review each others patches more".

		Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 15:28                       ` Kasper Sandberg
@ 2008-05-01 17:49                         ` Russ Dill
  2008-05-02  1:47                           ` Kasper Sandberg
  0 siblings, 1 reply; 229+ messages in thread
From: Russ Dill @ 2008-05-01 17:49 UTC (permalink / raw)
  To: linux-kernel


> I Recently found a system with a 2.6.4 kernel, and when i upgraded to
> 2.6.23, i saw memory usage increase from ~250mb to around 500. I
> upgraded to .25 to see if it was some weird bug, but it is the same.
> 
> Unfortunately i cannot investigate more, as i only had the box for a
> very short time, but this is alot more concerning to me.
> 

Memory is not something that is difficult to track. Its likely one of two things:

a) Your card now has 3d support, hooray! and X is mapping more regions, which
isn't really additional RAM usage.

b) Linux is caching more things, hooray! I'm not saying that you are one of
those people who just looks at the free number and doesn't think any further,
but you might be.

or c, the kernel has another 250MB is kernel data structures, seems unlikely.



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 17:41                               ` Linus Torvalds
@ 2008-05-01 18:11                                 ` Al Viro
  2008-05-01 18:23                                   ` Linus Torvalds
  2008-05-01 18:50                                 ` Willy Tarreau
                                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 229+ messages in thread
From: Al Viro @ 2008-05-01 18:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andrew Morton, Willy Tarreau, David Miller,
	linux-kernel, Jiri Slaby

On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> Of course everybody wishes for that. Again, it's not an argument, it's 
> just a unrealistic wish, unless you can actually give a suggestion of a 
> process or other thing that would actually seem to reach it (without 
> assuming other impossible things like "we need more time" or "we need 
> more people who just spend their day looking for bugs").
> 
> Same goes for "we should all just spend time looking at each others 
> patches and trying to find bugs in them". That's not a solution, that's a 
> drug-induced dream you're living in.

As one of those obviously drug-addled freaks who _are_ looking for bugs...
Thank you so fucking much ;-/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:11                                 ` Al Viro
@ 2008-05-01 18:23                                   ` Linus Torvalds
  2008-05-01 18:30                                     ` Linus Torvalds
                                                       ` (2 more replies)
  0 siblings, 3 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01 18:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Rafael J. Wysocki, Andrew Morton, Willy Tarreau, David Miller,
	linux-kernel, Jiri Slaby



On Thu, 1 May 2008, Al Viro wrote:
> On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> > 
> > Same goes for "we should all just spend time looking at each others 
> > patches and trying to find bugs in them". That's not a solution, that's a 
> > drug-induced dream you're living in.
> 
> As one of those obviously drug-addled freaks who _are_ looking for bugs...
> Thank you so fucking much ;-/

That's not what I meant, and I think you know it.

Of course as many people as possible should look at other peoples patches 
and comment on them. But saying so won't _make_ it so.  And it's also 
something that we have done since day #1 _anyway_, so anybody who thinks 
that it would improve code quality from where we already are, should 
explain how he thinks the increase would be caused, and how it would 
happen.

So when we're looking at improvement suggestions, they should be real 
suggestions that have realistic goals, not just wishes. And they 
shouldn't be the things we *already* do, because then they wouldn't 
be improvements.

In other words: do people have realistic ideas for how to make others 
spend _more_ time looking at patches? And not just _wishing_ people did 
that?

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:23                                   ` Linus Torvalds
@ 2008-05-01 18:30                                     ` Linus Torvalds
  2008-05-01 18:58                                     ` Willy Tarreau
  2008-05-01 19:37                                     ` Al Viro
  2 siblings, 0 replies; 229+ messages in thread
From: Linus Torvalds @ 2008-05-01 18:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Rafael J. Wysocki, Andrew Morton, Willy Tarreau, David Miller,
	linux-kernel, Jiri Slaby



On Thu, 1 May 2008, Linus Torvalds wrote:
> 
> In other words: do people have realistic ideas for how to make others 
> spend _more_ time looking at patches? And not just _wishing_ people did 
> that?

Just to throw out an example:

 - make a "Random pending patch of the day" google gadget.

I know that's abit out there, and I'm not sure the google gadget thing is 
realistic, but I bet I'm not the only one who ends up using the google 
homepage all the time. A button that says "this patch looks ok", "this 
patch looks crap", or "I dunno, give me another one to look at" might be a 
fun game that would encourage people to look at a couple of patches a day.

You get five thousand people doing that occasionally (not every day, but 
maybe when they are bored and look for something more rewarding than 
trying to find bad music videos on youtube), and maybe you'd actually get 
feedback on patches.

Make it pick a random commit that is in linux-next but hasn't been merged 
into main -git yet.

Crazy? Probably. But at least it fits my notion of "let's not just wish 
people did more patch commentary" thing.

IOW, if people are really serious about coming up with ways to improve 
code quality, I really think it needs to be about _practical_ things that 
can fit in our flow or can be extensions to it, not just wishing for 
better quality.

"If wishes were horses, beggars would ride"

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:31                       ` Nigel Cunningham
@ 2008-05-01 18:32                         ` Stephen Clark
  0 siblings, 0 replies; 229+ messages in thread
From: Stephen Clark @ 2008-05-01 18:32 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Linus Torvalds, Rafael J. Wysocki, Willy Tarreau, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2008-04-30 at 18:40 -0700, Linus Torvalds wrote:
>> The thing is, the quality of individual patches isn't what matters! What 
>> matters is the quality of the end result. And people are going to be a lot 
>> more involved in looking at, testing, and working with code that is 
>> merged, rather than code that isn't.
> 
> No. People generally expect that code that has been merged does work, so
> they don't look at it unless they're forced to (by a bug or the desire
> to make further modifications in that code) and they don't explicitly
> seek to test it. They just seek to use it.
> 
> When it doesn't work, some of us will go and seek to find the cause,
> others (most?) will simply roll back to whatever they last found to be
> reliable.
> 
> Out of tree code has the same issues.
> 
> The only time code really gets looked at and tested is when there's a
> problem, or when people are explicitly choosing to inspect it (pre-merge
> reviews, eg).
> 
> So my answer to the "how do we raise quality" question would be that
> when writing the code, we put time and effort into properly analysing
> the problem and developing a solution, we put time and effort into
> carefully testing the solution, and we put code in that will help the
> end-user help us to debug issues later (without them necessarily needing
> to git-bisect). After all, good software isn't the result of random (or
> semi-random), unconsidered modifications, but of planning, thought and
> attention to detail.
> 
> In other words, I'm arguing that the speed of merging should be
> irrelevant. What's relevant is the quality of the work done in the first
> place.
> 
> If you want better quality code, penalise the people who get buggy code
> merged. Give them a reason to get it in a better state before they try
> to merge. Of course Linus alone can't do that.
> 
> Nigel
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
Amen!

-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 15:26                           ` Linus Torvalds
  2008-05-01 17:09                             ` Rafael J. Wysocki
@ 2008-05-01 18:35                             ` Chris Frey
  2008-05-02 13:22                               ` Enrico Weigelt
  1 sibling, 1 reply; 229+ messages in thread
From: Chris Frey @ 2008-05-01 18:35 UTC (permalink / raw)
  To: linux-kernel

On Thu, May 01, 2008 at 08:26:27AM -0700, Linus Torvalds wrote:
> So let me repeat:
> 
>  (1) we have new code. We always *will* have new code, hopefully. A few
>      million lines pe year.

Pardon this comment from an inexperienced kernel hacker, but it seems to
me that one of the main problems is subsystems stomping on each other
during the merge window, and a general confusion as to who is responsible
for what bugs that appear.

Perhaps a shorter merge window, using a round-robin approach, based on
subsystem, would help alleviate these issues?

This would:

	- give people a "known" tree to base their subsystem patches on,
		when their turn comes around

	- give a rough schedule if the round-robin was always consistent
		in order, or made known in advance

	- a shorter window would keep people from waiting too long for
		their turn

	- give those responsible for the currently merged subsystem
		motivation and clarity to fix bugs that do appear during
		their merge window


Problems I see with this approach:

	- those at the end of the cycle get the shaft, if previous changes
		affect their work

	- political issues with determining the order of the round-robin
		schedule


If I'm overlooking something, I'm sure someone will correct me. :-)

- Chris


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 17:41                               ` Linus Torvalds
  2008-05-01 18:11                                 ` Al Viro
@ 2008-05-01 18:50                                 ` Willy Tarreau
  2008-05-01 19:07                                   ` david
  2008-05-01 22:17                                   ` Rafael J. Wysocki
  2008-05-01 19:39                                 ` Friedrich Göpel
  2008-05-01 21:59                                 ` Rafael J. Wysocki
  3 siblings, 2 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andrew Morton, David Miller, linux-kernel, Jiri Slaby

On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> Same goes for "we should all just spend time looking at each others 
> patches and trying to find bugs in them". That's not a solution, that's a 
> drug-induced dream you're living in.

"all" above is the wrong part. Encourage each other into reviewing code
will definitely *help* (and I did not say fix the problem, OK?). There
are persons who regularly spend some time to review code. I'm thinking
about Al, Andrew, Christoph, Arjan, and maybe many other ones I'm missing,
just that I regularly see them give advices to people who post their patches
on the list. And even if only for that, they deserve some respect, and their
efforts must not be dismissed.

Maybe they are more skilled than anyone else for this job. Maybe they're
so much used to do it that it just takes them a few minutes each time, I
don't know. I wish *more* people could be encouraged to do this work,
which is very likely painful but instructive. If the current reviewers
could give hints on how to save a lot of time to them, it may motivate
more to follow them. I suspect that insisting on developers to post their
less obvious work to the list(s) is a first step. Maybe at one point we're
all responsible when we see a mail entitled "[GIT] pull request for XXX",
we should all jump on it and ask "when and where was this code reviewed ?".

Once again, it's not a fix. It's just one small step towards a saner process.

> So do you have any productive *suggestions*? Some that involve more than 
> "let's write less code" or "let's just review each others patches more".

It's not much about reviewing each others' patches, it's about showing
one's work to others first. If our developers are encouraged to work
alone in a cave late at night with itching eyes, and send their work
at once every 2 months in a sealed envelope, we'll not solve anything.

I also proposed a more repressive method incitating the ones with really
bad scores to find crap in other's work in order to remain hidden behind
them. You explained why it would not work. Fine.

I also proposed to group merges by reduced overlapping areas, and to
shorten the merge window and make it (at least) twice as often. Rafael
also proposed to merge core first, then archs, which is a refined variation
on the same principle. I'm not sure I've seen your opinion on this.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:30                     ` Linus Torvalds
@ 2008-05-01 18:54                       ` Adrian Bunk
  2008-05-14 14:55                       ` Pavel Machek
  1 sibling, 0 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-01 18:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, rjw, davem, linux-kernel, jirislaby

On Wed, Apr 30, 2008 at 07:30:13PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 1 May 2008, Adrian Bunk wrote:
> > 
> > I am saying that it was merged too early, and that there are points that 
> > should have been addressed before the driver got merged.
> > 
> > Get it submitted for review to linux-kernel.
> > Give the maintainers some time to incorporate all comments.
> > Even one month later it could still have made it into 2.6.25.
> > 
> > The only problem with my suggestion is that it's currently pretty random 
> > whether someone takes the time to review such a driver on linux-kernel.
> 
> Now, I do agree that we could/should have some more process in general. I 
> really _would_ like to have a process in place that basically says:
> 
>  - everything must have gone through lkml at least once
> 
>  - after that point, it should have been in linux-next or the -mm queue
> 
>  - and then it can get merged (and if it didn't get any review by then, 
>    maybe it was because nobody was interested, and it simply won't be 
>    getting any until it oopses or catches peoples interest some other way)
> 
> HOWEVER.
> 
> That process doesn't actually work for everything anyway (a lot of trivial 
> fixes are really best not being so noisy, and various patches that are 
> specific to some subsystem really _are_ better off just discussed on that 
> subsystem mailing lists).

Cc linux-kernel on 3 patches "specific to some subsystem" that add the 
word "select" to Kconfig files and I'll catch at least one bug before it 
enters your tree...

> And perhaps more pertinently, right now that kind of process is very 
> inconvenient (to the point of effectively being impossible) for me to 
> check. Obviously, if the patch comes from Andrew, I know it was in -mm, 
> and I seldom drop those patches for obvious reasons anyway, but the last 
> thing we want is some process that depends even _more_ on Andrew being a 
> burnt-out-excuse-for-a-man in a few years (*).
> 
> So I could ask for people to always have pointers to "it was discussed 
> here" on patches they send (and I'd likely mostly trust them without even 
> bothering to verify), the same way -git maintainers often talk about "most 
> of this has been in -mm for the last two months".

It should be enough to trust maintainers that they follow the rules.

And in the unlikely case someone didn't follow them you know whom you 
have to watch closely during his next merge requests...

> That might work. But then there would still be the patches that are 
> obvious and don't need them.
> 
> And then even the obvious patches do break. And people will complain. Even 
> though requiring that kind of process for the stupid stuff would just slow 
> everybody down, and would be really painful.

There's a middle way.

Requiring the submission of bigger changes and new drivers to be Cc'ed 
to linux-kernel can help and shouldn't cause real problems.

And requiring this kind of patches to be in linux-next for some time
should also be possible.

Both can improve the quality of the kernel.

Trivial patches and bugfixes might not have to follow these rules, but 
that's similar to e.g. the current merge window process also having 
exceptions for new drivers. 

>...
> 			Linus
>...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:23                                   ` Linus Torvalds
  2008-05-01 18:30                                     ` Linus Torvalds
@ 2008-05-01 18:58                                     ` Willy Tarreau
  2008-05-01 19:37                                     ` Al Viro
  2 siblings, 0 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01 18:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Rafael J. Wysocki, Andrew Morton, David Miller,
	linux-kernel, Jiri Slaby

our mails have crossed each other. Just to follow up in this thread just
in case...

On Thu, May 01, 2008 at 11:23:43AM -0700, Linus Torvalds wrote:
> So when we're looking at improvement suggestions, they should be real 
> suggestions that have realistic goals, not just wishes. And they 
> shouldn't be the things we *already* do, because then they wouldn't 
> be improvements.

as explained in last mail, I think that we're doing that far less than
we used to because of the ease of "Linus, please pull from git://master...".

> In other words: do people have realistic ideas for how to make others 
> spend _more_ time looking at patches? And not just _wishing_ people did 
> that?

As explained, I have no problem hijacking pull requests asking for 1) code
and 2) review if it's not explicitly stated in the message that it has been
reviewed, or that it is an obvious fix. I have no problem trusting the poster,
he should just care not to lie too often or will get a bad reputation of being
a blatant liar.

The only limit is that if I'm alone doing those raids, I'll quickly get into
all developer's blacklist and nothing will change. *YOU* too have to enforce
this policy.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 15:29                           ` Ray Lee
@ 2008-05-01 19:03                             ` Willy Tarreau
  0 siblings, 0 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01 19:03 UTC (permalink / raw)
  To: Ray Lee
  Cc: Bartlomiej Zolnierkiewicz, Rafael J. Wysocki, Linus Torvalds,
	David Miller, linux-kernel, Andrew Morton, Jiri Slaby

On Thu, May 01, 2008 at 08:29:18AM -0700, Ray Lee wrote:
> And as a policy suggestion, if we're past rc1 and someone has
> identified a commit as the root of a regression/bug, then the policy
> should be just to revert it immediately, no questions asked. Let the
> original author work with the person who identified the problem and
> resend a fixed commit later. We lose testers in the meantime, and
> perhaps the extra effort involved in having the author work out the
> issues and redo the patch will help prevent drive-by patching in the
> future.

you make a valid point here : "we lose testers in the meantime". Maybe
it would help if -rc2 would be released a few days after -rc1 with
the first most obvious showstoppers (often build issues). The most
problematic ones are often fixed within an hour or so, but for most
testers, it still means they have to wait for -rc2.

Most external testers might then only try -rc2 first, but that's not
a problem. What we really want is them to test widely and not revert
back at the first problem. If only 20% of testers try -rc1, and the
remaining 80% actively wait for -rc2 3 days after, then we'll get
broader testing in the first two weeks.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:50                                 ` Willy Tarreau
@ 2008-05-01 19:07                                   ` david
  2008-05-01 19:28                                     ` Willy Tarreau
  2008-05-01 22:17                                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 229+ messages in thread
From: david @ 2008-05-01 19:07 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Morton, David Miller,
	linux-kernel, Jiri Slaby

On Thu, 1 May 2008, Willy Tarreau wrote:

> I also proposed to group merges by reduced overlapping areas, and to
> shorten the merge window and make it (at least) twice as often. Rafael
> also proposed to merge core first, then archs, which is a refined variation
> on the same principle. I'm not sure I've seen your opinion on this.

the problem with trying to make the cycle twice as fast is that it takes 
time to hunt down the hard bugs, even when you have some idea where they 
are.

go back through the last few kernels and look at the bugs that were fixed 
in the last couple of -rc releases (and in final), would they have really 
been fixed faster if other changes hadn't taken place?

I suspect that they would not have, and if I'm right the result of merging 
half as much wouldn't be twice as many releases, but rather approximatly 
the same release schedule with more piling up for the next release.

even individual git trees that do get a fair bit of testing (like 
networking for example) run into odd and hard to debug problems when 
exposed to a wider set of hardware and loads. having the networking 
changes go in every 4 months (with 4 months worth of changes) instead of 
every 2 months (with 2 months worth of changes) will just mean that there 
will be more problems in this area, and since they will be more 
concentrated in that area it will be harder to fix them all fast as the 
same group of people are needed for all of them.

if several maintainers think that you are correct that doing a merge with 
far fewer changes will be a lot faster, they can test this in the real 
world by skipping one release. just send Linus a 'no changes this time' 
instead of a pull request. If you are right the stable release will happen 
significantly faster and they can say 'I told you so' and in the next 
release have a fair chance of convincing other maintainers to skip a 
release.

it does worry me a bit that the release cycle seems to be slipping 
slightly each release, but I don't see a good way to fix this.

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 17:24                   ` Theodore Tso
@ 2008-05-01 19:26                     ` Andrew Morton
  2008-05-01 19:39                       ` Steven Rostedt
  2008-05-02 10:23                       ` Andi Kleen
  0 siblings, 2 replies; 229+ messages in thread
From: Andrew Morton @ 2008-05-01 19:26 UTC (permalink / raw)
  To: Theodore Tso
  Cc: bunk, arjan, torvalds, rjw, davem, linux-kernel, jirislaby, rostedt

On Thu, 1 May 2008 13:24:34 -0400
Theodore Tso <tytso@MIT.EDU> wrote:

> ... and maybe we can't solve hardware bugs. 

Many, many of these are regressions.  If old-linux works on that
hardware then new-linux can too.

(still wants to know what we did 2-3 years ago which caused thousands of
people to have to resort to using noapic and other apic-related boot option
workarounds)


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 19:07                                   ` david
@ 2008-05-01 19:28                                     ` Willy Tarreau
  2008-05-01 19:46                                       ` david
  0 siblings, 1 reply; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01 19:28 UTC (permalink / raw)
  To: david
  Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Morton, David Miller,
	linux-kernel, Jiri Slaby

On Thu, May 01, 2008 at 12:07:53PM -0700, david@lang.hm wrote:
> On Thu, 1 May 2008, Willy Tarreau wrote:
> 
> >I also proposed to group merges by reduced overlapping areas, and to
> >shorten the merge window and make it (at least) twice as often. Rafael
> >also proposed to merge core first, then archs, which is a refined variation
> >on the same principle. I'm not sure I've seen your opinion on this.
> 
> the problem with trying to make the cycle twice as fast is that it takes 
> time to hunt down the hard bugs, even when you have some idea where they 
> are.

Of course, they'll always be bugs. They'll still slip past the release,
as many are doing today.

> go back through the last few kernels and look at the bugs that were fixed 
> in the last couple of -rc releases (and in final), would they have really 
> been fixed faster if other changes hadn't taken place?

Don't know. However, I think that core bugs have more impact on the rest
than other bugs. Reason to merge core first.

> I suspect that they would not have, and if I'm right the result of merging 
> half as much wouldn't be twice as many releases, but rather approximatly 
> the same release schedule with more piling up for the next release.

no, this is exactly what *not* to do. Linus is right about the risk of
getting more stuff at once. If we merge less things, we *must* be able
to speed up the process. Half the patches to cross-check in half the
time should be easier than all patches in full time. The time to fix
a problem within N patches is O(N^2).

> even individual git trees that do get a fair bit of testing (like 
> networking for example) run into odd and hard to debug problems when 
> exposed to a wider set of hardware and loads. having the networking 
> changes go in every 4 months (with 4 months worth of changes) instead of 
> every 2 months (with 2 months worth of changes) will just mean that there 
> will be more problems in this area, and since they will be more 
> concentrated in that area it will be harder to fix them all fast as the 
> same group of people are needed for all of them.

You're perfectly right and that's exactly not what I'm proposing. BTW,
having two halves will also get more of the merge job done the side of
developers, where testing is being done before submission. So in the
end, we should also get *less* regressions caused by each submission.

> if several maintainers think that you are correct that doing a merge with 
> far fewer changes will be a lot faster, they can test this in the real 
> world by skipping one release. just send Linus a 'no changes this time' 
> instead of a pull request. If you are right the stable release will happen 
> significantly faster and they can say 'I told you so' and in the next 
> release have a fair chance of convincing other maintainers to skip a 
> release.

again, this cannot work because this would result in slowing them down,
and it's not what I'm proposing.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 11:53                       ` Rafael J. Wysocki
  2008-05-01 12:11                         ` Will Newton
  2008-05-01 13:16                         ` Bartlomiej Zolnierkiewicz
@ 2008-05-01 19:36                         ` Valdis.Kletnieks
  2 siblings, 0 replies; 229+ messages in thread
From: Valdis.Kletnieks @ 2008-05-01 19:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Willy Tarreau, Linus Torvalds, David Miller, linux-kernel,
	Andrew Morton, Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 897 bytes --]

On Thu, 01 May 2008 13:53:18 +0200, "Rafael J. Wysocki" said:

> How about:
> 
> (1) Merge a couple of trees at a time (one tree at a time would be ideal, but
>     that's impossible due to the total number of trees).
> (2) After (1) give testers some time to report problems introduced by the
>     merge.
> (3) Wait until the most urgent problems are resolved.  Revert the offending
>     changes if there's no solution within given time.
> (4) Repeat for another couple of trees.
> (5) Arrange things so that every tree gets merged once every two months.

You can't get there from here (at least not very easily).

If you have 60 trees, and want a merge for each one every 2 months, you have to
average 1 tree a day.  How big a delay you want in step (2) directly impacts
how many trees you merge at once - if you want a week of cook time, you have to
merge 7 trees every Monday, and so on...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:23                                   ` Linus Torvalds
  2008-05-01 18:30                                     ` Linus Torvalds
  2008-05-01 18:58                                     ` Willy Tarreau
@ 2008-05-01 19:37                                     ` Al Viro
  2008-05-01 19:58                                       ` Andrew Morton
  2008-05-01 20:07                                       ` Joel Becker
  2 siblings, 2 replies; 229+ messages in thread
From: Al Viro @ 2008-05-01 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andrew Morton, Willy Tarreau, David Miller,
	linux-kernel, Jiri Slaby

On Thu, May 01, 2008 at 11:23:43AM -0700, Linus Torvalds wrote:
> On Thu, 1 May 2008, Al Viro wrote:
> > On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> > > 
> > > Same goes for "we should all just spend time looking at each others 
> > > patches and trying to find bugs in them". That's not a solution, that's a 
> > > drug-induced dream you're living in.
> > 
> > As one of those obviously drug-addled freaks who _are_ looking for bugs...
> > Thank you so fucking much ;-/
> 
> That's not what I meant, and I think you know it.

FWIW, the way I'd read that had been "face it, normal folks don't *do*
that and if you hope for more people doing code review - put down your
pipe, it's not even worth talking about".  Which managed to get under
my skin, and that's not something that happens often...

Anyway, I'm glad it had been a misparsing; my apologies for the reaction.
 
> So when we're looking at improvement suggestions, they should be real 
> suggestions that have realistic goals, not just wishes. And they 
> shouldn't be the things we *already* do, because then they wouldn't 
> be improvements.
> 
> In other words: do people have realistic ideas for how to make others 
> spend _more_ time looking at patches? And not just _wishing_ people did 
> that?

The obvious answer: amount of areas where one _can_ do that depends on
some things that can be changed.  Namely:
	* one needs to understand enough of the area or know where/how
to get the information needed for that.  I've got some experience with
the latter and I suspect that most of the folks who do active reviews
have their own set of tricks for getting into the unfamiliar area fast.
Moreover, having such set of tricks is probably _the_ thing that makes
us able to do that kind of work.
	Sharing such (i.e. "here's how one wades through unfamiliar
area and gets a sense of what's going on there; here's what one looks
out for; here's how to deal with data structures; here are the signs
of problematic lifetime logics; here's how one formulates hypothesis
about refcounting rules; here's how one verifies such and looks for
possible bugs in that area; etc.) is a Good Idea(tm).
	Having the critical areas documented with easy to review in
mind is another thing that would probably help.  And yes, it won't
happen overnight, it won't happen for all areas and it won't be mandatory
for maintainers, etc.  Previous part (i.e. which questions to ask
about data structures, etc.) would help with that.
	FWIW, I'm trying to do that - right now I'm flipping between
wading through Cthulhu-damned fs/locks.c and its friends and getting
the notes I've got from the last month work into edible form (which
includes translation into something that resembles normal English,
among other things - more than half of that is in... well, let's call
it idiom-rich Russian).
	* patches should be visible *when* *they* *can* *be* *changed*.
If it's "Linus had pulled from linux-foo.git and that included a merge
from linux-foobar.git, which is developed on foobar-wank@hell.knows.where",
it's too late.  It's not just that you don't revert; it's that you _can't_
realistically revert in such situation - not without very massive work.
And I don't know what _can_ be done about that, other than making it
socially discouraged.  To some extent it's OK, but my impression is that
some areas are as bad as CVS-based "communities" had been and switch to
git has simply hidden the obvious signs of trouble...

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 17:41                               ` Linus Torvalds
  2008-05-01 18:11                                 ` Al Viro
  2008-05-01 18:50                                 ` Willy Tarreau
@ 2008-05-01 19:39                                 ` Friedrich Göpel
  2008-05-01 21:59                                 ` Rafael J. Wysocki
  3 siblings, 0 replies; 229+ messages in thread
From: Friedrich Göpel @ 2008-05-01 19:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andrew Morton, Willy Tarreau, David Miller,
	linux-kernel, Jiri Slaby

On 10:41 Thu 01 May     , Linus Torvalds wrote:
> Same goes for "we should all just spend time looking at each others 
> patches and trying to find bugs in them". That's not a solution, that's a 
> drug-induced dream you're living in.

But is it smarter to discourage people from doing code review, by saying
that they won't be doing it anyway,
or actively and publicly encourage people to do so, even on the chance
that it might not lead to everyone doing it?
It's kind of a self-fulfilling prophecy that way.

Trying to force it through the process is another matter entirely.


Cheers,

Friedrich Göpel

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 19:26                     ` Andrew Morton
@ 2008-05-01 19:39                       ` Steven Rostedt
  2008-05-02 10:23                       ` Andi Kleen
  1 sibling, 0 replies; 229+ messages in thread
From: Steven Rostedt @ 2008-05-01 19:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Theodore Tso, bunk, arjan, torvalds, rjw, davem, linux-kernel, jirislaby


On Thu, 1 May 2008, Andrew Morton wrote:

> On Thu, 1 May 2008 13:24:34 -0400
> Theodore Tso <tytso@MIT.EDU> wrote:
>
> > ... and maybe we can't solve hardware bugs.
>
> Many, many of these are regressions.  If old-linux works on that
> hardware then new-linux can too.
>
> (still wants to know what we did 2-3 years ago which caused thousands of
> people to have to resort to using noapic and other apic-related boot option
> workarounds)

Perhaps 2-3 years ago more people started using more hardware that
implements APIC. ;-)

-- Steve


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 19:28                                     ` Willy Tarreau
@ 2008-05-01 19:46                                       ` david
  2008-05-01 19:53                                         ` Willy Tarreau
  0 siblings, 1 reply; 229+ messages in thread
From: david @ 2008-05-01 19:46 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Morton, David Miller,
	linux-kernel, Jiri Slaby

On Thu, 1 May 2008, Willy Tarreau wrote:

> On Thu, May 01, 2008 at 12:07:53PM -0700, david@lang.hm wrote:
>> On Thu, 1 May 2008, Willy Tarreau wrote:
>>
>> I suspect that they would not have, and if I'm right the result of merging
>> half as much wouldn't be twice as many releases, but rather approximatly
>> the same release schedule with more piling up for the next release.
>
> no, this is exactly what *not* to do. Linus is right about the risk of
> getting more stuff at once. If we merge less things, we *must* be able
> to speed up the process. Half the patches to cross-check in half the
> time should be easier than all patches in full time. The time to fix
> a problem within N patches is O(N^2).

in general you are correct, however I don't think that it's the general 
bugs that end up delaying the releases, think it's the nasty, hard to 
identify and understand bugs that delay the releases, and I don't think 
that the debugging of those will speed up much.

>> even individual git trees that do get a fair bit of testing (like
>> networking for example) run into odd and hard to debug problems when
>> exposed to a wider set of hardware and loads. having the networking
>> changes go in every 4 months (with 4 months worth of changes) instead of
>> every 2 months (with 2 months worth of changes) will just mean that there
>> will be more problems in this area, and since they will be more
>> concentrated in that area it will be harder to fix them all fast as the
>> same group of people are needed for all of them.
>
> You're perfectly right and that's exactly not what I'm proposing. BTW,
> having two halves will also get more of the merge job done the side of
> developers, where testing is being done before submission. So in the
> end, we should also get *less* regressions caused by each submission.

Ok, I guess I don't understand what you are proposing then.

I thought that you were proposing going from 2 week merge + 6 week 
stabilize = release to 1 week merge half + 3 week stabilize = release

it now sounds as if you are saying 1 week merge + x week stabilize + 1 
week merge + x week stabilize = release

can you clarify?

>> if several maintainers think that you are correct that doing a merge with
>> far fewer changes will be a lot faster, they can test this in the real
>> world by skipping one release. just send Linus a 'no changes this time'
>> instead of a pull request. If you are right the stable release will happen
>> significantly faster and they can say 'I told you so' and in the next
>> release have a fair chance of convincing other maintainers to skip a
>> release.
>
> again, this cannot work because this would result in slowing them down,
> and it's not what I'm proposing.

if merging fewer catagoies of stuff doesn't speed up the release cycle 
then you are right, it would just slow things down. however I thought you 
were arguing that if we merged fewer catagories of stuff each cycle we 
could speed up the cycle. I'm saying that maintainers can choose to test 
this experimentally and see if it works. if it works we can shift to doing 
more of it, if it doesn't they only delay things by a couple of months one 
time.

you would need to have several maintainers decide to participate in the 
experiment or the difference in cycle time may not be noticable.

David Lang

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 19:46                                       ` david
@ 2008-05-01 19:53                                         ` Willy Tarreau
  0 siblings, 0 replies; 229+ messages in thread
From: Willy Tarreau @ 2008-05-01 19:53 UTC (permalink / raw)
  To: david
  Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Morton, David Miller,
	linux-kernel, Jiri Slaby

On Thu, May 01, 2008 at 12:46:41PM -0700, david@lang.hm wrote:
> On Thu, 1 May 2008, Willy Tarreau wrote:
> 
> >On Thu, May 01, 2008 at 12:07:53PM -0700, david@lang.hm wrote:
> >>On Thu, 1 May 2008, Willy Tarreau wrote:
> >>
> >>I suspect that they would not have, and if I'm right the result of merging
> >>half as much wouldn't be twice as many releases, but rather approximatly
> >>the same release schedule with more piling up for the next release.
> >
> >no, this is exactly what *not* to do. Linus is right about the risk of
> >getting more stuff at once. If we merge less things, we *must* be able
> >to speed up the process. Half the patches to cross-check in half the
> >time should be easier than all patches in full time. The time to fix
> >a problem within N patches is O(N^2).
> 
> in general you are correct, however I don't think that it's the general 
> bugs that end up delaying the releases, think it's the nasty, hard to 
> identify and understand bugs that delay the releases, and I don't think 
> that the debugging of those will speed up much.

Indirectly yes it should. Who do you think is chasing those nasty bugs ?
More people than should be. While those people spend time on bugs caused
revealed by associating several trees, they don't work on fixing their
own bugs.

> >>even individual git trees that do get a fair bit of testing (like
> >>networking for example) run into odd and hard to debug problems when
> >>exposed to a wider set of hardware and loads. having the networking
> >>changes go in every 4 months (with 4 months worth of changes) instead of
> >>every 2 months (with 2 months worth of changes) will just mean that there
> >>will be more problems in this area, and since they will be more
> >>concentrated in that area it will be harder to fix them all fast as the
> >>same group of people are needed for all of them.
> >
> >You're perfectly right and that's exactly not what I'm proposing. BTW,
> >having two halves will also get more of the merge job done the side of
> >developers, where testing is being done before submission. So in the
> >end, we should also get *less* regressions caused by each submission.
> 
> Ok, I guess I don't understand what you are proposing then.
> 
> I thought that you were proposing going from 2 week merge + 6 week 
> stabilize = release to 1 week merge half + 3 week stabilize = release
> 
> it now sounds as if you are saying 1 week merge + x week stabilize + 1 
> week merge + x week stabilize = release
> 
> can you clarify?

The later : 1 week merge for core, 2-4 weeks to stabilize depending on the
amount of changes and complexity of some bugs, release or not at this point
(probably not), then 1 week merge for the rest, and 2-4 weeks stabilize.

Drivers are different. Maybe we'll find it's better to merge them with the
rest, maybe we'll find it wise to merge them all along, I don't know.

> >>if several maintainers think that you are correct that doing a merge with
> >>far fewer changes will be a lot faster, they can test this in the real
> >>world by skipping one release. just send Linus a 'no changes this time'
> >>instead of a pull request. If you are right the stable release will happen
> >>significantly faster and they can say 'I told you so' and in the next
> >>release have a fair chance of convincing other maintainers to skip a
> >>release.
> >
> >again, this cannot work because this would result in slowing them down,
> >and it's not what I'm proposing.
> 
> if merging fewer catagoies of stuff doesn't speed up the release cycle 
> then you are right, it would just slow things down. however I thought you 
> were arguing that if we merged fewer catagories of stuff each cycle we 
> could speed up the cycle. I'm saying that maintainers can choose to test 
> this experimentally and see if it works. if it works we can shift to doing 
> more of it, if it doesn't they only delay things by a couple of months one 
> time.

we should not delay too much IMHO, especially for core changes. We risk to
get huge piles of code which break a lot of other things. Also, core changes
sometimes involve adjustments in every driver or so. So they should not get
additional delay (unless we're really bore by the maintainer not respecting
the process).

> you would need to have several maintainers decide to participate in the 
> experiment or the difference in cycle time may not be noticable.

But it would require Linus to drive it first.

Willy


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 19:37                                     ` Al Viro
@ 2008-05-01 19:58                                       ` Andrew Morton
  2008-05-01 20:07                                       ` Joel Becker
  1 sibling, 0 replies; 229+ messages in thread
From: Andrew Morton @ 2008-05-01 19:58 UTC (permalink / raw)
  To: Al Viro; +Cc: torvalds, rjw, w, davem, linux-kernel, jirislaby

On Thu, 1 May 2008 20:37:14 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> 	* patches should be visible *when* *they* *can* *be* *changed*.
> If it's "Linus had pulled from linux-foo.git and that included a merge
> from linux-foobar.git, which is developed on foobar-wank@hell.knows.where",
> it's too late.  It's not just that you don't revert; it's that you _can't_
> realistically revert in such situation - not without very massive work.
> And I don't know what _can_ be done about that, other than making it
> socially discouraged.  To some extent it's OK, but my impression is that
> some areas are as bad as CVS-based "communities" had been and switch to
> git has simply hidden the obvious signs of trouble...

Yup.  I think the only sane+scalable way of making this happen is to
prevail upon the 100-odd subsystem maintainers to keep an eye out for code
which should be exposed to additional eyes.

There are of course many reasons _why_ such code needs the attention of
others, and those reasons have varying strengths.  Off the top of my head:

- modifies stuff outside the designated subsystem (eg: lib/pcounter.c -
  thanks Pavel)

- (having just spent an hour looking at drivers/net/sfc/ and having
  boggled at its bitmap.h): adds generic-looking infrastructure which
  should be in core kernel.  Or already _is_ in core kernel.

- Adds any kernel<->user interface which is not of the the most
  trivial&standard form

- Futzes with memory management internals, adds pagefault handlers, etc.

- Ditto vfs things, I guess

- In any way attempts to work around _any_ shortcoming of any other part
  of the kernel!

- Does anything RCU related.  Every time I cc Paul on an rcu-using patch,
  he finds holes in it.

- add your own here.



But we won't find such code by going out and looking for it - we do need
the recipients of that code to say "hey, others might want to see this". 
That's very low-effort for the hey-sayer, so I expect we can do better here
quite easily.


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 19:37                                     ` Al Viro
  2008-05-01 19:58                                       ` Andrew Morton
@ 2008-05-01 20:07                                       ` Joel Becker
  1 sibling, 0 replies; 229+ messages in thread
From: Joel Becker @ 2008-05-01 20:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Morton, Willy Tarreau,
	David Miller, linux-kernel, Jiri Slaby

On Thu, May 01, 2008 at 08:37:14PM +0100, Al Viro wrote:
> 	* one needs to understand enough of the area or know where/how
> to get the information needed for that.  I've got some experience with
> the latter and I suspect that most of the folks who do active reviews
> have their own set of tricks for getting into the unfamiliar area fast.
> Moreover, having such set of tricks is probably _the_ thing that makes
> us able to do that kind of work.
> 	Sharing such (i.e. "here's how one wades through unfamiliar
> area and gets a sense of what's going on there; here's what one looks
> out for; here's how to deal with data structures; here are the signs
> of problematic lifetime logics; here's how one formulates hypothesis
> about refcounting rules; here's how one verifies such and looks for
> possible bugs in that area; etc.) is a Good Idea(tm).

<snip>

> 	FWIW, I'm trying to do that - right now I'm flipping between
> wading through Cthulhu-damned fs/locks.c and its friends and getting
> the notes I've got from the last month work into edible form (which
> includes translation into something that resembles normal English,
> among other things - more than half of that is in... well, let's call
> it idiom-rich Russian).

	I think you've just nailed one of the tricks right there.  A
long time ago, I just sat down and wrote up a "how the locking works in
the vfs" document for myself and others.  Wrote up the structures, what
each member is for, where the structure appears and disappears, and all
the call chains for all of the locks.  When I was done, I had a pretty
good idea of how everything interacted.
	I think this is a great trick for ramping up on a section of the
code - documentation is good, but you understand self-written
documentation better.

Joel

-- 

Life's Little Instruction Book #452

	"Never compromise your integrity."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 17:41                               ` Linus Torvalds
                                                   ` (2 preceding siblings ...)
  2008-05-01 19:39                                 ` Friedrich Göpel
@ 2008-05-01 21:59                                 ` Rafael J. Wysocki
  2008-05-02 12:17                                   ` Stefan Richter
  3 siblings, 1 reply; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 21:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Willy Tarreau, David Miller, linux-kernel, Jiri Slaby

On Thursday, 1 of May 2008, Linus Torvalds wrote:
> 
> On Thu, 1 May 2008, Rafael J. Wysocki wrote:
> > 
> > I obviously agree with that.  The question is, however, if we can decrease the
> > number of bugs introduced during merge windows and you seem to be saying
> > that no, we can't.  Which is disappointing.
> 
> No, that's not what I'm saying.
> 
> What I *am* saying is that as long as you concentrate on "merge window" 
> and "lots of code", you're concentrating not on the problems, but on the 
> facts of life. You can't change facts, and even trying is pointless.
> 
> What you should concentrate on is not how many patches there are during 
> the merge window (because we can't do anything about that) or the fact 
> that they all happen in a short timeframe, but about quality of patches 
> _regardless_ of merge window.
> 
> So if you can make an argument that does not even *try* to change the fact 
> that 
>  - we have lots of patches
> and
>  - we have a merge window
> and
>  - merging patches causes bugs
> 
> but argues about quality from some other standpoint, then I can start to 
> believe that you have a point.
> 
> But as long as you argue about the fact that we merge a lot of stuff, and 
> that bugs come in during the merge window, I'm not interested. Arguing 
> about facts is totally non-productive.
> 
> And as long as people keep saying "let's not merge broken patches" or "we 
> should never have bugs", I'll just ignore those kinds of idiotic 
> statements. They aren't even arguments, they are wishes, and they are 
> unrealistic. If we knew they were broken and had bugs, of course we 
> wouldn't merge them.
> 
> In short - I'm simply not interested in what you _wish_ reality was. 
> People need to first acknowledge reality, and _then_ they may have 
> solutions.
> 
> So the reality is:
>  - we do have tons of patches, and they need to be merged (and furiously)
> 
>  - there *will* be bugs. And the number of bugs will inevitably be 
>    relative to the number of patches. There is no "perfect", and anybody 
>    who argues for a lower number of bugs by lowering the number of patches 
>    is an idiot in my book.
> 
>  - there *will* be releases, even in the presense of bugs, because holding 
>    everything up is simply not an option.
> 
> Those are the things that we have to accept. Anything else is just 
> dreaming.
> 
> Now, what part _can_ we improve and still be realistic?
> 
> We can try to improve average quality - the number of bugs will *still* be 
> relative to the size of the changes (no getting away from that), but we 
> may be able to lower the absolute number of bugs. But not to zero!
> 
> And that "not to zero" is IMPORTANT. If you think you can aim for zero 
> bugs,

No, I don't.  I've never said we can _eliminate_ bugs and please don't make
things look as though I did.

> I'm simply not interested in discussing it with you. You live in a  
> different universe, and we're not talking about the same reality.
> 
> And if you're not being realistic, then why the hell would I believe that 
> your solutions are realistic? I'd rather take some pills and talk to the 
> little purple man living under the deck in my back yard, because at least 
> he's amusing, even if he doesn't make much sense either.

That's not a level of discussion I'm used to, sorry.

> And I'm also not in the *least* interested in arguments like "We should 
> just improve our quality of patches".
> 
> Of course everybody wishes for that. Again, it's not an argument, it's 
> just a unrealistic wish, unless you can actually give a suggestion of a 
> process or other thing that would actually seem to reach it (without 
> assuming other impossible things like "we need more time" or "we need 
> more people who just spend their day looking for bugs").
> 
> Same goes for "we should all just spend time looking at each others 
> patches and trying to find bugs in them".

Not necessarily trying to find bugs in them, but trying to understand how the
patched code is supposed to work and if that's really what we want.

I really think we should review each other's code more, but I do realize that
people don't do it.  Of course, I'm digressing.

> That's not a solution, that's a drug-induced dream you're living in. And
> again, if I want to discuss dreams, I'd rather talk about my purple guy, and
> the bad things he does to the hedgehog that lives next door.
> 
> So do you have any productive *suggestions*? Some that involve more than 
> "let's write less code" or "let's just review each others patches more".

I'm not sure if you find it productive, but whatever.

A general rule that the trees people want you to pull during a merge window
should be tested in linux-next before, with no additional last minute changes,
may help.

For this to work, though, the people will have to know in advance when the
merge window will start.  Which may be helpful anyway.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:50                                 ` Willy Tarreau
  2008-05-01 19:07                                   ` david
@ 2008-05-01 22:17                                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 229+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 22:17 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Andrew Morton, David Miller, linux-kernel, Jiri Slaby

On Thursday, 1 of May 2008, Willy Tarreau wrote:
> On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> > Same goes for "we should all just spend time looking at each others 
> > patches and trying to find bugs in them". That's not a solution, that's a 
> > drug-induced dream you're living in.
> 
> "all" above is the wrong part. Encourage each other into reviewing code
> will definitely *help* (and I did not say fix the problem, OK?). There
> are persons who regularly spend some time to review code. I'm thinking
> about Al, Andrew, Christoph, Arjan, and maybe many other ones I'm missing,
> just that I regularly see them give advices to people who post their patches
> on the list. And even if only for that, they deserve some respect, and their
> efforts must not be dismissed.
> 
> Maybe they are more skilled than anyone else for this job. Maybe they're
> so much used to do it that it just takes them a few minutes each time, I
> don't know. I wish *more* people could be encouraged to do this work,
> which is very likely painful but instructive. If the current reviewers
> could give hints on how to save a lot of time to them, it may motivate
> more to follow them. I suspect that insisting on developers to post their
> less obvious work to the list(s) is a first step. Maybe at one point we're
> all responsible when we see a mail entitled "[GIT] pull request for XXX",
> we should all jump on it and ask "when and where was this code reviewed ?".
> 
> Once again, it's not a fix. It's just one small step towards a saner process.
> 
> > So do you have any productive *suggestions*? Some that involve more than 
> > "let's write less code" or "let's just review each others patches more".
> 
> It's not much about reviewing each others' patches, it's about showing
> one's work to others first. If our developers are encouraged to work
> alone in a cave late at night with itching eyes, and send their work
> at once every 2 months in a sealed envelope, we'll not solve anything.
> 
> I also proposed a more repressive method incitating the ones with really
> bad scores to find crap in other's work in order to remain hidden behind
> them. You explained why it would not work. Fine.
> 
> I also proposed to group merges by reduced overlapping areas, and to
> shorten the merge window and make it (at least) twice as often. Rafael
> also proposed to merge core first, then archs, which is a refined variation
> on the same principle.

That wasn't me, but the idea is also worth considering IMO.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:19             ` Linus Torvalds
  2008-04-30 22:28               ` Dmitri Vorobiev
@ 2008-05-01 23:06               ` Kevin Winchester
  1 sibling, 0 replies; 229+ messages in thread
From: Kevin Winchester @ 2008-05-01 23:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Dmitri Vorobiev, rjw, davem, linux-kernel,
	jirislaby, mingo

Linus Torvalds wrote:
> 
> On Wed, 30 Apr 2008, Andrew Morton wrote:
>>> For busy (or lazy) people like myself, the big problem with linux-next are
>>> the frequent merge breakages, when pulling the tree stops with "you are in
>>> the middle of a merge conflict".
>> Really?  Doesn't Stephen handle all those problems?  It should be a clean
>> fetch each time?
> 
> It should indeed be a clean fetch, but I wonder if Dmitri perhaps does a 
> "git pull" - which will do the fetch, but then try to _merge_ that fetched 
> state into whatever the last base Dmitri happened to have.
> 
> Dmitry: you cannot just "git pull" on linux-next, because each version of 
> linux-next is independent of the next one. What you should do is basically
> 
> 	# Set this up just once..
> 	git remote add linux-next git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> 
> and then after that, you keep on just doing
> 
> 	git fetch linux-next
> 	git checkout linux-next/master
> 
> which will get you the actual objects and check out the state of that 
> remote (and then you'll normally never be on a local branch on that tree, 
> git will end up using a so-called "detached head" for this).
> 
> IOW, you should never need to do any merges, because Stephen did all those 
> in linux-next already.
> 

Just to add some emphasis here - this is something that took me a long time to figure out, and since it is the pattern for dealing with the x86 trees and with the mm git tree and with linux-next, it would help if it were documented somewhere (not that I can imagine where).  Once you know it, it becomes obvious, but try staring at a merge conflict for a while trying to figure out what to do, and it gets frustrating.  I wonder if we can guess how many testers abandon the mm git tree or the linux-next tree because of this.

It might be nice if git supported a command like git-remote-help or something that would fetch a predefined help file from a remote tree that describes the workflow for that tree.

But at least with an extra reply to this mail, it might creep higher in the google search results when looking for merge conflicts with linux-next.

-- 
Kevin Winchester
 

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 17:49                         ` Russ Dill
@ 2008-05-02  1:47                           ` Kasper Sandberg
  2008-05-02  2:54                             ` Russ Dill
  0 siblings, 1 reply; 229+ messages in thread
From: Kasper Sandberg @ 2008-05-02  1:47 UTC (permalink / raw)
  To: Russ Dill; +Cc: linux-kernel

On Thu, 2008-05-01 at 17:49 +0000, Russ Dill wrote:
> > I Recently found a system with a 2.6.4 kernel, and when i upgraded to
> > 2.6.23, i saw memory usage increase from ~250mb to around 500. I
> > upgraded to .25 to see if it was some weird bug, but it is the same.
> > 
> > Unfortunately i cannot investigate more, as i only had the box for a
> > very short time, but this is alot more concerning to me.
> > 
> 
> Memory is not something that is difficult to track. Its likely one of two things:
> 
> a) Your card now has 3d support, hooray! and X is mapping more regions, which
> isn't really additional RAM usage.
no, that isnt it. im talking ram USAGE. :)
> 
> b) Linux is caching more things, hooray! I'm not saying that you are one of
> those people who just looks at the free number and doesn't think any further,
> but you might be.
Im afraid this theory also isnt the case, i know what the cache is, and
i also know how to subtract :)

> 
> or c, the kernel has another 250MB is kernel data structures, seems unlikely.
Well yes that seems somewhat big, however, the kernel was the ONLY
change.

i can also say that i have noticed this on my own workstation, however
thats not really an as valid case, as i have also upgraded userspace and
such over time, but it used to be that my box wouldnt use more than
~100mb to boot into X with kde open, and about ~300mb at browsing/mail
and such, but these days my workstation easily uses 1.5gb of ram for no
apparent reason..

something certainly is fishy around here, these days people just tend to
fix it by throwing 10 times more ram in than should really be necessary,
which i guess, is because the ram prices has dropped 10 times
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 16:26                 ` Diego Calleja
  2008-05-01 16:31                   ` Dmitri Vorobiev
@ 2008-05-02  1:48                   ` Stephen Rothwell
  1 sibling, 0 replies; 229+ messages in thread
From: Stephen Rothwell @ 2008-05-02  1:48 UTC (permalink / raw)
  To: Diego Calleja
  Cc: Dmitri Vorobiev, Linus Torvalds, Andrew Morton, rjw, davem,
	linux-kernel, jirislaby, mingo

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

On Thu, 1 May 2008 18:26:58 +0200 Diego Calleja <diegocg@gmail.com> wrote:
>
> El Thu, 01 May 2008 02:28:33 +0400, Dmitri Vorobiev <dmitri.vorobiev@gmail.com> escribió:
> 
> > Linus, thanks a lot for the detailed explanation. Indeed, it seems that I foolishly
> > tried to duplicate Stephen's work. In the future I'll do as you suggest here.
> 
> That "howto" should probably be added to the linux-next announcements...
> (CC'ing Stephen)

This is already mentioned in the linux-next wiki
(http://linux.f-seidel.de/linux-next/pmwiki/) in the FAQ.  I will add a
link to the wiki to the announcements.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 13:21               ` Adrian Bunk
  2008-05-01 15:49                 ` Andrew Morton
@ 2008-05-02  2:08                 ` Paul Mackerras
  2008-05-02  3:10                   ` Josh Boyer
  1 sibling, 1 reply; 229+ messages in thread
From: Paul Mackerras @ 2008-05-02  2:08 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Linus Torvalds, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

Adrian Bunk writes:

> "Kernel oops while running kernbench and tbench on powerpc" took more 
> than 2 months to get resolved, and we ship 2.6.25 with this regression.

That was a very subtle bug that only showed up on one particular
powerpc machine.  I was not able to replicate it on any of the powerpc
machines I have here.  Nevertheless, we found it and we have a fix for
it.  I think that's an example of the process working. :)

Paul.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-02  1:47                           ` Kasper Sandberg
@ 2008-05-02  2:54                             ` Russ Dill
  2008-05-02  7:01                               ` Kasper Sandberg
  2008-05-02 17:34                               ` Lee Mathers (TCAFS)
  0 siblings, 2 replies; 229+ messages in thread
From: Russ Dill @ 2008-05-02  2:54 UTC (permalink / raw)
  To: Kasper Sandberg; +Cc: linux-kernel

>  i can also say that i have noticed this on my own workstation, however
>  thats not really an as valid case, as i have also upgraded userspace and
>  such over time, but it used to be that my box wouldnt use more than
>  ~100mb to boot into X with kde open, and about ~300mb at browsing/mail
>  and such, but these days my workstation easily uses 1.5gb of ram for no
>  apparent reason..
>
>  something certainly is fishy around here, these days people just tend to
>  fix it by throwing 10 times more ram in than should really be necessary,
>  which i guess, is because the ram prices has dropped 10 times
>

So you aren't really contributing anything to the discussion. It could
be userspace, it could be different types of pages you are visiting,
it could be the kernel, you haven't really measured what is taking up
the memory. And of course, its all because developers are lazy. Thanks
for the input.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02  2:08                 ` Paul Mackerras
@ 2008-05-02  3:10                   ` Josh Boyer
  2008-05-02  4:09                     ` Paul Mackerras
  0 siblings, 1 reply; 229+ messages in thread
From: Josh Boyer @ 2008-05-02  3:10 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Adrian Bunk, Arjan van de Ven, Linus Torvalds, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

On Fri, 2008-05-02 at 12:08 +1000, Paul Mackerras wrote:
> Adrian Bunk writes:
> 
> > "Kernel oops while running kernbench and tbench on powerpc" took more 
> > than 2 months to get resolved, and we ship 2.6.25 with this regression.
> 
> That was a very subtle bug that only showed up on one particular
> powerpc machine.  I was not able to replicate it on any of the powerpc
> machines I have here.  Nevertheless, we found it and we have a fix for
> it.  I think that's an example of the process working. :)

Was it even a regression in the classical sense of the word?  Seemed
more of a latent bug that was simply never triggered before.

josh


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02  3:10                   ` Josh Boyer
@ 2008-05-02  4:09                     ` Paul Mackerras
  2008-05-02  8:29                       ` Adrian Bunk
  0 siblings, 1 reply; 229+ messages in thread
From: Paul Mackerras @ 2008-05-02  4:09 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Adrian Bunk, Arjan van de Ven, Linus Torvalds, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

Josh Boyer writes:

> On Fri, 2008-05-02 at 12:08 +1000, Paul Mackerras wrote:
> > Adrian Bunk writes:
> > 
> > > "Kernel oops while running kernbench and tbench on powerpc" took more 
> > > than 2 months to get resolved, and we ship 2.6.25 with this regression.
> > 
> > That was a very subtle bug that only showed up on one particular
> > powerpc machine.  I was not able to replicate it on any of the powerpc
> > machines I have here.  Nevertheless, we found it and we have a fix for
> > it.  I think that's an example of the process working. :)
> 
> Was it even a regression in the classical sense of the word?  Seemed
> more of a latent bug that was simply never triggered before.

That's right.  The bug has been there basically forever (i.e. since
before 2.6.12-rc2 ;) and no-one has been able to trigger it reliably
before.

Paul.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-02  2:54                             ` Russ Dill
@ 2008-05-02  7:01                               ` Kasper Sandberg
  2008-05-02 17:34                               ` Lee Mathers (TCAFS)
  1 sibling, 0 replies; 229+ messages in thread
From: Kasper Sandberg @ 2008-05-02  7:01 UTC (permalink / raw)
  To: Russ Dill; +Cc: linux-kernel

On Thu, 2008-05-01 at 19:54 -0700, Russ Dill wrote:
> >  i can also say that i have noticed this on my own workstation, however
> >  thats not really an as valid case, as i have also upgraded userspace and
> >  such over time, but it used to be that my box wouldnt use more than
> >  ~100mb to boot into X with kde open, and about ~300mb at browsing/mail
> >  and such, but these days my workstation easily uses 1.5gb of ram for no
> >  apparent reason..
> >
> >  something certainly is fishy around here, these days people just tend to
> >  fix it by throwing 10 times more ram in than should really be necessary,
> >  which i guess, is because the ram prices has dropped 10 times
> >
> 
> So you aren't really contributing anything to the discussion. It could
> be userspace, it could be different types of pages you are visiting,
> it could be the kernel, you haven't really measured what is taking up
> the memory. And of course, its all because developers are lazy. Thanks
> for the input.

I think you didnt read read the first part of my message..

And as i said, on my workstation i have no idea exactly WHAT has changed
to cause it.. i just came with some information..

And of course, its all because the person writing the email forgets to
include key details such as "however, thats not really an as valid
case...". Thanks for the input..


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02  4:09                     ` Paul Mackerras
@ 2008-05-02  8:29                       ` Adrian Bunk
  2008-05-02 10:16                         ` Paul Mackerras
  2008-05-02 14:58                         ` Linus Torvalds
  0 siblings, 2 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-02  8:29 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Josh Boyer, Arjan van de Ven, Linus Torvalds, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

On Fri, May 02, 2008 at 02:09:39PM +1000, Paul Mackerras wrote:
> Josh Boyer writes:
> 
> > On Fri, 2008-05-02 at 12:08 +1000, Paul Mackerras wrote:
> > > Adrian Bunk writes:
> > > 
> > > > "Kernel oops while running kernbench and tbench on powerpc" took more 
> > > > than 2 months to get resolved, and we ship 2.6.25 with this regression.
> > > 
> > > That was a very subtle bug that only showed up on one particular
> > > powerpc machine.  I was not able to replicate it on any of the powerpc
> > > machines I have here.  Nevertheless, we found it and we have a fix for
> > > it.  I think that's an example of the process working. :)
> > 
> > Was it even a regression in the classical sense of the word?  Seemed
> > more of a latent bug that was simply never triggered before.
> 
> That's right.  The bug has been there basically forever (i.e. since
> before 2.6.12-rc2 ;) and no-one has been able to trigger it reliably
> before.

But for users this is a recent regression since 2.6.24 worked
and 2.6.25 does not.

If this problem was on x86 Linus himself and some other core developers 
would most likely have debugged this issue and Linus would have delayed 
the release of 2.6.25 for getting it fixed there.

And stuff that "only showed up on one particular machine" often shows up 
on many machines (we only know in hindsight) and the "one particular 
machine" is often due to the fact that of the many machines that might 
trigger a regression only one was used for testing this -rc kernel.

This not in any way meant against you personally, and due to the fact 
that the powerpc port is among the better maintained parts of the kernel 
this regression eventually got fixed, but in many other parts of the 
kernel this would have been one more of the many regressions that were 
reported and never fixed.

> Paul.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01  1:13                   ` Arjan van de Ven
@ 2008-05-02  9:00                     ` Adrian Bunk
  0 siblings, 0 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-02  9:00 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Linus Torvalds, Rafael J. Wysocki, davem,
	linux-kernel, jirislaby, Steven Rostedt

On Wed, Apr 30, 2008 at 06:13:38PM -0700, Arjan van de Ven wrote:
> On Thu, 1 May 2008 08:49:19 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > Granted that compared to x86 there's not a sizable portion of users 
> > > crazy enough to run Linux on powerpc machines...
> > 
> > Another fallacy which Arjan is pushing (even though he doesn't appear
> > to have realised it) is "all hardware is the same".
> 
> no I'm pushing "some classes of hardware are much more popular/relevant
> than others".

"popular/relevant" is hard to define.

E.g. if we'd go after "popular" we should only keep architectures like 
ARM and x86 and ditch architectures like ia64 and s390 that have puny 
userbases.

And how would you define "relevant"?

> > Well, it isn't.  And most of our bugs are hardware-specific.  So, I'd
> > venture, most of our bugs don't affect most people.  So, over time, by
> > Arjan's "important to enough people" observation we just get more and
> > more and more unfixed bugs.
> 
> I did not say "most people". I believe "most people" aren't hitting
> bugs right now (or there would be a lot more screaming).
> What I do believe is that *within the bugs that hit*, even the hardware
> specific ones, there's a clear prioritization by how many people hit
> the bug (or have the hardware in general).

If your "or have the hardware in general" is meant seriously you have to
convince people that ARM must become a very high priority.

No matter whether one supports your "there's a clear prioritization" 
view or not it anyway doesn't currently work since the areas covered by 
people testing -rc kernels don't even remotely map the most popular 
hardware in the field.

> > And I believe this effect has been occurring.
> 
> > And please stop regaling us with this kerneloops.org stuff.  It just
> > isn't very interesting, useful or representative when considering the
> > whole problem.  Very few kernel bugs result in a trace, and when they
> > do they are usually easy to fix and, because of this, they will get
> > fixed, often quickly.  I expect
> > netdevwatchdogeth0transmittimedout.org would tell a different story.
> 
> now that's a fallacy of your own.. if you care about that one, it's 1)
> trivial to track and/or 2) could contain a WARN_ON_ONCE(), at which
> point it's automatically tracked. (and more useful information I
> suspect, since it suddenly has a full backtrace including driver info
> in it)
> By your argument we should work hard to make sure we're better at
> creating traces for cases we detect something goes wrong.
> (I would not argue against that fwiw)
>...

kerneloops.org catches the easiest to solve bugs (there's a trace) and 
helps in getting them fixed.

That's a very good thing.

And if we get more bugs into this easy to resolve state that would be 
even better.

But it's only a small part of the complete picture of incoming bug 
reports.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02  8:29                       ` Adrian Bunk
@ 2008-05-02 10:16                         ` Paul Mackerras
  2008-05-02 11:58                           ` Adrian Bunk
  2008-05-02 14:58                         ` Linus Torvalds
  1 sibling, 1 reply; 229+ messages in thread
From: Paul Mackerras @ 2008-05-02 10:16 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Josh Boyer, Arjan van de Ven, Linus Torvalds, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

Adrian Bunk writes:

> > That's right.  The bug has been there basically forever (i.e. since
> > before 2.6.12-rc2 ;) and no-one has been able to trigger it reliably
> > before.
> 
> But for users this is a recent regression since 2.6.24 worked
> and 2.6.25 does not.

I never actually saw a statement to that effect (i.e. that 2.6.24
worked) from Kamalesh.  I think people assumed that because he
reported it against version X that version X-1 worked, but we don't
actually know that.

> If this problem was on x86 Linus himself and some other core developers 
> would most likely have debugged this issue and Linus would have delayed 
> the release of 2.6.25 for getting it fixed there.

If I had been able to replicate it, or if it had been seen on more
than one machine, I would probably have asked Linus to wait while we
fixed it.  

There's a risk management thing happening here.  Delaying a release is
a negative thing in itself, since it means that users have to wait
longer for the improvements we have made.  That has to be balanced
against the negative of some users seeing a regression.  It's not an
absolute, black-and-white kind of thing.  In this case, for a bug
being seen on only one machine, of a somewhat unusual configuration, I
considered it wasn't worth asking to delay the release.

Paul.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:53           ` Mariusz Kozlowski
  2008-04-30 23:11             ` Andrew Morton
@ 2008-05-02 10:20             ` Andi Kleen
  2008-05-02 15:33               ` Mariusz Kozlowski
  1 sibling, 1 reply; 229+ messages in thread
From: Andi Kleen @ 2008-05-02 10:20 UTC (permalink / raw)
  To: Mariusz Kozlowski
  Cc: Andrew Morton, Dan Noe, torvalds, rjw, davem, linux-kernel, jirislaby

Mariusz Kozlowski <m.kozlowski@tuxland.pl> writes:
>
> Speaking of energy and time of a tester. I'd like to know where these resources
> should be directed from the arch point of view. Once I had a plan to buy as
> many arches as I could get and run a farm of test boxes 8-) But that's hard
> because of various reasons (money, time, room, energy). What arches need more
> attention? Which are forgotten? Which are going away? For example does buying
> an alphaserver DS 20 (hey - it's cheap) and running tests on it makes sense
> these days?

A lot of bugs are not architecture specific. Or when they are architecture
specific they only affect some specific machines in that architecture.
But really a lot of bugs should happen on most architectures. Just focussing
on lots of boxes is not necessarily productive.

My recommendation would be to concentrate on deeper testing (more coverage)
on the architectures you have.

A interestig project for example would be to play with the kernel gcov patch that
was recently reposted (I hope it makes mainline eventually). Apply that patch,
run all the test suites and tests you usually run on your favourite test box
and check how much of the code that is compiled into your kernel was really tested
using the coverage information Then think: what additional tests can you do to get 
more coverage?  Write tests then? Or just write descriptions on what is not tested 
and send them to the list, as a project for others looking to contribute to the 
kernel.

-Andi

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 19:26                     ` Andrew Morton
  2008-05-01 19:39                       ` Steven Rostedt
@ 2008-05-02 10:23                       ` Andi Kleen
  1 sibling, 0 replies; 229+ messages in thread
From: Andi Kleen @ 2008-05-02 10:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Theodore Tso, bunk, arjan, torvalds, rjw, davem, linux-kernel,
	jirislaby, rostedt

Andrew Morton <akpm@linux-foundation.org> writes:
>
> (still wants to know what we did 2-3 years ago which caused thousands of
> people to have to resort to using noapic and other apic-related boot option
> workarounds)

Forcing APIC even when the BIOS didn't support them.

-Andi



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02 10:16                         ` Paul Mackerras
@ 2008-05-02 11:58                           ` Adrian Bunk
  0 siblings, 0 replies; 229+ messages in thread
From: Adrian Bunk @ 2008-05-02 11:58 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Josh Boyer, Arjan van de Ven, Linus Torvalds, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

On Fri, May 02, 2008 at 08:16:49PM +1000, Paul Mackerras wrote:
> Adrian Bunk writes:
> 
> > > That's right.  The bug has been there basically forever (i.e. since
> > > before 2.6.12-rc2 ;) and no-one has been able to trigger it reliably
> > > before.
> > 
> > But for users this is a recent regression since 2.6.24 worked
> > and 2.6.25 does not.
> 
> I never actually saw a statement to that effect (i.e. that 2.6.24
> worked) from Kamalesh.  I think people assumed that because he
> reported it against version X that version X-1 worked, but we don't
> actually know that.

He reported it as

[BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

and it was in the 2.6.25 regression lists for ages.

> > If this problem was on x86 Linus himself and some other core developers 
> > would most likely have debugged this issue and Linus would have delayed 
> > the release of 2.6.25 for getting it fixed there.
> 
> If I had been able to replicate it, or if it had been seen on more
> than one machine, I would probably have asked Linus to wait while we
> fixed it.  
> 
> There's a risk management thing happening here.  Delaying a release is
> a negative thing in itself, since it means that users have to wait
> longer for the improvements we have made.  That has to be balanced
> against the negative of some users seeing a regression.  It's not an
> absolute, black-and-white kind of thing.  In this case, for a bug
> being seen on only one machine, of a somewhat unusual configuration, I
> considered it wasn't worth asking to delay the release.

No general disagreement on this.

And my example was not in any way meant against you - it's actually 
unusual and positive that a bug that once got the attention of being
on the regression lists gets fixed later.

Even worse is the situation with regressions people run into when 
upgrading from 2.6.22 to 2.6.24 today...  :-(

> Paul.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 21:59                                 ` Rafael J. Wysocki
@ 2008-05-02 12:17                                   ` Stefan Richter
  0 siblings, 0 replies; 229+ messages in thread
From: Stefan Richter @ 2008-05-02 12:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Andrew Morton, Willy Tarreau, David Miller,
	linux-kernel, Jiri Slaby

Rafael J. Wysocki wrote:
> A general rule that the trees people want you to pull during a merge window
> should be tested in linux-next before, with no additional last minute changes,
> may help.
> 
> For this to work, though, the people will have to know in advance when the
> merge window will start.  Which may be helpful anyway.

If I only release into my tree's for-next branch what I would release
into my tree's for-linus if I was to send a merge request to Linus right
in this moment, then I won't need advance notice of a merge window.

IOW, treat -next everyday as if the merge window was open right now.

I'm sure it is not that easy for the larger subsystems or the
infrastructure trees.  However, Linus' late -rc announcements are plenty
of advance notice, at least for a merge period as long as two weeks.
-- 
Stefan Richter
-=====-==--- -=-= ---=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 18:35                             ` Chris Frey
@ 2008-05-02 13:22                               ` Enrico Weigelt
  0 siblings, 0 replies; 229+ messages in thread
From: Enrico Weigelt @ 2008-05-02 13:22 UTC (permalink / raw)
  To: linux kernel list


Hi folks,


<big_snip>

Just a few naive thoughts:

a) What about reducing code size ?

Some parts, IMHO, doen't necessarily need to be in the kernel,
eg. certain filesystems. Less code, less patches to review, less
chance of kernel bugs. Of course this might also cause other 
impacts (eg. performance), so those decisions require great care.

b) Mutli-tier trees / patchlines

IMHO, a major problem are conflicting patches (eg. a core change
causes some driver to break). In measurement instrumentation 
(eg. timesync), there's typically one primary reference point 
(eg. atomic clock) as tier-0, where (a limited set of) tier-1's 
are synchronized against, tier-2 syncs against tier-1 and so on.

So for the linux kernel, we perhaps could have something like:

* tier-0: core
* tier-1: arch
* tier-2: hw drivers
* tier-3: sw drivers
* tier-4: userland interfaces

If a change from a lower tier wants to it's upper tier, it first
MUST fit it's current mainline and carefully checked. Of course
this introduces longer times for an individual change to go to
into release (since it has to pass several tiers), but IMHO the
chance of new bugs in release should be reduced this way.

Of course there might be chances in a lower tier, which obviously 
won't affect several intermediate tiers. Those could skip some tiers.

For example, I'm currently working on an /proc interface for 
changing process privileges. In my model, this had to be settled
in #4, but shouldn't touch drivers (#2,#3), but maybe arch (#1).
So these changes could be kicked directly to #2. 


What do you think about this ?


cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 21:21         ` David Miller
                             ` (2 preceding siblings ...)
  2008-04-30 22:19           ` Ingo Molnar
@ 2008-05-02 13:37           ` Helge Hafting
  3 siblings, 0 replies; 229+ messages in thread
From: Helge Hafting @ 2008-05-02 13:37 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, torvalds, rjw, linux-kernel, jirislaby

David Miller wrote:
[...]
> I guess what these folks are truly afraid of is that someone will
> start tracking reverts and post their results in some presentation
> at some big conference.  I say that would be a good thing.  To
> be honest, hitting the revert button more aggressively and putting
> the fear of being the "revert king" into everyone's minds might
> really help with this problem.
>   
You will probably want to sort by "revert percentage" then.
The absolute number of reverts might make the biggest contributor
"revert king", even if his average patch quality is better than
most.
> Currently there is no sufficient negative pushback on people who
> insert broken crud into the tree.  So it should be no surprise that it
> continues.

Helge Hafting

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01 15:34                 ` Stefan Richter
@ 2008-05-02 14:05                   ` Tarkan Erimer
  0 siblings, 0 replies; 229+ messages in thread
From: Tarkan Erimer @ 2008-05-02 14:05 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Andrew Morton, Linus Torvalds, rjw, davem, linux-kernel, jirislaby

Stefan Richter wrote:
> Tarkan Erimer wrote:
>> To improve the quality of kernel releases, maybe we can create a 
>> special kernel testing tool.
>
> A variety of bugs cannot be caught by automated tests.  Notably those 
> which happen with rare hardware, or due to very specific interaction 
> with hardware, or with very special workloads.
Of course,it's impossible to test all the things/scenarios. Just, that 
kind of tool, should allow us to minimize the issues that we will face.

>
> An interesting thing to investigate would be to start at the 
> regression meta bugs at bugzilla.kernel.org, go through all bugs on 
> which are linked from there, and try to figure out
>   - if these bugs could have been found by automated or at least
>     semiautomatic tests on pre-merge code, and
>   - how those tests had to have looked like, e.g. what equipment would
>     have been necessary.
>
> Let's look back at the posting at the thread start:
> | On Wed, Apr 30, 2008 at 10:03 AM, David Miller <davem@davemloft.net> 
> wrote:
> | >  Yesterday, I spent the whole day bisecting boot failures
> | >  on my system due to the totally untested linux/bitops.h
> | >  optimization, which I fully analyzed and debugged.
> ...
> | >  Yet another bootup regression got added within the last 24
> | >  hours.
>
> Bootup regressions can be automatically caught if the necessary 
> machines are available, and candidate code gets exposure to test parks 
> of those machines.  I hear this is already being done, and 
> increasingly so.  But those test parks will ever only cover a tiny 
> fraction of existing hardware and cannot be subjected to all code 
> iterations and all possible .config permutations, hence will have 
> limited coverage of bugs.
>
> And things like the bitops issue depend on review much more than on 
> tests, AFAIU.
My idea is also hunting the bugs more easily via a tool like this that 
has a console/X interface and ability to bisect. So; users,who has 
little or no knowledge about git/bisect, can easily try to find out the 
problematic commits/bugs.

Tarkan


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02  8:29                       ` Adrian Bunk
  2008-05-02 10:16                         ` Paul Mackerras
@ 2008-05-02 14:58                         ` Linus Torvalds
  2008-05-02 15:44                           ` Carlos R. Mafra
  1 sibling, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-02 14:58 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Paul Mackerras, Josh Boyer, Arjan van de Ven, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt



On Fri, 2 May 2008, Adrian Bunk wrote:
> 
> But for users this is a recent regression since 2.6.24 worked
> and 2.6.25 does not.

Totally and utterly immaterial.

If it's a timing-related bug, as far as developers are concerned, nothing 
they did introduced the problem.

So anybody who think s that "process" should have caught it is just being 
stupid. 

Adrian, you're one of the absolutely *worst* in the camp of "everything 
should be perfect". You really need to realize that reality is messy, and 
things cannot be pefect.

You also need to realize and *understand* that aiming for "good" is 
actually much BETTER than trying to aim for "perfect".

Perfect is the enemy of good.

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-02 10:20             ` Andi Kleen
@ 2008-05-02 15:33               ` Mariusz Kozlowski
  0 siblings, 0 replies; 229+ messages in thread
From: Mariusz Kozlowski @ 2008-05-02 15:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Dan Noe, torvalds, rjw, davem, linux-kernel, jirislaby

Hello,

> > Speaking of energy and time of a tester. I'd like to know where these resources
> > should be directed from the arch point of view. Once I had a plan to buy as
> > many arches as I could get and run a farm of test boxes 8-) But that's hard
> > because of various reasons (money, time, room, energy). What arches need more
> > attention? Which are forgotten? Which are going away? For example does buying
> > an alphaserver DS 20 (hey - it's cheap) and running tests on it makes sense
> > these days?
> 
> A lot of bugs are not architecture specific. Or when they are architecture
> specific they only affect some specific machines in that architecture.

Yes, there is some amount of bugs that I see only on specific architecture.
These which are reproducible or have an easy test case I do report to LKML, but
there are also bugs I see rarely or just once and they never come back and sometimes
as a bonus leave no trace - and these I ususaly don't report. Providing a test case
is a challenge and one can really learn a lot.

> But really a lot of bugs should happen on most architectures. Just focussing
> on lots of boxes is not necessarily productive.

What I meant was one box per architecture, preferably an SMP one where possible - so
the number of required boxes is limited. This way instead of just cross-compiling
I could actually _run_ the kernel. On the other hand if some arch is close to be dead
and has no foreseable future then there is no point in testing it.

Also my thinking was that sometimes bugs from other (than x86) architectures can point to
some more generic problems. Well - I'll buy just a few more and that's it ;)

> My recommendation would be to concentrate on deeper testing (more coverage)
> on the architectures you have.

Can do.
 
> A interestig project for example would be to play with the kernel gcov patch that
> was recently reposted (I hope it makes mainline eventually). Apply that patch,
> run all the test suites and tests you usually run on your favourite test box
> and check how much of the code that is compiled into your kernel was really tested
> using the coverage information Then think: what additional tests can you do to get 
> more coverage?  Write tests then? Or just write descriptions on what is not tested 
> and send them to the list, as a project for others looking to contribute to the 
> kernel.

Sounds like a plan - will look into that.
 
	Mariusz aka arch'aeologist ;)

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02 14:58                         ` Linus Torvalds
@ 2008-05-02 15:44                           ` Carlos R. Mafra
  2008-05-02 16:28                             ` Linus Torvalds
  0 siblings, 1 reply; 229+ messages in thread
From: Carlos R. Mafra @ 2008-05-02 15:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Paul Mackerras, Josh Boyer, Arjan van de Ven,
	Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

On Fri  2.May'08 at  7:58:25 -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 2 May 2008, Adrian Bunk wrote:
> > 
> > But for users this is a recent regression since 2.6.24 worked
> > and 2.6.25 does not.
> 
> Totally and utterly immaterial.
> 
> If it's a timing-related bug, as far as developers are concerned, nothing 
> they did introduced the problem.
> 
> So anybody who think s that "process" should have caught it is just being 
> stupid. 

So I would like to ask you what an user should do when facing what is
probably a timing-related bug, as it appears I have the bad luck
of hitting one.

See for example my comments after this one 
http://bugzilla.kernel.org/show_bug.cgi?id=10117#c11

This same problem is still present with yesterday's git, and sometimes
it hangs without hpet=disable and sometimes it doesn't. (And never
with hpet=disable in the boot command line)

And when it hangs I can see only _one_ "Switched to high resolution mode
on CPU x" message before the hang point, and when it boots fine there
is always the two of them in sequence:

Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0

And using vga=6 or vga=0x0364 makes a difference in the probability
of hanging.

I am just waiting -rc1 to be released to send an email with my
problem again, as I am unable to debug this myself.
I think this is ok from my part, right?



^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02 15:44                           ` Carlos R. Mafra
@ 2008-05-02 16:28                             ` Linus Torvalds
  2008-05-02 17:15                               ` Carlos R. Mafra
  0 siblings, 1 reply; 229+ messages in thread
From: Linus Torvalds @ 2008-05-02 16:28 UTC (permalink / raw)
  To: Carlos R. Mafra
  Cc: Adrian Bunk, Paul Mackerras, Josh Boyer, Arjan van de Ven,
	Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt



On Fri, 2 May 2008, Carlos R. Mafra wrote:
> 
> So I would like to ask you what an user should do when facing what is
> probably a timing-related bug, as it appears I have the bad luck
> of hitting one.

Quite frankly, it will depend on the bug.

If it's *reliably* timing-related (which sounds crazy, but is not at all 
unheard of), it can be reliably bisected down to some totally unrelated 
commit that doesn't actually introduce the problem at all, but that 
reliably turns it on or off.

That can be very misleading, and can cause us to basically revert a good 
commit, only to not actually fix the bug (and possibly re-introduce the 
bug that the reverted commit tried to fix).

But sometimes it gives us a clue where the timing problem is. But quite 
frankly, that seems to be the exception rather than the rule.

There have been issues that literally seemed to depend on things like 
cacheline placement etc, where changing config options for code that was 
never actually even *run* would change timing just enough to show a bug 
pseudo-reliably or not at all.

The good news is that those timing issues are really quite rare. 

Tha bad news is that when they happen, they are almost totally 
undebuggable. 

> This same problem is still present with yesterday's git, and sometimes
> it hangs without hpet=disable and sometimes it doesn't. (And never
> with hpet=disable in the boot command line)

Hey, it may well be a HPET+NOHZ issue. But it could also be that HPET is 
the thing that just allows you to see the hang.

> And using vga=6 or vga=0x0364 makes a difference in the probability
> of hanging.

.. and yeah, these kinds of really odd and obviously totally unrelated 
issues are a sign of a bug that is either simply hardware instability or 
very subtly timing-related.

The reason I mention hardware instability is that there really are bugs 
that happen due to (for example) power supply instabilities. Brownouts 
under heavy load have been causes of problems, but perhaps surprisingly, 
so has _idle_ time thanks to sleep-states!

The latter is probably due to bad powr conditioning on the CPU power 
lines, where the huge current swings (going at high CPU power to low, and 
back again) not only have made soem motherboards "sing" (or "hum", 
depending on frequency) but also causes voltage instability and then 
the CPU crashes.

Am I saying that's the reason you see problems? Probably not. Most 
instabilities really are due to kernel bugs. But hardware instabilities do 
happen, and they can have these kinds of odd effects.

> I am just waiting -rc1 to be released to send an email with my
> problem again, as I am unable to debug this myself.
> I think this is ok from my part, right?

Yes. You've been a good bug reporter, and kept at it. It's not your fault 
that the bug is hard to pin down. 

Quite frankly, it does sound like the hang happens somewhere around the 

	hpet_init
	hpet_acpi_add
	hpet_resources
	hpet_resources: 0xfed00000 is busy

printk's you added (correct?) and we've had tons of issues with NO_HZ, so 
at a guess it is timer-related.

(And I assume it's stable if/once it gets past that boot hang issue? That 
tends to mean that it's not some hardware instability, it's literally our 
init code).

			Linus

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02 16:28                             ` Linus Torvalds
@ 2008-05-02 17:15                               ` Carlos R. Mafra
  2008-05-02 18:02                                 ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 229+ messages in thread
From: Carlos R. Mafra @ 2008-05-02 17:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Paul Mackerras, Josh Boyer, Arjan van de Ven,
	Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt, venkatesh.pallipadi

On Fri  2.May'08 at  9:28:08 -0700, Linus Torvalds wrote:

> Quite frankly, it does sound like the hang happens somewhere around the 
> 
> 	hpet_init
> 	hpet_acpi_add
> 	hpet_resources
> 	hpet_resources: 0xfed00000 is busy
> 
> printk's you added (correct?) and we've had tons of issues with NO_HZ, so 
> at a guess it is timer-related.

It happens a bit before that because when it hangs it doesn't 
print the above lines, and when it does not hang these lines are
the ones right after the point where it hangs. 

> (And I assume it's stable if/once it gets past that boot hang issue? 

Yes you are right. When I have luck and the boot succeeds my Sony laptop
is rock solid and the kernel is wonderful (even the card reader works!).

> That
> tends to mean that it's not some hardware instability, it's literally our 
> init code).

A few days ago I found this message in lkml in reply to a hpet patch
http://lkml.org/lkml/2007/5/7/361 in which the reporter also had 
a similar hang, which was cured by hpet=disable. 

So it is in my TODO list to try to check out if that patch is 
in the current -git and whether it can be reverted somehow (I 
added Venki to the Cc: now)

Thanks a lot for the answer!

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-02  2:54                             ` Russ Dill
  2008-05-02  7:01                               ` Kasper Sandberg
@ 2008-05-02 17:34                               ` Lee Mathers (TCAFS)
  2008-05-02 18:21                                 ` Andi Kleen
  1 sibling, 1 reply; 229+ messages in thread
From: Lee Mathers (TCAFS) @ 2008-05-02 17:34 UTC (permalink / raw)
  To: Russ Dill; +Cc: linux-kernel

Russ Dill wrote:
>>  i can also say that i have noticed this on my own workstation, however
>>  thats not really an as valid case, as i have also upgraded userspace and
>>  such over time, but it used to be that my box wouldnt use more than
>>  ~100mb to boot into X with kde open, and about ~300mb at browsing/mail
>>  and such, but these days my workstation easily uses 1.5gb of ram for no
>>  apparent reason..
>>
>>  something certainly is fishy around here, these days people just tend to
>>  fix it by throwing 10 times more ram in than should really be necessary,
>>  which i guess, is because the ram prices has dropped 10 times
>>
>>     
>
> So you aren't really contributing anything to the discussion. It could
> be userspace, it could be different types of pages you are visiting,
> it could be the kernel, you haven't really measured what is taking up
> the memory. And of course, its all because developers are lazy. Thanks
> for the input.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
If your subscribing and reading this list at least have the courtesy to 
think about your posting before delivering it around the world.  Not to 
be rude but do you know how dumb you look right now.  There are many 
programs that while provide detailed memory usage  reports.  That would 
be a first step.  


By "cc" to the list...

*WTF*!!  For someone that's been lurking on the kernel lists on and off 
since the mid 90's this has been one of the most stupid discussions and 
largest timesink to date..  Is this really the LKML or the #linux 
channel on irc?


^ permalink raw reply	[flat|nested] 229+ messages in thread

* RE: RFC: starting a kernel-testers group for newbies
  2008-05-02 17:15                               ` Carlos R. Mafra
@ 2008-05-02 18:02                                 ` Pallipadi, Venkatesh
  2008-05-09 16:32                                   ` Mark Lord
  0 siblings, 1 reply; 229+ messages in thread
From: Pallipadi, Venkatesh @ 2008-05-02 18:02 UTC (permalink / raw)
  To: Carlos R. Mafra, Linus Torvalds
  Cc: Adrian Bunk, Paul Mackerras, Josh Boyer, Arjan van de Ven,
	Andrew Morton, Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt, tglx, Len Brown

 

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
>Carlos R. Mafra
>Sent: Friday, May 02, 2008 10:16 AM
>To: Linus Torvalds
>Cc: Adrian Bunk; Paul Mackerras; Josh Boyer; Arjan van de Ven; 
>Andrew Morton; Rafael J. Wysocki; davem@davemloft.net; 
>linux-kernel@vger.kernel.org; jirislaby@gmail.com; Steven 
>Rostedt; Pallipadi, Venkatesh
>Subject: Re: RFC: starting a kernel-testers group for newbies
>
>On Fri  2.May'08 at  9:28:08 -0700, Linus Torvalds wrote:
>
>> Quite frankly, it does sound like the hang happens somewhere 
>around the 
>> 
>> 	hpet_init
>> 	hpet_acpi_add
>> 	hpet_resources
>> 	hpet_resources: 0xfed00000 is busy
>> 
>> printk's you added (correct?) and we've had tons of issues 
>with NO_HZ, so 
>> at a guess it is timer-related.
>
>It happens a bit before that because when it hangs it doesn't 
>print the above lines, and when it does not hang these lines are
>the ones right after the point where it hangs. 
>
>> (And I assume it's stable if/once it gets past that boot hang issue? 
>
>Yes you are right. When I have luck and the boot succeeds my 
>Sony laptop
>is rock solid and the kernel is wonderful (even the card 
>reader works!).
>
>> That
>> tends to mean that it's not some hardware instability, it's 
>literally our 
>> init code).
>
>A few days ago I found this message in lkml in reply to a hpet patch
>http://lkml.org/lkml/2007/5/7/361 in which the reporter also had 
>a similar hang, which was cured by hpet=disable. 
>
>So it is in my TODO list to try to check out if that patch is 
>in the current -git and whether it can be reverted somehow (I 
>added Venki to the Cc: now)
>
>Thanks a lot for the answer!

It depends on whether we are HPET is being force detected based on the
chipset or whether it was exported by the BIOS in ACPI table.

If it was force enabled and above patch is having any effect, then you
should see a message like
> Force enabled HPET at base address 0xfed00000

In any case, off late there seems to be quite a few breakages that are
related to HPET/timer interrupts. One of them was on a system which has
HPET being exported by BIOS
http://bugzilla.kernel.org/show_bug.cgi?id=10409
And the other one where we are force enabling based on chipset
http://bugzilla.kernel.org/show_bug.cgi?id=10561

And then we have hangs once in a while reports by you, Roman and Mark
here
http://bugzilla.kernel.org/show_bug.cgi?id=10377
http://bugzilla.kernel.org/show_bug.cgi?id=10117


Thanks,
Venki

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-02 17:34                               ` Lee Mathers (TCAFS)
@ 2008-05-02 18:21                                 ` Andi Kleen
  2008-05-02 21:34                                   ` Kasper Sandberg
  0 siblings, 1 reply; 229+ messages in thread
From: Andi Kleen @ 2008-05-02 18:21 UTC (permalink / raw)
  To: Lee Mathers (TCAFS); +Cc: Russ Dill, linux-kernel

"Lee Mathers (TCAFS)" <Lee.Mathers@tcafs.org> writes:

> to think about your posting before delivering it around the world.
> Not to be rude but do you know how dumb you look right now.  There are
> many programs that while provide detailed memory usage  reports.  That
> would be a first step.

To be fair detailed memory analysis of user space can be tricky,
especially if you consider shared pages etc. And the standard tools for it 
are actually not very good (would be a very interesting area for someone
to work on I think) Still the poster could have done much
more research before ranting, agreed.

-Andi

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-02 18:21                                 ` Andi Kleen
@ 2008-05-02 21:34                                   ` Kasper Sandberg
  0 siblings, 0 replies; 229+ messages in thread
From: Kasper Sandberg @ 2008-05-02 21:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Lee Mathers (TCAFS), Russ Dill, linux-kernel

On Fri, 2008-05-02 at 20:21 +0200, Andi Kleen wrote:
> "Lee Mathers (TCAFS)" <Lee.Mathers@tcafs.org> writes:
> 
> > to think about your posting before delivering it around the world.
> > Not to be rude but do you know how dumb you look right now.  There are
> > many programs that while provide detailed memory usage  reports.  That
> > would be a first step.
> 
> To be fair detailed memory analysis of user space can be tricky,
> especially if you consider shared pages etc. And the standard tools for it 
> are actually not very good (would be a very interesting area for someone
> to work on I think) Still the poster could have done much
> more research before ranting, agreed.

I did not rant, i in fact said that i did not know enough to place any
blame, or even say what is causing it for whatever reason it might be.
There is a difference, i merely provided some information.

> 
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:21                       ` Al Viro
  2008-05-01  5:19                         ` david
@ 2008-05-04  3:26                         ` Rene Herman
  1 sibling, 0 replies; 229+ messages in thread
From: Rene Herman @ 2008-05-04  3:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Rafael J. Wysocki, Willy Tarreau, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

On 01-05-08 04:21, Al Viro wrote:

> Really?  And how, pray tell, being out there will magically improve the
> code?  "With enough eyes all bugs are shallow" stuff out of ESR's arse?

In the same way that ESR's arse would improve if he'd not wear pants: by him 
going to the gym more to avoid at least a few of the many disgusted stares.

ie, the magic would be in the quality of the code being greater simply due 
to developer being aware of the openness. The effect probably wears of after 
enough time though...

Rene.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-04-30 14:15             ` Arjan van de Ven
  2008-05-01 12:42               ` David Woodhouse
@ 2008-05-04 12:45               ` Rene Herman
  2008-05-04 13:00                 ` Pekka Enberg
  1 sibling, 1 reply; 229+ messages in thread
From: Rene Herman @ 2008-05-04 12:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Adrian Bunk, Linus Torvalds, Rafael J. Wysocki,
	davem, linux-kernel, jirislaby, Steven Rostedt

On 30-04-08 16:15, Arjan van de Ven wrote:

> Does that mean nobody should fix the m68k bug? Someone who cares about
> m68k for sure should work on it, or if it's easy for an ext4 developer, 
> sure. But if the ext4 person has to spend 8 hours on it figuring cross
> compilers, I say we're doing something very wrong here. (no offense to
> the m68k people, but there's just a few of you; maybe I should have
> picked voyager instead)

On that note, I'd really like to see better binary availability of cross 
compilers. While it's improved over the last few years mostly due to the 
crossgcc stuff it's still a pain. Ideally, they would be available through 
the distribution package manager even but failing that some dedicated place 
on kernel.org with x86->lots and some of the more widely used other 
combinations would quite definitely be good. Perhaps not really directly 
relevant to this thread as such, but still good.

Andrew maintain{s,ed} a number of them at

http://userweb.kernel.org/~akpm/cross-compilers/

But as you see, most of the stuff there is really old again...

Rene

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-04 12:45               ` Rene Herman
@ 2008-05-04 13:00                 ` Pekka Enberg
  2008-05-04 13:19                   ` Rene Herman
  2008-05-05 13:13                   ` crosscompiler [WAS: RFC: starting a kernel-testers group for newbies] Enrico Weigelt
  0 siblings, 2 replies; 229+ messages in thread
From: Pekka Enberg @ 2008-05-04 13:00 UTC (permalink / raw)
  To: Rene Herman
  Cc: Arjan van de Ven, Andrew Morton, Adrian Bunk, Linus Torvalds,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt, Vegard Nossum

On Sun, May 4, 2008 at 3:45 PM, Rene Herman <rene.herman@keyaccess.nl> wrote:
>  On that note, I'd really like to see better binary availability of cross
> compilers. While it's improved over the last few years mostly due to the
> crossgcc stuff it's still a pain. Ideally, they would be available through
> the distribution package manager even but failing that some dedicated place
> on kernel.org with x86->lots and some of the more widely used other
> combinations would quite definitely be good. Perhaps not really directly
> relevant to this thread as such, but still good.
>
>  Andrew maintain{s,ed} a number of them at
>
>  http://userweb.kernel.org/~akpm/cross-compilers/
>
>  But as you see, most of the stuff there is really old again...

You're most welcome to help out Vegard to do this:

http://www.kernel.org/pub/tools/crosstool/

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-04 13:00                 ` Pekka Enberg
@ 2008-05-04 13:19                   ` Rene Herman
  2008-05-05 13:13                   ` crosscompiler [WAS: RFC: starting a kernel-testers group for newbies] Enrico Weigelt
  1 sibling, 0 replies; 229+ messages in thread
From: Rene Herman @ 2008-05-04 13:19 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Arjan van de Ven, Andrew Morton, Adrian Bunk, Linus Torvalds,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt, Vegard Nossum

On 04-05-08 15:00, Pekka Enberg wrote:

> On Sun, May 4, 2008 at 3:45 PM, Rene Herman <rene.herman@keyaccess.nl> wrote:

>>  On that note, I'd really like to see better binary availability of cross
>> compilers. While it's improved over the last few years mostly due to the
>> crossgcc stuff it's still a pain. Ideally, they would be available through
>> the distribution package manager even but failing that some dedicated place
>> on kernel.org with x86->lots and some of the more widely used other
>> combinations would quite definitely be good. Perhaps not really directly
>> relevant to this thread as such, but still good.
>>
>>  Andrew maintain{s,ed} a number of them at
>>
>>  http://userweb.kernel.org/~akpm/cross-compilers/
>>
>>  But as you see, most of the stuff there is really old again...
> 
> You're most welcome to help out Vegard to do this:
> 
> http://www.kernel.org/pub/tools/crosstool/

Ah, thanks, lovely, just new I see (and yes, I meant s/grossgcc/crosstool/). 
Good thing. I'll check it out and see if there's anything to add.

Rene.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  4:46           ` Linus Torvalds
@ 2008-05-04 13:47             ` Krzysztof Halasa
  2008-05-04 15:05               ` Jacek Luczak
  0 siblings, 1 reply; 229+ messages in thread
From: Krzysztof Halasa @ 2008-05-04 13:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Paul Mackerras, Rafael J. Wysocki, David Miller,
	linux-kernel, Andrew Morton, Jiri Slaby

Personally I think the current process works reasonably well, though
as we should always try to improve it further...

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 1 May 2008, Jeff Garzik wrote:
>> - opens all the debates about running parallel branches, such as, would it be
>> better to /branch/ for 2.6.X-rc, and then keep going full steam on
>> the trunk?

I think you could branch at ~ rc3 (strictly critical fixes only from
this point). This way 'next' wouldn't be low-maintenance but the
release branch would be.

I.e., the merge window would open at ~ rc3. At 'final', the merge window
would probably be already closed :-)

Something like:
- 2.6.26-rc3: 2.6.27 merge window opens, 2.6.26 - fixes only
- 1 week later: no core changes for 2.6.27 except fixes (drivers only?)

2.6.26* would receive backports from 2.6.27 (cherry-picking? applying
on 2.6.26 and merging?).

The "no open regressions" rule would make sense certainly - unless in
a specific case agreed otherwise.

Perhaps if needed you could let other people do the final release
("stable" extension) and concentrate on the trunk.

> If I'd have both a 'next' branch _and_ a full 2-week merge window, there's 
> no upside.

Shorter cycle is the big upside.

Perhaps we could start branching later at first - say at 2.6.26-rc5,
and see how does it work.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-04 13:47             ` Krzysztof Halasa
@ 2008-05-04 15:05               ` Jacek Luczak
  0 siblings, 0 replies; 229+ messages in thread
From: Jacek Luczak @ 2008-05-04 15:05 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Linus Torvalds, Jeff Garzik, Paul Mackerras, Rafael J. Wysocki,
	David Miller, linux-kernel, Andrew Morton, Jiri Slaby

Krzysztof Halasa pisze:
> Personally I think the current process works reasonably well, though
> as we should always try to improve it further...
> 
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> On Thu, 1 May 2008, Jeff Garzik wrote:
>>> - opens all the debates about running parallel branches, such as, would it be
>>> better to /branch/ for 2.6.X-rc, and then keep going full steam on
>>> the trunk?
> 
> I think you could branch at ~ rc3 (strictly critical fixes only from
> this point). This way 'next' wouldn't be low-maintenance but the
> release branch would be.
> 
> I.e., the merge window would open at ~ rc3. At 'final', the merge window
> would probably be already closed :-)
> 
> Something like:
> - 2.6.26-rc3: 2.6.27 merge window opens, 2.6.26 - fixes only
> - 1 week later: no core changes for 2.6.27 except fixes (drivers only?)

Yep, that sounds pretty interesting. But It would be better to start something
like ,,slow merge window'' (explained below) around -rc4 where things really
slow down (or used to).

The idea of ,,slow merge window'' would look like:
	- merge only *obvious* (long awaiting) changes;
	- merge stuff (fixes) which comes to -rc releases;
	- merge non-core changes from -mm;

After releasing stable kernel the old style merge window opens.

> 2.6.26* would receive backports from 2.6.27 (cherry-picking? applying
> on 2.6.26 and merging?).
> The "no open regressions" rule would make sense certainly - unless in
> a specific case agreed otherwise.
> 
> Perhaps if needed you could let other people do the final release
> ("stable" extension) and concentrate on the trunk.
> 
>> If I'd have both a 'next' branch _and_ a full 2-week merge window, there's 
>> no upside.
> 
> Shorter cycle is the big upside.
> 
> Perhaps we could start branching later at first - say at 2.6.26-rc5,
> and see how does it work.

-Jacek

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 22:19           ` Ingo Molnar
  2008-04-30 22:22             ` David Miller
  2008-04-30 22:35             ` Ingo Molnar
@ 2008-05-05  3:04             ` Rusty Russell
  2 siblings, 0 replies; 229+ messages in thread
From: Rusty Russell @ 2008-05-05  3:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David Miller, akpm, torvalds, rjw, linux-kernel, jirislaby

On Thursday 01 May 2008 08:19:36 Ingo Molnar wrote:
> * David Miller <davem@davemloft.net> wrote:
> > And the people who stick these regressions into the tree need more
> > negative reinforcement.
>
> What we need is not 'negative reinforcement'.

Over time as patches succeed more I reduce testing so I can "get things done 
faster".  Eventually I screw up, and get more cautious on checking.  It's a 
dynamic balance.

With reduced review comes sloppier code.  If we can't increase review, we can 
at least increase the penalty for screwing up when I do get caught.

If vger dropped all my emails for a week after I broke the kernel, I'd be far 
more careful OR I'd find efficient ways to avoid doing that (like increasing 
review, or automated testing).  Either way, it's a win.

But I'm sure everyone else is far more disciplined than I...
Rusty.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-01 12:42               ` David Woodhouse
  2008-04-30 15:02                 ` Arjan van de Ven
@ 2008-05-05 10:03                 ` Benny Halevy
  1 sibling, 0 replies; 229+ messages in thread
From: Benny Halevy @ 2008-05-05 10:03 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Arjan van de Ven, Andrew Morton, Adrian Bunk, Linus Torvalds,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt

On May. 01, 2008, 15:42 +0300, David Woodhouse <dwmw2@infradead.org> wrote:
> On Wed, 2008-04-30 at 07:15 -0700, Arjan van de Ven wrote:
>> Maybe that's a "boggle" for you; but for me that's symptomatic of
>> where we are today: We don't make (effective) prioritization
>> decisions. Such decisions are hard, because it effectively means
>> telling people "I'm sorry but your bug is not yet important". 
> 
> It's not that clear-cut, either. Something which manifests itself as a
> build failure or an immediate test failure on m68k alone, might actually
> turn out to cause subtle data corruption on other platforms.
> 
> You can't always know that it isn't important, just because it only
> shows up in some esoteric circumstances. You only really know how
> important it was _after_ you've fixed it.
> 
> That obviously doesn't help us to prioritise.
> 

Ideally, you'd do an analysis first and then prioritize, based
on the severity of the bug, its exposure, how easy it is it fix,
etc.  If while doing that you already have a fix at hand, you're
almost done :)

Recursively, there's the problem of which bugs you analyze first.
I'm inclined to say that you want to analyze most if not all bug reports
in higher priority than working on fixing non-critical bug.

Benny

^ permalink raw reply	[flat|nested] 229+ messages in thread

* crosscompiler [WAS: RFC: starting a kernel-testers group for newbies]
  2008-05-04 13:00                 ` Pekka Enberg
  2008-05-04 13:19                   ` Rene Herman
@ 2008-05-05 13:13                   ` Enrico Weigelt
  1 sibling, 0 replies; 229+ messages in thread
From: Enrico Weigelt @ 2008-05-05 13:13 UTC (permalink / raw)
  To: linux kernel list

* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> You're most welcome to help out Vegard to do this:
> 
> http://www.kernel.org/pub/tools/crosstool/

You could also use ct-ng:

http://ymorin.is-a-geek.org/dokuwiki/projects/crosstool

Works excellent for me :)


cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 20:54       ` Andrew Morton
  2008-04-30 21:21         ` David Miller
  2008-04-30 21:42         ` Dmitri Vorobiev
@ 2008-05-09  9:28         ` Jiri Kosina
  2008-05-09 15:00           ` Jeff Garzik
  2 siblings, 1 reply; 229+ messages in thread
From: Jiri Kosina @ 2008-05-09  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, rjw, davem, linux-kernel, jirislaby

On Wed, 30 Apr 2008, Andrew Morton wrote:

> I get the impression that we're seeing very little non-Stephen testing 
> of linux-next at this stage.  I hope we can ramp that up a bit, 
> initially by having core developers doing at least some basic sanity 
> testing.

Probably it would make sense also for distro vendors to make linux-next 
snapshosts available in their development distro branches (redhat's 
rawhide, opensuse's factory, etc), to make it easier to test by those 
users who are willing to test if it works in their environment, but don't 
want to compile kernels themselves.

-- 
Jiri Kosina


^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-09  9:28         ` Jiri Kosina
@ 2008-05-09 15:00           ` Jeff Garzik
  0 siblings, 0 replies; 229+ messages in thread
From: Jeff Garzik @ 2008-05-09 15:00 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andrew Morton, Linus Torvalds, rjw, davem, linux-kernel, jirislaby

Jiri Kosina wrote:
> On Wed, 30 Apr 2008, Andrew Morton wrote:
> 
>> I get the impression that we're seeing very little non-Stephen testing 
>> of linux-next at this stage.  I hope we can ramp that up a bit, 
>> initially by having core developers doing at least some basic sanity 
>> testing.

I try to test linux-next on a few SATA test boxes, but it's definitely 
not a daily thing.


> Probably it would make sense also for distro vendors to make linux-next 
> snapshosts available in their development distro branches (redhat's 
> rawhide, opensuse's factory, etc), to make it easier to test by those 
> users who are willing to test if it works in their environment, but don't 
> want to compile kernels themselves.

Agreed...  any lead time on linux-next testing would be great.

	Jeff




^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-02 18:02                                 ` Pallipadi, Venkatesh
@ 2008-05-09 16:32                                   ` Mark Lord
  2008-05-09 19:30                                     ` Carlos R. Mafra
  0 siblings, 1 reply; 229+ messages in thread
From: Mark Lord @ 2008-05-09 16:32 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Carlos R. Mafra, Linus Torvalds, Adrian Bunk, Paul Mackerras,
	Josh Boyer, Arjan van de Ven, Andrew Morton, Rafael J. Wysocki,
	davem, linux-kernel, jirislaby, Steven Rostedt, tglx, Len Brown

Pallipadi, Venkatesh wrote:
>  
> 
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org 
>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
>> Carlos R. Mafra
>> Sent: Friday, May 02, 2008 10:16 AM
>> To: Linus Torvalds
>> Cc: Adrian Bunk; Paul Mackerras; Josh Boyer; Arjan van de Ven; 
>> Andrew Morton; Rafael J. Wysocki; davem@davemloft.net; 
>> linux-kernel@vger.kernel.org; jirislaby@gmail.com; Steven 
>> Rostedt; Pallipadi, Venkatesh
>> Subject: Re: RFC: starting a kernel-testers group for newbies
>>
>> On Fri  2.May'08 at  9:28:08 -0700, Linus Torvalds wrote:
>>
>>> Quite frankly, it does sound like the hang happens somewhere 
>> around the 
>>> 	hpet_init
>>> 	hpet_acpi_add
>>> 	hpet_resources
>>> 	hpet_resources: 0xfed00000 is busy
>>>
>>> printk's you added (correct?) and we've had tons of issues 
>> with NO_HZ, so 
>>> at a guess it is timer-related.
>> It happens a bit before that because when it hangs it doesn't 
>> print the above lines, and when it does not hang these lines are
>> the ones right after the point where it hangs. 
>>
>>> (And I assume it's stable if/once it gets past that boot hang issue? 
>> Yes you are right. When I have luck and the boot succeeds my 
>> Sony laptop
>> is rock solid and the kernel is wonderful (even the card 
>> reader works!).
>>
>>> That
>>> tends to mean that it's not some hardware instability, it's 
>> literally our 
>>> init code).
>> A few days ago I found this message in lkml in reply to a hpet patch
>> http://lkml.org/lkml/2007/5/7/361 in which the reporter also had 
>> a similar hang, which was cured by hpet=disable. 
>>
>> So it is in my TODO list to try to check out if that patch is 
>> in the current -git and whether it can be reverted somehow (I 
>> added Venki to the Cc: now)
>>
>> Thanks a lot for the answer!
> 
> It depends on whether we are HPET is being force detected based on the
> chipset or whether it was exported by the BIOS in ACPI table.
> 
> If it was force enabled and above patch is having any effect, then you
> should see a message like
>> Force enabled HPET at base address 0xfed00000
> 
> In any case, off late there seems to be quite a few breakages that are
> related to HPET/timer interrupts. One of them was on a system which has
> HPET being exported by BIOS
> http://bugzilla.kernel.org/show_bug.cgi?id=10409
> And the other one where we are force enabling based on chipset
> http://bugzilla.kernel.org/show_bug.cgi?id=10561
> 
> And then we have hangs once in a while reports by you, Roman and Mark
> here
> http://bugzilla.kernel.org/show_bug.cgi?id=10377
> http://bugzilla.kernel.org/show_bug.cgi?id=10117
..

Yeah.  This particular bug first appeared when NOHZ & HPET were added.
Somebody once suggested it had something to do with an SMI interrupt
happening in the midst of HPET calibration or some such thing.

But nobody who works on the HPET code has ever shown more than a casual
interest in helping to track down and fix whatever the problem is.

Cheers

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-09 16:32                                   ` Mark Lord
@ 2008-05-09 19:30                                     ` Carlos R. Mafra
  2008-05-09 20:39                                       ` Mark Lord
  0 siblings, 1 reply; 229+ messages in thread
From: Carlos R. Mafra @ 2008-05-09 19:30 UTC (permalink / raw)
  To: Mark Lord
  Cc: Pallipadi, Venkatesh, Linus Torvalds, Adrian Bunk,
	Paul Mackerras, Josh Boyer, Arjan van de Ven, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt, tglx, Len Brown

On Fri  9.May'08 at 12:32:51 -0400, Mark Lord wrote:
> Pallipadi, Venkatesh wrote:
>>  
>>> -----Original Message-----
>>> From: linux-kernel-owner@vger.kernel.org 
>>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Carlos R. Mafra
>>> Sent: Friday, May 02, 2008 10:16 AM
>>> To: Linus Torvalds
>>> Cc: Adrian Bunk; Paul Mackerras; Josh Boyer; Arjan van de Ven; Andrew 
>>> Morton; Rafael J. Wysocki; davem@davemloft.net; 
>>> linux-kernel@vger.kernel.org; jirislaby@gmail.com; Steven Rostedt; 
>>> Pallipadi, Venkatesh
>>> Subject: Re: RFC: starting a kernel-testers group for newbies
>>>
>>> On Fri  2.May'08 at  9:28:08 -0700, Linus Torvalds wrote:
>>>
>>>> Quite frankly, it does sound like the hang happens somewhere 
>>> around the 
>>>> 	hpet_init
>>>> 	hpet_acpi_add
>>>> 	hpet_resources
>>>> 	hpet_resources: 0xfed00000 is busy
>>>>
>>>> printk's you added (correct?) and we've had tons of issues 
>>> with NO_HZ, so 
>>>> at a guess it is timer-related.
>>> It happens a bit before that because when it hangs it doesn't print the 
>>> above lines, and when it does not hang these lines are
>>> the ones right after the point where it hangs. 
>>>> (And I assume it's stable if/once it gets past that boot hang issue? 
>>> Yes you are right. When I have luck and the boot succeeds my Sony laptop
>>> is rock solid and the kernel is wonderful (even the card reader works!).
>>>
>>>> That
>>>> tends to mean that it's not some hardware instability, it's 
>>> literally our 
>>>> init code).
>>> A few days ago I found this message in lkml in reply to a hpet patch
>>> http://lkml.org/lkml/2007/5/7/361 in which the reporter also had a 
>>> similar hang, which was cured by hpet=disable. 
>>> So it is in my TODO list to try to check out if that patch is in the 
>>> current -git and whether it can be reverted somehow (I added Venki to the 
>>> Cc: now)
>>>
>>> Thanks a lot for the answer!
>>
>> It depends on whether we are HPET is being force detected based on the
>> chipset or whether it was exported by the BIOS in ACPI table.
>>
>> If it was force enabled and above patch is having any effect, then you
>> should see a message like
>>> Force enabled HPET at base address 0xfed00000
>>
>> In any case, off late there seems to be quite a few breakages that are
>> related to HPET/timer interrupts. One of them was on a system which has
>> HPET being exported by BIOS
>> http://bugzilla.kernel.org/show_bug.cgi?id=10409
>> And the other one where we are force enabling based on chipset
>> http://bugzilla.kernel.org/show_bug.cgi?id=10561
>>
>> And then we have hangs once in a while reports by you, Roman and Mark
>> here
>> http://bugzilla.kernel.org/show_bug.cgi?id=10377
>> http://bugzilla.kernel.org/show_bug.cgi?id=10117
> ..
>
> Yeah.  This particular bug first appeared when NOHZ & HPET were added.
> Somebody once suggested it had something to do with an SMI interrupt
> happening in the midst of HPET calibration or some such thing.
>

I said I was waiting for -rc1 to be released to send another email
about my HPET problem, but curiously with v2.6.26-rc1-6-gafa26be 
my laptop did not hang after 30+ boots and counting. 

Somewhere between 2.6.25-07000-(something) and the above kernel
something happened which changed significantly the probability
of hanging during boot. 

I could not boot more than 3 times in
a row without hanging with kernels up to 2.6.25-07000 (approximately),
and now I am still booting v2.6.26-rc1-6-gafa26be a few times a day
and no hangs yet.

Yesterday I started a "reverse" bisection, trying to find which
commit "fixed" it, but I still didn't finish (but it is past
-7200).

Of course I am not sure if after the 100th boot the latest -git
won't hang but it definitely improved.

> But nobody who works on the HPET code has ever shown more than a casual
> interest in helping to track down and fix whatever the problem is.

Well, I would like to thank Venki for his effort because he even
answered some private emails from me about this issue and is 
tracking the bugzillas about it.

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: RFC: starting a kernel-testers group for newbies
  2008-05-09 19:30                                     ` Carlos R. Mafra
@ 2008-05-09 20:39                                       ` Mark Lord
  0 siblings, 0 replies; 229+ messages in thread
From: Mark Lord @ 2008-05-09 20:39 UTC (permalink / raw)
  To: Mark Lord, Pallipadi, Venkatesh, Linus Torvalds, Adrian Bunk,
	Paul Mackerras, Josh Boyer, Arjan van de Ven, Andrew Morton,
	Rafael J. Wysocki, davem, linux-kernel, jirislaby,
	Steven Rostedt, tglx, Len Brown

Carlos R. Mafra wrote:
> On Fri  9.May'08 at 12:32:51 -0400, Mark Lord wrote:
>> Pallipadi, Venkatesh wrote:
>>>  
>>>> -----Original Message-----
>>>> From: linux-kernel-owner@vger.kernel.org 
>>>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Carlos R. Mafra
>>>> Sent: Friday, May 02, 2008 10:16 AM
>>>> To: Linus Torvalds
>>>> Cc: Adrian Bunk; Paul Mackerras; Josh Boyer; Arjan van de Ven; Andrew 
>>>> Morton; Rafael J. Wysocki; davem@davemloft.net; 
>>>> linux-kernel@vger.kernel.org; jirislaby@gmail.com; Steven Rostedt; 
>>>> Pallipadi, Venkatesh
>>>> Subject: Re: RFC: starting a kernel-testers group for newbies
>>>>
>>>> On Fri  2.May'08 at  9:28:08 -0700, Linus Torvalds wrote:
>>>>
>>>>> Quite frankly, it does sound like the hang happens somewhere 
>>>> around the 
>>>>> 	hpet_init
>>>>> 	hpet_acpi_add
>>>>> 	hpet_resources
>>>>> 	hpet_resources: 0xfed00000 is busy
>>>>>
>>>>> printk's you added (correct?) and we've had tons of issues 
>>>> with NO_HZ, so 
>>>>> at a guess it is timer-related.
>>>> It happens a bit before that because when it hangs it doesn't print the 
>>>> above lines, and when it does not hang these lines are
>>>> the ones right after the point where it hangs. 
>>>>> (And I assume it's stable if/once it gets past that boot hang issue? 
>>>> Yes you are right. When I have luck and the boot succeeds my Sony laptop
>>>> is rock solid and the kernel is wonderful (even the card reader works!).
>>>>
>>>>> That
>>>>> tends to mean that it's not some hardware instability, it's 
>>>> literally our 
>>>>> init code).
>>>> A few days ago I found this message in lkml in reply to a hpet patch
>>>> http://lkml.org/lkml/2007/5/7/361 in which the reporter also had a 
>>>> similar hang, which was cured by hpet=disable. 
>>>> So it is in my TODO list to try to check out if that patch is in the 
>>>> current -git and whether it can be reverted somehow (I added Venki to the 
>>>> Cc: now)
>>>>
>>>> Thanks a lot for the answer!
>>> It depends on whether we are HPET is being force detected based on the
>>> chipset or whether it was exported by the BIOS in ACPI table.
>>>
>>> If it was force enabled and above patch is having any effect, then you
>>> should see a message like
>>>> Force enabled HPET at base address 0xfed00000
>>> In any case, off late there seems to be quite a few breakages that are
>>> related to HPET/timer interrupts. One of them was on a system which has
>>> HPET being exported by BIOS
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10409
>>> And the other one where we are force enabling based on chipset
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10561
>>>
>>> And then we have hangs once in a while reports by you, Roman and Mark
>>> here
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10377
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10117
>> ..
>>
>> Yeah.  This particular bug first appeared when NOHZ & HPET were added.
>> Somebody once suggested it had something to do with an SMI interrupt
>> happening in the midst of HPET calibration or some such thing.
>>
> 
> I said I was waiting for -rc1 to be released to send another email
> about my HPET problem, but curiously with v2.6.26-rc1-6-gafa26be 
> my laptop did not hang after 30+ boots and counting. 
> 
> Somewhere between 2.6.25-07000-(something) and the above kernel
> something happened which changed significantly the probability
> of hanging during boot. 
> 
> I could not boot more than 3 times in
> a row without hanging with kernels up to 2.6.25-07000 (approximately),
> and now I am still booting v2.6.26-rc1-6-gafa26be a few times a day
> and no hangs yet.
> 
> Yesterday I started a "reverse" bisection, trying to find which
> commit "fixed" it, but I still didn't finish (but it is past
> -7200).
> 
> Of course I am not sure if after the 100th boot the latest -git
> won't hang but it definitely improved.
> 
>> But nobody who works on the HPET code has ever shown more than a casual
>> interest in helping to track down and fix whatever the problem is.
> 
> Well, I would like to thank Venki for his effort because he even
> answered some private emails from me about this issue and is 
> tracking the bugzillas about it.
..

My experience with this bug, since 2.6.20 or so, has been that it comes
and goes with even the most innocent change in the .config file,
like turning frame pointers on/off.

Cheers

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-04-30 23:11             ` Andrew Morton
@ 2008-05-12  9:27               ` Ben Dooks
  0 siblings, 0 replies; 229+ messages in thread
From: Ben Dooks @ 2008-05-12  9:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mariusz Kozlowski, dpn, torvalds, rjw, davem, linux-kernel, jirislaby

On Wed, Apr 30, 2008 at 04:11:30PM -0700, Andrew Morton wrote:
> On Thu, 1 May 2008 00:53:31 +0200
> Mariusz Kozlowski <m.kozlowski@tuxland.pl> wrote:
> 
> > Hello,
> > 
> > > > Perhaps we should be clear and simple about what potential testers 
> > > > should be running at any given point in time.  With -mm, linux-next, 
> > > > linux-2.6, etc, as a newcomer I find it difficult to know where my 
> > > > testing time and energy is best directed.
> > 
> > Speaking of energy and time of a tester. I'd like to know where these resources
> > should be directed from the arch point of view. Once I had a plan to buy as
> > many arches as I could get and run a farm of test boxes 8-) But that's hard
> > because of various reasons (money, time, room, energy). What arches need more
> > attention? Which are forgotten? Which are going away? For example does buying
> > an alphaserver DS 20 (hey - it's cheap) and running tests on it makes sense
> > these days?
> > 
> 
> gee.
> 
> I think to a large extent this problem solves itself - the "more important"
> architectures have more people using them, so they get more testing and
> more immediate testing.
> 
> However there are gaps.  I'd say that arm is one of the more important
> architectures, but many people who are interested in arm tend to shy away
> from bleeding-edge kernels for various reasons.  Mainly because they have
> real products to get out the door, rather than dinking around with mainline
> kernel developement.  So testing bleeding-edge on some arm systems would be
> good, I expect.

As both personally, and the policy of my employer we try and ensure we
can offer our customers at-least the previous 'stable' kernel release
and ensure that our development process tracks the kernel -rcX
candidates. We also run an autobuilder[1] which runs all -git releases
through an automated build (no auto-test yet) to ensure that we can
detect any build or configuration errors in the releases. 

ARM is a fast moving area due to the amount of sillicon vendors out
there who seem intent on doing their own thing, and often forking
hardware blocks they use during differing development branches. I
am currently looking at merging support for the S3C6400 (new) and
finishing S3C2443 (similar to 6400) and the S3C24A0... this means
that I have a lot of code to look through before each release and
having a stall will just keep the backlog building, making my job
a lot more difficult.


[1] http://armlinux.simtec.co.uk/kautobuild/

-- 
Ben (ben@fluff.org, http://www.fluff.org/)

  'a smiley only costs 4 bytes'

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
  2008-05-01  2:30                     ` Linus Torvalds
  2008-05-01 18:54                       ` Adrian Bunk
@ 2008-05-14 14:55                       ` Pavel Machek
  1 sibling, 0 replies; 229+ messages in thread
From: Pavel Machek @ 2008-05-14 14:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Andrew Morton, rjw, davem, linux-kernel, jirislaby

Hi!

> > I am saying that it was merged too early, and that there are points that 
> > should have been addressed before the driver got merged.
> > 
> > Get it submitted for review to linux-kernel.
> > Give the maintainers some time to incorporate all comments.
> > Even one month later it could still have made it into 2.6.25.
> > 
> > The only problem with my suggestion is that it's currently pretty random 
> > whether someone takes the time to review such a driver on linux-kernel.
> 
> Now, I do agree that we could/should have some more process in general. I 
> really _would_ like to have a process in place that basically says:
> 
>  - everything must have gone through lkml at least once


What about 'must go through lkml at least once *outside the merge
window*'. Or is it just me?

 During the merge window, I'm totally overloaded by all those patches
going in and related lkml traffic...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: Slow DOWN, please!!!
@ 2008-04-30 20:59 devzero
  0 siblings, 0 replies; 229+ messages in thread
From: devzero @ 2008-04-30 20:59 UTC (permalink / raw)
  To: linux-kernel

yes, please !

i know, linux is an ever moving target, but from my user`s perspective i think the kernel more and more suffers from a quality problem.

the deeper i`m involved in linux and the more i read into lkml or bugzilla, the more bad impression i get.

ok, maybe windows has still more bugs and the transparency of linux gives a false impression, but it`s ridculous that things get broken that often.

e.g. i rarely saw a cd-rom/dvd fail on windows - but i have seen LOTs of problems with that in linux, especially with more recent kernels.

i can somewhat understand why one of my colleagues at work (windoze evangelist, sigh) constantly teasing "nahh, go away i don`t want to grapple with your DIY superstore OS" :)

linux is absolutely great , but please make sure that quality and stability is priority number one !


_____________________________________________________________________
Unbegrenzter Speicherplatz für Ihr E-Mail Postfach? Jetzt aktivieren!
http://freemail.web.de/club/landingpage.htm/?mc=025555


^ permalink raw reply	[flat|nested] 229+ messages in thread

end of thread, other threads:[~2008-05-14 14:56 UTC | newest]

Thread overview: 229+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-30  2:03 Slow DOWN, please!!! David Miller
2008-04-30  4:03 ` David Newall
2008-04-30  4:18   ` David Miller
2008-04-30 13:04     ` David Newall
2008-04-30 13:18       ` Michael Kerrisk
2008-04-30 14:51       ` Linus Torvalds
2008-04-30 18:21         ` David Newall
2008-04-30 18:27           ` Linus Torvalds
2008-04-30 18:55             ` David Newall
2008-04-30 19:08               ` Linus Torvalds
2008-04-30 19:16                 ` David Newall
2008-04-30 19:25                   ` Linus Torvalds
2008-05-01  4:31                     ` David Newall
2008-05-01  4:37                       ` David Miller
2008-05-01 13:49                       ` Lennart Sorensen
2008-05-01 15:28                       ` Kasper Sandberg
2008-05-01 17:49                         ` Russ Dill
2008-05-02  1:47                           ` Kasper Sandberg
2008-05-02  2:54                             ` Russ Dill
2008-05-02  7:01                               ` Kasper Sandberg
2008-05-02 17:34                               ` Lee Mathers (TCAFS)
2008-05-02 18:21                                 ` Andi Kleen
2008-05-02 21:34                                   ` Kasper Sandberg
2008-04-30 19:06             ` Chris Friesen
2008-04-30 19:13               ` Linus Torvalds
2008-04-30 19:22                 ` David Newall
2008-04-30 19:42                   ` Linus Torvalds
2008-04-30  7:11   ` Tarkan Erimer
2008-04-30 13:28     ` David Newall
2008-04-30 13:38       ` Mike Galbraith
2008-04-30 14:41       ` mws
2008-04-30 14:55   ` Russ Dill
2008-04-30 14:48 ` Peter Teoh
2008-04-30 19:36 ` Rafael J. Wysocki
2008-04-30 20:00   ` Andrew Morton
2008-04-30 20:20     ` Rafael J. Wysocki
2008-04-30 20:05   ` Linus Torvalds
2008-04-30 20:14     ` Linus Torvalds
2008-04-30 20:56       ` Rafael J. Wysocki
2008-04-30 23:34       ` Greg KH
2008-04-30 20:45     ` Rafael J. Wysocki
2008-04-30 21:37       ` Linus Torvalds
2008-04-30 22:23         ` Rafael J. Wysocki
2008-04-30 22:31           ` Linus Torvalds
2008-04-30 22:41             ` Andrew Morton
2008-04-30 23:23               ` Rafael J. Wysocki
2008-04-30 23:41                 ` david
2008-04-30 23:51                   ` Rafael J. Wysocki
2008-05-01  0:57               ` Adrian Bunk
2008-05-01  1:25                 ` Linus Torvalds
2008-05-01  2:13                   ` Adrian Bunk
2008-05-01  2:30                     ` Linus Torvalds
2008-05-01 18:54                       ` Adrian Bunk
2008-05-14 14:55                       ` Pavel Machek
2008-05-01  1:35                 ` Theodore Tso
2008-05-01 12:31               ` Tarkan Erimer
2008-05-01 15:34                 ` Stefan Richter
2008-05-02 14:05                   ` Tarkan Erimer
2008-04-30 22:46             ` Willy Tarreau
2008-04-30 22:52               ` Andrew Morton
2008-04-30 23:21                 ` Willy Tarreau
2008-04-30 23:38                   ` Chris Shoemaker
2008-04-30 23:20               ` Linus Torvalds
2008-05-01  0:42                 ` Rafael J. Wysocki
2008-05-01  1:19                   ` Linus Torvalds
2008-05-01  1:31                     ` Andrew Morton
2008-05-01  1:43                       ` Linus Torvalds
2008-05-01 10:59                         ` Rafael J. Wysocki
2008-05-01 15:26                           ` Linus Torvalds
2008-05-01 17:09                             ` Rafael J. Wysocki
2008-05-01 17:41                               ` Linus Torvalds
2008-05-01 18:11                                 ` Al Viro
2008-05-01 18:23                                   ` Linus Torvalds
2008-05-01 18:30                                     ` Linus Torvalds
2008-05-01 18:58                                     ` Willy Tarreau
2008-05-01 19:37                                     ` Al Viro
2008-05-01 19:58                                       ` Andrew Morton
2008-05-01 20:07                                       ` Joel Becker
2008-05-01 18:50                                 ` Willy Tarreau
2008-05-01 19:07                                   ` david
2008-05-01 19:28                                     ` Willy Tarreau
2008-05-01 19:46                                       ` david
2008-05-01 19:53                                         ` Willy Tarreau
2008-05-01 22:17                                   ` Rafael J. Wysocki
2008-05-01 19:39                                 ` Friedrich Göpel
2008-05-01 21:59                                 ` Rafael J. Wysocki
2008-05-02 12:17                                   ` Stefan Richter
2008-05-01 18:35                             ` Chris Frey
2008-05-02 13:22                               ` Enrico Weigelt
2008-05-01  1:40                     ` Linus Torvalds
2008-05-01  1:51                       ` David Miller
2008-05-01  2:01                         ` Linus Torvalds
2008-05-01  2:17                           ` David Miller
2008-05-01  2:21                       ` Al Viro
2008-05-01  5:19                         ` david
2008-05-04  3:26                         ` Rene Herman
2008-05-01  2:31                       ` Nigel Cunningham
2008-05-01 18:32                         ` Stephen Clark
2008-05-01  3:53                       ` Frans Pop
2008-05-01 11:38                       ` Rafael J. Wysocki
2008-04-30 14:28                         ` Arjan van de Ven
2008-05-01 12:41                           ` Rafael J. Wysocki
2008-04-30 15:06                             ` Arjan van de Ven
2008-05-01  5:50                     ` Willy Tarreau
2008-05-01 11:53                       ` Rafael J. Wysocki
2008-05-01 12:11                         ` Will Newton
2008-05-01 13:16                         ` Bartlomiej Zolnierkiewicz
2008-05-01 13:53                           ` Rafael J. Wysocki
2008-05-01 14:35                             ` Bartlomiej Zolnierkiewicz
2008-05-01 15:29                           ` Ray Lee
2008-05-01 19:03                             ` Willy Tarreau
2008-05-01 19:36                         ` Valdis.Kletnieks
2008-05-01  1:30                 ` Jeremy Fitzhardinge
2008-05-01  5:35                   ` Willy Tarreau
2008-04-30 23:03             ` Rafael J. Wysocki
2008-04-30 22:40           ` david
2008-04-30 23:45             ` Rafael J. Wysocki
2008-04-30 23:57               ` david
2008-05-01  0:01                 ` Chris Shoemaker
2008-05-01  0:14                   ` david
2008-05-01  0:38                     ` Linus Torvalds
2008-05-01  1:39                       ` Jeremy Fitzhardinge
2008-05-01  0:38               ` Adrian Bunk
2008-05-01  0:56                 ` Rafael J. Wysocki
2008-05-01  1:25                   ` Adrian Bunk
2008-05-01 12:05                     ` Rafael J. Wysocki
2008-05-01 13:54       ` Stefan Richter
2008-05-01 14:06         ` Rafael J. Wysocki
2008-04-30 23:29     ` Paul Mackerras
2008-05-01  1:57       ` Jeff Garzik
2008-05-01  2:52         ` Frans Pop
2008-05-01  3:47       ` Linus Torvalds
2008-05-01  4:17         ` Jeff Garzik
2008-05-01  4:46           ` Linus Torvalds
2008-05-04 13:47             ` Krzysztof Halasa
2008-05-04 15:05               ` Jacek Luczak
2008-05-01  9:17           ` Alan Cox
2008-04-30 20:15   ` Andrew Morton
2008-04-30 20:31     ` Linus Torvalds
2008-04-30 20:47       ` Dan Noe
2008-04-30 20:59         ` Andrew Morton
2008-04-30 21:30           ` Rafael J. Wysocki
2008-04-30 21:37             ` Andrew Morton
2008-04-30 22:08             ` Linus Torvalds
2008-04-30 22:53           ` Mariusz Kozlowski
2008-04-30 23:11             ` Andrew Morton
2008-05-12  9:27               ` Ben Dooks
2008-05-02 10:20             ` Andi Kleen
2008-05-02 15:33               ` Mariusz Kozlowski
2008-04-30 20:54       ` Andrew Morton
2008-04-30 21:21         ` David Miller
2008-04-30 21:47           ` Rafael J. Wysocki
2008-04-30 22:02           ` Dmitri Vorobiev
2008-04-30 22:19           ` Ingo Molnar
2008-04-30 22:22             ` David Miller
2008-04-30 22:39               ` Rafael J. Wysocki
2008-04-30 22:54                 ` david
2008-04-30 23:12                 ` Willy Tarreau
2008-04-30 23:59                   ` Rafael J. Wysocki
2008-05-01  0:15                   ` Chris Shoemaker
2008-05-01  5:09                     ` Willy Tarreau
2008-04-30 22:35             ` Ingo Molnar
2008-04-30 22:49               ` Andrew Morton
2008-04-30 22:51               ` David Miller
2008-05-01  1:40                 ` Ingo Molnar
2008-05-01  2:48                 ` Adrian Bunk
2008-05-05  3:04             ` Rusty Russell
2008-05-02 13:37           ` Helge Hafting
2008-04-30 21:42         ` Dmitri Vorobiev
2008-04-30 22:06           ` Jiri Slaby
2008-04-30 22:10           ` Andrew Morton
2008-04-30 22:19             ` Linus Torvalds
2008-04-30 22:28               ` Dmitri Vorobiev
2008-05-01 16:26                 ` Diego Calleja
2008-05-01 16:31                   ` Dmitri Vorobiev
2008-05-02  1:48                   ` Stephen Rothwell
2008-05-01 23:06               ` Kevin Winchester
2008-04-30 23:04             ` Dmitri Vorobiev
2008-05-01 15:19               ` Jim Schutt
2008-05-01  6:15             ` Jan Engelhardt
2008-05-09  9:28         ` Jiri Kosina
2008-05-09 15:00           ` Jeff Garzik
2008-04-30 21:52       ` H. Peter Anvin
2008-05-01  3:24         ` Bob Tracy
2008-05-01 16:39         ` Valdis.Kletnieks
2008-05-01  0:31       ` RFC: starting a kernel-testers group for newbies Adrian Bunk
2008-04-30  7:03         ` Arjan van de Ven
2008-05-01  8:13           ` Andrew Morton
2008-04-30 14:15             ` Arjan van de Ven
2008-05-01 12:42               ` David Woodhouse
2008-04-30 15:02                 ` Arjan van de Ven
2008-05-05 10:03                 ` Benny Halevy
2008-05-04 12:45               ` Rene Herman
2008-05-04 13:00                 ` Pekka Enberg
2008-05-04 13:19                   ` Rene Herman
2008-05-05 13:13                   ` crosscompiler [WAS: RFC: starting a kernel-testers group for newbies] Enrico Weigelt
2008-05-01  9:16             ` RFC: starting a kernel-testers group for newbies Frans Pop
2008-05-01 10:30               ` Enrico Weigelt
2008-05-01 13:02                 ` Adrian Bunk
2008-05-01 11:30           ` Adrian Bunk
2008-04-30 14:20             ` Arjan van de Ven
2008-05-01 12:53               ` Rafael J. Wysocki
2008-05-01 13:21               ` Adrian Bunk
2008-05-01 15:49                 ` Andrew Morton
2008-05-01  1:13                   ` Arjan van de Ven
2008-05-02  9:00                     ` Adrian Bunk
2008-05-01 16:38                   ` Steven Rostedt
2008-05-01 17:18                     ` Andrew Morton
2008-05-01 17:24                   ` Theodore Tso
2008-05-01 19:26                     ` Andrew Morton
2008-05-01 19:39                       ` Steven Rostedt
2008-05-02 10:23                       ` Andi Kleen
2008-05-02  2:08                 ` Paul Mackerras
2008-05-02  3:10                   ` Josh Boyer
2008-05-02  4:09                     ` Paul Mackerras
2008-05-02  8:29                       ` Adrian Bunk
2008-05-02 10:16                         ` Paul Mackerras
2008-05-02 11:58                           ` Adrian Bunk
2008-05-02 14:58                         ` Linus Torvalds
2008-05-02 15:44                           ` Carlos R. Mafra
2008-05-02 16:28                             ` Linus Torvalds
2008-05-02 17:15                               ` Carlos R. Mafra
2008-05-02 18:02                                 ` Pallipadi, Venkatesh
2008-05-09 16:32                                   ` Mark Lord
2008-05-09 19:30                                     ` Carlos R. Mafra
2008-05-09 20:39                                       ` Mark Lord
2008-05-01  0:41         ` David Miller
2008-05-01 13:23           ` Adrian Bunk
2008-04-30 20:59 Slow DOWN, please!!! devzero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).