linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [lkcd-devel] Re: What's left over.
@ 2002-10-31 20:22 Andreas Herrmann
  2002-10-31 20:40 ` Linus Torvalds
  0 siblings, 1 reply; 72+ messages in thread
From: Andreas Herrmann @ 2002-10-31 20:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, lkcd-devel, lkcd-devel-admin, lkcd-general,
	Rusty Russell, Matt D. Robinson


      Linus Torvalds <torvalds@transmeta.com>
      Sent by: lkcd-devel-admin@lists.sourceforge.net
      10/31/02 04:46 PM

On Wed, 30 Oct 2002, Matt D. Robinson wrote:

  > People have to realize that my kernel is not for random new
  > features. The stuff I consider important are things that people
  > use on their own, or stuff that is the base for other work.

A dump mechanism within the kernel is a base for much easier
kernel debugging.
IMHO, analyzing a dump is much more effective than guessing
a kernel bug solely with help of an oops message.
Using lkcd/lcrash, I've debugged enough problems in
kernel modules that were otherwise quite hard to determine.
It is hard to understand why developers do not want the
aid of dump/dump-analysis for kernel development.


Regards,

Andreas


^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-11-02 10:36 Brad Hards
  2002-11-02 19:28 ` [lkcd-devel] " Matt D. Robinson
  0 siblings, 1 reply; 72+ messages in thread
From: Brad Hards @ 2002-11-02 10:36 UTC (permalink / raw)
  To: Matt D. Robinson; +Cc: Linus Torvalds, linux-kernel, lkcd-general, lkcd-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 1 Nov 2002 13:01, Matt D. Robinson wrote:
<snip>
> Uh ... have you read the patches?  Do you see how few the
> changes are to non-dump code?  Do you know that most of those
> changes only get triggered in a crash situation anyway?
I applied the patches, and reported some issues.
http://marc.theaimsgroup.com/?l=linux-kernel&m=103520434201014&w=2
I see no signs that any of them have been addressed, although I haven't tried 
a really recent set.

> Breakage occurs when people change code areas that are used
> all the time, like VM, network, block layer, etc.
Actually, this is the area that Linux is best at. If you break it, some poor 
sod will hit the problem, and you'll know really soon.

> Look at the patches and tell me where we are causing overhead
> and and seriously potential breakage.  If you find problems,
> then tell us, don't just comment on breakage scenarios.

I'm a fairly typical user - I just have a couple of desktop machines and a 
server/firewall. 

I don't have 700 nodes in a cluster, and when my machines break, its normally 
something I did. Sometimes the desktop locks up (say every second month, 
unless I'm dicking with the kernel), but I reboot and everything is happy.

LKCD doesn't really seem to do anything for me - it wouldn't really worry me 
if it went in (since I don't have to maintain it - it isn't near any of my 
code), but I'd really prefer that having the _CONFIG option set to N didn't 
make the kernel any bigger, or change any code paths.

Is this unreasonable?

Brad

BTW: I admit that I'd be pretty pissed if Linus said that my code was 
"stupid", but life isn't reasonable or fair. Take a few days off LKCD, go for 
a few walks, and worry about how to get it integrated after that.


- -- 
http://linux.conf.au. 22-25Jan2003. Perth, Aust. I'm registered. Are you?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9w6rCW6pHgIdAuOMRAlI5AJ48ELVdExIeCr5C5HtDpU5+1ZnuBQCdEji0
t4q2NjZQVGEumrz6b+CqEEs=
=xtYY
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-11-01 19:18 Linus Torvalds
  2002-11-01 20:22 ` [lkcd-devel] " Matt D. Robinson
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Torvalds @ 2002-11-01 19:18 UTC (permalink / raw)
  To: Joel Becker
  Cc: Alan Cox, Bill Davidsen, Chris Friesen, Matt D. Robinson,
	Rusty Russell, Linux Kernel Mailing List, lkcd-general,
	lkcd-devel


On Fri, 1 Nov 2002, Joel Becker wrote:
> 
> 	I always liked the AIX dumper choices.  You could either dump to
> the swap area (and startup detects the dump and moves it to the
> filesystem before swapon) or provide a dedicated dump partition.  The
> latter was prefered.
> 	Either of these methods merely require the dumper to correctly
> write to one disk partition.  This is about as simple as you are going
> to get in disk dumping.

Ehh.. That was on closed hardware that was largely designed with and for
the OS.

Alan isn't worried about the "which sector do I write" kind of thing.  
That's the trivial part. Alan is worried about the fact that once you know
which sector to write, actually _doing_ so is a really hard thing. You
have bounce buffers, you have exceedingly complex drivers that work
differently in PIO and DMA modes and are more likely than not the _cause_
of a number of problems etc.

And you have a situation where interrupts are not likely to work well
(because you crashed with various locks held), so the regular driver
simply isn't likely to work all that well.

And you have a situation where there are hundreds of different kinds of 
device drivers for the disk.

In other words, the AIX situation isn't even _remotely_ comparable. A
large portion of the complexity in the PC stability space is in device
drivers. It's the thing I worry most about for 2.6.x stabilization, by 
_far_.

And if you get these things wrong, you're quite likely to stomp on your
disk. Hard. You may be tryign to write the swap partition, but if the
driver gets confused, you just overwrote all your important data. At which
point it doesn't matter if your filesystem is journaling or not, since you
just potentially overwrote it.

In other words: it's a huge risk to play with the disk when the system is
already known to be unstable. The disk drivers tend to be one of the main
issues even when everything else is _stable_, for chrissake!

To add insult to injury, you will not be able to actually _test_ any of 
the real error paths in real life. Sure, you will be able to test forced 
dumps on _your_ hardware, but while that is fine in the AIX model ("we 
control the hardware, and charge the user five times what it is worth"), 
again that doesn't mean _squat_ in the PC hardware space.

See?

		Linus


^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-11-01  6:36 Linus Torvalds
  2002-11-01  7:00 ` [lkcd-devel] " Castor Fu
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Torvalds @ 2002-11-01  6:36 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Matt D. Robinson, Rusty Russell, linux-kernel, lkcd-general, lkcd-devel


On Fri, 1 Nov 2002, Bill Davidsen wrote:
> 
>   If you really believed the stuff you say you'd put it in and promise to
> take it out if people didn't find it useful or there were inherent
> limitations.

This never works. Be honest. Nobody takes out features, they are stuck 
once they get in. Which is exactly why my job is to say "no", and why 
there is no "accepted unless proven bad". 

> It would probably take 10-30% off the time to a stable release.

Talk is cheap.

I've not seen a _single_ bug-report with a fix that attributed the
existing LKCD patches. I might be more impressed if I had. 

The basic issue is that we don't put patches in in the hope that they will
prove themselves later. Your argument is fundamentally flawed.

		Linus


^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [lkcd-devel] Re: What's left over.
@ 2002-10-31 22:47 Richard J Moore
  2002-10-31 23:39 ` Werner Almesberger
  0 siblings, 1 reply; 72+ messages in thread
From: Richard J Moore @ 2002-10-31 22:47 UTC (permalink / raw)
  To: Werner Almesberger
  Cc: Jeff Garzik, linux-kernel, lkcd-devel, lkcd-devel-admin,
	lkcd-general, Rusty Russell, Linus Torvalds, Matt D. Robinson


> I'm not so convinced about this. I like the Mission Critical
> approach:

and so do many people. In fact netdump, mcode and lkcd are all
complementary parts of the same need. That's why we are working with
mcrit's blessing to merge mcore into lkcd. That's a big piece of work,
which we hope to make progress with during 2003 - Suparna's the expert :-)

Richard


^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-10-31 21:33 Rusty Russell
  2002-11-01  1:19 ` [lkcd-devel] " Matt D. Robinson
  0 siblings, 1 reply; 72+ messages in thread
From: Rusty Russell @ 2002-10-31 21:33 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Linus Torvalds, Matt D. Robinson, Rusty Russell, linux-kernel,
	lkcd-general, lkcd-devel

In message <3DC171FF.5000803@nortelnetworks.com> you write:
> Ideally I would like to see a dump framework that can have a number of 
> possible dump targets.  We should be able to dump to any combination of 
> network, serial, disk, flash, unused ram that isn't wiped over restarts, 
> etc...

Both the lkcd and ide mini-oopser have that (although the mini-oopser
has only x86-ide for now).

The mini-oopser has different aims than LCKD: they want to debug one
system, I want to make sure we're reaping OOPS reports from those 99%
of desktop users who run X and simply reboot when their machine
crashes once a month.

I did *not* put the mini-oopser on the Snowball list, because I don't
have time to polish it.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-10-31 20:59 Dave Anderson
  2002-11-01  1:25 ` [lkcd-devel] " Matt D. Robinson
  0 siblings, 1 reply; 72+ messages in thread
From: Dave Anderson @ 2002-10-31 20:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt D. Robinson, Rusty Russell, linux-kernel, lkcd-general, lkcd-devel


On Thu, 31 Oct 2002, Linus Torvalds wrote:

>  - included features kill off (potentially better) projects.
>
>         There's a big "inertia" to features. It's often better to keep
>         features _off_ the standard kernel if they may end up being
>         further developed in totally new directions.
>
>         In particular when it comes to this project, I'm told about
>         "netdump", which doesn't try to dump to a disk, but over the net.
>         And quite frankly, my immediate reaction is to say "Hell, I
>         _never_ want the dump touching my disk, but over the network
>         sounds like a great idea".
>
> To me this says "LKCD is stupid". Which means that I'm not going to apply
> it, and I'm going to need some real reason to do so - ie being proven
> wrong in the field.
>
> (And don't get me wrong - I don't mind getting proven wrong. I change my
> opinions the way some people change underwear. And I think that's ok).

It would be most unfortunate if the existance of netdump is used as a
reason to deny LKCD's inclusion, or to simply dismiss LKCD as stupid.

On Thu, 31 Oct 2002, Matt D. Robinson wrote:

> We want to see this in the kernel, frankly, because it's a pain
> in the butt keeping up with your kernel revisions and everything
> else that goes in that changes.  And I'm sure SuSE, UnitedLinux and
> (hopefully) Red Hat don't want to spend their time having to roll
> this stuff in each and every time you roll a new kernel.

While Red Hat advocates Ingo's netdump option, we have customer
requests that are requiring us to look at LKCD disk-based dumps as an
alternative, co-existing dump mechanism.  Since the two methods are not mutually
exclusive, LKCD will never kill off netdump -- nor certainly vice-versa.  We're
all just looking for a better means to be able to
provide support to our customers, not to mention its value as a
development aid.

Dave Anderson
Red Hat, Inc.




^ permalink raw reply	[flat|nested] 72+ messages in thread
* RE: [lkcd-devel] Re: What's left over.
@ 2002-10-31 18:17 Deepak Kumar Gupta, Noida
  0 siblings, 0 replies; 72+ messages in thread
From: Deepak Kumar Gupta, Noida @ 2002-10-31 18:17 UTC (permalink / raw)
  To: Chris Friesen, Linus Torvalds
  Cc: Matt D. Robinson, Rusty Russell, linux-kernel, lkcd-general, lkcd-devel

> Linus Torvalds wrote:
> 
> > 	In particular when it comes to this project, I'm told about
> > 	"netdump", which doesn't try to dump to a disk, but 
> over the net.
> > 	And quite frankly, my immediate reaction is to say "Hell, I
> > 	_never_ want the dump touching my disk, but over the network
> > 	sounds like a great idea".
> > 
> > To me this says "LKCD is stupid". Which means that I'm not 
> going to apply 
> > it, and I'm going to need some real reason to do so - ie 
> being proven 
> > wrong in the field.
> 
> How do you deal with netdump when your network driver is what 
> caused the 
> crash?
> 
> Ideally I would like to see a dump framework that can have a 
> number of 
> possible dump targets.  We should be able to dump to any 
> combination of 
> network, serial, disk, flash, unused ram that isn't wiped 
> over restarts, 
> etc...
This is what the LKCD with generic interface is .. LKCD with generic
interface has the capability to include various dump targets in a very clean
way. Originally the LKCD meant for saving dump only on the disks, but its
generic interface has provided the option to have a number of dump targets.

Regards
Deepak.

^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-10-31 17:25 Linus Torvalds
  2002-10-31 21:02 ` Jeff Garzik
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Torvalds @ 2002-10-31 17:25 UTC (permalink / raw)
  To: Matt D. Robinson; +Cc: Rusty Russell, linux-kernel, lkcd-general, lkcd-devel


[ Ok, this is a really serious email. If you don't get it, don't bother 
  emailing me. Instead, think about it for an hour, and if you still don't 
  get it, ask somebody you know to explain it to you. ]

On Thu, 31 Oct 2002, Matt D. Robinson wrote:
> 
> Sure, but why should they have to?  What technical reason is there
> for not including it, Linus?

There are many:

 - bloat kills:

	My job is saying "NO!"

	In other words: the question is never EVER "Why shouldn't it be
	accepted?", but it is always "Why do we really not want to live 
	without this?"

 - included features kill off (potentially better) projects.

	There's a big "inertia" to features. It's often better to keep 
	features _off_ the standard kernel if they may end up being
	further developed in totally new directions.

	In particular when it comes to this project, I'm told about
	"netdump", which doesn't try to dump to a disk, but over the net.
	And quite frankly, my immediate reaction is to say "Hell, I
	_never_ want the dump touching my disk, but over the network
	sounds like a great idea".

To me this says "LKCD is stupid". Which means that I'm not going to apply 
it, and I'm going to need some real reason to do so - ie being proven 
wrong in the field.

(And don't get me wrong - I don't mind getting proven wrong. I change my 
opinions the way some people change underwear. And I think that's ok).

> I completely don't understand your reasoning here.

Tough. That's YOUR problem.

		Linus


^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: What's left over.
@ 2002-10-31 15:46 Linus Torvalds
  2002-10-31 19:33 ` [lkcd-devel] " Castor Fu
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Torvalds @ 2002-10-31 15:46 UTC (permalink / raw)
  To: Matt D. Robinson; +Cc: Rusty Russell, linux-kernel, lkcd-general, lkcd-devel


On Wed, 30 Oct 2002, Matt D. Robinson wrote:

> Linus Torvalds wrote:
> > > Crash Dumping (LKCD)
> > 
> > This is definitely a vendor-driven thing. I don't believe it has any
> > relevance unless vendors actively support it.
> 
> There are people within IBM in Germany, India and England, as well as
> a number of companies (Intel, NEC, Hitachi, Fujitsu), as well as SGI
> that are PAID to support this.

That's fine. And since they are paid to support it, they can apply the 
patches.  

What I'm saying by "vendor driven" is that it has no relevance for the 
standard kernel, and since it has no relevance to that, then I have no 
incentives to merge it. The crash dump is only useful with people who 
actively look at the dumps, and I don't know _anybody_ outside of the 
specialized vendors you mention who actually do that.

I will merge it when there are real users who want it - usually as a
result of having gotten used to it through a vendor who supports it. (And
by "support" I do not mean "maintain the patches", but "actively uses it"
to work out the users problems or whatever).

Horse before the cart and all that thing.

People have to realize that my kernel is not for random new features. The
stuff I consider important are things that people use on their own, or
stuff that is the base for other work. Quite often I want vendors to merge
patches _they_ care about long long before I will merge them (examples of
this are quite common, things like reiserfs and ext3 etc).

THAT is what I mean by vendor-driven. If vendors decide they really want
the patches, and I actually start seeing noises on linux-kernel or getting
requests for it being merged from _users_ rather than developers, then
that means that the vendor is on to something.

		Linus


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2002-11-11 17:59 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-31 20:22 [lkcd-devel] Re: What's left over Andreas Herrmann
2002-10-31 20:40 ` Linus Torvalds
2002-10-31 20:54   ` Patrick Finnegan
2002-10-31 21:08   ` Benjamin LaHaise
2002-10-31 22:04     ` Bernhard Kaindl
2002-11-01  0:33       ` Werner Almesberger
  -- strict thread matches above, loose matches on Subject: below --
2002-11-02 10:36 Brad Hards
2002-11-02 19:28 ` [lkcd-devel] " Matt D. Robinson
2002-11-01 19:18 Linus Torvalds
2002-11-01 20:22 ` [lkcd-devel] " Matt D. Robinson
2002-11-02 13:02   ` Kai Henningsen
2002-11-01  6:36 Linus Torvalds
2002-11-01  7:00 ` [lkcd-devel] " Castor Fu
2002-10-31 22:47 Richard J Moore
2002-10-31 23:39 ` Werner Almesberger
2002-11-05 12:45   ` Suparna Bhattacharya
2002-10-31 21:33 Rusty Russell
2002-11-01  1:19 ` [lkcd-devel] " Matt D. Robinson
2002-11-01  2:59   ` Rusty Russell
2002-10-31 20:59 Dave Anderson
2002-11-01  1:25 ` [lkcd-devel] " Matt D. Robinson
2002-10-31 18:17 Deepak Kumar Gupta, Noida
2002-10-31 17:25 Linus Torvalds
2002-10-31 21:02 ` Jeff Garzik
2002-10-31 22:37   ` Werner Almesberger
2002-11-05 11:42     ` [lkcd-devel] " Suparna Bhattacharya
2002-11-05 18:00       ` Werner Almesberger
2002-11-05 18:36         ` Alan Cox
2002-11-05 19:19           ` Werner Almesberger
2002-11-05 20:10             ` Alan Cox
2002-11-05 23:25               ` Werner Almesberger
2002-11-06  0:21             ` Andy Pfiffer
2002-11-06  1:10               ` Werner Almesberger
2002-11-06  1:37                 ` Alexander Viro
2002-11-06  2:05                   ` Werner Almesberger
2002-11-07  6:04                     ` Eric W. Biederman
2002-11-07 12:17                       ` Werner Almesberger
2002-11-06  4:07                   ` Eric W. Biederman
2002-11-06  4:47                     ` Eric W. Biederman
2002-11-06 19:24                     ` Rob Landley
2002-11-10 18:35               ` Pavel Machek
2002-11-06  2:48           ` Eric W. Biederman
2002-11-06  4:29           ` Eric W. Biederman
2002-11-06  6:25             ` Linus Torvalds
2002-11-06  6:38               ` Suparna Bhattacharya
2002-11-06  7:48               ` Eric W. Biederman
2002-11-06  9:11                 ` Suparna Bhattacharya
2002-11-06 22:05                 ` Michal Jaegermann
2002-11-06 16:13               ` Eric W. Biederman
2002-11-07  8:50               ` Eric W. Biederman
2002-11-07 15:44                 ` Linus Torvalds
2002-11-09 23:05                   ` Eric W. Biederman
2002-11-09 23:33                     ` Linus Torvalds
2002-11-10  1:37                       ` Eric W. Biederman
2002-11-10  2:12                         ` Alan Cox
2002-11-10  2:16                           ` Eric W. Biederman
2002-11-10  3:03                             ` Werner Almesberger
2002-11-10  3:23                               ` Eric W. Biederman
2002-11-10 14:30                             ` Alan Cox
2002-11-10 16:56                               ` Eric W. Biederman
2002-11-10  3:17                         ` Linus Torvalds
2002-11-10  4:26                           ` Eric W. Biederman
2002-11-11 18:03                           ` Eric W. Biederman
2002-11-09 23:39                     ` Randy.Dunlap
2002-11-10  2:58                       ` Eric W. Biederman
2002-11-10 14:35                         ` Alan Cox
2002-11-10 18:13                           ` Eric W. Biederman
2002-11-10  1:31                     ` Werner Almesberger
2002-11-10  3:10                       ` Eric W. Biederman
2002-11-10  3:30                         ` Werner Almesberger
2002-11-10  3:49                           ` Eric W. Biederman
2002-11-10  3:49                         ` Linus Torvalds
2002-11-10  2:08                     ` Alan Cox
2002-11-10  2:18                       ` Eric W. Biederman
2002-11-10 14:31                         ` Alan Cox
2002-11-07 15:48                 ` Linus Torvalds
2002-11-08 18:01                 ` Alan Cox
2002-11-09 21:21         ` Pavel Machek
2002-11-11 16:27           ` Eric W. Biederman
2002-10-31 15:46 Linus Torvalds
2002-10-31 19:33 ` [lkcd-devel] " Castor Fu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).