linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [lkcd-general] Re: What's left over.
       [not found] <551170412@toto.iv>
@ 2002-11-04  3:03 ` Peter Chubb
  2002-11-04 13:08   ` Alan Cox
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Chubb @ 2002-11-04  3:03 UTC (permalink / raw)
  To: linux; +Cc: linux-kernel

>>>>> "linux" == linux  <linux@horizon.com> writes:


linux> While a crash dump to just half of one of those mirrors is
linux> fine, finding it might be a little bit tricky.  And the fact
linux> that the kernel reassembles the mirrors automatically on boot
linux> might make retrieving the data a little bit tricky, too.

What most other unices do is crash dump to a dedicated swap
partition.   LKCD appears to be able to do this.  So the setup of MD
etc., isn't going to affect anything.

Peter C

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04  3:03 ` [lkcd-general] Re: What's left over Peter Chubb
@ 2002-11-04 13:08   ` Alan Cox
  0 siblings, 0 replies; 35+ messages in thread
From: Alan Cox @ 2002-11-04 13:08 UTC (permalink / raw)
  To: Peter Chubb; +Cc: linux, Linux Kernel Mailing List

On Mon, 2002-11-04 at 03:03, Peter Chubb wrote:
> >>>>> "linux" == linux  <linux@horizon.com> writes:
> 
> 
> linux> While a crash dump to just half of one of those mirrors is
> linux> fine, finding it might be a little bit tricky.  And the fact
> linux> that the kernel reassembles the mirrors automatically on boot
> linux> might make retrieving the data a little bit tricky, too.
> 
> What most other unices do is crash dump to a dedicated swap
> partition.   LKCD appears to be able to do this.  So the setup of MD
> etc., isn't going to affect anything.

I have raid1 swap. That does make a difference to the problem space.
When we get into encrypted raid5 swap over nbd (the security paranoia
dept - store all my swap crypted split into 4 disks in four
jurisdictions...) it gets really fun.

For the normal cases it doesn't seem a problem, even for raid0 swap
since before crash time you can generate a list of device/blocknumber
values and store it in the signed area


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
@ 2002-11-05 20:37 Dr. Greg Wettstein
  0 siblings, 0 replies; 35+ messages in thread
From: Dr. Greg Wettstein @ 2002-11-05 20:37 UTC (permalink / raw)
  To: Alan Cox, Bill Davidsen
  Cc: Matt D. Robinson, Steven King, Linus Torvalds, Joel Becker,
	Chris Friesen, Rusty Russell, Linux Kernel Mailing List,
	lkcd-general, lkcd-devel

> On Sun, 2002-11-03 at 14:33, Bill Davidsen wrote:
> > If you define "unmaintainably bad" as "having features you don't need"
> > then I agree. But since dump to disk is in almost every other commercial
> > UNIX, maybe someone would question why it's good for others but not for
> > Linux.

Perhaps the other OS's have made bad decisions.

I've only seen one OS in the last 20 years which, by industry
consensus, seems to have some hope of becoming a viable contender to a
monopolistic position.  I would hope that we would contemplate the
factors that helped give rise to that situation.

}-- End of excerpt from Alan Cox

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-4950            WWW: http://www.enjellic.com
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"How appropriate, you fight like a cow."
                                -- Guybrush Threepwood

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03 16:32       ` Alan Cox
@ 2002-11-05 18:07         ` Bill Davidsen
  0 siblings, 0 replies; 35+ messages in thread
From: Bill Davidsen @ 2002-11-05 18:07 UTC (permalink / raw)
  To: Alan Cox
  Cc: Matt D. Robinson, Steven King, Linus Torvalds, Joel Becker,
	Chris Friesen, Rusty Russell, Linux Kernel Mailing List,
	lkcd-general, lkcd-devel

On 3 Nov 2002, Alan Cox wrote:

> On Sun, 2002-11-03 at 14:33, Bill Davidsen wrote:
> > If you define "unmaintainably bad" as "having features you don't need"
> > then I agree. But since dump to disk is in almost every other commercial
> > UNIX, maybe someone would question why it's good for others but not  for
> > Linux.
> 
> It isnt about features, its about clean maintainable code. netdump to me
> doesnt mean no dump to disk option. In fact I'd rather like to be able
> to insmod dump-foo.o. The correctness issues are hard but if the
> dump-foo is standalone, resets the hardware and has an SHA integrity
> check then it can be done (think of it as a post crash variant of the
> trusted computing TCB verification problem)

I certainly don't disagree, but the one critical problem is writing the
dump to the right place, or at least not writing to the wrong place. I'd
love to have disk, net, NVram, whatever choices, but disk is the one which
would help the most. AIX and ISC have dump to swap, and the swapon copies
the data back or clears it, with a fresh O/S load to ensure writing the
right place.
 
> > uses the crash dump in AIX, the person who wants to send a compressed dump
> > and money to IBM and get back a fix. Netdump assumes external resources
> 
> Lots of interesting legal issues but yes you can do it sometimes (DMCA,
> privacy, financial duties sometimes make it horribly complex). Even in
> the case where you only dump the oops its still valuable.

Agreed, I would think about doing that with a mail server. But even an
oops like ksymoops would be helpful. I started on systems with dumps,
ksymoops is wonderful by comparison.
 
> > and a functional secure network (is the dump encrypted and I missed it?)
> > which home users surely don't have, and remote servers oftem lack as well.
> 
> Encrypting the dump with the new crypto lib in the kernel would be easy,
> right now it doesnt. 
> 
> My disk dump concerns are purely those of correctness. That means
> 
> 1.	After loading the module getting the block list for the dump target

That could all be built as part of init, clearly you can't depend on
demand loading the module.
 
> 2.	Resetting and scratch initializing the dump device

If the modules are to be really self-sufficient it would have to include
the driver. I'll let someone tell me that's not always the case if the
driver can have its own data area.
 
> 3.	Not relying on any code outside of the dump TCB that may have
> been corrupted

Yes, although with separate code, stack and data that's less likely. In
the bad old days self-modifying code was common.
 
> 4.	At dump time turning off all bus masters, doing the dump TCB
> verification and then dumping

The first part of that looks medium hard, particularly if the code has to
be part of the dump module.
 
> Most of the pieces already exist.

Clearly it can be done even better than the current implementation, and
given an interface standard a replacement in the whole could be done.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 16:57   ` Alan Cox
@ 2002-11-05  9:05     ` Suparna Bhattacharya
  0 siblings, 0 replies; 35+ messages in thread
From: Suparna Bhattacharya @ 2002-11-05  9:05 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Richard J Moore, Oliver Xymoron, Dave Anderson,
	Linux Kernel Mailing List, lkcd-general, lkcd-general-admin,
	Rusty Russell, Matt D. Robinson

On Mon, Nov 04, 2002 at 04:40:11PM +0000, Alan Cox wrote:
> Let me ask another question here
> 
> Other than "register_reboot_notifier()" and adding a 
> "register_exception_notifier()" chain what else does a dump tool need.
> Register_exception_notifier seems to solve about 90% of the insmod gdb 
> problem space as well ?
> 
> 

I had tried to list these in an earlier mail, added a few more
comments now marked by ">>"

1.Enabling IPI to collect CPU state on all processors in the
  system right when dump is triggered (may not be a normal
  situation, so NMIs where supported are the best option)

  >> set/register_nmi_callback could also help in part (though 
  >> synchronization issues need to be thought through so that
  >> the effect on regular system operation is as low as possible), 
  >> but we also need an interface to generate the NMI ipi when
  >> required, and something that generalises on all architectures.

2.Ability to quiesce (silence) the system before dumping 
  (and if in non-disruptive mode, then restore it back)
 >> smp_call_function may not the ideal option for many situations
 >> - in general we would like to have a separate "force" path
 >> available for some troublesome situations, and it would be 
 >> nice to be able to tackle non-disruptive (but accurate) dumping
 >> as well.

 >> maybe 1 & 2 can be combined in some form
 >> Dump should preferably not overlap with a regularly used IPI.
 
3. Calls into dump from kernel paths (panic, oops, sysrq
   etc). 

   >> This is where your register_xxx_notifier(s) fit in

4. Exports of symbols to help with physical memory 
   traversal and verification

   >> Covers what Andi Kleen referred to as 
   >> iterate_over_memmap_and_give_me_type()
   >> (a way to figure out the type of memory - true ram or other)

Regards
Suparna


-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 14:45   ` Henning P. Schmiedehausen
  2002-11-04 15:29     ` Alan Cox
@ 2002-11-05  4:57     ` Werner Almesberger
  1 sibling, 0 replies; 35+ messages in thread
From: Werner Almesberger @ 2002-11-05  4:57 UTC (permalink / raw)
  To: Henning P. Schmiedehausen; +Cc: linux-kernel

Henning P. Schmiedehausen wrote:
> Good! This means, people debugging the code have actually to think and
> don't produce "turn on debugger, step here, there, patch a band aid,
> done" solutions you see with various other "commercial products"

Unfortunately, just making it hard doesn't guarantee that they
won't try anyway. If you're lucky, at least their band aid will
be so disgusting that you won't be fooled into thinking they
might be right.

But ultimately, it's an attitude problem. Even people who learn
about their bugs by source code reading may then produce a
shabby fix.

Hmm, I wonder if Linus has ever done any protocol design,
followed by validation. I always find the havoc a protocol
validator (e.g. Spin) wreaks a very instructive demonstration
of how much source code level "correctness" really buys you :-)
(Or what chances you'd stand of realizing what happened just
from an Oops.)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 16:22 ` Linus Torvalds
@ 2002-11-04 16:57   ` Alan Cox
  2002-11-05  9:05     ` Suparna Bhattacharya
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-11-04 16:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Richard J Moore, Oliver Xymoron, Dave Anderson,
	Linux Kernel Mailing List, lkcd-devel, lkcd-general,
	lkcd-general-admin, Rusty Russell, Matt D. Robinson

Let me ask another question here

Other than "register_reboot_notifier()" and adding a 
"register_exception_notifier()" chain what else does a dump tool need.
Register_exception_notifier seems to solve about 90% of the insmod gdb 
problem space as well ?





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
       [not found] ` <1036429035.1718.99.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>
@ 2002-11-04 16:53   ` Andi Kleen
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2002-11-04 16:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> Let me ask another question here
> 
> Other than "register_reboot_notifier()" and adding a 
> "register_exception_notifier()" chain what else does a dump tool need.
> Register_exception_notifier seems to solve about 90% of the insmod gdb 
> problem space as well ?

A memory dumper needs some infrastructure to find out what page is ram
and what is hole etc.
Basically an iterate_over_memmap_and_give_me_type() function.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 15:38         ` Patrick Finnegan
@ 2002-11-04 16:51           ` Henning P. Schmiedehausen
  0 siblings, 0 replies; 35+ messages in thread
From: Henning P. Schmiedehausen @ 2002-11-04 16:51 UTC (permalink / raw)
  To: linux-kernel

Patrick Finnegan <pat@purdueriots.com> writes:

>On Mon, 4 Nov 2002, Henning P. Schmiedehausen wrote:

>> Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
>>
>> >On Mon, 2002-11-04 at 14:45, Henning P. Schmiedehausen wrote:
>> >> Good! This means, people debugging the code have actually to think and
>> >> don't produce "turn on debugger, step here, there, patch a band aid,
>>
>> >Some of us debug hardware. Regardless of the nice theories about
>> >reviewing your code they don't actually work on hardware because no
>> >amount of code review will let you discover things like undocumented
>> >2uS deskew delays, or errors in DMA engines
>>
>> A debugger won't help you here either. A pci bus probe, a 'scope and a
>> logic analyzer do.
>>
>> (And experience, elbow grease, experience and a nice amount of ESP :-)
>> I do hate hardware. Had to debug too much of it (and just on
>> m68k/MCS-51 where the clock rates are low and the parts easy to
>> solder...).

>I find that hard to believe.  You're saying it's impossible to use a
>software debugger to debug the interface between the software and the

No. IMHO it is impossible to use a software debugger to catch 2uS
deskew delays or errors in DMA engines. That's what logic analyzers
are for. If you attach or fire up the debugger, the timing changes and
you're no longer testing the failure case but something different.

>(No Linus, I'm not pushing them, just stating my opinion.)

I am, BTW completely your opinion. Personally I find it horrid that
"the XIAFS resurrection" is winked through with "will be probably
accepted for the hack value" and LKCD is rejected with "bloat"
arguments.

But hey, it _is_ Linus' kernel and he may choose as he likes. I
e.g. run vendor kernels (for 2.4).

	Regards
		Henning

-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 11:59 Richard J Moore
  2002-11-04 12:27 ` Lars Marowsky-Bree
  2002-11-04 16:16 ` John Alvord
@ 2002-11-04 16:22 ` Linus Torvalds
  2002-11-04 16:57   ` Alan Cox
  2 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2002-11-04 16:22 UTC (permalink / raw)
  To: Richard J Moore
  Cc: Oliver Xymoron, Dave Anderson, linux-kernel, lkcd-devel,
	lkcd-general, lkcd-general-admin, Rusty Russell,
	Matt D. Robinson


On Mon, 4 Nov 2002, Richard J Moore wrote:
> 
> Are you sure? Isn't what Linus is saying is that he understands that some
> problems can be solved using dumps, some from the oops message and some by
> source code inspection and some by others means. But, he's not interested
> in a timely resolution;

Ok, with tons of explanation:

 - I'm clearly not interested. I've not seen any discussion of the usage 
   of the tools or how great it is, and that's apparently because all the
   LKCD people are off in their own mailing lists and do not want to have
   anything to do with the rest of the world. Except when they come out of
   the blue one week before feature freeze and _demand_ that I accept
   their patches that I've never seen before or heard anybody talk about.

   Hint: think about this part. Deeply. And then go and bother SOMEBODY 
   ELSE.

 - Since I'm not personally convinced, it's not going into my tree.

   It's as simple as that. I take stuff that I feel is good. Often that 
   feeling of goodness comes from trusting the person who sends it to me, 
   simply by past performance.  At other times, it is because I think the 
   feature is cool, or well done, or whatever.

   Hint: if you want stuff in my tree, make me trust you. Or work on 
   things that I feel are innately interesting. Don't bother dragging me 
   into your flame-wars and trying to convince me that I "must" apply your
   patches.

 - If it doesn't go into my tree, is that bad?

   NO! Open source is all about _other_ people being able to make their 
   changes. It by no means means that those changes have to be accepted 
   back: the license basically only boils down to that I must be _able_ to
   accept them back. But the really important thing, the thing that really 
   makes a difference, is that you, your dog, and your company can make
   your OWN changes.

 - If it doesn't have to happen in my tree, then whose tree _does_ it have 
   to happen in?  

   Doesn't much matter, actually. You can keep it in your tree, for all I 
   care. OSDL has already picked it up and apparently maintains it in 
   their tree. The only thing that matters is whether it gets used or not, 
   and whether it proves itself.

   More people use vendor trees than my tree. And if you don't find a 
   vendor who will apply your patches, there are several "personal 
   vendors" out there, with the -ac, -aa and -mm trees being the obvious
   ones. Many of those trees are not just used, they are also 
   obviously backed by people I do trust, which brings us back to the
   criteria for _me_ to apply patches.

 - Considering the above, if you still want it to _eventually_ make it 
   into my tree, what should you do?

   Do you think pestering me makes me like the patches any more and trust 
   you? And if it doesn't, then how do you expect it to help, considering 
   my patch acceptance criteria?

   No. The way to get it into my tree is not to whine about it. There are 
   a few different ways to get it into my tree:

	(a) prove me wrong. And btw, it doesn't help to do so in your LKCD 
	    mailing list. You need to get those patches out there to 
	    _other_ people, or convince your own people that living in 
	    your little hole just means that nobody else knows or cares 
	    about you.

	(b) If you can't convince me, convince somebody else. Maybe that 
	    somebody else is somebody I trust, and that somebody else 
	    feels that I was wrong and since _he_ believes in the project 
	    he will try to convince me about it.

	    And trust me, the people I trust don't revere me and think I'm 
	    always right. These people call me "pinhead" and tell me when
	    I'm full of shit. If these people don't believe in your
	    project, don't blame me and think it's because I "poisoned 
	    their minds". 

	(c) Push your vendor. I have absolutely _zero_ incentives to care 
	    about whining users (I care deeply about the non-whining 
	    kind), but vendors do. Sometimes they do things just to get 
	    their users off their backs.

	    And once it's in a vendor tree, that doesn't guarantee I pick 
	    it up, but it _does_ guarantee that the patch is at least
	    widely used and thus we get more easily to (a) - proving me 
	    wrong outside your own little world.

 - Never whine about a patch. I know whining works with a lot of people
   ("Oh, for chrissake, I'll just do it to get him off my back") but it 
   works remarkably badly with me. Trust me on this.

Was this clear enough? Any confusion on any particular issue? 

In short: convince somebody else. So far, the only thing that the 
discussion has convinced me off is that people somehow seem to think that 
they are ENTITLED to being merged into my tree. Tough. It ain't so. That 
tree is called "Linus' tree" for a reason.  The only thing you are 
ENTITLED to is to have your own tree.

		Linus


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 11:59 Richard J Moore
  2002-11-04 12:27 ` Lars Marowsky-Bree
@ 2002-11-04 16:16 ` John Alvord
  2002-11-04 16:22 ` Linus Torvalds
  2 siblings, 0 replies; 35+ messages in thread
From: John Alvord @ 2002-11-04 16:16 UTC (permalink / raw)
  To: Richard J Moore
  Cc: Oliver Xymoron, Dave Anderson, linux-kernel, lkcd-devel,
	lkcd-general, lkcd-general-admin, Rusty Russell, Linus Torvalds,
	Matt D. Robinson

On Mon, 4 Nov 2002 11:59:23 +0000, "Richard J Moore"
<richardj_moore@uk.ibm.com> wrote:

>
>
>> What he really wants is for Andrew or Alan or someone else he trusts
>> to merge it, get actual field results, and declare it useful. If
>> people start visibly passing around crash dump results on l-k and
>> solving problems with them, that'll help too. Until then all he has is
>> his gut feel to go on.
>
>Are you sure? Isn't what Linus is saying is that he understands that some
>problems can be solved using dumps, some from the oops message and some by
>source code inspection and some by others means. But, he's not interested
>in a timely resolution; he has a preference for solving the problems by
>looking at the source and only that way. That's his preference: arguments
>relating to timeliness and commercial considerations are of no interest to
>him - simply because they argue for benefits in which he has no interest.
>Because LKCD doesn't personally interest him he has declared that he will
>not merge it; it' up to some trusted advocate.

What you describe is certainly Linus' general philosophy.

But he also said that the feature was in "vendor push" mode, which
means if enough vendors adopt the feature he would consider. Why do
you think reisferfs got into the mainline - certainly not because he
uses it personally.

He also said he has seen no evidence of its usefulness... not one
report on L-K of kernel problems resolved.

Seems pretty clear to me...

john alvord

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 15:27       ` Henning P. Schmiedehausen
@ 2002-11-04 15:38         ` Patrick Finnegan
  2002-11-04 16:51           ` Henning P. Schmiedehausen
  0 siblings, 1 reply; 35+ messages in thread
From: Patrick Finnegan @ 2002-11-04 15:38 UTC (permalink / raw)
  To: linux-kernel

On Mon, 4 Nov 2002, Henning P. Schmiedehausen wrote:

> Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
>
> >On Mon, 2002-11-04 at 14:45, Henning P. Schmiedehausen wrote:
> >> Good! This means, people debugging the code have actually to think and
> >> don't produce "turn on debugger, step here, there, patch a band aid,
>
> >Some of us debug hardware. Regardless of the nice theories about
> >reviewing your code they don't actually work on hardware because no
> >amount of code review will let you discover things like undocumented
> >2uS deskew delays, or errors in DMA engines
>
> A debugger won't help you here either. A pci bus probe, a 'scope and a
> logic analyzer do.
>
> (And experience, elbow grease, experience and a nice amount of ESP :-)
> I do hate hardware. Had to debug too much of it (and just on
> m68k/MCS-51 where the clock rates are low and the parts easy to
> solder...).

I find that hard to believe.  You're saying it's impossible to use a
software debugger to debug the interface between the software and the
hardware?  Eg. errors in the hardware that cause periodic anomalies in the
output read by the software would be one thing they could catch, along
with diagnosing that a problem is caused by flaky hardware rather than the
latest not-well-tested VM code.  In that last case, since bad hardware can
usually cause a panic, I see crash dumps as an invaluable resource ;-).
(No Linus, I'm not pushing them, just stating my opinion.)

Pat
--
Purdue Universtiy ITAP/RCS
Information Technology at Purdue
Research Computing and Storage
http://www-rcd.cc.purdue.edu

http://dilbert.com/comics/dilbert/archive/images/dilbert2040637020924.gif




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 14:45   ` Henning P. Schmiedehausen
@ 2002-11-04 15:29     ` Alan Cox
  2002-11-04 15:27       ` Henning P. Schmiedehausen
  2002-11-05  4:57     ` Werner Almesberger
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-11-04 15:29 UTC (permalink / raw)
  To: hps; +Cc: Linux Kernel Mailing List

On Mon, 2002-11-04 at 14:45, Henning P. Schmiedehausen wrote:
> Good! This means, people debugging the code have actually to think and
> don't produce "turn on debugger, step here, there, patch a band aid,

Some of us debug hardware. Regardless of the nice theories about
reviewing your code they don't actually work on hardware because no
amount of code review will let you discover things like undocumented 
2uS deskew delays, or errors in DMA engines



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 15:29     ` Alan Cox
@ 2002-11-04 15:27       ` Henning P. Schmiedehausen
  2002-11-04 15:38         ` Patrick Finnegan
  0 siblings, 1 reply; 35+ messages in thread
From: Henning P. Schmiedehausen @ 2002-11-04 15:27 UTC (permalink / raw)
  To: linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

>On Mon, 2002-11-04 at 14:45, Henning P. Schmiedehausen wrote:
>> Good! This means, people debugging the code have actually to think and
>> don't produce "turn on debugger, step here, there, patch a band aid,

>Some of us debug hardware. Regardless of the nice theories about
>reviewing your code they don't actually work on hardware because no
>amount of code review will let you discover things like undocumented 
>2uS deskew delays, or errors in DMA engines

A debugger won't help you here either. A pci bus probe, a 'scope and a
logic analyzer do.

(And experience, elbow grease, experience and a nice amount of ESP :-)
I do hate hardware. Had to debug too much of it (and just on
m68k/MCS-51 where the clock rates are low and the parts easy to
solder...).

	Regards
		Henning

-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04  2:44 ` [lkcd-general] " Jennie Haywood
@ 2002-11-04 14:45   ` Henning P. Schmiedehausen
  2002-11-04 15:29     ` Alan Cox
  2002-11-05  4:57     ` Werner Almesberger
  0 siblings, 2 replies; 35+ messages in thread
From: Henning P. Schmiedehausen @ 2002-11-04 14:45 UTC (permalink / raw)
  To: linux-kernel

Jennie Haywood <jehaywood@compuserve.com> writes:

>The Linux kernel  is _extremely_  painful to debug compared to AIX.

Good! This means, people debugging the code have actually to think and
don't produce "turn on debugger, step here, there, patch a band aid,
done" solutions you see with various other "commercial products" (can
anyone really say "Internet Explorer" on this list and live? ;-) )

	Regards
		Henning

-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
@ 2002-11-04 12:34 Richard J Moore
  0 siblings, 0 replies; 35+ messages in thread
From: Richard J Moore @ 2002-11-04 12:34 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: lars, linux-kernel, lkcd-devel, lkcd-general, lkcd-general-admin



> But arguing about "I have so many fortune 100 companies just lined up
ready to
> say that they support this campaign!" is marketing speak. Go away with
that
> from Linux kernel, will you.

Thank-you - you have restated my point.


Richard


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-04 11:59 Richard J Moore
@ 2002-11-04 12:27 ` Lars Marowsky-Bree
  2002-11-04 16:16 ` John Alvord
  2002-11-04 16:22 ` Linus Torvalds
  2 siblings, 0 replies; 35+ messages in thread
From: Lars Marowsky-Bree @ 2002-11-04 12:27 UTC (permalink / raw)
  To: Richard J Moore
  Cc: linux-kernel, lkcd-devel, lkcd-general, lkcd-general-admin

On 2002-11-04T11:59:23,
   Richard J Moore <richardj_moore@uk.ibm.com> said:

> So, for those of use who passionately care whether Linux has a system
> dumping mechanism, we need to regroup, we need to decide the correct
> strategy for gaining LKCD's inclusion into the kernel.  Many of the
> arguments relate to timeliness and ultimately have a commercial benefit. I
> suggest we actively campaign among the various distros who are interested
> in selling Linus businesses and provide support. We also need to
> concentrate on consolidating the various requirements of a system crash
> dump - it's going to be much easier for everyone if there is a consensus on
> system dumping technology.

I think you are somewhat missing the point.

Both RH and UnitedLinux seem to care enough for system dump facilities that
they ship patched kernels (netdump / LKCD, respectively). Anyone who cares can
simply apply the patch themselves, if they want to compile from vanilla
sources. Just buy RH AS or any enterprise product powered by United Linux, and
off you go. I assume that your "enterprise customers" will want to do that
anyway because they need all those very useful certifications...

And since l-k (rightly!) mostly refuses to deal with crash/oops reports from
vendor patched kernels anyway, the distributors have to deal with the
diagnosis themselves already and do so as part of the support contracts.
Anyone who runs their own patched kernels probably also is able to do so.

While I can see the issue that having the patch included in the mainstream
kernel offers the usual advantages, it is by no means the absolute requirement
you make it out to be.

It appears that the facilities are all there now; so 2.6 should be a the
perfect time to test the various approaches in the field. (And face it, field
experience is rather limitted still, but I am very sure it will grow soon
because it is such a useful feature)

Then it can be included. This is how Linux has always worked. reiserfs has
gone through this, as has ext3, XFS, quite a few of the VM patches etc. So no
worries, nobody is being exceptionally harsh in any fashion.

But arguing about "I have so many fortune 100 companies just lined up ready to
say that they support this campaign!" is marketing speak. Go away with that
from Linux kernel, will you.

Come back when it is "I have so many fortune 100 companies actively using this
feature and have solved many problems with it!".


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Principal Squirrel 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
@ 2002-11-04 11:59 Richard J Moore
  2002-11-04 12:27 ` Lars Marowsky-Bree
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Richard J Moore @ 2002-11-04 11:59 UTC (permalink / raw)
  To: Oliver Xymoron
  Cc: Dave Anderson, linux-kernel, lkcd-devel, lkcd-general,
	lkcd-general-admin, Rusty Russell, Linus Torvalds,
	Matt D. Robinson



> What he really wants is for Andrew or Alan or someone else he trusts
> to merge it, get actual field results, and declare it useful. If
> people start visibly passing around crash dump results on l-k and
> solving problems with them, that'll help too. Until then all he has is
> his gut feel to go on.

Are you sure? Isn't what Linus is saying is that he understands that some
problems can be solved using dumps, some from the oops message and some by
source code inspection and some by others means. But, he's not interested
in a timely resolution; he has a preference for solving the problems by
looking at the source and only that way. That's his preference: arguments
relating to timeliness and commercial considerations are of no interest to
him - simply because they argue for benefits in which he has no interest.
Because LKCD doesn't personally interest him he has declared that he will
not merge it; it' up to some trusted advocate.

So, for those of use who passionately care whether Linux has a system
dumping mechanism, we need to regroup, we need to decide the correct
strategy for gaining LKCD's inclusion into the kernel.  Many of the
arguments relate to timeliness and ultimately have a commercial benefit. I
suggest we actively campaign among the various distros who are interested
in selling Linus businesses and provide support. We also need to
concentrate on consolidating the various requirements of a system crash
dump - it's going to be much easier for everyone if there is a consensus on
system dumping technology.


First crucial question - are there any avenues still open for 2.5?


Richard J Moore
RAS Project Lead - IBM Linux Technology Centre



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03 13:48 Bill Davidsen
@ 2002-11-04  2:44 ` Jennie Haywood
  2002-11-04 14:45   ` Henning P. Schmiedehausen
  0 siblings, 1 reply; 35+ messages in thread
From: Jennie Haywood @ 2002-11-04  2:44 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Alan Cox, Linus Torvalds, Chris Friesen, Matt D. Robinson,
	Rusty Russell, Linux Kernel Mailing List, lkcd-general,
	lkcd-devel

Bill Davidsen wrote:

>
> On 1 Nov 2002, Alan Cox wrote:
>
> > On Fri, 2002-11-01 at 06:34, Bill Davidsen wrote:
> > >   From the standpoint of just the driver that's true. However, the remote
> > > machine and all the network bits between them are a string of single
> > > points of failure. Isn't it good that both disk and network can be
> > > supported.
> >
> The AIX support has a group just to beat on dumps customers send. What
> more evidence is needed that people can and do use the capability.
>

AIX has 4 people doing dumps in Austin (otherwise known as ZTRANS).  There are
others in other countries.
The folks from other countries were brought to Austin for training (usually for 3
months).
There is usually one person in L3 doing dumps in Austin for service, although
every subsystem has someone that specializes in reading dumps for that subsystem.

The first 4 people only do a scan of the dump to see if it's a known problem.  If
it's not
a known problem AND it's in AIX code it goes to whoever it is that owns that
subsystem.

Dumps are only the beginning with AIX.   Trace hooks along with dumps are VERY
useful.
The trace hooks are also what the performance people use.

The Linux kernel  is _extremely_  painful to debug compared to AIX.


--
Jennie Haywood
jehaywood@compuserve.com
Everyone is crazy. It's just a matter of degree.
jehaywood@yahoo.com
-
The oak tree in your backyard is just a nut that held its ground.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03 17:08 linux
@ 2002-11-03 19:14 ` jw schultz
  0 siblings, 0 replies; 35+ messages in thread
From: jw schultz @ 2002-11-03 19:14 UTC (permalink / raw)
  To: linux-kernel

On Sun, Nov 03, 2002 at 05:08:23PM -0000, linux@horizon.com wrote:
> Just to complicate things, consider this setup:
> 
> # cat /proc/swaps
> Filename			Type		Size	Used	Priority
> /dev/md5                        partition	999864	16904	0
> /dev/md6                        partition	999864	16924	0
> /dev/md7                        partition	999864	16920	0
> 
> Those are all RAID-1 mirrors, a measure whose ass-saving value I have
> enjoyed.
> 
> While a crash dump to just half of one of those mirrors is fine, finding it
> might be a little bit tricky.  And the fact that the kernel reassembles
> the mirrors automatically on boot might make retrieving the data a little
> bit tricky, too.
> 
> (After a crash, the mirrors will be inconsistent, so one will get copied
> over the other, but I'm not too clear on which direction it'll happen in.)
> 
> I can't NOT reassemble at least some mirrors on boot because / is mirrored!
> 
> Now, to that, add the case that each of those is significantly smaller than
> main memory.  (2/3 size would still allow swap = 2*ram.)

You would want a dump2disk that could span devices.
Probably a module that would put a header on each part with
a dumpID and sequence#.  Compression would also help here as
well.  The right compression would actually accelerate the
process.

Early userspace would locate and assemble the pieces and put
the dump somewhere.  This might happen between mounting /
and assembling the other mirrors.  That would be up to you.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
@ 2002-11-03 17:08 linux
  2002-11-03 19:14 ` jw schultz
  0 siblings, 1 reply; 35+ messages in thread
From: linux @ 2002-11-03 17:08 UTC (permalink / raw)
  To: linux-kernel

Just to complicate things, consider this setup:

# cat /proc/swaps
Filename			Type		Size	Used	Priority
/dev/md5                        partition	999864	16904	0
/dev/md6                        partition	999864	16924	0
/dev/md7                        partition	999864	16920	0

Those are all RAID-1 mirrors, a measure whose ass-saving value I have
enjoyed.

While a crash dump to just half of one of those mirrors is fine, finding it
might be a little bit tricky.  And the fact that the kernel reassembles
the mirrors automatically on boot might make retrieving the data a little
bit tricky, too.

(After a crash, the mirrors will be inconsistent, so one will get copied
over the other, but I'm not too clear on which direction it'll happen in.)

I can't NOT reassemble at least some mirrors on boot because / is mirrored!

Now, to that, add the case that each of those is significantly smaller than
main memory.  (2/3 size would still allow swap = 2*ram.)


The problem is that hardware is getting more and more sopisticated and
requiring ever more elaborate device drivers.  Eventually you have to
have a cutoff and say that something is too complex to talk to after a
crash, even though it's theoretically available.  Where is that line?
USB?  iSCSI?  This situation?

A reasonable fallback is to just drop in a cheap crappy dedicated
IDE drive for catching crash dumps, but I'd like the crash dumper to
know how to wake it up from sleep mode; I'd hate to leave it spinning
all the time...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03 14:33     ` Bill Davidsen
  2002-11-03 15:34       ` Bernd Eckenfels
@ 2002-11-03 16:32       ` Alan Cox
  2002-11-05 18:07         ` Bill Davidsen
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-11-03 16:32 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Matt D. Robinson, Steven King, Linus Torvalds, Joel Becker,
	Chris Friesen, Rusty Russell, Linux Kernel Mailing List,
	lkcd-general, lkcd-devel

On Sun, 2002-11-03 at 14:33, Bill Davidsen wrote:
> If you define "unmaintainably bad" as "having features you don't need"
> then I agree. But since dump to disk is in almost every other commercial
> UNIX, maybe someone would question why it's good for others but not for
> Linux.

It isnt about features, its about clean maintainable code. netdump to me
doesnt mean no dump to disk option. In fact I'd rather like to be able
to insmod dump-foo.o. The correctness issues are hard but if the
dump-foo is standalone, resets the hardware and has an SHA integrity
check then it can be done (think of it as a post crash variant of the
trusted computing TCB verification problem)

> uses the crash dump in AIX, the person who wants to send a compressed dump
> and money to IBM and get back a fix. Netdump assumes external resources

Lots of interesting legal issues but yes you can do it sometimes (DMCA,
privacy, financial duties sometimes make it horribly complex). Even in
the case where you only dump the oops its still valuable.

> and a functional secure network (is the dump encrypted and I missed it?)
> which home users surely don't have, and remote servers oftem lack as well.

Encrypting the dump with the new crypto lib in the kernel would be easy,
right now it doesnt. 

My disk dump concerns are purely those of correctness. That means

1.	After loading the module getting the block list for the dump target

2.	Resetting and scratch initializing the dump device

3.	Not relying on any code outside of the dump TCB that may have
been corrupted

4.	At dump time turning off all bus masters, doing the dump TCB
verification and then dumping

Most of the pieces already exist.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03 14:33     ` Bill Davidsen
@ 2002-11-03 15:34       ` Bernd Eckenfels
  2002-11-03 16:32       ` Alan Cox
  1 sibling, 0 replies; 35+ messages in thread
From: Bernd Eckenfels @ 2002-11-03 15:34 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.3.96.1021103092330.5197D-100000@gatekeeper.tmr.com> you wrote:
> If you define "unmaintainably bad" as "having features you don't need"
> then I agree. But since dump to disk is in almost every other commercial
> UNIX, maybe someone would question why it's good for others but not for
> Linux.

It is even in FreeBSD or Windows > ME

Greetings
Bernd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03  1:49   ` Alan Cox
@ 2002-11-03 14:33     ` Bill Davidsen
  2002-11-03 15:34       ` Bernd Eckenfels
  2002-11-03 16:32       ` Alan Cox
  0 siblings, 2 replies; 35+ messages in thread
From: Bill Davidsen @ 2002-11-03 14:33 UTC (permalink / raw)
  To: Alan Cox
  Cc: Matt D. Robinson, Steven King, Linus Torvalds, Joel Becker,
	Chris Friesen, Rusty Russell, Linux Kernel Mailing List,
	lkcd-general, lkcd-devel

On 3 Nov 2002, Alan Cox wrote:

> I would hope IBM have more intelligence than to attempt to destroy the
> product by trying to force all sorts of junk into it. The Linux world
> has a process for filterng crap, it isnt IBM applying force. That path
> leads to Star Office 5.2, Netscape 4 and other similar scales of horror
> code that become unmaintainably bad.

If you define "unmaintainably bad" as "having features you don't need"
then I agree. But since dump to disk is in almost every other commercial
UNIX, maybe someone would question why it's good for others but not for
Linux.

I can agree on stuff the non-hacker wouldn't use, but that is exactly who
uses the crash dump in AIX, the person who wants to send a compressed dump
and money to IBM and get back a fix. Netdump assumes external resources
and a functional secure network (is the dump encrypted and I missed it?)
which home users surely don't have, and remote servers oftem lack as well.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03  1:24 ` [lkcd-general] " Matt D. Robinson
  2002-11-03  1:49   ` Alan Cox
@ 2002-11-03  3:10   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2002-11-03  3:10 UTC (permalink / raw)
  To: Matt D. Robinson
  Cc: Alan Cox, Bill Davidsen, Steven King, Linus Torvalds,
	Joel Becker, Chris Friesen, Rusty Russell,
	Linux Kernel Mailing List, lkcd-general, lkcd-devel

On Sat, Nov 02, 2002 at 05:24:17PM -0800, Matt D. Robinson wrote:
> P.S.  IBM shouldn't have signed a contact with Red Hat without
>       requiring certain features in Red Hat's OS(es).  Pushing for
>       LKCD, kprobes, LTT, etc., wouldn't be on this list for a whole
>       variety of cases if that had been done in the first place.

Bah, it's enough that IBMs money totally fucked up the tree of one popular
distribution..


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-03  1:24 ` [lkcd-general] " Matt D. Robinson
@ 2002-11-03  1:49   ` Alan Cox
  2002-11-03 14:33     ` Bill Davidsen
  2002-11-03  3:10   ` Christoph Hellwig
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-11-03  1:49 UTC (permalink / raw)
  To: Matt D. Robinson
  Cc: Bill Davidsen, Steven King, Linus Torvalds, Joel Becker,
	Chris Friesen, Rusty Russell, Linux Kernel Mailing List,
	lkcd-general, lkcd-devel

On Sun, 2002-11-03 at 01:24, Matt D. Robinson wrote:
> P.S.  IBM shouldn't have signed a contact with Red Hat without
>       requiring certain features in Red Hat's OS(es).  Pushing for
>       LKCD, kprobes, LTT, etc., wouldn't be on this list for a whole
>       variety of cases if that had been done in the first place.

I would hope IBM have more intelligence than to attempt to destroy the
product by trying to force all sorts of junk into it. The Linux world
has a process for filterng crap, it isnt IBM applying force. That path
leads to Star Office 5.2, Netscape 4 and other similar scales of horror
code that become unmaintainably bad.

> P.S.  As an aside, too many engineers try and make product marketing
>       decisions at Red Hat.  I personally think that's really bad for
>       their business model as a whole (and I'm not referring to LKCD).

You think things like EVMS are a product marketing decision. I'm very
glad you don't run a Linux distro. It would turn into something like the
old 3com rapops rather rapidly by your models (3com rapops btw ceased to
exist and for good reasons)

Alan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-11-02 15:29 Alan Cox
@ 2002-11-03  1:24 ` Matt D. Robinson
  2002-11-03  1:49   ` Alan Cox
  2002-11-03  3:10   ` Christoph Hellwig
  0 siblings, 2 replies; 35+ messages in thread
From: Matt D. Robinson @ 2002-11-03  1:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: Bill Davidsen, Steven King, Linus Torvalds, Joel Becker,
	Chris Friesen, Rusty Russell, Linux Kernel Mailing List,
	lkcd-general, lkcd-devel

On 2 Nov 2002, Alan Cox wrote:
|>On Sat, 2002-11-02 at 05:17, Bill Davidsen wrote:
|>>   I was hoping Alan would push Redhat to put this in their Linux so we
|>> could resolve some of the ongoing problems which don't write an oops to a
|>> log, but I guess none of the developers has to actually support production
|>> servers and find out why they crash.
|>
|>I think several Red Hat people would disagree very strongly. Red Hat
|>shipped with the kernel symbol decoding oops reporter for a good reason,
|>and also acquired netdump for a good reason. 

It would be great if crash dumping were an option, at the very least
to unify the netdump, oops reporter and disk dumping (for those that
want it) into a single infrastructure.  Long term, that's probably
where this is going anyway.  It takes away the religious "who is right"
argument, which is fundamentally silly.

Maybe one day.  I think quite a few Red Hat customers would
appreciate it.

--Matt

P.S.  IBM shouldn't have signed a contact with Red Hat without
      requiring certain features in Red Hat's OS(es).  Pushing for
      LKCD, kprobes, LTT, etc., wouldn't be on this list for a whole
      variety of cases if that had been done in the first place.

P.S.  As an aside, too many engineers try and make product marketing
      decisions at Red Hat.  I personally think that's really bad for
      their business model as a whole (and I'm not referring to LKCD).


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] RE: What's left over.
  2002-10-31 22:47 Perez-Gonzalez, Inaky
@ 2002-11-01 13:06 ` Jan Iven
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Iven @ 2002-11-01 13:06 UTC (permalink / raw)
  To: 'Linus Torvalds'
  Cc: 'linux-kernel@vger.kernel.org',
	'lkcd-general@lists.sourceforge.net',
	'lkcd-devel@lists.sourceforge.net'

>>>>> "PI" == Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> writes:

 >> THAT is what I mean by vendor-driven. If vendors decide they 
 >> really want the patches, and I actually start seeing noises on 
 >> linux-kernel or getting
 >> requests for it being merged from _users_ rather than developers, then
 >> that means that the vendor is on to something.

For what it is worth, CERN has been using LKCD kernels for the last
6month or so, enabled mostly on headless farm machines (but the
kernels get deployed to desktops as well). Please consider including
it into the mainstream kernel.

Jan Iven
Linux support / CERN



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-10-31 22:20 Shawn
@ 2002-10-31 23:14 ` Bernhard Kaindl
  0 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2002-10-31 23:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: lkcd-general

On Thu, 31 Oct 2002, Shawn wrote:
>
> Linus has to "keep up" with all the changees coming into his inbox as
> well, and the more features, the more breakage that can happen when
> Linus accepts a patch.

Yes, but lkcd differs from the other changes because it can make the
life of people easyer which don't need the patch in the first place,
and help quality and shorten the time to fix bugs.

If someone triggers a problem, one can take a free partition or setup
an network dump server, run and if it happens again, there is a good
chance that all that is needed to fix the problem is in the dump,
the System.map and the Kerntypes file from the kernel which can
be consolidatet into a report with symbolic stack traces of the
CPUs and Tasks quite easy.

Original source, patches and configuration options are good for
analysing but not required if the Kerntypes file is there. The
config options could be even read from the dump if this would
be a liked feature. :-)

> Really, Linus wants to push some of his maintanance overhead to distros,
> who get paid to do it, but also to provide sexy bullet point items for
> users, so they buy "Linux" stuff.

Sure, but the work of the distros could be even better if the base
kernel has lkcd, LTT and dprobes (you don't have to enable them if
you don't need them) because then they would have more resources
to make other even more useful things. But it's up to someone
who merges the stuff.

Bernd



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
@ 2002-10-31 21:58 Richard J Moore
  0 siblings, 0 replies; 35+ messages in thread
From: Richard J Moore @ 2002-10-31 21:58 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: linux-kernel, lkcd-devel, lkcd-general, lkcd-general-admin,
	Rusty Russell, Linus Torvalds, Matt D. Robinson


> So, I think the stock kernel does need some form of disk dumping,
> regardless of any presence/absence of netdump.  But LKCD isn't there
yet...

But if we get into 2.5 the minimal kernel piece we need, we can continue to
enhance and expand dumping capability independently of the kernel via the
dump module.  And in this respect we have been actively working on
integrating the netdump concept with lkcd.


Richard


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-10-31 19:57       ` george anzinger
@ 2002-10-31 20:48         ` Stephen Hemminger
  0 siblings, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2002-10-31 20:48 UTC (permalink / raw)
  To: george anzinger
  Cc: Patrick Mochel, Dave Craft, Linus Torvalds, Matt D. Robinson,
	Rusty Russell, Kernel List, lkcd-general, lkcd-devel

On Thu, 2002-10-31 at 11:57, george anzinger wrote:
> Stephen Hemminger wrote:
> > FYI the criteria I apply for what goes into DCL is:
> > * Applys to large systems and databases
> > * Vendor support
> > * Conforms to Linux standard style
> > * Active project and maintainer that accepts feedback
> > * Community rejection has been mostly positive.
>               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Could you decode this :)
s/rejection/reaction/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-10-31 19:16     ` Stephen Hemminger
@ 2002-10-31 19:57       ` george anzinger
  2002-10-31 20:48         ` Stephen Hemminger
  0 siblings, 1 reply; 35+ messages in thread
From: george anzinger @ 2002-10-31 19:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick Mochel, Dave Craft, Linus Torvalds, Matt D. Robinson,
	Rusty Russell, Kernel List, lkcd-general, lkcd-devel

Stephen Hemminger wrote:
> FYI the criteria I apply for what goes into DCL is:
> * Applys to large systems and databases
> * Vendor support
> * Conforms to Linux standard style
> * Active project and maintainer that accepts feedback
> * Community rejection has been mostly positive.
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Could you decode this :)

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-10-31 18:45   ` Patrick Mochel
@ 2002-10-31 19:16     ` Stephen Hemminger
  2002-10-31 19:57       ` george anzinger
  0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2002-10-31 19:16 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Dave Craft, Linus Torvalds, Matt D. Robinson, Rusty Russell,
	Kernel List, lkcd-general, lkcd-devel

On Thu, 2002-10-31 at 10:45, Patrick Mochel wrote:
> 
> So, this is precisely where something like OSDL's Carrier Grade and Data 
> Center working groups can come into play, amazingly enough. 
> 
> By now, nearly everyone has heard about the working groups and nearly
> every developer that has, despises them. Even I resist association with
> them. But, they can have some real value to the vendors and the OEMs in 
> exactly the way you describe. 
>
> Take for example DCL. It's a kernel tree with several base patches 
> intended to make Linux better in the data center. The base is not fancy, 
> and includes things like LKCD and kdb (I think). It's actively maintained 
> and updated more often than Linus makes a release (by virtue of 
> bitkeeper).

LKCD is in and I try to keep it up to date with the patch stream.
KDB is not in yet, because the current posted patches are not up to date
to apply cleanly against 2.5.44 or 2.5.45.

> The intent is to later have multiple child trees that implement features
> for a specific application space (e.g. databases), while maintainig the
> same base set of features. People wishing to use the most recent kernel 
> with those features can use the DCL tree directly. Or an OEM FAE can use 
> the tree to build something for the vendor, or add extra features.

CGL hasn't decided what they want to change to.
DCL is going to have one tree focused on databases.

> Note that it's not a distribution. We don't even make real releases, since 
> we don't create tarballs or patches (it's only in BK, which actually kinda 
> sucks). It's merely a means to have these features actively maintained and 
> kept in synch. 

For DCL there is both a bitkeeper tree bk://bk.osdl.org/dcl-2.5 and
regular snapshots available on sourceforge
http://osdldcl.sourceforge.net
 
> And really, that's what everyone wants. Linus doesn't want the features,
> as don't other developers, regardless of the Buzzword or Coolness factors.
> Some vendors and users do want them. The developers of the features and
> distributors of features don't want to deal with the tedium and pain of
> updating patches each and every release.
> 
> In the end, it comes down to the fact that Linus's tree is Linus's tree. 
> Other people can have their trees. I'm not going to tell you go off and 
> make your own if you want those features so bad, because I know what a 
> pain in the ass it is, and I know having someone else do it is a lot 
> easier.
> 

FYI the criteria I apply for what goes into DCL is:
* Applys to large systems and databases
* Vendor support
* Conforms to Linux standard style
* Active project and maintainer that accepts feedback
* Community rejection has been mostly positive.


> DCL and CGL have their trees, for purposes probably very very similar to 
> what your customers need. I encourage you to check them out and work with 
> them (or talk to people in your company that are). Try and make it work, 
> and everyone can be happy (relativey). And, if DCL and CGL aren't 
> satisfying the space that you need, please speak up to OSDL and the 
> working groups. People are listening, and willing to take your suggestions 
> into consideration. 
> 
> Relevant URLs:
> 
> http://osdl.org/projects/cgl/
> http://osdl.org/projects/dcl/

Stephen Hemminger
Data Center Linux (DCL) Maintainer/Coordinater



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-10-31 17:55 ` [lkcd-general] " Dave Craft
@ 2002-10-31 18:45   ` Patrick Mochel
  2002-10-31 19:16     ` Stephen Hemminger
  0 siblings, 1 reply; 35+ messages in thread
From: Patrick Mochel @ 2002-10-31 18:45 UTC (permalink / raw)
  To: Dave Craft
  Cc: Linus Torvalds, Matt D. Robinson, Rusty Russell, linux-kernel,
	lkcd-general, lkcd-devel


On Thu, 31 Oct 2002, Dave Craft wrote:

> On Thu, 31 Oct 2002, Linus Torvalds wrote:
> 
> > What I'm saying by "vendor driven" is that it has no relevance for the
> > standard kernel, and since it has no relevance to that, then I have no
> > incentives to merge it. The crash dump is only useful with people who
> > actively look at the dumps, and I don't know _anybody_ outside of the
> > specialized vendors you mention who actually do that.
> 
>   Unfortunately the vast majority of the customers I deal with
>   buy a distribution and then put a kernel from kernel.org
>   on.  I believe this comes about because of either needing fixes
>   or function that appear in later kernels that have not made
>   it to the distributions kernels yet.
> 
>   Even if the distribution included LKCD in their kernel,
>   I lose lots of debug ability once customers switch over to
>   kernel.org and no longer have the LKCD patch.
> 
>   Thus we are currently left with having to maintain LKCD patches for
>   many arbitrary kernel.org kernels and convince customers to apply
>   it BEFORE they start encountering problems that we'll have to look at.
>   Application of patches that aren't automatically included in kernel.org
>   rarely happens with our customer set (before problems occur),
>   no matter how much we flag the issue to them up front.


So, this is precisely where something like OSDL's Carrier Grade and Data 
Center working groups can come into play, amazingly enough. 

By now, nearly everyone has heard about the working groups and nearly
every developer that has, despises them. Even I resist association with
them. But, they can have some real value to the vendors and the OEMs in 
exactly the way you describe. 

Take for example DCL. It's a kernel tree with several base patches 
intended to make Linux better in the data center. The base is not fancy, 
and includes things like LKCD and kdb (I think). It's actively maintained 
and updated more often than Linus makes a release (by virtue of 
bitkeeper).

The intent is to later have multiple child trees that implement features
for a specific application space (e.g. databases), while maintainig the
same base set of features. People wishing to use the most recent kernel 
with those features can use the DCL tree directly. Or an OEM FAE can use 
the tree to build something for the vendor, or add extra features.

Note that it's not a distribution. We don't even make real releases, since 
we don't create tarballs or patches (it's only in BK, which actually kinda 
sucks). It's merely a means to have these features actively maintained and 
kept in synch. 

And really, that's what everyone wants. Linus doesn't want the features,
as don't other developers, regardless of the Buzzword or Coolness factors.
Some vendors and users do want them. The developers of the features and
distributors of features don't want to deal with the tedium and pain of
updating patches each and every release.

In the end, it comes down to the fact that Linus's tree is Linus's tree. 
Other people can have their trees. I'm not going to tell you go off and 
make your own if you want those features so bad, because I know what a 
pain in the ass it is, and I know having someone else do it is a lot 
easier.

DCL and CGL have their trees, for purposes probably very very similar to 
what your customers need. I encourage you to check them out and work with 
them (or talk to people in your company that are). Try and make it work, 
and everyone can be happy (relativey). And, if DCL and CGL aren't 
satisfying the space that you need, please speak up to OSDL and the 
working groups. People are listening, and willing to take your suggestions 
into consideration. 

Relevant URLs:

http://osdl.org/projects/cgl/
http://osdl.org/projects/dcl/

	-pat "kissing serious butt" mochel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [lkcd-general] Re: What's left over.
  2002-10-31 15:46 Linus Torvalds
@ 2002-10-31 17:55 ` Dave Craft
  2002-10-31 18:45   ` Patrick Mochel
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Craft @ 2002-10-31 17:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt D. Robinson, Rusty Russell, linux-kernel, lkcd-general, lkcd-devel

On Thu, 31 Oct 2002, Linus Torvalds wrote:

> What I'm saying by "vendor driven" is that it has no relevance for the
> standard kernel, and since it has no relevance to that, then I have no
> incentives to merge it. The crash dump is only useful with people who
> actively look at the dumps, and I don't know _anybody_ outside of the
> specialized vendors you mention who actually do that.

  Unfortunately the vast majority of the customers I deal with
  buy a distribution and then put a kernel from kernel.org
  on.  I believe this comes about because of either needing fixes
  or function that appear in later kernels that have not made
  it to the distributions kernels yet.

  Even if the distribution included LKCD in their kernel,
  I lose lots of debug ability once customers switch over to
  kernel.org and no longer have the LKCD patch.

  Thus we are currently left with having to maintain LKCD patches for
  many arbitrary kernel.org kernels and convince customers to apply
  it BEFORE they start encountering problems that we'll have to look at.
  Application of patches that aren't automatically included in kernel.org
  rarely happens with our customer set (before problems occur),
  no matter how much we flag the issue to them up front.

  I realize that while my current capacity makes me fall into
  the 'vendor' support you speak of, I believe I am actually
  advocating its inclusion on behalf of real live customers.

  Vendors can and do actually help linux development, by screening,
  researching fixes, and or directly fixing lots of customer
  problems that you never have to deal with.  To do that, LKCD
  is the debug weapon of choice.

  I request you reconsider the inclusion of LKCD.

  Regards, Dave

	Mail : dave@austin.ibm.com	Phone : 512-838-8248


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2002-11-05 20:35 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <551170412@toto.iv>
2002-11-04  3:03 ` [lkcd-general] Re: What's left over Peter Chubb
2002-11-04 13:08   ` Alan Cox
2002-11-05 20:37 Dr. Greg Wettstein
     [not found] <Pine.LNX.4.44.0211040727330.771-100000@home.transmeta.com.suse.lists.linux.kernel>
     [not found] ` <1036429035.1718.99.camel@irongate.swansea.linux.org.uk.suse.lists.linux.kernel>
2002-11-04 16:53   ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2002-11-04 12:34 Richard J Moore
2002-11-04 11:59 Richard J Moore
2002-11-04 12:27 ` Lars Marowsky-Bree
2002-11-04 16:16 ` John Alvord
2002-11-04 16:22 ` Linus Torvalds
2002-11-04 16:57   ` Alan Cox
2002-11-05  9:05     ` Suparna Bhattacharya
2002-11-03 17:08 linux
2002-11-03 19:14 ` jw schultz
2002-11-03 13:48 Bill Davidsen
2002-11-04  2:44 ` [lkcd-general] " Jennie Haywood
2002-11-04 14:45   ` Henning P. Schmiedehausen
2002-11-04 15:29     ` Alan Cox
2002-11-04 15:27       ` Henning P. Schmiedehausen
2002-11-04 15:38         ` Patrick Finnegan
2002-11-04 16:51           ` Henning P. Schmiedehausen
2002-11-05  4:57     ` Werner Almesberger
2002-11-02 15:29 Alan Cox
2002-11-03  1:24 ` [lkcd-general] " Matt D. Robinson
2002-11-03  1:49   ` Alan Cox
2002-11-03 14:33     ` Bill Davidsen
2002-11-03 15:34       ` Bernd Eckenfels
2002-11-03 16:32       ` Alan Cox
2002-11-05 18:07         ` Bill Davidsen
2002-11-03  3:10   ` Christoph Hellwig
2002-10-31 22:47 Perez-Gonzalez, Inaky
2002-11-01 13:06 ` [lkcd-general] " Jan Iven
2002-10-31 22:20 Shawn
2002-10-31 23:14 ` [lkcd-general] " Bernhard Kaindl
2002-10-31 21:58 Richard J Moore
2002-10-31 15:46 Linus Torvalds
2002-10-31 17:55 ` [lkcd-general] " Dave Craft
2002-10-31 18:45   ` Patrick Mochel
2002-10-31 19:16     ` Stephen Hemminger
2002-10-31 19:57       ` george anzinger
2002-10-31 20:48         ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).