linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Yet another crash dump tool
@ 2004-10-13 23:05 Itsuro Oda
  2004-10-14 11:29 ` Robin Holt
  2004-12-23 11:59 ` [Fastboot] " Eric W. Biederman
  0 siblings, 2 replies; 9+ messages in thread
From: Itsuro Oda @ 2004-10-13 23:05 UTC (permalink / raw)
  To: linux-kernel, fastboot; +Cc: oda

Hello,

We released a crash dump tool called "mini kernel dump".

Please see the following URL to get the motivation and the
overview of the mini kernel dump.
http://mkdump.sourceforge.net/

http://sourceforge.net/projects/mkdump/ 

Thank you.  
-- 
Itsuro ODA <oda@valinux.co.jp>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Yet another crash dump tool
  2004-10-13 23:05 Yet another crash dump tool Itsuro Oda
@ 2004-10-14 11:29 ` Robin Holt
  2004-10-15  0:08   ` Itsuro Oda
  2004-12-23 11:59 ` [Fastboot] " Eric W. Biederman
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Holt @ 2004-10-14 11:29 UTC (permalink / raw)
  To: Itsuro Oda; +Cc: linux-kernel, fastboot

On Thu, Oct 14, 2004 at 08:05:41AM +0900, Itsuro Oda wrote:
> Hello,
> 
> We released a crash dump tool called "mini kernel dump".
> 
> Please see the following URL to get the motivation and the
> overview of the mini kernel dump.
> http://mkdump.sourceforge.net/
> 
> http://sourceforge.net/projects/mkdump/ 

I am not sure why this is such a huge improvement.  The one
concern I have is you blindly are copying all of memory to the
dump device.  Can you dump device span multiple volumes?  If I
have a system using 1TB of physical memory, but 98% of that
is allocated as huge TLB pages for users, do I _REALLY_ need to
dump them all?

lkcd, and I would hope others, only dump kernel pages unless
configured to do otherwise.  More importantly lkcd can
eliminate page cache and buffer cache pages.  Those types of
pages are seldom relevant to figuring out what actually went
wrong.

Realistically, if the basic structures telling you whether pages
are used by the kernel or not are so messed up you can not use
them for dumping, they have probably been allocated to multiple
users and will be riddled with inconsistent information.

Robin Holt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Yet another crash dump tool
  2004-10-14 11:29 ` Robin Holt
@ 2004-10-15  0:08   ` Itsuro Oda
  2004-10-15 13:06     ` Robin Holt
  0 siblings, 1 reply; 9+ messages in thread
From: Itsuro Oda @ 2004-10-15  0:08 UTC (permalink / raw)
  To: Robin Holt; +Cc: linux-kernel, fastboot

Hi,

> dump device.  Can you dump device span multiple volumes?  If I
yes. (maybe. suppose logical volume.)

> have a system using 1TB of physical memory, but 98% of that
> is allocated as huge TLB pages for users, do I _REALLY_ need to
> dump them all?
yes, absolutely, for us.

Our target is customer's production system, not developping/debugging
system. The chance of capturing fault analysis materials may be only
one time. If a kernel destroy the memory using user process(page cache
, buffer cache), looking the pattern of destroy is great helpful to 
analyze. (note that I have encountered such case many times)
We also analyze user proccesses at the crash time from the dump.

> lkcd, and I would hope others, only dump kernel pages unless
> configured to do otherwise.

You should chose a dump tool you like.

We believe we need whole memory. But we understand there is an opinion
like you (reduce saving memory is better). We don't force to use our
tool. We make "mini kernel dump" as independent from kernel as possible.

Thank you.
-- 
Itsuro ODA <oda@valinux.co.jp>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Yet another crash dump tool
  2004-10-15  0:08   ` Itsuro Oda
@ 2004-10-15 13:06     ` Robin Holt
  2004-10-18  3:33       ` Itsuro Oda
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Holt @ 2004-10-15 13:06 UTC (permalink / raw)
  To: Itsuro Oda; +Cc: Robin Holt, linux-kernel, fastboot

On Fri, Oct 15, 2004 at 09:08:41AM +0900, Itsuro Oda wrote:
> Hi,
> 
> > dump device.  Can you dump device span multiple volumes?  If I
> yes. (maybe. suppose logical volume.)

1TB of memory being written to device at 200Mb/sec will take
1 1/2 hours to dump.  That seems like a long time.  I start
getting frustrated at a few minutes of lost production.  Can
you add a feature to only dump kernel pages, kernel pages +
page/buffer cache, or all of memory?  If not, this is a step
backwards in dumping.  We have seen RFPs from some potential
customers for as much as 16PB of memory.  I am not sure that
anybody builds hardware that scales to that level, but it
certainly shows you a problem.

> 
> > have a system using 1TB of physical memory, but 98% of that
> > is allocated as huge TLB pages for users, do I _REALLY_ need to
> > dump them all?
> yes, absolutely, for us.
> 
> Our target is customer's production system, not developping/debugging
> system. The chance of capturing fault analysis materials may be only
> one time. If a kernel destroy the memory using user process(page cache
> , buffer cache), looking the pattern of destroy is great helpful to 
> analyze. (note that I have encountered such case many times)
> We also analyze user proccesses at the crash time from the dump.

I have analyzed many dumps and never even had the desire to look at
the user processes.  Additionally, some of our customers have
classified data.  They require assurances that the minimal amount
of their unclassified data is being sent outside their control to
reduce the chance that someone can infer their methods.

> 
> > lkcd, and I would hope others, only dump kernel pages unless
> > configured to do otherwise.
> 
> You should chose a dump tool you like.
> 
> We believe we need whole memory. But we understand there is an opinion
> like you (reduce saving memory is better). We don't force to use our
> tool. We make "mini kernel dump" as independent from kernel as possible.
> 


Thanks,
Robin Holt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Yet another crash dump tool
  2004-10-15 13:06     ` Robin Holt
@ 2004-10-18  3:33       ` Itsuro Oda
  0 siblings, 0 replies; 9+ messages in thread
From: Itsuro Oda @ 2004-10-18  3:33 UTC (permalink / raw)
  To: Robin Holt; +Cc: linux-kernel, fastboot, oda

Hi,

> Can
> you add a feature to only dump kernel pages, kernel pages +
> page/buffer cache, or all of memory?  If not, this is a step

yes. actually the mini kernel dump has an interface from the
operational kernel to the mini kernel that contains "what pfn
should be dumped". 

It takes 2-3 minutes to dump 8GB memory (our typical customer).
So it is relative low priority to support selecting dump pages,
but I remind it.

> backwards in dumping.  We have seen RFPs from some potential
> customers for as much as 16PB of memory.  I am not sure that
> anybody builds hardware that scales to that level, but it
> certainly shows you a problem.

Maybe we should develop another fault isolation method on such
system.

> classified data.  They require assurances that the minimal amount
> of their unclassified data is being sent outside their control to
> reduce the chance that someone can infer their methods.

It is an important point to provide the fault analysis service.
We are considering that point. It is rather an operational problem 
than technical problem. We are planning some cryptographic mechanism.

Thank you.
-- 
Itsuro ODA <oda@valinux.co.jp>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Yet another crash dump tool
  2004-10-13 23:05 Yet another crash dump tool Itsuro Oda
  2004-10-14 11:29 ` Robin Holt
@ 2004-12-23 11:59 ` Eric W. Biederman
  2005-01-06  1:54   ` Itsuro Oda
  1 sibling, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2004-12-23 11:59 UTC (permalink / raw)
  To: Itsuro Oda; +Cc: linux-kernel, fastboot

Itsuro Oda <oda@valinux.co.jp> writes:

> Hello,
> 
> We released a crash dump tool called "mini kernel dump".
> 
> Please see the following URL to get the motivation and the
> overview of the mini kernel dump.
> http://mkdump.sourceforge.net/
> 
> http://sourceforge.net/projects/mkdump/

While the exact details differ this seems to be strategically
the same thing as kexec crash based dumps, which are also being
developed right now.  Would you be willing to work on the kexec system
call so we can get a infrastructure that reliably does what is needed
for everyone? 

Reading your documentation it seems to indicate that you have
successfully avoid using any memory that the crashing kernel used.
Is that correct?

And just for a little active feedback.  While you safely tuck
your kernel away in your reserved area of memory it does not appear
you tuck away the data structures necessary to get there.  Which
makes me just a little nervous.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Yet another crash dump tool
  2004-12-23 11:59 ` [Fastboot] " Eric W. Biederman
@ 2005-01-06  1:54   ` Itsuro Oda
  2005-01-06  5:25     ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Itsuro Oda @ 2005-01-06  1:54 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, fastboot

Hi,

On 23 Dec 2004 04:59:03 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> developed right now.  Would you be willing to work on the kexec system
> call so we can get a infrastructure that reliably does what is needed
> for everyone? 

We concentrate on the fault analysis. We think the original aim of the
kexec (== fastboot) differ from the caputuring dump. However,
since we apply the effort of the kexec project to mkdump, we are happy
to return something to the kexec project. 

> Reading your documentation it seems to indicate that you have
> successfully avoid using any memory that the crashing kernel used.
> Is that correct?

No. If the code or the data structures running from crash occur to the
mini kernel start (although it is very short) is damaged, starting the 
mini kernel will fail.
What we done (and will do partialy) is that the logical possibility of 
the deadlock/hang condition is eliminated from the code running from 
crash occur to the mini kernel start.

> And just for a little active feedback.  While you safely tuck
> your kernel away in your reserved area of memory it does not appear
> you tuck away the data structures necessary to get there.  Which
> makes me just a little nervous.

What do you mean "the data structures necessary to get there" ?
The necessary information to run the mini kernel and to caputure dump 
is stored in the reserved area at the same time of loading the mini kernel
(during the kernel is normal).

> Eric

Thanks.
-- 
Itsuro ODA <oda@valinux.co.jp>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Yet another crash dump tool
  2005-01-06  1:54   ` Itsuro Oda
@ 2005-01-06  5:25     ` Eric W. Biederman
  2005-01-06 23:55       ` Itsuro Oda
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2005-01-06  5:25 UTC (permalink / raw)
  To: Itsuro Oda; +Cc: fastboot, linux-kernel

Itsuro Oda <oda@valinux.co.jp> writes:

> Hi,
> 
> On 23 Dec 2004 04:59:03 -0700
> ebiederm@xmission.com (Eric W. Biederman) wrote:
> 
> > developed right now.  Would you be willing to work on the kexec system
> > call so we can get a infrastructure that reliably does what is needed
> > for everyone? 
> 
> We concentrate on the fault analysis. We think the original aim of the
> kexec (== fastboot) differ from the caputuring dump. However,
> since we apply the effort of the kexec project to mkdump, we are happy
> to return something to the kexec project. 

As a whole it has always been a goal.  The functionality was first
prototyped with kexec predecessor mcore.   At OLS this year there
appears to have been a lot of discussion the consensus reached was
that a kexec type mechanism was the way to go.  From comments Andrew
Morton has made I know it is something he would like to see.

Hariprasad Nellitheertha <hari@in.ibm.com>, and Vivek Goyal
<vgoyal@in.ibm.com> have recently been working on getting  the crash
dump case working.

Personally the crash dump case is not a large motivator but at the
same time I find handling the general case is quite important.
Currently I am in the process of cleaning things up and simplifying
them so hopefully we have something interesting.

> > Reading your documentation it seems to indicate that you have
> > successfully avoid using any memory that the crashing kernel used.
> > Is that correct?
> 
> No. If the code or the data structures running from crash occur to the
> mini kernel start (although it is very short) is damaged, starting the 
> mini kernel will fail.
> What we done (and will do partialy) is that the logical possibility of 
> the deadlock/hang condition is eliminated from the code running from 
> crash occur to the mini kernel start.

Let me clarify my question.

One of the problems  Hariprasad and Vivek seem to have been having is
that the keeping the crash dump kernel from using the first 1M.  You
have avoided that problem correct?

> > And just for a little active feedback.  While you safely tuck
> > your kernel away in your reserved area of memory it does not appear
> > you tuck away the data structures necessary to get there.  Which
> > makes me just a little nervous.
> 
> What do you mean "the data structures necessary to get there" ?
> The necessary information to run the mini kernel and to caputure dump 
> is stored in the reserved area at the same time of loading the mini kernel
> (during the kernel is normal).

As I recall from looking at your patch and it was obviously your last
version was that you were using kmalloc or get_free_pages for some 
of your data structures that controlled the loaded of the mini kernel
instead of allocating those data structures from the reserved area.

I'm not quite there as currently I have one structure still not
allocated in the reserved area.  But everything else is.

As soon as I can manage to focus I will have a new patch set out.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Yet another crash dump tool
  2005-01-06  5:25     ` Eric W. Biederman
@ 2005-01-06 23:55       ` Itsuro Oda
  0 siblings, 0 replies; 9+ messages in thread
From: Itsuro Oda @ 2005-01-06 23:55 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: fastboot, linux-kernel

Hi,

On 05 Jan 2005 22:25:34 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> One of the problems  Hariprasad and Vivek seem to have been having is
> that the keeping the crash dump kernel from using the first 1M.  You
> have avoided that problem correct?

alloc_pages(ZONE_NORMAL) is used to get memory area for the mini kernel
in "4MB unit"(i386). So the pages is never under 1M.
(for x86_64, alloc_bootmem is used to reserve the memory for the mini
 kernel. alloc_pages does not guarantee under 4GB!!)

> As I recall from looking at your patch and it was obviously your last
> version was that you were using kmalloc or get_free_pages for some 
> of your data structures that controlled the loaded of the mini kernel
> instead of allocating those data structures from the reserved area.

yes. kmalloc is used to get kimage struct. Indeed it is more safe to
put such structues in the reserved area (and write protected).
Thank you for your indication.

> Eric

Thanks.
-- 
Itsuro ODA <oda@valinux.co.jp>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-01-07  0:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-13 23:05 Yet another crash dump tool Itsuro Oda
2004-10-14 11:29 ` Robin Holt
2004-10-15  0:08   ` Itsuro Oda
2004-10-15 13:06     ` Robin Holt
2004-10-18  3:33       ` Itsuro Oda
2004-12-23 11:59 ` [Fastboot] " Eric W. Biederman
2005-01-06  1:54   ` Itsuro Oda
2005-01-06  5:25     ` Eric W. Biederman
2005-01-06 23:55       ` Itsuro Oda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).