From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755603Ab0LQSTT (ORCPT ); Fri, 17 Dec 2010 13:19:19 -0500 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:43306 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754883Ab0LQSTS (ORCPT ); Fri, 17 Dec 2010 13:19:18 -0500 Subject: Re: [concept & "good taste" review] persistent store From: James Bottomley To: Tony Luck Cc: Linus Torvalds , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, tglx@linutronix.de, mingo@elte.hu, greg@kroah.com, akpm@linux-foundation.org, ying.huang@intel.com, Borislav Petkov , David Miller , Alan Cox , Jim Keniston , Kyungmin Park , Geert Uytterhoeven , "H. Peter Anvin" In-Reply-To: References: <4d0662e511688484b3@agluck-desktop.sc.intel.com> Content-Type: text/plain; charset="UTF-8" Date: Fri, 17 Dec 2010 13:19:12 -0500 Message-ID: <1292609952.2820.43.camel@mulgrave.site> Mime-Version: 1.0 X-Mailer: Evolution 2.30.1.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2010-12-17 at 10:09 -0800, Tony Luck wrote: > On Thu, Dec 16, 2010 at 10:28 PM, Tony Luck wrote: > >> The _only_ valid reason for persistent storage is for things like > >> oopses that kill the machine. > > > > Maybe I misunderstood what "KMSG_DUMP_OOPS" meant ... it > > looked to me like this code is used for non-fatal OOPsen - ones > > that will be logged to /var/log/messages. > > Thinking about this a bit more I see my experiments with > this were hopelessly naive. There is no way to know at > "oops" time whether the problem is going to turn out to > be minor or fatal. So the right thing to do here is assume > the worst and squirrel the data away safely just in case > death is imminent. To be honest, this is what I'd recommend even if you could tell the difference. A lot of the oopses I see were triggered by something non-fatal (usually a WARN_ON()) earlier in the sequence. Without seeing the preceding WARN_ON() data, the oops is usually terrifically hard to diagnose (often just a NULL or junk pointer deref). James