From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757872Ab0LTS6V (ORCPT ); Mon, 20 Dec 2010 13:58:21 -0500 Received: from mail.skyhub.de ([78.46.96.112]:57006 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757292Ab0LTS6T (ORCPT ); Mon, 20 Dec 2010 13:58:19 -0500 Date: Mon, 20 Dec 2010 19:58:12 +0100 From: Borislav Petkov To: Linus Torvalds Cc: Tony Luck , Tony Luck , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, tglx@linutronix.de, mingo@elte.hu, greg@kroah.com, akpm@linux-foundation.org, ying.huang@intel.com, David Miller , Alan Cox , Jim Keniston , Kyungmin Park , Geert Uytterhoeven Subject: Re: [concept & "good taste" review] persistent store Message-ID: <20101220185812.GA11285@liondog.tnic> Mail-Followup-To: Borislav Petkov , Linus Torvalds , Tony Luck , Tony Luck , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, tglx@linutronix.de, mingo@elte.hu, greg@kroah.com, akpm@linux-foundation.org, ying.huang@intel.com, David Miller , Alan Cox , Jim Keniston , Kyungmin Park , Geert Uytterhoeven References: <4D0BEE1F.7020008@zytor.com> <20101219091752.GA16150@liondog.tnic> <20101220072632.GA28020@liondog.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 20, 2010 at 09:18:25AM -0800, Linus Torvalds wrote: > On Sun, Dec 19, 2010 at 11:26 PM, Borislav Petkov wrote: > > > > IOW, the simple (maybe too simple) algo of the pstore could be something > > like: > > Simple? > > > 1. Got a relevant message from kernel, log it. > > > > 2. Am I still alive? > > Umm. The "am I still alive" question is traditionally called "the > stopping problem", and is considered to be the traditional example of > _least_ simple problem there is. As in "fundamentally unsolvable". Yeah, I meant simple in the sense of only two steps required. > Did we kill X? Did we happen to hold some critical lock when oopsing? > Was it syslogd itself that died and caused nothing further to be > saved, even if the machine otherwise seems to be fine? Or did the > filesystem go into read-only mode due to the problem and the rest of > the system is fine, but the disk is never going to see the messages? > > In other words, the problem really is that "am I still alive" thing. > That's a seriously impossible question to answer. Maybe we should rephrase this as "am I still alive and well," for a specific definition of well. > What _can_ be answered is "did somebody write out the oops, then > fsync, and then notify us about it?" But without explicit notification > of "yeah, it really is saved off somewhere else", we really can't > tell. That could work, I should look deeper into that. > We could do heuristics, of course, and they might even work in > practice (like "flush after half an hour if there has been actual work > done and the machine is clearly making progress"). Yes, this was exactly what I was trying to say! Do something in a watchdog handler path that shows that we actually made progress. But you're right, we'd still need the notification. My look at "did we make a progress" was too simple and there _are_ nuances which need to be accounted for. Thanks. -- Regards/Gruss, Boris.