From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932968Ab0LTRYk (ORCPT ); Mon, 20 Dec 2010 12:24:40 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:59861 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932421Ab0LTRYh (ORCPT ); Mon, 20 Dec 2010 12:24:37 -0500 MIME-Version: 1.0 In-Reply-To: <20101220072632.GA28020@liondog.tnic> References: <4D0BEE1F.7020008@zytor.com> <20101219091752.GA16150@liondog.tnic> <20101220072632.GA28020@liondog.tnic> From: Linus Torvalds Date: Mon, 20 Dec 2010 09:18:25 -0800 Message-ID: Subject: Re: [concept & "good taste" review] persistent store To: Borislav Petkov , Tony Luck , Tony Luck , Linus Torvalds , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, tglx@linutronix.de, mingo@elte.hu, greg@kroah.com, akpm@linux-foundation.org, ying.huang@intel.com, David Miller , Alan Cox , Jim Keniston , Kyungmin Park , Geert Uytterhoeven Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 19, 2010 at 11:26 PM, Borislav Petkov wrote: > > IOW, the simple (maybe too simple) algo of the pstore could be something > like: Simple? > 1. Got a relevant message from kernel, log it. > > 2. Am I still alive? Umm. The "am I still alive" question is traditionally called "the stopping problem", and is considered to be the traditional example of _least_ simple problem there is. As in "fundamentally unsolvable". Did we kill X? Did we happen to hold some critical lock when oopsing? Was it syslogd itself that died and caused nothing further to be saved, even if the machine otherwise seems to be fine? Or did the filesystem go into read-only mode due to the problem and the rest of the system is fine, but the disk is never going to see the messages? In other words, the problem really is that "am I still alive" thing. That's a seriously impossible question to answer. What _can_ be answered is "did somebody write out the oops, then fsync, and then notify us about it?" But without explicit notification of "yeah, it really is saved off somewhere else", we really can't tell. We could do heuristics, of course, and they might even work in practice (like "flush after half an hour if there has been actual work done and the machine is clearly making progress"). Linus From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: [concept & "good taste" review] persistent store Date: Mon, 20 Dec 2010 09:18:25 -0800 Message-ID: References: <4D0BEE1F.7020008@zytor.com> <20101219091752.GA16150@liondog.tnic> <20101220072632.GA28020@liondog.tnic> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Return-path: Received: from smtp1.linux-foundation.org ([140.211.169.13]:59861 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932421Ab0LTRYh (ORCPT ); Mon, 20 Dec 2010 12:24:37 -0500 In-Reply-To: <20101220072632.GA28020@liondog.tnic> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Borislav Petkov , Tony Luck , Tony Luck , Linus Torvalds , "H. Peter Anvin" lin On Sun, Dec 19, 2010 at 11:26 PM, Borislav Petkov wrote: > > IOW, the simple (maybe too simple) algo of the pstore could be something > like: Simple? > 1. Got a relevant message from kernel, log it. > > 2. Am I still alive? Umm. The "am I still alive" question is traditionally called "the stopping problem", and is considered to be the traditional example of _least_ simple problem there is. As in "fundamentally unsolvable". Did we kill X? Did we happen to hold some critical lock when oopsing? Was it syslogd itself that died and caused nothing further to be saved, even if the machine otherwise seems to be fine? Or did the filesystem go into read-only mode due to the problem and the rest of the system is fine, but the disk is never going to see the messages? In other words, the problem really is that "am I still alive" thing. That's a seriously impossible question to answer. What _can_ be answered is "did somebody write out the oops, then fsync, and then notify us about it?" But without explicit notification of "yeah, it really is saved off somewhere else", we really can't tell. We could do heuristics, of course, and they might even work in practice (like "flush after half an hour if there has been actual work done and the machine is clearly making progress"). Linus