From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261623AbVFOWSa (ORCPT ); Wed, 15 Jun 2005 18:18:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261622AbVFOWSM (ORCPT ); Wed, 15 Jun 2005 18:18:12 -0400 Received: from omx1-ext.sgi.com ([192.48.179.11]:62675 "EHLO omx1.americas.sgi.com") by vger.kernel.org with ESMTP id S261628AbVFOWJX (ORCPT ); Wed, 15 Jun 2005 18:09:23 -0400 From: Russ Anderson Message-Id: <200506152209.j5FM9HgD1464876@clink.americas.sgi.com> Subject: Re: [RFC] Linux memory error handling To: macro@linux-mips.org (Maciej W. Rozycki) Date: Wed, 15 Jun 2005 17:09:17 -0500 (CDT) Cc: rja@sgi.com (Russ Anderson), linux-kernel@vger.kernel.org In-Reply-To: from "Maciej W. Rozycki" at Jun 15, 2005 04:26:13 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Maciej W. Rozycki wrote: > On Wed, 15 Jun 2005, Russ Anderson wrote: > > > Handling memory errors: > > > > Polling Threshold: A solid single bit error can cause a burst > > of correctable errors that can cause a significant logging > > overhead. SBE thresholding counts the number of SBEs for > > a given page and if too many SBEs are detected in a given > > period of time, the interrupt is disabled and instead > > linux periodically polls for corrected errors. > > This is highly undesirable if the same interrupt is used for MBEs. A > page that causes an excessive number of SBEs should rather be removed from > the available pool instead. As a practical point I think you are right that if there are enough SBEs to cause a performance hit, migrating the data to a different physical page would be a prudent thing to do. But that functionality hasn't been implemented yet. That may not always be the right setting for all customers. One possible way to deal with that would be to have different threshold settings for logging and page migration. That would provide flexibility. > Logging should probably take recent events > into account anyway and take care of not overloading the system, e.g. by > keeping only statistical data instead of detailed information about each > event under load. That's what the SBE thresholding does. It avoids overloading the system by switching from interrupt mode to periodic polling mode, where detailed information can get dropped. > > Memory handling parameters: > > > > Since memory failure modes are due to specific DIMM > > failure characteristics, there is will be no way to > > reach agreement on one set of thresholds that will > > be appropriate for all configurations. Therefore there > > needs to be a way to modify the thresholds. One alternative > > is a /proc/sys/kernel/ interface to control settings, such > > as polling thresholds. That provides an easy standard > > way of modifying thresholds to match the characteristics > > of the specific DIMM type. > > Note that scrubbing may also be required depending on hardware > capabilities as data could have been corrected on the fly for the purpose > of providing a correct value for the bus transaction, but memory may still > hold corrupted data. Good point. > And of course not all memory is DIMM! Another good point. > > Uncorrected error handling: > > > > Kill the application: One recovery technique to avoid a kernel > > panic when an application process hits an uncorrectable > > memory error is to SIGKILL the application. The page is > > marked PG_reserved to avoid re-use. A (new) PG_hard_error > > flag would be useful to indicate that the physical page has > > a hard memory error. > > Note we have some infrastructure for that in the MIPS port -- we kill the > triggering process, but we don't mark the problematic memory page as > unusable (which is an area for improvement). Mips has some nice features when it comes to error recovery. Thanks, -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com