From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760262AbZFTDWx (ORCPT ); Fri, 19 Jun 2009 23:22:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754194AbZFTDUj (ORCPT ); Fri, 19 Jun 2009 23:20:39 -0400 Received: from mga03.intel.com ([143.182.124.21]:50791 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753473AbZFTDUF (ORCPT ); Fri, 19 Jun 2009 23:20:05 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.42,257,1243839600"; d="scan'208";a="156644150" Message-Id: <20090620031625.120958248@intel.com> References: <20090620031608.624240019@intel.com> User-Agent: quilt/0.46-1 Date: Sat, 20 Jun 2009 11:16:12 +0800 From: Wu Fengguang To: Andrew Morton Cc: LKML , Andi Kleen cc: Ingo Molnar Cc: Minchan Kim cc: Mel Gorman cc: "Wu, Fengguang" , Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra , Nick Piggin , Hugh Dickins , Andi Kleen , "riel@redhat.com" , "chris.mason@oracle.com" , "linux-mm@kvack.org" Subject: [PATCH 04/15] HWPOISON: Add new SIGBUS error codes for hardware poison signals Content-Disposition: inline; filename=poison-signal Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andi Kleen Add new SIGBUS codes for reporting machine checks as signals. When the hardware detects an uncorrected ECC error it can trigger these signals. This is needed for telling KVM's qemu about machine checks that happen to guests, so that it can inject them, but might be also useful for other programs. I find it useful in my test programs. This patch merely defines the new types. - Define two new si_codes for SIGBUS. BUS_MCEERR_AO and BUS_MCEERR_AR * BUS_MCEERR_AO is for "Action Optional" machine checks, which means that some corruption has been detected in the background, but nothing has been consumed so far. The program can ignore those if it wants (but most programs would already get killed) * BUS_MCEERR_AR is for "Action Required" machine checks. This happens when corrupted data is consumed or the application ran into an area which has been known to be corrupted earlier. These require immediate action and cannot just returned to. Most programs would kill themselves. - They report the address of the corruption in the user address space in si_addr. - Define a new si_addr_lsb field that reports the extent of the corruption to user space. That's currently always a (small) page. The user application cannot tell where in this page the corruption happened. AK: I plan to write a man page update before anyone asks. Signed-off-by: Andi Kleen --- include/asm-generic/siginfo.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- sound-2.6.orig/include/asm-generic/siginfo.h +++ sound-2.6/include/asm-generic/siginfo.h @@ -82,6 +82,7 @@ typedef struct siginfo { #ifdef __ARCH_SI_TRAPNO int _trapno; /* TRAP # which caused the signal */ #endif + short _addr_lsb; /* LSB of the reported address */ } _sigfault; /* SIGPOLL */ @@ -112,6 +113,7 @@ typedef struct siginfo { #ifdef __ARCH_SI_TRAPNO #define si_trapno _sifields._sigfault._trapno #endif +#define si_addr_lsb _sifields._sigfault._addr_lsb #define si_band _sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd @@ -192,7 +194,11 @@ typedef struct siginfo { #define BUS_ADRALN (__SI_FAULT|1) /* invalid address alignment */ #define BUS_ADRERR (__SI_FAULT|2) /* non-existant physical address */ #define BUS_OBJERR (__SI_FAULT|3) /* object specific hardware error */ -#define NSIGBUS 3 +/* hardware memory error consumed on a machine check: action required */ +#define BUS_MCEERR_AR (__SI_FAULT|4) +/* hardware memory error detected in process but not consumed: action optional*/ +#define BUS_MCEERR_AO (__SI_FAULT|5) +#define NSIGBUS 5 /* * SIGTRAP si_codes -- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 04/15] HWPOISON: Add new SIGBUS error codes for hardware poison signals Date: Sat, 20 Jun 2009 11:16:12 +0800 Message-ID: <20090620031625.120958248@intel.com> References: <20090620031608.624240019@intel.com> Return-path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 09B126B0055 for ; Fri, 19 Jun 2009 23:19:30 -0400 (EDT) Content-Disposition: inline; filename=poison-signal Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: LKML , Andi Kleen , Ingo Molnar , Minchan Kim , Mel Gorman , "Wu, Fengguang" , Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra , Nick Piggin , Hugh Dickins , Andi Kleen , "riel@redhat.com" , "chris.mason@oracle.com" , "linux-mm@kvack.org" List-Id: linux-mm.kvack.org From: Andi Kleen Add new SIGBUS codes for reporting machine checks as signals. When the hardware detects an uncorrected ECC error it can trigger these signals. This is needed for telling KVM's qemu about machine checks that happen to guests, so that it can inject them, but might be also useful for other programs. I find it useful in my test programs. This patch merely defines the new types. - Define two new si_codes for SIGBUS. BUS_MCEERR_AO and BUS_MCEERR_AR * BUS_MCEERR_AO is for "Action Optional" machine checks, which means that some corruption has been detected in the background, but nothing has been consumed so far. The program can ignore those if it wants (but most programs would already get killed) * BUS_MCEERR_AR is for "Action Required" machine checks. This happens when corrupted data is consumed or the application ran into an area which has been known to be corrupted earlier. These require immediate action and cannot just returned to. Most programs would kill themselves. - They report the address of the corruption in the user address space in si_addr. - Define a new si_addr_lsb field that reports the extent of the corruption to user space. That's currently always a (small) page. The user application cannot tell where in this page the corruption happened. AK: I plan to write a man page update before anyone asks. Signed-off-by: Andi Kleen --- include/asm-generic/siginfo.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- sound-2.6.orig/include/asm-generic/siginfo.h +++ sound-2.6/include/asm-generic/siginfo.h @@ -82,6 +82,7 @@ typedef struct siginfo { #ifdef __ARCH_SI_TRAPNO int _trapno; /* TRAP # which caused the signal */ #endif + short _addr_lsb; /* LSB of the reported address */ } _sigfault; /* SIGPOLL */ @@ -112,6 +113,7 @@ typedef struct siginfo { #ifdef __ARCH_SI_TRAPNO #define si_trapno _sifields._sigfault._trapno #endif +#define si_addr_lsb _sifields._sigfault._addr_lsb #define si_band _sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd @@ -192,7 +194,11 @@ typedef struct siginfo { #define BUS_ADRALN (__SI_FAULT|1) /* invalid address alignment */ #define BUS_ADRERR (__SI_FAULT|2) /* non-existant physical address */ #define BUS_OBJERR (__SI_FAULT|3) /* object specific hardware error */ -#define NSIGBUS 3 +/* hardware memory error consumed on a machine check: action required */ +#define BUS_MCEERR_AR (__SI_FAULT|4) +/* hardware memory error detected in process but not consumed: action optional*/ +#define BUS_MCEERR_AO (__SI_FAULT|5) +#define NSIGBUS 5 /* * SIGTRAP si_codes -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org