* [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error @ 2014-11-08 1:40 Chen Yucong 2014-11-08 1:40 ` [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle " Chen Yucong ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Chen Yucong @ 2014-11-08 1:40 UTC (permalink / raw) To: bp, tony.luck; +Cc: ak, aravind.gopalakrishnan, linux-edac, linux-kernel Hi all, At the suggestion of Boris, the first patch extends the mce_severity mechanism for handling UCNA/DEFERRED error. Link: https://lkml.org/lkml/2014/10/23/190 v2: The first patch have also eliminated a big hack to make mce_severity() work when called from non-exception context on the advice of Tony and Boris. Link: https://lkml.org/lkml/2014/10/27/1017 And on the basis of the first patch, the second patch adds the support for identifying and handling UCNA/DEFERRED error in machine_check_poll. V3: According to Boris, the second patch have also split `memory_error' from mem_deferred_error so that the memory_error() function can be used in other code paths separately. Link: https://lkml.org/lkml/2014/11/6/452 Boris also reported the warning about "MCI_STATUS_POISON" and "MCI_STATUS_POISON" redefined. thx! cyc [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-08 1:40 [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Chen Yucong @ 2014-11-08 1:40 ` Chen Yucong 2014-11-10 22:06 ` Aravind Gopalakrishnan 2014-11-08 1:40 ` [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll Chen Yucong 2014-11-10 16:42 ` [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Borislav Petkov 2 siblings, 1 reply; 17+ messages in thread From: Chen Yucong @ 2014-11-08 1:40 UTC (permalink / raw) To: bp, tony.luck Cc: ak, aravind.gopalakrishnan, linux-edac, linux-kernel, Chen Yucong Until now, the mce_severity mechanism can only identify the severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter out DEFERRED error for ADM platform. This patch aims to extend the mce_severity mechanism for handling UCNA/DEFERRED error. In order to do this, the patch introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY. In addition, mce_severity is specific to machine check exception, and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity mechanism in non-exception context, the patch also introduces a new argument (is_excp) for mce_severity. `is_excp' is used to explicitly specify the calling context of mce_severity. Signed-off-by: Chen Yucong <slaoub@gmail.com> --- arch/x86/include/asm/mce.h | 4 ++++ arch/x86/kernel/cpu/mcheck/mce-internal.h | 4 +++- arch/x86/kernel/cpu/mcheck/mce-severity.c | 21 ++++++++++++++++----- arch/x86/kernel/cpu/mcheck/mce.c | 14 ++++++++------ drivers/edac/mce_amd.h | 3 --- 5 files changed, 31 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 276392f..51b26e89 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -34,6 +34,10 @@ #define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */ #define MCI_STATUS_AR (1ULL<<55) /* Action required */ +/* AMD-specific bits */ +#define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */ +#define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */ + /* * Note that the full MCACOD field of IA32_MCi_STATUS MSR is * bits 15:0. But bit 12 is the 'F' bit, defined for corrected diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h index 09edd0b..10b4690 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h @@ -3,6 +3,8 @@ enum severity_level { MCE_NO_SEVERITY, + MCE_DEFERRED_SEVERITY, + MCE_UCNA_SEVERITY = MCE_DEFERRED_SEVERITY, MCE_KEEP_SEVERITY, MCE_SOME_SEVERITY, MCE_AO_SEVERITY, @@ -21,7 +23,7 @@ struct mce_bank { char attrname[ATTR_LEN]; /* attribute name */ }; -int mce_severity(struct mce *a, int tolerant, char **msg); +int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp); struct dentry *mce_get_debugfs_dir(void); extern struct mce_bank *mce_banks; diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c index c370e1c..c61feb3 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -31,6 +31,7 @@ enum context { IN_KERNEL = 1, IN_USER = 2 }; enum ser { SER_REQUIRED = 1, NO_SER = 2 }; +enum exception { EXCP_CONTEXT = 1, NO_EXCP = 2 }; static struct severity { u64 mask; @@ -40,6 +41,7 @@ static struct severity { unsigned char mcgres; unsigned char ser; unsigned char context; + unsigned char excp; unsigned char covered; char *msg; } severities[] = { @@ -48,6 +50,8 @@ static struct severity { #define USER .context = IN_USER #define SER .ser = SER_REQUIRED #define NOSER .ser = NO_SER +#define EXCP .excp = EXCP_CONTEXT +#define NOEXCP .excp = NO_EXCP #define BITCLR(x) .mask = x, .result = 0 #define BITSET(x) .mask = x, .result = x #define MCGMASK(x, y) .mcgmask = x, .mcgres = y @@ -71,16 +75,20 @@ static struct severity { /* When MCIP is not set something is very confused */ MCESEV( PANIC, "MCIP not set in MCA handler", - MCGMASK(MCG_STATUS_MCIP, 0) + EXCP, MCGMASK(MCG_STATUS_MCIP, 0) ), /* Neither return not error IP -- no chance to recover -> PANIC */ MCESEV( PANIC, "Neither restart nor error IP", - MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0) + EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0) ), MCESEV( PANIC, "In kernel and no restart IP", - KERNEL, MCGMASK(MCG_STATUS_RIPV, 0) + EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0) + ), + MCESEV( + DEFERRED, "Deferred error", + NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED) ), MCESEV( KEEP, "Corrected error", @@ -89,7 +97,7 @@ static struct severity { /* ignore OVER for UCNA */ MCESEV( - KEEP, "Uncorrected no action required", + UCNA, "Uncorrected no action required", SER, MASK(MCI_UC_SAR, MCI_STATUS_UC) ), MCESEV( @@ -178,8 +186,9 @@ static int error_context(struct mce *m) return ((m->cs & 3) == 3) ? IN_USER : IN_KERNEL; } -int mce_severity(struct mce *m, int tolerant, char **msg) +int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp) { + enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP); enum context ctx = error_context(m); struct severity *s; @@ -194,6 +203,8 @@ int mce_severity(struct mce *m, int tolerant, char **msg) continue; if (s->context && ctx != s->context) continue; + if (s->excp && excp != s->excp) + continue; if (msg) *msg = s->msg; s->covered = 1; diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 61a9668ce..453e9bf 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -668,7 +668,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, if (quirk_no_way_out) quirk_no_way_out(i, m, regs); } - if (mce_severity(m, mca_cfg.tolerant, msg) >= MCE_PANIC_SEVERITY) + if (mce_severity(m, mca_cfg.tolerant, msg, true) >= + MCE_PANIC_SEVERITY) ret = 1; } return ret; @@ -754,7 +755,7 @@ static void mce_reign(void) for_each_possible_cpu(cpu) { int severity = mce_severity(&per_cpu(mces_seen, cpu), mca_cfg.tolerant, - &nmsg); + &nmsg, true); if (severity > global_worst) { msg = nmsg; global_worst = severity; @@ -1095,13 +1096,14 @@ void do_machine_check(struct pt_regs *regs, long error_code) */ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); - severity = mce_severity(&m, cfg->tolerant, NULL); + severity = mce_severity(&m, cfg->tolerant, NULL, true); /* - * When machine check was for corrected handler don't touch, - * unless we're panicing. + * When machine check was for corrected/deferred handler don't + * touch, unless we're panicing. */ - if (severity == MCE_KEEP_SEVERITY && !no_way_out) + if ((severity == MCE_KEEP_SEVERITY || + severity == MCE_UCNA_SEVERITY) && !no_way_out) continue; __set_bit(i, toclear); if (severity == MCE_NO_SEVERITY) { diff --git a/drivers/edac/mce_amd.h b/drivers/edac/mce_amd.h index 51b7e3a..c2359a1 100644 --- a/drivers/edac/mce_amd.h +++ b/drivers/edac/mce_amd.h @@ -32,9 +32,6 @@ #define R4(x) (((x) >> 4) & 0xf) #define R4_MSG(x) ((R4(x) < 9) ? rrrr_msgs[R4(x)] : "Wrong R4!") -#define MCI_STATUS_DEFERRED BIT_64(44) -#define MCI_STATUS_POISON BIT_64(43) - extern const char * const pp_msgs[]; enum tt_ids { -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-08 1:40 ` [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle " Chen Yucong @ 2014-11-10 22:06 ` Aravind Gopalakrishnan 2014-11-10 22:17 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Aravind Gopalakrishnan @ 2014-11-10 22:06 UTC (permalink / raw) To: Chen Yucong, bp, tony.luck; +Cc: ak, linux-edac, linux-kernel On 11/7/2014 7:40 PM, Chen Yucong wrote: > Until now, the mce_severity mechanism can only identify the severity > of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter > out DEFERRED error for ADM platform. > > This patch aims to extend the mce_severity mechanism for handling > UCNA/DEFERRED error. In order to do this, the patch introduces a new > severity level - MCE_UCNA/DEFERRED_SEVERITY. > > In addition, mce_severity is specific to machine check exception, > and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity > mechanism in non-exception context, the patch also introduces a new > argument (is_excp) for mce_severity. `is_excp' is used to explicitly > specify the calling context of mce_severity. > > Signed-off-by: Chen Yucong <slaoub@gmail.com> > --- > arch/x86/include/asm/mce.h | 4 ++++ > arch/x86/kernel/cpu/mcheck/mce-internal.h | 4 +++- > arch/x86/kernel/cpu/mcheck/mce-severity.c | 21 ++++++++++++++++----- > arch/x86/kernel/cpu/mcheck/mce.c | 14 ++++++++------ > drivers/edac/mce_amd.h | 3 --- > 5 files changed, 31 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index 276392f..51b26e89 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -34,6 +34,10 @@ > #define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */ > #define MCI_STATUS_AR (1ULL<<55) /* Action required */ > > +/* AMD-specific bits */ > +#define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */ > +#define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */ > + > /* > * Note that the full MCACOD field of IA32_MCi_STATUS MSR is > * bits 15:0. But bit 12 is the 'F' bit, defined for corrected > diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h > index 09edd0b..10b4690 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h > +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h > @@ -3,6 +3,8 @@ > > enum severity_level { > MCE_NO_SEVERITY, > + MCE_DEFERRED_SEVERITY, > + MCE_UCNA_SEVERITY = MCE_DEFERRED_SEVERITY, > MCE_KEEP_SEVERITY, > MCE_SOME_SEVERITY, > MCE_AO_SEVERITY, > @@ -21,7 +23,7 @@ struct mce_bank { > char attrname[ATTR_LEN]; /* attribute name */ > }; > > -int mce_severity(struct mce *a, int tolerant, char **msg); > +int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp); > struct dentry *mce_get_debugfs_dir(void); > > extern struct mce_bank *mce_banks; > diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c > index c370e1c..c61feb3 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c > +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c > @@ -31,6 +31,7 @@ > > enum context { IN_KERNEL = 1, IN_USER = 2 }; > enum ser { SER_REQUIRED = 1, NO_SER = 2 }; > +enum exception { EXCP_CONTEXT = 1, NO_EXCP = 2 }; > > static struct severity { > u64 mask; > @@ -40,6 +41,7 @@ static struct severity { > unsigned char mcgres; > unsigned char ser; > unsigned char context; > + unsigned char excp; > unsigned char covered; > char *msg; > } severities[] = { > @@ -48,6 +50,8 @@ static struct severity { > #define USER .context = IN_USER > #define SER .ser = SER_REQUIRED > #define NOSER .ser = NO_SER > +#define EXCP .excp = EXCP_CONTEXT > +#define NOEXCP .excp = NO_EXCP > #define BITCLR(x) .mask = x, .result = 0 > #define BITSET(x) .mask = x, .result = x > #define MCGMASK(x, y) .mcgmask = x, .mcgres = y > @@ -71,16 +75,20 @@ static struct severity { > /* When MCIP is not set something is very confused */ > MCESEV( > PANIC, "MCIP not set in MCA handler", > - MCGMASK(MCG_STATUS_MCIP, 0) > + EXCP, MCGMASK(MCG_STATUS_MCIP, 0) > ), > /* Neither return not error IP -- no chance to recover -> PANIC */ > MCESEV( > PANIC, "Neither restart nor error IP", > - MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0) > + EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0) > ), > MCESEV( > PANIC, "In kernel and no restart IP", > - KERNEL, MCGMASK(MCG_STATUS_RIPV, 0) > + EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0) > + ), > + MCESEV( > + DEFERRED, "Deferred error", > + NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED) > ), We don't need to have MCI_STATUS_POISON in the MASK() here as a deferred error is indicated by a {UC=0, Deferred = 1} (Older docs might be unclear on that..) And it still says ADM on the commit message :) - Aravind. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-10 22:06 ` Aravind Gopalakrishnan @ 2014-11-10 22:17 ` Borislav Petkov 2014-11-10 23:03 ` Aravind Gopalakrishnan 2014-11-10 23:32 ` Luck, Tony 0 siblings, 2 replies; 17+ messages in thread From: Borislav Petkov @ 2014-11-10 22:17 UTC (permalink / raw) To: Aravind Gopalakrishnan Cc: Chen Yucong, tony.luck, ak, linux-edac, linux-kernel On Mon, Nov 10, 2014 at 04:06:00PM -0600, Aravind Gopalakrishnan wrote: > >+ MCESEV( > >+ DEFERRED, "Deferred error", > >+ NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED) > > ), > > We don't need to have MCI_STATUS_POISON in the MASK() here as a deferred > error is indicated by a {UC=0, Deferred = 1} > (Older docs might be unclear on that..) Well, I think that's ok because the MASK() macro actually makes the check do: look at bits MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON and of them three MCI_STATUS_DEFERRED has to be the only one set. And that makes sense - we want deferred errors where we didn't attempt to consume poisoned data (which is signalled by MCI_STATUS_POISON).... Provided I'm reading this mce_severity thing correct, of course - more often than not I don't because it is insanely ugly and hard to read. > And it still says ADM on the commit message :) Yeah, Advanced Devices Micro - that's the new name. :-P -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-10 22:17 ` Borislav Petkov @ 2014-11-10 23:03 ` Aravind Gopalakrishnan 2014-11-10 23:32 ` Luck, Tony 1 sibling, 0 replies; 17+ messages in thread From: Aravind Gopalakrishnan @ 2014-11-10 23:03 UTC (permalink / raw) To: Borislav Petkov; +Cc: Chen Yucong, tony.luck, ak, linux-edac, linux-kernel On 11/10/2014 4:17 PM, Borislav Petkov wrote: > On Mon, Nov 10, 2014 at 04:06:00PM -0600, Aravind Gopalakrishnan wrote: >>> + MCESEV( >>> + DEFERRED, "Deferred error", >>> + NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED) >>> ), >> We don't need to have MCI_STATUS_POISON in the MASK() here as a deferred >> error is indicated by a {UC=0, Deferred = 1} >> (Older docs might be unclear on that..) > Well, I think that's ok because the MASK() macro actually makes the > check do: > > look at bits MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON and of > them three MCI_STATUS_DEFERRED has to be the only one set. > > And that makes sense - we want deferred errors where we didn't attempt > to consume poisoned data (which is signalled by MCI_STATUS_POISON).... Yeah, makes sense. Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Thanks, -Aravind. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-10 22:17 ` Borislav Petkov 2014-11-10 23:03 ` Aravind Gopalakrishnan @ 2014-11-10 23:32 ` Luck, Tony 2014-11-11 8:56 ` Borislav Petkov 1 sibling, 1 reply; 17+ messages in thread From: Luck, Tony @ 2014-11-10 23:32 UTC (permalink / raw) To: Borislav Petkov, Aravind Gopalakrishnan Cc: Chen Yucong, ak, linux-edac, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 966 bytes --] But then I tested it ... I injected a UC error to memory - then did a simple byte write to the target line. This resulted in two banks logging errors: [ 124.638045] poll: CPU54 saw ec00000000010092 in bank 7 [ 124.639006] poll: severity = 0 [ 124.647333] poll: CPU54 saw b800000000200179 in bank 3 [ 124.648322] poll: severity = 1 The bank 7 error reported as severity 0 because EN=0 ... so we took no action for it. The bank 3 error got past that hurdle, then through the next BIT(8) set indicates a cache error. Fell at the last check because ADDRV=0. I think the severity table entry for the "EN" check should have been skipped when calling from the CMCI handler. Then we would have seen severity=1 from the bank 7 error. It would have passed the other tests too (BIT(7) and ADDRV). -Tony ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-10 23:32 ` Luck, Tony @ 2014-11-11 8:56 ` Borislav Petkov 2014-11-11 18:44 ` Luck, Tony 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2014-11-11 8:56 UTC (permalink / raw) To: Luck, Tony Cc: Aravind Gopalakrishnan, Chen Yucong, ak, linux-edac, linux-kernel On Mon, Nov 10, 2014 at 11:32:12PM +0000, Luck, Tony wrote: > But then I tested it ... > > I injected a UC error to memory - then did a simple byte write to the target line. > This resulted in two banks logging errors: > > [ 124.638045] poll: CPU54 saw ec00000000010092 in bank 7 > [ 124.639006] poll: severity = 0 > [ 124.647333] poll: CPU54 saw b800000000200179 in bank 3 > [ 124.648322] poll: severity = 1 > > The bank 7 error reported as severity 0 because EN=0 ... so we took no action for it. How come EN is 0? Bank7 error reporting is not enabled? Why? Or the error injection thing doesn't do it? > The bank 3 error got past that hurdle, then through the next BIT(8) set indicates a > cache error. Fell at the last check because ADDRV=0. I guess you could tweak the injection path to write in a default address so that that check gets bypassed... > I think the severity table entry for the "EN" check should have been skipped > when calling from the CMCI handler. Then we would have seen severity=1 > from the bank 7 error. It would have passed the other tests too (BIT(7) and > ADDRV). ... but this is yet another example that this severity table is hard to extend and handle. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-11 8:56 ` Borislav Petkov @ 2014-11-11 18:44 ` Luck, Tony 2014-11-12 1:03 ` Chen Yucong 0 siblings, 1 reply; 17+ messages in thread From: Luck, Tony @ 2014-11-11 18:44 UTC (permalink / raw) To: Borislav Petkov Cc: Aravind Gopalakrishnan, Chen Yucong, ak, linux-edac, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2482 bytes --] >> The bank 7 error reported as severity 0 because EN=0 ... so we took no action for it. > > How come EN is 0? Bank7 error reporting is not enabled? Why? Or the > error injection thing doesn't do it? The "EN" bit is poorly named, and not well documented. Here's a clip from the SDM: One of bullets in 15.10.4.1 Machine-Check Exception Handler for Error Recovery When the EN flag is zero but the VAL and UC flags are one in the IA32_MCi_STATUS register, the reported uncorrected error in this bank is not enabled. As uncorrected errors with the EN flag = 0 are not the source of machine check exceptions, the MCE handler should log and clear non-enabled errors when the S bit is set and should continue searching for enabled errors from the other IA32_MCi_STATUS registers. Note that when IA32_MCG_CAP [24] is 0, any uncorrected error condition (VAL =1 and UC=1) including the one with the EN flag cleared are fatal and the handler must signal the operating system to reset the system. For the errors that do not generate machine check exceptions, the EN flag has no meaning. See Chapter 19: Table 19-15 to find the errors that do not generate machine check exceptions. Unfortunately the reference to chapter 19 is stale (that is now all about performance monitoring - I'll log a bug with the SDM editor to find the right reference and fix this). What this is trying to say is that the "EN" bit is to enable signaling of machine checks - so it only has meaning when checking banks from the machine check handler. Errors that are logged, but not signaled, or signaled as CMCI will have MCi_STATUS.EN=0 >> The bank 3 error got past that hurdle, then through the next BIT(8) set indicates a >> cache error. Fell at the last check because ADDRV=0. > > I guess you could tweak the injection path to write in a default address > so that that check gets bypassed... I don't think this is an injection artifact. I think on this processor the mid-level-cache just isn't providing an address in this case. It doesn't help to make one up - our whole game plan is to offline a page with a UC error - and we must have an address to know which page to offline. Perhaps the severity table entries for UCNA and DEFERRED errors should look to see if ADDRV is set - if not, don't report this as UCNA/DEFERRED? -Tony ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-11 18:44 ` Luck, Tony @ 2014-11-12 1:03 ` Chen Yucong 2014-11-12 18:28 ` Luck, Tony 0 siblings, 1 reply; 17+ messages in thread From: Chen Yucong @ 2014-11-12 1:03 UTC (permalink / raw) To: Luck, Tony Cc: Borislav Petkov, Aravind Gopalakrishnan, ak, linux-edac, linux-kernel On Tue, 2014-11-11 at 18:44 +0000, Luck, Tony wrote: > >> The bank 7 error reported as severity 0 because EN=0 ... so we took no action for it. > > > > How come EN is 0? Bank7 error reporting is not enabled? Why? Or the > > error injection thing doesn't do it? > > The "EN" bit is poorly named, and not well documented. Here's a clip from the SDM: > > One of bullets in 15.10.4.1 Machine-Check Exception Handler for Error Recovery > > When the EN flag is zero but the VAL and UC flags are one in the > IA32_MCi_STATUS register, the reported uncorrected error in this bank > is not enabled. As uncorrected errors with the EN flag = 0 are not the > source of machine check exceptions, the MCE handler should log and clear > non-enabled errors when the S bit is set and should continue searching > for enabled errors from the other IA32_MCi_STATUS registers. Note that > when IA32_MCG_CAP [24] is 0, any uncorrected error condition (VAL =1 > and UC=1) including the one with the EN flag cleared are fatal and the > handler must signal the operating system to reset the system. For the > errors that do not generate machine check exceptions, the EN flag has > no meaning. See Chapter 19: Table 19-15 to find the errors that do not > generate machine check exceptions. > > Unfortunately the reference to chapter 19 is stale (that is now all about > performance monitoring - I'll log a bug with the SDM editor to find the > right reference and fix this). > > What this is trying to say is that the "EN" bit is to enable signaling > of machine checks - so it only has meaning when checking banks from the > machine check handler. Errors that are logged, but not signaled, or signaled > as CMCI will have MCi_STATUS.EN=0 > > > >> The bank 3 error got past that hurdle, then through the next BIT(8) set indicates a > >> cache error. Fell at the last check because ADDRV=0. > > > > I guess you could tweak the injection path to write in a default address > > so that that check gets bypassed... > > I don't think this is an injection artifact. I think on this processor the mid-level-cache > just isn't providing an address in this case. It doesn't help to make one up - our whole > game plan is to offline a page with a UC error - and we must have an address to know > which page to offline. > > Perhaps the severity table entries for UCNA and DEFERRED errors should look to see > if ADDRV is set - if not, don't report this as UCNA/DEFERRED? > We can also find the following snippet from AMD APM Volume 2: 9.3.2 Error-Reporting Register Banks - MCi_STATUS EN—Bit 60. When set to 1, this bit indicates that the error condition is enabled in the corresponding error-reporting control register (MCi_CTL). Errors disabled by MCi_CTL do not cause a `machine-check exception'. Just as what you said, the severity table entry for the "EN" check should have been skipped when calling from the CMCI/Poll handler. As shown below: MCESEV( NO, "Not enabled", EXCP, BITCLR(MCI_STATUS_EN) ), thx! cyc ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-12 1:03 ` Chen Yucong @ 2014-11-12 18:28 ` Luck, Tony 0 siblings, 0 replies; 17+ messages in thread From: Luck, Tony @ 2014-11-12 18:28 UTC (permalink / raw) To: Chen Yucong Cc: Borislav Petkov, Aravind Gopalakrishnan, ak, linux-edac, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 524 bytes --] > Just as what you said, the severity table entry for the "EN" check > should have been skipped when calling from the CMCI/Poll handler. > As shown below: > > MCESEV( > NO, "Not enabled", > EXCP, BITCLR(MCI_STATUS_EN) > ), Yes - that worked. The bank 7 error now gets processed and we offline the page. -Tony ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll 2014-11-08 1:40 [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Chen Yucong 2014-11-08 1:40 ` [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle " Chen Yucong @ 2014-11-08 1:40 ` Chen Yucong 2014-11-10 19:06 ` Borislav Petkov 2014-11-10 16:42 ` [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Borislav Petkov 2 siblings, 1 reply; 17+ messages in thread From: Chen Yucong @ 2014-11-08 1:40 UTC (permalink / raw) To: bp, tony.luck Cc: ak, aravind.gopalakrishnan, linux-edac, linux-kernel, Chen Yucong Uncorrected no action required (UCNA) - is a uncorrected recoverable machine check error that is not signaled via a machine check exception and, instead, is reported to system software as a corrected machine check error. UCNA errors indicate that some data in the system is corrupted, but the data has not been consumed and the processor state is valid and you may continue execution on this processor. UCNA errors require no action from system software to continue execution. Note that UCNA errors are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set. -- Intel SDM Volume 3B Deferred errors are errors that cannot be corrected by hardware, but do not cause an immediate interruption in program flow, loss of data integrity, or corruption of processor state. These errors indicate that data has been corrupted but not consumed. Hardware writes information to the status and address registers in the corresponding bank that identifies the source of the error if deferred errors are enabled for logging. Deferred errors are not reported via machine check exceptions; they can be seen by polling the MCi_STATUS registers. -- AMD64 APM Volume 2 Above two items, both UCNA and Deferred errors belong to detected errors, but they can't be corrected by hardware, and this is very similar to Software Recoverable Action Optional (SRAO) errors. Therefore, we can take some actions that have been used for handling SRAO errors to handle UCNA and Deferred errors. Signed-off-by: Chen Yucong <slaoub@gmail.com> --- arch/x86/kernel/cpu/mcheck/mce.c | 50 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 453e9bf..4b6e4cdf 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -575,6 +575,41 @@ static void mce_read_aux(struct mce *m, int i) } } +static bool memory_error(struct mce *m) +{ + struct cpuinfo_x86 *c = &boot_cpu_data; + + if (c->x86_vendor == X86_VENDOR_AMD) { + /* + * AMD BKDGs - Machine Check Error Codes + * + * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating + * a memory-specific error. Note that this field encodes info- + * rmation about memory-hierarchy level involved in the error. + */ + return (m->status & 0xff00) == BIT(8); + } else if (c->x86_vendor == X86_VENDOR_INTEL) { + /* + * Intel SDM Volume 3B - 15.9.2 Compound Error Codes + * + * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for + * indicating a memory error. Bit 8 is used for indicating a + * cache hierarchy error. The combination of bit 2 and bit 3 + * is used for indicating a `generic' cache hierarchy error + * But we can't just blindly check the above bits, because if + * bit 11 is set, then it is a bus/interconnect error - and + * either way the above bits just gives more detail on what + * bus/interconnect error happened. Note that bit 12 can be + * ignored, as it's the "filter" bit. + */ + return (m->status & 0xef80) == BIT(7) || + (m->status & 0xef00) == BIT(8) || + (m->status & 0xeffc) == 0xc; + } + + return false; +} + DEFINE_PER_CPU(unsigned, mce_poll_count); /* @@ -595,6 +630,7 @@ DEFINE_PER_CPU(unsigned, mce_poll_count); void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) { struct mce m; + int severity; int i; this_cpu_inc(mce_poll_count); @@ -630,6 +666,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) if (!(flags & MCP_TIMESTAMP)) m.tsc = 0; + + severity = mce_severity(&m, mca_cfg.tolerant, NULL, false); + + /* + * In the cases where we don't have a valid address after all, + * do not add it into the ring buffer. + */ + if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m)) { + if (m.status & MCI_STATUS_ADDRV) { + mce_ring_add(m.addr >> PAGE_SHIFT); + mce_schedule_work(); + } + } + /* * Don't get the IP here because it's unlikely to * have anything to do with the actual error location. -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll 2014-11-08 1:40 ` [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll Chen Yucong @ 2014-11-10 19:06 ` Borislav Petkov 2014-11-10 21:37 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2014-11-10 19:06 UTC (permalink / raw) To: Chen Yucong Cc: tony.luck, ak, aravind.gopalakrishnan, linux-edac, linux-kernel On Sat, Nov 08, 2014 at 09:40:21AM +0800, Chen Yucong wrote: > Uncorrected no action required (UCNA) - is a uncorrected recoverable > machine check error that is not signaled via a machine check exception > and, instead, is reported to system software as a corrected machine > check error. UCNA errors indicate that some data in the system is > corrupted, but the data has not been consumed and the processor state > is valid and you may continue execution on this processor. UCNA errors > require no action from system software to continue execution. Note that > UCNA errors are supported by the processor only when IA32_MCG_CAP[24] > (MCG_SER_P) is set. > -- Intel SDM Volume 3B > > Deferred errors are errors that cannot be corrected by hardware, but > do not cause an immediate interruption in program flow, loss of data > integrity, or corruption of processor state. These errors indicate > that data has been corrupted but not consumed. Hardware writes information > to the status and address registers in the corresponding bank that > identifies the source of the error if deferred errors are enabled for > logging. Deferred errors are not reported via machine check exceptions; > they can be seen by polling the MCi_STATUS registers. > -- AMD64 APM Volume 2 > > Above two items, both UCNA and Deferred errors belong to detected > errors, but they can't be corrected by hardware, and this is very > similar to Software Recoverable Action Optional (SRAO) errors. > Therefore, we can take some actions that have been used for handling > SRAO errors to handle UCNA and Deferred errors. > > Signed-off-by: Chen Yucong <slaoub@gmail.com> > --- > arch/x86/kernel/cpu/mcheck/mce.c | 50 ++++++++++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 453e9bf..4b6e4cdf 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -575,6 +575,41 @@ static void mce_read_aux(struct mce *m, int i) > } > } > > +static bool memory_error(struct mce *m) > +{ > + struct cpuinfo_x86 *c = &boot_cpu_data; > + > + if (c->x86_vendor == X86_VENDOR_AMD) { > + /* > + * AMD BKDGs - Machine Check Error Codes > + * > + * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating > + * a memory-specific error. Note that this field encodes info- > + * rmation about memory-hierarchy level involved in the error. > + */ > + return (m->status & 0xff00) == BIT(8); Grrr, you copied this from a patch of mine, correct? So if you copy it why did you go and change it up and break it on top of that?! Testing bit 8 is wrong! NAK! Before you go and "fix" this again, wait for me to clear up something else before you even go do that. I'll let you know. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll 2014-11-10 19:06 ` Borislav Petkov @ 2014-11-10 21:37 ` Borislav Petkov 2014-11-10 21:44 ` Luck, Tony 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2014-11-10 21:37 UTC (permalink / raw) To: Chen Yucong Cc: tony.luck, ak, aravind.gopalakrishnan, linux-edac, linux-kernel On Mon, Nov 10, 2014 at 08:06:17PM +0100, Borislav Petkov wrote: > Before you go and "fix" this again, wait for me to clear up something > else before you even go do that. I'll let you know. Infact, you could redo this patch in the meantime without the AMD vendor check so that Tony can pick them up soon. I'll add the correct AMD bits later. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll 2014-11-10 21:37 ` Borislav Petkov @ 2014-11-10 21:44 ` Luck, Tony 2014-11-10 21:47 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Luck, Tony @ 2014-11-10 21:44 UTC (permalink / raw) To: Borislav Petkov, Chen Yucong Cc: ak, aravind.gopalakrishnan, linux-edac, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 880 bytes --] > In fact, you could redo this patch in the meantime without the AMD vendor > check so that Tony can pick them up soon. I'll add the correct AMD bits > later. You just want this hunk deleted ... + if (c->x86_vendor == X86_VENDOR_AMD) { + /* + * AMD BKDGs - Machine Check Error Codes + * + * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating + * a memory-specific error. Note that this field encodes info- + * rmation about memory-hierarchy level involved in the error. + */ + return (m->status & 0xff00) == BIT(8); + } else I can do that without a repost. Or I can leave a reminder ... + if (c->x86_vendor == X86_VENDOR_AMD) { + /* + * coming soon + */ + return false; + } else -Tony ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll 2014-11-10 21:44 ` Luck, Tony @ 2014-11-10 21:47 ` Borislav Petkov 0 siblings, 0 replies; 17+ messages in thread From: Borislav Petkov @ 2014-11-10 21:47 UTC (permalink / raw) To: Luck, Tony Cc: Chen Yucong, ak, aravind.gopalakrishnan, linux-edac, linux-kernel On Mon, Nov 10, 2014 at 09:44:26PM +0000, Luck, Tony wrote: > > In fact, you could redo this patch in the meantime without the AMD vendor > > check so that Tony can pick them up soon. I'll add the correct AMD bits > > later. > > You just want this hunk deleted ... > > + if (c->x86_vendor == X86_VENDOR_AMD) { > + /* > + * AMD BKDGs - Machine Check Error Codes > + * > + * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating > + * a memory-specific error. Note that this field encodes info- > + * rmation about memory-hierarchy level involved in the error. > + */ > + return (m->status & 0xff00) == BIT(8); > + } else > > I can do that without a repost. Or I can leave a reminder ... > > + if (c->x86_vendor == X86_VENDOR_AMD) { > + /* > + * coming soon > + */ > + return false; > + } else Ok, that's fine too. I'll try to have a fix for this very soon so that we can be just in time ready to go for 3.19. With that you can slap Acked-by: Borislav Petkov <bp@suse.de> to them. Thanks Tony. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error 2014-11-08 1:40 [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Chen Yucong 2014-11-08 1:40 ` [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle " Chen Yucong 2014-11-08 1:40 ` [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll Chen Yucong @ 2014-11-10 16:42 ` Borislav Petkov 2014-11-10 18:47 ` Luck, Tony 2 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2014-11-10 16:42 UTC (permalink / raw) To: tony.luck Cc: Chen Yucong, ak, aravind.gopalakrishnan, linux-edac, linux-kernel On Sat, Nov 08, 2014 at 09:40:19AM +0800, Chen Yucong wrote: > Hi all, > > At the suggestion of Boris, the first patch extends the mce_severity > mechanism for handling UCNA/DEFERRED error. > Link: https://lkml.org/lkml/2014/10/23/190 > > v2: > The first patch have also eliminated a big hack to make mce_severity() > work when called from non-exception context on the advice of Tony and > Boris. > Link: https://lkml.org/lkml/2014/10/27/1017 > > And on the basis of the first patch, the second patch adds the support > for identifying and handling UCNA/DEFERRED error in machine_check_poll. > > V3: > According to Boris, the second patch have also split `memory_error' > from mem_deferred_error so that the memory_error() function can be used > in other code paths separately. > Link: https://lkml.org/lkml/2014/11/6/452 > > Boris also reported the warning about "MCI_STATUS_POISON" and "MCI_STATUS_POISON" > redefined. Looks ok to me. Tony, let me know if I should pick them up or you want to. Btw, there's already tip/x86/ras for 3.19. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error 2014-11-10 16:42 ` [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Borislav Petkov @ 2014-11-10 18:47 ` Luck, Tony 0 siblings, 0 replies; 17+ messages in thread From: Luck, Tony @ 2014-11-10 18:47 UTC (permalink / raw) To: Borislav Petkov Cc: Chen Yucong, ak, aravind.gopalakrishnan, linux-edac, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 391 bytes --] > Looks ok to me. And also to me > Tony, let me know if I should pick them up or you want to. Btw, there's > already tip/x86/ras for 3.19. I can take it - if you retype the "Looks ok" in the form of an "Acked-by" tag :-) -Tony ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-11-12 18:32 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-08 1:40 [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Chen Yucong 2014-11-08 1:40 ` [PATCH v3 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle " Chen Yucong 2014-11-10 22:06 ` Aravind Gopalakrishnan 2014-11-10 22:17 ` Borislav Petkov 2014-11-10 23:03 ` Aravind Gopalakrishnan 2014-11-10 23:32 ` Luck, Tony 2014-11-11 8:56 ` Borislav Petkov 2014-11-11 18:44 ` Luck, Tony 2014-11-12 1:03 ` Chen Yucong 2014-11-12 18:28 ` Luck, Tony 2014-11-08 1:40 ` [PATCH v3 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll Chen Yucong 2014-11-10 19:06 ` Borislav Petkov 2014-11-10 21:37 ` Borislav Petkov 2014-11-10 21:44 ` Luck, Tony 2014-11-10 21:47 ` Borislav Petkov 2014-11-10 16:42 ` [PATCH v3 0/2]RAS: add the support for handling UCNA/DEFERRED error Borislav Petkov 2014-11-10 18:47 ` Luck, Tony
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).