When we hit an UE while using machine check safe copy routines, ignore_event flag is set and the event is ignored by mce handler, And the flag is also saved for defered handling and printing of mce event information, But as of now saving of this flag is done on checking if the effective address is provided or physical address is calculated, which is not right. Save ignore_event flag regardless of whether the effective address is provided or physical address is calculated. Without this change following log is seen, when the event is to be ignored. [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90 [ 512.971655] MCE: CPU1: Initiator CPU [ 512.971739] MCE: CPU1: Unknown [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90 [ 512.972456] MCE: CPU1: Initiator CPU [ 512.972534] MCE: CPU1: Unknown Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> --- arch/powerpc/kernel/mce.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 11f0cae086ed..db9363e131ce 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -131,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled, * Populate the mce error_type and type-specific error_type. */ mce_set_error_info(mce, mce_err); + if (mce->error_type == MCE_ERROR_TYPE_UE) + mce->u.ue_error.ignore_event = mce_err->ignore_event; if (!addr) return; @@ -159,7 +161,6 @@ void save_mce_event(struct pt_regs *regs, long handled, if (phys_addr != ULONG_MAX) { mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; - mce->u.ue_error.ignore_event = mce_err->ignore_event; machine_check_ue_event(mce); } } -- 2.26.2
Hi Ganesh, Ganesh Goudar <ganeshgr@linux.ibm.com> writes: > When we hit an UE while using machine check safe copy routines, > ignore_event flag is set and the event is ignored by mce handler, > And the flag is also saved for defered handling and printing of > mce event information, But as of now saving of this flag is done > on checking if the effective address is provided or physical address > is calculated, which is not right. > > Save ignore_event flag regardless of whether the effective address is > provided or physical address is calculated. > > Without this change following log is seen, when the event is to be > ignored. > > [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] > [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90 > [ 512.971655] MCE: CPU1: Initiator CPU > [ 512.971739] MCE: CPU1: Unknown > [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] > [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90 > [ 512.972456] MCE: CPU1: Initiator CPU > [ 512.972534] MCE: CPU1: Unknown > > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> > --- > arch/powerpc/kernel/mce.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c > index 11f0cae086ed..db9363e131ce 100644 > --- a/arch/powerpc/kernel/mce.c > +++ b/arch/powerpc/kernel/mce.c > @@ -131,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled, > * Populate the mce error_type and type-specific error_type. > */ > mce_set_error_info(mce, mce_err); > + if (mce->error_type == MCE_ERROR_TYPE_UE) > + mce->u.ue_error.ignore_event = mce_err->ignore_event; > > if (!addr) > return; > @@ -159,7 +161,6 @@ void save_mce_event(struct pt_regs *regs, long handled, > if (phys_addr != ULONG_MAX) { > mce->u.ue_error.physical_address_provided = true; > mce->u.ue_error.physical_address = phys_addr; > - mce->u.ue_error.ignore_event = mce_err->ignore_event; > machine_check_ue_event(mce); > } > } Small nit: Setting ignore event can happen before the phys_addr check, under the existing check for MCE_ERROR_TYPE_UE, instead of repeating the same condition again. Except for the above nit Thanks, Santosh > -- > 2.26.2
Hi Ganesh, Ganesh Goudar <ganeshgr@linux.ibm.com> writes: > When we hit an UE while using machine check safe copy routines, > ignore_event flag is set and the event is ignored by mce handler, > And the flag is also saved for defered handling and printing of > mce event information, But as of now saving of this flag is done > on checking if the effective address is provided or physical address > is calculated, which is not right. > > Save ignore_event flag regardless of whether the effective address is > provided or physical address is calculated. > > Without this change following log is seen, when the event is to be > ignored. > > [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] > [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90 > [ 512.971655] MCE: CPU1: Initiator CPU > [ 512.971739] MCE: CPU1: Unknown > [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] > [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90 > [ 512.972456] MCE: CPU1: Initiator CPU > [ 512.972534] MCE: CPU1: Unknown > > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> > --- > arch/powerpc/kernel/mce.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c > index 11f0cae086ed..db9363e131ce 100644 > --- a/arch/powerpc/kernel/mce.c > +++ b/arch/powerpc/kernel/mce.c > @@ -131,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled, > * Populate the mce error_type and type-specific error_type. > */ > mce_set_error_info(mce, mce_err); > + if (mce->error_type == MCE_ERROR_TYPE_UE) > + mce->u.ue_error.ignore_event = mce_err->ignore_event; > > if (!addr) > return; > @@ -159,7 +161,6 @@ void save_mce_event(struct pt_regs *regs, long handled, > if (phys_addr != ULONG_MAX) { > mce->u.ue_error.physical_address_provided = true; > mce->u.ue_error.physical_address = phys_addr; > - mce->u.ue_error.ignore_event = mce_err->ignore_event; > machine_check_ue_event(mce); > } > } Small nit: Setting ignore event can happen before the phys_addr check, under the existing check for MCE_ERROR_TYPE_UE, instead of repeating the same condition again. Except for the above nit Reviewed-by: Santosh Sivaraj <santosh@fossix.org> Thanks, Santosh > -- > 2.26.2
On 4/20/21 12:54 PM, Santosh Sivaraj wrote: > Hi Ganesh, > > Ganesh Goudar <ganeshgr@linux.ibm.com> writes: > >> When we hit an UE while using machine check safe copy routines, >> ignore_event flag is set and the event is ignored by mce handler, >> And the flag is also saved for defered handling and printing of >> mce event information, But as of now saving of this flag is done >> on checking if the effective address is provided or physical address >> is calculated, which is not right. >> >> Save ignore_event flag regardless of whether the effective address is >> provided or physical address is calculated. >> >> Without this change following log is seen, when the event is to be >> ignored. >> >> [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] >> [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90 >> [ 512.971655] MCE: CPU1: Initiator CPU >> [ 512.971739] MCE: CPU1: Unknown >> [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] >> [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90 >> [ 512.972456] MCE: CPU1: Initiator CPU >> [ 512.972534] MCE: CPU1: Unknown >> >> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> >> --- >> arch/powerpc/kernel/mce.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c >> index 11f0cae086ed..db9363e131ce 100644 >> --- a/arch/powerpc/kernel/mce.c >> +++ b/arch/powerpc/kernel/mce.c >> @@ -131,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled, >> * Populate the mce error_type and type-specific error_type. >> */ >> mce_set_error_info(mce, mce_err); >> + if (mce->error_type == MCE_ERROR_TYPE_UE) >> + mce->u.ue_error.ignore_event = mce_err->ignore_event; >> >> if (!addr) >> return; >> @@ -159,7 +161,6 @@ void save_mce_event(struct pt_regs *regs, long handled, >> if (phys_addr != ULONG_MAX) { >> mce->u.ue_error.physical_address_provided = true; >> mce->u.ue_error.physical_address = phys_addr; >> - mce->u.ue_error.ignore_event = mce_err->ignore_event; >> machine_check_ue_event(mce); >> } >> } > Small nit: > Setting ignore event can happen before the phys_addr check, under the existing > check for MCE_ERROR_TYPE_UE, instead of repeating the same condition again. In some cases we may not get effective address also, so it is placed before effective address check. > > Except for the above nit > > Reviewed-by: Santosh Sivaraj <santosh@fossix.org> > > Thanks, > Santosh >> -- >> 2.26.2
Ganesh <ganeshgr@linux.ibm.com> writes: > On 4/20/21 12:54 PM, Santosh Sivaraj wrote: > >> Hi Ganesh, >> >> Ganesh Goudar <ganeshgr@linux.ibm.com> writes: >> >>> When we hit an UE while using machine check safe copy routines, >>> ignore_event flag is set and the event is ignored by mce handler, >>> And the flag is also saved for defered handling and printing of >>> mce event information, But as of now saving of this flag is done >>> on checking if the effective address is provided or physical address >>> is calculated, which is not right. >>> >>> Save ignore_event flag regardless of whether the effective address is >>> provided or physical address is calculated. >>> >>> Without this change following log is seen, when the event is to be >>> ignored. >>> >>> [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] >>> [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90 >>> [ 512.971655] MCE: CPU1: Initiator CPU >>> [ 512.971739] MCE: CPU1: Unknown >>> [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered] >>> [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90 >>> [ 512.972456] MCE: CPU1: Initiator CPU >>> [ 512.972534] MCE: CPU1: Unknown >>> >>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> >>> --- >>> arch/powerpc/kernel/mce.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c >>> index 11f0cae086ed..db9363e131ce 100644 >>> --- a/arch/powerpc/kernel/mce.c >>> +++ b/arch/powerpc/kernel/mce.c >>> @@ -131,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled, >>> * Populate the mce error_type and type-specific error_type. >>> */ >>> mce_set_error_info(mce, mce_err); >>> + if (mce->error_type == MCE_ERROR_TYPE_UE) >>> + mce->u.ue_error.ignore_event = mce_err->ignore_event; >>> >>> if (!addr) >>> return; >>> @@ -159,7 +161,6 @@ void save_mce_event(struct pt_regs *regs, long handled, >>> if (phys_addr != ULONG_MAX) { >>> mce->u.ue_error.physical_address_provided = true; >>> mce->u.ue_error.physical_address = phys_addr; >>> - mce->u.ue_error.ignore_event = mce_err->ignore_event; >>> machine_check_ue_event(mce); >>> } >>> } >> Small nit: >> Setting ignore event can happen before the phys_addr check, under the existing >> check for MCE_ERROR_TYPE_UE, instead of repeating the same condition again. > > In some cases we may not get effective address also, so it is placed before > effective address check. Yes, I forgot the last two lines in the changelog after I applied the patch :-) Thanks, Santosh > >> >> Except for the above nit >> >> Reviewed-by: Santosh Sivaraj <santosh@fossix.org> >> >> Thanks, >> Santosh >>> -- >>> 2.26.2
On Wed, 7 Apr 2021 10:28:16 +0530, Ganesh Goudar wrote: > When we hit an UE while using machine check safe copy routines, > ignore_event flag is set and the event is ignored by mce handler, > And the flag is also saved for defered handling and printing of > mce event information, But as of now saving of this flag is done > on checking if the effective address is provided or physical address > is calculated, which is not right. > > [...] Applied to powerpc/next. [1/1] powerpc/mce: save ignore_event flag unconditionally for UE https://git.kernel.org/powerpc/c/92d9d61be519f32f16c07602db5bcbe30a0836fe cheers
On 4/7/21 10:28 AM, Ganesh Goudar wrote:
> When we hit an UE while using machine check safe copy routines,
> ignore_event flag is set and the event is ignored by mce handler,
> And the flag is also saved for defered handling and printing of
> mce event information, But as of now saving of this flag is done
> on checking if the effective address is provided or physical address
> is calculated, which is not right.
>
> Save ignore_event flag regardless of whether the effective address is
> provided or physical address is calculated.
>
> Without this change following log is seen, when the event is to be
> ignored.
>
> [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered]
> [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90
> [ 512.971655] MCE: CPU1: Initiator CPU
> [ 512.971739] MCE: CPU1: Unknown
> [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store [Recovered]
> [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90
> [ 512.972456] MCE: CPU1: Initiator CPU
> [ 512.972534] MCE: CPU1: Unknown
>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---
> arch/powerpc/kernel/mce.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
Hi mpe, Any comments on this patch?
On 4/22/21 11:31 AM, Ganesh wrote:
> On 4/7/21 10:28 AM, Ganesh Goudar wrote:
>
>> When we hit an UE while using machine check safe copy routines,
>> ignore_event flag is set and the event is ignored by mce handler,
>> And the flag is also saved for defered handling and printing of
>> mce event information, But as of now saving of this flag is done
>> on checking if the effective address is provided or physical address
>> is calculated, which is not right.
>>
>> Save ignore_event flag regardless of whether the effective address is
>> provided or physical address is calculated.
>>
>> Without this change following log is seen, when the event is to be
>> ignored.
>>
>> [ 512.971365] MCE: CPU1: machine check (Severe) UE Load/Store
>> [Recovered]
>> [ 512.971509] MCE: CPU1: NIP: [c0000000000b67c0] memcpy+0x40/0x90
>> [ 512.971655] MCE: CPU1: Initiator CPU
>> [ 512.971739] MCE: CPU1: Unknown
>> [ 512.972209] MCE: CPU1: machine check (Severe) UE Load/Store
>> [Recovered]
>> [ 512.972334] MCE: CPU1: NIP: [c0000000000b6808] memcpy+0x88/0x90
>> [ 512.972456] MCE: CPU1: Initiator CPU
>> [ 512.972534] MCE: CPU1: Unknown
>>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> ---
>> arch/powerpc/kernel/mce.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
> Hi mpe, Any comments on this patch?
Please ignore, I see its applied.