All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set
@ 2017-09-21  2:04 Michael Neuling
  2017-09-21  8:18 ` Nicholas Piggin
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Neuling @ 2017-09-21  2:04 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, benh, Balbir Singh

On POWER9 DD2.1 and below, it's possible to get Machine Check
Exception (MCE) where only DSISR bit 33 is set. This will result in
the linux MCE handler seeing an unknown event, which triggers linux to
crash.

We change this by detecting unknown events in the MCE handler and
marking them as handled so that we no longer crash. We do this only on
chip revisions known to have this problem.

MCE that occurs like this is spurious, so we don't need to do anything
in terms of servicing it. If there is something that needs to be
serviced, the CPU will raise the MCE again with the correct DSISR so
that it can be serviced properly.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
v2 update commit message based on Balbir's comments
---
 arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index b76ca198e0..72ec667136 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs,
 	uint64_t addr;
 	uint64_t srr1 = regs->msr;
 	long handled;
+	unsigned long pvr;
 
 	if (SRR1_MC_LOADSTORE(srr1))
 		handled = mce_handle_derror(regs, dtable, &mce_err, &addr);
@@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs,
 	if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE)
 		handled = mce_handle_ue_error(regs);
 
+	/*
+	 * On POWER9 DD2.1 and below, it's possible to get machine
+	 * check where only DSISR bit 33 is set. This will result in
+	 * the MCE handler seeing an unknown event and us crashing.
+	 * Change this to mark as handled on these revisions.
+	 */
+	pvr = mfspr(SPRN_PVR);
+	if (((PVR_VER(pvr) == PVR_POWER9) &&
+	     (PVR_CFG(pvr) == 2) &&
+	     (PVR_MIN(pvr) <= 1)) || cpu_has_feature(CPU_FTR_POWER9_DD1))
+		/* DD2.1 and below */
+		if (mce_err.error_type == MCE_ERROR_TYPE_UNKNOWN)
+		    handled = 1;
+
 	save_mce_event(regs, handled, &mce_err, regs->nip, addr);
 
 	return handled;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set
  2017-09-21  2:04 [PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set Michael Neuling
@ 2017-09-21  8:18 ` Nicholas Piggin
  2017-09-21  9:57   ` Michael Neuling
  0 siblings, 1 reply; 4+ messages in thread
From: Nicholas Piggin @ 2017-09-21  8:18 UTC (permalink / raw)
  To: Michael Neuling; +Cc: mpe, linuxppc-dev

On Thu, 21 Sep 2017 12:04:34 +1000
Michael Neuling <mikey@neuling.org> wrote:

> On POWER9 DD2.1 and below, it's possible to get Machine Check
> Exception (MCE) where only DSISR bit 33 is set. This will result in
> the linux MCE handler seeing an unknown event, which triggers linux to
> crash.
> 
> We change this by detecting unknown events in the MCE handler and
> marking them as handled so that we no longer crash. We do this only on
> chip revisions known to have this problem.
> 
> MCE that occurs like this is spurious, so we don't need to do anything
> in terms of servicing it. If there is something that needs to be
> serviced, the CPU will raise the MCE again with the correct DSISR so
> that it can be serviced properly.
> 
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
> v2 update commit message based on Balbir's comments
> ---
>  arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
> index b76ca198e0..72ec667136 100644
> --- a/arch/powerpc/kernel/mce_power.c
> +++ b/arch/powerpc/kernel/mce_power.c
> @@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs,
>  	uint64_t addr;
>  	uint64_t srr1 = regs->msr;
>  	long handled;
> +	unsigned long pvr;
>  
>  	if (SRR1_MC_LOADSTORE(srr1))
>  		handled = mce_handle_derror(regs, dtable, &mce_err, &addr);
> @@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs,
>  	if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE)
>  		handled = mce_handle_ue_error(regs);
>  
> +	/*
> +	 * On POWER9 DD2.1 and below, it's possible to get machine
> +	 * check where only DSISR bit 33 is set. This will result in
> +	 * the MCE handler seeing an unknown event and us crashing.
> +	 * Change this to mark as handled on these revisions.
> +	 */
> +	pvr = mfspr(SPRN_PVR);
> +	if (((PVR_VER(pvr) == PVR_POWER9) &&
> +	     (PVR_CFG(pvr) == 2) &&
> +	     (PVR_MIN(pvr) <= 1)) || cpu_has_feature(CPU_FTR_POWER9_DD1))
> +		/* DD2.1 and below */
> +		if (mce_err.error_type == MCE_ERROR_TYPE_UNKNOWN)
> +		    handled = 1;

I might be missing something, but can you just do

  if (regs->dsisr == 0x40000000)
      return 1;

In __machine_check_early_realmode_p9() ?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set
  2017-09-21  8:18 ` Nicholas Piggin
@ 2017-09-21  9:57   ` Michael Neuling
  2017-09-21 12:44     ` Nicholas Piggin
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Neuling @ 2017-09-21  9:57 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: mpe, linuxppc-dev

On Thu, 2017-09-21 at 18:18 +1000, Nicholas Piggin wrote:
> On Thu, 21 Sep 2017 12:04:34 +1000
> Michael Neuling <mikey@neuling.org> wrote:
>=20
> > On POWER9 DD2.1 and below, it's possible to get Machine Check
> > Exception (MCE) where only DSISR bit 33 is set. This will result in
> > the linux MCE handler seeing an unknown event, which triggers linux to
> > crash.
> >=20
> > We change this by detecting unknown events in the MCE handler and
> > marking them as handled so that we no longer crash. We do this only on
> > chip revisions known to have this problem.
> >=20
> > MCE that occurs like this is spurious, so we don't need to do anything
> > in terms of servicing it. If there is something that needs to be
> > serviced, the CPU will raise the MCE again with the correct DSISR so
> > that it can be serviced properly.
> >=20
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> > v2 update commit message based on Balbir's comments
> > ---
> > =C2=A0arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++
> > =C2=A01 file changed, 15 insertions(+)
> >=20
> > diff --git a/arch/powerpc/kernel/mce_power.c
> > b/arch/powerpc/kernel/mce_power.c
> > index b76ca198e0..72ec667136 100644
> > --- a/arch/powerpc/kernel/mce_power.c
> > +++ b/arch/powerpc/kernel/mce_power.c
> > @@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs,
> > =C2=A0	uint64_t addr;
> > =C2=A0	uint64_t srr1 =3D regs->msr;
> > =C2=A0	long handled;
> > +	unsigned long pvr;
> > =C2=A0
> > =C2=A0	if (SRR1_MC_LOADSTORE(srr1))
> > =C2=A0		handled =3D mce_handle_derror(regs, dtable, &mce_err, &addr);
> > @@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs,
> > =C2=A0	if (!handled && mce_err.error_type =3D=3D MCE_ERROR_TYPE_UE)
> > =C2=A0		handled =3D mce_handle_ue_error(regs);
> > =C2=A0
> > +	/*
> > +	=C2=A0* On POWER9 DD2.1 and below, it's possible to get machine
> > +	=C2=A0* check where only DSISR bit 33 is set. This will result in
> > +	=C2=A0* the MCE handler seeing an unknown event and us crashing.
> > +	=C2=A0* Change this to mark as handled on these revisions.
> > +	=C2=A0*/
> > +	pvr =3D mfspr(SPRN_PVR);
> > +	if (((PVR_VER(pvr) =3D=3D PVR_POWER9) &&
> > +	=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(PVR_CFG(pvr) =3D=3D 2) &&
> > +	=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(PVR_MIN(pvr) <=3D 1)) || cpu_has_featu=
re(CPU_FTR_POWER9_DD1))
> > +		/* DD2.1 and below */
> > +		if (mce_err.error_type =3D=3D MCE_ERROR_TYPE_UNKNOWN)
> > +		=C2=A0=C2=A0=C2=A0=C2=A0handled =3D 1;
>=20
> I might be missing something, but can you just do
>=20
> =C2=A0 if (regs->dsisr =3D=3D 0x40000000)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return 1;
>=20
> In __machine_check_early_realmode_p9() ?

You're right, thanks.

Mikey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set
  2017-09-21  9:57   ` Michael Neuling
@ 2017-09-21 12:44     ` Nicholas Piggin
  0 siblings, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2017-09-21 12:44 UTC (permalink / raw)
  To: Michael Neuling; +Cc: mpe, linuxppc-dev

On Thu, 21 Sep 2017 19:57:20 +1000
Michael Neuling <mikey@neuling.org> wrote:

> On Thu, 2017-09-21 at 18:18 +1000, Nicholas Piggin wrote:
> > On Thu, 21 Sep 2017 12:04:34 +1000
> > Michael Neuling <mikey@neuling.org> wrote:
> >   
> > > On POWER9 DD2.1 and below, it's possible to get Machine Check
> > > Exception (MCE) where only DSISR bit 33 is set. This will result in
> > > the linux MCE handler seeing an unknown event, which triggers linux to
> > > crash.
> > > 
> > > We change this by detecting unknown events in the MCE handler and
> > > marking them as handled so that we no longer crash. We do this only on
> > > chip revisions known to have this problem.
> > > 
> > > MCE that occurs like this is spurious, so we don't need to do anything
> > > in terms of servicing it. If there is something that needs to be
> > > serviced, the CPU will raise the MCE again with the correct DSISR so
> > > that it can be serviced properly.
> > > 
> > > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > > ---
> > > v2 update commit message based on Balbir's comments
> > > ---
> > >  arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++
> > >  1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/kernel/mce_power.c
> > > b/arch/powerpc/kernel/mce_power.c
> > > index b76ca198e0..72ec667136 100644
> > > --- a/arch/powerpc/kernel/mce_power.c
> > > +++ b/arch/powerpc/kernel/mce_power.c
> > > @@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs,
> > >  	uint64_t addr;
> > >  	uint64_t srr1 = regs->msr;
> > >  	long handled;
> > > +	unsigned long pvr;
> > >  
> > >  	if (SRR1_MC_LOADSTORE(srr1))
> > >  		handled = mce_handle_derror(regs, dtable, &mce_err, &addr);
> > > @@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs,
> > >  	if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE)
> > >  		handled = mce_handle_ue_error(regs);
> > >  
> > > +	/*
> > > +	 * On POWER9 DD2.1 and below, it's possible to get machine
> > > +	 * check where only DSISR bit 33 is set. This will result in
> > > +	 * the MCE handler seeing an unknown event and us crashing.
> > > +	 * Change this to mark as handled on these revisions.
> > > +	 */
> > > +	pvr = mfspr(SPRN_PVR);
> > > +	if (((PVR_VER(pvr) == PVR_POWER9) &&
> > > +	     (PVR_CFG(pvr) == 2) &&
> > > +	     (PVR_MIN(pvr) <= 1)) || cpu_has_feature(CPU_FTR_POWER9_DD1))
> > > +		/* DD2.1 and below */
> > > +		if (mce_err.error_type == MCE_ERROR_TYPE_UNKNOWN)
> > > +		    handled = 1;  
> > 
> > I might be missing something, but can you just do
> > 
> >   if (regs->dsisr == 0x40000000)
> >       return 1;
> > 
> > In __machine_check_early_realmode_p9() ?  
> 
> You're right, thanks.

If you leave the PVR and DD1 checks in there, it would be a good
reminder for me to convert into a quirk if I can get this version
specific quirks stuff going

https://marc.info/?l=linuxppc-embedded&m=150597337720114&w=2

Thanks,
Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-09-21 12:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21  2:04 [PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set Michael Neuling
2017-09-21  8:18 ` Nicholas Piggin
2017-09-21  9:57   ` Michael Neuling
2017-09-21 12:44     ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.