All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
@ 2018-06-14 19:36 Siarhei Liakh
  2018-06-18 21:47 ` Andy Lutomirski
  2018-06-20  9:48 ` [tip:x86/urgent] " tip-bot for Siarhei Liakh
  0 siblings, 2 replies; 6+ messages in thread
From: Siarhei Liakh @ 2018-06-14 19:36 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov

fpu__drop() has an explicit fwait which under some conditions can trigger
a fixable FPU exception while in kernel. Thus, we should attempt to fixup
the exception first, and only call notify_die() if the fixup failed just
like in do_general_protection(). The original call sequence incorrectly
triggers KDB entry on debug kernels under particular FPU-intensive
workloads. This issue had been privately observed, fixed, and tested 
on 4.9.98, while this patch brings the fix to the upstream.

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
---

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a535dd6..68d77a3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -835,16 +835,18 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
 						"simd exception";
 
-	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) == NOTIFY_STOP)
-		return;
 	cond_local_irq_enable(regs);
 
 	if (!user_mode(regs)) {
-		if (!fixup_exception(regs, trapnr)) {
-			task->thread.error_code = error_code;
-			task->thread.trap_nr = trapnr;
+		if (fixup_exception(regs, trapnr))
+			return;
+
+		task->thread.error_code = error_code;
+		task->thread.trap_nr = trapnr;
+
+		if (notify_die(DIE_TRAP, str, regs, error_code,
+					trapnr, SIGFPE) != NOTIFY_STOP)
 			die(str, regs, error_code);
-		}
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-14 19:36 [PATCH] x86: Call fixup_exception() before notify_die() in math_error() Siarhei Liakh
@ 2018-06-18 21:47 ` Andy Lutomirski
  2018-06-19  6:23   ` Thomas Gleixner
  2018-06-20  9:48 ` [tip:x86/urgent] " tip-bot for Siarhei Liakh
  1 sibling, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2018-06-18 21:47 UTC (permalink / raw)
  To: Siarhei.Liakh
  Cc: LKML, X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Lutomirski, Borislav Petkov

On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
<Siarhei.Liakh@concurrent-rt.com> wrote:
>
> fpu__drop() has an explicit fwait which under some conditions can trigger
> a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> the exception first, and only call notify_die() if the fixup failed just
> like in do_general_protection(). The original call sequence incorrectly
> triggers KDB entry on debug kernels under particular FPU-intensive
> workloads. This issue had been privately observed, fixed, and tested
> on 4.9.98, while this patch brings the fix to the upstream.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

With the caveat that you are perpetuating what is arguably a bug in
some of the other entries: math_error() can now be called with IRQs
off and return with IRQs on.  If we actually start asserting good
behavior in the entry code, we'll need to fix this.

--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-18 21:47 ` Andy Lutomirski
@ 2018-06-19  6:23   ` Thomas Gleixner
  2018-06-19 15:23     ` Andy Lutomirski
       [not found]     ` <DM5PR11MB2011397361BB5C0FB3D4FDFFB1700@DM5PR11MB2011.namprd11.prod.outlook.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2018-06-19  6:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Siarhei.Liakh, LKML, X86 ML, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov

On Mon, 18 Jun 2018, Andy Lutomirski wrote:

> On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
> <Siarhei.Liakh@concurrent-rt.com> wrote:
> >
> > fpu__drop() has an explicit fwait which under some conditions can trigger
> > a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> > the exception first, and only call notify_die() if the fixup failed just
> > like in do_general_protection(). The original call sequence incorrectly
> > triggers KDB entry on debug kernels under particular FPU-intensive
> > workloads. This issue had been privately observed, fixed, and tested
> > on 4.9.98, while this patch brings the fix to the upstream.
> 
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> 
> With the caveat that you are perpetuating what is arguably a bug in
> some of the other entries: math_error() can now be called with IRQs
> off and return with IRQs on.  If we actually start asserting good
> behavior in the entry code, we'll need to fix this.

Confused. math_error() is still invoked with interrupts off. What's
different now is that notify_die() is called with interrupts conditionally
enabled while upstream it's always called with interrupts disabled.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-19  6:23   ` Thomas Gleixner
@ 2018-06-19 15:23     ` Andy Lutomirski
       [not found]     ` <DM5PR11MB2011397361BB5C0FB3D4FDFFB1700@DM5PR11MB2011.namprd11.prod.outlook.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Andy Lutomirski @ 2018-06-19 15:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Siarhei.Liakh, LKML, X86 ML, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov



> On Jun 18, 2018, at 11:23 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Mon, 18 Jun 2018, Andy Lutomirski wrote:
>> 
>> On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
>> <Siarhei.Liakh@concurrent-rt.com> wrote:
>>> 
>>> fpu__drop() has an explicit fwait which under some conditions can trigger
>>> a fixable FPU exception while in kernel. Thus, we should attempt to fixup
>>> the exception first, and only call notify_die() if the fixup failed just
>>> like in do_general_protection(). The original call sequence incorrectly
>>> triggers KDB entry on debug kernels under particular FPU-intensive
>>> workloads. This issue had been privately observed, fixed, and tested
>>> on 4.9.98, while this patch brings the fix to the upstream.
>> 
>> Reviewed-by: Andy Lutomirski <luto@kernel.org>
>> 
>> With the caveat that you are perpetuating what is arguably a bug in
>> some of the other entries: math_error() can now be called with IRQs
>> off and return with IRQs on.  If we actually start asserting good
>> behavior in the entry code, we'll need to fix this.
> 
> Confused. math_error() is still invoked with interrupts off. What's
> different now is that notify_die() is called with interrupts conditionally
> enabled while upstream it's always called with interrupts disabled.

True, but I don’t think that matters. What I’m grumbling about is that we can do cond_local_irq_enable() and then return without local_irq_disable().

Anyway, I think the patch is fine as is. We can unsuck the entry IRQ handling another day.

> 
> Thanks,
> 
>    tglx
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
       [not found]       ` <B739D98B-BA3F-40F6-82DF-6444546B6B4C@amacapital.net>
@ 2018-06-19 19:56         ` Siarhei Liakh
  0 siblings, 0 replies; 6+ messages in thread
From: Siarhei Liakh @ 2018-06-19 19:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Andy Lutomirski, LKML, X86 ML, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov

On Tue, 19 Jun 2018, Andy Lutomirski wrote:   

> On Jun 19, 2018, at 9:15 AM, Siarhei Liakh <Siarhei.Liakh@concurrent-rt.com> wrote:
> 
> > On Mon, 18 Jun 2018, Andy Lutomirski wrote:
> > 
> > > > On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
> > > > <Siarhei.Liakh@concurrent-rt.com> wrote:
> > > > >
> > > > > fpu__drop() has an explicit fwait which under some conditions can trigger
> > > > > a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> > > > > the exception first, and only call notify_die() if the fixup failed just
> > > > > like in do_general_protection(). The original call sequence incorrectly
> > > > > triggers KDB entry on debug kernels under particular FPU-intensive
> > > > > workloads. This issue had been privately observed, fixed, and tested
> > > > > on 4.9.98, while this patch brings the fix to the upstream.
> > > > 
> > > > Reviewed-by: Andy Lutomirski <luto@kernel.org>
> > > > 
> > > > With the caveat that you are perpetuating what is arguably a bug in
> > > > some of the other entries: math_error() can now be called with IRQs
> > > > off and return with IRQs on.  If we actually start asserting good
> > > > behavior in the entry code, we'll need to fix this.
> > > 
> > > Confused. math_error() is still invoked with interrupts off. What's
> > > different now is that notify_die() is called with interrupts conditionally
> > > enabled while upstream it's always called with interrupts disabled.
> > 
> > I see that notify_die() is being called either way in upstream (ex:
> > do_general_protection() and do_iret_error() vs do_bounds() and etc.).
> > Is there some some sort of general policy/guide documentation available
> > which outlines the expectations of notify_die(), as well as its notifiers?
> 
> I doubt it.
> 
> The right fix is to delete notify_die(), not to document it. kernel debuggers should
> hook die() directly, and other users (if any) should be moved into the error handlers.

Got it. Unfortunately, this looks like a whole separate code refactoring project
which I cannot undertake at this time. In the mean time, this patch offers a fix for
an immediate issue (KDB tripped when it shouldn't) even if it does nothing to
address the deficiencies in the framework itself. 

Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/urgent] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-14 19:36 [PATCH] x86: Call fixup_exception() before notify_die() in math_error() Siarhei Liakh
  2018-06-18 21:47 ` Andy Lutomirski
@ 2018-06-20  9:48 ` tip-bot for Siarhei Liakh
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Siarhei Liakh @ 2018-06-20  9:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Siarhei.Liakh, linux-kernel, bpetkov, tglx, siarhei.liakh, mingo,
	luto, hpa

Commit-ID:  3ae6295ccb7cf6d344908209701badbbbb503e40
Gitweb:     https://git.kernel.org/tip/3ae6295ccb7cf6d344908209701badbbbb503e40
Author:     Siarhei Liakh <Siarhei.Liakh@concurrent-rt.com>
AuthorDate: Thu, 14 Jun 2018 19:36:07 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 20 Jun 2018 11:44:56 +0200

x86: Call fixup_exception() before notify_die() in math_error()

fpu__drop() has an explicit fwait which under some conditions can trigger a
fixable FPU exception while in kernel. Thus, we should attempt to fixup the
exception first, and only call notify_die() if the fixup failed just like
in do_general_protection(). The original call sequence incorrectly triggers
KDB entry on debug kernels under particular FPU-intensive workloads.

Andy noted, that this makes the whole conditional irq enable thing even
more inconsistent, but fixing that it outside the scope of this.

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Borislav  Petkov" <bpetkov@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/DM5PR11MB201156F1CAB2592B07C79A03B17D0@DM5PR11MB2011.namprd11.prod.outlook.com

---
 arch/x86/kernel/traps.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 03f3d7695dac..162a31d80ad5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -834,16 +834,18 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
 						"simd exception";
 
-	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) == NOTIFY_STOP)
-		return;
 	cond_local_irq_enable(regs);
 
 	if (!user_mode(regs)) {
-		if (!fixup_exception(regs, trapnr)) {
-			task->thread.error_code = error_code;
-			task->thread.trap_nr = trapnr;
+		if (fixup_exception(regs, trapnr))
+			return;
+
+		task->thread.error_code = error_code;
+		task->thread.trap_nr = trapnr;
+
+		if (notify_die(DIE_TRAP, str, regs, error_code,
+					trapnr, SIGFPE) != NOTIFY_STOP)
 			die(str, regs, error_code);
-		}
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-20  9:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14 19:36 [PATCH] x86: Call fixup_exception() before notify_die() in math_error() Siarhei Liakh
2018-06-18 21:47 ` Andy Lutomirski
2018-06-19  6:23   ` Thomas Gleixner
2018-06-19 15:23     ` Andy Lutomirski
     [not found]     ` <DM5PR11MB2011397361BB5C0FB3D4FDFFB1700@DM5PR11MB2011.namprd11.prod.outlook.com>
     [not found]       ` <B739D98B-BA3F-40F6-82DF-6444546B6B4C@amacapital.net>
2018-06-19 19:56         ` Siarhei Liakh
2018-06-20  9:48 ` [tip:x86/urgent] " tip-bot for Siarhei Liakh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.