linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
@ 2018-06-14 19:36 Siarhei Liakh
  2018-06-18 21:47 ` Andy Lutomirski
  2018-06-20  9:48 ` [tip:x86/urgent] " tip-bot for Siarhei Liakh
  0 siblings, 2 replies; 6+ messages in thread
From: Siarhei Liakh @ 2018-06-14 19:36 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov

fpu__drop() has an explicit fwait which under some conditions can trigger
a fixable FPU exception while in kernel. Thus, we should attempt to fixup
the exception first, and only call notify_die() if the fixup failed just
like in do_general_protection(). The original call sequence incorrectly
triggers KDB entry on debug kernels under particular FPU-intensive
workloads. This issue had been privately observed, fixed, and tested 
on 4.9.98, while this patch brings the fix to the upstream.

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
---

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a535dd6..68d77a3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -835,16 +835,18 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
 						"simd exception";
 
-	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) == NOTIFY_STOP)
-		return;
 	cond_local_irq_enable(regs);
 
 	if (!user_mode(regs)) {
-		if (!fixup_exception(regs, trapnr)) {
-			task->thread.error_code = error_code;
-			task->thread.trap_nr = trapnr;
+		if (fixup_exception(regs, trapnr))
+			return;
+
+		task->thread.error_code = error_code;
+		task->thread.trap_nr = trapnr;
+
+		if (notify_die(DIE_TRAP, str, regs, error_code,
+					trapnr, SIGFPE) != NOTIFY_STOP)
 			die(str, regs, error_code);
-		}
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-14 19:36 [PATCH] x86: Call fixup_exception() before notify_die() in math_error() Siarhei Liakh
@ 2018-06-18 21:47 ` Andy Lutomirski
  2018-06-19  6:23   ` Thomas Gleixner
  2018-06-20  9:48 ` [tip:x86/urgent] " tip-bot for Siarhei Liakh
  1 sibling, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2018-06-18 21:47 UTC (permalink / raw)
  To: Siarhei.Liakh
  Cc: LKML, X86 ML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Lutomirski, Borislav Petkov

On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
<Siarhei.Liakh@concurrent-rt.com> wrote:
>
> fpu__drop() has an explicit fwait which under some conditions can trigger
> a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> the exception first, and only call notify_die() if the fixup failed just
> like in do_general_protection(). The original call sequence incorrectly
> triggers KDB entry on debug kernels under particular FPU-intensive
> workloads. This issue had been privately observed, fixed, and tested
> on 4.9.98, while this patch brings the fix to the upstream.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

With the caveat that you are perpetuating what is arguably a bug in
some of the other entries: math_error() can now be called with IRQs
off and return with IRQs on.  If we actually start asserting good
behavior in the entry code, we'll need to fix this.

--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-18 21:47 ` Andy Lutomirski
@ 2018-06-19  6:23   ` Thomas Gleixner
  2018-06-19 15:23     ` Andy Lutomirski
       [not found]     ` <DM5PR11MB2011397361BB5C0FB3D4FDFFB1700@DM5PR11MB2011.namprd11.prod.outlook.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2018-06-19  6:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Siarhei.Liakh, LKML, X86 ML, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov

On Mon, 18 Jun 2018, Andy Lutomirski wrote:

> On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
> <Siarhei.Liakh@concurrent-rt.com> wrote:
> >
> > fpu__drop() has an explicit fwait which under some conditions can trigger
> > a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> > the exception first, and only call notify_die() if the fixup failed just
> > like in do_general_protection(). The original call sequence incorrectly
> > triggers KDB entry on debug kernels under particular FPU-intensive
> > workloads. This issue had been privately observed, fixed, and tested
> > on 4.9.98, while this patch brings the fix to the upstream.
> 
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> 
> With the caveat that you are perpetuating what is arguably a bug in
> some of the other entries: math_error() can now be called with IRQs
> off and return with IRQs on.  If we actually start asserting good
> behavior in the entry code, we'll need to fix this.

Confused. math_error() is still invoked with interrupts off. What's
different now is that notify_die() is called with interrupts conditionally
enabled while upstream it's always called with interrupts disabled.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-19  6:23   ` Thomas Gleixner
@ 2018-06-19 15:23     ` Andy Lutomirski
       [not found]     ` <DM5PR11MB2011397361BB5C0FB3D4FDFFB1700@DM5PR11MB2011.namprd11.prod.outlook.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Andy Lutomirski @ 2018-06-19 15:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Siarhei.Liakh, LKML, X86 ML, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov



> On Jun 18, 2018, at 11:23 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Mon, 18 Jun 2018, Andy Lutomirski wrote:
>> 
>> On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
>> <Siarhei.Liakh@concurrent-rt.com> wrote:
>>> 
>>> fpu__drop() has an explicit fwait which under some conditions can trigger
>>> a fixable FPU exception while in kernel. Thus, we should attempt to fixup
>>> the exception first, and only call notify_die() if the fixup failed just
>>> like in do_general_protection(). The original call sequence incorrectly
>>> triggers KDB entry on debug kernels under particular FPU-intensive
>>> workloads. This issue had been privately observed, fixed, and tested
>>> on 4.9.98, while this patch brings the fix to the upstream.
>> 
>> Reviewed-by: Andy Lutomirski <luto@kernel.org>
>> 
>> With the caveat that you are perpetuating what is arguably a bug in
>> some of the other entries: math_error() can now be called with IRQs
>> off and return with IRQs on.  If we actually start asserting good
>> behavior in the entry code, we'll need to fix this.
> 
> Confused. math_error() is still invoked with interrupts off. What's
> different now is that notify_die() is called with interrupts conditionally
> enabled while upstream it's always called with interrupts disabled.

True, but I don’t think that matters. What I’m grumbling about is that we can do cond_local_irq_enable() and then return without local_irq_disable().

Anyway, I think the patch is fine as is. We can unsuck the entry IRQ handling another day.

> 
> Thanks,
> 
>    tglx
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()
       [not found]       ` <B739D98B-BA3F-40F6-82DF-6444546B6B4C@amacapital.net>
@ 2018-06-19 19:56         ` Siarhei Liakh
  0 siblings, 0 replies; 6+ messages in thread
From: Siarhei Liakh @ 2018-06-19 19:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Andy Lutomirski, LKML, X86 ML, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov

On Tue, 19 Jun 2018, Andy Lutomirski wrote:   

> On Jun 19, 2018, at 9:15 AM, Siarhei Liakh <Siarhei.Liakh@concurrent-rt.com> wrote:
> 
> > On Mon, 18 Jun 2018, Andy Lutomirski wrote:
> > 
> > > > On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
> > > > <Siarhei.Liakh@concurrent-rt.com> wrote:
> > > > >
> > > > > fpu__drop() has an explicit fwait which under some conditions can trigger
> > > > > a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> > > > > the exception first, and only call notify_die() if the fixup failed just
> > > > > like in do_general_protection(). The original call sequence incorrectly
> > > > > triggers KDB entry on debug kernels under particular FPU-intensive
> > > > > workloads. This issue had been privately observed, fixed, and tested
> > > > > on 4.9.98, while this patch brings the fix to the upstream.
> > > > 
> > > > Reviewed-by: Andy Lutomirski <luto@kernel.org>
> > > > 
> > > > With the caveat that you are perpetuating what is arguably a bug in
> > > > some of the other entries: math_error() can now be called with IRQs
> > > > off and return with IRQs on.  If we actually start asserting good
> > > > behavior in the entry code, we'll need to fix this.
> > > 
> > > Confused. math_error() is still invoked with interrupts off. What's
> > > different now is that notify_die() is called with interrupts conditionally
> > > enabled while upstream it's always called with interrupts disabled.
> > 
> > I see that notify_die() is being called either way in upstream (ex:
> > do_general_protection() and do_iret_error() vs do_bounds() and etc.).
> > Is there some some sort of general policy/guide documentation available
> > which outlines the expectations of notify_die(), as well as its notifiers?
> 
> I doubt it.
> 
> The right fix is to delete notify_die(), not to document it. kernel debuggers should
> hook die() directly, and other users (if any) should be moved into the error handlers.

Got it. Unfortunately, this looks like a whole separate code refactoring project
which I cannot undertake at this time. In the mean time, this patch offers a fix for
an immediate issue (KDB tripped when it shouldn't) even if it does nothing to
address the deficiencies in the framework itself. 

Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/urgent] x86: Call fixup_exception() before notify_die() in math_error()
  2018-06-14 19:36 [PATCH] x86: Call fixup_exception() before notify_die() in math_error() Siarhei Liakh
  2018-06-18 21:47 ` Andy Lutomirski
@ 2018-06-20  9:48 ` tip-bot for Siarhei Liakh
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Siarhei Liakh @ 2018-06-20  9:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Siarhei.Liakh, linux-kernel, bpetkov, tglx, siarhei.liakh, mingo,
	luto, hpa

Commit-ID:  3ae6295ccb7cf6d344908209701badbbbb503e40
Gitweb:     https://git.kernel.org/tip/3ae6295ccb7cf6d344908209701badbbbb503e40
Author:     Siarhei Liakh <Siarhei.Liakh@concurrent-rt.com>
AuthorDate: Thu, 14 Jun 2018 19:36:07 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 20 Jun 2018 11:44:56 +0200

x86: Call fixup_exception() before notify_die() in math_error()

fpu__drop() has an explicit fwait which under some conditions can trigger a
fixable FPU exception while in kernel. Thus, we should attempt to fixup the
exception first, and only call notify_die() if the fixup failed just like
in do_general_protection(). The original call sequence incorrectly triggers
KDB entry on debug kernels under particular FPU-intensive workloads.

Andy noted, that this makes the whole conditional irq enable thing even
more inconsistent, but fixing that it outside the scope of this.

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Borislav  Petkov" <bpetkov@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/DM5PR11MB201156F1CAB2592B07C79A03B17D0@DM5PR11MB2011.namprd11.prod.outlook.com

---
 arch/x86/kernel/traps.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 03f3d7695dac..162a31d80ad5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -834,16 +834,18 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
 						"simd exception";
 
-	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) == NOTIFY_STOP)
-		return;
 	cond_local_irq_enable(regs);
 
 	if (!user_mode(regs)) {
-		if (!fixup_exception(regs, trapnr)) {
-			task->thread.error_code = error_code;
-			task->thread.trap_nr = trapnr;
+		if (fixup_exception(regs, trapnr))
+			return;
+
+		task->thread.error_code = error_code;
+		task->thread.trap_nr = trapnr;
+
+		if (notify_die(DIE_TRAP, str, regs, error_code,
+					trapnr, SIGFPE) != NOTIFY_STOP)
 			die(str, regs, error_code);
-		}
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-20  9:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14 19:36 [PATCH] x86: Call fixup_exception() before notify_die() in math_error() Siarhei Liakh
2018-06-18 21:47 ` Andy Lutomirski
2018-06-19  6:23   ` Thomas Gleixner
2018-06-19 15:23     ` Andy Lutomirski
     [not found]     ` <DM5PR11MB2011397361BB5C0FB3D4FDFFB1700@DM5PR11MB2011.namprd11.prod.outlook.com>
     [not found]       ` <B739D98B-BA3F-40F6-82DF-6444546B6B4C@amacapital.net>
2018-06-19 19:56         ` Siarhei Liakh
2018-06-20  9:48 ` [tip:x86/urgent] " tip-bot for Siarhei Liakh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).