linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments
@ 2015-04-20 17:09 Andrew Cooper
  2015-04-20 17:11 ` David Vrabel
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-04-20 17:09 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	linux-kernel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel, Rusty Russell, lguest

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: xen-devel <xen-devel@lists.xen.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: lguest@lists.ozlabs.org

---
This patch is RFC because I am not certain that native_save_fl() is
necessarily the correct solution on lguest, but it does seem that setup_smap()
wants to check the actual AC bit, rather than an idealised value.

A different approach, given the dual nature of the AC flag now is to gate
setup_smap() on a kernel rpl of 0.  SMAP necessarily can't be used in a
paravirtual situation where the kernel runs in cpl > 0.

Another different approach would be to formally state that
pv_irq_ops.save_fl() needs to return all the flags, which would make
local_irq_save() safe to use in this circumstance, but that makes a hotpath
longer for the sake of a single boot time check.
---
 arch/x86/kernel/cpu/common.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04..4f2fded 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -291,10 +291,9 @@ __setup("nosmap", setup_disable_smap);
 
 static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 {
-	unsigned long eflags;
+	unsigned long eflags = native_save_fl();
 
 	/* This should have been cleared long ago */
-	raw_local_save_flags(eflags);
 	BUG_ON(eflags & X86_EFLAGS_AC);
 
 	if (cpu_has(c, X86_FEATURE_SMAP)) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments
  2015-04-20 17:09 [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Andrew Cooper
@ 2015-04-20 17:11 ` David Vrabel
  2015-04-21  0:35 ` Andy Lutomirski
  2015-04-21  8:36 ` [Xen-devel] [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: David Vrabel @ 2015-04-20 17:11 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Rusty Russell, lguest

On 20/04/15 18:09, Andrew Cooper wrote:
> There appears to be no formal statement of what pv_irq_ops.save_fl() is
> supposed to return precisely.  Native returns the full flags, while lguest and
> Xen only return the Interrupt Flag, and both have comments by the
> implementations stating that only the Interrupt Flag is looked at.  This may
> have been true when initially implemented, but no longer is.
> 
> To make matters worse, the Xen PVOP leaves the upper bits undefined, making
> the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
> guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
> build, but not consistent for all builds.  It has also been a sitting timebomb
> since SMAP support was introduced.
> 
> Use native_save_fl() instead, which will obtain an accurate view of the AC
> flag.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments
  2015-04-20 17:09 [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Andrew Cooper
  2015-04-20 17:11 ` David Vrabel
@ 2015-04-21  0:35 ` Andy Lutomirski
  2015-04-21  5:07   ` Rusty Russell
                     ` (2 more replies)
  2015-04-21  8:36 ` [Xen-devel] [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Jan Beulich
  2 siblings, 3 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-04-21  0:35 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, lguest, X86 ML, Denys Vlasenko

On 04/20/2015 10:09 AM, Andrew Cooper wrote:
> There appears to be no formal statement of what pv_irq_ops.save_fl() is
> supposed to return precisely.  Native returns the full flags, while lguest and
> Xen only return the Interrupt Flag, and both have comments by the
> implementations stating that only the Interrupt Flag is looked at.  This may
> have been true when initially implemented, but no longer is.
>
> To make matters worse, the Xen PVOP leaves the upper bits undefined, making
> the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
> guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
> build, but not consistent for all builds.  It has also been a sitting timebomb
> since SMAP support was introduced.
>
> Use native_save_fl() instead, which will obtain an accurate view of the AC
> flag.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: H. Peter Anvin <hpa@zytor.com>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: David Vrabel <david.vrabel@citrix.com>
> CC: xen-devel <xen-devel@lists.xen.org>
> CC: Rusty Russell <rusty@rustcorp.com.au>
> CC: lguest@lists.ozlabs.org
>
> ---
> This patch is RFC because I am not certain that native_save_fl() is
> necessarily the correct solution on lguest, but it does seem that setup_smap()
> wants to check the actual AC bit, rather than an idealised value.
>
> A different approach, given the dual nature of the AC flag now is to gate
> setup_smap() on a kernel rpl of 0.  SMAP necessarily can't be used in a
> paravirtual situation where the kernel runs in cpl > 0.
>
> Another different approach would be to formally state that
> pv_irq_ops.save_fl() needs to return all the flags, which would make
> local_irq_save() safe to use in this circumstance, but that makes a hotpath
> longer for the sake of a single boot time check.

...which reminds me:

Why does native_restore_fl restore anything other than IF?  A branch and 
sti should be considerably faster than popf.

Also, if we did this, could Xen use PVI and then use native_restore_fl 
and avoid lots of pvops?

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments
  2015-04-21  0:35 ` Andy Lutomirski
@ 2015-04-21  5:07   ` Rusty Russell
  2015-04-21  8:26   ` Andrew Cooper
  2015-04-21 12:45   ` [RFC PATCH] x86/asm/irq: Don't use POPF but STI Ingo Molnar
  2 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2015-04-21  5:07 UTC (permalink / raw)
  To: Andy Lutomirski, Andrew Cooper, Xen-devel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel, lguest,
	X86 ML, Denys Vlasenko

Andy Lutomirski <luto@kernel.org> writes:
> On 04/20/2015 10:09 AM, Andrew Cooper wrote:
>> There appears to be no formal statement of what pv_irq_ops.save_fl() is
>> supposed to return precisely.  Native returns the full flags, while lguest and
>> Xen only return the Interrupt Flag, and both have comments by the
>> implementations stating that only the Interrupt Flag is looked at.  This may
>> have been true when initially implemented, but no longer is.
>>
>> To make matters worse, the Xen PVOP leaves the upper bits undefined, making
>> the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
>> guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
>> build, but not consistent for all builds.  It has also been a sitting timebomb
>> since SMAP support was introduced.
>>
>> Use native_save_fl() instead, which will obtain an accurate view of the AC
>> flag.

That should work for lguest.  Indeed, it does (in practice those bits
are 0).

Tested-by: Rusty Russell <rusty@rustcorp.com.au> (lguest)

Thanks,
Rusty.

>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: H. Peter Anvin <hpa@zytor.com>
>> CC: x86@kernel.org
>> CC: linux-kernel@vger.kernel.org
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> CC: David Vrabel <david.vrabel@citrix.com>
>> CC: xen-devel <xen-devel@lists.xen.org>
>> CC: Rusty Russell <rusty@rustcorp.com.au>
>> CC: lguest@lists.ozlabs.org
>>
>> ---
>> This patch is RFC because I am not certain that native_save_fl() is
>> necessarily the correct solution on lguest, but it does seem that setup_smap()
>> wants to check the actual AC bit, rather than an idealised value.
>>
>> A different approach, given the dual nature of the AC flag now is to gate
>> setup_smap() on a kernel rpl of 0.  SMAP necessarily can't be used in a
>> paravirtual situation where the kernel runs in cpl > 0.
>>
>> Another different approach would be to formally state that
>> pv_irq_ops.save_fl() needs to return all the flags, which would make
>> local_irq_save() safe to use in this circumstance, but that makes a hotpath
>> longer for the sake of a single boot time check.
>
> ...which reminds me:
>
> Why does native_restore_fl restore anything other than IF?  A branch and 
> sti should be considerably faster than popf.
>
> Also, if we did this, could Xen use PVI and then use native_restore_fl 
> and avoid lots of pvops?
>
> --Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments
  2015-04-21  0:35 ` Andy Lutomirski
  2015-04-21  5:07   ` Rusty Russell
@ 2015-04-21  8:26   ` Andrew Cooper
  2015-04-21 12:45   ` [RFC PATCH] x86/asm/irq: Don't use POPF but STI Ingo Molnar
  2 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-04-21  8:26 UTC (permalink / raw)
  To: Andy Lutomirski, Xen-devel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, lguest, Denys Vlasenko

On 21/04/2015 01:35, Andy Lutomirski wrote:
> On 04/20/2015 10:09 AM, Andrew Cooper wrote:
>> There appears to be no formal statement of what pv_irq_ops.save_fl() is
>> supposed to return precisely.  Native returns the full flags, while
>> lguest and
>> Xen only return the Interrupt Flag, and both have comments by the
>> implementations stating that only the Interrupt Flag is looked at. 
>> This may
>> have been true when initially implemented, but no longer is.
>>
>> To make matters worse, the Xen PVOP leaves the upper bits undefined,
>> making
>> the BUG_ON() undefined behaviour.  Experimentally, this now trips for
>> 32bit PV
>> guests on Broadwell hardware.  The BUG_ON() is consistent for an
>> individual
>> build, but not consistent for all builds.  It has also been a sitting
>> timebomb
>> since SMAP support was introduced.
>>
>> Use native_save_fl() instead, which will obtain an accurate view of
>> the AC
>> flag.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: H. Peter Anvin <hpa@zytor.com>
>> CC: x86@kernel.org
>> CC: linux-kernel@vger.kernel.org
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> CC: David Vrabel <david.vrabel@citrix.com>
>> CC: xen-devel <xen-devel@lists.xen.org>
>> CC: Rusty Russell <rusty@rustcorp.com.au>
>> CC: lguest@lists.ozlabs.org
>>
>> ---
>> This patch is RFC because I am not certain that native_save_fl() is
>> necessarily the correct solution on lguest, but it does seem that
>> setup_smap()
>> wants to check the actual AC bit, rather than an idealised value.
>>
>> A different approach, given the dual nature of the AC flag now is to
>> gate
>> setup_smap() on a kernel rpl of 0.  SMAP necessarily can't be used in a
>> paravirtual situation where the kernel runs in cpl > 0.
>>
>> Another different approach would be to formally state that
>> pv_irq_ops.save_fl() needs to return all the flags, which would make
>> local_irq_save() safe to use in this circumstance, but that makes a
>> hotpath
>> longer for the sake of a single boot time check.
>
> ...which reminds me:
>
> Why does native_restore_fl restore anything other than IF?  A branch
> and sti should be considerably faster than popf.

I was wondering about the performance aspect, given a comment in your
patch which removed sysret64, but hadn't had time to investigate yet.

Unfortunately, irq_save()/irq_enable()/irq_restore() appears to be a
used pattern in the kernel, making the irq_restore() disable interrupts.

The performance improvement might be worth explicitly moving the onus
into the caller with irq_maybe_disable()/irq_maybe_enable(), but that
does involve altering a lot of common code for an architecture specific
gain.

>
> Also, if we did this, could Xen use PVI and then use native_restore_fl
> and avoid lots of pvops?

Xen HVM guests already use the native pvops in this area, so would
benefit from any improvement.  PV guests on the other hand run with cpl
> 0 and instead have a writeable mask in a piece of shared memory with
Xen, and need the pvop.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments
  2015-04-20 17:09 [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Andrew Cooper
  2015-04-20 17:11 ` David Vrabel
  2015-04-21  0:35 ` Andy Lutomirski
@ 2015-04-21  8:36 ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2015-04-21  8:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: David Vrabel, x86, Thomas Gleixner, lguest, Xen-devel,
	Boris Ostrovsky, Ingo Molnar, Rusty Russell, linux-kernel,
	H. Peter Anvin

>>> On 20.04.15 at 19:09, <andrew.cooper3@citrix.com> wrote:
> A different approach, given the dual nature of the AC flag now is to gate
> setup_smap() on a kernel rpl of 0.  SMAP necessarily can't be used in a
> paravirtual situation where the kernel runs in cpl > 0.

"Can't" isn't true here - for 64-bit PV Xen guests, which already
toggle between two page table variants for kernel and user mode,
it would be possible (but perhaps expensive) to mimic the needed
behavior by introducing a 3rd set of page tables, containing only
the kernel mappings. You may recall that I had even posted an
RFC patch to tat effect about a year ago.

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH] x86/asm/irq: Don't use POPF but STI
  2015-04-21  0:35 ` Andy Lutomirski
  2015-04-21  5:07   ` Rusty Russell
  2015-04-21  8:26   ` Andrew Cooper
@ 2015-04-21 12:45   ` Ingo Molnar
  2015-04-21 13:09     ` Borislav Petkov
  2015-04-21 16:12     ` Linus Torvalds
  2 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2015-04-21 12:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, Xen-devel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, linux-kernel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel, Rusty Russell, lguest,
	Denys Vlasenko, Linus Torvalds, Borislav Petkov


* Andy Lutomirski <luto@kernel.org> wrote:

> > Another different approach would be to formally state that 
> > pv_irq_ops.save_fl() needs to return all the flags, which would 
> > make local_irq_save() safe to use in this circumstance, but that 
> > makes a hotpath longer for the sake of a single boot time check.
> 
> ...which reminds me:
> 
> Why does native_restore_fl restore anything other than IF?  A branch 
> and sti should be considerably faster than popf.

Yes, this has come up in the past, something like the patch below?

Totally untested and not signed off yet: because we'd first have to 
make sure (via irq flags debugging) that it's not used in reverse, to 
re-disable interrupts:

	local_irq_save(flags);
	local_irq_enable();
	...
	local_irq_restore(flags); /* effective local_irq_disable() */

I don't think we have many (any?) such patterns left, but it has to be 
checked first. If we have such cases then we'll have to use different 
primitives there.

But this patch should be good enough to give an good overview of the 
effects: the text impact does not look too horrible, if we decide to 
do this. The bloat of +1K on x86 defconfig is better than I first 
feared.

Thanks,

	Ingo

======================>
>From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 21 Apr 2015 13:32:13 +0200
Subject: [PATCH] x86/asm/irq: Don't use POPF but STI

So because the POPF instruction is slow and STI is faster on 
essentially all x86 CPUs that matter, instead of:

  ffffffff81891848:       9d                      popfq

we can do:

  ffffffff81661a2e:       41 f7 c4 00 02 00 00    test   $0x200,%r12d
  ffffffff81661a35:       74 01                   je     ffffffff81661a38 <snd_pcm_stream_unlock_irqrestore+0x28>
  ffffffff81661a37:       fb                      sti
  ffffffff81661a38:

This bloats the kernel a bit, by about 1K on the 64-bit defconfig:

   text    data     bss     dec     hex filename
   12258634        1812120 1085440 15156194         e743e2 vmlinux.before
   12259582        1812120 1085440 15157142         e74796 vmlinux.after

the other cost is the extra branching, adding extra pressure to the
branch prediction hardware and also potential branch misses.
---
 arch/x86/include/asm/irqflags.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index b77f5edb03b0..8bc2a9bc7a06 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -69,7 +69,8 @@ static inline notrace unsigned long arch_local_save_flags(void)
 
 static inline notrace void arch_local_irq_restore(unsigned long flags)
 {
-	native_restore_fl(flags);
+	if (likely(flags & X86_EFLAGS_IF))
+		native_irq_enable();
 }
 
 static inline notrace void arch_local_irq_disable(void)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI
  2015-04-21 12:45   ` [RFC PATCH] x86/asm/irq: Don't use POPF but STI Ingo Molnar
@ 2015-04-21 13:09     ` Borislav Petkov
  2015-04-21 15:22       ` Ingo Molnar
  2015-04-21 16:12     ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2015-04-21 13:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Andrew Cooper, Xen-devel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, lguest, Denys Vlasenko, Linus Torvalds

On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote:
> From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@kernel.org>
> Date: Tue, 21 Apr 2015 13:32:13 +0200
> Subject: [PATCH] x86/asm/irq: Don't use POPF but STI
> 
> So because the POPF instruction is slow and STI is faster on 
> essentially all x86 CPUs that matter, instead of:
> 
>   ffffffff81891848:       9d                      popfq
> 
> we can do:
> 
>   ffffffff81661a2e:       41 f7 c4 00 02 00 00    test   $0x200,%r12d
>   ffffffff81661a35:       74 01                   je     ffffffff81661a38 <snd_pcm_stream_unlock_irqrestore+0x28>
>   ffffffff81661a37:       fb                      sti
>   ffffffff81661a38:
> 
> This bloats the kernel a bit, by about 1K on the 64-bit defconfig:
> 
>    text    data     bss     dec     hex filename
>    12258634        1812120 1085440 15156194         e743e2 vmlinux.before
>    12259582        1812120 1085440 15157142         e74796 vmlinux.after
> 
> the other cost is the extra branching, adding extra pressure to the
> branch prediction hardware and also potential branch misses.

Do we care? After we enable interrupts, we'll most likely go somewhere
cache "cold" anyway, so the branch misses will happen anyway.

The question is, would the cost drop from POPF -> STI cover the increase
in branch misses overhead?

Hmm, interesting.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI
  2015-04-21 13:09     ` Borislav Petkov
@ 2015-04-21 15:22       ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2015-04-21 15:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Andrew Cooper, Xen-devel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, lguest, Denys Vlasenko, Linus Torvalds


* Borislav Petkov <bp@alien8.de> wrote:

> On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote:
> > From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <mingo@kernel.org>
> > Date: Tue, 21 Apr 2015 13:32:13 +0200
> > Subject: [PATCH] x86/asm/irq: Don't use POPF but STI
> > 
> > So because the POPF instruction is slow and STI is faster on 
> > essentially all x86 CPUs that matter, instead of:
> > 
> >   ffffffff81891848:       9d                      popfq
> > 
> > we can do:
> > 
> >   ffffffff81661a2e:       41 f7 c4 00 02 00 00    test   $0x200,%r12d
> >   ffffffff81661a35:       74 01                   je     ffffffff81661a38 <snd_pcm_stream_unlock_irqrestore+0x28>
> >   ffffffff81661a37:       fb                      sti
> >   ffffffff81661a38:
> > 
> > This bloats the kernel a bit, by about 1K on the 64-bit defconfig:
> > 
> >    text    data     bss     dec     hex filename
> >    12258634        1812120 1085440 15156194         e743e2 vmlinux.before
> >    12259582        1812120 1085440 15157142         e74796 vmlinux.after
> > 
> > the other cost is the extra branching, adding extra pressure to the
> > branch prediction hardware and also potential branch misses.
> 
> Do we care? [...]

Only if it makes stuff faster.

> [...] After we enable interrupts, we'll most likely go somewhere 
> cache "cold" anyway, so the branch misses will happen anyway.
> 
> The question is, would the cost drop from POPF -> STI cover the 
> increase in branch misses overhead?
> 
> Hmm, interesting.

So there's a few places where the POPF is a STI in 100% of the cases. 
It's probably a win there.

But my main worry would be sites that are 'multi use', such as locking 
APIs - for example spin_unlock_irqrestore(): those tend to be called 
from different code paths, and each one has a different IRQ flags 
state.

For example scheduler wakeups done from irqs-off codepaths (it's very 
common), or from irqs-on codepaths (that's very common as well). In 
the former case we won't have a STI, in the latter case we will - and 
both would hit a POPF at the end of the critical section. The 
probability of a branch prediction miss is high in this case.

So the question is, is the POPF/STI performance difference higher than 
the average cost of branch misses. If yes, then the change is probably 
a win. If not, then it's probably a loss.

My gut feeling is that we should let the hardware do it, i.e. we 
should continue to use POPF - but I can be convinced ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI
  2015-04-21 12:45   ` [RFC PATCH] x86/asm/irq: Don't use POPF but STI Ingo Molnar
  2015-04-21 13:09     ` Borislav Petkov
@ 2015-04-21 16:12     ` Linus Torvalds
  2015-04-21 22:39       ` Andy Lutomirski
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2015-04-21 16:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Andrew Cooper, Xen-devel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel, Rusty Russell, lguest,
	Denys Vlasenko, Borislav Petkov

On Tue, Apr 21, 2015 at 5:45 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Totally untested and not signed off yet: because we'd first have to
> make sure (via irq flags debugging) that it's not used in reverse, to
> re-disable interrupts:

Not only might that happen in some place, I *really* doubt that a
conditional 'sti' is actually any faster. The only way it's going to
be measurably faster is if you run some microbenchmark so that the
code is hot and the branch predicts well.

"popf" is fast for the "no changes to IF" case, and is a smaller
instruction anyway. I'd really hate to make this any more complex
unless somebody has some real numbers for performance improvement
(that is *not* just some cycle timing from a bogus test-case, but real
measurements on a real load).

And even *with* real measurements, I'd worry about the "use popf to
clear IF" case.

           Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] x86/asm/irq: Don't use POPF but STI
  2015-04-21 16:12     ` Linus Torvalds
@ 2015-04-21 22:39       ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-04-21 22:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andy Lutomirski, Andrew Cooper, Xen-devel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, lguest, Denys Vlasenko, Borislav Petkov

On Tue, Apr 21, 2015 at 9:12 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Apr 21, 2015 at 5:45 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> Totally untested and not signed off yet: because we'd first have to
>> make sure (via irq flags debugging) that it's not used in reverse, to
>> re-disable interrupts:
>
> Not only might that happen in some place, I *really* doubt that a
> conditional 'sti' is actually any faster. The only way it's going to
> be measurably faster is if you run some microbenchmark so that the
> code is hot and the branch predicts well.
>
> "popf" is fast for the "no changes to IF" case, and is a smaller
> instruction anyway. I'd really hate to make this any more complex
> unless somebody has some real numbers for performance improvement
> (that is *not* just some cycle timing from a bogus test-case, but real
> measurements on a real load).
>
> And even *with* real measurements, I'd worry about the "use popf to
> clear IF" case.

Fair enough.  Maybe I'll benchmark this some day.

--Andy

>
>            Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-04-21 22:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-20 17:09 [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Andrew Cooper
2015-04-20 17:11 ` David Vrabel
2015-04-21  0:35 ` Andy Lutomirski
2015-04-21  5:07   ` Rusty Russell
2015-04-21  8:26   ` Andrew Cooper
2015-04-21 12:45   ` [RFC PATCH] x86/asm/irq: Don't use POPF but STI Ingo Molnar
2015-04-21 13:09     ` Borislav Petkov
2015-04-21 15:22       ` Ingo Molnar
2015-04-21 16:12     ` Linus Torvalds
2015-04-21 22:39       ` Andy Lutomirski
2015-04-21  8:36 ` [Xen-devel] [PATCH] [RFC] x86/cpu: Fix SMAP check in PVOPS environments Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).