All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/HACK] x86: Fast return to kernel
@ 2014-05-02 19:04 Andy Lutomirski
  2014-05-02 19:31 ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2014-05-02 19:04 UTC (permalink / raw)
  To: linux-kernel, x86, H. Peter Anvin, Linus Torvalds; +Cc: Andy Lutomirski

This speeds up my kernel_pf microbenchmark by about 17%.  The cfi
annotations need some work.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---

My test case is here:

https://gitorious.org/linux-test-utils/linux-clock-tests/source/kernel_pf.c

This could have some other interesting benefits.  For example, pages faults
that happen during an NMI might not re-enable NMIs.

 arch/x86/kernel/entry_64.S | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c36..922a057 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1033,9 +1033,24 @@ retint_swapgs:		/* return to user-space */
 retint_restore_args:	/* return to kernel space */
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	/*
-	 * The iretq could re-enable interrupts:
+	 * The popfq could re-enable interrupts:
 	 */
 	TRACE_IRQS_IRETQ
+
+	/* Fast return to kernel. */
+	movq RSP-ARGOFFSET(%rsp), %rsi
+	subq $16, %rsi
+	movq EFLAGS-ARGOFFSET(%rsp), %rdi
+	movq %rdi, (%rsi)
+	movq RIP-ARGOFFSET(%rsp), %rdi
+	movq %rdi, 8(%rsi)
+	movq %rsi, RIP-ARGOFFSET(%rsp)
+	RESTORE_ARGS 1,8,1
+	popq %rsp
+	popfq
+	/* Interrupts are still off because of the one-insn grace period. */
+	retq
+
 restore_args:
 	RESTORE_ARGS 1,8,1
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:04 [RFC/HACK] x86: Fast return to kernel Andy Lutomirski
@ 2014-05-02 19:31 ` Linus Torvalds
  2014-05-02 19:50   ` Andy Lutomirski
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2014-05-02 19:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	H. Peter Anvin, Steven Rostedt

On Fri, May 2, 2014 at 12:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> This speeds up my kernel_pf microbenchmark by about 17%.  The cfi
> annotations need some work.

Sadly, performance of page faults in kernel mode is pretty much
completely uninteresting. It simply doesn't happen on any real load.

That said, regular *device* interrupts do often return to kernel mode
(the idle loop in particular), so if you have any way to measure that,
that might be interesting, and might show some of the same advantages.

And NMI not being re-enabled might just be a real advantage. Adding
Steven to the cc to make him aware of this patch.

So I like the patch, I just think that selling it on a "page fault
cost" basis is not very interesting. The real advantages would be
elsewhere. The page fault case is mainly a good way to test that it
restores the registers correctly.

Also, are you *really* sure that "popf" has the same one-instruction
interrupt shadow that "sti" has? Because I'm not at all sure that is
true, and it's not documented as far as I can tell. In contrast, the
one-instruction shadow after "sti" very much _is_ documented.

You may need to have a separate paths for do/don't enable interrupts,
with the interrupt-enabling one clearing the IF bit on stack, and then
finishing with "popf ; sti ; retq" instead.

Side note related to the whole IF shadow: that thing is also really
easy to get wrong in emulation etc, so we should check with the
virtualization people..

              Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:31 ` Linus Torvalds
@ 2014-05-02 19:50   ` Andy Lutomirski
  2014-05-04 18:40     ` Ingo Molnar
  2014-05-02 19:51   ` Linus Torvalds
  2014-05-02 20:19   ` Steven Rostedt
  2 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2014-05-02 19:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	H. Peter Anvin, Steven Rostedt

On Fri, May 2, 2014 at 12:31 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, May 2, 2014 at 12:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> This speeds up my kernel_pf microbenchmark by about 17%.  The cfi
>> annotations need some work.
>
> Sadly, performance of page faults in kernel mode is pretty much
> completely uninteresting. It simply doesn't happen on any real load.

I wonder if mlock, mlockall, MAP_POPULATE and such would benefit.
Anyone who mmaps a file and writes from the mmapped area to a socket,
pipe, or another file would benefit, I think, although I haven't
checked exactly how that works.

>
> That said, regular *device* interrupts do often return to kernel mode
> (the idle loop in particular), so if you have any way to measure that,
> that might be interesting, and might show some of the same advantages.

I can try something awful involving measuring latency of
hardware-timed packets on a SolarFlare card, but I'll have calibration
issues.  I suppose I could see if 'ping' gets faster.  In general,
this will speed up interrupts that wake userspace from idle by about
100ns on my box, since it's presumably the same size and the speedup
per loop in my silly benchmark.

I bet that lat_ctx and such would speed up, but that's unfair, since
it's still just a bug.

>
> And NMI not being re-enabled might just be a real advantage. Adding
> Steven to the cc to make him aware of this patch.
>
> So I like the patch, I just think that selling it on a "page fault
> cost" basis is not very interesting. The real advantages would be
> elsewhere. The page fault case is mainly a good way to test that it
> restores the registers correctly.
>
> Also, are you *really* sure that "popf" has the same one-instruction
> interrupt shadow that "sti" has? Because I'm not at all sure that is
> true, and it's not documented as far as I can tell. In contrast, the
> one-instruction shadow after "sti" very much _is_ documented.
>
> You may need to have a separate paths for do/don't enable interrupts,
> with the interrupt-enabling one clearing the IF bit on stack, and then
> finishing with "popf ; sti ; retq" instead.

Hmm.  I think I may have mis-remembered - I can't find it either.
I'll try this.

I'll also grumble at the CFI stuff.  I'm not really sure how to test
it -- my copy of gdb isn't happy with the stack even before I start
fiddling with it.

--Andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:31 ` Linus Torvalds
  2014-05-02 19:50   ` Andy Lutomirski
@ 2014-05-02 19:51   ` Linus Torvalds
  2014-05-02 20:07     ` H. Peter Anvin
                       ` (2 more replies)
  2014-05-02 20:19   ` Steven Rostedt
  2 siblings, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2014-05-02 19:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	H. Peter Anvin, Steven Rostedt, Gleb Natapov, Paolo Bonzini

On Fri, May 2, 2014 at 12:31 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Also, are you *really* sure that "popf" has the same one-instruction
> interrupt shadow that "sti" has? Because I'm not at all sure that is
> true, and it's not documented as far as I can tell. In contrast, the
> one-instruction shadow after "sti" very much _is_ documented.

Yeah, I'm pretty sure about this. The only instructions with an
interrupt shadow are "sti", "mov ss" and "pop ss".

There may be specific microarchitectures that do it for a "popf" that
enables interrupts too, but that is not documented _anywhere_ I could
find.

Btw, on the "really easy to get wrong in emulation" note and looking
at the kernel sources: it looks like KVM gets "pop ss" wrong, and only
does the shadow on "mov ss".

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:51   ` Linus Torvalds
@ 2014-05-02 20:07     ` H. Peter Anvin
  2014-05-02 20:30     ` Thomas Gleixner
  2014-05-04 23:46     ` Paolo Bonzini
  2 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2014-05-02 20:07 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	Steven Rostedt, Gleb Natapov, Paolo Bonzini

On 05/02/2014 12:51 PM, Linus Torvalds wrote:
> On Fri, May 2, 2014 at 12:31 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Also, are you *really* sure that "popf" has the same one-instruction
>> interrupt shadow that "sti" has? Because I'm not at all sure that is
>> true, and it's not documented as far as I can tell. In contrast, the
>> one-instruction shadow after "sti" very much _is_ documented.
> 
> Yeah, I'm pretty sure about this. The only instructions with an
> interrupt shadow are "sti", "mov ss" and "pop ss".
> 

I believe you are correct here.

	-hpa


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:31 ` Linus Torvalds
  2014-05-02 19:50   ` Andy Lutomirski
  2014-05-02 19:51   ` Linus Torvalds
@ 2014-05-02 20:19   ` Steven Rostedt
  2 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2014-05-02 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin

On Fri, 2 May 2014 12:31:42 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> And NMI not being re-enabled might just be a real advantage. Adding
> Steven to the cc to make him aware of this patch.
> 

There's not much of an advantage for NMIs, as they seldom page fault.
We may get some due to vmalloc'd areas, but the whole nested NMI code
that I wrote was to deal with breakpoints in NMIs.

Although, this patch would have helped before my code, when doing
things like dumping stacks from NMI context, as some stack dumps can
trigger a page fault. In the past, I used dump all task's states from
NMI context to find why the system locked up hard. But due to the
re-enabling of NMIs with page faults, that usually caused the system to
triple fault, and made that debugging method rather useless.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:51   ` Linus Torvalds
  2014-05-02 20:07     ` H. Peter Anvin
@ 2014-05-02 20:30     ` Thomas Gleixner
  2014-05-02 21:01       ` Linus Torvalds
  2014-05-04 23:46     ` Paolo Bonzini
  2 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2014-05-02 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin, Steven Rostedt,
	Gleb Natapov, Paolo Bonzini

On Fri, 2 May 2014, Linus Torvalds wrote:

> On Fri, May 2, 2014 at 12:31 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Also, are you *really* sure that "popf" has the same one-instruction
> > interrupt shadow that "sti" has? Because I'm not at all sure that is
> > true, and it's not documented as far as I can tell. In contrast, the
> > one-instruction shadow after "sti" very much _is_ documented.
> 
> Yeah, I'm pretty sure about this. The only instructions with an
> interrupt shadow are "sti", "mov ss" and "pop ss".
> 
> There may be specific microarchitectures that do it for a "popf" that
> enables interrupts too, but that is not documented _anywhere_ I could
> find.

So what about manipulating the stack so that the popf does not enable
interrupts and do an explicit sti to get the benefit of the
one-instruction shadow ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 20:30     ` Thomas Gleixner
@ 2014-05-02 21:01       ` Linus Torvalds
  2014-05-02 21:04         ` Andy Lutomirski
  2014-05-02 21:28         ` Thomas Gleixner
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2014-05-02 21:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin, Steven Rostedt,
	Gleb Natapov, Paolo Bonzini

On Fri, May 2, 2014 at 1:30 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> So what about manipulating the stack so that the popf does not enable
> interrupts and do an explicit sti to get the benefit of the
> one-instruction shadow ?

That's what I already suggested in the original "I don't think popf
works" email.

It does get more complex since you now have to test things (there are
very much cases where we get page faults and other exceptions with
interrupts disabled), but it shouldn't be much worse.

Btw, Andy, why did you do "popq %rsp"? That just looks crazy. If the
stack isn't contiguous, the subsequent "popf" couldn't have worked
anyway. And I bet it screws with the stack engine. So you should just
have done something like "addq $16,%rsp" or whatever the constant ends
up being.

              Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 21:01       ` Linus Torvalds
@ 2014-05-02 21:04         ` Andy Lutomirski
  2014-05-02 21:07           ` Linus Torvalds
  2014-05-02 21:28         ` Thomas Gleixner
  1 sibling, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2014-05-02 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin, Steven Rostedt,
	Gleb Natapov, Paolo Bonzini

On Fri, May 2, 2014 at 2:01 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, May 2, 2014 at 1:30 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> So what about manipulating the stack so that the popf does not enable
>> interrupts and do an explicit sti to get the benefit of the
>> one-instruction shadow ?
>
> That's what I already suggested in the original "I don't think popf
> works" email.
>
> It does get more complex since you now have to test things (there are
> very much cases where we get page faults and other exceptions with
> interrupts disabled), but it shouldn't be much worse.
>
> Btw, Andy, why did you do "popq %rsp"? That just looks crazy. If the
> stack isn't contiguous, the subsequent "popf" couldn't have worked
> anyway. And I bet it screws with the stack engine. So you should just
> have done something like "addq $16,%rsp" or whatever the constant ends
> up being.

Because otherwise I'd have to keep track of whether it's a zeroentry
or an errorentry.  I can't stuff the offset in a register without even
more stack hackery, since there are no available registers there.  I
could split the whole thing into two code paths, I guess.

--Andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 21:04         ` Andy Lutomirski
@ 2014-05-02 21:07           ` Linus Torvalds
  2014-05-02 21:37             ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2014-05-02 21:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin, Steven Rostedt,
	Gleb Natapov, Paolo Bonzini

On Fri, May 2, 2014 at 2:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Because otherwise I'd have to keep track of whether it's a zeroentry
> or an errorentry.  I can't stuff the offset in a register without even
> more stack hackery, since there are no available registers there.  I
> could split the whole thing into two code paths, I guess.

Ahh. Never mind. I didn't think about the fact that the error entry
case had one more field on the stack. Your approach is all fine, it
was me not seeing the problem.

          Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 21:01       ` Linus Torvalds
  2014-05-02 21:04         ` Andy Lutomirski
@ 2014-05-02 21:28         ` Thomas Gleixner
  1 sibling, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2014-05-02 21:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin, Steven Rostedt,
	Gleb Natapov, Paolo Bonzini

On Fri, 2 May 2014, Linus Torvalds wrote:

> On Fri, May 2, 2014 at 1:30 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > So what about manipulating the stack so that the popf does not enable
> > interrupts and do an explicit sti to get the benefit of the
> > one-instruction shadow ?
> 
> That's what I already suggested in the original "I don't think popf
> works" email.

Missed that.
 
> It does get more complex since you now have to test things (there are
> very much cases where we get page faults and other exceptions with
> interrupts disabled), but it shouldn't be much worse.

Right. The extra conditional is probably not noticable at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 21:07           ` Linus Torvalds
@ 2014-05-02 21:37             ` H. Peter Anvin
  2014-05-02 21:42               ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2014-05-02 21:37 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers, Steven Rostedt, Gleb Natapov,
	Paolo Bonzini

On 05/02/2014 02:07 PM, Linus Torvalds wrote:
> On Fri, May 2, 2014 at 2:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Because otherwise I'd have to keep track of whether it's a zeroentry
>> or an errorentry.  I can't stuff the offset in a register without even
>> more stack hackery, since there are no available registers there.  I
>> could split the whole thing into two code paths, I guess.
> 
> Ahh. Never mind. I didn't think about the fact that the error entry
> case had one more field on the stack. Your approach is all fine, it
> was me not seeing the problem.
> 

I have to admit to being rather partial to the idea of simply doing
"push $0" on entry for the vectors that don't push an error code, like
the early exception handling code does.

	-hpa



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 21:37             ` H. Peter Anvin
@ 2014-05-02 21:42               ` Andy Lutomirski
  2014-05-02 21:44                 ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2014-05-02 21:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers, Steven Rostedt, Gleb Natapov,
	Paolo Bonzini

On Fri, May 2, 2014 at 2:37 PM, H. Peter Anvin <h.peter.anvin@intel.com> wrote:
> On 05/02/2014 02:07 PM, Linus Torvalds wrote:
>> On Fri, May 2, 2014 at 2:04 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>
>>> Because otherwise I'd have to keep track of whether it's a zeroentry
>>> or an errorentry.  I can't stuff the offset in a register without even
>>> more stack hackery, since there are no available registers there.  I
>>> could split the whole thing into two code paths, I guess.
>>
>> Ahh. Never mind. I didn't think about the fact that the error entry
>> case had one more field on the stack. Your approach is all fine, it
>> was me not seeing the problem.
>>
>
> I have to admit to being rather partial to the idea of simply doing
> "push $0" on entry for the vectors that don't push an error code, like
> the early exception handling code does.
>

Hah -- I think I just faked both of you out :)

I don't think this has anything to do with the error code, and I think
that the errorentry code already does more or less that: it pushes -1.

The real issue here is probably the magic 16-byte stack alignment when
a non-stack-switching interrupt happens.

--Andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 21:42               ` Andy Lutomirski
@ 2014-05-02 21:44                 ` H. Peter Anvin
  0 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2014-05-02 21:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers, Steven Rostedt, Gleb Natapov,
	Paolo Bonzini

On 05/02/2014 02:42 PM, Andy Lutomirski wrote:
> 
> Hah -- I think I just faked both of you out :)
> 
> I don't think this has anything to do with the error code, and I think
> that the errorentry code already does more or less that: it pushes -1.
> 
> The real issue here is probably the magic 16-byte stack alignment when
> a non-stack-switching interrupt happens.
> 

Errorentry is when there *is* an error code pushed by the hardware.  The
other variant is zeroentry, which does generate a zero error code --
eventually.  The -1 means we didn't enter the kernel through a system call.

	-hpa


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:50   ` Andy Lutomirski
@ 2014-05-04 18:40     ` Ingo Molnar
  2014-05-04 19:59       ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2014-05-04 18:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	the arch/x86 maintainers, H. Peter Anvin, Steven Rostedt,
	Peter Zijlstra


* Andy Lutomirski <luto@amacapital.net> wrote:

> > That said, regular *device* interrupts do often return to kernel 
> > mode (the idle loop in particular), so if you have any way to 
> > measure that, that might be interesting, and might show some of 
> > the same advantages.
> 
> I can try something awful involving measuring latency of 
> hardware-timed packets on a SolarFlare card, but I'll have 
> calibration issues.  I suppose I could see if 'ping' gets faster.  
> In general, this will speed up interrupts that wake userspace from 
> idle by about 100ns on my box, since it's presumably the same size 
> and the speedup per loop in my silly benchmark.

To simulate high rate device IRQ you can generate very high frequency 
lapic IRQs by using hrtimers, that's generating a ton of per CPU lapic 
IRQs.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-04 18:40     ` Ingo Molnar
@ 2014-05-04 19:59       ` H. Peter Anvin
  2014-05-04 21:31         ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2014-05-04 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Andy Lutomirski
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	the arch/x86 maintainers, Steven Rostedt, Peter Zijlstra

On 05/04/2014 11:40 AM, Ingo Molnar wrote:
> 
> * Andy Lutomirski <luto@amacapital.net> wrote:
> 
>>> That said, regular *device* interrupts do often return to kernel 
>>> mode (the idle loop in particular), so if you have any way to 
>>> measure that, that might be interesting, and might show some of 
>>> the same advantages.
>>
>> I can try something awful involving measuring latency of 
>> hardware-timed packets on a SolarFlare card, but I'll have 
>> calibration issues.  I suppose I could see if 'ping' gets faster.  
>> In general, this will speed up interrupts that wake userspace from 
>> idle by about 100ns on my box, since it's presumably the same size 
>> and the speedup per loop in my silly benchmark.
> 
> To simulate high rate device IRQ you can generate very high frequency 
> lapic IRQs by using hrtimers, that's generating a ton of per CPU lapic 
> IRQs.
> 

The bigger question is if that helps in measuring the actual latency.
It should get more data points, to be sure.

Maybe let userspace sit in a tight loop doing RDTSC, and look for data
points too far apart to have been uninterrupted?

	-hpa



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-04 19:59       ` H. Peter Anvin
@ 2014-05-04 21:31         ` Linus Torvalds
  2014-05-04 22:01           ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2014-05-04 21:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Steven Rostedt, Peter Zijlstra

On Sun, May 4, 2014 at 12:59 PM, H. Peter Anvin <h.peter.anvin@intel.com> wrote:
>
> Maybe let userspace sit in a tight loop doing RDTSC, and look for data
> points too far apart to have been uninterrupted?

That won't work, since Andy's patch improves on the "interrupt
happened in kernel space", not on the user-space interrupt case.

But some variation on that with a kernel module that does something like

 - take over one CPU and force tons of timer interrupts on that CPU
using the local APIC

 - for (say) ten billion cycles, do something like this in that kernel module:

   #define TEN_BILLION (10000000000)

        unsigned long prev = 0, sum = 0, end = rdtsc() + TEN_BILLION;
        for (;;) {
                unsigned long tsc = rdtsc();
                if (tsc > end)
                        break;
                if (tsc < prev + 500) {
                        sum += tsc - prev;
                }
                prev = tsc;
        }

and see how big a fraction of the 10 billion cycles you capture in
'sum'.  The bigger the fraction, the less time the timer interrupts
stole from your CPU.

That "500" is just a random cut-off. Any interrupt will take more than
that many TSC cycles. So the above basically counts how much
uninterrupted time that thread gets.

Hmm?

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-04 21:31         ` Linus Torvalds
@ 2014-05-04 22:01           ` H. Peter Anvin
  0 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2014-05-04 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Steven Rostedt, Peter Zijlstra

On 05/04/2014 02:31 PM, Linus Torvalds wrote:
> On Sun, May 4, 2014 at 12:59 PM, H. Peter Anvin <h.peter.anvin@intel.com> wrote:
>>
>> Maybe let userspace sit in a tight loop doing RDTSC, and look for data
>> points too far apart to have been uninterrupted?
> 
> That won't work, since Andy's patch improves on the "interrupt
> happened in kernel space", not on the user-space interrupt case.
> 

I was thinking about your proposal, not Andy's.

> But some variation on that with a kernel module that does something like
> 
>  - take over one CPU and force tons of timer interrupts on that CPU
> using the local APIC
> 
>  - for (say) ten billion cycles, do something like this in that kernel module:
> 
>    #define TEN_BILLION (10000000000)
> 
>         unsigned long prev = 0, sum = 0, end = rdtsc() + TEN_BILLION;
>         for (;;) {
>                 unsigned long tsc = rdtsc();
>                 if (tsc > end)
>                         break;
>                 if (tsc < prev + 500) {
>                         sum += tsc - prev;
>                 }
>                 prev = tsc;
>         }
> 
> and see how big a fraction of the 10 billion cycles you capture in
> 'sum'.  The bigger the fraction, the less time the timer interrupts
> stole from your CPU.
> 
> That "500" is just a random cut-off. Any interrupt will take more than
> that many TSC cycles. So the above basically counts how much
> uninterrupted time that thread gets.

Yes, same idea, but in a kernel module.

	-hpa



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-02 19:51   ` Linus Torvalds
  2014-05-02 20:07     ` H. Peter Anvin
  2014-05-02 20:30     ` Thomas Gleixner
@ 2014-05-04 23:46     ` Paolo Bonzini
  2014-05-04 23:49       ` H. Peter Anvin
  2 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-04 23:46 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	H. Peter Anvin, Steven Rostedt, Gleb Natapov

Il 02/05/2014 21:51, Linus Torvalds ha scritto:
>> > Also, are you *really* sure that "popf" has the same one-instruction
>> > interrupt shadow that "sti" has? Because I'm not at all sure that is
>> > true, and it's not documented as far as I can tell. In contrast, the
>> > one-instruction shadow after "sti" very much _is_ documented.
> Yeah, I'm pretty sure about this. The only instructions with an
> interrupt shadow are "sti", "mov ss" and "pop ss".

Yep.

> There may be specific microarchitectures that do it for a "popf" that
> enables interrupts too, but that is not documented _anywhere_ I could
> find.
>
> Btw, on the "really easy to get wrong in emulation" note and looking
> at the kernel sources: it looks like KVM gets "pop ss" wrong, and only
> does the shadow on "mov ss".

Thanks, that's useful to know (and easy to fix).  Note that in practice 
arch/x86/kvm/emulate.c will only emulate POP SS in big real mode or if 
the stack is in MMIO memory.  The interrupt shadow will be handled by 
the processor in all other cases, and Intel calls the bit "Blocking by 
MOV SS" even if it also applies to POP SS.

Your suggested trick of splitting the return paths for IF=0/IF=1 can be 
also done like this:

	movq EFLAGS-ARGOFFSET(%rsp), %rdi
	btrq $9, %rdi		# Clear IF, save old value in CF
	movq %rdi, (%rsi)
	...
	popfq
	jnc	1f		# If IF was 0, just return
	sti			# Using STI gets us an interrupt shadow
1f:
	retq

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC/HACK] x86: Fast return to kernel
  2014-05-04 23:46     ` Paolo Bonzini
@ 2014-05-04 23:49       ` H. Peter Anvin
  0 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2014-05-04 23:49 UTC (permalink / raw)
  To: Paolo Bonzini, Linus Torvalds, Andy Lutomirski
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	Steven Rostedt, Gleb Natapov

On 05/04/2014 04:46 PM, Paolo Bonzini wrote:
> 
> Your suggested trick of splitting the return paths for IF=0/IF=1 can be
> also done like this:
> 
>     movq EFLAGS-ARGOFFSET(%rsp), %rdi
>     btrq $9, %rdi        # Clear IF, save old value in CF
>     movq %rdi, (%rsi)
>     ...
>     popfq
>     jnc    1f        # If IF was 0, just return
>     sti            # Using STI gets us an interrupt shadow
> 1f:
>     retq
> 

That doesn't work, because CF gets restored by the popfq as well.
Unfortunately.

	-hpa



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-05-04 23:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-02 19:04 [RFC/HACK] x86: Fast return to kernel Andy Lutomirski
2014-05-02 19:31 ` Linus Torvalds
2014-05-02 19:50   ` Andy Lutomirski
2014-05-04 18:40     ` Ingo Molnar
2014-05-04 19:59       ` H. Peter Anvin
2014-05-04 21:31         ` Linus Torvalds
2014-05-04 22:01           ` H. Peter Anvin
2014-05-02 19:51   ` Linus Torvalds
2014-05-02 20:07     ` H. Peter Anvin
2014-05-02 20:30     ` Thomas Gleixner
2014-05-02 21:01       ` Linus Torvalds
2014-05-02 21:04         ` Andy Lutomirski
2014-05-02 21:07           ` Linus Torvalds
2014-05-02 21:37             ` H. Peter Anvin
2014-05-02 21:42               ` Andy Lutomirski
2014-05-02 21:44                 ` H. Peter Anvin
2014-05-02 21:28         ` Thomas Gleixner
2014-05-04 23:46     ` Paolo Bonzini
2014-05-04 23:49       ` H. Peter Anvin
2014-05-02 20:19   ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.