[RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
@ 2015-01-23 15:53 Kirill Tkhai
  2015-01-23 16:07 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2015-01-23 15:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

It's useless to send reschedule interrupts in such situations. The earliest
point, where schedule() call is possible, is sysret_careful(). But in that
function we directly test TIF_NEED_RESCHED.

So it's possible to get rid of that type of interrupts.

How about this idea? Is set_bit() cheap on x86 machines?
---
 arch/x86/kernel/entry_64.S |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c653dc4..a046ba8 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
 	movq_cfi rax,(ORIG_RAX-ARGOFFSET)
 	movq  %rcx,RIP-ARGOFFSET(%rsp)
 	CFI_REL_OFFSET rip,RIP-ARGOFFSET
+#if !defined(CONFIG_PREEMPT) || !defined(SMP)
+	/*
+	 * Tell resched_curr() do not send useless interrupts to us.
+	 * Kernel isn't preemptible till sysret_careful() anyway.
+	 */
+	LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+#endif
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 	jnz tracesys
 system_call_fastpath:
@@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
  * Has incomplete stack frame and undefined top of stack.
  */
 ret_from_sys_call:
+#if !defined(CONFIG_PREEMPT) || !defined(SMP)
+	LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+#endif
 	movl $_TIF_ALLWORK_MASK,%edi
 	/* edi:	flagmask */
 sysret_check:




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
  2015-01-23 15:53 [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT Kirill Tkhai
@ 2015-01-23 16:07 ` Peter Zijlstra
  2015-01-23 16:24   ` Andy Lutomirski
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2015-01-23 16:07 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto

On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
> It's useless to send reschedule interrupts in such situations. The earliest
> point, where schedule() call is possible, is sysret_careful(). But in that
> function we directly test TIF_NEED_RESCHED.
> 
> So it's possible to get rid of that type of interrupts.
> 
> How about this idea? Is set_bit() cheap on x86 machines?

So you set TIF_POLLING_NRFLAG on syscall entry and clear it again on
exit? Thereby we avoid the IPI, because the exit path already checks for
TIF_NEED_RESCHED.

Should work I suppose, but I'm not too familiar with all that entry.S
muck. Andy might know and appreciate this.

> ---
>  arch/x86/kernel/entry_64.S |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index c653dc4..a046ba8 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
>  	movq_cfi rax,(ORIG_RAX-ARGOFFSET)
>  	movq  %rcx,RIP-ARGOFFSET(%rsp)
>  	CFI_REL_OFFSET rip,RIP-ARGOFFSET
> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> +	/*
> +	 * Tell resched_curr() do not send useless interrupts to us.
> +	 * Kernel isn't preemptible till sysret_careful() anyway.
> +	 */
> +	LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> +#endif
>  	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>  	jnz tracesys
>  system_call_fastpath:
> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
>   * Has incomplete stack frame and undefined top of stack.
>   */
>  ret_from_sys_call:
> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> +	LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> +#endif
>  	movl $_TIF_ALLWORK_MASK,%edi
>  	/* edi:	flagmask */
>  sysret_check:
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
  2015-01-23 16:07 ` Peter Zijlstra
@ 2015-01-23 16:24   ` Andy Lutomirski
  2015-01-23 17:09     ` Kirill Tkhai
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2015-01-23 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill Tkhai, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
>> It's useless to send reschedule interrupts in such situations. The earliest
>> point, where schedule() call is possible, is sysret_careful(). But in that
>> function we directly test TIF_NEED_RESCHED.
>>
>> So it's possible to get rid of that type of interrupts.
>>
>> How about this idea? Is set_bit() cheap on x86 machines?
>
> So you set TIF_POLLING_NRFLAG on syscall entry and clear it again on
> exit? Thereby we avoid the IPI, because the exit path already checks for
> TIF_NEED_RESCHED.

The idle code says:

        /*
         * If the arch has a polling bit, we maintain an invariant:
         *
         * Our polling bit is clear if we're not scheduled (i.e. if
         * rq->curr != rq->idle).  This means that, if rq->idle has
         * the polling bit set, then setting need_resched is
         * guaranteed to cause the cpu to reschedule.
         */

Setting polling on non-idle tasks like this will either involve
weakening this a bit (it'll still be true for rq->idle) or changing
the polling state on context switch.

>
> Should work I suppose, but I'm not too familiar with all that entry.S
> muck. Andy might know and appreciate this.
>
>> ---
>>  arch/x86/kernel/entry_64.S |   10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index c653dc4..a046ba8 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
>>       movq_cfi rax,(ORIG_RAX-ARGOFFSET)
>>       movq  %rcx,RIP-ARGOFFSET(%rsp)
>>       CFI_REL_OFFSET rip,RIP-ARGOFFSET
>> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
>> +     /*
>> +      * Tell resched_curr() do not send useless interrupts to us.
>> +      * Kernel isn't preemptible till sysret_careful() anyway.
>> +      */
>> +     LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>> +#endif

That's kind of expensive.  What's the !SMP part for?

>>       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>>       jnz tracesys
>>  system_call_fastpath:
>> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
>>   * Has incomplete stack frame and undefined top of stack.
>>   */
>>  ret_from_sys_call:
>> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
>> +     LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>> +#endif

If only it were this simple.  There are lots of ways out of syscalls,
and this is only one of them :(  If we did this, I'd rather do it
through the do_notify_resume mechanism or something.

I don't see any way to do this without at least one atomic op or
smp_mb per syscall, and that's kind of expensive.

Would it make sense to try to use context tracking instead?  On
systems that use context tracking, syscalls are already expensive, and
we're already keeping track of which CPUs are in user mode.

--Andy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
  2015-01-23 16:24   ` Andy Lutomirski
@ 2015-01-23 17:09     ` Kirill Tkhai
  2015-01-24  2:36       ` Andy Lutomirski
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2015-01-23 17:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

В Пт, 23/01/2015 в 08:24 -0800, Andy Lutomirski пишет:
> On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
> >> It's useless to send reschedule interrupts in such situations. The earliest
> >> point, where schedule() call is possible, is sysret_careful(). But in that
> >> function we directly test TIF_NEED_RESCHED.
> >>
> >> So it's possible to get rid of that type of interrupts.
> >>
> >> How about this idea? Is set_bit() cheap on x86 machines?
> >
> > So you set TIF_POLLING_NRFLAG on syscall entry and clear it again on
> > exit? Thereby we avoid the IPI, because the exit path already checks for
> > TIF_NEED_RESCHED.
> 
> The idle code says:
> 
>         /*
>          * If the arch has a polling bit, we maintain an invariant:
>          *
>          * Our polling bit is clear if we're not scheduled (i.e. if
>          * rq->curr != rq->idle).  This means that, if rq->idle has
>          * the polling bit set, then setting need_resched is
>          * guaranteed to cause the cpu to reschedule.
>          */
> 
> Setting polling on non-idle tasks like this will either involve
> weakening this a bit (it'll still be true for rq->idle) or changing
> the polling state on context switch.
> 
> >
> > Should work I suppose, but I'm not too familiar with all that entry.S
> > muck. Andy might know and appreciate this.
> >
> >> ---
> >>  arch/x86/kernel/entry_64.S |   10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> >> index c653dc4..a046ba8 100644
> >> --- a/arch/x86/kernel/entry_64.S
> >> +++ b/arch/x86/kernel/entry_64.S
> >> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
> >>       movq_cfi rax,(ORIG_RAX-ARGOFFSET)
> >>       movq  %rcx,RIP-ARGOFFSET(%rsp)
> >>       CFI_REL_OFFSET rip,RIP-ARGOFFSET
> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> +     /*
> >> +      * Tell resched_curr() do not send useless interrupts to us.
> >> +      * Kernel isn't preemptible till sysret_careful() anyway.
> >> +      */
> >> +     LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> +#endif
> 
> That's kind of expensive.  What's the !SMP part for?

smp_send_reschedule() is NOP on UP. There is no problem.

> 
> >>       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >>       jnz tracesys
> >>  system_call_fastpath:
> >> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
> >>   * Has incomplete stack frame and undefined top of stack.
> >>   */
> >>  ret_from_sys_call:
> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> +     LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> +#endif
> 
> If only it were this simple.  There are lots of ways out of syscalls,
> and this is only one of them :(  If we did this, I'd rather do it
> through the do_notify_resume mechanism or something.

Yes, syscall is the only thing I did as an example.

> I don't see any way to do this without at least one atomic op or
> smp_mb per syscall, and that's kind of expensive.

JFI, doesn't x86 set_bit() lock a small area of memory? I thought
it's not very expensive on this arch (some bus optimizations or
something like this).

> Would it make sense to try to use context tracking instead?  On
> systems that use context tracking, syscalls are already expensive, and
> we're already keeping track of which CPUs are in user mode.

I'll look at context_tracking, but I'm not sure some smp synchronization
there.

Thanks,
Kirill


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
  2015-01-23 17:09     ` Kirill Tkhai
@ 2015-01-24  2:36       ` Andy Lutomirski
  2015-01-26 11:58         ` Kirill Tkhai
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2015-01-24  2:36 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

On Fri, Jan 23, 2015 at 9:09 AM, Kirill Tkhai <ktkhai@parallels.com> wrote:
> В Пт, 23/01/2015 в 08:24 -0800, Andy Lutomirski пишет:
>> On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
>> >> ---
>> >>  arch/x86/kernel/entry_64.S |   10 ++++++++++
>> >>  1 file changed, 10 insertions(+)
>> >>
>> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> >> index c653dc4..a046ba8 100644
>> >> --- a/arch/x86/kernel/entry_64.S
>> >> +++ b/arch/x86/kernel/entry_64.S
>> >> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
>> >>       movq_cfi rax,(ORIG_RAX-ARGOFFSET)
>> >>       movq  %rcx,RIP-ARGOFFSET(%rsp)
>> >>       CFI_REL_OFFSET rip,RIP-ARGOFFSET
>> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
>> >> +     /*
>> >> +      * Tell resched_curr() do not send useless interrupts to us.
>> >> +      * Kernel isn't preemptible till sysret_careful() anyway.
>> >> +      */
>> >> +     LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>> >> +#endif
>>
>> That's kind of expensive.  What's the !SMP part for?
>
> smp_send_reschedule() is NOP on UP. There is no problem.

Shouldn't it be #if !defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) then?

>
>>
>> >>       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>> >>       jnz tracesys
>> >>  system_call_fastpath:
>> >> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
>> >>   * Has incomplete stack frame and undefined top of stack.
>> >>   */
>> >>  ret_from_sys_call:
>> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
>> >> +     LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>> >> +#endif
>>
>> If only it were this simple.  There are lots of ways out of syscalls,
>> and this is only one of them :(  If we did this, I'd rather do it
>> through the do_notify_resume mechanism or something.
>
> Yes, syscall is the only thing I did as an example.
>
>> I don't see any way to do this without at least one atomic op or
>> smp_mb per syscall, and that's kind of expensive.
>
> JFI, doesn't x86 set_bit() lock a small area of memory? I thought
> it's not very expensive on this arch (some bus optimizations or
> something like this).

An entire syscall on x86 is well under 200 cycles.  lock addl is >20
cycles for me, and I don't see why the atomic bitops would be faster.
(Oddly, mfence is slower than lock addl, which is really odd, since
lock addl implies mfence.)  So this overhead may actually matter.

>
>> Would it make sense to try to use context tracking instead?  On
>> systems that use context tracking, syscalls are already expensive, and
>> we're already keeping track of which CPUs are in user mode.
>
> I'll look at context_tracking, but I'm not sure some smp synchronization
> there.

It could be combinable with existing synchronization there.

--Andy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
  2015-01-24  2:36       ` Andy Lutomirski
@ 2015-01-26 11:58         ` Kirill Tkhai
  2015-02-03 17:14           ` Kirill Tkhai
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2015-01-26 11:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

В Пт, 23/01/2015 в 18:36 -0800, Andy Lutomirski пишет:
> On Fri, Jan 23, 2015 at 9:09 AM, Kirill Tkhai <ktkhai@parallels.com> wrote:
> > В Пт, 23/01/2015 в 08:24 -0800, Andy Lutomirski пишет:
> >> On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
> >> >> ---
> >> >>  arch/x86/kernel/entry_64.S |   10 ++++++++++
> >> >>  1 file changed, 10 insertions(+)
> >> >>
> >> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> >> >> index c653dc4..a046ba8 100644
> >> >> --- a/arch/x86/kernel/entry_64.S
> >> >> +++ b/arch/x86/kernel/entry_64.S
> >> >> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
> >> >>       movq_cfi rax,(ORIG_RAX-ARGOFFSET)
> >> >>       movq  %rcx,RIP-ARGOFFSET(%rsp)
> >> >>       CFI_REL_OFFSET rip,RIP-ARGOFFSET
> >> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> >> +     /*
> >> >> +      * Tell resched_curr() do not send useless interrupts to us.
> >> >> +      * Kernel isn't preemptible till sysret_careful() anyway.
> >> >> +      */
> >> >> +     LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> >> +#endif
> >>
> >> That's kind of expensive.  What's the !SMP part for?
> >
> > smp_send_reschedule() is NOP on UP. There is no problem.
> 
> Shouldn't it be #if !defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) then?

Definitely, thanks.

> 
> >
> >>
> >> >>       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> >>       jnz tracesys
> >> >>  system_call_fastpath:
> >> >> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
> >> >>   * Has incomplete stack frame and undefined top of stack.
> >> >>   */
> >> >>  ret_from_sys_call:
> >> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> >> +     LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> >> +#endif
> >>
> >> If only it were this simple.  There are lots of ways out of syscalls,
> >> and this is only one of them :(  If we did this, I'd rather do it
> >> through the do_notify_resume mechanism or something.
> >
> > Yes, syscall is the only thing I did as an example.
> >
> >> I don't see any way to do this without at least one atomic op or
> >> smp_mb per syscall, and that's kind of expensive.
> >
> > JFI, doesn't x86 set_bit() lock a small area of memory? I thought
> > it's not very expensive on this arch (some bus optimizations or
> > something like this).
> 
> An entire syscall on x86 is well under 200 cycles.  lock addl is >20
> cycles for me, and I don't see why the atomic bitops would be faster.
> (Oddly, mfence is slower than lock addl, which is really odd, since
> lock addl implies mfence.)  So this overhead may actually matter.

Yeah, it's really big overhead.

> >
> >> Would it make sense to try to use context tracking instead?  On
> >> systems that use context tracking, syscalls are already expensive, and
> >> we're already keeping track of which CPUs are in user mode.
> >
> > I'll look at context_tracking, but I'm not sure some smp synchronization
> > there.
> 
> It could be combinable with existing synchronization there.

I'll look at this. Thanks!

Kirill


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
  2015-01-26 11:58         ` Kirill Tkhai
@ 2015-02-03 17:14           ` Kirill Tkhai
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2015-02-03 17:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

В Пн, 26/01/2015 в 14:58 +0300, Kirill Tkhai пишет:
> В Пт, 23/01/2015 в 18:36 -0800, Andy Lutomirski пишет:
> > On Fri, Jan 23, 2015 at 9:09 AM, Kirill Tkhai <ktkhai@parallels.com> wrote:
> > > В Пт, 23/01/2015 в 08:24 -0800, Andy Lutomirski пишет:
> > >> On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >> > On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
> > >> >> ---
> > >> >>  arch/x86/kernel/entry_64.S |   10 ++++++++++
> > >> >>  1 file changed, 10 insertions(+)
> > >> >>
> > >> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > >> >> index c653dc4..a046ba8 100644
> > >> >> --- a/arch/x86/kernel/entry_64.S
> > >> >> +++ b/arch/x86/kernel/entry_64.S
> > >> >> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
> > >> >>       movq_cfi rax,(ORIG_RAX-ARGOFFSET)
> > >> >>       movq  %rcx,RIP-ARGOFFSET(%rsp)
> > >> >>       CFI_REL_OFFSET rip,RIP-ARGOFFSET
> > >> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> > >> >> +     /*
> > >> >> +      * Tell resched_curr() do not send useless interrupts to us.
> > >> >> +      * Kernel isn't preemptible till sysret_careful() anyway.
> > >> >> +      */
> > >> >> +     LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> > >> >> +#endif
> > >>
> > >> That's kind of expensive.  What's the !SMP part for?
> > >
> > > smp_send_reschedule() is NOP on UP. There is no problem.
> > 
> > Shouldn't it be #if !defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) then?
> 
> Definitely, thanks.
> 
> > 
> > >
> > >>
> > >> >>       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> > >> >>       jnz tracesys
> > >> >>  system_call_fastpath:
> > >> >> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
> > >> >>   * Has incomplete stack frame and undefined top of stack.
> > >> >>   */
> > >> >>  ret_from_sys_call:
> > >> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> > >> >> +     LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> > >> >> +#endif
> > >>
> > >> If only it were this simple.  There are lots of ways out of syscalls,
> > >> and this is only one of them :(  If we did this, I'd rather do it
> > >> through the do_notify_resume mechanism or something.
> > >
> > > Yes, syscall is the only thing I did as an example.
> > >
> > >> I don't see any way to do this without at least one atomic op or
> > >> smp_mb per syscall, and that's kind of expensive.
> > >
> > > JFI, doesn't x86 set_bit() lock a small area of memory? I thought
> > > it's not very expensive on this arch (some bus optimizations or
> > > something like this).
> > 
> > An entire syscall on x86 is well under 200 cycles.  lock addl is >20
> > cycles for me, and I don't see why the atomic bitops would be faster.
> > (Oddly, mfence is slower than lock addl, which is really odd, since
> > lock addl implies mfence.)  So this overhead may actually matter.
> 
> Yeah, it's really big overhead.
> 
> > >
> > >> Would it make sense to try to use context tracking instead?  On
> > >> systems that use context tracking, syscalls are already expensive, and
> > >> we're already keeping track of which CPUs are in user mode.
> > >
> > > I'll look at context_tracking, but I'm not sure some smp synchronization
> > > there.
> > 
> > It could be combinable with existing synchronization there.
> 
> I'll look at this. Thanks!

Сontinuing the theme. I've tried the idea with RCU. Fast & dirty patch
which prevents unnecessary interrupts.

It prevents 2% of reschedule IPIs. The cost is atomic_read() in
resched_curr(). It looks like the profit is not too much...

Kirill


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-03 17:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-23 15:53 [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT Kirill Tkhai
2015-01-23 16:07 ` Peter Zijlstra
2015-01-23 16:24   ` Andy Lutomirski
2015-01-23 17:09     ` Kirill Tkhai
2015-01-24  2:36       ` Andy Lutomirski
2015-01-26 11:58         ` Kirill Tkhai
2015-02-03 17:14           ` Kirill Tkhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.