All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@parallels.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
Date: Fri, 23 Jan 2015 20:09:49 +0300	[thread overview]
Message-ID: <1422032989.6345.26.camel@tkhai> (raw)
In-Reply-To: <CALCETrVEsNj8dvqd-mNqb5tKNQOwQEgtMRUeTtJSS8-EmntAiA@mail.gmail.com>

В Пт, 23/01/2015 в 08:24 -0800, Andy Lutomirski пишет:
> On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
> >> It's useless to send reschedule interrupts in such situations. The earliest
> >> point, where schedule() call is possible, is sysret_careful(). But in that
> >> function we directly test TIF_NEED_RESCHED.
> >>
> >> So it's possible to get rid of that type of interrupts.
> >>
> >> How about this idea? Is set_bit() cheap on x86 machines?
> >
> > So you set TIF_POLLING_NRFLAG on syscall entry and clear it again on
> > exit? Thereby we avoid the IPI, because the exit path already checks for
> > TIF_NEED_RESCHED.
> 
> The idle code says:
> 
>         /*
>          * If the arch has a polling bit, we maintain an invariant:
>          *
>          * Our polling bit is clear if we're not scheduled (i.e. if
>          * rq->curr != rq->idle).  This means that, if rq->idle has
>          * the polling bit set, then setting need_resched is
>          * guaranteed to cause the cpu to reschedule.
>          */
> 
> Setting polling on non-idle tasks like this will either involve
> weakening this a bit (it'll still be true for rq->idle) or changing
> the polling state on context switch.
> 
> >
> > Should work I suppose, but I'm not too familiar with all that entry.S
> > muck. Andy might know and appreciate this.
> >
> >> ---
> >>  arch/x86/kernel/entry_64.S |   10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> >> index c653dc4..a046ba8 100644
> >> --- a/arch/x86/kernel/entry_64.S
> >> +++ b/arch/x86/kernel/entry_64.S
> >> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
> >>       movq_cfi rax,(ORIG_RAX-ARGOFFSET)
> >>       movq  %rcx,RIP-ARGOFFSET(%rsp)
> >>       CFI_REL_OFFSET rip,RIP-ARGOFFSET
> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> +     /*
> >> +      * Tell resched_curr() do not send useless interrupts to us.
> >> +      * Kernel isn't preemptible till sysret_careful() anyway.
> >> +      */
> >> +     LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> +#endif
> 
> That's kind of expensive.  What's the !SMP part for?

smp_send_reschedule() is NOP on UP. There is no problem.

> 
> >>       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >>       jnz tracesys
> >>  system_call_fastpath:
> >> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
> >>   * Has incomplete stack frame and undefined top of stack.
> >>   */
> >>  ret_from_sys_call:
> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> +     LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> +#endif
> 
> If only it were this simple.  There are lots of ways out of syscalls,
> and this is only one of them :(  If we did this, I'd rather do it
> through the do_notify_resume mechanism or something.

Yes, syscall is the only thing I did as an example.

> I don't see any way to do this without at least one atomic op or
> smp_mb per syscall, and that's kind of expensive.

JFI, doesn't x86 set_bit() lock a small area of memory? I thought
it's not very expensive on this arch (some bus optimizations or
something like this).

> Would it make sense to try to use context tracking instead?  On
> systems that use context tracking, syscalls are already expensive, and
> we're already keeping track of which CPUs are in user mode.

I'll look at context_tracking, but I'm not sure some smp synchronization
there.

Thanks,
Kirill


  reply	other threads:[~2015-01-23 17:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-23 15:53 [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT Kirill Tkhai
2015-01-23 16:07 ` Peter Zijlstra
2015-01-23 16:24   ` Andy Lutomirski
2015-01-23 17:09     ` Kirill Tkhai [this message]
2015-01-24  2:36       ` Andy Lutomirski
2015-01-26 11:58         ` Kirill Tkhai
2015-02-03 17:14           ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1422032989.6345.26.camel@tkhai \
    --to=ktkhai@parallels.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.