linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: fweisbec@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>, X86 ML <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>,
	williams@redhat.com
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Thu, 7 May 2015 14:44:37 +0200	[thread overview]
Message-ID: <20150507124437.GB17443@gmail.com> (raw)
In-Reply-To: <CALCETrXtd2PBLBhAdKPTxuMbE9pUSvsDLpBJnd2srJg5hk=LhQ@mail.gmail.com>


* Andy Lutomirski <luto@amacapital.net> wrote:

> > > We cannot take the lock_trace(task) from irq context, and we 
> > > probably do not need to anyway, since we do not care about a 
> > > precise stack trace for the task.
> >
> > So one worry with this and similar approaches of statistically 
> > detecting user mode would be the fact that on the way out to 
> > user-space we don't really destroy the previous call trace - we 
> > just pop off the stack (non-destructively), restore RIPs and are 
> > gone.
> >
> > We'll need that percpu flag I suspect.
> >
> > And once we have the flag, we can get rid of the per syscall RCU 
> > callback as well, relatively easily: with CMPXCHG (in 
> > synchronize_rcu()!) we can reliably sample whether a CPU is in 
> > user mode right now, while the syscall entry/exit path does not 
> > use any atomics, we can just use a simple MOV.
> >
> > Once we observe 'user mode', then we have observed quiescent state 
> > and synchronize_rcu() can continue. If we've observed kernel mode 
> > we can frob the remote task's TIF_ flags to make it go into a 
> > quiescent state publishing routine on syscall-return.
> >
> 
> How does that work?
>
> If the exit code writes the per-cpu flag and then checks 
> TIF_whatever, we need a barrier to avoid a race where we end up in 
> user mode without seeing the flag.

No, the TIF_RCU_SYNC flag would be independent of the user-mode flag.

Also, the user-mode flag would be changed from the 'kernel-mode' value 
to the 'user-mode' value after we do the regular TIF checks - just 
before we hit user-space.

The TIF_RCU_QS thing is just a fancy way for synchronize_rcu() (being 
executed on some other CPU not doing RT work) to intelligently wait 
for the remote (RT work doing) CPU to finish executing kernel code, 
without polling or so.

And yes, the TIF_RCU_QS handler on the remote CPU would have to 
execute an SMP barrier, before it allows an RCU quiescent state to 
pass. Note that the regular RCU code for quiescent state publishing 
already has that barrier, typically something like:

	this_cpu_inc(rcu_qs_ctr);

That in itself is enough for synchronize_rcu(), it could do a small 
timing loop with short sleeps, until it waits for rcu_qs_ctr to 
increase.

> I think the right solution is to accept that race and just have the 
> RCU code send an IPI (or check again) if it sees too long of a 
> period elapse in kernel mode.

I don't think there's any need for an IPI.

> I think the flag should be a counter, though.  That way a workload 
> that makes lots of syscalls will be quickly detected as going 
> through quiescent states even if it's never actually observed in 
> user mode.

Flag write is easier on the CPU than an INC/DEC: only a store to a 
percpu location, no load needed, and no value dependencies. The store 
will be program ordered, so it's a spin_unlock() work-alike.

> > The only hard requirement of this scheme from the RCU 
> > synchronization POV is that all kernel contexts that may touch RCU 
> > state need to flip this flag reliably to 'kernel mode': i.e. all 
> > irq handlers, traps, NMIs and all syscall variants need to do 
> > this.
> >
> > But once it's there, it's really neat.
> 
> We already have to do this with the current code.  I went through 
> and checked all of the IST entries a couple versions ago.

Yeah - I just mean if the primary flag is _not_ TIF driven (which is 
my suggestion, and which I thought you suggested as well) then all 
code paths need to be covered.

Things like the irq-tracking callbacks are debugging and 
instrumentation code - the user/kernel mode flag would be a hard 
requirement on RCU correctness: missing a flag update might cause the 
kernel to crash in non-obvious ways.

Thanks,

	Ingo

  reply	other threads:[~2015-05-07 12:44 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56   ` Andy Lutomirski
2015-05-01  6:40   ` Ingo Molnar
2015-05-01 15:20     ` Rik van Riel
2015-05-01 15:59       ` Ingo Molnar
2015-05-01 16:03         ` Andy Lutomirski
2015-05-01 16:21           ` Ingo Molnar
2015-05-01 16:26             ` Rik van Riel
2015-05-01 16:34               ` Ingo Molnar
2015-05-01 18:05                 ` Rik van Riel
2015-05-01 18:40                   ` Ingo Molnar
2015-05-01 19:11                     ` Rik van Riel
2015-05-01 19:37                       ` Andy Lutomirski
2015-05-02  5:27                         ` Ingo Molnar
2015-05-02 18:27                           ` Rik van Riel
2015-05-03 18:41                           ` Andy Lutomirski
2015-05-07 10:35                             ` Ingo Molnar
2015-05-04  9:26                           ` Paolo Bonzini
2015-05-04 13:30                             ` Rik van Riel
2015-05-04 14:06                             ` Rik van Riel
2015-05-04 14:19                             ` Rik van Riel
2015-05-04 15:59                             ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39                               ` Paul E. McKenney
2015-05-04 19:39                                 ` Rik van Riel
2015-05-04 20:02                                   ` Paul E. McKenney
2015-05-04 20:13                                     ` Rik van Riel
2015-05-04 20:38                                       ` Paul E. McKenney
2015-05-04 20:53                                         ` Rik van Riel
2015-05-05  5:54                                           ` Paul E. McKenney
2015-05-06  1:49                                             ` Mike Galbraith
2015-05-06  3:44                                               ` Mike Galbraith
2015-05-06  6:06                                                 ` Paul E. McKenney
2015-05-06  6:52                                                   ` Mike Galbraith
2015-05-06  7:01                                                     ` Mike Galbraith
2015-05-07  0:59                                           ` Frederic Weisbecker
2015-05-07 15:44                                             ` Rik van Riel
2015-05-04 19:00                               ` Rik van Riel
2015-05-04 19:39                                 ` Paul E. McKenney
2015-05-04 19:59                                   ` Rik van Riel
2015-05-04 20:40                                     ` Paul E. McKenney
2015-05-05 10:53                                   ` Peter Zijlstra
2015-05-05 12:34                                     ` Paul E. McKenney
2015-05-05 13:00                                       ` Peter Zijlstra
2015-05-05 18:35                                         ` Paul E. McKenney
2015-05-05 21:09                                           ` Rik van Riel
2015-05-06  5:41                                             ` Paul E. McKenney
2015-05-05 10:48                                 ` Peter Zijlstra
2015-05-05 10:51                                   ` Peter Zijlstra
2015-05-05 12:30                                     ` Paul E. McKenney
2015-05-02  4:06                   ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37             ` Ingo Molnar
2015-05-01 16:40               ` Rik van Riel
2015-05-01 16:45                 ` Ingo Molnar
2015-05-01 16:54                   ` Rik van Riel
2015-05-01 17:12                     ` Ingo Molnar
2015-05-01 17:22                       ` Rik van Riel
2015-05-01 17:59                         ` Ingo Molnar
2015-05-01 16:22           ` Rik van Riel
2015-05-01 16:27             ` Ingo Molnar
2015-05-03 13:23       ` Mike Galbraith
2015-05-03 17:30         ` Rik van Riel
2015-05-03 18:24           ` Andy Lutomirski
2015-05-03 18:52             ` Rik van Riel
2015-05-07 10:48               ` Ingo Molnar
2015-05-07 12:18                 ` Frederic Weisbecker
2015-05-07 12:29                   ` Ingo Molnar
2015-05-07 15:47                     ` Rik van Riel
2015-05-08  7:58                       ` Ingo Molnar
2015-05-07 12:22                 ` Andy Lutomirski
2015-05-07 12:44                   ` Ingo Molnar [this message]
2015-05-07 12:49                     ` Ingo Molnar
2015-05-08  6:17                       ` Paul E. McKenney
2015-05-07 12:52                     ` Andy Lutomirski
2015-05-07 15:08                       ` Ingo Molnar
2015-05-07 17:47                         ` Andy Lutomirski
2015-05-08  6:37                           ` Ingo Molnar
2015-05-08 10:59                             ` Andy Lutomirski
2015-05-08 11:27                               ` Ingo Molnar
2015-05-08 12:56                                 ` Andy Lutomirski
2015-05-08 13:27                                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150507124437.GB17443@gmail.com \
    --to=mingo@kernel.org \
    --cc=fweisbec@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).