All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Rik van Riel <riel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>,
	williams@redhat.com, Andrew Lutomirski <luto@kernel.org>,
	fweisbec@redhat.com, Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: question about RCU dynticks_nesting
Date: Mon, 4 May 2015 12:39:23 -0700	[thread overview]
Message-ID: <20150504193923.GX5381@linux.vnet.ibm.com> (raw)
In-Reply-To: <5547C1DC.10802@redhat.com>

On Mon, May 04, 2015 at 03:00:44PM -0400, Rik van Riel wrote:
> On 05/04/2015 11:59 AM, Rik van Riel wrote:
> 
> > However, currently the RCU code seems to use a much more
> > complex counting scheme, with a different increment for
> > kernel/task use, and irq use.
> > 
> > This counter seems to be modeled on the task preempt_counter,
> > where we do care about whether we are in task context, irq
> > context, or softirq context.
> > 
> > On the other hand, the RCU code only seems to care about
> > whether or not a CPU is in an extended quiescent state,
> > or is potentially in an RCU critical section.
> > 
> > Paul, what is the reason for RCU using a complex counter,
> > instead of a simple increment for each potential kernel/RCU
> > entry, like rcu_read_lock() does with CONFIG_PREEMPT_RCU
> > enabled?
> 
> Looking at the code for a while more, I have not found
> any reason why the rcu dynticks counter is so complex.

For the nesting counter, please see my earlier email.

> The rdtp->dynticks atomic seems to be used as a serial
> number. Odd means the cpu is in an rcu quiescent state,
> even means it is not.

Yep.

> This test is used to verify whether or not a CPU is
> in rcu quiescent state. Presumably the atomic_add_return
> is used to add a memory barrier.
> 
> 	atomic_add_return(0, &rdtp->dynticks) & 0x1)

Yep.  It is sampled remotely, hence the need for full memory barriers.
It doesn't help to sample the counter if the sampling gets reordered
with the surrounding code.  Ditto for the increments.

By the end of the year, and hopefully much sooner, I expect to have
testing infrastructure capable of detecting ordering bugs in this code.
At which point, I can start experimenting with alternative code sequences.

But full ordering is still required, and cache misses can happen.

> > In fact, would we be able to simply use tsk->rcu_read_lock_nesting
> > as an indicator of whether or not we should bother waiting on that
> > task or CPU when doing synchronize_rcu?
> 
> We seem to have two variants of __rcu_read_lock().
> 
> One increments current->rcu_read_lock_nesting, the other
> calls preempt_disable().

Yep.  The first is preemptible RCU, the second classic RCU.

> In case of the non-preemptible RCU, we could easily also
> increase current->rcu_read_lock_nesting at the same time
> we increase the preempt counter, and use that as the
> indicator to test whether the cpu is in an extended
> rcu quiescent state. That way there would be no extra
> overhead at syscall entry or exit at all. The trick
> would be getting the preempt count and the rcu read
> lock nesting count in the same cache line for each task.

But in non-preemptible RCU, we have PREEMPT=n, so there is no preempt
counter in production kernels.  Even if there was, we have to sample this
on other CPUs, so the overhead of preempt_disable() and preempt_enable()
would be where kernel entry/exit is, so I expect that this would be a
net loss in overall performance.

> In case of the preemptible RCU scheme, we would have to
> examine the per-task state (under the runqueue lock)
> to get the current task info of all CPUs, and in
> addition wait for the blkd_tasks list to empty out
> when doing a synchronize_rcu().
> 
> That does not appear to require special per-cpu
> counters; examining the per-cpu rdp and the lists
> inside it, with the rnp->lock held if doing any
> list manipulation, looks like it would be enough.
> 
> However, the current code is a lot more complicated
> than that. Am I overlooking something obvious, Paul?
> Maybe something non-obvious? :)

Ummm...  The need to maintain memory ordering when sampling task
state from remote CPUs?

Or am I completely confused about what you are suggesting?

That said, are you chasing a real system-visible performance issue
that you tracked to RCU's dyntick-idle system?

							Thanx, Paul


  reply	other threads:[~2015-05-04 19:39 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56   ` Andy Lutomirski
2015-05-01  6:40   ` Ingo Molnar
2015-05-01 15:20     ` Rik van Riel
2015-05-01 15:59       ` Ingo Molnar
2015-05-01 16:03         ` Andy Lutomirski
2015-05-01 16:21           ` Ingo Molnar
2015-05-01 16:26             ` Rik van Riel
2015-05-01 16:34               ` Ingo Molnar
2015-05-01 18:05                 ` Rik van Riel
2015-05-01 18:40                   ` Ingo Molnar
2015-05-01 19:11                     ` Rik van Riel
2015-05-01 19:37                       ` Andy Lutomirski
2015-05-02  5:27                         ` Ingo Molnar
2015-05-02 18:27                           ` Rik van Riel
2015-05-03 18:41                           ` Andy Lutomirski
2015-05-07 10:35                             ` Ingo Molnar
2015-05-04  9:26                           ` Paolo Bonzini
2015-05-04 13:30                             ` Rik van Riel
2015-05-04 14:06                             ` Rik van Riel
2015-05-04 14:19                             ` Rik van Riel
2015-05-04 15:59                             ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39                               ` Paul E. McKenney
2015-05-04 19:39                                 ` Rik van Riel
2015-05-04 20:02                                   ` Paul E. McKenney
2015-05-04 20:13                                     ` Rik van Riel
2015-05-04 20:38                                       ` Paul E. McKenney
2015-05-04 20:53                                         ` Rik van Riel
2015-05-05  5:54                                           ` Paul E. McKenney
2015-05-06  1:49                                             ` Mike Galbraith
2015-05-06  3:44                                               ` Mike Galbraith
2015-05-06  6:06                                                 ` Paul E. McKenney
2015-05-06  6:52                                                   ` Mike Galbraith
2015-05-06  7:01                                                     ` Mike Galbraith
2015-05-07  0:59                                           ` Frederic Weisbecker
2015-05-07 15:44                                             ` Rik van Riel
2015-05-04 19:00                               ` Rik van Riel
2015-05-04 19:39                                 ` Paul E. McKenney [this message]
2015-05-04 19:59                                   ` Rik van Riel
2015-05-04 20:40                                     ` Paul E. McKenney
2015-05-05 10:53                                   ` Peter Zijlstra
2015-05-05 12:34                                     ` Paul E. McKenney
2015-05-05 13:00                                       ` Peter Zijlstra
2015-05-05 18:35                                         ` Paul E. McKenney
2015-05-05 21:09                                           ` Rik van Riel
2015-05-06  5:41                                             ` Paul E. McKenney
2015-05-05 10:48                                 ` Peter Zijlstra
2015-05-05 10:51                                   ` Peter Zijlstra
2015-05-05 12:30                                     ` Paul E. McKenney
2015-05-02  4:06                   ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37             ` Ingo Molnar
2015-05-01 16:40               ` Rik van Riel
2015-05-01 16:45                 ` Ingo Molnar
2015-05-01 16:54                   ` Rik van Riel
2015-05-01 17:12                     ` Ingo Molnar
2015-05-01 17:22                       ` Rik van Riel
2015-05-01 17:59                         ` Ingo Molnar
2015-05-01 16:22           ` Rik van Riel
2015-05-01 16:27             ` Ingo Molnar
2015-05-03 13:23       ` Mike Galbraith
2015-05-03 17:30         ` Rik van Riel
2015-05-03 18:24           ` Andy Lutomirski
2015-05-03 18:52             ` Rik van Riel
2015-05-07 10:48               ` Ingo Molnar
2015-05-07 12:18                 ` Frederic Weisbecker
2015-05-07 12:29                   ` Ingo Molnar
2015-05-07 15:47                     ` Rik van Riel
2015-05-08  7:58                       ` Ingo Molnar
2015-05-07 12:22                 ` Andy Lutomirski
2015-05-07 12:44                   ` Ingo Molnar
2015-05-07 12:49                     ` Ingo Molnar
2015-05-08  6:17                       ` Paul E. McKenney
2015-05-07 12:52                     ` Andy Lutomirski
2015-05-07 15:08                       ` Ingo Molnar
2015-05-07 17:47                         ` Andy Lutomirski
2015-05-08  6:37                           ` Ingo Molnar
2015-05-08 10:59                             ` Andy Lutomirski
2015-05-08 11:27                               ` Ingo Molnar
2015-05-08 12:56                                 ` Andy Lutomirski
2015-05-08 13:27                                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150504193923.GX5381@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=fweisbec@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.