linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Lai Jiangshan <laijs@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Andy Lutomirski <luto@kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Kees Cook <keescook@chromium.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Dave Hansen <dave.hansen@intel.com>,
	Babu Moger <Babu.Moger@amd.com>, Rik van Riel <riel@surriel.com>,
	"Chang S. Bae" <chang.seok.bae@intel.com>,
	Jann Horn <jannh@google.com>, David Windsor <dwindsor@gmail.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	Andrea Parri <andrea.parri@amarulasolutions.com>,
	Yuyang Du <duyuyang@gmail.com>,
	Richard Guy Briggs <rgb@redhat.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Michal Hocko <mhocko@suse.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	rcu@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [PATCH 11/11] x86,rcu: use percpu rcu_preempt_depth
Date: Fri, 1 Nov 2019 09:21:09 -0700	[thread overview]
Message-ID: <20191101162109.GN20975@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <06b15cfa-620f-d6b1-61d1-8ddfba74a2c8@linux.alibaba.com>

On Fri, Nov 01, 2019 at 11:32:32PM +0800, Lai Jiangshan wrote:
> 
> 
> On 2019/11/1 10:30 下午, Paul E. McKenney wrote:
> > On Fri, Nov 01, 2019 at 02:13:15PM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 01, 2019 at 05:58:16AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 31, 2019 at 10:08:06AM +0000, Lai Jiangshan wrote:
> > > > > +/* We mask the RCU_NEED_SPECIAL bit so that it return real depth */
> > > > > +static __always_inline int rcu_preempt_depth(void)
> > > > > +{
> > > > > +	return raw_cpu_read_4(__rcu_preempt_depth) & ~RCU_NEED_SPECIAL;
> > > > 
> > > > Why not raw_cpu_generic_read()?
> > > > 
> > > > OK, OK, I get that raw_cpu_read_4() translates directly into an "mov"
> > > > instruction on x86, but given that x86 percpu_from_op() is able to
> > > > adjust based on operand size, why doesn't something like raw_cpu_read()
> > > > also have an x86-specific definition that adjusts based on operand size?
> > > 
> > > The reason for preempt.h was header recursion hell.
> > 
> > Fair enough, being as that is also the reason for _rcu_read_lock()
> > not being inlined.  :-/
> > 
> > > > > +}
> > > > > +
> > > > > +static __always_inline void rcu_preempt_depth_set(int pc)
> > > > > +{
> > > > > +	int old, new;
> > > > > +
> > > > > +	do {
> > > > > +		old = raw_cpu_read_4(__rcu_preempt_depth);
> > > > > +		new = (old & RCU_NEED_SPECIAL) |
> > > > > +			(pc & ~RCU_NEED_SPECIAL);
> > > > > +	} while (raw_cpu_cmpxchg_4(__rcu_preempt_depth, old, new) != old);
> > > > 
> > > > Ummm...
> > > > 
> > > > OK, as you know, I have long wanted _rcu_read_lock() to be inlineable.
> > > > But are you -sure- that an x86 cmpxchg is faster than a function call
> > > > and return?  I have strong doubts on that score.
> > > 
> > > This is a regular CMPXCHG instruction, not a LOCK prefixed one, and that
> > > should make all the difference
> > 
> > Yes, understood, but this is also adding some arithmetic, a comparison,
> > and a conditional branch.  Are you -sure- that this is cheaper than
> > an unconditional call and return?
> 
> rcu_preempt_depth_set() is used only for exit_rcu().
> The performance doesn't matter here. And since RCU_NEED_SPECIAL
> bit is allowed to lost in exit_rcu(), rcu_preempt_depth_set()
> can be a single raw_cpu_write_4() if the performance is matter.

That code only executes if there is a bug involving a missing
rcu_read_unlock(), but in that case it would be necessary to keep the
check of ->rcu_read_lock_special.b.blocked due to the possibility that
the exiting task was preempted some time after the rcu_read_lock()
that would have been paired with the missing rcu_read_unlock().

> (This complex code is copied from preempt.h and I can't expect
> how will rcu_preempt_depth_set() be used in the feture
> so I keep it unchanged.)
> 
> 
> +static __always_inline void rcu_preempt_depth_inc(void)
> +{
> +	raw_cpu_add_4(__rcu_preempt_depth, 1);
> +}
> 
> This one is for read_read_lock(). ONE instruction.

Apologies, I did in fact confuse _inc with _set.

> +
> +static __always_inline bool rcu_preempt_depth_dec_and_test(void)
> +{
> +	return GEN_UNARY_RMWcc("decl", __rcu_preempt_depth, e,
> __percpu_arg([var]));
> +}
> 
> This one is for read_read_unlock() which will be 2 instructions
> ("decl" and "je"), which is the same as preempt_enable().
> 
> In news days, preempt_disable() is discouraged unless it is
> really necessary and rcu is always encouraged. Low overhead
> read_read_[un]lock() is essential.

Agreed.  And you give instruction counts below, thank you.  But is
this reduction in the number of instructions really visible in terms of
performance at the system level?

> > > > Plus multiplying the x86-specific code by 26 doesn't look good.
> > > > 
> > > > And the RCU read-side nesting depth really is a per-task thing.  Copying
> > > > it to and from the task at context-switch time might make sense if we
> > > > had a serious optimization, but it does not appear that we do.
> 
> Once upon a time, __preempt_count is also being copied to and from the
> task at context-switch, and worked well.
> 
> > > > You original patch some years back, ill-received though it was at the
> > > > time, is looking rather good by comparison.  Plus it did not require
> > > > architecture-specific code!
> > > 
> > > Right, so the per-cpu preempt_count code relies on the preempt_count
> > > being invariant over context switches. That means we never have to
> > > save/restore the thing.
> > > 
> > > For (preemptible) rcu, this is 'obviously' not the case.
> > > 
> > > That said, I've not looked over this patch series, I only got 1 actual
> > > patch, not the whole series, and I've not had time to go dig out the
> > > rest..
> > 
> > I have taken a couple of the earlier patches in the series.
> > 
> > Perhaps inlining these things is instead a job for the long anticipated
> > GCC LTO?  ;-)
> 
> Adding a kenerl/offset.c and some Mafefile stuff will help inlining
> these things. But I don't think Linus will happy with introducing
> kenerl/offset.c. There will be 3 instructions for rcu_read_lock()
> and 5 for rcu_read_unlock(), which doesn't taste so delicious.

Adding kernel/offset.c would be to implement the original approach of
computing the offset of ->rcu_read_lock_nesting within task_struct,
correct?  (As opposed to GCC LTO.)  If so, what are your thoughts on
the GCC LTO approach?

> Moving rcu_read_lock_nesting to struct thread_info is another
> possible way. The number of instructions is also 3 and 5.

At first glance, that would require much less architecture-specific code,
so would be more attractive.  And it would be good to avoid having two
different _rcu_read_lock() and _rcu_read_unlock() implementations for
PREEMPT=y.  Here are some questions, though:

o	Do any architectures have size limits on thread_info?

o	The thread_info reference clearly cannot change at the same
	time as the current pointer.  Are there any issues with
	code that executes between these two operations, either
	when switching out or when switching in?

Any other questions that I should be asking?

And please rest assured that despite my lack of enthusiasm for your
current approach, I do very much appreciate your looking into this!!!

							Thanx, Paul

  reply	other threads:[~2019-11-01 16:21 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-31 10:07 [PATCH 00/11] rcu: introduce percpu rcu_preempt_depth Lai Jiangshan
2019-10-31 10:07 ` [PATCH 01/11] rcu: avoid leaking exp_deferred_qs into next GP Lai Jiangshan
2019-10-31 13:43   ` Paul E. McKenney
2019-10-31 18:19     ` Lai Jiangshan
2019-10-31 19:00       ` Paul E. McKenney
2019-10-31 10:07 ` [PATCH 02/11] rcu: fix bug when rcu_exp_handler() in nested interrupt Lai Jiangshan
2019-10-31 13:47   ` Paul E. McKenney
2019-10-31 14:20     ` Lai Jiangshan
2019-10-31 14:31     ` Paul E. McKenney
2019-10-31 15:14       ` Lai Jiangshan
2019-10-31 18:52         ` Paul E. McKenney
2019-11-01  0:19           ` Boqun Feng
2019-11-01  2:29             ` Lai Jiangshan
2019-10-31 10:07 ` [PATCH 03/11] rcu: clean up rcu_preempt_deferred_qs_irqrestore() Lai Jiangshan
2019-10-31 13:52   ` Paul E. McKenney
2019-10-31 15:25     ` Lai Jiangshan
2019-10-31 18:57       ` Paul E. McKenney
2019-10-31 19:02         ` Paul E. McKenney
2019-10-31 10:07 ` [PATCH 04/11] rcu: cleanup rcu_preempt_deferred_qs() Lai Jiangshan
2019-10-31 14:10   ` Paul E. McKenney
2019-10-31 14:35     ` Lai Jiangshan
2019-10-31 15:07       ` Paul E. McKenney
2019-10-31 18:33         ` Lai Jiangshan
2019-10-31 22:45           ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 05/11] rcu: clean all rcu_read_unlock_special after report qs Lai Jiangshan
2019-11-01 11:54   ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 06/11] rcu: clear t->rcu_read_unlock_special in one go Lai Jiangshan
2019-11-01 12:10   ` Paul E. McKenney
2019-11-01 16:58     ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 07/11] rcu: set special.b.deferred_qs before wake_up() Lai Jiangshan
2019-10-31 10:08 ` [PATCH 08/11] rcu: don't use negative ->rcu_read_lock_nesting Lai Jiangshan
2019-11-01 12:33   ` Paul E. McKenney
2019-11-16 13:04     ` Lai Jiangshan
2019-11-17 21:53       ` Paul E. McKenney
2019-11-18  1:54         ` Lai Jiangshan
2019-11-18 14:57           ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 09/11] rcu: wrap usages of rcu_read_lock_nesting Lai Jiangshan
2019-10-31 10:08 ` [PATCH 10/11] rcu: clear the special.b.need_qs in rcu_note_context_switch() Lai Jiangshan
2019-10-31 10:08 ` [PATCH 11/11] x86,rcu: use percpu rcu_preempt_depth Lai Jiangshan
2019-11-01 12:58   ` Paul E. McKenney
2019-11-01 13:13     ` Peter Zijlstra
2019-11-01 14:30       ` Paul E. McKenney
2019-11-01 15:32         ` Lai Jiangshan
2019-11-01 16:21           ` Paul E. McKenney [this message]
2019-11-01 15:47       ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191101162109.GN20975@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=Babu.Moger@amd.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=chang.seok.bae@intel.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=dave.hansen@intel.com \
    --cc=duyuyang@gmail.com \
    --cc=dwindsor@gmail.com \
    --cc=elena.reshetova@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=keescook@chromium.org \
    --cc=laijs@linux.alibaba.com \
    --cc=ldv@altlinux.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rcu@vger.kernel.org \
    --cc=rgb@redhat.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).