All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, kvm@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH RFC tip/core/rcu 1/2] srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context
Date: Tue, 6 Jun 2017 15:08:50 +0200	[thread overview]
Message-ID: <b77cb817-9ffb-b600-19a9-1de685049c57@redhat.com> (raw)
In-Reply-To: <20170606105343.ibhzrk6jwhmoja5t@hirez.programming.kicks-ass.net>



On 06/06/2017 12:53, Peter Zijlstra wrote:
> On Mon, Jun 05, 2017 at 03:09:50PM -0700, Paul E. McKenney wrote:
>> There would be a slowdown if 1) fast this_cpu_inc is not available and
>> cannot be implemented (this usually means that atomic_inc has implicit
>> memory barriers),
> 
> I don't get this.
> 
> How is per-cpu crud related to being strongly ordered?
> 
> this_cpu_ has 3 forms:
> 
> 	x86:		single instruction
> 	arm64,s390:	preempt_disable()+atomic_op
> 	generic:	local_irq_save()+normal_op
> 
> Only s390 is TSO, arm64 is very much a weak arch.

Right, and thus arm64 can implement a fast this_cpu_inc using LL/SC.
s390 cannot because its atomic_inc has implicit memory barriers.

s390's this_cpu_inc is *faster* than the generic one, but still pretty slow.

>>                   and 2) local_irq_save/restore is slower than disabling
>> preemption.  The main architecture with these constraints is s390, which
>> however is already paying the price in __srcu_read_unlock and has not
>> complained.
> 
> IIRC only PPC (and hopefully soon x86) has a local_irq_save() that is as
> fast as preempt_disable().

1 = arch-specific this_cpu_inc is available
2 = local_irq_save/restore as fast as preempt_disable/enable

If either 1 or 2 are true, this patch makes SRCU faster or equal

x86 (single instruction): 1 = true, 2 = false   -> ok
arm64 (weakly ordered):   1 = true, 2 = false   -> ok
powerpc:                  1 = false, 2 = true   -> ok
s390:                     1 = false, 2 = false  -> slower

For other LL/SC architectures, notably arm, fast this_cpu_* ops not yet
available, but could be written pretty easily.

>> A valid optimization on s390 would be to skip the smp_mb;
>> AIUI, this_cpu_inc implies a memory barrier (!) due to its implementation.
> 
> You mean the s390 this_cpu_inc() in specific, right? Because
> this_cpu_inc() in general does not imply any such thing.

Yes, of course, this is only for s390.

Alternatively, we could change the counters to atomic_t and use
smp_mb__{before,after}_atomic, as in the (unnecessary) srcutiny patch.
That should shave a few cycles on x86 too, since "lock inc" is faster
than "inc; mfence".  For srcuclassic (and stable) however I'd rather
keep the simple __this_cpu_inc -> this_cpu_inc change.

Paolo

  parent reply	other threads:[~2017-06-06 13:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-05 22:09 [PATCH RFC tip/core/rcu 0/2] srcu: All SRCU readers from both process and irq Paul E. McKenney
2017-06-05 22:09 ` [PATCH RFC tip/core/rcu 1/2] srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context Paul E. McKenney
2017-06-06 10:53   ` Peter Zijlstra
2017-06-06 12:56     ` Paul E. McKenney
2017-06-06 13:08     ` Paolo Bonzini [this message]
2017-06-06 14:45       ` Christian Borntraeger
2017-06-06 15:27         ` Heiko Carstens
2017-06-06 15:37           ` Christian Borntraeger
2017-06-06 15:58             ` Paul E. McKenney
2017-06-06 16:15           ` Peter Zijlstra
2017-06-06 17:00             ` Paul E. McKenney
2017-06-06 17:20             ` Heiko Carstens
2017-06-06 16:12         ` Peter Zijlstra
2017-06-06 16:02       ` Peter Zijlstra
2017-06-06 11:09   ` Peter Zijlstra
2017-06-06 12:01     ` Paolo Bonzini
2017-06-06 12:53       ` Paul E. McKenney
2017-06-06 15:54         ` Peter Zijlstra
2017-06-06 15:59           ` Paul E. McKenney
2017-06-06 17:23   ` Peter Zijlstra
2017-06-06 17:50     ` Paul E. McKenney
2017-06-06 18:00       ` Peter Zijlstra
2017-06-06 18:22         ` Paul E. McKenney
2017-06-05 22:09 ` [PATCH RFC tip/core/rcu 2/2] srcu: Allow use of Classic " Paul E. McKenney
2017-06-06 17:05 ` [PATCH RFC tip/core/rcu 0/2] srcu: All SRCU readers from both process and irq Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b77cb817-9ffb-b600-19a9-1de685049c57@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.