linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
@ 2013-10-30 19:09 Michael S. Tsirkin
  2013-10-30 20:15 ` Paul E. McKenney
  2013-10-31 11:14 ` Paolo Bonzini
  0 siblings, 2 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-10-30 19:09 UTC (permalink / raw)
  To: paulmck, linux-kernel; +Cc: kvm, gleb, pbonzini

I noticed that srcu_read_lock/unlock both have a memory barrier,
so just by moving srcu_read_unlock earlier we can get rid of
one call to smp_mb().

Unsurprisingly, the gain is small but measureable using the unit test
microbenchmark:
before
	vmcall 1407
after
	vmcall 1357

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

--

I didn't stress test this yet, sending out for early review/flames.

Paul, could you review this patch please?
Documentation/memory-barriers.txt says that unlock has a weaker
uni-directional barrier, but in practice srcu_read_unlock calls
smp_mb().

Is it OK to rely on this? If not, can I add
smp_mb__after_srcu_read_unlock (making it an empty macro for now)
so we can avoid an actual extra smp_mb()?

Thanks.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8617c9d..a48fb36 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5949,8 +5949,10 @@ restore:
 
 	/* We should set ->mode before check ->requests,
 	 * see the comment in make_all_cpus_request.
+	 *
+	 * srcu_read_unlock below acts as a memory barrier.
 	 */
-	smp_mb();
+	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
 	local_irq_disable();
 
@@ -5960,12 +5962,11 @@ restore:
 		smp_wmb();
 		local_irq_enable();
 		preempt_enable();
+		vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		r = 1;
 		goto cancel_injection;
 	}
 
-	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-
 	if (req_immediate_exit)
 		smp_send_reschedule(vcpu->cpu);
 

-- 
MST

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-30 19:09 [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock Michael S. Tsirkin
@ 2013-10-30 20:15 ` Paul E. McKenney
  2013-10-30 23:26   ` Michael S. Tsirkin
  2013-10-31 11:14 ` Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2013-10-30 20:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, gleb, pbonzini

On Wed, Oct 30, 2013 at 09:09:29PM +0200, Michael S. Tsirkin wrote:
> I noticed that srcu_read_lock/unlock both have a memory barrier,
> so just by moving srcu_read_unlock earlier we can get rid of
> one call to smp_mb().
> 
> Unsurprisingly, the gain is small but measureable using the unit test
> microbenchmark:
> before
> 	vmcall 1407
> after
> 	vmcall 1357
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> --
> 
> I didn't stress test this yet, sending out for early review/flames.
> 
> Paul, could you review this patch please?
> Documentation/memory-barriers.txt says that unlock has a weaker
> uni-directional barrier, but in practice srcu_read_unlock calls
> smp_mb().
> 
> Is it OK to rely on this? If not, can I add
> smp_mb__after_srcu_read_unlock (making it an empty macro for now)
> so we can avoid an actual extra smp_mb()?

Please use smp_mb__after_srcu_read_unlock().  After all, it was not
that long ago that srcu_read_unlock() contained no memory barriers,
and perhaps some day it won't need to once again.

							Thanx, Paul

> Thanks.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8617c9d..a48fb36 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5949,8 +5949,10 @@ restore:
> 
>  	/* We should set ->mode before check ->requests,
>  	 * see the comment in make_all_cpus_request.
> +	 *
> +	 * srcu_read_unlock below acts as a memory barrier.
>  	 */
> -	smp_mb();
> +	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> 
>  	local_irq_disable();
> 
> @@ -5960,12 +5962,11 @@ restore:
>  		smp_wmb();
>  		local_irq_enable();
>  		preempt_enable();
> +		vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
>  		r = 1;
>  		goto cancel_injection;
>  	}
> 
> -	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> -
>  	if (req_immediate_exit)
>  		smp_send_reschedule(vcpu->cpu);
> 
> 
> -- 
> MST
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-30 20:15 ` Paul E. McKenney
@ 2013-10-30 23:26   ` Michael S. Tsirkin
  2013-10-31  4:56     ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-10-30 23:26 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, kvm, gleb, pbonzini

> > Paul, could you review this patch please?
> > Documentation/memory-barriers.txt says that unlock has a weaker
> > uni-directional barrier, but in practice srcu_read_unlock calls
> > smp_mb().
> > 
> > Is it OK to rely on this? If not, can I add
> > smp_mb__after_srcu_read_unlock (making it an empty macro for now)
> > so we can avoid an actual extra smp_mb()?
> 
> Please use smp_mb__after_srcu_read_unlock().  After all, it was not
> that long ago that srcu_read_unlock() contained no memory barriers,
> and perhaps some day it won't need to once again.
> 
> 							Thanx, Paul
>

Thanks!
Something like this will be enough?
 
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index c114614..9b058ee 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -237,4 +237,18 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__srcu_read_unlock(sp, idx);
 }
 
+/**
+ * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
+ *
+ * Converts the preceding srcu_read_unlock into a two-way memory barrier.
+ *
+ * Call this after srcu_read_unlock, to guarantee that all memory operations
+ * that occur after smp_mb__after_srcu_read_unlock will appear to happen after
+ * the preceding srcu_read_unlock.
+ */
+static inline void smp_mb__after_srcu_read_unlock(void)
+{
+	/* __srcu_read_unlock has smp_mb() internally so nothing to do here. */
+}
+
 #endif

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-30 23:26   ` Michael S. Tsirkin
@ 2013-10-31  4:56     ` Paul E. McKenney
  2013-10-31  6:47       ` Gleb Natapov
  2013-10-31 13:57       ` Michael S. Tsirkin
  0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2013-10-31  4:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, gleb, pbonzini

On Thu, Oct 31, 2013 at 01:26:05AM +0200, Michael S. Tsirkin wrote:
> > > Paul, could you review this patch please?
> > > Documentation/memory-barriers.txt says that unlock has a weaker
> > > uni-directional barrier, but in practice srcu_read_unlock calls
> > > smp_mb().
> > > 
> > > Is it OK to rely on this? If not, can I add
> > > smp_mb__after_srcu_read_unlock (making it an empty macro for now)
> > > so we can avoid an actual extra smp_mb()?
> > 
> > Please use smp_mb__after_srcu_read_unlock().  After all, it was not
> > that long ago that srcu_read_unlock() contained no memory barriers,
> > and perhaps some day it won't need to once again.
> > 
> > 							Thanx, Paul
> >
> 
> Thanks!
> Something like this will be enough?
> 
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index c114614..9b058ee 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -237,4 +237,18 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
>  	__srcu_read_unlock(sp, idx);
>  }
> 
> +/**
> + * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
> + *
> + * Converts the preceding srcu_read_unlock into a two-way memory barrier.
> + *
> + * Call this after srcu_read_unlock, to guarantee that all memory operations
> + * that occur after smp_mb__after_srcu_read_unlock will appear to happen after
> + * the preceding srcu_read_unlock.
> + */
> +static inline void smp_mb__after_srcu_read_unlock(void)
> +{
> +	/* __srcu_read_unlock has smp_mb() internally so nothing to do here. */
> +}
> +
>  #endif

Yep, that should do it!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-31  4:56     ` Paul E. McKenney
@ 2013-10-31  6:47       ` Gleb Natapov
  2013-10-31 11:11         ` Paolo Bonzini
  2013-10-31 13:57       ` Michael S. Tsirkin
  1 sibling, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2013-10-31  6:47 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Michael S. Tsirkin, linux-kernel, kvm, pbonzini

On Wed, Oct 30, 2013 at 09:56:29PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 31, 2013 at 01:26:05AM +0200, Michael S. Tsirkin wrote:
> > > > Paul, could you review this patch please?
> > > > Documentation/memory-barriers.txt says that unlock has a weaker
> > > > uni-directional barrier, but in practice srcu_read_unlock calls
> > > > smp_mb().
> > > > 
> > > > Is it OK to rely on this? If not, can I add
> > > > smp_mb__after_srcu_read_unlock (making it an empty macro for now)
> > > > so we can avoid an actual extra smp_mb()?
> > > 
> > > Please use smp_mb__after_srcu_read_unlock().  After all, it was not
> > > that long ago that srcu_read_unlock() contained no memory barriers,
> > > and perhaps some day it won't need to once again.
> > > 
> > > 							Thanx, Paul
> > >
> > 
> > Thanks!
> > Something like this will be enough?
> > 
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index c114614..9b058ee 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -237,4 +237,18 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
> >  	__srcu_read_unlock(sp, idx);
> >  }
> > 
> > +/**
> > + * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
> > + *
> > + * Converts the preceding srcu_read_unlock into a two-way memory barrier.
> > + *
> > + * Call this after srcu_read_unlock, to guarantee that all memory operations
> > + * that occur after smp_mb__after_srcu_read_unlock will appear to happen after
> > + * the preceding srcu_read_unlock.
> > + */
> > +static inline void smp_mb__after_srcu_read_unlock(void)
> > +{
> > +	/* __srcu_read_unlock has smp_mb() internally so nothing to do here. */
> > +}
> > +
> >  #endif
> 
> Yep, that should do it!
> 
This looks dubious to me. All other smp_mb__after_* variants are there
because some atomic operations have different memory barrier semantics on
different arches, but srcu_read_unlock() have the same semantics on all
arches, so smp_mb__after_srcu_read_unlock() becomes
smp_mb__after_a_function_that_happens_to_have_mb_now_but_may_not_have_in_the_feature().
How likely it is that smp_mb() will disappear from srcu_read_unlock()
(if was added for a reason I guess)?  May be we should change documentation
to say that srcu_read_unlock() is a memory barrier which will reflect
the reality.

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-31  6:47       ` Gleb Natapov
@ 2013-10-31 11:11         ` Paolo Bonzini
  2013-10-31 12:28           ` Gleb Natapov
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2013-10-31 11:11 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paul E. McKenney, Michael S. Tsirkin, linux-kernel, kvm

Il 31/10/2013 07:47, Gleb Natapov ha scritto:
> This looks dubious to me. All other smp_mb__after_* variants are there
> because some atomic operations have different memory barrier semantics on
> different arches,

It doesn't have to be arches; unlock APIs typically have release
semantics only, but SRCU is stronger.

> but srcu_read_unlock() have the same semantics on all
> arches, so smp_mb__after_srcu_read_unlock() becomes
> smp_mb__after_a_function_that_happens_to_have_mb_now_but_may_not_have_in_the_feature().
> How likely it is that smp_mb() will disappear from srcu_read_unlock()
> (if was added for a reason I guess)?  May be we should change documentation
> to say that srcu_read_unlock() is a memory barrier which will reflect
> the reality.

That would be different from all other unlock APIs.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-30 19:09 [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock Michael S. Tsirkin
  2013-10-30 20:15 ` Paul E. McKenney
@ 2013-10-31 11:14 ` Paolo Bonzini
  2013-10-31 11:32   ` Michael S. Tsirkin
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2013-10-31 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: paulmck, linux-kernel, kvm, gleb

Il 30/10/2013 20:09, Michael S. Tsirkin ha scritto:
> I noticed that srcu_read_lock/unlock both have a memory barrier,
> so just by moving srcu_read_unlock earlier we can get rid of
> one call to smp_mb().
> 
> Unsurprisingly, the gain is small but measureable using the unit test
> microbenchmark:
> before
> 	vmcall 1407
> after
> 	vmcall 1357
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Across how many runs?  Best or average or "all runs were in that
ballpark", :) and what's the minimum/maximum before and after the patch?

As you say the benefit is not surprising, but the experiments should be
documented properly.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-31 11:14 ` Paolo Bonzini
@ 2013-10-31 11:32   ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-10-31 11:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: paulmck, linux-kernel, kvm, gleb

On Thu, Oct 31, 2013 at 12:14:15PM +0100, Paolo Bonzini wrote:
> Il 30/10/2013 20:09, Michael S. Tsirkin ha scritto:
> > I noticed that srcu_read_lock/unlock both have a memory barrier,
> > so just by moving srcu_read_unlock earlier we can get rid of
> > one call to smp_mb().
> > 
> > Unsurprisingly, the gain is small but measureable using the unit test
> > microbenchmark:
> > before
> > 	vmcall 1407
> > after
> > 	vmcall 1357
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Across how many runs?


It's the experiment that vmexit test does:
it runs for 2^30 cycles, then divides the number of cycles by the
number of iterations.
You get in the ballpark of 1300000 iterations normally.

> Best or average or "all runs were in that
> ballpark", :) and what's the minimum/maximum before and after the patch?
> 
> As you say the benefit is not surprising, but the experiments should be
> documented properly.
> 
> Paolo

"All runs in that ballpark".

-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-31 11:11         ` Paolo Bonzini
@ 2013-10-31 12:28           ` Gleb Natapov
  0 siblings, 0 replies; 11+ messages in thread
From: Gleb Natapov @ 2013-10-31 12:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Paul E. McKenney, Michael S. Tsirkin, linux-kernel, kvm

On Thu, Oct 31, 2013 at 12:11:21PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 07:47, Gleb Natapov ha scritto:
> > This looks dubious to me. All other smp_mb__after_* variants are there
> > because some atomic operations have different memory barrier semantics on
> > different arches,
> 
> It doesn't have to be arches; 
Of course it doesn't, but it is now :)

>                               unlock APIs typically have release
> semantics only, but SRCU is stronger.
> 
Yes the question is if it is by design or implementation detail we should
not rely on.

> > but srcu_read_unlock() have the same semantics on all
> > arches, so smp_mb__after_srcu_read_unlock() becomes
> > smp_mb__after_a_function_that_happens_to_have_mb_now_but_may_not_have_in_the_feature().
> > How likely it is that smp_mb() will disappear from srcu_read_unlock()
> > (if was added for a reason I guess)?  May be we should change documentation
> > to say that srcu_read_unlock() is a memory barrier which will reflect
> > the reality.
> 
> That would be different from all other unlock APIs.
> 
As long as it is documented... smp_mb__after_srcu_read_unlock() is just
a form of documentation anyway right now. I do not have strong objection
to smp_mb__after_srcu_read_unlock() though, the improvement is impressive
for such a small change.

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-31  4:56     ` Paul E. McKenney
  2013-10-31  6:47       ` Gleb Natapov
@ 2013-10-31 13:57       ` Michael S. Tsirkin
  2013-11-01  8:43         ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-10-31 13:57 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, kvm, gleb, pbonzini

On Wed, Oct 30, 2013 at 09:56:29PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 31, 2013 at 01:26:05AM +0200, Michael S. Tsirkin wrote:
> > > > Paul, could you review this patch please?
> > > > Documentation/memory-barriers.txt says that unlock has a weaker
> > > > uni-directional barrier, but in practice srcu_read_unlock calls
> > > > smp_mb().
> > > > 
> > > > Is it OK to rely on this? If not, can I add
> > > > smp_mb__after_srcu_read_unlock (making it an empty macro for now)
> > > > so we can avoid an actual extra smp_mb()?
> > > 
> > > Please use smp_mb__after_srcu_read_unlock().  After all, it was not
> > > that long ago that srcu_read_unlock() contained no memory barriers,
> > > and perhaps some day it won't need to once again.
> > > 
> > > 							Thanx, Paul
> > >
> > 
> > Thanks!
> > Something like this will be enough?
> > 
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index c114614..9b058ee 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -237,4 +237,18 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
> >  	__srcu_read_unlock(sp, idx);
> >  }
> > 
> > +/**
> > + * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
> > + *
> > + * Converts the preceding srcu_read_unlock into a two-way memory barrier.
> > + *
> > + * Call this after srcu_read_unlock, to guarantee that all memory operations
> > + * that occur after smp_mb__after_srcu_read_unlock will appear to happen after
> > + * the preceding srcu_read_unlock.
> > + */
> > +static inline void smp_mb__after_srcu_read_unlock(void)
> > +{
> > +	/* __srcu_read_unlock has smp_mb() internally so nothing to do here. */
> > +}
> > +
> >  #endif
> 
> Yep, that should do it!
> 
> 							Thanx, Paul

BTW I'm wondering about the smb_mb within srcu_read_lock.
If we kept the index in the same memory with the buffer we
dereference, could we get rid of it and use a dependency barrier
instead? It does appear prominently in the profiles.
Thoughts?


-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock
  2013-10-31 13:57       ` Michael S. Tsirkin
@ 2013-11-01  8:43         ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2013-11-01  8:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, gleb, pbonzini

On Thu, Oct 31, 2013 at 03:57:14PM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 30, 2013 at 09:56:29PM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 31, 2013 at 01:26:05AM +0200, Michael S. Tsirkin wrote:
> > > > > Paul, could you review this patch please?
> > > > > Documentation/memory-barriers.txt says that unlock has a weaker
> > > > > uni-directional barrier, but in practice srcu_read_unlock calls
> > > > > smp_mb().
> > > > > 
> > > > > Is it OK to rely on this? If not, can I add
> > > > > smp_mb__after_srcu_read_unlock (making it an empty macro for now)
> > > > > so we can avoid an actual extra smp_mb()?
> > > > 
> > > > Please use smp_mb__after_srcu_read_unlock().  After all, it was not
> > > > that long ago that srcu_read_unlock() contained no memory barriers,
> > > > and perhaps some day it won't need to once again.
> > > > 
> > > > 							Thanx, Paul
> > > >
> > > 
> > > Thanks!
> > > Something like this will be enough?
> > > 
> > > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > > index c114614..9b058ee 100644
> > > --- a/include/linux/srcu.h
> > > +++ b/include/linux/srcu.h
> > > @@ -237,4 +237,18 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
> > >  	__srcu_read_unlock(sp, idx);
> > >  }
> > > 
> > > +/**
> > > + * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
> > > + *
> > > + * Converts the preceding srcu_read_unlock into a two-way memory barrier.
> > > + *
> > > + * Call this after srcu_read_unlock, to guarantee that all memory operations
> > > + * that occur after smp_mb__after_srcu_read_unlock will appear to happen after
> > > + * the preceding srcu_read_unlock.
> > > + */
> > > +static inline void smp_mb__after_srcu_read_unlock(void)
> > > +{
> > > +	/* __srcu_read_unlock has smp_mb() internally so nothing to do here. */
> > > +}
> > > +
> > >  #endif
> > 
> > Yep, that should do it!
> > 
> > 							Thanx, Paul
> 
> BTW I'm wondering about the smb_mb within srcu_read_lock.
> If we kept the index in the same memory with the buffer we
> dereference, could we get rid of it and use a dependency barrier
> instead? It does appear prominently in the profiles.
> Thoughts?

Unfortunately, no go:

	int __srcu_read_lock(struct srcu_struct *sp)
	{
		int idx;

		idx = ACCESS_ONCE(sp->completed) & 0x1;
		preempt_disable();
		ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1;
		smp_mb(); /* B */  /* Avoid leaking the critical section. */
		ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1;
		preempt_enable();
		return idx;
	}

The smp_mb() is between the two increments, and there is no dependency
between them.  There -could- be a dependency between the fetch of idx
and the first increment, but there is no ordering required there because
the rest of the algorithm will correctly handle any misordering.  Which
is why there is no memory barrier there.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-01  8:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-30 19:09 [PATCH RFC] kvm: optimize out smp_mb using srcu_read_unlock Michael S. Tsirkin
2013-10-30 20:15 ` Paul E. McKenney
2013-10-30 23:26   ` Michael S. Tsirkin
2013-10-31  4:56     ` Paul E. McKenney
2013-10-31  6:47       ` Gleb Natapov
2013-10-31 11:11         ` Paolo Bonzini
2013-10-31 12:28           ` Gleb Natapov
2013-10-31 13:57       ` Michael S. Tsirkin
2013-11-01  8:43         ` Paul E. McKenney
2013-10-31 11:14 ` Paolo Bonzini
2013-10-31 11:32   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).