linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
@ 2019-01-03 14:17 Trond Myklebust
  2019-01-03 22:45 ` J Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2019-01-03 14:17 UTC (permalink / raw)
  To: J Bruce Fields; +Cc: linux-nfs

Use READ_ONCE() to tell the compiler to not optimse away the read of
xprt->xpt_flags in svc_xprt_release_slot().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/svc_xprt.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 51d36230b6e3..94d344325e22 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -363,9 +363,11 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
 
 static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
 {
-	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
+	unsigned long xpt_flags = READ_ONCE(xprt->xpt_flags);
+
+	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
 		return true;
-	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
+	if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED))) {
 		if (xprt->xpt_ops->xpo_has_wspace(xprt) &&
 		    svc_xprt_slots_in_range(xprt))
 			return true;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-03 14:17 [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot() Trond Myklebust
@ 2019-01-03 22:45 ` J Bruce Fields
  2019-01-03 23:40   ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: J Bruce Fields @ 2019-01-03 22:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Thu, Jan 03, 2019 at 09:17:12AM -0500, Trond Myklebust wrote:
> Use READ_ONCE() to tell the compiler to not optimse away the read of
> xprt->xpt_flags in svc_xprt_release_slot().

What exactly is the possible race here?  And why is a READ_ONCE()
sufficient, as opposed to some memory barriers?

I may need to shut myself in a room with memory-barriers.txt, I'm pretty
hazy on these things.

--b.

> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  net/sunrpc/svc_xprt.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 51d36230b6e3..94d344325e22 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -363,9 +363,11 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
>  
>  static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
>  {
> -	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> +	unsigned long xpt_flags = READ_ONCE(xprt->xpt_flags);
> +
> +	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
>  		return true;
> -	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> +	if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED))) {
>  		if (xprt->xpt_ops->xpo_has_wspace(xprt) &&
>  		    svc_xprt_slots_in_range(xprt))
>  			return true;
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-03 22:45 ` J Bruce Fields
@ 2019-01-03 23:40   ` Trond Myklebust
  2019-01-04 17:39     ` bfields
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2019-01-03 23:40 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

On Thu, 2019-01-03 at 17:45 -0500, J Bruce Fields wrote:
> On Thu, Jan 03, 2019 at 09:17:12AM -0500, Trond Myklebust wrote:
> > Use READ_ONCE() to tell the compiler to not optimse away the read
> > of
> > xprt->xpt_flags in svc_xprt_release_slot().
> 
> What exactly is the possible race here?  And why is a READ_ONCE()
> sufficient, as opposed to some memory barriers?
> 
> I may need to shut myself in a room with memory-barriers.txt, I'm
> pretty
> hazy on these things.
> 

It's not about fixing any races. It is about ensuring that the compiler
does not optimise away the read if the function is ever called from
inside a loop. Not an important fix, since I'm not aware of any cases
where this has happened. However strictly speaking, we should use
READ_ONCE() here because that variable is volatile; it can be changed
by a background action.

> --b.
> 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  net/sunrpc/svc_xprt.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index 51d36230b6e3..94d344325e22 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -363,9 +363,11 @@ static void svc_xprt_release_slot(struct
> > svc_rqst *rqstp)
> >  
> >  static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
> >  {
> > -	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> > +	unsigned long xpt_flags = READ_ONCE(xprt->xpt_flags);
> > +
> > +	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
> >  		return true;
> > -	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> > +	if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED))) {
> >  		if (xprt->xpt_ops->xpo_has_wspace(xprt) &&
> >  		    svc_xprt_slots_in_range(xprt))
> >  			return true;
> > -- 
> > 2.20.1
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-03 23:40   ` Trond Myklebust
@ 2019-01-04 17:39     ` bfields
  2019-01-07 21:32       ` bfields
  0 siblings, 1 reply; 19+ messages in thread
From: bfields @ 2019-01-04 17:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Thu, Jan 03, 2019 at 11:40:21PM +0000, Trond Myklebust wrote:
> On Thu, 2019-01-03 at 17:45 -0500, J Bruce Fields wrote:
> > On Thu, Jan 03, 2019 at 09:17:12AM -0500, Trond Myklebust wrote:
> > > Use READ_ONCE() to tell the compiler to not optimse away the read
> > > of
> > > xprt->xpt_flags in svc_xprt_release_slot().
> > 
> > What exactly is the possible race here?  And why is a READ_ONCE()
> > sufficient, as opposed to some memory barriers?
> > 
> > I may need to shut myself in a room with memory-barriers.txt, I'm
> > pretty
> > hazy on these things.
> > 
> 
> It's not about fixing any races. It is about ensuring that the compiler
> does not optimise away the read if the function is ever called from
> inside a loop. Not an important fix, since I'm not aware of any cases
> where this has happened. However strictly speaking, we should use
> READ_ONCE() here because that variable is volatile; it can be changed
> by a background action.

I wonder if there's a race here independent of that change:

svc_xprt_enqueue() callers all do something like:

	1. change some condition
	2. call svc_xprt_enqueue() to check whether the xprt should 
	   now be enqueued.

where the conditions are settings of the xpt_flags, or socket wspace, or
xpt_nr_rqsts.

In theory if we miss some concurrent change we're OK because whoever's
making that change will then also call svc_xprt_enqueue.  But that's not
enough; e.g.:

	task 1				task 2
	------				------
	set XPT_DATA
					atomic_dec(xpt_nr_rqsts)

	check XPT_DATA && check xpt_nr_rqsts

					check XPT_DATA && check xpt_nr_rqsts

If the tasks only see their local changes, then neither see both
conditions true, so the socket doesn't get enqueued.  (And a request
that was ready to be processed will sit around until someone else comes
calls svc_xprt_enqueue() on that xprt.)

The code's more complicated than that and maybe there's some reason that
can't happen.

--b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-04 17:39     ` bfields
@ 2019-01-07 21:32       ` bfields
  2019-01-07 22:06         ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: bfields @ 2019-01-07 21:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Fri, Jan 04, 2019 at 12:39:12PM -0500, bfields@fieldses.org wrote:
> I wonder if there's a race here independent of that change:
> 
> svc_xprt_enqueue() callers all do something like:
> 
> 	1. change some condition
> 	2. call svc_xprt_enqueue() to check whether the xprt should 
> 	   now be enqueued.
> 
> where the conditions are settings of the xpt_flags, or socket wspace, or
> xpt_nr_rqsts.
> 
> In theory if we miss some concurrent change we're OK because whoever's
> making that change will then also call svc_xprt_enqueue.  But that's not
> enough; e.g.:
> 
> 	task 1				task 2
> 	------				------
> 	set XPT_DATA
> 					atomic_dec(xpt_nr_rqsts)
> 
> 	check XPT_DATA && check xpt_nr_rqsts
> 
> 					check XPT_DATA && check xpt_nr_rqsts
> 
> If the tasks only see their local changes, then neither see both
> conditions true, so the socket doesn't get enqueued.  (And a request
> that was ready to be processed will sit around until someone else comes
> calls svc_xprt_enqueue() on that xprt.)

So maybe we actually need

 static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
 {
+	mb();
 	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
 		return true;
 	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {

Then whichever memory barrier executes second guarantees that the
following check sees the result of both the XPT_DATA and xpt_nr_rqsts
changes.  I think....

--b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-07 21:32       ` bfields
@ 2019-01-07 22:06         ` Trond Myklebust
  2019-01-08 15:01           ` bfields
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2019-01-07 22:06 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

On Mon, 2019-01-07 at 16:32 -0500, bfields@fieldses.org wrote:
> On Fri, Jan 04, 2019 at 12:39:12PM -0500, bfields@fieldses.org wrote:
> > I wonder if there's a race here independent of that change:
> > 
> > svc_xprt_enqueue() callers all do something like:
> > 
> > 	1. change some condition
> > 	2. call svc_xprt_enqueue() to check whether the xprt should 
> > 	   now be enqueued.
> > 
> > where the conditions are settings of the xpt_flags, or socket
> > wspace, or
> > xpt_nr_rqsts.
> > 
> > In theory if we miss some concurrent change we're OK because
> > whoever's
> > making that change will then also call svc_xprt_enqueue.  But
> > that's not
> > enough; e.g.:
> > 
> > 	task 1				task 2
> > 	------				------
> > 	set XPT_DATA
> > 					atomic_dec(xpt_nr_rqsts)
> > 
> > 	check XPT_DATA && check xpt_nr_rqsts
> > 
> > 					check XPT_DATA && check
> > xpt_nr_rqsts
> > 
> > If the tasks only see their local changes, then neither see both
> > conditions true, so the socket doesn't get enqueued.  (And a
> > request
> > that was ready to be processed will sit around until someone else
> > comes
> > calls svc_xprt_enqueue() on that xprt.)
> 
> So maybe we actually need
> 
>  static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
>  {
> +	mb();

You would at best need a 'smp_rmb()'. There is nothing to gain from
adding a write barrier here, and you don't even need a read barrier in
the non-smp case.

>  	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
>  		return true;
>  	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> 
> Then whichever memory barrier executes second guarantees that the
> following check sees the result of both the XPT_DATA and xpt_nr_rqsts
> changes.  I think....



-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-07 22:06         ` Trond Myklebust
@ 2019-01-08 15:01           ` bfields
  2019-01-08 16:21             ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: bfields @ 2019-01-08 15:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Mon, Jan 07, 2019 at 10:06:19PM +0000, Trond Myklebust wrote:
> On Mon, 2019-01-07 at 16:32 -0500, bfields@fieldses.org wrote:
> > So maybe we actually need
> > 
> >  static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
> >  {
> > +	mb();
> 
> You would at best need a 'smp_rmb()'. There is nothing to gain from
> adding a write barrier here,

That's not my understanding.

What we have is basically:

	1			2
	----			----
	WRITE to A		WRITE to B

	READ from A and B	READ from A and B

and we want to guarantee that at least one of those two reads will see
both of the writes.

A read barrier only orders reads with respect to the barrier, it doesn't
do anything about writes, so doesn't guarantee anything here.

--b.



> and you don't even need a read barrier in
> the non-smp case.
> 
> >  	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> >  		return true;
> >  	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> > 
> > Then whichever memory barrier executes second guarantees that the
> > following check sees the result of both the XPT_DATA and xpt_nr_rqsts
> > changes.  I think....
> 
> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-08 15:01           ` bfields
@ 2019-01-08 16:21             ` Trond Myklebust
  2019-01-09 16:51               ` bfields
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2019-01-08 16:21 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

On Tue, 2019-01-08 at 10:01 -0500, bfields@fieldses.org wrote:
> On Mon, Jan 07, 2019 at 10:06:19PM +0000, Trond Myklebust wrote:
> > On Mon, 2019-01-07 at 16:32 -0500, bfields@fieldses.org wrote:
> > > So maybe we actually need
> > > 
> > >  static bool (struct svc_xprt *xprt)
> > >  {
> > > +	mb();
> > 
> > You would at best need a 'smp_rmb()'. There is nothing to gain from
> > adding a write barrier here,
> 
> That's not my understanding.
> 
> What we have is basically:
> 
> 	1			2
> 	----			----
> 	WRITE to A		WRITE to B
> 
> 	READ from A and B	READ from A and B
> 
> and we want to guarantee that at least one of those two reads will
> see
> both of the writes.
> 
> A read barrier only orders reads with respect to the barrier, it
> doesn't
> do anything about writes, so doesn't guarantee anything here.

In this context 'WRITE to A' and/or 'WRITE to B' are presumably the
operations of setting the flag bits in xprt->xpt_flags, no? That's not
occurring here, it is occurring elsewhere.

The test_and_set_bit(XPT_DATA, &xprt->xpt_flags) in svc_data_ready()
performs an explicit barrier, so we shouldn't really care. The other
cases where we do set_bit(XPT_DATA) don't matter since the socket has
its own locking, and so the XPT_DATA is really just a test for whether
or not we need to enqueue the svc_xprt.

In the only place where XPT_DEFERRED is set, you have an implicit write
barrier (due to a spin_unlock) between the call to set_bit() and the
call to svc_xprt_enqueue(), so all data writes are guaranteed to be
complete before any attempt to enqueue the socket.

I can't see that you really care for the case of XPT_CONN, since the
just-created socket isn't going to be visible to other cpus until
you've added it to &pool->sp_sockets (which also has implicit write
barriers due to spin locks).

I don't think you really care for the case of XPT_CLOSE either since
svc_delete_xprt() doesn't depend on any other data writes that aren't
already protected by spinlocks.


So the conclusion would be to add smp_rmb() in
svc_xprt_has_something_to_do(). No extra write barriers are needed
AFAICS.
You may still need the READ_ONCE() in order to add a data dependency
barrier (i.e. to ensure that alpha processors don't reorder reads of
the xpt_flags with other speculative reads). That should reduce to a
standard read on all non-alpha architectures.

> 
> --b.
> 
> 
> 
> > and you don't even need a read barrier in
> > the non-smp case.
> > 
> > >  	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> > >  		return true;
> > >  	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> > > 
> > > Then whichever memory barrier executes second guarantees that the
> > > following check sees the result of both the XPT_DATA and
> > > xpt_nr_rqsts
> > > changes.  I think....
> > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-08 16:21             ` Trond Myklebust
@ 2019-01-09 16:51               ` bfields
  2019-01-09 17:41                 ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: bfields @ 2019-01-09 16:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Tue, Jan 08, 2019 at 04:21:40PM +0000, Trond Myklebust wrote:
> On Tue, 2019-01-08 at 10:01 -0500, bfields@fieldses.org wrote:
> > On Mon, Jan 07, 2019 at 10:06:19PM +0000, Trond Myklebust wrote:
> > > On Mon, 2019-01-07 at 16:32 -0500, bfields@fieldses.org wrote:
> > > > So maybe we actually need
> > > > 
> > > >  static bool (struct svc_xprt *xprt)
> > > >  {
> > > > +	mb();
> > > 
> > > You would at best need a 'smp_rmb()'. There is nothing to gain from
> > > adding a write barrier here,
> > 
> > That's not my understanding.
> > 
> > What we have is basically:
> > 
> > 	1			2
> > 	----			----
> > 	WRITE to A		WRITE to B
> > 
> > 	READ from A and B	READ from A and B
> > 
> > and we want to guarantee that at least one of those two reads will
> > see
> > both of the writes.
> > 
> > A read barrier only orders reads with respect to the barrier, it
> > doesn't
> > do anything about writes, so doesn't guarantee anything here.
> 
> In this context 'WRITE to A' and/or 'WRITE to B' are presumably the
> operations of setting the flag bits in xprt->xpt_flags, no?

Right, or I guess sk_sock->flags, or an atomic operation on xpt_reserved
or xpt_nr_rqsts.

> That's not occurring here, it is occurring elsewhere.

Right.  And I hadn't tried to verify whether there were corresponding
(possibly implicit) write barriers in those places, thanks for doing
that work:

> The test_and_set_bit(XPT_DATA, &xprt->xpt_flags) in svc_data_ready()
> performs an explicit barrier, so we shouldn't really care.

OK.

> The other cases where we do set_bit(XPT_DATA) don't matter since the
> socket has its own locking, and so the XPT_DATA is really just a test
> for whether or not we need to enqueue the svc_xprt.

I'm not following, apologies.

In any case it's set only on initialization or in recvfrom, and in the
recvfrom case I think the

	smp_mb__before_atomic();
	clear_bit(XPT_BUSY, &xprt->xpt_flags);

in svc_xprt_received() provides the necessary write barrier.

But there are some exceptions in the rdma code, in svc_rdma_wc_receive
and svc_rdma_wc_done.

> In the only place where XPT_DEFERRED is set, you have an implicit write
> barrier (due to a spin_unlock) between the call to set_bit() and the
> call to svc_xprt_enqueue(), so all data writes are guaranteed to be
> complete before any attempt to enqueue the socket.

OK.

> I can't see that you really care for the case of XPT_CONN, since the
> just-created socket isn't going to be visible to other cpus until
> you've added it to &pool->sp_sockets (which also has implicit write
> barriers due to spin locks).
> 
> I don't think you really care for the case of XPT_CLOSE either since
> svc_delete_xprt() doesn't depend on any other data writes that aren't
> already protected by spinlocks.

OK.  Yes, I'm not worried about XPT_CONN or XPT_CLOSE.

> So the conclusion would be to add smp_rmb() in
> svc_xprt_has_something_to_do(). No extra write barriers are needed
> AFAICS.
> You may still need the READ_ONCE() in order to add a data dependency
> barrier (i.e. to ensure that alpha processors don't reorder reads of
> the xpt_flags with other speculative reads). That should reduce to a
> standard read on all non-alpha architectures.

That looks unnecessary; memory-barriers.txt say "Read memory barriers
imply data dependency barriers", and later "As of v4.15 of the Linux
kernel, an smp_read_barrier_depends() was added to READ_ONCE()".

I still wonder about:

	- the RDMA cases above.
	- svc_xprt_release_slot: no write barrier after writing to
	  xprt->xpt_nr_rqsts.
	- svc_reserve: no barrier after writing to xpt_reserved

Also svc_write_space is setting SOCK_NOSPACE and then calling
svc_xprt_enqueue.  I'm pretty sure the sk_write_space method has to have
a write barrier after that, though, so this is OK.

--b.

> 
> > 
> > --b.
> > 
> > 
> > 
> > > and you don't even need a read barrier in
> > > the non-smp case.
> > > 
> > > >  	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> > > >  		return true;
> > > >  	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> > > > 
> > > > Then whichever memory barrier executes second guarantees that the
> > > > following check sees the result of both the XPT_DATA and
> > > > xpt_nr_rqsts
> > > > changes.  I think....
> > > 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-09 16:51               ` bfields
@ 2019-01-09 17:41                 ` Trond Myklebust
  2019-01-11 21:12                   ` bfields
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2019-01-09 17:41 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

On Wed, 2019-01-09 at 11:51 -0500, bfields@fieldses.org wrote:
> On Tue, Jan 08, 2019 at 04:21:40PM +0000, Trond Myklebust wrote:
> > On Tue, 2019-01-08 at 10:01 -0500, bfields@fieldses.org wrote:
> > > On Mon, Jan 07, 2019 at 10:06:19PM +0000, Trond Myklebust wrote:
> > > > On Mon, 2019-01-07 at 16:32 -0500, bfields@fieldses.org wrote:
> > > > > So maybe we actually need
> > > > > 
> > > > >  static bool (struct svc_xprt *xprt)
> > > > >  {
> > > > > +	mb();
> > > > 
> > > > You would at best need a 'smp_rmb()'. There is nothing to gain
> > > > from
> > > > adding a write barrier here,
> > > 
> > > That's not my understanding.
> > > 
> > > What we have is basically:
> > > 
> > > 	1			2
> > > 	----			----
> > > 	WRITE to A		WRITE to B
> > > 
> > > 	READ from A and B	READ from A and B
> > > 
> > > and we want to guarantee that at least one of those two reads
> > > will
> > > see
> > > both of the writes.
> > > 
> > > A read barrier only orders reads with respect to the barrier, it
> > > doesn't
> > > do anything about writes, so doesn't guarantee anything here.
> > 
> > In this context 'WRITE to A' and/or 'WRITE to B' are presumably the
> > operations of setting the flag bits in xprt->xpt_flags, no?
> 
> Right, or I guess sk_sock->flags, or an atomic operation on
> xpt_reserved
> or xpt_nr_rqsts.
> 
> > That's not occurring here, it is occurring elsewhere.
> 
> Right.  And I hadn't tried to verify whether there were corresponding
> (possibly implicit) write barriers in those places, thanks for doing
> that work:
> 
> > The test_and_set_bit(XPT_DATA, &xprt->xpt_flags) in
> > svc_data_ready()
> > performs an explicit barrier, so we shouldn't really care.
> 
> OK.
> 
> > The other cases where we do set_bit(XPT_DATA) don't matter since
> > the
> > socket has its own locking, and so the XPT_DATA is really just a
> > test
> > for whether or not we need to enqueue the svc_xprt.
> 
> I'm not following, apologies.
> 
> In any case it's set only on initialization or in recvfrom, and in
> the
> recvfrom case I think the
> 
> 	smp_mb__before_atomic();
> 	clear_bit(XPT_BUSY, &xprt->xpt_flags);
> 
> in svc_xprt_received() provides the necessary write barrier.
> 
> But there are some exceptions in the rdma code, in
> svc_rdma_wc_receive
> and svc_rdma_wc_done.
> 
> > In the only place where XPT_DEFERRED is set, you have an implicit
> > write
> > barrier (due to a spin_unlock) between the call to set_bit() and
> > the
> > call to svc_xprt_enqueue(), so all data writes are guaranteed to be
> > complete before any attempt to enqueue the socket.
> 
> OK.
> 
> > I can't see that you really care for the case of XPT_CONN, since
> > the
> > just-created socket isn't going to be visible to other cpus until
> > you've added it to &pool->sp_sockets (which also has implicit write
> > barriers due to spin locks).
> > 
> > I don't think you really care for the case of XPT_CLOSE either
> > since
> > svc_delete_xprt() doesn't depend on any other data writes that
> > aren't
> > already protected by spinlocks.
> 
> OK.  Yes, I'm not worried about XPT_CONN or XPT_CLOSE.
> 
> > So the conclusion would be to add smp_rmb() in
> > svc_xprt_has_something_to_do(). No extra write barriers are needed
> > AFAICS.
> > You may still need the READ_ONCE() in order to add a data
> > dependency
> > barrier (i.e. to ensure that alpha processors don't reorder reads
> > of
> > the xpt_flags with other speculative reads). That should reduce to
> > a
> > standard read on all non-alpha architectures.
> 
> That looks unnecessary; memory-barriers.txt say "Read memory barriers
> imply data dependency barriers", and later "As of v4.15 of the Linux
> kernel, an smp_read_barrier_depends() was added to READ_ONCE()".
> 

The above is stating that

smp_rmb();
smp_read_barrier_depends();
if (xprt->xpt_flags & ....)

is redundant and can be replaced with just

smp_rmb();
if (xprt->xpt_flags & ....)

However that's not the case for smp_rmb() followed by READ_ONCE(). That
would expand to

smp_rmb();
if (xprt->xpt_flags & ...) {
    smp_read_barrier_depends();
} else
    smp_read_barrier_depends();

which is not redundant. It is ensuring (on alpha only) that the read of
xprt->xpt_flags is also not re-ordered w.r.t. other data reads that
follow.

See, for instance, kernel/events/core.c which has several examples, or
kernel/exit.c.

> I still wonder about:
> 
> 	- the RDMA cases above.
> 	- svc_xprt_release_slot: no write barrier after writing to
> 	  xprt->xpt_nr_rqsts.
> 	- svc_reserve: no barrier after writing to xpt_reserved
> 
> Also svc_write_space is setting SOCK_NOSPACE and then calling
> svc_xprt_enqueue.  I'm pretty sure the sk_write_space method has to
> have
> a write barrier after that, though, so this is OK.
> 
> --b.
> 
> > > --b.
> > > 
> > > 
> > > 
> > > > and you don't even need a read barrier in
> > > > the non-smp case.
> > > > 
> > > > >  	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> > > > >  		return true;
> > > > >  	if (xprt->xpt_flags &
> > > > > ((1<<XPT_DATA)|(1<<XPT_DEFERRED))) {
> > > > > 
> > > > > Then whichever memory barrier executes second guarantees that
> > > > > the
> > > > > following check sees the result of both the XPT_DATA and
> > > > > xpt_nr_rqsts
> > > > > changes.  I think....
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-09 17:41                 ` Trond Myklebust
@ 2019-01-11 21:12                   ` bfields
  2019-01-11 21:52                     ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: bfields @ 2019-01-11 21:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, Chuck Lever

On Wed, Jan 09, 2019 at 05:41:36PM +0000, Trond Myklebust wrote:
> The above is stating that
> 
> smp_rmb();
> smp_read_barrier_depends();
> if (xprt->xpt_flags & ....)
> 
> is redundant and can be replaced with just
> 
> smp_rmb();
> if (xprt->xpt_flags & ....)
> 
> However that's not the case for smp_rmb() followed by READ_ONCE(). That
> would expand to
> 
> smp_rmb();
> if (xprt->xpt_flags & ...) {
>     smp_read_barrier_depends();
> } else
>     smp_read_barrier_depends();
> 
> which is not redundant. It is ensuring (on alpha only) that the read of
> xprt->xpt_flags is also not re-ordered w.r.t. other data reads that
> follow.
> 
> See, for instance, kernel/events/core.c which has several examples, or
> kernel/exit.c.

You're right, I was confused.

So, I think we need your patch plus something like this.

Chuck, maybe you could help me with the "XXX: Chuck:" parts?

(This applies on top of your patch plus another that just renames the
stupidly long svc_xprt_has_something_to_do() to svc_xprt_read().)

--b.

commit d7356c3250d4
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Fri Jan 11 15:36:40 2019 -0500

    svcrpc: fix unlikely races preventing queueing of sockets
    
    In the rpc server, When something happens that might be reason to wake
    up a thread to do something, what we do is
    
            - modify xpt_flags, sk_sock->flags, xpt_reserved, or
              xpt_nr_rqsts to indicate the new situation
            - call svc_xprt_enqueue() to decide whether to wake up a thread.
    
    svc_xprt_enqueue may require multiple conditions to be true before
    queueing up a thread to handle the xprt.  In the SMP case, one of the
    other CPU's may have set another required condition, and in that case,
    although both CPUs run svc_xprt_enqueue(), it's possible that neither
    call sees the writes done by the other CPU in time, and neither one
    recognizes that all the required conditions have been set.  A socket
    could therefore be ignored indefinitely.
    
    Add memory barries to ensure that any svc_xprt_enqueue() call will
    always see the conditions changed by other CPUs before deciding to
    ignore a socket.
    
    I've never seen this race reported.  In the unlikely event it happens,
    another event will usually come along and the problem will fix itself.
    So I don't think this is worth backporting to stable.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d410ae512b02..2af21b84b3b6 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -357,6 +357,7 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
 	struct svc_xprt	*xprt = rqstp->rq_xprt;
 	if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
 		atomic_dec(&xprt->xpt_nr_rqsts);
+		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
 		svc_xprt_enqueue(xprt);
 	}
 }
@@ -365,6 +366,15 @@ static bool svc_xprt_ready(struct svc_xprt *xprt)
 {
 	unsigned long xpt_flags;
 
+	/*
+	 * If another cpu has recently updated xpt_flags,
+	 * sk_sock->flags, xpt_reserved, or xpt_nr_rqsts, we need to
+	 * know about it; otherwise it's possible that both that cpu and
+	 * this one could call svc_xprt_enqueue() without either
+	 * svc_xprt_enqueue() recognizing that the conditions below
+	 * are satisfied, and we could stall indefinitely:
+	 */
+	smp_rmb();
 	READ_ONCE(xprt->xpt_flags);
 
 	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
@@ -479,7 +489,7 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
 	if (xprt && space < rqstp->rq_reserved) {
 		atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
 		rqstp->rq_reserved = space;
-
+		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
 		svc_xprt_enqueue(xprt);
 	}
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 828b149eaaef..377244992ae8 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
 	spin_unlock(&rdma->sc_rq_dto_lock);
 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
+	/* XXX: Chuck: do we need an smp_mb__after_atomic() here? */
 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
 		svc_xprt_enqueue(&rdma->sc_xprt);
 	goto out;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index dc1951759a8e..e1a790487d69 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -290,6 +290,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
 		spin_unlock(&rdma->sc_rq_dto_lock);
 
 		set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
+		/* XXX: Chuck: do we need a smp_mb__after_atomic() here? */
 		svc_xprt_enqueue(&rdma->sc_xprt);
 	}
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-11 21:12                   ` bfields
@ 2019-01-11 21:52                     ` Chuck Lever
  2019-01-11 21:54                       ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2019-01-11 21:52 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List



> On Jan 11, 2019, at 4:12 PM, bfields@fieldses.org wrote:
> 
> On Wed, Jan 09, 2019 at 05:41:36PM +0000, Trond Myklebust wrote:
>> The above is stating that
>> 
>> smp_rmb();
>> smp_read_barrier_depends();
>> if (xprt->xpt_flags & ....)
>> 
>> is redundant and can be replaced with just
>> 
>> smp_rmb();
>> if (xprt->xpt_flags & ....)
>> 
>> However that's not the case for smp_rmb() followed by READ_ONCE(). That
>> would expand to
>> 
>> smp_rmb();
>> if (xprt->xpt_flags & ...) {
>>    smp_read_barrier_depends();
>> } else
>>    smp_read_barrier_depends();
>> 
>> which is not redundant. It is ensuring (on alpha only) that the read of
>> xprt->xpt_flags is also not re-ordered w.r.t. other data reads that
>> follow.
>> 
>> See, for instance, kernel/events/core.c which has several examples, or
>> kernel/exit.c.
> 
> You're right, I was confused.
> 
> So, I think we need your patch plus something like this.
> 
> Chuck, maybe you could help me with the "XXX: Chuck:" parts?

I haven't been following. Why do you think those are necessary?
We've had set_bit and atomic_{inc,dec} in this code for ages,
and I've never noticed a problem.

Rather than adding another CPU pipeline bubble in the RDMA code,
though, could you simply move the set_bit() call site inside the
critical sections?


> (This applies on top of your patch plus another that just renames the
> stupidly long svc_xprt_has_something_to_do() to svc_xprt_read().)
> 
> --b.
> 
> commit d7356c3250d4
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Fri Jan 11 15:36:40 2019 -0500
> 
>    svcrpc: fix unlikely races preventing queueing of sockets
> 
>    In the rpc server, When something happens that might be reason to wake
>    up a thread to do something, what we do is
> 
>            - modify xpt_flags, sk_sock->flags, xpt_reserved, or
>              xpt_nr_rqsts to indicate the new situation
>            - call svc_xprt_enqueue() to decide whether to wake up a thread.
> 
>    svc_xprt_enqueue may require multiple conditions to be true before
>    queueing up a thread to handle the xprt.  In the SMP case, one of the
>    other CPU's may have set another required condition, and in that case,
>    although both CPUs run svc_xprt_enqueue(), it's possible that neither
>    call sees the writes done by the other CPU in time, and neither one
>    recognizes that all the required conditions have been set.  A socket
>    could therefore be ignored indefinitely.
> 
>    Add memory barries to ensure that any svc_xprt_enqueue() call will
>    always see the conditions changed by other CPUs before deciding to
>    ignore a socket.
> 
>    I've never seen this race reported.  In the unlikely event it happens,
>    another event will usually come along and the problem will fix itself.
>    So I don't think this is worth backporting to stable.
> 
>    Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index d410ae512b02..2af21b84b3b6 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -357,6 +357,7 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
> 	struct svc_xprt	*xprt = rqstp->rq_xprt;
> 	if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
> 		atomic_dec(&xprt->xpt_nr_rqsts);
> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> 		svc_xprt_enqueue(xprt);
> 	}
> }
> @@ -365,6 +366,15 @@ static bool svc_xprt_ready(struct svc_xprt *xprt)
> {
> 	unsigned long xpt_flags;
> 
> +	/*
> +	 * If another cpu has recently updated xpt_flags,
> +	 * sk_sock->flags, xpt_reserved, or xpt_nr_rqsts, we need to
> +	 * know about it; otherwise it's possible that both that cpu and
> +	 * this one could call svc_xprt_enqueue() without either
> +	 * svc_xprt_enqueue() recognizing that the conditions below
> +	 * are satisfied, and we could stall indefinitely:
> +	 */
> +	smp_rmb();
> 	READ_ONCE(xprt->xpt_flags);
> 
> 	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
> @@ -479,7 +489,7 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
> 	if (xprt && space < rqstp->rq_reserved) {
> 		atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
> 		rqstp->rq_reserved = space;
> -
> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> 		svc_xprt_enqueue(xprt);
> 	}
> }
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 828b149eaaef..377244992ae8 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
> 	spin_unlock(&rdma->sc_rq_dto_lock);
> 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> +	/* XXX: Chuck: do we need an smp_mb__after_atomic() here? */
> 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
> 		svc_xprt_enqueue(&rdma->sc_xprt);
> 	goto out;
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> index dc1951759a8e..e1a790487d69 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> @@ -290,6 +290,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
> 		spin_unlock(&rdma->sc_rq_dto_lock);
> 
> 		set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> +		/* XXX: Chuck: do we need a smp_mb__after_atomic() here? */
> 		svc_xprt_enqueue(&rdma->sc_xprt);
> 	}
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-11 21:52                     ` Chuck Lever
@ 2019-01-11 21:54                       ` Chuck Lever
  2019-01-11 22:10                         ` Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2019-01-11 21:54 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List



> On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Jan 11, 2019, at 4:12 PM, bfields@fieldses.org wrote:
>> 
>> On Wed, Jan 09, 2019 at 05:41:36PM +0000, Trond Myklebust wrote:
>>> The above is stating that
>>> 
>>> smp_rmb();
>>> smp_read_barrier_depends();
>>> if (xprt->xpt_flags & ....)
>>> 
>>> is redundant and can be replaced with just
>>> 
>>> smp_rmb();
>>> if (xprt->xpt_flags & ....)
>>> 
>>> However that's not the case for smp_rmb() followed by READ_ONCE(). That
>>> would expand to
>>> 
>>> smp_rmb();
>>> if (xprt->xpt_flags & ...) {
>>>   smp_read_barrier_depends();
>>> } else
>>>   smp_read_barrier_depends();
>>> 
>>> which is not redundant. It is ensuring (on alpha only) that the read of
>>> xprt->xpt_flags is also not re-ordered w.r.t. other data reads that
>>> follow.
>>> 
>>> See, for instance, kernel/events/core.c which has several examples, or
>>> kernel/exit.c.
>> 
>> You're right, I was confused.
>> 
>> So, I think we need your patch plus something like this.
>> 
>> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
> 
> I haven't been following. Why do you think those are necessary?
> We've had set_bit and atomic_{inc,dec} in this code for ages,
> and I've never noticed a problem.
> 
> Rather than adding another CPU pipeline bubble in the RDMA code,
> though, could you simply move the set_bit() call site inside the
> critical sections?

er, inside the preceding critical section. Just reverse the order
of the spin_unlock and the set_bit.


> 
> 
>> (This applies on top of your patch plus another that just renames the
>> stupidly long svc_xprt_has_something_to_do() to svc_xprt_read().)
>> 
>> --b.
>> 
>> commit d7356c3250d4
>> Author: J. Bruce Fields <bfields@redhat.com>
>> Date:   Fri Jan 11 15:36:40 2019 -0500
>> 
>>   svcrpc: fix unlikely races preventing queueing of sockets
>> 
>>   In the rpc server, When something happens that might be reason to wake
>>   up a thread to do something, what we do is
>> 
>>           - modify xpt_flags, sk_sock->flags, xpt_reserved, or
>>             xpt_nr_rqsts to indicate the new situation
>>           - call svc_xprt_enqueue() to decide whether to wake up a thread.
>> 
>>   svc_xprt_enqueue may require multiple conditions to be true before
>>   queueing up a thread to handle the xprt.  In the SMP case, one of the
>>   other CPU's may have set another required condition, and in that case,
>>   although both CPUs run svc_xprt_enqueue(), it's possible that neither
>>   call sees the writes done by the other CPU in time, and neither one
>>   recognizes that all the required conditions have been set.  A socket
>>   could therefore be ignored indefinitely.
>> 
>>   Add memory barries to ensure that any svc_xprt_enqueue() call will
>>   always see the conditions changed by other CPUs before deciding to
>>   ignore a socket.
>> 
>>   I've never seen this race reported.  In the unlikely event it happens,
>>   another event will usually come along and the problem will fix itself.
>>   So I don't think this is worth backporting to stable.
>> 
>>   Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>> 
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index d410ae512b02..2af21b84b3b6 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -357,6 +357,7 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
>> 	struct svc_xprt	*xprt = rqstp->rq_xprt;
>> 	if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
>> 		atomic_dec(&xprt->xpt_nr_rqsts);
>> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
>> 		svc_xprt_enqueue(xprt);
>> 	}
>> }
>> @@ -365,6 +366,15 @@ static bool svc_xprt_ready(struct svc_xprt *xprt)
>> {
>> 	unsigned long xpt_flags;
>> 
>> +	/*
>> +	 * If another cpu has recently updated xpt_flags,
>> +	 * sk_sock->flags, xpt_reserved, or xpt_nr_rqsts, we need to
>> +	 * know about it; otherwise it's possible that both that cpu and
>> +	 * this one could call svc_xprt_enqueue() without either
>> +	 * svc_xprt_enqueue() recognizing that the conditions below
>> +	 * are satisfied, and we could stall indefinitely:
>> +	 */
>> +	smp_rmb();
>> 	READ_ONCE(xprt->xpt_flags);
>> 
>> 	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
>> @@ -479,7 +489,7 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
>> 	if (xprt && space < rqstp->rq_reserved) {
>> 		atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
>> 		rqstp->rq_reserved = space;
>> -
>> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
>> 		svc_xprt_enqueue(xprt);
>> 	}
>> }
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index 828b149eaaef..377244992ae8 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
>> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
>> 	spin_unlock(&rdma->sc_rq_dto_lock);
>> 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
>> +	/* XXX: Chuck: do we need an smp_mb__after_atomic() here? */
>> 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
>> 		svc_xprt_enqueue(&rdma->sc_xprt);
>> 	goto out;
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> index dc1951759a8e..e1a790487d69 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> @@ -290,6 +290,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
>> 		spin_unlock(&rdma->sc_rq_dto_lock);
>> 
>> 		set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
>> +		/* XXX: Chuck: do we need a smp_mb__after_atomic() here? */
>> 		svc_xprt_enqueue(&rdma->sc_xprt);
>> 	}
>> 
> 
> --
> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-11 21:54                       ` Chuck Lever
@ 2019-01-11 22:10                         ` Bruce Fields
  2019-01-11 22:27                           ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Bruce Fields @ 2019-01-11 22:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
> > On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> So, I think we need your patch plus something like this.
> >> 
> >> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
> > 
> > I haven't been following. Why do you think those are necessary?

I'm worried something like this could happen:

	CPU 1				CPU 2
	-----				-----

	set XPT_DATA			dec xpt_nr_rqsts

	svc_xprt_enqueue		svc_xprt_enqueue

And both decide nothing should be done if neither sees the change that
the other made.

Maybe I'm still missing some reason that couldn't happen.

Even if it can happen, it's an unlikely race that will likely be fixed
when another event comes along a little later, which would explain why
we've never seen any reports.

> > We've had set_bit and atomic_{inc,dec} in this code for ages,
> > and I've never noticed a problem.
> > 
> > Rather than adding another CPU pipeline bubble in the RDMA code,
> > though, could you simply move the set_bit() call site inside the
> > critical sections?
> 
> er, inside the preceding critical section. Just reverse the order
> of the spin_unlock and the set_bit.

That'd do it, thanks!

--b.

> 
> 
> > 
> > 
> >> (This applies on top of your patch plus another that just renames the
> >> stupidly long svc_xprt_has_something_to_do() to svc_xprt_read().)
> >> 
> >> --b.
> >> 
> >> commit d7356c3250d4
> >> Author: J. Bruce Fields <bfields@redhat.com>
> >> Date:   Fri Jan 11 15:36:40 2019 -0500
> >> 
> >>   svcrpc: fix unlikely races preventing queueing of sockets
> >> 
> >>   In the rpc server, When something happens that might be reason to wake
> >>   up a thread to do something, what we do is
> >> 
> >>           - modify xpt_flags, sk_sock->flags, xpt_reserved, or
> >>             xpt_nr_rqsts to indicate the new situation
> >>           - call svc_xprt_enqueue() to decide whether to wake up a thread.
> >> 
> >>   svc_xprt_enqueue may require multiple conditions to be true before
> >>   queueing up a thread to handle the xprt.  In the SMP case, one of the
> >>   other CPU's may have set another required condition, and in that case,
> >>   although both CPUs run svc_xprt_enqueue(), it's possible that neither
> >>   call sees the writes done by the other CPU in time, and neither one
> >>   recognizes that all the required conditions have been set.  A socket
> >>   could therefore be ignored indefinitely.
> >> 
> >>   Add memory barries to ensure that any svc_xprt_enqueue() call will
> >>   always see the conditions changed by other CPUs before deciding to
> >>   ignore a socket.
> >> 
> >>   I've never seen this race reported.  In the unlikely event it happens,
> >>   another event will usually come along and the problem will fix itself.
> >>   So I don't think this is worth backporting to stable.
> >> 
> >>   Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> >> 
> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >> index d410ae512b02..2af21b84b3b6 100644
> >> --- a/net/sunrpc/svc_xprt.c
> >> +++ b/net/sunrpc/svc_xprt.c
> >> @@ -357,6 +357,7 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
> >> 	struct svc_xprt	*xprt = rqstp->rq_xprt;
> >> 	if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
> >> 		atomic_dec(&xprt->xpt_nr_rqsts);
> >> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> >> 		svc_xprt_enqueue(xprt);
> >> 	}
> >> }
> >> @@ -365,6 +366,15 @@ static bool svc_xprt_ready(struct svc_xprt *xprt)
> >> {
> >> 	unsigned long xpt_flags;
> >> 
> >> +	/*
> >> +	 * If another cpu has recently updated xpt_flags,
> >> +	 * sk_sock->flags, xpt_reserved, or xpt_nr_rqsts, we need to
> >> +	 * know about it; otherwise it's possible that both that cpu and
> >> +	 * this one could call svc_xprt_enqueue() without either
> >> +	 * svc_xprt_enqueue() recognizing that the conditions below
> >> +	 * are satisfied, and we could stall indefinitely:
> >> +	 */
> >> +	smp_rmb();
> >> 	READ_ONCE(xprt->xpt_flags);
> >> 
> >> 	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
> >> @@ -479,7 +489,7 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
> >> 	if (xprt && space < rqstp->rq_reserved) {
> >> 		atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
> >> 		rqstp->rq_reserved = space;
> >> -
> >> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> >> 		svc_xprt_enqueue(xprt);
> >> 	}
> >> }
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> index 828b149eaaef..377244992ae8 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> @@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> >> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
> >> 	spin_unlock(&rdma->sc_rq_dto_lock);
> >> 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> >> +	/* XXX: Chuck: do we need an smp_mb__after_atomic() here? */
> >> 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
> >> 		svc_xprt_enqueue(&rdma->sc_xprt);
> >> 	goto out;
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> index dc1951759a8e..e1a790487d69 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> @@ -290,6 +290,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
> >> 		spin_unlock(&rdma->sc_rq_dto_lock);
> >> 
> >> 		set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> >> +		/* XXX: Chuck: do we need a smp_mb__after_atomic() here? */
> >> 		svc_xprt_enqueue(&rdma->sc_xprt);
> >> 	}
> >> 
> > 
> > --
> > Chuck Lever
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-11 22:10                         ` Bruce Fields
@ 2019-01-11 22:27                           ` Chuck Lever
  2019-01-12  0:56                             ` Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2019-01-11 22:27 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List



> On Jan 11, 2019, at 5:10 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
>>> On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>> So, I think we need your patch plus something like this.
>>>> 
>>>> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
>>> 
>>> I haven't been following. Why do you think those are necessary?
> 
> I'm worried something like this could happen:
> 
> 	CPU 1				CPU 2
> 	-----				-----
> 
> 	set XPT_DATA			dec xpt_nr_rqsts
> 
> 	svc_xprt_enqueue		svc_xprt_enqueue
> 
> And both decide nothing should be done if neither sees the change that
> the other made.
> 
> Maybe I'm still missing some reason that couldn't happen.
> 
> Even if it can happen, it's an unlikely race that will likely be fixed
> when another event comes along a little later, which would explain why
> we've never seen any reports.
> 
>>> We've had set_bit and atomic_{inc,dec} in this code for ages,
>>> and I've never noticed a problem.
>>> 
>>> Rather than adding another CPU pipeline bubble in the RDMA code,
>>> though, could you simply move the set_bit() call site inside the
>>> critical sections?
>> 
>> er, inside the preceding critical section. Just reverse the order
>> of the spin_unlock and the set_bit.
> 
> That'd do it, thanks!

I can try that here and see if it results in a performance regression.


> --b.
> 
>> 
>> 
>>> 
>>> 
>>>> (This applies on top of your patch plus another that just renames the
>>>> stupidly long svc_xprt_has_something_to_do() to svc_xprt_read().)
>>>> 
>>>> --b.
>>>> 
>>>> commit d7356c3250d4
>>>> Author: J. Bruce Fields <bfields@redhat.com>
>>>> Date:   Fri Jan 11 15:36:40 2019 -0500
>>>> 
>>>>  svcrpc: fix unlikely races preventing queueing of sockets
>>>> 
>>>>  In the rpc server, When something happens that might be reason to wake
>>>>  up a thread to do something, what we do is
>>>> 
>>>>          - modify xpt_flags, sk_sock->flags, xpt_reserved, or
>>>>            xpt_nr_rqsts to indicate the new situation
>>>>          - call svc_xprt_enqueue() to decide whether to wake up a thread.
>>>> 
>>>>  svc_xprt_enqueue may require multiple conditions to be true before
>>>>  queueing up a thread to handle the xprt.  In the SMP case, one of the
>>>>  other CPU's may have set another required condition, and in that case,
>>>>  although both CPUs run svc_xprt_enqueue(), it's possible that neither
>>>>  call sees the writes done by the other CPU in time, and neither one
>>>>  recognizes that all the required conditions have been set.  A socket
>>>>  could therefore be ignored indefinitely.
>>>> 
>>>>  Add memory barries to ensure that any svc_xprt_enqueue() call will
>>>>  always see the conditions changed by other CPUs before deciding to
>>>>  ignore a socket.
>>>> 
>>>>  I've never seen this race reported.  In the unlikely event it happens,
>>>>  another event will usually come along and the problem will fix itself.
>>>>  So I don't think this is worth backporting to stable.
>>>> 
>>>>  Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>>>> 
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index d410ae512b02..2af21b84b3b6 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -357,6 +357,7 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
>>>> 	struct svc_xprt	*xprt = rqstp->rq_xprt;
>>>> 	if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
>>>> 		atomic_dec(&xprt->xpt_nr_rqsts);
>>>> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
>>>> 		svc_xprt_enqueue(xprt);
>>>> 	}
>>>> }
>>>> @@ -365,6 +366,15 @@ static bool svc_xprt_ready(struct svc_xprt *xprt)
>>>> {
>>>> 	unsigned long xpt_flags;
>>>> 
>>>> +	/*
>>>> +	 * If another cpu has recently updated xpt_flags,
>>>> +	 * sk_sock->flags, xpt_reserved, or xpt_nr_rqsts, we need to
>>>> +	 * know about it; otherwise it's possible that both that cpu and
>>>> +	 * this one could call svc_xprt_enqueue() without either
>>>> +	 * svc_xprt_enqueue() recognizing that the conditions below
>>>> +	 * are satisfied, and we could stall indefinitely:
>>>> +	 */
>>>> +	smp_rmb();
>>>> 	READ_ONCE(xprt->xpt_flags);
>>>> 
>>>> 	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
>>>> @@ -479,7 +489,7 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
>>>> 	if (xprt && space < rqstp->rq_reserved) {
>>>> 		atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
>>>> 		rqstp->rq_reserved = space;
>>>> -
>>>> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
>>>> 		svc_xprt_enqueue(xprt);
>>>> 	}
>>>> }
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> index 828b149eaaef..377244992ae8 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> @@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
>>>> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
>>>> 	spin_unlock(&rdma->sc_rq_dto_lock);
>>>> 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
>>>> +	/* XXX: Chuck: do we need an smp_mb__after_atomic() here? */
>>>> 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
>>>> 		svc_xprt_enqueue(&rdma->sc_xprt);
>>>> 	goto out;
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>>>> index dc1951759a8e..e1a790487d69 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>>>> @@ -290,6 +290,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
>>>> 		spin_unlock(&rdma->sc_rq_dto_lock);
>>>> 
>>>> 		set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
>>>> +		/* XXX: Chuck: do we need a smp_mb__after_atomic() here? */
>>>> 		svc_xprt_enqueue(&rdma->sc_xprt);
>>>> 	}
>>>> 
>>> 
>>> --
>>> Chuck Lever
>> 
>> --
>> Chuck Lever
>> 
>> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-11 22:27                           ` Chuck Lever
@ 2019-01-12  0:56                             ` Bruce Fields
  2019-01-14 17:24                               ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Bruce Fields @ 2019-01-12  0:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Fri, Jan 11, 2019 at 05:27:30PM -0500, Chuck Lever wrote:
> 
> 
> > On Jan 11, 2019, at 5:10 PM, Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
> >>> On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>> So, I think we need your patch plus something like this.
> >>>> 
> >>>> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
> >>> 
> >>> I haven't been following. Why do you think those are necessary?
> > 
> > I'm worried something like this could happen:
> > 
> > 	CPU 1				CPU 2
> > 	-----				-----
> > 
> > 	set XPT_DATA			dec xpt_nr_rqsts
> > 
> > 	svc_xprt_enqueue		svc_xprt_enqueue
> > 
> > And both decide nothing should be done if neither sees the change that
> > the other made.
> > 
> > Maybe I'm still missing some reason that couldn't happen.
> > 
> > Even if it can happen, it's an unlikely race that will likely be fixed
> > when another event comes along a little later, which would explain why
> > we've never seen any reports.
> > 
> >>> We've had set_bit and atomic_{inc,dec} in this code for ages,
> >>> and I've never noticed a problem.
> >>> 
> >>> Rather than adding another CPU pipeline bubble in the RDMA code,
> >>> though, could you simply move the set_bit() call site inside the
> >>> critical sections?
> >> 
> >> er, inside the preceding critical section. Just reverse the order
> >> of the spin_unlock and the set_bit.
> > 
> > That'd do it, thanks!
> 
> I can try that here and see if it results in a performance regression.

Thanks, I've got a version with a typo fixed at

	git://linux-nfs.org/~bfields/linux.git nfsd-next

--b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-12  0:56                             ` Bruce Fields
@ 2019-01-14 17:24                               ` Chuck Lever
  2019-01-25 20:30                                 ` Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2019-01-14 17:24 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List



> On Jan 11, 2019, at 7:56 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Fri, Jan 11, 2019 at 05:27:30PM -0500, Chuck Lever wrote:
>> 
>> 
>>> On Jan 11, 2019, at 5:10 PM, Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
>>>>> On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>>> So, I think we need your patch plus something like this.
>>>>>> 
>>>>>> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
>>>>> 
>>>>> I haven't been following. Why do you think those are necessary?
>>> 
>>> I'm worried something like this could happen:
>>> 
>>> 	CPU 1				CPU 2
>>> 	-----				-----
>>> 
>>> 	set XPT_DATA			dec xpt_nr_rqsts
>>> 
>>> 	svc_xprt_enqueue		svc_xprt_enqueue
>>> 
>>> And both decide nothing should be done if neither sees the change that
>>> the other made.
>>> 
>>> Maybe I'm still missing some reason that couldn't happen.
>>> 
>>> Even if it can happen, it's an unlikely race that will likely be fixed
>>> when another event comes along a little later, which would explain why
>>> we've never seen any reports.
>>> 
>>>>> We've had set_bit and atomic_{inc,dec} in this code for ages,
>>>>> and I've never noticed a problem.
>>>>> 
>>>>> Rather than adding another CPU pipeline bubble in the RDMA code,
>>>>> though, could you simply move the set_bit() call site inside the
>>>>> critical sections?
>>>> 
>>>> er, inside the preceding critical section. Just reverse the order
>>>> of the spin_unlock and the set_bit.
>>> 
>>> That'd do it, thanks!
>> 
>> I can try that here and see if it results in a performance regression.
> 
> Thanks, I've got a version with a typo fixed at
> 
> 	git://linux-nfs.org/~bfields/linux.git nfsd-next

Applied all four patches here. I don't see any performance regressions,
but my server has only a single last-level CPU cache.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-14 17:24                               ` Chuck Lever
@ 2019-01-25 20:30                                 ` Bruce Fields
  2019-01-25 21:32                                   ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Bruce Fields @ 2019-01-25 20:30 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Mon, Jan 14, 2019 at 12:24:24PM -0500, Chuck Lever wrote:
> 
> 
> > On Jan 11, 2019, at 7:56 PM, Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Fri, Jan 11, 2019 at 05:27:30PM -0500, Chuck Lever wrote:
> >> 
> >> 
> >>> On Jan 11, 2019, at 5:10 PM, Bruce Fields <bfields@fieldses.org> wrote:
> >>> 
> >>> On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
> >>>>> On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>>>> So, I think we need your patch plus something like this.
> >>>>>> 
> >>>>>> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
> >>>>> 
> >>>>> I haven't been following. Why do you think those are necessary?
> >>> 
> >>> I'm worried something like this could happen:
> >>> 
> >>> 	CPU 1				CPU 2
> >>> 	-----				-----
> >>> 
> >>> 	set XPT_DATA			dec xpt_nr_rqsts
> >>> 
> >>> 	svc_xprt_enqueue		svc_xprt_enqueue
> >>> 
> >>> And both decide nothing should be done if neither sees the change that
> >>> the other made.
> >>> 
> >>> Maybe I'm still missing some reason that couldn't happen.
> >>> 
> >>> Even if it can happen, it's an unlikely race that will likely be fixed
> >>> when another event comes along a little later, which would explain why
> >>> we've never seen any reports.
> >>> 
> >>>>> We've had set_bit and atomic_{inc,dec} in this code for ages,
> >>>>> and I've never noticed a problem.
> >>>>> 
> >>>>> Rather than adding another CPU pipeline bubble in the RDMA code,
> >>>>> though, could you simply move the set_bit() call site inside the
> >>>>> critical sections?
> >>>> 
> >>>> er, inside the preceding critical section. Just reverse the order
> >>>> of the spin_unlock and the set_bit.
> >>> 
> >>> That'd do it, thanks!
> >> 
> >> I can try that here and see if it results in a performance regression.
> > 
> > Thanks, I've got a version with a typo fixed at
> > 
> > 	git://linux-nfs.org/~bfields/linux.git nfsd-next
> 
> Applied all four patches here. I don't see any performance regressions,
> but my server has only a single last-level CPU cache.

Thanks!

I'm adding a Tested-by: for you if that's OK.

--b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
  2019-01-25 20:30                                 ` Bruce Fields
@ 2019-01-25 21:32                                   ` Chuck Lever
  0 siblings, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2019-01-25 21:32 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List



> On Jan 25, 2019, at 12:30 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Jan 14, 2019 at 12:24:24PM -0500, Chuck Lever wrote:
>> 
>> 
>>> On Jan 11, 2019, at 7:56 PM, Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Fri, Jan 11, 2019 at 05:27:30PM -0500, Chuck Lever wrote:
>>>> 
>>>> 
>>>>> On Jan 11, 2019, at 5:10 PM, Bruce Fields <bfields@fieldses.org> wrote:
>>>>> 
>>>>> On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
>>>>>>> On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>>>>> So, I think we need your patch plus something like this.
>>>>>>>> 
>>>>>>>> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
>>>>>>> 
>>>>>>> I haven't been following. Why do you think those are necessary?
>>>>> 
>>>>> I'm worried something like this could happen:
>>>>> 
>>>>> 	CPU 1				CPU 2
>>>>> 	-----				-----
>>>>> 
>>>>> 	set XPT_DATA			dec xpt_nr_rqsts
>>>>> 
>>>>> 	svc_xprt_enqueue		svc_xprt_enqueue
>>>>> 
>>>>> And both decide nothing should be done if neither sees the change that
>>>>> the other made.
>>>>> 
>>>>> Maybe I'm still missing some reason that couldn't happen.
>>>>> 
>>>>> Even if it can happen, it's an unlikely race that will likely be fixed
>>>>> when another event comes along a little later, which would explain why
>>>>> we've never seen any reports.
>>>>> 
>>>>>>> We've had set_bit and atomic_{inc,dec} in this code for ages,
>>>>>>> and I've never noticed a problem.
>>>>>>> 
>>>>>>> Rather than adding another CPU pipeline bubble in the RDMA code,
>>>>>>> though, could you simply move the set_bit() call site inside the
>>>>>>> critical sections?
>>>>>> 
>>>>>> er, inside the preceding critical section. Just reverse the order
>>>>>> of the spin_unlock and the set_bit.
>>>>> 
>>>>> That'd do it, thanks!
>>>> 
>>>> I can try that here and see if it results in a performance regression.
>>> 
>>> Thanks, I've got a version with a typo fixed at
>>> 
>>> 	git://linux-nfs.org/~bfields/linux.git nfsd-next
>> 
>> Applied all four patches here. I don't see any performance regressions,
>> but my server has only a single last-level CPU cache.
> 
> Thanks!
> 
> I'm adding a Tested-by: for you if that's OK.

Sorry, yes! that's fine with me.



--
Chuck Lever




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-01-25 21:33 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 14:17 [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot() Trond Myklebust
2019-01-03 22:45 ` J Bruce Fields
2019-01-03 23:40   ` Trond Myklebust
2019-01-04 17:39     ` bfields
2019-01-07 21:32       ` bfields
2019-01-07 22:06         ` Trond Myklebust
2019-01-08 15:01           ` bfields
2019-01-08 16:21             ` Trond Myklebust
2019-01-09 16:51               ` bfields
2019-01-09 17:41                 ` Trond Myklebust
2019-01-11 21:12                   ` bfields
2019-01-11 21:52                     ` Chuck Lever
2019-01-11 21:54                       ` Chuck Lever
2019-01-11 22:10                         ` Bruce Fields
2019-01-11 22:27                           ` Chuck Lever
2019-01-12  0:56                             ` Bruce Fields
2019-01-14 17:24                               ` Chuck Lever
2019-01-25 20:30                                 ` Bruce Fields
2019-01-25 21:32                                   ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).