linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bruce Fields <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
Date: Fri, 11 Jan 2019 17:10:30 -0500	[thread overview]
Message-ID: <20190111221030.GA28794@fieldses.org> (raw)
In-Reply-To: <6F5B73B7-E9F8-4FDB-8381-E5C02772C6A5@oracle.com>

On Fri, Jan 11, 2019 at 04:54:01PM -0500, Chuck Lever wrote:
> > On Jan 11, 2019, at 4:52 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> So, I think we need your patch plus something like this.
> >> 
> >> Chuck, maybe you could help me with the "XXX: Chuck:" parts?
> > 
> > I haven't been following. Why do you think those are necessary?

I'm worried something like this could happen:

	CPU 1				CPU 2
	-----				-----

	set XPT_DATA			dec xpt_nr_rqsts

	svc_xprt_enqueue		svc_xprt_enqueue

And both decide nothing should be done if neither sees the change that
the other made.

Maybe I'm still missing some reason that couldn't happen.

Even if it can happen, it's an unlikely race that will likely be fixed
when another event comes along a little later, which would explain why
we've never seen any reports.

> > We've had set_bit and atomic_{inc,dec} in this code for ages,
> > and I've never noticed a problem.
> > 
> > Rather than adding another CPU pipeline bubble in the RDMA code,
> > though, could you simply move the set_bit() call site inside the
> > critical sections?
> 
> er, inside the preceding critical section. Just reverse the order
> of the spin_unlock and the set_bit.

That'd do it, thanks!

--b.

> 
> 
> > 
> > 
> >> (This applies on top of your patch plus another that just renames the
> >> stupidly long svc_xprt_has_something_to_do() to svc_xprt_read().)
> >> 
> >> --b.
> >> 
> >> commit d7356c3250d4
> >> Author: J. Bruce Fields <bfields@redhat.com>
> >> Date:   Fri Jan 11 15:36:40 2019 -0500
> >> 
> >>   svcrpc: fix unlikely races preventing queueing of sockets
> >> 
> >>   In the rpc server, When something happens that might be reason to wake
> >>   up a thread to do something, what we do is
> >> 
> >>           - modify xpt_flags, sk_sock->flags, xpt_reserved, or
> >>             xpt_nr_rqsts to indicate the new situation
> >>           - call svc_xprt_enqueue() to decide whether to wake up a thread.
> >> 
> >>   svc_xprt_enqueue may require multiple conditions to be true before
> >>   queueing up a thread to handle the xprt.  In the SMP case, one of the
> >>   other CPU's may have set another required condition, and in that case,
> >>   although both CPUs run svc_xprt_enqueue(), it's possible that neither
> >>   call sees the writes done by the other CPU in time, and neither one
> >>   recognizes that all the required conditions have been set.  A socket
> >>   could therefore be ignored indefinitely.
> >> 
> >>   Add memory barries to ensure that any svc_xprt_enqueue() call will
> >>   always see the conditions changed by other CPUs before deciding to
> >>   ignore a socket.
> >> 
> >>   I've never seen this race reported.  In the unlikely event it happens,
> >>   another event will usually come along and the problem will fix itself.
> >>   So I don't think this is worth backporting to stable.
> >> 
> >>   Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> >> 
> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >> index d410ae512b02..2af21b84b3b6 100644
> >> --- a/net/sunrpc/svc_xprt.c
> >> +++ b/net/sunrpc/svc_xprt.c
> >> @@ -357,6 +357,7 @@ static void svc_xprt_release_slot(struct svc_rqst *rqstp)
> >> 	struct svc_xprt	*xprt = rqstp->rq_xprt;
> >> 	if (test_and_clear_bit(RQ_DATA, &rqstp->rq_flags)) {
> >> 		atomic_dec(&xprt->xpt_nr_rqsts);
> >> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> >> 		svc_xprt_enqueue(xprt);
> >> 	}
> >> }
> >> @@ -365,6 +366,15 @@ static bool svc_xprt_ready(struct svc_xprt *xprt)
> >> {
> >> 	unsigned long xpt_flags;
> >> 
> >> +	/*
> >> +	 * If another cpu has recently updated xpt_flags,
> >> +	 * sk_sock->flags, xpt_reserved, or xpt_nr_rqsts, we need to
> >> +	 * know about it; otherwise it's possible that both that cpu and
> >> +	 * this one could call svc_xprt_enqueue() without either
> >> +	 * svc_xprt_enqueue() recognizing that the conditions below
> >> +	 * are satisfied, and we could stall indefinitely:
> >> +	 */
> >> +	smp_rmb();
> >> 	READ_ONCE(xprt->xpt_flags);
> >> 
> >> 	if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE)))
> >> @@ -479,7 +489,7 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
> >> 	if (xprt && space < rqstp->rq_reserved) {
> >> 		atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
> >> 		rqstp->rq_reserved = space;
> >> -
> >> +		smp_wmb(); /* See smp_rmb() in svc_xprt_ready() */
> >> 		svc_xprt_enqueue(xprt);
> >> 	}
> >> }
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> index 828b149eaaef..377244992ae8 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> @@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> >> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
> >> 	spin_unlock(&rdma->sc_rq_dto_lock);
> >> 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> >> +	/* XXX: Chuck: do we need an smp_mb__after_atomic() here? */
> >> 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
> >> 		svc_xprt_enqueue(&rdma->sc_xprt);
> >> 	goto out;
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> index dc1951759a8e..e1a790487d69 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> @@ -290,6 +290,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
> >> 		spin_unlock(&rdma->sc_rq_dto_lock);
> >> 
> >> 		set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> >> +		/* XXX: Chuck: do we need a smp_mb__after_atomic() here? */
> >> 		svc_xprt_enqueue(&rdma->sc_xprt);
> >> 	}
> >> 
> > 
> > --
> > Chuck Lever
> 
> --
> Chuck Lever
> 
> 

  reply	other threads:[~2019-01-11 22:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-03 14:17 [PATCH] SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot() Trond Myklebust
2019-01-03 22:45 ` J Bruce Fields
2019-01-03 23:40   ` Trond Myklebust
2019-01-04 17:39     ` bfields
2019-01-07 21:32       ` bfields
2019-01-07 22:06         ` Trond Myklebust
2019-01-08 15:01           ` bfields
2019-01-08 16:21             ` Trond Myklebust
2019-01-09 16:51               ` bfields
2019-01-09 17:41                 ` Trond Myklebust
2019-01-11 21:12                   ` bfields
2019-01-11 21:52                     ` Chuck Lever
2019-01-11 21:54                       ` Chuck Lever
2019-01-11 22:10                         ` Bruce Fields [this message]
2019-01-11 22:27                           ` Chuck Lever
2019-01-12  0:56                             ` Bruce Fields
2019-01-14 17:24                               ` Chuck Lever
2019-01-25 20:30                                 ` Bruce Fields
2019-01-25 21:32                                   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190111221030.GA28794@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).