All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] svcrdma: Hold private mutex while invoking rdma_accept()
@ 2021-02-11 22:15 Chuck Lever
  2021-02-12 14:43 ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2021-02-11 22:15 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

RDMA core mutex locking was restructured by d114c6feedfe ("RDMA/cma:
Add missing locking to rdma_accept()") [Aug 2020]. When lock
debugging is enabled, the RPC/RDMA server trips over the new lockdep
assertion in rdma_accept() because it doesn't call rdma_accept()
from its CM event handler.

As a temporary fix, have svc_rdma_accept() take the mutex
explicitly. In the meantime, let's consider how to restructure the
RPC/RDMA transport to invoke rdma_accept() from the proper context.

Calls to svc_rdma_accept() are serialized with calls to
svc_rdma_free() by the generic RPC server layer.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index afba4e9d5425..8d14dbd45418 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -498,7 +498,9 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	}
 	conn_param.private_data = &pmsg;
 	conn_param.private_data_len = sizeof(pmsg);
+	rdma_lock_handler(newxprt->sc_cm_id);
 	ret = rdma_accept(newxprt->sc_cm_id, &conn_param);
+	rdma_unlock_handler(newxprt->sc_cm_id);
 	if (ret) {
 		trace_svcrdma_accept_err(newxprt, ret);
 		goto errout;



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] svcrdma: Hold private mutex while invoking rdma_accept()
  2021-02-11 22:15 [PATCH v1] svcrdma: Hold private mutex while invoking rdma_accept() Chuck Lever
@ 2021-02-12 14:43 ` Jason Gunthorpe
  2021-02-12 14:50   ` Chuck Lever
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2021-02-12 14:43 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, linux-rdma

On Thu, Feb 11, 2021 at 05:15:30PM -0500, Chuck Lever wrote:
> RDMA core mutex locking was restructured by d114c6feedfe ("RDMA/cma:
> Add missing locking to rdma_accept()") [Aug 2020]. When lock
> debugging is enabled, the RPC/RDMA server trips over the new lockdep
> assertion in rdma_accept() because it doesn't call rdma_accept()
> from its CM event handler.
> 
> As a temporary fix, have svc_rdma_accept() take the mutex
> explicitly. In the meantime, let's consider how to restructure the
> RPC/RDMA transport to invoke rdma_accept() from the proper context.
> 
> Calls to svc_rdma_accept() are serialized with calls to
> svc_rdma_free() by the generic RPC server layer.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Fixes line

> ---
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 ++
>  1 file changed, 2 insertions(+)

It could even be right like this..

This should be inside the lock though:

        newxprt->sc_cm_id->event_handler = svc_rdma_cma_handler;

But this really funny looking, before it gets to accept the handler is
still the listen handler so any incoming events will just be
discarded.

However the rdma_accept() should fail if the state machine has been
moved from the accepting state, and I think the only meaningful event
that can be delivered here is disconnect. So the rdma_accept() failure
does trigger destroy_id, which is the right thing on disconnect anyhow.

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] svcrdma: Hold private mutex while invoking rdma_accept()
  2021-02-12 14:43 ` Jason Gunthorpe
@ 2021-02-12 14:50   ` Chuck Lever
  2021-02-12 14:55     ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2021-02-12 14:50 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Linux NFS Mailing List, linux-rdma

Hi Jason-

Thanks for your review.


> On Feb 12, 2021, at 9:43 AM, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Thu, Feb 11, 2021 at 05:15:30PM -0500, Chuck Lever wrote:
>> RDMA core mutex locking was restructured by d114c6feedfe ("RDMA/cma:
>> Add missing locking to rdma_accept()") [Aug 2020]. When lock
>> debugging is enabled, the RPC/RDMA server trips over the new lockdep
>> assertion in rdma_accept() because it doesn't call rdma_accept()
>> from its CM event handler.
>> 
>> As a temporary fix, have svc_rdma_accept() take the mutex
>> explicitly. In the meantime, let's consider how to restructure the
>> RPC/RDMA transport to invoke rdma_accept() from the proper context.
>> 
>> Calls to svc_rdma_accept() are serialized with calls to
>> svc_rdma_free() by the generic RPC server layer.
>> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> Fixes line

Wasn't clear to me which commit should be listed. d114c6feedfe ?


>> ---
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 ++
>> 1 file changed, 2 insertions(+)
> 
> It could even be right like this..
> 
> This should be inside the lock though:
> 
>        newxprt->sc_cm_id->event_handler = svc_rdma_cma_handler;

OK.


> But this really funny looking, before it gets to accept the handler is
> still the listen handler so any incoming events will just be
> discarded.

Yeah, not clear to me why two CM event handlers are necessary.
If they are truly needed, a comment would be helpful.


> However the rdma_accept() should fail if the state machine has been
> moved from the accepting state, and I think the only meaningful event
> that can be delivered here is disconnect. So the rdma_accept() failure
> does trigger destroy_id, which is the right thing on disconnect anyhow.

The mutex needs to be released before the ID is destroyed, right?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] svcrdma: Hold private mutex while invoking rdma_accept()
  2021-02-12 14:50   ` Chuck Lever
@ 2021-02-12 14:55     ` Jason Gunthorpe
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2021-02-12 14:55 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List, linux-rdma

On Fri, Feb 12, 2021 at 02:50:42PM +0000, Chuck Lever wrote:
> Hi Jason-
> 
> Thanks for your review.
> 
> 
> > On Feb 12, 2021, at 9:43 AM, Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > On Thu, Feb 11, 2021 at 05:15:30PM -0500, Chuck Lever wrote:
> >> RDMA core mutex locking was restructured by d114c6feedfe ("RDMA/cma:
> >> Add missing locking to rdma_accept()") [Aug 2020]. When lock
> >> debugging is enabled, the RPC/RDMA server trips over the new lockdep
> >> assertion in rdma_accept() because it doesn't call rdma_accept()
> >> from its CM event handler.
> >> 
> >> As a temporary fix, have svc_rdma_accept() take the mutex
> >> explicitly. In the meantime, let's consider how to restructure the
> >> RPC/RDMA transport to invoke rdma_accept() from the proper context.
> >> 
> >> Calls to svc_rdma_accept() are serialized with calls to
> >> svc_rdma_free() by the generic RPC server layer.
> >> 
> >> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > 
> > Fixes line
> 
> Wasn't clear to me which commit should be listed. d114c6feedfe ?

Yes, this is the earliest it can go back, arguably it should be
backported further, but the bug from missing this lock is very small

> > But this really funny looking, before it gets to accept the handler is
> > still the listen handler so any incoming events will just be
> > discarded.
> 
> Yeah, not clear to me why two CM event handlers are necessary.
> If they are truly needed, a comment would be helpful.

Looks like the only thing it does here is discard the disconnected
event before accept, so if svc_xprt_enqueue() can run concurrently
with the accept process they can be safely combined
 
> > However the rdma_accept() should fail if the state machine has been
> > moved from the accepting state, and I think the only meaningful event
> > that can be delivered here is disconnect. So the rdma_accept() failure
> > does trigger destroy_id, which is the right thing on disconnect anyhow.
> 
> The mutex needs to be released before the ID is destroyed, right?

Yes, noting that the handler can potentially still be called until the
ID is destroyed, so its has to be safe against races with the
svc_xprt_enqueue() too.

Though the core code as a destroy_id_handler_unlock() which can be
called under lock that is used to make the destruction atomic with the
handlers.

Jason
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-12 14:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-11 22:15 [PATCH v1] svcrdma: Hold private mutex while invoking rdma_accept() Chuck Lever
2021-02-12 14:43 ` Jason Gunthorpe
2021-02-12 14:50   ` Chuck Lever
2021-02-12 14:55     ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.