null pointer in rxe_mr_copy()

* null pointer in rxe_mr_copy()
@ 2022-04-11  3:34 Bob Pearson
  2022-04-11  5:14 ` Zhu Yanjun
  2022-04-12  4:11 ` Bob Pearson
  0 siblings, 2 replies; 10+ messages in thread
From: Bob Pearson @ 2022-04-11  3:34 UTC (permalink / raw)
  To: Zhu Yanjun, linux-rdma

Zhu,

Since checking for mr == NULL in rxe_mr_copy fixes the problem you were seeing in rping.
Perhaps it would be a good idea to apply the following patch which would tell us which of
the three calls to rxe_mr_copy is failing. My suspicion is the one in read_reply() in rxe_resp.c
This could be caused by a race between shutting down the qp and finishing up an RDMA read.
The responder resources state machine is completely unprotected from simultaneous access by
verbs code and bh code in rxe_resp.c. rxe_resp is a tasklet so all the accesses from there are
serialized but if anyone makes a verbs call that touches the responder resources it could
cause problems. The most likely (only?) place this could happen is qp shutdown.

Bob

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c

index 60a31b718774..66184f5a4ddf 100644

--- a/drivers/infiniband/sw/rxe/rxe_mr.c

+++ b/drivers/infiniband/sw/rxe/rxe_mr.c

@@ -489,6 +489,7 @@ int copy_data(

 		if (bytes > 0) {

 			iova = sge->addr + offset;

 

+			WARN_ON(!mr);

 			err = rxe_mr_copy(mr, iova, addr, bytes, dir);

 			if (err)

 				goto err2;

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c

index 1d95fab606da..6e3e86bdccd7 100644

--- a/drivers/infiniband/sw/rxe/rxe_resp.c

+++ b/drivers/infiniband/sw/rxe/rxe_resp.c

@@ -536,6 +536,7 @@ static enum resp_states write_data_in(struct rxe_qp *qp,

 	int	err;

 	int data_len = payload_size(pkt);

 

+	WARN_ON(!qp->resp.mr);

 	err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,

 			  payload_addr(pkt), data_len, RXE_TO_MR_OBJ);

 	if (err) {

@@ -772,6 +773,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,

 	if (!skb)

 		return RESPST_ERR_RNR;

 

+	WARN_ON(!mr);

 	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),

 			  payload, RXE_FROM_MR_OBJ);

 	if (err)


^ permalink raw reply	[flat|nested] 10+ messages in thread