All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Wang <jinpu.wang@cloud.ionos.com>
To: linux-rdma@vger.kernel.org
Cc: bvanassche@acm.org, leon@kernel.org, dledford@redhat.com,
	jgg@ziepe.ca, danil.kipnis@cloud.ionos.com,
	jinpu.wang@cloud.ionos.com, Gioh Kim <gi-oh.kim@cloud.ionos.com>
Subject: [PATCHv2 for-next 03/12] RDMA/rtrs-clt: avoid run destroy_con_cq_qp/create_con_cq_qp in parallel
Date: Fri, 23 Oct 2020 09:43:44 +0200	[thread overview]
Message-ID: <20201023074353.21946-4-jinpu.wang@cloud.ionos.com> (raw)
In-Reply-To: <20201023074353.21946-1-jinpu.wang@cloud.ionos.com>

It could happen two kworkers race with each other:

    addr_resolver kworker           reconnect kworker
    rtrs_clt_rdma_cm_handler
    rtrs_rdma_addr_resolved
    create_con_cq_qp: s.dev_ref++
    "s.dev_ref is 1"
                                    wait in create_cm fails with TIMEOUT
                                    destroy_con_cq_qp: --s.dev_ref
                                    "s.dev_ref is 0"
                                    destroy_con_cq_qp: sess->s.dev = NULL
     rtrs_cq_qp_create -> create_qp(con, sess->dev->ib_pd...)
    sess->dev is NULL, panic.

To fix the problem using mutex to serialize create_con_cq_qp
and destroy_con_cq_qp.

Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +++++++++++++--
 drivers/infiniband/ulp/rtrs/rtrs-clt.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index fb840b152b37..4677e8ed29ae 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -1499,6 +1499,7 @@ static int create_con(struct rtrs_clt_sess *sess, unsigned int cid)
 	con->c.cid = cid;
 	con->c.sess = &sess->s;
 	atomic_set(&con->io_cnt, 0);
+	mutex_init(&con->con_mutex);
 
 	sess->s.con[cid] = &con->c;
 
@@ -1510,6 +1511,7 @@ static void destroy_con(struct rtrs_clt_con *con)
 	struct rtrs_clt_sess *sess = to_clt_sess(con->c.sess);
 
 	sess->s.con[con->c.cid] = NULL;
+	mutex_destroy(&con->con_mutex);
 	kfree(con);
 }
 
@@ -1520,6 +1522,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con)
 	int err, cq_vector;
 	struct rtrs_msg_rkey_rsp *rsp;
 
+	lockdep_assert_held(&con->con_mutex);
 	if (con->c.cid == 0) {
 		/*
 		 * One completion for each receive and two for each send
@@ -1593,7 +1596,7 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con)
 	 * Be careful here: destroy_con_cq_qp() can be called even
 	 * create_con_cq_qp() failed, see comments there.
 	 */
-
+	lockdep_assert_held(&con->con_mutex);
 	rtrs_cq_qp_destroy(&con->c);
 	if (con->rsp_ius) {
 		rtrs_iu_free(con->rsp_ius, DMA_FROM_DEVICE,
@@ -1625,7 +1628,9 @@ static int rtrs_rdma_addr_resolved(struct rtrs_clt_con *con)
 	struct rtrs_sess *s = con->c.sess;
 	int err;
 
+	mutex_lock(&con->con_mutex);
 	err = create_con_cq_qp(con);
+	mutex_unlock(&con->con_mutex);
 	if (err) {
 		rtrs_err(s, "create_con_cq_qp(), err: %d\n", err);
 		return err;
@@ -1938,8 +1943,9 @@ static int create_cm(struct rtrs_clt_con *con)
 
 errr:
 	stop_cm(con);
-	/* Is safe to call destroy if cq_qp is not inited */
+	mutex_lock(&con->con_mutex);
 	destroy_con_cq_qp(con);
+	mutex_unlock(&con->con_mutex);
 destroy_cm:
 	destroy_cm(con);
 
@@ -2046,7 +2052,9 @@ static void rtrs_clt_stop_and_destroy_conns(struct rtrs_clt_sess *sess)
 		if (!sess->s.con[cid])
 			break;
 		con = to_clt_con(sess->s.con[cid]);
+		mutex_lock(&con->con_mutex);
 		destroy_con_cq_qp(con);
+		mutex_unlock(&con->con_mutex);
 		destroy_cm(con);
 		destroy_con(con);
 	}
@@ -2213,7 +2221,10 @@ static int init_conns(struct rtrs_clt_sess *sess)
 		struct rtrs_clt_con *con = to_clt_con(sess->s.con[cid]);
 
 		stop_cm(con);
+
+		mutex_lock(&con->con_mutex);
 		destroy_con_cq_qp(con);
+		mutex_unlock(&con->con_mutex);
 		destroy_cm(con);
 		destroy_con(con);
 	}
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
index 167acd3c90fc..b8dbd701b3cb 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
@@ -72,6 +72,7 @@ struct rtrs_clt_con {
 	struct rtrs_iu		*rsp_ius;
 	u32			queue_size;
 	unsigned int		cpu;
+	struct mutex		con_mutex;
 	atomic_t		io_cnt;
 	int			cm_err;
 };
-- 
2.25.1


  parent reply	other threads:[~2020-10-23  7:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-23  7:43 [PATCHv2 for-next 00/12] rtrs: misc fix and cleanup Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 01/12] RDMA/rtrs-clt: remove destroy_con_cq_qp in case route resolving failed Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 02/12] RDMA/rtrs-clt: remove outdated comment in create_con_cq_qp Jack Wang
2020-10-23  7:43 ` Jack Wang [this message]
2020-10-23  7:43 ` [PATCHv2 for-next 04/12] RDMA/rtrs-clt: missing error from rtrs_rdma_conn_established Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 05/12] RDMA/rtrs-srv: don't guard the whole __alloc_srv with srv_mutex Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 06/12] RDMA/rtrs-srv: fix typo Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 07/12] RDMA/rtrs: remove unnecessary argument dir of rtrs_iu_free Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 08/12] RDMA/rtrs-clt: remove duplicated switch-case handling for CM error events Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 09/12] RDMA/rtrs-clt: remove duplicated code Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 10/12] RDMA/rtrs-srv: kill rtrs_srv_change_state_get_old Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 11/12] RDMA/rtrs: introduce rtrs_post_send Jack Wang
2020-10-23  7:43 ` [PATCHv2 for-next 12/12] RDMA/rtrs-clt: remove 'addr' from rtrs_clt_add_path_to_arr Jack Wang
2020-10-28 16:28 ` [PATCHv2 for-next 00/12] rtrs: misc fix and cleanup Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201023074353.21946-4-jinpu.wang@cloud.ionos.com \
    --to=jinpu.wang@cloud.ionos.com \
    --cc=bvanassche@acm.org \
    --cc=danil.kipnis@cloud.ionos.com \
    --cc=dledford@redhat.com \
    --cc=gi-oh.kim@cloud.ionos.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.