From: Jack Wang <jinpu.wang@cloud.ionos.com> To: linux-rdma@vger.kernel.org Cc: bvanassche@acm.org, leon@kernel.org, dledford@redhat.com, jgg@ziepe.ca, danil.kipnis@cloud.ionos.com, jinpu.wang@cloud.ionos.com, Gioh Kim <gi-oh.kim@cloud.ionos.com> Subject: [PATCHv2 for-next 03/12] RDMA/rtrs-clt: avoid run destroy_con_cq_qp/create_con_cq_qp in parallel Date: Fri, 23 Oct 2020 09:43:44 +0200 Message-ID: <20201023074353.21946-4-jinpu.wang@cloud.ionos.com> (raw) In-Reply-To: <20201023074353.21946-1-jinpu.wang@cloud.ionos.com> It could happen two kworkers race with each other: addr_resolver kworker reconnect kworker rtrs_clt_rdma_cm_handler rtrs_rdma_addr_resolved create_con_cq_qp: s.dev_ref++ "s.dev_ref is 1" wait in create_cm fails with TIMEOUT destroy_con_cq_qp: --s.dev_ref "s.dev_ref is 0" destroy_con_cq_qp: sess->s.dev = NULL rtrs_cq_qp_create -> create_qp(con, sess->dev->ib_pd...) sess->dev is NULL, panic. To fix the problem using mutex to serialize create_con_cq_qp and destroy_con_cq_qp. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +++++++++++++-- drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index fb840b152b37..4677e8ed29ae 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1499,6 +1499,7 @@ static int create_con(struct rtrs_clt_sess *sess, unsigned int cid) con->c.cid = cid; con->c.sess = &sess->s; atomic_set(&con->io_cnt, 0); + mutex_init(&con->con_mutex); sess->s.con[cid] = &con->c; @@ -1510,6 +1511,7 @@ static void destroy_con(struct rtrs_clt_con *con) struct rtrs_clt_sess *sess = to_clt_sess(con->c.sess); sess->s.con[con->c.cid] = NULL; + mutex_destroy(&con->con_mutex); kfree(con); } @@ -1520,6 +1522,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) int err, cq_vector; struct rtrs_msg_rkey_rsp *rsp; + lockdep_assert_held(&con->con_mutex); if (con->c.cid == 0) { /* * One completion for each receive and two for each send @@ -1593,7 +1596,7 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) * Be careful here: destroy_con_cq_qp() can be called even * create_con_cq_qp() failed, see comments there. */ - + lockdep_assert_held(&con->con_mutex); rtrs_cq_qp_destroy(&con->c); if (con->rsp_ius) { rtrs_iu_free(con->rsp_ius, DMA_FROM_DEVICE, @@ -1625,7 +1628,9 @@ static int rtrs_rdma_addr_resolved(struct rtrs_clt_con *con) struct rtrs_sess *s = con->c.sess; int err; + mutex_lock(&con->con_mutex); err = create_con_cq_qp(con); + mutex_unlock(&con->con_mutex); if (err) { rtrs_err(s, "create_con_cq_qp(), err: %d\n", err); return err; @@ -1938,8 +1943,9 @@ static int create_cm(struct rtrs_clt_con *con) errr: stop_cm(con); - /* Is safe to call destroy if cq_qp is not inited */ + mutex_lock(&con->con_mutex); destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); destroy_cm: destroy_cm(con); @@ -2046,7 +2052,9 @@ static void rtrs_clt_stop_and_destroy_conns(struct rtrs_clt_sess *sess) if (!sess->s.con[cid]) break; con = to_clt_con(sess->s.con[cid]); + mutex_lock(&con->con_mutex); destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); destroy_cm(con); destroy_con(con); } @@ -2213,7 +2221,10 @@ static int init_conns(struct rtrs_clt_sess *sess) struct rtrs_clt_con *con = to_clt_con(sess->s.con[cid]); stop_cm(con); + + mutex_lock(&con->con_mutex); destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); destroy_cm(con); destroy_con(con); } diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h index 167acd3c90fc..b8dbd701b3cb 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h @@ -72,6 +72,7 @@ struct rtrs_clt_con { struct rtrs_iu *rsp_ius; u32 queue_size; unsigned int cpu; + struct mutex con_mutex; atomic_t io_cnt; int cm_err; }; -- 2.25.1
next prev parent reply index Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-23 7:43 [PATCHv2 for-next 00/12] rtrs: misc fix and cleanup Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 01/12] RDMA/rtrs-clt: remove destroy_con_cq_qp in case route resolving failed Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 02/12] RDMA/rtrs-clt: remove outdated comment in create_con_cq_qp Jack Wang 2020-10-23 7:43 ` Jack Wang [this message] 2020-10-23 7:43 ` [PATCHv2 for-next 04/12] RDMA/rtrs-clt: missing error from rtrs_rdma_conn_established Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 05/12] RDMA/rtrs-srv: don't guard the whole __alloc_srv with srv_mutex Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 06/12] RDMA/rtrs-srv: fix typo Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 07/12] RDMA/rtrs: remove unnecessary argument dir of rtrs_iu_free Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 08/12] RDMA/rtrs-clt: remove duplicated switch-case handling for CM error events Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 09/12] RDMA/rtrs-clt: remove duplicated code Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 10/12] RDMA/rtrs-srv: kill rtrs_srv_change_state_get_old Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 11/12] RDMA/rtrs: introduce rtrs_post_send Jack Wang 2020-10-23 7:43 ` [PATCHv2 for-next 12/12] RDMA/rtrs-clt: remove 'addr' from rtrs_clt_add_path_to_arr Jack Wang 2020-10-28 16:28 ` [PATCHv2 for-next 00/12] rtrs: misc fix and cleanup Jason Gunthorpe
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201023074353.21946-4-jinpu.wang@cloud.ionos.com \ --to=jinpu.wang@cloud.ionos.com \ --cc=bvanassche@acm.org \ --cc=danil.kipnis@cloud.ionos.com \ --cc=dledford@redhat.com \ --cc=gi-oh.kim@cloud.ionos.com \ --cc=jgg@ziepe.ca \ --cc=leon@kernel.org \ --cc=linux-rdma@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-RDMA Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \ linux-rdma@vger.kernel.org public-inbox-index linux-rdma Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma AGPL code for this site: git clone https://public-inbox.org/public-inbox.git