Please consider to include following changes to upstream. I re-ordered the patch sequence so bugfix comes first, then cleanup. First 5 patches are bugfixes, we also added the Fixes tag accordingly. Thanks! Changelog: v2 -> v1: Droped "RDMA/rtrs: removed unused filed list of rtrs_iu" merged alreay. Droped "RDMA/rtrs-clt: remove unnecessary dev_ref of rtrs_sess", buggy. New "RDMA/rtrs-srv: don't guard the whole __alloc_srv with srv_mutex" v1: https://lore.kernel.org/linux-rdma/20201012131814.121096-1-jinpu.wang@cloud.ionos.com/ Danil Kipnis (1): RDMA/rtrs-clt: remove destroy_con_cq_qp in case route resolving failed Gioh Kim (4): RDMA/rtrs-clt: missing error from rtrs_rdma_conn_established RDMA/rtrs: remove unnecessary argument dir of rtrs_iu_free RDMA/rtrs-clt: remove duplicated switch-case handling for CM error events RDMA/rtrs-clt: remove duplicated code Guoqing Jiang (5): RDMA/rtrs-srv: don't guard the whole __alloc_srv with srv_mutex RDMA/rtrs-srv: fix typo RDMA/rtrs-srv: kill rtrs_srv_change_state_get_old RDMA/rtrs: introduce rtrs_post_send RDMA/rtrs-clt: remove 'addr' from rtrs_clt_add_path_to_arr Jack Wang (2): RDMA/rtrs-clt: remove outdated comment in create_con_cq_qp RDMA/rtrs-clt: avoid run destroy_con_cq_qp/create_con_cq_qp in parallel drivers/infiniband/ulp/rtrs/rtrs-clt.c | 74 ++++++++------- drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + drivers/infiniband/ulp/rtrs/rtrs-pri.h | 3 +- drivers/infiniband/ulp/rtrs/rtrs-srv.c | 119 ++++++++++--------------- drivers/infiniband/ulp/rtrs/rtrs-srv.h | 2 +- drivers/infiniband/ulp/rtrs/rtrs.c | 61 +++++-------- 6 files changed, 108 insertions(+), 152 deletions(-) -- 2.25.1
From: Danil Kipnis <danil.kipnis@cloud.ionos.com> We call destroy_con_cq_qp(con) in rtrs_rdma_addr_resolved() in case route couldn't be resolved and then again in create_cm() because nothing happens. Don't call destroy_con_cq_qp from rtrs_rdma_addr_resolved, create_cm() does the clean up already. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 776e89231c52..9980bb4a6f78 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1640,10 +1640,8 @@ static int rtrs_rdma_addr_resolved(struct rtrs_clt_con *con) return err; } err = rdma_resolve_route(con->c.cm_id, RTRS_CONNECT_TIMEOUT_MS); - if (err) { + if (err) rtrs_err(s, "Resolving route failed, err: %d\n", err); - destroy_con_cq_qp(con); - } return err; } -- 2.25.1
As run destroy_con_cq_qp many times doesn't work, remove the comments. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Suggested-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 9980bb4a6f78..fb840b152b37 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1520,15 +1520,6 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) int err, cq_vector; struct rtrs_msg_rkey_rsp *rsp; - /* - * This function can fail, but still destroy_con_cq_qp() should - * be called, this is because create_con_cq_qp() is called on cm - * event path, thus caller/waiter never knows: have we failed before - * create_con_cq_qp() or after. To solve this dilemma without - * creating any additional flags just allow destroy_con_cq_qp() be - * called many times. - */ - if (con->c.cid == 0) { /* * One completion for each receive and two for each send -- 2.25.1
It could happen two kworkers race with each other: addr_resolver kworker reconnect kworker rtrs_clt_rdma_cm_handler rtrs_rdma_addr_resolved create_con_cq_qp: s.dev_ref++ "s.dev_ref is 1" wait in create_cm fails with TIMEOUT destroy_con_cq_qp: --s.dev_ref "s.dev_ref is 0" destroy_con_cq_qp: sess->s.dev = NULL rtrs_cq_qp_create -> create_qp(con, sess->dev->ib_pd...) sess->dev is NULL, panic. To fix the problem using mutex to serialize create_con_cq_qp and destroy_con_cq_qp. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +++++++++++++-- drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index fb840b152b37..4677e8ed29ae 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1499,6 +1499,7 @@ static int create_con(struct rtrs_clt_sess *sess, unsigned int cid) con->c.cid = cid; con->c.sess = &sess->s; atomic_set(&con->io_cnt, 0); + mutex_init(&con->con_mutex); sess->s.con[cid] = &con->c; @@ -1510,6 +1511,7 @@ static void destroy_con(struct rtrs_clt_con *con) struct rtrs_clt_sess *sess = to_clt_sess(con->c.sess); sess->s.con[con->c.cid] = NULL; + mutex_destroy(&con->con_mutex); kfree(con); } @@ -1520,6 +1522,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con) int err, cq_vector; struct rtrs_msg_rkey_rsp *rsp; + lockdep_assert_held(&con->con_mutex); if (con->c.cid == 0) { /* * One completion for each receive and two for each send @@ -1593,7 +1596,7 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) * Be careful here: destroy_con_cq_qp() can be called even * create_con_cq_qp() failed, see comments there. */ - + lockdep_assert_held(&con->con_mutex); rtrs_cq_qp_destroy(&con->c); if (con->rsp_ius) { rtrs_iu_free(con->rsp_ius, DMA_FROM_DEVICE, @@ -1625,7 +1628,9 @@ static int rtrs_rdma_addr_resolved(struct rtrs_clt_con *con) struct rtrs_sess *s = con->c.sess; int err; + mutex_lock(&con->con_mutex); err = create_con_cq_qp(con); + mutex_unlock(&con->con_mutex); if (err) { rtrs_err(s, "create_con_cq_qp(), err: %d\n", err); return err; @@ -1938,8 +1943,9 @@ static int create_cm(struct rtrs_clt_con *con) errr: stop_cm(con); - /* Is safe to call destroy if cq_qp is not inited */ + mutex_lock(&con->con_mutex); destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); destroy_cm: destroy_cm(con); @@ -2046,7 +2052,9 @@ static void rtrs_clt_stop_and_destroy_conns(struct rtrs_clt_sess *sess) if (!sess->s.con[cid]) break; con = to_clt_con(sess->s.con[cid]); + mutex_lock(&con->con_mutex); destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); destroy_cm(con); destroy_con(con); } @@ -2213,7 +2221,10 @@ static int init_conns(struct rtrs_clt_sess *sess) struct rtrs_clt_con *con = to_clt_con(sess->s.con[cid]); stop_cm(con); + + mutex_lock(&con->con_mutex); destroy_con_cq_qp(con); + mutex_unlock(&con->con_mutex); destroy_cm(con); destroy_con(con); } diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h index 167acd3c90fc..b8dbd701b3cb 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h @@ -72,6 +72,7 @@ struct rtrs_clt_con { struct rtrs_iu *rsp_ius; u32 queue_size; unsigned int cpu; + struct mutex con_mutex; atomic_t io_cnt; int cm_err; }; -- 2.25.1
From: Gioh Kim <gi-oh.kim@cloud.ionos.com> When rtrs_rdma_conn_established returns error (non-zero value), the error value is stored in con->cm_err and it cannot trigger rtrs_rdma_error_recovery. Finally the error of rtrs_rdma_con_established will be forgot. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 4677e8ed29ae..ffca004625bb 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1831,8 +1831,8 @@ static int rtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, cm_err = rtrs_rdma_route_resolved(con); break; case RDMA_CM_EVENT_ESTABLISHED: - con->cm_err = rtrs_rdma_conn_established(con, ev); - if (likely(!con->cm_err)) { + cm_err = rtrs_rdma_conn_established(con, ev); + if (likely(!cm_err)) { /* * Report success and wake up. Here we abuse state_wq, * i.e. wake up without state change, but we set cm_err. -- 2.25.1
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> The purpose of srv_mutex is to protect srv_list as in put_srv, so no need to hold it when allocate memory for srv since it could be time consuming. Otherwise if one machine has limited memory, rsrv_close_work could be blocked for a longer time due to the mutex is held by get_or_create_srv since it can't get memory in time. [13327733.807781] INFO: task kworker/1:1:27478 blocked for more than 120 seconds. [13327733.854623] Tainted: G O 4.14.171-1-storage #4.14.171-1.3~deb9 [13327733.902461] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [13327733.953930] kworker/1:1 D 0 27478 2 0x80000000 [13327733.953938] Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server] [13327733.953939] Call Trace: [13327733.953945] ? __schedule+0x38c/0x7e0 [13327733.953946] schedule+0x32/0x80 [13327733.953948] schedule_preempt_disabled+0xa/0x10 [13327733.953949] __mutex_lock.isra.2+0x25e/0x4d0 [13327733.953954] ? put_srv+0x44/0x100 [rtrs_server] [13327733.953958] put_srv+0x44/0x100 [rtrs_server] [13327733.953961] rtrs_srv_close_work+0x16c/0x280 [rtrs_server] [13327733.953966] process_one_work+0x1c5/0x3c0 [13327733.953969] worker_thread+0x47/0x3e0 [13327733.953970] kthread+0xfc/0x130 [13327733.953972] ? trace_event_raw_event_workqueue_execute_start+0xa0/0xa0 [13327733.953973] ? kthread_create_on_node+0x70/0x70 [13327733.953974] ret_from_fork+0x1f/0x30 Let's move all the logics from __find_srv_and_get and __alloc_srv to get_or_create_srv, and remove the two functions. Then it should be safe for multiple processes to access the same srv since it is protected with srv_mutex. And since we don't want to allocate chunks with srv_mutex held, let's check the srv->refcount after get srv because the chunks could not be allocated yet. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-srv.c | 86 +++++++++++--------------- 1 file changed, 37 insertions(+), 49 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c index d6f93601712e..1cb778aff3c5 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c @@ -1328,17 +1328,42 @@ static void rtrs_srv_dev_release(struct device *dev) kfree(srv); } -static struct rtrs_srv *__alloc_srv(struct rtrs_srv_ctx *ctx, - const uuid_t *paths_uuid) +static void free_srv(struct rtrs_srv *srv) +{ + int i; + + WARN_ON(refcount_read(&srv->refcount)); + for (i = 0; i < srv->queue_depth; i++) + mempool_free(srv->chunks[i], chunk_pool); + kfree(srv->chunks); + mutex_destroy(&srv->paths_mutex); + mutex_destroy(&srv->paths_ev_mutex); + /* last put to release the srv structure */ + put_device(&srv->dev); +} + +static struct rtrs_srv *get_or_create_srv(struct rtrs_srv_ctx *ctx, + const uuid_t *paths_uuid) { struct rtrs_srv *srv; int i; + mutex_lock(&ctx->srv_mutex); + list_for_each_entry(srv, &ctx->srv_list, ctx_list) { + if (uuid_equal(&srv->paths_uuid, paths_uuid) && + refcount_inc_not_zero(&srv->refcount)) { + mutex_unlock(&ctx->srv_mutex); + return srv; + } + } + + /* need to allocate a new srv */ srv = kzalloc(sizeof(*srv), GFP_KERNEL); - if (!srv) + if (!srv) { + mutex_unlock(&ctx->srv_mutex); return NULL; + } - refcount_set(&srv->refcount, 1); INIT_LIST_HEAD(&srv->paths_list); mutex_init(&srv->paths_mutex); mutex_init(&srv->paths_ev_mutex); @@ -1347,6 +1372,8 @@ static struct rtrs_srv *__alloc_srv(struct rtrs_srv_ctx *ctx, srv->ctx = ctx; device_initialize(&srv->dev); srv->dev.release = rtrs_srv_dev_release; + list_add(&srv->ctx_list, &ctx->srv_list); + mutex_unlock(&ctx->srv_mutex); srv->chunks = kcalloc(srv->queue_depth, sizeof(*srv->chunks), GFP_KERNEL); @@ -1358,7 +1385,7 @@ static struct rtrs_srv *__alloc_srv(struct rtrs_srv_ctx *ctx, if (!srv->chunks[i]) goto err_free_chunks; } - list_add(&srv->ctx_list, &ctx->srv_list); + refcount_set(&srv->refcount, 1); return srv; @@ -1369,52 +1396,9 @@ static struct rtrs_srv *__alloc_srv(struct rtrs_srv_ctx *ctx, err_free_srv: kfree(srv); - return NULL; } -static void free_srv(struct rtrs_srv *srv) -{ - int i; - - WARN_ON(refcount_read(&srv->refcount)); - for (i = 0; i < srv->queue_depth; i++) - mempool_free(srv->chunks[i], chunk_pool); - kfree(srv->chunks); - mutex_destroy(&srv->paths_mutex); - mutex_destroy(&srv->paths_ev_mutex); - /* last put to release the srv structure */ - put_device(&srv->dev); -} - -static inline struct rtrs_srv *__find_srv_and_get(struct rtrs_srv_ctx *ctx, - const uuid_t *paths_uuid) -{ - struct rtrs_srv *srv; - - list_for_each_entry(srv, &ctx->srv_list, ctx_list) { - if (uuid_equal(&srv->paths_uuid, paths_uuid) && - refcount_inc_not_zero(&srv->refcount)) - return srv; - } - - return NULL; -} - -static struct rtrs_srv *get_or_create_srv(struct rtrs_srv_ctx *ctx, - const uuid_t *paths_uuid) -{ - struct rtrs_srv *srv; - - mutex_lock(&ctx->srv_mutex); - srv = __find_srv_and_get(ctx, paths_uuid); - if (!srv) - srv = __alloc_srv(ctx, paths_uuid); - mutex_unlock(&ctx->srv_mutex); - - return srv; -} - static void put_srv(struct rtrs_srv *srv) { if (refcount_dec_and_test(&srv->refcount)) { @@ -1813,7 +1797,11 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, } recon_cnt = le16_to_cpu(msg->recon_cnt); srv = get_or_create_srv(ctx, &msg->paths_uuid); - if (!srv) { + /* + * "refcount == 0" happens if a previous thread calls get_or_create_srv + * allocate srv, but chunks of srv are not allocated yet. + */ + if (!srv || refcount_read(&srv->refcount) == 0) { err = -ENOMEM; goto reject_w_err; } -- 2.25.1
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> It should mean region here. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-srv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.h b/drivers/infiniband/ulp/rtrs/rtrs-srv.h index 08b0b8a6eebe..9543ae19996c 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.h +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.h @@ -62,7 +62,7 @@ struct rtrs_srv_op { /* * server side memory region context, when always_invalidate=Y, we need - * queue_depth of memory regrion to invalidate each memory region. + * queue_depth of memory region to invalidate each memory region. */ struct rtrs_srv_mr { struct ib_mr *mr; -- 2.25.1
From: Gioh Kim <gi-oh.kim@cloud.ionos.com> The direction of DMA operation is already in the rtrs_iu Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 14 ++++++-------- drivers/infiniband/ulp/rtrs/rtrs-pri.h | 3 +-- drivers/infiniband/ulp/rtrs/rtrs-srv.c | 14 ++++++-------- drivers/infiniband/ulp/rtrs/rtrs.c | 9 ++++----- 4 files changed, 17 insertions(+), 23 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index ffca004625bb..4e5da834034a 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1236,8 +1236,7 @@ static void free_sess_reqs(struct rtrs_clt_sess *sess) if (req->mr) ib_dereg_mr(req->mr); kfree(req->sge); - rtrs_iu_free(req->iu, DMA_TO_DEVICE, - sess->s.dev->ib_dev, 1); + rtrs_iu_free(req->iu, sess->s.dev->ib_dev, 1); } kfree(sess->reqs); sess->reqs = NULL; @@ -1599,8 +1598,7 @@ static void destroy_con_cq_qp(struct rtrs_clt_con *con) lockdep_assert_held(&con->con_mutex); rtrs_cq_qp_destroy(&con->c); if (con->rsp_ius) { - rtrs_iu_free(con->rsp_ius, DMA_FROM_DEVICE, - sess->s.dev->ib_dev, con->queue_size); + rtrs_iu_free(con->rsp_ius, sess->s.dev->ib_dev, con->queue_size); con->rsp_ius = NULL; con->queue_size = 0; } @@ -2245,7 +2243,7 @@ static void rtrs_clt_info_req_done(struct ib_cq *cq, struct ib_wc *wc) struct rtrs_iu *iu; iu = container_of(wc->wr_cqe, struct rtrs_iu, cqe); - rtrs_iu_free(iu, DMA_TO_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(iu, sess->s.dev->ib_dev, 1); if (unlikely(wc->status != IB_WC_SUCCESS)) { rtrs_err(sess->clt, "Sess info request send failed: %s\n", @@ -2374,7 +2372,7 @@ static void rtrs_clt_info_rsp_done(struct ib_cq *cq, struct ib_wc *wc) out: rtrs_clt_update_wc_stats(con); - rtrs_iu_free(iu, DMA_FROM_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(iu, sess->s.dev->ib_dev, 1); rtrs_clt_change_state(sess, state); } @@ -2436,9 +2434,9 @@ static int rtrs_send_sess_info(struct rtrs_clt_sess *sess) out: if (tx_iu) - rtrs_iu_free(tx_iu, DMA_TO_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(tx_iu, sess->s.dev->ib_dev, 1); if (rx_iu) - rtrs_iu_free(rx_iu, DMA_FROM_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(rx_iu, sess->s.dev->ib_dev, 1); if (unlikely(err)) /* If we've never taken async path because of malloc problems */ rtrs_clt_change_state(sess, RTRS_CLT_CONNECTING_ERR); diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h index b8e43dc4d95a..3f2918671dbe 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h +++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h @@ -287,8 +287,7 @@ struct rtrs_msg_rdma_hdr { struct rtrs_iu *rtrs_iu_alloc(u32 queue_size, size_t size, gfp_t t, struct ib_device *dev, enum dma_data_direction, void (*done)(struct ib_cq *cq, struct ib_wc *wc)); -void rtrs_iu_free(struct rtrs_iu *iu, enum dma_data_direction dir, - struct ib_device *dev, u32 queue_size); +void rtrs_iu_free(struct rtrs_iu *iu, struct ib_device *dev, u32 queue_size); int rtrs_iu_post_recv(struct rtrs_con *con, struct rtrs_iu *iu); int rtrs_iu_post_send(struct rtrs_con *con, struct rtrs_iu *iu, size_t size, struct ib_send_wr *head); diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c index 1cb778aff3c5..42ee3bf7dc52 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c @@ -577,8 +577,7 @@ static void unmap_cont_bufs(struct rtrs_srv_sess *sess) struct rtrs_srv_mr *srv_mr; srv_mr = &sess->mrs[i]; - rtrs_iu_free(srv_mr->iu, DMA_TO_DEVICE, - sess->s.dev->ib_dev, 1); + rtrs_iu_free(srv_mr->iu, sess->s.dev->ib_dev, 1); ib_dereg_mr(srv_mr->mr); ib_dma_unmap_sg(sess->s.dev->ib_dev, srv_mr->sgt.sgl, srv_mr->sgt.nents, DMA_BIDIRECTIONAL); @@ -682,8 +681,7 @@ static int map_cont_bufs(struct rtrs_srv_sess *sess) sgt = &srv_mr->sgt; mr = srv_mr->mr; free_iu: - rtrs_iu_free(srv_mr->iu, DMA_TO_DEVICE, - sess->s.dev->ib_dev, 1); + rtrs_iu_free(srv_mr->iu, sess->s.dev->ib_dev, 1); dereg_mr: ib_dereg_mr(mr); unmap_sg: @@ -735,7 +733,7 @@ static void rtrs_srv_info_rsp_done(struct ib_cq *cq, struct ib_wc *wc) struct rtrs_iu *iu; iu = container_of(wc->wr_cqe, struct rtrs_iu, cqe); - rtrs_iu_free(iu, DMA_TO_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(iu, sess->s.dev->ib_dev, 1); if (unlikely(wc->status != IB_WC_SUCCESS)) { rtrs_err(s, "Sess info response send failed: %s\n", @@ -861,7 +859,7 @@ static int process_info_req(struct rtrs_srv_con *con, if (unlikely(err)) { rtrs_err(s, "rtrs_iu_post_send(), err: %d\n", err); iu_free: - rtrs_iu_free(tx_iu, DMA_TO_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(tx_iu, sess->s.dev->ib_dev, 1); } rwr_free: kfree(rwr); @@ -906,7 +904,7 @@ static void rtrs_srv_info_req_done(struct ib_cq *cq, struct ib_wc *wc) goto close; out: - rtrs_iu_free(iu, DMA_FROM_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(iu, sess->s.dev->ib_dev, 1); return; close: close_sess(sess); @@ -929,7 +927,7 @@ static int post_recv_info_req(struct rtrs_srv_con *con) err = rtrs_iu_post_recv(&con->c, rx_iu); if (unlikely(err)) { rtrs_err(s, "rtrs_iu_post_recv(), err: %d\n", err); - rtrs_iu_free(rx_iu, DMA_FROM_DEVICE, sess->s.dev->ib_dev, 1); + rtrs_iu_free(rx_iu, sess->s.dev->ib_dev, 1); return err; } diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c index ff1093d6e4bc..48f648f573b6 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs.c +++ b/drivers/infiniband/ulp/rtrs/rtrs.c @@ -31,6 +31,7 @@ struct rtrs_iu *rtrs_iu_alloc(u32 queue_size, size_t size, gfp_t gfp_mask, return NULL; for (i = 0; i < queue_size; i++) { iu = &ius[i]; + iu->direction = dir; iu->buf = kzalloc(size, gfp_mask); if (!iu->buf) goto err; @@ -41,17 +42,15 @@ struct rtrs_iu *rtrs_iu_alloc(u32 queue_size, size_t size, gfp_t gfp_mask, iu->cqe.done = done; iu->size = size; - iu->direction = dir; } return ius; err: - rtrs_iu_free(ius, dir, dma_dev, i); + rtrs_iu_free(ius, dma_dev, i); return NULL; } EXPORT_SYMBOL_GPL(rtrs_iu_alloc); -void rtrs_iu_free(struct rtrs_iu *ius, enum dma_data_direction dir, - struct ib_device *ibdev, u32 queue_size) +void rtrs_iu_free(struct rtrs_iu *ius, struct ib_device *ibdev, u32 queue_size) { struct rtrs_iu *iu; int i; @@ -61,7 +60,7 @@ void rtrs_iu_free(struct rtrs_iu *ius, enum dma_data_direction dir, for (i = 0; i < queue_size; i++) { iu = &ius[i]; - ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir); + ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, iu->direction); kfree(iu->buf); } kfree(ius); -- 2.25.1
From: Gioh Kim <gi-oh.kim@cloud.ionos.com> The events returning the same error value are put together. Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 4e5da834034a..30eda2f355e1 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -1843,20 +1843,22 @@ static int rtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, case RDMA_CM_EVENT_REJECTED: cm_err = rtrs_rdma_conn_rejected(con, ev); break; + case RDMA_CM_EVENT_DISCONNECTED: + /* No message for disconnecting */ + cm_err = -ECONNRESET; + break; case RDMA_CM_EVENT_CONNECT_ERROR: case RDMA_CM_EVENT_UNREACHABLE: + case RDMA_CM_EVENT_ADDR_CHANGE: + case RDMA_CM_EVENT_TIMEWAIT_EXIT: rtrs_wrn(s, "CM error event %d\n", ev->event); cm_err = -ECONNRESET; break; case RDMA_CM_EVENT_ADDR_ERROR: case RDMA_CM_EVENT_ROUTE_ERROR: + rtrs_wrn(s, "CM error event %d\n", ev->event); cm_err = -EHOSTUNREACH; break; - case RDMA_CM_EVENT_DISCONNECTED: - case RDMA_CM_EVENT_ADDR_CHANGE: - case RDMA_CM_EVENT_TIMEWAIT_EXIT: - cm_err = -ECONNRESET; - break; case RDMA_CM_EVENT_DEVICE_REMOVAL: /* * Device removal is a special case. Queue close and return 0. -- 2.25.1
From: Gioh Kim <gi-oh.kim@cloud.ionos.com> process_info_rsp checks that sg_cnt is zero twice. Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 30eda2f355e1..1b2821d71b46 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -2264,8 +2264,12 @@ static int process_info_rsp(struct rtrs_clt_sess *sess, int i, sgi; sg_cnt = le16_to_cpu(msg->sg_cnt); - if (unlikely(!sg_cnt)) + if (unlikely(!sg_cnt || (sess->queue_depth % sg_cnt))) { + rtrs_err(sess->clt, "Incorrect sg_cnt %d, is not multiple\n", + sg_cnt); return -EINVAL; + } + /* * Check if IB immediate data size is enough to hold the mem_id and * the offset inside the memory chunk. @@ -2278,11 +2282,6 @@ static int process_info_rsp(struct rtrs_clt_sess *sess, MAX_IMM_PAYL_BITS, sg_cnt, sess->chunk_size); return -EINVAL; } - if (unlikely(!sg_cnt || (sess->queue_depth % sg_cnt))) { - rtrs_err(sess->clt, "Incorrect sg_cnt %d, is not multiple\n", - sg_cnt); - return -EINVAL; - } total_len = 0; for (sgi = 0, i = 0; sgi < sg_cnt && i < sess->queue_depth; sgi++) { const struct rtrs_sg_desc *desc = &msg->desc[sgi]; -- 2.25.1
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> This function isn't needed since no caller checks the old_state of sess. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-srv.c | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c index 42ee3bf7dc52..c42fd470c4eb 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c @@ -113,28 +113,18 @@ static bool __rtrs_srv_change_state(struct rtrs_srv_sess *sess, return changed; } -static bool rtrs_srv_change_state_get_old(struct rtrs_srv_sess *sess, - enum rtrs_srv_state new_state, - enum rtrs_srv_state *old_state) +static bool rtrs_srv_change_state(struct rtrs_srv_sess *sess, + enum rtrs_srv_state new_state) { bool changed; spin_lock_irq(&sess->state_lock); - *old_state = sess->state; changed = __rtrs_srv_change_state(sess, new_state); spin_unlock_irq(&sess->state_lock); return changed; } -static bool rtrs_srv_change_state(struct rtrs_srv_sess *sess, - enum rtrs_srv_state new_state) -{ - enum rtrs_srv_state old_state; - - return rtrs_srv_change_state_get_old(sess, new_state, &old_state); -} - static void free_id(struct rtrs_srv_op *id) { if (!id) @@ -471,10 +461,7 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id, void close_sess(struct rtrs_srv_sess *sess) { - enum rtrs_srv_state old_state; - - if (rtrs_srv_change_state_get_old(sess, RTRS_SRV_CLOSING, - &old_state)) + if (rtrs_srv_change_state(sess, RTRS_SRV_CLOSING)) queue_work(rtrs_wq, &sess->close_work); WARN_ON(sess->state != RTRS_SRV_CLOSING); } -- 2.25.1
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Since the three functions share the similar logic, let's introduce one common function for it. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs.c | 52 +++++++++++------------------- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c index 48f648f573b6..2e3a849e0a77 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs.c +++ b/drivers/infiniband/ulp/rtrs/rtrs.c @@ -104,6 +104,22 @@ int rtrs_post_recv_empty(struct rtrs_con *con, struct ib_cqe *cqe) } EXPORT_SYMBOL_GPL(rtrs_post_recv_empty); +static int rtrs_post_send(struct ib_qp *qp, struct ib_send_wr *head, + struct ib_send_wr *wr) +{ + if (head) { + struct ib_send_wr *tail = head; + + while (tail->next) + tail = tail->next; + tail->next = wr; + } else { + head = wr; + } + + return ib_post_send(qp, head, NULL); +} + int rtrs_iu_post_send(struct rtrs_con *con, struct rtrs_iu *iu, size_t size, struct ib_send_wr *head) { @@ -126,17 +142,7 @@ int rtrs_iu_post_send(struct rtrs_con *con, struct rtrs_iu *iu, size_t size, .send_flags = IB_SEND_SIGNALED, }; - if (head) { - struct ib_send_wr *tail = head; - - while (tail->next) - tail = tail->next; - tail->next = ≀ - } else { - head = ≀ - } - - return ib_post_send(con->qp, head, NULL); + return rtrs_post_send(con->qp, head, &wr); } EXPORT_SYMBOL_GPL(rtrs_iu_post_send); @@ -168,17 +174,7 @@ int rtrs_iu_post_rdma_write_imm(struct rtrs_con *con, struct rtrs_iu *iu, if (WARN_ON(sge[i].length == 0)) return -EINVAL; - if (head) { - struct ib_send_wr *tail = head; - - while (tail->next) - tail = tail->next; - tail->next = &wr.wr; - } else { - head = &wr.wr; - } - - return ib_post_send(con->qp, head, NULL); + return rtrs_post_send(con->qp, head, &wr.wr); } EXPORT_SYMBOL_GPL(rtrs_iu_post_rdma_write_imm); @@ -195,17 +191,7 @@ int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe, .ex.imm_data = cpu_to_be32(imm_data), }; - if (head) { - struct ib_send_wr *tail = head; - - while (tail->next) - tail = tail->next; - tail->next = ≀ - } else { - head = ≀ - } - - return ib_post_send(con->qp, head, NULL); + return rtrs_post_send(con->qp, head, &wr); } EXPORT_SYMBOL_GPL(rtrs_post_rdma_write_imm_empty); -- 2.25.1
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Remove the argument since it is not used in the function. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 1b2821d71b46..9d359c8f2f81 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -2161,8 +2161,7 @@ static void rtrs_clt_remove_path_from_arr(struct rtrs_clt_sess *sess) mutex_unlock(&clt->paths_mutex); } -static void rtrs_clt_add_path_to_arr(struct rtrs_clt_sess *sess, - struct rtrs_addr *addr) +static void rtrs_clt_add_path_to_arr(struct rtrs_clt_sess *sess) { struct rtrs_clt *clt = sess->clt; @@ -2937,7 +2936,7 @@ int rtrs_clt_create_path_from_sysfs(struct rtrs_clt *clt, * IO will never grab it. Also it is very important to add * path before init, since init fires LINK_CONNECTED event. */ - rtrs_clt_add_path_to_arr(sess, addr); + rtrs_clt_add_path_to_arr(sess); err = init_sess(sess); if (err) -- 2.25.1
On Fri, Oct 23, 2020 at 09:43:41AM +0200, Jack Wang wrote:
> Please consider to include following changes to upstream.
>
> I re-ordered the patch sequence so bugfix comes first, then cleanup.
> First 5 patches are bugfixes, we also added the Fixes tag accordingly.
>
> Thanks!
>
> Changelog:
> v2 -> v1:
> Droped "RDMA/rtrs: removed unused filed list of rtrs_iu" merged alreay.
> Droped "RDMA/rtrs-clt: remove unnecessary dev_ref of rtrs_sess", buggy.
> New "RDMA/rtrs-srv: don't guard the whole __alloc_srv with srv_mutex"
>
> v1:
> https://lore.kernel.org/linux-rdma/20201012131814.121096-1-jinpu.wang@cloud.ionos.com/
>
> Danil Kipnis (1):
> RDMA/rtrs-clt: remove destroy_con_cq_qp in case route resolving failed
>
> Gioh Kim (4):
> RDMA/rtrs-clt: missing error from rtrs_rdma_conn_established
> RDMA/rtrs: remove unnecessary argument dir of rtrs_iu_free
> RDMA/rtrs-clt: remove duplicated switch-case handling for CM error
> events
> RDMA/rtrs-clt: remove duplicated code
>
> Guoqing Jiang (5):
> RDMA/rtrs-srv: don't guard the whole __alloc_srv with srv_mutex
> RDMA/rtrs-srv: fix typo
> RDMA/rtrs-srv: kill rtrs_srv_change_state_get_old
> RDMA/rtrs: introduce rtrs_post_send
> RDMA/rtrs-clt: remove 'addr' from rtrs_clt_add_path_to_arr
>
> Jack Wang (2):
> RDMA/rtrs-clt: remove outdated comment in create_con_cq_qp
> RDMA/rtrs-clt: avoid run destroy_con_cq_qp/create_con_cq_qp in
> parallel
Applied to for next
Please note subjects in RDMA should lead with a capital letter:
RDMA/rtrs-clt: Remove outdated comment in create_con_cq_qp
Thanks,
Jason