All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
@ 2021-07-12  6:07 Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma; +Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang

Hi Jason, hi Doug,

Please consider to include following changes to upstream.

This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").

In commit b38041d50add, the signal flag is droped to fix the send queue full
logic, but introduced a worse bug the send queue overflow on both clt and srv
by heartbeat, sorry.

The patchset is orgnized as:
- patch1 debug patch.
- patch2 preparation.
- patch3 signal both IO and heartbeat.
- patch4 cleanup.
- patch5 cleanup
- patch6 move sq_wr_avail to account send queue full correctly.

The patches are created base v5.14-rc1.

Since v1:
* rebased to latest v5.14-rc1, target rc instread of for-next.

v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t

Jack Wang (6):
  RDMA/rtrs: Add error messages for failed operations.
  RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
  RDMA/rtrs: Enable the same selective signal for heartbeat and IO
  RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
  RDMA/rtrs: Remove unused flags parameter
  RDMA/rtrs: Move sq_wr_avail to rtrs_con

 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 11 ++++++++---
 drivers/infiniband/ulp/rtrs/rtrs-clt.h |  1 -
 drivers/infiniband/ulp/rtrs/rtrs-pri.h |  6 +++---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 19 ++++++++++---------
 drivers/infiniband/ulp/rtrs/rtrs-srv.h |  2 --
 drivers/infiniband/ulp/rtrs/rtrs.c     | 23 ++++++++++++++++-------
 6 files changed, 37 insertions(+), 25 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations.
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
@ 2021-07-12  6:07 ` Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con Jack Wang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Aleksei Marov, Gioh Kim

It could help debugging in case of error happens.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 61919ebd92b2..870b21f551a4 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -316,6 +316,7 @@ void rtrs_send_hb_ack(struct rtrs_sess *sess)
 	err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
 					     0, NULL);
 	if (err) {
+		rtrs_err(sess, "send HB ACK failed, errno: %d\n", err);
 		sess->hb_err_handler(usr_con);
 		return;
 	}
@@ -333,6 +334,7 @@ static void hb_work(struct work_struct *work)
 	usr_con = sess->con[0];
 
 	if (sess->hb_missed_cnt > sess->hb_missed_max) {
+		rtrs_err(sess, "HB missed max reached.\n");
 		sess->hb_err_handler(usr_con);
 		return;
 	}
@@ -348,6 +350,7 @@ static void hb_work(struct work_struct *work)
 	err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
 					     0, NULL);
 	if (err) {
+		rtrs_err(sess, "HB send failed, errno: %d\n", err);
 		sess->hb_err_handler(usr_con);
 		return;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
@ 2021-07-12  6:07 ` Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO Jack Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Aleksei Marov, Gioh Kim

We need to track also the wr used for heatbeat. This is
a preparation for that, will be used in later patch.

The io_cnt in rtrs_clt is removed, use wr_cnt instead.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 7 ++++---
 drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 -
 drivers/infiniband/ulp/rtrs/rtrs-pri.h | 1 +
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 6 +++---
 drivers/infiniband/ulp/rtrs/rtrs-srv.h | 1 -
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index f2c40e50f25e..5cb00ea08919 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -478,7 +478,7 @@ static int rtrs_post_send_rdma(struct rtrs_clt_con *con,
 	 * From time to time we have to post signalled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = atomic_inc_return(&con->io_cnt) % sess->queue_depth ?
+	flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
 			0 : IB_SEND_SIGNALED;
 
 	ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -1043,7 +1043,7 @@ static int rtrs_post_rdma_write_sg(struct rtrs_clt_con *con,
 	 * From time to time we have to post signalled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = atomic_inc_return(&con->io_cnt) % sess->queue_depth ?
+	flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
 			0 : IB_SEND_SIGNALED;
 
 	ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -1601,7 +1601,8 @@ static int create_con(struct rtrs_clt_sess *sess, unsigned int cid)
 	con->cpu  = (cid ? cid - 1 : 0) % nr_cpu_ids;
 	con->c.cid = cid;
 	con->c.sess = &sess->s;
-	atomic_set(&con->io_cnt, 0);
+	/* Align with srv, init as 1 */
+	atomic_set(&con->c.wr_cnt, 1);
 	mutex_init(&con->con_mutex);
 
 	sess->s.con[cid] = &con->c;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
index e276a2dfcf7c..3c3ff094588c 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
@@ -74,7 +74,6 @@ struct rtrs_clt_con {
 	u32			queue_num;
 	unsigned int		cpu;
 	struct mutex		con_mutex;
-	atomic_t		io_cnt;
 	int			cm_err;
 };
 
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index 36f184a3b676..a44a4fb1b515 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -96,6 +96,7 @@ struct rtrs_con {
 	struct rdma_cm_id	*cm_id;
 	unsigned int		cid;
 	int                     nr_cqe;
+	atomic_t		wr_cnt;
 };
 
 struct rtrs_sess {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 3df290086169..31b846ca0c5e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -269,7 +269,7 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
 	 * From time to time we have to post signaled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = (atomic_inc_return(&id->con->wr_cnt) % srv->queue_depth) ?
+	flags = (atomic_inc_return(&id->con->c.wr_cnt) % srv->queue_depth) ?
 		0 : IB_SEND_SIGNALED;
 
 	if (need_inval) {
@@ -396,7 +396,7 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id,
 	 * From time to time we have to post signalled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = (atomic_inc_return(&con->wr_cnt) % srv->queue_depth) ?
+	flags = (atomic_inc_return(&con->c.wr_cnt) % srv->queue_depth) ?
 		0 : IB_SEND_SIGNALED;
 	imm = rtrs_to_io_rsp_imm(id->msg_id, errno, need_inval);
 	imm_wr.wr.next = NULL;
@@ -1648,7 +1648,7 @@ static int create_con(struct rtrs_srv_sess *sess,
 	con->c.cm_id = cm_id;
 	con->c.sess = &sess->s;
 	con->c.cid = cid;
-	atomic_set(&con->wr_cnt, 1);
+	atomic_set(&con->c.wr_cnt, 1);
 	wr_limit = sess->s.dev->ib_dev->attrs.max_qp_wr;
 
 	if (con->c.cid == 0) {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.h b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
index f8da2e3f0bda..6785c3b6363e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
@@ -42,7 +42,6 @@ struct rtrs_srv_stats {
 
 struct rtrs_srv_con {
 	struct rtrs_con		c;
-	atomic_t		wr_cnt;
 	atomic_t		sq_wr_avail;
 	struct list_head	rsp_wr_wait_list;
 	spinlock_t		rsp_wr_wait_lock;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con Jack Wang
@ 2021-07-12  6:07 ` Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static Jack Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Aleksei Marov, Gioh Kim

On idle session, because we do not do signal for heartbeat, it will
overflow the send queue after sometime.

To avoid that, we need to enable the signal for heartbeat. To do that,
add a new member signal_interval in rtrs_path, which will set min of
queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat
and IO, so the sq queue full accounting is correct.

Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c |  7 +++++--
 drivers/infiniband/ulp/rtrs/rtrs-pri.h |  1 +
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 11 ++++++-----
 drivers/infiniband/ulp/rtrs/rtrs.c     |  7 ++++++-
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5cb00ea08919..f023676e05e4 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -478,7 +478,7 @@ static int rtrs_post_send_rdma(struct rtrs_clt_con *con,
 	 * From time to time we have to post signalled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
+	flags = atomic_inc_return(&con->c.wr_cnt) % sess->s.signal_interval ?
 			0 : IB_SEND_SIGNALED;
 
 	ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -680,6 +680,7 @@ static void rtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
 	case IB_WC_RDMA_WRITE:
 		/*
 		 * post_send() RDMA write completions of IO reqs (read/write)
+		 * and hb.
 		 */
 		break;
 
@@ -1043,7 +1044,7 @@ static int rtrs_post_rdma_write_sg(struct rtrs_clt_con *con,
 	 * From time to time we have to post signalled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
+	flags = atomic_inc_return(&con->c.wr_cnt) % sess->s.signal_interval ?
 			0 : IB_SEND_SIGNALED;
 
 	ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -1849,6 +1850,8 @@ static int rtrs_rdma_conn_established(struct rtrs_clt_con *con,
 				return -ENOMEM;
 		}
 		sess->queue_depth = queue_depth;
+		sess->s.signal_interval = min_not_zero(queue_depth,
+						(unsigned short) SERVICE_CON_QUEUE_DEPTH);
 		sess->max_hdr_size = le32_to_cpu(msg->max_hdr_size);
 		sess->max_io_size = le32_to_cpu(msg->max_io_size);
 		sess->flags = le32_to_cpu(msg->flags);
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index a44a4fb1b515..b88a4944cb30 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -109,6 +109,7 @@ struct rtrs_sess {
 	unsigned int		con_num;
 	unsigned int		irq_con_num;
 	unsigned int		recon_cnt;
+	unsigned int		signal_interval;
 	struct rtrs_ib_dev	*dev;
 	int			dev_ref;
 	struct ib_cqe		*hb_cqe;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 31b846ca0c5e..44ed15f38896 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -201,7 +201,6 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
 	struct rtrs_srv_sess *sess = to_srv_sess(s);
 	dma_addr_t dma_addr = sess->dma_addr[id->msg_id];
 	struct rtrs_srv_mr *srv_mr;
-	struct rtrs_srv *srv = sess->srv;
 	struct ib_send_wr inv_wr;
 	struct ib_rdma_wr imm_wr;
 	struct ib_rdma_wr *wr = NULL;
@@ -269,7 +268,7 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
 	 * From time to time we have to post signaled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = (atomic_inc_return(&id->con->c.wr_cnt) % srv->queue_depth) ?
+	flags = (atomic_inc_return(&id->con->c.wr_cnt) % s->signal_interval) ?
 		0 : IB_SEND_SIGNALED;
 
 	if (need_inval) {
@@ -347,7 +346,6 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id,
 	struct ib_send_wr inv_wr, *wr = NULL;
 	struct ib_rdma_wr imm_wr;
 	struct ib_reg_wr rwr;
-	struct rtrs_srv *srv = sess->srv;
 	struct rtrs_srv_mr *srv_mr;
 	bool need_inval = false;
 	enum ib_send_flags flags;
@@ -396,7 +394,7 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id,
 	 * From time to time we have to post signalled sends,
 	 * or send queue will fill up and only QP reset can help.
 	 */
-	flags = (atomic_inc_return(&con->c.wr_cnt) % srv->queue_depth) ?
+	flags = (atomic_inc_return(&con->c.wr_cnt) % s->signal_interval) ?
 		0 : IB_SEND_SIGNALED;
 	imm = rtrs_to_io_rsp_imm(id->msg_id, errno, need_inval);
 	imm_wr.wr.next = NULL;
@@ -1268,8 +1266,9 @@ static void rtrs_srv_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
 	case IB_WC_SEND:
 		/*
 		 * post_send() RDMA write completions of IO reqs (read/write)
+		 * and hb.
 		 */
-		atomic_add(srv->queue_depth, &con->sq_wr_avail);
+		atomic_add(s->signal_interval, &con->sq_wr_avail);
 
 		if (unlikely(!list_empty_careful(&con->rsp_wr_wait_list)))
 			rtrs_rdma_process_wr_wait_list(con);
@@ -1659,6 +1658,8 @@ static int create_con(struct rtrs_srv_sess *sess,
 		max_send_wr = min_t(int, wr_limit,
 				    SERVICE_CON_QUEUE_DEPTH * 2 + 2);
 		max_recv_wr = max_send_wr;
+		s->signal_interval = min_not_zero(srv->queue_depth,
+						  (size_t)SERVICE_CON_QUEUE_DEPTH);
 	} else {
 		/* when always_invlaidate enalbed, we need linv+rinv+mr+imm */
 		if (always_invalidate)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 870b21f551a4..7f2dfb9d11fc 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -187,10 +187,15 @@ int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
 				    struct ib_send_wr *head)
 {
 	struct ib_rdma_wr wr;
+	struct rtrs_sess *sess = con->sess;
+	enum ib_send_flags sflags;
+
+	sflags = (atomic_inc_return(&con->wr_cnt) % sess->signal_interval) ?
+		0 : IB_SEND_SIGNALED;
 
 	wr = (struct ib_rdma_wr) {
 		.wr.wr_cqe	= cqe,
-		.wr.send_flags	= flags,
+		.wr.send_flags	= sflags,
 		.wr.opcode	= IB_WR_RDMA_WRITE_WITH_IMM,
 		.wr.ex.imm_data	= cpu_to_be32(imm_data),
 	};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
                   ` (2 preceding siblings ...)
  2021-07-12  6:07 ` [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO Jack Wang
@ 2021-07-12  6:07 ` Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter Jack Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Aleksei Marov, Gioh Kim

It's only used in rtrs.c, so no need to export.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-pri.h | 3 ---
 drivers/infiniband/ulp/rtrs/rtrs.c     | 9 +++++----
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index b88a4944cb30..76581ebaed1d 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -311,9 +311,6 @@ int rtrs_iu_post_rdma_write_imm(struct rtrs_con *con, struct rtrs_iu *iu,
 				struct ib_send_wr *tail);
 
 int rtrs_post_recv_empty(struct rtrs_con *con, struct ib_cqe *cqe);
-int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
-				   u32 imm_data, enum ib_send_flags flags,
-				   struct ib_send_wr *head);
 
 int rtrs_cq_qp_create(struct rtrs_sess *sess, struct rtrs_con *con,
 		      u32 max_send_sge, int cq_vector, int nr_cqe,
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 7f2dfb9d11fc..528d6a57c9b6 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -182,9 +182,11 @@ int rtrs_iu_post_rdma_write_imm(struct rtrs_con *con, struct rtrs_iu *iu,
 }
 EXPORT_SYMBOL_GPL(rtrs_iu_post_rdma_write_imm);
 
-int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
-				    u32 imm_data, enum ib_send_flags flags,
-				    struct ib_send_wr *head)
+static int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con,
+					  struct ib_cqe *cqe,
+					  u32 imm_data,
+					  enum ib_send_flags flags,
+					  struct ib_send_wr *head)
 {
 	struct ib_rdma_wr wr;
 	struct rtrs_sess *sess = con->sess;
@@ -202,7 +204,6 @@ int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
 
 	return rtrs_post_send(con->qp, head, &wr.wr, NULL);
 }
-EXPORT_SYMBOL_GPL(rtrs_post_rdma_write_imm_empty);
 
 static void qp_event_handler(struct ib_event *ev, void *ctx)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
                   ` (3 preceding siblings ...)
  2021-07-12  6:07 ` [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static Jack Wang
@ 2021-07-12  6:07 ` Jack Wang
  2021-07-12  6:07 ` [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con Jack Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Aleksei Marov, Gioh Kim

flags is not used, so remove it from rtrs_post_rdma_write_imm_empty.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 528d6a57c9b6..b56dc5b82db0 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -185,7 +185,6 @@ EXPORT_SYMBOL_GPL(rtrs_iu_post_rdma_write_imm);
 static int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con,
 					  struct ib_cqe *cqe,
 					  u32 imm_data,
-					  enum ib_send_flags flags,
 					  struct ib_send_wr *head)
 {
 	struct ib_rdma_wr wr;
@@ -320,7 +319,7 @@ void rtrs_send_hb_ack(struct rtrs_sess *sess)
 
 	imm = rtrs_to_imm(RTRS_HB_ACK_IMM, 0);
 	err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
-					     0, NULL);
+					     NULL);
 	if (err) {
 		rtrs_err(sess, "send HB ACK failed, errno: %d\n", err);
 		sess->hb_err_handler(usr_con);
@@ -354,7 +353,7 @@ static void hb_work(struct work_struct *work)
 
 	imm = rtrs_to_imm(RTRS_HB_MSG_IMM, 0);
 	err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
-					     0, NULL);
+					     NULL);
 	if (err) {
 		rtrs_err(sess, "HB send failed, errno: %d\n", err);
 		sess->hb_err_handler(usr_con);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
                   ` (4 preceding siblings ...)
  2021-07-12  6:07 ` [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter Jack Wang
@ 2021-07-12  6:07 ` Jack Wang
  2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
  2021-07-15 17:41 ` Jason Gunthorpe
  7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12  6:07 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Aleksei Marov, Gioh Kim

In order to account HB for sq_wr_avail properly,
move sq_wr_avail from rtrs_srv_con to rtrs_con.

Although rtrs-clt do not care sq_wr_avail, but still init it
to max_send_wr.

Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 1 +
 drivers/infiniband/ulp/rtrs/rtrs-pri.h | 1 +
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 8 ++++----
 drivers/infiniband/ulp/rtrs/rtrs-srv.h | 1 -
 drivers/infiniband/ulp/rtrs/rtrs.c     | 1 +
 5 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index f023676e05e4..ece3205531b8 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -1680,6 +1680,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con)
 			      sess->queue_depth * 3 + 1);
 		max_send_sge = 2;
 	}
+	atomic_set(&con->c.sq_wr_avail, max_send_wr);
 	cq_num = max_send_wr + max_recv_wr;
 	/* alloc iu to recv new rkey reply when server reports flags set */
 	if (sess->flags & RTRS_MSG_NEW_RKEY_F || con->c.cid == 0) {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index 76581ebaed1d..d12ddfa50747 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -97,6 +97,7 @@ struct rtrs_con {
 	unsigned int		cid;
 	int                     nr_cqe;
 	atomic_t		wr_cnt;
+	atomic_t		sq_wr_avail;
 };
 
 struct rtrs_sess {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 44ed15f38896..cd9a4ccf4c28 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -507,11 +507,11 @@ bool rtrs_srv_resp_rdma(struct rtrs_srv_op *id, int status)
 		ib_update_fast_reg_key(mr->mr, ib_inc_rkey(mr->mr->rkey));
 	}
 	if (unlikely(atomic_sub_return(1,
-				       &con->sq_wr_avail) < 0)) {
+				       &con->c.sq_wr_avail) < 0)) {
 		rtrs_err(s, "IB send queue full: sess=%s cid=%d\n",
 			 kobject_name(&sess->kobj),
 			 con->c.cid);
-		atomic_add(1, &con->sq_wr_avail);
+		atomic_add(1, &con->c.sq_wr_avail);
 		spin_lock(&con->rsp_wr_wait_lock);
 		list_add_tail(&id->wait_list, &con->rsp_wr_wait_list);
 		spin_unlock(&con->rsp_wr_wait_lock);
@@ -1268,7 +1268,7 @@ static void rtrs_srv_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
 		 * post_send() RDMA write completions of IO reqs (read/write)
 		 * and hb.
 		 */
-		atomic_add(s->signal_interval, &con->sq_wr_avail);
+		atomic_add(s->signal_interval, &con->c.sq_wr_avail);
 
 		if (unlikely(!list_empty_careful(&con->rsp_wr_wait_list)))
 			rtrs_rdma_process_wr_wait_list(con);
@@ -1680,7 +1680,7 @@ static int create_con(struct rtrs_srv_sess *sess,
 		 */
 	}
 	cq_num = max_send_wr + max_recv_wr;
-	atomic_set(&con->sq_wr_avail, max_send_wr);
+	atomic_set(&con->c.sq_wr_avail, max_send_wr);
 	cq_vector = rtrs_srv_get_next_cq_vector(sess);
 
 	/* TODO: SOFTIRQ can be faster, but be careful with softirq context */
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.h b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
index 6785c3b6363e..e81774f5acd3 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
@@ -42,7 +42,6 @@ struct rtrs_srv_stats {
 
 struct rtrs_srv_con {
 	struct rtrs_con		c;
-	atomic_t		sq_wr_avail;
 	struct list_head	rsp_wr_wait_list;
 	spinlock_t		rsp_wr_wait_lock;
 };
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index b56dc5b82db0..ca542e477d38 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -191,6 +191,7 @@ static int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con,
 	struct rtrs_sess *sess = con->sess;
 	enum ib_send_flags sflags;
 
+	atomic_dec_if_positive(&con->sq_wr_avail);
 	sflags = (atomic_inc_return(&con->wr_cnt) % sess->signal_interval) ?
 		0 : IB_SEND_SIGNALED;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
                   ` (5 preceding siblings ...)
  2021-07-12  6:07 ` [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con Jack Wang
@ 2021-07-12 17:35 ` Jason Gunthorpe
  2021-07-12 18:58   ` Jinpu Wang
  2021-07-15 17:41 ` Jason Gunthorpe
  7 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-07-12 17:35 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal

On Mon, Jul 12, 2021 at 08:07:44AM +0200, Jack Wang wrote:
> Hi Jason, hi Doug,
> 
> Please consider to include following changes to upstream.
> 
> This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
> 
> In commit b38041d50add, the signal flag is droped to fix the send queue full
> logic, but introduced a worse bug the send queue overflow on both clt and srv
> by heartbeat, sorry.
> 
> The patchset is orgnized as:
> - patch1 debug patch.
> - patch2 preparation.
> - patch3 signal both IO and heartbeat.
> - patch4 cleanup.
> - patch5 cleanup
> - patch6 move sq_wr_avail to account send queue full correctly.
> 
> The patches are created base v5.14-rc1.
> 
> Since v1:
> * rebased to latest v5.14-rc1, target rc instread of for-next.
> 
> v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
> 
> Jack Wang (6):
>   RDMA/rtrs: Add error messages for failed operations.
>   RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
>   RDMA/rtrs: Enable the same selective signal for heartbeat and IO
>   RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
>   RDMA/rtrs: Remove unused flags parameter
>   RDMA/rtrs: Move sq_wr_avail to rtrs_con

This is not really structured well enough for a -rc patch. There
should be no unncessary changes and each patch should try to be self
contained. Avoid "cleanup". Carefully describe what user visible
defect each patch is fixing.

If you really want it to be in -rc then it needs reorganizing,
otherwise I can put it in -next

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
  2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
@ 2021-07-12 18:58   ` Jinpu Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Jinpu Wang @ 2021-07-12 18:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: RDMA mailing list, Bart Van Assche, Leon Romanovsky,
	Doug Ledford, Haris Iqbal

On Mon, Jul 12, 2021 at 7:35 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Jul 12, 2021 at 08:07:44AM +0200, Jack Wang wrote:
> > Hi Jason, hi Doug,
> >
> > Please consider to include following changes to upstream.
> >
> > This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
> >
> > In commit b38041d50add, the signal flag is droped to fix the send queue full
> > logic, but introduced a worse bug the send queue overflow on both clt and srv
> > by heartbeat, sorry.
> >
> > The patchset is orgnized as:
> > - patch1 debug patch.
> > - patch2 preparation.
> > - patch3 signal both IO and heartbeat.
> > - patch4 cleanup.
> > - patch5 cleanup
> > - patch6 move sq_wr_avail to account send queue full correctly.
> >
> > The patches are created base v5.14-rc1.
> >
> > Since v1:
> > * rebased to latest v5.14-rc1, target rc instread of for-next.
> >
> > v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
> >
> > Jack Wang (6):
> >   RDMA/rtrs: Add error messages for failed operations.
> >   RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
> >   RDMA/rtrs: Enable the same selective signal for heartbeat and IO
> >   RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
> >   RDMA/rtrs: Remove unused flags parameter
> >   RDMA/rtrs: Move sq_wr_avail to rtrs_con
>
> This is not really structured well enough for a -rc patch. There
> should be no unncessary changes and each patch should try to be self
> contained. Avoid "cleanup". Carefully describe what user visible
> defect each patch is fixing.
>
> If you really want it to be in -rc then it needs reorganizing,
> otherwise I can put it in -next
>
> Jason
Hi Jason,

Thanks for your suggestion, I think it would be ok to put them in for-next.

Regards!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
  2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
                   ` (6 preceding siblings ...)
  2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
@ 2021-07-15 17:41 ` Jason Gunthorpe
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2021-07-15 17:41 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal

On Mon, Jul 12, 2021 at 08:07:44AM +0200, Jack Wang wrote:
> Hi Jason, hi Doug,
> 
> Please consider to include following changes to upstream.
> 
> This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
> 
> In commit b38041d50add, the signal flag is droped to fix the send queue full
> logic, but introduced a worse bug the send queue overflow on both clt and srv
> by heartbeat, sorry.
> 
> The patchset is orgnized as:
> - patch1 debug patch.
> - patch2 preparation.
> - patch3 signal both IO and heartbeat.
> - patch4 cleanup.
> - patch5 cleanup
> - patch6 move sq_wr_avail to account send queue full correctly.
> 
> The patches are created base v5.14-rc1.
> 
> Since v1:
> * rebased to latest v5.14-rc1, target rc instread of for-next.
> 
> v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
> 
> Jack Wang (6):
>   RDMA/rtrs: Add error messages for failed operations.
>   RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
>   RDMA/rtrs: Enable the same selective signal for heartbeat and IO
>   RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
>   RDMA/rtrs: Remove unused flags parameter
>   RDMA/rtrs: Move sq_wr_avail to rtrs_con

Applied to for-next, thanks

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-15 17:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-12  6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
2021-07-12  6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
2021-07-12  6:07 ` [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con Jack Wang
2021-07-12  6:07 ` [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO Jack Wang
2021-07-12  6:07 ` [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static Jack Wang
2021-07-12  6:07 ` [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter Jack Wang
2021-07-12  6:07 ` [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con Jack Wang
2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
2021-07-12 18:58   ` Jinpu Wang
2021-07-15 17:41 ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.