* [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
@ 2021-07-12 6:07 Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma; +Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang
Hi Jason, hi Doug,
Please consider to include following changes to upstream.
This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
In commit b38041d50add, the signal flag is droped to fix the send queue full
logic, but introduced a worse bug the send queue overflow on both clt and srv
by heartbeat, sorry.
The patchset is orgnized as:
- patch1 debug patch.
- patch2 preparation.
- patch3 signal both IO and heartbeat.
- patch4 cleanup.
- patch5 cleanup
- patch6 move sq_wr_avail to account send queue full correctly.
The patches are created base v5.14-rc1.
Since v1:
* rebased to latest v5.14-rc1, target rc instread of for-next.
v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
Jack Wang (6):
RDMA/rtrs: Add error messages for failed operations.
RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
RDMA/rtrs: Enable the same selective signal for heartbeat and IO
RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
RDMA/rtrs: Remove unused flags parameter
RDMA/rtrs: Move sq_wr_avail to rtrs_con
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 11 ++++++++---
drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 -
drivers/infiniband/ulp/rtrs/rtrs-pri.h | 6 +++---
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 19 ++++++++++---------
drivers/infiniband/ulp/rtrs/rtrs-srv.h | 2 --
drivers/infiniband/ulp/rtrs/rtrs.c | 23 ++++++++++++++++-------
6 files changed, 37 insertions(+), 25 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations.
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
@ 2021-07-12 6:07 ` Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con Jack Wang
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma
Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
Aleksei Marov, Gioh Kim
It could help debugging in case of error happens.
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 61919ebd92b2..870b21f551a4 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -316,6 +316,7 @@ void rtrs_send_hb_ack(struct rtrs_sess *sess)
err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
0, NULL);
if (err) {
+ rtrs_err(sess, "send HB ACK failed, errno: %d\n", err);
sess->hb_err_handler(usr_con);
return;
}
@@ -333,6 +334,7 @@ static void hb_work(struct work_struct *work)
usr_con = sess->con[0];
if (sess->hb_missed_cnt > sess->hb_missed_max) {
+ rtrs_err(sess, "HB missed max reached.\n");
sess->hb_err_handler(usr_con);
return;
}
@@ -348,6 +350,7 @@ static void hb_work(struct work_struct *work)
err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
0, NULL);
if (err) {
+ rtrs_err(sess, "HB send failed, errno: %d\n", err);
sess->hb_err_handler(usr_con);
return;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
@ 2021-07-12 6:07 ` Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO Jack Wang
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma
Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
Aleksei Marov, Gioh Kim
We need to track also the wr used for heatbeat. This is
a preparation for that, will be used in later patch.
The io_cnt in rtrs_clt is removed, use wr_cnt instead.
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 7 ++++---
drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 -
drivers/infiniband/ulp/rtrs/rtrs-pri.h | 1 +
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 6 +++---
drivers/infiniband/ulp/rtrs/rtrs-srv.h | 1 -
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index f2c40e50f25e..5cb00ea08919 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -478,7 +478,7 @@ static int rtrs_post_send_rdma(struct rtrs_clt_con *con,
* From time to time we have to post signalled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = atomic_inc_return(&con->io_cnt) % sess->queue_depth ?
+ flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
0 : IB_SEND_SIGNALED;
ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -1043,7 +1043,7 @@ static int rtrs_post_rdma_write_sg(struct rtrs_clt_con *con,
* From time to time we have to post signalled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = atomic_inc_return(&con->io_cnt) % sess->queue_depth ?
+ flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
0 : IB_SEND_SIGNALED;
ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -1601,7 +1601,8 @@ static int create_con(struct rtrs_clt_sess *sess, unsigned int cid)
con->cpu = (cid ? cid - 1 : 0) % nr_cpu_ids;
con->c.cid = cid;
con->c.sess = &sess->s;
- atomic_set(&con->io_cnt, 0);
+ /* Align with srv, init as 1 */
+ atomic_set(&con->c.wr_cnt, 1);
mutex_init(&con->con_mutex);
sess->s.con[cid] = &con->c;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
index e276a2dfcf7c..3c3ff094588c 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
@@ -74,7 +74,6 @@ struct rtrs_clt_con {
u32 queue_num;
unsigned int cpu;
struct mutex con_mutex;
- atomic_t io_cnt;
int cm_err;
};
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index 36f184a3b676..a44a4fb1b515 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -96,6 +96,7 @@ struct rtrs_con {
struct rdma_cm_id *cm_id;
unsigned int cid;
int nr_cqe;
+ atomic_t wr_cnt;
};
struct rtrs_sess {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 3df290086169..31b846ca0c5e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -269,7 +269,7 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
* From time to time we have to post signaled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = (atomic_inc_return(&id->con->wr_cnt) % srv->queue_depth) ?
+ flags = (atomic_inc_return(&id->con->c.wr_cnt) % srv->queue_depth) ?
0 : IB_SEND_SIGNALED;
if (need_inval) {
@@ -396,7 +396,7 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id,
* From time to time we have to post signalled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = (atomic_inc_return(&con->wr_cnt) % srv->queue_depth) ?
+ flags = (atomic_inc_return(&con->c.wr_cnt) % srv->queue_depth) ?
0 : IB_SEND_SIGNALED;
imm = rtrs_to_io_rsp_imm(id->msg_id, errno, need_inval);
imm_wr.wr.next = NULL;
@@ -1648,7 +1648,7 @@ static int create_con(struct rtrs_srv_sess *sess,
con->c.cm_id = cm_id;
con->c.sess = &sess->s;
con->c.cid = cid;
- atomic_set(&con->wr_cnt, 1);
+ atomic_set(&con->c.wr_cnt, 1);
wr_limit = sess->s.dev->ib_dev->attrs.max_qp_wr;
if (con->c.cid == 0) {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.h b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
index f8da2e3f0bda..6785c3b6363e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
@@ -42,7 +42,6 @@ struct rtrs_srv_stats {
struct rtrs_srv_con {
struct rtrs_con c;
- atomic_t wr_cnt;
atomic_t sq_wr_avail;
struct list_head rsp_wr_wait_list;
spinlock_t rsp_wr_wait_lock;
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con Jack Wang
@ 2021-07-12 6:07 ` Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static Jack Wang
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma
Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
Aleksei Marov, Gioh Kim
On idle session, because we do not do signal for heartbeat, it will
overflow the send queue after sometime.
To avoid that, we need to enable the signal for heartbeat. To do that,
add a new member signal_interval in rtrs_path, which will set min of
queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat
and IO, so the sq queue full accounting is correct.
Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 7 +++++--
drivers/infiniband/ulp/rtrs/rtrs-pri.h | 1 +
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 11 ++++++-----
drivers/infiniband/ulp/rtrs/rtrs.c | 7 ++++++-
4 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5cb00ea08919..f023676e05e4 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -478,7 +478,7 @@ static int rtrs_post_send_rdma(struct rtrs_clt_con *con,
* From time to time we have to post signalled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
+ flags = atomic_inc_return(&con->c.wr_cnt) % sess->s.signal_interval ?
0 : IB_SEND_SIGNALED;
ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -680,6 +680,7 @@ static void rtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
case IB_WC_RDMA_WRITE:
/*
* post_send() RDMA write completions of IO reqs (read/write)
+ * and hb.
*/
break;
@@ -1043,7 +1044,7 @@ static int rtrs_post_rdma_write_sg(struct rtrs_clt_con *con,
* From time to time we have to post signalled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = atomic_inc_return(&con->c.wr_cnt) % sess->queue_depth ?
+ flags = atomic_inc_return(&con->c.wr_cnt) % sess->s.signal_interval ?
0 : IB_SEND_SIGNALED;
ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
@@ -1849,6 +1850,8 @@ static int rtrs_rdma_conn_established(struct rtrs_clt_con *con,
return -ENOMEM;
}
sess->queue_depth = queue_depth;
+ sess->s.signal_interval = min_not_zero(queue_depth,
+ (unsigned short) SERVICE_CON_QUEUE_DEPTH);
sess->max_hdr_size = le32_to_cpu(msg->max_hdr_size);
sess->max_io_size = le32_to_cpu(msg->max_io_size);
sess->flags = le32_to_cpu(msg->flags);
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index a44a4fb1b515..b88a4944cb30 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -109,6 +109,7 @@ struct rtrs_sess {
unsigned int con_num;
unsigned int irq_con_num;
unsigned int recon_cnt;
+ unsigned int signal_interval;
struct rtrs_ib_dev *dev;
int dev_ref;
struct ib_cqe *hb_cqe;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 31b846ca0c5e..44ed15f38896 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -201,7 +201,6 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
struct rtrs_srv_sess *sess = to_srv_sess(s);
dma_addr_t dma_addr = sess->dma_addr[id->msg_id];
struct rtrs_srv_mr *srv_mr;
- struct rtrs_srv *srv = sess->srv;
struct ib_send_wr inv_wr;
struct ib_rdma_wr imm_wr;
struct ib_rdma_wr *wr = NULL;
@@ -269,7 +268,7 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
* From time to time we have to post signaled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = (atomic_inc_return(&id->con->c.wr_cnt) % srv->queue_depth) ?
+ flags = (atomic_inc_return(&id->con->c.wr_cnt) % s->signal_interval) ?
0 : IB_SEND_SIGNALED;
if (need_inval) {
@@ -347,7 +346,6 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id,
struct ib_send_wr inv_wr, *wr = NULL;
struct ib_rdma_wr imm_wr;
struct ib_reg_wr rwr;
- struct rtrs_srv *srv = sess->srv;
struct rtrs_srv_mr *srv_mr;
bool need_inval = false;
enum ib_send_flags flags;
@@ -396,7 +394,7 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, struct rtrs_srv_op *id,
* From time to time we have to post signalled sends,
* or send queue will fill up and only QP reset can help.
*/
- flags = (atomic_inc_return(&con->c.wr_cnt) % srv->queue_depth) ?
+ flags = (atomic_inc_return(&con->c.wr_cnt) % s->signal_interval) ?
0 : IB_SEND_SIGNALED;
imm = rtrs_to_io_rsp_imm(id->msg_id, errno, need_inval);
imm_wr.wr.next = NULL;
@@ -1268,8 +1266,9 @@ static void rtrs_srv_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
case IB_WC_SEND:
/*
* post_send() RDMA write completions of IO reqs (read/write)
+ * and hb.
*/
- atomic_add(srv->queue_depth, &con->sq_wr_avail);
+ atomic_add(s->signal_interval, &con->sq_wr_avail);
if (unlikely(!list_empty_careful(&con->rsp_wr_wait_list)))
rtrs_rdma_process_wr_wait_list(con);
@@ -1659,6 +1658,8 @@ static int create_con(struct rtrs_srv_sess *sess,
max_send_wr = min_t(int, wr_limit,
SERVICE_CON_QUEUE_DEPTH * 2 + 2);
max_recv_wr = max_send_wr;
+ s->signal_interval = min_not_zero(srv->queue_depth,
+ (size_t)SERVICE_CON_QUEUE_DEPTH);
} else {
/* when always_invlaidate enalbed, we need linv+rinv+mr+imm */
if (always_invalidate)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 870b21f551a4..7f2dfb9d11fc 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -187,10 +187,15 @@ int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
struct ib_send_wr *head)
{
struct ib_rdma_wr wr;
+ struct rtrs_sess *sess = con->sess;
+ enum ib_send_flags sflags;
+
+ sflags = (atomic_inc_return(&con->wr_cnt) % sess->signal_interval) ?
+ 0 : IB_SEND_SIGNALED;
wr = (struct ib_rdma_wr) {
.wr.wr_cqe = cqe,
- .wr.send_flags = flags,
+ .wr.send_flags = sflags,
.wr.opcode = IB_WR_RDMA_WRITE_WITH_IMM,
.wr.ex.imm_data = cpu_to_be32(imm_data),
};
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
` (2 preceding siblings ...)
2021-07-12 6:07 ` [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO Jack Wang
@ 2021-07-12 6:07 ` Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter Jack Wang
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma
Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
Aleksei Marov, Gioh Kim
It's only used in rtrs.c, so no need to export.
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-pri.h | 3 ---
drivers/infiniband/ulp/rtrs/rtrs.c | 9 +++++----
2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index b88a4944cb30..76581ebaed1d 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -311,9 +311,6 @@ int rtrs_iu_post_rdma_write_imm(struct rtrs_con *con, struct rtrs_iu *iu,
struct ib_send_wr *tail);
int rtrs_post_recv_empty(struct rtrs_con *con, struct ib_cqe *cqe);
-int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
- u32 imm_data, enum ib_send_flags flags,
- struct ib_send_wr *head);
int rtrs_cq_qp_create(struct rtrs_sess *sess, struct rtrs_con *con,
u32 max_send_sge, int cq_vector, int nr_cqe,
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 7f2dfb9d11fc..528d6a57c9b6 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -182,9 +182,11 @@ int rtrs_iu_post_rdma_write_imm(struct rtrs_con *con, struct rtrs_iu *iu,
}
EXPORT_SYMBOL_GPL(rtrs_iu_post_rdma_write_imm);
-int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
- u32 imm_data, enum ib_send_flags flags,
- struct ib_send_wr *head)
+static int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con,
+ struct ib_cqe *cqe,
+ u32 imm_data,
+ enum ib_send_flags flags,
+ struct ib_send_wr *head)
{
struct ib_rdma_wr wr;
struct rtrs_sess *sess = con->sess;
@@ -202,7 +204,6 @@ int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con, struct ib_cqe *cqe,
return rtrs_post_send(con->qp, head, &wr.wr, NULL);
}
-EXPORT_SYMBOL_GPL(rtrs_post_rdma_write_imm_empty);
static void qp_event_handler(struct ib_event *ev, void *ctx)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
` (3 preceding siblings ...)
2021-07-12 6:07 ` [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static Jack Wang
@ 2021-07-12 6:07 ` Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con Jack Wang
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma
Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
Aleksei Marov, Gioh Kim
flags is not used, so remove it from rtrs_post_rdma_write_imm_empty.
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 528d6a57c9b6..b56dc5b82db0 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -185,7 +185,6 @@ EXPORT_SYMBOL_GPL(rtrs_iu_post_rdma_write_imm);
static int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con,
struct ib_cqe *cqe,
u32 imm_data,
- enum ib_send_flags flags,
struct ib_send_wr *head)
{
struct ib_rdma_wr wr;
@@ -320,7 +319,7 @@ void rtrs_send_hb_ack(struct rtrs_sess *sess)
imm = rtrs_to_imm(RTRS_HB_ACK_IMM, 0);
err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
- 0, NULL);
+ NULL);
if (err) {
rtrs_err(sess, "send HB ACK failed, errno: %d\n", err);
sess->hb_err_handler(usr_con);
@@ -354,7 +353,7 @@ static void hb_work(struct work_struct *work)
imm = rtrs_to_imm(RTRS_HB_MSG_IMM, 0);
err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
- 0, NULL);
+ NULL);
if (err) {
rtrs_err(sess, "HB send failed, errno: %d\n", err);
sess->hb_err_handler(usr_con);
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
` (4 preceding siblings ...)
2021-07-12 6:07 ` [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter Jack Wang
@ 2021-07-12 6:07 ` Jack Wang
2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
2021-07-15 17:41 ` Jason Gunthorpe
7 siblings, 0 replies; 10+ messages in thread
From: Jack Wang @ 2021-07-12 6:07 UTC (permalink / raw)
To: linux-rdma
Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
Aleksei Marov, Gioh Kim
In order to account HB for sq_wr_avail properly,
move sq_wr_avail from rtrs_srv_con to rtrs_con.
Although rtrs-clt do not care sq_wr_avail, but still init it
to max_send_wr.
Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 1 +
drivers/infiniband/ulp/rtrs/rtrs-pri.h | 1 +
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 8 ++++----
drivers/infiniband/ulp/rtrs/rtrs-srv.h | 1 -
drivers/infiniband/ulp/rtrs/rtrs.c | 1 +
5 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index f023676e05e4..ece3205531b8 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -1680,6 +1680,7 @@ static int create_con_cq_qp(struct rtrs_clt_con *con)
sess->queue_depth * 3 + 1);
max_send_sge = 2;
}
+ atomic_set(&con->c.sq_wr_avail, max_send_wr);
cq_num = max_send_wr + max_recv_wr;
/* alloc iu to recv new rkey reply when server reports flags set */
if (sess->flags & RTRS_MSG_NEW_RKEY_F || con->c.cid == 0) {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index 76581ebaed1d..d12ddfa50747 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -97,6 +97,7 @@ struct rtrs_con {
unsigned int cid;
int nr_cqe;
atomic_t wr_cnt;
+ atomic_t sq_wr_avail;
};
struct rtrs_sess {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 44ed15f38896..cd9a4ccf4c28 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -507,11 +507,11 @@ bool rtrs_srv_resp_rdma(struct rtrs_srv_op *id, int status)
ib_update_fast_reg_key(mr->mr, ib_inc_rkey(mr->mr->rkey));
}
if (unlikely(atomic_sub_return(1,
- &con->sq_wr_avail) < 0)) {
+ &con->c.sq_wr_avail) < 0)) {
rtrs_err(s, "IB send queue full: sess=%s cid=%d\n",
kobject_name(&sess->kobj),
con->c.cid);
- atomic_add(1, &con->sq_wr_avail);
+ atomic_add(1, &con->c.sq_wr_avail);
spin_lock(&con->rsp_wr_wait_lock);
list_add_tail(&id->wait_list, &con->rsp_wr_wait_list);
spin_unlock(&con->rsp_wr_wait_lock);
@@ -1268,7 +1268,7 @@ static void rtrs_srv_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
* post_send() RDMA write completions of IO reqs (read/write)
* and hb.
*/
- atomic_add(s->signal_interval, &con->sq_wr_avail);
+ atomic_add(s->signal_interval, &con->c.sq_wr_avail);
if (unlikely(!list_empty_careful(&con->rsp_wr_wait_list)))
rtrs_rdma_process_wr_wait_list(con);
@@ -1680,7 +1680,7 @@ static int create_con(struct rtrs_srv_sess *sess,
*/
}
cq_num = max_send_wr + max_recv_wr;
- atomic_set(&con->sq_wr_avail, max_send_wr);
+ atomic_set(&con->c.sq_wr_avail, max_send_wr);
cq_vector = rtrs_srv_get_next_cq_vector(sess);
/* TODO: SOFTIRQ can be faster, but be careful with softirq context */
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.h b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
index 6785c3b6363e..e81774f5acd3 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
@@ -42,7 +42,6 @@ struct rtrs_srv_stats {
struct rtrs_srv_con {
struct rtrs_con c;
- atomic_t sq_wr_avail;
struct list_head rsp_wr_wait_list;
spinlock_t rsp_wr_wait_lock;
};
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index b56dc5b82db0..ca542e477d38 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -191,6 +191,7 @@ static int rtrs_post_rdma_write_imm_empty(struct rtrs_con *con,
struct rtrs_sess *sess = con->sess;
enum ib_send_flags sflags;
+ atomic_dec_if_positive(&con->sq_wr_avail);
sflags = (atomic_inc_return(&con->wr_cnt) % sess->signal_interval) ?
0 : IB_SEND_SIGNALED;
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
` (5 preceding siblings ...)
2021-07-12 6:07 ` [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con Jack Wang
@ 2021-07-12 17:35 ` Jason Gunthorpe
2021-07-12 18:58 ` Jinpu Wang
2021-07-15 17:41 ` Jason Gunthorpe
7 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-07-12 17:35 UTC (permalink / raw)
To: Jack Wang; +Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal
On Mon, Jul 12, 2021 at 08:07:44AM +0200, Jack Wang wrote:
> Hi Jason, hi Doug,
>
> Please consider to include following changes to upstream.
>
> This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
>
> In commit b38041d50add, the signal flag is droped to fix the send queue full
> logic, but introduced a worse bug the send queue overflow on both clt and srv
> by heartbeat, sorry.
>
> The patchset is orgnized as:
> - patch1 debug patch.
> - patch2 preparation.
> - patch3 signal both IO and heartbeat.
> - patch4 cleanup.
> - patch5 cleanup
> - patch6 move sq_wr_avail to account send queue full correctly.
>
> The patches are created base v5.14-rc1.
>
> Since v1:
> * rebased to latest v5.14-rc1, target rc instread of for-next.
>
> v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
>
> Jack Wang (6):
> RDMA/rtrs: Add error messages for failed operations.
> RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
> RDMA/rtrs: Enable the same selective signal for heartbeat and IO
> RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
> RDMA/rtrs: Remove unused flags parameter
> RDMA/rtrs: Move sq_wr_avail to rtrs_con
This is not really structured well enough for a -rc patch. There
should be no unncessary changes and each patch should try to be self
contained. Avoid "cleanup". Carefully describe what user visible
defect each patch is fixing.
If you really want it to be in -rc then it needs reorganizing,
otherwise I can put it in -next
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
@ 2021-07-12 18:58 ` Jinpu Wang
0 siblings, 0 replies; 10+ messages in thread
From: Jinpu Wang @ 2021-07-12 18:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: RDMA mailing list, Bart Van Assche, Leon Romanovsky,
Doug Ledford, Haris Iqbal
On Mon, Jul 12, 2021 at 7:35 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Jul 12, 2021 at 08:07:44AM +0200, Jack Wang wrote:
> > Hi Jason, hi Doug,
> >
> > Please consider to include following changes to upstream.
> >
> > This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
> >
> > In commit b38041d50add, the signal flag is droped to fix the send queue full
> > logic, but introduced a worse bug the send queue overflow on both clt and srv
> > by heartbeat, sorry.
> >
> > The patchset is orgnized as:
> > - patch1 debug patch.
> > - patch2 preparation.
> > - patch3 signal both IO and heartbeat.
> > - patch4 cleanup.
> > - patch5 cleanup
> > - patch6 move sq_wr_avail to account send queue full correctly.
> >
> > The patches are created base v5.14-rc1.
> >
> > Since v1:
> > * rebased to latest v5.14-rc1, target rc instread of for-next.
> >
> > v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
> >
> > Jack Wang (6):
> > RDMA/rtrs: Add error messages for failed operations.
> > RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
> > RDMA/rtrs: Enable the same selective signal for heartbeat and IO
> > RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
> > RDMA/rtrs: Remove unused flags parameter
> > RDMA/rtrs: Move sq_wr_avail to rtrs_con
>
> This is not really structured well enough for a -rc patch. There
> should be no unncessary changes and each patch should try to be self
> contained. Avoid "cleanup". Carefully describe what user visible
> defect each patch is fixing.
>
> If you really want it to be in -rc then it needs reorganizing,
> otherwise I can put it in -next
>
> Jason
Hi Jason,
Thanks for your suggestion, I think it would be ok to put them in for-next.
Regards!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
` (6 preceding siblings ...)
2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
@ 2021-07-15 17:41 ` Jason Gunthorpe
7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2021-07-15 17:41 UTC (permalink / raw)
To: Jack Wang; +Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal
On Mon, Jul 12, 2021 at 08:07:44AM +0200, Jack Wang wrote:
> Hi Jason, hi Doug,
>
> Please consider to include following changes to upstream.
>
> This patchset fix a regression since b38041d50add ("RDMA/rtrs: Do not signal for heatbeat").
>
> In commit b38041d50add, the signal flag is droped to fix the send queue full
> logic, but introduced a worse bug the send queue overflow on both clt and srv
> by heartbeat, sorry.
>
> The patchset is orgnized as:
> - patch1 debug patch.
> - patch2 preparation.
> - patch3 signal both IO and heartbeat.
> - patch4 cleanup.
> - patch5 cleanup
> - patch6 move sq_wr_avail to account send queue full correctly.
>
> The patches are created base v5.14-rc1.
>
> Since v1:
> * rebased to latest v5.14-rc1, target rc instread of for-next.
>
> v1: https://lore.kernel.org/linux-rdma/20210629065321.12600-1-jinpu.wang@ionos.com/T/#t
>
> Jack Wang (6):
> RDMA/rtrs: Add error messages for failed operations.
> RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
> RDMA/rtrs: Enable the same selective signal for heartbeat and IO
> RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static
> RDMA/rtrs: Remove unused flags parameter
> RDMA/rtrs: Move sq_wr_avail to rtrs_con
Applied to for-next, thanks
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-07-15 17:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-12 6:07 [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 1/6] RDMA/rtrs: Add error messages for failed operations Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 2/6] RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 3/6] RDMA/rtrs: Enable the same selective signal for heartbeat and IO Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 4/6] RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 5/6] RDMA/rtrs: Remove unused flags parameter Jack Wang
2021-07-12 6:07 ` [PATCHv2 for-rc 6/6] RDMA/rtrs: Move sq_wr_avail to rtrs_con Jack Wang
2021-07-12 17:35 ` [PATCHv2 for-rc 0/6] Bugfixes for send queue overflow by heartbeat Jason Gunthorpe
2021-07-12 18:58 ` Jinpu Wang
2021-07-15 17:41 ` Jason Gunthorpe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.