All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 00/22] Misc update for rtrs
@ 2021-03-25 15:32 Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 01/22] MAINTAINERS: Change maintainer for rtrs module Gioh Kim
                   ` (22 more replies)
  0 siblings, 23 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang, Gioh Kim

Hi Jason, hi Doug,

Please consider to include following changes to the next merge window.

It contains a few bugfix and cleanup:
- Change maintainer
- Change domain address of maintainers' email: from cloud.ionos.com to ionos.com
- Add some fault-injection points and document update
- New policy for path finding: min-latency and document update
- Code refactoring to remove unused code and better error message 

Danil Kipnis (1):
  MAINTAINERS: Change maintainer for rtrs module

Gioh Kim (12):
  RDMA/rtrs: Enable the fault-injection
  RDMA/rtrs-clt: Inject a fault at request processing
  RDMA/rtrs-srv: Inject a fault at heart-beat sending
  docs: fault-injection: Add fault-injection manual of RTRS
  RDMA/rtrs: New function converting rtrs_addr to string
  RDMA/rtrs-clt: Print more info when an error happens
  RDMA/rtrs-srv: More debugging info when fail to send reply
  RDMA/rtrs-srv: Report temporary sessname for error message
  RDMA/rtrs-clt: Simplify error message
  RDMA/rtrs-clt: Add a minimum latency multipath policy
  RDMA/rtrs-clt: new sysfs attribute to print the latency of each path
  Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy

Guoqing Jiang (5):
  RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req
  RDMA/rtrs: Kill the put label in
    rtrs_srv_create_once_sysfs_root_folders
  RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs
  RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler

Jack Wang (2):
  RDMA/rtrs: cleanup unused variable
  RDMA/rtrs-clt: Cap max_io_size

Md Haris Iqbal (2):
  RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt
    session files
  RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its
    stats

 .../ABI/testing/sysfs-class-rtrs-client       |  12 ++
 .../fault-injection/rtrs-fault-injection.rst  |  83 ++++++++++++++
 MAINTAINERS                                   |   4 +-
 drivers/infiniband/ulp/rtrs/Makefile          |   2 +
 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c  |  91 +++++++++++++--
 drivers/infiniband/ulp/rtrs/rtrs-clt.c        | 108 +++++++++++++++---
 drivers/infiniband/ulp/rtrs/rtrs-clt.h        |  14 +++
 drivers/infiniband/ulp/rtrs/rtrs-fault.c      |  52 +++++++++
 drivers/infiniband/ulp/rtrs/rtrs-fault.h      |  28 +++++
 drivers/infiniband/ulp/rtrs/rtrs-pri.h        |   2 +
 drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c  |  64 +++++++++--
 drivers/infiniband/ulp/rtrs/rtrs-srv.c        |  31 +++--
 drivers/infiniband/ulp/rtrs/rtrs-srv.h        |  13 +++
 drivers/infiniband/ulp/rtrs/rtrs.c            |  27 +++++
 drivers/infiniband/ulp/rtrs/rtrs.h            |   3 +-
 15 files changed, 486 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/fault-injection/rtrs-fault-injection.rst
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-fault.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-fault.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH for-next 01/22] MAINTAINERS: Change maintainer for rtrs module
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 02/22] RDMA/rtrs: Enable the fault-injection Gioh Kim
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Danil Kipnis, Md Haris Iqbal, Jack Wang

From: Danil Kipnis <danil.kipnis@cloud.ionos.com>

Danil will step down, Haris will take over.
Also update to email address to ionos.com, cloud.ionos.com
will still work for sometime.

Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Acked-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9e876927c60d..716221c9d689 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15541,8 +15541,8 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git rtl8xxxu-deve
 F:	drivers/net/wireless/realtek/rtl8xxxu/
 
 RTRS TRANSPORT DRIVERS
-M:	Danil Kipnis <danil.kipnis@cloud.ionos.com>
-M:	Jack Wang <jinpu.wang@cloud.ionos.com>
+M:	Md. Haris Iqbal <haris.iqbal@ionos.com>
+M:	Jack Wang <jinpu.wang@ionos.com>
 L:	linux-rdma@vger.kernel.org
 S:	Maintained
 F:	drivers/infiniband/ulp/rtrs/
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 02/22] RDMA/rtrs: Enable the fault-injection
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 01/22] MAINTAINERS: Change maintainer for rtrs module Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 03/22] RDMA/rtrs-clt: Inject a fault at request processing Gioh Kim
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Jack Wang

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

This patch introduces some functions to enable the fault-injection
for RTRS.
* rtrs_fault_inject_init/final initialize the fault-injection
and create a debugfs directory.
* rtrs_fault_inject_add creates a debugfs entry to enable
the fault-injection point.

Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/Makefile     |  2 +
 drivers/infiniband/ulp/rtrs/rtrs-fault.c | 52 ++++++++++++++++++++++++
 drivers/infiniband/ulp/rtrs/rtrs-fault.h | 28 +++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-fault.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-fault.h

diff --git a/drivers/infiniband/ulp/rtrs/Makefile b/drivers/infiniband/ulp/rtrs/Makefile
index 3898509be270..3490f66bb7f2 100644
--- a/drivers/infiniband/ulp/rtrs/Makefile
+++ b/drivers/infiniband/ulp/rtrs/Makefile
@@ -3,10 +3,12 @@
 rtrs-client-y := rtrs-clt.o \
 		  rtrs-clt-stats.o \
 		  rtrs-clt-sysfs.o
+rtrs-client-$(CONFIG_FAULT_INJECTION_DEBUG_FS) += rtrs-fault.o
 
 rtrs-server-y := rtrs-srv.o \
 		  rtrs-srv-stats.o \
 		  rtrs-srv-sysfs.o
+rtrs-server-$(CONFIG_FAULT_INJECTION_DEBUG_FS) += rtrs-fault.o
 
 rtrs-core-y := rtrs.o
 
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-fault.c b/drivers/infiniband/ulp/rtrs/rtrs-fault.c
new file mode 100644
index 000000000000..af475c814c29
--- /dev/null
+++ b/drivers/infiniband/ulp/rtrs/rtrs-fault.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * RDMA Transport Layer
+ *
+ * Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
+ * Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
+ */
+
+#include "rtrs-fault.h"
+
+static DECLARE_FAULT_ATTR(fail_default_attr);
+
+void rtrs_fault_inject_init(struct rtrs_fault_inject *fj,
+			    const char *dir_name,
+			    u32 err_status)
+{
+	struct dentry *dir, *parent;
+	struct fault_attr *attr = &fj->attr;
+
+	/* create debugfs directory and attribute */
+	parent = debugfs_create_dir(dir_name, NULL);
+	if (!parent) {
+		pr_warn("%s: failed to create debugfs directory\n", dir_name);
+		return;
+	}
+
+	*attr = fail_default_attr;
+	dir = fault_create_debugfs_attr("fault_inject", parent, attr);
+	if (IS_ERR(dir)) {
+		pr_warn("%s: failed to create debugfs attr\n", dir_name);
+		debugfs_remove_recursive(parent);
+		return;
+	}
+	fj->parent = parent;
+	fj->dir = dir;
+
+	/* create debugfs for status code */
+	fj->status = err_status;
+	debugfs_create_u32("status", 0600, dir,	&fj->status);
+}
+
+void rtrs_fault_inject_final(struct rtrs_fault_inject *fj)
+{
+	/* remove debugfs directories */
+	debugfs_remove_recursive(fj->parent);
+}
+
+void rtrs_fault_inject_add(struct dentry *dir, const char *fname, bool *value)
+{
+	debugfs_create_bool(fname, 0600, dir, value);
+}
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-fault.h b/drivers/infiniband/ulp/rtrs/rtrs-fault.h
new file mode 100644
index 000000000000..8c1acffb2b16
--- /dev/null
+++ b/drivers/infiniband/ulp/rtrs/rtrs-fault.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * RDMA Transport Layer
+ *
+ * Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
+ * Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
+ */
+
+#ifndef RTRS_FAULT_H
+#define RTRS_FAULT_H
+
+#include <linux/fault-inject.h>
+
+struct rtrs_fault_inject {
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+	struct fault_attr attr;
+	struct dentry *parent;
+	struct dentry *dir;
+	u32 status;
+#endif
+};
+
+void rtrs_fault_inject_init(struct rtrs_fault_inject *fj,
+			    const char *dev_name, u32 err_status);
+void rtrs_fault_inject_add(struct dentry *dir, const char *fname, bool *value);
+void rtrs_fault_inject_final(struct rtrs_fault_inject *fj);
+#endif /* RTRS_FAULT_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 03/22] RDMA/rtrs-clt: Inject a fault at request processing
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 01/22] MAINTAINERS: Change maintainer for rtrs module Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 02/22] RDMA/rtrs: Enable the fault-injection Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 04/22] RDMA/rtrs-srv: Inject a fault at heart-beat sending Gioh Kim
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Jack Wang

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

If the fault-injection is enabled, it does not sent a request to the
server and returns error.

Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c | 44 ++++++++++++++++++++
 drivers/infiniband/ulp/rtrs/rtrs-clt.c       |  7 ++++
 drivers/infiniband/ulp/rtrs/rtrs-clt.h       | 13 ++++++
 3 files changed, 64 insertions(+)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
index b6a0abf40589..d168bd08037a 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
@@ -480,3 +480,47 @@ void rtrs_clt_destroy_sysfs_root(struct rtrs_clt *clt)
 		kobject_put(clt->kobj_paths);
 	}
 }
+
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+void rtrs_clt_fault_inject_init(struct rtrs_clt_fault_inject *fault_inject,
+				struct rtrs_clt_sess *sess)
+{
+	char str[NAME_MAX];
+	int cnt;
+
+	cnt = sockaddr_to_str((struct sockaddr *)&sess->s.src_addr,
+			      str, sizeof(str));
+	cnt += scnprintf(str + cnt, sizeof(str) - cnt, "@");
+	sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr,
+			str + cnt, sizeof(str) - cnt);
+
+	rtrs_fault_inject_init(&fault_inject->fj, str, -EBUSY);
+	/* injection points */
+	rtrs_fault_inject_add(fault_inject->fj.dir,
+			      "fail-request", &fault_inject->fail_request);
+}
+
+void rtrs_clt_fault_inject_final(struct rtrs_clt_fault_inject *fault_inject)
+{
+	rtrs_fault_inject_final(&fault_inject->fj);
+}
+
+int rtrs_clt_should_fail_request(struct rtrs_clt_fault_inject *fault_inject)
+{
+	if (fault_inject->fail_request && should_fail(&fault_inject->fj.attr, 1))
+		return fault_inject->fj.status;
+	return 0;
+}
+#else
+void rtrs_clt_fault_inject_init(struct rtrs_clt_fault_inject *fault_inject,
+				struct rtrs_clt_sess *sess)
+{
+}
+void rtrs_clt_fault_inject_final(struct rtrs_clt_fault_inject *fault_inject)
+{
+}
+int rtrs_clt_should_fail_request(struct rtrs_clt_fault_inject *fault_inject)
+{
+	return 0;
+}
+#endif
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index f95955fc2992..4f7690137e42 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -1472,6 +1472,7 @@ static struct rtrs_clt_sess *alloc_sess(struct rtrs_clt *clt,
 
 void free_sess(struct rtrs_clt_sess *sess)
 {
+	rtrs_clt_fault_inject_final(&sess->fault_inject);
 	free_percpu(sess->mp_skip_entry);
 	mutex_destroy(&sess->init_mutex);
 	kfree(sess->s.con);
@@ -2689,6 +2690,8 @@ struct rtrs_clt *rtrs_clt_open(struct rtrs_clt_ops *ops,
 			free_sess(sess);
 			goto close_all_sess;
 		}
+
+		rtrs_clt_fault_inject_init(&sess->fault_inject, sess);
 	}
 	err = alloc_permits(clt);
 	if (err)
@@ -2861,6 +2864,10 @@ int rtrs_clt_request(int dir, struct rtrs_clt_req_ops *ops,
 		if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
 			continue;
 
+		err = rtrs_clt_should_fail_request(&sess->fault_inject);
+		if (unlikely(err))
+			continue;
+
 		if (unlikely(usr_len + hdr_len > sess->max_hdr_size)) {
 			rtrs_wrn_rl(sess->clt,
 				     "%s request failed, user message size is %zu and header length %zu, but max size is %u\n",
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
index 692bc83e1f09..59ea2ec44fe5 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
@@ -12,6 +12,7 @@
 
 #include <linux/device.h>
 #include "rtrs-pri.h"
+#include "rtrs-fault.h"
 
 /**
  * enum rtrs_clt_state - Client states.
@@ -122,6 +123,13 @@ struct rtrs_rbuf {
 	u32 rkey;
 };
 
+struct rtrs_clt_fault_inject {
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+	struct rtrs_fault_inject fj;
+	bool fail_request;
+#endif
+};
+
 struct rtrs_clt_sess {
 	struct rtrs_sess	s;
 	struct rtrs_clt	*clt;
@@ -150,6 +158,7 @@ struct rtrs_clt_sess {
 	char                    hca_name[IB_DEVICE_NAME_MAX];
 	struct list_head __percpu
 				*mp_skip_entry;
+	struct rtrs_clt_fault_inject	fault_inject;
 };
 
 struct rtrs_clt {
@@ -250,4 +259,8 @@ int rtrs_clt_create_sess_files(struct rtrs_clt_sess *sess);
 void rtrs_clt_destroy_sess_files(struct rtrs_clt_sess *sess,
 				  const struct attribute *sysfs_self);
 
+void rtrs_clt_fault_inject_init(struct rtrs_clt_fault_inject *fault_inject,
+				struct rtrs_clt_sess *sess);
+void rtrs_clt_fault_inject_final(struct rtrs_clt_fault_inject *fault_inject);
+int rtrs_clt_should_fail_request(struct rtrs_clt_fault_inject *fault_inject);
 #endif /* RTRS_CLT_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 04/22] RDMA/rtrs-srv: Inject a fault at heart-beat sending
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (2 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 03/22] RDMA/rtrs-clt: Inject a fault at request processing Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS Gioh Kim
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Jack Wang

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

If the fault-injection is enabled, it does not send a heart-beat
and generates the error on the client side.

Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c | 44 ++++++++++++++++++++
 drivers/infiniband/ulp/rtrs/rtrs-srv.c       |  5 +++
 drivers/infiniband/ulp/rtrs/rtrs-srv.h       | 13 ++++++
 3 files changed, 62 insertions(+)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
index 126a96e75c62..7ac3fe884409 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
@@ -313,3 +313,47 @@ void rtrs_srv_destroy_sess_files(struct rtrs_srv_sess *sess)
 		rtrs_srv_destroy_once_sysfs_root_folders(sess);
 	}
 }
+
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+void rtrs_srv_fault_inject_init(struct rtrs_srv_fault_inject *fault_inject,
+				struct rtrs_srv_sess *sess)
+{
+	char str[NAME_MAX];
+	int cnt;
+
+	cnt = sockaddr_to_str((struct sockaddr *)&sess->s.src_addr,
+			      str, sizeof(str));
+	cnt += scnprintf(str + cnt, sizeof(str) - cnt, "@");
+	sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr,
+			str + cnt, sizeof(str) - cnt);
+
+	rtrs_fault_inject_init(&fault_inject->fj, str, -EBUSY);
+	/* injection points */
+	rtrs_fault_inject_add(fault_inject->fj.dir,
+			      "fail-hb-ack", &fault_inject->fail_hb_ack);
+}
+
+void rtrs_srv_fault_inject_final(struct rtrs_srv_fault_inject *fault_inject)
+{
+	rtrs_fault_inject_final(&fault_inject->fj);
+}
+
+int rtrs_should_fail_hb_ack(struct rtrs_srv_fault_inject *fault_inject)
+{
+	if (fault_inject->fail_hb_ack && should_fail(&fault_inject->fj.attr, 1))
+		return fault_inject->fj.status;
+	return 0;
+}
+#else
+void rtrs_srv_fault_inject_init(struct rtrs_srv_fault_inject *fault_inject,
+				struct rtrs_srv_sess *sess_name)
+{
+}
+void rtrs_srv_fault_inject_final(struct rtrs_srv_fault_inject *fault_inject)
+{
+}
+int rtrs_should_fail_hb_ack(struct rtrs_srv_fault_inject *fault_inject)
+{
+	return 0;
+}
+#endif
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 37ba121564a2..e9998e40f2f7 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1232,6 +1232,8 @@ static void rtrs_srv_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
 			}
 		} else if (imm_type == RTRS_HB_MSG_IMM) {
 			WARN_ON(con->c.cid);
+			if (unlikely(rtrs_should_fail_hb_ack(&sess->fault_inject)))
+				break;
 			rtrs_send_hb_ack(&sess->s);
 		} else if (imm_type == RTRS_HB_ACK_IMM) {
 			WARN_ON(con->c.cid);
@@ -1489,6 +1491,7 @@ static void rtrs_srv_close_work(struct work_struct *work)
 
 	sess = container_of(work, typeof(*sess), close_work);
 
+	rtrs_srv_fault_inject_final(&sess->fault_inject);
 	rtrs_srv_destroy_sess_files(sess);
 	rtrs_srv_stop_hb(sess);
 
@@ -1739,6 +1742,8 @@ static struct rtrs_srv_sess *__alloc_sess(struct rtrs_srv *srv,
 
 	__add_path_to_srv(srv, sess);
 
+	rtrs_srv_fault_inject_init(&sess->fault_inject, sess);
+
 	return sess;
 
 err_unmap_bufs:
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.h b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
index 9543ae19996c..001889e148ac 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.h
@@ -13,6 +13,7 @@
 #include <linux/device.h>
 #include <linux/refcount.h>
 #include "rtrs-pri.h"
+#include "rtrs-fault.h"
 
 /*
  * enum rtrs_srv_state - Server states.
@@ -73,6 +74,13 @@ struct rtrs_srv_mr {
 	struct rtrs_iu	*iu;		/* send buffer for new rkey msg */
 };
 
+struct rtrs_srv_fault_inject {
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+	struct rtrs_fault_inject fj;
+	bool fail_hb_ack;
+#endif
+};
+
 struct rtrs_srv_sess {
 	struct rtrs_sess	s;
 	struct rtrs_srv	*srv;
@@ -90,6 +98,7 @@ struct rtrs_srv_sess {
 	unsigned int		mem_bits;
 	struct kobject		kobj;
 	struct rtrs_srv_stats	*stats;
+	struct rtrs_srv_fault_inject	fault_inject;
 };
 
 struct rtrs_srv {
@@ -152,4 +161,8 @@ ssize_t rtrs_srv_reset_all_help(struct rtrs_srv_stats *stats,
 int rtrs_srv_create_sess_files(struct rtrs_srv_sess *sess);
 void rtrs_srv_destroy_sess_files(struct rtrs_srv_sess *sess);
 
+void rtrs_srv_fault_inject_init(struct rtrs_srv_fault_inject *fault_inject,
+				struct rtrs_srv_sess *sess);
+void rtrs_srv_fault_inject_final(struct rtrs_srv_fault_inject *fault_inject);
+int rtrs_should_fail_hb_ack(struct rtrs_srv_fault_inject *fault_inject);
 #endif /* RTRS_SRV_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (3 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 04/22] RDMA/rtrs-srv: Inject a fault at heart-beat sending Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-04-01 18:37   ` Jason Gunthorpe
  2021-03-25 15:32 ` [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected Gioh Kim
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

It describes how to use the fault-injection of RTRS.

Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
---
 .../fault-injection/rtrs-fault-injection.rst  | 83 +++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 Documentation/fault-injection/rtrs-fault-injection.rst

diff --git a/Documentation/fault-injection/rtrs-fault-injection.rst b/Documentation/fault-injection/rtrs-fault-injection.rst
new file mode 100644
index 000000000000..775a223fef09
--- /dev/null
+++ b/Documentation/fault-injection/rtrs-fault-injection.rst
@@ -0,0 +1,83 @@
+RTRS (RDMA Transport) Fault Injection
+=====================================
+This document introduces how to enable and use the error injection of RTRS
+via debugfs in the /sys/kernel/debug directory. When enabled, users can
+enable specific error injection point and change the default status code
+via the debugfs.
+
+Following examples show how to inject an error into the RTRS.
+
+First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config,
+recompile the kernel. After booting up the kernel, map a target device.
+
+After mapping, /sys/kernel/debug/<session-name> directory is created
+on both of the client and the server.
+
+Example 1: Inject an error into request processing of rtrs-client
+-----------------------------------------------------------------
+
+Generate an error on one session of rtrs-client::
+   
+  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability 
+  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times 
+  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request 
+  dd if=/dev/rnbd0 of=./dd bs=1k count=10
+
+Expected Result::
+
+  dd succeeds but generates an IO error
+
+Message from dmesg::
+
+  FAULT_INJECTION: forcing a failure.
+  name fault_inject, interval 1, probability 100, space 0, times 1
+  CPU: 0 PID: 799 Comm: dd Tainted: G           O      5.4.77-pserver+ #169
+  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
+  Call Trace:
+    dump_stack+0x97/0xe0
+    should_fail.cold+0x5/0x11
+    rtrs_clt_should_fail_request+0x2f/0x50 [rtrs_client]
+    rtrs_clt_request+0x223/0x540 [rtrs_client]
+    rnbd_queue_rq+0x347/0x800 [rnbd_client]
+    __blk_mq_try_issue_directly+0x268/0x380
+    blk_mq_request_issue_directly+0x9a/0xe0
+    blk_mq_try_issue_list_directly+0xa3/0x170
+    blk_mq_sched_insert_requests+0x1de/0x340
+    blk_mq_flush_plug_list+0x488/0x620
+    blk_flush_plug_list+0x20f/0x250
+    blk_finish_plug+0x3c/0x54
+    read_pages+0x104/0x2b0
+    __do_page_cache_readahead+0x28b/0x2b0
+    ondemand_readahead+0x2cc/0x610
+    generic_file_read_iter+0xde0/0x11f0
+    new_sync_read+0x246/0x360
+    vfs_read+0xc1/0x1b0
+    ksys_read+0xc3/0x160
+    do_syscall_64+0x68/0x260
+    entry_SYSCALL_64_after_hwframe+0x49/0xbe
+  RIP: 0033:0x7f7ff4296461
+  Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
+  RSP: 002b:00007fffdceca5b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
+  RAX: ffffffffffffffda RBX: 000055c5eab6e3e0 RCX: 00007f7ff4296461
+  RDX: 0000000000000400 RSI: 000055c5ead27000 RDI: 0000000000000000
+  RBP: 0000000000000400 R08: 0000000000000003 R09: 00007f7ff4368260
+  R10: ffffffffffffff3b R11: 0000000000000246 R12: 000055c5ead27000
+  R13: 0000000000000000 R14: 0000000000000000 R15: 000055c5ead27000
+
+Example 2: rtrs-server does not send ACK to the heart-beat of rtrs-client
+-------------------------------------------------------------------------
+
+::
+
+  echo 100 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/probability 
+  echo 5 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/times 
+  echo 1 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/fail-hb-ack
+   
+Expected Result::
+
+  If rtrs-server does not send ACK more than 5 times, rtrs-client tries reconnection.
+
+Check how many times rtrs-client did reconnection::
+
+  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects 
+  1 0
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (4 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-04-01 18:38   ` Jason Gunthorpe
  2021-03-25 15:32 ` [PATCH for-next 07/22] RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req Gioh Kim
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Guoqing Jiang, Guoqing Jiang, Danil Kipnis, Gioh Kim

From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

No need to continue the loop if one sess is connected.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 .../fault-injection/rtrs-fault-injection.rst         | 12 ++++++------
 drivers/infiniband/ulp/rtrs/rtrs-clt.c               |  5 ++++-
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/Documentation/fault-injection/rtrs-fault-injection.rst b/Documentation/fault-injection/rtrs-fault-injection.rst
index 775a223fef09..65ffe26ece67 100644
--- a/Documentation/fault-injection/rtrs-fault-injection.rst
+++ b/Documentation/fault-injection/rtrs-fault-injection.rst
@@ -17,10 +17,10 @@ Example 1: Inject an error into request processing of rtrs-client
 -----------------------------------------------------------------
 
 Generate an error on one session of rtrs-client::
-   
-  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability 
-  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times 
-  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request 
+
+  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability
+  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times
+  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request
   dd if=/dev/rnbd0 of=./dd bs=1k count=10
 
 Expected Result::
@@ -72,12 +72,12 @@ Example 2: rtrs-server does not send ACK to the heart-beat of rtrs-client
   echo 100 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/probability 
   echo 5 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/times 
   echo 1 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/fail-hb-ack
-   
+
 Expected Result::
 
   If rtrs-server does not send ACK more than 5 times, rtrs-client tries reconnection.
 
 Check how many times rtrs-client did reconnection::
 
-  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects 
+  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects
   1 0
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 4f7690137e42..bfb5be5481e7 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -51,7 +51,10 @@ static inline bool rtrs_clt_is_connected(const struct rtrs_clt *clt)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry)
-		connected |= READ_ONCE(sess->state) == RTRS_CLT_CONNECTED;
+		if (READ_ONCE(sess->state) == RTRS_CLT_CONNECTED) {
+			connected = true;
+			break;
+		}
 	rcu_read_unlock();
 
 	return connected;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 07/22] RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (5 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 08/22] RDMA/rtrs: Kill the put label in rtrs_srv_create_once_sysfs_root_folders Gioh Kim
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Guoqing Jiang, Guoqing Jiang, Danil Kipnis, Gioh Kim

From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

There is no need to dereference 's' from 'sess', since we have
"sess = to_clt_sess(s)" before.

And we can deference 'dev' from 's' earlier.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index bfb5be5481e7..dd841121dc2c 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -1055,7 +1055,7 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
 	struct rtrs_sess *s = con->c.sess;
 	struct rtrs_clt_sess *sess = to_clt_sess(s);
 	struct rtrs_msg_rdma_read *msg;
-	struct rtrs_ib_dev *dev;
+	struct rtrs_ib_dev *dev = sess->s.dev;
 
 	struct ib_reg_wr rwr;
 	struct ib_send_wr *wr = NULL;
@@ -1065,9 +1065,6 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
 
 	const size_t tsize = sizeof(*msg) + req->data_len + req->usr_len;
 
-	s = &sess->s;
-	dev = sess->s.dev;
-
 	if (unlikely(tsize > sess->chunk_size)) {
 		rtrs_wrn(s,
 			  "Read request failed, message size is %zu, bigger than CHUNK_SIZE %d\n",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 08/22] RDMA/rtrs: Kill the put label in rtrs_srv_create_once_sysfs_root_folders
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (6 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 07/22] RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 09/22] RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs Gioh Kim
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Guoqing Jiang, Guoqing Jiang, Danil Kipnis, Gioh Kim

From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

We can remove the label after move put_device to the right place.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
index 7ac3fe884409..309693434870 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
@@ -176,7 +176,8 @@ static int rtrs_srv_create_once_sysfs_root_folders(struct rtrs_srv_sess *sess)
 	err = device_add(&srv->dev);
 	if (err) {
 		pr_err("device_add(): %d\n", err);
-		goto put;
+		put_device(&srv->dev);
+		goto unlock;
 	}
 	srv->kobj_paths = kobject_create_and_add("paths", &srv->dev.kobj);
 	if (!srv->kobj_paths) {
@@ -188,10 +189,6 @@ static int rtrs_srv_create_once_sysfs_root_folders(struct rtrs_srv_sess *sess)
 	}
 	dev_set_uevent_suppress(&srv->dev, false);
 	kobject_uevent(&srv->dev.kobj, KOBJ_ADD);
-	goto unlock;
-
-put:
-	put_device(&srv->dev);
 unlock:
 	mutex_unlock(&srv->paths_mutex);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 09/22] RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (7 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 08/22] RDMA/rtrs: Kill the put label in rtrs_srv_create_once_sysfs_root_folders Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 10/22] RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler Gioh Kim
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Guoqing Jiang, Guoqing Jiang, Danil Kipnis, Gioh Kim

From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

The two members are not used in the code, so remove them.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 2 --
 drivers/infiniband/ulp/rtrs/rtrs.h     | 2 --
 2 files changed, 4 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index dd841121dc2c..124197e3162f 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2912,8 +2912,6 @@ int rtrs_clt_query(struct rtrs_clt *clt, struct rtrs_attrs *attr)
 
 	attr->queue_depth      = clt->queue_depth;
 	attr->max_io_size      = clt->max_io_size;
-	attr->sess_kobj	       = &clt->dev.kobj;
-	strlcpy(attr->sessname, clt->sessname, sizeof(attr->sessname));
 
 	return 0;
 }
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.h b/drivers/infiniband/ulp/rtrs/rtrs.h
index 8738e90e715a..a7e9ae579686 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs.h
@@ -110,8 +110,6 @@ int rtrs_clt_request(int dir, struct rtrs_clt_req_ops *ops,
 struct rtrs_attrs {
 	u32		queue_depth;
 	u32		max_io_size;
-	u8		sessname[NAME_MAX];
-	struct kobject	*sess_kobj;
 };
 
 int rtrs_clt_query(struct rtrs_clt *sess, struct rtrs_attrs *attr);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 10/22] RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (8 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 09/22] RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 11/22] RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files Gioh Kim
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Guoqing Jiang, Guoqing Jiang, Danil Kipnis, Gioh Kim

From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

Let these cases share the same path since all of them need to close
session.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index e9998e40f2f7..c0a65e1e6fda 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1913,13 +1913,10 @@ static int rtrs_srv_rdma_cm_handler(struct rdma_cm_id *cm_id,
 	case RDMA_CM_EVENT_UNREACHABLE:
 		rtrs_err(s, "CM error (CM event: %s, err: %d)\n",
 			  rdma_event_msg(ev->event), ev->status);
-		close_sess(sess);
-		break;
+		fallthrough;
 	case RDMA_CM_EVENT_DISCONNECTED:
 	case RDMA_CM_EVENT_ADDR_CHANGE:
 	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
-		close_sess(sess);
-		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
 		close_sess(sess);
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 11/22] RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (9 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 10/22] RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:32 ` [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats Gioh Kim
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Md Haris Iqbal, Gioh Kim

From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>

KASAN detected the following BUG:
[  821.309371] ==================================================================
[  821.309842] BUG: KASAN: use-after-free in rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client]
[  821.310114] Read of size 8 at addr ffff88bf2fb4adc0 by task swapper/0/0

[  821.310503] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.4.84-pserver #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
[  821.310511] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
[  821.310518] Call Trace:
[  821.310526]  <IRQ>
[  821.310541]  dump_stack+0x96/0xe0
[  821.310560]  print_address_description.constprop.4+0x1f/0x300
[  821.310571]  ? irq_work_claim+0x2e/0x50
[  821.310589]  __kasan_report.cold.8+0x78/0x92
[  821.310615]  ? rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client]
[  821.310639]  kasan_report+0x10/0x20
[  821.310656]  rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client]
[  821.310681]  rtrs_clt_rdma_done+0xb1/0x760 [rtrs_client]
[  821.310698]  ? lockdep_hardirqs_on+0x1a8/0x290
[  821.310725]  ? process_io_rsp+0xb0/0xb0 [rtrs_client]
[  821.310779]  ? mlx4_ib_destroy_cq+0x100/0x100 [mlx4_ib]
[  821.310802]  ? add_interrupt_randomness+0x1a2/0x340
[  821.310863]  __ib_process_cq+0x97/0x100 [ib_core]
[  821.310924]  ib_poll_handler+0x41/0xb0 [ib_core]
[  821.310945]  irq_poll_softirq+0xe0/0x260
[  821.310974]  __do_softirq+0x127/0x672
[  821.311016]  irq_exit+0xd1/0xe0
[  821.311027]  do_IRQ+0xa3/0x1d0
[  821.311046]  common_interrupt+0xf/0xf
[  821.311055]  </IRQ>
[  821.311065] RIP: 0010:cpuidle_enter_state+0xea/0x780
[  821.311075] Code: 31 ff e8 99 48 47 ff 80 7c 24 08 00 74 12 9c 58 f6 c4 02 0f 85 53 05 00 00 31 ff e8 b0 6f 53 ff e8 ab 4f 5e ff fb 8b 44 24 04 <85> c0 0f 89 f3 01 00 00 48 8d 7b 14 e8 65 1e 77 ff c7 43 14 00 00
[  821.311082] RSP: 0018:ffffffffab007d58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffca
[  821.311093] RAX: 0000000000000002 RBX: ffff88b803d69800 RCX: ffffffffa91a8298
[  821.311101] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffffab021414
[  821.311109] RBP: ffffffffab6329e0 R08: 0000000000000002 R09: 0000000000000000
[  821.311116] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[  821.311124] R13: 000000bf39d82466 R14: ffffffffab632aa0 R15: ffffffffab632ae0
[  821.311157]  ? lockdep_hardirqs_on+0x1a8/0x290
[  821.311183]  ? cpuidle_enter_state+0xe5/0x780
[  821.311212]  cpuidle_enter+0x3c/0x60
[  821.311233]  do_idle+0x2fb/0x390
[  821.311250]  ? arch_cpu_idle_exit+0x40/0x40
[  821.311272]  ? schedule+0x94/0x120
[  821.311298]  cpu_startup_entry+0x19/0x1b
[  821.311313]  start_kernel+0x5da/0x61b
[  821.311330]  ? thread_stack_cache_init+0x6/0x6
[  821.311342]  ? load_ucode_amd_bsp+0x6f/0xc4
[  821.311358]  ? init_amd_microcode+0xa6/0xa6
[  821.311380]  ? x86_family+0x5/0x20
[  821.311392]  ? load_ucode_bsp+0x182/0x1fd
[  821.311421]  secondary_startup_64+0xa4/0xb0

[  821.311652] Allocated by task 5730:
[  821.313411]  save_stack+0x19/0x80
[  821.313420]  __kasan_kmalloc.constprop.9+0xc1/0xd0
[  821.313429]  kmem_cache_alloc_trace+0x15b/0x350
[  821.313443]  alloc_sess+0xf4/0x570 [rtrs_client]
[  821.313456]  rtrs_clt_open+0x3b4/0x780 [rtrs_client]
[  821.313469]  find_and_get_or_create_sess+0x649/0x9d0 [rnbd_client]
[  821.313481]  rnbd_clt_map_device+0xd7/0xf50 [rnbd_client]
[  821.313493]  rnbd_clt_map_device_store+0x4ee/0x970 [rnbd_client]
[  821.313503]  kernfs_fop_write+0x141/0x240
[  821.313512]  vfs_write+0xf3/0x280
[  821.313521]  ksys_write+0xba/0x150
[  821.313530]  do_syscall_64+0x68/0x270
[  821.313539]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  821.313708] Freed by task 5822:
[  821.313918]  save_stack+0x19/0x80
[  821.313928]  __kasan_slab_free+0x125/0x170
[  821.313936]  kfree+0xe7/0x3f0
[  821.313945]  kobject_put+0xd3/0x240
[  821.313959]  rtrs_clt_destroy_sess_files+0x3f/0x60 [rtrs_client]
[  821.313972]  rtrs_clt_close+0x3c/0x80 [rtrs_client]
[  821.313984]  close_rtrs+0x45/0x80 [rnbd_client]
[  821.313996]  rnbd_client_exit+0x10f/0x2bd [rnbd_client]
[  821.314006]  __x64_sys_delete_module+0x27b/0x340
[  821.314014]  do_syscall_64+0x68/0x270
[  821.314024]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  821.314197] The buggy address belongs to the object at ffff88bf2fb4ad80
                which belongs to the cache kmalloc-96 of size 96
[  821.314514] The buggy address is located 64 bytes inside of
                96-byte region [ffff88bf2fb4ad80, ffff88bf2fb4ade0)
[  821.314820] The buggy address belongs to the page:
[  821.315023] page:ffffea00fcbed280 refcount:1 mapcount:0 mapping:ffff8887c6016e00 index:0xffff88bf2fb4a800
[  821.315032] flags: 0x1effff8000000200(slab)
[  821.315044] raw: 1effff8000000200 ffffea00bf41b640 0000000300000003 ffff8887c6016e00
[  821.315054] raw: ffff88bf2fb4a800 000000008020001d 00000001ffffffff 0000000000000000
[  821.315061] page dumped because: kasan: bad access detected

[  821.315232] Memory state around the buggy address:
[  821.315434]  ffff88bf2fb4ac80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  821.315694]  ffff88bf2fb4ad00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  821.315950] >ffff88bf2fb4ad80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  821.316205]                                            ^
[  821.316414]  ffff88bf2fb4ae00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  821.316671]  ffff88bf2fb4ae80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  821.316929] ==================================================================

When rtrs_clt_close is triggered, it iterates over all the present
rtrs_clt_sess and triggers close on them. However, the call to
rtrs_clt_destroy_sess_files is done before the rtrs_clt_close_conns. This
is incorrect since during the initialization phase we allocate
rtrs_clt_sess first, and then we go ahead and create rtrs_clt_con for it.

If we free the rtrs_clt_sess structure before closing the rtrs_clt_con, it
may so happen that an inflight IO completion would trigger the function
rtrs_clt_rdma_done, which would lead to the above UAF case.

Hence close the rtrs_clt_con connections first, and then trigger the
destruction of session files.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 124197e3162f..42f49208b8f7 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2726,8 +2726,8 @@ void rtrs_clt_close(struct rtrs_clt *clt)
 
 	/* Now it is safe to iterate over all paths without locks */
 	list_for_each_entry_safe(sess, tmp, &clt->paths_list, s.entry) {
-		rtrs_clt_destroy_sess_files(sess, NULL);
 		rtrs_clt_close_conns(sess, true);
+		rtrs_clt_destroy_sess_files(sess, NULL);
 		kobject_put(&sess->kobj);
 	}
 	free_clt(clt);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (10 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 11/22] RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-04-01 18:44   ` Jason Gunthorpe
  2021-03-25 15:32 ` [PATCH for-next 13/22] RDMA/rtrs: New function converting rtrs_addr to string Gioh Kim
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Md Haris Iqbal, Gioh Kim

From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>

KASAN detected the following BUG:
[  230.436512] ==================================================================
[  230.437182] BUG: KASAN: use-after-free in get_next_path_min_inflight+0x95/0x150 [rtrs_client]
[  230.437632] Read of size 4 at addr ffff88a796b4bb50 by task fio/4130

[  230.438069] CPU: 32 PID: 4130 Comm: fio Tainted: G           O      5.4.84-pserver #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
[  230.438079] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
[  230.438088] Call Trace:
[  230.438111]  dump_stack+0x96/0xe0
[  230.438136]  print_address_description.constprop.4+0x1f/0x300
[  230.438150]  ? irq_work_claim+0x2e/0x50
[  230.438172]  __kasan_report.cold.8+0x78/0x92
[  230.438203]  ? get_next_path_min_inflight+0x95/0x150 [rtrs_client]
[  230.438234]  kasan_report+0x10/0x20
[  230.438249]  check_memory_region+0x144/0x1c0
[  230.438274]  get_next_path_min_inflight+0x95/0x150 [rtrs_client]
[  230.438312]  rtrs_clt_request+0x1fe/0x700 [rtrs_client]
[  230.438364]  ? rtrs_clt_close_work+0x40/0x40 [rtrs_client]
[  230.438395]  ? rtrs_clt_change_state_get_old+0x70/0x70 [rtrs_client]
[  230.438417]  ? blk_mq_start_request+0x1a4/0x2c0
[  230.438430]  ? blk_rq_map_sg+0x3d5/0xaa0
[  230.438468]  ? round_jiffies_up+0x60/0x90
[  230.438511]  rnbd_queue_rq+0x3e2/0x870 [rnbd_client]
[  230.438567]  ? rnbd_softirq_done_fn+0x90/0x90 [rnbd_client]
[  230.438587]  ? rnbd_get_permit+0x50/0x50 [rnbd_client]
[  230.438601]  ? __lock_acquire+0x68e/0x23a0
[  230.438635]  ? blk_mq_get_driver_tag+0xbe/0x250
[  230.438652]  ? blk_mq_dequeue_from_ctx+0x4d0/0x4d0
[  230.438663]  ? lock_acquire+0xf3/0x210
[  230.438726]  __blk_mq_try_issue_directly+0x272/0x390
[  230.438752]  ? blk_mq_get_driver_tag+0x250/0x250
[  230.438785]  ? rcu_is_watching+0x34/0x50
[  230.438816]  blk_mq_request_issue_directly+0xa8/0xf0
[  230.438833]  ? blk_mq_flush_plug_list+0x690/0x690
[  230.438859]  ? lock_downgrade+0x390/0x390
[  230.438892]  ? lock_acquire+0xf3/0x210
[  230.438920]  blk_mq_try_issue_list_directly+0xa1/0x160
[  230.438952]  blk_mq_sched_insert_requests+0x23c/0x390
[  230.438992]  blk_mq_flush_plug_list+0x361/0x690
[  230.439037]  ? blk_mq_insert_requests+0x300/0x300
[  230.439058]  ? current_time+0x8c/0xe0
[  230.439074]  ? timestamp_truncate+0x180/0x180
[  230.439101]  ? file_remove_privs+0xb4/0x1f0
[  230.439139]  blk_flush_plug_list+0x1d1/0x210
[  230.439167]  ? blk_insert_cloned_request+0x1e0/0x1e0
[  230.439220]  blk_finish_plug+0x3c/0x54
[  230.439243]  blkdev_write_iter+0x173/0x260
[  230.439272]  ? bd_finish_claiming+0xe0/0xe0
[  230.439298]  ? 0xffffffff9a000000
[  230.439330]  ? rw_verify_area+0xd9/0x130
[  230.439359]  aio_write+0x1d3/0x300
[  230.439387]  ? aio_read+0x260/0x260
[  230.439477]  ? lock_downgrade+0x390/0x390
[  230.439497]  ? lock_acquire+0xf3/0x210
[  230.439512]  ? __might_fault+0x7d/0xe0
[  230.439570]  io_submit_one+0xccc/0x1920
[  230.439633]  ? aio_poll_complete_work+0x850/0x850
[  230.439735]  ? __x64_sys_io_submit+0x118/0x380
[  230.439748]  __x64_sys_io_submit+0x118/0x380
[  230.439777]  ? __ia32_compat_sys_io_submit+0x360/0x360
[  230.439793]  ? __x64_sys_io_getevents+0xd7/0x150
[  230.439807]  ? mark_held_locks+0x29/0xa0
[  230.439827]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[  230.439840]  ? trace_hardirqs_off_caller+0x15/0x110
[  230.439857]  ? mark_held_locks+0x29/0xa0
[  230.439893]  ? do_syscall_64+0x68/0x270
[  230.439903]  do_syscall_64+0x68/0x270
[  230.439924]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  230.439936] RIP: 0033:0x7f8f10233f59
[  230.439948] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
[  230.439958] RSP: 002b:00007fff1df1d238 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[  230.439970] RAX: ffffffffffffffda RBX: 00007f8f104ec360 RCX: 00007f8f10233f59
[  230.439980] RDX: 0000555c4c1f9440 RSI: 0000000000000001 RDI: 00007f8f04730000
[  230.439989] RBP: 00007f8f04730000 R08: 0000555c4c1c97f0 R09: 00000000000001e0
[  230.439998] R10: 0000555c4c1f9670 R11: 0000000000000246 R12: 0000000000000001
[  230.440007] R13: 0000000000000000 R14: 0000555c4c1f9440 R15: 00007f8ee30b5210

[  230.440257] Allocated by task 3440:
[  230.440471]  save_stack+0x19/0x80
[  230.440482]  __kasan_kmalloc.constprop.9+0xc1/0xd0
[  230.440492]  kmem_cache_alloc_trace+0x15b/0x350
[  230.440508]  alloc_sess+0xf4/0x570 [rtrs_client]
[  230.440524]  rtrs_clt_open+0x3b4/0x780 [rtrs_client]
[  230.440538]  find_and_get_or_create_sess+0x649/0x9d0 [rnbd_client]
[  230.440551]  rnbd_clt_map_device+0xd7/0xf50 [rnbd_client]
[  230.440565]  rnbd_clt_map_device_store+0x4ee/0x970 [rnbd_client]
[  230.440577]  kernfs_fop_write+0x141/0x240
[  230.440587]  vfs_write+0xf3/0x280
[  230.440598]  ksys_write+0xba/0x150
[  230.440608]  do_syscall_64+0x68/0x270
[  230.440619]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  230.440806] Freed by task 4148:
[  230.441013]  save_stack+0x19/0x80
[  230.441024]  __kasan_slab_free+0x125/0x170
[  230.441034]  kfree+0xe7/0x3f0
[  230.441045]  kobject_put+0xd3/0x240
[  230.441061]  rtrs_clt_destroy_sess_files+0x3f/0x60 [rtrs_client]
[  230.441076]  rtrs_clt_remove_path_from_sysfs+0x95/0xe0 [rtrs_client]
[  230.441092]  rtrs_clt_remove_path_store+0x3e/0xa0 [rtrs_client]
[  230.441103]  kernfs_fop_write+0x141/0x240
[  230.441113]  vfs_write+0xf3/0x280
[  230.441123]  ksys_write+0xba/0x150
[  230.441133]  do_syscall_64+0x68/0x270
[  230.441145]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  230.441333] The buggy address belongs to the object at ffff88a796b4bb00
                which belongs to the cache kmalloc-96 of size 96
[  230.441705] The buggy address is located 80 bytes inside of
                96-byte region [ffff88a796b4bb00, ffff88a796b4bb60)
[  230.442063] The buggy address belongs to the page:
[  230.442294] page:ffffea009e5ad2c0 refcount:1 mapcount:0 mapping:ffff8887c6016e00 index:0x0
[  230.442305] flags: 0x12ffff8000000200(slab)
[  230.442320] raw: 12ffff8000000200 dead000000000100 dead000000000122 ffff8887c6016e00
[  230.442332] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[  230.442340] page dumped because: kasan: bad access detected

[  230.442525] Memory state around the buggy address:
[  230.442756]  ffff88a796b4ba00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  230.443059]  ffff88a796b4ba80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  230.443359] >ffff88a796b4bb00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  230.443681]                                                  ^
[  230.443935]  ffff88a796b4bb80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
[  230.444233]  ffff88a796b4bc00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  230.444529] ==================================================================

When get_next_path_min_inflight is called to select the next path, it
iterates over the list of available rtrs_clt_sess (paths). It then reads
the number of inflight IOs for that path to select one which has the least

But it may so happen that that rtrs_clt_sess (path) is no longer in the
connected state, and like in the above BUG its resources have also been
freed.

So, check the state of the rtrs_clt_sess (path) before going ahead to read
its inflight stats.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 42f49208b8f7..1519191d7154 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
 	int inflight;
 
 	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
+		if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
+			continue;
+
 		if (unlikely(!list_empty(raw_cpu_ptr(sess->mp_skip_entry))))
 			continue;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 13/22] RDMA/rtrs: New function converting rtrs_addr to string
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (11 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats Gioh Kim
@ 2021-03-25 15:32 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens Gioh Kim
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:32 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

There are common code converting addresses of source
machine and destination machine to one string.
We already have a struct rtrs_addr to store two addresses.
This patch introduces a new function that converts
two addresses into one string with struct rtrs_addr.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c | 13 +++++------
 drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c | 13 +++++------
 drivers/infiniband/ulp/rtrs/rtrs.c           | 24 ++++++++++++++++++++
 drivers/infiniband/ulp/rtrs/rtrs.h           |  1 +
 4 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
index d168bd08037a..c502dcbae9bb 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
@@ -396,14 +396,13 @@ int rtrs_clt_create_sess_files(struct rtrs_clt_sess *sess)
 {
 	struct rtrs_clt *clt = sess->clt;
 	char str[NAME_MAX];
-	int err, cnt;
-
-	cnt = sockaddr_to_str((struct sockaddr *)&sess->s.src_addr,
-			      str, sizeof(str));
-	cnt += scnprintf(str + cnt, sizeof(str) - cnt, "@");
-	sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr,
-			str + cnt, sizeof(str) - cnt);
+	int err;
+	struct rtrs_addr path = {
+		.src = &sess->s.src_addr,
+		.dst = &sess->s.dst_addr,
+	};
 
+	rtrs_addr_to_str(&path, str, sizeof(str));
 	err = kobject_init_and_add(&sess->kobj, &ktype_sess, clt->kobj_paths,
 				   "%s", str);
 	if (err) {
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
index 309693434870..57af9e7c3588 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
@@ -259,14 +259,13 @@ int rtrs_srv_create_sess_files(struct rtrs_srv_sess *sess)
 	struct rtrs_srv *srv = sess->srv;
 	struct rtrs_sess *s = &sess->s;
 	char str[NAME_MAX];
-	int err, cnt;
-
-	cnt = sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr,
-			      str, sizeof(str));
-	cnt += scnprintf(str + cnt, sizeof(str) - cnt, "@");
-	sockaddr_to_str((struct sockaddr *)&sess->s.src_addr,
-			str + cnt, sizeof(str) - cnt);
+	int err;
+	struct rtrs_addr path = {
+		.src = &sess->s.dst_addr,
+		.dst = &sess->s.src_addr,
+	};
 
+	rtrs_addr_to_str(&path, str, sizeof(str));
 	err = rtrs_srv_create_once_sysfs_root_folders(sess);
 	if (err)
 		return err;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 5822cb088eef..1dd772d84e71 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -463,6 +463,30 @@ int sockaddr_to_str(const struct sockaddr *addr, char *buf, size_t len)
 }
 EXPORT_SYMBOL(sockaddr_to_str);
 
+/**
+ * rtrs_addr_to_str() - convert rtrs_addr to a string "src@dst"
+ * @addr:	the rtrs_addr structure to be converted
+ * @str:	string containing source and destination addr of a path
+ *		separated by '@' I.e. "ip:1.1.1.1@ip:1.1.1.2"
+ *		"ip:1.1.1.1@ip:1.1.1.2".
+ * @len:	string length
+ *
+ * The return value is the number of characters written into buf not
+ * including the trailing '\0'.
+ */
+int rtrs_addr_to_str(const struct rtrs_addr *addr, char *buf, size_t len)
+{
+	int cnt;
+
+	cnt = sockaddr_to_str((struct sockaddr *)addr->src,
+			      buf, len);
+	cnt += scnprintf(buf + cnt, len - cnt, "@");
+	sockaddr_to_str((struct sockaddr *)addr->dst,
+			buf + cnt, len - cnt);
+	return cnt;
+}
+EXPORT_SYMBOL(rtrs_addr_to_str);
+
 /**
  * rtrs_addr_to_sockaddr() - convert path string "src,dst" or "src@dst"
  * to sockaddreses
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.h b/drivers/infiniband/ulp/rtrs/rtrs.h
index a7e9ae579686..966c1e5ad68e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs.h
@@ -184,4 +184,5 @@ int rtrs_addr_to_sockaddr(const char *str, size_t len, u16 port,
 			  struct rtrs_addr *addr);
 
 int sockaddr_to_str(const struct sockaddr *addr, char *buf, size_t len);
+int rtrs_addr_to_str(const struct rtrs_addr *addr, char *buf, size_t len);
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (12 preceding siblings ...)
  2021-03-25 15:32 ` [PATCH for-next 13/22] RDMA/rtrs: New function converting rtrs_addr to string Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-04-01 18:46   ` Jason Gunthorpe
  2021-03-25 15:33 ` [PATCH for-next 15/22] RDMA/rtrs-srv: More debugging info when fail to send reply Gioh Kim
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

Client prints only error value and it is not enough for debugging.

1. When client receives an error from server:
the client does not only print the error value but also
more information of server connection.

2. When client failes to send IO:
the client gets an error from RDMA layer. It also
print more information of server connection.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 31 ++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 1519191d7154..a41864ec853d 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -440,6 +440,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
 	req->in_use = false;
 	req->con = NULL;
 
+	if (unlikely(errno)) {
+		rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n",
+			    errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
+	}
+
 	if (notify)
 		req->conf(req->priv, errno);
 }
@@ -1026,7 +1031,8 @@ static int rtrs_clt_write_req(struct rtrs_clt_io_req *req)
 				       req->usr_len + sizeof(*msg),
 				       imm);
 	if (unlikely(ret)) {
-		rtrs_err(s, "Write request failed: %d\n", ret);
+		rtrs_err_rl(s, "Write request failed: error=%d path=%s [%s:%u]\n",
+			    ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
 		if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
 			atomic_dec(&sess->stats->inflight);
 		if (req->sg_cnt)
@@ -1144,7 +1150,8 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
 	ret = rtrs_post_send_rdma(req->con, req, &sess->rbufs[buf_id],
 				   req->data_len, imm, wr);
 	if (unlikely(ret)) {
-		rtrs_err(s, "Read request failed: %d\n", ret);
+		rtrs_err_rl(s, "Read request failed: error=%d path=%s [%s:%u]\n",
+			    ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
 		if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
 			atomic_dec(&sess->stats->inflight);
 		req->need_inv = false;
@@ -2465,12 +2472,28 @@ static int init_sess(struct rtrs_clt_sess *sess)
 	mutex_lock(&sess->init_mutex);
 	err = init_conns(sess);
 	if (err) {
-		rtrs_err(sess->clt, "init_conns(), err: %d\n", err);
+		char str[NAME_MAX];
+		int err;
+		struct rtrs_addr path = {
+			.src = &sess->s.src_addr,
+			.dst = &sess->s.dst_addr,
+		};
+		rtrs_addr_to_str(&path, str, sizeof(str));
+		rtrs_err(sess->clt, "init_conns() failed: err=%d path=%s [%s:%u]\n",
+			 err, str, sess->hca_name, sess->hca_port);
 		goto out;
 	}
 	err = rtrs_send_sess_info(sess);
 	if (err) {
-		rtrs_err(sess->clt, "rtrs_send_sess_info(), err: %d\n", err);
+		char str[NAME_MAX];
+		int err;
+		struct rtrs_addr path = {
+			.src = &sess->s.src_addr,
+			.dst = &sess->s.dst_addr,
+		};
+		rtrs_addr_to_str(&path, str, sizeof(str));
+		rtrs_err(sess->clt, "rtrs_send_sess_info() failed: err=%d path=%s [%s:%u]\n",
+			 err, str, sess->hca_name, sess->hca_port);
 		goto out;
 	}
 	rtrs_clt_sess_up(sess);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 15/22] RDMA/rtrs-srv: More debugging info when fail to send reply
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (13 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 16/22] RDMA/rtrs-srv: Report temporary sessname for error message Gioh Kim
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

It does not help to debug if it only print error message
without any debugging information which session and connection
the error happened.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index c0a65e1e6fda..c1428cf602bc 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -518,8 +518,9 @@ bool rtrs_srv_resp_rdma(struct rtrs_srv_op *id, int status)
 
 	if (unlikely(sess->state != RTRS_SRV_CONNECTED)) {
 		rtrs_err_rl(s,
-			     "Sending I/O response failed,  session is disconnected, sess state %s\n",
-			     rtrs_srv_state_str(sess->state));
+			    "Sending I/O response failed,  session %s is disconnected, sess state %s\n",
+			    kobject_name(&sess->kobj),
+			    rtrs_srv_state_str(sess->state));
 		goto out;
 	}
 	if (always_invalidate) {
@@ -529,7 +530,9 @@ bool rtrs_srv_resp_rdma(struct rtrs_srv_op *id, int status)
 	}
 	if (unlikely(atomic_sub_return(1,
 				       &con->sq_wr_avail) < 0)) {
-		pr_err("IB send queue full\n");
+		rtrs_err(s, "IB send queue full: sess=%s cid=%d\n",
+			 kobject_name(&sess->kobj),
+			 con->c.cid);
 		atomic_add(1, &con->sq_wr_avail);
 		spin_lock(&con->rsp_wr_wait_lock);
 		list_add_tail(&id->wait_list, &con->rsp_wr_wait_list);
@@ -543,7 +546,8 @@ bool rtrs_srv_resp_rdma(struct rtrs_srv_op *id, int status)
 		err = rdma_write_sg(id);
 
 	if (unlikely(err)) {
-		rtrs_err_rl(s, "IO response failed: %d\n", err);
+		rtrs_err_rl(s, "IO response failed: %d: sess=%s\n", err,
+			    kobject_name(&sess->kobj));
 		close_sess(sess);
 	}
 out:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 16/22] RDMA/rtrs-srv: Report temporary sessname for error message
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (14 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 15/22] RDMA/rtrs-srv: More debugging info when fail to send reply Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 17/22] RDMA/rtrs: cleanup unused variable Gioh Kim
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

Before receiving the session name, the error message cannot
include any information which connection generates the error.
This patch stores the addresses of source and target in the
sessname field to show which generates the error. And that field
will be over-written when receiving the session name from client.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index c1428cf602bc..8ec9c30b9887 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1690,6 +1690,9 @@ static struct rtrs_srv_sess *__alloc_sess(struct rtrs_srv *srv,
 {
 	struct rtrs_srv_sess *sess;
 	int err = -ENOMEM;
+	char str[NAME_MAX];
+	struct rtrs_sess *s;
+	struct rtrs_addr path;
 
 	if (srv->paths_num >= MAX_PATHS_NUM) {
 		err = -ECONNRESET;
@@ -1724,6 +1727,14 @@ static struct rtrs_srv_sess *__alloc_sess(struct rtrs_srv *srv,
 	sess->cur_cq_vector = -1;
 	sess->s.dst_addr = cm_id->route.addr.dst_addr;
 	sess->s.src_addr = cm_id->route.addr.src_addr;
+
+	/* temporary until receiving session-name from client */
+	s = &sess->s;
+	path.src = &sess->s.src_addr;
+	path.dst = &sess->s.dst_addr;
+	rtrs_addr_to_str(&path, str, sizeof(str));
+	strlcpy(sess->s.sessname, str, sizeof(sess->s.sessname));
+
 	sess->s.con_num = con_num;
 	sess->s.recon_cnt = recon_cnt;
 	uuid_copy(&sess->s.uuid, uuid);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 17/22] RDMA/rtrs: cleanup unused variable
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (15 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 16/22] RDMA/rtrs-srv: Report temporary sessname for error message Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-04-01 18:50   ` Jason Gunthorpe
  2021-03-25 15:33 ` [PATCH for-next 18/22] RDMA/rtrs-clt: Simplify error message Gioh Kim
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Jack Wang, Gioh Kim

From: Jack Wang <jinpu.wang@cloud.ionos.com>

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 2 --
 drivers/infiniband/ulp/rtrs/rtrs.c     | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 8ec9c30b9887..e11e91626b41 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1691,7 +1691,6 @@ static struct rtrs_srv_sess *__alloc_sess(struct rtrs_srv *srv,
 	struct rtrs_srv_sess *sess;
 	int err = -ENOMEM;
 	char str[NAME_MAX];
-	struct rtrs_sess *s;
 	struct rtrs_addr path;
 
 	if (srv->paths_num >= MAX_PATHS_NUM) {
@@ -1729,7 +1728,6 @@ static struct rtrs_srv_sess *__alloc_sess(struct rtrs_srv *srv,
 	sess->s.src_addr = cm_id->route.addr.src_addr;
 
 	/* temporary until receiving session-name from client */
-	s = &sess->s;
 	path.src = &sess->s.src_addr;
 	path.dst = &sess->s.dst_addr;
 	rtrs_addr_to_str(&path, str, sizeof(str));
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index 1dd772d84e71..bc08b7f6e5e2 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -466,7 +466,7 @@ EXPORT_SYMBOL(sockaddr_to_str);
 /**
  * rtrs_addr_to_str() - convert rtrs_addr to a string "src@dst"
  * @addr:	the rtrs_addr structure to be converted
- * @str:	string containing source and destination addr of a path
+ * @buf:	string containing source and destination addr of a path
  *		separated by '@' I.e. "ip:1.1.1.1@ip:1.1.1.2"
  *		"ip:1.1.1.1@ip:1.1.1.2".
  * @len:	string length
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 18/22] RDMA/rtrs-clt: Simplify error message
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (16 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 17/22] RDMA/rtrs: cleanup unused variable Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 19/22] RDMA/rtrs-clt: Cap max_io_size Gioh Kim
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

Two error messages are only different message but have common
code to generate the path string.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index a41864ec853d..eb830c66fa44 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2468,30 +2468,22 @@ static int rtrs_send_sess_info(struct rtrs_clt_sess *sess)
 static int init_sess(struct rtrs_clt_sess *sess)
 {
 	int err;
+	char str[NAME_MAX];
+	struct rtrs_addr path = {
+		.src = &sess->s.src_addr,
+		.dst = &sess->s.dst_addr,
+	};
+	rtrs_addr_to_str(&path, str, sizeof(str));
 
 	mutex_lock(&sess->init_mutex);
 	err = init_conns(sess);
 	if (err) {
-		char str[NAME_MAX];
-		int err;
-		struct rtrs_addr path = {
-			.src = &sess->s.src_addr,
-			.dst = &sess->s.dst_addr,
-		};
-		rtrs_addr_to_str(&path, str, sizeof(str));
 		rtrs_err(sess->clt, "init_conns() failed: err=%d path=%s [%s:%u]\n",
 			 err, str, sess->hca_name, sess->hca_port);
 		goto out;
 	}
 	err = rtrs_send_sess_info(sess);
 	if (err) {
-		char str[NAME_MAX];
-		int err;
-		struct rtrs_addr path = {
-			.src = &sess->s.src_addr,
-			.dst = &sess->s.dst_addr,
-		};
-		rtrs_addr_to_str(&path, str, sizeof(str));
 		rtrs_err(sess->clt, "rtrs_send_sess_info() failed: err=%d path=%s [%s:%u]\n",
 			 err, str, sess->hca_name, sess->hca_port);
 		goto out;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 19/22] RDMA/rtrs-clt: Cap max_io_size
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (17 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 18/22] RDMA/rtrs-clt: Simplify error message Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 20/22] RDMA/rtrs-clt: Add a minimum latency multipath policy Gioh Kim
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Jack Wang, Gioh Kim

From: Jack Wang <jinpu.wang@cloud.ionos.com>

Max io size is limited by both remote buffer size and
the max fr pages per mr.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index eb830c66fa44..7e72f0911cf2 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2929,7 +2929,9 @@ int rtrs_clt_query(struct rtrs_clt *clt, struct rtrs_attrs *attr)
 		return -ECOMM;
 
 	attr->queue_depth      = clt->queue_depth;
-	attr->max_io_size      = clt->max_io_size;
+	/* Cap max_io_size to min of remote buffer size and the fr pages */
+	attr->max_io_size = min_t(int, clt->max_io_size,
+				  clt->max_segments * SZ_4K);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 20/22] RDMA/rtrs-clt: Add a minimum latency multipath policy
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (18 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 19/22] RDMA/rtrs-clt: Cap max_io_size Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 21/22] RDMA/rtrs-clt: new sysfs attribute to print the latency of each path Gioh Kim
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

This patch adds new multipath policy: min-latency.
Client checks the latency of each path when it sends the heart-beat.
And it sends IO to the path with the minimum latency.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c | 18 +++++--
 drivers/infiniband/ulp/rtrs/rtrs-clt.c       | 57 +++++++++++++++++++-
 drivers/infiniband/ulp/rtrs/rtrs-clt.h       |  1 +
 drivers/infiniband/ulp/rtrs/rtrs-pri.h       |  2 +
 drivers/infiniband/ulp/rtrs/rtrs.c           |  3 ++
 5 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
index c502dcbae9bb..bc46b7a99ba0 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
@@ -101,6 +101,8 @@ static ssize_t mpath_policy_show(struct device *dev,
 	case MP_POLICY_MIN_INFLIGHT:
 		return sysfs_emit(page, "min-inflight (MI: %d)\n",
 				  clt->mp_policy);
+	case MP_POLICY_MIN_LATENCY:
+		return sprintf(page, "min-latency (ML: %d)\n", clt->mp_policy);
 	default:
 		return sysfs_emit(page, "Unknown (%d)\n", clt->mp_policy);
 	}
@@ -114,22 +116,32 @@ static ssize_t mpath_policy_store(struct device *dev,
 	struct rtrs_clt *clt;
 	int value;
 	int ret;
+	size_t len = 0;
 
 	clt = container_of(dev, struct rtrs_clt, dev);
 
 	ret = kstrtoint(buf, 10, &value);
 	if (!ret && (value == MP_POLICY_RR ||
-		     value == MP_POLICY_MIN_INFLIGHT)) {
+		     value == MP_POLICY_MIN_INFLIGHT ||
+		     value == MP_POLICY_MIN_LATENCY)) {
 		clt->mp_policy = value;
 		return count;
 	}
 
+	/* distinguish "mi" and "min-latency" with length */
+	len = strnlen(buf, NAME_MAX);
+	if (buf[len - 1] == '\n')
+		len--;
+
 	if (!strncasecmp(buf, "round-robin", 11) ||
-	    !strncasecmp(buf, "rr", 2))
+	    (len == 2 && !strncasecmp(buf, "rr", 2)))
 		clt->mp_policy = MP_POLICY_RR;
 	else if (!strncasecmp(buf, "min-inflight", 12) ||
-		 !strncasecmp(buf, "mi", 2))
+		 (len == 2 && !strncasecmp(buf, "mi", 2)))
 		clt->mp_policy = MP_POLICY_MIN_INFLIGHT;
+	else if (!strncasecmp(buf, "min-latency", 11) ||
+		 (len == 2 && !strncasecmp(buf, "ml", 2)))
+		clt->mp_policy = MP_POLICY_MIN_LATENCY;
 	else
 		return -EINVAL;
 
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 7e72f0911cf2..3e7b78d71362 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -636,6 +636,8 @@ static void rtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc)
 		} else if (imm_type == RTRS_HB_ACK_IMM) {
 			WARN_ON(con->c.cid);
 			sess->s.hb_missed_cnt = 0;
+			sess->s.hb_cur_latency =
+				ktime_sub(ktime_get(), sess->s.hb_last_sent);
 			if (sess->flags & RTRS_MSG_NEW_RKEY_F)
 				return  rtrs_clt_recv_done(con, wc);
 		} else {
@@ -837,6 +839,57 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
 	return min_path;
 }
 
+/**
+ * get_next_path_min_latency() - Returns path with minimal latency.
+ * @it:	the path pointer
+ *
+ * Return: a path with the lowest latency or NULL if all paths are tried
+ *
+ * Locks:
+ *    rcu_read_lock() must be hold.
+ *
+ * Related to @MP_POLICY_MIN_LATENCY
+ *
+ * This DOES skip an already-tried path.
+ * There is a skip-list to skip a path if the path has tried but failed.
+ * It will try the minimum latency path and then the second minimum latency
+ * path and so on. Finally it will return NULL if all paths are tried.
+ * Therefore the caller MUST check the returned
+ * path is NULL and trigger the IO error.
+ */
+static struct rtrs_clt_sess *get_next_path_min_latency(struct path_it *it)
+{
+	struct rtrs_clt_sess *min_path = NULL;
+	struct rtrs_clt *clt = it->clt;
+	struct rtrs_clt_sess *sess;
+	ktime_t min_latency = INT_MAX;
+	ktime_t latency;
+
+	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
+		if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
+			continue;
+
+		if (unlikely(!list_empty(raw_cpu_ptr(sess->mp_skip_entry))))
+			continue;
+
+		latency = sess->s.hb_cur_latency;
+
+		if (latency < min_latency) {
+			min_latency = latency;
+			min_path = sess;
+		}
+	}
+
+	/*
+	 * add the path to the skip list, so that next time we can get
+	 * a different one
+	 */
+	if (min_path)
+		list_add(raw_cpu_ptr(min_path->mp_skip_entry), &it->skip_list);
+
+	return min_path;
+}
+
 static inline void path_it_init(struct path_it *it, struct rtrs_clt *clt)
 {
 	INIT_LIST_HEAD(&it->skip_list);
@@ -845,8 +898,10 @@ static inline void path_it_init(struct path_it *it, struct rtrs_clt *clt)
 
 	if (clt->mp_policy == MP_POLICY_RR)
 		it->next_path = get_next_path_rr;
-	else
+	else if (clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
 		it->next_path = get_next_path_min_inflight;
+	else
+		it->next_path = get_next_path_min_latency;
 }
 
 static inline void path_it_deinit(struct path_it *it)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
index 59ea2ec44fe5..5c0cea8dd83e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h
@@ -30,6 +30,7 @@ enum rtrs_clt_state {
 enum rtrs_mp_policy {
 	MP_POLICY_RR,
 	MP_POLICY_MIN_INFLIGHT,
+	MP_POLICY_MIN_LATENCY,
 };
 
 /* see Documentation/ABI/testing/sysfs-class-rtrs-client for details */
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index 1b31bda9ca78..bcad5e2168c5 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -112,6 +112,8 @@ struct rtrs_sess {
 	unsigned int		hb_interval_ms;
 	unsigned int		hb_missed_cnt;
 	unsigned int		hb_missed_max;
+	ktime_t			hb_last_sent;
+	ktime_t			hb_cur_latency;
 };
 
 /* rtrs information unit */
diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
index bc08b7f6e5e2..a7847282a2eb 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs.c
@@ -337,6 +337,9 @@ static void hb_work(struct work_struct *work)
 		schedule_hb(sess);
 		return;
 	}
+
+	sess->hb_last_sent = ktime_get();
+
 	imm = rtrs_to_imm(RTRS_HB_MSG_IMM, 0);
 	err = rtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe, imm,
 					     0, NULL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 21/22] RDMA/rtrs-clt: new sysfs attribute to print the latency of each path
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (19 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 20/22] RDMA/rtrs-clt: Add a minimum latency multipath policy Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-03-25 15:33 ` [PATCH for-next 22/22] Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy Gioh Kim
  2021-04-01 19:04 ` [PATCH for-next 00/22] Misc update for rtrs Jason Gunthorpe
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang,
	Gioh Kim, Gioh Kim

From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

It shows the latest latency that the client checked when sending
the heart-beat.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
index bc46b7a99ba0..fc6de514b328 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
@@ -354,6 +354,21 @@ static ssize_t rtrs_clt_hca_name_show(struct kobject *kobj,
 static struct kobj_attribute rtrs_clt_hca_name_attr =
 	__ATTR(hca_name, 0444, rtrs_clt_hca_name_show, NULL);
 
+static ssize_t rtrs_clt_cur_latency_show(struct kobject *kobj,
+				    struct kobj_attribute *attr,
+				    char *page)
+{
+	struct rtrs_clt_sess *sess;
+
+	sess = container_of(kobj, struct rtrs_clt_sess, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%lld ns\n",
+			 ktime_to_ns(sess->s.hb_cur_latency));
+}
+
+static struct kobj_attribute rtrs_clt_cur_latency_attr =
+	__ATTR(cur_latency, 0444, rtrs_clt_cur_latency_show, NULL);
+
 static ssize_t rtrs_clt_src_addr_show(struct kobject *kobj,
 				       struct kobj_attribute *attr,
 				       char *page)
@@ -397,6 +412,7 @@ static struct attribute *rtrs_clt_sess_attrs[] = {
 	&rtrs_clt_reconnect_attr.attr,
 	&rtrs_clt_disconnect_attr.attr,
 	&rtrs_clt_remove_path_attr.attr,
+	&rtrs_clt_cur_latency_attr.attr,
 	NULL,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH for-next 22/22] Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (20 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 21/22] RDMA/rtrs-clt: new sysfs attribute to print the latency of each path Gioh Kim
@ 2021-03-25 15:33 ` Gioh Kim
  2021-04-01 19:04 ` [PATCH for-next 00/22] Misc update for rtrs Jason Gunthorpe
  22 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-03-25 15:33 UTC (permalink / raw)
  To: linux-rdma
  Cc: bvanassche, leon, dledford, jgg, haris.iqbal, jinpu.wang, Gioh Kim

describe new multipath policy min-latency of the RTRS client.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 Documentation/ABI/testing/sysfs-class-rtrs-client | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-rtrs-client b/Documentation/ABI/testing/sysfs-class-rtrs-client
index 0f7165aab251..49a4157c7bf1 100644
--- a/Documentation/ABI/testing/sysfs-class-rtrs-client
+++ b/Documentation/ABI/testing/sysfs-class-rtrs-client
@@ -34,6 +34,9 @@ Description:	Multipath policy specifies which path should be selected on each IO
 		min-inflight (1):
 		    select path with minimum inflights.
 
+		min-latency (2):
+		    select path with minimum latency.
+
 What:		/sys/class/rtrs-client/<session-name>/paths/
 Date:		Feb 2020
 KernelVersion:	5.7
@@ -95,6 +98,15 @@ KernelVersion:	5.7
 Contact:	Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
 Description:	RO, Contains the destination address of the path
 
+What:		/sys/class/rtrs-client/<session-name>/paths/<src@dst>/cur_latency
+Date:		Feb 2020
+KernelVersion:	5.7
+Contact:	Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
+Description:	RO, Contains the latency time calculated by the heart-beat messages.
+		Whenever the client sends heart-beat message, it checks the time gap
+		between sending the heart-beat message and receiving the ACK.
+		This value can be changed regularly.
+
 What:		/sys/class/rtrs-client/<session-name>/paths/<src@dst>/stats/reset_all
 Date:		Feb 2020
 KernelVersion:	5.7
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS
  2021-03-25 15:32 ` [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS Gioh Kim
@ 2021-04-01 18:37   ` Jason Gunthorpe
  2021-04-01 19:06     ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 18:37 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal, jinpu.wang,
	Gioh Kim

On Thu, Mar 25, 2021 at 04:32:51PM +0100, Gioh Kim wrote:
> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> 
> It describes how to use the fault-injection of RTRS.
> 
> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> ---
>  .../fault-injection/rtrs-fault-injection.rst  | 83 +++++++++++++++++++
>  1 file changed, 83 insertions(+)
>  create mode 100644 Documentation/fault-injection/rtrs-fault-injection.rst

You need to break this giant series up and CC the relavent reviewers.

In this case the series should stop here and CC people interested in
docs and fault injection in general. Look at the maintainer file and
git history.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  2021-03-25 15:32 ` [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected Gioh Kim
@ 2021-04-01 18:38   ` Jason Gunthorpe
  2021-04-06 10:23     ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 18:38 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal, jinpu.wang,
	Guoqing Jiang, Guoqing Jiang, Danil Kipnis

On Thu, Mar 25, 2021 at 04:32:52PM +0100, Gioh Kim wrote:
> From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> 
> No need to continue the loop if one sess is connected.
> 
> Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
> Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
>  .../fault-injection/rtrs-fault-injection.rst         | 12 ++++++------
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c               |  5 ++++-
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/fault-injection/rtrs-fault-injection.rst b/Documentation/fault-injection/rtrs-fault-injection.rst
> index 775a223fef09..65ffe26ece67 100644
> +++ b/Documentation/fault-injection/rtrs-fault-injection.rst
> @@ -17,10 +17,10 @@ Example 1: Inject an error into request processing of rtrs-client
>  
>  Generate an error on one session of rtrs-client::
> -   
> -  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability 
> -  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times 
> -  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request 
> +
> +  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability
> +  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times
> +  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request
>    dd if=/dev/rnbd0 of=./dd bs=1k count=10
>  
>  Expected Result::
> @@ -72,12 +72,12 @@ Example 2: rtrs-server does not send ACK to the heart-beat of rtrs-client
>    echo 100 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/probability 
>    echo 5 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/times 
>    echo 1 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/fail-hb-ack
> -   
> +
>  Expected Result::
>  
>    If rtrs-server does not send ACK more than 5 times, rtrs-client tries reconnection.
>  
>  Check how many times rtrs-client did reconnection::
>  
> -  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects 
> +  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects
>    1 0

Why is all of this in this patch?

> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> index 4f7690137e42..bfb5be5481e7 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> @@ -51,7 +51,10 @@ static inline bool rtrs_clt_is_connected(const struct rtrs_clt *clt)
>  
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry)
> -		connected |= READ_ONCE(sess->state) == RTRS_CLT_CONNECTED;
> +		if (READ_ONCE(sess->state) == RTRS_CLT_CONNECTED) {
> +			connected = true;
> +			break;
> +		}
>  	rcu_read_unlock();

I assume this is the intended change? rebase error?

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-03-25 15:32 ` [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats Gioh Kim
@ 2021-04-01 18:44   ` Jason Gunthorpe
  2021-04-06  8:55     ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 18:44 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal, jinpu.wang,
	Md Haris Iqbal

On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> index 42f49208b8f7..1519191d7154 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
>  	int inflight;
>  
>  	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> +		if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> +			continue;

There is no way this could be right, a READ_ONCE can't guarentee that
a following load is not going to happen without races.

You need locking.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens
  2021-03-25 15:33 ` [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens Gioh Kim
@ 2021-04-01 18:46   ` Jason Gunthorpe
  2021-04-01 19:09     ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 18:46 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal, jinpu.wang,
	Gioh Kim

On Thu, Mar 25, 2021 at 04:33:00PM +0100, Gioh Kim wrote:
> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> 
> Client prints only error value and it is not enough for debugging.
> 
> 1. When client receives an error from server:
> the client does not only print the error value but also
> more information of server connection.
> 
> 2. When client failes to send IO:
> the client gets an error from RDMA layer. It also
> print more information of server connection.
> 
> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 31 ++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> index 1519191d7154..a41864ec853d 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> @@ -440,6 +440,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
>  	req->in_use = false;
>  	req->con = NULL;
>  
> +	if (unlikely(errno)) {
> +		rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n",
> +			    errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
> +	}
> +
>  	if (notify)
>  		req->conf(req->priv, errno);
>  }
> @@ -1026,7 +1031,8 @@ static int rtrs_clt_write_req(struct rtrs_clt_io_req *req)
>  				       req->usr_len + sizeof(*msg),
>  				       imm);
>  	if (unlikely(ret)) {
> -		rtrs_err(s, "Write request failed: %d\n", ret);
> +		rtrs_err_rl(s, "Write request failed: error=%d path=%s [%s:%u]\n",
> +			    ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
>  		if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
>  			atomic_dec(&sess->stats->inflight);
>  		if (req->sg_cnt)
> @@ -1144,7 +1150,8 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
>  	ret = rtrs_post_send_rdma(req->con, req, &sess->rbufs[buf_id],
>  				   req->data_len, imm, wr);
>  	if (unlikely(ret)) {
> -		rtrs_err(s, "Read request failed: %d\n", ret);
> +		rtrs_err_rl(s, "Read request failed: error=%d path=%s [%s:%u]\n",
> +			    ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
>  		if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
>  			atomic_dec(&sess->stats->inflight);
>  		req->need_inv = false;
> @@ -2465,12 +2472,28 @@ static int init_sess(struct rtrs_clt_sess *sess)
>  	mutex_lock(&sess->init_mutex);
>  	err = init_conns(sess);
>  	if (err) {
> -		rtrs_err(sess->clt, "init_conns(), err: %d\n", err);
> +		char str[NAME_MAX];
> +		int err;
> +		struct rtrs_addr path = {
> +			.src = &sess->s.src_addr,
> +			.dst = &sess->s.dst_addr,
> +		};
> +		rtrs_addr_to_str(&path, str, sizeof(str));

Coding style is to have new lines after variable blocks.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 17/22] RDMA/rtrs: cleanup unused variable
  2021-03-25 15:33 ` [PATCH for-next 17/22] RDMA/rtrs: cleanup unused variable Gioh Kim
@ 2021-04-01 18:50   ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 18:50 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal, jinpu.wang,
	Jack Wang

On Thu, Mar 25, 2021 at 04:33:03PM +0100, Gioh Kim wrote:
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs.c b/drivers/infiniband/ulp/rtrs/rtrs.c
> index 1dd772d84e71..bc08b7f6e5e2 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs.c
> @@ -466,7 +466,7 @@ EXPORT_SYMBOL(sockaddr_to_str);
>  /**
>   * rtrs_addr_to_str() - convert rtrs_addr to a string "src@dst"
>   * @addr:	the rtrs_addr structure to be converted
> - * @str:	string containing source and destination addr of a path
> + * @buf:	string containing source and destination addr of a path
>   *		separated by '@' I.e. "ip:1.1.1.1@ip:1.1.1.2"
>   *		"ip:1.1.1.1@ip:1.1.1.2".
>   * @len:	string length

This is a rebasing error. Please read your own patches before you send
them, I fixed it

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 00/22] Misc update for rtrs
  2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
                   ` (21 preceding siblings ...)
  2021-03-25 15:33 ` [PATCH for-next 22/22] Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy Gioh Kim
@ 2021-04-01 19:04 ` Jason Gunthorpe
  2021-04-06  9:04   ` Gioh Kim
  22 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 19:04 UTC (permalink / raw)
  To: Gioh Kim; +Cc: linux-rdma, bvanassche, leon, dledford, haris.iqbal, jinpu.wang

On Thu, Mar 25, 2021 at 04:32:46PM +0100, Gioh Kim wrote:
> Hi Jason, hi Doug,
> 
> Please consider to include following changes to the next merge window.
> 
> It contains a few bugfix and cleanup:
> - Change maintainer
> - Change domain address of maintainers' email: from cloud.ionos.com to ionos.com
> - Add some fault-injection points and document update
> - New policy for path finding: min-latency and document update
> - Code refactoring to remove unused code and better error message 

>   RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt
>     session files

This one is for RC, and you need to add Ffixes lines when you fix
things, I put it there.

>   MAINTAINERS: Change maintainer for rtrs module

>   RDMA/rtrs: Enable the fault-injection
>   RDMA/rtrs-clt: Inject a fault at request processing
>   RDMA/rtrs-srv: Inject a fault at heart-beat sending
>   docs: fault-injection: Add fault-injection manual of RTRS

These should be a series

>   RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req
>   RDMA/rtrs: Kill the put label in
>     rtrs_srv_create_once_sysfs_root_folders
>   RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs
>   RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler
>   RDMA/rtrs: New function converting rtrs_addr to string
>   RDMA/rtrs-srv: Report temporary sessname for error message
>   RDMA/rtrs: cleanup unused variable
>   RDMA/rtrs-clt: Cap max_io_size

I applied these trivial patches to for-next. Please pay attention to commit
messages, these are becoming hard to understand.

>   RDMA/rtrs-srv: More debugging info when fail to send reply
>   RDMA/rtrs-clt: Print more info when an error happens
>   RDMA/rtrs-clt: Simplify error message

This is a reasonble series about improving debugging

>   RDMA/rtrs-clt: Add a minimum latency multipath policy
>   RDMA/rtrs-clt: new sysfs attribute to print the latency of each path
>   Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy

This is a series

Everything I didn't apply needs to be rebased/resent/fixed.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS
  2021-04-01 18:37   ` Jason Gunthorpe
@ 2021-04-01 19:06     ` Gioh Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-04-01 19:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Gioh Kim

On Thu, Apr 1, 2021 at 8:37 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Mar 25, 2021 at 04:32:51PM +0100, Gioh Kim wrote:
> > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> >
> > It describes how to use the fault-injection of RTRS.
> >
> > Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > ---
> >  .../fault-injection/rtrs-fault-injection.rst  | 83 +++++++++++++++++++
> >  1 file changed, 83 insertions(+)
> >  create mode 100644 Documentation/fault-injection/rtrs-fault-injection.rst
>
> You need to break this giant series up and CC the relavent reviewers.
>
> In this case the series should stop here and CC people interested in
> docs and fault injection in general. Look at the maintainer file and
> git history.
>
> Jason

Hi Jason,

Ok, I will send patch series v3 excluding patches for the fault injection.

The patch series for fault injection code and documents will include a CC list
of people relevant to the fault injection feature.

Thank you for the review.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens
  2021-04-01 18:46   ` Jason Gunthorpe
@ 2021-04-01 19:09     ` Gioh Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-04-01 19:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Gioh Kim

On Thu, Apr 1, 2021 at 8:46 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Mar 25, 2021 at 04:33:00PM +0100, Gioh Kim wrote:
> > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> >
> > Client prints only error value and it is not enough for debugging.
> >
> > 1. When client receives an error from server:
> > the client does not only print the error value but also
> > more information of server connection.
> >
> > 2. When client failes to send IO:
> > the client gets an error from RDMA layer. It also
> > print more information of server connection.
> >
> > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 31 ++++++++++++++++++++++----
> >  1 file changed, 27 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > index 1519191d7154..a41864ec853d 100644
> > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > @@ -440,6 +440,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> >       req->in_use = false;
> >       req->con = NULL;
> >
> > +     if (unlikely(errno)) {
> > +             rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n",
> > +                         errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
> > +     }
> > +
> >       if (notify)
> >               req->conf(req->priv, errno);
> >  }
> > @@ -1026,7 +1031,8 @@ static int rtrs_clt_write_req(struct rtrs_clt_io_req *req)
> >                                      req->usr_len + sizeof(*msg),
> >                                      imm);
> >       if (unlikely(ret)) {
> > -             rtrs_err(s, "Write request failed: %d\n", ret);
> > +             rtrs_err_rl(s, "Write request failed: error=%d path=%s [%s:%u]\n",
> > +                         ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
> >               if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
> >                       atomic_dec(&sess->stats->inflight);
> >               if (req->sg_cnt)
> > @@ -1144,7 +1150,8 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
> >       ret = rtrs_post_send_rdma(req->con, req, &sess->rbufs[buf_id],
> >                                  req->data_len, imm, wr);
> >       if (unlikely(ret)) {
> > -             rtrs_err(s, "Read request failed: %d\n", ret);
> > +             rtrs_err_rl(s, "Read request failed: error=%d path=%s [%s:%u]\n",
> > +                         ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
> >               if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
> >                       atomic_dec(&sess->stats->inflight);
> >               req->need_inv = false;
> > @@ -2465,12 +2472,28 @@ static int init_sess(struct rtrs_clt_sess *sess)
> >       mutex_lock(&sess->init_mutex);
> >       err = init_conns(sess);
> >       if (err) {
> > -             rtrs_err(sess->clt, "init_conns(), err: %d\n", err);
> > +             char str[NAME_MAX];
> > +             int err;
> > +             struct rtrs_addr path = {
> > +                     .src = &sess->s.src_addr,
> > +                     .dst = &sess->s.dst_addr,
> > +             };
> > +             rtrs_addr_to_str(&path, str, sizeof(str));
>
> Coding style is to have new lines after variable blocks.
>
> Jason

Hi Jason,

Ok, I will fix it in v3.
Thank you for the review.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-01 18:44   ` Jason Gunthorpe
@ 2021-04-06  8:55     ` Gioh Kim
  2021-04-08 12:04       ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-04-06  8:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Md Haris Iqbal

On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > index 42f49208b8f7..1519191d7154 100644
> > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
> >       int inflight;
> >
> >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> > +             if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> > +                     continue;
>
> There is no way this could be right, a READ_ONCE can't guarentee that
> a following load is not going to happen without races.
>
> You need locking.

Hi Jason,

rtrs_clt_request() calls rcu_read_lock() before calling
get_next_path_min_inflight().
And rtrs_clt_change_state_from_to(), that changes the sess->state,
calls spin_lock_irq() before changing it.
I think that is enough, isn't it?


>
> Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 00/22] Misc update for rtrs
  2021-04-01 19:04 ` [PATCH for-next 00/22] Misc update for rtrs Jason Gunthorpe
@ 2021-04-06  9:04   ` Gioh Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-04-06  9:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal, Jinpu Wang

On Thu, Apr 1, 2021 at 9:04 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Mar 25, 2021 at 04:32:46PM +0100, Gioh Kim wrote:
> > Hi Jason, hi Doug,
> >
> > Please consider to include following changes to the next merge window.
> >
> > It contains a few bugfix and cleanup:
> > - Change maintainer
> > - Change domain address of maintainers' email: from cloud.ionos.com to ionos.com
> > - Add some fault-injection points and document update
> > - New policy for path finding: min-latency and document update
> > - Code refactoring to remove unused code and better error message
>
> >   RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt
> >     session files
>
> This one is for RC, and you need to add Ffixes lines when you fix
> things, I put it there.

Got it. Thank you.

>
> >   MAINTAINERS: Change maintainer for rtrs module
>
> >   RDMA/rtrs: Enable the fault-injection
> >   RDMA/rtrs-clt: Inject a fault at request processing
> >   RDMA/rtrs-srv: Inject a fault at heart-beat sending
> >   docs: fault-injection: Add fault-injection manual of RTRS
>
> These should be a series
Ok, I will split it and send another patch set.


>
> >   RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req
> >   RDMA/rtrs: Kill the put label in
> >     rtrs_srv_create_once_sysfs_root_folders
> >   RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs
> >   RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler
> >   RDMA/rtrs: New function converting rtrs_addr to string
> >   RDMA/rtrs-srv: Report temporary sessname for error message
> >   RDMA/rtrs: cleanup unused variable
> >   RDMA/rtrs-clt: Cap max_io_size
>
> I applied these trivial patches to for-next. Please pay attention to commit
> messages, these are becoming hard to understand.

Got it. Thank you.

>
> >   RDMA/rtrs-srv: More debugging info when fail to send reply
> >   RDMA/rtrs-clt: Print more info when an error happens
> >   RDMA/rtrs-clt: Simplify error message
>
> This is a reasonble series about improving debugging

Ok, I will send a separate patch set.

>
> >   RDMA/rtrs-clt: Add a minimum latency multipath policy
> >   RDMA/rtrs-clt: new sysfs attribute to print the latency of each path
> >   Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy
>
> This is a series

This also will be sent as a separate set.

In summary, I will send 3 patch-sets: fault-injection, improbe
debugging, min-latency policy.
Thank you.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  2021-04-01 18:38   ` Jason Gunthorpe
@ 2021-04-06 10:23     ` Gioh Kim
  2021-04-06 12:51       ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-04-06 10:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Guoqing Jiang, Guoqing Jiang, Danil Kipnis

[-- Attachment #1: Type: text/plain, Size: 4818 bytes --]

On Thu, Apr 1, 2021 at 8:38 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Mar 25, 2021 at 04:32:52PM +0100, Gioh Kim wrote:
> > From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> >
> > No need to continue the loop if one sess is connected.
> >
> > Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
> > Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
> > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> >  .../fault-injection/rtrs-fault-injection.rst         | 12 ++++++------
> >  drivers/infiniband/ulp/rtrs/rtrs-clt.c               |  5 ++++-
> >  2 files changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/fault-injection/rtrs-fault-injection.rst b/Documentation/fault-injection/rtrs-fault-injection.rst
> > index 775a223fef09..65ffe26ece67 100644
> > +++ b/Documentation/fault-injection/rtrs-fault-injection.rst
> > @@ -17,10 +17,10 @@ Example 1: Inject an error into request processing of rtrs-client
> >
> >  Generate an error on one session of rtrs-client::
> > -
> > -  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability
> > -  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times
> > -  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request
> > +
> > +  echo 100 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/probability
> > +  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/times
> > +  echo 1 > /sys/kernel/debug/ip\:192.168.123.144@ip\:192.168.123.190/fault_inject/fail-request
> >    dd if=/dev/rnbd0 of=./dd bs=1k count=10
> >
> >  Expected Result::
> > @@ -72,12 +72,12 @@ Example 2: rtrs-server does not send ACK to the heart-beat of rtrs-client
> >    echo 100 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/probability
> >    echo 5 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/times
> >    echo 1 > /sys/kernel/debug/ip\:192.168.123.190@ip\:192.168.123.144/fault_inject/fail-hb-ack
> > -
> > +
> >  Expected Result::
> >
> >    If rtrs-server does not send ACK more than 5 times, rtrs-client tries reconnection.
> >
> >  Check how many times rtrs-client did reconnection::
> >
> > -  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects
> > +  cat /sys/devices/virtual/rtrs-client/bla/paths/ip\:192.168.122.142@ip\:192.168.122.130/stats/reconnects
> >    1 0
>
> Why is all of this in this patch?

I am terribly sorry. That is a rebase error.

>
> > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > index 4f7690137e42..bfb5be5481e7 100644
> > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > @@ -51,7 +51,10 @@ static inline bool rtrs_clt_is_connected(const struct rtrs_clt *clt)
> >
> >       rcu_read_lock();
> >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry)
> > -             connected |= READ_ONCE(sess->state) == RTRS_CLT_CONNECTED;
> > +             if (READ_ONCE(sess->state) == RTRS_CLT_CONNECTED) {
> > +                     connected = true;
> > +                     break;
> > +             }
> >       rcu_read_unlock();
>
> I assume this is the intended change? rebase error?

It is a rebase error.
Just in case, I copied the corrected patch below and also attached it.
If you want me to send the patch again, please inform me.

------------------------------------------------- 8<
----------------------------------------------------------

From 0362520ca0f974ff787a13b3fc36ff4dbef08aad Mon Sep 17 00:00:00 2001
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Date: Sun, 6 Dec 2020 04:20:58 +0100
Subject: [PATCH] RDMA/rtrs-clt: Break if one sess is connected in
 rtrs_clt_is_connected

No need to continue the loop if one sess is connected.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5062328ac577..1b75d2e4860e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -51,7 +51,10 @@ static inline bool rtrs_clt_is_connected(const
struct rtrs_clt *clt)

  rcu_read_lock();
  list_for_each_entry_rcu(sess, &clt->paths_list, s.entry)
- connected |= READ_ONCE(sess->state) == RTRS_CLT_CONNECTED;
+ if (READ_ONCE(sess->state) == RTRS_CLT_CONNECTED) {
+ connected = true;
+ break;
+ }
  rcu_read_unlock();

  return connected;
-- 
2.25.1

[-- Attachment #2: 0001-RDMA-rtrs-clt-Break-if-one-sess-is-connected-in-rtrs.patch --]
[-- Type: text/x-patch, Size: 1210 bytes --]

From 0362520ca0f974ff787a13b3fc36ff4dbef08aad Mon Sep 17 00:00:00 2001
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Date: Sun, 6 Dec 2020 04:20:58 +0100
Subject: [PATCH] RDMA/rtrs-clt: Break if one sess is connected in
 rtrs_clt_is_connected

No need to continue the loop if one sess is connected.

Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5062328ac577..1b75d2e4860e 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -51,7 +51,10 @@ static inline bool rtrs_clt_is_connected(const struct rtrs_clt *clt)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry)
-		connected |= READ_ONCE(sess->state) == RTRS_CLT_CONNECTED;
+		if (READ_ONCE(sess->state) == RTRS_CLT_CONNECTED) {
+			connected = true;
+			break;
+		}
 	rcu_read_unlock();
 
 	return connected;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  2021-04-06 10:23     ` Gioh Kim
@ 2021-04-06 12:51       ` Jason Gunthorpe
  2021-04-06 12:53         ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-06 12:51 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Guoqing Jiang, Guoqing Jiang, Danil Kipnis

On Tue, Apr 06, 2021 at 12:23:49PM +0200, Gioh Kim wrote:

> It is a rebase error.
> Just in case, I copied the corrected patch below and also attached it.
> If you want me to send the patch again, please inform me.

Follow along on https://patchwork.kernel.org/project/linux-rdma/list/

If your patch isn't there you need to fix something and resend it

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  2021-04-06 12:51       ` Jason Gunthorpe
@ 2021-04-06 12:53         ` Gioh Kim
  2021-04-06 12:59           ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-04-06 12:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Guoqing Jiang, Guoqing Jiang, Danil Kipnis

On Tue, Apr 6, 2021 at 2:51 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Apr 06, 2021 at 12:23:49PM +0200, Gioh Kim wrote:
>
> > It is a rebase error.
> > Just in case, I copied the corrected patch below and also attached it.
> > If you want me to send the patch again, please inform me.
>
> Follow along on https://patchwork.kernel.org/project/linux-rdma/list/
>
> If your patch isn't there you need to fix something and resend it
>
> Jason

Ok, I will resend that patch.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected
  2021-04-06 12:53         ` Gioh Kim
@ 2021-04-06 12:59           ` Gioh Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-04-06 12:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Guoqing Jiang, Guoqing Jiang, Danil Kipnis

On Tue, Apr 6, 2021 at 2:53 PM Gioh Kim <gi-oh.kim@ionos.com> wrote:
>
> On Tue, Apr 6, 2021 at 2:51 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Apr 06, 2021 at 12:23:49PM +0200, Gioh Kim wrote:
> >
> > > It is a rebase error.
> > > Just in case, I copied the corrected patch below and also attached it.
> > > If you want me to send the patch again, please inform me.
> >
> > Follow along on https://patchwork.kernel.org/project/linux-rdma/list/
> >
> > If your patch isn't there you need to fix something and resend it
> >
> > Jason
>
> Ok, I will resend that patch.

Thank you. I just sent it.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-06  8:55     ` Gioh Kim
@ 2021-04-08 12:04       ` Jason Gunthorpe
  2021-04-08 12:08         ` Gioh Kim
  2021-04-08 13:45         ` Jinpu Wang
  0 siblings, 2 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-08 12:04 UTC (permalink / raw)
  To: Gioh Kim
  Cc: linux-rdma, bvanassche, leon, Doug Ledford, Haris Iqbal,
	Jinpu Wang, Md Haris Iqbal

On Tue, Apr 06, 2021 at 10:55:59AM +0200, Gioh Kim wrote:
> On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > index 42f49208b8f7..1519191d7154 100644
> > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
> > >       int inflight;
> > >
> > >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> > > +             if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> > > +                     continue;
> >
> > There is no way this could be right, a READ_ONCE can't guarentee that
> > a following load is not going to happen without races.
> >
> > You need locking.
> 
> Hi Jason,
> 
> rtrs_clt_request() calls rcu_read_lock() before calling
> get_next_path_min_inflight().
> And rtrs_clt_change_state_from_to(), that changes the sess->state,
> calls spin_lock_irq() before changing it.
> I think that is enough, isn't it?

Why would that be enough?

Under RCU this check is racy and effetively does nothing.

This is an OK usage of RCU:

	list_del_rcu(&sess->s.entry);

	/* Make sure everybody observes path removal. */
	synchronize_rcu();

And you could say that observing the sess in the list is required, but
checking state is pointless.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-08 12:04       ` Jason Gunthorpe
@ 2021-04-08 12:08         ` Gioh Kim
  2021-04-08 13:45         ` Jinpu Wang
  1 sibling, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-04-08 12:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Bart Van Assche, Leon Romanovsky, Doug Ledford,
	Haris Iqbal, Jinpu Wang, Md Haris Iqbal

On Thu, Apr 8, 2021 at 2:04 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Apr 06, 2021 at 10:55:59AM +0200, Gioh Kim wrote:
> > On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > index 42f49208b8f7..1519191d7154 100644
> > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
> > > >       int inflight;
> > > >
> > > >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> > > > +             if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> > > > +                     continue;
> > >
> > > There is no way this could be right, a READ_ONCE can't guarentee that
> > > a following load is not going to happen without races.
> > >
> > > You need locking.
> >
> > Hi Jason,
> >
> > rtrs_clt_request() calls rcu_read_lock() before calling
> > get_next_path_min_inflight().
> > And rtrs_clt_change_state_from_to(), that changes the sess->state,
> > calls spin_lock_irq() before changing it.
> > I think that is enough, isn't it?
>
> Why would that be enough?
>
> Under RCU this check is racy and effetively does nothing.

Thank you for your review.
I will have a discussion with my colleagues and let you know the result.


>
> This is an OK usage of RCU:
>
>         list_del_rcu(&sess->s.entry);
>
>         /* Make sure everybody observes path removal. */
>         synchronize_rcu();
>
> And you could say that observing the sess in the list is required, but
> checking state is pointless.
>
> Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-08 12:04       ` Jason Gunthorpe
  2021-04-08 12:08         ` Gioh Kim
@ 2021-04-08 13:45         ` Jinpu Wang
  2021-04-08 13:50           ` Jason Gunthorpe
  1 sibling, 1 reply; 44+ messages in thread
From: Jinpu Wang @ 2021-04-08 13:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gioh Kim, linux-rdma, Bart Van Assche, Leon Romanovsky,
	Doug Ledford, Haris Iqbal, Md Haris Iqbal

On Thu, Apr 8, 2021 at 2:04 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Apr 06, 2021 at 10:55:59AM +0200, Gioh Kim wrote:
> > On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > index 42f49208b8f7..1519191d7154 100644
> > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
> > > >       int inflight;
> > > >
> > > >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> > > > +             if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> > > > +                     continue;
> > >
> > > There is no way this could be right, a READ_ONCE can't guarentee that
> > > a following load is not going to happen without races.
> > >
> > > You need locking.
> >
> > Hi Jason,
> >
> > rtrs_clt_request() calls rcu_read_lock() before calling
> > get_next_path_min_inflight().
> > And rtrs_clt_change_state_from_to(), that changes the sess->state,
> > calls spin_lock_irq() before changing it.
> > I think that is enough, isn't it?
>
> Why would that be enough?
>
> Under RCU this check is racy and effetively does nothing.
>
> This is an OK usage of RCU:
>
>         list_del_rcu(&sess->s.entry);
>
>         /* Make sure everybody observes path removal. */
>         synchronize_rcu();
>
> And you could say that observing the sess in the list is required, but
> checking state is pointless.
>
> Jason
Hi Jason

Sending IO via disconnected session is not a problem, it will just get
an error. It's only about if in the meantime user delete the path
while IO running, and session is freed.

There is session state machine it will go CONNECTED to CLOSED will be
a few steps, disconnect, drain QP/etc,  the cpu will get the latest
state sooner or later.
We don't really care if there's a few cycles sess->state is no up to date.

Regards!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-08 13:45         ` Jinpu Wang
@ 2021-04-08 13:50           ` Jason Gunthorpe
  2021-04-08 14:44             ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-08 13:50 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: Gioh Kim, linux-rdma, Bart Van Assche, Leon Romanovsky,
	Doug Ledford, Haris Iqbal, Md Haris Iqbal

On Thu, Apr 08, 2021 at 03:45:45PM +0200, Jinpu Wang wrote:
> On Thu, Apr 8, 2021 at 2:04 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Apr 06, 2021 at 10:55:59AM +0200, Gioh Kim wrote:
> > > On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > >
> > > > On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > index 42f49208b8f7..1519191d7154 100644
> > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
> > > > >       int inflight;
> > > > >
> > > > >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> > > > > +             if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> > > > > +                     continue;
> > > >
> > > > There is no way this could be right, a READ_ONCE can't guarentee that
> > > > a following load is not going to happen without races.
> > > >
> > > > You need locking.
> > >
> > > Hi Jason,
> > >
> > > rtrs_clt_request() calls rcu_read_lock() before calling
> > > get_next_path_min_inflight().
> > > And rtrs_clt_change_state_from_to(), that changes the sess->state,
> > > calls spin_lock_irq() before changing it.
> > > I think that is enough, isn't it?
> >
> > Why would that be enough?
> >
> > Under RCU this check is racy and effetively does nothing.
> >
> > This is an OK usage of RCU:
> >
> >         list_del_rcu(&sess->s.entry);
> >
> >         /* Make sure everybody observes path removal. */
> >         synchronize_rcu();
> >
> > And you could say that observing the sess in the list is required, but
> > checking state is pointless.
> >
> > Jason
> Hi Jason
> 
> Sending IO via disconnected session is not a problem, it will just get
> an error. It's only about if in the meantime user delete the path
> while IO running, and session is freed.

But the session can toggle to !connected immediately after the test
above as well, so I still don't see what this is accomplishing.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-08 13:50           ` Jason Gunthorpe
@ 2021-04-08 14:44             ` Gioh Kim
  2021-04-08 14:51               ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Gioh Kim @ 2021-04-08 14:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jinpu Wang, linux-rdma, Bart Van Assche, Leon Romanovsky,
	Doug Ledford, Haris Iqbal, Md Haris Iqbal

On Thu, Apr 8, 2021 at 3:50 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Apr 08, 2021 at 03:45:45PM +0200, Jinpu Wang wrote:
> > On Thu, Apr 8, 2021 at 2:04 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Tue, Apr 06, 2021 at 10:55:59AM +0200, Gioh Kim wrote:
> > > > On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > > >
> > > > > On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote:
> > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > > index 42f49208b8f7..1519191d7154 100644
> > > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it)
> > > > > >       int inflight;
> > > > > >
> > > > > >       list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) {
> > > > > > +             if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED))
> > > > > > +                     continue;
> > > > >
> > > > > There is no way this could be right, a READ_ONCE can't guarentee that
> > > > > a following load is not going to happen without races.
> > > > >
> > > > > You need locking.
> > > >
> > > > Hi Jason,
> > > >
> > > > rtrs_clt_request() calls rcu_read_lock() before calling
> > > > get_next_path_min_inflight().
> > > > And rtrs_clt_change_state_from_to(), that changes the sess->state,
> > > > calls spin_lock_irq() before changing it.
> > > > I think that is enough, isn't it?
> > >
> > > Why would that be enough?
> > >
> > > Under RCU this check is racy and effetively does nothing.
> > >
> > > This is an OK usage of RCU:
> > >
> > >         list_del_rcu(&sess->s.entry);
> > >
> > >         /* Make sure everybody observes path removal. */
> > >         synchronize_rcu();
> > >
> > > And you could say that observing the sess in the list is required, but
> > > checking state is pointless.
> > >
> > > Jason
> > Hi Jason
> >
> > Sending IO via disconnected session is not a problem, it will just get
> > an error. It's only about if in the meantime user delete the path
> > while IO running, and session is freed.
>
> But the session can toggle to !connected immediately after the test
> above as well, so I still don't see what this is accomplishing.
>
> Jason

 The original problem was
- get_next_path_min_inflight() checks sess->state == CONNECTED
- rtrs_clt_remove_path_from_sysfs() set sess->state = RTRS_CLT_DEAD
- rtrs_clt_remove_path_from_sysfs() ->  rtrs_clt_destroy_sess_files()
-> kobject_put(sess->stats->kobj_stats) -> free sess->stats
- get_next_path_min_inflight() read sess->stats->inflight

So the patch adds checking sess->state in get_next_path_min_inflight().

I think the same problem happens after rtrs_clt_request().
- rtrs_clt_request() checks sess->state == CONNECTED
- rtrs_clt_remove_path_from_sysfs() set sess->state = RTRS_CLT_DEAD
- rtrs_clt_remove_path_from_sysfs() ->  rtrs_clt_destroy_sess_files()
-> kobject_put(sess->stats->kobj_stats) -> free sess->stats
- rtrs_clt_request()->rtrs_clt_read_req() reads sess->stats->inflight

I think it might be a solution to change the
rtrs_clt_remove_path_from_sysfs as below.
It changes the order: first remove the session from list and then
destroy sess->stat memory.

@@ -2900,8 +2900,8 @@ int rtrs_clt_remove_path_from_sysfs(struct
rtrs_clt_sess *sess,
        } while (!changed && old_state != RTRS_CLT_DEAD);

        if (likely(changed)) {
-               rtrs_clt_destroy_sess_files(sess, sysfs_self);
                rtrs_clt_remove_path_from_arr(sess);
+               rtrs_clt_destroy_sess_files(sess, sysfs_self);
                kobject_put(&sess->kobj);
        }

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-08 14:44             ` Gioh Kim
@ 2021-04-08 14:51               ` Jason Gunthorpe
  2021-04-12  8:41                 ` Gioh Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2021-04-08 14:51 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Jinpu Wang, linux-rdma, Bart Van Assche, Leon Romanovsky,
	Doug Ledford, Haris Iqbal, Md Haris Iqbal

On Thu, Apr 08, 2021 at 04:44:02PM +0200, Gioh Kim wrote:

> I think it might be a solution to change the
> rtrs_clt_remove_path_from_sysfs as below.  It changes the order:
> first remove the session from list and then destroy sess->stat
> memory.

Freeing memory used under a RCU lock before doing a RCU
synchronization is an error, so at least this could make sense to me

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
  2021-04-08 14:51               ` Jason Gunthorpe
@ 2021-04-12  8:41                 ` Gioh Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Gioh Kim @ 2021-04-12  8:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jinpu Wang, linux-rdma, Bart Van Assche, Leon Romanovsky,
	Doug Ledford, Haris Iqbal, Md Haris Iqbal

On Thu, Apr 8, 2021 at 4:51 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Apr 08, 2021 at 04:44:02PM +0200, Gioh Kim wrote:
>
> > I think it might be a solution to change the
> > rtrs_clt_remove_path_from_sysfs as below.  It changes the order:
> > first remove the session from list and then destroy sess->stat
> > memory.
>
> Freeing memory used under a RCU lock before doing a RCU
> synchronization is an error, so at least this could make sense to me
>
> Jason

Hi Jason,

I just sent a patch "RDMA/rtrs-clt: destroy sysfs after removing
session from active list".
Thank you for the review.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-04-12  8:42 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25 15:32 [PATCH for-next 00/22] Misc update for rtrs Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 01/22] MAINTAINERS: Change maintainer for rtrs module Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 02/22] RDMA/rtrs: Enable the fault-injection Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 03/22] RDMA/rtrs-clt: Inject a fault at request processing Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 04/22] RDMA/rtrs-srv: Inject a fault at heart-beat sending Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 05/22] docs: fault-injection: Add fault-injection manual of RTRS Gioh Kim
2021-04-01 18:37   ` Jason Gunthorpe
2021-04-01 19:06     ` Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 06/22] RDMA/rtrs-clt: Break if one sess is connected in rtrs_clt_is_connected Gioh Kim
2021-04-01 18:38   ` Jason Gunthorpe
2021-04-06 10:23     ` Gioh Kim
2021-04-06 12:51       ` Jason Gunthorpe
2021-04-06 12:53         ` Gioh Kim
2021-04-06 12:59           ` Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 07/22] RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 08/22] RDMA/rtrs: Kill the put label in rtrs_srv_create_once_sysfs_root_folders Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 09/22] RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 10/22] RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 11/22] RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 12/22] RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats Gioh Kim
2021-04-01 18:44   ` Jason Gunthorpe
2021-04-06  8:55     ` Gioh Kim
2021-04-08 12:04       ` Jason Gunthorpe
2021-04-08 12:08         ` Gioh Kim
2021-04-08 13:45         ` Jinpu Wang
2021-04-08 13:50           ` Jason Gunthorpe
2021-04-08 14:44             ` Gioh Kim
2021-04-08 14:51               ` Jason Gunthorpe
2021-04-12  8:41                 ` Gioh Kim
2021-03-25 15:32 ` [PATCH for-next 13/22] RDMA/rtrs: New function converting rtrs_addr to string Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 14/22] RDMA/rtrs-clt: Print more info when an error happens Gioh Kim
2021-04-01 18:46   ` Jason Gunthorpe
2021-04-01 19:09     ` Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 15/22] RDMA/rtrs-srv: More debugging info when fail to send reply Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 16/22] RDMA/rtrs-srv: Report temporary sessname for error message Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 17/22] RDMA/rtrs: cleanup unused variable Gioh Kim
2021-04-01 18:50   ` Jason Gunthorpe
2021-03-25 15:33 ` [PATCH for-next 18/22] RDMA/rtrs-clt: Simplify error message Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 19/22] RDMA/rtrs-clt: Cap max_io_size Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 20/22] RDMA/rtrs-clt: Add a minimum latency multipath policy Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 21/22] RDMA/rtrs-clt: new sysfs attribute to print the latency of each path Gioh Kim
2021-03-25 15:33 ` [PATCH for-next 22/22] Documentation/ABI/rtrs-clt: Add descriptions for min-latency policy Gioh Kim
2021-04-01 19:04 ` [PATCH for-next 00/22] Misc update for rtrs Jason Gunthorpe
2021-04-06  9:04   ` Gioh Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.