linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Number of fixes for rtrs
@ 2020-07-24 11:15 Md Haris Iqbal
  2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
  To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
  Cc: Md Haris Iqbal

This patch series fixes a number of issues discovered while testing

1) RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
2) RDMA/rtrs-srv: only call put_device when it's in sysfs
3) RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting

Regards
Md Haris Iqbal


Jack Wang (2):
  RDMA/rtrs-srv: only call put_device when it's in sysfs
  RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq

Danil Kipnis (1):
  RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting


 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 16 +++++++++++++---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c |  7 +++++--
 2 files changed, 18 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
  2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
@ 2020-07-24 11:15 ` Md Haris Iqbal
  2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
  To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
  Cc: Md Haris Iqbal

From: Danil Kipnis <danil.kipnis@cloud.ionos.com>

In order to avoid all the clients to start reconnecting at the same time
schedule the reconnect dwork +[0,8] seconds late

Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 564388a85603..5b31d3b03737 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -12,6 +12,7 @@
 
 #include <linux/module.h>
 #include <linux/rculist.h>
+#include <linux/random.h>
 
 #include "rtrs-clt.h"
 #include "rtrs-log.h"
@@ -23,6 +24,12 @@
  * leads to "false positives" failed reconnect attempts
  */
 #define RTRS_RECONNECT_BACKOFF 1000
+/*
+ * Wait for additional random time between 0 and 8 seconds
+ * before starting to reconnect to avoid clients reconnecting
+ * all at once in case of a major network outage
+ */
+#define RTRS_RECONNECT_SEED 8
 
 MODULE_DESCRIPTION("RDMA Transport Client");
 MODULE_LICENSE("GPL");
@@ -306,7 +313,8 @@ static void rtrs_rdma_error_recovery(struct rtrs_clt_con *con)
 		 */
 		delay_ms = clt->reconnect_delay_sec * 1000;
 		queue_delayed_work(rtrs_wq, &sess->reconnect_dwork,
-				   msecs_to_jiffies(delay_ms));
+				   msecs_to_jiffies(delay_ms +
+						    prandom_u32() % RTRS_RECONNECT_SEED));
 	} else {
 		/*
 		 * Error can happen just on establishing new connection,
@@ -2503,7 +2511,9 @@ static void rtrs_clt_reconnect_work(struct work_struct *work)
 		sess->stats->reconnects.fail_cnt++;
 		delay_ms = clt->reconnect_delay_sec * 1000;
 		queue_delayed_work(rtrs_wq, &sess->reconnect_dwork,
-				   msecs_to_jiffies(delay_ms));
+				   msecs_to_jiffies(delay_ms +
+						    prandom_u32() %
+						    RTRS_RECONNECT_SEED));
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs
  2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
  2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
@ 2020-07-24 11:15 ` Md Haris Iqbal
  2020-07-24 12:28   ` Jason Gunthorpe
  2020-07-24 11:15 ` [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq Md Haris Iqbal
  2020-07-29 17:28 ` [PATCH 0/3] Number of fixes for rtrs Jason Gunthorpe
  3 siblings, 1 reply; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
  To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
  Cc: Md Haris Iqbal

From: Jack Wang <jinpu.wang@cloud.ionos.com>

There are error case we will call free_srv before device kobject
initialized, in such case we shouldn't call put_device, otherwise
a Warning will be generated, eg:

kobject: '(null)' (000000009f5445ed): is not initialized, yet kobject_put() is being called.

So add check before call into put_device.

Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 0d9241f5d9e6..8a55bc559466 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1373,7 +1373,10 @@ static void free_srv(struct rtrs_srv *srv)
 	mutex_destroy(&srv->paths_mutex);
 	mutex_destroy(&srv->paths_ev_mutex);
 	/* last put to release the srv structure */
-	put_device(&srv->dev);
+	if(srv->dev.kobj.state_in_sysfs)
+		put_device(&srv->dev);
+	else
+		kfree(srv);
 }
 
 static inline struct rtrs_srv *__find_srv_and_get(struct rtrs_srv_ctx *ctx,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
  2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
  2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
  2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
@ 2020-07-24 11:15 ` Md Haris Iqbal
  2020-07-29 17:28 ` [PATCH 0/3] Number of fixes for rtrs Jason Gunthorpe
  3 siblings, 0 replies; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
  To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
  Cc: Md Haris Iqbal

From: Jack Wang <jinpu.wang@cloud.ionos.com>

We triggered warning from time to time when we run regression
test, eg:

rnbd_client L685: </dev/nullb0@bla> Device disconnected.
rnbd_client L1756: Unloading module
------------[ cut here ]-----------
workqueue: WQ_MEM_RECLAIM rtrs_client_wq:rtrs_clt_reconnect_work [rtrs_client] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
WARNING: CPU: 2 PID: 18824 at kernel/workqueue.c:2517 check_flush_dependency+0xad/0x130

The root cause is workqueue core expect flushing should not be done
for a !WQ_MEM_RECLAIM wq from a WQ_MEM_RECLAIM workqueue.

In above case ib_addr workqueue without WQ_MEM_RECLAIM, but rtrs_wq
WQ_MEM_RECLAIM.

To avoid the warning, remove the WQ_MEM_RECLAIM flag.

Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 2 +-
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5b31d3b03737..776e89231c52 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2982,7 +2982,7 @@ static int __init rtrs_client_init(void)
 		pr_err("Failed to create rtrs-client dev class\n");
 		return PTR_ERR(rtrs_clt_dev_class);
 	}
-	rtrs_wq = alloc_workqueue("rtrs_client_wq", WQ_MEM_RECLAIM, 0);
+	rtrs_wq = alloc_workqueue("rtrs_client_wq", 0, 0);
 	if (!rtrs_wq) {
 		class_destroy(rtrs_clt_dev_class);
 		return -ENOMEM;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 8a55bc559466..454bb6c343bb 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -2153,7 +2153,7 @@ static int __init rtrs_server_init(void)
 		err = PTR_ERR(rtrs_dev_class);
 		goto out_chunk_pool;
 	}
-	rtrs_wq = alloc_workqueue("rtrs_server_wq", WQ_MEM_RECLAIM, 0);
+	rtrs_wq = alloc_workqueue("rtrs_server_wq", 0, 0);
 	if (!rtrs_wq) {
 		err = -ENOMEM;
 		goto out_dev_class;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs
  2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
@ 2020-07-24 12:28   ` Jason Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2020-07-24 12:28 UTC (permalink / raw)
  To: Md Haris Iqbal
  Cc: danil.kipnis, jinpu.wang, linux-rdma, dledford, leon, bvanassche

On Fri, Jul 24, 2020 at 04:45:07PM +0530, Md Haris Iqbal wrote:
> From: Jack Wang <jinpu.wang@cloud.ionos.com>
> 
> There are error case we will call free_srv before device kobject
> initialized, in such case we shouldn't call put_device, otherwise
> a Warning will be generated, eg:
> 
> kobject: '(null)' (000000009f5445ed): is not initialized, yet kobject_put() is being called.
> 
> So add check before call into put_device.
> 
> Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
>  drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> index 0d9241f5d9e6..8a55bc559466 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> @@ -1373,7 +1373,10 @@ static void free_srv(struct rtrs_srv *srv)
>  	mutex_destroy(&srv->paths_mutex);
>  	mutex_destroy(&srv->paths_ev_mutex);
>  	/* last put to release the srv structure */
> -	put_device(&srv->dev);
> +	if(srv->dev.kobj.state_in_sysfs)
> +		put_device(&srv->dev);
> +	else
> +		kfree(srv);
>  }

Not like this, call device_initialize() sooner.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] Number of fixes for rtrs
  2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
                   ` (2 preceding siblings ...)
  2020-07-24 11:15 ` [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq Md Haris Iqbal
@ 2020-07-29 17:28 ` Jason Gunthorpe
  3 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2020-07-29 17:28 UTC (permalink / raw)
  To: Md Haris Iqbal
  Cc: danil.kipnis, jinpu.wang, linux-rdma, dledford, leon, bvanassche

On Fri, Jul 24, 2020 at 04:45:05PM +0530, Md Haris Iqbal wrote:
> This patch series fixes a number of issues discovered while testing
> 
> 1) RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
> 2) RDMA/rtrs-srv: only call put_device when it's in sysfs
> 3) RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
> 
> Regards
> Md Haris Iqbal
> 
> 
> Jack Wang (2):
>   RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
> 
> Danil Kipnis (1):
>   RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting

Applied to for-next, thanks

>   RDMA/rtrs-srv: only call put_device when it's in sysfs

Needs more work

Thanks,
Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-07-29 17:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
2020-07-24 12:28   ` Jason Gunthorpe
2020-07-24 11:15 ` [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq Md Haris Iqbal
2020-07-29 17:28 ` [PATCH 0/3] Number of fixes for rtrs Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).