* [PATCH 0/3] Number of fixes for rtrs
@ 2020-07-24 11:15 Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
Cc: Md Haris Iqbal
This patch series fixes a number of issues discovered while testing
1) RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
2) RDMA/rtrs-srv: only call put_device when it's in sysfs
3) RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
Regards
Md Haris Iqbal
Jack Wang (2):
RDMA/rtrs-srv: only call put_device when it's in sysfs
RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
Danil Kipnis (1):
RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 16 +++++++++++++---
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 7 +++++--
2 files changed, 18 insertions(+), 5 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
@ 2020-07-24 11:15 ` Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
Cc: Md Haris Iqbal
From: Danil Kipnis <danil.kipnis@cloud.ionos.com>
In order to avoid all the clients to start reconnecting at the same time
schedule the reconnect dwork +[0,8] seconds late
Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 564388a85603..5b31d3b03737 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -12,6 +12,7 @@
#include <linux/module.h>
#include <linux/rculist.h>
+#include <linux/random.h>
#include "rtrs-clt.h"
#include "rtrs-log.h"
@@ -23,6 +24,12 @@
* leads to "false positives" failed reconnect attempts
*/
#define RTRS_RECONNECT_BACKOFF 1000
+/*
+ * Wait for additional random time between 0 and 8 seconds
+ * before starting to reconnect to avoid clients reconnecting
+ * all at once in case of a major network outage
+ */
+#define RTRS_RECONNECT_SEED 8
MODULE_DESCRIPTION("RDMA Transport Client");
MODULE_LICENSE("GPL");
@@ -306,7 +313,8 @@ static void rtrs_rdma_error_recovery(struct rtrs_clt_con *con)
*/
delay_ms = clt->reconnect_delay_sec * 1000;
queue_delayed_work(rtrs_wq, &sess->reconnect_dwork,
- msecs_to_jiffies(delay_ms));
+ msecs_to_jiffies(delay_ms +
+ prandom_u32() % RTRS_RECONNECT_SEED));
} else {
/*
* Error can happen just on establishing new connection,
@@ -2503,7 +2511,9 @@ static void rtrs_clt_reconnect_work(struct work_struct *work)
sess->stats->reconnects.fail_cnt++;
delay_ms = clt->reconnect_delay_sec * 1000;
queue_delayed_work(rtrs_wq, &sess->reconnect_dwork,
- msecs_to_jiffies(delay_ms));
+ msecs_to_jiffies(delay_ms +
+ prandom_u32() %
+ RTRS_RECONNECT_SEED));
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs
2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
@ 2020-07-24 11:15 ` Md Haris Iqbal
2020-07-24 12:28 ` Jason Gunthorpe
2020-07-24 11:15 ` [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq Md Haris Iqbal
2020-07-29 17:28 ` [PATCH 0/3] Number of fixes for rtrs Jason Gunthorpe
3 siblings, 1 reply; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
Cc: Md Haris Iqbal
From: Jack Wang <jinpu.wang@cloud.ionos.com>
There are error case we will call free_srv before device kobject
initialized, in such case we shouldn't call put_device, otherwise
a Warning will be generated, eg:
kobject: '(null)' (000000009f5445ed): is not initialized, yet kobject_put() is being called.
So add check before call into put_device.
Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 0d9241f5d9e6..8a55bc559466 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1373,7 +1373,10 @@ static void free_srv(struct rtrs_srv *srv)
mutex_destroy(&srv->paths_mutex);
mutex_destroy(&srv->paths_ev_mutex);
/* last put to release the srv structure */
- put_device(&srv->dev);
+ if(srv->dev.kobj.state_in_sysfs)
+ put_device(&srv->dev);
+ else
+ kfree(srv);
}
static inline struct rtrs_srv *__find_srv_and_get(struct rtrs_srv_ctx *ctx,
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
@ 2020-07-24 11:15 ` Md Haris Iqbal
2020-07-29 17:28 ` [PATCH 0/3] Number of fixes for rtrs Jason Gunthorpe
3 siblings, 0 replies; 6+ messages in thread
From: Md Haris Iqbal @ 2020-07-24 11:15 UTC (permalink / raw)
To: danil.kipnis, jinpu.wang, linux-rdma, dledford, jgg, leon, bvanassche
Cc: Md Haris Iqbal
From: Jack Wang <jinpu.wang@cloud.ionos.com>
We triggered warning from time to time when we run regression
test, eg:
rnbd_client L685: </dev/nullb0@bla> Device disconnected.
rnbd_client L1756: Unloading module
------------[ cut here ]-----------
workqueue: WQ_MEM_RECLAIM rtrs_client_wq:rtrs_clt_reconnect_work [rtrs_client] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
WARNING: CPU: 2 PID: 18824 at kernel/workqueue.c:2517 check_flush_dependency+0xad/0x130
The root cause is workqueue core expect flushing should not be done
for a !WQ_MEM_RECLAIM wq from a WQ_MEM_RECLAIM workqueue.
In above case ib_addr workqueue without WQ_MEM_RECLAIM, but rtrs_wq
WQ_MEM_RECLAIM.
To avoid the warning, remove the WQ_MEM_RECLAIM flag.
Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
---
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 2 +-
drivers/infiniband/ulp/rtrs/rtrs-srv.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5b31d3b03737..776e89231c52 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2982,7 +2982,7 @@ static int __init rtrs_client_init(void)
pr_err("Failed to create rtrs-client dev class\n");
return PTR_ERR(rtrs_clt_dev_class);
}
- rtrs_wq = alloc_workqueue("rtrs_client_wq", WQ_MEM_RECLAIM, 0);
+ rtrs_wq = alloc_workqueue("rtrs_client_wq", 0, 0);
if (!rtrs_wq) {
class_destroy(rtrs_clt_dev_class);
return -ENOMEM;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 8a55bc559466..454bb6c343bb 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -2153,7 +2153,7 @@ static int __init rtrs_server_init(void)
err = PTR_ERR(rtrs_dev_class);
goto out_chunk_pool;
}
- rtrs_wq = alloc_workqueue("rtrs_server_wq", WQ_MEM_RECLAIM, 0);
+ rtrs_wq = alloc_workqueue("rtrs_server_wq", 0, 0);
if (!rtrs_wq) {
err = -ENOMEM;
goto out_dev_class;
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs
2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
@ 2020-07-24 12:28 ` Jason Gunthorpe
0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2020-07-24 12:28 UTC (permalink / raw)
To: Md Haris Iqbal
Cc: danil.kipnis, jinpu.wang, linux-rdma, dledford, leon, bvanassche
On Fri, Jul 24, 2020 at 04:45:07PM +0530, Md Haris Iqbal wrote:
> From: Jack Wang <jinpu.wang@cloud.ionos.com>
>
> There are error case we will call free_srv before device kobject
> initialized, in such case we shouldn't call put_device, otherwise
> a Warning will be generated, eg:
>
> kobject: '(null)' (000000009f5445ed): is not initialized, yet kobject_put() is being called.
>
> So add check before call into put_device.
>
> Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> index 0d9241f5d9e6..8a55bc559466 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> @@ -1373,7 +1373,10 @@ static void free_srv(struct rtrs_srv *srv)
> mutex_destroy(&srv->paths_mutex);
> mutex_destroy(&srv->paths_ev_mutex);
> /* last put to release the srv structure */
> - put_device(&srv->dev);
> + if(srv->dev.kobj.state_in_sysfs)
> + put_device(&srv->dev);
> + else
> + kfree(srv);
> }
Not like this, call device_initialize() sooner.
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] Number of fixes for rtrs
2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
` (2 preceding siblings ...)
2020-07-24 11:15 ` [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq Md Haris Iqbal
@ 2020-07-29 17:28 ` Jason Gunthorpe
3 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2020-07-29 17:28 UTC (permalink / raw)
To: Md Haris Iqbal
Cc: danil.kipnis, jinpu.wang, linux-rdma, dledford, leon, bvanassche
On Fri, Jul 24, 2020 at 04:45:05PM +0530, Md Haris Iqbal wrote:
> This patch series fixes a number of issues discovered while testing
>
> 1) RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
> 2) RDMA/rtrs-srv: only call put_device when it's in sysfs
> 3) RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
>
> Regards
> Md Haris Iqbal
>
>
> Jack Wang (2):
> RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
>
> Danil Kipnis (1):
> RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
Applied to for-next, thanks
> RDMA/rtrs-srv: only call put_device when it's in sysfs
Needs more work
Thanks,
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-07-29 17:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 11:15 [PATCH 0/3] Number of fixes for rtrs Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 1/3] RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting Md Haris Iqbal
2020-07-24 11:15 ` [PATCH 2/3] RDMA/rtrs-srv: only call put_device when it's in sysfs Md Haris Iqbal
2020-07-24 12:28 ` Jason Gunthorpe
2020-07-24 11:15 ` [PATCH 3/3] RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq Md Haris Iqbal
2020-07-29 17:28 ` [PATCH 0/3] Number of fixes for rtrs Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).