All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down
@ 2021-05-25 15:49 Sagi Grimberg
  2021-05-25 17:10 ` Hannes Reinecke
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Sagi Grimberg @ 2021-05-25 15:49 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke

Controller teardown flow may take some time in case it has many I/O
queues, and the host may not send us keep-alive during this period.
Hence reset the traffic based keep-alive timer so we don't trigger
a controller teardown as a result of a keep-alive expiration.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/core.c  | 16 ++++++++++++----
 drivers/nvme/target/nvmet.h |  2 +-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 1853db38b682..e991b4671aeb 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -388,10 +388,10 @@ static void nvmet_keep_alive_timer(struct work_struct *work)
 {
 	struct nvmet_ctrl *ctrl = container_of(to_delayed_work(work),
 			struct nvmet_ctrl, ka_work);
-	bool cmd_seen = ctrl->cmd_seen;
+	bool reset_tbkas = ctrl->reset_tbkas;
 
-	ctrl->cmd_seen = false;
-	if (cmd_seen) {
+	ctrl->reset_tbkas = false;
+	if (reset_tbkas) {
 		pr_debug("ctrl %d reschedule traffic based keep-alive timer\n",
 			ctrl->cntlid);
 		schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
@@ -804,6 +804,14 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
 	percpu_ref_exit(&sq->ref);
 
 	if (ctrl) {
+		/*
+		 * teardown flow may take some time, and the host
+		 * may not send us keep-alive during this period,
+		 * hence reset the traffic based keep-alive timer
+		 * so we don't trigger a controller teardown as
+		 * a result of a keep-alive expiration.
+		 */
+		ctrl->reset_tbkas = true;
 		nvmet_ctrl_put(ctrl);
 		sq->ctrl = NULL; /* allows reusing the queue later */
 	}
@@ -952,7 +960,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	}
 
 	if (sq->ctrl)
-		sq->ctrl->cmd_seen = true;
+		sq->ctrl->reset_tbkas = true;
 
 	return true;
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index d69a409515d6..53aea9a8056e 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -167,7 +167,7 @@ struct nvmet_ctrl {
 	struct nvmet_subsys	*subsys;
 	struct nvmet_sq		**sqs;
 
-	bool			cmd_seen;
+	bool			reset_tbkas;
 
 	struct mutex		lock;
 	u64			cap;
-- 
2.27.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down
  2021-05-25 15:49 [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down Sagi Grimberg
@ 2021-05-25 17:10 ` Hannes Reinecke
  2021-05-26  1:55 ` Chaitanya Kulkarni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2021-05-25 17:10 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch

On 5/25/21 5:49 PM, Sagi Grimberg wrote:
> Controller teardown flow may take some time in case it has many I/O
> queues, and the host may not send us keep-alive during this period.
> Hence reset the traffic based keep-alive timer so we don't trigger
> a controller teardown as a result of a keep-alive expiration.
> 
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>   drivers/nvme/target/core.c  | 16 ++++++++++++----
>   drivers/nvme/target/nvmet.h |  2 +-
>   2 files changed, 13 insertions(+), 5 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down
  2021-05-25 15:49 [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down Sagi Grimberg
  2021-05-25 17:10 ` Hannes Reinecke
@ 2021-05-26  1:55 ` Chaitanya Kulkarni
  2021-05-26  1:57 ` Chaitanya Kulkarni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-26  1:55 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch,
	Hannes Reinecke

On 5/25/21 09:38, Sagi Grimberg wrote:
> @@ -804,6 +804,14 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>  	percpu_ref_exit(&sq->ref);
>  
>  	if (ctrl) {
> +		/*
> +		 * teardown flow may take some time, and the host
> +		 * may not send us keep-alive during this period,
> +		 * hence reset the traffic based keep-alive timer
> +		 * so we don't trigger a controller teardown as
> +		 * a result of a keep-alive expiration.
> +		 */
> +		ctrl->reset_tbkas = true;
>  		nvmet_ctrl_put(ctrl);
>  		sq->ctrl = NULL; /* allows reusing the queue later */
>  	}

The above comment could be :-

+               /*
+                * Teardown flow may take some time, and the host may
not send
+                * us keep-alive during this period, hence reset the traffic
+                * based keep-alive timer so we don't trigger a controller
+                * teardown as a result of a keep-alive expiration.
+                */



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down
  2021-05-25 15:49 [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down Sagi Grimberg
  2021-05-25 17:10 ` Hannes Reinecke
  2021-05-26  1:55 ` Chaitanya Kulkarni
@ 2021-05-26  1:57 ` Chaitanya Kulkarni
  2021-05-26  5:52 ` Yi Zhang
  2021-05-26 14:17 ` Christoph Hellwig
  4 siblings, 0 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-26  1:57 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch,
	Hannes Reinecke

On 5/25/21 09:38, Sagi Grimberg wrote:
> Controller teardown flow may take some time in case it has many I/O
> queues, and the host may not send us keep-alive during this period.
> Hence reset the traffic based keep-alive timer so we don't trigger
> a controller teardown as a result of a keep-alive expiration.
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

With nit fix of comment looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down
  2021-05-25 15:49 [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down Sagi Grimberg
                   ` (2 preceding siblings ...)
  2021-05-26  1:57 ` Chaitanya Kulkarni
@ 2021-05-26  5:52 ` Yi Zhang
  2021-05-26 14:17 ` Christoph Hellwig
  4 siblings, 0 replies; 6+ messages in thread
From: Yi Zhang @ 2021-05-26  5:52 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke

Verified the issue on the latest linux-block/for-next

Tested-by: Yi Zhang <yi.zhang@redhat.com>

On Wed, May 26, 2021 at 12:24 AM Sagi Grimberg <sagi@grimberg.me> wrote:
>
> Controller teardown flow may take some time in case it has many I/O
> queues, and the host may not send us keep-alive during this period.
> Hence reset the traffic based keep-alive timer so we don't trigger
> a controller teardown as a result of a keep-alive expiration.
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/target/core.c  | 16 ++++++++++++----
>  drivers/nvme/target/nvmet.h |  2 +-
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 1853db38b682..e991b4671aeb 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -388,10 +388,10 @@ static void nvmet_keep_alive_timer(struct work_struct *work)
>  {
>         struct nvmet_ctrl *ctrl = container_of(to_delayed_work(work),
>                         struct nvmet_ctrl, ka_work);
> -       bool cmd_seen = ctrl->cmd_seen;
> +       bool reset_tbkas = ctrl->reset_tbkas;
>
> -       ctrl->cmd_seen = false;
> -       if (cmd_seen) {
> +       ctrl->reset_tbkas = false;
> +       if (reset_tbkas) {
>                 pr_debug("ctrl %d reschedule traffic based keep-alive timer\n",
>                         ctrl->cntlid);
>                 schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
> @@ -804,6 +804,14 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>         percpu_ref_exit(&sq->ref);
>
>         if (ctrl) {
> +               /*
> +                * teardown flow may take some time, and the host
> +                * may not send us keep-alive during this period,
> +                * hence reset the traffic based keep-alive timer
> +                * so we don't trigger a controller teardown as
> +                * a result of a keep-alive expiration.
> +                */
> +               ctrl->reset_tbkas = true;
>                 nvmet_ctrl_put(ctrl);
>                 sq->ctrl = NULL; /* allows reusing the queue later */
>         }
> @@ -952,7 +960,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>         }
>
>         if (sq->ctrl)
> -               sq->ctrl->cmd_seen = true;
> +               sq->ctrl->reset_tbkas = true;
>
>         return true;
>
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index d69a409515d6..53aea9a8056e 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -167,7 +167,7 @@ struct nvmet_ctrl {
>         struct nvmet_subsys     *subsys;
>         struct nvmet_sq         **sqs;
>
> -       bool                    cmd_seen;
> +       bool                    reset_tbkas;
>
>         struct mutex            lock;
>         u64                     cap;
> --
> 2.27.0
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


-- 
Best Regards,
  Yi Zhang


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down
  2021-05-25 15:49 [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down Sagi Grimberg
                   ` (3 preceding siblings ...)
  2021-05-26  5:52 ` Yi Zhang
@ 2021-05-26 14:17 ` Christoph Hellwig
  4 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2021-05-26 14:17 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke

Thanks,

applied to nvme-5.13 with the updated comment.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-26 15:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25 15:49 [PATCH] nvmet: fix false keep-alive timeout when a controller is torn down Sagi Grimberg
2021-05-25 17:10 ` Hannes Reinecke
2021-05-26  1:55 ` Chaitanya Kulkarni
2021-05-26  1:57 ` Chaitanya Kulkarni
2021-05-26  5:52 ` Yi Zhang
2021-05-26 14:17 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.