All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] nvme: make NVMe freeze API reliably
@ 2022-08-21  8:47 Ming Lei
  2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Ming Lei @ 2022-08-21  8:47 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Sagi Grimberg, Keith Busch, Ming Lei

Hello,

The 1st patch is from Keith, which makes NVMe freeze APIs more reliably.

The 2nd patch removes nvme_wait_freeze() from nvme-pci's nvme_reset_work().

Yi, please test and see if the issue you reported in [1] can be fixed.

[1] https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r

Keith Busch (1):
  nvme: make NVMe freeze API reliably

Ming Lei (1):
  nvme-pci: don't wait freeze during resetting

 drivers/nvme/host/core.c | 16 +++++++++++++---
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  |  1 -
 3 files changed, 14 insertions(+), 4 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-21  8:47 [PATCH 0/2] nvme: make NVMe freeze API reliably Ming Lei
@ 2022-08-21  8:47 ` Ming Lei
  2022-08-24 11:15   ` Hannes Reinecke
                     ` (2 more replies)
  2022-08-21  8:47 ` [PATCH 2/2] nvme-pci: don't wait freeze during resetting Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 24+ messages in thread
From: Ming Lei @ 2022-08-21  8:47 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Sagi Grimberg, Keith Busch, Ming Lei

From: Keith Busch <kbusch@kernel.org>

In some corner cases[1], freeze wait and unfreeze API may be called on
unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
freeze APIs more reliably, then this kind of issues can be avoided.
And similar approach has been applied on stopping/quiescing nvme queues.

[1] https://lore.kernel.org/linux-nvme/20220801125753.1434024-1-ming.lei@redhat.com/

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r

Add comment log.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 16 +++++++++++++---
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index af367b22871b..fe4ae3616ed1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5028,8 +5028,11 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (!test_and_clear_bit(NVME_NS_FREEZE, &ns->flags))
+			continue;
 		blk_mq_unfreeze_queue(ns->queue);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
@@ -5040,6 +5043,8 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (!test_bit(NVME_NS_FREEZE, &ns->flags))
+			continue;
 		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
 		if (timeout <= 0)
 			break;
@@ -5054,8 +5059,11 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (!test_bit(NVME_NS_FREEZE, &ns->flags))
+			continue;
 		blk_mq_freeze_queue_wait(ns->queue);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
@@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		set_bit(NVME_NS_FREEZE, &ns->flags);
 		blk_freeze_queue_start(ns->queue);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_freeze);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1bdf714dcd9e..702ba19b5647 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -484,6 +484,7 @@ struct nvme_ns {
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
 #define NVME_NS_STOPPED		5
+#define NVME_NS_FREEZE		6
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-21  8:47 [PATCH 0/2] nvme: make NVMe freeze API reliably Ming Lei
  2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
@ 2022-08-21  8:47 ` Ming Lei
  2022-08-25 10:05   ` Chao Leng
                     ` (2 more replies)
  2022-08-24 11:57 ` [PATCH 0/2] nvme: make NVMe freeze API reliably Yi Zhang
  2022-09-19 14:52 ` Christoph Hellwig
  3 siblings, 3 replies; 24+ messages in thread
From: Ming Lei @ 2022-08-21  8:47 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Sagi Grimberg, Keith Busch, Ming Lei

First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
If tagset isn't allocated, there can't be any inflight IOs; otherwise
blk_mq_update_nr_hw_queues can freeze & wait queues.

Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
while requests are in progress"), it is fine to unfreeze queue without
draining inflight IOs.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3a1c37f32f30..91b2903fcc24 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_free_tagset(dev);
 	} else {
 		nvme_start_queues(&dev->ctrl);
-		nvme_wait_freeze(&dev->ctrl);
 		if (!dev->ctrl.tagset)
 			nvme_pci_alloc_tag_set(dev);
 		else
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
@ 2022-08-24 11:15   ` Hannes Reinecke
  2022-08-24 14:07     ` Keith Busch
  2022-08-25 10:02   ` Chao Leng
  2022-08-28 14:37   ` Sagi Grimberg
  2 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2022-08-24 11:15 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Sagi Grimberg, Keith Busch

On 8/21/22 10:47, Ming Lei wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> In some corner cases[1], freeze wait and unfreeze API may be called on
> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> freeze APIs more reliably, then this kind of issues can be avoided.
> And similar approach has been applied on stopping/quiescing nvme queues.
> 
> [1] https://lore.kernel.org/linux-nvme/20220801125753.1434024-1-ming.lei@redhat.com/
> 
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
> 
> Add comment log.
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/core.c | 16 +++++++++++++---
>   drivers/nvme/host/nvme.h |  1 +
>   2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index af367b22871b..fe4ae3616ed1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5028,8 +5028,11 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (!test_and_clear_bit(NVME_NS_FREEZE, &ns->flags))
> +			continue;
>   		blk_mq_unfreeze_queue(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_unfreeze);
> @@ -5040,6 +5043,8 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
>   
>   	down_read(&ctrl->namespaces_rwsem);
>   	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> +			continue;
>   		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
>   		if (timeout <= 0)
>   			break;
> @@ -5054,8 +5059,11 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> +			continue;
>   		blk_mq_freeze_queue_wait(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_wait_freeze);
> @@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) { > +		set_bit(NVME_NS_FREEZE, &ns->flags);

Why not test_and_set_bit()?
Which would have the nice effect of adding a memory barrier ...

>   		blk_freeze_queue_start(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_start_freeze);

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] nvme: make NVMe freeze API reliably
  2022-08-21  8:47 [PATCH 0/2] nvme: make NVMe freeze API reliably Ming Lei
  2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
  2022-08-21  8:47 ` [PATCH 2/2] nvme-pci: don't wait freeze during resetting Ming Lei
@ 2022-08-24 11:57 ` Yi Zhang
  2022-09-19 14:52 ` Christoph Hellwig
  3 siblings, 0 replies; 24+ messages in thread
From: Yi Zhang @ 2022-08-24 11:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, open list:NVM EXPRESS DRIVER, Sagi Grimberg,
	Keith Busch

On Sun, Aug 21, 2022 at 4:48 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Hello,
>
> The 1st patch is from Keith, which makes NVMe freeze APIs more reliably.
>
> The 2nd patch removes nvme_wait_freeze() from nvme-pci's nvme_reset_work().
>
> Yi, please test and see if the issue you reported in [1] can be fixed.

Hi Ming
The issue cannot be reproduced now after applying the two patch, feel
free to add
Tested-by: Yi Zhang <yi.zhang@redhat.com>


>
> [1] https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
>
> Keith Busch (1):
>   nvme: make NVMe freeze API reliably
>
> Ming Lei (1):
>   nvme-pci: don't wait freeze during resetting
>
>  drivers/nvme/host/core.c | 16 +++++++++++++---
>  drivers/nvme/host/nvme.h |  1 +
>  drivers/nvme/host/pci.c  |  1 -
>  3 files changed, 14 insertions(+), 4 deletions(-)
>
> --
> 2.31.1
>


-- 
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-24 11:15   ` Hannes Reinecke
@ 2022-08-24 14:07     ` Keith Busch
  0 siblings, 0 replies; 24+ messages in thread
From: Keith Busch @ 2022-08-24 14:07 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Ming Lei, Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg

On Wed, Aug 24, 2022 at 01:15:28PM +0200, Hannes Reinecke wrote:
> On 8/21/22 10:47, Ming Lei wrote:
> > @@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
> >   	struct nvme_ns *ns;
> >   	down_read(&ctrl->namespaces_rwsem);
> > -	list_for_each_entry(ns, &ctrl->namespaces, list)
> > +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> > +		set_bit(NVME_NS_FREEZE, &ns->flags);
> 
> Why not test_and_set_bit()?
> Which would have the nice effect of adding a memory barrier ...

Sure, I think you can even make a new WARN if the test fails. It should never
fail in this path, otherwise we could mess up the freeze depth.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
  2022-08-24 11:15   ` Hannes Reinecke
@ 2022-08-25 10:02   ` Chao Leng
  2022-09-06  4:49     ` Christoph Hellwig
  2022-09-06  8:45     ` Ming Lei
  2022-08-28 14:37   ` Sagi Grimberg
  2 siblings, 2 replies; 24+ messages in thread
From: Chao Leng @ 2022-08-25 10:02 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Sagi Grimberg, Keith Busch



On 2022/8/21 16:47, Ming Lei wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> In some corner cases[1], freeze wait and unfreeze API may be called on
> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> freeze APIs more reliably, then this kind of issues can be avoided.
> And similar approach has been applied on stopping/quiescing nvme queues.
This leads to another problem: the process that needs to be
in the frozen state is not actually frozen.
It's not safe.
There is the same risk on stopping/quiescing nvme queues.
> 
> [1] https://lore.kernel.org/linux-nvme/20220801125753.1434024-1-ming.lei@redhat.com/
> 
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
> 
> Add comment log.
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/core.c | 16 +++++++++++++---
>   drivers/nvme/host/nvme.h |  1 +
>   2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index af367b22871b..fe4ae3616ed1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5028,8 +5028,11 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (!test_and_clear_bit(NVME_NS_FREEZE, &ns->flags))
> +			continue;
>   		blk_mq_unfreeze_queue(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_unfreeze);
> @@ -5040,6 +5043,8 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
>   
>   	down_read(&ctrl->namespaces_rwsem);
>   	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> +			continue;
>   		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
>   		if (timeout <= 0)
>   			break;
> @@ -5054,8 +5059,11 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> +			continue;
>   		blk_mq_freeze_queue_wait(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_wait_freeze);
> @@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		set_bit(NVME_NS_FREEZE, &ns->flags);
>   		blk_freeze_queue_start(ns->queue);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_start_freeze);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 1bdf714dcd9e..702ba19b5647 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -484,6 +484,7 @@ struct nvme_ns {
>   #define NVME_NS_FORCE_RO	3
>   #define NVME_NS_READY		4
>   #define NVME_NS_STOPPED		5
> +#define NVME_NS_FREEZE		6
>   
>   	struct cdev		cdev;
>   	struct device		cdev_device;
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-21  8:47 ` [PATCH 2/2] nvme-pci: don't wait freeze during resetting Ming Lei
@ 2022-08-25 10:05   ` Chao Leng
  2022-08-25 11:34     ` Ming Lei
  2022-08-28 14:37   ` Sagi Grimberg
  2022-09-06  4:49   ` Christoph Hellwig
  2 siblings, 1 reply; 24+ messages in thread
From: Chao Leng @ 2022-08-25 10:05 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Sagi Grimberg, Keith Busch



On 2022/8/21 16:47, Ming Lei wrote:
> First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
> If tagset isn't allocated, there can't be any inflight IOs; otherwise
> blk_mq_update_nr_hw_queues can freeze & wait queues.
> 
> Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> while requests are in progress"), it is fine to unfreeze queue without
> draining inflight IOs.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/pci.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3a1c37f32f30..91b2903fcc24 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
>   		nvme_free_tagset(dev);
>   	} else {
>   		nvme_start_queues(&dev->ctrl);
> -		nvme_wait_freeze(&dev->ctrl);
It is not safe.
nvme_dev_add may call blk_mq_update_nr_hw_queues, blk_mq_update_nr_hw_queues
require the state is frozen.
>   		if (!dev->ctrl.tagset)
>   			nvme_pci_alloc_tag_set(dev);
>   		else
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-25 10:05   ` Chao Leng
@ 2022-08-25 11:34     ` Ming Lei
  2022-08-25 13:42       ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-08-25 11:34 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch

On Thu, Aug 25, 2022 at 06:05:30PM +0800, Chao Leng wrote:
> 
> 
> On 2022/8/21 16:47, Ming Lei wrote:
> > First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
> > If tagset isn't allocated, there can't be any inflight IOs; otherwise
> > blk_mq_update_nr_hw_queues can freeze & wait queues.
> > 
> > Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> > while requests are in progress"), it is fine to unfreeze queue without
> > draining inflight IOs.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/pci.c | 1 -
> >   1 file changed, 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 3a1c37f32f30..91b2903fcc24 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
> >   		nvme_free_tagset(dev);
> >   	} else {
> >   		nvme_start_queues(&dev->ctrl);
> > -		nvme_wait_freeze(&dev->ctrl);
> It is not safe.
> nvme_dev_add may call blk_mq_update_nr_hw_queues, blk_mq_update_nr_hw_queues
> require the state is frozen.

blk_mq_update_nr_hw_queues() calls blk_freeze_queue() for every queue in
this tagset, so it needn't nvme's freeze wait.


Thanks,
Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-25 11:34     ` Ming Lei
@ 2022-08-25 13:42       ` Keith Busch
  2022-08-25 14:15         ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2022-08-25 13:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Chao Leng, Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg

On Thu, Aug 25, 2022 at 07:34:30PM +0800, Ming Lei wrote:
> On Thu, Aug 25, 2022 at 06:05:30PM +0800, Chao Leng wrote:
> > 
> > 
> > On 2022/8/21 16:47, Ming Lei wrote:
> > > First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
> > > If tagset isn't allocated, there can't be any inflight IOs; otherwise
> > > blk_mq_update_nr_hw_queues can freeze & wait queues.
> > > 
> > > Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> > > while requests are in progress"), it is fine to unfreeze queue without
> > > draining inflight IOs.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >   drivers/nvme/host/pci.c | 1 -
> > >   1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index 3a1c37f32f30..91b2903fcc24 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
> > >   		nvme_free_tagset(dev);
> > >   	} else {
> > >   		nvme_start_queues(&dev->ctrl);
> > > -		nvme_wait_freeze(&dev->ctrl);
> > It is not safe.
> > nvme_dev_add may call blk_mq_update_nr_hw_queues, blk_mq_update_nr_hw_queues
> > require the state is frozen.
> 
> blk_mq_update_nr_hw_queues() calls blk_freeze_queue() for every queue in
> this tagset, so it needn't nvme's freeze wait.

Right, the blk-mq realloc should be fine, and it's really a very uncommon event
that nvme needs to realloc the tag set.

I'm trying to remember why I added the wait freeze here in the first place, and
I don't recall a good reason for it. I think I had a controller that always
timed out IO, and the resets would just continue indefinitely. Waiting for
freeze within a reset causes the next IO timeout to trigger a driver detach.
That doesn't sound like a good idea, though, since it tears down the admin
queue too.

Anyway, this looks good to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-25 13:42       ` Keith Busch
@ 2022-08-25 14:15         ` Ming Lei
  2022-08-26  1:27           ` Chao Leng
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-08-25 14:15 UTC (permalink / raw)
  To: Keith Busch
  Cc: Chao Leng, Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg

On Thu, Aug 25, 2022 at 07:42:44AM -0600, Keith Busch wrote:
> On Thu, Aug 25, 2022 at 07:34:30PM +0800, Ming Lei wrote:
> > On Thu, Aug 25, 2022 at 06:05:30PM +0800, Chao Leng wrote:
> > > 
> > > 
> > > On 2022/8/21 16:47, Ming Lei wrote:
> > > > First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
> > > > If tagset isn't allocated, there can't be any inflight IOs; otherwise
> > > > blk_mq_update_nr_hw_queues can freeze & wait queues.
> > > > 
> > > > Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> > > > while requests are in progress"), it is fine to unfreeze queue without
> > > > draining inflight IOs.
> > > > 
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >   drivers/nvme/host/pci.c | 1 -
> > > >   1 file changed, 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > > index 3a1c37f32f30..91b2903fcc24 100644
> > > > --- a/drivers/nvme/host/pci.c
> > > > +++ b/drivers/nvme/host/pci.c
> > > > @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
> > > >   		nvme_free_tagset(dev);
> > > >   	} else {
> > > >   		nvme_start_queues(&dev->ctrl);
> > > > -		nvme_wait_freeze(&dev->ctrl);
> > > It is not safe.
> > > nvme_dev_add may call blk_mq_update_nr_hw_queues, blk_mq_update_nr_hw_queues
> > > require the state is frozen.
> > 
> > blk_mq_update_nr_hw_queues() calls blk_freeze_queue() for every queue in
> > this tagset, so it needn't nvme's freeze wait.
> 
> Right, the blk-mq realloc should be fine, and it's really a very uncommon event
> that nvme needs to realloc the tag set.
> 
> I'm trying to remember why I added the wait freeze here in the first place, and
> I don't recall a good reason for it. I think I had a controller that always
> timed out IO, and the resets would just continue indefinitely. Waiting for
> freeze within a reset causes the next IO timeout to trigger a driver detach.
> That doesn't sound like a good idea, though, since it tears down the admin
> queue too.

The wait becomes not necessary since commit bdd6316094e0 ("block: Allow unfreezing
of a queue while requests are in progress").


Thanks,
Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-25 14:15         ` Ming Lei
@ 2022-08-26  1:27           ` Chao Leng
  0 siblings, 0 replies; 24+ messages in thread
From: Chao Leng @ 2022-08-26  1:27 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg



On 2022/8/25 22:15, Ming Lei wrote:
> On Thu, Aug 25, 2022 at 07:42:44AM -0600, Keith Busch wrote:
>> On Thu, Aug 25, 2022 at 07:34:30PM +0800, Ming Lei wrote:
>>> On Thu, Aug 25, 2022 at 06:05:30PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>> First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
>>>>> If tagset isn't allocated, there can't be any inflight IOs; otherwise
>>>>> blk_mq_update_nr_hw_queues can freeze & wait queues.
>>>>>
>>>>> Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
>>>>> while requests are in progress"), it is fine to unfreeze queue without
>>>>> draining inflight IOs.
>>>>>
>>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>>> ---
>>>>>    drivers/nvme/host/pci.c | 1 -
>>>>>    1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>>>> index 3a1c37f32f30..91b2903fcc24 100644
>>>>> --- a/drivers/nvme/host/pci.c
>>>>> +++ b/drivers/nvme/host/pci.c
>>>>> @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
>>>>>    		nvme_free_tagset(dev);
>>>>>    	} else {
>>>>>    		nvme_start_queues(&dev->ctrl);
>>>>> -		nvme_wait_freeze(&dev->ctrl);
>>>> It is not safe.
>>>> nvme_dev_add may call blk_mq_update_nr_hw_queues, blk_mq_update_nr_hw_queues
>>>> require the state is frozen.
>>>
>>> blk_mq_update_nr_hw_queues() calls blk_freeze_queue() for every queue in
>>> this tagset, so it needn't nvme's freeze wait.
>>
>> Right, the blk-mq realloc should be fine, and it's really a very uncommon event
>> that nvme needs to realloc the tag set.
>>
>> I'm trying to remember why I added the wait freeze here in the first place, and
>> I don't recall a good reason for it. I think I had a controller that always
>> timed out IO, and the resets would just continue indefinitely. Waiting for
>> freeze within a reset causes the next IO timeout to trigger a driver detach.
>> That doesn't sound like a good idea, though, since it tears down the admin
>> queue too.
> 
> The wait becomes not necessary since commit bdd6316094e0 ("block: Allow unfreezing
> of a queue while requests are in progress").
Yes, I didn't notice this before. The wait freeze in nvme is redundant.

Reviewed-by: Chao Leng <lengchao@huawei.com>
> 
> 
> Thanks,
> Ming
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
  2022-08-24 11:15   ` Hannes Reinecke
  2022-08-25 10:02   ` Chao Leng
@ 2022-08-28 14:37   ` Sagi Grimberg
  2 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2022-08-28 14:37 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme; +Cc: Yi Zhang, Keith Busch

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-21  8:47 ` [PATCH 2/2] nvme-pci: don't wait freeze during resetting Ming Lei
  2022-08-25 10:05   ` Chao Leng
@ 2022-08-28 14:37   ` Sagi Grimberg
  2022-09-06  4:49   ` Christoph Hellwig
  2 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2022-08-28 14:37 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme; +Cc: Yi Zhang, Keith Busch

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] nvme-pci: don't wait freeze during resetting
  2022-08-21  8:47 ` [PATCH 2/2] nvme-pci: don't wait freeze during resetting Ming Lei
  2022-08-25 10:05   ` Chao Leng
  2022-08-28 14:37   ` Sagi Grimberg
@ 2022-09-06  4:49   ` Christoph Hellwig
  2 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2022-09-06  4:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch

On Sun, Aug 21, 2022 at 04:47:54PM +0800, Ming Lei wrote:
> First it isn't necessary to call nvme_wait_freeze in nvme_reset_work().
> If tagset isn't allocated, there can't be any inflight IOs; otherwise
> blk_mq_update_nr_hw_queues can freeze & wait queues.
> 
> Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> while requests are in progress"), it is fine to unfreeze queue without
> draining inflight IOs.

The apple, RDMA, and TCP drivers seem to follow the same wait_free
pattern, so I think we'll need patches for them as well.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-25 10:02   ` Chao Leng
@ 2022-09-06  4:49     ` Christoph Hellwig
  2022-09-06  8:45     ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2022-09-06  4:49 UTC (permalink / raw)
  To: Chao Leng
  Cc: Ming Lei, Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
	Keith Busch

On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>
>
> On 2022/8/21 16:47, Ming Lei wrote:
>> From: Keith Busch <kbusch@kernel.org>
>>
>> In some corner cases[1], freeze wait and unfreeze API may be called on
>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>> freeze APIs more reliably, then this kind of issues can be avoided.
>> And similar approach has been applied on stopping/quiescing nvme queues.
> This leads to another problem: the process that needs to be
> in the frozen state is not actually frozen.

Where do frozen processes come into play here?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-08-25 10:02   ` Chao Leng
  2022-09-06  4:49     ` Christoph Hellwig
@ 2022-09-06  8:45     ` Ming Lei
  2022-09-06  9:32       ` Chao Leng
  1 sibling, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-09-06  8:45 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch

On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
> 
> 
> On 2022/8/21 16:47, Ming Lei wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > In some corner cases[1], freeze wait and unfreeze API may be called on
> > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > freeze APIs more reliably, then this kind of issues can be avoided.
> > And similar approach has been applied on stopping/quiescing nvme queues.
> This leads to another problem: the process that needs to be
> in the frozen state is not actually frozen.
> It's not safe.

The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
can be done only the flag is set. Not sure how it isn't safe.

Meantime calling blk_mq_freeze_queue_wait() on queue not being started
to freeze is usually a bug, and I think WARN_ON_ONCE() can be added
in nvme_wait_freeze().


Thanks,
Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-09-06  8:45     ` Ming Lei
@ 2022-09-06  9:32       ` Chao Leng
  2022-09-07  0:33         ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Chao Leng @ 2022-09-06  9:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch



On 2022/9/6 16:45, Ming Lei wrote:
> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>
>>
>> On 2022/8/21 16:47, Ming Lei wrote:
>>> From: Keith Busch <kbusch@kernel.org>
>>>
>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>> And similar approach has been applied on stopping/quiescing nvme queues.
>> This leads to another problem: the process that needs to be
>> in the frozen state is not actually frozen.
>> It's not safe.
> 
> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> can be done only the flag is set. Not sure how it isn't safe.
I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
If just set_bit in nvme_start_freeze, it will cause another problem in
below scenario.
A: start freeze and set the bit;B:start freeze and set the bit;
and then
A:test and clear the bit, and unfreeze;B: test and skip;
The queue will be frozen for ever.

In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
No matter how to use NVME_NS_FREEZE , it may cause problems.
The freeze mechanism is perfect, and no additional protection mechanism is required.
> 
> Meantime calling blk_mq_freeze_queue_wait() on queue not being started
> to freeze is usually a bug, and I think WARN_ON_ONCE() can be added
> in nvme_wait_freeze().
> 
> 
> Thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-09-06  9:32       ` Chao Leng
@ 2022-09-07  0:33         ` Ming Lei
  2022-09-07  1:18           ` Chao Leng
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-09-07  0:33 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
	Keith Busch, ming.lei

On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
> 
> 
> On 2022/9/6 16:45, Ming Lei wrote:
> > On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
> > > 
> > > 
> > > On 2022/8/21 16:47, Ming Lei wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > In some corner cases[1], freeze wait and unfreeze API may be called on
> > > > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > > > freeze APIs more reliably, then this kind of issues can be avoided.
> > > > And similar approach has been applied on stopping/quiescing nvme queues.
> > > This leads to another problem: the process that needs to be
> > > in the frozen state is not actually frozen.
> > > It's not safe.
> > 
> > The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> > can be done only the flag is set. Not sure how it isn't safe.
> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
> If just set_bit in nvme_start_freeze, it will cause another problem in
> below scenario.
> A: start freeze and set the bit;B:start freeze and set the bit;
> and then
> A:test and clear the bit, and unfreeze;B: test and skip;
> The queue will be frozen for ever.

One simple approach is to replace down_read(->namespaces_rwsem) with
down_write(->namespaces_rwsem) in nvme_start_freeze() and
nvme_unfreeze().

> 
> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
> No matter how to use NVME_NS_FREEZE , it may cause problems.
> The freeze mechanism is perfect, and no additional protection mechanism is required.

block layer requires queue freeze and unfreeze APIs to be called in
pair strictly, that is why I add the 1st patch.



Thanks,
Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-09-07  0:33         ` Ming Lei
@ 2022-09-07  1:18           ` Chao Leng
  2022-09-07  2:06             ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Chao Leng @ 2022-09-07  1:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch



On 2022/9/7 8:33, Ming Lei wrote:
> On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>>
>>
>> On 2022/9/6 16:45, Ming Lei wrote:
>>> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>> From: Keith Busch <kbusch@kernel.org>
>>>>>
>>>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>>>> And similar approach has been applied on stopping/quiescing nvme queues.
>>>> This leads to another problem: the process that needs to be
>>>> in the frozen state is not actually frozen.
>>>> It's not safe.
>>>
>>> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
>>> can be done only the flag is set. Not sure how it isn't safe.
>> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
>> If just set_bit in nvme_start_freeze, it will cause another problem in
>> below scenario.
>> A: start freeze and set the bit;B:start freeze and set the bit;
>> and then
>> A:test and clear the bit, and unfreeze;B: test and skip;
>> The queue will be frozen for ever.
> 
> One simple approach is to replace down_read(->namespaces_rwsem) with
> down_write(->namespaces_rwsem) in nvme_start_freeze() and
> nvme_unfreeze().
> 
>>
>> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
>> No matter how to use NVME_NS_FREEZE , it may cause problems.
>> The freeze mechanism is perfect, and no additional protection mechanism is required.
> 
> block layer requires queue freeze and unfreeze APIs to be called in
> pair strictly, that is why I add the 1st patch.
 From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
patch 2/2 is already delete the nvme_wait_freeze.
If there is another bug of unmatched freeze and unfreeze,
can you describe the analysis of unmatched freeze and unfreeze?
The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
Maybe another patch is needed to fix the bug.
> 
> 
> 
> Thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-09-07  1:18           ` Chao Leng
@ 2022-09-07  2:06             ` Ming Lei
  2022-09-07  5:58               ` Chao Leng
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-09-07  2:06 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch

On Wed, Sep 07, 2022 at 09:18:52AM +0800, Chao Leng wrote:
> 
> 
> On 2022/9/7 8:33, Ming Lei wrote:
> > On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
> > > 
> > > 
> > > On 2022/9/6 16:45, Ming Lei wrote:
> > > > On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
> > > > > 
> > > > > 
> > > > > On 2022/8/21 16:47, Ming Lei wrote:
> > > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > > 
> > > > > > In some corner cases[1], freeze wait and unfreeze API may be called on
> > > > > > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > > > > > freeze APIs more reliably, then this kind of issues can be avoided.
> > > > > > And similar approach has been applied on stopping/quiescing nvme queues.
> > > > > This leads to another problem: the process that needs to be
> > > > > in the frozen state is not actually frozen.
> > > > > It's not safe.
> > > > 
> > > > The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> > > > can be done only the flag is set. Not sure how it isn't safe.
> > > I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
> > > If just set_bit in nvme_start_freeze, it will cause another problem in
> > > below scenario.
> > > A: start freeze and set the bit;B:start freeze and set the bit;
> > > and then
> > > A:test and clear the bit, and unfreeze;B: test and skip;
> > > The queue will be frozen for ever.
> > 
> > One simple approach is to replace down_read(->namespaces_rwsem) with
> > down_write(->namespaces_rwsem) in nvme_start_freeze() and
> > nvme_unfreeze().
> > 
> > > 
> > > In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
> > > No matter how to use NVME_NS_FREEZE , it may cause problems.
> > > The freeze mechanism is perfect, and no additional protection mechanism is required.
> > 
> > block layer requires queue freeze and unfreeze APIs to be called in
> > pair strictly, that is why I add the 1st patch.
> From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
> patch 2/2 is already delete the nvme_wait_freeze.
> If there is another bug of unmatched freeze and unfreeze,
> can you describe the analysis of unmatched freeze and unfreeze?
> The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
> Maybe another patch is needed to fix the bug.

Please look at nvme_reset_work():
		...
		if (dev->online_queues < 2) {
			...
        } else {
                nvme_start_queues(&dev->ctrl);
				nvme_wait_freeze(&dev->ctrl);
                if (!dev->ctrl.tagset)
                        nvme_pci_alloc_tag_set(dev);
                else
                        nvme_pci_update_nr_queues(dev);
                nvme_dbbuf_set(dev);
                nvme_unfreeze(&dev->ctrl);
		}
		...

nvme_wait_freeze() is called on unfrozen queue, so io hang is caused.

If nvme_unfreeze() is called on unfrozen queue, WARN_ON_ONCE() in
__blk_mq_unfreeze_queue() will be triggered.



Thanks,
Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
  2022-09-07  2:06             ` Ming Lei
@ 2022-09-07  5:58               ` Chao Leng
  0 siblings, 0 replies; 24+ messages in thread
From: Chao Leng @ 2022-09-07  5:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch



On 2022/9/7 10:06, Ming Lei wrote:
> On Wed, Sep 07, 2022 at 09:18:52AM +0800, Chao Leng wrote:
>>
>>
>> On 2022/9/7 8:33, Ming Lei wrote:
>>> On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/9/6 16:45, Ming Lei wrote:
>>>>> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>>>> From: Keith Busch <kbusch@kernel.org>
>>>>>>>
>>>>>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>>>>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>>>>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>>>>>> And similar approach has been applied on stopping/quiescing nvme queues.
>>>>>> This leads to another problem: the process that needs to be
>>>>>> in the frozen state is not actually frozen.
>>>>>> It's not safe.
>>>>>
>>>>> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
>>>>> can be done only the flag is set. Not sure how it isn't safe.
>>>> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
>>>> If just set_bit in nvme_start_freeze, it will cause another problem in
>>>> below scenario.
>>>> A: start freeze and set the bit;B:start freeze and set the bit;
>>>> and then
>>>> A:test and clear the bit, and unfreeze;B: test and skip;
>>>> The queue will be frozen for ever.
>>>
>>> One simple approach is to replace down_read(->namespaces_rwsem) with
>>> down_write(->namespaces_rwsem) in nvme_start_freeze() and
>>> nvme_unfreeze().
>>>
>>>>
>>>> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
>>>> No matter how to use NVME_NS_FREEZE , it may cause problems.
>>>> The freeze mechanism is perfect, and no additional protection mechanism is required.
>>>
>>> block layer requires queue freeze and unfreeze APIs to be called in
>>> pair strictly, that is why I add the 1st patch.
>>  From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
>> patch 2/2 is already delete the nvme_wait_freeze.
>> If there is another bug of unmatched freeze and unfreeze,
>> can you describe the analysis of unmatched freeze and unfreeze?
>> The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
>> Maybe another patch is needed to fix the bug.
> 
> Please look at nvme_reset_work():
> 		...
> 		if (dev->online_queues < 2) {
> 			...
>          } else {
>                  nvme_start_queues(&dev->ctrl);
> 				nvme_wait_freeze(&dev->ctrl);
>                  if (!dev->ctrl.tagset)
>                          nvme_pci_alloc_tag_set(dev);
>                  else
>                          nvme_pci_update_nr_queues(dev);
>                  nvme_dbbuf_set(dev);
>                  nvme_unfreeze(&dev->ctrl);
> 		}
> 		...
> 
> nvme_wait_freeze() is called on unfrozen queue, so io hang is caused.
> 
> If nvme_unfreeze() is called on unfrozen queue, WARN_ON_ONCE() in
> __blk_mq_unfreeze_queue() will be triggered.
The bug is specific to the PCI transport layer.
It might be a better option to fix this bug by adding flag for normal connection
and abnormal recovery like other transport layers, and then perform different processing.
Different processes are processed differently to avoid unexpected nvme_unfreeze.
> 
> 
> 
> Thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] nvme: make NVMe freeze API reliably
  2022-08-21  8:47 [PATCH 0/2] nvme: make NVMe freeze API reliably Ming Lei
                   ` (2 preceding siblings ...)
  2022-08-24 11:57 ` [PATCH 0/2] nvme: make NVMe freeze API reliably Yi Zhang
@ 2022-09-19 14:52 ` Christoph Hellwig
  2022-09-20  0:51   ` Ming Lei
  3 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2022-09-19 14:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch

Just trying to get back into this:  I think we're all in agreement
that the approach is fine now, aren't we?

Ming, can you respin the series to cover all transport and not just
PCIe, please?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] nvme: make NVMe freeze API reliably
  2022-09-19 14:52 ` Christoph Hellwig
@ 2022-09-20  0:51   ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2022-09-20  0:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Yi Zhang, Sagi Grimberg, Keith Busch

On Mon, Sep 19, 2022 at 04:52:15PM +0200, Christoph Hellwig wrote:
> Just trying to get back into this:  I think we're all in agreement
> that the approach is fine now, aren't we?

Accurately the approach in patch 2 is fine, but there is still
problem in patch 1.

> 
> Ming, can you respin the series to cover all transport and not just
> PCIe, please?

OK.


Thanks,
Ming



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-09-20  0:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-21  8:47 [PATCH 0/2] nvme: make NVMe freeze API reliably Ming Lei
2022-08-21  8:47 ` [PATCH 1/2] " Ming Lei
2022-08-24 11:15   ` Hannes Reinecke
2022-08-24 14:07     ` Keith Busch
2022-08-25 10:02   ` Chao Leng
2022-09-06  4:49     ` Christoph Hellwig
2022-09-06  8:45     ` Ming Lei
2022-09-06  9:32       ` Chao Leng
2022-09-07  0:33         ` Ming Lei
2022-09-07  1:18           ` Chao Leng
2022-09-07  2:06             ` Ming Lei
2022-09-07  5:58               ` Chao Leng
2022-08-28 14:37   ` Sagi Grimberg
2022-08-21  8:47 ` [PATCH 2/2] nvme-pci: don't wait freeze during resetting Ming Lei
2022-08-25 10:05   ` Chao Leng
2022-08-25 11:34     ` Ming Lei
2022-08-25 13:42       ` Keith Busch
2022-08-25 14:15         ` Ming Lei
2022-08-26  1:27           ` Chao Leng
2022-08-28 14:37   ` Sagi Grimberg
2022-09-06  4:49   ` Christoph Hellwig
2022-08-24 11:57 ` [PATCH 0/2] nvme: make NVMe freeze API reliably Yi Zhang
2022-09-19 14:52 ` Christoph Hellwig
2022-09-20  0:51   ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.