From mboxrd@z Thu Jan  1 00:00:00 1970
From: hch@lst.de (Christoph Hellwig)
Date: Fri, 17 Feb 2017 16:27:13 +0100
Subject: [PATCH 5/5] nvme/pci: Complete all stuck requests
In-Reply-To: <1486768553-13738-6-git-send-email-keith.busch@intel.com>
References: <1486768553-13738-1-git-send-email-keith.busch@intel.com>
 <1486768553-13738-6-git-send-email-keith.busch@intel.com>
Message-ID: <20170217152713.GA27158@lst.de>

>  	u32 csts = -1;
> +	bool drain_queue = pci_is_enabled(to_pci_dev(dev->dev));
>  
>  	del_timer_sync(&dev->watchdog_timer);
>  	cancel_work_sync(&dev->reset_work);
>  
>  	mutex_lock(&dev->shutdown_lock);
> -	if (pci_is_enabled(to_pci_dev(dev->dev))) {
> +	if (drain_queue) {
> +		if (shutdown)
> +			nvme_start_freeze(&dev->ctrl);

So if the devices is enabled and we are going to shut the device
down we're going to freeze all I/O queues here.

Question 1:  why skip the freeze if we are not shutting down?

>  		nvme_stop_queues(&dev->ctrl);

Especially as we're now going to wait for all I/O to finish here in
all shutdown cases.

>  		csts = readl(dev->bar + NVME_REG_CSTS);
>  	}
> @@ -1701,6 +1704,25 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>  
>  	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
>  	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);

And kill all busy requests down here.

> +
> +	/*
> +	 * If shutting down, the driver will not be starting up queues again,
> +	 * so must drain all entered requests to their demise to avoid
> +	 * deadlocking blk-mq hot-cpu notifier.
> +	 */
> +	if (drain_queue && shutdown) {
> +		nvme_start_queues(&dev->ctrl);
> +		/*
> +		 * Waiting for frozen increases the freeze depth. Since we
> +		 * already start the freeze earlier in this function to stop
> +		 * incoming requests, we have to unfreeze after froze to get
> +		 * the depth back to the desired.
> +		 */
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_unfreeze(&dev->ctrl);
> +		nvme_stop_queues(&dev->ctrl);

And all this (just like the start_free + quience sequence above)
really sounds like something we'd need to move to the core.

> +		/*
> +		 * If we are resuming from suspend, the queue was set to freeze
> +		 * to prevent blk-mq's hot CPU notifier from getting stuck on
> +		 * requests that entered the queue that NVMe had quiesced. Now
> +		 * that we are resuming and have notified blk-mq of the new h/w
> +		 * context queue count, it is safe to unfreeze the queues.
> +		 */
> +		if (was_suspend)
> +			nvme_unfreeze(&dev->ctrl);

And this change I don't understand at all.  It doesn't seem to pair
up with anything else in the patch.