From mboxrd@z Thu Jan  1 00:00:00 1970
From: hch@infradead.org (Christoph Hellwig)
Date: Wed, 25 Nov 2015 00:55:37 -0800
Subject: [PATCH] NVMe: Shutdown fixes
In-Reply-To: <1448407814-12544-1-git-send-email-keith.busch@intel.com>
References: <1448407814-12544-1-git-send-email-keith.busch@intel.com>
Message-ID: <20151125085537.GA19066@infradead.org>

> +static inline void nvme_end_req(struct request *req, int error)
> +{
> +	/*
> +	 * The request may already be marked "complete" by blk-mq if
> +	 * completion occurs during a timeout. We set req->errors
> +	 * before telling blk-mq in case this is the target request
> +	 * of a timeout handler. This is safe since this driver never
> +	 * completes the same command twice.
> +	 */
> +	req->errors = error;
> +	blk_mq_complete_request(req, req->errors);


I don't think overwriting req->errors on an already completed request
is a good idea - it's what we used to earlier, but I specificly changed
the blk_mq_complete_request so that we don't do that.  Can you explain
when you want to overwrite the value exactly and why?

>  	/*
> -	 * Just return an error to reset_work if we're already called from
> -	 * probe context.  We don't worry about request / tag reuse as
> -	 * reset_work will immeditalely unwind and remove the controller
> -	 * once we fail a command.
> +	 * Shutdown immediately if controller times out while starting. The
> +	 * reset work will see the pci device disabled when it gets the forced
> +	 * cancellation error. All outstanding requests are completed on
> +	 * shutdown, so we return BLK_EH_HANDLED.
>  	 */
>  	if (test_bit(NVME_CTRL_RESETTING, &dev->flags)) {
> -		req->errors = NVME_SC_ABORT_REQ | NVME_SC_DNR;
> +		dev_warn(dev->dev,
> +				"I/O %d QID %d timeout, disable controller\n",
> +				 req->tag, nvmeq->qid);
> +		nvme_dev_shutdown(dev);
>  		return BLK_EH_HANDLED;

So here you return BLK_EH_HANDLED without setting req->errors, which
sounds like it might be related to the above issue.

>  	/*
> +	 * Shutdown controller if the command was already aborted once
> +	 * before and still hasn't been returned to the driver, or if this
> +	 * is * the admin queue, and then schedule restart.

little formatting error with the '* ' after the is above.

>  	 */
>  	if (!nvmeq->qid || iod->aborted) {
> +		dev_warn(dev->dev, "I/O %d QID %d timeout, reset controller\n",
>  				 req->tag, nvmeq->qid);
> +		nvme_dev_shutdown(dev);
> +		queue_work(nvme_workq, &dev->reset_work);

So now we will get a call to nvme_dev_shutdown for every I/O that times
out.  I'll see how my controller handles the discard of death with that.

>  
>  		/*
> +		 * Mark the request as handled, since the inline shutdown
> +		 * forces all outstanding requests to complete.
>  		 */
> -		return BLK_EH_QUIESCED;
> +		return BLK_EH_HANDLED;

Again we're not setting req->errors here.

> -	if (status > 0) {
> +	if (status) {
>  		dev_err(dev->dev, "Could not set queue count (%d)\n", status);
>  		return 0;
>  	}
> @@ -2023,6 +2038,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
>  
>  	nvme_dev_list_remove(dev);
>  
> +	mutex_lock(&dev->shutdown_lock);

Shoudn't we also have a state check that skips all the work here if
the controller has already been shut down?

> +
> +	/*
> +	 * The pci device will be disabled if a timeout occurs while setting up
> +	 * IO queues, but return 0 for "result", so we have to check both.
> +	 */
> +	if (result || !pci_is_enabled(to_pci_dev(dev->dev)))
>  		goto free_tags;

I don't think we should return zero here in that case - that seems to
be a bug in setting up req->errrors in the timeout handler.