From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754952AbeASIjI (ORCPT <rfc822;w@1wt.eu>);
        Fri, 19 Jan 2018 03:39:08 -0500
Received: from mga04.intel.com ([192.55.52.120]:19243 "EHLO mga04.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1754693AbeASIi7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 19 Jan 2018 03:38:59 -0500
X-Amp-Result: UNKNOWN
X-Amp-Original-Verdict: FILE UNKNOWN
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.46,381,1511856000";
   d="scan'208";a="24641452"
Date: Fri, 19 Jan 2018 01:42:18 -0700
From: Keith Busch <keith.busch@intel.com>
To: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: axboe@fb.com, hch@lst.de, sagi@grimberg.me, maxg@mellanox.com,
        james.smart@broadcom.com, linux-nvme@lists.infradead.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is
 ongoing
Message-ID: <20180119084218.GF12043@localhost.localdomain>
References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
 <20180119080130.GE12043@localhost.localdomain>
 <0639aa2f-d153-5aac-ce08-df0d4b45f9a0@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <0639aa2f-d153-5aac-ce08-df0d4b45f9a0@oracle.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Jan 19, 2018 at 04:14:02PM +0800, jianchao.wang wrote:
> On 01/19/2018 04:01 PM, Keith Busch wrote:
> > The nvme_dev_disable routine makes forward progress without depending on
> > timeout handling to complete expired commands. Once controller disabling
> > completes, there can't possibly be any started requests that can expire.
> > So we don't need nvme_timeout to do anything for requests above the
> > boundary.
> > 
> Yes, once controller disabling completes, any started requests will be handled and cannot expire.
> But before the _boundary_, there could be a nvme_timeout context runs with nvme_dev_disable in parallel.
> If a timeout path grabs a request, then nvme_dev_disable cannot get and cancel it.
> So even though the nvme_dev_disable completes, there still could be a request in nvme_timeout context.
> 
> The worst case is :
> nvme_timeout                              nvme_reset_work
> if (ctrl->state == RESETTING )              nvme_dev_disable
>     nvme_dev_disable                        initializing procedure
> 
> the nvme_dev_disable run with reinit procedure in nvme_reset_work in parallel.

Okay, I see what you're saying. That's a pretty obscure case, as normally
we enter nvme_reset_work with the queues already disabled, but there
are a few cases where we need nvme_reset_work to handle that.

But if we are in that case, I think we really just want to sync the
queues. What do you think of this?

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fde6fd2e7eef..516383193416 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3520,6 +3520,17 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
+void nvme_sync_queues(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_sync_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_sync_queues);
+
 void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 8e7fc1b041b7..45b1b8ceddb6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -374,6 +374,7 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_sync_queues(struct nvme_ctrl *ctrl)
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a2ffb557b616..1fe00be22ad1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2289,8 +2289,10 @@ static void nvme_reset_work(struct work_struct *work)
 	 * If we're called to reset a live controller first shut it down before
 	 * moving on.
 	 */
-	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
+	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE) {
 		nvme_dev_disable(dev, false);
+		nvme_sync_queues(&dev->ctrl);
+	}
 
 	result = nvme_pci_enable(dev);
 	if (result)
--

From mboxrd@z Thu Jan  1 00:00:00 1970
From: keith.busch@intel.com (Keith Busch)
Date: Fri, 19 Jan 2018 01:42:18 -0700
Subject: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is
 ongoing
In-Reply-To: <0639aa2f-d153-5aac-ce08-df0d4b45f9a0@oracle.com>
References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
 <20180119080130.GE12043@localhost.localdomain>
 <0639aa2f-d153-5aac-ce08-df0d4b45f9a0@oracle.com>
Message-ID: <20180119084218.GF12043@localhost.localdomain>

On Fri, Jan 19, 2018@04:14:02PM +0800, jianchao.wang wrote:
> On 01/19/2018 04:01 PM, Keith Busch wrote:
> > The nvme_dev_disable routine makes forward progress without depending on
> > timeout handling to complete expired commands. Once controller disabling
> > completes, there can't possibly be any started requests that can expire.
> > So we don't need nvme_timeout to do anything for requests above the
> > boundary.
> > 
> Yes, once controller disabling completes, any started requests will be handled and cannot expire.
> But before the _boundary_, there could be a nvme_timeout context runs with nvme_dev_disable in parallel.
> If a timeout path grabs a request, then nvme_dev_disable cannot get and cancel it.
> So even though the nvme_dev_disable completes, there still could be a request in nvme_timeout context.
> 
> The worst case is :
> nvme_timeout                              nvme_reset_work
> if (ctrl->state == RESETTING )              nvme_dev_disable
>     nvme_dev_disable                        initializing procedure
> 
> the nvme_dev_disable run with reinit procedure in nvme_reset_work in parallel.

Okay, I see what you're saying. That's a pretty obscure case, as normally
we enter nvme_reset_work with the queues already disabled, but there
are a few cases where we need nvme_reset_work to handle that.

But if we are in that case, I think we really just want to sync the
queues. What do you think of this?

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fde6fd2e7eef..516383193416 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3520,6 +3520,17 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
+void nvme_sync_queues(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_sync_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_sync_queues);
+
 void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 8e7fc1b041b7..45b1b8ceddb6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -374,6 +374,7 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_sync_queues(struct nvme_ctrl *ctrl)
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a2ffb557b616..1fe00be22ad1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2289,8 +2289,10 @@ static void nvme_reset_work(struct work_struct *work)
 	 * If we're called to reset a live controller first shut it down before
 	 * moving on.
 	 */
-	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
+	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE) {
 		nvme_dev_disable(dev, false);
+		nvme_sync_queues(&dev->ctrl);
+	}
 
 	result = nvme_pci_enable(dev);
 	if (result)
--