From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754760AbeASH6W (ORCPT <rfc822;w@1wt.eu>);
        Fri, 19 Jan 2018 02:58:22 -0500
Received: from mga05.intel.com ([192.55.52.43]:21206 "EHLO mga05.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1750948AbeASH6N (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 19 Jan 2018 02:58:13 -0500
X-Amp-Result: UNKNOWN
X-Amp-Original-Verdict: FILE UNKNOWN
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.46,381,1511856000";
   d="scan'208";a="10963715"
Date: Fri, 19 Jan 2018 01:01:30 -0700
From: Keith Busch <keith.busch@intel.com>
To: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: axboe@fb.com, hch@lst.de, sagi@grimberg.me, maxg@mellanox.com,
        james.smart@broadcom.com, linux-nvme@lists.infradead.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is
 ongoing
Message-ID: <20180119080130.GE12043@localhost.localdomain>
References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Jan 18, 2018 at 06:10:00PM +0800, Jianchao Wang wrote:
> Hello
> 
> Please consider the following scenario.
> nvme_reset_ctrl
>   -> set state to RESETTING
>   -> queue reset_work       
>     (scheduling)
> nvme_reset_work
>   -> nvme_dev_disable
>     -> quiesce queues
>     -> nvme_cancel_request 
>        on outstanding requests
> -------------------------------_boundary_
>   -> nvme initializing (issue request on adminq)
> 
> Before the _boundary_, not only quiesce the queues, but only cancel
> all the outstanding requests.
> 
> A request could expire when the ctrl state is RESETTING.
>  - If the timeout occur before the _boundary_, the expired requests
>    are from the previous work.
>  - Otherwise, the expired requests are from the controller initializing
>    procedure, such as sending cq/sq create commands to adminq to setup
>    io queues.
> In current implementation, nvme_timeout cannot identify the _boundary_ 
> so only handles second case above.

Bare with me a moment, as I'm only just now getting a real chance to look
at this, and I'm not quite sure I follow what problem this is solving.

The nvme_dev_disable routine makes forward progress without depending on
timeout handling to complete expired commands. Once controller disabling
completes, there can't possibly be any started requests that can expire.
So we don't need nvme_timeout to do anything for requests above the
boundary.

From mboxrd@z Thu Jan  1 00:00:00 1970
From: keith.busch@intel.com (Keith Busch)
Date: Fri, 19 Jan 2018 01:01:30 -0700
Subject: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is
 ongoing
In-Reply-To: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
Message-ID: <20180119080130.GE12043@localhost.localdomain>

On Thu, Jan 18, 2018@06:10:00PM +0800, Jianchao Wang wrote:
> Hello
> 
> Please consider the following scenario.
> nvme_reset_ctrl
>   -> set state to RESETTING
>   -> queue reset_work       
>     (scheduling)
> nvme_reset_work
>   -> nvme_dev_disable
>     -> quiesce queues
>     -> nvme_cancel_request 
>        on outstanding requests
> -------------------------------_boundary_
>   -> nvme initializing (issue request on adminq)
> 
> Before the _boundary_, not only quiesce the queues, but only cancel
> all the outstanding requests.
> 
> A request could expire when the ctrl state is RESETTING.
>  - If the timeout occur before the _boundary_, the expired requests
>    are from the previous work.
>  - Otherwise, the expired requests are from the controller initializing
>    procedure, such as sending cq/sq create commands to adminq to setup
>    io queues.
> In current implementation, nvme_timeout cannot identify the _boundary_ 
> so only handles second case above.

Bare with me a moment, as I'm only just now getting a real chance to look
at this, and I'm not quite sure I follow what problem this is solving.

The nvme_dev_disable routine makes forward progress without depending on
timeout handling to complete expired commands. Once controller disabling
completes, there can't possibly be any started requests that can expire.
So we don't need nvme_timeout to do anything for requests above the
boundary.