From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753714AbeASF6d (ORCPT <rfc822;w@1wt.eu>);
        Fri, 19 Jan 2018 00:58:33 -0500
Received: from aserp2120.oracle.com ([141.146.126.78]:60008 "EHLO
        aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752100AbeASF6I (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 19 Jan 2018 00:58:08 -0500
Subject: Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is
 ongoing
To: Keith Busch <keith.busch@intel.com>
Cc: axboe@fb.com, hch@lst.de, sagi@grimberg.me, maxg@mellanox.com,
        james.smart@broadcom.com, linux-nvme@lists.infradead.org,
        linux-kernel@vger.kernel.org
References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
 <1516270202-8051-3-git-send-email-jianchao.w.wang@oracle.com>
 <20180119045944.GC12043@localhost.localdomain>
From: "jianchao.wang" <jianchao.w.wang@oracle.com>
Message-ID: <0b74b36d-ecb5-e9e2-2900-6dc9c9699658@oracle.com>
Date: Fri, 19 Jan 2018 13:55:29 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <20180119045944.GC12043@localhost.localdomain>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8778 signatures=668654
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0
 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999
 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.0.1-1711220000 definitions=main-1801190074
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Keith

Thanks for your kindly response and directive.

On 01/19/2018 12:59 PM, Keith Busch wrote:
> On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote:
>> +	 * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
>> +	 *   request should come from the previous work and we handle
>> +	 *   it as nvme_cancel_request.
>> +	 * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired
>> +	 *   request should come from the initializing procedure such as
>> +	 *   setup io queues, because all the previous outstanding
>> +	 *   requests should have been cancelled.
>>  	 */
>> -	if (dev->ctrl.state == NVME_CTRL_RESETTING) {
>> -		dev_warn(dev->ctrl.device,
>> -			 "I/O %d QID %d timeout, disable controller\n",
>> -			 req->tag, nvmeq->qid);
>> -		nvme_dev_disable(dev, false);
>> +	switch (dev->ctrl.state) {
>> +	case NVME_CTRL_RESETTING:
>> +		nvme_req(req)->status = NVME_SC_ABORT_REQ;
>> +		return BLK_EH_HANDLED;
>> +	case NVME_CTRL_RECONNECTING:
>> +		WARN_ON_ONCE(nvmeq->qid);
>>  		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>>  		return BLK_EH_HANDLED;
>> +	default:
>> +		break;
>>  	}
> 
> The driver may be giving up on the command here, but that doesn't mean
> the controller has. We can't just end the request like this because that
> will release the memory the controller still owns. We must wait until
> after nvme_dev_disable clears bus master because we can't say for sure
> the controller isn't going to write to that address right after we end
> the request.
> 
Yes, but the controller is going to be reseted or shutdown at the moment,
even if the controller accesses a bad address and goes wrong, everything will
be ok after reset or shutdown. :)

Thanks
Jianchao  

From mboxrd@z Thu Jan  1 00:00:00 1970
From: jianchao.w.wang@oracle.com (jianchao.wang)
Date: Fri, 19 Jan 2018 13:55:29 +0800
Subject: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is
 ongoing
In-Reply-To: <20180119045944.GC12043@localhost.localdomain>
References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com>
 <1516270202-8051-3-git-send-email-jianchao.w.wang@oracle.com>
 <20180119045944.GC12043@localhost.localdomain>
Message-ID: <0b74b36d-ecb5-e9e2-2900-6dc9c9699658@oracle.com>

Hi Keith

Thanks for your kindly response and directive.

On 01/19/2018 12:59 PM, Keith Busch wrote:
> On Thu, Jan 18, 2018@06:10:02PM +0800, Jianchao Wang wrote:
>> +	 * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
>> +	 *   request should come from the previous work and we handle
>> +	 *   it as nvme_cancel_request.
>> +	 * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired
>> +	 *   request should come from the initializing procedure such as
>> +	 *   setup io queues, because all the previous outstanding
>> +	 *   requests should have been cancelled.
>>  	 */
>> -	if (dev->ctrl.state == NVME_CTRL_RESETTING) {
>> -		dev_warn(dev->ctrl.device,
>> -			 "I/O %d QID %d timeout, disable controller\n",
>> -			 req->tag, nvmeq->qid);
>> -		nvme_dev_disable(dev, false);
>> +	switch (dev->ctrl.state) {
>> +	case NVME_CTRL_RESETTING:
>> +		nvme_req(req)->status = NVME_SC_ABORT_REQ;
>> +		return BLK_EH_HANDLED;
>> +	case NVME_CTRL_RECONNECTING:
>> +		WARN_ON_ONCE(nvmeq->qid);
>>  		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>>  		return BLK_EH_HANDLED;
>> +	default:
>> +		break;
>>  	}
> 
> The driver may be giving up on the command here, but that doesn't mean
> the controller has. We can't just end the request like this because that
> will release the memory the controller still owns. We must wait until
> after nvme_dev_disable clears bus master because we can't say for sure
> the controller isn't going to write to that address right after we end
> the request.
> 
Yes, but the controller is going to be reseted or shutdown at the moment,
even if the controller accesses a bad address and goes wrong, everything will
be ok after reset or shutdown. :)

Thanks
Jianchao