From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753714AbeASF6d (ORCPT ); Fri, 19 Jan 2018 00:58:33 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:60008 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752100AbeASF6I (ORCPT ); Fri, 19 Jan 2018 00:58:08 -0500 Subject: Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing To: Keith Busch Cc: axboe@fb.com, hch@lst.de, sagi@grimberg.me, maxg@mellanox.com, james.smart@broadcom.com, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com> <1516270202-8051-3-git-send-email-jianchao.w.wang@oracle.com> <20180119045944.GC12043@localhost.localdomain> From: "jianchao.wang" Message-ID: <0b74b36d-ecb5-e9e2-2900-6dc9c9699658@oracle.com> Date: Fri, 19 Jan 2018 13:55:29 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180119045944.GC12043@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8778 signatures=668654 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801190074 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Keith Thanks for your kindly response and directive. On 01/19/2018 12:59 PM, Keith Busch wrote: > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote: >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired >> + * request should come from the previous work and we handle >> + * it as nvme_cancel_request. >> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired >> + * request should come from the initializing procedure such as >> + * setup io queues, because all the previous outstanding >> + * requests should have been cancelled. >> */ >> - if (dev->ctrl.state == NVME_CTRL_RESETTING) { >> - dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, disable controller\n", >> - req->tag, nvmeq->qid); >> - nvme_dev_disable(dev, false); >> + switch (dev->ctrl.state) { >> + case NVME_CTRL_RESETTING: >> + nvme_req(req)->status = NVME_SC_ABORT_REQ; >> + return BLK_EH_HANDLED; >> + case NVME_CTRL_RECONNECTING: >> + WARN_ON_ONCE(nvmeq->qid); >> nvme_req(req)->flags |= NVME_REQ_CANCELLED; >> return BLK_EH_HANDLED; >> + default: >> + break; >> } > > The driver may be giving up on the command here, but that doesn't mean > the controller has. We can't just end the request like this because that > will release the memory the controller still owns. We must wait until > after nvme_dev_disable clears bus master because we can't say for sure > the controller isn't going to write to that address right after we end > the request. > Yes, but the controller is going to be reseted or shutdown at the moment, even if the controller accesses a bad address and goes wrong, everything will be ok after reset or shutdown. :) Thanks Jianchao From mboxrd@z Thu Jan 1 00:00:00 1970 From: jianchao.w.wang@oracle.com (jianchao.wang) Date: Fri, 19 Jan 2018 13:55:29 +0800 Subject: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing In-Reply-To: <20180119045944.GC12043@localhost.localdomain> References: <1516270202-8051-1-git-send-email-jianchao.w.wang@oracle.com> <1516270202-8051-3-git-send-email-jianchao.w.wang@oracle.com> <20180119045944.GC12043@localhost.localdomain> Message-ID: <0b74b36d-ecb5-e9e2-2900-6dc9c9699658@oracle.com> Hi Keith Thanks for your kindly response and directive. On 01/19/2018 12:59 PM, Keith Busch wrote: > On Thu, Jan 18, 2018@06:10:02PM +0800, Jianchao Wang wrote: >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired >> + * request should come from the previous work and we handle >> + * it as nvme_cancel_request. >> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired >> + * request should come from the initializing procedure such as >> + * setup io queues, because all the previous outstanding >> + * requests should have been cancelled. >> */ >> - if (dev->ctrl.state == NVME_CTRL_RESETTING) { >> - dev_warn(dev->ctrl.device, >> - "I/O %d QID %d timeout, disable controller\n", >> - req->tag, nvmeq->qid); >> - nvme_dev_disable(dev, false); >> + switch (dev->ctrl.state) { >> + case NVME_CTRL_RESETTING: >> + nvme_req(req)->status = NVME_SC_ABORT_REQ; >> + return BLK_EH_HANDLED; >> + case NVME_CTRL_RECONNECTING: >> + WARN_ON_ONCE(nvmeq->qid); >> nvme_req(req)->flags |= NVME_REQ_CANCELLED; >> return BLK_EH_HANDLED; >> + default: >> + break; >> } > > The driver may be giving up on the command here, but that doesn't mean > the controller has. We can't just end the request like this because that > will release the memory the controller still owns. We must wait until > after nvme_dev_disable clears bus master because we can't say for sure > the controller isn't going to write to that address right after we end > the request. > Yes, but the controller is going to be reseted or shutdown at the moment, even if the controller accesses a bad address and goes wrong, everything will be ok after reset or shutdown. :) Thanks Jianchao