From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2130.oracle.com ([141.146.126.79]:54030 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751904AbeA0Oa3 (ORCPT ); Sat, 27 Jan 2018 09:30:29 -0500 Subject: Re: [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED To: Ming Lei Cc: Christoph Hellwig , Keith Busch , stable@vger.kernel.org, Sagi Grimberg , linux-nvme@lists.infradead.org, Xiao Liang References: <20180125081023.13303-1-ming.lei@redhat.com> <20180125101503.GA13375@ming.t460p> <6749d736-ffdd-7f8d-c50c-58453b054ef8@oracle.com> <20180127133127.GA19560@ming.t460p> From: "jianchao.wang" Message-ID: Date: Sat, 27 Jan 2018 22:29:30 +0800 MIME-Version: 1.0 In-Reply-To: <20180127133127.GA19560@ming.t460p> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: stable-owner@vger.kernel.org List-ID: Hi ming Thanks for your detailed response. That's really appreciated. On 01/27/2018 09:31 PM, Ming Lei wrote: >> But nvme_dev_disable may run with nvme_timeout in parallel or race with it. > But that doesn't mean it is a race, blk_mq_complete_request() can avoid race > between timeout and other completions, such as cancel. > Yes, I know blk_mq_complete_request could avoid the a request is accessed by timeout path and other completion path concurrently. :) What's I worry about is the timeout path could hold the expired request, so when nvme_dev_disable return, we cannot ensure all the previous outstanding requests has been handled. That's really bad. >> The best way to fix this is to ensure the timeout path has been completed before cancel the >> previously outstanding requests. (Just ignore the case where the nvme_timeout will invoke nvme_dev_disable, >> it has to be fixed by other way.) > Maybe your approach looks a bit clean and simplify the implementation, but seems > it isn't necessary. > > So could you explain a bit what the exact issue you are worrying about? deadlock? > or others? There is indeed potential issue. But it is in very narrow window. Please refer to https://lkml.org/lkml/2018/1/19/68 As you said, the approach looks a bit clean and simplify the implementation. That's what I really want, break the complicated relationship between nvme_timeout and nvme_dev_diable. Thanks Jianchao From mboxrd@z Thu Jan 1 00:00:00 1970 From: jianchao.w.wang@oracle.com (jianchao.wang) Date: Sat, 27 Jan 2018 22:29:30 +0800 Subject: [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED In-Reply-To: <20180127133127.GA19560@ming.t460p> References: <20180125081023.13303-1-ming.lei@redhat.com> <20180125101503.GA13375@ming.t460p> <6749d736-ffdd-7f8d-c50c-58453b054ef8@oracle.com> <20180127133127.GA19560@ming.t460p> Message-ID: Hi ming Thanks for your detailed response. That's really appreciated. On 01/27/2018 09:31 PM, Ming Lei wrote: >> But nvme_dev_disable may run with nvme_timeout in parallel or race with it. > But that doesn't mean it is a race, blk_mq_complete_request() can avoid race > between timeout and other completions, such as cancel. > Yes, I know blk_mq_complete_request could avoid the a request is accessed by timeout path and other completion path concurrently. :) What's I worry about is the timeout path could hold the expired request, so when nvme_dev_disable return, we cannot ensure all the previous outstanding requests has been handled. That's really bad. >> The best way to fix this is to ensure the timeout path has been completed before cancel the >> previously outstanding requests. (Just ignore the case where the nvme_timeout will invoke nvme_dev_disable, >> it has to be fixed by other way.) > Maybe your approach looks a bit clean and simplify the implementation, but seems > it isn't necessary. > > So could you explain a bit what the exact issue you are worrying about? deadlock? > or others? There is indeed potential issue. But it is in very narrow window. Please refer to https://lkml.org/lkml/2018/1/19/68 As you said, the approach looks a bit clean and simplify the implementation. That's what I really want, break the complicated relationship between nvme_timeout and nvme_dev_diable. Thanks Jianchao