From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 11 May 2018 05:50:07 +0800 From: Ming Lei To: Keith Busch Cc: Jens Axboe , linux-block , Sagi Grimberg , Ming Lei , linux-nvme , Keith Busch , Jianchao Wang , Christoph Hellwig Subject: Re: [PATCH 1/2] nvme: pci: simplify timeout handling Message-ID: <20180510215006.GD3515@ming.t460p> References: <20180426123956.26039-2-ming.lei@redhat.com> <20180427175157.GB5073@localhost.localdomain> <20180428035015.GB5657@ming.t460p> <20180508153038.GA30842@localhost.localdomain> <20180510210548.GB4787@localhost.localdomain> <20180510211829.GC4787@localhost.localdomain> <20180510212444.GC3515@ming.t460p> <20180510214441.GD4787@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180510214441.GD4787@localhost.localdomain> List-ID: On Thu, May 10, 2018 at 03:44:41PM -0600, Keith Busch wrote: > On Fri, May 11, 2018 at 05:24:46AM +0800, Ming Lei wrote: > > Could you share me the link? > > The diff was in this reply here: > > http://lists.infradead.org/pipermail/linux-nvme/2018-April/017019.html > > > Firstly, the previous nvme_sync_queues() won't work reliably, so this > > patch introduces blk_unquiesce_timeout() and blk_quiesce_timeout() for > > this purpose. > > > > Secondly, I remembered that you only call nvme_sync_queues() at the > > entry of nvme_reset_work(), but timeout(either admin or normal IO) > > can happen again during resetting, that is another race addressed by > > this patchset, but can't cover by your proposal. > > I sync the queues at the beginning because it ensures there is not > a single in flight request for the entire controller (all namespaces > plus admin queue) before transitioning to the connecting state. But it can't avoid the race I mentioned, since nvme_dev_disable() can happen again during resetting. > > If a command times out during connecting state, we go to the dead state. That is too risky, since the IO during resetting isn't much different with IOs submitted from other IO paths. For example, in case of blktests block/011, controller shouldn't have been put into dead, and this patchset(V4 & V5) can cover this case well. Thanks, Ming From mboxrd@z Thu Jan 1 00:00:00 1970 From: ming.lei@redhat.com (Ming Lei) Date: Fri, 11 May 2018 05:50:07 +0800 Subject: [PATCH 1/2] nvme: pci: simplify timeout handling In-Reply-To: <20180510214441.GD4787@localhost.localdomain> References: <20180426123956.26039-2-ming.lei@redhat.com> <20180427175157.GB5073@localhost.localdomain> <20180428035015.GB5657@ming.t460p> <20180508153038.GA30842@localhost.localdomain> <20180510210548.GB4787@localhost.localdomain> <20180510211829.GC4787@localhost.localdomain> <20180510212444.GC3515@ming.t460p> <20180510214441.GD4787@localhost.localdomain> Message-ID: <20180510215006.GD3515@ming.t460p> On Thu, May 10, 2018@03:44:41PM -0600, Keith Busch wrote: > On Fri, May 11, 2018@05:24:46AM +0800, Ming Lei wrote: > > Could you share me the link? > > The diff was in this reply here: > > http://lists.infradead.org/pipermail/linux-nvme/2018-April/017019.html > > > Firstly, the previous nvme_sync_queues() won't work reliably, so this > > patch introduces blk_unquiesce_timeout() and blk_quiesce_timeout() for > > this purpose. > > > > Secondly, I remembered that you only call nvme_sync_queues() at the > > entry of nvme_reset_work(), but timeout(either admin or normal IO) > > can happen again during resetting, that is another race addressed by > > this patchset, but can't cover by your proposal. > > I sync the queues at the beginning because it ensures there is not > a single in flight request for the entire controller (all namespaces > plus admin queue) before transitioning to the connecting state. But it can't avoid the race I mentioned, since nvme_dev_disable() can happen again during resetting. > > If a command times out during connecting state, we go to the dead state. That is too risky, since the IO during resetting isn't much different with IOs submitted from other IO paths. For example, in case of blktests block/011, controller shouldn't have been put into dead, and this patchset(V4 & V5) can cover this case well. Thanks, Ming