From mboxrd@z Thu Jan 1 00:00:00 1970 From: maxg@mellanox.com (Max Gurtovoy) Date: Tue, 21 May 2019 13:21:39 +0300 Subject: [PATCH 5.0 66/95] nvme: cancel request synchronously In-Reply-To: <20190521094535.GA28632@ming.t460p> References: <20190509181309.180685671@linuxfoundation.org> <20190509181314.082604502@linuxfoundation.org> <20190521094535.GA28632@ming.t460p> Message-ID: <7a4863ba-ece9-e3fa-8396-21736d54e1fe@mellanox.com> On 5/21/2019 12:45 PM, Ming Lei wrote: > On Tue, May 21, 2019@11:36:26AM +0300, Max Gurtovoy wrote: >> On 5/9/2019 9:42 PM, Greg Kroah-Hartman wrote: >>> [ Upstream commit eb3afb75b57c28599af0dfa03a99579d410749e9 ] >>> >>> nvme_cancel_request() is used in error handler, and it is always >>> reliable to cancel request synchronously, and avoids possible race >>> in which request may be completed after real hw queue is destroyed. >> Ming, >> >> If the completion is async in the block layer, can't a "good" request (not a >> canceled one..) complete after real HW queue is destroyed ? > In theory, it can't. > > 1) in case of error recovery > > It is driver's responsibility to sync normal completion and handling > error. NVMe PCI calls nvme_dev_disable() to shutdown controller, and > there won't be good request any more after nvme_dev_disable() returns. > I am not very familiar with NVMe RDMA code, but nvme_rdma_stop_io_queues() > is supposed to do that for avoiding race with normal completion. Otherwise, > it isn't enough by simply canceling in-flight requests. Indeed nvme_rdma_stop_io_queues will guaranty that we won't get anything from the wire/HCA anymore. But what happens to IO's that were completed before "nvme_rdma_stop_io_queues" in async way: 1. nvme_end_request --> blk_mq_complete_request (async) 2. error recovery starts (queues are stopped) 3. block layer calls ops->complete(rq) on rq from step #1 if the blk_mq_quiesce_queue + blk_mq_unquiesce_queue don't sync the requests from #1, i think it might be problematic.. > > 2) in case of device removal > blk_cleanup_queue() drains all in-queue requests, so there can't be > such issue. > > > Thanks, > Ming