From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 1/2] nvme: pci: simplify timeout handling To: Ming Lei Cc: Jens Axboe , linux-block@vger.kernel.org, Sagi Grimberg , linux-nvme@lists.infradead.org, Keith Busch , Christoph Hellwig References: <20180426123956.26039-1-ming.lei@redhat.com> <20180426123956.26039-2-ming.lei@redhat.com> <20180426155722.GA3597@ming.t460p> From: "jianchao.wang" Message-ID: <325688af-3ae2-49db-3a59-ef3903adcdf6@oracle.com> Date: Fri, 27 Apr 2018 09:37:06 +0800 MIME-Version: 1.0 In-Reply-To: <20180426155722.GA3597@ming.t460p> Content-Type: text/plain; charset=utf-8 List-ID: On 04/26/2018 11:57 PM, Ming Lei wrote: > Hi Jianchao, > > On Thu, Apr 26, 2018 at 11:07:56PM +0800, jianchao.wang wrote: >> Hi Ming >> >> Thanks for your wonderful solution. :) >> >> On 04/26/2018 08:39 PM, Ming Lei wrote: >>> +/* >>> + * This one is called after queues are quiesced, and no in-fligh timeout >>> + * and nvme interrupt handling. >>> + */ >>> +static void nvme_pci_cancel_request(struct request *req, void *data, >>> + bool reserved) >>> +{ >>> + /* make sure timed-out requests are covered too */ >>> + if (req->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) { >>> + req->aborted_gstate = 0; >>> + req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; >>> + } >>> + >>> + nvme_cancel_request(req, data, reserved); >>> +} >>> + >>> static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >>> { >>> int i; >>> @@ -2223,10 +2316,17 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >>> for (i = dev->ctrl.queue_count - 1; i >= 0; i--) >>> nvme_suspend_queue(&dev->queues[i]); >>> >>> + /* >>> + * safe to sync timeout after queues are quiesced, then all >>> + * requests(include the time-out ones) will be canceled. >>> + */ >>> + nvme_sync_queues(&dev->ctrl); >>> + blk_sync_queue(dev->ctrl.admin_q); >>> + >> Looks like blk_sync_queue cannot drain all the timeout work. >> >> blk_sync_queue >> -> del_timer_sync >> blk_mq_timeout_work >> -> mod_timer >> -> cancel_work_sync >> the timeout work may come back again. >> we may need to force all the in-flight requests to be timed out with blk_abort_request >> > > blk_abort_request() seems over-kill, we could avoid this race simply by > returning EH_NOT_HANDLED if the controller is in-recovery. return EH_NOT_HANDLED maybe not enough. please consider the following scenario. nvme_error_handler -> nvme_dev_disable -> blk_sync_queue //timeout comes again due to the //scenario above blk_mq_timeout_work -> blk_mq_check_expired -> set aborted_gstate -> nvme_pci_cancel_request -> RQF_MQ_TIMEOUT_EXPIRED has not been set -> nvme_cancel_request -> blk_mq_complete_request -> do nothing -> blk_mq_ternimate_expired -> blk_mq_rq_timed_out -> set RQF_MQ_TIMEOUT_EXPIRED -> .timeout return BLK_EH_NOT_HANDLED Then the timeout request is leaked. > >>> nvme_pci_disable(dev); >> >> the interrupt will not come, but there maybe running one. >> a synchronize_sched() here ? > > We may cover this case by moving nvme_suspend_queue() before > nvme_stop_queues(). > > Both two are very good catch, thanks! > From mboxrd@z Thu Jan 1 00:00:00 1970 From: jianchao.w.wang@oracle.com (jianchao.wang) Date: Fri, 27 Apr 2018 09:37:06 +0800 Subject: [PATCH 1/2] nvme: pci: simplify timeout handling In-Reply-To: <20180426155722.GA3597@ming.t460p> References: <20180426123956.26039-1-ming.lei@redhat.com> <20180426123956.26039-2-ming.lei@redhat.com> <20180426155722.GA3597@ming.t460p> Message-ID: <325688af-3ae2-49db-3a59-ef3903adcdf6@oracle.com> On 04/26/2018 11:57 PM, Ming Lei wrote: > Hi Jianchao, > > On Thu, Apr 26, 2018@11:07:56PM +0800, jianchao.wang wrote: >> Hi Ming >> >> Thanks for your wonderful solution. :) >> >> On 04/26/2018 08:39 PM, Ming Lei wrote: >>> +/* >>> + * This one is called after queues are quiesced, and no in-fligh timeout >>> + * and nvme interrupt handling. >>> + */ >>> +static void nvme_pci_cancel_request(struct request *req, void *data, >>> + bool reserved) >>> +{ >>> + /* make sure timed-out requests are covered too */ >>> + if (req->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) { >>> + req->aborted_gstate = 0; >>> + req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; >>> + } >>> + >>> + nvme_cancel_request(req, data, reserved); >>> +} >>> + >>> static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >>> { >>> int i; >>> @@ -2223,10 +2316,17 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) >>> for (i = dev->ctrl.queue_count - 1; i >= 0; i--) >>> nvme_suspend_queue(&dev->queues[i]); >>> >>> + /* >>> + * safe to sync timeout after queues are quiesced, then all >>> + * requests(include the time-out ones) will be canceled. >>> + */ >>> + nvme_sync_queues(&dev->ctrl); >>> + blk_sync_queue(dev->ctrl.admin_q); >>> + >> Looks like blk_sync_queue cannot drain all the timeout work. >> >> blk_sync_queue >> -> del_timer_sync >> blk_mq_timeout_work >> -> mod_timer >> -> cancel_work_sync >> the timeout work may come back again. >> we may need to force all the in-flight requests to be timed out with blk_abort_request >> > > blk_abort_request() seems over-kill, we could avoid this race simply by > returning EH_NOT_HANDLED if the controller is in-recovery. return EH_NOT_HANDLED maybe not enough. please consider the following scenario. nvme_error_handler -> nvme_dev_disable -> blk_sync_queue //timeout comes again due to the //scenario above blk_mq_timeout_work -> blk_mq_check_expired -> set aborted_gstate -> nvme_pci_cancel_request -> RQF_MQ_TIMEOUT_EXPIRED has not been set -> nvme_cancel_request -> blk_mq_complete_request -> do nothing -> blk_mq_ternimate_expired -> blk_mq_rq_timed_out -> set RQF_MQ_TIMEOUT_EXPIRED -> .timeout return BLK_EH_NOT_HANDLED Then the timeout request is leaked. > >>> nvme_pci_disable(dev); >> >> the interrupt will not come, but there maybe running one. >> a synchronize_sched() here ? > > We may cover this case by moving nvme_suspend_queue() before > nvme_stop_queues(). > > Both two are very good catch, thanks! >