From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754745AbcBBMlo (ORCPT ); Tue, 2 Feb 2016 07:41:44 -0500 Received: from mail-wm0-f44.google.com ([74.125.82.44]:33668 "EHLO mail-wm0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754536AbcBBMln (ORCPT ); Tue, 2 Feb 2016 07:41:43 -0500 Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended To: Wenbo Wang , Jens Axboe , Wenbo Wang , "keith.busch@intel.com" References: <1454341324-21273-1-git-send-email-mail_weber_wang@163.com> <56AF8DB5.70206@fb.com> Cc: "Wenwei.Tao" , "linux-kernel@vger.kernel.org" , "linux-nvme@lists.infradead.org" From: Sagi Grimberg Message-ID: <56B0A401.30306@dev.mellanox.co.il> Date: Tue, 2 Feb 2016 14:41:37 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Jens, > > I did the following test to validate the issue. > > 1. Modify code as below to increase the chance of races. > Add 10s delay after nvme_dev_unmap() in nvme_dev_disable() > Add 10s delay before __nvme_submit_cmd() > 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register. > > Following is the execution order of the two code paths: > __blk_mq_run_hw_queue > Test BLK_MQ_S_STOPPED > nvme_dev_disable() > nvme_stop_queues() <-- set BLK_MQ_S_STOPPED > nvme_dev_unmap(dev) <-- unmap door bell > nvme_queue_rq() > Touch door bell <-- panic here First of all, I think we need to cancel all inflight requests before nvme_dev_unmap. With my patches that move I/O termination to the nvme core ([PATCH v1 0/3] Move active IO termination to the core) the change needed is: -- diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index e921165..2288bdb 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1890,10 +1890,11 @@ static void nvme_dev_shutdown(struct nvme_dev *dev) nvme_shutdown_ctrl(&dev->ctrl); nvme_disable_queue(dev, 0); } - nvme_dev_unmap(dev); blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_io, dev); blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_io, dev); + + nvme_dev_unmap(dev); } static int nvme_setup_prp_pools(struct nvme_dev *dev) -- But still we need a way to wait out for all active queue_rq to end. It seems like we need to maintain the request_fn_active for blk-mq and provide an API (blk_mq_wait_for_active_requests ?) that waits for it to drop to zero. Thoughts? From mboxrd@z Thu Jan 1 00:00:00 1970 From: sagig@dev.mellanox.co.il (Sagi Grimberg) Date: Tue, 2 Feb 2016 14:41:37 +0200 Subject: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended In-Reply-To: References: <1454341324-21273-1-git-send-email-mail_weber_wang@163.com> <56AF8DB5.70206@fb.com> Message-ID: <56B0A401.30306@dev.mellanox.co.il> > Jens, > > I did the following test to validate the issue. > > 1. Modify code as below to increase the chance of races. > Add 10s delay after nvme_dev_unmap() in nvme_dev_disable() > Add 10s delay before __nvme_submit_cmd() > 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register. > > Following is the execution order of the two code paths: > __blk_mq_run_hw_queue > Test BLK_MQ_S_STOPPED > nvme_dev_disable() > nvme_stop_queues() <-- set BLK_MQ_S_STOPPED > nvme_dev_unmap(dev) <-- unmap door bell > nvme_queue_rq() > Touch door bell <-- panic here First of all, I think we need to cancel all inflight requests before nvme_dev_unmap. With my patches that move I/O termination to the nvme core ([PATCH v1 0/3] Move active IO termination to the core) the change needed is: -- diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index e921165..2288bdb 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1890,10 +1890,11 @@ static void nvme_dev_shutdown(struct nvme_dev *dev) nvme_shutdown_ctrl(&dev->ctrl); nvme_disable_queue(dev, 0); } - nvme_dev_unmap(dev); blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_io, dev); blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_io, dev); + + nvme_dev_unmap(dev); } static int nvme_setup_prp_pools(struct nvme_dev *dev) -- But still we need a way to wait out for all active queue_rq to end. It seems like we need to maintain the request_fn_active for blk-mq and provide an API (blk_mq_wait_for_active_requests ?) that waits for it to drop to zero. Thoughts?