From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755437AbcBCOle (ORCPT ); Wed, 3 Feb 2016 09:41:34 -0500 Received: from mga11.intel.com ([192.55.52.93]:19323 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751952AbcBCOlc (ORCPT ); Wed, 3 Feb 2016 09:41:32 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,391,1449561600"; d="scan'208";a="895555754" Date: Wed, 3 Feb 2016 14:41:24 +0000 From: Keith Busch To: Wenbo Wang Cc: Jens Axboe , Wenbo Wang , "linux-kernel@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "Wenwei.Tao" Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended Message-ID: <20160203144123.GB23910@localhost.localdomain> References: <1454341324-21273-1-git-send-email-mail_weber_wang@163.com> <56AF8DB5.70206@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 02, 2016 at 07:15:57AM +0000, Wenbo Wang wrote: > I did the following test to validate the issue. > > 1. Modify code as below to increase the chance of races. > Add 10s delay after nvme_dev_unmap() in nvme_dev_disable() > Add 10s delay before __nvme_submit_cmd() > 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register. > > Following is the execution order of the two code paths: > __blk_mq_run_hw_queue > Test BLK_MQ_S_STOPPED > nvme_dev_disable() > nvme_stop_queues() <-- set BLK_MQ_S_STOPPED > nvme_dev_unmap(dev) <-- unmap door bell > nvme_queue_rq() > Touch door bell <-- panic here Does the following force the first to complete before the unmap? --- @@ -1415,10 +1421,21 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl) blk_mq_cancel_requeue_work(ns->queue); blk_mq_stop_hw_queues(ns->queue); + blk_sync_queue(ns->queue); } mutex_unlock(&ctrl->namespaces_mutex); } -- From mboxrd@z Thu Jan 1 00:00:00 1970 From: keith.busch@intel.com (Keith Busch) Date: Wed, 3 Feb 2016 14:41:24 +0000 Subject: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended In-Reply-To: References: <1454341324-21273-1-git-send-email-mail_weber_wang@163.com> <56AF8DB5.70206@fb.com> Message-ID: <20160203144123.GB23910@localhost.localdomain> On Tue, Feb 02, 2016@07:15:57AM +0000, Wenbo Wang wrote: > I did the following test to validate the issue. > > 1. Modify code as below to increase the chance of races. > Add 10s delay after nvme_dev_unmap() in nvme_dev_disable() > Add 10s delay before __nvme_submit_cmd() > 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register. > > Following is the execution order of the two code paths: > __blk_mq_run_hw_queue > Test BLK_MQ_S_STOPPED > nvme_dev_disable() > nvme_stop_queues() <-- set BLK_MQ_S_STOPPED > nvme_dev_unmap(dev) <-- unmap door bell > nvme_queue_rq() > Touch door bell <-- panic here Does the following force the first to complete before the unmap? --- @@ -1415,10 +1421,21 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl) blk_mq_cancel_requeue_work(ns->queue); blk_mq_stop_hw_queues(ns->queue); + blk_sync_queue(ns->queue); } mutex_unlock(&ctrl->namespaces_mutex); } --