From: "jianchao.wang" <jianchao.w.wang@oracle.com> To: Ming Lei <ming.lei@redhat.com> Cc: Sagi Grimberg <sagi@grimberg.me>, Xiao Liang <xiliang@redhat.com>, linux-nvme@lists.infradead.org, Keith Busch <keith.busch@intel.com>, stable@vger.kernel.org, Christoph Hellwig <hch@lst.de> Subject: Re: [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED Date: Sun, 28 Jan 2018 17:01:18 +0800 [thread overview] Message-ID: <ead2bb8f-3628-ab15-f2f1-d84ffb1a3ed4@oracle.com> (raw) In-Reply-To: <20180127154356.GA20712@ming.t460p> Hi ming Thanks for your time to look at this. That's really appreciated. On 01/27/2018 11:44 PM, Ming Lei wrote: >>>> The best way to fix this is to ensure the timeout path has been completed before cancel the >>>> previously outstanding requests. (Just ignore the case where the nvme_timeout will invoke nvme_dev_disable, >>>> it has to be fixed by other way.) >>> Maybe your approach looks a bit clean and simplify the implementation, but seems >>> it isn't necessary. >>> >>> So could you explain a bit what the exact issue you are worrying about? deadlock? >>> or others? >> There is indeed potential issue. But it is in very narrow window. >> Please refer to https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2018_1_19_68&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=vcQM1keq4TzxrKgWxIBOPADXTw7IRtGtWTBlHVIvPn8&s=83k_LmLl0TFLYwkVn4HjZbj5RgIMlLv570rnB5jNRC0&e= > OK, follows description from the above link: > >> Yes, once controller disabling completes, any started requests will be >> handled and cannot expire. But before the _boundary_, there could be a >> nvme_timeout context runs with nvme_dev_disable in parallel. If a timeout >> path grabs a request, then nvme_dev_disable cannot get and cancel it. >> >> So even though the nvme_dev_disable completes, there still could be a request in nvme_timeout context. >> >> The worst case is : >> nvme_timeout nvme_reset_work >> if (ctrl->state == RESETTING ) nvme_dev_disable >> nvme_dev_disable initializing procedure >> >> the nvme_dev_disable run with reinit procedure in nvme_reset_work in parallel. > OK, that is the issue, the nvme_dev_disable() in nvme_timeout() may > disable queues again, and cause hang in nvme_reset_work(). Yes. :) > > Looks Keith's suggestion of introducing nvme_sync_queues() should be > enough to kill the race, but seems nvme_sync_queues() should have been > called at the entry of nvme_reset_work() unconditionally. Thanks for your suggestion. Jianchao
WARNING: multiple messages have this Message-ID (diff)
From: jianchao.w.wang@oracle.com (jianchao.wang) Subject: [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED Date: Sun, 28 Jan 2018 17:01:18 +0800 [thread overview] Message-ID: <ead2bb8f-3628-ab15-f2f1-d84ffb1a3ed4@oracle.com> (raw) In-Reply-To: <20180127154356.GA20712@ming.t460p> Hi ming Thanks for your time to look at this. That's really appreciated. On 01/27/2018 11:44 PM, Ming Lei wrote: >>>> The best way to fix this is to ensure the timeout path has been completed before cancel the >>>> previously outstanding requests. (Just ignore the case where the nvme_timeout will invoke nvme_dev_disable, >>>> it has to be fixed by other way.) >>> Maybe your approach looks a bit clean and simplify the implementation, but seems >>> it isn't necessary. >>> >>> So could you explain a bit what the exact issue you are worrying about? deadlock? >>> or others? >> There is indeed potential issue. But it is in very narrow window. >> Please refer to https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2018_1_19_68&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=vcQM1keq4TzxrKgWxIBOPADXTw7IRtGtWTBlHVIvPn8&s=83k_LmLl0TFLYwkVn4HjZbj5RgIMlLv570rnB5jNRC0&e= > OK, follows description from the above link: > >> Yes, once controller disabling completes, any started requests will be >> handled and cannot expire. But before the _boundary_, there could be a >> nvme_timeout context runs with nvme_dev_disable in parallel. If a timeout >> path grabs a request, then nvme_dev_disable cannot get and cancel it. >> >> So even though the nvme_dev_disable completes, there still could be a request in nvme_timeout context. >> >> The worst case is : >> nvme_timeout nvme_reset_work >> if (ctrl->state == RESETTING ) nvme_dev_disable >> nvme_dev_disable initializing procedure >> >> the nvme_dev_disable run with reinit procedure in nvme_reset_work in parallel. > OK, that is the issue, the nvme_dev_disable() in nvme_timeout() may > disable queues again, and cause hang in nvme_reset_work(). Yes. :) > > Looks Keith's suggestion of introducing nvme_sync_queues() should be > enough to kill the race, but seems nvme_sync_queues() should have been > called at the entry of nvme_reset_work() unconditionally. Thanks for your suggestion. Jianchao
next prev parent reply other threads:[~2018-01-28 9:02 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-01-25 8:10 [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED Ming Lei 2018-01-25 8:10 ` Ming Lei 2018-01-25 8:52 ` jianchao.wang 2018-01-25 8:52 ` jianchao.wang 2018-01-25 10:15 ` Ming Lei 2018-01-25 10:15 ` Ming Lei 2018-01-27 12:33 ` jianchao.wang 2018-01-27 12:33 ` jianchao.wang 2018-01-27 13:31 ` Ming Lei 2018-01-27 13:31 ` Ming Lei 2018-01-27 14:29 ` jianchao.wang 2018-01-27 14:29 ` jianchao.wang 2018-01-27 15:44 ` Ming Lei 2018-01-27 15:44 ` Ming Lei 2018-01-28 9:01 ` jianchao.wang [this message] 2018-01-28 9:01 ` jianchao.wang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ead2bb8f-3628-ab15-f2f1-d84ffb1a3ed4@oracle.com \ --to=jianchao.w.wang@oracle.com \ --cc=hch@lst.de \ --cc=keith.busch@intel.com \ --cc=linux-nvme@lists.infradead.org \ --cc=ming.lei@redhat.com \ --cc=sagi@grimberg.me \ --cc=stable@vger.kernel.org \ --cc=xiliang@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.