From: Hannes Reinecke <hare@suse.de> To: Mike Snitzer <snitzer@redhat.com> Cc: Keith Busch <keith.busch@intel.com>, Sagi Grimberg <sagi@grimberg.me>, hch@lst.de, axboe@kernel.dk, Martin Wilck <mwilck@suse.com>, lijie <lijie34@huawei.com>, xose.vazquez@gmail.com, linux-nvme@lists.infradead.org, chengjike.cheng@huawei.com, shenhong09@huawei.com, dm-devel@redhat.com, wangzhoumengjian@huawei.com, christophe.varoqui@opensvc.com, bmarzins@redhat.com, sschremm@netapp.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: multipath-tools: add ANA support for NVMe device Date: Wed, 14 Nov 2018 19:51:28 +0100 [thread overview] Message-ID: <87c931e5-4ac9-1795-8d40-cc5541d3ebcf@suse.de> (raw) In-Reply-To: <20181114174746.GA18526@redhat.com> On 11/14/18 6:47 PM, Mike Snitzer wrote: > On Wed, Nov 14 2018 at 2:49am -0500, > Hannes Reinecke <hare@suse.de> wrote: > >> On 11/14/18 6:38 AM, Mike Snitzer wrote: >>> On Tue, Nov 13 2018 at 1:00pm -0500, >>> Mike Snitzer <snitzer@redhat.com> wrote: >>> >>>> [1]: http://lists.infradead.org/pipermail/linux-nvme/2018-November/020765.html >>>> [2]: https://www.redhat.com/archives/dm-devel/2018-November/msg00072.html >>> ... >>> >>> I knew there had to be a pretty tight coupling between the NVMe driver's >>> native multipathing and ANA support... and that the simplicity of >>> Hannes' patch [1] was too good to be true. >>> >>> The real justification for not making Hannes' change is it'd effectively >>> be useless without first splitting out the ANA handling done during NVMe >>> request completion (NVME_SC_ANA_* cases in nvme_failover_req) that >>> triggers re-reading the ANA log page accordingly. >>> >>> So without the ability to drive the ANA workqueue to trigger >>> nvme_read_ana_log() from the nvme driver's completion path -- even if >>> nvme_core.multipath=N -- it really doesn't buy multipath-tools anything >>> to have the NVMe driver export the ana state via sysfs, because that ANA >>> state will never get updated. >>> >> Hmm. Indeed, I was more focussed on having the sysfs attributes >> displayed, so yes, indeed it needs some more work. > ... >>> Not holding my breath BUT: >>> if decoupling the reading of ANA state from native NVMe multipathing >>> specific work during nvme request completion were an acceptable >>> advancement I'd gladly do the work. >>> >> I'd be happy to work on that, given that we'll have to have 'real' >> ANA support for device-mapper anyway for SLE12 SP4 etc. > > I had a close enough look yesterday that I figured I'd just implement > what I reasoned through as one way forward, compile tested only (patch > relative to Jens' for-4.21/block): > > drivers/nvme/host/core.c | 14 +++++++--- > drivers/nvme/host/multipath.c | 65 ++++++++++++++++++++++++++----------------- > drivers/nvme/host/nvme.h | 4 +++ > 3 files changed, 54 insertions(+), 29 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f172d63db2b5..05313ab5d91e 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -252,10 +252,16 @@ void nvme_complete_rq(struct request *req) > trace_nvme_complete_rq(req); > > if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) { > - if ((req->cmd_flags & REQ_NVME_MPATH) && > - blk_path_error(status)) { > - nvme_failover_req(req); > - return; > + if (blk_path_error(status)) { > + struct nvme_ns *ns = req->q->queuedata; > + u16 nvme_status = nvme_req(req)->status; > + > + if (req->cmd_flags & REQ_NVME_MPATH) { > + nvme_failover_req(req); > + nvme_update_ana(ns, nvme_status); > + return; > + } > + nvme_update_ana(ns, nvme_status); > } > > if (!blk_queue_dying(req->q)) { > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 5e3cc8c59a39..f7fbc161dc8c 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -22,7 +22,7 @@ MODULE_PARM_DESC(multipath, > > inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl) > { > - return multipath && ctrl->subsys && (ctrl->subsys->cmic & (1 << 3)); > + return ctrl->subsys && (ctrl->subsys->cmic & (1 << 3)); > } > > /* > @@ -47,6 +47,17 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns, > } > } > > +static bool nvme_ana_error(u16 status) > +{ > + switch (status & 0x7ff) { > + case NVME_SC_ANA_TRANSITION: > + case NVME_SC_ANA_INACCESSIBLE: > + case NVME_SC_ANA_PERSISTENT_LOSS: > + return true; > + } > + return false; > +} > + > void nvme_failover_req(struct request *req) > { > struct nvme_ns *ns = req->q->queuedata; > @@ -58,10 +69,7 @@ void nvme_failover_req(struct request *req) > spin_unlock_irqrestore(&ns->head->requeue_lock, flags); > blk_mq_end_request(req, 0); > > - switch (status & 0x7ff) { > - case NVME_SC_ANA_TRANSITION: > - case NVME_SC_ANA_INACCESSIBLE: > - case NVME_SC_ANA_PERSISTENT_LOSS: > + if (nvme_ana_error(status)) { > /* > * If we got back an ANA error we know the controller is alive, > * but not ready to serve this namespaces. The spec suggests > @@ -69,31 +77,38 @@ void nvme_failover_req(struct request *req) > * that the admin and I/O queues are not serialized that is > * fundamentally racy. So instead just clear the current path, > * mark the the path as pending and kick of a re-read of the ANA > - * log page ASAP. > + * log page ASAP (see nvme_update_ana() below). > */ > nvme_mpath_clear_current_path(ns); > - if (ns->ctrl->ana_log_buf) { > - set_bit(NVME_NS_ANA_PENDING, &ns->flags); > - queue_work(nvme_wq, &ns->ctrl->ana_work); > + } else { > + switch (status & 0x7ff) { > + case NVME_SC_HOST_PATH_ERROR: > + /* > + * Temporary transport disruption in talking to the > + * controller. Try to send on a new path. > + */ > + nvme_mpath_clear_current_path(ns); > + break; > + default: > + /* > + * Reset the controller for any non-ANA error as we > + * don't know what caused the error. > + */ > + nvme_reset_ctrl(ns->ctrl); > + break; > } > - break; > - case NVME_SC_HOST_PATH_ERROR: > - /* > - * Temporary transport disruption in talking to the controller. > - * Try to send on a new path. > - */ > - nvme_mpath_clear_current_path(ns); > - break; > - default: > - /* > - * Reset the controller for any non-ANA error as we don't know > - * what caused the error. > - */ > - nvme_reset_ctrl(ns->ctrl); > - break; > } Maybe move nvme_mpath_clear_current_path() out of the loop (it wouldn't matter if we clear the path if we reset the controller afterwards); that might even clean up the code even more. > +} > > - kblockd_schedule_work(&ns->head->requeue_work); > +void nvme_update_ana(struct nvme_ns *ns, u16 status) > +{ > + if (nvme_ana_error(status) && ns->ctrl->ana_log_buf) { > + set_bit(NVME_NS_ANA_PENDING, &ns->flags); > + queue_work(nvme_wq, &ns->ctrl->ana_work); > + } > + > + if (multipath) > + kblockd_schedule_work(&ns->head->requeue_work); > } maybe use 'ns->head->disk' here; we only need to call this if we have a multipath disk. Remaining bits are okay. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
WARNING: multiple messages have this Message-ID (diff)
From: hare@suse.de (Hannes Reinecke) Subject: multipath-tools: add ANA support for NVMe device Date: Wed, 14 Nov 2018 19:51:28 +0100 [thread overview] Message-ID: <87c931e5-4ac9-1795-8d40-cc5541d3ebcf@suse.de> (raw) In-Reply-To: <20181114174746.GA18526@redhat.com> On 11/14/18 6:47 PM, Mike Snitzer wrote: > On Wed, Nov 14 2018 at 2:49am -0500, > Hannes Reinecke <hare@suse.de> wrote: > >> On 11/14/18 6:38 AM, Mike Snitzer wrote: >>> On Tue, Nov 13 2018 at 1:00pm -0500, >>> Mike Snitzer <snitzer@redhat.com> wrote: >>> >>>> [1]: http://lists.infradead.org/pipermail/linux-nvme/2018-November/020765.html >>>> [2]: https://www.redhat.com/archives/dm-devel/2018-November/msg00072.html >>> ... >>> >>> I knew there had to be a pretty tight coupling between the NVMe driver's >>> native multipathing and ANA support... and that the simplicity of >>> Hannes' patch [1] was too good to be true. >>> >>> The real justification for not making Hannes' change is it'd effectively >>> be useless without first splitting out the ANA handling done during NVMe >>> request completion (NVME_SC_ANA_* cases in nvme_failover_req) that >>> triggers re-reading the ANA log page accordingly. >>> >>> So without the ability to drive the ANA workqueue to trigger >>> nvme_read_ana_log() from the nvme driver's completion path -- even if >>> nvme_core.multipath=N -- it really doesn't buy multipath-tools anything >>> to have the NVMe driver export the ana state via sysfs, because that ANA >>> state will never get updated. >>> >> Hmm. Indeed, I was more focussed on having the sysfs attributes >> displayed, so yes, indeed it needs some more work. > ... >>> Not holding my breath BUT: >>> if decoupling the reading of ANA state from native NVMe multipathing >>> specific work during nvme request completion were an acceptable >>> advancement I'd gladly do the work. >>> >> I'd be happy to work on that, given that we'll have to have 'real' >> ANA support for device-mapper anyway for SLE12 SP4 etc. > > I had a close enough look yesterday that I figured I'd just implement > what I reasoned through as one way forward, compile tested only (patch > relative to Jens' for-4.21/block): > > drivers/nvme/host/core.c | 14 +++++++--- > drivers/nvme/host/multipath.c | 65 ++++++++++++++++++++++++++----------------- > drivers/nvme/host/nvme.h | 4 +++ > 3 files changed, 54 insertions(+), 29 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f172d63db2b5..05313ab5d91e 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -252,10 +252,16 @@ void nvme_complete_rq(struct request *req) > trace_nvme_complete_rq(req); > > if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) { > - if ((req->cmd_flags & REQ_NVME_MPATH) && > - blk_path_error(status)) { > - nvme_failover_req(req); > - return; > + if (blk_path_error(status)) { > + struct nvme_ns *ns = req->q->queuedata; > + u16 nvme_status = nvme_req(req)->status; > + > + if (req->cmd_flags & REQ_NVME_MPATH) { > + nvme_failover_req(req); > + nvme_update_ana(ns, nvme_status); > + return; > + } > + nvme_update_ana(ns, nvme_status); > } > > if (!blk_queue_dying(req->q)) { > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 5e3cc8c59a39..f7fbc161dc8c 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -22,7 +22,7 @@ MODULE_PARM_DESC(multipath, > > inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl) > { > - return multipath && ctrl->subsys && (ctrl->subsys->cmic & (1 << 3)); > + return ctrl->subsys && (ctrl->subsys->cmic & (1 << 3)); > } > > /* > @@ -47,6 +47,17 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns, > } > } > > +static bool nvme_ana_error(u16 status) > +{ > + switch (status & 0x7ff) { > + case NVME_SC_ANA_TRANSITION: > + case NVME_SC_ANA_INACCESSIBLE: > + case NVME_SC_ANA_PERSISTENT_LOSS: > + return true; > + } > + return false; > +} > + > void nvme_failover_req(struct request *req) > { > struct nvme_ns *ns = req->q->queuedata; > @@ -58,10 +69,7 @@ void nvme_failover_req(struct request *req) > spin_unlock_irqrestore(&ns->head->requeue_lock, flags); > blk_mq_end_request(req, 0); > > - switch (status & 0x7ff) { > - case NVME_SC_ANA_TRANSITION: > - case NVME_SC_ANA_INACCESSIBLE: > - case NVME_SC_ANA_PERSISTENT_LOSS: > + if (nvme_ana_error(status)) { > /* > * If we got back an ANA error we know the controller is alive, > * but not ready to serve this namespaces. The spec suggests > @@ -69,31 +77,38 @@ void nvme_failover_req(struct request *req) > * that the admin and I/O queues are not serialized that is > * fundamentally racy. So instead just clear the current path, > * mark the the path as pending and kick of a re-read of the ANA > - * log page ASAP. > + * log page ASAP (see nvme_update_ana() below). > */ > nvme_mpath_clear_current_path(ns); > - if (ns->ctrl->ana_log_buf) { > - set_bit(NVME_NS_ANA_PENDING, &ns->flags); > - queue_work(nvme_wq, &ns->ctrl->ana_work); > + } else { > + switch (status & 0x7ff) { > + case NVME_SC_HOST_PATH_ERROR: > + /* > + * Temporary transport disruption in talking to the > + * controller. Try to send on a new path. > + */ > + nvme_mpath_clear_current_path(ns); > + break; > + default: > + /* > + * Reset the controller for any non-ANA error as we > + * don't know what caused the error. > + */ > + nvme_reset_ctrl(ns->ctrl); > + break; > } > - break; > - case NVME_SC_HOST_PATH_ERROR: > - /* > - * Temporary transport disruption in talking to the controller. > - * Try to send on a new path. > - */ > - nvme_mpath_clear_current_path(ns); > - break; > - default: > - /* > - * Reset the controller for any non-ANA error as we don't know > - * what caused the error. > - */ > - nvme_reset_ctrl(ns->ctrl); > - break; > } Maybe move nvme_mpath_clear_current_path() out of the loop (it wouldn't matter if we clear the path if we reset the controller afterwards); that might even clean up the code even more. > +} > > - kblockd_schedule_work(&ns->head->requeue_work); > +void nvme_update_ana(struct nvme_ns *ns, u16 status) > +{ > + if (nvme_ana_error(status) && ns->ctrl->ana_log_buf) { > + set_bit(NVME_NS_ANA_PENDING, &ns->flags); > + queue_work(nvme_wq, &ns->ctrl->ana_work); > + } > + > + if (multipath) > + kblockd_schedule_work(&ns->head->requeue_work); > } maybe use 'ns->head->disk' here; we only need to call this if we have a multipath disk. Remaining bits are okay. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg)
next prev parent reply other threads:[~2018-11-14 18:51 UTC|newest] Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-11-08 6:09 [PATCH] multipath-tools: add ANA support for NVMe device lijie 2018-11-12 16:23 ` Martin Wilck 2018-11-12 21:53 ` Mike Snitzer 2018-11-12 21:53 ` Mike Snitzer 2018-11-13 6:59 ` Martin Wilck 2018-11-13 6:59 ` Martin Wilck 2018-11-13 16:18 ` Keith Busch 2018-11-13 16:18 ` Keith Busch 2018-11-13 18:00 ` Mike Snitzer 2018-11-13 18:00 ` Mike Snitzer 2018-11-14 5:38 ` Mike Snitzer 2018-11-14 5:38 ` Mike Snitzer 2018-11-14 7:49 ` Hannes Reinecke 2018-11-14 7:49 ` Hannes Reinecke 2018-11-14 10:36 ` [dm-devel] " Martin Wilck 2018-11-14 10:36 ` Martin Wilck 2018-11-14 17:47 ` Mike Snitzer 2018-11-14 17:47 ` Mike Snitzer 2018-11-14 18:51 ` Hannes Reinecke [this message] 2018-11-14 18:51 ` Hannes Reinecke 2018-11-14 19:26 ` Mike Snitzer 2018-11-14 19:26 ` Mike Snitzer 2018-11-15 17:46 ` [PATCH] nvme: allow ANA support to be independent of native multipathing Mike Snitzer 2018-11-15 17:46 ` Mike Snitzer 2018-11-16 7:25 ` Hannes Reinecke 2018-11-16 7:25 ` Hannes Reinecke 2018-11-16 14:01 ` Mike Snitzer 2018-11-16 14:01 ` Mike Snitzer 2018-11-16 9:14 ` [PATCH] " Christoph Hellwig 2018-11-16 9:14 ` Christoph Hellwig 2018-11-16 9:40 ` Hannes Reinecke 2018-11-16 9:40 ` Hannes Reinecke 2018-11-16 9:49 ` Christoph Hellwig 2018-11-16 9:49 ` Christoph Hellwig 2018-11-16 10:06 ` Hannes Reinecke 2018-11-16 10:06 ` Hannes Reinecke 2018-11-16 10:17 ` Christoph Hellwig 2018-11-16 10:17 ` Christoph Hellwig 2018-11-16 19:28 ` Mike Snitzer 2018-11-16 19:28 ` Mike Snitzer 2018-11-16 19:34 ` Laurence Oberman 2018-11-16 19:34 ` Laurence Oberman 2018-11-19 9:39 ` Christoph Hellwig 2018-11-19 9:39 ` Christoph Hellwig 2018-11-19 14:56 ` Mike Snitzer 2018-11-19 14:56 ` Mike Snitzer 2018-11-19 14:56 ` Mike Snitzer 2018-11-20 9:42 ` Christoph Hellwig 2018-11-20 9:42 ` Christoph Hellwig 2018-11-20 13:37 ` Mike Snitzer 2018-11-20 13:37 ` Mike Snitzer 2018-11-20 16:23 ` Christoph Hellwig 2018-11-20 16:23 ` Christoph Hellwig 2018-11-16 14:12 ` Mike Snitzer 2018-11-16 14:12 ` Mike Snitzer 2018-11-16 18:59 ` [PATCH v2] " Mike Snitzer 2018-11-16 18:59 ` Mike Snitzer 2018-11-14 7:24 ` multipath-tools: add ANA support for NVMe device Hannes Reinecke 2018-11-14 7:24 ` Hannes Reinecke 2018-11-14 15:35 ` Christoph Hellwig 2018-11-14 15:35 ` Christoph Hellwig 2018-11-14 16:16 ` Mike Snitzer 2018-11-14 16:16 ` Mike Snitzer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87c931e5-4ac9-1795-8d40-cc5541d3ebcf@suse.de \ --to=hare@suse.de \ --cc=axboe@kernel.dk \ --cc=bmarzins@redhat.com \ --cc=chengjike.cheng@huawei.com \ --cc=christophe.varoqui@opensvc.com \ --cc=dm-devel@redhat.com \ --cc=hch@lst.de \ --cc=keith.busch@intel.com \ --cc=lijie34@huawei.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=mwilck@suse.com \ --cc=sagi@grimberg.me \ --cc=shenhong09@huawei.com \ --cc=snitzer@redhat.com \ --cc=sschremm@netapp.com \ --cc=wangzhoumengjian@huawei.com \ --cc=xose.vazquez@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.