All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Anton Eidelman <anton.eidelman@gmail.com>,
	linux-nvme@lists.infradead.org, hch@lst.de, kbusch@kernel.org,
	axboe@fb.com
Cc: ushankar@purestorage.com, Anton Eidelman <anton@lightbitslabs.com>
Subject: Re: [PATCH v2 1/1] nvme/mpath: fix hang when disk goes live over reconnect
Date: Wed, 23 Mar 2022 11:23:44 +0200	[thread overview]
Message-ID: <da6814ad-0746-c8a6-77ed-e72cb9683034@grimberg.me> (raw)
In-Reply-To: <20220323035551.1524128-2-anton@lightbitslabs.com>



On 3/23/22 05:55, Anton Eidelman wrote:
> nvme_mpath_init_identify() invoked from nvme_init_identify()
> fetches a fresh ANA log from the ctrl.
> This is essential to have an up to date path states
> for both existing namespaces and for those scan_work
> may discover once the ctrl is up.
> 
> This happens in the following cases:
> 1) A new ctrl is being connected.
> 2) An existing ctrl is successfully reconnected.
> 3) An existing ctrl is being reset.
> 
> While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces,
> and nvme_read_ana_log() may call nvme_update_ns_ana_state().
> 
> This result in a hang when the ANA state of an existing namespace
> changes and makes the disk live: nvme_mpath_set_live() issues
> IO to the namespace through the ctrl,
> which does NOT have IO queues yet.

Given that setting the path as live triggers I/O, it makes perfect
sense to do it only after we have I/O queues that can accept this
I/O.

> 
> See sample hang below.
> 
> Solution:
> - nvme_update_ns_ana_state() to call set_live only if ctrl is live
> - nvme_read_ana_log() call from nvme_mpath_init_identify()
>    therefore only fetches and parses the ANA log;
>    any erros in this process will fail the ctrl setup as appropriate;
> - a separate function nvme_mpath_update()
>    is called in nvme_start_ctrl();
>    this parses the ANA log without fetching it.
>    At this point the ctrl is live,
>    therefore, disks can be set live normally.
> 
> Sample failure:
>      nvme nvme0: starting error recovery
>      nvme nvme0: Reconnecting in 10 seconds...
>      block nvme0n6: no usable path - requeuing I/O
>      INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
>            Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
>      Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
>      Call Trace:
>       __schedule+0x2a2/0x7e0
>       schedule+0x4e/0xb0
>       io_schedule+0x16/0x40
>       wait_on_page_bit_common+0x15c/0x3e0
>       do_read_cache_page+0x1e0/0x410
>       read_cache_page+0x12/0x20
>       read_part_sector+0x46/0x100
>       read_lba+0x121/0x240
>       efi_partition+0x1d2/0x6a0
>       bdev_disk_changed.part.0+0x1df/0x430
>       bdev_disk_changed+0x18/0x20
>       blkdev_get_whole+0x77/0xe0
>       blkdev_get_by_dev+0xd2/0x3a0
>       __device_add_disk+0x1ed/0x310
>       device_add_disk+0x13/0x20
>       nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
>       nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
>       nvme_update_ana_state+0xca/0xe0 [nvme_core]
>       nvme_parse_ana_log+0xac/0x170 [nvme_core]
>       nvme_read_ana_log+0x7d/0xe0 [nvme_core]
>       nvme_mpath_init_identify+0x105/0x150 [nvme_core]
>       nvme_init_identify+0x2df/0x4d0 [nvme_core]
>       nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
>       nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
>       nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
>       process_one_work+0x1bd/0x360
>       worker_thread+0x50/0x3d0
> 
> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
> ---
>   drivers/nvme/host/core.c      |  1 +
>   drivers/nvme/host/multipath.c | 12 +++++++++++-
>   drivers/nvme/host/nvme.h      |  4 ++++
>   3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3c0461129bca..d4495fa97657 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4509,6 +4509,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
>   	if (ctrl->queue_count > 1) {
>   		nvme_queue_scan(ctrl);
>   		nvme_start_queues(ctrl);
> +		nvme_mpath_update(ctrl);
>   	}
>   
>   	nvme_change_uevent(ctrl, "NVME_EVENT=connected");
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index c97d7f843977..a22d455f48a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -613,7 +613,8 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
>   	ns->ana_state = desc->state;
>   	clear_bit(NVME_NS_ANA_PENDING, &ns->flags);
>   

I think we need a comment here explaining this. something like:

	/*
	 * nvme_mpath_set_live() will trigger I/O to the mpath
	 * device node and in turn to this path device, however we
	 * cannot accept this I/O if the ctrl is not live, this may
	 * deadlock if called from the nvme_mpath_init_identify and
	 * the controller will never complete initialization, preventing
	 * I/O from completing. For this we will reprocess the ANA log
	 * page again once the controller will be fully initialized.
	 */

> -	if (nvme_state_is_live(ns->ana_state))
> +	if (nvme_state_is_live(ns->ana_state) &&
> +		(ns->ctrl->state == NVME_CTRL_LIVE))

No need for the brackets I think, and

Alignment should be:
	if (nvme_state_is_live(ns->ana_state) &&
	    ns->ctrl->state == NVME_CTRL_LIVE)

>   		nvme_mpath_set_live(ns);
>   }
>   
> @@ -700,6 +701,15 @@ static void nvme_ana_work(struct work_struct *work)
>   	nvme_read_ana_log(ctrl);
>   }
>   
> +void nvme_mpath_update(struct nvme_ctrl *ctrl)
> +{
> +	u32 nr_change_groups = 0;
> +
> +	mutex_lock(&ctrl->ana_lock);
> +	nvme_parse_ana_log(ctrl, &nr_change_groups, nvme_update_ana_state);
> +	mutex_unlock(&ctrl->ana_lock);
> +}
> +
>   static void nvme_anatt_timeout(struct timer_list *t)
>   {
>   	struct nvme_ctrl *ctrl = from_timer(ctrl, t, anatt_timer);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 1ea908d43e17..76f7a5f37379 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -781,6 +781,7 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
>   void nvme_mpath_remove_disk(struct nvme_ns_head *head);
>   int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id);
>   void nvme_mpath_init_ctrl(struct nvme_ctrl *ctrl);
> +void nvme_mpath_update(struct nvme_ctrl *ctrl);
>   void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
>   void nvme_mpath_stop(struct nvme_ctrl *ctrl);
>   bool nvme_mpath_clear_current_path(struct nvme_ns *ns);
> @@ -852,6 +853,9 @@ static inline int nvme_mpath_init_identify(struct nvme_ctrl *ctrl,
>   "Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.\n");
>   	return 0;
>   }
> +void nvme_mpath_update(struct nvme_ctrl *ctrl)
> +{
> +}
>   static inline void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
>   {
>   }


  reply	other threads:[~2022-03-23  9:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18 21:57 [PATCH] nvme/mpath: fix hang when disk goes live over reconnect Anton Eidelman
2021-09-19 10:19 ` Sagi Grimberg
2021-09-19 23:05   ` Anton Eidelman
2021-09-20  7:55     ` Sagi Grimberg
2021-09-20  6:35 ` Christoph Hellwig
2021-09-20 14:53   ` Anton Eidelman
2021-09-21  7:15     ` Christoph Hellwig
2021-10-04 16:46       ` Anton Eidelman
2021-10-04 16:57         ` Christoph Hellwig
2021-10-05 12:38           ` Sagi Grimberg
     [not found]             ` <368499da-c117-e8b7-7b1b-46894e1e0b48@grimberg.me>
2021-10-19 15:13               ` Anton Eidelman
2022-03-23  3:55 ` [PATCH v2 0/1] " Anton Eidelman
2022-03-23  3:55   ` [PATCH v2 1/1] " Anton Eidelman
2022-03-23  9:23     ` Sagi Grimberg [this message]
2022-03-23 14:45     ` [PATCH v3 0/1] " Anton Eidelman
2022-03-23 14:45       ` [PATCH v3 1/1] " Anton Eidelman
2022-03-23 14:55         ` Sagi Grimberg
2022-03-23 15:22         ` Christoph Hellwig
2022-03-24 19:05           ` [PATCH v4 0/1] " Anton Eidelman
2022-03-24 19:05             ` [PATCH v4 1/1] " Anton Eidelman
2022-03-24 21:06               ` Sagi Grimberg
2022-03-25  6:36               ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da6814ad-0746-c8a6-77ed-e72cb9683034@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=anton.eidelman@gmail.com \
    --cc=anton@lightbitslabs.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.