All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <keith.busch@wdc.com>,
	linux-nvme@lists.infradead.org,
	Daniel Wagner <daniel.wagner@suse.de>
Subject: Re: [PATCHv3] nvme-mpath: delete disk after last connection
Date: Thu, 6 May 2021 08:13:51 +0200	[thread overview]
Message-ID: <2260f3ca-2e27-b2ad-d1c0-6747897e9557@suse.de> (raw)
In-Reply-To: <8a274c79-6db8-a21b-e60e-4e73a9d139b5@grimberg.me>

On 5/5/21 10:40 PM, Sagi Grimberg wrote:
> 
>>>>>> As stated in the v3 review this is an incompatible change.  We'll 
>>>>>> need
>>>>>> the queue_if_no_path attribute first, and default it to on to keep
>>>>>> compatability.
>>>>>>
>>>>>
>>>>> That is what I tried the last time, but the direction I got was to 
>>>>> treat
>>>>> both, NVMe-PCI and NVMe-oF identically:
>>>>> (https://lore.kernel.org/linux-nvme/34e5c178-8bc4-68d3-8374-fbc1b451b6e8@grimberg.me/) 
>>>>>
>>>>
>>>> Yes, I'm not sure I understand your comment Christoph. This 
>>>> addresses an
>>>> issue with mdraid where hot unplug+replug does not restore the 
>>>> device to
>>>> the raid group (pci and fabrics alike), where before multipath this 
>>>> used
>>>> to work.
>>>>
>>>> queue_if_no_path is a dm-multipath feature so I'm not entirely clear
>>>> what is the concern? mdraid on nvme (pci/fabrics) used to work a 
>>>> certain
>>>> way, with the introduction of nvme-mpath the behavior was broken (as 
>>>> far
>>>> as I understand from Hannes).
>>>>
>>>> My thinking is that if we want queue_if_no_path functionality in nvme
>>>> mpath we should have it explicitly stated properly such that people
>>>> that actually need it will use it and have mdraid function correctly
>>>> again. Also, queue_if_no_path applies really if all the paths are
>>>> gone in the sense they are completely removed, and doesn't apply
>>>> to controller reset/reconnect.
>>>>
>>>> I agree we should probably have queue_if_no_path attribute on the
>>>> mpath device, but it doesn't sound right to default it to true given
>>>> that it breaks mdraid stacking on top of it..
>>>
>>> If you want "queue_if_no_path" behavior, can't you just set really high
>>> reconnect_delay and ctrl_loss_tmo values? That prevents the path from
>>> being deleted while it is unreachable, then restart IO on the existing
>>> path once connection is re-established.
>>>
>> Precisely my thinking.
>> We _could_ add a queue_if_no_path attribute, but we can also achieve the
>> same behaviour by setting the ctrl_loss_tmo value to infinity.
>> Provided we can change it on the fly, though; but it not that's easily
>> fixed.
>>
>> In fact, that's what we recommend to our customers to avoid the bug
>> fixed by this patch.
> 
> You can change ctrl_loss_tmo on the fly. How does that address the
> issue? the original issue is ctrl_loss_tmo expires for fabrics? or
> pci unplug (which ctrl_loss_tmo does not apply to it)?

Yes. It becomes particularly noticeable in TCP fabrics where the link 
can go down for an extended time.
The system will try to reconnect until ctrl_loss_tmo kicks in; if the 
link gets reestablished after that time your system is hosed.

With this patch I/O is still killed, but at least you can then 
re-establish the connection by just calling

nvme connect

and the nvme device will be reconnected such that you can call

mdadm --re-add

to resync the device.
With the current implementation you are out of luck as I/O is pending on 
the disconnected original nvme device, and you have no chance to flush 
that. Consequently you can't detach it from the MD array, and, again, 
your system is hosed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  parent reply	other threads:[~2021-05-06  6:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-01 12:04 [PATCHv3] nvme-mpath: delete disk after last connection Hannes Reinecke
2021-05-04  8:54 ` Christoph Hellwig
2021-05-04 13:40   ` Hannes Reinecke
2021-05-04 19:54     ` Sagi Grimberg
2021-05-05 15:26       ` Keith Busch
2021-05-05 16:15         ` Hannes Reinecke
2021-05-05 20:40           ` Sagi Grimberg
2021-05-06  2:50             ` Keith Busch
2021-05-06  6:13             ` Hannes Reinecke [this message]
2021-05-06  7:43       ` Christoph Hellwig
2021-05-06  8:42         ` Hannes Reinecke
2021-05-06  9:47           ` Sagi Grimberg
2021-05-06 12:08             ` Christoph Hellwig
2021-05-06 15:54               ` Hannes Reinecke
2021-05-07  6:46                 ` Christoph Hellwig
2021-05-07 17:02                   ` Hannes Reinecke
2021-05-07 17:20                     ` Sagi Grimberg
2021-05-10  6:23                     ` Christoph Hellwig
2021-05-10 13:01                       ` Hannes Reinecke
2021-05-10 13:57                         ` Hannes Reinecke
2021-05-10 14:48                       ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2260f3ca-2e27-b2ad-d1c0-6747897e9557@suse.de \
    --to=hare@suse.de \
    --cc=daniel.wagner@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=keith.busch@wdc.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.