All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Keith Busch <keith.busch@intel.com>,
	Hannes Reinecke <hare@suse.de>,
	Laurence Oberman <loberman@redhat.com>,
	Ewan Milne <emilne@redhat.com>,
	James Smart <james.smart@broadcom.com>,
	Linux Kernel Mailinglist <linux-kernel@vger.kernel.org>,
	Linux NVMe Mailinglist <linux-nvme@lists.infradead.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Martin George <marting@netapp.com>,
	John Meneghini <John.Meneghini@netapp.com>
Subject: Re: [PATCH 0/3] Provide more fine grained control over multipathing
Date: Thu, 31 May 2018 08:37:39 -0400	[thread overview]
Message-ID: <20180531123738.GA10552@redhat.com> (raw)
In-Reply-To: <f084e3bc-77af-4268-c882-7b0737e45f3b@grimberg.me>

On Thu, May 31 2018 at  4:37am -0400,
Sagi Grimberg <sagi@grimberg.me> wrote:

> 
> >Wouldn't expect you guys to nurture this 'mpath_personality' knob.  SO
> >when features like "dispersed namespaces" land a negative check would
> >need to be added in the code to prevent switching from "native".
> >
> >And once something like "dispersed namespaces" lands we'd then have to
> >see about a more sophisticated switch that operates at a different
> >granularity.  Could also be that switching one subsystem that is part of
> >"dispersed namespaces" would then cascade to all other associated
> >subsystems?  Not that dissimilar from the 3rd patch in this series that
> >allows a 'device' switch to be done in terms of the subsystem.
> 
> Which I think is broken by allowing to change this personality on the
> fly.

I saw your reply to the 1/3 patch.. I do agree it is broken for not
checking if any handles are active.  But that is easily fixed no?

Or are you suggesting some other aspect of "broken"?

> >Anyway, I don't know the end from the beginning on something you just
> >told me about ;)  But we're all in this together.  And can take it as it
> >comes.
> 
> I agree but this will be exposed to user-space and we will need to live
> with it for a long long time...

OK, well dm-multipath has been around for a long long time.  We cannot
simply wish it away.  Regardless of whatever architectural grievances
are levied against it.

There are far more customer and vendor products that have been developed
to understand and consume dm-multipath and multipath-tools interfaces
than native NVMe multipath.

> >>Don't get me wrong, I do support your cause, and I think nvme should try
> >>to help, I just think that subsystem granularity is not the correct
> >>approach going forward.
> >
> >I understand there will be limits to this 'mpath_personality' knob's
> >utility and it'll need to evolve over time.  But the burden of making
> >more advanced NVMe multipath features accessible outside of native NVMe
> >isn't intended to be on any of the NVMe maintainers (other than maybe
> >remembering to disallow the switch where it makes sense in the future).
> 
> I would expect that any "advanced multipath features" would be properly
> brought up with the NVMe TWG as a ratified standard and find its way
> to nvme. So I don't think this particularly is a valid argument.

You're misreading me again.  I'm also saying stop worrying.  I'm saying
any future native NVMe multipath features that come about don't necessarily
get immediate dm-multipath parity.  The native NVMe multipath would need
appropriate negative checks.
 
> >>As I said, I've been off the grid, can you remind me why global knob is
> >>not sufficient?
> >
> >Because once nvme_core.multipath=N is set: native NVMe multipath is then
> >not accessible from the same host.  The goal of this patchset is to give
> >users choice.  But not limit them to _only_ using dm-multipath if they
> >just have some legacy needs.
> >
> >Tough to be convincing with hypotheticals but I could imagine a very
> >obvious usecase for native NVMe multipathing be PCI-based embedded NVMe
> >"fabrics" (especially if/when the numa-based path selector lands).  But
> >the same host with PCI NVMe could be connected to a FC network that has
> >historically always been managed via dm-multipath.. but say that
> >FC-based infrastructure gets updated to use NVMe (to leverage a wider
> >NVMe investment, whatever?) -- but maybe admins would still prefer to
> >use dm-multipath for the NVMe over FC.
> 
> You are referring to an array exposing media via nvmf and scsi
> simultaneously? I'm not sure that there is a clean definition of
> how that is supposed to work (ANA/ALUA, reservations, etc..)

No I'm referring to completely disjoint arrays that are homed to the
same host.

> >>This might sound stupid to you, but can't users that desperately must
> >>keep using dm-multipath (for its mature toolset or what-not) just
> >>stack it on multipath nvme device? (I might be completely off on
> >>this so feel free to correct my ignorance).
> >
> >We could certainly pursue adding multipath-tools support for native NVMe
> >multipathing.  Not opposed to it (even if just reporting topology and
> >state).  But given the extensive lengths NVMe multipath goes to hide
> >devices we'd need some way to piercing through the opaque nvme device
> >that native NVMe multipath exposes.  But that really is a tangent
> >relative to this patchset.  Since that kind of visibility would also
> >benefit the nvme cli... otherwise how are users to even be able to trust
> >but verify native NVMe multipathing did what it expected it to?
> 
> Can you explain what is missing for multipath-tools to resolve topology?

I've not poured over these nvme interfaces (below I just learned
nvme-cli has since grown the capability).   SO I'm not informed enough
to know if nvme cli has grown other new capabilities.

In any case, training multipath-tools to understand native NVMe
multipath topology doesn't replace actual dm-multipath interface and
associated information.

Per-device statistics is something that users want to be able to see.
Per-device up/down state, etc.

> nvme list-subsys is doing just that, doesn't it? It lists subsys-ctrl
> topology but that is sort of the important information as controllers
> are the real paths.

I had nvme cli version 1.4; which doesn't have nvme list-subsys.
Which means I need to uninstall the distro provided
nvme-cli-1.4-3.el7.x86_64 and find the relevant upstream and build from
src.... 

Yes, this looks like the basic topology info I was hoping for:

# nvme list-subsys
nvme-subsys0 - NQN=nqn.2014.08.org.nvmexpress:80868086PHMB7361004R280CGN  INTEL SSDPED1D280GA
\
 +- nvme0 pcie 0000:5e:00.0
nvme-subsys1 - NQN=mptestnqn
\
 +- nvme1 fc traddr=nn-0x200140111111dbcc:pn-0x100140111111dbcc host_traddr=nn-0x200140111111dac8:pn-0x100140111111dac8
 +- nvme2 fc traddr=nn-0x200140111111dbcd:pn-0x100140111111dbcd host_traddr=nn-0x200140111111dac9:pn-0x100140111111dac9
 +- nvme3 fc traddr=nn-0x200140111111dbce:pn-0x100140111111dbce host_traddr=nn-0x200140111111daca:pn-0x100140111111daca
 +- nvme4 fc traddr=nn-0x200140111111dbcf:pn-0x100140111111dbcf host_traddr=nn-0x200140111111dacb:pn-0x100140111111dacb

WARNING: multiple messages have this Message-ID (diff)
From: snitzer@redhat.com (Mike Snitzer)
Subject: [PATCH 0/3] Provide more fine grained control over multipathing
Date: Thu, 31 May 2018 08:37:39 -0400	[thread overview]
Message-ID: <20180531123738.GA10552@redhat.com> (raw)
In-Reply-To: <f084e3bc-77af-4268-c882-7b0737e45f3b@grimberg.me>

On Thu, May 31 2018 at  4:37am -0400,
Sagi Grimberg <sagi@grimberg.me> wrote:

> 
> >Wouldn't expect you guys to nurture this 'mpath_personality' knob.  SO
> >when features like "dispersed namespaces" land a negative check would
> >need to be added in the code to prevent switching from "native".
> >
> >And once something like "dispersed namespaces" lands we'd then have to
> >see about a more sophisticated switch that operates at a different
> >granularity.  Could also be that switching one subsystem that is part of
> >"dispersed namespaces" would then cascade to all other associated
> >subsystems?  Not that dissimilar from the 3rd patch in this series that
> >allows a 'device' switch to be done in terms of the subsystem.
> 
> Which I think is broken by allowing to change this personality on the
> fly.

I saw your reply to the 1/3 patch.. I do agree it is broken for not
checking if any handles are active.  But that is easily fixed no?

Or are you suggesting some other aspect of "broken"?

> >Anyway, I don't know the end from the beginning on something you just
> >told me about ;)  But we're all in this together.  And can take it as it
> >comes.
> 
> I agree but this will be exposed to user-space and we will need to live
> with it for a long long time...

OK, well dm-multipath has been around for a long long time.  We cannot
simply wish it away.  Regardless of whatever architectural grievances
are levied against it.

There are far more customer and vendor products that have been developed
to understand and consume dm-multipath and multipath-tools interfaces
than native NVMe multipath.

> >>Don't get me wrong, I do support your cause, and I think nvme should try
> >>to help, I just think that subsystem granularity is not the correct
> >>approach going forward.
> >
> >I understand there will be limits to this 'mpath_personality' knob's
> >utility and it'll need to evolve over time.  But the burden of making
> >more advanced NVMe multipath features accessible outside of native NVMe
> >isn't intended to be on any of the NVMe maintainers (other than maybe
> >remembering to disallow the switch where it makes sense in the future).
> 
> I would expect that any "advanced multipath features" would be properly
> brought up with the NVMe TWG as a ratified standard and find its way
> to nvme. So I don't think this particularly is a valid argument.

You're misreading me again.  I'm also saying stop worrying.  I'm saying
any future native NVMe multipath features that come about don't necessarily
get immediate dm-multipath parity.  The native NVMe multipath would need
appropriate negative checks.
 
> >>As I said, I've been off the grid, can you remind me why global knob is
> >>not sufficient?
> >
> >Because once nvme_core.multipath=N is set: native NVMe multipath is then
> >not accessible from the same host.  The goal of this patchset is to give
> >users choice.  But not limit them to _only_ using dm-multipath if they
> >just have some legacy needs.
> >
> >Tough to be convincing with hypotheticals but I could imagine a very
> >obvious usecase for native NVMe multipathing be PCI-based embedded NVMe
> >"fabrics" (especially if/when the numa-based path selector lands).  But
> >the same host with PCI NVMe could be connected to a FC network that has
> >historically always been managed via dm-multipath.. but say that
> >FC-based infrastructure gets updated to use NVMe (to leverage a wider
> >NVMe investment, whatever?) -- but maybe admins would still prefer to
> >use dm-multipath for the NVMe over FC.
> 
> You are referring to an array exposing media via nvmf and scsi
> simultaneously? I'm not sure that there is a clean definition of
> how that is supposed to work (ANA/ALUA, reservations, etc..)

No I'm referring to completely disjoint arrays that are homed to the
same host.

> >>This might sound stupid to you, but can't users that desperately must
> >>keep using dm-multipath (for its mature toolset or what-not) just
> >>stack it on multipath nvme device? (I might be completely off on
> >>this so feel free to correct my ignorance).
> >
> >We could certainly pursue adding multipath-tools support for native NVMe
> >multipathing.  Not opposed to it (even if just reporting topology and
> >state).  But given the extensive lengths NVMe multipath goes to hide
> >devices we'd need some way to piercing through the opaque nvme device
> >that native NVMe multipath exposes.  But that really is a tangent
> >relative to this patchset.  Since that kind of visibility would also
> >benefit the nvme cli... otherwise how are users to even be able to trust
> >but verify native NVMe multipathing did what it expected it to?
> 
> Can you explain what is missing for multipath-tools to resolve topology?

I've not poured over these nvme interfaces (below I just learned
nvme-cli has since grown the capability).   SO I'm not informed enough
to know if nvme cli has grown other new capabilities.

In any case, training multipath-tools to understand native NVMe
multipath topology doesn't replace actual dm-multipath interface and
associated information.

Per-device statistics is something that users want to be able to see.
Per-device up/down state, etc.

> nvme list-subsys is doing just that, doesn't it? It lists subsys-ctrl
> topology but that is sort of the important information as controllers
> are the real paths.

I had nvme cli version 1.4; which doesn't have nvme list-subsys.
Which means I need to uninstall the distro provided
nvme-cli-1.4-3.el7.x86_64 and find the relevant upstream and build from
src.... 

Yes, this looks like the basic topology info I was hoping for:

# nvme list-subsys
nvme-subsys0 - NQN=nqn.2014.08.org.nvmexpress:80868086PHMB7361004R280CGN  INTEL SSDPED1D280GA
\
 +- nvme0 pcie 0000:5e:00.0
nvme-subsys1 - NQN=mptestnqn
\
 +- nvme1 fc traddr=nn-0x200140111111dbcc:pn-0x100140111111dbcc host_traddr=nn-0x200140111111dac8:pn-0x100140111111dac8
 +- nvme2 fc traddr=nn-0x200140111111dbcd:pn-0x100140111111dbcd host_traddr=nn-0x200140111111dac9:pn-0x100140111111dac9
 +- nvme3 fc traddr=nn-0x200140111111dbce:pn-0x100140111111dbce host_traddr=nn-0x200140111111daca:pn-0x100140111111daca
 +- nvme4 fc traddr=nn-0x200140111111dbcf:pn-0x100140111111dbcf host_traddr=nn-0x200140111111dacb:pn-0x100140111111dacb

  reply	other threads:[~2018-05-31 12:37 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25 12:53 [PATCH 0/3] Provide more fine grained control over multipathing Johannes Thumshirn
2018-05-25 12:53 ` Johannes Thumshirn
2018-05-25 12:53 ` [PATCH 1/3] nvme: provide a way to disable nvme mpath per subsystem Johannes Thumshirn
2018-05-25 12:53   ` Johannes Thumshirn
2018-05-25 13:47   ` Mike Snitzer
2018-05-25 13:47     ` Mike Snitzer
2018-05-31  8:17   ` Sagi Grimberg
2018-05-31  8:17     ` Sagi Grimberg
2018-05-25 12:53 ` [PATCH 2/3] nvme multipath: added SUBSYS_ATTR_RW Johannes Thumshirn
2018-05-25 12:53   ` Johannes Thumshirn
2018-05-25 12:53 ` [PATCH 3/3] nvme multipath: add dev_attr_mpath_personality Johannes Thumshirn
2018-05-25 12:53   ` Johannes Thumshirn
2018-05-25 13:05 ` [PATCH 0/3] Provide more fine grained control over multipathing Christoph Hellwig
2018-05-25 13:05   ` Christoph Hellwig
2018-05-25 13:58   ` Mike Snitzer
2018-05-25 13:58     ` Mike Snitzer
2018-05-25 14:12     ` Christoph Hellwig
2018-05-25 14:12       ` Christoph Hellwig
2018-05-25 14:50       ` Mike Snitzer
2018-05-25 14:50         ` Mike Snitzer
2018-05-29  1:19         ` Martin K. Petersen
2018-05-29  1:19           ` Martin K. Petersen
2018-05-29  3:02           ` Mike Snitzer
2018-05-29  3:02             ` Mike Snitzer
2018-05-29  7:18             ` Hannes Reinecke
2018-05-29  7:18               ` Hannes Reinecke
2018-05-29  7:22             ` Johannes Thumshirn
2018-05-29  7:22               ` Johannes Thumshirn
2018-05-29  8:09               ` Christoph Hellwig
2018-05-29  8:09                 ` Christoph Hellwig
2018-05-29  9:54                 ` Mike Snitzer
2018-05-29  9:54                   ` Mike Snitzer
2018-05-29 23:27                 ` Mike Snitzer
2018-05-29 23:27                   ` Mike Snitzer
2018-05-30 19:05                   ` Jens Axboe
2018-05-30 19:05                     ` Jens Axboe
2018-05-30 19:59                     ` Mike Snitzer
2018-05-30 19:59                       ` Mike Snitzer
2018-06-04  6:19                     ` Hannes Reinecke
2018-06-04  6:19                       ` Hannes Reinecke
2018-06-04  7:18                       ` Johannes Thumshirn
2018-06-04  7:18                         ` Johannes Thumshirn
2018-06-04 12:59                         ` Christoph Hellwig
2018-06-04 12:59                           ` Christoph Hellwig
2018-06-04 13:27                           ` Mike Snitzer
2018-06-04 13:27                             ` Mike Snitzer
2018-05-31  2:42               ` Ming Lei
2018-05-31  2:42                 ` Ming Lei
2018-05-30 21:20     ` Sagi Grimberg
2018-05-30 21:20       ` Sagi Grimberg
2018-05-30 22:02       ` Mike Snitzer
2018-05-30 22:02         ` Mike Snitzer
2018-05-31  8:37         ` Sagi Grimberg
2018-05-31  8:37           ` Sagi Grimberg
2018-05-31 12:37           ` Mike Snitzer [this message]
2018-05-31 12:37             ` Mike Snitzer
2018-05-31 16:34             ` Christoph Hellwig
2018-05-31 16:34               ` Christoph Hellwig
2018-06-01  4:11               ` Mike Snitzer
2018-06-01  4:11                 ` Mike Snitzer
2018-05-31 16:36           ` Christoph Hellwig
2018-05-31 16:36             ` Christoph Hellwig
2018-05-31 16:33         ` Christoph Hellwig
2018-05-31 16:33           ` Christoph Hellwig
2018-05-31 18:17           ` Mike Snitzer
2018-05-31 18:17             ` Mike Snitzer
2018-06-01  2:40             ` Martin K. Petersen
2018-06-01  2:40               ` Martin K. Petersen
2018-06-01  4:24               ` Mike Snitzer
2018-06-01  4:24                 ` Mike Snitzer
2018-06-01 14:09                 ` Martin K. Petersen
2018-06-01 14:09                   ` Martin K. Petersen
2018-06-01 15:21                   ` Mike Snitzer
2018-06-01 15:21                     ` Mike Snitzer
2018-06-03 11:00                 ` Sagi Grimberg
2018-06-03 11:00                   ` Sagi Grimberg
2018-06-03 16:06                   ` Mike Snitzer
2018-06-03 16:06                     ` Mike Snitzer
2018-06-04 11:46                     ` Sagi Grimberg
2018-06-04 11:46                       ` Sagi Grimberg
2018-06-04 12:48                       ` Johannes Thumshirn
2018-06-04 12:48                         ` Johannes Thumshirn
2018-05-30 22:44       ` Mike Snitzer
2018-05-30 22:44         ` Mike Snitzer
2018-05-31  8:51         ` Sagi Grimberg
2018-05-31  8:51           ` Sagi Grimberg
2018-05-31 12:41           ` Mike Snitzer
2018-05-31 12:41             ` Mike Snitzer
2018-06-04 21:58       ` Roland Dreier
2018-06-04 21:58         ` Roland Dreier
2018-06-05  4:42         ` Christoph Hellwig
2018-06-05  4:42           ` Christoph Hellwig
2018-06-05 22:57           ` Roland Dreier
2018-06-05 22:57             ` Roland Dreier
2018-06-06  9:51             ` Christoph Hellwig
2018-06-06  9:51               ` Christoph Hellwig
2018-06-06  9:32           ` Sagi Grimberg
2018-06-06  9:32             ` Sagi Grimberg
2018-06-06  9:50             ` Christoph Hellwig
2018-06-06  9:50               ` Christoph Hellwig
2018-05-25 14:22   ` Johannes Thumshirn
2018-05-25 14:22     ` Johannes Thumshirn
2018-05-25 14:30     ` Christoph Hellwig
2018-05-25 14:30       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180531123738.GA10552@redhat.com \
    --to=snitzer@redhat.com \
    --cc=John.Meneghini@netapp.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=jthumshirn@suse.de \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=loberman@redhat.com \
    --cc=martin.petersen@oracle.com \
    --cc=marting@netapp.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.