All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
@ 2017-01-11  9:44 Hannes Reinecke
  2017-01-11 22:23 ` Mike Snitzer
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-11  9:44 UTC (permalink / raw)
  To: lsf-pc
  Cc: device-mapper development, linux-block,
	linux-scsi@vger.kernel.org, Mike Snitzer

Hi all,

I'd like to attend LSF/MM this year, and would like to discuss a
redesign of the multipath handling.

With recent kernels we've got quite some functionality required for
multipathing already implemented, making some design decisions of the
original multipath-tools implementation quite pointless.

I'm working on a proof-of-concept implementation which just uses a
simple configfs interface and doesn't require a daemon altogether.

At LSF/MM I'd like to discuss how to move forward here, and whether we'd
like to stay with the current device-mapper integration or move away
from that towards a stand-alone implementation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-11  9:44 [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign Hannes Reinecke
@ 2017-01-11 22:23 ` Mike Snitzer
  2017-01-12  8:27     ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2017-01-11 22:23 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: lsf-pc, device-mapper development, linux-block,
	linux-scsi@vger.kernel.org

On Wed, Jan 11 2017 at  4:44am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> Hi all,
> 
> I'd like to attend LSF/MM this year, and would like to discuss a
> redesign of the multipath handling.
> 
> With recent kernels we've got quite some functionality required for
> multipathing already implemented, making some design decisions of the
> original multipath-tools implementation quite pointless.
> 
> I'm working on a proof-of-concept implementation which just uses a
> simple configfs interface and doesn't require a daemon altogether.
> 
> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
> like to stay with the current device-mapper integration or move away
> from that towards a stand-alone implementation.

I'd really like open exchange of the problems you're having with the
current multipath-tools and DM multipath _before LSF_.  Last LSF only
scratched the surface on people having disdain for the complexity that is
the multipath-tools userspace.  But considering how much of the
multipath-tools you've written I find it fairly comical that you're the
person advocating switching away from it.

But if less userspace involvement is needed then fix userspace.  Fail to
see how configfs is any different than the established DM ioctl interface.

As I just said in another email DM multipath could benefit from
factoring out the SCSI-specific bits so that they are nicely optimized
away if using new transports (e.g. NVMEoF).

Could be lessons can be learned from your approach but I'd prefer we
provably exhaust the utility of the current DM multipath kernel
implementation.  DM multipath is one of the most actively maintained and
updated DM targets (aside from thinp and cache).  As you know DM
multipath has grown blk-mq support which yielded serious performance
improvement.  You also noted (in an earlier email) that I reintroduced
bio-based DM multipath.  On a data path level we have all possible block
core interfaces plumbed.  And yes, they all involve cloning due to the
underlying Device Mapper core.  Open to any ideas on optimization.  If
DM is imposing some inherent performance limitation then please report
it accordingly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-11 22:23 ` Mike Snitzer
@ 2017-01-12  8:27     ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-12  8:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: lsf-pc, device-mapper development, linux-block,
	linux-scsi@vger.kernel.org

On 01/11/2017 11:23 PM, Mike Snitzer wrote:
> On Wed, Jan 11 2017 at  4:44am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
>
>> Hi all,
>>
>> I'd like to attend LSF/MM this year, and would like to discuss a
>> redesign of the multipath handling.
>>
>> With recent kernels we've got quite some functionality required for
>> multipathing already implemented, making some design decisions of the
>> original multipath-tools implementation quite pointless.
>>
>> I'm working on a proof-of-concept implementation which just uses a
>> simple configfs interface and doesn't require a daemon altogether.
>>
>> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
>> like to stay with the current device-mapper integration or move away
>> from that towards a stand-alone implementation.
>
> I'd really like open exchange of the problems you're having with the
> current multipath-tools and DM multipath _before LSF_.  Last LSF only
> scratched the surface on people having disdain for the complexity that is
> the multipath-tools userspace.  But considering how much of the
> multipath-tools you've written I find it fairly comical that you're the
> person advocating switching away from it.
>
Yeah, I know.

But I've stared long and hard at the code, and found some issues really 
hard to overcome. Even more so as most things it does are really pointless.

multipathd _insists_ on redoing the _entire_ device layout for basically 
any operation (except for path checking).
As the data structures allow only for a single setup it uses a lock per 
multipath device to protect against concurrent changes.
When lots of uevents are to be processed this lock is heavily contended, 
leading to a slow-down of uevent processing.
(cf the patchseries from Tang Junhui and my earlier pathset for
lock pushdown)

I've tried to move that lock down even further with distinct locks for 
device paths and multipath devices, but ultimately failed as it would 
amount to essentially a rewrite of the core engine.

> But if less userspace involvement is needed then fix userspace.  Fail to
> see how configfs is any different than the established DM ioctl interface.
>
> As I just said in another email DM multipath could benefit from
> factoring out the SCSI-specific bits so that they are nicely optimized
> away if using new transports (e.g. NVMEoF).
>
> Could be lessons can be learned from your approach but I'd prefer we
> provably exhaust the utility of the current DM multipath kernel
> implementation.  DM multipath is one of the most actively maintained and
> updated DM targets (aside from thinp and cache).  As you know DM
> multipath has grown blk-mq support which yielded serious performance
> improvement.  You also noted (in an earlier email) that I reintroduced
> bio-based DM multipath.  On a data path level we have all possible block
> core interfaces plumbed.  And yes, they all involve cloning due to the
> underlying Device Mapper core.  Open to any ideas on optimization.  If
> DM is imposing some inherent performance limitation then please report
> it accordingly.
>
Ah. And I thought you disliked request-based multipathing ...

It's not _actually_ the DM interface which I'm objecting to, it's more 
the user-space implementation.
The daemon is build around some design decisions which are simply not 
applicable anymore:
- we now _do_ have reliable device identifications, so the the 'path_id' 
functionality is pointless.
- The 'alua' device handler also provides you with reliable priority 
information, so it should be possible to do away with the 'prio' 
setting, too.
- And for (most) SCSI devices the 'state' setting provides a reliable 
indicator if the device is useable.

Hence I've implemented a notifier chain (hooked onto 'struct gendisk') 
which provides events for path up/path down etc.
With that it's possible to automatically fail and reinstate paths.
However, what's missing is an automatic pathgroup switch once all paths 
in a group are down.
In the current implementation the device-mapper target doesn't have any 
inkling about path priorities; it just sees path groups as such.
As it stands should reasonably trivial to switch to the next available 
pathgroup, but fallback will become ... interesting.
So we would need to update the interface here to allow for path group 
priorities and also for transmitting the fallback information.

Nothing insurmountable, agreed.
But once we do this most of the current functionality of the 
multipath-tools daemon will become obsolete.

Plus I wasn't quite sure about the direction device-mapper itself will 
be going, so I decided to implement a stand-alone version as a testbed.
I'm not trying to push that at all costs; I'm perfectly happy with 
updating device-mapper.
As long as no-one insists we're having to use the bio-based interface ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: F. Imend�rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N�rnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
@ 2017-01-12  8:27     ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-12  8:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: lsf-pc, device-mapper development, linux-block,
	linux-scsi@vger.kernel.org

On 01/11/2017 11:23 PM, Mike Snitzer wrote:
> On Wed, Jan 11 2017 at  4:44am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
>
>> Hi all,
>>
>> I'd like to attend LSF/MM this year, and would like to discuss a
>> redesign of the multipath handling.
>>
>> With recent kernels we've got quite some functionality required for
>> multipathing already implemented, making some design decisions of the
>> original multipath-tools implementation quite pointless.
>>
>> I'm working on a proof-of-concept implementation which just uses a
>> simple configfs interface and doesn't require a daemon altogether.
>>
>> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
>> like to stay with the current device-mapper integration or move away
>> from that towards a stand-alone implementation.
>
> I'd really like open exchange of the problems you're having with the
> current multipath-tools and DM multipath _before LSF_.  Last LSF only
> scratched the surface on people having disdain for the complexity that is
> the multipath-tools userspace.  But considering how much of the
> multipath-tools you've written I find it fairly comical that you're the
> person advocating switching away from it.
>
Yeah, I know.

But I've stared long and hard at the code, and found some issues really 
hard to overcome. Even more so as most things it does are really pointless.

multipathd _insists_ on redoing the _entire_ device layout for basically 
any operation (except for path checking).
As the data structures allow only for a single setup it uses a lock per 
multipath device to protect against concurrent changes.
When lots of uevents are to be processed this lock is heavily contended, 
leading to a slow-down of uevent processing.
(cf the patchseries from Tang Junhui and my earlier pathset for
lock pushdown)

I've tried to move that lock down even further with distinct locks for 
device paths and multipath devices, but ultimately failed as it would 
amount to essentially a rewrite of the core engine.

> But if less userspace involvement is needed then fix userspace.  Fail to
> see how configfs is any different than the established DM ioctl interface.
>
> As I just said in another email DM multipath could benefit from
> factoring out the SCSI-specific bits so that they are nicely optimized
> away if using new transports (e.g. NVMEoF).
>
> Could be lessons can be learned from your approach but I'd prefer we
> provably exhaust the utility of the current DM multipath kernel
> implementation.  DM multipath is one of the most actively maintained and
> updated DM targets (aside from thinp and cache).  As you know DM
> multipath has grown blk-mq support which yielded serious performance
> improvement.  You also noted (in an earlier email) that I reintroduced
> bio-based DM multipath.  On a data path level we have all possible block
> core interfaces plumbed.  And yes, they all involve cloning due to the
> underlying Device Mapper core.  Open to any ideas on optimization.  If
> DM is imposing some inherent performance limitation then please report
> it accordingly.
>
Ah. And I thought you disliked request-based multipathing ...

It's not _actually_ the DM interface which I'm objecting to, it's more 
the user-space implementation.
The daemon is build around some design decisions which are simply not 
applicable anymore:
- we now _do_ have reliable device identifications, so the the 'path_id' 
functionality is pointless.
- The 'alua' device handler also provides you with reliable priority 
information, so it should be possible to do away with the 'prio' 
setting, too.
- And for (most) SCSI devices the 'state' setting provides a reliable 
indicator if the device is useable.

Hence I've implemented a notifier chain (hooked onto 'struct gendisk') 
which provides events for path up/path down etc.
With that it's possible to automatically fail and reinstate paths.
However, what's missing is an automatic pathgroup switch once all paths 
in a group are down.
In the current implementation the device-mapper target doesn't have any 
inkling about path priorities; it just sees path groups as such.
As it stands should reasonably trivial to switch to the next available 
pathgroup, but fallback will become ... interesting.
So we would need to update the interface here to allow for path group 
priorities and also for transmitting the fallback information.

Nothing insurmountable, agreed.
But once we do this most of the current functionality of the 
multipath-tools daemon will become obsolete.

Plus I wasn't quite sure about the direction device-mapper itself will 
be going, so I decided to implement a stand-alone version as a testbed.
I'm not trying to push that at all costs; I'm perfectly happy with 
updating device-mapper.
As long as no-one insists we're having to use the bio-based interface ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-12  8:27     ` Hannes Reinecke
@ 2017-01-12 17:29       ` Benjamin Marzinski
  -1 siblings, 0 replies; 13+ messages in thread
From: Benjamin Marzinski @ 2017-01-12 17:29 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Snitzer, linux-block, device-mapper development, lsf-pc,
	linux-scsi@vger.kernel.org

On Thu, Jan 12, 2017 at 09:27:40AM +0100, Hannes Reinecke wrote:
> On 01/11/2017 11:23 PM, Mike Snitzer wrote:
> >On Wed, Jan 11 2017 at  4:44am -0500,
> >Hannes Reinecke <hare@suse.de> wrote:
> >
> >>Hi all,
> >>
> >>I'd like to attend LSF/MM this year, and would like to discuss a
> >>redesign of the multipath handling.
> >>
> >>With recent kernels we've got quite some functionality required for
> >>multipathing already implemented, making some design decisions of the
> >>original multipath-tools implementation quite pointless.
> >>
> >>I'm working on a proof-of-concept implementation which just uses a
> >>simple configfs interface and doesn't require a daemon altogether.
> >>
> >>At LSF/MM I'd like to discuss how to move forward here, and whether we'd
> >>like to stay with the current device-mapper integration or move away
> >>from that towards a stand-alone implementation.
> >
> >I'd really like open exchange of the problems you're having with the
> >current multipath-tools and DM multipath _before LSF_.  Last LSF only
> >scratched the surface on people having disdain for the complexity that is
> >the multipath-tools userspace.  But considering how much of the
> >multipath-tools you've written I find it fairly comical that you're the
> >person advocating switching away from it.
> >
> Yeah, I know.
> 
> But I've stared long and hard at the code, and found some issues really hard
> to overcome. Even more so as most things it does are really pointless.
> 
> multipathd _insists_ on redoing the _entire_ device layout for basically any
> operation (except for path checking).
> As the data structures allow only for a single setup it uses a lock per
> multipath device to protect against concurrent changes.
> When lots of uevents are to be processed this lock is heavily contended,
> leading to a slow-down of uevent processing.
> (cf the patchseries from Tang Junhui and my earlier pathset for
> lock pushdown)
> 
> I've tried to move that lock down even further with distinct locks for
> device paths and multipath devices, but ultimately failed as it would amount
> to essentially a rewrite of the core engine.

The multipath user-space tools locking IS horrible and touches
everything.  I could never see a way around it that didn't involve
a ground-up redesign.
 
> >But if less userspace involvement is needed then fix userspace.  Fail to
> >see how configfs is any different than the established DM ioctl interface.
> >
> >As I just said in another email DM multipath could benefit from
> >factoring out the SCSI-specific bits so that they are nicely optimized
> >away if using new transports (e.g. NVMEoF).
> >
> >Could be lessons can be learned from your approach but I'd prefer we
> >provably exhaust the utility of the current DM multipath kernel
> >implementation.  DM multipath is one of the most actively maintained and
> >updated DM targets (aside from thinp and cache).  As you know DM
> >multipath has grown blk-mq support which yielded serious performance
> >improvement.  You also noted (in an earlier email) that I reintroduced
> >bio-based DM multipath.  On a data path level we have all possible block
> >core interfaces plumbed.  And yes, they all involve cloning due to the
> >underlying Device Mapper core.  Open to any ideas on optimization.  If
> >DM is imposing some inherent performance limitation then please report
> >it accordingly.
> >
> Ah. And I thought you disliked request-based multipathing ...
> 
> It's not _actually_ the DM interface which I'm objecting to, it's more the
> user-space implementation.
> The daemon is build around some design decisions which are simply not
> applicable anymore:
> - we now _do_ have reliable device identifications, so the the 'path_id'
> functionality is pointless.

This could be largely fixed in the existing code. The route that the
latest patch from Tang Junhui are going still grabs the wwid if we got
it from the uevent, but it isn't necesary, as long was we're careful.
Currently rbd devices don't get their wwid from the uevent but all other
devices do. It would probably be possible to write an rbd device udev
rule to set a variable so that they can work through udev environment
variables too.

> - The 'alua' device handler also provides you with reliable priority
> information, so it should be possible to do away with the 'prio' setting,
> too.

But this isn't true for all devices. Also, Like I mentioned last year
when this got brought up, no matter how we group the paths, there end up
being users that have good reasons why they want them grouped
differently in their case.  The path priority/grouping seems like one
place where evidence has shown that we should give users the tools to
make policy decisions, instead of making them ourselves.

> - And for (most) SCSI devices the 'state' setting provides a reliable
> indicator if the device is useable.

This is also not true for all devices.

So, are you planning on creating a multipath implementation that only
handles some devices? Obviously, the current userspace tools are still
around to handle setups that this wouldn't.

While I've daydreamed of rewriting the multipath tools multiple times,
and having nothing aginst you doing it in concept, I would be happier
knowing that it won't simply mean that there are two sets of tools, that
both need to be supported to deal with all customer configurations.

-Ben 

> 
> Hence I've implemented a notifier chain (hooked onto 'struct gendisk') which
> provides events for path up/path down etc.
> With that it's possible to automatically fail and reinstate paths.
> However, what's missing is an automatic pathgroup switch once all paths in a
> group are down.
> In the current implementation the device-mapper target doesn't have any
> inkling about path priorities; it just sees path groups as such.
> As it stands should reasonably trivial to switch to the next available
> pathgroup, but fallback will become ... interesting.
> So we would need to update the interface here to allow for path group
> priorities and also for transmitting the fallback information.
> 
> Nothing insurmountable, agreed.
> But once we do this most of the current functionality of the multipath-tools
> daemon will become obsolete.
> 
> Plus I wasn't quite sure about the direction device-mapper itself will be
> going, so I decided to implement a stand-alone version as a testbed.
> I'm not trying to push that at all costs; I'm perfectly happy with updating
> device-mapper.
> As long as no-one insists we're having to use the bio-based interface ...
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
> GF: F. Imend�rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG N�rnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
@ 2017-01-12 17:29       ` Benjamin Marzinski
  0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Marzinski @ 2017-01-12 17:29 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Snitzer, linux-block, device-mapper development, lsf-pc,
	linux-scsi@vger.kernel.org

On Thu, Jan 12, 2017 at 09:27:40AM +0100, Hannes Reinecke wrote:
> On 01/11/2017 11:23 PM, Mike Snitzer wrote:
> >On Wed, Jan 11 2017 at  4:44am -0500,
> >Hannes Reinecke <hare@suse.de> wrote:
> >
> >>Hi all,
> >>
> >>I'd like to attend LSF/MM this year, and would like to discuss a
> >>redesign of the multipath handling.
> >>
> >>With recent kernels we've got quite some functionality required for
> >>multipathing already implemented, making some design decisions of the
> >>original multipath-tools implementation quite pointless.
> >>
> >>I'm working on a proof-of-concept implementation which just uses a
> >>simple configfs interface and doesn't require a daemon altogether.
> >>
> >>At LSF/MM I'd like to discuss how to move forward here, and whether we'd
> >>like to stay with the current device-mapper integration or move away
> >>from that towards a stand-alone implementation.
> >
> >I'd really like open exchange of the problems you're having with the
> >current multipath-tools and DM multipath _before LSF_.  Last LSF only
> >scratched the surface on people having disdain for the complexity that is
> >the multipath-tools userspace.  But considering how much of the
> >multipath-tools you've written I find it fairly comical that you're the
> >person advocating switching away from it.
> >
> Yeah, I know.
> 
> But I've stared long and hard at the code, and found some issues really hard
> to overcome. Even more so as most things it does are really pointless.
> 
> multipathd _insists_ on redoing the _entire_ device layout for basically any
> operation (except for path checking).
> As the data structures allow only for a single setup it uses a lock per
> multipath device to protect against concurrent changes.
> When lots of uevents are to be processed this lock is heavily contended,
> leading to a slow-down of uevent processing.
> (cf the patchseries from Tang Junhui and my earlier pathset for
> lock pushdown)
> 
> I've tried to move that lock down even further with distinct locks for
> device paths and multipath devices, but ultimately failed as it would amount
> to essentially a rewrite of the core engine.

The multipath user-space tools locking IS horrible and touches
everything.  I could never see a way around it that didn't involve
a ground-up redesign.
 
> >But if less userspace involvement is needed then fix userspace.  Fail to
> >see how configfs is any different than the established DM ioctl interface.
> >
> >As I just said in another email DM multipath could benefit from
> >factoring out the SCSI-specific bits so that they are nicely optimized
> >away if using new transports (e.g. NVMEoF).
> >
> >Could be lessons can be learned from your approach but I'd prefer we
> >provably exhaust the utility of the current DM multipath kernel
> >implementation.  DM multipath is one of the most actively maintained and
> >updated DM targets (aside from thinp and cache).  As you know DM
> >multipath has grown blk-mq support which yielded serious performance
> >improvement.  You also noted (in an earlier email) that I reintroduced
> >bio-based DM multipath.  On a data path level we have all possible block
> >core interfaces plumbed.  And yes, they all involve cloning due to the
> >underlying Device Mapper core.  Open to any ideas on optimization.  If
> >DM is imposing some inherent performance limitation then please report
> >it accordingly.
> >
> Ah. And I thought you disliked request-based multipathing ...
> 
> It's not _actually_ the DM interface which I'm objecting to, it's more the
> user-space implementation.
> The daemon is build around some design decisions which are simply not
> applicable anymore:
> - we now _do_ have reliable device identifications, so the the 'path_id'
> functionality is pointless.

This could be largely fixed in the existing code. The route that the
latest patch from Tang Junhui are going still grabs the wwid if we got
it from the uevent, but it isn't necesary, as long was we're careful.
Currently rbd devices don't get their wwid from the uevent but all other
devices do. It would probably be possible to write an rbd device udev
rule to set a variable so that they can work through udev environment
variables too.

> - The 'alua' device handler also provides you with reliable priority
> information, so it should be possible to do away with the 'prio' setting,
> too.

But this isn't true for all devices. Also, Like I mentioned last year
when this got brought up, no matter how we group the paths, there end up
being users that have good reasons why they want them grouped
differently in their case.  The path priority/grouping seems like one
place where evidence has shown that we should give users the tools to
make policy decisions, instead of making them ourselves.

> - And for (most) SCSI devices the 'state' setting provides a reliable
> indicator if the device is useable.

This is also not true for all devices.

So, are you planning on creating a multipath implementation that only
handles some devices? Obviously, the current userspace tools are still
around to handle setups that this wouldn't.

While I've daydreamed of rewriting the multipath tools multiple times,
and having nothing aginst you doing it in concept, I would be happier
knowing that it won't simply mean that there are two sets of tools, that
both need to be supported to deal with all customer configurations.

-Ben 

> 
> Hence I've implemented a notifier chain (hooked onto 'struct gendisk') which
> provides events for path up/path down etc.
> With that it's possible to automatically fail and reinstate paths.
> However, what's missing is an automatic pathgroup switch once all paths in a
> group are down.
> In the current implementation the device-mapper target doesn't have any
> inkling about path priorities; it just sees path groups as such.
> As it stands should reasonably trivial to switch to the next available
> pathgroup, but fallback will become ... interesting.
> So we would need to update the interface here to allow for path group
> priorities and also for transmitting the fallback information.
> 
> Nothing insurmountable, agreed.
> But once we do this most of the current functionality of the multipath-tools
> daemon will become obsolete.
> 
> Plus I wasn't quite sure about the direction device-mapper itself will be
> going, so I decided to implement a stand-alone version as a testbed.
> I'm not trying to push that at all costs; I'm perfectly happy with updating
> device-mapper.
> As long as no-one insists we're having to use the bio-based interface ...
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-12 17:29       ` Benjamin Marzinski
@ 2017-01-13 15:56         ` Hannes Reinecke
  -1 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-13 15:56 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Mike Snitzer, linux-block, device-mapper development, lsf-pc,
	linux-scsi@vger.kernel.org

On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
> On Thu, Jan 12, 2017 at 09:27:40AM +0100, Hannes Reinecke wrote:
>> On 01/11/2017 11:23 PM, Mike Snitzer wrote:
>>> On Wed, Jan 11 2017 at  4:44am -0500,
>>> Hannes Reinecke <hare@suse.de> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to attend LSF/MM this year, and would like to discuss a
>>>> redesign of the multipath handling.
>>>>
>>>> With recent kernels we've got quite some functionality required for
>>>> multipathing already implemented, making some design decisions of the
>>>> original multipath-tools implementation quite pointless.
>>>>
>>>> I'm working on a proof-of-concept implementation which just uses a
>>>> simple configfs interface and doesn't require a daemon altogether.
>>>>
>>>> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
>>>> like to stay with the current device-mapper integration or move away
>>> >from that towards a stand-alone implementation.
>>>
>>> I'd really like open exchange of the problems you're having with the
>>> current multipath-tools and DM multipath _before LSF_.  Last LSF only
>>> scratched the surface on people having disdain for the complexity that is
>>> the multipath-tools userspace.  But considering how much of the
>>> multipath-tools you've written I find it fairly comical that you're the
>>> person advocating switching away from it.
>>>
>> Yeah, I know.
>>
>> But I've stared long and hard at the code, and found some issues really hard
>> to overcome. Even more so as most things it does are really pointless.
>>
>> multipathd _insists_ on redoing the _entire_ device layout for basically any
>> operation (except for path checking).
>> As the data structures allow only for a single setup it uses a lock per
>> multipath device to protect against concurrent changes.
>> When lots of uevents are to be processed this lock is heavily contended,
>> leading to a slow-down of uevent processing.
>> (cf the patchseries from Tang Junhui and my earlier pathset for
>> lock pushdown)
>>
>> I've tried to move that lock down even further with distinct locks for
>> device paths and multipath devices, but ultimately failed as it would amount
>> to essentially a rewrite of the core engine.
> 
> The multipath user-space tools locking IS horrible and touches
> everything.  I could never see a way around it that didn't involve
> a ground-up redesign.
>  
:-)

>>> But if less userspace involvement is needed then fix userspace.  Fail to
>>> see how configfs is any different than the established DM ioctl interface.
>>>
>>> As I just said in another email DM multipath could benefit from
>>> factoring out the SCSI-specific bits so that they are nicely optimized
>>> away if using new transports (e.g. NVMEoF).
>>>
>>> Could be lessons can be learned from your approach but I'd prefer we
>>> provably exhaust the utility of the current DM multipath kernel
>>> implementation.  DM multipath is one of the most actively maintained and
>>> updated DM targets (aside from thinp and cache).  As you know DM
>>> multipath has grown blk-mq support which yielded serious performance
>>> improvement.  You also noted (in an earlier email) that I reintroduced
>>> bio-based DM multipath.  On a data path level we have all possible block
>>> core interfaces plumbed.  And yes, they all involve cloning due to the
>>> underlying Device Mapper core.  Open to any ideas on optimization.  If
>>> DM is imposing some inherent performance limitation then please report
>>> it accordingly.
>>>
>> Ah. And I thought you disliked request-based multipathing ...
>>
>> It's not _actually_ the DM interface which I'm objecting to, it's more the
>> user-space implementation.
>> The daemon is build around some design decisions which are simply not
>> applicable anymore:
>> - we now _do_ have reliable device identifications, so the the 'path_id'
>> functionality is pointless.
> 
> This could be largely fixed in the existing code. The route that the
> latest patch from Tang Junhui are going still grabs the wwid if we got
> it from the uevent, but it isn't necesary, as long was we're careful.
> Currently rbd devices don't get their wwid from the uevent but all other
> devices do. It would probably be possible to write an rbd device udev
> rule to set a variable so that they can work through udev environment
> variables too.
> 
But this is still only working around the problem.
We only should need to touch the device-mapper tables when setting up
devices or during reconfiguration.

>> - The 'alua' device handler also provides you with reliable priority
>> information, so it should be possible to do away with the 'prio' setting,
>> too.
> 
> But this isn't true for all devices. Also, Like I mentioned last year
> when this got brought up, no matter how we group the paths, there end up
> being users that have good reasons why they want them grouped
> differently in their case.  The path priority/grouping seems like one
> place where evidence has shown that we should give users the tools to
> make policy decisions, instead of making them ourselves.
> 
>> - And for (most) SCSI devices the 'state' setting provides a reliable
>> indicator if the device is useable.
> 
> This is also not true for all devices.
> 
So? The 'state' attribute reflects the internal SCSI device state.
If _that_ doesn't work reliably you end up with I/O errors.
Which eventually will end up with the 'state' attribute being
synchronized with the actual device state (or being set to 'offline').

> So, are you planning on creating a multipath implementation that only
> handles some devices? Obviously, the current userspace tools are still
> around to handle setups that this wouldn't.
> 
No, certainly not.
ATM my implementation is merely a testbed, as new
features/functionalities can be more easily implemented there.
I don't see any issues with porting this to device-mapper as such.

> While I've daydreamed of rewriting the multipath tools multiple times,
> and having nothing aginst you doing it in concept, I would be happier
> knowing that it won't simply mean that there are two sets of tools, that
> both need to be supported to deal with all customer configurations.
> 
Sure. I feel the pain of supporting multipath-tools all too strongly.
Having two tools for the same thing is always a pain, and I would like
to avoid this if at all possible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: F. Imend�rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N�rnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
@ 2017-01-13 15:56         ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-13 15:56 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Mike Snitzer, linux-block, device-mapper development, lsf-pc,
	linux-scsi@vger.kernel.org

On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
> On Thu, Jan 12, 2017 at 09:27:40AM +0100, Hannes Reinecke wrote:
>> On 01/11/2017 11:23 PM, Mike Snitzer wrote:
>>> On Wed, Jan 11 2017 at  4:44am -0500,
>>> Hannes Reinecke <hare@suse.de> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to attend LSF/MM this year, and would like to discuss a
>>>> redesign of the multipath handling.
>>>>
>>>> With recent kernels we've got quite some functionality required for
>>>> multipathing already implemented, making some design decisions of the
>>>> original multipath-tools implementation quite pointless.
>>>>
>>>> I'm working on a proof-of-concept implementation which just uses a
>>>> simple configfs interface and doesn't require a daemon altogether.
>>>>
>>>> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
>>>> like to stay with the current device-mapper integration or move away
>>> >from that towards a stand-alone implementation.
>>>
>>> I'd really like open exchange of the problems you're having with the
>>> current multipath-tools and DM multipath _before LSF_.  Last LSF only
>>> scratched the surface on people having disdain for the complexity that is
>>> the multipath-tools userspace.  But considering how much of the
>>> multipath-tools you've written I find it fairly comical that you're the
>>> person advocating switching away from it.
>>>
>> Yeah, I know.
>>
>> But I've stared long and hard at the code, and found some issues really hard
>> to overcome. Even more so as most things it does are really pointless.
>>
>> multipathd _insists_ on redoing the _entire_ device layout for basically any
>> operation (except for path checking).
>> As the data structures allow only for a single setup it uses a lock per
>> multipath device to protect against concurrent changes.
>> When lots of uevents are to be processed this lock is heavily contended,
>> leading to a slow-down of uevent processing.
>> (cf the patchseries from Tang Junhui and my earlier pathset for
>> lock pushdown)
>>
>> I've tried to move that lock down even further with distinct locks for
>> device paths and multipath devices, but ultimately failed as it would amount
>> to essentially a rewrite of the core engine.
> 
> The multipath user-space tools locking IS horrible and touches
> everything.  I could never see a way around it that didn't involve
> a ground-up redesign.
>  
:-)

>>> But if less userspace involvement is needed then fix userspace.  Fail to
>>> see how configfs is any different than the established DM ioctl interface.
>>>
>>> As I just said in another email DM multipath could benefit from
>>> factoring out the SCSI-specific bits so that they are nicely optimized
>>> away if using new transports (e.g. NVMEoF).
>>>
>>> Could be lessons can be learned from your approach but I'd prefer we
>>> provably exhaust the utility of the current DM multipath kernel
>>> implementation.  DM multipath is one of the most actively maintained and
>>> updated DM targets (aside from thinp and cache).  As you know DM
>>> multipath has grown blk-mq support which yielded serious performance
>>> improvement.  You also noted (in an earlier email) that I reintroduced
>>> bio-based DM multipath.  On a data path level we have all possible block
>>> core interfaces plumbed.  And yes, they all involve cloning due to the
>>> underlying Device Mapper core.  Open to any ideas on optimization.  If
>>> DM is imposing some inherent performance limitation then please report
>>> it accordingly.
>>>
>> Ah. And I thought you disliked request-based multipathing ...
>>
>> It's not _actually_ the DM interface which I'm objecting to, it's more the
>> user-space implementation.
>> The daemon is build around some design decisions which are simply not
>> applicable anymore:
>> - we now _do_ have reliable device identifications, so the the 'path_id'
>> functionality is pointless.
> 
> This could be largely fixed in the existing code. The route that the
> latest patch from Tang Junhui are going still grabs the wwid if we got
> it from the uevent, but it isn't necesary, as long was we're careful.
> Currently rbd devices don't get their wwid from the uevent but all other
> devices do. It would probably be possible to write an rbd device udev
> rule to set a variable so that they can work through udev environment
> variables too.
> 
But this is still only working around the problem.
We only should need to touch the device-mapper tables when setting up
devices or during reconfiguration.

>> - The 'alua' device handler also provides you with reliable priority
>> information, so it should be possible to do away with the 'prio' setting,
>> too.
> 
> But this isn't true for all devices. Also, Like I mentioned last year
> when this got brought up, no matter how we group the paths, there end up
> being users that have good reasons why they want them grouped
> differently in their case.  The path priority/grouping seems like one
> place where evidence has shown that we should give users the tools to
> make policy decisions, instead of making them ourselves.
> 
>> - And for (most) SCSI devices the 'state' setting provides a reliable
>> indicator if the device is useable.
> 
> This is also not true for all devices.
> 
So? The 'state' attribute reflects the internal SCSI device state.
If _that_ doesn't work reliably you end up with I/O errors.
Which eventually will end up with the 'state' attribute being
synchronized with the actual device state (or being set to 'offline').

> So, are you planning on creating a multipath implementation that only
> handles some devices? Obviously, the current userspace tools are still
> around to handle setups that this wouldn't.
> 
No, certainly not.
ATM my implementation is merely a testbed, as new
features/functionalities can be more easily implemented there.
I don't see any issues with porting this to device-mapper as such.

> While I've daydreamed of rewriting the multipath tools multiple times,
> and having nothing aginst you doing it in concept, I would be happier
> knowing that it won't simply mean that there are two sets of tools, that
> both need to be supported to deal with all customer configurations.
> 
Sure. I feel the pain of supporting multipath-tools all too strongly.
Having two tools for the same thing is always a pain, and I would like
to avoid this if at all possible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-13 15:56         ` Hannes Reinecke
@ 2017-01-13 16:07           ` Mike Snitzer
  -1 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2017-01-13 16:07 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Benjamin Marzinski, linux-block, device-mapper development,
	lsf-pc, linux-scsi@vger.kernel.org

On Fri, Jan 13 2017 at 10:56am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
> > On Thu, Jan 12, 2017 at 09:27:40AM +0100, Hannes Reinecke wrote:
> >> On 01/11/2017 11:23 PM, Mike Snitzer wrote:
> >>> On Wed, Jan 11 2017 at  4:44am -0500,
> >>> Hannes Reinecke <hare@suse.de> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I'd like to attend LSF/MM this year, and would like to discuss a
> >>>> redesign of the multipath handling.
> >>>>
> >>>> With recent kernels we've got quite some functionality required for
> >>>> multipathing already implemented, making some design decisions of the
> >>>> original multipath-tools implementation quite pointless.
> >>>>
> >>>> I'm working on a proof-of-concept implementation which just uses a
> >>>> simple configfs interface and doesn't require a daemon altogether.
> >>>>
> >>>> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
> >>>> like to stay with the current device-mapper integration or move away
> >>> >from that towards a stand-alone implementation.
> >>>
> >>> I'd really like open exchange of the problems you're having with the
> >>> current multipath-tools and DM multipath _before LSF_.  Last LSF only
> >>> scratched the surface on people having disdain for the complexity that is
> >>> the multipath-tools userspace.  But considering how much of the
> >>> multipath-tools you've written I find it fairly comical that you're the
> >>> person advocating switching away from it.
> >>>
> >> Yeah, I know.
> >>
> >> But I've stared long and hard at the code, and found some issues really hard
> >> to overcome. Even more so as most things it does are really pointless.
> >>
> >> multipathd _insists_ on redoing the _entire_ device layout for basically any
> >> operation (except for path checking).
> >> As the data structures allow only for a single setup it uses a lock per
> >> multipath device to protect against concurrent changes.
> >> When lots of uevents are to be processed this lock is heavily contended,
> >> leading to a slow-down of uevent processing.
> >> (cf the patchseries from Tang Junhui and my earlier pathset for
> >> lock pushdown)
> >>
> >> I've tried to move that lock down even further with distinct locks for
> >> device paths and multipath devices, but ultimately failed as it would amount
> >> to essentially a rewrite of the core engine.
> > 
> > The multipath user-space tools locking IS horrible and touches
> > everything.  I could never see a way around it that didn't involve
> > a ground-up redesign.
> >  
> :-)
> 
> >>> But if less userspace involvement is needed then fix userspace.  Fail to
> >>> see how configfs is any different than the established DM ioctl interface.
> >>>
> >>> As I just said in another email DM multipath could benefit from
> >>> factoring out the SCSI-specific bits so that they are nicely optimized
> >>> away if using new transports (e.g. NVMEoF).
> >>>
> >>> Could be lessons can be learned from your approach but I'd prefer we
> >>> provably exhaust the utility of the current DM multipath kernel
> >>> implementation.  DM multipath is one of the most actively maintained and
> >>> updated DM targets (aside from thinp and cache).  As you know DM
> >>> multipath has grown blk-mq support which yielded serious performance
> >>> improvement.  You also noted (in an earlier email) that I reintroduced
> >>> bio-based DM multipath.  On a data path level we have all possible block
> >>> core interfaces plumbed.  And yes, they all involve cloning due to the
> >>> underlying Device Mapper core.  Open to any ideas on optimization.  If
> >>> DM is imposing some inherent performance limitation then please report
> >>> it accordingly.
> >>>
> >> Ah. And I thought you disliked request-based multipathing ...
> >>
> >> It's not _actually_ the DM interface which I'm objecting to, it's more the
> >> user-space implementation.
> >> The daemon is build around some design decisions which are simply not
> >> applicable anymore:
> >> - we now _do_ have reliable device identifications, so the the 'path_id'
> >> functionality is pointless.
> > 
> > This could be largely fixed in the existing code. The route that the
> > latest patch from Tang Junhui are going still grabs the wwid if we got
> > it from the uevent, but it isn't necesary, as long was we're careful.
> > Currently rbd devices don't get their wwid from the uevent but all other
> > devices do. It would probably be possible to write an rbd device udev
> > rule to set a variable so that they can work through udev environment
> > variables too.
> > 
> But this is still only working around the problem.
> We only should need to touch the device-mapper tables when setting up
> devices or during reconfiguration.
> 
> >> - The 'alua' device handler also provides you with reliable priority
> >> information, so it should be possible to do away with the 'prio' setting,
> >> too.
> > 
> > But this isn't true for all devices. Also, Like I mentioned last year
> > when this got brought up, no matter how we group the paths, there end up
> > being users that have good reasons why they want them grouped
> > differently in their case.  The path priority/grouping seems like one
> > place where evidence has shown that we should give users the tools to
> > make policy decisions, instead of making them ourselves.
> > 
> >> - And for (most) SCSI devices the 'state' setting provides a reliable
> >> indicator if the device is useable.
> > 
> > This is also not true for all devices.
> > 
> So? The 'state' attribute reflects the internal SCSI device state.
> If _that_ doesn't work reliably you end up with I/O errors.
> Which eventually will end up with the 'state' attribute being
> synchronized with the actual device state (or being set to 'offline').
> 
> > So, are you planning on creating a multipath implementation that only
> > handles some devices? Obviously, the current userspace tools are still
> > around to handle setups that this wouldn't.
> > 
> No, certainly not.
> ATM my implementation is merely a testbed, as new
> features/functionalities can be more easily implemented there.
> I don't see any issues with porting this to device-mapper as such.
> 
> > While I've daydreamed of rewriting the multipath tools multiple times,
> > and having nothing aginst you doing it in concept, I would be happier
> > knowing that it won't simply mean that there are two sets of tools, that
> > both need to be supported to deal with all customer configurations.
> > 
> Sure. I feel the pain of supporting multipath-tools all too strongly.
> Having two tools for the same thing is always a pain, and I would like
> to avoid this if at all possible.

I welcome your work.  Should help us focus on what fat needs to be
trimmed from both multipath-tools and kernel.

Might be a good time to branch multipath-tools and get very aggressive
with trimming stuff that is outdated.

Things like the event stuff, using select interface, that Andy Grover is
working on (and Mikulas is taking a stab at finishing/optimizing) is
something that might help... but your approach described in this thread
may prove better.

Point is, everything should be on the table for revitalizing multipath
userspace (and kernel) to meet new requirements (e.g. NVMEoF, etc).

And yes, I'd prefer to ultimately see these advances land in terms of DM
multipath advances but we'll take it as it comes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
@ 2017-01-13 16:07           ` Mike Snitzer
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2017-01-13 16:07 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-block, lsf-pc, device-mapper development,
	linux-scsi@vger.kernel.org

On Fri, Jan 13 2017 at 10:56am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
> > On Thu, Jan 12, 2017 at 09:27:40AM +0100, Hannes Reinecke wrote:
> >> On 01/11/2017 11:23 PM, Mike Snitzer wrote:
> >>> On Wed, Jan 11 2017 at  4:44am -0500,
> >>> Hannes Reinecke <hare@suse.de> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I'd like to attend LSF/MM this year, and would like to discuss a
> >>>> redesign of the multipath handling.
> >>>>
> >>>> With recent kernels we've got quite some functionality required for
> >>>> multipathing already implemented, making some design decisions of the
> >>>> original multipath-tools implementation quite pointless.
> >>>>
> >>>> I'm working on a proof-of-concept implementation which just uses a
> >>>> simple configfs interface and doesn't require a daemon altogether.
> >>>>
> >>>> At LSF/MM I'd like to discuss how to move forward here, and whether we'd
> >>>> like to stay with the current device-mapper integration or move away
> >>> >from that towards a stand-alone implementation.
> >>>
> >>> I'd really like open exchange of the problems you're having with the
> >>> current multipath-tools and DM multipath _before LSF_.  Last LSF only
> >>> scratched the surface on people having disdain for the complexity that is
> >>> the multipath-tools userspace.  But considering how much of the
> >>> multipath-tools you've written I find it fairly comical that you're the
> >>> person advocating switching away from it.
> >>>
> >> Yeah, I know.
> >>
> >> But I've stared long and hard at the code, and found some issues really hard
> >> to overcome. Even more so as most things it does are really pointless.
> >>
> >> multipathd _insists_ on redoing the _entire_ device layout for basically any
> >> operation (except for path checking).
> >> As the data structures allow only for a single setup it uses a lock per
> >> multipath device to protect against concurrent changes.
> >> When lots of uevents are to be processed this lock is heavily contended,
> >> leading to a slow-down of uevent processing.
> >> (cf the patchseries from Tang Junhui and my earlier pathset for
> >> lock pushdown)
> >>
> >> I've tried to move that lock down even further with distinct locks for
> >> device paths and multipath devices, but ultimately failed as it would amount
> >> to essentially a rewrite of the core engine.
> > 
> > The multipath user-space tools locking IS horrible and touches
> > everything.  I could never see a way around it that didn't involve
> > a ground-up redesign.
> >  
> :-)
> 
> >>> But if less userspace involvement is needed then fix userspace.  Fail to
> >>> see how configfs is any different than the established DM ioctl interface.
> >>>
> >>> As I just said in another email DM multipath could benefit from
> >>> factoring out the SCSI-specific bits so that they are nicely optimized
> >>> away if using new transports (e.g. NVMEoF).
> >>>
> >>> Could be lessons can be learned from your approach but I'd prefer we
> >>> provably exhaust the utility of the current DM multipath kernel
> >>> implementation.  DM multipath is one of the most actively maintained and
> >>> updated DM targets (aside from thinp and cache).  As you know DM
> >>> multipath has grown blk-mq support which yielded serious performance
> >>> improvement.  You also noted (in an earlier email) that I reintroduced
> >>> bio-based DM multipath.  On a data path level we have all possible block
> >>> core interfaces plumbed.  And yes, they all involve cloning due to the
> >>> underlying Device Mapper core.  Open to any ideas on optimization.  If
> >>> DM is imposing some inherent performance limitation then please report
> >>> it accordingly.
> >>>
> >> Ah. And I thought you disliked request-based multipathing ...
> >>
> >> It's not _actually_ the DM interface which I'm objecting to, it's more the
> >> user-space implementation.
> >> The daemon is build around some design decisions which are simply not
> >> applicable anymore:
> >> - we now _do_ have reliable device identifications, so the the 'path_id'
> >> functionality is pointless.
> > 
> > This could be largely fixed in the existing code. The route that the
> > latest patch from Tang Junhui are going still grabs the wwid if we got
> > it from the uevent, but it isn't necesary, as long was we're careful.
> > Currently rbd devices don't get their wwid from the uevent but all other
> > devices do. It would probably be possible to write an rbd device udev
> > rule to set a variable so that they can work through udev environment
> > variables too.
> > 
> But this is still only working around the problem.
> We only should need to touch the device-mapper tables when setting up
> devices or during reconfiguration.
> 
> >> - The 'alua' device handler also provides you with reliable priority
> >> information, so it should be possible to do away with the 'prio' setting,
> >> too.
> > 
> > But this isn't true for all devices. Also, Like I mentioned last year
> > when this got brought up, no matter how we group the paths, there end up
> > being users that have good reasons why they want them grouped
> > differently in their case.  The path priority/grouping seems like one
> > place where evidence has shown that we should give users the tools to
> > make policy decisions, instead of making them ourselves.
> > 
> >> - And for (most) SCSI devices the 'state' setting provides a reliable
> >> indicator if the device is useable.
> > 
> > This is also not true for all devices.
> > 
> So? The 'state' attribute reflects the internal SCSI device state.
> If _that_ doesn't work reliably you end up with I/O errors.
> Which eventually will end up with the 'state' attribute being
> synchronized with the actual device state (or being set to 'offline').
> 
> > So, are you planning on creating a multipath implementation that only
> > handles some devices? Obviously, the current userspace tools are still
> > around to handle setups that this wouldn't.
> > 
> No, certainly not.
> ATM my implementation is merely a testbed, as new
> features/functionalities can be more easily implemented there.
> I don't see any issues with porting this to device-mapper as such.
> 
> > While I've daydreamed of rewriting the multipath tools multiple times,
> > and having nothing aginst you doing it in concept, I would be happier
> > knowing that it won't simply mean that there are two sets of tools, that
> > both need to be supported to deal with all customer configurations.
> > 
> Sure. I feel the pain of supporting multipath-tools all too strongly.
> Having two tools for the same thing is always a pain, and I would like
> to avoid this if at all possible.

I welcome your work.  Should help us focus on what fat needs to be
trimmed from both multipath-tools and kernel.

Might be a good time to branch multipath-tools and get very aggressive
with trimming stuff that is outdated.

Things like the event stuff, using select interface, that Andy Grover is
working on (and Mikulas is taking a stab at finishing/optimizing) is
something that might help... but your approach described in this thread
may prove better.

Point is, everything should be on the table for revitalizing multipath
userspace (and kernel) to meet new requirements (e.g. NVMEoF, etc).

And yes, I'd prefer to ultimately see these advances land in terms of DM
multipath advances but we'll take it as it comes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-13 16:07           ` Mike Snitzer
@ 2017-01-13 17:41             ` Hannes Reinecke
  -1 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-13 17:41 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Benjamin Marzinski, linux-block, device-mapper development,
	lsf-pc, linux-scsi@vger.kernel.org

On 01/13/2017 05:07 PM, Mike Snitzer wrote:
> On Fri, Jan 13 2017 at 10:56am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
[ .. ]
>>> While I've daydreamed of rewriting the multipath tools multiple times,
>>> and having nothing aginst you doing it in concept, I would be happier
>>> knowing that it won't simply mean that there are two sets of tools, that
>>> both need to be supported to deal with all customer configurations.
>>>
>> Sure. I feel the pain of supporting multipath-tools all too strongly.
>> Having two tools for the same thing is always a pain, and I would like
>> to avoid this if at all possible.
> 
> I welcome your work.  Should help us focus on what fat needs to be
> trimmed from both multipath-tools and kernel.
> 
> Might be a good time to branch multipath-tools and get very aggressive
> with trimming stuff that is outdated.
> 
> Things like the event stuff, using select interface, that Andy Grover is
> working on (and Mikulas is taking a stab at finishing/optimizing) is
> something that might help... but your approach described in this thread
> may prove better.
> 
> Point is, everything should be on the table for revitalizing multipath
> userspace (and kernel) to meet new requirements (e.g. NVMEoF, etc).
> 
> And yes, I'd prefer to ultimately see these advances land in terms of DM
> multipath advances but we'll take it as it comes.

I'm fully on board with that.
And it would be good if Ben Marzinski would be present, too;
he might have some insights which both of us might lack
(like the ominous dm-event interface into multipathd where we both
struggle to figure out what it's for ...)

Looking forward to that discussion.
And promising to have some results by then.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: J. Hawn, J. Guild, F. Imend�rffer, HRB 16746 (AG N�rnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
@ 2017-01-13 17:41             ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-01-13 17:41 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Benjamin Marzinski, linux-block, device-mapper development,
	lsf-pc, linux-scsi@vger.kernel.org

On 01/13/2017 05:07 PM, Mike Snitzer wrote:
> On Fri, Jan 13 2017 at 10:56am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
[ .. ]
>>> While I've daydreamed of rewriting the multipath tools multiple times,
>>> and having nothing aginst you doing it in concept, I would be happier
>>> knowing that it won't simply mean that there are two sets of tools, that
>>> both need to be supported to deal with all customer configurations.
>>>
>> Sure. I feel the pain of supporting multipath-tools all too strongly.
>> Having two tools for the same thing is always a pain, and I would like
>> to avoid this if at all possible.
> 
> I welcome your work.  Should help us focus on what fat needs to be
> trimmed from both multipath-tools and kernel.
> 
> Might be a good time to branch multipath-tools and get very aggressive
> with trimming stuff that is outdated.
> 
> Things like the event stuff, using select interface, that Andy Grover is
> working on (and Mikulas is taking a stab at finishing/optimizing) is
> something that might help... but your approach described in this thread
> may prove better.
> 
> Point is, everything should be on the table for revitalizing multipath
> userspace (and kernel) to meet new requirements (e.g. NVMEoF, etc).
> 
> And yes, I'd prefer to ultimately see these advances land in terms of DM
> multipath advances but we'll take it as it comes.

I'm fully on board with that.
And it would be good if Ben Marzinski would be present, too;
he might have some insights which both of us might lack
(like the ominous dm-event interface into multipathd where we both
struggle to figure out what it's for ...)

Looking forward to that discussion.
And promising to have some results by then.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
  2017-01-13 17:41             ` Hannes Reinecke
  (?)
@ 2017-01-17  1:36             ` tang.junhui
  -1 siblings, 0 replies; 13+ messages in thread
From: tang.junhui @ 2017-01-17  1:36 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi@vger.kernel.org, Mike Snitzer, dm-devel-bounces,
	linux-block, device-mapper development, lsf-pc


[-- Attachment #1.1: Type: text/plain, Size: 3384 bytes --]

Hello Hannes:

As a mulitpath developer, I find that multipath is getting bigger
and harder to maintain now, and I'm really looking forward to this
change, and I hope to be able to devote myself to this change too.
I am very interested in any news of the multipath redesign and
hope to see results soon.

I can't imagine what the new multipath looks like, but I suggest
some bad places for the current multipath we should avoid:
1) coarse grained lock;
2) vectors;
3) waiter thread;
4) high coupling;
5) too many configurations;
I really hope we can make a clean, efficient and easy to use
multipath.

Thank you
Tang Junhui



发件人:         Hannes Reinecke <hare@suse.de>
收件人:         Mike Snitzer <snitzer@redhat.com>, 
抄送:   "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, 
"lsf-pc@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org>, 
device-mapper development <dm-devel@redhat.com>, 
"linux-scsi@vger.kernel.org" <Linux-scsi@vger.kernel.org>
日期:   2017/01/14 01:52
主题:   Re: [dm-devel] [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign
发件人: dm-devel-bounces@redhat.com



On 01/13/2017 05:07 PM, Mike Snitzer wrote:
> On Fri, Jan 13 2017 at 10:56am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 01/12/2017 06:29 PM, Benjamin Marzinski wrote:
[ .. ]
>>> While I've daydreamed of rewriting the multipath tools multiple times,
>>> and having nothing aginst you doing it in concept, I would be happier
>>> knowing that it won't simply mean that there are two sets of tools, 
that
>>> both need to be supported to deal with all customer configurations.
>>>
>> Sure. I feel the pain of supporting multipath-tools all too strongly.
>> Having two tools for the same thing is always a pain, and I would like
>> to avoid this if at all possible.
> 
> I welcome your work.  Should help us focus on what fat needs to be
> trimmed from both multipath-tools and kernel.
> 
> Might be a good time to branch multipath-tools and get very aggressive
> with trimming stuff that is outdated.
> 
> Things like the event stuff, using select interface, that Andy Grover is
> working on (and Mikulas is taking a stab at finishing/optimizing) is
> something that might help... but your approach described in this thread
> may prove better.
> 
> Point is, everything should be on the table for revitalizing multipath
> userspace (and kernel) to meet new requirements (e.g. NVMEoF, etc).
> 
> And yes, I'd prefer to ultimately see these advances land in terms of DM
> multipath advances but we'll take it as it comes.

I'm fully on board with that.
And it would be good if Ben Marzinski would be present, too;
he might have some insights which both of us might lack
(like the ominous dm-event interface into multipathd where we both
struggle to figure out what it's for ...)

Looking forward to that discussion.
And promising to have some results by then.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                                    zSeries & Storage
hare@suse.de                                                   +49 911 
74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[-- Attachment #1.2: Type: text/html, Size: 5699 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-01-17  1:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11  9:44 [LSF/MM TOPIC][LSF/MM ATTEND] multipath redesign Hannes Reinecke
2017-01-11 22:23 ` Mike Snitzer
2017-01-12  8:27   ` Hannes Reinecke
2017-01-12  8:27     ` Hannes Reinecke
2017-01-12 17:29     ` [dm-devel] " Benjamin Marzinski
2017-01-12 17:29       ` Benjamin Marzinski
2017-01-13 15:56       ` Hannes Reinecke
2017-01-13 15:56         ` Hannes Reinecke
2017-01-13 16:07         ` Mike Snitzer
2017-01-13 16:07           ` Mike Snitzer
2017-01-13 17:41           ` Hannes Reinecke
2017-01-13 17:41             ` Hannes Reinecke
2017-01-17  1:36             ` tang.junhui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.