All of lore.kernel.org
 help / color / mirror / Atom feed
* Disconcerting observation with multipathd's dmevent polling code
@ 2018-11-21 10:50 Martin Wilck
  2018-11-21 11:20 ` Hannes Reinecke
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2018-11-21 10:50 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel mailing list

Hi Ben,

I recently encountered a strange and disturbing phenomenon with the
latest upstream code.

The system in question had two iSCSI connections on separate networks.
On one of them, I tested iSCSI disconnect and re-connects, and broken
storage configurations (different LUNs diffenent sizes and with the
same WWID). The other one was idle.

During one test, I observed that multipathd's internal representation
of the map representing the *idle* connection had vanished. The map was
still present on the device-mapper level, but multipathd seemed to have
"forgotten" it completely. The respective paths were tracked as
orphans.

Here is a log excerpt. multipathd is working on path additions for the
map I was testing with (149455...), when a DM event is received. Right
after ev_add_path() releases the lock, the idle map
(36001405...) is removed (not flushed, as the messages suggest).

Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: add path (uevent)
# uev_add_path->pathinfo()
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: udev property SCSI_IDENT_LUN_T10 whitelisted
# ev_add_path()->adopt_paths()
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sde: ownership set to 149455400000000006d706174683100000000000000000000
...
# ev_add_path->setup_map()
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: failback = "manual" (setting: multipath internal)
# ev_add_path->domap()
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: load table [0 2097152 multipath 0 0 3 1 service-time 0 1 1 8:64 ...
# ev_add_path->setup_multipath()->update-multipath_strings
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: disassemble map [0 0 3 1 service-time 0 1 2 8:64 1 1 service-time ...
# (!!!!) devmap event for LIO map
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 36001405d08f748dd4c5481f9ec22b888: devmap event #20
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg [8:96]: path added to devmap 149455400000000006d706174683100000000000000000000
# ev_add_path() returns, vecs->lock is released
# uev_add_path() goes on to next path, lock currently not held
Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdh: add path (uevent)
# (!!!!) The LIO map is flushed (36001405d08f748dd4c5481f9ec22b888)
# this looks like remove_map() or remove_map_and_stop_waiter()
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdd: orphan path, map flushed
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sysfs prioritizer refcount 2
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdc: orphan path, map flushed

To me this looks like the map was removed by the DM event handler.
More detail can be found on 
https://bugzilla.opensuse.org/show_bug.cgi?id=1116272.
I've also tried to reproduce the problem, with no success so far
(actually I haven't observed any dmevents in those reproduction
attempts).

After encountering this problem, I've scrutinized your polling dmevent
handler code over and over last week, but I couldn't spot any possible
problems :-)

The concern that I have is with the design of the polling dmevent API,
in particular the fact that the default action is EVENT_REMOVE. If, for
whatever reason, one map is missing from the return value of the
DM_DEVICE_LIST ioctl, multipath removes the map immediately, and
there's nothing short of a "reconfigure" or "add map" CLI command that
would reinstate the map. IOW, we remove the map not on a kernel event
saying "this map has been removed", but on receiving a list where this
element happens to be missing. We handle this differently for paths,
where we wait for a "remove" uevent before we really delete the path
from our data structures. (Note that the messages file I saved from the
event above shows no sign of such an uevent ever beeing sent - as I
said, the dm map was still present after the above occured).

What do you think about this? Could you maybe inspect those logs I
took, to make sure I didn't get on a totally wrong track here?

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-21 10:50 Disconcerting observation with multipathd's dmevent polling code Martin Wilck
@ 2018-11-21 11:20 ` Hannes Reinecke
  2018-11-21 13:33   ` Martin Wilck
  2018-11-27  9:19   ` Martin Wilck
  0 siblings, 2 replies; 8+ messages in thread
From: Hannes Reinecke @ 2018-11-21 11:20 UTC (permalink / raw)
  To: Martin Wilck, Benjamin Marzinski; +Cc: dm-devel mailing list

On 11/21/18 11:50 AM, Martin Wilck wrote:
> Hi Ben,
> 
> I recently encountered a strange and disturbing phenomenon with the
> latest upstream code.
> 
> The system in question had two iSCSI connections on separate networks.
> On one of them, I tested iSCSI disconnect and re-connects, and broken
> storage configurations (different LUNs diffenent sizes and with the
> same WWID). The other one was idle.
> 
> During one test, I observed that multipathd's internal representation
> of the map representing the *idle* connection had vanished. The map was
> still present on the device-mapper level, but multipathd seemed to have
> "forgotten" it completely. The respective paths were tracked as
> orphans.
> 
> Here is a log excerpt. multipathd is working on path additions for the
> map I was testing with (149455...), when a DM event is received. Right
> after ev_add_path() releases the lock, the idle map
> (36001405...) is removed (not flushed, as the messages suggest).
> 
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: add path (uevent)
> # uev_add_path->pathinfo()
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: udev property SCSI_IDENT_LUN_T10 whitelisted
> # ev_add_path()->adopt_paths()
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sde: ownership set to 149455400000000006d706174683100000000000000000000
> ...
> # ev_add_path->setup_map()
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: failback = "manual" (setting: multipath internal)
> # ev_add_path->domap()
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: load table [0 2097152 multipath 0 0 3 1 service-time 0 1 1 8:64 ...
> # ev_add_path->setup_multipath()->update-multipath_strings
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: disassemble map [0 0 3 1 service-time 0 1 2 8:64 1 1 service-time ...
> # (!!!!) devmap event for LIO map
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 36001405d08f748dd4c5481f9ec22b888: devmap event #20
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg [8:96]: path added to devmap 149455400000000006d706174683100000000000000000000
> # ev_add_path() returns, vecs->lock is released
> # uev_add_path() goes on to next path, lock currently not held
> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdh: add path (uevent)
> # (!!!!) The LIO map is flushed (36001405d08f748dd4c5481f9ec22b888)
> # this looks like remove_map() or remove_map_and_stop_waiter()
>> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdd: orphan path, map flushed
>> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sysfs prioritizer refcount 2
>> Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdc: orphan path, map flushed
> 
> To me this looks like the map was removed by the DM event handler.
> More detail can be found on
> https://bugzilla.opensuse.org/show_bug.cgi?id=1116272.
> I've also tried to reproduce the problem, with no success so far
> (actually I haven't observed any dmevents in those reproduction
> attempts).
> 
> After encountering this problem, I've scrutinized your polling dmevent
> handler code over and over last week, but I couldn't spot any possible
> problems :-)
> 
> The concern that I have is with the design of the polling dmevent API,
> in particular the fact that the default action is EVENT_REMOVE. If, for
> whatever reason, one map is missing from the return value of the
> DM_DEVICE_LIST ioctl, multipath removes the map immediately, and
> there's nothing short of a "reconfigure" or "add map" CLI command that
> would reinstate the map. IOW, we remove the map not on a kernel event
> saying "this map has been removed", but on receiving a list where this
> element happens to be missing. We handle this differently for paths,
> where we wait for a "remove" uevent before we really delete the path
> from our data structures. (Note that the messages file I saved from the
> event above shows no sign of such an uevent ever beeing sent - as I
> said, the dm map was still present after the above occured).
> 
> What do you think about this? Could you maybe inspect those logs I
> took, to make sure I didn't get on a totally wrong track here?
> 
There's an easy solution to it: drop the dmevent handling code in 
multipath-tools completely.
It's original design was to track _external_ map reloads, as originally 
the maps had been setup by 'multipath', and 'multipahtd' was just 
tracking state changes. As such multipathd needed to track those 
changes, and it had been using dmevents for this.
However, with the advent of udev and updates to the multipathd daemon 
external map reloads rarely happens, and even if they happen we still 
would be notified by uevents.
So we might as well drop the dmevent code and rely on udev completely.
Which is what we do nowadays anyway.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-21 11:20 ` Hannes Reinecke
@ 2018-11-21 13:33   ` Martin Wilck
  2018-11-27  9:19   ` Martin Wilck
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2018-11-21 13:33 UTC (permalink / raw)
  To: Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel mailing list

On Wed, 2018-11-21 at 12:20 +0100, Hannes Reinecke wrote:
> On 11/21/18 11:50 AM, Martin Wilck wrote:
> > 
> > The concern that I have is with the design of the polling dmevent
> > API,
> > in particular the fact that the default action is EVENT_REMOVE. If,
> > for
> > whatever reason, one map is missing from the return value of the
> > DM_DEVICE_LIST ioctl, multipath removes the map immediately, and
> > there's nothing short of a "reconfigure" or "add map" CLI command
> > that
> > would reinstate the map. IOW, we remove the map not on a kernel
> > event
> > saying "this map has been removed", but on receiving a list where
> > this
> > element happens to be missing. We handle this differently for
> > paths,
> > where we wait for a "remove" uevent before we really delete the
> > path
> > from our data structures. (Note that the messages file I saved from
> > the
> > event above shows no sign of such an uevent ever beeing sent - as I
> > said, the dm map was still present after the above occured).
> > 
> > What do you think about this? Could you maybe inspect those logs I
> > took, to make sure I didn't get on a totally wrong track here?
> > 
> There's an easy solution to it: drop the dmevent handling code in 
> multipath-tools completely.
> It's original design was to track _external_ map reloads, as
> originally 
> the maps had been setup by 'multipath', and 'multipahtd' was just 
> tracking state changes. As such multipathd needed to track those 
> changes, and it had been using dmevents for this.
> However, with the advent of udev and updates to the multipathd
> daemon 
> external map reloads rarely happens, and even if they happen we
> still 
> would be notified by uevents.
> So we might as well drop the dmevent code and rely on udev
> completely.
> Which is what we do nowadays anyway.

I recall Ben arguing that the dmevent API was faster, and more
reliable, than udev. It doesn't depend on udevd being healthy and
having enough workers available, which is a plus. The reliability seems
to be debatable in view of the API design, that was the point of my
mail.

Yet we depend so heavily on udev nowadays that multipathd is pretty
much lost without it anyway.

Let's see what Ben says.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-21 11:20 ` Hannes Reinecke
  2018-11-21 13:33   ` Martin Wilck
@ 2018-11-27  9:19   ` Martin Wilck
  2018-11-27 20:13     ` Benjamin Marzinski
  1 sibling, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2018-11-27  9:19 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel mailing list

Hi Ben,

On Wed, 2018-11-21 at 12:20 +0100, Hannes Reinecke wrote:
> On 11/21/18 11:50 AM, Martin Wilck wrote:
> > Hi Ben,
> > 
> > I recently encountered a strange and disturbing phenomenon with the
> > latest upstream code.
> > 
> > The system in question had two iSCSI connections on separate
> > networks.
> > On one of them, I tested iSCSI disconnect and re-connects, and
> > broken
> > storage configurations (different LUNs diffenent sizes and with the
> > same WWID). The other one was idle.
> > 
> > During one test, I observed that multipathd's internal
> > representation
> > of the map representing the *idle* connection had vanished. The map
> > was
> > still present on the device-mapper level, but multipathd seemed to
> > have
> > "forgotten" it completely. The respective paths were tracked as
> > orphans.
> > 
> > Here is a log excerpt. multipathd is working on path additions for
> > the
> > map I was testing with (149455...), when a DM event is received.
> > Right
> > after ev_add_path() releases the lock, the idle map
> > (36001405...) is removed (not flushed, as the messages suggest).
> > 
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: add path
> > (uevent)
> > # uev_add_path->pathinfo()
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: udev
> > property SCSI_IDENT_LUN_T10 whitelisted
> > # ev_add_path()->adopt_paths()
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sde: ownership
> > set to 149455400000000006d706174683100000000000000000000
> > ...
> > # ev_add_path->setup_map()
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > 149455400000000006d706174683100000000000000000000: failback =
> > "manual" (setting: multipath internal)
> > # ev_add_path->domap()
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > 149455400000000006d706174683100000000000000000000: load table [0
> > 2097152 multipath 0 0 3 1 service-time 0 1 1 8:64 ...
> > # ev_add_path->setup_multipath()->update-multipath_strings
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > 149455400000000006d706174683100000000000000000000: disassemble map
> > [0 0 3 1 service-time 0 1 2 8:64 1 1 service-time ...
> > # (!!!!) devmap event for LIO map
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > 36001405d08f748dd4c5481f9ec22b888: devmap event #20
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg [8:96]:
> > path added to devmap
> > 149455400000000006d706174683100000000000000000000
> > # ev_add_path() returns, vecs->lock is released
> > # uev_add_path() goes on to next path, lock currently not held
> > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdh: add path
> > (uevent)
> > # (!!!!) The LIO map is flushed (36001405d08f748dd4c5481f9ec22b888)
> > # this looks like remove_map() or remove_map_and_stop_waiter()
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdd: orphan
> > > path, map flushed
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sysfs
> > > prioritizer refcount 2
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdc: orphan
> > > path, map flushed
> > 
> > To me this looks like the map was removed by the DM event handler.
> > More detail can be found on
> > https://bugzilla.opensuse.org/show_bug.cgi?id=1116272.
> > I've also tried to reproduce the problem, with no success so far
> > (actually I haven't observed any dmevents in those reproduction
> > attempts).
> > 
> > After encountering this problem, I've scrutinized your polling
> > dmevent
> > handler code over and over last week, but I couldn't spot any
> > possible
> > problems :-)
> > 
> > The concern that I have is with the design of the polling dmevent
> > API,
> > in particular the fact that the default action is EVENT_REMOVE. If,
> > for
> > whatever reason, one map is missing from the return value of the
> > DM_DEVICE_LIST ioctl, multipath removes the map immediately, and
> > there's nothing short of a "reconfigure" or "add map" CLI command
> > that
> > would reinstate the map. IOW, we remove the map not on a kernel
> > event
> > saying "this map has been removed", but on receiving a list where
> > this
> > element happens to be missing. We handle this differently for
> > paths,
> > where we wait for a "remove" uevent before we really delete the
> > path
> > from our data structures. (Note that the messages file I saved from
> > the
> > event above shows no sign of such an uevent ever beeing sent - as I
> > said, the dm map was still present after the above occured).
> > 
> > What do you think about this? Could you maybe inspect those logs I
> > took, to make sure I didn't get on a totally wrong track here?
> > 
> There's an easy solution to it: drop the dmevent handling code in 
> multipath-tools completely.
> It's original design was to track _external_ map reloads, as
> originally 
> the maps had been setup by 'multipath', and 'multipahtd' was just 
> tracking state changes. As such multipathd needed to track those 
> changes, and it had been using dmevents for this.
> However, with the advent of udev and updates to the multipathd
> daemon 
> external map reloads rarely happens, and even if they happen we
> still 
> would be notified by uevents.
> So we might as well drop the dmevent code and rely on udev
> completely.
> Which is what we do nowadays anyway.

I like this idea, in particular for map removal.
Ben, what do you think?

Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-27  9:19   ` Martin Wilck
@ 2018-11-27 20:13     ` Benjamin Marzinski
  2018-11-28  9:31       ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2018-11-27 20:13 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel mailing list

On Tue, Nov 27, 2018 at 10:19:20AM +0100, Martin Wilck wrote:
> Hi Ben,
> 
> On Wed, 2018-11-21 at 12:20 +0100, Hannes Reinecke wrote:
> > On 11/21/18 11:50 AM, Martin Wilck wrote:
> > > Hi Ben,
> > > 
> > > I recently encountered a strange and disturbing phenomenon with the
> > > latest upstream code.
> > > 
> > > The system in question had two iSCSI connections on separate
> > > networks.
> > > On one of them, I tested iSCSI disconnect and re-connects, and
> > > broken
> > > storage configurations (different LUNs diffenent sizes and with the
> > > same WWID). The other one was idle.
> > > 
> > > During one test, I observed that multipathd's internal
> > > representation
> > > of the map representing the *idle* connection had vanished. The map
> > > was
> > > still present on the device-mapper level, but multipathd seemed to
> > > have
> > > "forgotten" it completely. The respective paths were tracked as
> > > orphans.
> > > 
> > > Here is a log excerpt. multipathd is working on path additions for
> > > the
> > > map I was testing with (149455...), when a DM event is received.
> > > Right
> > > after ev_add_path() releases the lock, the idle map
> > > (36001405...) is removed (not flushed, as the messages suggest).
> > > 
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: add path
> > > (uevent)
> > > # uev_add_path->pathinfo()
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: udev
> > > property SCSI_IDENT_LUN_T10 whitelisted
> > > # ev_add_path()->adopt_paths()
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sde: ownership
> > > set to 149455400000000006d706174683100000000000000000000
> > > ...
> > > # ev_add_path->setup_map()
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > > 149455400000000006d706174683100000000000000000000: failback =
> > > "manual" (setting: multipath internal)
> > > # ev_add_path->domap()
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > > 149455400000000006d706174683100000000000000000000: load table [0
> > > 2097152 multipath 0 0 3 1 service-time 0 1 1 8:64 ...
> > > # ev_add_path->setup_multipath()->update-multipath_strings
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > > 149455400000000006d706174683100000000000000000000: disassemble map
> > > [0 0 3 1 service-time 0 1 2 8:64 1 1 service-time ...
> > > # (!!!!) devmap event for LIO map
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]:
> > > 36001405d08f748dd4c5481f9ec22b888: devmap event #20
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg [8:96]:
> > > path added to devmap
> > > 149455400000000006d706174683100000000000000000000
> > > # ev_add_path() returns, vecs->lock is released
> > > # uev_add_path() goes on to next path, lock currently not held
> > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdh: add path
> > > (uevent)
> > > # (!!!!) The LIO map is flushed (36001405d08f748dd4c5481f9ec22b888)
> > > # this looks like remove_map() or remove_map_and_stop_waiter()
> > > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdd: orphan
> > > > path, map flushed
> > > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sysfs
> > > > prioritizer refcount 2
> > > > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdc: orphan
> > > > path, map flushed
> > > 
> > > To me this looks like the map was removed by the DM event handler.
> > > More detail can be found on
> > > https://bugzilla.opensuse.org/show_bug.cgi?id=1116272.
> > > I've also tried to reproduce the problem, with no success so far
> > > (actually I haven't observed any dmevents in those reproduction
> > > attempts).
> > > 
> > > After encountering this problem, I've scrutinized your polling
> > > dmevent
> > > handler code over and over last week, but I couldn't spot any
> > > possible
> > > problems :-)
> > > 
> > > The concern that I have is with the design of the polling dmevent
> > > API,
> > > in particular the fact that the default action is EVENT_REMOVE. If,
> > > for
> > > whatever reason, one map is missing from the return value of the
> > > DM_DEVICE_LIST ioctl, multipath removes the map immediately, and
> > > there's nothing short of a "reconfigure" or "add map" CLI command
> > > that
> > > would reinstate the map. IOW, we remove the map not on a kernel
> > > event
> > > saying "this map has been removed", but on receiving a list where
> > > this
> > > element happens to be missing. We handle this differently for
> > > paths,
> > > where we wait for a "remove" uevent before we really delete the
> > > path
> > > from our data structures. (Note that the messages file I saved from
> > > the
> > > event above shows no sign of such an uevent ever beeing sent - as I
> > > said, the dm map was still present after the above occured).
> > > 
> > > What do you think about this? Could you maybe inspect those logs I
> > > took, to make sure I didn't get on a totally wrong track here?
> > > 
> > There's an easy solution to it: drop the dmevent handling code in 
> > multipath-tools completely.
> > It's original design was to track _external_ map reloads, as
> > originally 
> > the maps had been setup by 'multipath', and 'multipahtd' was just 
> > tracking state changes. As such multipathd needed to track those 
> > changes, and it had been using dmevents for this.
> > However, with the advent of udev and updates to the multipathd
> > daemon 
> > external map reloads rarely happens, and even if they happen we
> > still 
> > would be notified by uevents.
> > So we might as well drop the dmevent code and rely on udev
> > completely.
> > Which is what we do nowadays anyway.
> 
> I like this idea, in particular for map removal.
> Ben, what do you think?

That was my original goal when I ended up with the dmevents code.  I
kept running into new corner cases.  The issue is that udev, by design,
does not guarantee that you will get all the uevents, and in fact, we do
see this happen occasionally, so it's not a theoretical problem.

Because of that, any system that relies on ueventis has to handle things
like

1. Someone removes a multipath device
2. mutipathd misses the remove uevent
3. Someone creates a differnt multipath device with the same alias
4. multipathd misses the add uevent.
5. multipathd notices that your multipath device has all the wrong
   paths, and switches them. Well, fortunately, disable_changed_wwids
   will catch this and simply disable the device

Someone decides to switch the names of two (or more) multipath devices:
1. Someone disables user_friendly_names, renaming the devices to
   their wwid names.
2. mutipathd misses the uevents.
3. The person switches the aliases in mutipath.conf and reloads
4. Now multipathd needs to figure out how to swap two (or more) device
   aliases

I kept trying to figure out methods around issues like these, and kept
thinking up new ones.  One observation I hit on was that it was really
important to not miss removing a device, because that was the first step
in having your multipath devices pointing to the wrong place.  I'm not
saying it is impossible, I'm just saying that I gave up on it. I would
happily review code that dealt with all of these issues.

However, before we start down that road (because I went down it once
already, and turned back), lets look at fixing what we have. I agree
that what happened here is that the dmevents code didn't see the device,
and thus removed it. There seem to be two possible ways for this to
happen

1. device-mapper told us that it successfully completed our command, but
   gave us incomplete information.  This would be a really big issue.
   Unlike udev, device mapper is supposed to guarantee that if it
   returns success, it has the information that you asked for. It's not
   just multipath that depends on this.  Many tools use libdevmapper.
   If there is a bug like this in it, that needs to get fixed, not
   worked around on a tool-by-tool basis. But, I don't think that this
   is what happened.

2. device-mapper told us that it failed to complete our command, and we
   didn't differentiate that from a successful completion with a certain
   result.  I think that this is more likely the cause, largely because
   I can see where we do this. In dm_get_events() once we get the list
   of names from device-mapper we call dm_is_mpath() on each name.  I'm
   not sure why I did this.  Probably, I was worried about searching
   through a large list of non-multipath names with a lock held, and
   thought that culling devices that obviously weren't correct would
   speed this up. At any rate, if the device-mapper command fails, we
   assume that the device isn't a multipath device, even though
   device-mapper never told us that.  This is clearly a bug, and it
   clearly would cause exactly the result you saw. That check should be
   removed, or at least we should differentiate between failure of the
   command and success of the command, where the device does not have a
   multipath table.

thoughts?

-Ben

> Martin
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-27 20:13     ` Benjamin Marzinski
@ 2018-11-28  9:31       ` Martin Wilck
  2018-11-28 16:57         ` Benjamin Marzinski
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2018-11-28  9:31 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel mailing list

On Tue, 2018-11-27 at 14:13 -0600, Benjamin Marzinski wrote:
> On Tue, Nov 27, 2018 at 10:19:20AM +0100, Martin Wilck wrote:
> > 
> > I like this idea, in particular for map removal.
> > Ben, what do you think?
> 
> That was my original goal when I ended up with the dmevents code.  I
> kept running into new corner cases.  The issue is that udev, by
> design,
> does not guarantee that you will get all the uevents, and in fact, we
> do
> see this happen occasionally, so it's not a theoretical problem.
> 
> Because of that, any system that relies on ueventis has to handle
> things
> like
> 
> 1. Someone removes a multipath device
> 2. mutipathd misses the remove uevent
> 3. Someone creates a differnt multipath device with the same alias
> 4. multipathd misses the add uevent.
> 5. multipathd notices that your multipath device has all the wrong
>    paths, and switches them. Well, fortunately, disable_changed_wwids
>    will catch this and simply disable the device

Maybe we should put more work in the "delegate dangerous commands to
multipathd" approach.

> Someone decides to switch the names of two (or more) multipath
> devices:
> 1. Someone disables user_friendly_names, renaming the devices to
>    their wwid names.
> 2. mutipathd misses the uevents.
> 3. The person switches the aliases in mutipath.conf and reloads
> 4. Now multipathd needs to figure out how to swap two (or more)
> device
>    aliases

IMO this works well with the current code (unless names are really
*swapped*, e.g. mpatha<->mpathb). But maybe I'm overlooking something.

> 
> I kept trying to figure out methods around issues like these, and
> kept
> thinking up new ones.  One observation I hit on was that it was
> really
> important to not miss removing a device, because that was the first
> step
> in having your multipath devices pointing to the wrong place.  I'm
> not
> saying it is impossible, I'm just saying that I gave up on it. I
> would
> happily review code that dealt with all of these issues.

If "remove" uevents is most important, what if we created an additional
monitor for "kernel" uevents in multipathd? Those should be just as
reliable as dmevents, and for removal, we don't need the udev rule
processing. Having ACTION==remove and the KERNEL name should be
sufficient. This would obviously not work for "change" or "add" events.

Also, I'd like to understand in what (test? production?) scenarios
you've observed lost uevents. Lack of reliablity of udev is a problem
not only for multipathd, and udev should be fixed. I can mainly think
of:

 1) udev rule processing gets stuck on slow IO or IO timeouts, causing
the worker to be killed,
 2) resource contention between udev workers, in particular on large
systems during "coldplug".

Do you have other failure scenarios in mind?

> 
> However, before we start down that road (because I went down it once
> already, and turned back), lets look at fixing what we have. I agree
> that what happened here is that the dmevents code didn't see the
> device,
> and thus removed it. There seem to be two possible ways for this to
> happen
> 
> 1. device-mapper told us that it successfully completed our command,
> but
>    gave us incomplete information.  This would be a really big issue.
>    Unlike udev, device mapper is supposed to guarantee that if it
>    returns success, it has the information that you asked for. It's
> not
>    just multipath that depends on this.  Many tools use libdevmapper.
>    If there is a bug like this in it, that needs to get fixed, not
>    worked around on a tool-by-tool basis. But, I don't think that
> this
>    is what happened.
> 
> 2. device-mapper told us that it failed to complete our command, and
> we
>    didn't differentiate that from a successful completion with a
> certain
>    result.  I think that this is more likely the cause, largely
> because
>    I can see where we do this. In dm_get_events() once we get the
> list
>    of names from device-mapper we call dm_is_mpath() on each
> name.  I'm
>    not sure why I did this.  Probably, I was worried about searching
>    through a large list of non-multipath names with a lock held, and
>    thought that culling devices that obviously weren't correct would
>    speed this up. At any rate, if the device-mapper command fails, we
>    assume that the device isn't a multipath device, even though
>    device-mapper never told us that.  This is clearly a bug, and it
>    clearly would cause exactly the result you saw. That check should
> be
>    removed, or at least we should differentiate between failure of
> the
>    command and success of the command, where the device does not have
> a
>    multipath table.

Good thought. I'd stared at your code in length, but that didn't occur
to me. I feel relieved that you found an explanation short of
DM_DEVICE_LIST being unreliable.

Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-28  9:31       ` Martin Wilck
@ 2018-11-28 16:57         ` Benjamin Marzinski
  2018-11-28 19:35           ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2018-11-28 16:57 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel mailing list

On Wed, Nov 28, 2018 at 10:31:02AM +0100, Martin Wilck wrote:
> On Tue, 2018-11-27 at 14:13 -0600, Benjamin Marzinski wrote:
> > On Tue, Nov 27, 2018 at 10:19:20AM +0100, Martin Wilck wrote:
> > > 
> > > I like this idea, in particular for map removal.
> > > Ben, what do you think?
> > 
> > That was my original goal when I ended up with the dmevents code.  I
> > kept running into new corner cases.  The issue is that udev, by
> > design,
> > does not guarantee that you will get all the uevents, and in fact, we
> > do
> > see this happen occasionally, so it's not a theoretical problem.
> > 
> > Because of that, any system that relies on ueventis has to handle
> > things
> > like
> > 
> > 1. Someone removes a multipath device
> > 2. mutipathd misses the remove uevent
> > 3. Someone creates a differnt multipath device with the same alias
> > 4. multipathd misses the add uevent.
> > 5. multipathd notices that your multipath device has all the wrong
> >    paths, and switches them. Well, fortunately, disable_changed_wwids
> >    will catch this and simply disable the device
> 
> Maybe we should put more work in the "delegate dangerous commands to
> multipathd" approach.

Yeah. If multipath was more of a front-end to multipathd, these
divergent view issues would be much less likely.
 
> > Someone decides to switch the names of two (or more) multipath
> > devices:
> > 1. Someone disables user_friendly_names, renaming the devices to
> >    their wwid names.
> > 2. mutipathd misses the uevents.
> > 3. The person switches the aliases in mutipath.conf and reloads
> > 4. Now multipathd needs to figure out how to swap two (or more)
> > device
> >    aliases
> 
> IMO this works well with the current code (unless names are really
> *swapped*, e.g. mpatha<->mpathb). But maybe I'm overlooking something.

It does work correctly in the current code, but that's because we never
miss remove events (renames are basically a remove followed by an add).
If we relied on uevents and missed them, when we checked the device in
checkerloop we would find that it suddenly is all wrong.

> > 
> > I kept trying to figure out methods around issues like these, and
> > kept
> > thinking up new ones.  One observation I hit on was that it was
> > really
> > important to not miss removing a device, because that was the first
> > step
> > in having your multipath devices pointing to the wrong place.  I'm
> > not
> > saying it is impossible, I'm just saying that I gave up on it. I
> > would
> > happily review code that dealt with all of these issues.
> 
> If "remove" uevents is most important, what if we created an additional
> monitor for "kernel" uevents in multipathd? Those should be just as
> reliable as dmevents, and for removal, we don't need the udev rule
> processing. Having ACTION==remove and the KERNEL name should be
> sufficient. This would obviously not work for "change" or "add" events.

That would certainly help, but again, I would contend that it's not as
reliable as devmapper. From the netlink man page

"However, reliable transmissions from kernel to user are impossible in
any case. The kernel can't send a netlink message if the socket buffer
is full: the message will be dropped and the kernel and the user-space
process will no longer have the same view of kernel state. It is up to
the application to detect when this happens (via the ENOBUFS error
returned by recvmsg(2)) and resynchronize."

In this case, we would need to rescan everything on ENOBUFS, but at
least we would get a notification.

dmevents simply has one value per device that tracks the current event
number.  Space for this value is allocated when the device is created,
and it doesn't take any more space when there is 5000 outstanding events
than when there is 1.

> Also, I'd like to understand in what (test? production?) scenarios
> you've observed lost uevents. Lack of reliablity of udev is a problem
> not only for multipathd, and udev should be fixed. I can mainly think
> of:
> 
>  1) udev rule processing gets stuck on slow IO or IO timeouts, causing
> the worker to be killed,
>  2) resource contention between udev workers, in particular on large
> systems during "coldplug".
> 
> Do you have other failure scenarios in mind?

I don't know if this ever being confirmed in the field, but udev and
won't store an unlimited number of events.  IIRC, after udev finished
processing and event, processes looking for that event may have a
limited time to receive that event before it is dropped, if memory is
tight.  To deal with this, uevent_listen needed to work quickly to make
sure it didn't lose events.  Or at least that was the case in the early
days of multipath's udev support. It might be better now.

But we've certainly seen workers killed from timeouts.

> > 
> > However, before we start down that road (because I went down it once
> > already, and turned back), lets look at fixing what we have. I agree
> > that what happened here is that the dmevents code didn't see the
> > device,
> > and thus removed it. There seem to be two possible ways for this to
> > happen
> > 
> > 1. device-mapper told us that it successfully completed our command,
> > but
> >    gave us incomplete information.  This would be a really big issue.
> >    Unlike udev, device mapper is supposed to guarantee that if it
> >    returns success, it has the information that you asked for. It's
> > not
> >    just multipath that depends on this.  Many tools use libdevmapper.
> >    If there is a bug like this in it, that needs to get fixed, not
> >    worked around on a tool-by-tool basis. But, I don't think that
> > this
> >    is what happened.
> > 
> > 2. device-mapper told us that it failed to complete our command, and
> > we
> >    didn't differentiate that from a successful completion with a
> > certain
> >    result.  I think that this is more likely the cause, largely
> > because
> >    I can see where we do this. In dm_get_events() once we get the
> > list
> >    of names from device-mapper we call dm_is_mpath() on each
> > name.  I'm
> >    not sure why I did this.  Probably, I was worried about searching
> >    through a large list of non-multipath names with a lock held, and
> >    thought that culling devices that obviously weren't correct would
> >    speed this up. At any rate, if the device-mapper command fails, we
> >    assume that the device isn't a multipath device, even though
> >    device-mapper never told us that.  This is clearly a bug, and it
> >    clearly would cause exactly the result you saw. That check should
> > be
> >    removed, or at least we should differentiate between failure of
> > the
> >    command and success of the command, where the device does not have
> > a
> >    multipath table.
> 
> Good thought. I'd stared at your code in length, but that didn't occur
> to me. I feel relieved that you found an explanation short of
> DM_DEVICE_LIST being unreliable.
> 
> Martin
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Disconcerting observation with multipathd's dmevent polling code
  2018-11-28 16:57         ` Benjamin Marzinski
@ 2018-11-28 19:35           ` Martin Wilck
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2018-11-28 19:35 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel mailing list

On Wed, 2018-11-28 at 10:57 -0600, Benjamin Marzinski wrote:
> On Wed, Nov 28, 2018 at 10:31:02AM +0100, Martin Wilck wrote:
> > 
> > If "remove" uevents is most important, what if we created an
> > additional
> > monitor for "kernel" uevents in multipathd? Those should be just as
> > reliable as dmevents, and for removal, we don't need the udev rule
> > processing. Having ACTION==remove and the KERNEL name should be
> > sufficient. This would obviously not work for "change" or "add"
> > events.
> 
> That would certainly help, but again, I would contend that it's not
> as
> reliable as devmapper. From the netlink man page
> 
> "However, reliable transmissions from kernel to user are impossible
> in
> any case. The kernel can't send a netlink message if the socket
> buffer
> is full: the message will be dropped and the kernel and the user-
> space
> process will no longer have the same view of kernel state. It is up
> to
> the application to detect when this happens (via the ENOBUFS error
> returned by recvmsg(2)) and resynchronize."
> 
> In this case, we would need to rescan everything on ENOBUFS, but at
> least we would get a notification.

By running the uevent listener with RT priority and using a suitably
large receive buffer, we should be able to reduce the likelihood of
this happening such that it doesn't really matter anymore. In real OOM
situations, multipathd is pretty much hosed anyway. multipathd hasn't
been written with resilience against memory pressure in mind.

But for the time being, I'm fine with your dm_is_mpath patch.

Thanks
Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-28 19:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21 10:50 Disconcerting observation with multipathd's dmevent polling code Martin Wilck
2018-11-21 11:20 ` Hannes Reinecke
2018-11-21 13:33   ` Martin Wilck
2018-11-27  9:19   ` Martin Wilck
2018-11-27 20:13     ` Benjamin Marzinski
2018-11-28  9:31       ` Martin Wilck
2018-11-28 16:57         ` Benjamin Marzinski
2018-11-28 19:35           ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.