* [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 13:42 ` mwilck 0 siblings, 0 replies; 32+ messages in thread From: mwilck @ 2022-01-28 13:42 UTC (permalink / raw) To: Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: Franck Bui, lvm-devel, dm-devel, Martin Wilck From: Martin Wilck <mwilck@suse.com> If a dm device is suspended, we can't run blkid on it. But earlier rules (e.g. 11-dm-parts.rules) might have imported previously scanned properties from the udev db, in particular if the device had been correctly set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for existing ID_FS_xyz properties must be preserved in this case. Otherwise lower-priority devices (such as multipath components) might take over the symlink temporarily. Likewise, we should't stop watching a temporarily suspended, but previously correctly configured dm device. Signed-off-by: Martin Wilck <mwilck@suse.com> --- udev/13-dm-disk.rules.in | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in index 535581070..5cc08121e 100644 --- a/udev/13-dm-disk.rules.in +++ b/udev/13-dm-disk.rules.in @@ -17,10 +17,14 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", GOTO="dm_end" SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" ENV{DM_SUSPENDED}=="1", GOTO="dm_end" ENV{DM_NOSCAN}=="1", GOTO="dm_watch" (BLKID_RULE) + +LABEL="dm_link" ENV{DM_UDEV_LOW_PRIORITY_FLAG}=="1", OPTIONS="link_priority=-100" ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}" ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}" -- 2.34.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 13:42 ` mwilck 0 siblings, 0 replies; 32+ messages in thread From: mwilck @ 2022-01-28 13:42 UTC (permalink / raw) To: lvm-devel From: Martin Wilck <mwilck@suse.com> If a dm device is suspended, we can't run blkid on it. But earlier rules (e.g. 11-dm-parts.rules) might have imported previously scanned properties from the udev db, in particular if the device had been correctly set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for existing ID_FS_xyz properties must be preserved in this case. Otherwise lower-priority devices (such as multipath components) might take over the symlink temporarily. Likewise, we should't stop watching a temporarily suspended, but previously correctly configured dm device. Signed-off-by: Martin Wilck <mwilck@suse.com> --- udev/13-dm-disk.rules.in | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in index 535581070..5cc08121e 100644 --- a/udev/13-dm-disk.rules.in +++ b/udev/13-dm-disk.rules.in @@ -17,10 +17,14 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", GOTO="dm_end" SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" ENV{DM_SUSPENDED}=="1", GOTO="dm_end" ENV{DM_NOSCAN}=="1", GOTO="dm_watch" (BLKID_RULE) + +LABEL="dm_link" ENV{DM_UDEV_LOW_PRIORITY_FLAG}=="1", OPTIONS="link_priority=-100" ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}" ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}" -- 2.34.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 13:42 ` mwilck @ 2022-01-28 15:33 ` Zdenek Kabelac -1 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-28 15:33 UTC (permalink / raw) To: mwilck, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Franck Bui, lvm-devel Dne 28. 01. 22 v 14:42 mwilck@suse.com napsal(a): > From: Martin Wilck <mwilck@suse.com> > > If a dm device is suspended, we can't run blkid on it. But earlier > rules (e.g. 11-dm-parts.rules) might have imported previously scanned > properties from the udev db, in particular if the device had been correctly > set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for existing > ID_FS_xyz properties must be preserved in this case. Otherwise lower-priority > devices (such as multipath components) might take over the symlink > temporarily. > > Likewise, we should't stop watching a temporarily suspended, but previously > correctly configured dm device. I'm a bit confused here what is the purpose of this patch. blkid is supposed to scan 'every' disk it's told to scan - if device is suspend - blkid shall fait till it's resumed. Suspend operation itself is meant to be quick - and process suspending any device should be doing it rather 'quickly' (aka reload DM table) So now - how do you get 'suspended' devices that are blocking blkid ? lvm2 has implemented some sort of 'optional' hack to avoid scanning suspended devices - but this shouldn't be normally needed - unless your system is flawed with some set of suspended devices (maybe from some crashed lvm command). Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 15:33 ` Zdenek Kabelac 0 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-28 15:33 UTC (permalink / raw) To: lvm-devel Dne 28. 01. 22 v 14:42 mwilck at suse.com napsal(a): > From: Martin Wilck <mwilck@suse.com> > > If a dm device is suspended, we can't run blkid on it. But earlier > rules (e.g. 11-dm-parts.rules) might have imported previously scanned > properties from the udev db, in particular if the device had been correctly > set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for existing > ID_FS_xyz properties must be preserved in this case. Otherwise lower-priority > devices (such as multipath components) might take over the symlink > temporarily. > > Likewise, we should't stop watching a temporarily suspended, but previously > correctly configured dm device. I'm a bit confused here what is the purpose of this patch. blkid is supposed to scan 'every' disk it's told to scan - if device is suspend - blkid shall fait till it's resumed. Suspend operation itself is meant to be quick - and process suspending any device should be doing it rather 'quickly' (aka reload DM table) So now - how do you get 'suspended' devices that are blocking blkid ? lvm2 has implemented some sort of 'optional' hack to avoid scanning suspended devices - but this shouldn't be normally needed - unless your system is flawed with some set of suspended devices (maybe from some crashed lvm command). Regards Zdenek ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 15:33 ` Zdenek Kabelac @ 2022-01-28 15:57 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 15:57 UTC (permalink / raw) To: Zdenek Kabelac, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel On Fri, 2022-01-28 at 16:33 +0100, Zdenek Kabelac wrote: > Dne 28. 01. 22 v 14:42 mwilck@suse.com napsal(a): > > From: Martin Wilck <mwilck@suse.com> > > > > If a dm device is suspended, we can't run blkid on it. But earlier > > rules (e.g. 11-dm-parts.rules) might have imported previously > > scanned > > properties from the udev db, in particular if the device had been > > correctly > > set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for > > existing > > ID_FS_xyz properties must be preserved in this case. Otherwise > > lower-priority > > devices (such as multipath components) might take over the symlink > > temporarily. > > > > Likewise, we should't stop watching a temporarily suspended, but > > previously > > correctly configured dm device. > > > I'm a bit confused here what is the purpose of this patch. > > blkid is supposed to scan 'every' disk it's told to scan - if device > is > suspend - blkid shall fait till it's resumed. Here we're talking about a device that had been successfully scanned before (during initramfs processing). In my case it was a partition-on- multipath device (linear mapping on top of multipath mapping) hosting a btrfs file system with multiple subvolumes. The problem occurs when the coldplug "add" event is processed after switching to the real root, and if the device is in suspended state for whatever reason when that happens. If the SYMLINK+= directive for the /dev/disk/by-uuid link for the device is skipped in the udev rules, udev will notice and remove the symlink (which means in the case of multipath: assign it to a component SCSI device instead). systemd, however, thinks that the /dev/disk/by-uuid device is ready for processing and tries to mount it while the symlink wrongly points to the SCSI device. That fails (the SCSI device is mapped by multipath), and thus booting fails. See a log excerpt below. > Suspend operation itself is meant to be quick - and process > suspending any > device should be doing it rather 'quickly' (aka reload DM table) > > So now - how do you get 'suspended' devices that are blocking blkid ? It's a race condition. It probably happens while multipathd is reloading a map (*), suspending it during the table reload. The device will be resumed a few fractions of a second later (so yes, it's "quick"), but then it's too late - systemd will already have tried to mount it, and failed. When emergency mode is reached, all looks fine, because the device has been resumed and the correct symlink has been restored by udev while processing the associated CHANGE event. I can actually see that some of the subvolumes are mounted successfully and some are not. It all depends on the timing, which device mount(2) actually accesses when it follows the by-uuid symlink. > lvm2 has implemented some sort of 'optional' hack to avoid scanning > suspended > devices - but this shouldn't be normally needed - unless your system > is flawed > with some set of suspended devices (maybe from some crashed lvm > command). I'm not sure what "hack" you're talking about. 13-dm-disk.rules always skips calling "blkid" for suspended devices. And that's correct. The point is not to "forget" valid symlinks because scanning is skipped. Regards Martin (*) If a dm device is encountered in such a transient suspended state, it is very difficult to figure out why / by which process it was suspended, in particular during boot (tell me if you know a good trick to figure it out). But multipathd is a likely candidate. Sample boot log: > [ 127.532674] localhost systemd-udevd[1080]: dm-13: Updating old device symlink '/dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0', which is no longer belonging to this device. > [ 127.532784] localhost systemd-udevd[1080]: dm-13: Found 'b8:18' claiming '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533079] localhost systemd-udevd[1080]: sdb2: Device claims priority 0 for '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533150] localhost systemd-udevd[1080]: dm-13: Found 'b8:146' claiming '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533397] localhost systemd-udevd[1080]: dm-13: Found 'b8:82' claiming '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533678] localhost systemd-udevd[1080]: dm-13: Atomically replace '/dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.535494] localhost systemd[1]: srv.mount: About to execute /usr/bin/mount /dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0 /srv -t btrfs -o subvol=/@/srv > [ 127.535845] localhost systemd[1]: srv.mount: Forked /usr/bin/mount as 1343 > [ 127.535992] localhost systemd[1]: srv.mount: Changed dead -> mounting > [ 127.536278] localhost systemd[1343]: srv.mount: Executing: /usr/bin/mount /dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0 /srv -t btrfs -o subvol=/@/srv > [ 127.657542] localhost mount[1343]: mount: /srv: /dev/sdb2 already mounted or mount point busy. > [ 127.888332] localhost systemd[1]: srv.mount: Failed to read oom_kill field of memory.events cgroup attribute: No such file or directory > [ 127.888532] localhost systemd[1]: srv.mount: Child 1343 belongs to srv.mount. > [ 127.888779] localhost systemd[1]: srv.mount: Mount process exited, code=exited, status=32/n/a > [ 127.888961] localhost systemd[1]: srv.mount: Failed with result 'exit-code'. > [ 127.889200] localhost systemd[1]: srv.mount: Changed mounting -> failed > [ 127.890046] localhost systemd[1]: srv.mount: Job 180 srv.mount/start finished, result=failed > [ 127.890283] localhost systemd[1]: Failed to mount /srv. > [ 127.918072] localhost systemd[1]: srv.mount: Unit entered failed state. Note the message "Updating old device symlink '/dev/disk/by- uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0', which is no longer belonging to this device"), which is where the trouble starts. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 15:57 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 15:57 UTC (permalink / raw) To: lvm-devel On Fri, 2022-01-28 at 16:33 +0100, Zdenek Kabelac wrote: > Dne 28. 01. 22 v 14:42 mwilck at suse.com?napsal(a): > > From: Martin Wilck <mwilck@suse.com> > > > > If a dm device is suspended, we can't run blkid on it. But earlier > > rules (e.g. 11-dm-parts.rules) might have imported previously > > scanned > > properties from the udev db, in particular if the device had been > > correctly > > set up beforehand (DM_UDEV_PRIMARY_SOURCE_FLAG==1). Symlinks for > > existing > > ID_FS_xyz properties must be preserved in this case. Otherwise > > lower-priority > > devices (such as multipath components) might take over the symlink > > temporarily. > > > > Likewise, we should't stop watching a temporarily suspended, but > > previously > > correctly configured dm device. > > > I'm a bit confused here what is the purpose of this patch. > > blkid is supposed to scan 'every' disk it's told to scan -? if device > is > suspend - blkid shall fait till it's resumed. Here we're talking about a device that had been successfully scanned before (during initramfs processing). In my case it was a partition-on- multipath device (linear mapping on top of multipath mapping) hosting a btrfs file system with multiple subvolumes. The problem occurs when the coldplug "add" event is processed after switching to the real root, and if the device is in suspended state for whatever reason when that happens. If the SYMLINK+= directive for the /dev/disk/by-uuid link for the device is skipped in the udev rules, udev will notice and remove the symlink (which means in the case of multipath: assign it to a component SCSI device instead). systemd, however, thinks that the /dev/disk/by-uuid device is ready for processing and tries to mount it while the symlink wrongly points to the SCSI device. That fails (the SCSI device is mapped by multipath), and thus booting fails. See a log excerpt below. > Suspend operation itself is meant to be quick - and process > suspending any > device should be doing it rather 'quickly'? (aka reload DM table) > > So now - how do you get 'suspended' devices that are blocking blkid ? It's a race condition. It probably happens while multipathd is reloading a map (*), suspending it during the table reload. The device will be resumed a few fractions of a second later (so yes, it's "quick"), but then it's too late - systemd will already have tried to mount it, and failed. When emergency mode is reached, all looks fine, because the device has been resumed and the correct symlink has been restored by udev while processing the associated CHANGE event. I can actually see that some of the subvolumes are mounted successfully and some are not. It all depends on the timing, which device mount(2) actually accesses when it follows the by-uuid symlink. > lvm2 has implemented some sort of 'optional' hack to avoid scanning > suspended > devices - but this shouldn't be normally needed - unless your system > is flawed > with some set of suspended devices (maybe from some crashed lvm > command). I'm not sure what "hack" you're talking about. 13-dm-disk.rules always skips calling "blkid" for suspended devices. And that's correct. The point is not to "forget" valid symlinks because scanning is skipped. Regards Martin (*) If a dm device is encountered in such a transient suspended state, it is very difficult to figure out why / by which process it was suspended, in particular during boot (tell me if you know a good trick to figure it out). But multipathd is a likely candidate. Sample boot log: > [ 127.532674] localhost systemd-udevd[1080]: dm-13: Updating old device symlink '/dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0', which is no longer belonging to this device. > [ 127.532784] localhost systemd-udevd[1080]: dm-13: Found 'b8:18' claiming '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533079] localhost systemd-udevd[1080]: sdb2: Device claims priority 0 for '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533150] localhost systemd-udevd[1080]: dm-13: Found 'b8:146' claiming '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533397] localhost systemd-udevd[1080]: dm-13: Found 'b8:82' claiming '/run/udev/links/disk\x2fby-uuid\x2fe40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.533678] localhost systemd-udevd[1080]: dm-13: Atomically replace '/dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0' > [ 127.535494] localhost systemd[1]: srv.mount: About to execute /usr/bin/mount /dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0 /srv -t btrfs -o subvol=/@/srv > [ 127.535845] localhost systemd[1]: srv.mount: Forked /usr/bin/mount as 1343 > [ 127.535992] localhost systemd[1]: srv.mount: Changed dead -> mounting > [ 127.536278] localhost systemd[1343]: srv.mount: Executing: /usr/bin/mount /dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0 /srv -t btrfs -o subvol=/@/srv > [ 127.657542] localhost mount[1343]: mount: /srv: /dev/sdb2 already mounted or mount point busy. > [ 127.888332] localhost systemd[1]: srv.mount: Failed to read oom_kill field of memory.events cgroup attribute: No such file or directory > [ 127.888532] localhost systemd[1]: srv.mount: Child 1343 belongs to srv.mount. > [ 127.888779] localhost systemd[1]: srv.mount: Mount process exited, code=exited, status=32/n/a > [ 127.888961] localhost systemd[1]: srv.mount: Failed with result 'exit-code'. > [ 127.889200] localhost systemd[1]: srv.mount: Changed mounting -> failed > [ 127.890046] localhost systemd[1]: srv.mount: Job 180 srv.mount/start finished, result=failed > [ 127.890283] localhost systemd[1]: Failed to mount /srv. > [ 127.918072] localhost systemd[1]: srv.mount: Unit entered failed state. Note the message "Updating old device symlink '/dev/disk/by- uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0', which is no longer belonging to this device"), which is where the trouble starts. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 15:57 ` Martin Wilck @ 2022-01-28 16:02 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 16:02 UTC (permalink / raw) To: Zdenek Kabelac, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > > It's a race condition. It probably happens while multipathd is > reloading a map (*), suspending it during the table reload. The > device > will be resumed a few fractions of a second later (so yes, it's > "quick"), but then it's too late More precisely: The suspend state itself may actually not last longer than a few ms. But once the symlink is bent to point to the wrong device, it will remain so, until the CHANGE event following the device resume is successfully processed by udev, which may happen substantially later. And it's that longer time span which matters for systemd's mount attempt (or LVM device activation, for that matter). Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 16:02 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 16:02 UTC (permalink / raw) To: lvm-devel On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > > It's a race condition. It probably happens while multipathd is > reloading a map (*), suspending it during the table reload. The > device > will be resumed a few fractions of a second later (so yes, it's > "quick"), but then it's too late More precisely: The suspend state itself may actually not last longer than a few ms. But once the symlink is bent to point to the wrong device, it will remain so, until the CHANGE event following the device resume is successfully processed by udev, which may happen substantially later. And it's that longer time span which matters for systemd's mount attempt (or LVM device activation, for that matter). Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 16:02 ` Martin Wilck @ 2022-01-28 17:47 ` Zdenek Kabelac -1 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-28 17:47 UTC (permalink / raw) To: Martin Wilck, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): > On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: >> >> It's a race condition. It probably happens while multipathd is >> reloading a map (*), suspending it during the table reload. The >> device >> will be resumed a few fractions of a second later (so yes, it's >> "quick"), but then it's too late > > More precisely: The suspend state itself may actually not last longer > than a few ms. But once the symlink is bent to point to the wrong > device, it will remain so, until the CHANGE event following the device > resume is successfully processed by udev, which may happen > substantially later. And it's that longer time span which matters for > systemd's mount attempt (or LVM device activation, for that matter). > > This looks like you are trying to mask-out different synchronization bug. Also it's worth to note - using symlinks is somewhat doomed on its own. So you only solve a very minor subcase where you manage to 'hit' your race just in a moment where you device is suspend and you actually 'scan' state of device. But what happen - if device would happen to be already resumed ? It looks like there is some race in udev rules processing - just somewhere else. I think Peter could more enlighten the lvm2 logic - but it seems there is possibly missing similar logic on multipath side in the moment when devices are created ? There should be no race when switching from ramdisk to rootfs. Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 17:47 ` Zdenek Kabelac 0 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-28 17:47 UTC (permalink / raw) To: lvm-devel Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): > On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: >> >> It's a race condition. It probably happens while multipathd is >> reloading a map (*), suspending it during the table reload. The >> device >> will be resumed a few fractions of a second later (so yes, it's >> "quick"), but then it's too late > > More precisely: The suspend state itself may actually not last longer > than a few ms. But once the symlink is bent to point to the wrong > device, it will remain so, until the CHANGE event following the device > resume is successfully processed by udev, which may happen > substantially later. And it's that longer time span which matters for > systemd's mount attempt (or LVM device activation, for that matter). > > This looks like you are trying to mask-out different synchronization bug. Also it's worth to note - using symlinks is somewhat doomed on its own. So you only solve a very minor subcase where you manage to 'hit' your race just in a moment where you device is suspend and you actually 'scan' state of device. But what happen - if device would happen to be already resumed ? It looks like there is some race in udev rules processing - just somewhere else. I think Peter could more enlighten the lvm2 logic - but it seems there is possibly missing similar logic on multipath side in the moment when devices are created ? There should be no race when switching from ramdisk to rootfs. Regards Zdenek ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 17:47 ` Zdenek Kabelac @ 2022-01-28 18:46 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 18:46 UTC (permalink / raw) To: Zdenek Kabelac, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: > Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): > > On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > > > > > > It's a race condition. It probably happens while multipathd is > > > reloading a map (*), suspending it during the table reload. The > > > device > > > will be resumed a few fractions of a second later (so yes, it's > > > "quick"), but then it's too late > > > > More precisely: The suspend state itself may actually not last > > longer > > than a few ms. But once the symlink is bent to point to the wrong > > device, it will remain so, until the CHANGE event following the > > device > > resume is successfully processed by udev, which may happen > > substantially later. And it's that longer time span which matters > > for > > systemd's mount attempt (or LVM device activation, for that > > matter). > > > > > > > This looks like you are trying to mask-out different synchronization > bug. Please explain. What bug would that be, in what subsystem? It takes time until a dm-triggered CHANGE event makes its way through the udev event queue and completes rules processing. That's perfectly normal. I say the bug is in this udev rule file. The set of properties and symlinks of a systemd-managed device must not change when new uevents for this device are processed, unless absolutely necessary. The current rule violates this principle: If a device that contains a file system is suspended, it continues to contain this file system, and the by-uuid symlink for the file system should be preserved. You can easily test this. Check the set of symlinks for a partition on a multipath device, suspend the device, run "udevadm trigger -c add" on the device (simulating coldplug), and check the set of symlinks again. You'll see that by-uuid or by-label links will be lost and will now point to one of the map's paths. These symlinks are crucial for mounting file systems. With my patch, the symlinks are preserved. > Also it's worth to note - using symlinks is somewhat doomed on its > own. That's how device identification and activation with udev and systemd works. It won't change any time soon. Explain on which grounds you're calling it "somewhat doomed". > So you only solve a very minor subcase where you manage to 'hit' your > race > just in a moment where you device is suspend and you actually 'scan' > state of > device. I wouldn't call it "minor" if a system fails to boot. The time window when this is possible is indeed small. But it's not zero, and that's why this issue occurs. > But what happen - if device would happen to be already resumed ? Nothing bad happens. The device is mounted or scanned just fine. > It looks like there is some race in udev rules processing - just > somewhere else. I've personally had a part in fixing multiple race conditions both multipathd and udev. Ben Marzinski and I recently worked on dropping the dependency of multipathd.service on udev-settle.service. That accellerates booting and is conceptually cleaner than before, but it reveals bugs in other parts of the system, like this one. > I think Peter could more enlighten the lvm2 logic - but it seems > there is > possibly missing similar logic on multipath side in the moment when > devices > are created ? As long as you don't explain what logic you mean, this is a hard question to answer. > There should be no race when switching from ramdisk to rootfs. I agree that there *should* not be any. But there is one, and this patch fixes it. I would appreciate if you said what's actually wrong my patch. So far you haven't. You just speculated about errors in other parts of the system. Are you saying my patch would cause regressions or errors? If yes, how would that come to pass, for which devices, and under which conditions? Regards, Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 18:46 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 18:46 UTC (permalink / raw) To: lvm-devel On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: > Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): > > On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > > > > > > It's a race condition. It probably happens while multipathd is > > > reloading a map (*), suspending it during the table reload. The > > > device > > > will be resumed a few fractions of a second later (so yes, it's > > > "quick"), but then it's too late > > > > More precisely: The suspend state itself may actually not last > > longer > > than a few ms. But once the symlink is bent to point to the wrong > > device, it will remain so, until the CHANGE event following the > > device > > resume is successfully processed by udev, which may happen > > substantially later. And it's that longer time span which matters > > for > > systemd's mount attempt (or LVM device activation, for that > > matter). > > > > > > > This looks like you are trying to mask-out different synchronization > bug. Please explain. What bug would that be, in what subsystem??It takes time until a dm-triggered CHANGE event makes its way through the udev event queue and completes rules processing. That's perfectly normal. I say the bug is in this udev rule file. The set of properties and symlinks of a systemd-managed device must not change when new uevents for this device are processed, unless absolutely necessary. The current rule violates this principle: If a device that contains a file system is suspended, it continues to contain this file system, and the by-uuid symlink for the file system should be preserved. You can easily test this. Check the set of symlinks for a partition on a multipath device, suspend the device, run "udevadm trigger -c add" on the device (simulating coldplug), and check the set of symlinks again. You'll see that by-uuid or by-label links will be lost and will now point to one of the map's paths. These symlinks are crucial for mounting file systems. With my patch, the symlinks are preserved. > Also it's worth to note - using symlinks is somewhat doomed on its > own. That's how device identification and activation with udev and systemd works. It won't change any time soon. Explain on which grounds you're calling it "somewhat doomed". > So you only solve a very minor subcase where you manage to 'hit' your > race > just in a moment where you device is suspend and you actually 'scan' > state of > device. I wouldn't call it "minor" if a system fails to boot. The time window when this is possible is indeed small. But it's not zero, and that's why this issue occurs. > But what happen - if device would happen to be already resumed ? Nothing bad happens. The device is mounted or scanned just fine. > It looks like there is some race in udev rules processing - just > somewhere else. I've personally had a part in fixing multiple race conditions both multipathd and udev.?Ben Marzinski and I recently worked on dropping the dependency of multipathd.service on udev-settle.service. That accellerates booting and is conceptually cleaner than before, but it reveals bugs in other parts of the system, like this one. > I think Peter could more enlighten the lvm2 logic - but it seems > there is > possibly missing similar logic on multipath side in the moment when > devices > are created ? As long as you don't explain what logic you mean, this is a hard question to answer. > There should be no race when switching from ramdisk to rootfs. I agree that there *should* not be any. But there is one, and this patch fixes it. I would appreciate if you said what's actually wrong my patch. So far you haven't. You just speculated about errors in other parts of the system. Are you saying my patch would cause regressions or errors? If yes, how would that come to pass, for which devices, and under which conditions? Regards, Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 18:46 ` Martin Wilck @ 2022-01-28 21:06 ` Zdenek Kabelac -1 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-28 21:06 UTC (permalink / raw) To: Martin Wilck, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel Dne 28. 01. 22 v 19:46 Martin Wilck napsal(a): > On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: >> Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): >>> On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: >>>> It's a race condition. It probably happens while multipathd is >>>> reloading a map (*), suspending it during the table reload. The >>>> device >>>> will be resumed a few fractions of a second later (so yes, it's >>>> "quick"), but then it's too late >>> More precisely: The suspend state itself may actually not last >>> longer >>> than a few ms. But once the symlink is bent to point to the wrong >>> device, it will remain so, until the CHANGE event following the >>> device >>> resume is successfully processed by udev, which may happen >>> substantially later. And it's that longer time span which matters >>> for >>> systemd's mount attempt (or LVM device activation, for that >>> matter). >>> >>> >> >> This looks like you are trying to mask-out different synchronization >> bug. > Please explain. What bug would that be, in what subsystem? It takes > time until a dm-triggered CHANGE event makes its way through the udev > event queue and completes rules processing. That's perfectly normal. Well it is at the point we need to know exactly which device in which state causes you problem. Then we need to see what is wrong in the whole process. All I'm saying here is - that the proposed patch is not fixing bug - but rather masking/minimizing window for error. > I say the bug is in this udev rule file. The set of properties and > symlinks of a systemd-managed device must not change when new uevents > for this device are processed, unless absolutely necessary. The current The question is - whether the event is meant to be there if there is no change, that should be reflected in the system. It really goes to #1 question about detailed state of DM tables and your observed blocked system. With lvm2 we are carefull about sending events. > rule violates this principle: If a device that contains a file system > is suspended, it continues to contain this file system, and the by-uuid > symlink for the file system should be preserved. > You can easily test this. Check the set of symlinks for a partition on > a multipath device, suspend the device, run "udevadm trigger -c add" on > the device (simulating coldplug), and check the set of symlinks again. Whoever reads suspended device has to wait for 'resume' - so the test case where you suspend device for long period of time and you observe other processed waiting for device to be resumed is wanted behavior - it's not valid to have suspended devices in the system. You have state 'before' suspend and state after 'resume'. There is no state for 'suspended' type of device (as the device could be resumed either with exiting table or a new table. So any simulation relaying on suspended device is basically testing invalid state of system. And I'm assuming your problem is something else - as I'd say you don't leak forgotten 'suspended' device in ramdisk ? > You'll see that by-uuid or by-label links will be lost and will now > point to one of the map's paths. These symlinks are crucial for > mounting file systems. With my patch, the symlinks are preserved. I'm still missing why 'UUID' should be lost ? Where they missing before suspend ? Are they lost after resume ? What is causing your UUID to be lost ? To connect this with lvm2 logic - when lvm2 creates or deletes device - it always waits for udev confirmation (ATM via cookie semaphore). So i.e. we can't create a situation where we would be triggering 'udev' disk creation event that would be 'racing' with udev disk removal processing - so there is close synchronization to avoid these kind of races where udev would be accessing our LV devices in some async parallel logic and possibly would have 'modifed' its symlink database with some obsolete info. >> Also it's worth to note - using symlinks is somewhat doomed on its >> own. > That's how device identification and activation with udev and systemd > works. It won't change any time soon. Explain on which grounds you're > calling it "somewhat doomed". You could find my discussion with Lennart - one of major design problem of symlinks is it can't handle duplicate devices with same IDs. So if you have multiple disks with same identification - symlinks will be randomly switching between them - so plain simple 'dd if=diskA of=diskB' ruins symlinks in your system. >> So you only solve a very minor subcase where you manage to 'hit' your >> race >> just in a moment where you device is suspend and you actually 'scan' >> state of >> device. > I wouldn't call it "minor" if a system fails to boot. The time window > when this is possible is indeed small. But it's not zero, and that's > why this issue occurs. I'm not saying 'non-booting' system is 'minor' problem - rather I'm saying your proposed fix is not the correct fixing for your existing problem. >> But what happen - if device would happen to be already resumed ? > Nothing bad happens. The device is mounted or scanned just fine. How could you get wrong UUID for already mounted (so likely also opened) device?? What is happening with your device that existing udev rule is generating incorrect symlink ? That really needs close log evidence about states of your system. >> It looks like there is some race in udev rules processing - just >> somewhere else. > I've personally had a part in fixing multiple race conditions both > multipathd and udev. Ben Marzinski and I recently worked on dropping > the dependency of multipathd.service on udev-settle.service. That > accellerates booting and is conceptually cleaner than before, but it > reveals bugs in other parts of the system, like this one. And we need to find the fix for the bug - not just mask it out. As technically udev is just one of many possible 'readers' of your device. So while you would mask bug for udev - other device users would still get misleading info about your device. > I would appreciate if you said what's actually wrong my patch. So far > you haven't. You just speculated about errors in other parts of the As said the automatic 'skip' of read of suspend device can't be the right thing. Suspend is parallel&invisible operation and as such any reader of device is meant to wait till device is resumed and finish it's reading. That's the core idea of whole DM usage - all users of DM devices use it like normal block device. So if you need to check state of DM device (and you are not a DM maintaining tool and blkid tool certainly is not such tool) then there is something wrong in design (aka we do not want to support 'skip' of udev rules if user is running random stream of 'dmsetup suspend & dmsetup resume' commands) Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 21:06 ` Zdenek Kabelac 0 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-28 21:06 UTC (permalink / raw) To: lvm-devel Dne 28. 01. 22 v 19:46 Martin Wilck napsal(a): > On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: >> Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): >>> On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: >>>> It's a race condition. It probably happens while multipathd is >>>> reloading a map (*), suspending it during the table reload. The >>>> device >>>> will be resumed a few fractions of a second later (so yes, it's >>>> "quick"), but then it's too late >>> More precisely: The suspend state itself may actually not last >>> longer >>> than a few ms. But once the symlink is bent to point to the wrong >>> device, it will remain so, until the CHANGE event following the >>> device >>> resume is successfully processed by udev, which may happen >>> substantially later. And it's that longer time span which matters >>> for >>> systemd's mount attempt (or LVM device activation, for that >>> matter). >>> >>> >> >> This looks like you are trying to mask-out different synchronization >> bug. > Please explain. What bug would that be, in what subsystem??It takes > time until a dm-triggered CHANGE event makes its way through the udev > event queue and completes rules processing. That's perfectly normal. Well it is at the point we need to know exactly which device in which state causes you problem. Then we need to see what is wrong in the whole process. All I'm saying here is - that the proposed patch is not fixing bug - but rather masking/minimizing window for error. > I say the bug is in this udev rule file. The set of properties and > symlinks of a systemd-managed device must not change when new uevents > for this device are processed, unless absolutely necessary. The current The question is - whether the event is meant to be there if there is no change, that should be reflected in the system. It really goes to #1 question about detailed state of DM tables and your observed blocked system.? With lvm2 we are carefull about sending events. > rule violates this principle: If a device that contains a file system > is suspended, it continues to contain this file system, and the by-uuid > symlink for the file system should be preserved. > You can easily test this. Check the set of symlinks for a partition on > a multipath device, suspend the device, run "udevadm trigger -c add" on > the device (simulating coldplug), and check the set of symlinks again. Whoever reads suspended device has to wait for 'resume' - so the test case where you suspend device for long period of time and you observe other processed waiting for device to be resumed is wanted behavior - it's not valid to have? suspended devices in the system.? You have state 'before' suspend? and state after 'resume'. ? There is no state for 'suspended' type of device (as the device could be resumed either with exiting table or a new table. So any simulation relaying on suspended device is basically testing invalid state of system. And I'm assuming your problem is something else - as I'd say you don't leak forgotten 'suspended' device in ramdisk ? > You'll see that by-uuid or by-label links will be lost and will now > point to one of the map's paths. These symlinks are crucial for > mounting file systems. With my patch, the symlinks are preserved. I'm still missing why 'UUID' should be lost ? Where they missing before suspend ? Are they lost after resume ? What is causing your UUID to be lost ? To connect this with lvm2 logic -? when lvm2 creates or deletes device - it always waits for udev confirmation? (ATM via cookie semaphore). So i.e. we can't create a situation where we would be triggering? 'udev' disk creation event that would be 'racing' with? udev disk removal processing - so there is close synchronization to avoid these kind of races where udev would be accessing? our LV devices in some? async parallel logic and possibly would have 'modifed' its symlink database with some obsolete info. >> Also it's worth to note - using symlinks is somewhat doomed on its >> own. > That's how device identification and activation with udev and systemd > works. It won't change any time soon. Explain on which grounds you're > calling it "somewhat doomed". You could find my discussion with Lennart? - one of major design problem of symlinks is it can't handle duplicate devices with same IDs.? So if you have multiple disks with same identification - symlinks will be randomly switching between them -? so plain simple 'dd? if=diskA of=diskB' ruins symlinks in your system. >> So you only solve a very minor subcase where you manage to 'hit' your >> race >> just in a moment where you device is suspend and you actually 'scan' >> state of >> device. > I wouldn't call it "minor" if a system fails to boot. The time window > when this is possible is indeed small. But it's not zero, and that's > why this issue occurs. I'm not saying 'non-booting' system is 'minor' problem - rather I'm saying your proposed fix is not the correct fixing for your existing problem. >> But what happen - if device would happen to be already resumed ? > Nothing bad happens. The device is mounted or scanned just fine. How could you get wrong UUID for already mounted (so likely also opened) device?? What is happening with your device that existing udev rule is generating incorrect symlink ? That really needs close log evidence about states of your system. >> It looks like there is some race in udev rules processing - just >> somewhere else. > I've personally had a part in fixing multiple race conditions both > multipathd and udev.?Ben Marzinski and I recently worked on dropping > the dependency of multipathd.service on udev-settle.service. That > accellerates booting and is conceptually cleaner than before, but it > reveals bugs in other parts of the system, like this one. And we need to find the fix for the bug - not just mask it out. As technically udev is just one of many possible 'readers' of your device. So while you would mask bug for udev - other device users would still get misleading info about your device. > I would appreciate if you said what's actually wrong my patch. So far > you haven't. You just speculated about errors in other parts of the As said the automatic? 'skip'? of read of suspend device can't be the right thing. Suspend is parallel&invisible operation and as such? any reader of device is meant to wait till device is resumed and finish it's reading. That's the core idea of whole DM usage -? all users of DM devices use it like normal block device.? So if you need to check state of DM device? (and you are not a DM maintaining tool? and blkid tool certainly is not such tool)? then there is something wrong in design? (aka we do not want to support 'skip' of udev rules if user is running? random stream? of 'dmsetup suspend & dmsetup resume' commands) Regards Zdenek ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 21:06 ` Zdenek Kabelac @ 2022-01-28 23:21 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 23:21 UTC (permalink / raw) To: Zdenek Kabelac, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel On Fri, 2022-01-28 at 22:06 +0100, Zdenek Kabelac wrote: > Dne 28. 01. 22 v 19:46 Martin Wilck napsal(a): > > On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: > > > Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): > > > > On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > > > > > It's a race condition. It probably happens while multipathd > > > > > is > > > > > reloading a map (*), suspending it during the table reload. > > > > > The > > > > > device > > > > > will be resumed a few fractions of a second later (so yes, > > > > > it's > > > > > "quick"), but then it's too late > > > > More precisely: The suspend state itself may actually not last > > > > longer > > > > than a few ms. But once the symlink is bent to point to the > > > > wrong > > > > device, it will remain so, until the CHANGE event following the > > > > device > > > > resume is successfully processed by udev, which may happen > > > > substantially later. And it's that longer time span which > > > > matters > > > > for > > > > systemd's mount attempt (or LVM device activation, for that > > > > matter). > > > > > > > > > > > > > > This looks like you are trying to mask-out different > > > synchronization > > > bug. > > Please explain. What bug would that be, in what subsystem? It takes > > time until a dm-triggered CHANGE event makes its way through the > > udev > > event queue and completes rules processing. That's perfectly > > normal. > > > Well it is at the point we need to know exactly which device in which > state > causes you problem. Then we need to see what is wrong in the whole > process. This is how multipathd loads the multipath map before the error occurs: [ 127.352614] localhost multipathd[1059]: 3600a098000aad1e300000b4b5a275d45: reload [0 134217728 multipath 3 pg_init_retries 50 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:80 1 service-time 0 2 1 8:16 1 8:144 1] The mapping of the partition device is this (after boot): # dmsetup table /dev/dm-13 0 133953503 linear 254:10 264192 # dmsetup ls --tree # excerpt 3600a098000aad1e300000b4b5a275d45-part2 (254:13) └─3600a098000aad1e300000b4b5a275d45 (254:10) ├─ (8:144) ├─ (8:16) └─ (8:80) Here you can see the links that udev creates for the device when it's set up during initrd processing (note the by-uuid link): [ 40.611989] localhost systemd-udevd[555]: dm-13: Handling device node '/dev/dm-13', devnum=b254:13 [ 40.612073] localhost systemd-udevd[555]: dm-13: Creating symlink '/dev/mapper/3600a098000aad1e300000b4b5a275d45-part2' to '../dm-13' [ 40.612374] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-id/wwn-0x600a098000aad1e300000b4b5a275d45-part2' [ 40.612476] localhost systemd-udevd[555]: dm-13: Creating symlink '/dev/disk/by-id/dm-uuid-part2-mpath-3600a098000aad1e300000b4b5a275d45' to '../../dm-13' [ 40.612827] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-partuuid/1c2f70e0-fb91-49f5-8260-38eacaf7992b' [ 40.612980] localhost systemd-udevd[555]: dm-13: Creating symlink '/dev/disk/by-id/dm-name-3600a098000aad1e300000b4b5a275d45-part2' to '../../dm-13' [ 40.613222] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-id/scsi-3600a098000aad1e300000b4b5a275d45-part2' [ 40.613502] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0' [ 40.613531] localhost systemd-udevd[555]: dm-13: Preserve already existing symlink '/dev/block/254:13' to '../dm-13' The by-uuid link is then removed after switching root as I showed in my previous post. The reason is that the respective SYMLINK line in 13-dm-disk.rules is skipped for suspended devices, and my patch fixes that. At the time this happens, the same partiton _is already mounted_ as btrfs root file system. This file system has multiple subvolumes for /tmp, /var, /usr/local, etc., which need to be mounted after switching root. The problem occurs when these mounts are attempted. Typically some succeed, and some don't, depending on the timing. > All I'm saying here is - that the proposed patch is not fixing bug - > but > rather masking/minimizing window for error. AFAICS my patch eliminates the window for this error entirely. > > I say the bug is in this udev rule file. The set of properties and > > symlinks of a systemd-managed device must not change when new > > uevents > > for this device are processed, unless absolutely necessary. The > > current > > > The question is - whether the event is meant to be there if there is > no > change, that should be reflected in the system. It really goes to #1 > question > about detailed state of DM tables and your observed blocked system. > With lvm2 > we are carefull about sending events. The coldplug "add" event always happens. It can't be avoided. It is necessary to make user-space aware of the device after switching root. And device-mapper rules go into great detail how they deal with it, as you are certainly aware. > > > rule violates this principle: If a device that contains a file > > system > > is suspended, it continues to contain this file system, and the by- > > uuid > > symlink for the file system should be preserved. > > You can easily test this. Check the set of symlinks for a partition > > on > > a multipath device, suspend the device, run "udevadm trigger -c > > add" on > > the device (simulating coldplug), and check the set of symlinks > > again. > > Whoever reads suspended device has to wait for 'resume' I suppose if mount(8) opened /dev/dm-13 directly, it would just hang while the device was suspended, and start IO afterwards. But this isn't how this works. mount(8) opens the /dev/disk/by-uuid symlink which now points to the underlying device (sdb2 - udev bent the symlink to point to sdb2 after it was dropped from dm-13). Opening sdb2 doesn't hang, it fails with -EBUSY because the device is locked by dm. Thus the mount fails. mount(8) has no way to detect why this is so. systemd might, and might attempt a retry, but it lacks the logic for doing that. > > - so the test case > where you suspend device for long period of time and you observe > other > processed waiting for device to be resumed is wanted behavior - it's > not valid > to have suspended devices in the system. Unfortunately dm devices are often suspended for a short period of time, when their table is reloaded. As I pointed out before, it's normal for this to happen during multipath startup, when paths are added or removed, priority groups switched, etc. > You have state 'before' suspend > and state after 'resume'. There is no state for 'suspended' type of > device > (as the device could be resumed either with exiting table or a new > table. So: the mapping could be entirely different when the device is resumed, and if systemd tried to mount it in the time span between the resume and the finish of the uevent that follows, it might mount a totally different file system. Is that what you mean? If yes, I suppose you're right. But isn't that possible today already? Couldn't I reload a mapping with totally different devices and offsets, while it is mounted? At least for multipath, that's extremely unlikely to happen, whereas quick reloads with a similar mapping that doesn't affect upper layers are very common. If you take the above seriously, you need to rewrite your code for communicating with udev and systemd. I guess systemd itself would need to change the way it handles dm devices, fundamentally. Basically you would need to tell systemd to consider a suspended device as unusable, and not make it available to the system again before it's resumed in a state that's verified to be compatible with the previous state. And we'd need deal with the special caveats for multipath, too. > So any simulation relaying on suspended device is basically testing > invalid > state of system. > > And I'm assuming your problem is something else - as I'd say you > don't leak > forgotten 'suspended' device in ramdisk ? No, everything is perfectly fine in the initrd. And afterwards, all is fine, too - except if, by very bad luck, the timing is such that a) the coldplug "add" event happens while the device is suspended (as I said, probably due to a reload operation) b) systemd tries to mount the file system during the short time span in which the symlinks are missing (i.e. between the resume and the final processing of the CHANGE event). In my testing, this happens only on one system, roughly once every 50 reboots. > > > You'll see that by-uuid or by-label links will be lost and will now > > point to one of the map's paths. These symlinks are crucial for > > mounting file systems. With my patch, the symlinks are preserved. > > > I'm still missing why 'UUID' should be lost ? The link is bound to udev's ID_PART_ENTRY_UUID property. This property is set by IMPORT{buidltin}="blkid", or using IMPORT{db}="ID_PART_ENTRY_UUID". The latter is done by 11-dm- mpath.rules and 11-dm-parts.rules, for example, exactly for the same reason - we can't risk loosing the symlink because blkid fails. This works nicely, unless DM_SUSPENDED or DM_NOSCAN is set, because in this case 13-dm-disk.rules won't call SYMLINK. > Where they missing before suspend ? No. > Are they lost after resume ? No, after resume udev reinstates them, but then it's too late. > What is causing your UUID to be lost ? The UUID is never lost. Only the symlink that reflects it. > > To connect this with lvm2 logic - when lvm2 creates or deletes > device - it > always waits for udev confirmation (ATM via cookie semaphore). So > i.e. we > can't create a situation where we would be triggering 'udev' disk > creation > event that would be 'racing' with udev disk removal processing - so > there is > close synchronization to avoid these kind of races where udev would > be > accessing our LV devices in some async parallel logic and possibly > would > have 'modifed' its symlink database with some obsolete info. multipathd does that, too. But it doesn't apply here. Because the device is live (system has mounted it in the initramfs already and it's actually mounted at the time all this happens, see above). Yet the coldplug event does arrive, we can't avoid it. (Well we could, at least partly, by delaying multipathd startup until after "udev settle" again, but I'm sure Ben agrees with me that we'd rather not do that). > > > > > Also it's worth to note - using symlinks is somewhat doomed on > > > its > > > own. > > That's how device identification and activation with udev and > > systemd > > works. It won't change any time soon. Explain on which grounds > > you're > > calling it "somewhat doomed". > > You could find my discussion with Lennart - one of major design > problem of > symlinks is it can't handle duplicate devices with same IDs. So if > you have > multiple disks with same identification - symlinks will be randomly > switching > between them - so plain simple 'dd if=diskA of=diskB' ruins > symlinks in your > system. udev has it's link priority concept for this reason. It may not be perfect but it works quite well, and the symlink handling has improved a lot lately. I am curious what your alternative concept is. Do you have a link? > [...] > > How could you get wrong UUID for already mounted (so likely also > opened) device?? Why do you say "wrong UUID" ? The UUID was correct all the time. I explained above what happened. Yes, the device was mounted (to other btrfs subvolumes) when the error occured. > What is happening with your device that existing udev rule is > generating > incorrect symlink ? > > That really needs close log evidence about states of your system. Believe me, I have scrutinized these logs very carefully. I've already told you the relevant parts of it. But as you asked for it: https://www.dropbox.com/sh/ks946a8wjos97ji/AAAVtOK3Z3XpbQ7DFzE1ykaYa?dl=0 > > > It looks like there is some race in udev rules processing - just > > > somewhere else. > > I've personally had a part in fixing multiple race conditions both > > multipathd and udev. Ben Marzinski and I recently worked on > > dropping > > the dependency of multipathd.service on udev-settle.service. That > > accellerates booting and is conceptually cleaner than before, but > > it > > reveals bugs in other parts of the system, like this one. > > And we need to find the fix for the bug - not just mask it out. > As technically udev is just one of many possible 'readers' of your > device. So > while you would mask bug for udev - other device users would still > get > misleading info about your device. The only thing that's misleading here is the fact that udev removes the device symlinks because the rules file don't execute the SYMLINK directives. All else is fine. "My device" is in a sane state. > > I would appreciate if you said what's actually wrong my patch. So > > far > > you haven't. You just speculated about errors in other parts of the > > > As said the automatic 'skip' of read of suspend device can't be the > right thing. But that is not what my patch does, as I already explained. The reading was skipped anyway, and this has been so since the file came into existence. Please look at the history of the file. All I am changing is to prevent the symlinks from being deleted prematurely. All I want to achieve here is fix a nasty multipath boot failure. I think we agree that the handling of dm devices by udev and systemd is not optimal. But my goal with this patch is not to save that generic problem. That's something you need to work out with Lennart and his co- workers. My estimate is that a proper solution would take a year or more of intensive work. > > Suspend is parallel&invisible operation and as such any reader of > device is > meant to wait till device is resumed and finish it's reading. That's > the core > idea of whole DM usage - all users of DM devices use it like normal > block > device. So if you need to check state of DM device (and you are not > a DM > maintaining tool and blkid tool certainly is not such tool) then > there is > something wrong in design (aka we do not want to support 'skip' of > udev rules > if user is running random stream of 'dmsetup suspend & dmsetup > resume' commands) What's "wrong in design" here is that udev can't afford to wait for a device to be resumed. If the device is suspended, blkid can't be called, period. In this case we are in jeopardy, by definition. We have to make some assumption. We can assume that the by-uuid link is still valid, like my patch does, and risk that the assumption is wrong. Or simply drop the symlink, as the current code does, and break systemd's assumptions. Discuss this with Lennart, please. It means that systemd needs to change it's device handling fundamentally. But really, this goes much further than what I'm currently interested in. I just want to make sure these systems boot reliably. If you can't take this for generic dm, we'll have to find a way to do it in the multipath rules. This is where where we run IMPORT{db} anyway to avoid failures. It will be far less elegant because it means code duplication (13-dm-disk.rules is where this logic belongs), but well, I guess this is the price we have to pay if we can't agree. Regards Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-28 23:21 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-28 23:21 UTC (permalink / raw) To: lvm-devel On Fri, 2022-01-28 at 22:06 +0100, Zdenek Kabelac wrote: > Dne 28. 01. 22 v 19:46 Martin Wilck napsal(a): > > On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: > > > Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): > > > > On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > > > > > It's a race condition. It probably happens while multipathd > > > > > is > > > > > reloading a map (*), suspending it during the table reload. > > > > > The > > > > > device > > > > > will be resumed a few fractions of a second later (so yes, > > > > > it's > > > > > "quick"), but then it's too late > > > > More precisely: The suspend state itself may actually not last > > > > longer > > > > than a few ms. But once the symlink is bent to point to the > > > > wrong > > > > device, it will remain so, until the CHANGE event following the > > > > device > > > > resume is successfully processed by udev, which may happen > > > > substantially later. And it's that longer time span which > > > > matters > > > > for > > > > systemd's mount attempt (or LVM device activation, for that > > > > matter). > > > > > > > > > > > > > > This looks like you are trying to mask-out different > > > synchronization > > > bug. > > Please explain. What bug would that be, in what subsystem??It takes > > time until a dm-triggered CHANGE event makes its way through the > > udev > > event queue and completes rules processing. That's perfectly > > normal. > > > Well it is at the point we need to know exactly which device in which > state > causes you problem. Then we need to see what is wrong in the whole > process. This is how multipathd loads the multipath map before the error occurs: [ 127.352614] localhost multipathd[1059]: 3600a098000aad1e300000b4b5a275d45: reload [0 134217728 multipath 3 pg_init_retries 50 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:80 1 service-time 0 2 1 8:16 1 8:144 1] The mapping of the partition device is this (after boot): # dmsetup table /dev/dm-13 0 133953503 linear 254:10 264192 # dmsetup ls --tree # excerpt 3600a098000aad1e300000b4b5a275d45-part2 (254:13) ??3600a098000aad1e300000b4b5a275d45 (254:10) ?? (8:144) ?? (8:16) ?? (8:80) Here you can see the links that udev creates for the device when it's set up during initrd processing (note the by-uuid link): [ 40.611989] localhost systemd-udevd[555]: dm-13: Handling device node '/dev/dm-13', devnum=b254:13 [ 40.612073] localhost systemd-udevd[555]: dm-13: Creating symlink '/dev/mapper/3600a098000aad1e300000b4b5a275d45-part2' to '../dm-13' [ 40.612374] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-id/wwn-0x600a098000aad1e300000b4b5a275d45-part2' [ 40.612476] localhost systemd-udevd[555]: dm-13: Creating symlink '/dev/disk/by-id/dm-uuid-part2-mpath-3600a098000aad1e300000b4b5a275d45' to '../../dm-13' [ 40.612827] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-partuuid/1c2f70e0-fb91-49f5-8260-38eacaf7992b' [ 40.612980] localhost systemd-udevd[555]: dm-13: Creating symlink '/dev/disk/by-id/dm-name-3600a098000aad1e300000b4b5a275d45-part2' to '../../dm-13' [ 40.613222] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-id/scsi-3600a098000aad1e300000b4b5a275d45-part2' [ 40.613502] localhost systemd-udevd[555]: dm-13: Atomically replace '/dev/disk/by-uuid/e40d3005-ab2f-4845-bd83-be5fd09e62a0' [ 40.613531] localhost systemd-udevd[555]: dm-13: Preserve already existing symlink '/dev/block/254:13' to '../dm-13' The by-uuid link is then removed after switching root as I showed in my previous post. The reason is that the respective SYMLINK line in 13-dm-disk.rules is skipped for suspended devices, and my patch fixes that. At the time this happens, the same partiton _is already mounted_ as btrfs root file system. This file system has multiple subvolumes for /tmp, /var, /usr/local, etc., which need to be mounted after switching root. The problem occurs when these mounts are attempted. Typically some succeed, and some don't, depending on the timing. > All I'm saying here is - that the proposed patch is not fixing bug - > but > rather masking/minimizing window for error. AFAICS my patch eliminates the window for this error entirely. > > I say the bug is in this udev rule file. The set of properties and > > symlinks of a systemd-managed device must not change when new > > uevents > > for this device are processed, unless absolutely necessary. The > > current > > > The question is - whether the event is meant to be there if there is > no > change, that should be reflected in the system. It really goes to #1 > question > about detailed state of DM tables and your observed blocked system.? > With lvm2 > we are carefull about sending events. The coldplug "add" event always happens. It can't be avoided. It is necessary to make user-space aware of the device after switching root. And device-mapper rules go into great detail how they deal with it, as you are certainly aware. > > > rule violates this principle: If a device that contains a file > > system > > is suspended, it continues to contain this file system, and the by- > > uuid > > symlink for the file system should be preserved. > > You can easily test this. Check the set of symlinks for a partition > > on > > a multipath device, suspend the device, run "udevadm trigger -c > > add" on > > the device (simulating coldplug), and check the set of symlinks > > again. > > Whoever reads suspended device has to wait for 'resume' I suppose if mount(8) opened /dev/dm-13 directly, it would just hang while the device was suspended, and start IO afterwards. But this isn't how this works. mount(8) opens the /dev/disk/by-uuid symlink which now points to the underlying device (sdb2 - udev bent the symlink to point to sdb2 after it was dropped from dm-13). Opening sdb2 doesn't hang, it fails with -EBUSY because the device is locked by dm. Thus the mount fails. mount(8) has no way to detect why this is so. systemd might, and might attempt a retry, but it lacks the logic for doing that. > > - so the test case > where you suspend device for long period of time and you observe > other > processed waiting for device to be resumed is wanted behavior - it's > not valid > to have? suspended devices in the system.?? Unfortunately dm devices are often suspended for a short period of time, when their table is reloaded. As I pointed out before, it's normal for this to happen during multipath startup, when paths are added or removed, priority groups switched, etc. > You have state 'before' suspend? > and state after 'resume'. ? There is no state for 'suspended' type of > device > (as the device could be resumed either with exiting table or a new > table. So: the mapping could be entirely different when the device is resumed, and if systemd tried to mount it in the time span between the resume and the finish of the uevent that follows, it might mount a totally different file system. Is that what you mean? If yes, I suppose you're right. But isn't that possible today already? Couldn't I reload a mapping with totally different devices and offsets, while it is mounted? At least for multipath, that's extremely unlikely to happen, whereas quick reloads with a similar mapping that doesn't affect upper layers are very common. If you take the above seriously, you need to rewrite your code for communicating with udev and systemd. I guess systemd itself would need to change the way it handles dm devices, fundamentally. Basically you would need to tell systemd to consider a suspended device as unusable, and not make it available to the system again before it's resumed in a state that's verified to be compatible with the previous state. And we'd need deal with the special caveats for multipath, too. > So any simulation relaying on suspended device is basically testing > invalid > state of system. > > And I'm assuming your problem is something else - as I'd say you > don't leak > forgotten 'suspended' device in ramdisk ? No, everything is perfectly fine in the initrd. And afterwards, all is fine, too - except if, by very bad luck, the timing is such that? a) the coldplug "add" event happens?while the device is suspended (as I said, probably due to a reload operation) b) systemd tries to mount the file system during the short time span in which the symlinks are missing (i.e. between the resume and the final processing of the CHANGE event). In my testing, this happens only on one system, roughly once every 50 reboots. > > > You'll see that by-uuid or by-label links will be lost and will now > > point to one of the map's paths. These symlinks are crucial for > > mounting file systems. With my patch, the symlinks are preserved. > > > I'm still missing why 'UUID' should be lost ? The link is bound to udev's ID_PART_ENTRY_UUID property. This property is set by IMPORT{buidltin}="blkid", or using IMPORT{db}="ID_PART_ENTRY_UUID". The latter is done by 11-dm- mpath.rules and 11-dm-parts.rules, for example, exactly for the same reason - we can't risk loosing the symlink because blkid fails. This works nicely, unless DM_SUSPENDED or DM_NOSCAN is set, because in this case 13-dm-disk.rules won't call SYMLINK. > Where they missing before suspend ? No. > Are they lost after resume ? No, after resume udev reinstates them, but then it's too late. > What is causing your UUID to be lost ? The UUID is never lost. Only the symlink that reflects it. > > To connect this with lvm2 logic -? when lvm2 creates or deletes > device - it > always waits for udev confirmation? (ATM via cookie semaphore). So > i.e. we > can't create a situation where we would be triggering? 'udev' disk > creation > event that would be 'racing' with? udev disk removal processing - so > there is > close synchronization to avoid these kind of races where udev would > be > accessing? our LV devices in some? async parallel logic and possibly > would > have 'modifed' its symlink database with some obsolete info. multipathd does that, too. But it doesn't apply here. Because the device is live (system has mounted it in the initramfs already and it's actually mounted at the time all this happens, see above). Yet the coldplug event does arrive, we can't avoid it. (Well we could, at least partly, by delaying multipathd startup until after "udev settle" again, but I'm sure Ben agrees with me that we'd rather not do that). > > > > > Also it's worth to note - using symlinks is somewhat doomed on > > > its > > > own. > > That's how device identification and activation with udev and > > systemd > > works. It won't change any time soon. Explain on which grounds > > you're > > calling it "somewhat doomed". > > You could find my discussion with Lennart? - one of major design > problem of > symlinks is it can't handle duplicate devices with same IDs.? So if > you have > multiple disks with same identification - symlinks will be randomly > switching > between them -? so plain simple 'dd? if=diskA of=diskB' ruins > symlinks in your > system. udev has it's link priority concept for this reason. It may not be perfect but it works quite well, and the symlink handling has improved a lot lately. I am curious what your alternative concept is. Do you have a link? > [...] > > How could you get wrong UUID for already mounted (so likely also > opened) device?? Why do you say "wrong UUID" ? The UUID was correct all the time. I explained above what happened. Yes, the device was mounted (to other btrfs subvolumes) when the error occured. > What is happening with your device that existing udev rule is > generating > incorrect symlink ? > > That really needs close log evidence about states of your system. Believe me, I have scrutinized these logs very carefully. I've already told you the relevant parts of it.?But as you asked for it: https://www.dropbox.com/sh/ks946a8wjos97ji/AAAVtOK3Z3XpbQ7DFzE1ykaYa?dl=0 > > > It looks like there is some race in udev rules processing - just > > > somewhere else. > > I've personally had a part in fixing multiple race conditions both > > multipathd and udev.?Ben Marzinski and I recently worked on > > dropping > > the dependency of multipathd.service on udev-settle.service. That > > accellerates booting and is conceptually cleaner than before, but > > it > > reveals bugs in other parts of the system, like this one. > > And we need to find the fix for the bug - not just mask it out. > As technically udev is just one of many possible 'readers' of your > device. So > while you would mask bug for udev - other device users would still > get > misleading info about your device. The only thing that's misleading here is the fact that udev removes the device symlinks because the rules file don't execute the SYMLINK directives. All else is fine. "My device" is in a sane state. > > I would appreciate if you said what's actually wrong my patch. So > > far > > you haven't. You just speculated about errors in other parts of the > > > As said the automatic? 'skip'? of read of suspend device can't be the > right thing. But that is not what my patch does, as I already explained. The reading was skipped anyway, and this has been so since the file came into existence. Please look at the history of the file. All I am changing is to prevent the symlinks from being deleted prematurely. All I want to achieve here is fix a nasty multipath boot failure. I think we agree that the handling of dm devices by udev and systemd is not optimal. But my goal with this patch is not to save that generic problem. That's something you need to work out with Lennart and his co- workers. My estimate is that a proper solution would take a year or more of intensive work. > > Suspend is parallel&invisible operation and as such? any reader of > device is > meant to wait till device is resumed and finish it's reading. That's > the core > idea of whole DM usage -? all users of DM devices use it like normal > block > device.? So if you need to check state of DM device? (and you are not > a DM > maintaining tool? and blkid tool certainly is not such tool)? then > there is > something wrong in design? (aka we do not want to support 'skip' of > udev rules > if user is running? random stream? of 'dmsetup suspend & dmsetup > resume' commands) What's "wrong in design" here is that udev can't afford to wait for a device to be resumed. If the device is suspended, blkid can't be called, period. In this case we are in jeopardy, by definition. We have to make some assumption. We can assume that the by-uuid link is still valid, like my patch does, and risk that the assumption is wrong. Or simply drop the symlink, as the current code does, and break systemd's assumptions. Discuss this with Lennart, please. It means that systemd needs to change it's device handling fundamentally. But really, this goes much further than what I'm currently interested in. I just want to make sure these systems boot reliably. If you can't take this for generic dm, we'll have to find a way to do it in the multipath rules. This is where where we run IMPORT{db} anyway to avoid failures. It will be far less elegant because it means code duplication (13-dm-disk.rules is where this logic belongs), but well, I guess this is the price we have to pay if we can't agree. Regards Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-28 23:21 ` Martin Wilck @ 2022-01-29 20:05 ` Zdenek Kabelac -1 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-29 20:05 UTC (permalink / raw) To: Martin Wilck, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel Dne 29. 01. 22 v 0:21 Martin Wilck napsal(a): > On Fri, 2022-01-28 at 22:06 +0100, Zdenek Kabelac wrote: >> Dne 28. 01. 22 v 19:46 Martin Wilck napsal(a): >>> On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: >>>> Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): >>>>> On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > Well it is at the point we need to know exactly which device in which >> state >> causes you problem. Then we need to see what is wrong in the whole >> process. > > This is how multipathd loads the multipath map before the error occurs: > > [ 127.352614] localhost multipathd[1059]: 3600a098000aad1e300000b4b5a275d45: reload [0 134217728 multipath 3 pg_init_retries 50 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:80 1 service-time 0 2 1 8:16 1 8:144 1] > >> All I'm saying here is - that the proposed patch is not fixing bug - >> but >> rather masking/minimizing window for error. > > AFAICS my patch eliminates the window for this error entirely. Ok now I see that there is already skip for DM_SUSPENDED and you patch likely only tries to preserve some existing state of links. I'll need to get in touch with Peter here. I guess the idea behind was to avoid read of device that will be resumed and will automatically get a new event - and suspened device itself cannot change - but that fact it's been loosing existing state was missed - I'm wondering why this was not seen as problem before. Regars Zdene -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-29 20:05 ` Zdenek Kabelac 0 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-01-29 20:05 UTC (permalink / raw) To: lvm-devel Dne 29. 01. 22 v 0:21 Martin Wilck napsal(a): > On Fri, 2022-01-28 at 22:06 +0100, Zdenek Kabelac wrote: >> Dne 28. 01. 22 v 19:46 Martin Wilck napsal(a): >>> On Fri, 2022-01-28 at 18:47 +0100, Zdenek Kabelac wrote: >>>> Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a): >>>>> On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote: > Well it is at the point we need to know exactly which device in which >> state >> causes you problem. Then we need to see what is wrong in the whole >> process. > > This is how multipathd loads the multipath map before the error occurs: > > [ 127.352614] localhost multipathd[1059]: 3600a098000aad1e300000b4b5a275d45: reload [0 134217728 multipath 3 pg_init_retries 50 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:80 1 service-time 0 2 1 8:16 1 8:144 1] > >> All I'm saying here is - that the proposed patch is not fixing bug - >> but >> rather masking/minimizing window for error. > > AFAICS my patch eliminates the window for this error entirely. Ok now I see that there is already skip for DM_SUSPENDED and you patch likely only tries to preserve some existing state of links. I'll need to get in touch with Peter here. I guess the idea behind was to avoid read of device that will be resumed and will automatically get a new event - and suspened device itself cannot change - but that fact it's been loosing existing state was missed - I'm wondering why this was not seen as problem before. Regars Zdene ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-29 20:05 ` Zdenek Kabelac @ 2022-01-29 20:46 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-29 20:46 UTC (permalink / raw) To: Zdenek Kabelac, Zdenek Kabelac, David Teigland, Peter Rajnoha Cc: dm-devel, Heming Zhao, Franck Bui, lvm-devel On Sat, 2022-01-29 at 21:05 +0100, Zdenek Kabelac wrote: > Dne 29. 01. 22 v 0:21 Martin Wilck napsal(a): > > > > > AFAICS my patch eliminates the window for this error entirely. > > > Ok now I see that there is already skip for DM_SUSPENDED > and you patch likely only tries to preserve some existing state of > links. > > I'll need to get in touch with Peter here. > > I guess the idea behind was to avoid read of device that will be > resumed and > will automatically get a new event - and suspened device itself > cannot change Thank you! > - but that fact it's been loosing existing state was missed - I'm > wondering > why this was not seen as problem before. One reason is that we're now starting multipathd earlier. This has a lot of advantages, but it reveals problems that were hidden behind the "After=systemd-udev-settle.service" dependency of mulltipathd before. Regards, Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-29 20:46 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-01-29 20:46 UTC (permalink / raw) To: lvm-devel On Sat, 2022-01-29 at 21:05 +0100, Zdenek Kabelac wrote: > Dne 29. 01. 22 v 0:21 Martin Wilck napsal(a): > > > > > AFAICS my patch eliminates the window for this error entirely. > > > Ok now I see that there is already skip for DM_SUSPENDED > and you patch likely only tries to preserve some existing state of > links. > > I'll need to get in touch with Peter here. > > I guess the idea behind was to avoid read of device that will be > resumed and > will automatically get a new event - and suspened device itself > cannot change Thank you! > - but that fact it's been loosing existing state was missed - I'm > wondering > why this was not seen as problem before. One reason is that we're now starting multipathd earlier. This has a lot of advantages, but it reveals problems that were hidden behind the "After=systemd-udev-settle.service" dependency of mulltipathd before. Regards, Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-29 20:46 ` Martin Wilck @ 2022-01-31 13:33 ` Peter Rajnoha -1 siblings, 0 replies; 32+ messages in thread From: Peter Rajnoha @ 2022-01-31 13:33 UTC (permalink / raw) To: Martin Wilck Cc: Franck Bui, lvm-devel, dm-devel, David Teigland, Zdenek Kabelac, Zdenek Kabelac, Heming Zhao On Sat 29 Jan 2022 21:46, Martin Wilck wrote: > On Sat, 2022-01-29 at 21:05 +0100, Zdenek Kabelac wrote: > > Dne 29. 01. 22 v 0:21 Martin Wilck napsal(a): > > > > > > > > AFAICS my patch eliminates the window for this error entirely. > > > > > > Ok now I see that there is already skip for DM_SUSPENDED > > and you patch likely only tries to preserve some existing state of > > links. > > > > I'll need to get in touch with Peter here. > > > > I guess the idea behind was to avoid read of device that will be > > resumed and > > will automatically get a new event - and suspened device itself > > cannot change > > Thank you! > > > - but that fact it's been loosing existing state was missed - I'm > > wondering > > why this was not seen as problem before. > > One reason is that we're now starting multipathd earlier. This has a > lot of advantages, but it reveals problems that were hidden behind the > "After=systemd-udev-settle.service" dependency of mulltipathd before. Hi! (just discussed this with Zdenek too) The patch makes sense to me! We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly such cases where we need to take the existing values already scanned in previous event, main use-case being the trigger at boot. We just didn't cover the 13-dm-disk.rules with the same logic regarding the suspended state to keep the symlinks - I didn't think it would cause issues (because, usually, after suspend, we anticipate incoming resume where the device is scanned again). But yes, if temporarily losing the symlink causes issues, your patch solves that (Zdenek will push that upstream). (There's one more situation with checking the suspended state for the pvscan we have in 69-dm-lvm...rules, but that's lvm-specific, we'll patch that one...) Thanks! -- Peter -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-01-31 13:33 ` Peter Rajnoha 0 siblings, 0 replies; 32+ messages in thread From: Peter Rajnoha @ 2022-01-31 13:33 UTC (permalink / raw) To: lvm-devel On Sat 29 Jan 2022 21:46, Martin Wilck wrote: > On Sat, 2022-01-29 at 21:05 +0100, Zdenek Kabelac wrote: > > Dne 29. 01. 22 v 0:21 Martin Wilck napsal(a): > > > > > > > > AFAICS my patch eliminates the window for this error entirely. > > > > > > Ok now I see that there is already skip for DM_SUSPENDED > > and you patch likely only tries to preserve some existing state of > > links. > > > > I'll need to get in touch with Peter here. > > > > I guess the idea behind was to avoid read of device that will be > > resumed and > > will automatically get a new event - and suspened device itself > > cannot change > > Thank you! > > > - but that fact it's been loosing existing state was missed - I'm > > wondering > > why this was not seen as problem before. > > One reason is that we're now starting multipathd earlier. This has a > lot of advantages, but it reveals problems that were hidden behind the > "After=systemd-udev-settle.service" dependency of mulltipathd before. Hi! (just discussed this with Zdenek too) The patch makes sense to me! We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly such cases where we need to take the existing values already scanned in previous event, main use-case being the trigger at boot. We just didn't cover the 13-dm-disk.rules with the same logic regarding the suspended state to keep the symlinks - I didn't think it would cause issues (because, usually, after suspend, we anticipate incoming resume where the device is scanned again). But yes, if temporarily losing the symlink causes issues, your patch solves that (Zdenek will push that upstream). (There's one more situation with checking the suspended state for the pvscan we have in 69-dm-lvm...rules, but that's lvm-specific, we'll patch that one...) Thanks! -- Peter ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-01-31 13:33 ` Peter Rajnoha @ 2022-02-01 8:40 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-02-01 8:40 UTC (permalink / raw) To: Peter Rajnoha Cc: Franck Bui, lvm-devel, dm-devel, David Teigland, Zdenek Kabelac, Zdenek Kabelac, Heming Zhao On Mon, 2022-01-31 at 14:33 +0100, Peter Rajnoha wrote: > (just discussed this with Zdenek too) > > The patch makes sense to me! > > We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly > such cases where we need to take the existing values already scanned > in previous event, main use-case being the trigger at boot. We just > didn't cover the 13-dm-disk.rules with the same logic regarding the > suspended state to keep the symlinks - I didn't think it would cause > issues (because, usually, after suspend, we anticipate incoming > resume where the device is scanned again). > > But yes, if temporarily losing the symlink causes issues, your patch > solves that (Zdenek will push that upstream). Thank you very much! It occured to me that if we want to solve my use case with minimal risk, we could make the the case in which the symlinks are preserved conditional on ACTION=="add" (i.e. true coldplug events). Tell me if you'd prefer that, I can re-submit. Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-02-01 8:40 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-02-01 8:40 UTC (permalink / raw) To: lvm-devel On Mon, 2022-01-31 at 14:33 +0100, Peter Rajnoha wrote: > (just discussed this with Zdenek too) > > The patch makes sense to me! > > We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly > such cases where we need to take the existing values already scanned > in previous event, main use-case being the trigger at boot. We just > didn't cover the 13-dm-disk.rules with the same logic regarding the > suspended state to keep the symlinks - I didn't think it would cause > issues (because, usually, after suspend, we anticipate incoming > resume where the device is scanned again). > > But yes, if temporarily losing the symlink causes issues, your patch > solves that (Zdenek will push that upstream). Thank you very much! It occured to me that if we want to solve my use case with minimal risk, we could make the the case in which the symlinks are preserved conditional on ACTION=="add" (i.e. true coldplug events). Tell me if you'd prefer that, I can re-submit. Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-02-01 8:40 ` Martin Wilck @ 2022-02-01 10:11 ` Zdenek Kabelac -1 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-02-01 10:11 UTC (permalink / raw) To: Martin Wilck, Peter Rajnoha Cc: Franck Bui, lvm-devel, dm-devel, David Teigland, Zdenek Kabelac, Heming Zhao Dne 01. 02. 22 v 9:40 Martin Wilck napsal(a): > On Mon, 2022-01-31 at 14:33 +0100, Peter Rajnoha wrote: >> (just discussed this with Zdenek too) >> >> The patch makes sense to me! >> >> We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly >> such cases where we need to take the existing values already scanned >> in previous event, main use-case being the trigger at boot. We just >> didn't cover the 13-dm-disk.rules with the same logic regarding the >> suspended state to keep the symlinks - I didn't think it would cause >> issues (because, usually, after suspend, we anticipate incoming >> resume where the device is scanned again). >> >> But yes, if temporarily losing the symlink causes issues, your patch >> solves that (Zdenek will push that upstream). > > Thank you very much! It occured to me that if we want to solve my use > case with minimal risk, we could make the the case in which the > symlinks are preserved conditional on ACTION=="add" (i.e. true coldplug > events). Tell me if you'd prefer that, I can re-submit. The problem is handling of 'suspended' state in udev rules - as I've mentioned original no userland tool should mostly care about this. However since there are those things like udev 'trigger' and it would be kind of ugly if the 'left-over' suspended device would stop whole system operation it's most likely better ATM to have some kind of support for this. It's should be noted there is still 'race' risk of system freezing in the case the suspend happens just after sysfs test and before actual i.e. 'blkid' use. But let's assume the chance of having this to happen is pretty minimal, and usually tools doing suspend & resume will work correctly and the 'crash' will not likely happen in this particular moment - and suspended device just hanging there already for longer time. The missed issue to be solve is - that ALL rules have to rely on a single suspend check - otherwise we risk 'partial' processing which is the last thing we want to see (aka all or nothing). Your real problem was not the suspend on its own - which still may happen anytime (so i.e. just after the test whether device is suspended), but the bug was related to bad exit path cleaning udev db content in this case - which is clear bug. Next bug is that other rules have to be consistent and properly skip its processing if the 1st. rules decides its meant to be skipped. ATM your patch is already upstream and great thanks for pointing this out. Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-02-01 10:11 ` Zdenek Kabelac 0 siblings, 0 replies; 32+ messages in thread From: Zdenek Kabelac @ 2022-02-01 10:11 UTC (permalink / raw) To: lvm-devel Dne 01. 02. 22 v 9:40 Martin Wilck napsal(a): > On Mon, 2022-01-31 at 14:33 +0100, Peter Rajnoha wrote: >> (just discussed this with Zdenek too) >> >> The patch makes sense to me! >> >> We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly >> such cases where we need to take the existing values already scanned >> in previous event, main use-case being the trigger at boot. We just >> didn't cover the 13-dm-disk.rules with the same logic regarding the >> suspended state to keep the symlinks - I didn't think it would cause >> issues (because, usually, after suspend, we anticipate incoming >> resume where the device is scanned again). >> >> But yes, if temporarily losing the symlink causes issues, your patch >> solves that (Zdenek will push that upstream). > > Thank you very much! It occured to me that if we want to solve my use > case with minimal risk, we could make the the case in which the > symlinks are preserved conditional on ACTION=="add" (i.e. true coldplug > events). Tell me if you'd prefer that, I can re-submit. The problem is handling of 'suspended' state in udev rules - as I've mentioned original no userland tool should mostly care about this. However since there are those things like udev 'trigger' and it would be kind of ugly if the 'left-over' suspended device would stop whole system operation it's most likely better ATM to have some kind of support for this. It's should be noted there is still 'race' risk of system freezing in the case the suspend happens just after sysfs test and before actual i.e. 'blkid' use. But let's assume the chance of having this to happen is pretty minimal, and usually tools doing suspend & resume will work correctly and the 'crash' will not likely happen in this particular moment - and suspended device just hanging there already for longer time. The missed issue to be solve is - that ALL rules have to rely on a single suspend check - otherwise we risk 'partial' processing which is the last thing we want to see (aka all or nothing). Your real problem was not the suspend on its own - which still may happen anytime (so i.e. just after the test whether device is suspended), but the bug was related to bad exit path cleaning udev db content in this case - which is clear bug. Next bug is that other rules have to be consistent and properly skip its processing if the 1st. rules decides its meant to be skipped. ATM your patch is already upstream and great thanks for pointing this out. Regards Zdenek ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-02-01 10:11 ` Zdenek Kabelac @ 2022-02-01 10:55 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-02-01 10:55 UTC (permalink / raw) To: Zdenek Kabelac, Peter Rajnoha Cc: Franck Bui, lvm-devel, dm-devel, David Teigland, Zdenek Kabelac, Heming Zhao On Tue, 2022-02-01 at 11:11 +0100, Zdenek Kabelac wrote: > > > > > > But yes, if temporarily losing the symlink causes issues, your > > > patch > > > solves that (Zdenek will push that upstream). > > > > Thank you very much! It occured to me that if we want to solve my > > use > > case with minimal risk, we could make the the case in which the > > symlinks are preserved conditional on ACTION=="add" (i.e. true > > coldplug > > events). Tell me if you'd prefer that, I can re-submit. > > The problem is handling of 'suspended' state in udev rules - as I've > mentioned > original no userland tool should mostly care about this. > > However since there are those things like udev 'trigger' and it would > be kind > of ugly if the 'left-over' suspended device would stop whole system > operation > it's most likely better ATM to have some kind of support for this. > > It's should be noted there is still 'race' risk of system freezing in > the case > the suspend happens just after sysfs test and before actual i.e. > 'blkid' use. Right. Even if blkid was smart enough to check immediately before doing I/O on the device, there would still be a race window. It occured to me that it might be useful if IO on suspended DM devices failed with -EAGAIN when opened with a non-blocking file descriptor. But I haven't thought it through ... I suppose that would have other issues, it would be a breaking change anyway. The good thing is that most of the time, the devices are suspended only for a short period of time, so blkid will just hang for a few fractions of a second. The fact that udev skips calling blkid is only for the very rare remaining cases in which the suspend state persists for a long time. > The missed issue to be solve is - that ALL rules have to rely on a > single > suspend check - otherwise we risk 'partial' processing which is the > last > thing we want to see (aka all or nothing). Don't we have that already? Isn't that the check that sets the DM_SUSPENDED flag? > Your real problem was not the suspend on its own - which still may > happen > anytime (so i.e. just after the test whether device is suspended), > but the bug was related to bad exit path cleaning udev db content in > this case > - which is clear bug. Next bug is that other rules have to be > consistent and > properly skip its processing if the 1st. rules decides its meant to > be skipped. Part of the consistency problem is that we have a lot of related but not equivalent udev variables: DM_SUSPENDED DM_NOSCAN DM_UDEV_DISABLE_OTHER_RULES_FLAG DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG DM_UDEV_DISABLE_DISK_RULES_FLAG and then of course SYSTEMD_READY MPATH_DEVICE_READY For consumers of these variables (i.e. udev rules from other subsystems), it's not always obvious which one they should look at (just my €0.02). Regards, Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-02-01 10:55 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-02-01 10:55 UTC (permalink / raw) To: lvm-devel On Tue, 2022-02-01 at 11:11 +0100, Zdenek Kabelac wrote: > > > > > > But yes, if temporarily losing the symlink causes issues, your > > > patch > > > solves that (Zdenek will push that upstream). > > > > Thank you very much! It occured to me that if we want to solve my > > use > > case with minimal risk, we could make the the case in which the > > symlinks are preserved conditional on ACTION=="add" (i.e. true > > coldplug > > events). Tell me if you'd prefer that, I can re-submit. > > The problem is handling of 'suspended' state in udev rules - as I've > mentioned > original no userland tool should mostly care about this. > > However since there are those things like udev 'trigger' and it would > be kind > of ugly if the 'left-over' suspended device would stop whole system > operation > it's most likely better ATM to have some kind of support for this. > > It's should be noted there is still 'race' risk of system freezing in > the case > the suspend happens just after sysfs test and before actual i.e. > 'blkid' use. Right. Even if blkid was smart enough to check immediately before doing I/O on the device, there would still be a race window. It occured to me that it might be useful if IO on suspended DM devices failed with -EAGAIN when opened with a non-blocking file descriptor. But I haven't thought it through ... I suppose that would have other issues, it would be a breaking change anyway. The good thing is that most of the time, the devices are suspended only for a short period of time, so blkid will just hang for a few fractions of a second. The fact that udev skips calling blkid is only for the very rare remaining cases in which the suspend state persists for a long time. > The missed issue to be solve is - that ALL rules have to rely on a > single > suspend check - otherwise we risk 'partial' processing? which is the > last > thing we want to see (aka all or nothing). Don't we have that already? Isn't that the check that sets the DM_SUSPENDED flag? > Your real problem was not the suspend on its own - which still may > happen > anytime (so i.e. just after the test whether device is suspended), > but the bug was related to bad exit path cleaning udev db content in > this case > - which is clear bug.? Next bug is that other rules have to be > consistent and > properly skip its processing if the 1st. rules decides its meant to > be skipped. Part of the consistency problem is that we have a lot of related but not equivalent udev variables: DM_SUSPENDED DM_NOSCAN DM_UDEV_DISABLE_OTHER_RULES_FLAG DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG DM_UDEV_DISABLE_DISK_RULES_FLAG and then of course SYSTEMD_READY MPATH_DEVICE_READY For consumers of these variables (i.e. udev rules from other subsystems), it's not always obvious which one they should look at (just my ?0.02). Regards, Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-02-01 8:40 ` Martin Wilck @ 2022-02-01 10:55 ` Peter Rajnoha -1 siblings, 0 replies; 32+ messages in thread From: Peter Rajnoha @ 2022-02-01 10:55 UTC (permalink / raw) To: Martin Wilck Cc: Franck Bui, lvm-devel, dm-devel, David Teigland, Zdenek Kabelac, Zdenek Kabelac, Heming Zhao On Tue 01 Feb 2022 09:40, Martin Wilck wrote: > On Mon, 2022-01-31 at 14:33 +0100, Peter Rajnoha wrote: > > (just discussed this with Zdenek too) > > > > The patch makes sense to me! > > > > We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly > > such cases where we need to take the existing values already scanned > > in previous event, main use-case being the trigger at boot. We just > > didn't cover the 13-dm-disk.rules with the same logic regarding the > > suspended state to keep the symlinks - I didn't think it would cause > > issues (because, usually, after suspend, we anticipate incoming > > resume where the device is scanned again). > > > > But yes, if temporarily losing the symlink causes issues, your patch > > solves that (Zdenek will push that upstream). > > Thank you very much! It occured to me that if we want to solve my use > case with minimal risk, we could make the the case in which the > symlinks are preserved conditional on ACTION=="add" (i.e. true coldplug > events). Tell me if you'd prefer that, I can re-submit. I'd keep it for both actions ("add" and "change") because: - we won't be creating special case where only "add" is processed in 13-dm-disk.rules, - there's also a chance that someone might call "udevadm trigger --action=change" which should also work. I've just been playing with this change a bit and noticed we forgot to "IMPORT{db}" the blkid values. I tried this with an fs on top of <dm_dev> so there should be ID_FS_UUID_END based on which the /dev/disk/by-uuid/<UUID> should be present: 1) "udevadm info --name=<dm_dev>" (ID_FS_UUID_ENC is there and included in DEVLINKS) 2) "dmsetup suspend <dm_dev>" 3) "echo add > /sys/block/<dm_dev>/uevent" (to simulate the trigger) 4) "udevadm info --name=<dm_dev>" (ID_FS_UUID_ENV should still be there and included in DEVLINKS) 5) "dmsetup resume <dm_dev>" (ID_FS_UUID_ENV + DEVLINKS still correct) Thing is, we only restore DM_* values in 10-dm.rules, but we need to do the same for blkid values. That would be a patch like this on top of yours: udev/13-dm-disk.rules.in | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in index 5cc08121e..9b1a0b562 100644 --- a/udev/13-dm-disk.rules.in +++ b/udev/13-dm-disk.rules.in @@ -17,12 +17,22 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", GOTO="dm_end" SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" -ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" -ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_blkid_restore" +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_blkid_restore" ENV{DM_SUSPENDED}=="1", GOTO="dm_end" ENV{DM_NOSCAN}=="1", GOTO="dm_watch" (BLKID_RULE) +GOTO="dm_link" + +LABEL="dm_blkid_restore" +IMPORT{db}="ID_FS_USAGE" +IMPORT{db}="ID_FS_UUID_ENC" +IMPORT{db}="ID_FS_LABEL_ENC" +IMPORT{db}="ID_PART_ENTRY_UUID" +IMPORT{db}="ID_PART_ENTRY_SCHEME" +IMPORT{db}="ID_PART_ENTRY_NAME" +IMPORT{db}="ID_PART_GPT_AUTO_ROOT" LABEL="dm_link" ENV{DM_UDEV_LOW_PRIORITY_FLAG}=="1", OPTIONS="link_priority=-100" -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-02-01 10:55 ` Peter Rajnoha 0 siblings, 0 replies; 32+ messages in thread From: Peter Rajnoha @ 2022-02-01 10:55 UTC (permalink / raw) To: lvm-devel On Tue 01 Feb 2022 09:40, Martin Wilck wrote: > On Mon, 2022-01-31 at 14:33 +0100, Peter Rajnoha wrote: > > (just discussed this with Zdenek too) > > > > The patch makes sense to me! > > > > We added all the DM_UDEV_PRIMARY_SOURCE_FLAG and related for exactly > > such cases where we need to take the existing values already scanned > > in previous event, main use-case being the trigger at boot. We just > > didn't cover the 13-dm-disk.rules with the same logic regarding the > > suspended state to keep the symlinks - I didn't think it would cause > > issues (because, usually, after suspend, we anticipate incoming > > resume where the device is scanned again). > > > > But yes, if temporarily losing the symlink causes issues, your patch > > solves that (Zdenek will push that upstream). > > Thank you very much! It occured to me that if we want to solve my use > case with minimal risk, we could make the the case in which the > symlinks are preserved conditional on ACTION=="add" (i.e. true coldplug > events). Tell me if you'd prefer that, I can re-submit. I'd keep it for both actions ("add" and "change") because: - we won't be creating special case where only "add" is processed in 13-dm-disk.rules, - there's also a chance that someone might call "udevadm trigger --action=change" which should also work. I've just been playing with this change a bit and noticed we forgot to "IMPORT{db}" the blkid values. I tried this with an fs on top of <dm_dev> so there should be ID_FS_UUID_END based on which the /dev/disk/by-uuid/<UUID> should be present: 1) "udevadm info --name=<dm_dev>" (ID_FS_UUID_ENC is there and included in DEVLINKS) 2) "dmsetup suspend <dm_dev>" 3) "echo add > /sys/block/<dm_dev>/uevent" (to simulate the trigger) 4) "udevadm info --name=<dm_dev>" (ID_FS_UUID_ENV should still be there and included in DEVLINKS) 5) "dmsetup resume <dm_dev>" (ID_FS_UUID_ENV + DEVLINKS still correct) Thing is, we only restore DM_* values in 10-dm.rules, but we need to do the same for blkid values. That would be a patch like this on top of yours: udev/13-dm-disk.rules.in | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in index 5cc08121e..9b1a0b562 100644 --- a/udev/13-dm-disk.rules.in +++ b/udev/13-dm-disk.rules.in @@ -17,12 +17,22 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", GOTO="dm_end" SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" -ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" -ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_blkid_restore" +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_blkid_restore" ENV{DM_SUSPENDED}=="1", GOTO="dm_end" ENV{DM_NOSCAN}=="1", GOTO="dm_watch" (BLKID_RULE) +GOTO="dm_link" + +LABEL="dm_blkid_restore" +IMPORT{db}="ID_FS_USAGE" +IMPORT{db}="ID_FS_UUID_ENC" +IMPORT{db}="ID_FS_LABEL_ENC" +IMPORT{db}="ID_PART_ENTRY_UUID" +IMPORT{db}="ID_PART_ENTRY_SCHEME" +IMPORT{db}="ID_PART_ENTRY_NAME" +IMPORT{db}="ID_PART_GPT_AUTO_ROOT" LABEL="dm_link" ENV{DM_UDEV_LOW_PRIORITY_FLAG}=="1", OPTIONS="link_priority=-100" ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state 2022-02-01 10:55 ` Peter Rajnoha @ 2022-02-01 11:11 ` Martin Wilck -1 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-02-01 11:11 UTC (permalink / raw) To: Peter Rajnoha Cc: Franck Bui, lvm-devel, dm-devel, David Teigland, Zdenek Kabelac, Zdenek Kabelac, Heming Zhao On Tue, 2022-02-01 at 11:55 +0100, Peter Rajnoha wrote: > > Thing is, we only restore DM_* values in 10-dm.rules, but we need to > do > the same for blkid values. That would be a patch like this on top of > yours: Right. For those devices I'm mainly concerned about, this is done in 11-dm-mpath.rules and 11-dm-parts.rules (both part of multipath-tools). https://github.com/opensvc/multipath-tools/blob/master/multipath/11-dm-mpath.rules https://github.com/opensvc/multipath-tools/blob/master/kpartx/dm-parts.rules If you want the same functionality for generic dm devices, you need these import statements in the generic code as well. I should have added the imports right away, I might have been causing less confusion. But I thought I'd cause less potential for regressions this way, as all the SYMLINK+=... rules first test whether the related properties are non-empty. > > udev/13-dm-disk.rules.in | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in > index 5cc08121e..9b1a0b562 100644 > --- a/udev/13-dm-disk.rules.in > +++ b/udev/13-dm-disk.rules.in > @@ -17,12 +17,22 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", > GOTO="dm_end" > SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" > ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" > > -ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_link" > -ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_link" > +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_blkid_restore" > +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_blkid_restore" > ENV{DM_SUSPENDED}=="1", GOTO="dm_end" > ENV{DM_NOSCAN}=="1", GOTO="dm_watch" > > (BLKID_RULE) > +GOTO="dm_link" > + > +LABEL="dm_blkid_restore" > +IMPORT{db}="ID_FS_USAGE" > +IMPORT{db}="ID_FS_UUID_ENC" > +IMPORT{db}="ID_FS_LABEL_ENC" > +IMPORT{db}="ID_PART_ENTRY_UUID" > +IMPORT{db}="ID_PART_ENTRY_SCHEME" > +IMPORT{db}="ID_PART_ENTRY_NAME" > +IMPORT{db}="ID_PART_GPT_AUTO_ROOT" This looks ok. The list of variables is a little different in the multipath rules files. Anyway, the only properties that matter are those for which we're going to create symlinks for. In general, the approach of doing this here is somewhat fragile - when udev's blkid changes, the list of determined / required device properties might change as well. It'd be optimal to move this logic into udev proper, into the generic rules that handle storage device naming (60-persistent-storage.rules). Then all we'd need to do in the dm rules would be to set a flag telling udev that blkid shouldn't be called because it might hang, and perhaps another one to tell that it's OK to import the properties from the db instead. Regards Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] udev: create symlinks and watch even in suspended state @ 2022-02-01 11:11 ` Martin Wilck 0 siblings, 0 replies; 32+ messages in thread From: Martin Wilck @ 2022-02-01 11:11 UTC (permalink / raw) To: lvm-devel On Tue, 2022-02-01 at 11:55 +0100, Peter Rajnoha wrote: > > Thing is, we only restore DM_* values in 10-dm.rules, but we need to > do > the same for blkid values. That would be a patch like this on top of > yours: Right. For those devices I'm mainly concerned about, this is done in 11-dm-mpath.rules and 11-dm-parts.rules (both part of multipath-tools). https://github.com/opensvc/multipath-tools/blob/master/multipath/11-dm-mpath.rules https://github.com/opensvc/multipath-tools/blob/master/kpartx/dm-parts.rules If you want the same functionality for generic dm devices, you need these import statements in the generic code as well. I should have added the imports right away, I might have been causing less confusion. But I thought I'd cause less potential for regressions this way, as all the SYMLINK+=... rules first test whether the related properties are non-empty. > > ?udev/13-dm-disk.rules.in | 14 ++++++++++++-- > ?1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in > index 5cc08121e..9b1a0b562 100644 > --- a/udev/13-dm-disk.rules.in > +++ b/udev/13-dm-disk.rules.in > @@ -17,12 +17,22 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", > GOTO="dm_end" > ?SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" > ?ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" > ? > -ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_link" > -ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_link" > +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_blkid_restore" > +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", > GOTO="dm_blkid_restore" > ?ENV{DM_SUSPENDED}=="1", GOTO="dm_end" > ?ENV{DM_NOSCAN}=="1", GOTO="dm_watch" > ? > ?(BLKID_RULE) > +GOTO="dm_link" > + > +LABEL="dm_blkid_restore" > +IMPORT{db}="ID_FS_USAGE" > +IMPORT{db}="ID_FS_UUID_ENC" > +IMPORT{db}="ID_FS_LABEL_ENC" > +IMPORT{db}="ID_PART_ENTRY_UUID" > +IMPORT{db}="ID_PART_ENTRY_SCHEME" > +IMPORT{db}="ID_PART_ENTRY_NAME" > +IMPORT{db}="ID_PART_GPT_AUTO_ROOT" This looks ok. The list of variables is a little different in the multipath rules files. Anyway, the only properties that matter are those for which we're going to create symlinks for. In general, the approach of doing this here is somewhat fragile - when udev's blkid changes, the list of determined / required device properties might change as well. It'd be optimal to move this logic into udev proper, into the generic rules that handle storage device naming (60-persistent-storage.rules). Then all we'd need to do in the dm rules would be to set a flag telling udev that blkid shouldn't be called because it might hang, and perhaps another one to tell that it's OK to import the properties from the db instead. Regards Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2022-02-01 11:12 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-28 13:42 [dm-devel] [PATCH] udev: create symlinks and watch even in suspended state mwilck 2022-01-28 13:42 ` mwilck 2022-01-28 15:33 ` [dm-devel] " Zdenek Kabelac 2022-01-28 15:33 ` Zdenek Kabelac 2022-01-28 15:57 ` [dm-devel] " Martin Wilck 2022-01-28 15:57 ` Martin Wilck 2022-01-28 16:02 ` [dm-devel] " Martin Wilck 2022-01-28 16:02 ` Martin Wilck 2022-01-28 17:47 ` [dm-devel] " Zdenek Kabelac 2022-01-28 17:47 ` Zdenek Kabelac 2022-01-28 18:46 ` [dm-devel] " Martin Wilck 2022-01-28 18:46 ` Martin Wilck 2022-01-28 21:06 ` [dm-devel] " Zdenek Kabelac 2022-01-28 21:06 ` Zdenek Kabelac 2022-01-28 23:21 ` [dm-devel] " Martin Wilck 2022-01-28 23:21 ` Martin Wilck 2022-01-29 20:05 ` [dm-devel] " Zdenek Kabelac 2022-01-29 20:05 ` Zdenek Kabelac 2022-01-29 20:46 ` [dm-devel] " Martin Wilck 2022-01-29 20:46 ` Martin Wilck 2022-01-31 13:33 ` [dm-devel] " Peter Rajnoha 2022-01-31 13:33 ` Peter Rajnoha 2022-02-01 8:40 ` [dm-devel] " Martin Wilck 2022-02-01 8:40 ` Martin Wilck 2022-02-01 10:11 ` [dm-devel] " Zdenek Kabelac 2022-02-01 10:11 ` Zdenek Kabelac 2022-02-01 10:55 ` [dm-devel] " Martin Wilck 2022-02-01 10:55 ` Martin Wilck 2022-02-01 10:55 ` [dm-devel] " Peter Rajnoha 2022-02-01 10:55 ` Peter Rajnoha 2022-02-01 11:11 ` [dm-devel] " Martin Wilck 2022-02-01 11:11 ` Martin Wilck
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.