Re: [systemd-devel] Errorneous detection of degraded array

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [systemd-devel] Errorneous detection of degraded array
       [not found] <96A26C8C6786C341B83BC4F2BC5419E4795DE9A6@SRF-EXCH1.corp.sunrisefutures.com>
@ 2017-01-27  7:12 ` Andrei Borzenkov
  2017-01-27  8:25   ` Martin Wilck
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrei Borzenkov @ 2017-01-27  7:12 UTC (permalink / raw)
  To: Luke Pyzowski, 'systemd-devel@lists.freedesktop.org', linux-raid

26.01.2017 21:02, Luke Pyzowski пишет:
> Hello,
> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur.
> 
> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
> systemd[1]: Starting Activate md array even though degraded...
> systemd[1]: Stopped target Local File Systems.
> systemd[1]: Stopping Local File Systems.
> systemd[1]: Unmounting /share...
> systemd[1]: Stopped (with error) /dev/md0.
> systemd[1]: Started Activate md array even though degraded.
> systemd[1]: Unmounted /share.
> 
> When the system boots normally the following is in the logs:
> systemd[1]: Started Timer to wait for more drives before activating degraded array..
> systemd[1]: Starting Timer to wait for more drives before activating degraded array..
> ...
> systemd[1]: Stopped Timer to wait for more drives before activating degraded array..
> systemd[1]: Stopping Timer to wait for more drives before activating degraded array..
> 
> The above occurs within the same second according to the timestamps and the timer ends prior to mounting any local filesystems, it properly detects that the RAID is valid and everything continues normally. The other RAID device - a RAID1 of 2 disks containing swap and / have never exhibited this failure.
> 
> My question is, what are the conditions where systemd detects the RAID6 as being degraded? It seems to be a race condition somewhere, but I am not sure what configuration should be modified if any. If needed I can provide more verbose logs, just let me know if they might be useful.
> 

It is not directly related to systemd. When block device that is part of
MD array is detected by kernel, udev rule queries array if it is
complete. If it is, it starts array (subject to general rules of which
arrays are auto-started); and if not, it (udev rule) starts timer to
assemble degraded array.

See udev-md-raid-assembly.rules in mdadm sources:

ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*",
ENV{MD_FOREIGN}=="no",
ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"

So it looks like events for some array members either got lost or are
delivered late.

Note that there was discussion on openSUSE list where arrays would not
be auto-assembled on boot, even though triggering device change *after*
initial boot would correctly run these rules. This situation was
triggered by adding extra disk to the system (i.e. - boot with 3 disks
worked, with 4 disks - not). I could not find any hints even after
enabling full udev and systemd debug logs. Logs are available if anyone
wants to try it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Errorneous detection of degraded array
  2017-01-27  7:12 ` [systemd-devel] Errorneous detection of degraded array Andrei Borzenkov
@ 2017-01-27  8:25   ` Martin Wilck
  2017-01-27 19:44   ` Luke Pyzowski
  2017-01-30  1:53   ` NeilBrown
  2 siblings, 0 replies; 12+ messages in thread
From: Martin Wilck @ 2017-01-27  8:25 UTC (permalink / raw)
  To: Andrei Borzenkov, Luke Pyzowski,
	'systemd-devel@lists.freedesktop.org',
	linux-raid

> 26.01.2017 21:02, Luke Pyzowski пишет:
> > Hello,
> > I have a large RAID6 device with 24 local drives on CentOS7.3.
> > Randomly (around 50% of the time) systemd will unmount my RAID
> > device thinking it is degraded after the mdadm-last-resort@.timer
> > expires, however the device is working normally by all accounts,
> > and I can immediately mount it manually upon boot completion. In
> > the logs below /share is the RAID device. I can increase the timer
> > in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60
> > seconds, but this problem can randomly still occur.

It seems to me that you rather need to decrease the timeout value, or
(more reasonable) increase x-systemd.device-timeout for the /share
mount point.
Unfortunately your log excerpt contains to time stamps but I suppose
you're facing a race where the device times out before the "last
resort" timer starts it (and before the last devices appear).

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Errorneous detection of degraded array
  2017-01-27  7:12 ` [systemd-devel] Errorneous detection of degraded array Andrei Borzenkov
  2017-01-27  8:25   ` Martin Wilck
@ 2017-01-27 19:44   ` Luke Pyzowski
  2017-01-28 17:34     ` [systemd-devel] " Andrei Borzenkov
  2017-01-30  1:53   ` NeilBrown
  2 siblings, 1 reply; 12+ messages in thread
From: Luke Pyzowski @ 2017-01-27 19:44 UTC (permalink / raw)
  To: 'Andrei Borzenkov',
	'systemd-devel@lists.freedesktop.org',
	linux-raid

I've modified a number of settings to try to resolve this, so far no success.
I've created an explicit mount file for the RAID array: /etc/systemd/system/share.mount
Inside there I've experimented with TimeoutSec=

In /etc/systemd/system/mdadm-last-resort@.timer I've worked with
OnActiveSec=

I've also tried (without an explicit mount file) to add x-systemd.device-timeout to /etc/fstab for the mount. 

Here's a few more system logs showing perhaps more detail. I've edited them to show only relevant details, full pastebin of logs: http://pastebin.com/sL8nKt7j
These logs were generated with TimeoutSec=120 in /etc/systemd/system/share.mount the description of the mount in the logs is: "Mount /share RAID partition explicitly"
And OnActiveSec=30 in /etc/systemd/system/mdadm-last-resort@.timer
From blkid:
/dev/md0: UUID="2b9114be-3d5a-41d7-8d4b-e5047d223129" TYPE="ext4"
/dev/md0 is the /share partition.

From /etc/mdadm.conf:
ARRAY /dev/md/0  metadata=1.2 UUID=97566d2f:ae7a169b:966f5840:3e8267f9 name=lnxnfs01:0

Boot begins at Jan 27 11:33:10
+4 seconds from boot:
Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device /dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959.
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdc operational as raid disk 2
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdx operational as raid disk 23
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdu operational as raid disk 20
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdt operational as raid disk 19
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdo operational as raid disk 14
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdn operational as raid disk 13
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdd operational as raid disk 3
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdv operational as raid disk 21
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sda operational as raid disk 0
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdf operational as raid disk 5
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdm operational as raid disk 12
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sde operational as raid disk 4
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdp operational as raid disk 15
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdi operational as raid disk 8
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdl operational as raid disk 11
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdk operational as raid disk 10
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sds operational as raid disk 18
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdb operational as raid disk 1
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdj operational as raid disk 9
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdg operational as raid disk 6
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdr operational as raid disk 17
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdh operational as raid disk 7
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdq operational as raid disk 16
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdw operational as raid disk 22
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: allocated 25534kB
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: raid level 6 active with 24 out of 24 devices, algorithm 2
Jan 27 11:33:14 lnxnfs01 kernel: RAID conf printout:
Jan 27 11:33:14 lnxnfs01 kernel:  --- level:6 rd:24 wd:24
Jan 27 11:33:14 lnxnfs01 kernel:  disk 0, o:1, dev:sda
Jan 27 11:33:14 lnxnfs01 kernel:  disk 1, o:1, dev:sdb
Jan 27 11:33:14 lnxnfs01 kernel:  disk 2, o:1, dev:sdc
Jan 27 11:33:14 lnxnfs01 kernel:  disk 3, o:1, dev:sdd
Jan 27 11:33:14 lnxnfs01 kernel:  disk 4, o:1, dev:sde
Jan 27 11:33:14 lnxnfs01 kernel:  disk 5, o:1, dev:sdf
Jan 27 11:33:14 lnxnfs01 kernel:  disk 6, o:1, dev:sdg
Jan 27 11:33:14 lnxnfs01 kernel:  disk 7, o:1, dev:sdh
Jan 27 11:33:14 lnxnfs01 kernel:  disk 8, o:1, dev:sdi
Jan 27 11:33:14 lnxnfs01 kernel:  disk 9, o:1, dev:sdj
Jan 27 11:33:14 lnxnfs01 kernel:  disk 10, o:1, dev:sdk
Jan 27 11:33:14 lnxnfs01 kernel:  disk 11, o:1, dev:sdl
Jan 27 11:33:14 lnxnfs01 kernel:  disk 12, o:1, dev:sdm
Jan 27 11:33:14 lnxnfs01 kernel:  disk 13, o:1, dev:sdn
Jan 27 11:33:14 lnxnfs01 kernel:  disk 14, o:1, dev:sdo
Jan 27 11:33:14 lnxnfs01 kernel:  disk 15, o:1, dev:sdp
Jan 27 11:33:14 lnxnfs01 kernel:  disk 16, o:1, dev:sdq
Jan 27 11:33:14 lnxnfs01 kernel:  disk 17, o:1, dev:sdr
Jan 27 11:33:14 lnxnfs01 kernel:  disk 18, o:1, dev:sds
Jan 27 11:33:14 lnxnfs01 kernel:  disk 19, o:1, dev:sdt
Jan 27 11:33:14 lnxnfs01 kernel:  disk 20, o:1, dev:sdu
Jan 27 11:33:14 lnxnfs01 kernel:  disk 21, o:1, dev:sdv
Jan 27 11:33:14 lnxnfs01 kernel:  disk 22, o:1, dev:sdw
Jan 27 11:33:14 lnxnfs01 kernel:  disk 23, o:1, dev:sdx
Jan 27 11:33:14 lnxnfs01 kernel: md0: detected capacity change from 0 to 45062020923392
Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device /dev/disk/by-uuid/2b9114be-3d5a-41d7-8d4b-e5047d223129.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started udev Wait for Complete Device Initialization.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started Timer to wait for more drives before activating degraded array..
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Timer to wait for more drives before activating degraded array..
Jan 27 11:33:14 lnxnfs01 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting LVM2 PV scan on device 9:127...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Activation of DM RAID sets...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Mounting Mount /share RAID partition explicitly...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting File System Check on /dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Activating swap /dev/mapper/vg_root-swap...
Jan 27 11:33:14 lnxnfs01 lvm[1494]: 2 logical volume(s) in volume group "vg_root" now active
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started LVM2 PV scan on device 9:127.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Activated swap /dev/mapper/vg_root-swap.
Jan 27 11:33:14 lnxnfs01 kernel: Adding 32763900k swap on /dev/mapper/vg_root-swap.  Priority:-1 extents:1 across:32763900k FS
Jan 27 11:33:14 lnxnfs01 systemd-fsck[1499]: /dev/md126: clean, 345/64128 files, 47930/256240 blocks
Jan 27 11:33:14 lnxnfs01 systemd[1]: Reached target Swap.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Swap.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started File System Check on /dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Mounting /boot...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Mounted /boot.
Jan 27 11:33:14 lnxnfs01 kernel: EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: (null)
Jan 27 11:33:21 lnxnfs01 systemd[1]: Started Activation of DM RAID sets.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Reached target Encrypted Volumes.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Starting Encrypted Volumes.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Reached target Local File Systems.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Starting Local File Systems.
...
Jan 27 11:33:21 lnxnfs01 kernel: EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
Jan 27 11:33:21 lnxnfs01 systemd[1]: Mounted Mount /share RAID partition explicitly.

... + 31 seconds from disk initialization, expiration of 30 second timer from mdadm-last-resort@.timer

Jan 27 11:33:45 lnxnfs01 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped target Local File Systems.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopping Local File Systems.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounting Mount /share RAID partition explicitly...
Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting Activate md array even though degraded...
Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped (with error) /dev/md0.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Started Activate md array even though degraded.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounted Mount /share RAID partition explicitly.

... + 121 seconds from disk initialization, expiration of 120 second timer from TimeoutSec=120 in /etc/systemd/system/share.mount

Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-md-0.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /dev/md/0.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-md-0.device/stop failed with result 'timeout'.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2dname\x2dlnxnfs01:0.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /dev/disk/by-id/md-name-lnxnfs01:0.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2dname\x2dlnxnfs01:0.device/stop failed with result 'timeout'.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job sys-devices-virtual-block-md0.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /sys/devices/virtual/block/md0.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job sys-devices-virtual-block-md0.device/stop failed with result 'timeout'.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2duuid\x2d97566d2f:ae7a169b:966f5840:3e8267f9.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /dev/disk/by-id/md-uuid-97566d2f:ae7a169b:966f5840:3e8267f9.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2duuid\x2d97566d2f:ae7a169b:966f5840:3e8267f9.device/stop failed with result 'timeout'.

In the logs above, the timer is started to wait for devices, it does not stop immediately because it sees a degraded array. It continues for 30 seconds and then kicks out my mount believing the raid is problematic. 

As an experiment, I've increased the values for the mdadm-last-resort@.timer to OnActiveSec=300 and TimeoutSec=320. This give me enough time to log in to the system. During that time, I can view the RAID and everything appears proper, yet 300 seconds later the mdadm-last-resort@.timer expires with an error on /dev/md0?

Perhaps systemd is working normally, but then the question is why is the raid not being assembled properly - which triggers /usr/lib/systemd/system/mdadm-last-resort@.service?

From your suggestion I need to next move to full udev debug logs? Can I delay the assembly of the RAID by the kernel if this is a race condition, as it appears that might be the case? Should that delay be early in the systemd startup process or elsewhere?

Thanks again,
Luke Pyzowski
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [systemd-devel] Errorneous detection of degraded array
  2017-01-27 19:44   ` Luke Pyzowski
@ 2017-01-28 17:34     ` Andrei Borzenkov
  2017-01-30 22:41       ` Luke Pyzowski
  0 siblings, 1 reply; 12+ messages in thread
From: Andrei Borzenkov @ 2017-01-28 17:34 UTC (permalink / raw)
  To: Luke Pyzowski, 'systemd-devel@lists.freedesktop.org', linux-raid

27.01.2017 22:44, Luke Pyzowski пишет:
...
> Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: raid level 6 active with 24 out of 24 devices, algorithm 2
...
> Jan 27 11:33:14 lnxnfs01 kernel: md0: detected capacity change from 0 to 45062020923392
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device /dev/disk/by-uuid/2b9114be-3d5a-41d7-8d4b-e5047d223129.
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Started udev Wait for Complete Device Initialization.
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Started Timer to wait for more drives before activating degraded array..
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Timer to wait for more drives before activating degraded array..
...
> 
> ... + 31 seconds from disk initialization, expiration of 30 second timer from mdadm-last-resort@.timer
> 
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped target Local File Systems.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopping Local File Systems.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounting Mount /share RAID partition explicitly...
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting Activate md array even though degraded...
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped (with error) /dev/md0.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Started Activate md array even though degraded.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounted Mount /share RAID partition explicitly.
> 

Here is my educated guess.

Both mdadm-last-resort@.timer and mdadm-last-resort@.service conflict
with MD device:

bor@bor-Latitude-E5450:~/src/systemd$ cat ../mdadm/systemd/
mdadm-grow-continue@.service  mdadm.shutdown
SUSE-mdadm_env.sh
mdadm-last-resort@.service    mdmonitor.service
mdadm-last-resort@.timer      mdmon@.service
bor@bor-Latitude-E5450:~/src/systemd$ cat
../mdadm/systemd/mdadm-last-resort@.timer
[Unit]
Description=Timer to wait for more drives before activating degraded array.
DefaultDependencies=no
Conflicts=sys-devices-virtual-block-%i.device

[Timer]
OnActiveSec=30
bor@bor-Latitude-E5450:~/src/systemd$ cat
../mdadm/systemd/mdadm-last-resort@.service
[Unit]
Description=Activate md array even though degraded
DefaultDependencies=no
Conflicts=sys-devices-virtual-block-%i.device

[Service]
Type=oneshot
ExecStart=BINDIR/mdadm --run /dev/%i

I presume intention is to stop these units when MD device is finally
assembled as complete. This is indeed what happens on my (test) system:

Jan 28 14:18:04 linux-ffk5 kernel: md: bind<vda1>
Jan 28 14:18:04 linux-ffk5 kernel: md: bind<vdb1>
Jan 28 14:18:05 linux-ffk5 kernel: md/raid1:md0: active with 2 out of 2
mirrors
Jan 28 14:18:05 linux-ffk5 kernel: md0: detected capacity change from 0
to 5363466240
Jan 28 14:18:06 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Installed new job mdadm-last-resort@md0.timer/start as 287
Jan 28 14:18:06 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Enqueued job mdadm-last-resort@md0.timer/start as 287
Jan 28 14:18:06 linux-ffk5 systemd[1]: dev-ttyS9.device: Changed dead ->
plugged
Jan 28 14:18:07 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Changed dead -> waiting
Jan 28 14:18:12 linux-ffk5 systemd[1]:
sys-devices-virtual-block-md0.device: Changed dead -> plugged
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Trying to enqueue job mdadm-last-resort@md0.timer/stop/replace
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Installed new job mdadm-last-resort@md0.timer/stop as 292
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Enqueued job mdadm-last-resort@md0.timer/stop as 292
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Changed waiting -> dead
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer: Job
mdadm-last-resort@md0.timer/stop finished, result=done
Jan 28 14:18:12 linux-ffk5 systemd[1]: Stopped Timer to wait for more
drives before activating degraded array..
Jan 28 14:19:34 10 systemd[1692]: dev-vda1.device: Changed dead -> plugged
Jan 28 14:19:34 10 systemd[1692]: dev-vdb1.device: Changed dead -> plugged

On your system apparently timer is not stopped when md device appears so
that when later last-resort service runs, it causes attempt to stop md
device (due to conflict) and transitively mount on top of it.

Could you try run with systemd.log_level=debug on kernel command line
and upload journal again. We can only hope that it will not skew timings
enough but it may prove my hypothesis.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [systemd-devel] Errorneous detection of degraded array
  2017-01-28 17:34     ` [systemd-devel] " Andrei Borzenkov
@ 2017-01-30 22:41       ` Luke Pyzowski
  0 siblings, 0 replies; 12+ messages in thread
From: Luke Pyzowski @ 2017-01-30 22:41 UTC (permalink / raw)
  To: 'Andrei Borzenkov',
	'systemd-devel@lists.freedesktop.org',
	linux-raid

> Does
>   systemctl  list-dependencies  sys-devices-virtual-block-md0.device
> report anything interesting?  I get
>
> sys-devices-virtual-block-md0.device
> ● └─mdmonitor.service

Nothing interesting, the same output as you have above.

> Could you try run with systemd.log_level=debug on kernel command line and upload journal again. We can only hope that it will not skew timings enough but it may prove my hypothesis.

I've uploaded the full debug logs to: https://gist.github.com/Kryai/8273322c8a61347e2300e476c70b4d05
In around 20 reboots, the error appeared only twice, certainly with debug enabled it is more rare, but it does still occur, but to your correct guess, debug logging does affect the exhibition of the race condition.

Reminder of key things in the log:
# cat /etc/systemd/system/mdadm-last-resort@.timer 
[Unit]
Description=Timer to wait for more drives before activating degraded array.
DefaultDependencies=no
Conflicts=sys-devices-virtual-block-%i.device

[Timer]
OnActiveSec=30

# cat /etc/systemd/system/share.mount 
[Unit]
Description=Mount /share RAID partition explicitly
Before=nfs-server.service

[Mount]
What=/dev/disk/by-uuid/2b9114be-3d5a-41d7-8d4b-e5047d223129
Where=/share
Type=ext4
Options=defaults
TimeoutSec=120

[Install]
WantedBy=multi-user.target

Again, if any more information is needed please let me know I'll provide it.

Many thanks,
Luke Pyzowski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Errorneous detection of degraded array
  2017-01-27  7:12 ` [systemd-devel] Errorneous detection of degraded array Andrei Borzenkov
  2017-01-27  8:25   ` Martin Wilck
  2017-01-27 19:44   ` Luke Pyzowski
@ 2017-01-30  1:53   ` NeilBrown
  2017-01-30  3:40     ` Andrei Borzenkov
  2 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2017-01-30  1:53 UTC (permalink / raw)
  To: Andrei Borzenkov, Luke Pyzowski,
	'systemd-devel@lists.freedesktop.org',
	linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 2045 bytes --]

On Fri, Jan 27 2017, Andrei Borzenkov wrote:

> 26.01.2017 21:02, Luke Pyzowski пишет:
>> Hello,
>> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur.
>> 
>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>> systemd[1]: Starting Activate md array even though degraded...
>> systemd[1]: Stopped target Local File Systems.
>> systemd[1]: Stopping Local File Systems.
>> systemd[1]: Unmounting /share...
>> systemd[1]: Stopped (with error) /dev/md0.

This line perplexes me.

The last-resort.service (and .timer) files have a Conflict= directive
against sys-devices-virtual-block-md$DEV.device 
Normally a Conflicts= directive means that if this service starts, that
one is stopped, and if that one starts, this is stopped.
However .device units cannot be stopped:

$ systemctl show sys-devices-virtual-block-md0.device | grep Can
CanStart=no
CanStop=no
CanReload=no
CanIsolate=no

so presumable the attempt to stop the device fails, so the Conflict=
dependency cannot be met, so the last-resort service (or timer) doesn't
get started.
At least, that is what I see happening in my tests.

But your log doesn't mention sys-devices-virtual-block-md0, it
mentions /dev/md0.
How does systemd know about /dev/md0, or the connection it has with
sys-devices-virtual-block-md0 ??

Does
  systemctl  list-dependencies  sys-devices-virtual-block-md0.device

report anything interesting?  I get

sys-devices-virtual-block-md0.device
● └─mdmonitor.service

NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 172 bytes --]

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Errorneous detection of degraded array
  2017-01-30  1:53   ` NeilBrown
@ 2017-01-30  3:40     ` Andrei Borzenkov
  2017-01-30  6:36       ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Andrei Borzenkov @ 2017-01-30  3:40 UTC (permalink / raw)
  To: NeilBrown, Luke Pyzowski,
	'systemd-devel@lists.freedesktop.org',
	linux-raid


[-- Attachment #1.1.1: Type: text/plain, Size: 3159 bytes --]

30.01.2017 04:53, NeilBrown пишет:
> On Fri, Jan 27 2017, Andrei Borzenkov wrote:
> 
>> 26.01.2017 21:02, Luke Pyzowski пишет:
>>> Hello,
>>> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur.
>>>
>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>> systemd[1]: Starting Activate md array even though degraded...
>>> systemd[1]: Stopped target Local File Systems.
>>> systemd[1]: Stopping Local File Systems.
>>> systemd[1]: Unmounting /share...
>>> systemd[1]: Stopped (with error) /dev/md0.
> 
> This line perplexes me.
> 
> The last-resort.service (and .timer) files have a Conflict= directive
> against sys-devices-virtual-block-md$DEV.device 
> Normally a Conflicts= directive means that if this service starts, that
> one is stopped, and if that one starts, this is stopped.
> However .device units cannot be stopped:
> 
> $ systemctl show sys-devices-virtual-block-md0.device | grep Can
> CanStart=no
> CanStop=no
> CanReload=no
> CanIsolate=no
> 
> so presumable the attempt to stop the device fails, so the Conflict=
> dependency cannot be met, so the last-resort service (or timer) doesn't
> get started.

As I explained in other mail, to me it looks like last-resort timer does
get started, and then last-resort service is started which attempts to
stop device and because mount point depends on device it also stops
mount point. So somehow we have bad timing when both device and timer
start without canceling each other.

The fact that stopping of device itself fails is irrelevant here -
dependencies are evaluated at the time job is submitted, so if
share.mount Requires dev-md0.device and you attempt to Stop
dev-md0.device, systemd still queues job to Stop share.mount.

> At least, that is what I see happening in my tests.
> 

Yes, we have race condition here, I cannot reproduce this either. It
does not mean it does not exist :) Let's hope debug logging will show
something more useful (it is entirely possible that with debugging logs
turned on this race does not happen).

> But your log doesn't mention sys-devices-virtual-block-md0, it
> mentions /dev/md0.
> How does systemd know about /dev/md0, or the connection it has with
> sys-devices-virtual-block-md0 ??
> 

By virtue of "Following" attribute. dev-md0.device is Following
sys-devices-virtual-block-md0.device so stopping the latter will also
stop the former.

> Does
>   systemctl  list-dependencies  sys-devices-virtual-block-md0.device
> 
> report anything interesting?  I get
> 
> sys-devices-virtual-block-md0.device
> ● └─mdmonitor.service
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 172 bytes --]

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Errorneous detection of degraded array
  2017-01-30  3:40     ` Andrei Borzenkov
@ 2017-01-30  6:36       ` NeilBrown
  2017-01-30  7:29         ` Andrei Borzenkov
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2017-01-30  6:36 UTC (permalink / raw)
  To: Andrei Borzenkov, Luke Pyzowski,
	'systemd-devel@lists.freedesktop.org',
	linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 4630 bytes --]

On Mon, Jan 30 2017, Andrei Borzenkov wrote:

> 30.01.2017 04:53, NeilBrown пишет:
>> On Fri, Jan 27 2017, Andrei Borzenkov wrote:
>> 
>>> 26.01.2017 21:02, Luke Pyzowski пишет:
>>>> Hello,
>>>> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur.
>>>>
>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>>> systemd[1]: Starting Activate md array even though degraded...
>>>> systemd[1]: Stopped target Local File Systems.
>>>> systemd[1]: Stopping Local File Systems.
>>>> systemd[1]: Unmounting /share...
>>>> systemd[1]: Stopped (with error) /dev/md0.
>> 
>> This line perplexes me.
>> 
>> The last-resort.service (and .timer) files have a Conflict= directive
>> against sys-devices-virtual-block-md$DEV.device 
>> Normally a Conflicts= directive means that if this service starts, that
>> one is stopped, and if that one starts, this is stopped.
>> However .device units cannot be stopped:
>> 
>> $ systemctl show sys-devices-virtual-block-md0.device | grep Can
>> CanStart=no
>> CanStop=no
>> CanReload=no
>> CanIsolate=no
>> 
>> so presumable the attempt to stop the device fails, so the Conflict=
>> dependency cannot be met, so the last-resort service (or timer) doesn't
>> get started.
>
> As I explained in other mail, to me it looks like last-resort timer does
> get started, and then last-resort service is started which attempts to
> stop device and because mount point depends on device it also stops
> mount point. So somehow we have bad timing when both device and timer
> start without canceling each other.
>
> The fact that stopping of device itself fails is irrelevant here -
> dependencies are evaluated at the time job is submitted, so if
> share.mount Requires dev-md0.device and you attempt to Stop
> dev-md0.device, systemd still queues job to Stop share.mount.
>
>> At least, that is what I see happening in my tests.
>> 
>
> Yes, we have race condition here, I cannot reproduce this either. It
> does not mean it does not exist :) Let's hope debug logging will show
> something more useful (it is entirely possible that with debugging logs
> turned on this race does not happen).
>
>> But your log doesn't mention sys-devices-virtual-block-md0, it
>> mentions /dev/md0.
>> How does systemd know about /dev/md0, or the connection it has with
>> sys-devices-virtual-block-md0 ??
>> 
>
> By virtue of "Following" attribute. dev-md0.device is Following
> sys-devices-virtual-block-md0.device so stopping the latter will also
> stop the former.

Ahh.. I see why I never saw this now.
Two reasons.
 1/ My /etc/fstab has UUID=d1711227-c9fa-4883-a904-7cd7a3eb865c rather
    than /dev/md0
    systemd doesn't manage to intuit a 'Following' dependency between
    the UUID and the mount point.
 2/ I use partitions of md arrays: that UUID is actually /dev/md0p3.
    systemd doesn't intuit that md0p3.device is Following md0.device.

So you only hit a problem if you have "/dev/md0" or similar in
/etc/fstab.

The race is, I think, that one I mentioned.  If the md device is started
before udev tells systemd to start the timer, the Conflicts dependencies
goes the "wrong" way and stops the wrong thing.

It would be nice to be able to reliably stop the timer when the device
starts, without risking having the device get stopped when the timer
starts, but I don't think we can reliably do that.

Changing the
  Conflicts=sys-devices-virtual-block-%i.device
lines to
  ConditionPathExists=/sys/devices/virtual/block/%i
might make the problem go away, without any negative consequences.

The primary purpose of having the 'Conflicts' directives was so that
systemd wouldn't log
  Starting Activate md array even though degraded
after the array was successfully started.
Hopefully it won't do that when the Condition fails.

Thanks,
NeilBrown


 
  

>
>> Does
>>   systemctl  list-dependencies  sys-devices-virtual-block-md0.device
>> 
>> report anything interesting?  I get
>> 
>> sys-devices-virtual-block-md0.device
>> ● └─mdmonitor.service
>> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 172 bytes --]

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Errorneous detection of degraded array
  2017-01-30  6:36       ` NeilBrown
@ 2017-01-30  7:29         ` Andrei Borzenkov
  2017-01-30 22:19           ` [systemd-devel] " NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Andrei Borzenkov @ 2017-01-30  7:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, systemd-devel

On Mon, Jan 30, 2017 at 9:36 AM, NeilBrown <neilb@suse.com> wrote:
...
>>>>>
>>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>>>> systemd[1]: Starting Activate md array even though degraded...
>>>>> systemd[1]: Stopped target Local File Systems.
>>>>> systemd[1]: Stopping Local File Systems.
>>>>> systemd[1]: Unmounting /share...
>>>>> systemd[1]: Stopped (with error) /dev/md0.
>>>
...
>
> The race is, I think, that one I mentioned.  If the md device is started
> before udev tells systemd to start the timer, the Conflicts dependencies
> goes the "wrong" way and stops the wrong thing.
>

From the logs provided it is unclear whether it is *timer* or
*service*. If it is timer - I do not understand why it is started
exactly 30 seconds after device apparently appears. This would match
starting service.

Yet another case where system logging is hopelessly unfriendly for
troubleshooting :(

> It would be nice to be able to reliably stop the timer when the device
> starts, without risking having the device get stopped when the timer
> starts, but I don't think we can reliably do that.
>

Well, let's wait until we can get some more information about what happens.

> Changing the
>   Conflicts=sys-devices-virtual-block-%i.device
> lines to
>   ConditionPathExists=/sys/devices/virtual/block/%i
> might make the problem go away, without any negative consequences.
>

Ugly, but yes, may be this is the only way using current systemd.

> The primary purpose of having the 'Conflicts' directives was so that
> systemd wouldn't log
>   Starting Activate md array even though degraded
> after the array was successfully started.

This looks like cosmetic problem. What will happen if last resort
service is started when array is fully assembled? Will it do any harm?

> Hopefully it won't do that when the Condition fails.
>
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [systemd-devel] Errorneous detection of degraded array
  2017-01-30  7:29         ` Andrei Borzenkov
@ 2017-01-30 22:19           ` NeilBrown
  2017-01-31 20:17             ` Andrei Borzenkov
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2017-01-30 22:19 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Luke Pyzowski, systemd-devel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3172 bytes --]

On Mon, Jan 30 2017, Andrei Borzenkov wrote:

> On Mon, Jan 30, 2017 at 9:36 AM, NeilBrown <neilb@suse.com> wrote:
> ...
>>>>>>
>>>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>>>>> systemd[1]: Starting Activate md array even though degraded...
>>>>>> systemd[1]: Stopped target Local File Systems.
>>>>>> systemd[1]: Stopping Local File Systems.
>>>>>> systemd[1]: Unmounting /share...
>>>>>> systemd[1]: Stopped (with error) /dev/md0.
>>>>
> ...
>>
>> The race is, I think, that one I mentioned.  If the md device is started
>> before udev tells systemd to start the timer, the Conflicts dependencies
>> goes the "wrong" way and stops the wrong thing.
>>
>
> From the logs provided it is unclear whether it is *timer* or
> *service*. If it is timer - I do not understand why it is started
> exactly 30 seconds after device apparently appears. This would match
> starting service.

My guess is that the timer is triggered immediately after the device is
started, but before it is mounted.
The Conflicts directive tries to stop the device, but is cannot stop the
device and there are no dependencies yet, so nothing happen.
After the timer fires (30 seconds later) the .service starts.  It also
has a Conflicts directory so systemd tried to stop the device again.
Now that it has been mounted, there is a dependences that can be
stopped, and the device gets unmounted.

>
> Yet another case where system logging is hopelessly unfriendly for
> troubleshooting :(
>
>> It would be nice to be able to reliably stop the timer when the device
>> starts, without risking having the device get stopped when the timer
>> starts, but I don't think we can reliably do that.
>>
>
> Well, let's wait until we can get some more information about what happens.
>
>> Changing the
>>   Conflicts=sys-devices-virtual-block-%i.device
>> lines to
>>   ConditionPathExists=/sys/devices/virtual/block/%i
>> might make the problem go away, without any negative consequences.
>>
>
> Ugly, but yes, may be this is the only way using current systemd.
>
>> The primary purpose of having the 'Conflicts' directives was so that
>> systemd wouldn't log
>>   Starting Activate md array even though degraded
>> after the array was successfully started.
>
> This looks like cosmetic problem. What will happen if last resort
> service is started when array is fully assembled? Will it do any harm?

Yes, it could be seen as cosmetic, but cosmetic issues can be important
too.  Confusing messages in logs can be harmful.

In all likely cases, running the last-resort service won't cause any
harm.
If, during the 30 seconds, the array is started, then deliberately
stopped, then partially assembled again, then when the last-resort
service finally starts it might do the wrong thing.
So it would be cleanest if the timer was killed as soon as the device
is started.  But I don't think there is a practical concern.

I guess I could make a udev rule that fires when the array started, and
that runs "systemctl stop mdadm-last-resort@md0.timer"

NeilBrown


>
>> Hopefully it won't do that when the Condition fails.
>>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [systemd-devel] Errorneous detection of degraded array
  2017-01-30 22:19           ` [systemd-devel] " NeilBrown
@ 2017-01-31 20:17             ` Andrei Borzenkov
  2017-02-08  4:10               ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Andrei Borzenkov @ 2017-01-31 20:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: Luke Pyzowski, systemd-devel, linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 4222 bytes --]

31.01.2017 01:19, NeilBrown пишет:
> On Mon, Jan 30 2017, Andrei Borzenkov wrote:
> 
>> On Mon, Jan 30, 2017 at 9:36 AM, NeilBrown <neilb@suse.com> wrote:
>> ...
>>>>>>>
>>>>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>>>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>>>>>> systemd[1]: Starting Activate md array even though degraded...
>>>>>>> systemd[1]: Stopped target Local File Systems.
>>>>>>> systemd[1]: Stopping Local File Systems.
>>>>>>> systemd[1]: Unmounting /share...
>>>>>>> systemd[1]: Stopped (with error) /dev/md0.
>>>>>
>> ...
>>>
>>> The race is, I think, that one I mentioned.  If the md device is started
>>> before udev tells systemd to start the timer, the Conflicts dependencies
>>> goes the "wrong" way and stops the wrong thing.
>>>
>>
>> From the logs provided it is unclear whether it is *timer* or
>> *service*. If it is timer - I do not understand why it is started
>> exactly 30 seconds after device apparently appears. This would match
>> starting service.
> 
> My guess is that the timer is triggered immediately after the device is
> started, but before it is mounted.
> The Conflicts directive tries to stop the device, but is cannot stop the
> device and there are no dependencies yet, so nothing happen.
> After the timer fires (30 seconds later) the .service starts.  It also
> has a Conflicts directory so systemd tried to stop the device again.
> Now that it has been mounted, there is a dependences that can be
> stopped, and the device gets unmounted.
> 
>>
>> Yet another case where system logging is hopelessly unfriendly for
>> troubleshooting :(
>>
>>> It would be nice to be able to reliably stop the timer when the device
>>> starts, without risking having the device get stopped when the timer
>>> starts, but I don't think we can reliably do that.
>>>
>>
>> Well, let's wait until we can get some more information about what happens.
>>

Not much more, but we at least have confirmed that it was indeed last
resort service which was fired off by last resort timer. Unfortunately
no trace of timer itself.

>>> Changing the
>>>   Conflicts=sys-devices-virtual-block-%i.device
>>> lines to
>>>   ConditionPathExists=/sys/devices/virtual/block/%i
>>> might make the problem go away, without any negative consequences.
>>>
>>
>> Ugly, but yes, may be this is the only way using current systemd.
>>

This won't work. sysfs node appears as soon as the very first array
member is found and array is still inactive, while what we need is
condition "array is active".

Conflicts line works because array is not announced to systemd
(SYSTEMD_READY) until it is active. Which in turn is derived from the
content of md/array_state.

>>> The primary purpose of having the 'Conflicts' directives was so that
>>> systemd wouldn't log
>>>   Starting Activate md array even though degraded
>>> after the array was successfully started.
>>

Yes, I understand it.

>> This looks like cosmetic problem. What will happen if last resort
>> service is started when array is fully assembled? Will it do any harm?
> 
> Yes, it could be seen as cosmetic, but cosmetic issues can be important
> too.  Confusing messages in logs can be harmful.
> 
> In all likely cases, running the last-resort service won't cause any
> harm.
> If, during the 30 seconds, the array is started, then deliberately
> stopped, then partially assembled again, then when the last-resort
> service finally starts it might do the wrong thing.
> So it would be cleanest if the timer was killed as soon as the device
> is started.  But I don't think there is a practical concern.
> 
> I guess I could make a udev rule that fires when the array started, and
> that runs "systemctl stop mdadm-last-resort@md0.timer"
> 


Well ... what we really need is unidirectional dependency. Actually the
way Conflicts is used *is* unidirectional anyway - nobody seriously
expects that starting foo.service will stop currently running
shutdown.target. But that is semantic we have currently.

But this probably will do to mitigate this issue until something more
generic can be implemented.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [systemd-devel] Errorneous detection of degraded array
  2017-01-31 20:17             ` Andrei Borzenkov
@ 2017-02-08  4:10               ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2017-02-08  4:10 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Luke Pyzowski, systemd-devel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

On Tue, Jan 31 2017, Andrei Borzenkov wrote:

>
>>>> Changing the
>>>>   Conflicts=sys-devices-virtual-block-%i.device
>>>> lines to
>>>>   ConditionPathExists=/sys/devices/virtual/block/%i
>>>> might make the problem go away, without any negative consequences.
>>>>
>>>
>>> Ugly, but yes, may be this is the only way using current systemd.
>>>
>
> This won't work. sysfs node appears as soon as the very first array
> member is found and array is still inactive, while what we need is
> condition "array is active".

Of course, you are right.
A suitable "array is active" test is the existence of
  .../md/sync_action
which appears when an array is activated (except for RAID0 and Linear,
which don't need last-resort support).

So this is what I propose to post upstream.  Could you please confirm
that it works for you?  It appears to work for me.

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Date: Wed, 8 Feb 2017 15:01:05 +1100
Subject: [PATCH] systemd/mdadm-last-resort: use ConditionPathExists instead of
 Conflicts

Commit cec72c071bbe ("systemd/mdadm-last-resort: add Conflicts to .service file.")

added a 'Conflicts' directive to the mdadm-last-resort@.service file in
the hope that this would make sure the service didn't run after the device
was active, even if the timer managed to get started, which is possible in
race conditions.

This seemed to work is testing, but it isn't clear why, and it is known
to cause problems.
If systemd happens to know that the mentioned device is a dependency of a
mount point, the Conflicts can unmount that mountpoint, which is certainly
not wanted.

So remove the "Conflicts" and instead use
 ConditionPathExists=!/sys/devices/virtual/block/%i/md/sync_action

The "sync_action" file exists for any array which require last-resort
handling, and only appear when the array is activated.  So it is safe
to reliy on it to determine if the last-resort is really needed.

Fixes: cec72c071bbe ("systemd/mdadm-last-resort: add Conflicts to .service file.")
Signed-off-by: NeilBrown <neilb@suse.com>
---
 systemd/mdadm-last-resort@.service | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/systemd/mdadm-last-resort@.service b/systemd/mdadm-last-resort@.service
index e93d72b2b45e..f9d4d12738a3 100644
--- a/systemd/mdadm-last-resort@.service
+++ b/systemd/mdadm-last-resort@.service
@@ -1,7 +1,7 @@
 [Unit]
 Description=Activate md array even though degraded
 DefaultDependencies=no
-Conflicts=sys-devices-virtual-block-%i.device
+ConditionPathExists=!/sys/devices/virtual/block/%i/md/sync_action

 [Service]
 Type=oneshot
-- 
2.11.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-02-08  4:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <96A26C8C6786C341B83BC4F2BC5419E4795DE9A6@SRF-EXCH1.corp.sunrisefutures.com>
2017-01-27  7:12 ` [systemd-devel] Errorneous detection of degraded array Andrei Borzenkov
2017-01-27  8:25   ` Martin Wilck
2017-01-27 19:44   ` Luke Pyzowski
2017-01-28 17:34     ` [systemd-devel] " Andrei Borzenkov
2017-01-30 22:41       ` Luke Pyzowski
2017-01-30  1:53   ` NeilBrown
2017-01-30  3:40     ` Andrei Borzenkov
2017-01-30  6:36       ` NeilBrown
2017-01-30  7:29         ` Andrei Borzenkov
2017-01-30 22:19           ` [systemd-devel] " NeilBrown
2017-01-31 20:17             ` Andrei Borzenkov
2017-02-08  4:10               ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.