On Tue, 07 Oct 2014 09:05:43 +0200 Francis Moreau wrote: > Hi Neil, > > On 09/30/2014 09:43 AM, Francis Moreau wrote: > > Hi Neil, > > > > On 09/29/2014 11:56 PM, NeilBrown wrote: > >> On Mon, 29 Sep 2014 10:45:17 +0200 Francis Moreau > >> wrote: > >> > >>>> So what were pids 930 and 459? > >>>> One was presumably the "mdadm -Ss" - probably 930. > >>>> Is 459 the "mdadm --monitor" ?? That might be useful hint. > >>>> > >>> > >>> yes. > >>> > >>> [456] is: /sbin/mdadm --monitor --scan --daemonise --syslog > >>> --pid-file=/run/mdadm/mdadm.pid > >>> > >>> and [930] is 'mdamd -Ss'. > >> > >> Good. Please try the patch below. > >> > > > > After applying your patch, this is what I'm getting in syslog: > > > > Sep 30 03:40:07 localhost kernel: md_open(): md125 opened by mdadm [970] > > Sep 30 03:40:07 localhost kernel: md_release(): md125 released by mdadm > > [970] > > Sep 30 03:40:07 localhost kernel: md_open(): md125 opened by mdadm [972] > > Sep 30 03:40:07 localhost kernel: md_open(): md125 opened by mdadm [970] > > Sep 30 03:40:07 localhost kernel: md_release(): md125 released by mdadm > > [972] > > Sep 30 03:40:07 localhost kernel: md_open(): md125 opened by > > systemd-udevd [971] > > Sep 30 03:40:07 localhost systemd[1]: Cannot add dependency job for unit > > mdmonitor-takeover.service, ignoring: Invalid argument > > Sep 30 03:40:07 localhost systemd[1]: Started Software RAID monitoring > > and management. > > Sep 30 03:40:07 localhost kernel: md_release(): md125 released by > > systemd-udevd [971] > > Sep 30 03:40:08 localhost mdadm[466]: DeviceDisappeared event detected > > on md device /dev/md125 > > Sep 30 03:40:08 localhost mdadm[466]: DeviceDisappeared event detected > > on md device /dev/md126 > > Sep 30 03:40:08 localhost mdadm[466]: DeviceDisappeared event detected > > on md device /dev/md127 > > Sep 30 03:40:08 localhost kernel: md125: detected capacity change from > > 1863254016 to 0 > > Sep 30 03:40:08 localhost kernel: md: md125 stopped. > > Sep 30 03:40:08 localhost kernel: md: unbind > > Sep 30 03:40:08 localhost kernel: md: export_rdev(vdc3) > > Sep 30 03:40:08 localhost kernel: md: unbind > > Sep 30 03:40:08 localhost kernel: md: export_rdev(vdb3) > > Sep 30 03:40:08 localhost kernel: md_release(): md125 released by mdadm > > [970] > > Sep 30 03:40:08 localhost kernel: md_open(): md127 opened by mdadm [466] > > Sep 30 03:40:08 localhost kernel: md_release(): md127 released by mdadm > > [466] > > Sep 30 03:40:08 localhost kernel: md_open(): md126 opened by mdadm [466] > > Sep 30 03:40:08 localhost kernel: md_release(): md126 released by mdadm > > [466] > > Sep 30 03:40:08 localhost kernel: md_open(): md126 opened by mdadm [970] > > Sep 30 03:40:08 localhost kernel: md_release(): md126 released by mdadm > > [970] > > Sep 30 03:40:08 localhost kernel: md_open(): md126 opened by mdadm [970] > > Sep 30 03:40:08 localhost kernel: md126: detected capacity change from > > 67043328 to 0 > > Sep 30 03:40:08 localhost kernel: md: md126 stopped. > > Sep 30 03:40:08 localhost kernel: md: unbind > > Sep 30 03:40:08 localhost kernel: md: export_rdev(vdc1) > > Sep 30 03:40:08 localhost kernel: md: unbind > > Sep 30 03:40:08 localhost kernel: md: export_rdev(vdb1) > > Sep 30 03:40:08 localhost kernel: md_open(): md127 opened by mdadm [466] > > Sep 30 03:40:08 localhost kernel: md_release(): md127 released by mdadm > > [466] > > Sep 30 03:40:08 localhost kernel: md_release(): md126 released by mdadm > > [970] > > Sep 30 03:40:08 localhost kernel: md_open(): md127 opened by mdadm [970] > > Sep 30 03:40:08 localhost kernel: md_release(): md127 released by mdadm > > [970] > > Sep 30 03:40:08 localhost kernel: md_open(): md127 opened by mdadm [970] > > Sep 30 03:40:08 localhost kernel: md127: detected capacity change from > > 214564864 to 0 > > Sep 30 03:40:08 localhost kernel: md: md127 stopped. > > Sep 30 03:40:08 localhost kernel: md: unbind > > Sep 30 03:40:08 localhost kernel: md: export_rdev(vdc2) > > Sep 30 03:40:08 localhost kernel: md: unbind > > Sep 30 03:40:08 localhost kernel: md: export_rdev(vdb2) > > Sep 30 03:40:08 localhost kernel: md_release(): md127 released by mdadm > > [970] > > > > The ghost device is no more present so your patch seems to have fixed my > > issue. But I must admit I don't really understand what's going on :-/ > > > > Since those 'ghost' devices are expected from the MD implementation > point of view, I'm wondering how am I supposed to detect them or maybe > how an application is supposed to recognized online arrays. If your application is looking in /proc/mdstat, then the "ghost" devices will be either "inactive" or not present at all. If your application is looking in /sys/block/md*, then the "ghost" devices will have "clear" or "inactive" in /sys/block/mdXX/md/array_state. If you use the new "CREATE names=yes" line in mdadm.conf (mdadm 3.3 or later), and use kernel 3.17 or later, and use names rather than numbers to identify your arrays (/dev/md/home, /dev/md_root), then the "ghost" problem will be gone, and names in /proc/mdstat will be e.g. "md_home", or "md_root" rather than "md4" or "md127". > > My application uses udev to detect et to get information about new > devices. I don't think the information exported by udev is enough to > figure this out. Also please note that since I rely on udev, I can't > really read information on /sys since this information may be out of > sync with the one returned by udev. If udev reports that an array exists, then it really did exist when udev got the message. By the time your program gets run by udev, it might not exist any more. i.e. udev is always racy. You should always treat any event from udev as a hint: "Something happened to this device in the recent past. Lots of other things might have happened since. The device might not exist any more, or it might have been replaced with a completely different device. So you might want to do something, or you might not, but whatever you do - be careful and don't blame me if things go wrong 'cause I'm just the messenger." NeilBrown