All of lore.kernel.org
 help / color / mirror / Atom feed
* The dev node can't be released at once after stopping raid
       [not found] <51439640.15505639.1496113073965.JavaMail.zimbra@redhat.com>
@ 2017-06-01  3:47 ` Xiao Ni
  2017-06-01  4:43   ` Zhilong Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2017-06-01  3:47 UTC (permalink / raw)
  To: linux-raid

Hi all

I tried with the latest linux stable kernel and latest mdadm.

After stopping a raid device, the dev node directory can't be released
at once. I did a simple test, the script is:

#!/bin/sh

while [ 1 ]; do 
mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1 
sleep 5
mdadm -S /dev/md0 
ls /dev/md0
sleep 1
ls /dev/md0
done

mdadm: stopped /dev/md0
/dev/md0
ls: cannot access /dev/md0: No such file or directory

It usually detects dev node /dev/md0 isn't released after stopping raid. 
I'm not sure whether it's a bug or not. Do we need to do some job to
make sure that the node should be released before command mdadm -S return?

Best Regards
Xiao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-06-01  3:47 ` The dev node can't be released at once after stopping raid Xiao Ni
@ 2017-06-01  4:43   ` Zhilong Liu
  2017-06-01  5:50     ` Xiao Ni
  0 siblings, 1 reply; 11+ messages in thread
From: Zhilong Liu @ 2017-06-01  4:43 UTC (permalink / raw)
  To: Xiao Ni, linux-raid



On 06/01/2017 11:47 AM, Xiao Ni wrote:
> Hi all
>
> I tried with the latest linux stable kernel and latest mdadm.
>
> After stopping a raid device, the dev node directory can't be released
> at once. I did a simple test, the script is:
>
> #!/bin/sh
>
> while [ 1 ]; do
> mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
> sleep 5
> mdadm -S /dev/md0
> ls /dev/md0
> sleep 1
> ls /dev/md0
> done
>
> mdadm: stopped /dev/md0
> /dev/md0
> ls: cannot access /dev/md0: No such file or directory
>
> It usually detects dev node /dev/md0 isn't released after stopping raid.
> I'm not sure whether it's a bug or not. Do we need to do some job to
> make sure that the node should be released before command mdadm -S return?

it's waiting for processing the udev events. we can monitor it via to "# 
udevadm monitor".

For mdadm -S /dev/md0, Manage_stop() has already did the errno checking,

cut piece of code from Manage.c
.. .. .. ..
done:

     /* As we have an O_EXCL open, any use of the device
      * which blocks STOP_ARRAY is probably a transient use,
      * so it is reasonable to retry for a while - 5 seconds.
      */
     count = 25; err = 0;
     while (count && fd >= 0 &&
            (err = ioctl(fd, STOP_ARRAY, NULL)) < 0 && errno == EBUSY) {
         usleep(200000);
         count --;
     }

Best regards,
-Zhilong

> Best Regards
> Xiao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-06-01  4:43   ` Zhilong Liu
@ 2017-06-01  5:50     ` Xiao Ni
  2017-08-31  3:55       ` Xiao Ni
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2017-06-01  5:50 UTC (permalink / raw)
  To: Zhilong Liu; +Cc: linux-raid



----- Original Message -----
> From: "Zhilong Liu" <zlliu@suse.com>
> To: "Xiao Ni" <xni@redhat.com>, linux-raid@vger.kernel.org
> Sent: Thursday, June 1, 2017 12:43:49 PM
> Subject: Re: The dev node can't be released at once after stopping raid
> 
> 
> 
> On 06/01/2017 11:47 AM, Xiao Ni wrote:
> > Hi all
> >
> > I tried with the latest linux stable kernel and latest mdadm.
> >
> > After stopping a raid device, the dev node directory can't be released
> > at once. I did a simple test, the script is:
> >
> > #!/bin/sh
> >
> > while [ 1 ]; do
> > mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
> > sleep 5
> > mdadm -S /dev/md0
> > ls /dev/md0
> > sleep 1
> > ls /dev/md0
> > done
> >
> > mdadm: stopped /dev/md0
> > /dev/md0
> > ls: cannot access /dev/md0: No such file or directory
> >
> > It usually detects dev node /dev/md0 isn't released after stopping raid.
> > I'm not sure whether it's a bug or not. Do we need to do some job to
> > make sure that the node should be released before command mdadm -S return?
> 
> it's waiting for processing the udev events. we can monitor it via to "#
> udevadm monitor".
> 
> For mdadm -S /dev/md0, Manage_stop() has already did the errno checking,
> 
> cut piece of code from Manage.c
> .. .. .. ..
> done:
> 
>      /* As we have an O_EXCL open, any use of the device
>       * which blocks STOP_ARRAY is probably a transient use,
>       * so it is reasonable to retry for a while - 5 seconds.
>       */
>      count = 25; err = 0;
>      while (count && fd >= 0 &&
>             (err = ioctl(fd, STOP_ARRAY, NULL)) < 0 && errno == EBUSY) {
>          usleep(200000);
>          count --;
>      }

Hi Zhilong

Good suggestions. I tried it and it can add some codes in the script to wait.
Is it better to check the udev events in mdadm? Let's check it after closing 
mdfd when Manage_stop returns. Because it's mdadm's job, right?

Regards
Xiao
> 
> Best regards,
> -Zhilong
> 
> > Best Regards
> > Xiao
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-06-01  5:50     ` Xiao Ni
@ 2017-08-31  3:55       ` Xiao Ni
  2017-08-31  4:36         ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2017-08-31  3:55 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

Hi Neil

I have searched in history emails and there have many topics like this. Sorry for talking
about this again. But it looks like the situation I encountered is different. There is 1 second
window between stop the raid device and delete the node /dev/md0. The /dev/md0 node can be
removed successfully after 1 second. 

There is no process that open the /dev/md0 after mdadm -S /dev/md0: 

mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1 --assume-clean
dmesg:
[36416.860525] Opened by mdadm, pid is 3523
[36416.984160] md/raid1:md0: active with 2 out of 2 mirrors
[36416.984181] md0: detected capacity change from 0 to 523239424
[36416.984219] Released by mdadm, pid is 3523
[36416.984228] remove_and_add_spares
[36416.991588] Opened by mdadm, pid is 3541
[36416.997183] Released by mdadm, pid is 3541
[36417.001376] Opened by systemd-udevd, pid is 3525
[36417.007128] Released by systemd-udevd, pid is 3525

udev:
KERNEL[36419.830817] add      /devices/virtual/bdi/9:0 (bdi)
KERNEL[36419.831045] add      /devices/virtual/block/md0 (block)
UDEV  [36419.832911] add      /devices/virtual/bdi/9:0 (bdi)
UDEV  [36419.836380] add      /devices/virtual/block/md0 (block)
KERNEL[36419.877705] change   /devices/virtual/block/loop0 (block)
KERNEL[36419.878057] change   /devices/virtual/block/loop0 (block)
KERNEL[36419.926761] change   /devices/virtual/block/loop1 (block)
KERNEL[36419.927015] change   /devices/virtual/block/loop1 (block)
UDEV  [36419.953112] change   /devices/virtual/block/loop0 (block)
UDEV  [36419.953141] change   /devices/virtual/block/loop1 (block)
KERNEL[36419.954765] change   /devices/virtual/block/md0 (block)
UDEV  [36419.955973] change   /devices/virtual/block/loop0 (block)
UDEV  [36419.962799] change   /devices/virtual/block/loop1 (block)
UDEV  [36419.982934] change   /devices/virtual/block/md0 (block)

mdadm -S /dev/md0
dmesg:
[36493.068054] Opened by mdadm, pid is 3552
[36493.072051] Released by mdadm, pid is 3552
[36493.076123] Opened by mdadm, pid is 3552
[36493.080073] md0: detected capacity change from 523239424 to 0
[36493.080077] md: md0 stopped.
[36493.273011] Released by mdadm, pid is 3552
udev:
KERNEL[36496.300219] remove   /devices/virtual/bdi/9:0 (bdi)
KERNEL[36496.300335] remove   /devices/virtual/block/md0 (block)
UDEV  [36496.300736] remove   /devices/virtual/bdi/9:0 (bdi)
UDEV  [36496.301812] remove   /devices/virtual/block/md0 (block)

There are only REMOVE events during command mdadm -S /dev/md0.

I tried to create a lvm and remove it to check whether lvm has this problem or not. 

pvcreate /dev/md0 
vgcreate vg /dev/md0 
lvcreate -L 100M -n test vg
lvremove vg/test -y
ls /dev/mapper/vg-test
ls /dev/dm-3

The node /dev/mapper/vg-test and /dev/dm-3 can be removed in time. There is no time
window. So it looks like it's a problem of md. Could you give some suggestions about
this? What should I do next? 

If it's not a bug, why there is a 1 second window?

Best Regards
Xiao

----- Original Message -----
> From: "Xiao Ni" <xni@redhat.com>
> To: "Zhilong Liu" <zlliu@suse.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, June 1, 2017 1:50:38 PM
> Subject: Re: The dev node can't be released at once after stopping raid
> 
> 
> 
> ----- Original Message -----
> > From: "Zhilong Liu" <zlliu@suse.com>
> > To: "Xiao Ni" <xni@redhat.com>, linux-raid@vger.kernel.org
> > Sent: Thursday, June 1, 2017 12:43:49 PM
> > Subject: Re: The dev node can't be released at once after stopping raid
> > 
> > 
> > 
> > On 06/01/2017 11:47 AM, Xiao Ni wrote:
> > > Hi all
> > >
> > > I tried with the latest linux stable kernel and latest mdadm.
> > >
> > > After stopping a raid device, the dev node directory can't be released
> > > at once. I did a simple test, the script is:
> > >
> > > #!/bin/sh
> > >
> > > while [ 1 ]; do
> > > mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
> > > sleep 5
> > > mdadm -S /dev/md0
> > > ls /dev/md0
> > > sleep 1
> > > ls /dev/md0
> > > done
> > >
> > > mdadm: stopped /dev/md0
> > > /dev/md0
> > > ls: cannot access /dev/md0: No such file or directory
> > >
> > > It usually detects dev node /dev/md0 isn't released after stopping raid.
> > > I'm not sure whether it's a bug or not. Do we need to do some job to
> > > make sure that the node should be released before command mdadm -S
> > > return?
> > 
> > it's waiting for processing the udev events. we can monitor it via to "#
> > udevadm monitor".
> > 
> > For mdadm -S /dev/md0, Manage_stop() has already did the errno checking,
> > 
> > cut piece of code from Manage.c
> > .. .. .. ..
> > done:
> > 
> >      /* As we have an O_EXCL open, any use of the device
> >       * which blocks STOP_ARRAY is probably a transient use,
> >       * so it is reasonable to retry for a while - 5 seconds.
> >       */
> >      count = 25; err = 0;
> >      while (count && fd >= 0 &&
> >             (err = ioctl(fd, STOP_ARRAY, NULL)) < 0 && errno == EBUSY) {
> >          usleep(200000);
> >          count --;
> >      }
> 
> Hi Zhilong
> 
> Good suggestions. I tried it and it can add some codes in the script to wait.
> Is it better to check the udev events in mdadm? Let's check it after closing
> mdfd when Manage_stop returns. Because it's mdadm's job, right?
> 
> Regards
> Xiao
> > 
> > Best regards,
> > -Zhilong
> > 
> > > Best Regards
> > > Xiao
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-08-31  3:55       ` Xiao Ni
@ 2017-08-31  4:36         ` NeilBrown
  2017-08-31  6:17           ` Xiao Ni
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2017-08-31  4:36 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4266 bytes --]

On Wed, Aug 30 2017, Xiao Ni wrote:

> Hi Neil
>
> I have searched in history emails and there have many topics like this. Sorry for talking
> about this again. But it looks like the situation I encountered is different. There is 1 second
> window between stop the raid device and delete the node /dev/md0. The /dev/md0 node can be
> removed successfully after 1 second. 

I think you are saying that /dev/md0 gets deleted 1 second after the
device is stopped.  I assume that is a delay in udev processing of
events.

When you say "can be"  I assume you mean "is being".
ie. if you say
   "The node can be removed after 1 second", it seems to imply that if
   you try to remove it earlier, the unlink() will fail.
If you say
  "The node is being removed after 1 seconds", that suggests that the
  removal happens automatically, but there is a delay between the device
  stopping and the removal happening.

>
> There is no process that open the /dev/md0 after mdadm -S /dev/md0: 
>
> mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1 --assume-clean
> dmesg:
> [36416.860525] Opened by mdadm, pid is 3523
> [36416.984160] md/raid1:md0: active with 2 out of 2 mirrors
> [36416.984181] md0: detected capacity change from 0 to 523239424
> [36416.984219] Released by mdadm, pid is 3523
> [36416.984228] remove_and_add_spares
> [36416.991588] Opened by mdadm, pid is 3541
> [36416.997183] Released by mdadm, pid is 3541
> [36417.001376] Opened by systemd-udevd, pid is 3525
> [36417.007128] Released by systemd-udevd, pid is 3525
>
> udev:
> KERNEL[36419.830817] add      /devices/virtual/bdi/9:0 (bdi)
> KERNEL[36419.831045] add      /devices/virtual/block/md0 (block)
> UDEV  [36419.832911] add      /devices/virtual/bdi/9:0 (bdi)
> UDEV  [36419.836380] add      /devices/virtual/block/md0 (block)
> KERNEL[36419.877705] change   /devices/virtual/block/loop0 (block)
> KERNEL[36419.878057] change   /devices/virtual/block/loop0 (block)
> KERNEL[36419.926761] change   /devices/virtual/block/loop1 (block)
> KERNEL[36419.927015] change   /devices/virtual/block/loop1 (block)
> UDEV  [36419.953112] change   /devices/virtual/block/loop0 (block)
> UDEV  [36419.953141] change   /devices/virtual/block/loop1 (block)
> KERNEL[36419.954765] change   /devices/virtual/block/md0 (block)
> UDEV  [36419.955973] change   /devices/virtual/block/loop0 (block)
> UDEV  [36419.962799] change   /devices/virtual/block/loop1 (block)
> UDEV  [36419.982934] change   /devices/virtual/block/md0 (block)
>
> mdadm -S /dev/md0
> dmesg:
> [36493.068054] Opened by mdadm, pid is 3552
> [36493.072051] Released by mdadm, pid is 3552
> [36493.076123] Opened by mdadm, pid is 3552
> [36493.080073] md0: detected capacity change from 523239424 to 0
> [36493.080077] md: md0 stopped.
> [36493.273011] Released by mdadm, pid is 3552
> udev:
> KERNEL[36496.300219] remove   /devices/virtual/bdi/9:0 (bdi)
> KERNEL[36496.300335] remove   /devices/virtual/block/md0 (block)
> UDEV  [36496.300736] remove   /devices/virtual/bdi/9:0 (bdi)
> UDEV  [36496.301812] remove   /devices/virtual/block/md0 (block)

I don't see any 1 second delay here.
I can see a 3 second delay between "Released by mdadm, pid = 3552" and
the UDEV remove event.  Is that what you are referring to?

>
> There are only REMOVE events during command mdadm -S /dev/md0.

The remove events seems to happen *after* "mdadm -S /dev/md0", or did
"mdadm -S /dev/md0" take 3 seconds to run?

>
> I tried to create a lvm and remove it to check whether lvm has this problem or not. 
>
> pvcreate /dev/md0 
> vgcreate vg /dev/md0 
> lvcreate -L 100M -n test vg
> lvremove vg/test -y
> ls /dev/mapper/vg-test
> ls /dev/dm-3
>
> The node /dev/mapper/vg-test and /dev/dm-3 can be removed in time. There is no time
> window. So it looks like it's a problem of md. Could you give some suggestions about
> this? What should I do next? 

Maybe lvremove explicitly unlinks the files in /dev, I don't know.

>
> If it's not a bug, why there is a 1 second window?

As I said, probably because udev is slow.
Why do you think this is a problem?  Why do you care about 1 second
window.  If I don't know how why this matters, I cannot help you.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-08-31  4:36         ` NeilBrown
@ 2017-08-31  6:17           ` Xiao Ni
  2017-08-31  6:48             ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2017-08-31  6:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid



----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, August 31, 2017 12:36:08 PM
> Subject: Re: The dev node can't be released at once after stopping raid
> 
> On Wed, Aug 30 2017, Xiao Ni wrote:
> 
> > Hi Neil
> >
> > I have searched in history emails and there have many topics like this.
> > Sorry for talking
> > about this again. But it looks like the situation I encountered is
> > different. There is 1 second
> > window between stop the raid device and delete the node /dev/md0. The
> > /dev/md0 node can be
> > removed successfully after 1 second.
> 
> I think you are saying that /dev/md0 gets deleted 1 second after the
> device is stopped.  I assume that is a delay in udev processing of
> events.
> 
> When you say "can be"  I assume you mean "is being".
> ie. if you say
>    "The node can be removed after 1 second", it seems to imply that if
>    you try to remove it earlier, the unlink() will fail.
> If you say
>   "The node is being removed after 1 seconds", that suggests that the
>   removal happens automatically, but there is a delay between the device
>   stopping and the removal happening.

Yes, it's this situation. The node is being removed after 1 seconds.

> 
> >
> > There is no process that open the /dev/md0 after mdadm -S /dev/md0:
> >
> > mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1 --assume-clean
> > dmesg:
> > [36416.860525] Opened by mdadm, pid is 3523
> > [36416.984160] md/raid1:md0: active with 2 out of 2 mirrors
> > [36416.984181] md0: detected capacity change from 0 to 523239424
> > [36416.984219] Released by mdadm, pid is 3523
> > [36416.984228] remove_and_add_spares
> > [36416.991588] Opened by mdadm, pid is 3541
> > [36416.997183] Released by mdadm, pid is 3541
> > [36417.001376] Opened by systemd-udevd, pid is 3525
> > [36417.007128] Released by systemd-udevd, pid is 3525
> >
> > udev:
> > KERNEL[36419.830817] add      /devices/virtual/bdi/9:0 (bdi)
> > KERNEL[36419.831045] add      /devices/virtual/block/md0 (block)
> > UDEV  [36419.832911] add      /devices/virtual/bdi/9:0 (bdi)
> > UDEV  [36419.836380] add      /devices/virtual/block/md0 (block)
> > KERNEL[36419.877705] change   /devices/virtual/block/loop0 (block)
> > KERNEL[36419.878057] change   /devices/virtual/block/loop0 (block)
> > KERNEL[36419.926761] change   /devices/virtual/block/loop1 (block)
> > KERNEL[36419.927015] change   /devices/virtual/block/loop1 (block)
> > UDEV  [36419.953112] change   /devices/virtual/block/loop0 (block)
> > UDEV  [36419.953141] change   /devices/virtual/block/loop1 (block)
> > KERNEL[36419.954765] change   /devices/virtual/block/md0 (block)
> > UDEV  [36419.955973] change   /devices/virtual/block/loop0 (block)
> > UDEV  [36419.962799] change   /devices/virtual/block/loop1 (block)
> > UDEV  [36419.982934] change   /devices/virtual/block/md0 (block)
> >
> > mdadm -S /dev/md0
> > dmesg:
> > [36493.068054] Opened by mdadm, pid is 3552
> > [36493.072051] Released by mdadm, pid is 3552
> > [36493.076123] Opened by mdadm, pid is 3552
> > [36493.080073] md0: detected capacity change from 523239424 to 0
> > [36493.080077] md: md0 stopped.
> > [36493.273011] Released by mdadm, pid is 3552
> > udev:
> > KERNEL[36496.300219] remove   /devices/virtual/bdi/9:0 (bdi)
> > KERNEL[36496.300335] remove   /devices/virtual/block/md0 (block)
> > UDEV  [36496.300736] remove   /devices/virtual/bdi/9:0 (bdi)
> > UDEV  [36496.301812] remove   /devices/virtual/block/md0 (block)
> 
> I don't see any 1 second delay here.
> I can see a 3 second delay between "Released by mdadm, pid = 3552" and
> the UDEV remove event.  Is that what you are referring to?

Ah, how do you calculate 3 second? 36496 - 36493? 

I did the test again, the dmesg and udev are:
dmesg
[ 2988.821730] Opened by mdadm, pid is 3174
[ 2988.825827] Released by mdadm, pid is 3174
[ 2988.830112] Opened by mdadm, pid is 3174
[ 2988.834200] md: md0 stopped.
[ 2988.834397] Released by mdadm, pid is 3174
udev
KERNEL[2989.150258] remove   /devices/virtual/bdi/9:0 (bdi)
KERNEL[2989.150334] remove   /devices/virtual/block/md0 (block)
UDEV  [2989.150491] remove   /devices/virtual/bdi/9:0 (bdi)
UDEV  [2989.151587] remove   /devices/virtual/block/md0 (block)


The test script is:
[root@dell-per210-01 ~]# cat test.sh 
#!/bin/sh
mdadm -CR /dev/md0 -l1 -n2 /dev/loop0  /dev/loop1 --assume-clean
mdadm -S /dev/md0
ls /dev/md0
sleep 1
ls /dev/md0

The result is:
[root@dell-per210-01 ~]# sh test.sh 
...
mdadm: stopped /dev/md0
/dev/md0
ls: cannot access /dev/md0: No such file or directory

> 
> >
> > There are only REMOVE events during command mdadm -S /dev/md0.
> 
> The remove events seems to happen *after* "mdadm -S /dev/md0", or did
> "mdadm -S /dev/md0" take 3 seconds to run?
> 
> >
> > I tried to create a lvm and remove it to check whether lvm has this problem
> > or not.
> >
> > pvcreate /dev/md0
> > vgcreate vg /dev/md0
> > lvcreate -L 100M -n test vg
> > lvremove vg/test -y
> > ls /dev/mapper/vg-test
> > ls /dev/dm-3
> >
> > The node /dev/mapper/vg-test and /dev/dm-3 can be removed in time. There is
> > no time
> > window. So it looks like it's a problem of md. Could you give some
> > suggestions about
> > this? What should I do next?
> 
> Maybe lvremove explicitly unlinks the files in /dev, I don't know.

I did a test. mdadm unlink /run/mdadm/map.lock during mdadm -S. Can mdadm
unlink explicitly too? I added this line and this problem can be fixed.

diff --git a/Manage.c b/Manage.c
index b82a729..04994b3 100644
--- a/Manage.c
+++ b/Manage.c
@@ -482,6 +482,7 @@ done:
        map_lock(&map);
        map_remove(&map, devnm);
        map_unlock(&map);
+       unlink(devname);
 out:
        sysfs_free(mdi);
 

> 
> >
> > If it's not a bug, why there is a 1 second window?
> 
> As I said, probably because udev is slow.
> Why do you think this is a problem?  Why do you care about 1 second
> window.  If I don't know how why this matters, I cannot help you.

There is a bug https://bugzilla.redhat.com/show_bug.cgi?id=1444434. 
Another tool(blivet) stops raid device and the device node still exists.
Then it calls mdadm -S xxx again and it fails. So I ask myself why
/dev/mdxxx can't be removed immediately after command mdadm -S. 

In topic "MD Remnants After –stop", you said the REMOVE events are 
generated by "md_free() -> del_gendisk() ->  blk_unregister_queue()".
When mdadm -S return, the REMOVE events should be generated already,
right?

I always have a question. Who is responsible for removing the device
node under /dev/ directory? The function unlink()?

> 
> NeilBrown
> 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-08-31  6:17           ` Xiao Ni
@ 2017-08-31  6:48             ` NeilBrown
  2017-08-31  7:16               ` Xiao Ni
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2017-08-31  6:48 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

On Thu, Aug 31 2017, Xiao Ni wrote:
>
> There is a bug https://bugzilla.redhat.com/show_bug.cgi?id=1444434. 
> Another tool(blivet) stops raid device and the device node still exists.
> Then it calls mdadm -S xxx again and it fails.

I would suggest that the tool is broken.  It should trust mdadm and not
double check that it actually worked.

>                                                 So I ask myself why
> /dev/mdxxx can't be removed immediately after command mdadm -S. 

Because udev is asynchronous.  You cannot rely on things happening
instantly.  udev doesn't work that way.

mdadm has a function 'wait_for()' which waits for the device name to
appear when the array is started.  Possibly we could add something to
wait for udev to remove the device when the array is stopped, but I
really think it shouldn't be necessary.  Nothing should look at the name
after the device is stopped.

>
> In topic "MD Remnants After –stop", you said the REMOVE events are 
> generated by "md_free() -> del_gendisk() ->  blk_unregister_queue()".
> When mdadm -S return, the REMOVE events should be generated already,
> right?

Not necessarily. md_free is called from mddev_delayed_delete, which is
run on a work-queue, so might be delayed briefly.

>
> I always have a question. Who is responsible for removing the device
> node under /dev/ directory? The function unlink()?

udev is responsible for removing the device.  Obviously udev uses unlink()
to do this.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-08-31  6:48             ` NeilBrown
@ 2017-08-31  7:16               ` Xiao Ni
  2017-08-31 23:39                 ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2017-08-31  7:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid



----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, August 31, 2017 2:48:02 PM
> Subject: Re: The dev node can't be released at once after stopping raid
> 
> On Thu, Aug 31 2017, Xiao Ni wrote:
> >
> > There is a bug https://bugzilla.redhat.com/show_bug.cgi?id=1444434.
> > Another tool(blivet) stops raid device and the device node still exists.
> > Then it calls mdadm -S xxx again and it fails.
> 
> I would suggest that the tool is broken.  It should trust mdadm and not
> double check that it actually worked.
> 
> >                                                 So I ask myself why
> > /dev/mdxxx can't be removed immediately after command mdadm -S.
> 
> Because udev is asynchronous.  You cannot rely on things happening
> instantly.  udev doesn't work that way.
> 
> mdadm has a function 'wait_for()' which waits for the device name to
> appear when the array is started.  Possibly we could add something to
> wait for udev to remove the device when the array is stopped, but I
> really think it shouldn't be necessary.  Nothing should look at the name
> after the device is stopped.

Hmm I want to try it. There are many topics about /dev/mdxx exists after
stop array before. I want to stop this topic forever. 

Can we remove /dev/mdxx directly? Something like this:
diff --git a/Manage.c b/Manage.c
index b82a729..04994b3 100644
--- a/Manage.c
+++ b/Manage.c
@@ -482,6 +482,7 @@ done:
        map_lock(&map);
        map_remove(&map, devnm);
        map_unlock(&map);
+       unlink(devname);
 out:
        sysfs_free(mdi);


Best Regards
Xiao 

> 
> >
> > In topic "MD Remnants After –stop", you said the REMOVE events are
> > generated by "md_free() -> del_gendisk() ->  blk_unregister_queue()".
> > When mdadm -S return, the REMOVE events should be generated already,
> > right?
> 
> Not necessarily. md_free is called from mddev_delayed_delete, which is
> run on a work-queue, so might be delayed briefly.
> 
> >
> > I always have a question. Who is responsible for removing the device
> > node under /dev/ directory? The function unlink()?
> 
> udev is responsible for removing the device.  Obviously udev uses unlink()
> to do this.
> 
> NeilBrown
> 
> 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-08-31  7:16               ` Xiao Ni
@ 2017-08-31 23:39                 ` NeilBrown
  2017-09-01  0:30                   ` Xiao Ni
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2017-08-31 23:39 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2860 bytes --]

On Thu, Aug 31 2017, Xiao Ni wrote:

> ----- Original Message -----
>> From: "NeilBrown" <neilb@suse.com>
>> To: "Xiao Ni" <xni@redhat.com>
>> Cc: linux-raid@vger.kernel.org
>> Sent: Thursday, August 31, 2017 2:48:02 PM
>> Subject: Re: The dev node can't be released at once after stopping raid
>> 
>> On Thu, Aug 31 2017, Xiao Ni wrote:
>> >
>> > There is a bug https://bugzilla.redhat.com/show_bug.cgi?id=1444434.
>> > Another tool(blivet) stops raid device and the device node still exists.
>> > Then it calls mdadm -S xxx again and it fails.
>> 
>> I would suggest that the tool is broken.  It should trust mdadm and not
>> double check that it actually worked.
>> 
>> >                                                 So I ask myself why
>> > /dev/mdxxx can't be removed immediately after command mdadm -S.
>> 
>> Because udev is asynchronous.  You cannot rely on things happening
>> instantly.  udev doesn't work that way.
>> 
>> mdadm has a function 'wait_for()' which waits for the device name to
>> appear when the array is started.  Possibly we could add something to
>> wait for udev to remove the device when the array is stopped, but I
>> really think it shouldn't be necessary.  Nothing should look at the name
>> after the device is stopped.
>
> Hmm I want to try it. There are many topics about /dev/mdxx exists after
> stop array before. I want to stop this topic forever. 
>
> Can we remove /dev/mdxx directly? Something like this:
> diff --git a/Manage.c b/Manage.c
> index b82a729..04994b3 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -482,6 +482,7 @@ done:
>         map_lock(&map);
>         map_remove(&map, devnm);
>         map_unlock(&map);
> +       unlink(devname);

udev will create the device and multiple links.  You are just removing
the device.  Someone might come along and complain about the links.

I don't like this change.

NeilBrown


>  out:
>         sysfs_free(mdi);
>
>
> Best Regards
> Xiao 
>
>> 
>> >
>> > In topic "MD Remnants After –stop", you said the REMOVE events are
>> > generated by "md_free() -> del_gendisk() ->  blk_unregister_queue()".
>> > When mdadm -S return, the REMOVE events should be generated already,
>> > right?
>> 
>> Not necessarily. md_free is called from mddev_delayed_delete, which is
>> run on a work-queue, so might be delayed briefly.
>> 
>> >
>> > I always have a question. Who is responsible for removing the device
>> > node under /dev/ directory? The function unlink()?
>> 
>> udev is responsible for removing the device.  Obviously udev uses unlink()
>> to do this.
>> 
>> NeilBrown
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-08-31 23:39                 ` NeilBrown
@ 2017-09-01  0:30                   ` Xiao Ni
  2017-09-01  4:34                     ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2017-09-01  0:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid



----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Friday, September 1, 2017 7:39:21 AM
> Subject: Re: The dev node can't be released at once after stopping raid
> 
> On Thu, Aug 31 2017, Xiao Ni wrote:
> 
> > ----- Original Message -----
> >> From: "NeilBrown" <neilb@suse.com>
> >> To: "Xiao Ni" <xni@redhat.com>
> >> Cc: linux-raid@vger.kernel.org
> >> Sent: Thursday, August 31, 2017 2:48:02 PM
> >> Subject: Re: The dev node can't be released at once after stopping raid
> >> 
> >> On Thu, Aug 31 2017, Xiao Ni wrote:
> >> >
> >> > There is a bug https://bugzilla.redhat.com/show_bug.cgi?id=1444434.
> >> > Another tool(blivet) stops raid device and the device node still exists.
> >> > Then it calls mdadm -S xxx again and it fails.
> >> 
> >> I would suggest that the tool is broken.  It should trust mdadm and not
> >> double check that it actually worked.
> >> 
> >> >                                                 So I ask myself why
> >> > /dev/mdxxx can't be removed immediately after command mdadm -S.
> >> 
> >> Because udev is asynchronous.  You cannot rely on things happening
> >> instantly.  udev doesn't work that way.
> >> 
> >> mdadm has a function 'wait_for()' which waits for the device name to
> >> appear when the array is started.  Possibly we could add something to
> >> wait for udev to remove the device when the array is stopped, but I
> >> really think it shouldn't be necessary.  Nothing should look at the name
> >> after the device is stopped.
> >
> > Hmm I want to try it. There are many topics about /dev/mdxx exists after
> > stop array before. I want to stop this topic forever.
> >
> > Can we remove /dev/mdxx directly? Something like this:
> > diff --git a/Manage.c b/Manage.c
> > index b82a729..04994b3 100644
> > --- a/Manage.c
> > +++ b/Manage.c
> > @@ -482,6 +482,7 @@ done:
> >         map_lock(&map);
> >         map_remove(&map, devnm);
> >         map_unlock(&map);
> > +       unlink(devname);
> 
> udev will create the device and multiple links.  You are just removing
> the device.  Someone might come along and complain about the links.

Sorry, could you explain "udev will create the device and multiple links" in detail? 
Does it mean unlink can cause udev to re-create the device and multiple links? Or
You mean it should remove other links by unlink too, not only the device?

Regards
Xiao
> 
> I don't like this change.
> 
> NeilBrown
> 
> 
> >  out:
> >         sysfs_free(mdi);
> >
> >
> > Best Regards
> > Xiao
> >
> >> 
> >> >
> >> > In topic "MD Remnants After –stop", you said the REMOVE events are
> >> > generated by "md_free() -> del_gendisk() ->  blk_unregister_queue()".
> >> > When mdadm -S return, the REMOVE events should be generated already,
> >> > right?
> >> 
> >> Not necessarily. md_free is called from mddev_delayed_delete, which is
> >> run on a work-queue, so might be delayed briefly.
> >> 
> >> >
> >> > I always have a question. Who is responsible for removing the device
> >> > node under /dev/ directory? The function unlink()?
> >> 
> >> udev is responsible for removing the device.  Obviously udev uses unlink()
> >> to do this.
> >> 
> >> NeilBrown
> >> 
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The dev node can't be released at once after stopping raid
  2017-09-01  0:30                   ` Xiao Ni
@ 2017-09-01  4:34                     ` NeilBrown
  0 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2017-09-01  4:34 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Thu, Aug 31 2017, Xiao Ni wrote:

>> 
>> udev will create the device and multiple links.  You are just removing
>> the device.  Someone might come along and complain about the links.
>
> Sorry, could you explain "udev will create the device and multiple links" in detail? 
> Does it mean unlink can cause udev to re-create the device and multiple links? Or
> You mean it should remove other links by unlink too, not only the device?

If you have an md array assembled, run
  udevadm info /dev/mdWHATEVER | grep DEVLINKS

e.g.

$ udevadm info /dev/md0 | grep DEVLINKS
E: DEVLINKS=/dev/disk/by-id/md-uuid-4812bff9:24c9ef72:14a75d6a:bbcc0774 /dev/md/0 /dev/disk/by-id/md-name-any:0

Note that there are multiple names listed.
When the array appears, udev creates the device (/dev/md0 in this case)
and creates all the links pointing to the device.
When the array disappears, udev removes the device and the links.
You want to get mdadm to remove the device, but not the links.
That is, at best, half the job.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-01  4:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <51439640.15505639.1496113073965.JavaMail.zimbra@redhat.com>
2017-06-01  3:47 ` The dev node can't be released at once after stopping raid Xiao Ni
2017-06-01  4:43   ` Zhilong Liu
2017-06-01  5:50     ` Xiao Ni
2017-08-31  3:55       ` Xiao Ni
2017-08-31  4:36         ` NeilBrown
2017-08-31  6:17           ` Xiao Ni
2017-08-31  6:48             ` NeilBrown
2017-08-31  7:16               ` Xiao Ni
2017-08-31 23:39                 ` NeilBrown
2017-09-01  0:30                   ` Xiao Ni
2017-09-01  4:34                     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.