MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'

All of lore.kernel.org
 help / color / mirror / Atom feed

* MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
@ 2011-08-29 17:17 Alexander Lyakas
  2011-08-29 21:25 ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-08-29 17:17 UTC (permalink / raw)
  To: linux-raid

Greetings everybody,

I issue
mdadm --stop /dev/md0
and I want to reliably determine that the MD devnode (/dev/md0) is gone.
So I look for the udev 'remove' event for that devnode.
However, in some cases even after I see the udev event, I issue
mdadm --detail /dev/md0
and I get:
mdadm: md device /dev/md0 does not appear to be active

According to Detail.c, this means that mdadm can successfully do
open("/dev/md0") and receive a valid fd.
But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
from the kernel.

Can somebody suggest an explanation for this behavior? Is there a
reliable way to know when a MD devnode is gone?

Thanks,
  Alex.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-08-29 17:17 MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active' Alexander Lyakas
@ 2011-08-29 21:25 ` NeilBrown
  2011-08-30 15:18   ` Alexander Lyakas
  2011-09-13  8:49   ` Alexander Lyakas
  0 siblings, 2 replies; 15+ messages in thread
From: NeilBrown @ 2011-08-29 21:25 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Greetings everybody,
> 
> I issue
> mdadm --stop /dev/md0
> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> So I look for the udev 'remove' event for that devnode.
> However, in some cases even after I see the udev event, I issue
> mdadm --detail /dev/md0
> and I get:
> mdadm: md device /dev/md0 does not appear to be active
> 
> According to Detail.c, this means that mdadm can successfully do
> open("/dev/md0") and receive a valid fd.
> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> from the kernel.
> 
> Can somebody suggest an explanation for this behavior? Is there a
> reliable way to know when a MD devnode is gone?

run "udevadm settle" after stopping /dev/md0  is most likely to work.

I suspect that udev removes the node *after* you see the 'remove' event.
Sometimes so soon after that you don't see the lag - sometimes a bit later.

NeilBrown

> 
> Thanks,
>   Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-08-29 21:25 ` NeilBrown
@ 2011-08-30 15:18   ` Alexander Lyakas
  2011-08-31  0:54     ` NeilBrown
  2011-09-13  8:49   ` Alexander Lyakas
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-08-30 15:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks, Neil.

Although according to udev documentation: "the udev events are sent
out after udev has finished its event processing, all rules have been
processed, and needed device nodes are created."

Also looking at udev-worker code of udevd, the
udev_monitor_send_device() call is done after all the rules have been
processed.

Nevertheless, I looked at udevadm_settle.c and did some equivalent of
that in my code, and it looks like the issue is resolved. Perhaps
there is something md-specific here?

Another thing, since you are reading this thread, I wanted to ask
whether you have any advice on the "RAID5: failing an active component
during spare rebuild - arrays hangs" thread I opened some time ago.
Since you were not answering, I assume there is nothing additional you
can advise about, correct? I apologize if this off-topic was
inappropriate.

Thanks for the help,
  Alex.



On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Greetings everybody,
>>
>> I issue
>> mdadm --stop /dev/md0
>> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
>> So I look for the udev 'remove' event for that devnode.
>> However, in some cases even after I see the udev event, I issue
>> mdadm --detail /dev/md0
>> and I get:
>> mdadm: md device /dev/md0 does not appear to be active
>>
>> According to Detail.c, this means that mdadm can successfully do
>> open("/dev/md0") and receive a valid fd.
>> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
>> from the kernel.
>>
>> Can somebody suggest an explanation for this behavior? Is there a
>> reliable way to know when a MD devnode is gone?
>
> run "udevadm settle" after stopping /dev/md0  is most likely to work.
>
> I suspect that udev removes the node *after* you see the 'remove' event.
> Sometimes so soon after that you don't see the lag - sometimes a bit later.
>
> NeilBrown
>
>>
>> Thanks,
>>   Alex.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-08-30 15:18   ` Alexander Lyakas
@ 2011-08-31  0:54     ` NeilBrown
  2011-09-01 21:18       ` Alexander Lyakas
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-08-31  0:54 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

On Tue, 30 Aug 2011 18:18:11 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Thanks, Neil.
> 
> Although according to udev documentation: "the udev events are sent
> out after udev has finished its event processing, all rules have been
> processed, and needed device nodes are created."
> 
> Also looking at udev-worker code of udevd, the
> udev_monitor_send_device() call is done after all the rules have been
> processed.
> 
> Nevertheless, I looked at udevadm_settle.c and did some equivalent of
> that in my code, and it looks like the issue is resolved. Perhaps
> there is something md-specific here?

I cannot see how it would be md-specific.  mdadm doesn't create or remove
devices when udev is active - it leaves all that to udev.
If you are curious I suggest you ask the udev developers.

> 
> Another thing, since you are reading this thread, I wanted to ask
> whether you have any advice on the "RAID5: failing an active component
> during spare rebuild - arrays hangs" thread I opened some time ago.
> Since you were not answering, I assume there is nothing additional you
> can advise about, correct? I apologize if this off-topic was
> inappropriate.

It could mean that I had nothing extra to say, but it could also mean that I
got distracted, forgot, and never got back to it.  I live in a world of
distractions :-(

But a reminder never hurts - it shows that it is important to you, so that
makes it at least a little bit important to me.  I'll go back and have a look
and see if I have anything useful to add.

NeilBrown


> 
> Thanks for the help,
>   Alex.
> 
> 
> 
> On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Greetings everybody,
> >>
> >> I issue
> >> mdadm --stop /dev/md0
> >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> >> So I look for the udev 'remove' event for that devnode.
> >> However, in some cases even after I see the udev event, I issue
> >> mdadm --detail /dev/md0
> >> and I get:
> >> mdadm: md device /dev/md0 does not appear to be active
> >>
> >> According to Detail.c, this means that mdadm can successfully do
> >> open("/dev/md0") and receive a valid fd.
> >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> >> from the kernel.
> >>
> >> Can somebody suggest an explanation for this behavior? Is there a
> >> reliable way to know when a MD devnode is gone?
> >
> > run "udevadm settle" after stopping /dev/md0  is most likely to work.
> >
> > I suspect that udev removes the node *after* you see the 'remove' event.
> > Sometimes so soon after that you don't see the lag - sometimes a bit later.
> >
> > NeilBrown
> >
> >>
> >> Thanks,
> >>   Alex.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-08-31  0:54     ` NeilBrown
@ 2011-09-01 21:18       ` Alexander Lyakas
  0 siblings, 0 replies; 15+ messages in thread
From: Alexander Lyakas @ 2011-09-01 21:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thank you for looking at both issues, Neil.

Alex.


On Wed, Aug 31, 2011 at 3:54 AM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 30 Aug 2011 18:18:11 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Thanks, Neil.
>>
>> Although according to udev documentation: "the udev events are sent
>> out after udev has finished its event processing, all rules have been
>> processed, and needed device nodes are created."
>>
>> Also looking at udev-worker code of udevd, the
>> udev_monitor_send_device() call is done after all the rules have been
>> processed.
>>
>> Nevertheless, I looked at udevadm_settle.c and did some equivalent of
>> that in my code, and it looks like the issue is resolved. Perhaps
>> there is something md-specific here?
>
> I cannot see how it would be md-specific.  mdadm doesn't create or remove
> devices when udev is active - it leaves all that to udev.
> If you are curious I suggest you ask the udev developers.
>
>>
>> Another thing, since you are reading this thread, I wanted to ask
>> whether you have any advice on the "RAID5: failing an active component
>> during spare rebuild - arrays hangs" thread I opened some time ago.
>> Since you were not answering, I assume there is nothing additional you
>> can advise about, correct? I apologize if this off-topic was
>> inappropriate.
>
> It could mean that I had nothing extra to say, but it could also mean that I
> got distracted, forgot, and never got back to it.  I live in a world of
> distractions :-(
>
> But a reminder never hurts - it shows that it is important to you, so that
> makes it at least a little bit important to me.  I'll go back and have a look
> and see if I have anything useful to add.
>
> NeilBrown
>
>
>>
>> Thanks for the help,
>>   Alex.
>>
>>
>>
>> On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
>> > wrote:
>> >
>> >> Greetings everybody,
>> >>
>> >> I issue
>> >> mdadm --stop /dev/md0
>> >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
>> >> So I look for the udev 'remove' event for that devnode.
>> >> However, in some cases even after I see the udev event, I issue
>> >> mdadm --detail /dev/md0
>> >> and I get:
>> >> mdadm: md device /dev/md0 does not appear to be active
>> >>
>> >> According to Detail.c, this means that mdadm can successfully do
>> >> open("/dev/md0") and receive a valid fd.
>> >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
>> >> from the kernel.
>> >>
>> >> Can somebody suggest an explanation for this behavior? Is there a
>> >> reliable way to know when a MD devnode is gone?
>> >
>> > run "udevadm settle" after stopping /dev/md0  is most likely to work.
>> >
>> > I suspect that udev removes the node *after* you see the 'remove' event.
>> > Sometimes so soon after that you don't see the lag - sometimes a bit later.
>> >
>> > NeilBrown
>> >
>> >>
>> >> Thanks,
>> >>   Alex.
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-08-29 21:25 ` NeilBrown
  2011-08-30 15:18   ` Alexander Lyakas
@ 2011-09-13  8:49   ` Alexander Lyakas
  2011-09-21  5:03     ` NeilBrown
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-09-13  8:49 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hello Neil,
I am sorry for opening this again, but I am convinced now that I don't
understand what's going on:)

Basically, I see that GET_ARRAY_INFO can also return ENODEV in case
the device in the kernel exists, but "we are not initialized yet":
/* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
 * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */
if ((!mddev->raid_disks && !mddev->external)
    && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
    && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE
    && cmd != GET_BITMAP_FILE) {
	err = -ENODEV;
	goto abort_unlock;

I thought that ENODEV means that the device in the kernel does not
exist, although I am not this familiar with the kernel sources (yet)
to verify that.

Basically, I just wanted to know whether there is a reliable way to
determine whether the kernel MD device exists or no. (Obviously,
success to open a devnode from user space is not enough).

Thanks,
  Alex.







On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Greetings everybody,
>>
>> I issue
>> mdadm --stop /dev/md0
>> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
>> So I look for the udev 'remove' event for that devnode.
>> However, in some cases even after I see the udev event, I issue
>> mdadm --detail /dev/md0
>> and I get:
>> mdadm: md device /dev/md0 does not appear to be active
>>
>> According to Detail.c, this means that mdadm can successfully do
>> open("/dev/md0") and receive a valid fd.
>> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
>> from the kernel.
>>
>> Can somebody suggest an explanation for this behavior? Is there a
>> reliable way to know when a MD devnode is gone?
>
> run "udevadm settle" after stopping /dev/md0  is most likely to work.
>
> I suspect that udev removes the node *after* you see the 'remove' event.
> Sometimes so soon after that you don't see the lag - sometimes a bit later.
>
> NeilBrown
>
>>
>> Thanks,
>>   Alex.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-09-13  8:49   ` Alexander Lyakas
@ 2011-09-21  5:03     ` NeilBrown
  2011-09-23 19:24       ` Alexander Lyakas
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-09-21  5:03 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3604 bytes --]

On Tue, 13 Sep 2011 11:49:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hello Neil,
> I am sorry for opening this again, but I am convinced now that I don't
> understand what's going on:)
> 
> Basically, I see that GET_ARRAY_INFO can also return ENODEV in case
> the device in the kernel exists, but "we are not initialized yet":
> /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
>  * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */
> if ((!mddev->raid_disks && !mddev->external)
>     && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
>     && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE
>     && cmd != GET_BITMAP_FILE) {
> 	err = -ENODEV;
> 	goto abort_unlock;
> 
> I thought that ENODEV means that the device in the kernel does not
> exist, although I am not this familiar with the kernel sources (yet)
> to verify that.
> 
> Basically, I just wanted to know whether there is a reliable way to
> determine whether the kernel MD device exists or no. (Obviously,
> success to open a devnode from user space is not enough).
> 
> Thanks,
>   Alex.

What exactly do you mean by "the kernel MD device exists" ??

When you open a device-special-file for an md device (major == 9) it
automatically creates an inactive array.  You can then fill in the details
and activate it, or explicitly deactivate it.  If you do that it will
disappear.

Opening the devnode is enough to check that the device exists, because it
creates the device and then you know that it exists.
If you want to know if it already exists - whether inactive or not - look
in /proc/mdstat or /sys/block/md*.
If you want to know if it already exists and is active, look in /proc/mdstat,
or open the device and use GET_ARRAY_INFO, or look in /sys/block/md*
and look at the device size. or maybe /sys/block/mdXX/md/raid_disks.

It depends on why you are asking.

NeilBrown



> 
> 
> 
> 
> 
> 
> 
> On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Greetings everybody,
> >>
> >> I issue
> >> mdadm --stop /dev/md0
> >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> >> So I look for the udev 'remove' event for that devnode.
> >> However, in some cases even after I see the udev event, I issue
> >> mdadm --detail /dev/md0
> >> and I get:
> >> mdadm: md device /dev/md0 does not appear to be active
> >>
> >> According to Detail.c, this means that mdadm can successfully do
> >> open("/dev/md0") and receive a valid fd.
> >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> >> from the kernel.
> >>
> >> Can somebody suggest an explanation for this behavior? Is there a
> >> reliable way to know when a MD devnode is gone?
> >
> > run "udevadm settle" after stopping /dev/md0  is most likely to work.
> >
> > I suspect that udev removes the node *after* you see the 'remove' event.
> > Sometimes so soon after that you don't see the lag - sometimes a bit later.
> >
> > NeilBrown
> >
> >>
> >> Thanks,
> >>   Alex.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-09-21  5:03     ` NeilBrown
@ 2011-09-23 19:24       ` Alexander Lyakas
  2011-09-25 10:15         ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-09-23 19:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thank you, Neil, for answering.
I'm not sure that understand all of this, because my knowledge of
Linux user-kernel interaction is, unfortunately, not sufficient. In
the future, I hope to know more.
For example, I don't understand, how opening a "/dev/mdXX" can create
a device in the kernel, if the devnode "/dev/mdXX" does not exist. In
that case, I actually fail to open it with ENOENT.

But what I did is actually similar to what you advised:
- if I fail to open the devnode with ENOENT, I know (?) that the
device does not exist
- otherwise, I do GET_ARRAY_INFO
- if it returns ok, then I go ahead and do GET_DISK_INFOs to get the
disks information
- otherwise if it returns ENODEV, I close the fd and then I read /proc/mdstat
- if the md is there, then I know it's inactive array (and I have to
--stop it and reassemble or do incremental assembly)
- if the md is not there, then I know that it really does not exist
(this is the case when md deletion happened but the devnode did not
disappear yet)

Does it sound right? It passes stress testing pretty well.

By the way, I understand that /proc/mdstat can be only of 4K size...so
if I have many arrays, I should probably switch to look at
/sys/block....

Thanks,
  Alex.






On Wed, Sep 21, 2011 at 8:03 AM, NeilBrown <neilb@suse.de> wrote:
>
> On Tue, 13 Sep 2011 11:49:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
> > Hello Neil,
> > I am sorry for opening this again, but I am convinced now that I don't
> > understand what's going on:)
> >
> > Basically, I see that GET_ARRAY_INFO can also return ENODEV in case
> > the device in the kernel exists, but "we are not initialized yet":
> > /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
> >  * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */
> > if ((!mddev->raid_disks && !mddev->external)
> >     && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
> >     && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE
> >     && cmd != GET_BITMAP_FILE) {
> >       err = -ENODEV;
> >       goto abort_unlock;
> >
> > I thought that ENODEV means that the device in the kernel does not
> > exist, although I am not this familiar with the kernel sources (yet)
> > to verify that.
> >
> > Basically, I just wanted to know whether there is a reliable way to
> > determine whether the kernel MD device exists or no. (Obviously,
> > success to open a devnode from user space is not enough).
> >
> > Thanks,
> >   Alex.
>
> What exactly do you mean by "the kernel MD device exists" ??
>
> When you open a device-special-file for an md device (major == 9) it
> automatically creates an inactive array.  You can then fill in the details
> and activate it, or explicitly deactivate it.  If you do that it will
> disappear.
>
> Opening the devnode is enough to check that the device exists, because it
> creates the device and then you know that it exists.
> If you want to know if it already exists - whether inactive or not - look
> in /proc/mdstat or /sys/block/md*.
> If you want to know if it already exists and is active, look in /proc/mdstat,
> or open the device and use GET_ARRAY_INFO, or look in /sys/block/md*
> and look at the device size. or maybe /sys/block/mdXX/md/raid_disks.
>
> It depends on why you are asking.
>
> NeilBrown
>
>
>
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> > > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > > wrote:
> > >
> > >> Greetings everybody,
> > >>
> > >> I issue
> > >> mdadm --stop /dev/md0
> > >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> > >> So I look for the udev 'remove' event for that devnode.
> > >> However, in some cases even after I see the udev event, I issue
> > >> mdadm --detail /dev/md0
> > >> and I get:
> > >> mdadm: md device /dev/md0 does not appear to be active
> > >>
> > >> According to Detail.c, this means that mdadm can successfully do
> > >> open("/dev/md0") and receive a valid fd.
> > >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> > >> from the kernel.
> > >>
> > >> Can somebody suggest an explanation for this behavior? Is there a
> > >> reliable way to know when a MD devnode is gone?
> > >
> > > run "udevadm settle" after stopping /dev/md0  is most likely to work.
> > >
> > > I suspect that udev removes the node *after* you see the 'remove' event.
> > > Sometimes so soon after that you don't see the lag - sometimes a bit later.
> > >
> > > NeilBrown
> > >
> > >>
> > >> Thanks,
> > >>   Alex.
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-09-23 19:24       ` Alexander Lyakas
@ 2011-09-25 10:15         ` NeilBrown
  2011-10-11 13:11           ` Alexander Lyakas
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-09-25 10:15 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6306 bytes --]

On Fri, 23 Sep 2011 22:24:08 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Thank you, Neil, for answering.
> I'm not sure that understand all of this, because my knowledge of
> Linux user-kernel interaction is, unfortunately, not sufficient. In
> the future, I hope to know more.
> For example, I don't understand, how opening a "/dev/mdXX" can create
> a device in the kernel, if the devnode "/dev/mdXX" does not exist. In
> that case, I actually fail to open it with ENOENT.

/dev/mdXX is a "device special file".  It is not the device itself.
You can think of it like a symbolic link.
The "real" name for the device is something like "block device with major 9
and minor X"  That thing can exist quite independently of whether
the /dev/mdXX thing exists.  Just like a file may or may not exist
independently of whether some sym-link to it exists.

When the device (block,9,XX) appears, udev is told and it should create
things in /dev.  when the device disappears, udev is told and it should
remove the /dev entry.  But there can be races, and other things might
sometimes add or remove /dev entries (though they shouldn't).  So the
existence of something in /dev isn't a guarantee that it really exists.


> 
> But what I did is actually similar to what you advised:
> - if I fail to open the devnode with ENOENT, I know (?) that the
> device does not exist
> - otherwise, I do GET_ARRAY_INFO
> - if it returns ok, then I go ahead and do GET_DISK_INFOs to get the
> disks information
> - otherwise if it returns ENODEV, I close the fd and then I read /proc/mdstat
> - if the md is there, then I know it's inactive array (and I have to
> --stop it and reassemble or do incremental assembly)
> - if the md is not there, then I know that it really does not exist
> (this is the case when md deletion happened but the devnode did not
> disappear yet)
> 
> Does it sound right? It passes stress testing pretty well.

Yes, that sounds right.

> 
> By the way, I understand that /proc/mdstat can be only of 4K size...so
> if I have many arrays, I should probably switch to look at
> /sys/block....

Correct.

NeilBrown


> 
> Thanks,
>   Alex.
> 
> 
> 
> 
> 
> 
> On Wed, Sep 21, 2011 at 8:03 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > On Tue, 13 Sep 2011 11:49:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> > > Hello Neil,
> > > I am sorry for opening this again, but I am convinced now that I don't
> > > understand what's going on:)
> > >
> > > Basically, I see that GET_ARRAY_INFO can also return ENODEV in case
> > > the device in the kernel exists, but "we are not initialized yet":
> > > /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
> > >  * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */
> > > if ((!mddev->raid_disks && !mddev->external)
> > >     && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
> > >     && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE
> > >     && cmd != GET_BITMAP_FILE) {
> > >       err = -ENODEV;
> > >       goto abort_unlock;
> > >
> > > I thought that ENODEV means that the device in the kernel does not
> > > exist, although I am not this familiar with the kernel sources (yet)
> > > to verify that.
> > >
> > > Basically, I just wanted to know whether there is a reliable way to
> > > determine whether the kernel MD device exists or no. (Obviously,
> > > success to open a devnode from user space is not enough).
> > >
> > > Thanks,
> > >   Alex.
> >
> > What exactly do you mean by "the kernel MD device exists" ??
> >
> > When you open a device-special-file for an md device (major == 9) it
> > automatically creates an inactive array.  You can then fill in the details
> > and activate it, or explicitly deactivate it.  If you do that it will
> > disappear.
> >
> > Opening the devnode is enough to check that the device exists, because it
> > creates the device and then you know that it exists.
> > If you want to know if it already exists - whether inactive or not - look
> > in /proc/mdstat or /sys/block/md*.
> > If you want to know if it already exists and is active, look in /proc/mdstat,
> > or open the device and use GET_ARRAY_INFO, or look in /sys/block/md*
> > and look at the device size. or maybe /sys/block/mdXX/md/raid_disks.
> >
> > It depends on why you are asking.
> >
> > NeilBrown
> >
> >
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> > > > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > > > wrote:
> > > >
> > > >> Greetings everybody,
> > > >>
> > > >> I issue
> > > >> mdadm --stop /dev/md0
> > > >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> > > >> So I look for the udev 'remove' event for that devnode.
> > > >> However, in some cases even after I see the udev event, I issue
> > > >> mdadm --detail /dev/md0
> > > >> and I get:
> > > >> mdadm: md device /dev/md0 does not appear to be active
> > > >>
> > > >> According to Detail.c, this means that mdadm can successfully do
> > > >> open("/dev/md0") and receive a valid fd.
> > > >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> > > >> from the kernel.
> > > >>
> > > >> Can somebody suggest an explanation for this behavior? Is there a
> > > >> reliable way to know when a MD devnode is gone?
> > > >
> > > > run "udevadm settle" after stopping /dev/md0  is most likely to work.
> > > >
> > > > I suspect that udev removes the node *after* you see the 'remove' event.
> > > > Sometimes so soon after that you don't see the lag - sometimes a bit later.
> > > >
> > > > NeilBrown
> > > >
> > > >>
> > > >> Thanks,
> > > >>   Alex.
> > > >> --
> > > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > >> the body of a message to majordomo@vger.kernel.org
> > > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-09-25 10:15         ` NeilBrown
@ 2011-10-11 13:11           ` Alexander Lyakas
  2011-10-12  3:45             ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-10-11 13:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hello Neil,
can you please confirm for me something?
In case the array is FAILED (when your enough() function returns 0) -
for example, after simultaneous failure of all drives - then the only
option to try to recover such array is to do:
mdadm --stop
and then attempt
mdadm --assemble

correct?

I did not see any other option to recover such array Incremental
assemble doesn't work in that case, it simply adds back the drives as
spares.

Thanks,
  Alex.

On Sun, Sep 25, 2011 at 12:15 PM, NeilBrown <neilb@suse.de> wrote:
> On Fri, 23 Sep 2011 22:24:08 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Thank you, Neil, for answering.
>> I'm not sure that understand all of this, because my knowledge of
>> Linux user-kernel interaction is, unfortunately, not sufficient. In
>> the future, I hope to know more.
>> For example, I don't understand, how opening a "/dev/mdXX" can create
>> a device in the kernel, if the devnode "/dev/mdXX" does not exist. In
>> that case, I actually fail to open it with ENOENT.
>
> /dev/mdXX is a "device special file".  It is not the device itself.
> You can think of it like a symbolic link.
> The "real" name for the device is something like "block device with major 9
> and minor X"  That thing can exist quite independently of whether
> the /dev/mdXX thing exists.  Just like a file may or may not exist
> independently of whether some sym-link to it exists.
>
> When the device (block,9,XX) appears, udev is told and it should create
> things in /dev.  when the device disappears, udev is told and it should
> remove the /dev entry.  But there can be races, and other things might
> sometimes add or remove /dev entries (though they shouldn't).  So the
> existence of something in /dev isn't a guarantee that it really exists.
>
>
>>
>> But what I did is actually similar to what you advised:
>> - if I fail to open the devnode with ENOENT, I know (?) that the
>> device does not exist
>> - otherwise, I do GET_ARRAY_INFO
>> - if it returns ok, then I go ahead and do GET_DISK_INFOs to get the
>> disks information
>> - otherwise if it returns ENODEV, I close the fd and then I read /proc/mdstat
>> - if the md is there, then I know it's inactive array (and I have to
>> --stop it and reassemble or do incremental assembly)
>> - if the md is not there, then I know that it really does not exist
>> (this is the case when md deletion happened but the devnode did not
>> disappear yet)
>>
>> Does it sound right? It passes stress testing pretty well.
>
> Yes, that sounds right.
>
>>
>> By the way, I understand that /proc/mdstat can be only of 4K size...so
>> if I have many arrays, I should probably switch to look at
>> /sys/block....
>
> Correct.
>
> NeilBrown
>
>
>>
>> Thanks,
>>   Alex.
>>
>>
>>
>>
>>
>>
>> On Wed, Sep 21, 2011 at 8:03 AM, NeilBrown <neilb@suse.de> wrote:
>> >
>> > On Tue, 13 Sep 2011 11:49:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
>> > wrote:
>> >
>> > > Hello Neil,
>> > > I am sorry for opening this again, but I am convinced now that I don't
>> > > understand what's going on:)
>> > >
>> > > Basically, I see that GET_ARRAY_INFO can also return ENODEV in case
>> > > the device in the kernel exists, but "we are not initialized yet":
>> > > /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
>> > >  * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */
>> > > if ((!mddev->raid_disks && !mddev->external)
>> > >     && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
>> > >     && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE
>> > >     && cmd != GET_BITMAP_FILE) {
>> > >       err = -ENODEV;
>> > >       goto abort_unlock;
>> > >
>> > > I thought that ENODEV means that the device in the kernel does not
>> > > exist, although I am not this familiar with the kernel sources (yet)
>> > > to verify that.
>> > >
>> > > Basically, I just wanted to know whether there is a reliable way to
>> > > determine whether the kernel MD device exists or no. (Obviously,
>> > > success to open a devnode from user space is not enough).
>> > >
>> > > Thanks,
>> > >   Alex.
>> >
>> > What exactly do you mean by "the kernel MD device exists" ??
>> >
>> > When you open a device-special-file for an md device (major == 9) it
>> > automatically creates an inactive array.  You can then fill in the details
>> > and activate it, or explicitly deactivate it.  If you do that it will
>> > disappear.
>> >
>> > Opening the devnode is enough to check that the device exists, because it
>> > creates the device and then you know that it exists.
>> > If you want to know if it already exists - whether inactive or not - look
>> > in /proc/mdstat or /sys/block/md*.
>> > If you want to know if it already exists and is active, look in /proc/mdstat,
>> > or open the device and use GET_ARRAY_INFO, or look in /sys/block/md*
>> > and look at the device size. or maybe /sys/block/mdXX/md/raid_disks.
>> >
>> > It depends on why you are asking.
>> >
>> > NeilBrown
>> >
>> >
>> >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
>> > > > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
>> > > > wrote:
>> > > >
>> > > >> Greetings everybody,
>> > > >>
>> > > >> I issue
>> > > >> mdadm --stop /dev/md0
>> > > >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
>> > > >> So I look for the udev 'remove' event for that devnode.
>> > > >> However, in some cases even after I see the udev event, I issue
>> > > >> mdadm --detail /dev/md0
>> > > >> and I get:
>> > > >> mdadm: md device /dev/md0 does not appear to be active
>> > > >>
>> > > >> According to Detail.c, this means that mdadm can successfully do
>> > > >> open("/dev/md0") and receive a valid fd.
>> > > >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
>> > > >> from the kernel.
>> > > >>
>> > > >> Can somebody suggest an explanation for this behavior? Is there a
>> > > >> reliable way to know when a MD devnode is gone?
>> > > >
>> > > > run "udevadm settle" after stopping /dev/md0  is most likely to work.
>> > > >
>> > > > I suspect that udev removes the node *after* you see the 'remove' event.
>> > > > Sometimes so soon after that you don't see the lag - sometimes a bit later.
>> > > >
>> > > > NeilBrown
>> > > >
>> > > >>
>> > > >> Thanks,
>> > > >>   Alex.
>> > > >> --
>> > > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > > >> the body of a message to majordomo@vger.kernel.org
>> > > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > > >
>> > > >
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > > the body of a message to majordomo@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-10-11 13:11           ` Alexander Lyakas
@ 2011-10-12  3:45             ` NeilBrown
  2011-10-19 12:01               ` Alexander Lyakas
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-10-12  3:45 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

On Tue, 11 Oct 2011 15:11:47 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hello Neil,
> can you please confirm for me something?
> In case the array is FAILED (when your enough() function returns 0) -
> for example, after simultaneous failure of all drives - then the only
> option to try to recover such array is to do:
> mdadm --stop
> and then attempt
> mdadm --assemble
> 
> correct?

Yes, though you will probably want a --force as well.

> 
> I did not see any other option to recover such array Incremental
> assemble doesn't work in that case, it simply adds back the drives as
> spares.

In recent version of mdadm it shouldn't add them as spare.  It should say
that it cannot add it and give up.

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-10-12  3:45             ` NeilBrown
@ 2011-10-19 12:01               ` Alexander Lyakas
  2011-10-19 23:56                 ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-10-19 12:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks, Neil.
I experimented with --force switch, and I saw that when using this
switch it is possible to start the array, even though I am sure that
the data will be corrupted. Such as selecting stale drives (which have
been replaced previously etc.)
Can I have some indication that it is "relatively safe" to start the
array with --force?
For example, in the case of "dirty degraded", perhaps it might be
relatively safe.

What should I look at? The output of --examine? Or something else?

Thanks,
  Alex.


On Wed, Oct 12, 2011 at 5:45 AM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 11 Oct 2011 15:11:47 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Hello Neil,
>> can you please confirm for me something?
>> In case the array is FAILED (when your enough() function returns 0) -
>> for example, after simultaneous failure of all drives - then the only
>> option to try to recover such array is to do:
>> mdadm --stop
>> and then attempt
>> mdadm --assemble
>>
>> correct?
>
> Yes, though you will probably want a --force as well.
>
>>
>> I did not see any other option to recover such array Incremental
>> assemble doesn't work in that case, it simply adds back the drives as
>> spares.
>
> In recent version of mdadm it shouldn't add them as spare.  It should say
> that it cannot add it and give up.
>
> NeilBrown
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-10-19 12:01               ` Alexander Lyakas
@ 2011-10-19 23:56                 ` NeilBrown
  2011-10-23  9:03                   ` Alexander Lyakas
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-10-19 23:56 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2516 bytes --]

On Wed, 19 Oct 2011 14:01:16 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Thanks, Neil.
> I experimented with --force switch, and I saw that when using this
> switch it is possible to start the array, even though I am sure that
> the data will be corrupted. Such as selecting stale drives (which have
> been replaced previously etc.)
> Can I have some indication that it is "relatively safe" to start the
> array with --force?
> For example, in the case of "dirty degraded", perhaps it might be
> relatively safe.
> 
> What should I look at? The output of --examine? Or something else?

Yes, look at the output of examine.  Look particularly at update time and
event counts, but also at RAID raid level etc and the role in the array
played by each device.

Then choose the set of devices that you should are most likely to have
current data and given them to "mdadm --assemble --force".

Obviously if one device hasn't been updated for months, that is probably a
bad choice, while if one device is only a few minutes behind the others, then
that is probably a good choice.

Normally there isn't much choice to be made, and the answer will be obvious.
But if you let devices fail and leave them lying around, or don't replace
them, then that can cause problems.

If you need to use --force  there might be some corruption.  Or there might
be none.  And there could be a lot.  But mdadm has know way of knowing.
Usually mdadm will do the best that is possible, but it cannot know how good
that is.

NeilBrown



> 
> Thanks,
>   Alex.
> 
> 
> On Wed, Oct 12, 2011 at 5:45 AM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 11 Oct 2011 15:11:47 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Hello Neil,
> >> can you please confirm for me something?
> >> In case the array is FAILED (when your enough() function returns 0) -
> >> for example, after simultaneous failure of all drives - then the only
> >> option to try to recover such array is to do:
> >> mdadm --stop
> >> and then attempt
> >> mdadm --assemble
> >>
> >> correct?
> >
> > Yes, though you will probably want a --force as well.
> >
> >>
> >> I did not see any other option to recover such array Incremental
> >> assemble doesn't work in that case, it simply adds back the drives as
> >> spares.
> >
> > In recent version of mdadm it shouldn't add them as spare.  It should say
> > that it cannot add it and give up.
> >
> > NeilBrown
> >
> >
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-10-19 23:56                 ` NeilBrown
@ 2011-10-23  9:03                   ` Alexander Lyakas
  2011-10-23 22:55                     ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lyakas @ 2011-10-23  9:03 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks, Neil.
To end this long email thread: what is "more important": update time
or event count? Or perhaps they are updated simultaneously?

Thanks,
  Alex.



On Thu, Oct 20, 2011 at 1:56 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 19 Oct 2011 14:01:16 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Thanks, Neil.
>> I experimented with --force switch, and I saw that when using this
>> switch it is possible to start the array, even though I am sure that
>> the data will be corrupted. Such as selecting stale drives (which have
>> been replaced previously etc.)
>> Can I have some indication that it is "relatively safe" to start the
>> array with --force?
>> For example, in the case of "dirty degraded", perhaps it might be
>> relatively safe.
>>
>> What should I look at? The output of --examine? Or something else?
>
> Yes, look at the output of examine.  Look particularly at update time and
> event counts, but also at RAID raid level etc and the role in the array
> played by each device.
>
> Then choose the set of devices that you should are most likely to have
> current data and given them to "mdadm --assemble --force".
>
> Obviously if one device hasn't been updated for months, that is probably a
> bad choice, while if one device is only a few minutes behind the others, then
> that is probably a good choice.
>
> Normally there isn't much choice to be made, and the answer will be obvious.
> But if you let devices fail and leave them lying around, or don't replace
> them, then that can cause problems.
>
> If you need to use --force  there might be some corruption.  Or there might
> be none.  And there could be a lot.  But mdadm has know way of knowing.
> Usually mdadm will do the best that is possible, but it cannot know how good
> that is.
>
> NeilBrown
>
>
>
>>
>> Thanks,
>>   Alex.
>>
>>
>> On Wed, Oct 12, 2011 at 5:45 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Tue, 11 Oct 2011 15:11:47 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
>> > wrote:
>> >
>> >> Hello Neil,
>> >> can you please confirm for me something?
>> >> In case the array is FAILED (when your enough() function returns 0) -
>> >> for example, after simultaneous failure of all drives - then the only
>> >> option to try to recover such array is to do:
>> >> mdadm --stop
>> >> and then attempt
>> >> mdadm --assemble
>> >>
>> >> correct?
>> >
>> > Yes, though you will probably want a --force as well.
>> >
>> >>
>> >> I did not see any other option to recover such array Incremental
>> >> assemble doesn't work in that case, it simply adds back the drives as
>> >> spares.
>> >
>> > In recent version of mdadm it shouldn't add them as spare.  It should say
>> > that it cannot add it and give up.
>> >
>> > NeilBrown
>> >
>> >
>> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
  2011-10-23  9:03                   ` Alexander Lyakas
@ 2011-10-23 22:55                     ` NeilBrown
  0 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2011-10-23 22:55 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 536 bytes --]

On Sun, 23 Oct 2011 11:03:14 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Thanks, Neil.
> To end this long email thread: what is "more important": update time
> or event count? Or perhaps they are updated simultaneously?
> 

They are updated simultaneously.

There is no sense that one is more important than the other, but if you know
something about the recent history of the array, then combining that
knowledge with these details might provide you with more precise information
of some sort.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-10-23 22:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-29 17:17 MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active' Alexander Lyakas
2011-08-29 21:25 ` NeilBrown
2011-08-30 15:18   ` Alexander Lyakas
2011-08-31  0:54     ` NeilBrown
2011-09-01 21:18       ` Alexander Lyakas
2011-09-13  8:49   ` Alexander Lyakas
2011-09-21  5:03     ` NeilBrown
2011-09-23 19:24       ` Alexander Lyakas
2011-09-25 10:15         ` NeilBrown
2011-10-11 13:11           ` Alexander Lyakas
2011-10-12  3:45             ` NeilBrown
2011-10-19 12:01               ` Alexander Lyakas
2011-10-19 23:56                 ` NeilBrown
2011-10-23  9:03                   ` Alexander Lyakas
2011-10-23 22:55                     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.