All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] md/mdadm: Inform udev about device removal when stopping
@ 2016-02-16 14:44 Sebastian Parschauer
  2016-02-16 14:44 ` [PATCH 1/2] md: " Sebastian Parschauer
  2016-02-16 14:44 ` [PATCH 2/2] Manage: " Sebastian Parschauer
  0 siblings, 2 replies; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-16 14:44 UTC (permalink / raw)
  To: linux-raid
  Cc: Sebastian Parschauer, Shaohua Li, Jes Sorensen, Brassow Jonathan,
	Artur Paszkiewicz, NeilBrown, systemd-devel

When stopping an MD device, then its device node /dev/mdX may still
exist afterwards or it is recreated by udev. The next open() call
can lead to creation of an inoperable MD device. The reason for
this is that a change event (KOBJ_CHANGE) is announced to udev.
So announce a removal event (KOBJ_REMOVE) to udev instead.

Because of the support for kernels prior to 2.6.28, this change is
required in mdadm and the kernel. The udev event from mdadm
overrides the one from the kernel.

Sebastian Parschauer (2):
  md: Inform udev about device removal when stopping
  Manage: Inform udev about device removal when stopping

 drivers/md/md.c |    2 +-
 Manage.c        |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] md: Inform udev about device removal when stopping
  2016-02-16 14:44 [PATCH 0/2] md/mdadm: Inform udev about device removal when stopping Sebastian Parschauer
@ 2016-02-16 14:44 ` Sebastian Parschauer
  2016-02-16 20:05   ` Shaohua Li
  2016-02-16 14:44 ` [PATCH 2/2] Manage: " Sebastian Parschauer
  1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-16 14:44 UTC (permalink / raw)
  To: linux-raid
  Cc: Sebastian Parschauer, Shaohua Li, Jes Sorensen, Brassow Jonathan,
	Artur Paszkiewicz, NeilBrown, systemd-devel

When stopping an MD device, then its device node /dev/mdX may still
exist afterwards or it is recreated by udev. The next open() call
can lead to creation of an inoperable MD device. The reason for
this is that a change event (KOBJ_CHANGE) is announced to udev.
So announce a removal event (KOBJ_REMOVE) to udev instead.

A change is likely also required in mdadm because of the support
for kernels prior to 2.6.28.

Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/md/md.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e55e6cf..ccae070 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5671,7 +5671,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 		export_array(mddev);
 
 		md_clean(mddev);
-		kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE);
+		kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_REMOVE);
 		if (mddev->hold_active == UNTIL_STOP)
 			mddev->hold_active = 0;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 14:44 [PATCH 0/2] md/mdadm: Inform udev about device removal when stopping Sebastian Parschauer
  2016-02-16 14:44 ` [PATCH 1/2] md: " Sebastian Parschauer
@ 2016-02-16 14:44 ` Sebastian Parschauer
  2016-02-16 17:41   ` Jes Sorensen
  1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-16 14:44 UTC (permalink / raw)
  To: linux-raid
  Cc: Sebastian Parschauer, Shaohua Li, Jes Sorensen, Brassow Jonathan,
	Artur Paszkiewicz, NeilBrown, systemd-devel

When stopping an MD device, then its device node /dev/mdX may still
exist afterwards or it is recreated by udev. The next open() call
can lead to creation of an inoperable MD device. The reason for
this is that a change event (KOBJ_CHANGE) is announced to udev.
So announce a removal event (KOBJ_REMOVE) to udev instead.

This also overrides the change event sent by the kernel.

Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 Manage.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Manage.c b/Manage.c
index 7e1b94b..bc89764 100644
--- a/Manage.c
+++ b/Manage.c
@@ -494,13 +494,13 @@ done:
 		goto out;
 	}
 	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
-	 * was stopped, so We'll do it here just to be sure.  Drop any
-	 * partitions as well...
+	 * was stopped, it should be KOBJ_REMOVE instead, so we set the
+	 * remove event here just to be sure. Drop any partitions as well...
 	 */
 	if (fd >= 0)
 		ioctl(fd, BLKRRPART, 0);
 	if (mdi)
-		sysfs_uevent(mdi, "change");
+		sysfs_uevent(mdi, "remove");
 
 	if (devnm[0] && use_udev()) {
 		struct map_ent *mp = map_by_devnm(&map, devnm);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 14:44 ` [PATCH 2/2] Manage: " Sebastian Parschauer
@ 2016-02-16 17:41   ` Jes Sorensen
  2016-02-16 18:03     ` Sebastian Parschauer
  0 siblings, 1 reply; 20+ messages in thread
From: Jes Sorensen @ 2016-02-16 17:41 UTC (permalink / raw)
  To: Sebastian Parschauer
  Cc: linux-raid, Shaohua Li, Brassow Jonathan, Artur Paszkiewicz,
	NeilBrown, systemd-devel

Sebastian Parschauer <sebastian.riemer@profitbricks.com> writes:
> When stopping an MD device, then its device node /dev/mdX may still
> exist afterwards or it is recreated by udev. The next open() call
> can lead to creation of an inoperable MD device. The reason for
> this is that a change event (KOBJ_CHANGE) is announced to udev.
> So announce a removal event (KOBJ_REMOVE) to udev instead.
>
> This also overrides the change event sent by the kernel.
>
> Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>  Manage.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Manage.c b/Manage.c
> index 7e1b94b..bc89764 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -494,13 +494,13 @@ done:
>  		goto out;
>  	}
>  	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
> -	 * was stopped, so We'll do it here just to be sure.  Drop any
> -	 * partitions as well...
> +	 * was stopped, it should be KOBJ_REMOVE instead, so we set the
> +	 * remove event here just to be sure. Drop any partitions as well...
>  	 */
>  	if (fd >= 0)
>  		ioctl(fd, BLKRRPART, 0);
>  	if (mdi)
> -		sysfs_uevent(mdi, "change");
> +		sysfs_uevent(mdi, "remove");

I am a little concerned about this change. You assume the kernel and
mdadm will be updated in sync, which is unlikely to happen. I believe
you need to match the kernel version and send the corresponding event
currectly for this to work correctly?

Jes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 17:41   ` Jes Sorensen
@ 2016-02-16 18:03     ` Sebastian Parschauer
  2016-02-16 18:40       ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-16 18:03 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Brassow Jonathan, NeilBrown, linux-raid, Artur Paszkiewicz,
	systemd-devel, Shaohua Li

On 16.02.2016 18:41, Jes Sorensen wrote:
> Sebastian Parschauer <sebastian.riemer@profitbricks.com> writes:
>> When stopping an MD device, then its device node /dev/mdX may still
>> exist afterwards or it is recreated by udev. The next open() call
>> can lead to creation of an inoperable MD device. The reason for
>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>>
>> This also overrides the change event sent by the kernel.
>>
>> Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>> ---
>>  Manage.c |    6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/Manage.c b/Manage.c
>> index 7e1b94b..bc89764 100644
>> --- a/Manage.c
>> +++ b/Manage.c
>> @@ -494,13 +494,13 @@ done:
>>  		goto out;
>>  	}
>>  	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
>> -	 * was stopped, so We'll do it here just to be sure.  Drop any
>> -	 * partitions as well...
>> +	 * was stopped, it should be KOBJ_REMOVE instead, so we set the
>> +	 * remove event here just to be sure. Drop any partitions as well...
>>  	 */
>>  	if (fd >= 0)
>>  		ioctl(fd, BLKRRPART, 0);
>>  	if (mdi)
>> -		sysfs_uevent(mdi, "change");
>> +		sysfs_uevent(mdi, "remove");
> 
> I am a little concerned about this change. You assume the kernel and
> mdadm will be updated in sync, which is unlikely to happen. I believe
> you need to match the kernel version and send the corresponding event
> currectly for this to work correctly?

The worst thing that can happen is that the kernel sends the change
event after the remove event. Then it is the current situation again.
From my tests mdadm does enough other stuff in between. Udev is able to
handle receiving two remove events from my testing. Multiple mdadm
instances can't run in parallel any ways. So userspace around it needs
some serialization for it any ways. So also stopping an MD device and
assembling a new one with the same minor number shouldn't race.

I still prefer this solution here. But if you decide to drop the udev
event sending in mdadm, then I'm also fine with that.

Cheers,
Sebastian
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 18:03     ` Sebastian Parschauer
@ 2016-02-16 18:40       ` Hannes Reinecke
  2016-02-16 18:52         ` Jes Sorensen
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-02-16 18:40 UTC (permalink / raw)
  To: Sebastian Parschauer, Jes Sorensen
  Cc: linux-raid, Shaohua Li, Brassow Jonathan, Artur Paszkiewicz,
	NeilBrown, systemd-devel

On 02/16/2016 07:03 PM, Sebastian Parschauer wrote:
> On 16.02.2016 18:41, Jes Sorensen wrote:
>> Sebastian Parschauer <sebastian.riemer@profitbricks.com> writes:
>>> When stopping an MD device, then its device node /dev/mdX may still
>>> exist afterwards or it is recreated by udev. The next open() call
>>> can lead to creation of an inoperable MD device. The reason for
>>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>>>
>>> This also overrides the change event sent by the kernel.
>>>
>>> Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>>> ---
>>>   Manage.c |    6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/Manage.c b/Manage.c
>>> index 7e1b94b..bc89764 100644
>>> --- a/Manage.c
>>> +++ b/Manage.c
>>> @@ -494,13 +494,13 @@ done:
>>>   		goto out;
>>>   	}
>>>   	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
>>> -	 * was stopped, so We'll do it here just to be sure.  Drop any
>>> -	 * partitions as well...
>>> +	 * was stopped, it should be KOBJ_REMOVE instead, so we set the
>>> +	 * remove event here just to be sure. Drop any partitions as well...
>>>   	 */
>>>   	if (fd >= 0)
>>>   		ioctl(fd, BLKRRPART, 0);
>>>   	if (mdi)
>>> -		sysfs_uevent(mdi, "change");
>>> +		sysfs_uevent(mdi, "remove");
>>
>> I am a little concerned about this change. You assume the kernel and
>> mdadm will be updated in sync, which is unlikely to happen. I believe
>> you need to match the kernel version and send the corresponding event
>> currectly for this to work correctly?
>
> The worst thing that can happen is that the kernel sends the change
> event after the remove event. Then it is the current situation again.
>  From my tests mdadm does enough other stuff in between. Udev is able to
> handle receiving two remove events from my testing. Multiple mdadm
> instances can't run in parallel any ways. So userspace around it needs
> some serialization for it any ways. So also stopping an MD device and
> assembling a new one with the same minor number shouldn't race.
>
> I still prefer this solution here. But if you decide to drop the udev
> event sending in mdadm, then I'm also fine with that.
>
I strongly prefer removing the udev event generation altogether.
As this appears to be a carry-over from older kernels, it looks to me as 
being an incomplete conversion:
At one point udev introduced 'ONLINE' and 'OFFLINE' events, which were 
supposed to be used for this kind of scenario.
(ONLINE being a companion to 'ADD', and 'OFFLINE' being the companion to 
'DELETE'). However, later the 'ONLINE' got modified to 'CHANGE', and the 
'OFFLINE' got dropped completely.
Or that was the plan.
So it looks as if the conversion to 'CHANGE' got applied to the 
'OFFLINE' event, too.
Hence I strongly recommend to drop it completely, and let the kernel or 
the MD module decide if and when a uevent should be send.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 18:40       ` Hannes Reinecke
@ 2016-02-16 18:52         ` Jes Sorensen
  2016-02-16 20:46           ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Jes Sorensen @ 2016-02-16 18:52 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sebastian Parschauer, linux-raid, Shaohua Li, Brassow Jonathan,
	Artur Paszkiewicz, NeilBrown, systemd-devel

Hannes Reinecke <hare@suse.de> writes:
> On 02/16/2016 07:03 PM, Sebastian Parschauer wrote:
>> On 16.02.2016 18:41, Jes Sorensen wrote:
>>> Sebastian Parschauer <sebastian.riemer@profitbricks.com> writes:
>>>> When stopping an MD device, then its device node /dev/mdX may still
>>>> exist afterwards or it is recreated by udev. The next open() call
>>>> can lead to creation of an inoperable MD device. The reason for
>>>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>>>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>>>>
>>>> This also overrides the change event sent by the kernel.
>>>>
>>>> Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>>>> ---
>>>>   Manage.c |    6 +++---
>>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/Manage.c b/Manage.c
>>>> index 7e1b94b..bc89764 100644
>>>> --- a/Manage.c
>>>> +++ b/Manage.c
>>>> @@ -494,13 +494,13 @@ done:
>>>>   		goto out;
>>>>   	}
>>>>   	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
>>>> -	 * was stopped, so We'll do it here just to be sure.  Drop any
>>>> -	 * partitions as well...
>>>> +	 * was stopped, it should be KOBJ_REMOVE instead, so we set the
>>>> +	 * remove event here just to be sure. Drop any partitions as well...
>>>>   	 */
>>>>   	if (fd >= 0)
>>>>   		ioctl(fd, BLKRRPART, 0);
>>>>   	if (mdi)
>>>> -		sysfs_uevent(mdi, "change");
>>>> +		sysfs_uevent(mdi, "remove");
>>>
>>> I am a little concerned about this change. You assume the kernel and
>>> mdadm will be updated in sync, which is unlikely to happen. I believe
>>> you need to match the kernel version and send the corresponding event
>>> currectly for this to work correctly?
>>
>> The worst thing that can happen is that the kernel sends the change
>> event after the remove event. Then it is the current situation again.
>>  From my tests mdadm does enough other stuff in between. Udev is able to
>> handle receiving two remove events from my testing. Multiple mdadm
>> instances can't run in parallel any ways. So userspace around it needs
>> some serialization for it any ways. So also stopping an MD device and
>> assembling a new one with the same minor number shouldn't race.
>>
>> I still prefer this solution here. But if you decide to drop the udev
>> event sending in mdadm, then I'm also fine with that.
>>
> I strongly prefer removing the udev event generation altogether.
> As this appears to be a carry-over from older kernels, it looks to me
> as being an incomplete conversion:
> At one point udev introduced 'ONLINE' and 'OFFLINE' events, which were
> supposed to be used for this kind of scenario.
> (ONLINE being a companion to 'ADD', and 'OFFLINE' being the companion
> to 'DELETE'). However, later the 'ONLINE' got modified to 'CHANGE',
> and the 'OFFLINE' got dropped completely.
> Or that was the plan.
> So it looks as if the conversion to 'CHANGE' got applied to the
> 'OFFLINE' event, too.
> Hence I strongly recommend to drop it completely, and let the kernel
> or the MD module decide if and when a uevent should be send.

I am totally fine with this, however we should make mdadm fail if run
against a pre-2.6.28 kernel then.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] md: Inform udev about device removal when stopping
  2016-02-16 14:44 ` [PATCH 1/2] md: " Sebastian Parschauer
@ 2016-02-16 20:05   ` Shaohua Li
  2016-02-16 20:43     ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2016-02-16 20:05 UTC (permalink / raw)
  To: linux-raid
  Cc: Brassow Jonathan, Jes Sorensen, NeilBrown, Artur Paszkiewicz,
	systemd-devel

On Tue, Feb 16, 2016 at 03:44:36PM +0100, Sebastian Parschauer wrote:
> When stopping an MD device, then its device node /dev/mdX may still
> exist afterwards or it is recreated by udev. The next open() call
> can lead to creation of an inoperable MD device. The reason for
> this is that a change event (KOBJ_CHANGE) is announced to udev.
> So announce a removal event (KOBJ_REMOVE) to udev instead.
> 
> A change is likely also required in mdadm because of the support
> for kernels prior to 2.6.28.

I didn't follow why we need the change. Shouldn't the KOBJ_REMOVE event be sent
automatically when gendisk is deleted?
mddev_put()->mddev_delayed_delete()->md_free()->del_gendisk().

Thanks,
Shaohua
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] md: Inform udev about device removal when stopping
  2016-02-16 20:05   ` Shaohua Li
@ 2016-02-16 20:43     ` NeilBrown
  2016-02-17 11:24       ` Sebastian Parschauer
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-02-16 20:43 UTC (permalink / raw)
  To: Shaohua Li, linux-raid
  Cc: Jes Sorensen, Artur Paszkiewicz, Brassow Jonathan, systemd-devel


[-- Attachment #1.1: Type: text/plain, Size: 2312 bytes --]

On Wed, Feb 17 2016, Shaohua Li wrote:

> On Tue, Feb 16, 2016 at 03:44:36PM +0100, Sebastian Parschauer wrote:
>> When stopping an MD device, then its device node /dev/mdX may still
>> exist afterwards or it is recreated by udev. The next open() call
>> can lead to creation of an inoperable MD device. The reason for
>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>> 
>> A change is likely also required in mdadm because of the support
>> for kernels prior to 2.6.28.
>
> I didn't follow why we need the change. Shouldn't the KOBJ_REMOVE event be sent
> automatically when gendisk is deleted?
> mddev_put()->mddev_delayed_delete()->md_free()->del_gendisk().
>
> Thanks,
> Shaohua

For a bit of context: this KOBJ_CHANGE event was added in Oct 2008

Commit: 934d9c23b4c7 ("md: destroy partitions and notify udev when md array is stopped.")

At the time, md devices weren't getting removed at all.
Now they are (I figured out the locking), though they can still come
back.

There are still two stages.  The array is stopped, and then the block
device is destroyed.  It is theoretically possible to stop the array
without destroying the block device, though I don't think that happens
in practice.

So this KOBJ_CHANGE is, I think, technically correct (change from
"active" to "inactive")  but probably isn't needed any more - not to the
extent it was at the time.

There are some annoying races with caused by udev responding (belatedly)
to events by running programs that open s/dev/mdXX and so automatically
re-creates the md device.
The real problem here is not the event or the delays in udev.  It is the
fact that opening /dev/mdXX transparently creates a device.

The only way (I know of) to really avoid these races is to use named
arrays.
Put
   CREATE names=yes

in mdadm.conf.  Then md arrays will be created by writing a name to a
magic file in /sys.  The arrays have a minor number >=512 and are not
auto-re-created if the device node is re-opened before udev unlinks it.

So: the patch might be safe, and might solve a particular problem, but
it is really just a bandaid.  The best fix is "CREATE named=yes" (and
use named like "md_home", not "md4".

NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

[-- Attachment #2: Type: text/plain, Size: 172 bytes --]

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 18:52         ` Jes Sorensen
@ 2016-02-16 20:46           ` NeilBrown
  2016-02-16 22:02             ` Jes Sorensen
  2016-02-17  7:03             ` Hannes Reinecke
  0 siblings, 2 replies; 20+ messages in thread
From: NeilBrown @ 2016-02-16 20:46 UTC (permalink / raw)
  To: Jes Sorensen, Hannes Reinecke
  Cc: Sebastian Parschauer, linux-raid, Shaohua Li, Brassow Jonathan,
	Artur Paszkiewicz, systemd-devel

[-- Attachment #1: Type: text/plain, Size: 3578 bytes --]

On Wed, Feb 17 2016, Jes Sorensen wrote:

> Hannes Reinecke <hare@suse.de> writes:
>> On 02/16/2016 07:03 PM, Sebastian Parschauer wrote:
>>> On 16.02.2016 18:41, Jes Sorensen wrote:
>>>> Sebastian Parschauer <sebastian.riemer@profitbricks.com> writes:
>>>>> When stopping an MD device, then its device node /dev/mdX may still
>>>>> exist afterwards or it is recreated by udev. The next open() call
>>>>> can lead to creation of an inoperable MD device. The reason for
>>>>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>>>>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>>>>>
>>>>> This also overrides the change event sent by the kernel.
>>>>>
>>>>> Signed-off-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>>>>> ---
>>>>>   Manage.c |    6 +++---
>>>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/Manage.c b/Manage.c
>>>>> index 7e1b94b..bc89764 100644
>>>>> --- a/Manage.c
>>>>> +++ b/Manage.c
>>>>> @@ -494,13 +494,13 @@ done:
>>>>>   		goto out;
>>>>>   	}
>>>>>   	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
>>>>> -	 * was stopped, so We'll do it here just to be sure.  Drop any
>>>>> -	 * partitions as well...
>>>>> +	 * was stopped, it should be KOBJ_REMOVE instead, so we set the
>>>>> +	 * remove event here just to be sure. Drop any partitions as well...
>>>>>   	 */
>>>>>   	if (fd >= 0)
>>>>>   		ioctl(fd, BLKRRPART, 0);
>>>>>   	if (mdi)
>>>>> -		sysfs_uevent(mdi, "change");
>>>>> +		sysfs_uevent(mdi, "remove");
>>>>
>>>> I am a little concerned about this change. You assume the kernel and
>>>> mdadm will be updated in sync, which is unlikely to happen. I believe
>>>> you need to match the kernel version and send the corresponding event
>>>> currectly for this to work correctly?
>>>
>>> The worst thing that can happen is that the kernel sends the change
>>> event after the remove event. Then it is the current situation again.
>>>  From my tests mdadm does enough other stuff in between. Udev is able to
>>> handle receiving two remove events from my testing. Multiple mdadm
>>> instances can't run in parallel any ways. So userspace around it needs
>>> some serialization for it any ways. So also stopping an MD device and
>>> assembling a new one with the same minor number shouldn't race.
>>>
>>> I still prefer this solution here. But if you decide to drop the udev
>>> event sending in mdadm, then I'm also fine with that.
>>>
>> I strongly prefer removing the udev event generation altogether.
>> As this appears to be a carry-over from older kernels, it looks to me
>> as being an incomplete conversion:
>> At one point udev introduced 'ONLINE' and 'OFFLINE' events, which were
>> supposed to be used for this kind of scenario.
>> (ONLINE being a companion to 'ADD', and 'OFFLINE' being the companion
>> to 'DELETE'). However, later the 'ONLINE' got modified to 'CHANGE',
>> and the 'OFFLINE' got dropped completely.
>> Or that was the plan.
>> So it looks as if the conversion to 'CHANGE' got applied to the
>> 'OFFLINE' event, too.
>> Hence I strongly recommend to drop it completely, and let the kernel
>> or the MD module decide if and when a uevent should be send.
>
> I am totally fine with this, however we should make mdadm fail if run
> against a pre-2.6.28 kernel then.
>
> Cheers,
> Jes

I would suggest protecting the

	if (fd >= 0)
		ioctl(fd, BLKRRPART, 0);
	if (mdi)
		sysfs_uevent(mdi, "change");

code with

   if (get_linux_version() < 2006028)

That should be completely safe - 2.6.28 and later do this (if needed).

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 20:46           ` NeilBrown
@ 2016-02-16 22:02             ` Jes Sorensen
  2016-02-17 10:31               ` Sebastian Parschauer
  2016-02-17  7:03             ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Jes Sorensen @ 2016-02-16 22:02 UTC (permalink / raw)
  To: NeilBrown
  Cc: Hannes Reinecke, Sebastian Parschauer, linux-raid, Shaohua Li,
	Brassow Jonathan, Artur Paszkiewicz, systemd-devel

NeilBrown <neilb@suse.com> writes:
> On Wed, Feb 17 2016, Jes Sorensen wrote:
>
>> Hannes Reinecke <hare@suse.de> writes:
>>> On 02/16/2016 07:03 PM, Sebastian Parschauer wrote:
>>>> The worst thing that can happen is that the kernel sends the change
>>>> event after the remove event. Then it is the current situation again.
>>>>  From my tests mdadm does enough other stuff in between. Udev is able to
>>>> handle receiving two remove events from my testing. Multiple mdadm
>>>> instances can't run in parallel any ways. So userspace around it needs
>>>> some serialization for it any ways. So also stopping an MD device and
>>>> assembling a new one with the same minor number shouldn't race.
>>>>
>>>> I still prefer this solution here. But if you decide to drop the udev
>>>> event sending in mdadm, then I'm also fine with that.
>>>>
>>> I strongly prefer removing the udev event generation altogether.
>>> As this appears to be a carry-over from older kernels, it looks to me
>>> as being an incomplete conversion:
>>> At one point udev introduced 'ONLINE' and 'OFFLINE' events, which were
>>> supposed to be used for this kind of scenario.
>>> (ONLINE being a companion to 'ADD', and 'OFFLINE' being the companion
>>> to 'DELETE'). However, later the 'ONLINE' got modified to 'CHANGE',
>>> and the 'OFFLINE' got dropped completely.
>>> Or that was the plan.
>>> So it looks as if the conversion to 'CHANGE' got applied to the
>>> 'OFFLINE' event, too.
>>> Hence I strongly recommend to drop it completely, and let the kernel
>>> or the MD module decide if and when a uevent should be send.
>>
>> I am totally fine with this, however we should make mdadm fail if run
>> against a pre-2.6.28 kernel then.
>>
>> Cheers,
>> Jes
>
> I would suggest protecting the
>
> 	if (fd >= 0)
> 		ioctl(fd, BLKRRPART, 0);
> 	if (mdi)
> 		sysfs_uevent(mdi, "change");
>
> code with
>
>    if (get_linux_version() < 2006028)
>
> That should be completely safe - 2.6.28 and later do this (if needed).

Seems a better fix to me. I much prefer the duplicated events.

Sebastian, does this patch resolve the problem for you? If nobody
hollors, I will push this into mdadm.

Cheers,
Jes

commit 7856fa44b8f0bc217a6bbcb5f7c51b2f03717655
Author: Jes Sorensen <Jes.Sorensen@redhat.com>
Date:   Tue Feb 16 16:58:36 2016 -0500

    Manage.c: Only issue change events for kernels older then 2.6.28
    
    2.6.28+ kernels handle this themselves and issuing the event here can
    cause a race.
    
    Reported-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
    Suggested-by: NeilBrown <neilb@suse.com>
    Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>

diff --git a/Manage.c b/Manage.c
index 7e1b94b..eae96e1 100644
--- a/Manage.c
+++ b/Manage.c
@@ -493,14 +493,17 @@ done:
 		rv = 1;
 		goto out;
 	}
-	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
-	 * was stopped, so We'll do it here just to be sure.  Drop any
-	 * partitions as well...
-	 */
-	if (fd >= 0)
-		ioctl(fd, BLKRRPART, 0);
-	if (mdi)
-		sysfs_uevent(mdi, "change");
+
+	if (get_linux_version() < 2006028) {
+		/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
+		 * was stopped, so We'll do it here just to be sure.  Drop any
+		 * partitions as well...
+		 */
+		if (fd >= 0)
+			ioctl(fd, BLKRRPART, 0);
+		if (mdi)
+			sysfs_uevent(mdi, "change");
+	}
 
 	if (devnm[0] && use_udev()) {
 		struct map_ent *mp = map_by_devnm(&map, devnm);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 20:46           ` NeilBrown
  2016-02-16 22:02             ` Jes Sorensen
@ 2016-02-17  7:03             ` Hannes Reinecke
  2016-02-17 13:06               ` Jes Sorensen
  1 sibling, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-02-17  7:03 UTC (permalink / raw)
  To: NeilBrown, Jes Sorensen
  Cc: Sebastian Parschauer, linux-raid, Shaohua Li, Brassow Jonathan,
	Artur Paszkiewicz, systemd-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/16/2016 09:46 PM, NeilBrown wrote:
> On Wed, Feb 17 2016, Jes Sorensen wrote:
> 
>> Hannes Reinecke <hare@suse.de> writes:
>>> On 02/16/2016 07:03 PM, Sebastian Parschauer wrote:
>>>> On 16.02.2016 18:41, Jes Sorensen wrote:
>>>>> Sebastian Parschauer
>>>>> <sebastian.riemer@profitbricks.com> writes:
>>>>>> When stopping an MD device, then its device node
>>>>>> /dev/mdX may still exist afterwards or it is
>>>>>> recreated by udev. The next open() call can lead to
>>>>>> creation of an inoperable MD device. The reason for 
>>>>>> this is that a change event (KOBJ_CHANGE) is
>>>>>> announced to udev. So announce a removal event
>>>>>> (KOBJ_REMOVE) to udev instead.
>>>>>> 
>>>>>> This also overrides the change event sent by the
>>>>>> kernel.
>>>>>> 
>>>>>> Signed-off-by: Sebastian Parschauer
>>>>>> <sebastian.riemer@profitbricks.com> --- Manage.c |
>>>>>> 6 +++--- 1 file changed, 3 insertions(+), 3
>>>>>> deletions(-)
>>>>>> 
>>>>>> diff --git a/Manage.c b/Manage.c index
>>>>>> 7e1b94b..bc89764 100644 --- a/Manage.c +++
>>>>>> b/Manage.c @@ -494,13 +494,13 @@ done: goto out; } /*
>>>>>> prior to 2.6.28, KOBJ_CHANGE was not sent when an md
>>>>>> array -	 * was stopped, so We'll do it here just to
>>>>>> be sure.  Drop any -	 * partitions as well... +	 *
>>>>>> was stopped, it should be KOBJ_REMOVE instead, so we
>>>>>> set the +	 * remove event here just to be sure. Drop
>>>>>> any partitions as well... */ if (fd >= 0) ioctl(fd,
>>>>>> BLKRRPART, 0); if (mdi) -		sysfs_uevent(mdi,
>>>>>> "change"); +		sysfs_uevent(mdi, "remove");
>>>>> 
>>>>> I am a little concerned about this change. You assume
>>>>> the kernel and mdadm will be updated in sync, which is
>>>>> unlikely to happen. I believe you need to match the
>>>>> kernel version and send the corresponding event 
>>>>> currectly for this to work correctly?
>>>> 
>>>> The worst thing that can happen is that the kernel sends
>>>> the change event after the remove event. Then it is the
>>>> current situation again. From my tests mdadm does enough
>>>> other stuff in between. Udev is able to handle receiving
>>>> two remove events from my testing. Multiple mdadm 
>>>> instances can't run in parallel any ways. So userspace
>>>> around it needs some serialization for it any ways. So
>>>> also stopping an MD device and assembling a new one with
>>>> the same minor number shouldn't race.
>>>> 
>>>> I still prefer this solution here. But if you decide to
>>>> drop the udev event sending in mdadm, then I'm also fine
>>>> with that.
>>>> 
>>> I strongly prefer removing the udev event generation
>>> altogether. As this appears to be a carry-over from older
>>> kernels, it looks to me as being an incomplete conversion: 
>>> At one point udev introduced 'ONLINE' and 'OFFLINE' events,
>>> which were supposed to be used for this kind of scenario. 
>>> (ONLINE being a companion to 'ADD', and 'OFFLINE' being the
>>> companion to 'DELETE'). However, later the 'ONLINE' got
>>> modified to 'CHANGE', and the 'OFFLINE' got dropped
>>> completely. Or that was the plan. So it looks as if the
>>> conversion to 'CHANGE' got applied to the 'OFFLINE' event,
>>> too. Hence I strongly recommend to drop it completely, and
>>> let the kernel or the MD module decide if and when a uevent
>>> should be send.
>> 
>> I am totally fine with this, however we should make mdadm
>> fail if run against a pre-2.6.28 kernel then.
>> 
>> Cheers, Jes
> 
> I would suggest protecting the
> 
> if (fd >= 0) ioctl(fd, BLKRRPART, 0); if (mdi) 
> sysfs_uevent(mdi, "change");
> 
> code with
> 
> if (get_linux_version() < 2006028)
> 
> That should be completely safe - 2.6.28 and later do this (if
> needed).
> 
+1.

Yes, this is the best solution.

Cheers,

Hannes
- -- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJWxBs6AAoJEGz4yi9OyKjP8a0P/RqXuKmFFOcBXnST1f1ZWq24
gExfzeo8VAQicWi/CBFu+lumePOiypzKfP0NfHw4PDGPYuLQQq6OXRmiQCqhWz/p
EBRZ9a8NyUIjUpra2j66IiMGwh1KGYl9AeIZmvGTNVkNcOySKdJxLWkFnI6wLwvU
YmUez04UFxEleynt5c00ZYvioYnVchVDNEc4/8yTG6jAUg4+6Q7tTw8vvR3wko+K
vmDbUyUz+q8R5tyTjCKB/KWgMPnMUv+wYoZx+jWtpRyUO6a76U9T5if+ZKk4EUkb
NBHy76L6/YxvOhJqAuX9dMiPDADgHVzD5mnzTSzt9HF/ArXBXtEMcaMxhLPYpLTU
lY7FQqzQ5sGQKbo0nm3EHrLjIP1bWe3BKaniyFQG39wlzdhhFGCzJzHnl1KwSnu6
gy+/AuQSibvxUQhD/ZO4+AJjMq1sXLwRlwwPFr/pI8wrcIqFIUuZG0JjY9NY2UQ2
+povgSj4UnXpRS7BKjvmN/VyUIbnXzf8cpB8w2qwxI5nSwKgjhSNz+o3NeCDQqpw
u2E0MIciPDKXb2GnfPA2+Depm8VfcL9uaRNbHmnV9shIlRsQLB9/IzzVA5Cf5T3f
GA9pHLKEM2LHAWqPmUVIghzyTTj5CXsZrH2GdJVyNTc1bymDBKmQbbiO+IVIHg45
XnJ7L15O6ZTXEW1WMNYT
=okXv
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 22:02             ` Jes Sorensen
@ 2016-02-17 10:31               ` Sebastian Parschauer
  0 siblings, 0 replies; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-17 10:31 UTC (permalink / raw)
  To: Jes Sorensen, NeilBrown
  Cc: Hannes Reinecke, linux-raid, Shaohua Li, Brassow Jonathan,
	Artur Paszkiewicz, systemd-devel

[-- Attachment #1: Type: text/plain, Size: 4516 bytes --]

On 16.02.2016 23:02, Jes Sorensen wrote:
> NeilBrown <neilb@suse.com> writes:
>> On Wed, Feb 17 2016, Jes Sorensen wrote:
>>
>>> Hannes Reinecke <hare@suse.de> writes:
>>>> On 02/16/2016 07:03 PM, Sebastian Parschauer wrote:
>>>>> The worst thing that can happen is that the kernel sends the change
>>>>> event after the remove event. Then it is the current situation again.
>>>>>  From my tests mdadm does enough other stuff in between. Udev is able to
>>>>> handle receiving two remove events from my testing. Multiple mdadm
>>>>> instances can't run in parallel any ways. So userspace around it needs
>>>>> some serialization for it any ways. So also stopping an MD device and
>>>>> assembling a new one with the same minor number shouldn't race.
>>>>>
>>>>> I still prefer this solution here. But if you decide to drop the udev
>>>>> event sending in mdadm, then I'm also fine with that.
>>>>>
>>>> I strongly prefer removing the udev event generation altogether.
>>>> As this appears to be a carry-over from older kernels, it looks to me
>>>> as being an incomplete conversion:
>>>> At one point udev introduced 'ONLINE' and 'OFFLINE' events, which were
>>>> supposed to be used for this kind of scenario.
>>>> (ONLINE being a companion to 'ADD', and 'OFFLINE' being the companion
>>>> to 'DELETE'). However, later the 'ONLINE' got modified to 'CHANGE',
>>>> and the 'OFFLINE' got dropped completely.
>>>> Or that was the plan.
>>>> So it looks as if the conversion to 'CHANGE' got applied to the
>>>> 'OFFLINE' event, too.
>>>> Hence I strongly recommend to drop it completely, and let the kernel
>>>> or the MD module decide if and when a uevent should be send.
>>>
>>> I am totally fine with this, however we should make mdadm fail if run
>>> against a pre-2.6.28 kernel then.
>>>
>>> Cheers,
>>> Jes
>>
>> I would suggest protecting the
>>
>> 	if (fd >= 0)
>> 		ioctl(fd, BLKRRPART, 0);
>> 	if (mdi)
>> 		sysfs_uevent(mdi, "change");
>>
>> code with
>>
>>    if (get_linux_version() < 2006028)
>>
>> That should be completely safe - 2.6.28 and later do this (if needed).
> 
> Seems a better fix to me. I much prefer the duplicated events.
> 
> Sebastian, does this patch resolve the problem for you? If nobody
> hollors, I will push this into mdadm.

I've tested this on top of mdadm-3.4 and with vanilla kernel 4.5.0-rc3.
The device nodes are still there. I guess this is because of the change
event from the kernel. This confirms my former tests.

Run 1, 2: one of two device nodes gone
Run 3: both device nodes gone
Run 4: both device nodes are there (again)

It only works if the kernel sends the remove event instead of the change
event as well. So I'm only fine with this if the kernel change gets
accepted as well.

I create two MD devices (md0 and md1) on top of LVM. Then I check if the
creation worked, stop the devices and check the devices again. For a
clean state afterwards, I clean up everything and check devices again in
my script. I have 5s sleep between each action to get final states.

I've attached my test script.

> commit 7856fa44b8f0bc217a6bbcb5f7c51b2f03717655
> Author: Jes Sorensen <Jes.Sorensen@redhat.com>
> Date:   Tue Feb 16 16:58:36 2016 -0500
> 
>     Manage.c: Only issue change events for kernels older then 2.6.28

Shouldn't the prefix be "Manage:" or "Manage/stop:"? Here is a typo.
Should be "than" instead of "then" here.

>     
>     2.6.28+ kernels handle this themselves and issuing the event here can
>     cause a race.
>     
>     Reported-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>     Suggested-by: NeilBrown <neilb@suse.com>
>     Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
> 
> diff --git a/Manage.c b/Manage.c
> index 7e1b94b..eae96e1 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -493,14 +493,17 @@ done:
>  		rv = 1;
>  		goto out;
>  	}
> -	/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
> -	 * was stopped, so We'll do it here just to be sure.  Drop any
> -	 * partitions as well...
> -	 */
> -	if (fd >= 0)
> -		ioctl(fd, BLKRRPART, 0);
> -	if (mdi)
> -		sysfs_uevent(mdi, "change");
> +
> +	if (get_linux_version() < 2006028) {
> +		/* prior to 2.6.28, KOBJ_CHANGE was not sent when an md array
> +		 * was stopped, so We'll do it here just to be sure.  Drop any
> +		 * partitions as well...
> +		 */
> +		if (fd >= 0)
> +			ioctl(fd, BLKRRPART, 0);
> +		if (mdi)
> +			sysfs_uevent(mdi, "change");
> +	}
>  
>  	if (devnm[0] && use_udev()) {
>  		struct map_ent *mp = map_by_devnm(&map, devnm);
> 


[-- Attachment #2: test_mdadm_stop.sh.txt --]
[-- Type: text/plain, Size: 836 bytes --]

#!/bin/bash

NUM_VOL=2

remove_devices() {
    for ((i=0;i<${NUM_VOL};i++)); do
        rm -fv /dev/md${i}
    done
}

check_devices() {
    ls -l /dev/disk/by-id/md*
    ls -d /dev/md*
    ls -d /sys/block/md*
    cat /proc/mdstat
}

stop_devices() {
    for ((i=0;i<${NUM_VOL};i++)); do
        mdadm --stop /dev/md${i} 2>&1
    done
}

create_devices() {
    for ((i=0;i<${NUM_VOL};i++)); do
        mdadm --zero-superblock /dev/dm-${i}
        mdadm -C /dev/md${i} -l 1 -n 2 -e 1.2 \
            --run /dev/dm-${i} missing
    done
}

####### MAIN #######

create_devices

sleep 5

echo "==== CREATE CHECK ===="
check_devices

sleep 5

stop_devices

sleep 5

echo "==== CHECK 1 ===="
check_devices

stop_devices
remove_devices

sleep 5

echo "==== CHECK 2 ===="
check_devices

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] md: Inform udev about device removal when stopping
  2016-02-16 20:43     ` NeilBrown
@ 2016-02-17 11:24       ` Sebastian Parschauer
  2016-02-17 22:57         ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-17 11:24 UTC (permalink / raw)
  To: NeilBrown, Shaohua Li, linux-raid
  Cc: Jes Sorensen, Brassow Jonathan, Artur Paszkiewicz, systemd-devel

On 16.02.2016 21:43, NeilBrown wrote:
> On Wed, Feb 17 2016, Shaohua Li wrote:
> 
>> On Tue, Feb 16, 2016 at 03:44:36PM +0100, Sebastian Parschauer wrote:
>>> When stopping an MD device, then its device node /dev/mdX may still
>>> exist afterwards or it is recreated by udev. The next open() call
>>> can lead to creation of an inoperable MD device. The reason for
>>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>>>
>>> A change is likely also required in mdadm because of the support
>>> for kernels prior to 2.6.28.
>>
>> I didn't follow why we need the change. Shouldn't the KOBJ_REMOVE event be sent
>> automatically when gendisk is deleted?
>> mddev_put()->mddev_delayed_delete()->md_free()->del_gendisk().
>>
>> Thanks,
>> Shaohua
> 
> For a bit of context: this KOBJ_CHANGE event was added in Oct 2008
> 
> Commit: 934d9c23b4c7 ("md: destroy partitions and notify udev when md array is stopped.")
> 
> At the time, md devices weren't getting removed at all.
> Now they are (I figured out the locking), though they can still come
> back.
> 
> There are still two stages.  The array is stopped, and then the block
> device is destroyed.  It is theoretically possible to stop the array
> without destroying the block device, though I don't think that happens
> in practice.
> 
> So this KOBJ_CHANGE is, I think, technically correct (change from
> "active" to "inactive")  but probably isn't needed any more - not to the
> extent it was at the time.
> 
> There are some annoying races with caused by udev responding (belatedly)
> to events by running programs that open s/dev/mdXX and so automatically
> re-creates the md device.
> The real problem here is not the event or the delays in udev.  It is the
> fact that opening /dev/mdXX transparently creates a device.
> 
> The only way (I know of) to really avoid these races is to use named
> arrays.
> Put
>    CREATE names=yes
> 
> in mdadm.conf.  Then md arrays will be created by writing a name to a
> magic file in /sys.  The arrays have a minor number >=512 and are not
> auto-re-created if the device node is re-opened before udev unlinks it.
> 
> So: the patch might be safe, and might solve a particular problem, but
> it is really just a bandaid.  The best fix is "CREATE named=yes" (and
> use named like "md_home", not "md4".

Older mdadm versions like 3.2.6 have really bad scaling issues as they
search the whole /dev directory with map_dev() for the correct device
and we've hit further issues with the symlinks in /dev/md/. This is why
we've decided to go for the /dev/mdX devices directly as then also the
minor number is clear.

I remember custom commits:
* dev_open: add parameter 'do_map_dev'
* mdopen: don't do 'map_dev' in 'create_mddev' if devname is /dev/mdX

I did a further test: If mdadm and the kernel don't send any uevent when
stopping, then it also works. Might be the best solution.

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-17  7:03             ` Hannes Reinecke
@ 2016-02-17 13:06               ` Jes Sorensen
  2016-02-17 13:16                 ` Sebastian Parschauer
  0 siblings, 1 reply; 20+ messages in thread
From: Jes Sorensen @ 2016-02-17 13:06 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: NeilBrown, Sebastian Parschauer, linux-raid, Shaohua Li,
	Brassow Jonathan, Artur Paszkiewicz, systemd-devel

Hannes Reinecke <hare@suse.de> writes:
> On 02/16/2016 09:46 PM, NeilBrown wrote:
>> On Wed, Feb 17 2016, Jes Sorensen wrote:
>>> 
>>> I am totally fine with this, however we should make mdadm
>>> fail if run against a pre-2.6.28 kernel then.
>>> 
>>> Cheers, Jes
>> 
>> I would suggest protecting the
>> 
>> if (fd >= 0) ioctl(fd, BLKRRPART, 0); if (mdi) 
>> sysfs_uevent(mdi, "change");
>> 
>> code with
>> 
>> if (get_linux_version() < 2006028)
>> 
>> That should be completely safe - 2.6.28 and later do this (if
>> needed).
>> 
> +1.
>
> Yes, this is the best solution.

Sebastian indicates it only works if the kernel patch he submitted is
applied too - should we tweak the mdadm version check to match the next
upstream kernel, or stick with it as is here?

Jes


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-17 13:06               ` Jes Sorensen
@ 2016-02-17 13:16                 ` Sebastian Parschauer
  2016-02-17 17:33                   ` Jes Sorensen
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-17 13:16 UTC (permalink / raw)
  To: Jes Sorensen, Hannes Reinecke
  Cc: NeilBrown, linux-raid, Shaohua Li, Brassow Jonathan,
	Artur Paszkiewicz, systemd-devel

On 17.02.2016 14:06, Jes Sorensen wrote:
> Hannes Reinecke <hare@suse.de> writes:
>> On 02/16/2016 09:46 PM, NeilBrown wrote:
>>> On Wed, Feb 17 2016, Jes Sorensen wrote:
>>>>
>>>> I am totally fine with this, however we should make mdadm
>>>> fail if run against a pre-2.6.28 kernel then.
>>>>
>>>> Cheers, Jes
>>>
>>> I would suggest protecting the
>>>
>>> if (fd >= 0) ioctl(fd, BLKRRPART, 0); if (mdi) 
>>> sysfs_uevent(mdi, "change");
>>>
>>> code with
>>>
>>> if (get_linux_version() < 2006028)
>>>
>>> That should be completely safe - 2.6.28 and later do this (if
>>> needed).
>>>
>> +1.
>>
>> Yes, this is the best solution.
> 
> Sebastian indicates it only works if the kernel patch he submitted is
> applied too - should we tweak the mdadm version check to match the next
> upstream kernel, or stick with it as is here?

Sorry, it also works if dropping the sending of the change event in the
kernel as well. This seems to be the preferred solution so far. So for
kernels still sending the change event, the problem is not fixed this
way. But your mdadm commit also doesn't make it worse.

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-17 13:16                 ` Sebastian Parschauer
@ 2016-02-17 17:33                   ` Jes Sorensen
  0 siblings, 0 replies; 20+ messages in thread
From: Jes Sorensen @ 2016-02-17 17:33 UTC (permalink / raw)
  To: Sebastian Parschauer
  Cc: Hannes Reinecke, NeilBrown, linux-raid, Shaohua Li,
	Brassow Jonathan, Artur Paszkiewicz, systemd-devel

Sebastian Parschauer <sebastian.riemer@profitbricks.com> writes:
> On 17.02.2016 14:06, Jes Sorensen wrote:
>> Hannes Reinecke <hare@suse.de> writes:
>>> On 02/16/2016 09:46 PM, NeilBrown wrote:
>>>> On Wed, Feb 17 2016, Jes Sorensen wrote:
>>>>>
>>>>> I am totally fine with this, however we should make mdadm
>>>>> fail if run against a pre-2.6.28 kernel then.
>>>>>
>>>>> Cheers, Jes
>>>>
>>>> I would suggest protecting the
>>>>
>>>> if (fd >= 0) ioctl(fd, BLKRRPART, 0); if (mdi) 
>>>> sysfs_uevent(mdi, "change");
>>>>
>>>> code with
>>>>
>>>> if (get_linux_version() < 2006028)
>>>>
>>>> That should be completely safe - 2.6.28 and later do this (if
>>>> needed).
>>>>
>>> +1.
>>>
>>> Yes, this is the best solution.
>> 
>> Sebastian indicates it only works if the kernel patch he submitted is
>> applied too - should we tweak the mdadm version check to match the next
>> upstream kernel, or stick with it as is here?
>
> Sorry, it also works if dropping the sending of the change event in the
> kernel as well. This seems to be the preferred solution so far. So for
> kernels still sending the change event, the problem is not fixed this
> way. But your mdadm commit also doesn't make it worse.

Since there is pretty broad agreement on this approach, I have pushed
the fix out for mdadm.

Jes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] md: Inform udev about device removal when stopping
  2016-02-17 11:24       ` Sebastian Parschauer
@ 2016-02-17 22:57         ` NeilBrown
  0 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2016-02-17 22:57 UTC (permalink / raw)
  To: Sebastian Parschauer, Shaohua Li, linux-raid
  Cc: Jes Sorensen, Brassow Jonathan, Artur Paszkiewicz, systemd-devel

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]

On Wed, Feb 17 2016, Sebastian Parschauer wrote:

> On 16.02.2016 21:43, NeilBrown wrote:
>> On Wed, Feb 17 2016, Shaohua Li wrote:
>> 
>>> On Tue, Feb 16, 2016 at 03:44:36PM +0100, Sebastian Parschauer wrote:
>>>> When stopping an MD device, then its device node /dev/mdX may still
>>>> exist afterwards or it is recreated by udev. The next open() call
>>>> can lead to creation of an inoperable MD device. The reason for
>>>> this is that a change event (KOBJ_CHANGE) is announced to udev.
>>>> So announce a removal event (KOBJ_REMOVE) to udev instead.
>>>>
>>>> A change is likely also required in mdadm because of the support
>>>> for kernels prior to 2.6.28.
>>>
>>> I didn't follow why we need the change. Shouldn't the KOBJ_REMOVE event be sent
>>> automatically when gendisk is deleted?
>>> mddev_put()->mddev_delayed_delete()->md_free()->del_gendisk().
>>>
>>> Thanks,
>>> Shaohua
>> 
>> For a bit of context: this KOBJ_CHANGE event was added in Oct 2008
>> 
>> Commit: 934d9c23b4c7 ("md: destroy partitions and notify udev when md array is stopped.")
>> 
>> At the time, md devices weren't getting removed at all.
>> Now they are (I figured out the locking), though they can still come
>> back.
>> 
>> There are still two stages.  The array is stopped, and then the block
>> device is destroyed.  It is theoretically possible to stop the array
>> without destroying the block device, though I don't think that happens
>> in practice.
>> 
>> So this KOBJ_CHANGE is, I think, technically correct (change from
>> "active" to "inactive")  but probably isn't needed any more - not to the
>> extent it was at the time.
>> 
>> There are some annoying races with caused by udev responding (belatedly)
>> to events by running programs that open s/dev/mdXX and so automatically
>> re-creates the md device.
>> The real problem here is not the event or the delays in udev.  It is the
>> fact that opening /dev/mdXX transparently creates a device.
>> 
>> The only way (I know of) to really avoid these races is to use named
>> arrays.
>> Put
>>    CREATE names=yes
>> 
>> in mdadm.conf.  Then md arrays will be created by writing a name to a
>> magic file in /sys.  The arrays have a minor number >=512 and are not
>> auto-re-created if the device node is re-opened before udev unlinks it.
>> 
>> So: the patch might be safe, and might solve a particular problem, but
>> it is really just a bandaid.  The best fix is "CREATE named=yes" (and
>> use named like "md_home", not "md4".
>
> Older mdadm versions like 3.2.6 have really bad scaling issues as they
> search the whole /dev directory with map_dev() for the correct device
> and we've hit further issues with the symlinks in /dev/md/. This is why
> we've decided to go for the /dev/mdX devices directly as then also the
> minor number is clear.

Why would anyone care about the minor number?

with 'name=yes', the entries in /dev are e.g. "md_foo" - no symlinks in
/dev (the exact same symlinked are in /dev/md).

If there are scaling issues, we should try to fix them.  Please report
details.

>
> I remember custom commits:
> * dev_open: add parameter 'do_map_dev'
> * mdopen: don't do 'map_dev' in 'create_mddev' if devname is /dev/mdX

Have you posted these?  Please do.

>
> I did a further test: If mdadm and the kernel don't send any uevent when
> stopping, then it also works. Might be the best solution.

I'm glad it works for your test cases, but that doesn't necessarily
means it is correct or sufficient.
I'm not exactly against removing the uevent, but I wouldn't be surprised
if that ends up causing a regression for someone who does things
differently to you.
And as I have said, I think there are other situation, maybe less
common, where udev can get bogged down and end up handling change events after
the array has been destroyed.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
  2016-02-16 15:47 Hannes Reinecke
@ 2016-02-16 16:58 ` Sebastian Parschauer
  0 siblings, 0 replies; 20+ messages in thread
From: Sebastian Parschauer @ 2016-02-16 16:58 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-raid

On 16.02.2016 16:47, Hannes Reinecke wrote:
> Hi Sebastian,
> 
> while it's true in general that a 'change' event should not be sent
> when a device is being removed or deleted, sending a 'remove' event
> from userspace is most definitely wrong, too.
> 'remove' events should be generated from the kernel whenever a
> device disappears from sysfs; it should never be generated from
> userspace (as the device will still be present in sysfs).

Hi Hannes,

thanks for your comment! Only few people actually stop md devices
without rebooting. As a hotfix for running systems it is definitely the
best solution for us at PB to do it in mdadm. Mdadm knows in this
situation that stopping worked.

But I'm also fine with dropping the support for kernels prior to 2.6.28
and the event sending in mdadm completely for the mainline. Sending the
kernel patch to linux-stable is kind of useless if mdadm still sends the
change event.
So your vote seems to be dropping the udev event sending in mdadm and
going for the kernel change.

I'll wait for other comments or votes for the preferred solution.

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Manage: Inform udev about device removal when stopping
@ 2016-02-16 15:47 Hannes Reinecke
  2016-02-16 16:58 ` Sebastian Parschauer
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:47 UTC (permalink / raw)
  To: Sebastian Parschauer; +Cc: linux-raid

Hi Sebastian,

while it's true in general that a 'change' event should not be sent
when a device is being removed or deleted, sending a 'remove' event
from userspace is most definitely wrong, too.
'remove' events should be generated from the kernel whenever a
device disappears from sysfs; it should never be generated from
userspace (as the device will still be present in sysfs).

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-02-17 22:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-16 14:44 [PATCH 0/2] md/mdadm: Inform udev about device removal when stopping Sebastian Parschauer
2016-02-16 14:44 ` [PATCH 1/2] md: " Sebastian Parschauer
2016-02-16 20:05   ` Shaohua Li
2016-02-16 20:43     ` NeilBrown
2016-02-17 11:24       ` Sebastian Parschauer
2016-02-17 22:57         ` NeilBrown
2016-02-16 14:44 ` [PATCH 2/2] Manage: " Sebastian Parschauer
2016-02-16 17:41   ` Jes Sorensen
2016-02-16 18:03     ` Sebastian Parschauer
2016-02-16 18:40       ` Hannes Reinecke
2016-02-16 18:52         ` Jes Sorensen
2016-02-16 20:46           ` NeilBrown
2016-02-16 22:02             ` Jes Sorensen
2016-02-17 10:31               ` Sebastian Parschauer
2016-02-17  7:03             ` Hannes Reinecke
2016-02-17 13:06               ` Jes Sorensen
2016-02-17 13:16                 ` Sebastian Parschauer
2016-02-17 17:33                   ` Jes Sorensen
2016-02-16 15:47 Hannes Reinecke
2016-02-16 16:58 ` Sebastian Parschauer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.