All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
@ 2022-07-14 22:37 Logan Gunthorpe
  2022-07-15  2:20 ` Guoqing Jiang
  2022-07-20 18:59 ` Wols Lists
  0 siblings, 2 replies; 9+ messages in thread
From: Logan Gunthorpe @ 2022-07-14 22:37 UTC (permalink / raw)
  To: linux-raid, Jes Sorensen
  Cc: Song Liu, Christoph Hellwig, Donald Buczek, Guoqing Jiang,
	Xiao Ni, Himanshu Madhani, Mariusz Tkaczyk, Coly Li, Bruce Dubbs,
	Stephen Bates, Martin Oliveira, David Sloan, Logan Gunthorpe

The mdadm command tries to open the md device for most modes, first
thing, no matter what. When running to create or assemble an array,
in most cases, the md device will not exist, the open call will fail
and everything will proceed correctly.

However, when running tests, a create or assembly command may be run
shortly after stopping an array and the old md device file may still
be around. Then, if create_on_open is set in the kernel, a new md
device will be created when mdadm does its initial open.

When mdadm gets around to creating the new device with the new_array
parameter it issues this error:

   mdadm: Fail to create md0 when using
   /sys/module/md_mod/parameters/new_array, fallback to creation via node

This is because an mddev was already created by the kernel with the
earlier open() call and thus the new one being created will fail with
EEXIST. The mdadm command will still successfully be created due to
falling back to the node creation method. However, the error message
itself will fail any test that's running it.

This issue is a race condition that is very rare, but a recent change
in the kernel caused this to happen more frequently: about 1 in 50
times.

To fix this, don't bother trying to open the md device for CREATE and
ASSEMBLE commands, as the file descriptor will never be used anyway
even if it is successfully openned.

Side note: it would be nice to disable create_on_open as well to help
solve this, but it seems the work for this was never finished. By default,
mdadm will create using the old node interface when a name is specified
unless the user specifically puts names=yes in a config file, which
doesn't seem to be very common yet.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 mdadm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mdadm.c b/mdadm.c
index 56722ed997a2..3b886b5c0639 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -1347,8 +1347,7 @@ int main(int argc, char *argv[])
 	 * an md device.  We check that here and open it.
 	 */
 
-	if (mode == MANAGE || mode == BUILD || mode == CREATE ||
-	    mode == GROW || (mode == ASSEMBLE && ! c.scan)) {
+	if (mode == MANAGE || mode == BUILD || mode == GROW) {
 		if (devs_found < 1) {
 			pr_err("an md device must be given in this mode\n");
 			exit(2);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-07-14 22:37 [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE Logan Gunthorpe
@ 2022-07-15  2:20 ` Guoqing Jiang
  2022-07-19 11:27   ` Mariusz Tkaczyk
  2022-07-20 18:59 ` Wols Lists
  1 sibling, 1 reply; 9+ messages in thread
From: Guoqing Jiang @ 2022-07-15  2:20 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-raid, Jes Sorensen
  Cc: Song Liu, Christoph Hellwig, Donald Buczek, Xiao Ni,
	Himanshu Madhani, Mariusz Tkaczyk, Coly Li, Bruce Dubbs,
	Stephen Bates, Martin Oliveira, David Sloan



On 7/15/22 6:37 AM, Logan Gunthorpe wrote:
> The mdadm command tries to open the md device for most modes, first
> thing, no matter what. When running to create or assemble an array,
> in most cases, the md device will not exist, the open call will fail
> and everything will proceed correctly.

I guess the BUILD mode doesn't need to create md as well since commit
7f91af49ad09 ("Delay creation of array devices for assemble/build/create").

> However, when running tests, a create or assembly command may be run
> shortly after stopping an array and the old md device file may still
> be around. Then, if create_on_open is set in the kernel, a new md
> device will be created when mdadm does its initial open.
>
> When mdadm gets around to creating the new device with the new_array
> parameter it issues this error:
>
>     mdadm: Fail to create md0 when using
>     /sys/module/md_mod/parameters/new_array, fallback to creation via node
>
> This is because an mddev was already created by the kernel with the
> earlier open() call and thus the new one being created will fail with
> EEXIST. The mdadm command will still successfully be created due to
> falling back to the node creation method. However, the error message
> itself will fail any test that's running it.
>
> This issue is a race condition that is very rare, but a recent change
> in the kernel caused this to happen more frequently: about 1 in 50
> times.
>
> To fix this, don't bother trying to open the md device for CREATE and
> ASSEMBLE commands, as the file descriptor will never be used anyway
> even if it is successfully openned.
>
> Side note: it would be nice to disable create_on_open as well to help
> solve this, but it seems the work for this was never finished. By default,
> mdadm will create using the old node interface when a name is specified
> unless the user specifically puts names=yes in a config file, which
> doesn't seem to be vcreate_on_openery common yet.

Hmm, 'create_on_open' is default to true, not sure if there is any side 
effect
after change to false.

> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   mdadm.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mdadm.c b/mdadm.c
> index 56722ed997a2..3b886b5c0639 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -1347,8 +1347,7 @@ int main(int argc, char *argv[])
>   	 * an md device.  We check that here and open it.
>   	 */
>   
> -	if (mode == MANAGE || mode == BUILD || mode == CREATE ||
> -	    mode == GROW || (mode == ASSEMBLE && ! c.scan)) {
> +	if (mode == MANAGE || mode == BUILD || mode == GROW) {
>   		if (devs_found < 1) {
>   			pr_err("an md device must be given in this mode\n");
>   			exit(2);

I think it is a reasonable change.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-07-15  2:20 ` Guoqing Jiang
@ 2022-07-19 11:27   ` Mariusz Tkaczyk
  2022-07-19 16:43     ` Logan Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Mariusz Tkaczyk @ 2022-07-19 11:27 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Logan Gunthorpe, linux-raid, Jes Sorensen, Song Liu,
	Christoph Hellwig, Donald Buczek, Xiao Ni, Himanshu Madhani,
	Coly Li, Bruce Dubbs, Stephen Bates, Martin Oliveira,
	David Sloan

On Fri, 15 Jul 2022 10:20:26 +0800
Guoqing Jiang <guoqing.jiang@linux.dev> wrote:

> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:
> > The mdadm command tries to open the md device for most modes, first
> > thing, no matter what. When running to create or assemble an array,
> > in most cases, the md device will not exist, the open call will fail
> > and everything will proceed correctly.  
> 
> I guess the BUILD mode doesn't need to create md as well since commit
> 7f91af49ad09 ("Delay creation of array devices for assemble/build/create").
> 
> > However, when running tests, a create or assembly command may be run
> > shortly after stopping an array and the old md device file may still
> > be around. Then, if create_on_open is set in the kernel, a new md
> > device will be created when mdadm does its initial open.
> >
> > When mdadm gets around to creating the new device with the new_array
> > parameter it issues this error:
> >
> >     mdadm: Fail to create md0 when using
> >     /sys/module/md_mod/parameters/new_array, fallback to creation via node
> >
> > This is because an mddev was already created by the kernel with the
> > earlier open() call and thus the new one being created will fail with
> > EEXIST. The mdadm command will still successfully be created due to
> > falling back to the node creation method. However, the error message
> > itself will fail any test that's running it.
> >
> > This issue is a race condition that is very rare, but a recent change
> > in the kernel caused this to happen more frequently: about 1 in 50
> > times.
> >
> > To fix this, don't bother trying to open the md device for CREATE and
> > ASSEMBLE commands, as the file descriptor will never be used anyway
> > even if it is successfully openned.
Hi All,

This is not a fix, it just reduces race probability because file descriptor
will be opened later. I normal environment issue is not likely to happen
because user doesn't create and stop arrays in loop. Issue here
should be resolved in tests, perhaps wait should be sufficient.

I tried to resolve it in the past by adding completion to md driver and force
mdadm --stop command to wait for sysfs clean up but I have never finished it.
IMO it is a better way, wait for device to be fully removed by MD during stop.
Thoughts?

One objection I have here is that error handling is changed, so it could be
harmful change for users. 
> >
> > Side note: it would be nice to disable create_on_open as well to help
> > solve this, but it seems the work for this was never finished. By default,
> > mdadm will create using the old node interface when a name is specified
> > unless the user specifically puts names=yes in a config file, which
> > doesn't seem to be vcreate_on_openery common yet.  
> 
> Hmm, 'create_on_open' is default to true, not sure if there is any side 
> effect
> after change to false.

I'm slowly working on this. The change is not simple, starting from terrible
create_mddev code and char *mddev and char *name rules , ending with hidden
references which we are not aware off (it is common to create temp node and
open mddevice in mdadm) and other references in systemd (because of that, udev
is avoiding to open device).
This also indicate that we should drop partition discover in kernel and use
udev in the future, especially for external metadata (where RO -> RW transition
happens during assembly). But this is a separate problem.

Thanks,
Mariusz
> 
> > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> > ---
> >   mdadm.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/mdadm.c b/mdadm.c
> > index 56722ed997a2..3b886b5c0639 100644
> > --- a/mdadm.c
> > +++ b/mdadm.c
> > @@ -1347,8 +1347,7 @@ int main(int argc, char *argv[])
> >   	 * an md device.  We check that here and open it.
> >   	 */
> >   
> > -	if (mode == MANAGE || mode == BUILD || mode == CREATE ||
> > -	    mode == GROW || (mode == ASSEMBLE && ! c.scan)) {
> > +	if (mode == MANAGE || mode == BUILD || mode == GROW) {
> >   		if (devs_found < 1) {
> >   			pr_err("an md device must be given in this
> > mode\n"); exit(2);  
> 
> I think it is a reasonable change.
> 
> Thanks,
> Guoqing


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-07-19 11:27   ` Mariusz Tkaczyk
@ 2022-07-19 16:43     ` Logan Gunthorpe
  2022-07-20  8:20       ` Mariusz Tkaczyk
  0 siblings, 1 reply; 9+ messages in thread
From: Logan Gunthorpe @ 2022-07-19 16:43 UTC (permalink / raw)
  To: Mariusz Tkaczyk, Guoqing Jiang
  Cc: linux-raid, Jes Sorensen, Song Liu, Christoph Hellwig,
	Donald Buczek, Xiao Ni, Himanshu Madhani, Coly Li, Bruce Dubbs,
	Stephen Bates, Martin Oliveira, David Sloan



On 2022-07-19 05:27, Mariusz Tkaczyk wrote:
> On Fri, 15 Jul 2022 10:20:26 +0800
> Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:
>>> To fix this, don't bother trying to open the md device for CREATE and
>>> ASSEMBLE commands, as the file descriptor will never be used anyway
>>> even if it is successfully openned.
> Hi All,
> 
> This is not a fix, it just reduces race probability because file descriptor
> will be opened later.

That's not correct. The later "open" call actually will use the new_array
parameter which will wait for the workqueue before creating a new array
device, so the old one is properly cleaned up and there is no longer
a race condition with this patch. If new_array doesn't exist and it falls back
to a regular open, then it will still do the right thing and open the device 
with create_on_open.

> I tried to resolve it in the past by adding completion to md driver and force
> mdadm --stop command to wait for sysfs clean up but I have never finished it.
> IMO it is a better way, wait for device to be fully removed by MD during stop.
> Thoughts?

I don't think that would work very well. Userspace would end up blocking
on --stop indefinitely if there are any references delaying cleanup to
the device. That's not very user friendly. Given that opening the md
device has side-effects, I think we should avoid opening when we
should know that we are about to try to create a new device.

> One objection I have here is that error handling is changed, so it could be
> harmful change for users. 

Hmm, yes seems like I was a bit sloppy here. However, it still seems possible
to fix this up by not pre-opening the device. The only use for the mdfd
in CREATE, ASSEMBLE and BUILD is to get the minor number if
ident.super_minor == -2 and check if an existing specified device is an md 
device (which may be redundant). We could replace this with a stat() call to
avoid opening the device. What about the patch at the end of this email?

>>>
>>> Side note: it would be nice to disable create_on_open as well to help
>>> solve this, but it seems the work for this was never finished. By default,
>>> mdadm will create using the old node interface when a name is specified
>>> unless the user specifically puts names=yes in a config file, which
>>> doesn't seem to be vcreate_on_openery common yet.  
>>
>> Hmm, 'create_on_open' is default to true, not sure if there is any side 
>> effect
>> after change to false.
> 
> I'm slowly working on this. The change is not simple, starting from terrible
> create_mddev code and char *mddev and char *name rules , ending with hidden
> references which we are not aware off (it is common to create temp node and
> open mddevice in mdadm) and other references in systemd (because of that, udev
> is avoiding to open device).
> This also indicate that we should drop partition discover in kernel and use
> udev in the future, especially for external metadata (where RO -> RW transition
> happens during assembly). But this is a separate problem.

I'm glad to hear someone is still working on that. Thanks!

Logan

--

diff --git a/mdadm.c b/mdadm.c
index 56722ed997a2..7b757c79d6bf 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -1349,6 +1349,8 @@ int main(int argc, char *argv[])
 
 	if (mode == MANAGE || mode == BUILD || mode == CREATE ||
 	    mode == GROW || (mode == ASSEMBLE && ! c.scan)) {
+		struct stat stb;
+
 		if (devs_found < 1) {
 			pr_err("an md device must be given in this mode\n");
 			exit(2);
@@ -1361,37 +1363,31 @@ int main(int argc, char *argv[])
 			mdfd = open_mddev(devlist->devname, 1);
 			if (mdfd < 0)
 				exit(1);
+
+			fstat(mdfd, &stb);
 		} else {
 			char *bname = basename(devlist->devname);
+			int ret;
 
 			if (strlen(bname) > MD_NAME_MAX) {
 				pr_err("Name %s is too long.\n", devlist->devname);
 				exit(1);
 			}
-			/* non-existent device is OK */
-			mdfd = open_mddev(devlist->devname, 0);
-		}
-		if (mdfd == -2) {
-			pr_err("device %s exists but is not an md array.\n", devlist->devname);
-			exit(1);
-		}
-		if ((int)ident.super_minor == -2) {
-			struct stat stb;
-			if (mdfd < 0) {
+
+			ret = stat(devlist->devname, &stb);
+			if (ident.super_minor == -2 && ret) {
 				pr_err("--super-minor=dev given, and listed device %s doesn't exist.\n",
					devlist->devname);
+				exit(1);
+			}
+
+			if (!ret && !md_array_valid_stat(&stb)) {
+				pr_err("device %s exists but is not an md array.\n", devlist->devname);
 				exit(1);
 			}
-			fstat(mdfd, &stb);
-			ident.super_minor = minor(stb.st_rdev);
-		}
-		if (mdfd >= 0 && mode != MANAGE && mode != GROW) {
-			/* We don't really want this open yet, we just might
-			 * have wanted to check some things
-			 */
-			close(mdfd);
-			mdfd = -1;
 		}
+		if (ident.super_minor == -2)
+			ident.super_minor = minor(stb.st_rdev);
 	}
 
 	if (s.raiddisks) {
diff --git a/mdadm.h b/mdadm.h
index 05ef881f4709..7462336b381c 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1504,6 +1504,7 @@ extern int Restore_metadata(char *dev, char *dir, struct context *c,
 			    struct supertype *st, int only);
 
 int md_array_valid(int fd);
+int md_array_valid_stat(struct stat *st);
 int md_array_active(int fd);
 int md_array_is_active(struct mdinfo *info);
 int md_get_array_info(int fd, struct mdu_array_info_s *array);
diff --git a/util.c b/util.c
index cc94f96ef120..20acdcf6af22 100644
--- a/util.c
+++ b/util.c
@@ -250,6 +250,23 @@ int md_array_valid(int fd)
 	return !ret;
 }
 
+int md_array_valid_stat(struct stat *st)
+{
+	struct mdinfo *sra;
+	char *devnm;
+
+	devnm = stat2devnm(st);
+	if (!devnm)
+		return 0;
+
+	sra = sysfs_read(-1, devnm, GET_ARRAY_STATE);
+	if (!sra)
+		return 0;
+
+	free(sra);
+	return 1;
+}
+
 int md_array_active(int fd)
 {
 	struct mdinfo *sra;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-07-19 16:43     ` Logan Gunthorpe
@ 2022-07-20  8:20       ` Mariusz Tkaczyk
  2022-08-23 13:49         ` Jes Sorensen
  0 siblings, 1 reply; 9+ messages in thread
From: Mariusz Tkaczyk @ 2022-07-20  8:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Guoqing Jiang, linux-raid, Jes Sorensen, Song Liu,
	Christoph Hellwig, Donald Buczek, Xiao Ni, Himanshu Madhani,
	Coly Li, Bruce Dubbs, Stephen Bates, Martin Oliveira,
	David Sloan

On Tue, 19 Jul 2022 10:43:06 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> On 2022-07-19 05:27, Mariusz Tkaczyk wrote:
> > On Fri, 15 Jul 2022 10:20:26 +0800
> > Guoqing Jiang <guoqing.jiang@linux.dev> wrote:  
> >> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:  
> >>> To fix this, don't bother trying to open the md device for CREATE and
> >>> ASSEMBLE commands, as the file descriptor will never be used anyway
> >>> even if it is successfully openned.  
> > Hi All,
> > 
> > This is not a fix, it just reduces race probability because file descriptor
> > will be opened later.  
> 
> That's not correct. The later "open" call actually will use the new_array
> parameter which will wait for the workqueue before creating a new array
> device, so the old one is properly cleaned up and there is no longer
> a race condition with this patch. If new_array doesn't exist and it falls back
> to a regular open, then it will still do the right thing and open the device 
> with create_on_open.

Array is created by /sys/module/md/parameters/new_array if chosen_name has
certain form. For IMSM, when we are creating arrays using "/dev/md/name" or
"name" only create_on_open is used (if no "names=yes" in config). I
understand that it works with tests but I don't feel that it is complete yet.
Could you how it behaves when we use "whatever"? 

#mdadm -CR whatever -l0 -n2 /dev/nvme[01]n1

Please do not use --name= parameter.

I want to disable create_on_open and always use new_array in the future, without
fallback to create_on_open possibility. So I would like to have solution which
is not relying on it.
> 
> > I tried to resolve it in the past by adding completion to md driver and
> > force mdadm --stop command to wait for sysfs clean up but I have never
> > finished it. IMO it is a better way, wait for device to be fully removed by
> > MD during stop. Thoughts?  
> 
> I don't think that would work very well. Userspace would end up blocking
> on --stop indefinitely if there are any references delaying cleanup to
> the device. That's not very user friendly. Given that opening the md
> device has side-effects, I think we should avoid opening when we
> should know that we are about to try to create a new device.

Got it, thanks!

Hmm, so maybe the existing MD device should be marked as "in the middle of
removal" somehow to gives mdadm possibility to detect that. If we are using
node as name "/dev/mdX" then we will need to throw error, but when node needs
to be determined by find_free_devnm() then it will simply skip this one and
gives next one. But it will require changes in kernel probably.

> 
> > One objection I have here is that error handling is changed, so it could be
> > harmful change for users.   
> 
> Hmm, yes seems like I was a bit sloppy here. However, it still seems possible
> to fix this up by not pre-opening the device. The only use for the mdfd
> in CREATE, ASSEMBLE and BUILD is to get the minor number if
> ident.super_minor == -2 and check if an existing specified device is an md 
> device (which may be redundant). We could replace this with a stat() call to
> avoid opening the device. What about the patch at the end of this email?

LGTM, I put small comment. But as I said before, probably it don't fix all
creation cases.

Thanks,
Mariusz

> 
> >>>
> >>> Side note: it would be nice to disable create_on_open as well to help
> >>> solve this, but it seems the work for this was never finished. By default,
> >>> mdadm will create using the old node interface when a name is specified
> >>> unless the user specifically puts names=yes in a config file, which
> >>> doesn't seem to be vcreate_on_openery common yet.    
> >>
> >> Hmm, 'create_on_open' is default to true, not sure if there is any side 
> >> effect
> >> after change to false.  
> > 
> > I'm slowly working on this. The change is not simple, starting from terrible
> > create_mddev code and char *mddev and char *name rules , ending with hidden
> > references which we are not aware off (it is common to create temp node and
> > open mddevice in mdadm) and other references in systemd (because of that,
> > udev is avoiding to open device).
> > This also indicate that we should drop partition discover in kernel and use
> > udev in the future, especially for external metadata (where RO -> RW
> > transition happens during assembly). But this is a separate problem.  
> 
> I'm glad to hear someone is still working on that. Thanks!
> 
> Logan
> 
> --
> 
> diff --git a/mdadm.c b/mdadm.c
> index 56722ed997a2..7b757c79d6bf 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -1349,6 +1349,8 @@ int main(int argc, char *argv[])
>  
>  	if (mode == MANAGE || mode == BUILD || mode == CREATE ||
>  	    mode == GROW || (mode == ASSEMBLE && ! c.scan)) {
> +		struct stat stb;
> +
>  		if (devs_found < 1) {
>  			pr_err("an md device must be given in this mode\n");
>  			exit(2);
> @@ -1361,37 +1363,31 @@ int main(int argc, char *argv[])
>  			mdfd = open_mddev(devlist->devname, 1);
>  			if (mdfd < 0)
>  				exit(1);
> +
> +			fstat(mdfd, &stb);
>  		} else {
>  			char *bname = basename(devlist->devname);
> +			int ret;
>  
>  			if (strlen(bname) > MD_NAME_MAX) {
>  				pr_err("Name %s is too long.\n",
> devlist->devname); exit(1);
>  			}
> -			/* non-existent device is OK */
> -			mdfd = open_mddev(devlist->devname, 0);
> -		}
> -		if (mdfd == -2) {
> -			pr_err("device %s exists but is not an md array.\n",
> devlist->devname);
> -			exit(1);
> -		}
> -		if ((int)ident.super_minor == -2) {
> -			struct stat stb;
> -			if (mdfd < 0) {
> +
> +			ret = stat(devlist->devname, &stb);
> +			if (ident.super_minor == -2 && ret) {
>  				pr_err("--super-minor=dev given, and listed
> device %s doesn't exist.\n", devlist->devname);
> +				exit(1);
> +			}
please check ret != 0.

> +
> +			if (!ret && !md_array_valid_stat(&stb)) {
> +				pr_err("device %s exists but is not an md
> array.\n", devlist->devname); exit(1);
>  			}
> -			fstat(mdfd, &stb);
> -			ident.super_minor = minor(stb.st_rdev);
> -		}
> -		if (mdfd >= 0 && mode != MANAGE && mode != GROW) {
> -			/* We don't really want this open yet, we just might
> -			 * have wanted to check some things
> -			 */
> -			close(mdfd);
> -			mdfd = -1;
>  		}
> +		if (ident.super_minor == -2)
> +			ident.super_minor = minor(stb.st_rdev);
>  	}
>  
>  	if (s.raiddisks) {
> diff --git a/mdadm.h b/mdadm.h
> index 05ef881f4709..7462336b381c 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -1504,6 +1504,7 @@ extern int Restore_metadata(char *dev, char *dir,
> struct context *c, struct supertype *st, int only);
>  
>  int md_array_valid(int fd);
> +int md_array_valid_stat(struct stat *st);
>  int md_array_active(int fd);
>  int md_array_is_active(struct mdinfo *info);
>  int md_get_array_info(int fd, struct mdu_array_info_s *array);
> diff --git a/util.c b/util.c
> index cc94f96ef120..20acdcf6af22 100644
> --- a/util.c
> +++ b/util.c
> @@ -250,6 +250,23 @@ int md_array_valid(int fd)
>  	return !ret;
>  }
>  
> +int md_array_valid_stat(struct stat *st)
> +{
> +	struct mdinfo *sra;
> +	char *devnm;
> +
> +	devnm = stat2devnm(st);
> +	if (!devnm)
> +		return 0;
> +
> +	sra = sysfs_read(-1, devnm, GET_ARRAY_STATE);
> +	if (!sra)
> +		return 0;
> +
> +	free(sra);
> +	return 1;
> +}
> +
>  int md_array_active(int fd)
>  {
>  	struct mdinfo *sra;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-07-14 22:37 [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE Logan Gunthorpe
  2022-07-15  2:20 ` Guoqing Jiang
@ 2022-07-20 18:59 ` Wols Lists
  1 sibling, 0 replies; 9+ messages in thread
From: Wols Lists @ 2022-07-20 18:59 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-raid, Jes Sorensen

On 14/07/2022 23:37, Logan Gunthorpe wrote:
> Side note: it would be nice to disable create_on_open as well to help
> solve this, but it seems the work for this was never finished. By default,
> mdadm will create using the old node interface when a name is specified
> unless the user specifically puts names=yes in a config file, which
> doesn't seem to be very common yet.

Putting ANYTHING in a config file is not that common.

I suspect there are very many systems out there, like mine, that work 
fine with auto-assemble and no config. I gather there are quite a few 
people out there who insist on having a config file, but IME it's a 
waste of time building one.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-07-20  8:20       ` Mariusz Tkaczyk
@ 2022-08-23 13:49         ` Jes Sorensen
  2022-08-23 14:07           ` Mariusz Tkaczyk
  0 siblings, 1 reply; 9+ messages in thread
From: Jes Sorensen @ 2022-08-23 13:49 UTC (permalink / raw)
  To: Mariusz Tkaczyk, Logan Gunthorpe
  Cc: Guoqing Jiang, linux-raid, Jes Sorensen, Song Liu,
	Christoph Hellwig, Donald Buczek, Xiao Ni, Himanshu Madhani,
	Coly Li, Bruce Dubbs, Stephen Bates, Martin Oliveira,
	David Sloan

On 7/20/22 04:20, Mariusz Tkaczyk wrote:
> On Tue, 19 Jul 2022 10:43:06 -0600
> Logan Gunthorpe <logang@deltatee.com> wrote:
> 
>> On 2022-07-19 05:27, Mariusz Tkaczyk wrote:
>>> On Fri, 15 Jul 2022 10:20:26 +0800
>>> Guoqing Jiang <guoqing.jiang@linux.dev> wrote:  
>>>> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:  
>>>>> To fix this, don't bother trying to open the md device for CREATE and
>>>>> ASSEMBLE commands, as the file descriptor will never be used anyway
>>>>> even if it is successfully openned.  
>>> Hi All,
>>>
>>> This is not a fix, it just reduces race probability because file descriptor
>>> will be opened later.  
>>
>> That's not correct. The later "open" call actually will use the new_array
>> parameter which will wait for the workqueue before creating a new array
>> device, so the old one is properly cleaned up and there is no longer
>> a race condition with this patch. If new_array doesn't exist and it falls back
>> to a regular open, then it will still do the right thing and open the device 
>> with create_on_open.
> 
> Array is created by /sys/module/md/parameters/new_array if chosen_name has
> certain form. For IMSM, when we are creating arrays using "/dev/md/name" or
> "name" only create_on_open is used (if no "names=yes" in config). I
> understand that it works with tests but I don't feel that it is complete yet.
> Could you how it behaves when we use "whatever"? 
> 
> #mdadm -CR whatever -l0 -n2 /dev/nvme[01]n1
> 
> Please do not use --name= parameter.
> 
> I want to disable create_on_open and always use new_array in the future, without
> fallback to create_on_open possibility. So I would like to have solution which
> is not relying on it.
>>
>>> I tried to resolve it in the past by adding completion to md driver and
>>> force mdadm --stop command to wait for sysfs clean up but I have never
>>> finished it. IMO it is a better way, wait for device to be fully removed by
>>> MD during stop. Thoughts?  
>>
>> I don't think that would work very well. Userspace would end up blocking
>> on --stop indefinitely if there are any references delaying cleanup to
>> the device. That's not very user friendly. Given that opening the md
>> device has side-effects, I think we should avoid opening when we
>> should know that we are about to try to create a new device.
> 
> Got it, thanks!
> 
> Hmm, so maybe the existing MD device should be marked as "in the middle of
> removal" somehow to gives mdadm possibility to detect that. If we are using
> node as name "/dev/mdX" then we will need to throw error, but when node needs
> to be determined by find_free_devnm() then it will simply skip this one and
> gives next one. But it will require changes in kernel probably.
> 
>>
>>> One objection I have here is that error handling is changed, so it could be
>>> harmful change for users.   
>>
>> Hmm, yes seems like I was a bit sloppy here. However, it still seems possible
>> to fix this up by not pre-opening the device. The only use for the mdfd
>> in CREATE, ASSEMBLE and BUILD is to get the minor number if
>> ident.super_minor == -2 and check if an existing specified device is an md 
>> device (which may be redundant). We could replace this with a stat() call to
>> avoid opening the device. What about the patch at the end of this email?
> 
> LGTM, I put small comment. But as I said before, probably it don't fix all
> creation cases.

Hi Mariusz,

Just to recap on this, do you support applying this patch as is, or
should we wait for the longer term fix you were mentioning?

Thanks,
Jes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-08-23 13:49         ` Jes Sorensen
@ 2022-08-23 14:07           ` Mariusz Tkaczyk
  2022-08-23 14:10             ` Jes Sorensen
  0 siblings, 1 reply; 9+ messages in thread
From: Mariusz Tkaczyk @ 2022-08-23 14:07 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Logan Gunthorpe, Guoqing Jiang, linux-raid, Jes Sorensen,
	Song Liu, Christoph Hellwig, Donald Buczek, Xiao Ni,
	Himanshu Madhani, Coly Li, Bruce Dubbs, Stephen Bates,
	Martin Oliveira, David Sloan

On Tue, 23 Aug 2022 09:49:11 -0400
Jes Sorensen <jes@trained-monkey.org> wrote:

> On 7/20/22 04:20, Mariusz Tkaczyk wrote:
> > On Tue, 19 Jul 2022 10:43:06 -0600
> > Logan Gunthorpe <logang@deltatee.com> wrote:
> >   
> >> On 2022-07-19 05:27, Mariusz Tkaczyk wrote:  
> >>> On Fri, 15 Jul 2022 10:20:26 +0800
> >>> Guoqing Jiang <guoqing.jiang@linux.dev> wrote:    
> >>>> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:    
> >>>>> To fix this, don't bother trying to open the md device for CREATE and
> >>>>> ASSEMBLE commands, as the file descriptor will never be used anyway
> >>>>> even if it is successfully openned.    
> >>> Hi All,
> >>>
> >>> This is not a fix, it just reduces race probability because file
> >>> descriptor will be opened later.    
> >>
> >> That's not correct. The later "open" call actually will use the new_array
> >> parameter which will wait for the workqueue before creating a new array
> >> device, so the old one is properly cleaned up and there is no longer
> >> a race condition with this patch. If new_array doesn't exist and it falls
> >> back to a regular open, then it will still do the right thing and open the
> >> device with create_on_open.  
> > 
> > Array is created by /sys/module/md/parameters/new_array if chosen_name has
> > certain form. For IMSM, when we are creating arrays using "/dev/md/name" or
> > "name" only create_on_open is used (if no "names=yes" in config). I
> > understand that it works with tests but I don't feel that it is complete
> > yet. Could you how it behaves when we use "whatever"? 
> > 
> > #mdadm -CR whatever -l0 -n2 /dev/nvme[01]n1
> > 
> > Please do not use --name= parameter.
> > 
> > I want to disable create_on_open and always use new_array in the future,
> > without fallback to create_on_open possibility. So I would like to have
> > solution which is not relying on it.  
> >>  
> >>> I tried to resolve it in the past by adding completion to md driver and
> >>> force mdadm --stop command to wait for sysfs clean up but I have never
> >>> finished it. IMO it is a better way, wait for device to be fully removed
> >>> by MD during stop. Thoughts?    
> >>
> >> I don't think that would work very well. Userspace would end up blocking
> >> on --stop indefinitely if there are any references delaying cleanup to
> >> the device. That's not very user friendly. Given that opening the md
> >> device has side-effects, I think we should avoid opening when we
> >> should know that we are about to try to create a new device.  
> > 
> > Got it, thanks!
> > 
> > Hmm, so maybe the existing MD device should be marked as "in the middle of
> > removal" somehow to gives mdadm possibility to detect that. If we are using
> > node as name "/dev/mdX" then we will need to throw error, but when node
> > needs to be determined by find_free_devnm() then it will simply skip this
> > one and gives next one. But it will require changes in kernel probably.
> >   
> >>  
> >>> One objection I have here is that error handling is changed, so it could
> >>> be harmful change for users.     
> >>
> >> Hmm, yes seems like I was a bit sloppy here. However, it still seems
> >> possible to fix this up by not pre-opening the device. The only use for
> >> the mdfd in CREATE, ASSEMBLE and BUILD is to get the minor number if
> >> ident.super_minor == -2 and check if an existing specified device is an md 
> >> device (which may be redundant). We could replace this with a stat() call
> >> to avoid opening the device. What about the patch at the end of this
> >> email?  
> > 
> > LGTM, I put small comment. But as I said before, probably it don't fix all
> > creation cases.  
> 
> Hi Mariusz,
> 
> Just to recap on this, do you support applying this patch as is, or
> should we wait for the longer term fix you were mentioning?
> 
Hi Jes,
This patch looks good. Please apply next one version:
https://lore.kernel.org/linux-raid/20220727215246.121365-3-logang@deltatee.com/

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE
  2022-08-23 14:07           ` Mariusz Tkaczyk
@ 2022-08-23 14:10             ` Jes Sorensen
  0 siblings, 0 replies; 9+ messages in thread
From: Jes Sorensen @ 2022-08-23 14:10 UTC (permalink / raw)
  To: Mariusz Tkaczyk
  Cc: Logan Gunthorpe, linux-raid, Jes Sorensen, Song Liu, Coly Li

On 8/23/22 10:07, Mariusz Tkaczyk wrote:
> On Tue, 23 Aug 2022 09:49:11 -0400
> Jes Sorensen <jes@trained-monkey.org> wrote:

>> Hi Mariusz,
>>
>> Just to recap on this, do you support applying this patch as is, or
>> should we wait for the longer term fix you were mentioning?
>>
> Hi Jes,
> This patch looks good. Please apply next one version:
> https://lore.kernel.org/linux-raid/20220727215246.121365-3-logang@deltatee.com/

Great, thanks for the quick response.

Applied!

Thanks,
Jes


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-23 17:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 22:37 [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE Logan Gunthorpe
2022-07-15  2:20 ` Guoqing Jiang
2022-07-19 11:27   ` Mariusz Tkaczyk
2022-07-19 16:43     ` Logan Gunthorpe
2022-07-20  8:20       ` Mariusz Tkaczyk
2022-08-23 13:49         ` Jes Sorensen
2022-08-23 14:07           ` Mariusz Tkaczyk
2022-08-23 14:10             ` Jes Sorensen
2022-07-20 18:59 ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.