All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm ignoring homehost?
@ 2009-03-24 16:57 Jon Nelson
  2009-04-01 15:15 ` Jon Nelson
  2009-04-01 22:47 ` Michal Soltys
  0 siblings, 2 replies; 59+ messages in thread
From: Jon Nelson @ 2009-03-24 16:57 UTC (permalink / raw)
  To: LinuxRaid

I have a raid1 comprised of a local physical device (/dev/sda) and a
network block device (/dev/nbd0).
When the machine hosting the network block device comes up, however,
it creates /dev/md127.
Why?

On the machine hosting the network block device, /dev/sdb is what
backs /dev/nbd0.
This is physical storage for /dev/nbd0:

frank:~ # mdadm --examine /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
           Name : turnip:11
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
     Array Size : 156247976 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
   Super Offset : 160086512 sectors
          State : clean
    Device UUID : 01524a75:c309869c:6da972c9:084115c6

Internal Bitmap : 2 sectors from superblock
      Flags : write-mostly
    Update Time : Tue Mar 24 11:41:41 2009
       Checksum : 643e99c0 - correct
         Events : 111338


    Array Slot : 2 (failed, failed, empty, 1)
   Array State : _u 2 failed
frank:~ #


As you can see, the "Name" attribute is "turnip:11". The hostname is
"frank". Why did frank bring up the device?
The only thing in frank's /etc/mdadm.conf is "HOMEHOST frank" which I
didn't think was necessary anyway.



-- 
Jon

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: mdadm ignoring homehost?
  2009-03-24 16:57 mdadm ignoring homehost? Jon Nelson
@ 2009-04-01 15:15 ` Jon Nelson
  2009-04-01 22:46   ` Neil Brown
  2009-04-01 22:47 ` Michal Soltys
  1 sibling, 1 reply; 59+ messages in thread
From: Jon Nelson @ 2009-04-01 15:15 UTC (permalink / raw)
  To: LinuxRaid

ping?

On Tue, Mar 24, 2009 at 11:57 AM, Jon Nelson
<jnelson-linux-raid@jamponi.net> wrote:
>
> I have a raid1 comprised of a local physical device (/dev/sda) and a
> network block device (/dev/nbd0).
> When the machine hosting the network block device comes up, however,
> it creates /dev/md127.
> Why?
>
> On the machine hosting the network block device, /dev/sdb is what
> backs /dev/nbd0.
> This is physical storage for /dev/nbd0:
>
> frank:~ # mdadm --examine /dev/sdb
> /dev/sdb:
>          Magic : a92b4efc
>        Version : 1.0
>    Feature Map : 0x1
>     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
>           Name : turnip:11
>  Creation Time : Mon Dec 15 07:06:13 2008
>     Raid Level : raid1
>   Raid Devices : 2
>
>  Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
>     Array Size : 156247976 (74.50 GiB 80.00 GB)
>  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
>   Super Offset : 160086512 sectors
>          State : clean
>    Device UUID : 01524a75:c309869c:6da972c9:084115c6
>
> Internal Bitmap : 2 sectors from superblock
>      Flags : write-mostly
>    Update Time : Tue Mar 24 11:41:41 2009
>       Checksum : 643e99c0 - correct
>         Events : 111338
>
>
>    Array Slot : 2 (failed, failed, empty, 1)
>   Array State : _u 2 failed
> frank:~ #
>
>
> As you can see, the "Name" attribute is "turnip:11". The hostname is
> "frank". Why did frank bring up the device?
> The only thing in frank's /etc/mdadm.conf is "HOMEHOST frank" which I
> didn't think was necessary anyway.
>
>
>
> --
> Jon



--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: mdadm ignoring homehost?
  2009-04-01 15:15 ` Jon Nelson
@ 2009-04-01 22:46   ` Neil Brown
  2009-04-06 14:47     ` [Patch] " Doug Ledford
  0 siblings, 1 reply; 59+ messages in thread
From: Neil Brown @ 2009-04-01 22:46 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Wednesday April 1, jnelson-linux-raid@jamponi.net wrote:
> ping?

Oh yeah, that's right, I was going to reply to that - thanks for the
reminder. 

> 
> On Tue, Mar 24, 2009 at 11:57 AM, Jon Nelson
> <jnelson-linux-raid@jamponi.net> wrote:
> >
> > I have a raid1 comprised of a local physical device (/dev/sda) and a
> > network block device (/dev/nbd0).
> > When the machine hosting the network block device comes up, however,
> > it creates /dev/md127.
> > Why?

Because you cannot please all the people, all the time.

People seem to want their arrays to auto-assemble - you know, just
appear and do the right thing, read their mind probably, because
creating config files is too hard.
So I've endeavoured to make that happen.

The biggest problem with auto-assembly is what to do if two arrays
claim to have the same name. (e.g. /dev/md0) - which one wins.
The 'homehost' is (currently) used to resolve that.  An array only
gets to use the name it claims to have if it can show that it belongs
to "this" host.  If it doesn't it still get assembled, but with some
other more generic name.

But you want to actually stop some arrays from being assembled on a
particular host, in this case because the device is shared with
another host which "owns" the array.  mdadm cannot currently cope with
that.   But we have the source Luke!

Maybe I could make a three-way distinction with the 'homehost':
  - this host
  - no host specified
  - some other host

and only auto-assemble the first two. I suspect that would
inconvenience someone else though....

For now, my only suggestion is to provide a "DEVICES" line in you
mdadm.conf which lists all the device that you do want assembled into
arrays, but excluded /dev/sdb.

NeilBrown



> >
> > On the machine hosting the network block device, /dev/sdb is what
> > backs /dev/nbd0.
> > This is physical storage for /dev/nbd0:
> >
> > frank:~ # mdadm --examine /dev/sdb
> > /dev/sdb:
> >          Magic : a92b4efc
> >        Version : 1.0
> >    Feature Map : 0x1
> >     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
> >           Name : turnip:11
> >  Creation Time : Mon Dec 15 07:06:13 2008
> >     Raid Level : raid1
> >   Raid Devices : 2
> >
> >  Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
> >     Array Size : 156247976 (74.50 GiB 80.00 GB)
> >  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
> >   Super Offset : 160086512 sectors
> >          State : clean
> >    Device UUID : 01524a75:c309869c:6da972c9:084115c6
> >
> > Internal Bitmap : 2 sectors from superblock
> >      Flags : write-mostly
> >    Update Time : Tue Mar 24 11:41:41 2009
> >       Checksum : 643e99c0 - correct
> >         Events : 111338
> >
> >
> >    Array Slot : 2 (failed, failed, empty, 1)
> >   Array State : _u 2 failed
> > frank:~ #
> >
> >
> > As you can see, the "Name" attribute is "turnip:11". The hostname is
> > "frank". Why did frank bring up the device?
> > The only thing in frank's /etc/mdadm.conf is "HOMEHOST frank" which I
> > didn't think was necessary anyway.
> >
> >
> >
> > --
> > Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: mdadm ignoring homehost?
  2009-03-24 16:57 mdadm ignoring homehost? Jon Nelson
  2009-04-01 15:15 ` Jon Nelson
@ 2009-04-01 22:47 ` Michal Soltys
  1 sibling, 0 replies; 59+ messages in thread
From: Michal Soltys @ 2009-04-01 22:47 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

Jon Nelson wrote:
> I have a raid1 comprised of a local physical device (/dev/sda) and a
> network block device (/dev/nbd0).
> When the machine hosting the network block device comes up, however,
> it creates /dev/md127.
> Why?
> 

Likely due to udev. Stock udev rules for md devices create with names as 
seen by kernel. When mdadm assembles arrary with non standard name, the 
suitable kernel name will be chosen, often md127 or md_d127 depending on 
the scheme (regular, or legacy partitionable)

> On the machine hosting the network block device, /dev/sdb is what
> backs /dev/nbd0.
> This is physical storage for /dev/nbd0:
> 
> ...
> 
> As you can see, the "Name" attribute is "turnip:11". The hostname is
> "frank". Why did frank bring up the device?

Homehost is secondary to ARRAY - so if you have ARRAY line and all 
metadata matches (or appropriate switches on the commandline) - the 
array will be assembled regardless of the hostname.

> The only thing in frank's /etc/mdadm.conf is "HOMEHOST frank" which I
> didn't think was necessary anyway.
> 

If the only thing there is that HOMEHOST line, and you're trying to 
assembly using just mdadm -As, then the assembly shouldn't succeed, afaik.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-01 22:46   ` Neil Brown
@ 2009-04-06 14:47     ` Doug Ledford
  2009-04-06 19:33       ` Luca Berra
  2009-04-17  3:49       ` Neil Brown
  0 siblings, 2 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-06 14:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jon Nelson, LinuxRaid


[-- Attachment #1.1: Type: text/plain, Size: 5478 bytes --]

On Apr 1, 2009, at 6:46 PM, Neil Brown wrote:

> On Wednesday April 1, jnelson-linux-raid@jamponi.net wrote:
>> ping?
>
> Oh yeah, that's right, I was going to reply to that - thanks for the
> reminder.
>
>>
>> On Tue, Mar 24, 2009 at 11:57 AM, Jon Nelson
>> <jnelson-linux-raid@jamponi.net> wrote:
>>>
>>> I have a raid1 comprised of a local physical device (/dev/sda) and a
>>> network block device (/dev/nbd0).
>>> When the machine hosting the network block device comes up, however,
>>> it creates /dev/md127.
>>> Why?
>
> Because you cannot please all the people, all the time.

Very true.

>
> People seem to want their arrays to auto-assemble - you know, just
> appear and do the right thing, read their mind probably, because
> creating config files is too hard.
> So I've endeavoured to make that happen.
>
> The biggest problem with auto-assembly is what to do if two arrays
> claim to have the same name. (e.g. /dev/md0) - which one wins.
> The 'homehost' is (currently) used to resolve that.  An array only
> gets to use the name it claims to have if it can show that it belongs
> to "this" host.  If it doesn't it still get assembled, but with some
> other more generic name.

FWIW, I happen to disagree with this method.  And I'm currently  
testing out a new algorithm for this in Fedora 11 beta.

The logic behind this in mdadm-3.0devel3 is basically "if the array  
exists in mdadm.conf or if it has this homehost, assemble using normal  
name, else use a random name".  However, in the world of movable  
arrays (think one of those 5 disk SATA raid towers that just has a  
single eSATA port and a port replicator, which can easily be moved  
from machine to machine), this doesn't work so well.  The problem is  
that when you assemble an array with a random number, you confuse  
users.  They might find the array eventually, but it's certainly not  
as easy as if the array used the name they expected.  In an attempt to  
get mdadm to not possibly conflict with local array names, the  
homehost method of selecting which array name to use causes confusion  
all the time, instead of only confusing users when a conflict actually  
occurs.  This doesn't make sense to me, so I redid the tests in mdadm  
to change this (this is exacerbated by the fact that if your array  
does not define a homehost, it gets treated as though it has a  
different homehost, so common version 0.90 arrays will always get  
assembled as a random number if they aren't in the mdadm.conf file  
whether they are meant for this host or not).

So, my logic goes like this:

Does the array match an array mdadm.conf via uuid?  If yes, use name  
from mdadm.conf.  If no, does the array match an entry in mdadm.conf  
via the standard super-minor/name mapping?  If yes, and that array  
line contains a uuid that doesn't match this entry, then use a random  
name because this is likely a conflict.  If yes and that line does not  
contain a uuid entry, then this is likely a match, but a poor one.   
Use the name, but don't like it.  If no, then this array didn't match  
the mdadm.conf file at all and is likely a foreign array.  However, if  
there is no mdadm.conf file, or if there is a mdadm.conf file and  
nothing in it used our name, then foreign or not, it likely won't  
conflict on name, so go ahead and use the standard name for this device.

I had to modify the match loop to store both uuid and name matches  
separately in order to support this logic.  There's some other changes  
that were necessary in order to make it work properly, and I had to  
change mdopen.c to automatically go from what we thought was a good  
name to a random name if a conflict on an array happens in order to  
avoid failed autoassembles.  However, I'm personally much happier with  
the results.  For example, I can define md0 in the mdadm.conf file,  
create two different md0 arrays, then attempt to autoassemble the one  
that isn't in mdadm.conf and it will automatically get a random name  
and when the one that is in mdadm.conf shows up it gets the right  
name.  I can also define to md0 arrays with neither of them in the  
mdadm.conf file and it will assemble the first as md0 and the second  
as name md0_0 with a random minor (I think, it's been a week or so  
since I did that testing).  Anyway, it works well, and it basically  
negates the need for homehost in my opinion.  And the fact that it  
only assembles an array with a random number when it truly needs to is  
something that will help to greatly reduce confusion of users, which  
is always a plus in my book.  I'll attach the patch for your review.   
I could have shortened the logic in the match tests to just what's  
needed to set things right, but I left the long version so people can  
see all the possible options and why a specific setting is chosen on  
any given option.  Oh, and the patch also loosens up the name matching  
somewhat so that if someone names their device /dev/md0, that matches  
super-minor 0, as does md0 and just plain 0.  The original match  
setup, at least for devices not in the mdadm.conf file with a name in  
the array line, would only match the array name if it was numeric only  
(aka, homehost:0 or just 0).  I found that to be overly restrictive  
and contrary to what a lot of people would expect should be entered in  
the name field of the superblock.

Since I'm sending this anyway, I'll send a couple other changes I made  
to our mdadm in separate mails.


[-- Attachment #1.2: mdadm-3.0-foreign.patch --]
[-- Type: application/octet-stream, Size: 9968 bytes --]

--- mdadm-3.0-devel3/Incremental.c.foreign	2009-03-20 17:49:20.000000000 -0400
+++ mdadm-3.0-devel3/Incremental.c	2009-03-20 21:19:50.000000000 -0400
@@ -29,12 +29,14 @@
  */
 
 #include	"mdadm.h"
+#include	<ctype.h>
 
 static int count_active(struct supertype *st, int mdfd, char **availp,
 			struct mdinfo *info);
 static void find_reject(int mdfd, struct supertype *st, struct mdinfo *sra,
 			int number, __u64 events, int verbose,
 			char *array_name);
+static int compare_array_name(char *conf_name, char *sb_name);
 
 int Incremental(char *devname, int verbose, int runstop,
 		struct supertype *st, char *homehost, int autof)
@@ -48,8 +50,10 @@ int Incremental(char *devname, int verbo
 	 * 2/ Find metadata, reject if none appropriate (check
 	 *       version/name from args)
 	 * 3/ Check if there is a match in mdadm.conf
-	 * 3a/ if not, check for homehost match.  If no match, assemble as
-	 *    a 'foreign' array.
+	 * 3a/ Evalutate the quality of match and whether or not we have a
+	 * 	 conf file at all, and make a decision about whether or not
+	 * 	 to allow this array to keep its preferred name based upon 
+	 * 	 that
 	 * 4/ Determine device number.
 	 * - If in mdadm.conf with std name, use that
 	 * - UUID in /dev/md/mdadm.map  use that
@@ -78,7 +82,7 @@ int Incremental(char *devname, int verbo
 	 */
 	struct stat stb;
 	struct mdinfo info;
-	struct mddev_ident_s *array_list, *match;
+	struct mddev_ident_s *array_list, *match, *match_uuid, *match_name;
 	char chosen_name[1024];
 	int rv;
 	struct map_ent *mp, *map = NULL;
@@ -148,26 +152,42 @@ int Incremental(char *devname, int verbo
 	st->ss->getinfo_super(st, &info);
 	/* 3/ Check if there is a match in mdadm.conf */
 
+	name_to_use = strchr(info.name, ':');
+	if (name_to_use)
+		name_to_use++;
+	else
+		name_to_use = info.name;
 	array_list = conf_get_ident(NULL);
 	match = NULL;
+	match_uuid = NULL;
+	match_name = NULL;
 	for (; array_list; array_list = array_list->next) {
+		/* Check for matching uuid, then drop through to check and see
+		 * if we also have a matching name, and to catch cases of
+		 * matching names without a corresponding uuid match */
 		if (array_list->uuid_set &&
 		    same_uuid(array_list->uuid, info.uuid, st->ss->swapuuid)
-		    == 0) {
-			if (verbose >= 2 && array_list->devname)
+		    != 0)
+			match_uuid = array_list;
+		else if (array_list->uuid_set && verbose >= 2 &&
+			 array_list->devname)
 				fprintf(stderr, Name
 					": UUID differs from %s.\n",
 					array_list->devname);
-			continue;
-		}
+		/* If we match name, save it off separately so we can tell if
+		 * we matched uuid, name, or both, and if both, if they were
+		 * the same entry */
 		if (array_list->name[0] &&
-		    strcasecmp(array_list->name, info.name) != 0) {
-			if (verbose >= 2 && array_list->devname)
+		    compare_array_name(array_list->name, info.name))
+			match_name = array_list;
+		else if (array_list->name[0] && verbose >= 2 &&
+			 array_list->devname)
 				fprintf(stderr, Name
 					": Name differs from %s.\n",
 					array_list->devname);
+		if ((!match_uuid || match == match_uuid) &&
+		    (!match_name || match == match_name))
 			continue;
-		}
 		if (array_list->devices &&
 		    !match_oneof(array_list->devices, devname)) {
 			if (verbose >= 2 && array_list->devname)
@@ -197,7 +217,13 @@ int Incremental(char *devname, int verbo
 		/* FIXME, should I check raid_disks and level too?? */
 
 		if (match) {
-			if (verbose >= 0) {
+			if (match_uuid != match_name) {
+				if (match_uuid->devname)
+					fprintf(stderr, Name ": more than one "
+						"match for %s, using the UUID "
+						"match\n", match_uuid->devname);
+				match = match_uuid;
+			} else if (verbose >= 0) {
 				if (match->devname && array_list->devname)
 					fprintf(stderr, Name
 		   ": we match both %s and %s - cannot decide which to use.\n",
@@ -205,23 +231,52 @@ int Incremental(char *devname, int verbo
 				else
 					fprintf(stderr, Name
 						": multiple lines in mdadm.conf match\n");
+				return 2;
 			}
-			return 2;
 		}
 		match = array_list;
 	}
 
-	/* 3a/ if not, check for homehost match.  If no match, continue
-	 * but don't trust the 'name' in the array. Thus a 'random' minor
-	 * number will be assigned, and the device name will be based
-	 * on that. */
-	if (match)
+	/* 3a/ Decide if we got a good match, two matches, no matches, or a
+	 * likely foreign match.  I dropped the homehost test entirely because
+	 * it didn't seem to add any value whatsoever above and beyond what
+	 * these tests can do. */
+	if (match && match_uuid == match_name) {
+		/* found in conf, both name and uuid match */
 		trustworthy = LOCAL;
-	else if (homehost == NULL ||
-		 st->ss->match_home(st, homehost) != 1)
-		trustworthy = FOREIGN;
-	else
+	} else if (match_uuid && match_name) {
+		/* found both a name and a uuid match, but not on the same
+		 * entry, so prefer the uuid match (done above) */
 		trustworthy = LOCAL;
+	} else if (!match_uuid && match_name) {
+		/* no uuid match, but name match */
+		if (match_name->uuid_set) {
+			/* oops, name that matched had a uuid, it just wasn't
+			 * right, assume there is a local device with both
+			 * a matching name and uuid, so this needs a random
+			 * name */
+			trustworthy = FOREIGN;
+			match = NULL;
+		} else
+			/* matched name, and the matching entry in conf file
+			 * didn't include a uuid, and this uuid never showed
+			 * up anywhere else in the conf file, so consider it
+			 * a soft match and allow it...although users should
+			 * *REALLY* include the uuid on array lines in the
+			 * conf file */
+			trustworthy = LOCAL;
+	} else { /* no match at all */
+		if (!conf_exists())
+			/* If we don't even have a conf file, this is foreign,
+			 * but also not likely to conflict with anything
+			 * local, so let it keep its preferred name */
+			trustworthy = LOCAL;
+		else
+			/* We have a conf file, this didn't match any uuids
+			 * or names, so also not likely to conflict, let it
+			 * keep its own name */
+			trustworthy = LOCAL;
+	}
 
 	/* There are three possible sources for 'autof':  command line,
 	 * ARRAY line in mdadm.conf, or CREATE line in mdadm.conf.
@@ -240,11 +295,6 @@ int Incremental(char *devname, int verbo
 		return Incremental_container(st, devname, verbose, runstop,
 					     autof, trustworthy);
 	}
-	name_to_use = strchr(info.name, ':');
-	if (name_to_use)
-		name_to_use++;
-	else
-		name_to_use = info.name;
 
 	if ((!name_to_use || name_to_use[0] == 0) &&
 	    info.array.level == LEVEL_CONTAINER &&
@@ -797,3 +847,45 @@ int Incremental_container(struct superty
 	map_unlock(&map);
 	return 0;
 }
+
+static int compare_array_name(char *conf_name, char *sb_name)
+{
+	char *cptr, *sptr;
+	int conf_num = -1;
+
+	/* usage of the name variable in the superblock comes in several
+	 * flavors:
+	 * A) full md pathname (/dev/md0)
+	 * B) just the md name (md0)
+	 * C) just the md number (0)
+	 * D) all of the above, but with hostname: prefixed to it
+	 *
+	 * Depending on which of those variants we have, we need to alter
+	 * how we attempt to match the array name in the mdadm.conf file
+	 * which is always a full pathname.  We don't match on hostname:
+	 * though, so eliminate it from the equation.
+	 */
+
+	if ((sptr = strchr(sb_name, ':')) == NULL)
+		sptr = sb_name;
+	else
+		sptr++;
+
+	/* Do we have a full pathname in the superblock name field? */
+	if (strchr(sptr, '/'))
+		return !strcasecmp(conf_name, sptr);
+	/* If not, is it just a number or an md device name? */
+	else if (isdigit(sptr[0])) {
+		cptr = conf_name + strlen(conf_name);
+		while (cptr > conf_name && isdigit(cptr[-1]))
+			cptr--;
+		if (cptr[0])
+			conf_num = strtoul(cptr, NULL, 10);
+		return conf_num == strtoul(sptr, NULL, 10);
+	} /* fall through else, it's a device name but not a full path */
+
+	cptr = strcasestr(conf_name, sptr);
+	if (cptr)
+		return !strcasecmp(cptr, sptr);
+	return 0;
+}
--- mdadm-3.0-devel3/mdadm.h.foreign	2009-03-10 01:39:41.000000000 -0400
+++ mdadm-3.0-devel3/mdadm.h	2009-03-20 17:49:20.000000000 -0400
@@ -785,6 +785,7 @@ extern mddev_dev_t conf_get_devs(void);
 extern int conf_test_dev(char *devname);
 extern struct createinfo *conf_get_create_info(void);
 extern void set_conffile(char *file);
+extern int conf_exists(void);
 extern char *conf_get_mailaddr(void);
 extern char *conf_get_mailfrom(void);
 extern char *conf_get_program(void);
--- mdadm-3.0-devel3/mdopen.c.foreign	2009-03-20 19:02:38.000000000 -0400
+++ mdadm-3.0-devel3/mdopen.c	2009-03-20 19:02:43.000000000 -0400
@@ -159,7 +159,6 @@ int create_mddev(char *dev, char *name, 
 	strcpy(chosen, "/dev/md/");
 	cname = chosen + strlen(chosen);
 
-
 	if (dev) {
 		
 		if (strncmp(dev, "/dev/md/", 8) == 0) {
@@ -240,12 +239,14 @@ int create_mddev(char *dev, char *name, 
 	if (num < 0 && trustworthy == LOCAL && name) {
 		/* if name is numeric, use that for num
 		 * if it is not already in use */
-		char *ep;
-		num = strtoul(name, &ep, 10);
-		if (ep == name || *ep)
-			num = -1;
-		else if (mddev_busy(use_mdp ? (-1-num) : num))
-			num = -1;
+		char *e = name + strlen(name);
+		while (e > name && isdigit(e[-1]))
+			e--;
+		if (e[0]) {
+			num = strtoul(e, NULL, 10);
+			if (mddev_busy(use_mdp ? (-1-num) : num))
+				num = -1;
+		}
 	}
 
 	if (num < 0) {
--- mdadm-3.0-devel3/config.c.foreign	2009-03-10 01:39:41.000000000 -0400
+++ mdadm-3.0-devel3/config.c	2009-03-20 17:49:20.000000000 -0400
@@ -637,7 +637,7 @@ void homehostline(char *line)
 	}
 }
 
-
+int exists = 0;
 int loaded = 0;
 
 static char *conffile = NULL;
@@ -683,6 +683,7 @@ void load_conffile(void)
 	if (f == NULL)
 		return;
 
+	exists = 1;
 	loaded = 1;
 	while ((line=conf_line(f))) {
 		switch(match_keyword(line)) {
@@ -718,6 +719,13 @@ void load_conffile(void)
 /*    printf("got file\n"); */
 }
 
+int conf_exists(void)
+{
+	if (!loaded)
+		load_conffile();
+	return exists;
+}
+
 char *conf_get_mailaddr(void)
 {
 	load_conffile();

[-- Attachment #1.3: Type: text/plain, Size: 171 bytes --]



--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-06 14:47     ` [Patch] " Doug Ledford
@ 2009-04-06 19:33       ` Luca Berra
  2009-04-17  3:49       ` Neil Brown
  1 sibling, 0 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-06 19:33 UTC (permalink / raw)
  To: LinuxRaid

On Mon, Apr 06, 2009 at 10:47:00AM -0400, Doug Ledford wrote:
> FWIW, I happen to disagree with this method.  And I'm currently testing out 
> a new algorithm for this in Fedora 11 beta.
>
> Does the array match an array mdadm.conf via uuid?  If yes, use name from 
> mdadm.conf.  If no, does the array match an entry in mdadm.conf via the 
> standard super-minor/name mapping?  If yes, and that array line contains a 
> uuid that doesn't match this entry, then use a random name because this is 
> likely a conflict.  If yes and that line does not contain a uuid entry, 
> then this is likely a match, but a poor one.  Use the name, but don't like 
> it.  If no, then this array didn't match the mdadm.conf file at all and is 
> likely a foreign array.  However, if there is no mdadm.conf file, or if 
> there is a mdadm.conf file and nothing in it used our name, then foreign or 
> not, it likely won't conflict on name, so go ahead and use the standard 
> name for this device.
i think the idea is sound,
i only took a glance at the implementation, but i cannot understand the
use of "conf_exists()" function...

Regards,
L.


-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-06 14:47     ` [Patch] " Doug Ledford
  2009-04-06 19:33       ` Luca Berra
@ 2009-04-17  3:49       ` Neil Brown
  2009-04-17  7:08         ` Gabor Gombas
  2009-04-17 18:17         ` Doug Ledford
  1 sibling, 2 replies; 59+ messages in thread
From: Neil Brown @ 2009-04-17  3:49 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Jon Nelson, LinuxRaid

On Monday April 6, dledford@redhat.com wrote:
> On Apr 1, 2009, at 6:46 PM, Neil Brown wrote:
> 
> > On Wednesday April 1, jnelson-linux-raid@jamponi.net wrote:
> >> ping?
> >
> > Oh yeah, that's right, I was going to reply to that - thanks for the
> > reminder.
> >
> >>
> >> On Tue, Mar 24, 2009 at 11:57 AM, Jon Nelson
> >> <jnelson-linux-raid@jamponi.net> wrote:
> >>>
> >>> I have a raid1 comprised of a local physical device (/dev/sda) and a
> >>> network block device (/dev/nbd0).
> >>> When the machine hosting the network block device comes up, however,
> >>> it creates /dev/md127.
> >>> Why?
> >
> > Because you cannot please all the people, all the time.
> 
> Very true.

And I fear I'm going to be displeasing again :-(

> 
> >
> > People seem to want their arrays to auto-assemble - you know, just
> > appear and do the right thing, read their mind probably, because
> > creating config files is too hard.
> > So I've endeavoured to make that happen.
> >
> > The biggest problem with auto-assembly is what to do if two arrays
> > claim to have the same name. (e.g. /dev/md0) - which one wins.
> > The 'homehost' is (currently) used to resolve that.  An array only
> > gets to use the name it claims to have if it can show that it belongs
> > to "this" host.  If it doesn't it still get assembled, but with some
> > other more generic name.
> 
> FWIW, I happen to disagree with this method.  And I'm currently  
> testing out a new algorithm for this in Fedora 11 beta.

Thank you for explaining this in such detail.
There are aspects of it that I don't like, but I think there might be
pieces that I can take away from it too.

As you probably know, my preferred solution is to have all arrays
listed in /etc/mdadm.conf.  If it isn't in mdadm.conf, it doesn't get
assembled.   But I don't have a lot of company in this opinion.  Lots
of people want to have arrays assembled without them being in
mdadm.conf, and I'm trying to work with that.

Parts of what you are proposing seem to involve expecting people to
take a middle ground with some arrays listed in mdadm.conf and other
that aren't.  I'm not sure I'm happy with expecting people to do that
(though of course I'm happy to support it).
So the various parts of your algorithm which involve heuristics
based on the entries in mdadm.conf - or on the existence of mdadm.conf
itself - are parts that I don't feel comfortable with.

What is left?  Well, the observation that moving an external
multi-drive enclosure between hosts causes confusing naming is a valid
and useful observation.

Someone should be able to create an array on such a device called
'foo' and get '/dev/md/foo' created on any host.
The best thought I have come to so far is to support (and document)
something like
  --create --homehost=any
or
  --create --homehost=*

with the meaning that the array so created will get preferential
access to it's recorded name (i.e. no "_0" suffix).

I also wonder if, when mdadm finds an array that is explicitly for
another host, we could use that host name rather than _0 to
disambiguate.  So
  --create /dev/md/foo --homehost=bob
when assembled on some other host is called
       /dev/md/foo_bob
that might at least make it more obvious what is happening.


Note that 0.90 metadata does contain homehost information to some
extent.  When homehost is set, the last few bytes of the uuid is set
from a hash of the homehost name.  That makes it possible to test if a
0.90 array was created for 'this' host, but not to find out what host
it was created for.  So the above expedient won't work for 0.90
arrays, but the rest of the homehost concept (including any possible
'homehost=any' option) does.

You note that arrays with no homehost are treated as foreign with not
always being a good thing.  In 3.0, homehost is no longer optional.
If it is not explicitly set, it will default to `uname -n`.  So newly
created arrays will not suffer from this problem.  Arrays created with
mdadm 2.x do.  They can be 'upgraded' with
    --assemble --update=homehost
which is a suggestion that should be put in the man page.

Your idea of allowing the names "/dev/md0" and "md0" to connect with the
minor number '0' in the same way that the name "0" does is a good
one.  I have implemented that.

I think I am leaning towards 'homehost=any' rather than 'homehost=*'
and will implement that. (No one would have a computer called 'any'
would they?).

Thanks again for your input.

NeilBrown




> 
> The logic behind this in mdadm-3.0devel3 is basically "if the array  
> exists in mdadm.conf or if it has this homehost, assemble using normal  
> name, else use a random name".  However, in the world of movable  
> arrays (think one of those 5 disk SATA raid towers that just has a  
> single eSATA port and a port replicator, which can easily be moved  
> from machine to machine), this doesn't work so well.  The problem is  
> that when you assemble an array with a random number, you confuse  
> users.  They might find the array eventually, but it's certainly not  
> as easy as if the array used the name they expected.  In an attempt to  
> get mdadm to not possibly conflict with local array names, the  
> homehost method of selecting which array name to use causes confusion  
> all the time, instead of only confusing users when a conflict actually  
> occurs.  This doesn't make sense to me, so I redid the tests in mdadm  
> to change this (this is exacerbated by the fact that if your array  
> does not define a homehost, it gets treated as though it has a  
> different homehost, so common version 0.90 arrays will always get  
> assembled as a random number if they aren't in the mdadm.conf file  
> whether they are meant for this host or not).
> 
> So, my logic goes like this:
> 
> Does the array match an array mdadm.conf via uuid?  If yes, use name  
> from mdadm.conf.  If no, does the array match an entry in mdadm.conf  
> via the standard super-minor/name mapping?  If yes, and that array  
> line contains a uuid that doesn't match this entry, then use a random  
> name because this is likely a conflict.  If yes and that line does not  
> contain a uuid entry, then this is likely a match, but a poor one.   
> Use the name, but don't like it.  If no, then this array didn't match  
> the mdadm.conf file at all and is likely a foreign array.  However, if  
> there is no mdadm.conf file, or if there is a mdadm.conf file and  
> nothing in it used our name, then foreign or not, it likely won't  
> conflict on name, so go ahead and use the standard name for this device.
> 
> I had to modify the match loop to store both uuid and name matches  
> separately in order to support this logic.  There's some other changes  
> that were necessary in order to make it work properly, and I had to  
> change mdopen.c to automatically go from what we thought was a good  
> name to a random name if a conflict on an array happens in order to  
> avoid failed autoassembles.  However, I'm personally much happier with  
> the results.  For example, I can define md0 in the mdadm.conf file,  
> create two different md0 arrays, then attempt to autoassemble the one  
> that isn't in mdadm.conf and it will automatically get a random name  
> and when the one that is in mdadm.conf shows up it gets the right  
> name.  I can also define to md0 arrays with neither of them in the  
> mdadm.conf file and it will assemble the first as md0 and the second  
> as name md0_0 with a random minor (I think, it's been a week or so  
> since I did that testing).  Anyway, it works well, and it basically  
> negates the need for homehost in my opinion.  And the fact that it  
> only assembles an array with a random number when it truly needs to is  
> something that will help to greatly reduce confusion of users, which  
> is always a plus in my book.  I'll attach the patch for your review.   
> I could have shortened the logic in the match tests to just what's  
> needed to set things right, but I left the long version so people can  
> see all the possible options and why a specific setting is chosen on  
> any given option.  Oh, and the patch also loosens up the name matching  
> somewhat so that if someone names their device /dev/md0, that matches  
> super-minor 0, as does md0 and just plain 0.  The original match  
> setup, at least for devices not in the mdadm.conf file with a name in  
> the array line, would only match the array name if it was numeric only  
> (aka, homehost:0 or just 0).  I found that to be overly restrictive  
> and contrary to what a lot of people would expect should be entered in  
> the name field of the superblock.
> 
> Since I'm sending this anyway, I'll send a couple other changes I made  
> to our mdadm in separate mails.
> 
> 
> 
> --
> 
> Doug Ledford <dledford@redhat.com>
> 
> GPG KeyID: CFBFF194
> http://people.redhat.com/dledford
> 
> InfiniBand Specific RPMS
> http://people.redhat.com/dledford/Infiniband
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17  3:49       ` Neil Brown
@ 2009-04-17  7:08         ` Gabor Gombas
  2009-04-20  5:23           ` Neil Brown
  2009-04-17 18:17         ` Doug Ledford
  1 sibling, 1 reply; 59+ messages in thread
From: Gabor Gombas @ 2009-04-17  7:08 UTC (permalink / raw)
  To: Neil Brown; +Cc: Doug Ledford, Jon Nelson, LinuxRaid

On Fri, Apr 17, 2009 at 01:49:41PM +1000, Neil Brown wrote:

> As you probably know, my preferred solution is to have all arrays
> listed in /etc/mdadm.conf.  If it isn't in mdadm.conf, it doesn't get
> assembled.   But I don't have a lot of company in this opinion.  Lots
> of people want to have arrays assembled without them being in
> mdadm.conf, and I'm trying to work with that.

IMHO the goal to have all arrays defined in mdadm.conf would be much
better to achieve if mdadm managed that configuration itself, not unlike
how LVM metadata is handled. Of course doing that right is not exactly
easy...

> Note that 0.90 metadata does contain homehost information to some
> extent.  When homehost is set, the last few bytes of the uuid is set
> from a hash of the homehost name.  That makes it possible to test if a
> 0.90 array was created for 'this' host, but not to find out what host
> it was created for.  So the above expedient won't work for 0.90
> arrays, but the rest of the homehost concept (including any possible
> 'homehost=any' option) does.

How about introducing /dev/md/by-uuid/... (or similar) and teaching
people that if they want to transparently carry their arrays from one
host to another, then they should always refer to it by UUID?

Mounting file systems by UUID instead of device path got accepted by
people who really care about moving things around, so doing the same for
RAID could also work.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17  3:49       ` Neil Brown
  2009-04-17  7:08         ` Gabor Gombas
@ 2009-04-17 18:17         ` Doug Ledford
  2009-04-17 18:40           ` Piergiorgio Sartor
                             ` (3 more replies)
  1 sibling, 4 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-17 18:17 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jon Nelson, LinuxRaid

[-- Attachment #1: Type: text/plain, Size: 7861 bytes --]

On Apr 16, 2009, at 11:49 PM, Neil Brown wrote:
> On Monday April 6, dledford@redhat.com wrote:
>> On Apr 1, 2009, at 6:46 PM, Neil Brown wrote:
>>
>>> On Wednesday April 1, jnelson-linux-raid@jamponi.net wrote:
>>>> ping?
>>>
>>> Oh yeah, that's right, I was going to reply to that - thanks for the
>>> reminder.
>>>
>>>>
>>>> On Tue, Mar 24, 2009 at 11:57 AM, Jon Nelson
>>>> <jnelson-linux-raid@jamponi.net> wrote:
>>>>>
>>>>> I have a raid1 comprised of a local physical device (/dev/sda)  
>>>>> and a
>>>>> network block device (/dev/nbd0).
>>>>> When the machine hosting the network block device comes up,  
>>>>> however,
>>>>> it creates /dev/md127.
>>>>> Why?
>>>
>>> Because you cannot please all the people, all the time.
>>
>> Very true.
>
> And I fear I'm going to be displeasing again :-(
>
>>
>>>
>>> People seem to want their arrays to auto-assemble - you know, just
>>> appear and do the right thing, read their mind probably, because
>>> creating config files is too hard.
>>> So I've endeavoured to make that happen.
>>>
>>> The biggest problem with auto-assembly is what to do if two arrays
>>> claim to have the same name. (e.g. /dev/md0) - which one wins.
>>> The 'homehost' is (currently) used to resolve that.  An array only
>>> gets to use the name it claims to have if it can show that it  
>>> belongs
>>> to "this" host.  If it doesn't it still get assembled, but with some
>>> other more generic name.
>>
>> FWIW, I happen to disagree with this method.  And I'm currently
>> testing out a new algorithm for this in Fedora 11 beta.
>
> Thank you for explaining this in such detail.
> There are aspects of it that I don't like, but I think there might be
> pieces that I can take away from it too.
>
> As you probably know, my preferred solution is to have all arrays
> listed in /etc/mdadm.conf.  If it isn't in mdadm.conf, it doesn't get
> assembled.   But I don't have a lot of company in this opinion.  Lots
> of people want to have arrays assembled without them being in
> mdadm.conf, and I'm trying to work with that.

This appears to be the difference between a server setup and a desktop  
setup.  Server admins want to list things and only have known actions  
happen.  Desktop people want things to "just work".  I've had several  
people tell me they thought the idea of mdadm.conf was completely out  
of date and it should just go away entirely.  Not saying I agree, just  
letting you know what I get.

> Parts of what you are proposing seem to involve expecting people to
> take a middle ground with some arrays listed in mdadm.conf and other
> that aren't.

I do this myself FWIW.  My / and /boot arrays are in mdadm.conf, but  
arrays that I plug in via USB, eSATA, etc. are not.

>  I'm not sure I'm happy with expecting people to do that
> (though of course I'm happy to support it).

I really don't expect them to per se.  More like it's the *safe* thing  
to do.  If you ever have a conflict in names, the one in the file  
wins.  If you ever have a conflict in names without one of them in the  
file, then it's whoever got there first.  In that sense, mdadm.conf is  
just a backup for me.  Well, that and mkinitrd doesn't do incremental  
assembly, so it's needed for boot in my case.  But that could be  
changed.

> So the various parts of your algorithm which involve heuristics
> based on the entries in mdadm.conf - or on the existence of mdadm.conf
> itself - are parts that I don't feel comfortable with.
>
> What is left?  Well, the observation that moving an external
> multi-drive enclosure between hosts causes confusing naming is a valid
> and useful observation.
>
> Someone should be able to create an array on such a device called
> 'foo' and get '/dev/md/foo' created on any host.
> The best thought I have come to so far is to support (and document)
> something like
>  --create --homehost=any
> or
>  --create --homehost=*
>
> with the meaning that the array so created will get preferential
> access to it's recorded name (i.e. no "_0" suffix).
>
> I also wonder if, when mdadm finds an array that is explicitly for
> another host, we could use that host name rather than _0 to
> disambiguate.  So
>  --create /dev/md/foo --homehost=bob
> when assembled on some other host is called
>       /dev/md/foo_bob
> that might at least make it more obvious what is happening.

This is probably where you and I disagree.  I don't think you are  
disambiguating.  I think you are confounding the common case of no  
conflict.  If someone has a non-portable array, like /, they commonly  
use something like /dev/md0.  That, you will likely never get a  
conflict on.  On the other hand, if someone creates an array to be  
mobile, it will likely have a higher number (or it could be 0, but  
that implies they aren't using root raid arrays on their machines in  
all likelihood).  So, if you make a mobile array, just give it any old  
number you can remember other than the normal base numbers used by non- 
portable arrays, and viola, no conflicts (note that this is also why I  
was in favor of a completely numberless md setup, where device  
major:minor do not impact name of the array at all, and you are free  
to create something like /dev/md/root and there will be no access file  
other than /dev/md/root, specifically no alias from /dev/md0 to /dev/ 
md/root...it's much easier to remember names than numbers, and much  
easier to create a scheme that avoids conflicts 100% of the time).  As  
it stands though, the current code still won't honor random names as  
though that was the official and canonical name of the array, it  
insists on creating a /dev/md# device and then just symlinking the  
name as though the /dev/md# device is canonical.  In one of your  
previous emails you mentioned something about how bad design decisions  
get entrenched and can never be rooted out, I would point to this ;-)

> Note that 0.90 metadata does contain homehost information to some
> extent.  When homehost is set, the last few bytes of the uuid is set
> from a hash of the homehost name.  That makes it possible to test if a
> 0.90 array was created for 'this' host, but not to find out what host
> it was created for.  So the above expedient won't work for 0.90
> arrays, but the rest of the homehost concept (including any possible
> 'homehost=any' option) does.
>
> You note that arrays with no homehost are treated as foreign with not
> always being a good thing.  In 3.0, homehost is no longer optional.
> If it is not explicitly set, it will default to `uname -n`.  So newly
> created arrays will not suffer from this problem.  Arrays created with
> mdadm 2.x do.  They can be 'upgraded' with
>    --assemble --update=homehost
> which is a suggestion that should be put in the man page.

This is a bad idea, and just reinforces my thought that we shouldn't  
be paying attention to homehost.  Amongst the most important aspects  
are machines that are booted up, installed, raid arrays created during  
install, then shut down and moved, likely changing dhcp hostnames in  
the process.  Now all your homehosts belong to some hostname in some  
IT guys install network instead of in your final network.  At install  
time, it's actually fairly common that the hostname is not yet set,  
especially at raid array creation time.

> Your idea of allowing the names "/dev/md0" and "md0" to connect with  
> the
> minor number '0' in the same way that the name "0" does is a good
> one.  I have implemented that.
>
> I think I am leaning towards 'homehost=any' rather than 'homehost=*'
> and will implement that. (No one would have a computer called 'any'
> would they?).
>
> Thanks again for your input.

No problem.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17 18:17         ` Doug Ledford
@ 2009-04-17 18:40           ` Piergiorgio Sartor
  2009-04-18  7:54             ` Luca Berra
  2009-04-18 14:34             ` Andrew Burgess
  2009-04-18  8:12           ` Luca Berra
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-17 18:40 UTC (permalink / raw)
  To: linux-raid

On Fri, Apr 17, 2009 at 02:17:47PM -0400, Doug Ledford wrote:
>> As you probably know, my preferred solution is to have all arrays
>> listed in /etc/mdadm.conf.  If it isn't in mdadm.conf, it doesn't get
>> assembled.   But I don't have a lot of company in this opinion.  Lots
>> of people want to have arrays assembled without them being in
>> mdadm.conf, and I'm trying to work with that.
>
> This appears to be the difference between a server setup and a desktop  
> setup.  Server admins want to list things and only have known actions  
> happen.  Desktop people want things to "just work".  I've had several  
> people tell me they thought the idea of mdadm.conf was completely out of 
> date and it should just go away entirely.  Not saying I agree, just  
> letting you know what I get.

My two cents on this.
One puzzling thing of mdadm.conf is how it is created.
In order to get it, either "mdadm --detail --scan" or
"mdadm --examine --scan" is required.
If I understand it correctly, the latter uses information
from the underlining disks directly.
Now, why this command should be run manually?
Can the "system" does it automatically on boot or on hotplug?

The problem is that when something is changed in the array,
the file needs to be recreated.
For example, this happened to me, after growing the array it
was not possibile anymore to restart it without rerunning
the "--examin --scan" thing.

It seems to me that the usage of mdadm.conf is a bit
fragile and looping.
To start the array I need mdadm.conf, to create it I
need the array... (maybe alread started).

So, in conclusion, I would support Doug on this, and
try to have a more sensible method to get the device
name out of the array.

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17 18:40           ` Piergiorgio Sartor
@ 2009-04-18  7:54             ` Luca Berra
  2009-04-18  8:36               ` Piergiorgio Sartor
  2009-04-18 14:34             ` Andrew Burgess
  1 sibling, 1 reply; 59+ messages in thread
From: Luca Berra @ 2009-04-18  7:54 UTC (permalink / raw)
  To: linux-raid

On Fri, Apr 17, 2009 at 08:40:14PM +0200, Piergiorgio Sartor wrote:
>My two cents on this.
>One puzzling thing of mdadm.conf is how it is created.
>In order to get it, either "mdadm --detail --scan" or
>"mdadm --examine --scan" is required.
>If I understand it correctly, the latter uses information
>from the underlining disks directly.
exactly
>Now, why this command should be run manually?
because people might want control over what is happening.
otherwise we'll might as well be running windows

>Can the "system" does it automatically on boot or on hotplug?
on hotplug udev already tries to assemble raid arrays, this is the topic
of the discussion, how to make it just work on your desktop/laptop pc,
and how to make it not break the carefully tuned setup on my server,
where i will probably never connect usb/esata array.

>The problem is that when something is changed in the array,
>the file needs to be recreated.
no, unless you misconfigure mdadm.conf

>For example, this happened to me, after growing the array it
>was not possibile anymore to restart it without rerunning
>the "--examin --scan" thing.
probably because you left some component device name in the mdadm.conf
file, don't.

>It seems to me that the usage of mdadm.conf is a bit
>fragile and looping.
>To start the array I need mdadm.conf, to create it I
not really
>need the array... (maybe alread started).
i dont see the problem

L.
-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17 18:17         ` Doug Ledford
  2009-04-17 18:40           ` Piergiorgio Sartor
@ 2009-04-18  8:12           ` Luca Berra
  2009-04-18  8:44             ` Piergiorgio Sartor
  2009-04-18 13:35             ` Doug Ledford
  2009-04-18 13:58           ` Bill Davidsen
  2009-04-20  7:23           ` Neil Brown
  3 siblings, 2 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-18  8:12 UTC (permalink / raw)
  To: LinuxRaid

On Fri, Apr 17, 2009 at 02:17:47PM -0400, Doug Ledford wrote:
> This appears to be the difference between a server setup and a desktop 
> setup.  Server admins want to list things and only have known actions 
> happen.  Desktop people want things to "just work".  I've had several 
> people tell me they thought the idea of mdadm.conf was completely out of 
> date and it should just go away entirely.  Not saying I agree, just letting 
> you know what I get.
uhm, udev should be able to assemble an array without mdadm.conf, not
that i like it

>> Parts of what you are proposing seem to involve expecting people to
>> take a middle ground with some arrays listed in mdadm.conf and other
>> that aren't.
>
> I do this myself FWIW.  My / and /boot arrays are in mdadm.conf, but arrays 
> that I plug in via USB, eSATA, etc. are not.
>
>>  I'm not sure I'm happy with expecting people to do that
>> (though of course I'm happy to support it).
>
> I really don't expect them to per se.  More like it's the *safe* thing to 
> do.  If you ever have a conflict in names, the one in the file wins.  If 
> you ever have a conflict in names without one of them in the file, then 
> it's whoever got there first.  In that sense, mdadm.conf is just a backup 
> for me.  Well, that and mkinitrd doesn't do incremental assembly, so it's 
> needed for boot in my case.  But that could be changed.
yes, if we ensure it will mount the correct array :)

i was wondering about indicating our preference to policy in mdadm.conf
ie

POLICY {dynamic|preferred|strict}

dynamic: assemble anything you find, naming policy first come first
served. this might be the only line in mdadm.conf

preferred (in need of a better name): arrays defined here have
precedence over dynamically found arrays

strict: if it ain't here, just ignore it

....
> This is a bad idea, and just reinforces my thought that we shouldn't be 
> paying attention to homehost.  Amongst the most important aspects are 
> machines that are booted up, installed, raid arrays created during install, 
> then shut down and moved, likely changing dhcp hostnames in the process.  
> Now all your homehosts belong to some hostname in some IT guys install 
> network instead of in your final network.  At install time, it's actually 
> fairly common that the hostname is not yet set, especially at raid array 
> creation time.
i never found much use for homehost, i would prefer to have a stricter
locking mechanism for shared storage (maybe integration of cman
locking???) and leave the desktop world with randomic names.
If you get the first case wrong you risk damaging data.
If you get the array name wrong on a desktop i expect the luser will
never even notice as long a windows pops up showing the filesystem
contents :P

L.


-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18  7:54             ` Luca Berra
@ 2009-04-18  8:36               ` Piergiorgio Sartor
  2009-04-18 10:19                 ` Luca Berra
  0 siblings, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-18  8:36 UTC (permalink / raw)
  To: linux-raid

On Sat, Apr 18, 2009 at 09:54:38AM +0200, Luca Berra wrote:
> because people might want control over what is happening.
> otherwise we'll might as well be running windows

There is a difference between having to run it
*always* manually, or having the choice to do
one way (manually) or the other (automantically).

Control is not about manuality, control is about
having the possibility to do what is considered
most convenient.

So, the point is not having it automatic, but
having the choice between automatic or manual,
or both.

> on hotplug udev already tries to assemble raid arrays, this is the topic
> of the discussion, how to make it just work on your desktop/laptop pc,
> and how to make it not break the carefully tuned setup on my server,
> where i will probably never connect usb/esata array.

Well, so we are on the same frequency!

>> The problem is that when something is changed in the array,
>> the file needs to be recreated.
> no, unless you misconfigure mdadm.conf

I did not! :-)

>> For example, this happened to me, after growing the array it
>> was not possibile anymore to restart it without rerunning
>> the "--examin --scan" thing.
> probably because you left some component device name in the mdadm.conf
> file, don't.

The cause is "num-devices", which is set by the
"--examine --scan" or "--detail --scan".
If I grow the array, this changes, and the config
file needs an update.
Of course, the "num-devices" could be removed, but
it is more manual work, which could lead to errors.

>> It seems to me that the usage of mdadm.conf is a bit
>> fragile and looping.
>> To start the array I need mdadm.conf, to create it I
> not really
>> need the array... (maybe alread started).
> i dont see the problem

The problem is that the safer information to assemble
the array is (or should be) the UUID.
Since this is not something easy to remember, it is
necessary to find it out.
The information could be retrieved from the devices
composing it, of course.
I find a bit "silly" to have to check the components
manually in order to configure the file needed to
start the array.

Of course, for a fully static environment, this is
not a big issue. Once done, is done.
For a more "changing" situation, this could be
quite annoying and error prone.
I already had "booting" issue, because I forgot
to *update* the file.

In other words, it would be nice to have some more
automatic method to handle this configuration file.
This should *not* replace the manual operation, it
should just be an add on to it.

Of course, an "udev" solution would be also good,
so I'm not proposing anything different from what
is discussed here.
As I wrote, these are only my two cents or less.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18  8:12           ` Luca Berra
@ 2009-04-18  8:44             ` Piergiorgio Sartor
  2009-04-18 13:35             ` Doug Ledford
  1 sibling, 0 replies; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-18  8:44 UTC (permalink / raw)
  To: LinuxRaid

On Sat, Apr 18, 2009 at 10:12:53AM +0200, Luca Berra wrote:
> uhm, udev should be able to assemble an array without mdadm.conf, not
> that i like it

Actually I was wondering about that, since it
seems not always working.
I tested an array over USB, and the udev thing
sometimes works, sometimes it does not.
But if I stop the remainings of the array and
re-assemble it manually, no problem.
Could it be there is some race condition?

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18  8:36               ` Piergiorgio Sartor
@ 2009-04-18 10:19                 ` Luca Berra
  2009-04-18 13:06                   ` Piergiorgio Sartor
  0 siblings, 1 reply; 59+ messages in thread
From: Luca Berra @ 2009-04-18 10:19 UTC (permalink / raw)
  To: linux-raid

On Sat, Apr 18, 2009 at 10:36:09AM +0200, Piergiorgio Sartor wrote:
>The cause is "num-devices", which is set by the
>"--examine --scan" or "--detail --scan".

i believe the num-devices is redundant since this value is already
stored in the superblock
we could change the way mdadm outputs data to put _all_ redundant
information in subsequent lines and keep the only required info in the
'ARRAY' line,
so mdadm --examine --scan | grep ARRAY would be suitable for initial
configuration
or even print it only if a --verbose flag is added
so mdadm --examine --scan by itself would suit most need

for the time being some akw magic can be used to parse 'mdadm --examine
--scan' and make it suitable for inclusion in mdadm.conf

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18 10:19                 ` Luca Berra
@ 2009-04-18 13:06                   ` Piergiorgio Sartor
  2009-04-20  5:58                     ` Neil Brown
  0 siblings, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-18 13:06 UTC (permalink / raw)
  To: linux-raid

On Sat, Apr 18, 2009 at 12:19:54PM +0200, Luca Berra wrote:
> i believe the num-devices is redundant since this value is already
> stored in the superblock

I believe too. As I mentioned, it would be possible
to edit the outcome of "--examine --scan", but not
really wanted.

> we could change the way mdadm outputs data to put _all_ redundant
> information in subsequent lines and keep the only required info in the
> 'ARRAY' line,
> so mdadm --examine --scan | grep ARRAY would be suitable for initial
> configuration
> or even print it only if a --verbose flag is added
> so mdadm --examine --scan by itself would suit most need

This second option I would prefer in one way or the other.
I mean, either "mdadm --verbose --examine --scan": all info,
or "mdadm --quiet --examine --scan": minimal info.
One of the two would be OK, I guess (not necessarly both).

> for the time being some akw magic can be used to parse 'mdadm --examine
> --scan' and make it suitable for inclusion in mdadm.conf

This is a possibility. It would be also OK to have a
script, delivered together with mdadm, doing this.
I can script myself, but a "standard solution" might
be better.

One question somehow related to this thread.

I would like to have my "fixed" RAIDs as devices with a
specific name.
That is, something like /dev/md/root and /dev/md/lvm (for
/dev/md0 and /dev/md1).
In this context, I would also like to have /dev/md0 and
/dev/md1 free to be used by other RAID.
Of course, I've no problem in using mdadm.conf for this,
but it seems that it is only possible something like
/dev/mdX or /dev/md/X.

Is this correct or there is some way to "personalize" the
created device name?

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18  8:12           ` Luca Berra
  2009-04-18  8:44             ` Piergiorgio Sartor
@ 2009-04-18 13:35             ` Doug Ledford
  2009-04-18 13:52               ` Piergiorgio Sartor
                                 ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-18 13:35 UTC (permalink / raw)
  To: Luca Berra; +Cc: LinuxRaid

[-- Attachment #1: Type: text/plain, Size: 4091 bytes --]

On Apr 18, 2009, at 4:12 AM, Luca Berra wrote:
> On Fri, Apr 17, 2009 at 02:17:47PM -0400, Doug Ledford wrote:
>> This appears to be the difference between a server setup and a  
>> desktop setup.  Server admins want to list things and only have  
>> known actions happen.  Desktop people want things to "just work".   
>> I've had several people tell me they thought the idea of mdadm.conf  
>> was completely out of date and it should just go away entirely.   
>> Not saying I agree, just letting you know what I get.
> uhm, udev should be able to assemble an array without mdadm.conf, not
> that i like it

It does, the question is whether or not it should honor the preferred  
minor when it assembles an array not in mdadm.conf.

>>> Parts of what you are proposing seem to involve expecting people to
>>> take a middle ground with some arrays listed in mdadm.conf and other
>>> that aren't.
>>
>> I do this myself FWIW.  My / and /boot arrays are in mdadm.conf,  
>> but arrays that I plug in via USB, eSATA, etc. are not.
>>
>>> I'm not sure I'm happy with expecting people to do that
>>> (though of course I'm happy to support it).
>>
>> I really don't expect them to per se.  More like it's the *safe*  
>> thing to do.  If you ever have a conflict in names, the one in the  
>> file wins.  If you ever have a conflict in names without one of  
>> them in the file, then it's whoever got there first.  In that  
>> sense, mdadm.conf is just a backup for me.  Well, that and mkinitrd  
>> doesn't do incremental assembly, so it's needed for boot in my  
>> case.  But that could be changed.
> yes, if we ensure it will mount the correct array :)

With my patch (which Neil didn't like), it does.

> i was wondering about indicating our preference to policy in  
> mdadm.conf
> ie
>
> POLICY {dynamic|preferred|strict}
>
> dynamic: assemble anything you find, naming policy first come first
> served. this might be the only line in mdadm.conf
>
> preferred (in need of a better name): arrays defined here have
> precedence over dynamically found arrays
>
> strict: if it ain't here, just ignore it
>
> ....
>> This is a bad idea, and just reinforces my thought that we  
>> shouldn't be paying attention to homehost.  Amongst the most  
>> important aspects are machines that are booted up, installed, raid  
>> arrays created during install, then shut down and moved, likely  
>> changing dhcp hostnames in the process.  Now all your homehosts  
>> belong to some hostname in some IT guys install network instead of  
>> in your final network.  At install time, it's actually fairly  
>> common that the hostname is not yet set, especially at raid array  
>> creation time.
> i never found much use for homehost, i would prefer to have a stricter
> locking mechanism for shared storage (maybe integration of cman
> locking???) and leave the desktop world with randomic names.
> If you get the first case wrong you risk damaging data.
> If you get the array name wrong on a desktop i expect the luser will
> never even notice as long a windows pops up showing the filesystem
> contents :P


I've been thinking about this, and this is the method I would suggest.

Add two new keywords to the mdadm.conf file:

ASSEMBLE
INCREMENTAL

Allow each of those keywords to have one of three set values:
None - Don't attempt to assemble any arrays regardless of whether or  
not they are in the mdadm.conf file or not
Known - Only assemble arrays with a matching array line
All - Attempt to assemble any array found

The combination of the two options and the three settings would allow  
you to control mdadm behavior for both array assembly modes  
independently.  That, combined with my previous patch, should allow  
arrays to assemble well, with known names, allow you to control auto  
assembly by udev, and in the event that your machine just exports  
volumes to other machines for their use, stop assembly entirely.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18 13:35             ` Doug Ledford
@ 2009-04-18 13:52               ` Piergiorgio Sartor
  2009-04-18 14:50                 ` Doug Ledford
  2009-04-18 14:48               ` Jon Nelson
  2009-04-20  6:08               ` Neil Brown
  2 siblings, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-18 13:52 UTC (permalink / raw)
  Cc: Luca Berra, LinuxRaid

On Sat, Apr 18, 2009 at 09:35:33AM -0400, Doug Ledford wrote:
> On Apr 18, 2009, at 4:12 AM, Luca Berra wrote:
>> uhm, udev should be able to assemble an array without mdadm.conf, not
>> that i like it
>
> It does, the question is whether or not it should honor the preferred  
> minor when it assembles an array not in mdadm.conf.

Does it?
Then I should open a bug to Fedora, because to
me it does not (always).

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17 18:17         ` Doug Ledford
  2009-04-17 18:40           ` Piergiorgio Sartor
  2009-04-18  8:12           ` Luca Berra
@ 2009-04-18 13:58           ` Bill Davidsen
  2009-04-20  7:23           ` Neil Brown
  3 siblings, 0 replies; 59+ messages in thread
From: Bill Davidsen @ 2009-04-18 13:58 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Neil Brown, Jon Nelson, LinuxRaid

Doug Ledford wrote:
> On Apr 16, 2009, at 11:49 PM, Neil Brown wrote:
>> On Monday April 6, dledford@redhat.com wrote:
>>> On Apr 1, 2009, at 6:46 PM, Neil Brown wrote:
>>>
>>>> On Wednesday April 1, jnelson-linux-raid@jamponi.net wrote:
>>>>> ping?
>>>>
>>>> Oh yeah, that's right, I was going to reply to that - thanks for the
>>>> reminder.
>>>>
>>>>>
>>>>> On Tue, Mar 24, 2009 at 11:57 AM, Jon Nelson
>>>>> <jnelson-linux-raid@jamponi.net> wrote:
>>>>>>
>>>>>> I have a raid1 comprised of a local physical device (/dev/sda) and a
>>>>>> network block device (/dev/nbd0).
>>>>>> When the machine hosting the network block device comes up, however,
>>>>>> it creates /dev/md127.
>>>>>> Why?
>>>>
>>>> Because you cannot please all the people, all the time.
>>>
>>> Very true.
>>
>> And I fear I'm going to be displeasing again :-(
>>
>>>
>>>>
>>>> People seem to want their arrays to auto-assemble - you know, just
>>>> appear and do the right thing, read their mind probably, because
>>>> creating config files is too hard.
>>>> So I've endeavoured to make that happen.
>>>>
>>>> The biggest problem with auto-assembly is what to do if two arrays
>>>> claim to have the same name. (e.g. /dev/md0) - which one wins.
>>>> The 'homehost' is (currently) used to resolve that.  An array only
>>>> gets to use the name it claims to have if it can show that it belongs
>>>> to "this" host.  If it doesn't it still get assembled, but with some
>>>> other more generic name.
>>>
>>> FWIW, I happen to disagree with this method.  And I'm currently
>>> testing out a new algorithm for this in Fedora 11 beta.
>>
>> Thank you for explaining this in such detail.
>> There are aspects of it that I don't like, but I think there might be
>> pieces that I can take away from it too.
>>
>> As you probably know, my preferred solution is to have all arrays
>> listed in /etc/mdadm.conf.  If it isn't in mdadm.conf, it doesn't get
>> assembled.   But I don't have a lot of company in this opinion.  Lots
>> of people want to have arrays assembled without them being in
>> mdadm.conf, and I'm trying to work with that.
>
> This appears to be the difference between a server setup and a desktop 
> setup.  Server admins want to list things and only have known actions 
> happen.  Desktop people want things to "just work".  I've had several 
> people tell me they thought the idea of mdadm.conf was completely out 
> of date and it should just go away entirely.  Not saying I agree, just 
> letting you know what I get.
>
>> Parts of what you are proposing seem to involve expecting people to
>> take a middle ground with some arrays listed in mdadm.conf and other
>> that aren't.
>
> I do this myself FWIW.  My / and /boot arrays are in mdadm.conf, but 
> arrays that I plug in via USB, eSATA, etc. are not.

Similar here, I have arrays which should not be assembled without 
explicit request, for various reasons, including some which have 
passwords on the filesystem, and some which are mutually exclusive (test 
and production, 32/64 bit setups, etc).

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17 18:40           ` Piergiorgio Sartor
  2009-04-18  7:54             ` Luca Berra
@ 2009-04-18 14:34             ` Andrew Burgess
  1 sibling, 0 replies; 59+ messages in thread
From: Andrew Burgess @ 2009-04-18 14:34 UTC (permalink / raw)
  To: linux raid mailing list

On Fri, 2009-04-17 at 20:40 +0200, Piergiorgio Sartor wrote:
> 
> The problem is that when something is changed in the array,
> the file needs to be recreated.
> For example, this happened to me, after growing the array it
> was not possibile anymore to restart it without rerunning
> the "--examin --scan" thing.

That happened to me too but it was because I had the number of devices
in the file and that became incorrect when I grew the array. So I
removed everything except the UID and the name. All the other
information mdadm can glean from the devices themselves. The UID won't
change unless you completely obliterate the array.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18 13:35             ` Doug Ledford
  2009-04-18 13:52               ` Piergiorgio Sartor
@ 2009-04-18 14:48               ` Jon Nelson
  2009-04-20  6:08               ` Neil Brown
  2 siblings, 0 replies; 59+ messages in thread
From: Jon Nelson @ 2009-04-18 14:48 UTC (permalink / raw)
  Cc: LinuxRaid

On Sat, Apr 18, 2009 at 8:35 AM, Doug Ledford <dledford@redhat.com> wrote:
> I've been thinking about this, and this is the method I would suggest.
>
> Add two new keywords to the mdadm.conf file:
>
> ASSEMBLE
> INCREMENTAL
>
> Allow each of those keywords to have one of three set values:
> None - Don't attempt to assemble any arrays regardless of whether or not they are in the mdadm.conf file or not
> Known - Only assemble arrays with a matching array line
> All - Attempt to assemble any array found
>
> The combination of the two options and the three settings would allow you to control mdadm behavior for both array assembly modes independently.  That, combined with my previous patch, should allow arrays to assemble well, with known names, allow you to control auto assembly by udev, and in the event that your machine just exports volumes to other machines for their use, stop assembly entirely.
>
> --



Of the proposals thus far, I think I like this one the best.
More or less, it's what I thought homehost was supposed to do (if
homehost did not match, ignore!) but given the morass around homehost,
this seems like a very reasonable approach to solving this class of
issues. With the increasing prevalence of block devices which may
appear on many hosts (like AoE, etc...) it seems as though this issue
isn't going to go away easily.



--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18 13:52               ` Piergiorgio Sartor
@ 2009-04-18 14:50                 ` Doug Ledford
  0 siblings, 0 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-18 14:50 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid, Luca Berra

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

On Apr 18, 2009, at 9:52 AM, Piergiorgio Sartor wrote:
> On Sat, Apr 18, 2009 at 09:35:33AM -0400, Doug Ledford wrote:
>> On Apr 18, 2009, at 4:12 AM, Luca Berra wrote:
>>> uhm, udev should be able to assemble an array without mdadm.conf,  
>>> not
>>> that i like it
>>
>> It does, the question is whether or not it should honor the preferred
>> minor when it assembles an array not in mdadm.conf.
>
> Does it?

Starting with F11, yes.  Not with prior releases.

> Then I should open a bug to Fedora, because to
> me it does not (always).
>
> bye,
>
> -- 
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17  7:08         ` Gabor Gombas
@ 2009-04-20  5:23           ` Neil Brown
  2009-04-21  6:34             ` Gabor Gombas
  0 siblings, 1 reply; 59+ messages in thread
From: Neil Brown @ 2009-04-20  5:23 UTC (permalink / raw)
  To: Gabor Gombas; +Cc: Doug Ledford, Jon Nelson, LinuxRaid

On Friday April 17, gombasg@sztaki.hu wrote:
> On Fri, Apr 17, 2009 at 01:49:41PM +1000, Neil Brown wrote:
> 
> > As you probably know, my preferred solution is to have all arrays
> > listed in /etc/mdadm.conf.  If it isn't in mdadm.conf, it doesn't get
> > assembled.   But I don't have a lot of company in this opinion.  Lots
> > of people want to have arrays assembled without them being in
> > mdadm.conf, and I'm trying to work with that.
> 
> IMHO the goal to have all arrays defined in mdadm.conf would be much
> better to achieve if mdadm managed that configuration itself, not unlike
> how LVM metadata is handled. Of course doing that right is not exactly
> easy...

How does LVM manage metadata???  I assume it stored the metadata on
the device.  Which is what mdadm does.
But as devices can move between machines.....

> 
> > Note that 0.90 metadata does contain homehost information to some
> > extent.  When homehost is set, the last few bytes of the uuid is set
> > from a hash of the homehost name.  That makes it possible to test if a
> > 0.90 array was created for 'this' host, but not to find out what host
> > it was created for.  So the above expedient won't work for 0.90
> > arrays, but the rest of the homehost concept (including any possible
> > 'homehost=any' option) does.
> 
> How about introducing /dev/md/by-uuid/... (or similar) and teaching
> people that if they want to transparently carry their arrays from one
> host to another, then they should always refer to it by UUID?

This already exists, though it might be distro-dependant.
  /dev/disk/by-id/md-uuid-xxxxx

> 
> Mounting file systems by UUID instead of device path got accepted by
> people who really care about moving things around, so doing the same for
> RAID could also work.

That would be nice... 

NeilBrown


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18 13:06                   ` Piergiorgio Sartor
@ 2009-04-20  5:58                     ` Neil Brown
  2009-04-20 12:29                       ` Doug Ledford
  2009-04-20 18:17                       ` Piergiorgio Sartor
  0 siblings, 2 replies; 59+ messages in thread
From: Neil Brown @ 2009-04-20  5:58 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Saturday April 18, piergiorgio.sartor@nexgo.de wrote:
> On Sat, Apr 18, 2009 at 12:19:54PM +0200, Luca Berra wrote:
> > i believe the num-devices is redundant since this value is already
> > stored in the superblock
> 
> I believe too. As I mentioned, it would be possible
> to edit the outcome of "--examine --scan", but not
> really wanted.
> 
> > we could change the way mdadm outputs data to put _all_ redundant
> > information in subsequent lines and keep the only required info in the
> > 'ARRAY' line,
> > so mdadm --examine --scan | grep ARRAY would be suitable for initial
> > configuration
> > or even print it only if a --verbose flag is added
> > so mdadm --examine --scan by itself would suit most need
> 
> This second option I would prefer in one way or the other.
> I mean, either "mdadm --verbose --examine --scan": all info,
> or "mdadm --quiet --examine --scan": minimal info.
> One of the two would be OK, I guess (not necessarly both).

mdadm --verbose --examine (or --detail) --scan
already provided extra info not included without --verbose, that being
the list of devices that currently comprise the array.
I have just made a modification the 3.0-devel so that level= and
devices= are not reported unless --examine is given.
That just leaves metadata=, UUID= and possibly name= container=
member=, which should all be safe to have in mdadm.conf.

Thanks for the suggestion.

> 
> > for the time being some akw magic can be used to parse 'mdadm --examine
> > --scan' and make it suitable for inclusion in mdadm.conf
> 
> This is a possibility. It would be also OK to have a
> script, delivered together with mdadm, doing this.
> I can script myself, but a "standard solution" might
> be better.
> 
> One question somehow related to this thread.
> 
> I would like to have my "fixed" RAIDs as devices with a
> specific name.
> That is, something like /dev/md/root and /dev/md/lvm (for
> /dev/md0 and /dev/md1).
> In this context, I would also like to have /dev/md0 and
> /dev/md1 free to be used by other RAID.
> Of course, I've no problem in using mdadm.conf for this,
> but it seems that it is only possible something like
> /dev/mdX or /dev/md/X.
> 
> Is this correct or there is some way to "personalize" the
> created device name?

Yes.  If you use 0.90 metadata (still the default ... I wonder if I
should change that for 3.0..) then you need to list the name in
mdadm.conf, but

  ARRAY /dev/md/foo UUID=whatever

should do what you want.

If you use 1.x metadata (e.g. 1.0), then this works nicely.

 mdadm --create /dev/md/foo --metadata 1.0 --level .....

This will store the name 'foo' in the metadata and when you assemble
the array, it will be called /dev/md/foo.
This will be a symlink to /dev/md125 or something like that, but you
don't need to care.

NeilBrown

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-18 13:35             ` Doug Ledford
  2009-04-18 13:52               ` Piergiorgio Sartor
  2009-04-18 14:48               ` Jon Nelson
@ 2009-04-20  6:08               ` Neil Brown
  2009-04-20 12:26                 ` Luca Berra
  2009-04-20 12:36                 ` Doug Ledford
  2 siblings, 2 replies; 59+ messages in thread
From: Neil Brown @ 2009-04-20  6:08 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Luca Berra, LinuxRaid

On Saturday April 18, dledford@redhat.com wrote:
> 
> I've been thinking about this, and this is the method I would suggest.
> 
> Add two new keywords to the mdadm.conf file:
> 
> ASSEMBLE
> INCREMENTAL
> 
> Allow each of those keywords to have one of three set values:
> None - Don't attempt to assemble any arrays regardless of whether or  
> not they are in the mdadm.conf file or not
> Known - Only assemble arrays with a matching array line
> All - Attempt to assemble any array found
> 
> The combination of the two options and the three settings would allow  
> you to control mdadm behavior for both array assembly modes  
> independently.  That, combined with my previous patch, should allow  
> arrays to assemble well, with known names, allow you to control auto  
> assembly by udev, and in the event that your machine just exports  
> volumes to other machines for their use, stop assembly entirely.

Why "None"??  Why would you use "None" rather than "Known" with an
empty list of arrays?

Why have two options: ASSEMBLE and INCREMENTAL ??
If what circumstance would you use different settings for these two
options.

I current have two patches sitting in my scratch queue.  I am by no
means committed to them.

One allows you to have e.g.

  ARRAY ignore UUID=foo:bar:dead:beef

with the meaning that auto-assembly will ignore that array.  If you
run

  mdadm --assemble /dev/md/thing --uuid foo:bar:dead:beef

it will still assemble the array, but any auto-assembly will ignore
it.

The other allows you to say:

 AUTO -ddf -0.90 +all

which means don't auto-assemble any 'ddf' or '0.90' array, but do
auto-assemble anything else that is recognised.
You might want to use dmraid for ddf??

If you just have

  AUTO -all

then it won't auto-assemble anything, which is much like your
  ASSEMBLE  Known
  INCREMENTAL Known

NeilBrown

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-17 18:17         ` Doug Ledford
                             ` (2 preceding siblings ...)
  2009-04-18 13:58           ` Bill Davidsen
@ 2009-04-20  7:23           ` Neil Brown
  2009-04-20 13:15             ` Doug Ledford
  3 siblings, 1 reply; 59+ messages in thread
From: Neil Brown @ 2009-04-20  7:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Jon Nelson, LinuxRaid

On Friday April 17, dledford@redhat.com wrote:
> On Apr 16, 2009, at 11:49 PM, Neil Brown wrote:
> > On Monday April 6, dledford@redhat.com wrote:
> >> On Apr 1, 2009, at 6:46 PM, Neil Brown wrote:
> 
> This appears to be the difference between a server setup and a desktop  
> setup.  Server admins want to list things and only have known actions  
> happen.  Desktop people want things to "just work".  I've had several  
> people tell me they thought the idea of mdadm.conf was completely out  
> of date and it should just go away entirely.  Not saying I agree, just  
> letting you know what I get.

:-)

> >  I'm not sure I'm happy with expecting people to do that
> > (though of course I'm happy to support it).
> 
> I really don't expect them to per se.  More like it's the *safe* thing  
> to do.  If you ever have a conflict in names, the one in the file  
> wins.  If you ever have a conflict in names without one of them in the  
> file, then it's whoever got there first.  In that sense, mdadm.conf is  
> just a backup for me.  Well, that and mkinitrd doesn't do incremental  
> assembly, so it's needed for boot in my case.  But that could be  
> changed.

So the safe thing to do is to create mdadm.conf.  But we all know that
the convenient thing to do is not to create mdadm.conf.

Thus safe and convenient are separate.  This sounds like bad design.

I like it that not creating mdadm.conf is a little bit inconvenient in
that you are more likely to get names with _N suffixes.  It (I hope)
motivates people to become safe, either by making sure homehost works,
or be creating mdadm.conf.

The case that I want to avoid is this:
  You have two machines that each boot off their own md0.
  Late one night machine A dies.  So you get called in, while half
  asleep, to get the data back on line.
  You shut down B, pull the drive out of A and plug them into B and
  then boot B.
  You find that it made a root filesytem from the drives that were in A
  rather than in B.

This could be just inconvenient, or it could be a serious mess.

I don't want people to discover these potential naming conflicts which
trying to recover from a disaster.  I want them to discover them when
initially setting up their array.

To achieve that, I should probably make the _N suffix truly random
rather than simply arbitrary.  But I haven't done that.  Yet.

> 
> > So the various parts of your algorithm which involve heuristics
> > based on the entries in mdadm.conf - or on the existence of mdadm.conf
> > itself - are parts that I don't feel comfortable with.
> >
> > What is left?  Well, the observation that moving an external
> > multi-drive enclosure between hosts causes confusing naming is a valid
> > and useful observation.
> >
> > Someone should be able to create an array on such a device called
> > 'foo' and get '/dev/md/foo' created on any host.
> > The best thought I have come to so far is to support (and document)
> > something like
> >  --create --homehost=any
> > or
> >  --create --homehost=*
> >
> > with the meaning that the array so created will get preferential
> > access to it's recorded name (i.e. no "_0" suffix).
> >
> > I also wonder if, when mdadm finds an array that is explicitly for
> > another host, we could use that host name rather than _0 to
> > disambiguate.  So
> >  --create /dev/md/foo --homehost=bob
> > when assembled on some other host is called
> >       /dev/md/foo_bob
> > that might at least make it more obvious what is happening.
> 
> This is probably where you and I disagree.  I don't think you are  
> disambiguating.  I think you are confounding the common case of no  
> conflict.  If someone has a non-portable array, like /, they commonly  
> use something like /dev/md0.  That, you will likely never get a  
> conflict on.

Except in the above scenario, when you least want it to happen.

>               On the other hand, if someone creates an array to be  
> mobile, it will likely have a higher number (or it could be 0, but  
> that implies they aren't using root raid arrays on their machines in  
> all likelihood).  So, if you make a mobile array, just give it any old  
> number you can remember other than the normal base numbers used by non- 
> portable arrays, and viola, no conflicts (note that this is also why I  
> was in favor of a completely numberless md setup, where device  
> major:minor do not impact name of the array at all, and you are free  
> to create something like /dev/md/root and there will be no access file  
> other than /dev/md/root, specifically no alias from /dev/md0 to /dev/ 
> md/root...it's much easier to remember names than numbers, and much  
> easier to create a scheme that avoids conflicts 100% of the time).  As  
> it stands though, the current code still won't honor random names as  
> though that was the official and canonical name of the array, it  
> insists on creating a /dev/md# device and then just symlinking the  
> name as though the /dev/md# device is canonical.  In one of your  
> previous emails you mentioned something about how bad design decisions  
> get entrenched and can never be rooted out, I would point to this
> ;-)

I had forgotten about this...
The kernel supports this.  We just need to make sure it works with
udev and get mdadm to use it.

 echo md_foo > /sys/modules/md_mod/parameters/new_array
 ls -l /dev/md_foo

no numbers at all.

Maybe we can start using this in 3.1.
But I'm not sure how this relates to the current problem of how to
choose a name based on the contents of the metadata.



You draw a distinction between mobile and non-mobile arrays.  Quite
possibly that is a useful distinction to pursue.

It is the non-mobile arrays that I am particularly concerned about.
If someone plugs in a mobile array I'm happy to give them whatever
name seems like a good idea - conflicts aren't such a problem.

But how can we tell the difference???

Well, we could look in /etc/fstab (unless the same people who think
/etc/mdadm.conf is old fashioned manage to get rid of /etc/fstab as
well).

How about this:
  A name is 'local' if:
    it is associated with the array via mdadm.conf or
    it is associated with this host via 'homehost' 
  A name is 'non-mobile' if:
    it is associated with some use in /dev/fstab

Then if a name is either 'local' or not 'non-mobile', then we feel free to
use it as it stands, otherwise we add a _N suffix.
I think this is fairly close to using 'my' rules for things listed in
/etc/fstab, and 'your' rules for everything else.

This is tempting, but feels like it might be a bit fragile.
Does anything other than /etc/fstab depend on device names to
find things that are stored on devices?

One fragility would appear when running "mdadm -As" in an initrd.
You might not have an /etc/fstab at all, so everything might get
assembled using the wrong set of rules.

Maybe there is a safe way to detect "in initrd" and impose the
conservative rules in that case.

> 
> > Note that 0.90 metadata does contain homehost information to some
> > extent.  When homehost is set, the last few bytes of the uuid is set
> > from a hash of the homehost name.  That makes it possible to test if a
> > 0.90 array was created for 'this' host, but not to find out what host
> > it was created for.  So the above expedient won't work for 0.90
> > arrays, but the rest of the homehost concept (including any possible
> > 'homehost=any' option) does.
> >
> > You note that arrays with no homehost are treated as foreign with not
> > always being a good thing.  In 3.0, homehost is no longer optional.
> > If it is not explicitly set, it will default to `uname -n`.  So newly
> > created arrays will not suffer from this problem.  Arrays created with
> > mdadm 2.x do.  They can be 'upgraded' with
> >    --assemble --update=homehost
> > which is a suggestion that should be put in the man page.
> 
> This is a bad idea, and just reinforces my thought that we shouldn't  
> be paying attention to homehost.  Amongst the most important aspects  
> are machines that are booted up, installed, raid arrays created during  
> install, then shut down and moved, likely changing dhcp hostnames in  
> the process.  Now all your homehosts belong to some hostname in some  
> IT guys install network instead of in your final network.  At install  
> time, it's actually fairly common that the hostname is not yet set,  
> especially at raid array creation time.

But it should be fairly straight forward for the IT guys to arrange
that an mdadm.conf creates created which record the UUID of the array.
If the UUID is in mdadm.conf, you don't need homehost.

> 
> > Your idea of allowing the names "/dev/md0" and "md0" to connect with  
> > the
> > minor number '0' in the same way that the name "0" does is a good
> > one.  I have implemented that.
> >
> > I think I am leaning towards 'homehost=any' rather than 'homehost=*'
> > and will implement that. (No one would have a computer called 'any'
> > would they?).
> >
> > Thanks again for your input.
> 
> No problem.


Maybe a summary is in order.
We have:

 A - arrays that clearly belong on 'this' machine.  Either they are
     unambiguously listed in mdadm.conf, or they container homehost
     information that ties them to this computers.
 B - arrays that explicitly list another host in their metadata
 C - arrays that don't explicitly list a host.

and

 a - devices name that are explicitly record, e.g. in /etc/fstab
 b - device name that are not explicit used and so are only
     interesting to people.
  
We have:

 1 - boot time, when we want to be cautious about not assembling
    the wrong thing
 2 - normal run time when we have mounted all the really important
    filesystems and  we can be less cautious.

and we have:

 i  - cases when we want to explicitly not assemble certain arrays,
      such as SAN environments
 ii - cases when we want to assemble anything that appears

And various combinations that different people feel strongly about.
And the question is:  can we actually please all the people all the
time?

I think that if we can make a reliable and meaningful distinction
between 1 and 2, and between a and b,  and if we assemble only A in
case 1, and never assemble 'a' which is not 'A', and if we support
disabling of autoassembly for everything, or specific metadata types,
or specific arrays, in mdadm.conf - then we come pretty close.

Does anyone have thoughts on the 1 vs 2 distinction?? or the a vs b
distinction. 

I'm not sure that the B vs C distinction is of any value, but I
thought I would mention it for completeness.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20  6:08               ` Neil Brown
@ 2009-04-20 12:26                 ` Luca Berra
  2009-04-20 12:36                 ` Doug Ledford
  1 sibling, 0 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-20 12:26 UTC (permalink / raw)
  To: LinuxRaid

On Mon, Apr 20, 2009 at 04:08:00PM +1000, Neil Brown wrote:
>On Saturday April 18, dledford@redhat.com wrote:
>> 
>> I've been thinking about this, and this is the method I would suggest.
>> 
>> Add two new keywords to the mdadm.conf file:
>> 
>> ASSEMBLE
>> INCREMENTAL
>> 
>> Allow each of those keywords to have one of three set values:
>> None - Don't attempt to assemble any arrays regardless of whether or  
>> not they are in the mdadm.conf file or not
>> Known - Only assemble arrays with a matching array line
>> All - Attempt to assemble any array found
>> 
>> The combination of the two options and the three settings would allow  
>> you to control mdadm behavior for both array assembly modes  
>> independently.  That, combined with my previous patch, should allow  
>> arrays to assemble well, with known names, allow you to control auto  
>> assembly by udev, and in the event that your machine just exports  
>> volumes to other machines for their use, stop assembly entirely.
>
>Why "None"??  Why would you use "None" rather than "Known" with an
>empty list of arrays?
i believe ASSEMBLE is intended as AUTO ASSEMBLE
so we would have the description for the arry and be able to assemble it
manually using: mdadm -A /dev/md/foo
i would like to be able to define this per array tough

>Why have two options: ASSEMBLE and INCREMENTAL ??
>If what circumstance would you use different settings for these two
>options.
maybe i want to avoid incremental assemble of a big array, which would
result in undesired rebuilds.

>I current have two patches sitting in my scratch queue.  I am by no
>means committed to them.
>
>One allows you to have e.g.
>
>  ARRAY ignore UUID=foo:bar:dead:beef
>
>with the meaning that auto-assembly will ignore that array.  If you
>run
>
>  mdadm --assemble /dev/md/thing --uuid foo:bar:dead:beef
>
>it will still assemble the array, but any auto-assembly will ignore
>it.
it is not clear to me what the difference is from having no line at all


>The other allows you to say:
>
> AUTO -ddf -0.90 +all
>
>which means don't auto-assemble any 'ddf' or '0.90' array, but do
>auto-assemble anything else that is recognised.
>You might want to use dmraid for ddf??
>
>If you just have
>
>  AUTO -all
>

one syntax or the other is ok for me, provided the ability to control
what mdadm is doing.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20  5:58                     ` Neil Brown
@ 2009-04-20 12:29                       ` Doug Ledford
  2009-04-20 18:17                       ` Piergiorgio Sartor
  1 sibling, 0 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-20 12:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Piergiorgio Sartor, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

On Apr 20, 2009, at 1:58 AM, Neil Brown wrote:
> If you use 1.x metadata (e.g. 1.0), then this works nicely.
>
> mdadm --create /dev/md/foo --metadata 1.0 --level .....
>
> This will store the name 'foo' in the metadata and when you assemble
> the array, it will be called /dev/md/foo.
> This will be a symlink to /dev/md125 or something like that, but you
> don't need to care.


I would prefer to see /dev/md/foo as an actual device special file,  
not a symlink, and no /dev/md125 at all.  Additionally, /proc/mdstat  
output doesn't match /dev/md/foo, it matches /dev/md125, so if you  
need to figure out what raid device is /dev/md/foo so you can see its  
status in /proc/mdstat, then you have to dereference the /dev/md/foo  
symlink.  This just highlights the fact that we haven't gotten past  
numbers as our primary way of referring to md devices.  Kill the  
numbers, allow names to be a *sole* means of reference to an array.   
Otherwise, that lingering /dev/md125 just confuses the issue.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20  6:08               ` Neil Brown
  2009-04-20 12:26                 ` Luca Berra
@ 2009-04-20 12:36                 ` Doug Ledford
  1 sibling, 0 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-20 12:36 UTC (permalink / raw)
  To: Neil Brown; +Cc: Luca Berra, LinuxRaid

[-- Attachment #1: Type: text/plain, Size: 3682 bytes --]

On Apr 20, 2009, at 2:08 AM, Neil Brown wrote:
> On Saturday April 18, dledford@redhat.com wrote:
>>
>> I've been thinking about this, and this is the method I would  
>> suggest.
>>
>> Add two new keywords to the mdadm.conf file:
>>
>> ASSEMBLE
>> INCREMENTAL
>>
>> Allow each of those keywords to have one of three set values:
>> None - Don't attempt to assemble any arrays regardless of whether or
>> not they are in the mdadm.conf file or not
>> Known - Only assemble arrays with a matching array line
>> All - Attempt to assemble any array found
>>
>> The combination of the two options and the three settings would allow
>> you to control mdadm behavior for both array assembly modes
>> independently.  That, combined with my previous patch, should allow
>> arrays to assemble well, with known names, allow you to control auto
>> assembly by udev, and in the event that your machine just exports
>> volumes to other machines for their use, stop assembly entirely.
>
> Why "None"??  Why would you use "None" rather than "Known" with an
> empty list of arrays?

For the same reason we have a write-mostly flag for raid1.  Maybe we  
have two machines that both want/need to know about a given array, but  
only one should access it at a time (clustered failover scenario).  On  
the non-primary node, you use None, on the primary you use Known, and  
bootup proceeds properly.  Then on failover, on the non-primary  
machine you already have the array in mdadm.conf and can bring it up  
safely and reliably in manual operation.  This requires that mdadm  
tell the difference between manual and automatic mode (aka, from a  
command line instead of a shell script), so you need a new option flag  
to override the assembly/incremental settings, but that's the only  
change necessary.

> Why have two options: ASSEMBLE and INCREMENTAL ??
> If what circumstance would you use different settings for these two
> options.

I can't speak to other distros, but at least Fedora still does  
assembly on bootup, and incremental after bootup (well, we switch to  
incremental part way through bootup, mainly once rc.sysinit has  
completed).  Maybe you have a machine that exports raid block devices  
via AOE, and these are always present at bootup, so you want ASSEMBLY  
to none.  Yet, you also plug in a roving USB disk array for online  
backups, so you want that to come up via hot plug.  There are lots of  
reasons things might be done.  I just suggested a method that is  
flexible enough to satisfy even the most whacked out scenario.

> I current have two patches sitting in my scratch queue.  I am by no
> means committed to them.
>
> One allows you to have e.g.
>
>  ARRAY ignore UUID=foo:bar:dead:beef
>
> with the meaning that auto-assembly will ignore that array.  If you
> run
>
>  mdadm --assemble /dev/md/thing --uuid foo:bar:dead:beef
>
> it will still assemble the array, but any auto-assembly will ignore
> it.
>
> The other allows you to say:
>
> AUTO -ddf -0.90 +all
>
> which means don't auto-assemble any 'ddf' or '0.90' array, but do
> auto-assemble anything else that is recognised.
> You might want to use dmraid for ddf??
>
> If you just have
>
>  AUTO -all
>
> then it won't auto-assemble anything, which is much like your
>  ASSEMBLE  Known
>  INCREMENTAL Known
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20  7:23           ` Neil Brown
@ 2009-04-20 13:15             ` Doug Ledford
  2009-04-21  6:54               ` Neil Brown
  2009-05-11  6:47               ` Neil Brown
  0 siblings, 2 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-20 13:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jon Nelson, LinuxRaid

[-- Attachment #1: Type: text/plain, Size: 16871 bytes --]

On Apr 20, 2009, at 3:23 AM, Neil Brown wrote:
> On Friday April 17, dledford@redhat.com wrote:
>> On Apr 16, 2009, at 11:49 PM, Neil Brown wrote:
>>> On Monday April 6, dledford@redhat.com wrote:
>>>> On Apr 1, 2009, at 6:46 PM, Neil Brown wrote:
>>
>> This appears to be the difference between a server setup and a  
>> desktop
>> setup.  Server admins want to list things and only have known actions
>> happen.  Desktop people want things to "just work".  I've had several
>> people tell me they thought the idea of mdadm.conf was completely out
>> of date and it should just go away entirely.  Not saying I agree,  
>> just
>> letting you know what I get.
>
> :-)
>
>>> I'm not sure I'm happy with expecting people to do that
>>> (though of course I'm happy to support it).
>>
>> I really don't expect them to per se.  More like it's the *safe*  
>> thing
>> to do.  If you ever have a conflict in names, the one in the file
>> wins.  If you ever have a conflict in names without one of them in  
>> the
>> file, then it's whoever got there first.  In that sense, mdadm.conf  
>> is
>> just a backup for me.  Well, that and mkinitrd doesn't do incremental
>> assembly, so it's needed for boot in my case.  But that could be
>> changed.
>
> So the safe thing to do is to create mdadm.conf.  But we all know that
> the convenient thing to do is not to create mdadm.conf.
>
> Thus safe and convenient are separate.  This sounds like bad design.

That's life.  It's always more convenient to do the non-safe thing.   
The question isn't whether or not they are different, but what level  
of safety do you give up for convenience.

> I like it that not creating mdadm.conf is a little bit inconvenient in
> that you are more likely to get names with _N suffixes.  It (I hope)
> motivates people to become safe, either by making sure homehost works,
> or be creating mdadm.conf.
>
> The case that I want to avoid is this:
>  You have two machines that each boot off their own md0.
>  Late one night machine A dies.  So you get called in, while half
>  asleep, to get the data back on line.
>  You shut down B, pull the drive out of A and plug them into B and
>  then boot B.
>  You find that it made a root filesytem from the drives that were in A
>  rather than in B.
>
> This could be just inconvenient, or it could be a serious mess.

This is a total non-issue.  It can't happen (at least in Fedora).   
It's a 100% impossibility.  The reason is that if you have a / raid  
array, it's started by the initrd, and the initrd uses assemble and an  
mdadm.conf file (you wouldn't be able to boot otherwise, regardless of  
whether or not you've moved drives, the / raid array *must* be in  
mdadm.conf).  The same is also true of the other machine.  So, the  
only way for this to happen is if the admin inserted the drive into a  
location that was before the existing drive in the BIOS boot order, in  
which case *we* and mdadm can't do a damn thing about that.  Doing  
something inconvenient in an attempt to solve a problem that we  
*can't* solve does us no good what so ever.

> I don't want people to discover these potential naming conflicts which
> trying to recover from a disaster.  I want them to discover them when
> initially setting up their array.

Realistically, the admin would need to notice the different host name  
in this case.  If it brought up the wrong root, then at a minimum, the  
host name should be off.  If you are using dhcp host name setup for  
your servers in this situation, then depending on whether or not the  
servers are identical, you might actually be just fine running off the  
other root as the original machine.  In any case though, we can't do  
anything about it.  We booted off the wrong drive, so we'll have the  
wrong mdadm.conf and that mdadm.conf will think all the remote arrays  
are local.

> To achieve that, I should probably make the _N suffix truly random
> rather than simply arbitrary.  But I haven't done that.  Yet.
>
>>
>>> So the various parts of your algorithm which involve heuristics
>>> based on the entries in mdadm.conf - or on the existence of  
>>> mdadm.conf
>>> itself - are parts that I don't feel comfortable with.
>>>
>>> What is left?  Well, the observation that moving an external
>>> multi-drive enclosure between hosts causes confusing naming is a  
>>> valid
>>> and useful observation.
>>>
>>> Someone should be able to create an array on such a device called
>>> 'foo' and get '/dev/md/foo' created on any host.
>>> The best thought I have come to so far is to support (and document)
>>> something like
>>> --create --homehost=any
>>> or
>>> --create --homehost=*
>>>
>>> with the meaning that the array so created will get preferential
>>> access to it's recorded name (i.e. no "_0" suffix).
>>>
>>> I also wonder if, when mdadm finds an array that is explicitly for
>>> another host, we could use that host name rather than _0 to
>>> disambiguate.  So
>>> --create /dev/md/foo --homehost=bob
>>> when assembled on some other host is called
>>>      /dev/md/foo_bob
>>> that might at least make it more obvious what is happening.
>>
>> This is probably where you and I disagree.  I don't think you are
>> disambiguating.  I think you are confounding the common case of no
>> conflict.  If someone has a non-portable array, like /, they commonly
>> use something like /dev/md0.  That, you will likely never get a
>> conflict on.
>
> Except in the above false scenario, when you least want it to happen.
                       ^  There, corrected that for you.

>>              On the other hand, if someone creates an array to be
>> mobile, it will likely have a higher number (or it could be 0, but
>> that implies they aren't using root raid arrays on their machines in
>> all likelihood).  So, if you make a mobile array, just give it any  
>> old
>> number you can remember other than the normal base numbers used by  
>> non-
>> portable arrays, and viola, no conflicts (note that this is also  
>> why I
>> was in favor of a completely numberless md setup, where device
>> major:minor do not impact name of the array at all, and you are free
>> to create something like /dev/md/root and there will be no access  
>> file
>> other than /dev/md/root, specifically no alias from /dev/md0 to /dev/
>> md/root...it's much easier to remember names than numbers, and much
>> easier to create a scheme that avoids conflicts 100% of the time).   
>> As
>> it stands though, the current code still won't honor random names as
>> though that was the official and canonical name of the array, it
>> insists on creating a /dev/md# device and then just symlinking the
>> name as though the /dev/md# device is canonical.  In one of your
>> previous emails you mentioned something about how bad design  
>> decisions
>> get entrenched and can never be rooted out, I would point to this
>> ;-)
>
> I had forgotten about this...
> The kernel supports this.  We just need to make sure it works with
> udev and get mdadm to use it.
>
> echo md_foo > /sys/modules/md_mod/parameters/new_array
> ls -l /dev/md_foo
>
> no numbers at all.

Even in the output of /proc/mdstat?

> Maybe we can start using this in 3.1.
> But I'm not sure how this relates to the current problem of how to
> choose a name based on the contents of the metadata.
>
>
>
> You draw a distinction between mobile and non-mobile arrays.  Quite
> possibly that is a useful distinction to pursue.
>
> It is the non-mobile arrays that I am particularly concerned about.
> If someone plugs in a mobile array I'm happy to give them whatever
> name seems like a good idea - conflicts aren't such a problem.
>
> But how can we tell the difference???
>
> Well, we could look in /etc/fstab (unless the same people who think
> /etc/mdadm.conf is old fashioned manage to get rid of /etc/fstab as
> well).
>
> How about this:
>  A name is 'local' if:
>    it is associated with the array via mdadm.conf or
>    it is associated with this host via 'homehost'
>  A name is 'non-mobile' if:
>    it is associated with some use in /dev/fstab

At least in Fedora, this is a useless distinction.  For an array to be  
in fstab, it must first be in mdadm.conf.  We only allow non-fstab  
arrays to be autoassembled without also being in mdadm.conf.  And only  
then on hot plug that happens post-boot.

The Fedora mdadm bring up sequence goes like this:

1) In initrd, bring up any / raid arrays (supports stacked arrays and  
the like) using mdadm -As --run /dev/<device> (this way we support  
degraded array bring up as best possible)
2) In rc.sysinit, we start udev, but our udev incremental assembly  
rule checks if we are in rc.sysinit and skips incremental assembly as  
long as we are still there.  The start udev command will initiate the  
only add event we will get on the known devices at that time, so all  
those add events for devices already found by the system get ignored.
3) Later in rc.sysinit, if we have both /sbin/mdadm and /etc/ 
mdadm.conf, we run mdadm -As --run to bring up all listed arrays in  
mdadm.conf (your patch for ignoring an array would be useful here, and  
allow an even finer grained control than my ASSEMBLY/INCREMENTAL  
settings, although I could see those two complementing each other in  
that you could individually stop assembly of just select arrays, or  
turn off all assembly, so I see value in both options)
4) Once we leave rc.sysinit, we have started all listed md raid  
arrays, *and* we have mounted the local filesystems in fstab.  Only  
now does udev incremental assembly start working.  And since we've  
already processed all the add events for devices present at boot, it  
only attempts to assemble things plugged in after this point in time.

I should note that this method of splitting autoassembly from udev  
autoassembly also allows me to start all the non-hotplug devices with  
the --run option, while in my udev rule I only start them when they  
are complete, never when degraded.  This way if you hot plug an  
incomplete array, we don't do anything automatically.  We limit the  
automatic, make if "just work" actions to things that are fully there,  
but accept a degraded state on stuff we need to boot.

Now, this might raise the question of "what if I put a hot plug array  
into my mdadm.conf, will that stop me from booting?"  The answer is  
no.  The mdadm -As --run command will start all available arrays  
listed in mdadm.conf.  If you list mobile arrays in there, but they  
aren't plugged in, then they will get happily ignored (and if they are  
there, they'll get brought up, which would be necessary since udev  
won't process them later).  In fact, if *none* of the arrays get  
started, the mdadm failure to start arrays will not stop the boot  
sequence.  It won't be until you get to attempting to mount an array  
that isn't running that rc.sysinit will kick you out to a fix  
filesystem prompt.  So, you are free to list arrays in mdadm.conf that  
you don't need for bootup, but you are required to list arrays you do  
need for bootup.

> Then if a name is either 'local' or not 'non-mobile', then we feel  
> free to
> use it as it stands, otherwise we add a _N suffix.
> I think this is fairly close to using 'my' rules for things listed in
> /etc/fstab, and 'your' rules for everything else.
>
> This is tempting, but feels like it might be a bit fragile.
> Does anything other than /etc/fstab depend on device names to
> find things that are stored on devices?
>
> One fragility would appear when running "mdadm -As" in an initrd.
> You might not have an /etc/fstab at all, so everything might get
> assembled using the wrong set of rules.
>
> Maybe there is a safe way to detect "in initrd" and impose the
> conservative rules in that case.
>
>>
>>> Note that 0.90 metadata does contain homehost information to some
>>> extent.  When homehost is set, the last few bytes of the uuid is set
>>> from a hash of the homehost name.  That makes it possible to test  
>>> if a
>>> 0.90 array was created for 'this' host, but not to find out what  
>>> host
>>> it was created for.  So the above expedient won't work for 0.90
>>> arrays, but the rest of the homehost concept (including any possible
>>> 'homehost=any' option) does.
>>>
>>> You note that arrays with no homehost are treated as foreign with  
>>> not
>>> always being a good thing.  In 3.0, homehost is no longer optional.
>>> If it is not explicitly set, it will default to `uname -n`.  So  
>>> newly
>>> created arrays will not suffer from this problem.  Arrays created  
>>> with
>>> mdadm 2.x do.  They can be 'upgraded' with
>>>   --assemble --update=homehost
>>> which is a suggestion that should be put in the man page.
>>
>> This is a bad idea, and just reinforces my thought that we shouldn't
>> be paying attention to homehost.  Amongst the most important aspects
>> are machines that are booted up, installed, raid arrays created  
>> during
>> install, then shut down and moved, likely changing dhcp hostnames in
>> the process.  Now all your homehosts belong to some hostname in some
>> IT guys install network instead of in your final network.  At install
>> time, it's actually fairly common that the hostname is not yet set,
>> especially at raid array creation time.
>
> But it should be fairly straight forward for the IT guys to arrange
> that an mdadm.conf creates created which record the UUID of the array.
> If the UUID is in mdadm.conf, you don't need homehost.

OK, then see above for why, at least in Fedora, homehost has no value.

>>> Your idea of allowing the names "/dev/md0" and "md0" to connect with
>>> the
>>> minor number '0' in the same way that the name "0" does is a good
>>> one.  I have implemented that.
>>>
>>> I think I am leaning towards 'homehost=any' rather than 'homehost=*'
>>> and will implement that. (No one would have a computer called 'any'
>>> would they?).
>>>
>>> Thanks again for your input.
>>
>> No problem.
>
>
> Maybe a summary is in order.
> We have:
>
> A - arrays that clearly belong on 'this' machine.  Either they are
>     unambiguously listed in mdadm.conf, or they container homehost
>     information that ties them to this computers.
> B - arrays that explicitly list another host in their metadata
> C - arrays that don't explicitly list a host.
>
> and
>
> a - devices name that are explicitly record, e.g. in /etc/fstab
> b - device name that are not explicit used and so are only
>     interesting to people.
>
> We have:
>
> 1 - boot time, when we want to be cautious about not assembling
>    the wrong thing
> 2 - normal run time when we have mounted all the really important
>    filesystems and  we can be less cautious.
>
> and we have:
>
> i  - cases when we want to explicitly not assemble certain arrays,
>      such as SAN environments
> ii - cases when we want to assemble anything that appears
>
> And various combinations that different people feel strongly about.
> And the question is:  can we actually please all the people all the
> time?
>
> I think that if we can make a reliable and meaningful distinction
> between 1 and 2, and between a and b,  and if we assemble only A in
> case 1, and never assemble 'a' which is not 'A', and if we support
> disabling of autoassembly for everything, or specific metadata types,
> or specific arrays, in mdadm.conf - then we come pretty close.
>
> Does anyone have thoughts on the 1 vs 2 distinction?? or the a vs b
> distinction.
>
> I'm not sure that the B vs C distinction is of any value, but I
> thought I would mention it for completeness.

IMO, with my name selection patch, with the option to list an array as  
ignore, and with the option to turn either assembly or incremental  
modes completely off as I suggested, done in combination with a boot  
sequence like we now use in Fedora, you have this problem solved.  The  
only niggling thing might be if mdadm -As --run without a device  
specifier will pickup arrays via the DEVICE partitions that don't  
exist with ARRAY lines, but even if it does, my name patch means that  
any ARRAY lines will supersede any randomly found devices so the  
expected name will go to the expected place, and not elsewhere (and I  
did verify that the array that needs to have its normal name need not  
be up and running for the interloper array to kicked to another  
name...it gets kicked on the fact that it doesn't match the array in  
mdadm.conf, not on the presence of the matching array).  (Note: the  
bootup sequence I listed above is our new sequence as of F11, older  
versions of fedora were similar, but didn't limit incremental assembly  
to only outside of rc.sysinit, and so there were some bugs in actual  
usage)


--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20  5:58                     ` Neil Brown
  2009-04-20 12:29                       ` Doug Ledford
@ 2009-04-20 18:17                       ` Piergiorgio Sartor
  2009-04-20 19:49                         ` Leslie Rhorer
  2009-04-20 21:13                         ` Luca Berra
  1 sibling, 2 replies; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-20 18:17 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 20, 2009 at 03:58:08PM +1000, Neil Brown wrote:
[...]
> If you use 1.x metadata (e.g. 1.0), then this works nicely.
> 
>  mdadm --create /dev/md/foo --metadata 1.0 --level .....
> 
> This will store the name 'foo' in the metadata and when you assemble
> the array, it will be called /dev/md/foo.
> This will be a symlink to /dev/md125 or something like that, but you
> don't need to care.

Does this really work with mdadm 2.6.7.1?

Because I have this from "--examine --scan"

ARRAY /dev/md/boot level=raid1 metadata=1.0 num-devices=2 UUID=edb4254d:4274fac1:dd6cad61:a8e3c347 name=boot

Of course, /dev/mdadm.conf has the same entry.

But:

$> mdadm -A --scan
mdadm: /dev/md/boot does not exist and is not a 'standard' name so it cannot be created

The array was *not* created as you suggested,
but it has metadata 1.0 and a name.
Which name is also returned by --examine --scan
as array device name.

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [Patch] mdadm ignoring homehost?
  2009-04-20 18:17                       ` Piergiorgio Sartor
@ 2009-04-20 19:49                         ` Leslie Rhorer
  2009-04-20 20:04                           ` Piergiorgio Sartor
  2009-04-20 21:18                           ` Luca Berra
  2009-04-20 21:13                         ` Luca Berra
  1 sibling, 2 replies; 59+ messages in thread
From: Leslie Rhorer @ 2009-04-20 19:49 UTC (permalink / raw)
  To: 'Linux RAID'

> >  mdadm --create /dev/md/foo --metadata 1.0 --level .....
> >
> > This will store the name 'foo' in the metadata and when you assemble
> > the array, it will be called /dev/md/foo.
> > This will be a symlink to /dev/md125 or something like that, but you
> > don't need to care.
> 
> Does this really work with mdadm 2.6.7.1?
> 
> Because I have this from "--examine --scan"
> 
> ARRAY /dev/md/boot level=raid1 metadata=1.0 num-devices=2
> UUID=edb4254d:4274fac1:dd6cad61:a8e3c347 name=boot

Um, this is slightly odd, but when I use --examine --scan on my array, it
returns nothing.  If I run it on an element of the array I get:

ARRAY /dev/md0 level=raid6 num-devices=10
UUID=5a53eeb8:4d87963b:4cb502e9:37bde716

'Nothing about the metadata or the array name.

> Of course, /dev/mdadm.conf has the same entry.

Um, did you mean /etc/mdadm/mdadm.conf?  I don't have a /dev/mdadm.conf on
my system.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 19:49                         ` Leslie Rhorer
@ 2009-04-20 20:04                           ` Piergiorgio Sartor
  2009-04-20 21:18                           ` Luca Berra
  1 sibling, 0 replies; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-20 20:04 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 20, 2009 at 02:49:48PM -0500, Leslie Rhorer wrote:
> Um, this is slightly odd, but when I use --examine --scan on my array, it
> returns nothing.  If I run it on an element of the array I get:
> 
> ARRAY /dev/md0 level=raid6 num-devices=10
> UUID=5a53eeb8:4d87963b:4cb502e9:37bde716
> 
> 'Nothing about the metadata or the array name.

That's interesting, maybe Neil should comment on this.
 
> > Of course, /dev/mdadm.conf has the same entry.
> 
> Um, did you mean /etc/mdadm/mdadm.conf?  I don't have a /dev/mdadm.conf on
> my system.

Yeah, I meant /etc/mdadm.conf...

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 18:17                       ` Piergiorgio Sartor
  2009-04-20 19:49                         ` Leslie Rhorer
@ 2009-04-20 21:13                         ` Luca Berra
  2009-04-20 21:24                           ` Piergiorgio Sartor
  2009-04-21 18:15                           ` Piergiorgio Sartor
  1 sibling, 2 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-20 21:13 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 20, 2009 at 08:17:36PM +0200, Piergiorgio Sartor wrote:
>$> mdadm -A --scan
>mdadm: /dev/md/boot does not exist and is not a 'standard' name so it cannot be created
>
add --auto=md to the command line
or put it in the create line in mdadm.conf

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 19:49                         ` Leslie Rhorer
  2009-04-20 20:04                           ` Piergiorgio Sartor
@ 2009-04-20 21:18                           ` Luca Berra
  1 sibling, 0 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-20 21:18 UTC (permalink / raw)
  To: 'Linux RAID'

On Mon, Apr 20, 2009 at 02:49:48PM -0500, Leslie Rhorer wrote:
>> >  mdadm --create /dev/md/foo --metadata 1.0 --level .....
>> >
>> > This will store the name 'foo' in the metadata and when you assemble
>> > the array, it will be called /dev/md/foo.
>> > This will be a symlink to /dev/md125 or something like that, but you
>> > don't need to care.
>> 
>> Does this really work with mdadm 2.6.7.1?
>> 
>> Because I have this from "--examine --scan"
>> 
>> ARRAY /dev/md/boot level=raid1 metadata=1.0 num-devices=2
>> UUID=edb4254d:4274fac1:dd6cad61:a8e3c347 name=boot
>
>Um, this is slightly odd, but when I use --examine --scan on my array, it
>returns nothing.  If I run it on an element of the array I get:
--examine should be run on component device
ie. mdadm --examine /dev/sda
using --scan will make it examine all devices matching the DEVICE line
in mdadm.conf (or all device if mdadm.conf does not exist)

--detail should be run on an array
ie. mdadm --detail /dev/md0

>ARRAY /dev/md0 level=raid6 num-devices=10
>UUID=5a53eeb8:4d87963b:4cb502e9:37bde716
>
>'Nothing about the metadata or the array name.
this is because you are using 0.90 metadata and did never set a name on
the device...

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 21:13                         ` Luca Berra
@ 2009-04-20 21:24                           ` Piergiorgio Sartor
  2009-04-20 23:47                             ` Doug Ledford
  2009-04-21 18:15                           ` Piergiorgio Sartor
  1 sibling, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-20 21:24 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 20, 2009 at 11:13:32PM +0200, Luca Berra wrote:
> add --auto=md to the command line
> or put it in the create line in mdadm.conf

Thanks, that did the trick!

Interesting, it seems now I have 3 devices
for that raid.
One is /dev/md127, one is /dev/md/boot, with
same (major, minor) as md127, and there is a
third one, /dev/md_boot, which is a symbolic
link to /dev/md/boot.

/proc/mdstat reports /dev/md127

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 21:24                           ` Piergiorgio Sartor
@ 2009-04-20 23:47                             ` Doug Ledford
  2009-04-21  0:00                               ` Doug Ledford
  2009-04-21  6:29                               ` Luca Berra
  0 siblings, 2 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-20 23:47 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

On Apr 20, 2009, at 5:24 PM, Piergiorgio Sartor wrote:
> On Mon, Apr 20, 2009 at 11:13:32PM +0200, Luca Berra wrote:
>> add --auto=md to the command line
>> or put it in the create line in mdadm.conf
>
> Thanks, that did the trick!
>
> Interesting, it seems now I have 3 devices
> for that raid.
> One is /dev/md127, one is /dev/md/boot, with
> same (major, minor) as md127, and there is a
> third one, /dev/md_boot, which is a symbolic
> link to /dev/md/boot.
>
> /proc/mdstat reports /dev/md127

And so the confusion is perpetuated.  This is *not* accessing a device  
by name.  If I give mdadm a name for my device, I don't want it doing  
*anything* with numbers, creating numbered symlinks, or anything  
else.  In addition, if I tell mdadm to create something in /dev/md/  
(versus it decided all on its own to create something there), then I  
do *NOT* want it creating *anything* in /dev/ that I didn't ask for.   
That, again, adds to the confusion.  Of all of it though, the /proc/ 
mdstat is the worst part as it underscores that the kernel stack is  
not able to think in terms of names instead of numbers.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 23:47                             ` Doug Ledford
@ 2009-04-21  0:00                               ` Doug Ledford
  2009-04-21  8:57                                 ` Michal Soltys
  2009-04-21  6:29                               ` Luca Berra
  1 sibling, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2009-04-21  0:00 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Piergiorgio Sartor, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]

On Apr 20, 2009, at 7:47 PM, Doug Ledford wrote:
> And so the confusion is perpetuated.  This is *not* accessing a  
> device by name.  If I give mdadm a name for my device, I don't want  
> it doing *anything* with numbers, creating numbered symlinks, or  
> anything else.  In addition, if I tell mdadm to create something in / 
> dev/md/ (versus it decided all on its own to create something  
> there), then I do *NOT* want it creating *anything* in /dev/ that I  
> didn't ask for.  That, again, adds to the confusion.  Of all of it  
> though, the /proc/mdstat is the worst part as it underscores that  
> the kernel stack is not able to think in terms of names instead of  
> numbers.


Actually, I want to expand on this thought for a little bit.  I'm  
obviously harping on all the symlinks and stuff that mdadm creates  
when you tell it what you want it to do.  I know these were added for  
back compatibility reasons.  However, the problem I have is that I  
work on mdadm for a living (well, sorta, I have 30+ other packages I  
also maintain, many of them orders of magnitude larger than mdadm, and  
I do kernel work, so my mdadm specific time is fairly small, but still  
it's paid time), and I've sat down before and tried to figure out "if  
I use name 'X' for my array, what device file gets created".  The net  
result of my attempts to do that, were that I was never able to figure  
out just by running mdadm what the proper syntax for the name variable  
is/was.  It always created so many symlinks to cover all possible  
cases that I never could get it to do what I wanted, and *just* what I  
wanted.  In short, mdadm as it currently stands errs on the side of  
caution and back compatibility, but it does so to such an extent that  
you can never get things wrong.  And if you can never get things  
wrong, you can never figure out how to get things *right*.  We either  
have to create every possible compatibility symlink forever, or  
*sometime* we have to turn that off and just let people figure out how  
to get this stuff right.  But right now, no one is making any progress  
because it's hidden by mdadm trying to cover our asses for us.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 23:47                             ` Doug Ledford
  2009-04-21  0:00                               ` Doug Ledford
@ 2009-04-21  6:29                               ` Luca Berra
  1 sibling, 0 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-21  6:29 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 20, 2009 at 07:47:36PM -0400, Doug Ledford wrote:
> On Apr 20, 2009, at 5:24 PM, Piergiorgio Sartor wrote:
>> On Mon, Apr 20, 2009 at 11:13:32PM +0200, Luca Berra wrote:
>>> add --auto=md to the command line
>>> or put it in the create line in mdadm.conf
>>
>> Thanks, that did the trick!
>>
>> Interesting, it seems now I have 3 devices
>> for that raid.
>> One is /dev/md127, one is /dev/md/boot, with
>> same (major, minor) as md127, and there is a
>> third one, /dev/md_boot, which is a symbolic
>> link to /dev/md/boot.
>>
>> /proc/mdstat reports /dev/md127
>
> And so the confusion is perpetuated.  This is *not* accessing a device by 
> name.  If I give mdadm a name for my device, I don't want it doing 
> *anything* with numbers, creating numbered symlinks, or anything else.  In 
> addition, if I tell mdadm to create something in /dev/md/ (versus it 
> decided all on its own to create something there), then I do *NOT* want it 
> creating *anything* in /dev/ that I didn't ask for.  That, again, adds to 
> the confusion.  Of all of it though, the /proc/mdstat is the worst part as 
> it underscores that the kernel stack is not able to think in terms of names 
> instead of numbers.
>
and md is not even the worst case, think of device-mapper where
dm-<minor> could be anything from a multipath san storage to lvm or
encrypted partition or even a normal partition mapped via kpartx :P

I believe we should add a mdstat command that should replace 'cat
/proc/mdstat' hiding the gory details.
and device-file creation could be configurable.

L.


-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20  5:23           ` Neil Brown
@ 2009-04-21  6:34             ` Gabor Gombas
  2009-04-21  7:06               ` Luca Berra
  0 siblings, 1 reply; 59+ messages in thread
From: Gabor Gombas @ 2009-04-21  6:34 UTC (permalink / raw)
  To: Neil Brown; +Cc: Doug Ledford, Jon Nelson, LinuxRaid

On Mon, Apr 20, 2009 at 03:23:52PM +1000, Neil Brown wrote:

> How does LVM manage metadata???  I assume it stored the metadata on
> the device.  Which is what mdadm does.

It stores a backup of the metadata under /etc after every changes it
makes. It is like if "mdadm --create" have automatically added the array
to /etc/mdadm.conf etc.

> But as devices can move between machines.....

With LVM, moving a volume group means running
vgchange/vgexport/pvscan/vgimport/vgchange, so it's not like "it just
works". So if there are any name conflicts, you must explicitely run
vgrename before the new VG can be activated.

> > How about introducing /dev/md/by-uuid/... (or similar) and teaching
> > people that if they want to transparently carry their arrays from one
> > host to another, then they should always refer to it by UUID?
> 
> This already exists, though it might be distro-dependant.
>   /dev/disk/by-id/md-uuid-xxxxx

But that requrires the MD device to already be present therefore does
not help solving the name clash issue.

What I propose is that "mdadm --assemble" should accept "UUID=..."
instead of the MD device name. Then mdadm could ignore the
name/homehost, and create the device node with a name
(/dev/md/by-uuid/...) that will never clash with arrays using
traditional names.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 13:15             ` Doug Ledford
@ 2009-04-21  6:54               ` Neil Brown
  2009-05-11  6:47               ` Neil Brown
  1 sibling, 0 replies; 59+ messages in thread
From: Neil Brown @ 2009-04-21  6:54 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Jon Nelson, LinuxRaid

On Monday April 20, dledford@redhat.com wrote:
> On Apr 20, 2009, at 3:23 AM, Neil Brown wrote:
> > The kernel supports this.  We just need to make sure it works with
> > udev and get mdadm to use it.
> >
> > echo md_foo > /sys/modules/md_mod/parameters/new_array
> > ls -l /dev/md_foo
> >
> > no numbers at all.
> 
> Even in the output of /proc/mdstat?
> 

Yep.

# ls -l /dev/md_foo
brw-rw---- 1 root disk 9, 513 2009-04-21 16:50 /dev/md_foo
# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md_foo : active raid1 sdf[0]
      156249920 blocks [1/1] [U]
# ls -ld /sys/block/md*
drwxr-xr-x 6 root root 0 2009-04-21 16:49 /sys/block/md127
drwxr-xr-x 6 root root 0 2009-04-21 16:50 /sys/block/md_foo
      
NeilBrown

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-21  6:34             ` Gabor Gombas
@ 2009-04-21  7:06               ` Luca Berra
  0 siblings, 0 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-21  7:06 UTC (permalink / raw)
  To: LinuxRaid

On Tue, Apr 21, 2009 at 08:34:01AM +0200, Gabor Gombas wrote:
>On Mon, Apr 20, 2009 at 03:23:52PM +1000, Neil Brown wrote:
>
>> How does LVM manage metadata???  I assume it stored the metadata on
>> the device.  Which is what mdadm does.
>
>It stores a backup of the metadata under /etc after every changes it
>makes. It is like if "mdadm --create" have automatically added the array
>to /etc/mdadm.conf etc.
that could be interesting (automatically updating mdadm.conf on create,
maybe via an option)

>> But as devices can move between machines.....
>
>With LVM, moving a volume group means running
>vgchange/vgexport/pvscan/vgimport/vgchange, so it's not like "it just
>works". So if there are any name conflicts, you must explicitely run
>vgrename before the new VG can be activated.
yes, but vgexport/vgimport is optional, and the target audience for
array mobilit don't want to export the array before moving, they just
want to unplug it from a and replug into b.

>> > How about introducing /dev/md/by-uuid/... (or similar) and teaching
>> > people that if they want to transparently carry their arrays from one
>> > host to another, then they should always refer to it by UUID?
>> 
>> This already exists, though it might be distro-dependant.
>>   /dev/disk/by-id/md-uuid-xxxxx
>
>But that requrires the MD device to already be present therefore does
>not help solving the name clash issue.
>
>What I propose is that "mdadm --assemble" should accept "UUID=..."
>instead of the MD device name. Then mdadm could ignore the
>name/homehost, and create the device node with a name
>(/dev/md/by-uuid/...) that will never clash with arrays using
>traditional names.

i dont know how much we gain by hiding the md<minor> device, at least
while the kernel is not able to work completely without minors

see /proc/mdstat or /sys/block/md...

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-21  0:00                               ` Doug Ledford
@ 2009-04-21  8:57                                 ` Michal Soltys
  0 siblings, 0 replies; 59+ messages in thread
From: Michal Soltys @ 2009-04-21  8:57 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Piergiorgio Sartor, linux-raid

Doug Ledford wrote:
> On Apr 20, 2009, at 7:47 PM, Doug Ledford wrote:
> 
> Actually, I want to expand on this thought for a little bit.  I'm 
> obviously harping on all the symlinks and stuff that mdadm creates when 
> you tell it what you want it to do.  I know these were added for back 
> compatibility reasons.  However, the problem I have is that I work on 
> mdadm for a living (well, sorta, I have 30+ other packages I also 
> maintain, many of them orders of magnitude larger than mdadm, and I do 
> kernel work, so my mdadm specific time is fairly small, but still it's 
> paid time), and I've sat down before and tried to figure out "if I use 
> name 'X' for my array, what device file gets created".  The net result 
> of my attempts to do that, were that I was never able to figure out just 
> by running mdadm what the proper syntax for the name variable is/was.  

Mdadm will create the name you want it to. If it makes sense from 
kernel's perspective (aka, it's mdN or md_dN) it will be the same as you 
can see under /sys - assuming it's not taken already . If not, or if 
it's taken - you will have name defined in mdadm.conf under /dev or 
/dev/md/, but from /sys perspective it will be something like md127 or 
md_d127. If you create under /dev/md/ directory, mdadm will symlink from 
/dev or not - depending on CREATE line in mdadm.conf or appropriate 
commandline option.

At the same time udev will do its own stuff. It doesn't care about names 
   defined in mdadm.conf or specified on commandline. The stock udev 
rules will create /sys -like names in /dev - including leftovers of 
inactive arrays and produce all the symlinks it's told to - some from 
/dev/disk/by* , some from /dev/md (which might conflict with default 
mdadm's behaviour if one does mdadm -Es >>/etc/mdadm.conf and doesn't 
even bother adjusting it).

By default, mdadm -Es will create devies using /dev/md/ directory with 
stripped md[_] prefix (if the name is standard). In case of 1.x 
superblocks, the name stored in it will be used (by default - name set 
during --create w/o any md prefix).

In practice I've never used homehost, besides few tests for the sake of 
this thread. The behaviour seemed consistent with the above though.

The above is under assumption of versions 2.9.x . I haven't used 3.x 
branch yet.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 21:13                         ` Luca Berra
  2009-04-20 21:24                           ` Piergiorgio Sartor
@ 2009-04-21 18:15                           ` Piergiorgio Sartor
  2009-04-22 16:06                             ` Andrew Burgess
  1 sibling, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-21 18:15 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 20, 2009 at 11:13:32PM +0200, Luca Berra wrote:
> add --auto=md to the command line
> or put it in the create line in mdadm.conf

Actually I must this did not work quite right
after reboot...

This might be a Fedora 10 issue, so maybe Doug would like
to comment.

After reboot, someone, I guess udev, tries to automagically
start a RAID, so it assembles /dev/md_d127 with one of the
two components of /dev/md/boot (randomly, it seems).
Later, when /dev/md/boot is assembled, one drive is "busy",
because it belongs to /dev/md_d127, and the array is put
together degraded, i.e. with the other disk only.

This happens also after re-creating the initrd.

So, it seems there is still work to do...

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-21 18:15                           ` Piergiorgio Sartor
@ 2009-04-22 16:06                             ` Andrew Burgess
  2009-04-23  1:20                               ` Doug Ledford
  0 siblings, 1 reply; 59+ messages in thread
From: Andrew Burgess @ 2009-04-22 16:06 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Tue, 2009-04-21 at 20:15 +0200, Piergiorgio Sartor wrote:

> This might be a Fedora 10 issue, so maybe Doug would like
> to comment.
> 
> After reboot, someone, I guess udev, tries to automagically
> start a RAID, so it assembles /dev/md_d127 with one of the
> two components of /dev/md/boot (randomly, it seems).
> Later, when /dev/md/boot is assembled, one drive is "busy",
> because it belongs to /dev/md_d127, and the array is put
> together degraded, i.e. with the other disk only.

Just a "me too". I also started seeing this after upgrading to fedora
10. I had to create a startup script to stop md_d0 and reassemble
everything else.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-22 16:06                             ` Andrew Burgess
@ 2009-04-23  1:20                               ` Doug Ledford
  2009-04-23  5:51                                 ` Luca Berra
  2009-04-24 19:15                                 ` Piergiorgio Sartor
  0 siblings, 2 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-23  1:20 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Piergiorgio Sartor, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]

On Apr 22, 2009, at 12:06 PM, Andrew Burgess wrote:
> On Tue, 2009-04-21 at 20:15 +0200, Piergiorgio Sartor wrote:
>
>> This might be a Fedora 10 issue, so maybe Doug would like
>> to comment.
>>
>> After reboot, someone, I guess udev, tries to automagically
>> start a RAID, so it assembles /dev/md_d127 with one of the
>> two components of /dev/md/boot (randomly, it seems).
>> Later, when /dev/md/boot is assembled, one drive is "busy",
>> because it belongs to /dev/md_d127, and the array is put
>> together degraded, i.e. with the other disk only.
>
> Just a "me too". I also started seeing this after upgrading to fedora
> 10. I had to create a startup script to stop md_d0 and reassemble
> everything else.


Yeah, I found the cause for this while working on F11.  The problem is  
a race condition between udev and a call to mdadm -As in the  
rc.sysinit.  For F11, I solved this by making udev not process devices  
using incremental mode if we are still in the rc.sysinit script.  You  
can change /etc/udev/rules.d/70-mdadm.rules (I think that's the right  
name, it might be slightly off) to read something like this:

# This file causes block devices with Linux RAID (mdadm) signatures to
# automatically cause mdadm to be run.
# See udev(8) for syntax

SUBSYSTEM=="block", ACTION=="add",  
ENV{ID_FS_TYPE}=="linux_raid_member", \
	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I  
$env{DEVNAME}'"



--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-23  1:20                               ` Doug Ledford
@ 2009-04-23  5:51                                 ` Luca Berra
  2009-04-23  6:09                                   ` Luca Berra
  2009-04-23 11:05                                   ` Doug Ledford
  2009-04-24 19:15                                 ` Piergiorgio Sartor
  1 sibling, 2 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-23  5:51 UTC (permalink / raw)
  To: linux-raid

On Wed, Apr 22, 2009 at 09:20:49PM -0400, Doug Ledford wrote:
> On Apr 22, 2009, at 12:06 PM, Andrew Burgess wrote:
>> On Tue, 2009-04-21 at 20:15 +0200, Piergiorgio Sartor wrote:
>>
>>> This might be a Fedora 10 issue, so maybe Doug would like
>>> to comment.
>>>
>>> After reboot, someone, I guess udev, tries to automagically
>>> start a RAID, so it assembles /dev/md_d127 with one of the
>>> two components of /dev/md/boot (randomly, it seems).
>>> Later, when /dev/md/boot is assembled, one drive is "busy",
>>> because it belongs to /dev/md_d127, and the array is put
>>> together degraded, i.e. with the other disk only.
>>
>> Just a "me too". I also started seeing this after upgrading to fedora
>> 10. I had to create a startup script to stop md_d0 and reassemble
>> everything else.
>
>
> Yeah, I found the cause for this while working on F11.  The problem is a 
> race condition between udev and a call to mdadm -As in the rc.sysinit.  For 
> F11, I solved this by making udev not process devices using incremental 
> mode if we are still in the rc.sysinit script.  You can change 
> /etc/udev/rules.d/70-mdadm.rules (I think that's the right name, it might 
> be slightly off) to read something like this:
>
> # This file causes block devices with Linux RAID (mdadm) signatures to
> # automatically cause mdadm to be run.
> # See udev(8) for syntax
>
> SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="linux_raid_member", \
> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I $env{DEVNAME}'"
>
>

i believe i saw this as well, but not at startup, it was when i manually
run mdadm -As, so while your hack to prevent udev from assembling
devices while in sysinit may not be a full solution.

my solution was "rm -f /etc/udev/rules.d/70-mdadm.rules",
works like a charm :P

probably the best solution is preventing concurrent mdadm rules with a
lock.

Regards,
L.




-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-23  5:51                                 ` Luca Berra
@ 2009-04-23  6:09                                   ` Luca Berra
  2009-04-23 11:05                                   ` Doug Ledford
  1 sibling, 0 replies; 59+ messages in thread
From: Luca Berra @ 2009-04-23  6:09 UTC (permalink / raw)
  To: linux-raid

On Thu, Apr 23, 2009 at 07:51:32AM +0200, Luca Berra wrote:
> On Wed, Apr 22, 2009 at 09:20:49PM -0400, Doug Ledford wrote:
>> On Apr 22, 2009, at 12:06 PM, Andrew Burgess wrote:
>>> On Tue, 2009-04-21 at 20:15 +0200, Piergiorgio Sartor wrote:
>>>
>>>> This might be a Fedora 10 issue, so maybe Doug would like
>>>> to comment.
>>>>
>>>> After reboot, someone, I guess udev, tries to automagically
>>>> start a RAID, so it assembles /dev/md_d127 with one of the
>>>> two components of /dev/md/boot (randomly, it seems).
>>>> Later, when /dev/md/boot is assembled, one drive is "busy",
>>>> because it belongs to /dev/md_d127, and the array is put
>>>> together degraded, i.e. with the other disk only.
>>>
>>> Just a "me too". I also started seeing this after upgrading to fedora
>>> 10. I had to create a startup script to stop md_d0 and reassemble
>>> everything else.
>>
>>
>> Yeah, I found the cause for this while working on F11.  The problem is a 
>> race condition between udev and a call to mdadm -As in the rc.sysinit.  
>> For F11, I solved this by making udev not process devices using 
>> incremental mode if we are still in the rc.sysinit script.  You can change 
>> /etc/udev/rules.d/70-mdadm.rules (I think that's the right name, it might 
>> be slightly off) to read something like this:
>>
>> # This file causes block devices with Linux RAID (mdadm) signatures to
>> # automatically cause mdadm to be run.
>> # See udev(8) for syntax
>>
>> SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="linux_raid_member", \
>> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
>> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I $env{DEVNAME}'"
>>
>>
>
> i believe i saw this as well, but not at startup, it was when i manually
> run mdadm -As, so while your hack to prevent udev from assembling
> devices while in sysinit may not be a full solution.
>
> my solution was "rm -f /etc/udev/rules.d/70-mdadm.rules",
> works like a charm :P
>
> probably the best solution is preventing concurrent mdadm rules with a
> lock.
s/rules/runs

> Regards,
> L.
>
>
>
>
> -- 
> Luca Berra -- bluca@comedia.it
>         Communication Media & Services S.r.l.
>  /"\
>  \ /     ASCII RIBBON CAMPAIGN
>   X        AGAINST HTML MAIL
>  / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-23  5:51                                 ` Luca Berra
  2009-04-23  6:09                                   ` Luca Berra
@ 2009-04-23 11:05                                   ` Doug Ledford
  2009-04-23 21:31                                     ` Luca Berra
  1 sibling, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2009-04-23 11:05 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3390 bytes --]

On Apr 23, 2009, at 1:51 AM, Luca Berra wrote:
> On Wed, Apr 22, 2009 at 09:20:49PM -0400, Doug Ledford wrote:
>> On Apr 22, 2009, at 12:06 PM, Andrew Burgess wrote:
>>> On Tue, 2009-04-21 at 20:15 +0200, Piergiorgio Sartor wrote:
>>>
>>>> This might be a Fedora 10 issue, so maybe Doug would like
>>>> to comment.
>>>>
>>>> After reboot, someone, I guess udev, tries to automagically
>>>> start a RAID, so it assembles /dev/md_d127 with one of the
>>>> two components of /dev/md/boot (randomly, it seems).
>>>> Later, when /dev/md/boot is assembled, one drive is "busy",
>>>> because it belongs to /dev/md_d127, and the array is put
>>>> together degraded, i.e. with the other disk only.
>>>
>>> Just a "me too". I also started seeing this after upgrading to  
>>> fedora
>>> 10. I had to create a startup script to stop md_d0 and reassemble
>>> everything else.
>>
>>
>> Yeah, I found the cause for this while working on F11.  The problem  
>> is a race condition between udev and a call to mdadm -As in the  
>> rc.sysinit.  For F11, I solved this by making udev not process  
>> devices using incremental mode if we are still in the rc.sysinit  
>> script.  You can change /etc/udev/rules.d/70-mdadm.rules (I think  
>> that's the right name, it might be slightly off) to read something  
>> like this:
>>
>> # This file causes block devices with Linux RAID (mdadm) signatures  
>> to
>> # automatically cause mdadm to be run.
>> # See udev(8) for syntax
>>
>> SUBSYSTEM=="block", ACTION=="add",  
>> ENV{ID_FS_TYPE}=="linux_raid_member", \
>> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
>> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I  
>> $env{DEVNAME}'"
>>
>>
>
> i believe i saw this as well, but not at startup, it was when i  
> manually
> run mdadm -As, so while your hack to prevent udev from assembling
> devices while in sysinit may not be a full solution.

No, it is.  In your situation, the rules line must have read  
ACTION="add|change".  The fact that the incremental assembly rule  
would watch a change event means that when mdadm opens any device to  
scan for a superblock and then closes it, it would trigger the rule  
(yes, just opening and closing the device special file will trigger a  
change event), which would then race with mdadm using it for its own  
purposes.  The rule above does not watch change events, only add  
events.  Those only happen once when the device is added, not when  
mdadm scans the devices looking for superblocks.  Any time mdadm races  
with itself, one trying to assemble and one trying to do incremental  
assembly, you get split arrays with neither one started.

>
> my solution was "rm -f /etc/udev/rules.d/70-mdadm.rules",
> works like a charm :P
>
> probably the best solution is preventing concurrent mdadm rules with a
> lock.
>
> Regards,
> L.
>
>
>
>
> -- 
> Luca Berra -- bluca@comedia.it
>        Communication Media & Services S.r.l.
> /"\
> \ /     ASCII RIBBON CAMPAIGN
>  X        AGAINST HTML MAIL
> / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-23 11:05                                   ` Doug Ledford
@ 2009-04-23 21:31                                     ` Luca Berra
  2009-04-24 16:46                                       ` Doug Ledford
  0 siblings, 1 reply; 59+ messages in thread
From: Luca Berra @ 2009-04-23 21:31 UTC (permalink / raw)
  To: linux-raid

On Thu, Apr 23, 2009 at 07:05:04AM -0400, Doug Ledford wrote:
>>> # This file causes block devices with Linux RAID (mdadm) signatures to
>>> # automatically cause mdadm to be run.
>>> # See udev(8) for syntax
>>>
>>> SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="linux_raid_member", 
>>> \
>>> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
>>> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I 
>>> $env{DEVNAME}'"
>>>
>>>
>>
>> i believe i saw this as well, but not at startup, it was when i manually
>> run mdadm -As, so while your hack to prevent udev from assembling
>> devices while in sysinit may not be a full solution.
>
> No, it is.  In your situation, the rules line must have read 
> ACTION="add|change".  The fact that the incremental assembly rule would 
you are probably right about that, i tried with your ruleset and it
looks like the problem was due to the change ACTION
just out of curiosity what is the use of the IMPORT statement, is it
needed by some other rule?

>> my solution was "rm -f /etc/udev/rules.d/70-mdadm.rules",
>> works like a charm :P
>>
>> probably the best solution is preventing concurrent mdadm rules with a
>> lock.

do you think the last suggestion of having mdadm protect from itself
would be of use?
I think it might still happen when stacking arrays
i.e. mirror of stripes
running mdadm -As would activate the first striped md and generate and 'add'
event, then while it is assembling the second one udev will trigger and
create a degraded mirror containing only the first one.

Regards,
L.



-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-23 21:31                                     ` Luca Berra
@ 2009-04-24 16:46                                       ` Doug Ledford
  0 siblings, 0 replies; 59+ messages in thread
From: Doug Ledford @ 2009-04-24 16:46 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5280 bytes --]

On Apr 23, 2009, at 5:31 PM, Luca Berra wrote:
> On Thu, Apr 23, 2009 at 07:05:04AM -0400, Doug Ledford wrote:
>>>> # This file causes block devices with Linux RAID (mdadm)  
>>>> signatures to
>>>> # automatically cause mdadm to be run.
>>>> # See udev(8) for syntax
>>>>
>>>> SUBSYSTEM=="block", ACTION=="add",  
>>>> ENV{ID_FS_TYPE}=="linux_raid_member", \
>>>> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
>>>> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I  
>>>> $env{DEVNAME}'"
>>>>
>>>>
>>>
>>> i believe i saw this as well, but not at startup, it was when i  
>>> manually
>>> run mdadm -As, so while your hack to prevent udev from assembling
>>> devices while in sysinit may not be a full solution.
>>
>> No, it is.  In your situation, the rules line must have read  
>> ACTION="add|change".  The fact that the incremental assembly rule  
>> would
> you are probably right about that, i tried with your ruleset and it
> looks like the problem was due to the change ACTION
> just out of curiosity what is the use of the IMPORT statement, is it
> needed by some other rule?

The IMPORT statement just causes udev to add the output of the program  
to its own list of environment variables.  Since vol_id doesn't pick  
up all the information that mdadm might care about, we use mdadm to  
supplant those environment variables.

>
>>> my solution was "rm -f /etc/udev/rules.d/70-mdadm.rules",
>>> works like a charm :P
>>>
>>> probably the best solution is preventing concurrent mdadm rules  
>>> with a
>>> lock.
>
> do you think the last suggestion of having mdadm protect from itself
> would be of use?

Not really.  The problem is that assemble and incremental use two  
different methods of bringing an array online and you can't mix the  
two.  With assemble, it will open all the devices until it gets a  
complete set, then open up a control channel to the md stack, init the  
array, add all the devices in one go, then start the array.  It was  
the scanning of the devices for superblocks that was getting picked up  
by the change event portion of the rule and causing udev to try and  
add the device to an incremental array before mdadm had collected all  
the devices and added them to its assembly based array.  Now, since  
assembly mode does everything in one go, you could conceivably lock  
against other assembly runs, but in practice that isn't a problem  
because mdadm will attempt to get an exclusive open on the constituent  
devices before starting the array.

Incremental mode is different in that it will take a single device,  
scan it for info, if it is a constituent device for an array that  
hasn't been seen yet (as per the md stack, which is true while  
assembly mode is busy scanning drives), then it will create a place  
holder array to stick the drive into, but won't attempt to start the  
array.  When assembly mode gets around to trying to start populate its  
array, the incremental array already exists (although unstarted) and  
so it picks another array.  Mdadm does not assume that you might call  
assemble on an already partially assembled incremental array.  After  
mdadm puts the device into the incremental array, it exits.  So,  
incremental wouldn't actually be able to hold a lock through the  
incremental process because each new device spawns a new mdadm, and we  
don't really know when that spawn will happen.

> I think it might still happen when stacking arrays
> i.e. mirror of stripes
> running mdadm -As would activate the first striped md and generate  
> and 'add'
> event, then while it is assembling the second one udev will trigger  
> and
> create a degraded mirror containing only the first one.

The udev rule is designed to handle exactly this type of situation.   
If you manually assemble the first array, then udev will see that and  
*start* to create the striped array on top, but because all devices  
aren't there yet, it will only put the first into the place holder  
array and not attempt to start it.  Then, when you manually create the  
second one, another add event for the second array happens, udev picks  
it up, sees that it's for the same array it's already been working on,  
and adds that device to the partially assembled array it created  
before.  Now both constituent devices are there and mdadm will go  
ahead and start the array.  So, it works like it should.  It's only a  
problem when you try to mix incremental and assembly mode operation on  
the *exact* same array.  Since udev only processes on add events now,  
in order to race with udev on manually starting a hot plugged array,  
you would likely have to be a quick typist or be trying to beat udev  
to the punch.

>
> Regards,
> L.
>
>
>
> -- 
> Luca Berra -- bluca@comedia.it
>        Communication Media & Services S.r.l.
> /"\
> \ /     ASCII RIBBON CAMPAIGN
>  X        AGAINST HTML MAIL
> / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-23  1:20                               ` Doug Ledford
  2009-04-23  5:51                                 ` Luca Berra
@ 2009-04-24 19:15                                 ` Piergiorgio Sartor
  2009-04-26 11:52                                   ` Doug Ledford
  1 sibling, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-24 19:15 UTC (permalink / raw)
  To: linux-raid

On Wed, Apr 22, 2009 at 09:20:49PM -0400, Doug Ledford wrote:
>
> # This file causes block devices with Linux RAID (mdadm) signatures to
> # automatically cause mdadm to be run.
> # See udev(8) for syntax
>
> SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="linux_raid_member", 
> \
> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I  
> $env{DEVNAME}'"

It seems the missing "change" is causing problems too.

In fact, the first time the array is connected (it's
hotpluggable), devices are "added", but the second
time the devices are already there (not removed by
unplug) and there is no "add" anymore, so the array
is not assembled.
If the devices are deleted (rm /dev/md/...), then
it works again (somehow, not really, to be honest).

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-24 19:15                                 ` Piergiorgio Sartor
@ 2009-04-26 11:52                                   ` Doug Ledford
  2009-04-26 12:14                                     ` Piergiorgio Sartor
  0 siblings, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2009-04-26 11:52 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 1822 bytes --]

On Apr 24, 2009, at 3:15 PM, Piergiorgio Sartor wrote:
> On Wed, Apr 22, 2009 at 09:20:49PM -0400, Doug Ledford wrote:
>>
>> # This file causes block devices with Linux RAID (mdadm) signatures  
>> to
>> # automatically cause mdadm to be run.
>> # See udev(8) for syntax
>>
>> SUBSYSTEM=="block", ACTION=="add",  
>> ENV{ID_FS_TYPE}=="linux_raid_member",
>> \
>> 	IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
>> 	RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I
>> $env{DEVNAME}'"
>
> It seems the missing "change" is causing problems too.
>
> In fact, the first time the array is connected (it's
> hotpluggable), devices are "added", but the second
> time the devices are already there (not removed by
> unplug) and there is no "add" anymore, so the array
> is not assembled.

I'm guessing that you didn't completely stop all usage of the hotplug  
devices before you removed them as this works fine for me.  If the  
devices aren't completely stopped before removal, then the stack can't  
delete the devices.

> If the devices are deleted (rm /dev/md/...), then
> it works again (somehow, not really, to be honest).


Removing the /dev/md/ device files does nothing of value.  However, I  
will note that I've seen udev take up to 30 or 45 seconds to process a  
bunch of md raid removals at the same time (aka, I did mdadm -S /dev/ 
md/* and it took udev that long to remove all the old device files).   
So, make sure you have completely stopped arrays before removing the  
devices, then watch /dev/md/ to wait until udev does its job, then  
only replug the array after that has happened and things should work  
fine.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-26 11:52                                   ` Doug Ledford
@ 2009-04-26 12:14                                     ` Piergiorgio Sartor
  2009-04-26 12:58                                       ` Piergiorgio Sartor
  2009-04-26 21:37                                       ` Michal Soltys
  0 siblings, 2 replies; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-26 12:14 UTC (permalink / raw)
  To: linux-raid

On Sun, Apr 26, 2009 at 07:52:15AM -0400, Doug Ledford wrote:
>
> I'm guessing that you didn't completely stop all usage of the hotplug  
> devices before you removed them as this works fine for me.  If the  
> devices aren't completely stopped before removal, then the stack can't  
> delete the devices.

Actually I did and *some* devices were removed.

I'm using a mdadm.conf with name and "--auto=md",
that is "name=/dev/mv/vol0" and so on.
On hot plug something happens and some devices are
created.
Then "mdadm --stop /dev/md/vol*" stops the RAIDs,
then up-plug.
The /md/dev/vol* are still there, other devices are gone.

> Removing the /dev/md/ device files does nothing of value.  However, I  

From what I saw with udevmonitor, it seems that,
with those files, there is no add event.

> will note that I've seen udev take up to 30 or 45 seconds to process a  
> bunch of md raid removals at the same time (aka, I did mdadm -S /dev/ 
> md/* and it took udev that long to remove all the old device files).   
> So, make sure you have completely stopped arrays before removing the  
> devices, then watch /dev/md/ to wait until udev does its job, then only 
> replug the array after that has happened and things should work fine.

I'll try again, I'll let you know.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-26 12:14                                     ` Piergiorgio Sartor
@ 2009-04-26 12:58                                       ` Piergiorgio Sartor
  2009-04-26 18:06                                         ` Doug Ledford
  2009-04-26 21:37                                       ` Michal Soltys
  1 sibling, 1 reply; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-26 12:58 UTC (permalink / raw)
  To: linux-raid

On Sun, Apr 26, 2009 at 02:14:12PM +0200, Piergiorgio Sartor wrote:
> On Sun, Apr 26, 2009 at 07:52:15AM -0400, Doug Ledford wrote:
> >
> > I'm guessing that you didn't completely stop all usage of the hotplug  
> > devices before you removed them as this works fine for me.  If the  
> > devices aren't completely stopped before removal, then the stack can't  
> > delete the devices.
[...]

OK, so more clear info.
The /etc/udev/rules/70-mdadm.rules is:

# This file causes block devices with Linux RAID (mdadm) signatures to
# automatically cause mdadm to be run.
# See udev(8) for syntax

SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="linux_raid*", \
	RUN+="/sbin/mdadm -I --auto=yes $root/%k"

This is the same F10 standard, but without the "change"
option in the "ACTION".

On hotplug, I get a mess in the arrays, not all and
not always they are properly added.
This is similar to what happen with "change" in place.
Already at this point, something is fishy.

The /dev/md contains:

vol00    vol00p4  vol01p3  vol02p2  vol03p1  vol04    vol04p4  vol05p3  vol06p2
vol00p1  vol01    vol01p4  vol02p3  vol03p2  vol04p1  vol05    vol05p4  vol06p3
vol00p2  vol01p1  vol02    vol02p4  vol03p3  vol04p2  vol05p1  vol06    vol06p4
vol00p3  vol01p2  vol02p1  vol03    vol03p4  vol04p3  vol05p2  vol06p1

Note that these arrays have no partitions and no
filesystem, since they are PV of LVM.
The vol0X are the names of the arrays.

I manually remove the arrays, with "mdadm --stop --scan".
Now, the files are still there after removing the arrays,
even if there is no sign of the RAID in /proc/mdstat.
After un-plug, they are still there.

If I hot plug again the device, nothing happens, the arrays
are not auto-started by udev.
If I remove the /dev/md/vol* files, then it does something,
even if not correctly, as mentioned above.

If I tried, from command line:

mdadm -I --auto=yes /dev/sdd1

I get:

mdadm: failed to open /dev/md/vol00: File exists.

If I delete the /dev/md/vol* files, and I do manually
the "-I" thing with all the proper devices, the array
is assembled properly.

mdadm -I --auto=yes /dev/sdd1
/dev/md_vol00p1: File exists
/dev/md_vol00p2: File exists
/dev/md_vol00p3: File exists
/dev/md_vol00p4: File exists
mdadm: /dev/sdd1 attached to /dev/md/vol00, not enough to start (1).

Note that the /dev/md/ was empty before the command
was given.

I tried, right now, to re-add "change", and I get the same
result, so it seems the "add|change" or "add" alone are
doing the same, but still there are two problems.
One is that the arrays are not assembled properly, the
other is that they're not assembled at all if the files
are there.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-26 12:58                                       ` Piergiorgio Sartor
@ 2009-04-26 18:06                                         ` Doug Ledford
  2009-04-26 19:08                                           ` Piergiorgio Sartor
  0 siblings, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2009-04-26 18:06 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3739 bytes --]

On Apr 26, 2009, at 8:58 AM, Piergiorgio Sartor wrote:
> On Sun, Apr 26, 2009 at 02:14:12PM +0200, Piergiorgio Sartor wrote:
>> On Sun, Apr 26, 2009 at 07:52:15AM -0400, Doug Ledford wrote:
>>>
>>> I'm guessing that you didn't completely stop all usage of the  
>>> hotplug
>>> devices before you removed them as this works fine for me.  If the
>>> devices aren't completely stopped before removal, then the stack  
>>> can't
>>> delete the devices.
> [...]
>
> This is the same F10 standard, but without the "change"
> option in the "ACTION".

F10 still has some issues.  For things to work well, you need both the  
64-md-raid.rules file from the latest udev package and also the 65-md- 
incremental.rules file from the F11 mdadm package.

> On hotplug, I get a mess in the arrays, not all and
> not always they are properly added.
> This is similar to what happen with "change" in place.
> Already at this point, something is fishy.
>
> The /dev/md contains:
>
> vol00    vol00p4  vol01p3  vol02p2  vol03p1  vol04    vol04p4   
> vol05p3  vol06p2
> vol00p1  vol01    vol01p4  vol02p3  vol03p2  vol04p1  vol05     
> vol05p4  vol06p3
> vol00p2  vol01p1  vol02    vol02p4  vol03p3  vol04p2  vol05p1   
> vol06    vol06p4
> vol00p3  vol01p2  vol02p1  vol03    vol03p4  vol04p3  vol05p2  vol06p1

The partitions are there because of the --auto=yes in the incremental  
command in the udev rules file.  For F11 and later, since we no longer  
specifically need partitionable arrays as all block devices are now  
partitionable, you don't get this unless partitions actually exist on  
the device.

> Note that these arrays have no partitions and no
> filesystem, since they are PV of LVM.
> The vol0X are the names of the arrays.
>
> I manually remove the arrays, with "mdadm --stop --scan".
> Now, the files are still there after removing the arrays,
> even if there is no sign of the RAID in /proc/mdstat.
> After un-plug, they are still there.

This is also because of the --auto=yes line in the incremental command  
combined with the older 64-md-raid.rules file from udev.  In the  
latest version, udev creates all the files, mdadm creates none.  Also,  
it might be caused by the md devices never really getting deleted at  
the kernel level.  I'm not sure what kernel version the code to  
actually fully delete an md device on stop went in, but without that,  
udev doesn't know to remove the old files.

> If I hot plug again the device, nothing happens, the arrays
> are not auto-started by udev.
> If I remove the /dev/md/vol* files, then it does something,
> even if not correctly, as mentioned above.
>
> If I tried, from command line:
>
> mdadm -I --auto=yes /dev/sdd1
>
> I get:
>
> mdadm: failed to open /dev/md/vol00: File exists.
>
> If I delete the /dev/md/vol* files, and I do manually
> the "-I" thing with all the proper devices, the array
> is assembled properly.
>
> mdadm -I --auto=yes /dev/sdd1
> /dev/md_vol00p1: File exists
> /dev/md_vol00p2: File exists
> /dev/md_vol00p3: File exists
> /dev/md_vol00p4: File exists
> mdadm: /dev/sdd1 attached to /dev/md/vol00, not enough to start (1).
>
> Note that the /dev/md/ was empty before the command
> was given.
>
> I tried, right now, to re-add "change", and I get the same
> result, so it seems the "add|change" or "add" alone are
> doing the same, but still there are two problems.
> One is that the arrays are not assembled properly, the
> other is that they're not assembled at all if the files
> are there.


You can update the two udev rules files and things should work fine  
after that.

--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-26 18:06                                         ` Doug Ledford
@ 2009-04-26 19:08                                           ` Piergiorgio Sartor
  0 siblings, 0 replies; 59+ messages in thread
From: Piergiorgio Sartor @ 2009-04-26 19:08 UTC (permalink / raw)
  To: linux-raid

On Sun, Apr 26, 2009 at 02:06:31PM -0400, Doug Ledford wrote:
[...]
> You can update the two udev rules files and things should work fine  
> after that.

Thank you for the clarifications.

Unfortunately, it seems that the two rules sets are
not working on F10.
After adding the two files and removing the old one
(I guess replaced by the 65-md-incremental.rules),
on hot plug nothing happens.
I removed all the old md files, but this did not help.

Maybe something else from F11 is required, udev-141
or mdadm-3.0 or the kernel.

Thanks again,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-26 12:14                                     ` Piergiorgio Sartor
  2009-04-26 12:58                                       ` Piergiorgio Sartor
@ 2009-04-26 21:37                                       ` Michal Soltys
  1 sibling, 0 replies; 59+ messages in thread
From: Michal Soltys @ 2009-04-26 21:37 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

Piergiorgio Sartor wrote:
> 
>> Removing the /dev/md/ device files does nothing of value.  However, I  
> 
>>From what I saw with udevmonitor, it seems that,
> with those files, there is no add event.
> 

Few remarks:

'add' uevent will happen only, if created or assembled array uses kernel 
name not yet used (in the other words, it's not present under 
/sys/block/ yet).

Actual assembly and removal cause only 'change' uevents.

Moreover - if you have partitions on the raid device, access to the 
device (usually any mdadm udev rules will trigger it due to e.g. vol_id, 
etc.) will trigger sequence of additional partition 'add' uevents.

If you issue mdadm -S, 'change' is issued for the block device, but if 
you have any md partitions on such array - 'remove' uevents for them 
will happen when another arrary (possibly the same) is created or 
assembled using the same kernel name. OR - you can for example issue 
blockdev --rereadpt /dev/md... to trigger 'remove' partition uevents 
manually and immediately.

'remove' for actual raid device will not happen. Mdadm doesn't do it 
(recalling my old discussion with Neil, it's due to some subtleties, and 
  coding it it's just not worth the effort, at least not for 2.9.x). So 
- you're left with inactive /dev node and /sys/block entry.

Still, if all your arrays and stoped, and e.g. you issue rmmod raid1 (if 
we talk about raid1 arrays for the sake of the example) - that would 
cause 'remove' uevents of course.

This is all nicely visible under udevd --debug - you might need to 
simplify your rule files in a few places to control the clobber though.

Michal



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Patch] mdadm ignoring homehost?
  2009-04-20 13:15             ` Doug Ledford
  2009-04-21  6:54               ` Neil Brown
@ 2009-05-11  6:47               ` Neil Brown
  1 sibling, 0 replies; 59+ messages in thread
From: Neil Brown @ 2009-05-11  6:47 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Jon Nelson, LinuxRaid

On Monday April 20, dledford@redhat.com wrote:
> 
> IMO, with my name selection patch, with the option to list an array as  
> ignore, and with the option to turn either assembly or incremental  
> modes completely off as I suggested, done in combination with a boot  
> sequence like we now use in Fedora, you have this problem solved.

I've thought a lot about all that you have said, and made some
changes, and hopefully have something that we can all be happy with.

As you may have notices, I just released 3.0-rc1.  Maybe that is a
little optimistic as I haven't sought any feedback concerning my
recent changes, but I really want 3.0 out soon...

I have added the option of putting
  HOMEHOST <ignore>

in mdadm.conf which does - I think - essentially what you want for
Fedora.

It tells mdadm not to put much weight on the 'homehost', but to
instead look to mdadm.conf to be as authoritative as necessary.

So if a name is in use in mdadm.conf, then no other array will get
that name.  But if a name is not in use in mdadm.conf, then it is
available on a first-come, first-served basis.

There are a bunch of other bug fixes in there too.

Please try it out and let me know if you find it usable.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2009-05-11  6:47 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-24 16:57 mdadm ignoring homehost? Jon Nelson
2009-04-01 15:15 ` Jon Nelson
2009-04-01 22:46   ` Neil Brown
2009-04-06 14:47     ` [Patch] " Doug Ledford
2009-04-06 19:33       ` Luca Berra
2009-04-17  3:49       ` Neil Brown
2009-04-17  7:08         ` Gabor Gombas
2009-04-20  5:23           ` Neil Brown
2009-04-21  6:34             ` Gabor Gombas
2009-04-21  7:06               ` Luca Berra
2009-04-17 18:17         ` Doug Ledford
2009-04-17 18:40           ` Piergiorgio Sartor
2009-04-18  7:54             ` Luca Berra
2009-04-18  8:36               ` Piergiorgio Sartor
2009-04-18 10:19                 ` Luca Berra
2009-04-18 13:06                   ` Piergiorgio Sartor
2009-04-20  5:58                     ` Neil Brown
2009-04-20 12:29                       ` Doug Ledford
2009-04-20 18:17                       ` Piergiorgio Sartor
2009-04-20 19:49                         ` Leslie Rhorer
2009-04-20 20:04                           ` Piergiorgio Sartor
2009-04-20 21:18                           ` Luca Berra
2009-04-20 21:13                         ` Luca Berra
2009-04-20 21:24                           ` Piergiorgio Sartor
2009-04-20 23:47                             ` Doug Ledford
2009-04-21  0:00                               ` Doug Ledford
2009-04-21  8:57                                 ` Michal Soltys
2009-04-21  6:29                               ` Luca Berra
2009-04-21 18:15                           ` Piergiorgio Sartor
2009-04-22 16:06                             ` Andrew Burgess
2009-04-23  1:20                               ` Doug Ledford
2009-04-23  5:51                                 ` Luca Berra
2009-04-23  6:09                                   ` Luca Berra
2009-04-23 11:05                                   ` Doug Ledford
2009-04-23 21:31                                     ` Luca Berra
2009-04-24 16:46                                       ` Doug Ledford
2009-04-24 19:15                                 ` Piergiorgio Sartor
2009-04-26 11:52                                   ` Doug Ledford
2009-04-26 12:14                                     ` Piergiorgio Sartor
2009-04-26 12:58                                       ` Piergiorgio Sartor
2009-04-26 18:06                                         ` Doug Ledford
2009-04-26 19:08                                           ` Piergiorgio Sartor
2009-04-26 21:37                                       ` Michal Soltys
2009-04-18 14:34             ` Andrew Burgess
2009-04-18  8:12           ` Luca Berra
2009-04-18  8:44             ` Piergiorgio Sartor
2009-04-18 13:35             ` Doug Ledford
2009-04-18 13:52               ` Piergiorgio Sartor
2009-04-18 14:50                 ` Doug Ledford
2009-04-18 14:48               ` Jon Nelson
2009-04-20  6:08               ` Neil Brown
2009-04-20 12:26                 ` Luca Berra
2009-04-20 12:36                 ` Doug Ledford
2009-04-18 13:58           ` Bill Davidsen
2009-04-20  7:23           ` Neil Brown
2009-04-20 13:15             ` Doug Ledford
2009-04-21  6:54               ` Neil Brown
2009-05-11  6:47               ` Neil Brown
2009-04-01 22:47 ` Michal Soltys

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.