All of lore.kernel.org
 help / color / mirror / Atom feed
* Spare drive won't spin down
@ 2010-05-06 17:49 Joe Bryant
  2010-05-07  5:07 ` Michael Evans
  2010-05-07  6:20 ` Neil Brown
  0 siblings, 2 replies; 14+ messages in thread
From: Joe Bryant @ 2010-05-06 17:49 UTC (permalink / raw)
  To: linux-raid

Hello -

I hope some kind soul is able to help. I have configured a RAID1 array - two mirrored drives, one spare. My understanding is that the spare disk should  be allowed to spin down, but it won't. 

When I searched for troubleshooting information, I found talk of work that had been done specifically to allow the spare to spin down - to do with decrementing the event counter instead of incrementing it. So, I'm surprised this doesn't seem to be working in my case.

If I poll /proc/diskstats, I see writes to the spare drive (sdb1) every few seconds.

If I remove the drive from the RAID array, the writes stop and it is able to spin down.

If I do "mdadm --examine /dev/sdb1", I can see the event counter is increasing every two or three seconds.

The kernel is 2.6.32-21-generic - a new Ubuntu 10.04 install.

/proc/mdstat yields:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdb1[2](S) sda1[0](W) sdc1[1]
      29295552 blocks [2/2] [UU]
      
md1 : active raid1 sdb5[2](S) sda5[0](W) sdc5[1]
      1951680 blocks [2/2] [UU]
      
unused devices: <none>

Other facts that I don't think should be relevant but you may know better:

- You can see that in fact I have two RAID arrays - md0 is the 
troublesome one.

- md0 contains LVM, which contains ext4, which contains root filesystem.

- sdc is an SSD drive. sda and sdb are spinning disks.

- If I do "cat /sys/block/md0/md/safe_mode_delay" I get "0.210".

- I briefly had write-intent bitmaps set up for both md0 and md1, but I have since removed them.

This box is a fresh build and can be rebuilt if that's necessary to fix 
or diagnose the problem.

Thanks in advance for any help. Linux software RAID is just terrific - I report this minor issue largely because, as I said, it seems like this is something that is *supposed* to work.

--  
Joe Bryant



      

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-06 17:49 Spare drive won't spin down Joe Bryant
@ 2010-05-07  5:07 ` Michael Evans
  2010-05-07  7:39   ` Joe Bryant
  2010-05-07  6:20 ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Evans @ 2010-05-07  5:07 UTC (permalink / raw)
  To: Joe Bryant; +Cc: linux-raid

On Thu, May 6, 2010 at 10:49 AM, Joe Bryant <tenminjoe@yahoo.com> wrote:
> Hello -
>
> I hope some kind soul is able to help. I have configured a RAID1 array - two mirrored drives, one spare. My understanding is that the spare disk should  be allowed to spin down, but it won't.
>
> When I searched for troubleshooting information, I found talk of work that had been done specifically to allow the spare to spin down - to do with decrementing the event counter instead of incrementing it. So, I'm surprised this doesn't seem to be working in my case.
>
> If I poll /proc/diskstats, I see writes to the spare drive (sdb1) every few seconds.
>
> If I remove the drive from the RAID array, the writes stop and it is able to spin down.
>
> If I do "mdadm --examine /dev/sdb1", I can see the event counter is increasing every two or three seconds.
>
> The kernel is 2.6.32-21-generic - a new Ubuntu 10.04 install.
>
> /proc/mdstat yields:
>
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md0 : active raid1 sdb1[2](S) sda1[0](W) sdc1[1]
>      29295552 blocks [2/2] [UU]
>
> md1 : active raid1 sdb5[2](S) sda5[0](W) sdc5[1]
>      1951680 blocks [2/2] [UU]
>
> unused devices: <none>
>
> Other facts that I don't think should be relevant but you may know better:
>
> - You can see that in fact I have two RAID arrays - md0 is the
> troublesome one.
>
> - md0 contains LVM, which contains ext4, which contains root filesystem.
>
> - sdc is an SSD drive. sda and sdb are spinning disks.
>
> - If I do "cat /sys/block/md0/md/safe_mode_delay" I get "0.210".
>
> - I briefly had write-intent bitmaps set up for both md0 and md1, but I have since removed them.
>
> This box is a fresh build and can be rebuilt if that's necessary to fix
> or diagnose the problem.
>
> Thanks in advance for any help. Linux software RAID is just terrific - I report this minor issue largely because, as I said, it seems like this is something that is *supposed* to work.
>
> --
> Joe Bryant
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

So you want a near-line spare, or whatever the term is.

Not quite a 'hot' spare; which is what you have now (and with RAID1
makes no sense as an additional mirror improves performance,
distributes load, and keeps a ready backup for almost the same cost).
Nor a 'cold' spare; sitting on a shelf out of the box.

More of a 'warm' spare; in the box, ready to kick in as soon as needed
but otherwise spun down and idled.

I'm unsure if the mdadm framework currently supports it.  If it does
you should look in to creating a 'spare group' and having the 'spare'
in that group (which should only change when it's pulled out and added
to a different group/array).  It would thus be able to idle the drive
but keep the non-moving parts responsive to the system.

Or at least that's how I'd guess it works.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-06 17:49 Spare drive won't spin down Joe Bryant
  2010-05-07  5:07 ` Michael Evans
@ 2010-05-07  6:20 ` Neil Brown
  2010-05-07  7:40   ` Joe Bryant
  1 sibling, 1 reply; 14+ messages in thread
From: Neil Brown @ 2010-05-07  6:20 UTC (permalink / raw)
  To: Joe Bryant; +Cc: linux-raid

On Thu, 6 May 2010 10:49:06 -0700 (PDT)
Joe Bryant <tenminjoe@yahoo.com> wrote:

> Hello -
> 
> I hope some kind soul is able to help. I have configured a RAID1 array - two mirrored drives, one spare. My understanding is that the spare disk should  be allowed to spin down, but it won't. 

Yes, it should.  But as you say, it doesn't.
More precisely, it seems to work if you create an array with 1.x metadata
(e.g. --metadata=1.0) but not if you use 0.90 (which the default for mdadm
before 3.1).

It seems I broke it with commit 51d5668cb2e3f in 2.6.31.

I'll see about fixing that.

> 
> This box is a fresh build and can be rebuilt if that's necessary to fix 
> or diagnose the problem.

Recreate the array with "--metadata 1.0" or "--metadata 1.2" and it should
work better.

> 
> Thanks in advance for any help. Linux software RAID is just terrific - I report this minor issue largely because, as I said, it seems like this is something that is *supposed* to work.
> 

Yes, thanks for reporting it.

NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-07  5:07 ` Michael Evans
@ 2010-05-07  7:39   ` Joe Bryant
  0 siblings, 0 replies; 14+ messages in thread
From: Joe Bryant @ 2010-05-07  7:39 UTC (permalink / raw)
  To: Michael Evans; +Cc: linux-raid

> More of a 'warm' spare; in the box, ready to kick in as soon as needed

> but otherwise spun down and idled.

Exactly so.

> If it does you should look in to creating a 'spare group' and having the 'spare'
> in that group (which should only change when it's pulled out and added
> to a different group/array).

Sounds good, but I don't see a way to do that. As far as I can tell, a "spares group" isn't a thing that exists in its own right - it's a property of an array, and arrays with a matching "spares group" property will share spare disks if necessary and possible. I'd be pleased to be wrong, let me know if I've misunderstood.

Thanks -

-- 
Joe



      

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-07  6:20 ` Neil Brown
@ 2010-05-07  7:40   ` Joe Bryant
  2010-05-07  9:47     ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Joe Bryant @ 2010-05-07  7:40 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

> I'll see about fixing that.
>
> Recreate the array with "--metadata 1.0" or "--metadata 1.2" and it should work better.

That's great, thanks very much.

-- 
Joe



      

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-07  7:40   ` Joe Bryant
@ 2010-05-07  9:47     ` Neil Brown
  2010-05-07 10:05       ` Joe Bryant
  2010-05-10 16:59       ` Bill Davidsen
  0 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2010-05-07  9:47 UTC (permalink / raw)
  To: Joe Bryant; +Cc: linux-raid

On Fri, 7 May 2010 00:40:40 -0700 (PDT)
Joe Bryant <tenminjoe@yahoo.com> wrote:

> > I'll see about fixing that.
> >
> > Recreate the array with "--metadata 1.0" or "--metadata 1.2" and it should work better.
> 
> That's great, thanks very much.
> 

It turns out it is a bit more subtle than that, though that approach may work
for you.
If you make an odd number of changes to the metadata, it will switch from
doing what you want, to not.
e.g. if /dev/foo is your spare device, then

  mdadm /dev/md0 -f /dev/foo
  mdadm /dev/md0 -r /dev/foo
  mdadm /dev/md0 -a /dev/foo

will switch between working and not working.  v0.90 metadata starts out not
working.  v1.x start out working.

Alternately the following patch will fix it properly.

Thanks,
NeilBrown


From 12aee54f4fee61cdf4e594e501834737fd6b9605 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Fri, 7 May 2010 19:44:26 +1000
Subject: [PATCH] md: restore ability of spare drives to spin down.

Some time ago we stopped the clean/active metadata updates
from being written to a 'spare' device in most cases so that
it could spin down and say spun down.  Device failure/removal
etc are still recorded on spares.

However commit 51d5668cb2e3fd1827a55 broke this 50% of the time,
depending on whether the event count is even or odd.
The change log entry said:

   This means that the alignment between 'odd/even' and
    'clean/dirty' might take a little longer to attain,

how ever the code makes no attempt to create that alignment, so it
could take arbitrarily long.

So when we find that clean/dirty is not aligned with odd/even,
force a second metadata-update immediately.  There are already cases
where a second metadata-update is needed immediately (e.g. when a
device fails during the metadata update).  We just piggy-back on that.

Reported-by: Joe Bryant <tenminjoe@yahoo.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 66ef15c..4996f27 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2217,12 +2217,18 @@ repeat:
 		if (!mddev->in_sync || mddev->recovery_cp != MaxSector) { /* not clean */
 			/* .. if the array isn't clean, an 'even' event must also go
 			 * to spares. */
-			if ((mddev->events&1)==0)
+			if ((mddev->events&1)==0) {
 				nospares = 0;
+				sync_req = 2; /* force a second update to get the
+					       * even/odd in sync */
+			}
 		} else {
 			/* otherwise an 'odd' event must go to spares */
-			if ((mddev->events&1))
+			if ((mddev->events&1)) {
 				nospares = 0;
+				sync_req = 2; /* force a second update to get the
+					       * even/odd in sync */
+			}
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-07  9:47     ` Neil Brown
@ 2010-05-07 10:05       ` Joe Bryant
  2010-05-10 16:59       ` Bill Davidsen
  1 sibling, 0 replies; 14+ messages in thread
From: Joe Bryant @ 2010-05-07 10:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>  mdadm /dev/md0 -f /dev/foo
>  mdadm /dev/md0 -r /dev/foo
>  mdadm /dev/md0 -a /dev/foo

> will switch between working and not working. 

It worked! Thanks so much.

 Joe Bryant


      

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-07  9:47     ` Neil Brown
  2010-05-07 10:05       ` Joe Bryant
@ 2010-05-10 16:59       ` Bill Davidsen
  2010-05-11 20:53         ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2010-05-10 16:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: Joe Bryant, linux-raid

Neil Brown wrote:
> On Fri, 7 May 2010 00:40:40 -0700 (PDT)
> Joe Bryant <tenminjoe@yahoo.com> wrote:
>
>   
>>> I'll see about fixing that.
>>>
>>> Recreate the array with "--metadata 1.0" or "--metadata 1.2" and it should work better.
>>>       
>> That's great, thanks very much.
>>
>>     
>
> It turns out it is a bit more subtle than that, though that approach may work
> for you.
> If you make an odd number of changes to the metadata, it will switch from
> doing what you want, to not.
> e.g. if /dev/foo is your spare device, then
>
>   mdadm /dev/md0 -f /dev/foo
>   mdadm /dev/md0 -r /dev/foo
>   mdadm /dev/md0 -a /dev/foo
>
> will switch between working and not working.  v0.90 metadata starts out not
> working.  v1.x start out working.
>   

So we can assume that the little dance steps above will make 1.x 
misbehave in the same way?

Could you explain (point to an explanation) why this whole odd/even 
thing exists?

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-10 16:59       ` Bill Davidsen
@ 2010-05-11 20:53         ` Neil Brown
  2010-05-17 18:11           ` Doug Ledford
  2010-05-18  0:20           ` Neil Brown
  0 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2010-05-11 20:53 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Joe Bryant, linux-raid

On Mon, 10 May 2010 12:59:33 -0400
Bill Davidsen <davidsen@tmr.com> wrote:

> Neil Brown wrote:
> > On Fri, 7 May 2010 00:40:40 -0700 (PDT)
> > Joe Bryant <tenminjoe@yahoo.com> wrote:
> >
> >   
> >>> I'll see about fixing that.
> >>>
> >>> Recreate the array with "--metadata 1.0" or "--metadata 1.2" and it should work better.
> >>>       
> >> That's great, thanks very much.
> >>
> >>     
> >
> > It turns out it is a bit more subtle than that, though that approach may work
> > for you.
> > If you make an odd number of changes to the metadata, it will switch from
> > doing what you want, to not.
> > e.g. if /dev/foo is your spare device, then
> >
> >   mdadm /dev/md0 -f /dev/foo
> >   mdadm /dev/md0 -r /dev/foo
> >   mdadm /dev/md0 -a /dev/foo
> >
> > will switch between working and not working.  v0.90 metadata starts out not
> > working.  v1.x start out working.
> >   
> 
> So we can assume that the little dance steps above will make 1.x 
> misbehave in the same way?

Yes.

> 
> Could you explain (point to an explanation) why this whole odd/even 
> thing exists?
> 

Maybe ....

For backwards compatibility, the event counts in all the devices in an array
must not differ by more than 1.  And if the information in the superblocks is
different, then the event counts must be different to ensure that the current
information is used when the array is restarted.

Consequently, if the event counts are uniform across an array it is safe to
just mark the superblocks on active drives as 'dirty' leaving spare drives
alone.
To then mark the array as 'clean' again we would need to either update the
metadata on the spares (which we don't want to do) or decrease the event
count on the active devices.
However there are cases where decreasing the event count on active devices is
not safe.
If the array was dirty and we failed a device that would update the event
count everywhere but on the failed device.
When we then want to mark the array as 'clean' it is *not* safe to decrement
the event count as then the failed drive could look like it is still a valid
member of the array.

I had the idea that I could encode this extra information in the odd/even
status of the event count.  However it seems that now I explain it out loud
it doesn't actually make a lot of sense.

I should keep the "it is safe to decrement the event count" state in some
internal state variable and assume it is 'false' when an array is started.
That would be heaps cleaner and would actually do the right thing.

Theoretically, when the spares are one behind the active array and we need to
update them all, we should update the spares first, then the rest.  If we
don't and there is a crash at the wrong time, some spares could be 2 events
behind the most recent device.  However that is a fairly unlikely race to
lose and the cost is only having a spare device fall out of the array, which
is quite easy to put back it, that I might not worry to much about it.

So if you haven't seen a patch to fix this in a week or two, please remind me.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-11 20:53         ` Neil Brown
@ 2010-05-17 18:11           ` Doug Ledford
  2010-05-18  0:23             ` Neil Brown
  2010-05-18  0:20           ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Doug Ledford @ 2010-05-17 18:11 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Joe Bryant, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

On 05/11/2010 04:53 PM, Neil Brown wrote:
> Theoretically, when the spares are one behind the active array and we need to
> update them all, we should update the spares first, then the rest.  If we
> don't and there is a crash at the wrong time, some spares could be 2 events
> behind the most recent device.  However that is a fairly unlikely race to
> lose and the cost is only having a spare device fall out of the array, which
> is quite easy to put back it, that I might not worry to much about it.
> 
> So if you haven't seen a patch to fix this in a week or two, please remind me.

They're spares.  They don't need to have an in-sync generation count.
Change the semantics so that spares have a 0 generation count and only
when they've been activated does their count get brought into sync with
the rest of the array.  The only thing needed to make that work is to
not kick them on assembly because their generation count doesn't match,
but that should be trivial (and sane) to do based on the fact that if
they are a spare, they don't contain data, so the generation count
really is meaningless.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-11 20:53         ` Neil Brown
  2010-05-17 18:11           ` Doug Ledford
@ 2010-05-18  0:20           ` Neil Brown
  1 sibling, 0 replies; 14+ messages in thread
From: Neil Brown @ 2010-05-18  0:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Joe Bryant, linux-raid

On Wed, 12 May 2010 06:53:18 +1000
Neil Brown <neilb@suse.de> wrote:

> Theoretically, when the spares are one behind the active array and we need to
> update them all, we should update the spares first, then the rest.  If we
> don't and there is a crash at the wrong time, some spares could be 2 events
> behind the most recent device.  However that is a fairly unlikely race to
> lose and the cost is only having a spare device fall out of the array, which
> is quite easy to put back it, that I might not worry to much about it.
> 
> So if you haven't seen a patch to fix this in a week or two, please remind me.
> 

This is the sort of thing I was thinking of.
Comments?

Thanks,
NeilBrown

From bf7399c0f1e95e8af30f93114f96fcc73cb0d7c6 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 18 May 2010 09:28:43 +1000
Subject: [PATCH] md: simplify updating of event count to sometimes avoid updating spares.

When updating the event count for a simple clean <-> dirty transition,
we try to avoid updating the spares so they can safely spin-down.
As the event_counts across an array must be +/- 1, this means
decrementing the event_count on a dirty->clean transition.
This is not always safe and we have to avoid the unsafe time.
We current do this with a misguided idea about it being safe or
not depending on whether the event_count is odd or even.  This
approach only works reliably in a few common instances, but easily
falls down.

So instead, simply keep internal state concerning whether it is safe
or not, and always assume it is not safe when an array is first
assembled.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fec4abc..9ef21d9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2088,7 +2088,6 @@ static void sync_sbs(mddev_t * mddev, int nospares)
 		if (rdev->sb_events == mddev->events ||
 		    (nospares &&
 		     rdev->raid_disk < 0 &&
-		     (rdev->sb_events&1)==0 &&
 		     rdev->sb_events+1 == mddev->events)) {
 			/* Don't update this superblock */
 			rdev->sb_loaded = 2;
@@ -2141,28 +2140,14 @@ repeat:
 	 * and 'events' is odd, we can roll back to the previous clean state */
 	if (nospares
 	    && (mddev->in_sync && mddev->recovery_cp == MaxSector)
-	    && (mddev->events & 1)
-	    && mddev->events != 1)
+	    && mddev->can_decrease_events
+	    && mddev->events != 1) {
 		mddev->events--;
-	else {
+		mddev->can_decrease_events = 0;
+	} else {
 		/* otherwise we have to go forward and ... */
 		mddev->events ++;
-		if (!mddev->in_sync || mddev->recovery_cp != MaxSector) { /* not clean */
-			/* .. if the array isn't clean, an 'even' event must also go
-			 * to spares. */
-			if ((mddev->events&1)==0) {
-				nospares = 0;
-				sync_req = 2; /* force a second update to get the
-					       * even/odd in sync */
-			}
-		} else {
-			/* otherwise an 'odd' event must go to spares */
-			if ((mddev->events&1)) {
-				nospares = 0;
-				sync_req = 2; /* force a second update to get the
-					       * even/odd in sync */
-			}
-		}
+		mddev->can_decrease_events = nospares;
 	}
 
 	if (!mddev->events) {
@@ -4606,6 +4591,7 @@ static void md_clean(mddev_t *mddev)
 	mddev->layout = 0;
 	mddev->max_disks = 0;
 	mddev->events = 0;
+	mddev->can_decrease_events = 0;
 	mddev->delta_disks = 0;
 	mddev->new_level = LEVEL_NONE;
 	mddev->new_layout = 0;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index a536f54..7ab5ea1 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -150,6 +150,12 @@ struct mddev_s
 	int				external_size; /* size managed
 							* externally */
 	__u64				events;
+	/* If the last 'event' was simply a clean->dirty transition, and
+	 * we didn't write it to the spares, then it is safe and simple
+	 * to just decrement the event count on a dirty->clean transition.
+	 * So we record that possibility here.
+	 */
+	int				can_decrease_events;
 
 	char				uuid[16];
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-17 18:11           ` Doug Ledford
@ 2010-05-18  0:23             ` Neil Brown
  2010-05-18  0:38               ` Michael Evans
  0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2010-05-18  0:23 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Bill Davidsen, Joe Bryant, linux-raid

On Mon, 17 May 2010 14:11:42 -0400
Doug Ledford <dledford@redhat.com> wrote:

> On 05/11/2010 04:53 PM, Neil Brown wrote:
> > Theoretically, when the spares are one behind the active array and we need to
> > update them all, we should update the spares first, then the rest.  If we
> > don't and there is a crash at the wrong time, some spares could be 2 events
> > behind the most recent device.  However that is a fairly unlikely race to
> > lose and the cost is only having a spare device fall out of the array, which
> > is quite easy to put back it, that I might not worry to much about it.
> > 
> > So if you haven't seen a patch to fix this in a week or two, please remind me.
> 
> They're spares.  They don't need to have an in-sync generation count.
> Change the semantics so that spares have a 0 generation count and only
> when they've been activated does their count get brought into sync with
> the rest of the array.  The only thing needed to make that work is to
> not kick them on assembly because their generation count doesn't match,
> but that should be trivial (and sane) to do based on the fact that if
> they are a spare, they don't contain data, so the generation count
> really is meaningless.
> 

Yes.
However current kernels will kick them on assembly and so I cannot make
future kernels leave the event count at zero, else people who try a new
kernel then revert to an old might find there spares drop off ... and could
well not be happy about that.

But I can certainly relax the requirement that the event count on a spare be
current now.  Maybe in a few years I can then stop updating the event count
on spares completely....

NeilBrown

From 589af914c85ebb8205ba315a1e78e76ec7bd9e99 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 18 May 2010 10:17:09 +1000
Subject: [PATCH] md: don't insist on valid event count for spare devices.

Devices which know that they are spares do not really need to have
an event count that matches the rest of the array, so there are no
data-in-sync issues. It is enough that the uuid matches.
So remove the requirement that the event count is up-to-date.

We currently still write out and event count on spares, but this
allows us in a year or 3 to stop doing that completely.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9ef21d9..26b3d28 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1070,10 +1070,13 @@ static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev)
 				mddev->bitmap_info.default_offset;
 
 	} else if (mddev->pers == NULL) {
-		/* Insist on good event counter while assembling */
+		/* Insist on good event counter while assembling, except
+		 * for spares (which don't need an event count) */
 		++ev1;
-		if (ev1 < mddev->events) 
-			return -EINVAL;
+		if (sb->disks[rdev->desc_nr].state & (
+			    (1<<MD_DISK_SYNC) | (1 << MD_DISK_ACTIVE)))
+			if (ev1 < mddev->events) 
+				return -EINVAL;
 	} else if (mddev->bitmap) {
 		/* if adding to array with a bitmap, then we can accept an
 		 * older device ... but not too old.
@@ -1469,10 +1472,14 @@ static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev)
 		}
 
 	} else if (mddev->pers == NULL) {
-		/* Insist of good event counter while assembling */
+		/* Insist of good event counter while assembling, except for
+		 * spares (which don't need an event count) */
 		++ev1;
-		if (ev1 < mddev->events)
-			return -EINVAL;
+		if (rdev->desc_nr >= 0 &&
+		    rdev->desc_nr < le32_to_cpu(sb->max_dev) &&
+		    le16_to_cpu(sb->dev_roles[rdev->desc_nr]) < 0xfffe)
+			if (ev1 < mddev->events)
+				return -EINVAL;
 	} else if (mddev->bitmap) {
 		/* If adding to array with a bitmap, then we can accept an
 		 * older device, but not too old.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-18  0:23             ` Neil Brown
@ 2010-05-18  0:38               ` Michael Evans
  2010-05-18  0:50                 ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Evans @ 2010-05-18  0:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: Doug Ledford, Bill Davidsen, Joe Bryant, linux-raid

On Mon, May 17, 2010 at 5:23 PM, Neil Brown <neilb@suse.de> wrote:
> On Mon, 17 May 2010 14:11:42 -0400
> Doug Ledford <dledford@redhat.com> wrote:
>
>> On 05/11/2010 04:53 PM, Neil Brown wrote:
>> > Theoretically, when the spares are one behind the active array and we need to
>> > update them all, we should update the spares first, then the rest.  If we
>> > don't and there is a crash at the wrong time, some spares could be 2 events
>> > behind the most recent device.  However that is a fairly unlikely race to
>> > lose and the cost is only having a spare device fall out of the array, which
>> > is quite easy to put back it, that I might not worry to much about it.
>> >
>> > So if you haven't seen a patch to fix this in a week or two, please remind me.
>>
>> They're spares.  They don't need to have an in-sync generation count.
>> Change the semantics so that spares have a 0 generation count and only
>> when they've been activated does their count get brought into sync with
>> the rest of the array.  The only thing needed to make that work is to
>> not kick them on assembly because their generation count doesn't match,
>> but that should be trivial (and sane) to do based on the fact that if
>> they are a spare, they don't contain data, so the generation count
>> really is meaningless.
>>
>
> Yes.
> However current kernels will kick them on assembly and so I cannot make
> future kernels leave the event count at zero, else people who try a new
> kernel then revert to an old might find there spares drop off ... and could
> well not be happy about that.
>
> But I can certainly relax the requirement that the event count on a spare be
> current now.  Maybe in a few years I can then stop updating the event count
> on spares completely....
>
> NeilBrown
>
> From 589af914c85ebb8205ba315a1e78e76ec7bd9e99 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Tue, 18 May 2010 10:17:09 +1000
> Subject: [PATCH] md: don't insist on valid event count for spare devices.
>
> Devices which know that they are spares do not really need to have
> an event count that matches the rest of the array, so there are no
> data-in-sync issues. It is enough that the uuid matches.
> So remove the requirement that the event count is up-to-date.
>
> We currently still write out and event count on spares, but this
> allows us in a year or 3 to stop doing that completely.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 9ef21d9..26b3d28 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1070,10 +1070,13 @@ static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev)
>                                mddev->bitmap_info.default_offset;
>
>        } else if (mddev->pers == NULL) {
> -               /* Insist on good event counter while assembling */
> +               /* Insist on good event counter while assembling, except
> +                * for spares (which don't need an event count) */
>                ++ev1;
> -               if (ev1 < mddev->events)
> -                       return -EINVAL;
> +               if (sb->disks[rdev->desc_nr].state & (
> +                           (1<<MD_DISK_SYNC) | (1 << MD_DISK_ACTIVE)))
> +                       if (ev1 < mddev->events)
> +                               return -EINVAL;
>        } else if (mddev->bitmap) {
>                /* if adding to array with a bitmap, then we can accept an
>                 * older device ... but not too old.
> @@ -1469,10 +1472,14 @@ static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev)
>                }
>
>        } else if (mddev->pers == NULL) {
> -               /* Insist of good event counter while assembling */
> +               /* Insist of good event counter while assembling, except for
> +                * spares (which don't need an event count) */
>                ++ev1;
> -               if (ev1 < mddev->events)
> -                       return -EINVAL;
> +               if (rdev->desc_nr >= 0 &&
> +                   rdev->desc_nr < le32_to_cpu(sb->max_dev) &&
> +                   le16_to_cpu(sb->dev_roles[rdev->desc_nr]) < 0xfffe)
> +                       if (ev1 < mddev->events)
> +                               return -EINVAL;
>        } else if (mddev->bitmap) {
>                /* If adding to array with a bitmap, then we can accept an
>                 * older device, but not too old.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

You should make it a sysfs controllable option to keep the counter
updated on spare discs or not.  The default can be to keep it within
one as you currently are patching it to.  Then later you can change
the default to the higher performance setting.  Better, the presence
of that file can be used to test if the kernel supports it, and if it
doesn't initramfs scripts can react accordingly.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Spare drive won't spin down
  2010-05-18  0:38               ` Michael Evans
@ 2010-05-18  0:50                 ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2010-05-18  0:50 UTC (permalink / raw)
  To: Michael Evans; +Cc: Doug Ledford, Bill Davidsen, Joe Bryant, linux-raid

On Mon, 17 May 2010 17:38:04 -0700
Michael Evans <mjevans1983@gmail.com> wrote:

> You should make it a sysfs controllable option to keep the counter
> updated on spare discs or not.  The default can be to keep it within
> one as you currently are patching it to.  Then later you can change
> the default to the higher performance setting.  Better, the presence
> of that file can be used to test if the kernel supports it, and if it
> doesn't initramfs scripts can react accordingly.

There really isn't any need for user-space to know - this is an entirely
in-kernel thing.  mdadm already supports assembling arrays with widely
different event_counts.  It just gives them to the kernel and lets it make
the final decision.

And I think it is stretching things to call this a performance setting.  It
only changes behaviour when add devices is added/removed/failed or when an
array is started or reshaped.  None of those happen often enough that an
extra spin-up is really going to be an issue.

It is more an issue of simplicity and elegance.  I don't think a sysfs option
is appropriate for that.

But thanks for the suggestion.

NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-05-18  0:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-06 17:49 Spare drive won't spin down Joe Bryant
2010-05-07  5:07 ` Michael Evans
2010-05-07  7:39   ` Joe Bryant
2010-05-07  6:20 ` Neil Brown
2010-05-07  7:40   ` Joe Bryant
2010-05-07  9:47     ` Neil Brown
2010-05-07 10:05       ` Joe Bryant
2010-05-10 16:59       ` Bill Davidsen
2010-05-11 20:53         ` Neil Brown
2010-05-17 18:11           ` Doug Ledford
2010-05-18  0:23             ` Neil Brown
2010-05-18  0:38               ` Michael Evans
2010-05-18  0:50                 ` Neil Brown
2010-05-18  0:20           ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.