All of lore.kernel.org
 help / color / mirror / Atom feed
* Strange / inconsistent behavior with mdadm -I -R
@ 2013-03-14 20:03 Martin Wilck
  2013-03-14 20:11 ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
  2013-03-17 23:35 ` Strange / inconsistent behavior with mdadm -I -R NeilBrown
  0 siblings, 2 replies; 11+ messages in thread
From: Martin Wilck @ 2013-03-14 20:03 UTC (permalink / raw)
  To: linux-raid, NeilBrown

Hello Neil,

for my DDF/RAID10 work, I have been trying to figure out how mdadm -I -R
is supposed to behave, and I have found strangeness I'd like to clarify,
lest I make a mistake in my DDF/RAID10 code.

My test case is incremental assembly of a clean array running mdadm -I
-R by hand for each array device in turn.

1) native md and containers behave differently for RAID 1

Both native and container RAID 1 are started in auto-read-only mode when
the 1st disk is added. When the 2nd disk is added, the native md
switches to "active" and starts a recovery which finishes immediately.
Container arrays (tested: DDF), on the other hand, do not switch to
"active" until a write attempt is made on the array. The problem is in
the native case: after the switch to "active", no more disks can be
added any more ("can only add $DISK as a spare").

IMO the container behavior makes more sense and matches the man page
better than the native behavior. Do you agree? Would it be hard to fix that?

2) RAID1 skips recovery for clean arrays, RAID10 does not

Native RAID 10 behaves similar to RAID1 as described above. When the
array can be started, it does so, in auto-read-mode. When the next disk
is added after that, recovery starts, and the array switches to
"active", and further disks can't be added the "simple way" any more.
There's one important difference: in the RAID 10 case, the recovery
doesn't finish immediately. Rather, md does a full recovery of the added
disk although it was clean. This is wrong; I have come up with a patch
for this which I will send in a follow-up email.

I tested this behavior with kernels 2.6.32, 3.0, and 3.8, with the same
result, using mdadm from the git tree.

Regards
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled
  2013-03-14 20:03 Strange / inconsistent behavior with mdadm -I -R Martin Wilck
@ 2013-03-14 20:11 ` mwilck
  2013-03-17 23:06   ` NeilBrown
  2013-03-17 23:35 ` Strange / inconsistent behavior with mdadm -I -R NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: mwilck @ 2013-03-14 20:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: Martin Wilck

From: Martin Wilck <mwilck@arcor.de>

When an array is assembled incrementally with mdadm -I -R
and the array switches to "active" mode, md starts a recovery.

If the array was clean, the "fullsync" flag will be 0. Skip
the full recovery in this case, as RAID1 does (the code was
actually copied from the sync_request() method of RAID1).
---
 drivers/md/raid10.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 64d4824..e373d88 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2863,6 +2863,22 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 		if (init_resync(conf))
 			return 0;
 
+	/*
+	 * Allow skipping a full rebuild for incremental assembly
+	 * of a clean array, like RAID1 does.
+	 */
+	if (mddev->bitmap == NULL &&
+	    mddev->recovery_cp == MaxSector &&
+	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
+	    conf->fullsync == 0) {
+		*skipped = 1;
+		max_sector = mddev->dev_sectors;
+		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
+		    test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+			max_sector = mddev->resync_max_sectors;
+		return max_sector - sector_nr;
+	}
+
  skipped:
 	max_sector = mddev->dev_sectors;
 	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled
  2013-03-14 20:11 ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
@ 2013-03-17 23:06   ` NeilBrown
  0 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2013-03-17 23:06 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On Thu, 14 Mar 2013 21:11:31 +0100 mwilck@arcor.de wrote:

> From: Martin Wilck <mwilck@arcor.de>
> 
> When an array is assembled incrementally with mdadm -I -R
> and the array switches to "active" mode, md starts a recovery.
> 
> If the array was clean, the "fullsync" flag will be 0. Skip
> the full recovery in this case, as RAID1 does (the code was
> actually copied from the sync_request() method of RAID1).
> ---
>  drivers/md/raid10.c |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 64d4824..e373d88 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2863,6 +2863,22 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
>  		if (init_resync(conf))
>  			return 0;
>  
> +	/*
> +	 * Allow skipping a full rebuild for incremental assembly
> +	 * of a clean array, like RAID1 does.
> +	 */
> +	if (mddev->bitmap == NULL &&
> +	    mddev->recovery_cp == MaxSector &&
> +	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
> +	    conf->fullsync == 0) {
> +		*skipped = 1;
> +		max_sector = mddev->dev_sectors;
> +		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
> +		    test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
> +			max_sector = mddev->resync_max_sectors;
> +		return max_sector - sector_nr;
> +	}
> +
>   skipped:
>  	max_sector = mddev->dev_sectors;
>  	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||


Thanks Martin - looks good.

However protocol requires that you add the magic string "Signed-off-by: your
email address" to the patch.  This affirms that you have the right (from a
copyright perspective) to contribute that patch - not that I doubt it, but it
is always good to have it in writing.
See section 12 of "Documentation/SubmittingPatches".

If you could just send me that line (you don't need to send the whole patch
as well), I'll apply the patch.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Strange / inconsistent behavior with mdadm -I -R
  2013-03-14 20:03 Strange / inconsistent behavior with mdadm -I -R Martin Wilck
  2013-03-14 20:11 ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
@ 2013-03-17 23:35 ` NeilBrown
  2013-03-19 18:29   ` Martin Wilck
  2013-03-19 18:35   ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
  1 sibling, 2 replies; 11+ messages in thread
From: NeilBrown @ 2013-03-17 23:35 UTC (permalink / raw)
  To: Martin Wilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2777 bytes --]

On Thu, 14 Mar 2013 21:03:28 +0100 Martin Wilck <mwilck@arcor.de> wrote:

> Hello Neil,
> 
> for my DDF/RAID10 work, I have been trying to figure out how mdadm -I -R
> is supposed to behave, and I have found strangeness I'd like to clarify,
> lest I make a mistake in my DDF/RAID10 code.
> 
> My test case is incremental assembly of a clean array running mdadm -I
> -R by hand for each array device in turn.
> 
> 1) native md and containers behave differently for RAID 1
> 
> Both native and container RAID 1 are started in auto-read-only mode when
> the 1st disk is added. When the 2nd disk is added, the native md
> switches to "active" and starts a recovery which finishes immediately.
> Container arrays (tested: DDF), on the other hand, do not switch to
> "active" until a write attempt is made on the array. The problem is in
> the native case: after the switch to "active", no more disks can be
> added any more ("can only add $DISK as a spare").
> 
> IMO the container behavior makes more sense and matches the man page
> better than the native behavior. Do you agree? Would it be hard to fix that?

Without -R, the array should remain inactive until all expected devices have
been found.  Then it should switch:
  to 'active' if the array is known to this host - i.e. listed
       in /etc/mdadm.conf or has the right hostname in the metadata
  to 'read-auto' if the array is not known to this host.

With -R, it only stays inactive until we have the minimum devices needed for
a functioning array.  Then it will switch to 'read-auto' or, if the array is
known and all expected devices are present, it will switch to 'active'.

I think this is correct behaviour.  However I'm not quite sure from your
description whether you are saying that it doesn't behave like this, or if
you are saying that it does but it should behave differently.

Maybe if you could provide a sequence of "mdadm" commands that produces an
outcome different to what you would expect - that would reduce the chance
that I get confused.

> 
> 2) RAID1 skips recovery for clean arrays, RAID10 does not
> 
> Native RAID 10 behaves similar to RAID1 as described above. When the
> array can be started, it does so, in auto-read-mode. When the next disk
> is added after that, recovery starts, and the array switches to
> "active", and further disks can't be added the "simple way" any more.
> There's one important difference: in the RAID 10 case, the recovery
> doesn't finish immediately. Rather, md does a full recovery of the added
> disk although it was clean. This is wrong; I have come up with a patch
> for this which I will send in a follow-up email.
> 

I agree with your assessment here and with your patch, thanks.

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Strange / inconsistent behavior with mdadm -I -R
  2013-03-17 23:35 ` Strange / inconsistent behavior with mdadm -I -R NeilBrown
@ 2013-03-19 18:29   ` Martin Wilck
  2013-03-21  1:22     ` NeilBrown
  2013-03-19 18:35   ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
  1 sibling, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2013-03-19 18:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 03/18/2013 12:35 AM, NeilBrown wrote:

> With -R, it only stays inactive until we have the minimum devices needed for
> a functioning array.  Then it will switch to 'read-auto' or, if the array is
> known and all expected devices are present, it will switch to 'active'.

That's the point - I see the array switch to active *before* all
expected devices are present. That causes further additions to fail.

> Maybe if you could provide a sequence of "mdadm" commands that produces an
> outcome different to what you would expect - that would reduce the chance
> that I get confused.

Here is the sequence of mdadm commands to illustrate what I mean.
I have a RAID10 array and I add the devices using mdadm -I -R in the
sequence 1,3,2,4. After adding device 3, the array will be started in
auto-read-only mode, which is fine.

But then as soon as the next disk (/dev/tosh/rd2) is added, the array
switches to "active" although it is neither written to, nor all disks
have been added yet. Consequently, adding disk 4 fails.

I expected the array to remain "auto-read-only" until either all 4
devices are present, or it is opened for writing. This is how the
container case is behaving (almost - it doesn't switch to active
automatically until it's written to).

# ./mdadm -C /dev/md0 -l 1 -n 4 /dev/tosh/rd[1-4] -pn2
mdadm: array /dev/md0 started.
(wait for initial build to finish)
# mdadm -S /dev/md0
mdadm: stopped /dev/md0
# ./mdadm -v -I /dev/tosh/rd1 -R
mdadm: /dev/tosh/rd1 attached to /dev/md/0, not enough to start (1).
# ./mdadm -v -I /dev/tosh/rd3 -R
mdadm: /dev/tosh/rd3 attached to /dev/md/0, which has been started.
# cat /proc/mdstat
Personalities : [raid1] [raid10]
md0 : active (auto-read-only) raid10 dm-6[2] dm-4[0]
      2094080 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
# ./mdadm -v -I /dev/tosh/rd2 -R; cat /proc/mdstat
mdadm: /dev/tosh/rd2 attached to /dev/md/0 which is already active.
Personalities : [raid1] [raid10]
md0 : active raid10 dm-5[1] dm-6[2] dm-4[0]
      2094080 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
      [>....................]  recovery =  0.0% (0/1047040)
finish=1090.6min speed=0K/sec
(wait for recovery to finish)
# ./mdadm -v -I /dev/tosh/rd4 -R
mdadm: can only add /dev/tosh/rd4 to /dev/md/0 as a spare, and
force-spare is not set.
mdadm: failed to add /dev/tosh/rd4 to existing array /dev/md/0: Invalid
argument.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled
  2013-03-17 23:35 ` Strange / inconsistent behavior with mdadm -I -R NeilBrown
  2013-03-19 18:29   ` Martin Wilck
@ 2013-03-19 18:35   ` mwilck
  2013-03-19 23:46     ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: mwilck @ 2013-03-19 18:35 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: Martin Wilck

When an array is assembled incrementally with mdadm -I -R
and the array switches to "active" mode, md starts a recovery.

If the array was clean, the "fullsync" flag will be 0. Skip
the full recovery in this case, as RAID1 does (the code was
actually copied from the sync_request() method of RAID1).

Signed-off-by: Martin Wilck <mwilck@arcor.de>

---
 drivers/md/raid10.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 64d4824..e373d88 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2863,6 +2863,22 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 		if (init_resync(conf))
 			return 0;
 
+	/*
+	 * Allow skipping a full rebuild for incremental assembly
+	 * of a clean array, like RAID1 does.
+	 */
+	if (mddev->bitmap == NULL &&
+	    mddev->recovery_cp == MaxSector &&
+	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
+	    conf->fullsync == 0) {
+		*skipped = 1;
+		max_sector = mddev->dev_sectors;
+		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
+		    test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+			max_sector = mddev->resync_max_sectors;
+		return max_sector - sector_nr;
+	}
+
  skipped:
 	max_sector = mddev->dev_sectors;
 	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled
  2013-03-19 18:35   ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
@ 2013-03-19 23:46     ` NeilBrown
  0 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2013-03-19 23:46 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

On Tue, 19 Mar 2013 19:35:33 +0100 mwilck@arcor.de wrote:

> When an array is assembled incrementally with mdadm -I -R
> and the array switches to "active" mode, md starts a recovery.
> 
> If the array was clean, the "fullsync" flag will be 0. Skip
> the full recovery in this case, as RAID1 does (the code was
> actually copied from the sync_request() method of RAID1).
> 
> Signed-off-by: Martin Wilck <mwilck@arcor.de>
> 
> ---
>  drivers/md/raid10.c |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 64d4824..e373d88 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2863,6 +2863,22 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
>  		if (init_resync(conf))
>  			return 0;
>  
> +	/*
> +	 * Allow skipping a full rebuild for incremental assembly
> +	 * of a clean array, like RAID1 does.
> +	 */
> +	if (mddev->bitmap == NULL &&
> +	    mddev->recovery_cp == MaxSector &&
> +	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
> +	    conf->fullsync == 0) {
> +		*skipped = 1;
> +		max_sector = mddev->dev_sectors;
> +		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
> +		    test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
> +			max_sector = mddev->resync_max_sectors;
> +		return max_sector - sector_nr;
> +	}
> +
>   skipped:
>  	max_sector = mddev->dev_sectors;
>  	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||

applied, thanks.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Strange / inconsistent behavior with mdadm -I -R
  2013-03-19 18:29   ` Martin Wilck
@ 2013-03-21  1:22     ` NeilBrown
  2013-03-27  5:53       ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-03-21  1:22 UTC (permalink / raw)
  To: Martin Wilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3224 bytes --]

On Tue, 19 Mar 2013 19:29:44 +0100 Martin Wilck <mwilck@arcor.de> wrote:

> On 03/18/2013 12:35 AM, NeilBrown wrote:
> 
> > With -R, it only stays inactive until we have the minimum devices needed for
> > a functioning array.  Then it will switch to 'read-auto' or, if the array is
> > known and all expected devices are present, it will switch to 'active'.
> 
> That's the point - I see the array switch to active *before* all
> expected devices are present. That causes further additions to fail.
> 
> > Maybe if you could provide a sequence of "mdadm" commands that produces an
> > outcome different to what you would expect - that would reduce the chance
> > that I get confused.
> 
> Here is the sequence of mdadm commands to illustrate what I mean.
> I have a RAID10 array and I add the devices using mdadm -I -R in the
> sequence 1,3,2,4. After adding device 3, the array will be started in
> auto-read-only mode, which is fine.
> 
> But then as soon as the next disk (/dev/tosh/rd2) is added, the array
> switches to "active" although it is neither written to, nor all disks
> have been added yet. Consequently, adding disk 4 fails.
> 
> I expected the array to remain "auto-read-only" until either all 4
> devices are present, or it is opened for writing. This is how the
> container case is behaving (almost - it doesn't switch to active
> automatically until it's written to).
> 
> # ./mdadm -C /dev/md0 -l 1 -n 4 /dev/tosh/rd[1-4] -pn2
> mdadm: array /dev/md0 started.
> (wait for initial build to finish)
> # mdadm -S /dev/md0
> mdadm: stopped /dev/md0
> # ./mdadm -v -I /dev/tosh/rd1 -R
> mdadm: /dev/tosh/rd1 attached to /dev/md/0, not enough to start (1).
> # ./mdadm -v -I /dev/tosh/rd3 -R
> mdadm: /dev/tosh/rd3 attached to /dev/md/0, which has been started.
> # cat /proc/mdstat
> Personalities : [raid1] [raid10]
> md0 : active (auto-read-only) raid10 dm-6[2] dm-4[0]
>       2094080 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
> # ./mdadm -v -I /dev/tosh/rd2 -R; cat /proc/mdstat
> mdadm: /dev/tosh/rd2 attached to /dev/md/0 which is already active.
> Personalities : [raid1] [raid10]
> md0 : active raid10 dm-5[1] dm-6[2] dm-4[0]
>       2094080 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
>       [>....................]  recovery =  0.0% (0/1047040)
> finish=1090.6min speed=0K/sec
> (wait for recovery to finish)
> # ./mdadm -v -I /dev/tosh/rd4 -R
> mdadm: can only add /dev/tosh/rd4 to /dev/md/0 as a spare, and
> force-spare is not set.
> mdadm: failed to add /dev/tosh/rd4 to existing array /dev/md/0: Invalid
> argument.
> 
> Thanks,
> Martin

Thanks, that makes is all very clear.

The problem is that the ADD_NEW_DISK ioctl automatically converts from
read-auto to active.
There are two approaches I could take to addressing this.
1/ change ADD_NEW_DISK to not cause that conversion.  I think that would need
   to be conditional as some times it really should be changed.
2/ change mdadm to not use ADD_NEW_DISK but instead add the disk my setting it
   up via sysfs.

I'm not sure which is best and neither is completely straight forward.
So for now:  I'll get back to you.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Strange / inconsistent behavior with mdadm -I -R
  2013-03-21  1:22     ` NeilBrown
@ 2013-03-27  5:53       ` NeilBrown
  2013-03-27 20:29         ` Martin Wilck
  2013-04-16 19:09         ` Martin Wilck
  0 siblings, 2 replies; 11+ messages in thread
From: NeilBrown @ 2013-03-27  5:53 UTC (permalink / raw)
  To: Martin Wilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 9345 bytes --]

On Thu, 21 Mar 2013 12:22:05 +1100 NeilBrown <neilb@suse.de> wrote:

> On Tue, 19 Mar 2013 19:29:44 +0100 Martin Wilck <mwilck@arcor.de> wrote:
> 
> > On 03/18/2013 12:35 AM, NeilBrown wrote:
> > 
> > > With -R, it only stays inactive until we have the minimum devices needed for
> > > a functioning array.  Then it will switch to 'read-auto' or, if the array is
> > > known and all expected devices are present, it will switch to 'active'.
> > 
> > That's the point - I see the array switch to active *before* all
> > expected devices are present. That causes further additions to fail.
> > 
> > > Maybe if you could provide a sequence of "mdadm" commands that produces an
> > > outcome different to what you would expect - that would reduce the chance
> > > that I get confused.
> > 
> > Here is the sequence of mdadm commands to illustrate what I mean.
> > I have a RAID10 array and I add the devices using mdadm -I -R in the
> > sequence 1,3,2,4. After adding device 3, the array will be started in
> > auto-read-only mode, which is fine.
> > 
> > But then as soon as the next disk (/dev/tosh/rd2) is added, the array
> > switches to "active" although it is neither written to, nor all disks
> > have been added yet. Consequently, adding disk 4 fails.
> > 
> > I expected the array to remain "auto-read-only" until either all 4
> > devices are present, or it is opened for writing. This is how the
> > container case is behaving (almost - it doesn't switch to active
> > automatically until it's written to).
> > 
> > # ./mdadm -C /dev/md0 -l 1 -n 4 /dev/tosh/rd[1-4] -pn2
> > mdadm: array /dev/md0 started.
> > (wait for initial build to finish)
> > # mdadm -S /dev/md0
> > mdadm: stopped /dev/md0
> > # ./mdadm -v -I /dev/tosh/rd1 -R
> > mdadm: /dev/tosh/rd1 attached to /dev/md/0, not enough to start (1).
> > # ./mdadm -v -I /dev/tosh/rd3 -R
> > mdadm: /dev/tosh/rd3 attached to /dev/md/0, which has been started.
> > # cat /proc/mdstat
> > Personalities : [raid1] [raid10]
> > md0 : active (auto-read-only) raid10 dm-6[2] dm-4[0]
> >       2094080 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
> > # ./mdadm -v -I /dev/tosh/rd2 -R; cat /proc/mdstat
> > mdadm: /dev/tosh/rd2 attached to /dev/md/0 which is already active.
> > Personalities : [raid1] [raid10]
> > md0 : active raid10 dm-5[1] dm-6[2] dm-4[0]
> >       2094080 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
> >       [>....................]  recovery =  0.0% (0/1047040)
> > finish=1090.6min speed=0K/sec
> > (wait for recovery to finish)
> > # ./mdadm -v -I /dev/tosh/rd4 -R
> > mdadm: can only add /dev/tosh/rd4 to /dev/md/0 as a spare, and
> > force-spare is not set.
> > mdadm: failed to add /dev/tosh/rd4 to existing array /dev/md/0: Invalid
> > argument.
> > 
> > Thanks,
> > Martin
> 
> Thanks, that makes is all very clear.
> 
> The problem is that the ADD_NEW_DISK ioctl automatically converts from
> read-auto to active.
> There are two approaches I could take to addressing this.
> 1/ change ADD_NEW_DISK to not cause that conversion.  I think that would need
>    to be conditional as some times it really should be changed.
> 2/ change mdadm to not use ADD_NEW_DISK but instead add the disk my setting it
>    up via sysfs.
> 
> I'm not sure which is best and neither is completely straight forward.
> So for now:  I'll get back to you.

I spend a while experimenting and such and I now have a patch which I am
happy with and I think fixes your issue.  It is included below.

If you could test and confirm, I'd appreciate it.

I'll try to get to your DDF patches soon, but it might not be until next week.

NeilBrown


From ca16ef8db7f0392066735262d848825c00f67d77 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 27 Mar 2013 16:32:31 +1100
Subject: [PATCH] md: Allow devices to be re-added to a read-only array.

When assembling an array incrementally we might want to make
it device available when "enough" devices are present, but maybe
not "all" devices are present.
If the remaining devices appear before the array is actually used,
they should be added transparently.

We do this by using the "read-auto" mode where the array acts like
it is read-only until a write request arrives.

Current an add-device request switches a read-auto array to active.
This means that only one device can be added after the array is first
made read-auto.  This isn't a problem for RAID5, but is not ideal for
RAID6 or RAID10.
Also we don't really want to switch the array to read-auto at all
when re-adding a device as this doesn't really imply any change.

So:
 - remove the "md_update_sb()" call from add_new_disk().  This isn't
   really needed as just adding a disk doesn't require a metadata
   update.  Instead, just set MD_CHANGE_DEVS.  This will effect a
   metadata update soon enough, once the array is not read-only.

 - Allow the ADD_NEW_DISK ioctl to succeed without activating a
   read-auto array, providing the MD_DISK_SYNC flag is set.
   In this case, the device will be rejected if it cannot be added
   with the correct device number, or has an incorrect event count.

 - Teach remove_and_add_spares() to be careful about adding spares
   when the array is read-only (or read-mostly) - only add devices
   that are thought to be in-sync, and only do it if the array is
   in-sync itself.

 - In md_check_recovery, use remove_and_add_spares in the read-only
   case, rather than open coding just the 'remove' part of it.

Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aeceedf..a31cc0d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5810,7 +5810,7 @@ static int add_new_disk(struct mddev * mddev, mdu_disk_info_t *info)
 		else
 			sysfs_notify_dirent_safe(rdev->sysfs_state);
 
-		md_update_sb(mddev, 1);
+		set_bit(MD_CHANGE_DEVS, &mddev->flags);
 		if (mddev->degraded)
 			set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
@@ -6490,6 +6490,24 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 		err = md_set_readonly(mddev, bdev);
 		goto done_unlock;
 
+	case ADD_NEW_DISK:
+		/* We can support ADD_NEW_DISK on read-only arrays
+		 * on if we are re-adding a preexisting device.
+		 * So require mddev->pers and MD_DISK_SYNC.
+		 */
+		if (mddev->pers) {
+			mdu_disk_info_t info;
+			if (copy_from_user(&info, argp, sizeof(info)))
+				err = -EFAULT;
+			else if (!(info.state & (1<<MD_DISK_SYNC)))
+				/* Need to clear read-only for this */
+				break;
+			else
+				err = add_new_disk(mddev, &info);
+			goto done_unlock;
+		}
+		break;
+
 	case BLKROSET:
 		if (get_user(ro, (int __user *)(arg))) {
 			err = -EFAULT;
@@ -7671,17 +7689,36 @@ static int remove_and_add_spares(struct mddev *mddev)
 		    !test_bit(In_sync, &rdev->flags) &&
 		    !test_bit(Faulty, &rdev->flags))
 			spares++;
-		if (rdev->raid_disk < 0
-		    && !test_bit(Faulty, &rdev->flags)) {
-			rdev->recovery_offset = 0;
-			if (mddev->pers->
-			    hot_add_disk(mddev, rdev) == 0) {
-				if (sysfs_link_rdev(mddev, rdev))
-					/* failure here is OK */;
-				spares++;
-				md_new_event(mddev);
-				set_bit(MD_CHANGE_DEVS, &mddev->flags);
-			}
+		if (rdev->raid_disk >= 0)
+			continue;
+		if (test_bit(Faulty, &rdev->flags))
+			continue;
+		if (mddev->ro &&
+		    rdev->saved_raid_disk < 0)
+			continue;
+
+		rdev->recovery_offset = 0;
+		if (rdev->saved_raid_disk >= 0 && mddev->in_sync) {
+			spin_lock_irq(&mddev->write_lock);
+			if (mddev->in_sync)
+				/* OK, this device, which is in_sync,
+				 * will definitely be noticed before
+				 * the next write, so recovery isn't
+				 * needed.
+				 */
+				rdev->recovery_offset = mddev->recovery_cp;
+			spin_unlock_irq(&mddev->write_lock);
+		}
+		if (mddev->ro && rdev->recovery_offset != MaxSector)
+			/* not safe to add this disk now */
+			continue;
+		if (mddev->pers->
+		    hot_add_disk(mddev, rdev) == 0) {
+			if (sysfs_link_rdev(mddev, rdev))
+				/* failure here is OK */;
+			spares++;
+			md_new_event(mddev);
+			set_bit(MD_CHANGE_DEVS, &mddev->flags);
 		}
 	}
 	if (removed)
@@ -7789,22 +7826,16 @@ void md_check_recovery(struct mddev *mddev)
 		int spares = 0;
 
 		if (mddev->ro) {
-			/* Only thing we do on a ro array is remove
-			 * failed devices.
+			/* On a read-only array we can:
+			 * - remove failed devices
+			 * - add already-in_sync devices if the array itself
+			 *   is in-sync.
+			 * As we only add devices that are already in-sync,
+			 * we can activate the spares immediately.
 			 */
-			struct md_rdev *rdev;
-			rdev_for_each(rdev, mddev)
-				if (rdev->raid_disk >= 0 &&
-				    !test_bit(Blocked, &rdev->flags) &&
-				    test_bit(Faulty, &rdev->flags) &&
-				    atomic_read(&rdev->nr_pending)==0) {
-					if (mddev->pers->hot_remove_disk(
-						    mddev, rdev) == 0) {
-						sysfs_unlink_rdev(mddev, rdev);
-						rdev->raid_disk = -1;
-					}
-				}
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+			remove_and_add_spares(mddev);
+			mddev->pers->spare_active(mddev);
 			goto unlock;
 		}
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Strange / inconsistent behavior with mdadm -I -R
  2013-03-27  5:53       ` NeilBrown
@ 2013-03-27 20:29         ` Martin Wilck
  2013-04-16 19:09         ` Martin Wilck
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2013-03-27 20:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 03/27/2013 06:53 AM, NeilBrown wrote:

> I spend a while experimenting and such and I now have a patch which I am
> happy with and I think fixes your issue.  It is included below.
> 
> If you could test and confirm, I'd appreciate it.

It fixes my test case nicely, thanks. The behavior is now exactly the
same as in the container case, for both RAID10 and RAID1.

I'm not quite sure what else to test.

> I'll try to get to your DDF patches soon, but it might not be until next week.

Great, thanks and happy easter,
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Strange / inconsistent behavior with mdadm -I -R
  2013-03-27  5:53       ` NeilBrown
  2013-03-27 20:29         ` Martin Wilck
@ 2013-04-16 19:09         ` Martin Wilck
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2013-04-16 19:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,

On 03/27/2013 06:53 AM, NeilBrown wrote:

> I'll try to get to your DDF patches soon, but it might not be until next week.

Had any time for this yet?

Regards
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-04-16 19:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-14 20:03 Strange / inconsistent behavior with mdadm -I -R Martin Wilck
2013-03-14 20:11 ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
2013-03-17 23:06   ` NeilBrown
2013-03-17 23:35 ` Strange / inconsistent behavior with mdadm -I -R NeilBrown
2013-03-19 18:29   ` Martin Wilck
2013-03-21  1:22     ` NeilBrown
2013-03-27  5:53       ` NeilBrown
2013-03-27 20:29         ` Martin Wilck
2013-04-16 19:09         ` Martin Wilck
2013-03-19 18:35   ` [PATCH] RAID10: Allow skipping recovery when clean arrays are assembled mwilck
2013-03-19 23:46     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.