All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fixes for manually-added spares in raid5 (2nd)
@ 2011-01-14 12:59 Adam Kwolek
  2011-01-14 13:00 ` [PATCH 1/2] md/raid5: FIX: manually-added spare is not used Adam Kwolek
  2011-01-14 13:00 ` [PATCH 2/2] md/raid5: FIX: reshape on degraded devices has wrong configuration Adam Kwolek
  0 siblings, 2 replies; 14+ messages in thread
From: Adam Kwolek @ 2011-01-14 12:59 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

NOTE: this series is resent due to lack of check-patch in previous series.

After patch "md/raid6: handle manually-added spares in start_reshape." spares with set slot are almost used by md for raid5 reshape.
It works this way: before and after reshape process disks number in array is the same, new disks are not used during reshape process.
This is addressed by first patch.

Second problem I've found is for degraded arrays (i.e. takeovered raid0). Scenario is as follow:
1. Create raid0 array using 2 disks
2. add 2 spare disks to container
3. expand capacity to 4 disks in array

In result we will get raid0 build on 3 disks with one spare disk. this is due to fact that disk degradataion is not used for comparing disks slot number (disk used for
degraded slot is considered as present in array) This is addressed by second patch.

BR
Adam


---

Adam Kwolek (2):
      md/raid5: FIX: reshape on degraded devices has wrong configuration
      md/raid5: FIX: manually-added spare is not used


 drivers/md/raid5.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-14 12:59 [PATCH 0/2] fixes for manually-added spares in raid5 (2nd) Adam Kwolek
@ 2011-01-14 13:00 ` Adam Kwolek
  2011-01-16 23:11   ` NeilBrown
  2011-01-14 13:00 ` [PATCH 2/2] md/raid5: FIX: reshape on degraded devices has wrong configuration Adam Kwolek
  1 sibling, 1 reply; 14+ messages in thread
From: Adam Kwolek @ 2011-01-14 13:00 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Manually added spares are not used due to fact that they not added to md configuration.
Counters are updated only.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 drivers/md/raid5.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a2087c7..59c4150 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t *mddev)
 		} else if (rdev->raid_disk >= conf->previous_raid_disks
 			   && !test_bit(Faulty, &rdev->flags)) {
 			/* This is a spare that was manually added */
-			set_bit(In_sync, &rdev->flags);
-			added_devices++;
+			if (raid5_add_disk(mddev, rdev) == 0) {
+				set_bit(In_sync, &rdev->flags);
+				added_devices++;
+			}
 		}
 
 	/* When a reshape changes the number of devices, ->degraded


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] md/raid5: FIX: reshape on degraded devices has wrong configuration
  2011-01-14 12:59 [PATCH 0/2] fixes for manually-added spares in raid5 (2nd) Adam Kwolek
  2011-01-14 13:00 ` [PATCH 1/2] md/raid5: FIX: manually-added spare is not used Adam Kwolek
@ 2011-01-14 13:00 ` Adam Kwolek
  2011-01-16 23:30   ` NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: Adam Kwolek @ 2011-01-14 13:00 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When we performing expansion on degraded array (takeovered raid0),
and we are adding i.e. 2 disks, one disk is added only.

this is due to fact that parity (missing) disk is moved to the end
and added disk index is smaller than starting disk number in array.

To resolve this situation array degradation should be used for adding
disk(s) on reshape start.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 drivers/md/raid5.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 59c4150..3b9da76 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5528,7 +5528,8 @@ static int raid5_start_reshape(mddev_t *mddev)
 		return -ENOSPC;
 
 	list_for_each_entry(rdev, &mddev->disks, same_set)
-		if ((rdev->raid_disk < 0 || rdev->raid_disk >= conf->raid_disks)
+		if ((rdev->raid_disk < 0 ||
+		     rdev->raid_disk >= (conf->raid_disks - mddev->degraded))
 		     && !test_bit(Faulty, &rdev->flags))
 			spares++;
 
@@ -5589,7 +5590,8 @@ static int raid5_start_reshape(mddev_t *mddev)
 					/* Failure here is OK */;
 			} else
 				break;
-		} else if (rdev->raid_disk >= conf->previous_raid_disks
+		} else if (rdev->raid_disk >= (conf->previous_raid_disks
+						- mddev->degraded)
 			   && !test_bit(Faulty, &rdev->flags)) {
 			/* This is a spare that was manually added */
 			if (raid5_add_disk(mddev, rdev) == 0) {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-14 13:00 ` [PATCH 1/2] md/raid5: FIX: manually-added spare is not used Adam Kwolek
@ 2011-01-16 23:11   ` NeilBrown
  2011-01-16 23:28     ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2011-01-16 23:11 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> Manually added spares are not used due to fact that they not added to md configuration.
> Counters are updated only.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> ---
> 
>  drivers/md/raid5.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index a2087c7..59c4150 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t *mddev)
>  		} else if (rdev->raid_disk >= conf->previous_raid_disks
>  			   && !test_bit(Faulty, &rdev->flags)) {
>  			/* This is a spare that was manually added */
> -			set_bit(In_sync, &rdev->flags);
> -			added_devices++;
> +			if (raid5_add_disk(mddev, rdev) == 0) {
> +				set_bit(In_sync, &rdev->flags);
> +				added_devices++;
> +			}
>  		}
>  
>  	/* When a reshape changes the number of devices, ->degraded

This should not be needed.
When a device is manually added, the desired slot number is written to  
   ..../md/dev-XXX/slot

This calls slot_store (in md.c) which call mddev->pers->hot_add_disk which
for raid5 is raid5_add_disk.
So you shouldn't need to call raid5_add_disk again.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-16 23:11   ` NeilBrown
@ 2011-01-16 23:28     ` NeilBrown
  2011-01-17  0:44       ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2011-01-16 23:28 UTC (permalink / raw)
  To: NeilBrown
  Cc: Adam Kwolek, linux-raid, dan.j.williams, ed.ciechanowski,
	wojciech.neubauer

On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de> wrote:

> On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:
> 
> > Manually added spares are not used due to fact that they not added to md configuration.
> > Counters are updated only.
> > 
> > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > ---
> > 
> >  drivers/md/raid5.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index a2087c7..59c4150 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t *mddev)
> >  		} else if (rdev->raid_disk >= conf->previous_raid_disks
> >  			   && !test_bit(Faulty, &rdev->flags)) {
> >  			/* This is a spare that was manually added */
> > -			set_bit(In_sync, &rdev->flags);
> > -			added_devices++;
> > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > +				set_bit(In_sync, &rdev->flags);
> > +				added_devices++;
> > +			}
> >  		}
> >  
> >  	/* When a reshape changes the number of devices, ->degraded
> 
> This should not be needed.
> When a device is manually added, the desired slot number is written to  
>    ..../md/dev-XXX/slot
> 
> This calls slot_store (in md.c) which call mddev->pers->hot_add_disk which
> for raid5 is raid5_add_disk.
> So you shouldn't need to call raid5_add_disk again.
> 

ahhh... I see.  raid5_add_disk doesn't do the right thing in that case.  It
actually indexes beyond the end of an array, which is bad.

We possibly do need the raid5_add_disk where you had put it.  I'll have a
think and see what is best.

Thanks,

NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] md/raid5: FIX: reshape on degraded devices has wrong configuration
  2011-01-14 13:00 ` [PATCH 2/2] md/raid5: FIX: reshape on degraded devices has wrong configuration Adam Kwolek
@ 2011-01-16 23:30   ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2011-01-16 23:30 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

On Fri, 14 Jan 2011 14:00:08 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> When we performing expansion on degraded array (takeovered raid0),
> and we are adding i.e. 2 disks, one disk is added only.
> 
> this is due to fact that parity (missing) disk is moved to the end
> and added disk index is smaller than starting disk number in array.
> 
> To resolve this situation array degradation should be used for adding
> disk(s) on reshape start.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> ---
> 
>  drivers/md/raid5.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 59c4150..3b9da76 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5528,7 +5528,8 @@ static int raid5_start_reshape(mddev_t *mddev)
>  		return -ENOSPC;
>  
>  	list_for_each_entry(rdev, &mddev->disks, same_set)
> -		if ((rdev->raid_disk < 0 || rdev->raid_disk >= conf->raid_disks)
> +		if ((rdev->raid_disk < 0 ||
> +		     rdev->raid_disk >= (conf->raid_disks - mddev->degraded))
>  		     && !test_bit(Faulty, &rdev->flags))
>  			spares++;
>  
> @@ -5589,7 +5590,8 @@ static int raid5_start_reshape(mddev_t *mddev)
>  					/* Failure here is OK */;
>  			} else
>  				break;
> -		} else if (rdev->raid_disk >= conf->previous_raid_disks
> +		} else if (rdev->raid_disk >= (conf->previous_raid_disks
> +						- mddev->degraded)
>  			   && !test_bit(Faulty, &rdev->flags)) {
>  			/* This is a spare that was manually added */
>  			if (raid5_add_disk(mddev, rdev) == 0) {

I see the problem you are trying to fix, but I think the fix is wrong.
The range check is correct, but the test for Faulty isn't quite right.
I think we need to check for In_sync and not Faulty, or something like that.
I work something out.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-16 23:28     ` NeilBrown
@ 2011-01-17  0:44       ` NeilBrown
  2011-01-17 14:13         ` Kwolek, Adam
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2011-01-17  0:44 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

On Mon, 17 Jan 2011 10:28:21 +1100 NeilBrown <neilb@suse.de> wrote:

> On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de> wrote:
> 
> > On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:
> > 
> > > Manually added spares are not used due to fact that they not added to md configuration.
> > > Counters are updated only.
> > > 
> > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > ---
> > > 
> > >  drivers/md/raid5.c |    6 ++++--
> > >  1 files changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index a2087c7..59c4150 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t *mddev)
> > >  		} else if (rdev->raid_disk >= conf->previous_raid_disks
> > >  			   && !test_bit(Faulty, &rdev->flags)) {
> > >  			/* This is a spare that was manually added */
> > > -			set_bit(In_sync, &rdev->flags);
> > > -			added_devices++;
> > > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > > +				set_bit(In_sync, &rdev->flags);
> > > +				added_devices++;
> > > +			}
> > >  		}
> > >  
> > >  	/* When a reshape changes the number of devices, ->degraded
> > 
> > This should not be needed.
> > When a device is manually added, the desired slot number is written to  
> >    ..../md/dev-XXX/slot
> > 
> > This calls slot_store (in md.c) which call mddev->pers->hot_add_disk which
> > for raid5 is raid5_add_disk.
> > So you shouldn't need to call raid5_add_disk again.
> > 
> 
> ahhh... I see.  raid5_add_disk doesn't do the right thing in that case.  It
> actually indexes beyond the end of an array, which is bad.
> 
> We possibly do need the raid5_add_disk where you had put it.  I'll have a
> think and see what is best.

On third thoughts, I cannot see the problem you are seeing.
I even did some simple testing (manually writing to things in sysfs) and it
seems to include the new device properly.

There are some issues that I found which are address by the following patch,
but it isn't clear to me that any of them relate to what you are seeing.
Maybe if you could be more specific about what you see happening?

Thanks,
NeilBrown


diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3194a80..cd4cccd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5585,6 +5585,8 @@ static int update_raid_disks(mddev_t *mddev, int raid_disks)
 	mddev->delta_disks = raid_disks - mddev->raid_disks;
 
 	rv = mddev->pers->check_reshape(mddev);
+	if (rv < 0)
+		mddev->delta_disks = 0;
 	return rv;
 }
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5044bab..2902454 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5527,8 +5527,8 @@ static int raid5_start_reshape(mddev_t *mddev)
 		return -ENOSPC;
 
 	list_for_each_entry(rdev, &mddev->disks, same_set)
-		if ((rdev->raid_disk < 0 || rdev->raid_disk >= conf->raid_disks)
-		     && !test_bit(Faulty, &rdev->flags))
+		if (!test_bit(In_sync, &rdev->flags)
+		    && !test_bit(Faulty, &rdev->flags))
 			spares++;
 
 	if (spares - mddev->degraded < mddev->delta_disks - conf->max_degraded)
@@ -5571,7 +5571,7 @@ static int raid5_start_reshape(mddev_t *mddev)
 	 * to correctly record the "partially reconstructed" state of
 	 * such devices during the reshape and confusion could result.
 	 */
-	if (mddev->delta_disks >= 0)
+	if (mddev->delta_disks >= 0) {
 	    list_for_each_entry(rdev, &mddev->disks, same_set)
 		if (rdev->raid_disk < 0 &&
 		    !test_bit(Faulty, &rdev->flags)) {
@@ -5586,8 +5586,7 @@ static int raid5_start_reshape(mddev_t *mddev)
 				if (sysfs_create_link(&mddev->kobj,
 						      &rdev->kobj, nm))
 					/* Failure here is OK */;
-			} else
-				break;
+			}
 		} else if (rdev->raid_disk >= conf->previous_raid_disks
 			   && !test_bit(Faulty, &rdev->flags)) {
 			/* This is a spare that was manually added */
@@ -5595,14 +5594,14 @@ static int raid5_start_reshape(mddev_t *mddev)
 			added_devices++;
 		}
 
-	/* When a reshape changes the number of devices, ->degraded
-	 * is measured against the larger of the pre and post number of
-	 * devices.*/
-	if (mddev->delta_disks > 0) {
-		spin_lock_irqsave(&conf->device_lock, flags);
-		mddev->degraded += (conf->raid_disks - conf->previous_raid_disks)
-			- added_devices;
-		spin_unlock_irqrestore(&conf->device_lock, flags);
+	    /* When a reshape changes the number of devices,
+	     * ->degraded is measured against the larger of the pre
+	     * and post number of devices.
+	     */
+	    spin_lock_irqsave(&conf->device_lock, flags);
+	    mddev->degraded += (conf->raid_disks - conf->previous_raid_disks)
+		    - added_devices;
+	    spin_unlock_irqrestore(&conf->device_lock, flags);
 	}
 	mddev->raid_disks = conf->raid_disks;
 	mddev->reshape_position = conf->reshape_progress;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-17  0:44       ` NeilBrown
@ 2011-01-17 14:13         ` Kwolek, Adam
  2011-01-19 20:48           ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Kwolek, Adam @ 2011-01-17 14:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Monday, January 17, 2011 1:45 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> used
> 
> On Mon, 17 Jan 2011 10:28:21 +1100 NeilBrown <neilb@suse.de> wrote:
> 
> > On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de> wrote:
> >
> > > On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek
> <adam.kwolek@intel.com> wrote:
> > >
> > > > Manually added spares are not used due to fact that they not
> added to md configuration.
> > > > Counters are updated only.
> > > >
> > > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > > ---
> > > >
> > > >  drivers/md/raid5.c |    6 ++++--
> > > >  1 files changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > index a2087c7..59c4150 100644
> > > > --- a/drivers/md/raid5.c
> > > > +++ b/drivers/md/raid5.c
> > > > @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t
> *mddev)
> > > >  		} else if (rdev->raid_disk >= conf-
> >previous_raid_disks
> > > >  			   && !test_bit(Faulty, &rdev->flags)) {
> > > >  			/* This is a spare that was manually added */
> > > > -			set_bit(In_sync, &rdev->flags);
> > > > -			added_devices++;
> > > > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > > > +				set_bit(In_sync, &rdev->flags);
> > > > +				added_devices++;
> > > > +			}
> > > >  		}
> > > >
> > > >  	/* When a reshape changes the number of devices, ->degraded
> > >
> > > This should not be needed.
> > > When a device is manually added, the desired slot number is written
> to
> > >    ..../md/dev-XXX/slot
> > >
> > > This calls slot_store (in md.c) which call mddev->pers-
> >hot_add_disk which
> > > for raid5 is raid5_add_disk.
> > > So you shouldn't need to call raid5_add_disk again.
> > >
> >
> > ahhh... I see.  raid5_add_disk doesn't do the right thing in that
> case.  It
> > actually indexes beyond the end of an array, which is bad.
> >
> > We possibly do need the raid5_add_disk where you had put it.  I'll
> have a
> > think and see what is best.
> 
> On third thoughts, I cannot see the problem you are seeing.
> I even did some simple testing (manually writing to things in sysfs)
> and it
> seems to include the new device properly.
> 
> There are some issues that I found which are address by the following
> patch,
> but it isn't clear to me that any of them relate to what you are
> seeing.
> Maybe if you could be more specific about what you see happening?
> 
> Thanks,
> NeilBrown


When I'm not using raid5_add_disk() in raid5_start_reshape() added disk LED light doesn't blinks
(but it should during reshape ;)),
md doesn't make any signs that something goes wrong (even size can be increased).

I've made some debug, and at second (during reshape start) raid5_add_disk() call rcu_assign_pointer() is called again.
This means that somehow previous assignment when slot is set was cleared.

Correct situation (all disks are used during reshape) I can archive when instead raid5_add_disk() call
I've add the following code:

     struct disk_info *p = conf->disks + rdev->raid_disk;
     rcu_assign_pointer(p->rdev, rdev);

and (conf->disks + rdev->raid_disk)->rdev pointer is present in configuration.
I've checked that if I do not do call to rcu_assign_pointer() pointer  (p->rdev) has NULL value.
In both cases call rcu_assign_pointer() sets p->rdev to the same value, so rdev doesn't change his location in memory.


BR
Adam








> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3194a80..cd4cccd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5585,6 +5585,8 @@ static int update_raid_disks(mddev_t *mddev, int
> raid_disks)
>  	mddev->delta_disks = raid_disks - mddev->raid_disks;
> 
>  	rv = mddev->pers->check_reshape(mddev);
> +	if (rv < 0)
> +		mddev->delta_disks = 0;
>  	return rv;
>  }
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5044bab..2902454 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5527,8 +5527,8 @@ static int raid5_start_reshape(mddev_t *mddev)
>  		return -ENOSPC;
> 
>  	list_for_each_entry(rdev, &mddev->disks, same_set)
> -		if ((rdev->raid_disk < 0 || rdev->raid_disk >= conf-
> >raid_disks)
> -		     && !test_bit(Faulty, &rdev->flags))
> +		if (!test_bit(In_sync, &rdev->flags)
> +		    && !test_bit(Faulty, &rdev->flags))
>  			spares++;
> 
>  	if (spares - mddev->degraded < mddev->delta_disks - conf-
> >max_degraded)
> @@ -5571,7 +5571,7 @@ static int raid5_start_reshape(mddev_t *mddev)
>  	 * to correctly record the "partially reconstructed" state of
>  	 * such devices during the reshape and confusion could result.
>  	 */
> -	if (mddev->delta_disks >= 0)
> +	if (mddev->delta_disks >= 0) {
>  	    list_for_each_entry(rdev, &mddev->disks, same_set)
>  		if (rdev->raid_disk < 0 &&
>  		    !test_bit(Faulty, &rdev->flags)) {
> @@ -5586,8 +5586,7 @@ static int raid5_start_reshape(mddev_t *mddev)
>  				if (sysfs_create_link(&mddev->kobj,
>  						      &rdev->kobj, nm))
>  					/* Failure here is OK */;
> -			} else
> -				break;
> +			}
>  		} else if (rdev->raid_disk >= conf->previous_raid_disks
>  			   && !test_bit(Faulty, &rdev->flags)) {
>  			/* This is a spare that was manually added */
> @@ -5595,14 +5594,14 @@ static int raid5_start_reshape(mddev_t *mddev)
>  			added_devices++;
>  		}
> 
> -	/* When a reshape changes the number of devices, ->degraded
> -	 * is measured against the larger of the pre and post number of
> -	 * devices.*/
> -	if (mddev->delta_disks > 0) {
> -		spin_lock_irqsave(&conf->device_lock, flags);
> -		mddev->degraded += (conf->raid_disks - conf-
> >previous_raid_disks)
> -			- added_devices;
> -		spin_unlock_irqrestore(&conf->device_lock, flags);
> +	    /* When a reshape changes the number of devices,
> +	     * ->degraded is measured against the larger of the pre
> +	     * and post number of devices.
> +	     */
> +	    spin_lock_irqsave(&conf->device_lock, flags);
> +	    mddev->degraded += (conf->raid_disks - conf-
> >previous_raid_disks)
> +		    - added_devices;
> +	    spin_unlock_irqrestore(&conf->device_lock, flags);
>  	}
>  	mddev->raid_disks = conf->raid_disks;
>  	mddev->reshape_position = conf->reshape_progress;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-17 14:13         ` Kwolek, Adam
@ 2011-01-19 20:48           ` NeilBrown
  2011-01-20  8:29             ` Kwolek, Adam
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2011-01-19 20:48 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech

On Mon, 17 Jan 2011 14:13:34 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Monday, January 17, 2011 1:45 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> > used
> > 
> > On Mon, 17 Jan 2011 10:28:21 +1100 NeilBrown <neilb@suse.de> wrote:
> > 
> > > On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de> wrote:
> > >
> > > > On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek
> > <adam.kwolek@intel.com> wrote:
> > > >
> > > > > Manually added spares are not used due to fact that they not
> > added to md configuration.
> > > > > Counters are updated only.
> > > > >
> > > > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > > > ---
> > > > >
> > > > >  drivers/md/raid5.c |    6 ++++--
> > > > >  1 files changed, 4 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > > index a2087c7..59c4150 100644
> > > > > --- a/drivers/md/raid5.c
> > > > > +++ b/drivers/md/raid5.c
> > > > > @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t
> > *mddev)
> > > > >  		} else if (rdev->raid_disk >= conf-
> > >previous_raid_disks
> > > > >  			   && !test_bit(Faulty, &rdev->flags)) {
> > > > >  			/* This is a spare that was manually added */
> > > > > -			set_bit(In_sync, &rdev->flags);
> > > > > -			added_devices++;
> > > > > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > > > > +				set_bit(In_sync, &rdev->flags);
> > > > > +				added_devices++;
> > > > > +			}
> > > > >  		}
> > > > >
> > > > >  	/* When a reshape changes the number of devices, ->degraded
> > > >
> > > > This should not be needed.
> > > > When a device is manually added, the desired slot number is written
> > to
> > > >    ..../md/dev-XXX/slot
> > > >
> > > > This calls slot_store (in md.c) which call mddev->pers-
> > >hot_add_disk which
> > > > for raid5 is raid5_add_disk.
> > > > So you shouldn't need to call raid5_add_disk again.
> > > >
> > >
> > > ahhh... I see.  raid5_add_disk doesn't do the right thing in that
> > case.  It
> > > actually indexes beyond the end of an array, which is bad.
> > >
> > > We possibly do need the raid5_add_disk where you had put it.  I'll
> > have a
> > > think and see what is best.
> > 
> > On third thoughts, I cannot see the problem you are seeing.
> > I even did some simple testing (manually writing to things in sysfs)
> > and it
> > seems to include the new device properly.
> > 
> > There are some issues that I found which are address by the following
> > patch,
> > but it isn't clear to me that any of them relate to what you are
> > seeing.
> > Maybe if you could be more specific about what you see happening?
> > 
> > Thanks,
> > NeilBrown
> 
> 
> When I'm not using raid5_add_disk() in raid5_start_reshape() added disk LED light doesn't blinks
> (but it should during reshape ;)),
> md doesn't make any signs that something goes wrong (even size can be increased).
> 
> I've made some debug, and at second (during reshape start) raid5_add_disk() call rcu_assign_pointer() is called again.
> This means that somehow previous assignment when slot is set was cleared.
> 
> Correct situation (all disks are used during reshape) I can archive when instead raid5_add_disk() call
> I've add the following code:
> 
>      struct disk_info *p = conf->disks + rdev->raid_disk;
>      rcu_assign_pointer(p->rdev, rdev);
> 
> and (conf->disks + rdev->raid_disk)->rdev pointer is present in configuration.
> I've checked that if I do not do call to rcu_assign_pointer() pointer  (p->rdev) has NULL value.
> In both cases call rcu_assign_pointer() sets p->rdev to the same value, so rdev doesn't change his location in memory.
> 
> 
> BR
> Adam
> 

Could you put some debug printks in slot_store (in md.c) and make sure it is
being called, and that it calls raid5_add_disk, and see what raid5_add_disk
does in that case?
Thanks,

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-19 20:48           ` NeilBrown
@ 2011-01-20  8:29             ` Kwolek, Adam
  2011-01-20  9:30               ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Kwolek, Adam @ 2011-01-20  8:29 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, January 19, 2011 9:49 PM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> used
> 
> On Mon, 17 Jan 2011 14:13:34 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Monday, January 17, 2011 1:45 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> > > used
> > >
> > > On Mon, 17 Jan 2011 10:28:21 +1100 NeilBrown <neilb@suse.de> wrote:
> > >
> > > > On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de>
> wrote:
> > > >
> > > > > On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek
> > > <adam.kwolek@intel.com> wrote:
> > > > >
> > > > > > Manually added spares are not used due to fact that they not
> > > added to md configuration.
> > > > > > Counters are updated only.
> > > > > >
> > > > > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > > > > ---
> > > > > >
> > > > > >  drivers/md/raid5.c |    6 ++++--
> > > > > >  1 files changed, 4 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > > > index a2087c7..59c4150 100644
> > > > > > --- a/drivers/md/raid5.c
> > > > > > +++ b/drivers/md/raid5.c
> > > > > > @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t
> > > *mddev)
> > > > > >  		} else if (rdev->raid_disk >= conf-
> > > >previous_raid_disks
> > > > > >  			   && !test_bit(Faulty, &rdev->flags)) {
> > > > > >  			/* This is a spare that was manually added */
> > > > > > -			set_bit(In_sync, &rdev->flags);
> > > > > > -			added_devices++;
> > > > > > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > > > > > +				set_bit(In_sync, &rdev->flags);
> > > > > > +				added_devices++;
> > > > > > +			}
> > > > > >  		}
> > > > > >
> > > > > >  	/* When a reshape changes the number of devices, ->degraded
> > > > >
> > > > > This should not be needed.
> > > > > When a device is manually added, the desired slot number is
> written
> > > to
> > > > >    ..../md/dev-XXX/slot
> > > > >
> > > > > This calls slot_store (in md.c) which call mddev->pers-
> > > >hot_add_disk which
> > > > > for raid5 is raid5_add_disk.
> > > > > So you shouldn't need to call raid5_add_disk again.
> > > > >
> > > >
> > > > ahhh... I see.  raid5_add_disk doesn't do the right thing in that
> > > case.  It
> > > > actually indexes beyond the end of an array, which is bad.
> > > >
> > > > We possibly do need the raid5_add_disk where you had put it.
> I'll
> > > have a
> > > > think and see what is best.
> > >
> > > On third thoughts, I cannot see the problem you are seeing.
> > > I even did some simple testing (manually writing to things in
> sysfs)
> > > and it
> > > seems to include the new device properly.
> > >
> > > There are some issues that I found which are address by the
> following
> > > patch,
> > > but it isn't clear to me that any of them relate to what you are
> > > seeing.
> > > Maybe if you could be more specific about what you see happening?
> > >
> > > Thanks,
> > > NeilBrown
> >
> >
> > When I'm not using raid5_add_disk() in raid5_start_reshape() added
> disk LED light doesn't blinks
> > (but it should during reshape ;)),
> > md doesn't make any signs that something goes wrong (even size can be
> increased).
> >
> > I've made some debug, and at second (during reshape start)
> raid5_add_disk() call rcu_assign_pointer() is called again.
> > This means that somehow previous assignment when slot is set was
> cleared.
> >
> > Correct situation (all disks are used during reshape) I can archive
> when instead raid5_add_disk() call
> > I've add the following code:
> >
> >      struct disk_info *p = conf->disks + rdev->raid_disk;
> >      rcu_assign_pointer(p->rdev, rdev);
> >
> > and (conf->disks + rdev->raid_disk)->rdev pointer is present in
> configuration.
> > I've checked that if I do not do call to rcu_assign_pointer() pointer
> (p->rdev) has NULL value.
> > In both cases call rcu_assign_pointer() sets p->rdev to the same
> value, so rdev doesn't change his location in memory.
> >
> >
> > BR
> > Adam
> >
> 
> Could you put some debug printks in slot_store (in md.c) and make sure
> it is
> being called, and that it calls raid5_add_disk, and see what
> raid5_add_disk
> does in that case?
> Thanks,
> 
> NeilBrown


I've did it before (and I've double checked now).
slot_store() calls raid5_add_disk() and inside it, rcu_assign_pointer() sets correct rdev pointer (I've checked, it is set during slot_store() call).
During raid5_start_reshape() this pointer has NULL value. When I set it again, disk is used properly. Second time rdev pointer I'm setting is the same as I've set during slot_store() call.
It seems that slot_store() works correctly. I've didn't find why rdev pointer is cleaned meanwhile. I have it in my plans after I've close mdadm OLCE/migration code (main parts at least ;)).

BR
Adam





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-20  8:29             ` Kwolek, Adam
@ 2011-01-20  9:30               ` NeilBrown
  2011-01-20  9:40                 ` Kwolek, Adam
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2011-01-20  9:30 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech

On Thu, 20 Jan 2011 08:29:12 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Wednesday, January 19, 2011 9:49 PM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> > used
> > 
> > On Mon, 17 Jan 2011 14:13:34 +0000 "Kwolek, Adam"
> > <adam.kwolek@intel.com>
> > wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > Sent: Monday, January 17, 2011 1:45 AM
> > > > To: Kwolek, Adam
> > > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > > Neubauer, Wojciech
> > > > Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> > > > used
> > > >
> > > > On Mon, 17 Jan 2011 10:28:21 +1100 NeilBrown <neilb@suse.de> wrote:
> > > >
> > > > > On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de>
> > wrote:
> > > > >
> > > > > > On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek
> > > > <adam.kwolek@intel.com> wrote:
> > > > > >
> > > > > > > Manually added spares are not used due to fact that they not
> > > > added to md configuration.
> > > > > > > Counters are updated only.
> > > > > > >
> > > > > > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > > > > > ---
> > > > > > >
> > > > > > >  drivers/md/raid5.c |    6 ++++--
> > > > > > >  1 files changed, 4 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > > > > index a2087c7..59c4150 100644
> > > > > > > --- a/drivers/md/raid5.c
> > > > > > > +++ b/drivers/md/raid5.c
> > > > > > > @@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t
> > > > *mddev)
> > > > > > >  		} else if (rdev->raid_disk >= conf-
> > > > >previous_raid_disks
> > > > > > >  			   && !test_bit(Faulty, &rdev->flags)) {
> > > > > > >  			/* This is a spare that was manually added */
> > > > > > > -			set_bit(In_sync, &rdev->flags);
> > > > > > > -			added_devices++;
> > > > > > > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > > > > > > +				set_bit(In_sync, &rdev->flags);
> > > > > > > +				added_devices++;
> > > > > > > +			}
> > > > > > >  		}
> > > > > > >
> > > > > > >  	/* When a reshape changes the number of devices, ->degraded
> > > > > >
> > > > > > This should not be needed.
> > > > > > When a device is manually added, the desired slot number is
> > written
> > > > to
> > > > > >    ..../md/dev-XXX/slot
> > > > > >
> > > > > > This calls slot_store (in md.c) which call mddev->pers-
> > > > >hot_add_disk which
> > > > > > for raid5 is raid5_add_disk.
> > > > > > So you shouldn't need to call raid5_add_disk again.
> > > > > >
> > > > >
> > > > > ahhh... I see.  raid5_add_disk doesn't do the right thing in that
> > > > case.  It
> > > > > actually indexes beyond the end of an array, which is bad.
> > > > >
> > > > > We possibly do need the raid5_add_disk where you had put it.
> > I'll
> > > > have a
> > > > > think and see what is best.
> > > >
> > > > On third thoughts, I cannot see the problem you are seeing.
> > > > I even did some simple testing (manually writing to things in
> > sysfs)
> > > > and it
> > > > seems to include the new device properly.
> > > >
> > > > There are some issues that I found which are address by the
> > following
> > > > patch,
> > > > but it isn't clear to me that any of them relate to what you are
> > > > seeing.
> > > > Maybe if you could be more specific about what you see happening?
> > > >
> > > > Thanks,
> > > > NeilBrown
> > >
> > >
> > > When I'm not using raid5_add_disk() in raid5_start_reshape() added
> > disk LED light doesn't blinks
> > > (but it should during reshape ;)),
> > > md doesn't make any signs that something goes wrong (even size can be
> > increased).
> > >
> > > I've made some debug, and at second (during reshape start)
> > raid5_add_disk() call rcu_assign_pointer() is called again.
> > > This means that somehow previous assignment when slot is set was
> > cleared.
> > >
> > > Correct situation (all disks are used during reshape) I can archive
> > when instead raid5_add_disk() call
> > > I've add the following code:
> > >
> > >      struct disk_info *p = conf->disks + rdev->raid_disk;
> > >      rcu_assign_pointer(p->rdev, rdev);
> > >
> > > and (conf->disks + rdev->raid_disk)->rdev pointer is present in
> > configuration.
> > > I've checked that if I do not do call to rcu_assign_pointer() pointer
> > (p->rdev) has NULL value.
> > > In both cases call rcu_assign_pointer() sets p->rdev to the same
> > value, so rdev doesn't change his location in memory.
> > >
> > >
> > > BR
> > > Adam
> > >
> > 
> > Could you put some debug printks in slot_store (in md.c) and make sure
> > it is
> > being called, and that it calls raid5_add_disk, and see what
> > raid5_add_disk
> > does in that case?
> > Thanks,
> > 
> > NeilBrown
> 
> 
> I've did it before (and I've double checked now).
> slot_store() calls raid5_add_disk() and inside it, rcu_assign_pointer() sets correct rdev pointer (I've checked, it is set during slot_store() call).
> During raid5_start_reshape() this pointer has NULL value. When I set it again, disk is used properly. Second time rdev pointer I'm setting is the same as I've set during slot_store() call.
> It seems that slot_store() works correctly. I've didn't find why rdev pointer is cleaned meanwhile. I have it in my plans after I've close mdadm OLCE/migration code (main parts at least ;)).
> 

Thanks.
It is almost certainly getting removed by remove_add_add_spares calling
raid5_remove_disk.  One of those should stop the removal happening in that
case, but presumably isn't.
I'll try to figure out what "should" happen and get you a patch to try - not
sure when.

NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-20  9:30               ` NeilBrown
@ 2011-01-20  9:40                 ` Kwolek, Adam
  2011-01-20 10:15                   ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Kwolek, Adam @ 2011-01-20  9:40 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Thursday, January 20, 2011 10:31 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> used
> 
> On Thu, 20 Jan 2011 08:29:12 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Wednesday, January 19, 2011 9:49 PM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not
> > > used
> > >
> > > On Mon, 17 Jan 2011 14:13:34 +0000 "Kwolek, Adam"
> > > <adam.kwolek@intel.com>
> > > wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > > Sent: Monday, January 17, 2011 1:45 AM
> > > > > To: Kwolek, Adam
> > > > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski,
> Ed;
> > > > > Neubauer, Wojciech
> > > > > Subject: Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is
> not
> > > > > used
> > > > >
> > > > > On Mon, 17 Jan 2011 10:28:21 +1100 NeilBrown <neilb@suse.de>
> wrote:
> > > > >
> > > > > > On Mon, 17 Jan 2011 10:11:28 +1100 NeilBrown <neilb@suse.de>
> > > wrote:
> > > > > >
> > > > > > > On Fri, 14 Jan 2011 14:00:00 +0100 Adam Kwolek
> > > > > <adam.kwolek@intel.com> wrote:
> > > > > > >
> > > > > > > > Manually added spares are not used due to fact that they
> not
> > > > > added to md configuration.
> > > > > > > > Counters are updated only.
> > > > > > > >
> > > > > > > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > > > > > > ---
> > > > > > > >
> > > > > > > >  drivers/md/raid5.c |    6 ++++--
> > > > > > > >  1 files changed, 4 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > > > > > index a2087c7..59c4150 100644
> > > > > > > > --- a/drivers/md/raid5.c
> > > > > > > > +++ b/drivers/md/raid5.c
> > > > > > > > @@ -5592,8 +5592,10 @@ static int
> raid5_start_reshape(mddev_t
> > > > > *mddev)
> > > > > > > >  		} else if (rdev->raid_disk >= conf-
> > > > > >previous_raid_disks
> > > > > > > >  			   && !test_bit(Faulty, &rdev->flags)) {
> > > > > > > >  			/* This is a spare that was manually
> added */
> > > > > > > > -			set_bit(In_sync, &rdev->flags);
> > > > > > > > -			added_devices++;
> > > > > > > > +			if (raid5_add_disk(mddev, rdev) == 0) {
> > > > > > > > +				set_bit(In_sync, &rdev->flags);
> > > > > > > > +				added_devices++;
> > > > > > > > +			}
> > > > > > > >  		}
> > > > > > > >
> > > > > > > >  	/* When a reshape changes the number of devices, -
> >degraded
> > > > > > >
> > > > > > > This should not be needed.
> > > > > > > When a device is manually added, the desired slot number is
> > > written
> > > > > to
> > > > > > >    ..../md/dev-XXX/slot
> > > > > > >
> > > > > > > This calls slot_store (in md.c) which call mddev->pers-
> > > > > >hot_add_disk which
> > > > > > > for raid5 is raid5_add_disk.
> > > > > > > So you shouldn't need to call raid5_add_disk again.
> > > > > > >
> > > > > >
> > > > > > ahhh... I see.  raid5_add_disk doesn't do the right thing in
> that
> > > > > case.  It
> > > > > > actually indexes beyond the end of an array, which is bad.
> > > > > >
> > > > > > We possibly do need the raid5_add_disk where you had put it.
> > > I'll
> > > > > have a
> > > > > > think and see what is best.
> > > > >
> > > > > On third thoughts, I cannot see the problem you are seeing.
> > > > > I even did some simple testing (manually writing to things in
> > > sysfs)
> > > > > and it
> > > > > seems to include the new device properly.
> > > > >
> > > > > There are some issues that I found which are address by the
> > > following
> > > > > patch,
> > > > > but it isn't clear to me that any of them relate to what you
> are
> > > > > seeing.
> > > > > Maybe if you could be more specific about what you see
> happening?
> > > > >
> > > > > Thanks,
> > > > > NeilBrown
> > > >
> > > >
> > > > When I'm not using raid5_add_disk() in raid5_start_reshape()
> added
> > > disk LED light doesn't blinks
> > > > (but it should during reshape ;)),
> > > > md doesn't make any signs that something goes wrong (even size
> can be
> > > increased).
> > > >
> > > > I've made some debug, and at second (during reshape start)
> > > raid5_add_disk() call rcu_assign_pointer() is called again.
> > > > This means that somehow previous assignment when slot is set was
> > > cleared.
> > > >
> > > > Correct situation (all disks are used during reshape) I can
> archive
> > > when instead raid5_add_disk() call
> > > > I've add the following code:
> > > >
> > > >      struct disk_info *p = conf->disks + rdev->raid_disk;
> > > >      rcu_assign_pointer(p->rdev, rdev);
> > > >
> > > > and (conf->disks + rdev->raid_disk)->rdev pointer is present in
> > > configuration.
> > > > I've checked that if I do not do call to rcu_assign_pointer()
> pointer
> > > (p->rdev) has NULL value.
> > > > In both cases call rcu_assign_pointer() sets p->rdev to the same
> > > value, so rdev doesn't change his location in memory.
> > > >
> > > >
> > > > BR
> > > > Adam
> > > >
> > >
> > > Could you put some debug printks in slot_store (in md.c) and make
> sure
> > > it is
> > > being called, and that it calls raid5_add_disk, and see what
> > > raid5_add_disk
> > > does in that case?
> > > Thanks,
> > >
> > > NeilBrown
> >
> >
> > I've did it before (and I've double checked now).
> > slot_store() calls raid5_add_disk() and inside it,
> rcu_assign_pointer() sets correct rdev pointer (I've checked, it is set
> during slot_store() call).
> > During raid5_start_reshape() this pointer has NULL value. When I set
> it again, disk is used properly. Second time rdev pointer I'm setting
> is the same as I've set during slot_store() call.
> > It seems that slot_store() works correctly. I've didn't find why rdev
> pointer is cleaned meanwhile. I have it in my plans after I've close
> mdadm OLCE/migration code (main parts at least ;)).
> >
> 
> Thanks.
> It is almost certainly getting removed by remove_add_add_spares calling
> raid5_remove_disk.  One of those should stop the removal happening in
> that
> case, but presumably isn't.
> I'll try to figure out what "should" happen and get you a patch to try
> - not
> sure when.
> 
> NeilBrown


Could you publish your current mdadm development code base? 
It will be easier to synchronize changes.
(If yes, please let me know development branch name to pull)

Thanks 
Adam


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-20  9:40                 ` Kwolek, Adam
@ 2011-01-20 10:15                   ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2011-01-20 10:15 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech

On Thu, 20 Jan 2011 09:40:34 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> Could you publish your current mdadm development code base? 
> It will be easier to synchronize changes.
> (If yes, please let me know development branch name to pull)
> 
> Thanks 
> Adam


I've just now pushed it to devel-3.2

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] md/raid5: FIX: manually-added spare is not used
  2011-01-14 12:38 [PATCH 0/2] fixes for manually-added spares in raid5 Adam Kwolek
@ 2011-01-14 12:38 ` Adam Kwolek
  0 siblings, 0 replies; 14+ messages in thread
From: Adam Kwolek @ 2011-01-14 12:38 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Manually added spares are not used due to fact that they not added to md configuration.
Counters are updated only.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 drivers/md/raid5.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a2087c7..b98b01c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5592,8 +5592,10 @@ static int raid5_start_reshape(mddev_t *mddev)
 		} else if (rdev->raid_disk >= conf->previous_raid_disks
 			   && !test_bit(Faulty, &rdev->flags)) {
 			/* This is a spare that was manually added */
-			set_bit(In_sync, &rdev->flags);
-			added_devices++;
+			if (raid5_add_disk(mddev, rdev) == 0) {
+			    set_bit(In_sync, &rdev->flags);
+			    added_devices++;
+			}
 		}
 
 	/* When a reshape changes the number of devices, ->degraded


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-01-20 10:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-14 12:59 [PATCH 0/2] fixes for manually-added spares in raid5 (2nd) Adam Kwolek
2011-01-14 13:00 ` [PATCH 1/2] md/raid5: FIX: manually-added spare is not used Adam Kwolek
2011-01-16 23:11   ` NeilBrown
2011-01-16 23:28     ` NeilBrown
2011-01-17  0:44       ` NeilBrown
2011-01-17 14:13         ` Kwolek, Adam
2011-01-19 20:48           ` NeilBrown
2011-01-20  8:29             ` Kwolek, Adam
2011-01-20  9:30               ` NeilBrown
2011-01-20  9:40                 ` Kwolek, Adam
2011-01-20 10:15                   ` NeilBrown
2011-01-14 13:00 ` [PATCH 2/2] md/raid5: FIX: reshape on degraded devices has wrong configuration Adam Kwolek
2011-01-16 23:30   ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2011-01-14 12:38 [PATCH 0/2] fixes for manually-added spares in raid5 Adam Kwolek
2011-01-14 12:38 ` [PATCH 1/2] md/raid5: FIX: manually-added spare is not used Adam Kwolek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.