[PATCH] FIX: Cannot continue reshape if incremental assembly is used

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] FIX: Cannot continue reshape if incremental assembly is used
@ 2011-09-01 13:18 Lukasz Dorau
  2011-09-06 21:34 ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Lukasz Dorau @ 2011-09-01 13:18 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, marcin.labun, ed.ciechanowski

Description of the bug:
Interrupted reshape cannot be continued using incremental assembly.
Array becomes inactive.

Cause of the bug:
Reshape tried to continue with insufficient number of disks
added by incremental assembly (tested using capacity expansion).

Solution:
During reshape adding disks to array should be blocked until
minimum required number of disks is ready to be added.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
---
 Assemble.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 25cfec1..da43162 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1531,6 +1531,45 @@ int assemble_container_content(struct supertype *st, int mdfd,
 
 	if (sra)
 		sysfs_free(sra);
+	if (content->reshape_active) {
+		int disks_counter = 0;
+		int required_disks;
+		required_disks = content->array.raid_disks;
+		/* check if disks are removed */
+		if (content->delta_disks < 0)
+			required_disks += content->delta_disks;
+		/* Count devices available for assemblation.
+		*  In case of incremental assemblation during reshape
+		*  allow to add disks only if required minimum number of disks
+		*  is already collected to avoid assemblation problem.
+		*  */
+		for (dev = content->devs; dev; dev = dev->next) {
+			if (dev->disk.raid_disk >= 0)
+				disks_counter++;
+		}
+		/* allow for degradation */
+		switch (content->array.level) {
+		case 6:
+			required_disks--;
+		case 4:
+		case 5:
+			required_disks--;
+		default:
+			break;
+		}
+		/* check now, if number of disks allows for assemblation
+		*               */
+		if (disks_counter < required_disks) {
+			if (verbose >= 0)
+				fprintf(stderr, Name
+						": %s not assembled with %d devices "
+						"(required disks for assemblation: %i).\n",
+						chosen_name, disks_counter,
+						required_disks);
+			return 1;
+		}
+		block_subarray(content);
+	}
 	old_raid_disks = content->array.raid_disks - content->delta_disks;
 	for (dev = content->devs; dev; dev = dev->next)
 		if (sysfs_add_disk(content, dev, 1) == 0) {

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
  2011-09-01 13:18 [PATCH] FIX: Cannot continue reshape if incremental assembly is used Lukasz Dorau
@ 2011-09-06 21:34 ` Dan Williams
  2011-09-07  2:37   ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2011-09-06 21:34 UTC (permalink / raw)
  To: Lukasz Dorau; +Cc: neilb, linux-raid, marcin.labun, ed.ciechanowski

On Thu, Sep 1, 2011 at 6:18 AM, Lukasz Dorau <lukasz.dorau@intel.com> wrote:
> Description of the bug:
> Interrupted reshape cannot be continued using incremental assembly.
> Array becomes inactive.
>
> Cause of the bug:
> Reshape tried to continue with insufficient number of disks
> added by incremental assembly (tested using capacity expansion).
>
> Solution:
> During reshape adding disks to array should be blocked until
> minimum required number of disks is ready to be added.

Can you provide a script test-case to reproduce the problem?

> Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
> ---
>  Assemble.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/Assemble.c b/Assemble.c
> index 25cfec1..da43162 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -1531,6 +1531,45 @@ int assemble_container_content(struct supertype *st, int mdfd,
>
>        if (sra)
>                sysfs_free(sra);
> +       if (content->reshape_active) {
> +               int disks_counter = 0;
> +               int required_disks;
> +               required_disks = content->array.raid_disks;
> +               /* check if disks are removed */
> +               if (content->delta_disks < 0)
> +                       required_disks += content->delta_disks;
> +               /* Count devices available for assemblation.
> +               *  In case of incremental assemblation during reshape
> +               *  allow to add disks only if required minimum number of disks
> +               *  is already collected to avoid assemblation problem.
> +               *  */
> +               for (dev = content->devs; dev; dev = dev->next) {
> +                       if (dev->disk.raid_disk >= 0)
> +                               disks_counter++;
> +               }
> +               /* allow for degradation */
> +               switch (content->array.level) {
> +               case 6:
> +                       required_disks--;
> +               case 4:
> +               case 5:
> +                       required_disks--;
> +               default:
> +                       break;
> +               }
> +               /* check now, if number of disks allows for assemblation
> +               *               */
> +               if (disks_counter < required_disks) {
> +                       if (verbose >= 0)
> +                               fprintf(stderr, Name
> +                                               ": %s not assembled with %d devices "
> +                                               "(required disks for assemblation: %i).\n",
> +                                               chosen_name, disks_counter,
> +                                               required_disks);
> +                       return 1;
> +               }
> +               block_subarray(content);
> +       }

Checking that the expected number of disks is available is something
the existing code already does, so I don't understand why we need
another open-coded check?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
  2011-09-06 21:34 ` Dan Williams
@ 2011-09-07  2:37   ` NeilBrown
  2011-09-08  8:26     ` Dorau, Lukasz
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-09-07  2:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: Lukasz Dorau, linux-raid, marcin.labun, ed.ciechanowski

On Tue, 6 Sep 2011 14:34:42 -0700 Dan Williams <dan.j.williams@intel.com>
wrote:

> On Thu, Sep 1, 2011 at 6:18 AM, Lukasz Dorau <lukasz.dorau@intel.com> wrote:
> > Description of the bug:
> > Interrupted reshape cannot be continued using incremental assembly.
> > Array becomes inactive.
> >
> > Cause of the bug:
> > Reshape tried to continue with insufficient number of disks
> > added by incremental assembly (tested using capacity expansion).
> >
> > Solution:
> > During reshape adding disks to array should be blocked until
> > minimum required number of disks is ready to be added.
> 
> Can you provide a script test-case to reproduce the problem?

I can:

mdadm -C /dev/md/imsm -e imsm -n 4 /dev/sd[abcd]
mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
mdadm --wait /dev/md/r5
mdadm -G /dev/md/imsm -n4
sleep 10
mdadm -Ss
mdadm -I /dev/sda
mdadm -I /dev/sdb
mdadm -I /dev/sdc

array is started and reshape continues.

The problem is that container_content reports that array.working_disks is 3 rather than 4.
'working_disks' should be the number of disks int the array that were working last time
the array was assembled.
However the imsm code only counts devices that can currently be found.
I'm not familiar enough with the IMSM metadata to fix this.
However by looking at the metadata on just one device in an array it should be possible
to work out how many were working last time, and report that count.

NeilBrown


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
  2011-09-07  2:37   ` NeilBrown
@ 2011-09-08  8:26     ` Dorau, Lukasz
  0 siblings, 0 replies; 8+ messages in thread
From: Dorau, Lukasz @ 2011-09-08  8:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Labun, Marcin, Ciechanowski, Ed, Williams, Dan J

On Wed, Sep 07, 2011 4:38 AM Neil Brown <neilb@suse.de> wrote:
> 
> On Tue, 6 Sep 2011 14:34:42 -0700 Dan Williams <dan.j.williams@intel.com>
> wrote:
> 
> > On Thu, Sep 1, 2011 at 6:18 AM, Lukasz Dorau <lukasz.dorau@intel.com>
> wrote:
> > > Description of the bug:
> > > Interrupted reshape cannot be continued using incremental assembly.
> > > Array becomes inactive.
> > >
> > > Cause of the bug:
> > > Reshape tried to continue with insufficient number of disks
> > > added by incremental assembly (tested using capacity expansion).
> > >
> > > Solution:
> > > During reshape adding disks to array should be blocked until
> > > minimum required number of disks is ready to be added.
> >
> > Can you provide a script test-case to reproduce the problem?
> 

The patch originally was intended to fix the following issue:

export MDADM_EXPERIMENTAL=1
mdadm -C /dev/md/imsm0 -amd -e imsm -n 2 /dev/sda /dev/sdb -R
mdadm -C /dev/md/raid1-0 -amd -l1  --size 5G -n 2 /dev/sda /dev/sdb -R 
mdadm --wait /dev/md/raid1-0 
mdadm /dev/md/imsm0 --add /dev/sdc
mdadm /dev/md/imsm0 --add /dev/sdd
mdadm -G /dev/md/raid1-0 -l0 
mdadm -G /dev/md/imsm0 -n 4 
sleep 5
mdadm -Ss
mdadm -I /dev/sda
mdadm -I /dev/sdb
# At this moment mdadm tries to start array and continue reshape,
# however there is insufficient number of disks, so it fails
# (not enough operational devices - failed to run raid set)
# - see syslog below.
mdadm -I /dev/sdc
# array is not started and reshape does not continue
mdadm -I /dev/sdd
# array is not started and reshape does not continue

### syslog:
kernel: md: bind<sda>
kernel: md: bind<sdb>
kernel: md: bind<sda>
kernel: md: bind<sdb>
kernel: bio: create slab <bio-1> at 1
kernel: md/raid:md126: reshape will continue
kernel: md/raid:md126: device sdb operational as raid disk 0
kernel: md/raid:md126: device sda operational as raid disk 1
kernel: md/raid:md126: allocated 5334kB
kernel: md/raid:md126: not enough operational devices (3/5 failed)
kernel: md/raid:md126: failed to run raid set.
kernel: md: pers->run() failed ...
kernel: md: bind<sdc>
kernel: md: bind<sdd>
kernel: md: bind<sdd>
kernel: md: bind<sdc>
kernel: md: export_rdev(sda)
kernel: md: export_rdev(sdb)

Regards,
Lukasz 

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
       [not found]       ` <CABE8wwuheLbPA8JCJ0pw_nNOsWBWowHmLZ+piUOHXYcoFRtuHA@mail.gmail.com>
@ 2011-09-19  6:40         ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2011-09-19  6:40 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: Ciechanowski, Ed, Labun, Marcin, linux-raid, Dorau, Lukasz

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

On Wed, 7 Sep 2011 21:42:28 -0700 "Williams, Dan J"
<dan.j.williams@intel.com> wrote:

> On Sep 7, 2011 6:26 PM, "NeilBrown" <neilb@suse.de> wrote:
> >
> > On Wed, 7 Sep 2011 18:11:12 -0700 "Williams, Dan J"
> > <dan.j.williams@intel.com> wrote:
> >
> > > On Wed, Sep 7, 2011 at 6:21 AM, Dorau, Lukasz <lukasz.dorau@intel.com>
> wrote:
> > > Hmm, this might just be cribbed from the initial DDF implementation,
> > > should be straightforward to reuse the count we use for
> > > container_enough, but I'm not seeing where Incremental uses
> > > working_disks for external arrays...
> >
> > Assemble.c: assemble_container_content()
> > ....
> >        if (runstop > 0 ||
> >                 (working + preexist + expansion) >=
> >                        content->array.working_disks) {
> > ....
> >
> 
> ...so now i'd like to kill ->container_enough, because similar to the
> MD_SB_INVALID suggestion it's probably better to let ->container_content
> flag the true state rather than some sideband "don't call
> ->container_content yet" mechanism.

(catching up on some old mail).

I would be happy with that.

I don't exactly object to ->container_enough as it is conceivable that the
container knows something about the whole that you cannot deduce from the
member arrays.
But we definitely to want working_disks to be accurate so that we don't just
trust container_enough.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
  2011-09-08  1:11   ` Williams, Dan J
@ 2011-09-08  1:26     ` NeilBrown
       [not found]       ` <CABE8wwuheLbPA8JCJ0pw_nNOsWBWowHmLZ+piUOHXYcoFRtuHA@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-09-08  1:26 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: Dorau, Lukasz, linux-raid, Labun, Marcin, Ciechanowski, Ed

On Wed, 7 Sep 2011 18:11:12 -0700 "Williams, Dan J"
<dan.j.williams@intel.com> wrote:

> On Wed, Sep 7, 2011 at 6:21 AM, Dorau, Lukasz <lukasz.dorau@intel.com> wrote:
> > On Wed, Sep 07, 2011 4:38 AM Neil Brown <neilb@suse.de> wrote:
> >>
> >> On Tue, 6 Sep 2011 14:34:42 -0700 Dan Williams <dan.j.williams@intel.com>
> >> wrote:
> >>
> >> > On Thu, Sep 1, 2011 at 6:18 AM, Lukasz Dorau <lukasz.dorau@intel.com>
> >> wrote:
> >> > > Description of the bug:
> >> > > Interrupted reshape cannot be continued using incremental assembly.
> >> > > Array becomes inactive.
> >> > >
> >> > > Cause of the bug:
> >> > > Reshape tried to continue with insufficient number of disks
> >> > > added by incremental assembly (tested using capacity expansion).
> >> > >
> >> > > Solution:
> >> > > During reshape adding disks to array should be blocked until
> >> > > minimum required number of disks is ready to be added.
> >> >
> >> > Can you provide a script test-case to reproduce the problem?
> >>
> >> I can:
> >>
> >> mdadm -C /dev/md/imsm -e imsm -n 4 /dev/sd[abcd]
> >> mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
> >> mdadm --wait /dev/md/r5
> >> mdadm -G /dev/md/imsm -n4
> >> sleep 10
> >> mdadm -Ss
> >> mdadm -I /dev/sda
> >> mdadm -I /dev/sdb
> >> mdadm -I /dev/sdc
> >>
> >> array is started and reshape continues.
> >>
> >> The problem is that container_content reports that array.working_disks is 3
> >> rather than 4.
> >> 'working_disks' should be the number of disks int the array that were working
> >> last time
> >> the array was assembled.
> 
> Hmm, this might just be cribbed from the initial DDF implementation,
> should be straightforward to reuse the count we use for
> container_enough, but I'm not seeing where Incremental uses
> working_disks for external arrays...

Assemble.c: assemble_container_content()
....
	if (runstop > 0 ||
		 (working + preexist + expansion) >=
			content->array.working_disks) {
....

> 
> >> However the imsm code only counts devices that can currently be found.
> >> I'm not familiar enough with the IMSM metadata to fix this.
> >> However by looking at the metadata on just one device in an array it should be
> >> possible
> >> to work out how many were working last time, and report that count.
> >>
> >
> > Neil, please consider the following script test-case (not 4 but 5 drives finally in the array):
> >
> > mdadm -C /dev/md/imsm -e imsm -n 5 /dev/sd[abcde]
> > mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
> > mdadm --wait /dev/md/r5
> > mdadm -G /dev/md/imsm -n5
> > sleep 10
> > mdadm -Ss
> > mdadm -I /dev/sda
> > mdadm -I /dev/sdb
> > mdadm -I /dev/sdc
> > # array is not started and reshape does not continue!
> > mdadm -I /dev/sdd
> >
> > and now array is started and reshape continues - the minimum required number of disks is added to array already.
> >
> > So the question is:  when mdadm should start the array using incremental assembly?:
> 
> As soon as all drives are present, or when the minimum number is
> present and --run is specified.
> 
> > 1) when minimum required number of disks is added and (degraded) array can be started or
> > 2) when all disks that were working last time the array was assembled are added.
> 
> This is what ->container_enough attempts to identify, and it looks
> like you are running into the fact that it does not take into account
> migration.  imsm_count_failed() is returning the wrong value, and it
> has the comment:
> 
>         /* FIXME add support for online capacity expansion and
>          * raid-level-migration
>          */
> The routine in getinfo_super_imsm should also be looking at map0,
> currently it is looking at map1 to determine the number of device
> members.
> 
> > If the second is true, there is another question: when to decide to give up waiting for non-present disks that can be (e.g.) removed meanwhile by user?
> 
> Not really mdadm's problem.  That's primarily up to the udev policy.

Yes.  The theory is that once "enough time" as passed you run "mdadm -IRs" to
pick up the pieces.  However I don't know where we should put that command.

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
  2011-09-07 13:21 ` Dorau, Lukasz
@ 2011-09-08  1:11   ` Williams, Dan J
  2011-09-08  1:26     ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Williams, Dan J @ 2011-09-08  1:11 UTC (permalink / raw)
  To: Dorau, Lukasz; +Cc: NeilBrown, linux-raid, Labun, Marcin, Ciechanowski, Ed

On Wed, Sep 7, 2011 at 6:21 AM, Dorau, Lukasz <lukasz.dorau@intel.com> wrote:
> On Wed, Sep 07, 2011 4:38 AM Neil Brown <neilb@suse.de> wrote:
>>
>> On Tue, 6 Sep 2011 14:34:42 -0700 Dan Williams <dan.j.williams@intel.com>
>> wrote:
>>
>> > On Thu, Sep 1, 2011 at 6:18 AM, Lukasz Dorau <lukasz.dorau@intel.com>
>> wrote:
>> > > Description of the bug:
>> > > Interrupted reshape cannot be continued using incremental assembly.
>> > > Array becomes inactive.
>> > >
>> > > Cause of the bug:
>> > > Reshape tried to continue with insufficient number of disks
>> > > added by incremental assembly (tested using capacity expansion).
>> > >
>> > > Solution:
>> > > During reshape adding disks to array should be blocked until
>> > > minimum required number of disks is ready to be added.
>> >
>> > Can you provide a script test-case to reproduce the problem?
>>
>> I can:
>>
>> mdadm -C /dev/md/imsm -e imsm -n 4 /dev/sd[abcd]
>> mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
>> mdadm --wait /dev/md/r5
>> mdadm -G /dev/md/imsm -n4
>> sleep 10
>> mdadm -Ss
>> mdadm -I /dev/sda
>> mdadm -I /dev/sdb
>> mdadm -I /dev/sdc
>>
>> array is started and reshape continues.
>>
>> The problem is that container_content reports that array.working_disks is 3
>> rather than 4.
>> 'working_disks' should be the number of disks int the array that were working
>> last time
>> the array was assembled.

Hmm, this might just be cribbed from the initial DDF implementation,
should be straightforward to reuse the count we use for
container_enough, but I'm not seeing where Incremental uses
working_disks for external arrays...

>> However the imsm code only counts devices that can currently be found.
>> I'm not familiar enough with the IMSM metadata to fix this.
>> However by looking at the metadata on just one device in an array it should be
>> possible
>> to work out how many were working last time, and report that count.
>>
>
> Neil, please consider the following script test-case (not 4 but 5 drives finally in the array):
>
> mdadm -C /dev/md/imsm -e imsm -n 5 /dev/sd[abcde]
> mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
> mdadm --wait /dev/md/r5
> mdadm -G /dev/md/imsm -n5
> sleep 10
> mdadm -Ss
> mdadm -I /dev/sda
> mdadm -I /dev/sdb
> mdadm -I /dev/sdc
> # array is not started and reshape does not continue!
> mdadm -I /dev/sdd
>
> and now array is started and reshape continues - the minimum required number of disks is added to array already.
>
> So the question is:  when mdadm should start the array using incremental assembly?:

As soon as all drives are present, or when the minimum number is
present and --run is specified.

> 1) when minimum required number of disks is added and (degraded) array can be started or
> 2) when all disks that were working last time the array was assembled are added.

This is what ->container_enough attempts to identify, and it looks
like you are running into the fact that it does not take into account
migration.  imsm_count_failed() is returning the wrong value, and it
has the comment:

        /* FIXME add support for online capacity expansion and
         * raid-level-migration
         */
The routine in getinfo_super_imsm should also be looking at map0,
currently it is looking at map1 to determine the number of device
members.

> If the second is true, there is another question: when to decide to give up waiting for non-present disks that can be (e.g.) removed meanwhile by user?

Not really mdadm's problem.  That's primarily up to the udev policy.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] FIX: Cannot continue reshape if incremental assembly is used
@ 2011-09-07 13:21 ` Dorau, Lukasz
  2011-09-08  1:11   ` Williams, Dan J
  0 siblings, 1 reply; 8+ messages in thread
From: Dorau, Lukasz @ 2011-09-07 13:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Labun, Marcin, Ciechanowski, Ed, Williams, Dan J

On Wed, Sep 07, 2011 4:38 AM Neil Brown <neilb@suse.de> wrote:
> 
> On Tue, 6 Sep 2011 14:34:42 -0700 Dan Williams <dan.j.williams@intel.com>
> wrote:
> 
> > On Thu, Sep 1, 2011 at 6:18 AM, Lukasz Dorau <lukasz.dorau@intel.com>
> wrote:
> > > Description of the bug:
> > > Interrupted reshape cannot be continued using incremental assembly.
> > > Array becomes inactive.
> > >
> > > Cause of the bug:
> > > Reshape tried to continue with insufficient number of disks
> > > added by incremental assembly (tested using capacity expansion).
> > >
> > > Solution:
> > > During reshape adding disks to array should be blocked until
> > > minimum required number of disks is ready to be added.
> >
> > Can you provide a script test-case to reproduce the problem?
> 
> I can:
> 
> mdadm -C /dev/md/imsm -e imsm -n 4 /dev/sd[abcd]
> mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
> mdadm --wait /dev/md/r5
> mdadm -G /dev/md/imsm -n4
> sleep 10
> mdadm -Ss
> mdadm -I /dev/sda
> mdadm -I /dev/sdb
> mdadm -I /dev/sdc
> 
> array is started and reshape continues.
> 
> The problem is that container_content reports that array.working_disks is 3
> rather than 4.
> 'working_disks' should be the number of disks int the array that were working
> last time
> the array was assembled.
> However the imsm code only counts devices that can currently be found.
> I'm not familiar enough with the IMSM metadata to fix this.
> However by looking at the metadata on just one device in an array it should be
> possible
> to work out how many were working last time, and report that count.
> 

Neil, please consider the following script test-case (not 4 but 5 drives finally in the array):

mdadm -C /dev/md/imsm -e imsm -n 5 /dev/sd[abcde]
mdadm -C /dev/md/r5 -n3 -l5 /dev/md/imsm -z 2000000
mdadm --wait /dev/md/r5
mdadm -G /dev/md/imsm -n5
sleep 10
mdadm -Ss
mdadm -I /dev/sda
mdadm -I /dev/sdb
mdadm -I /dev/sdc
# array is not started and reshape does not continue!
mdadm -I /dev/sdd

and now array is started and reshape continues - the minimum required number of disks is added to array already.

So the question is:  when mdadm should start the array using incremental assembly?:
1) when minimum required number of disks is added and (degraded) array can be started or
2) when all disks that were working last time the array was assembled are added. 

If the second is true, there is another question: when to decide to give up waiting for non-present disks that can be (e.g.) removed meanwhile by user?

What do you suggest?

Regards,
Lukasz 

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-09-19  6:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 13:18 [PATCH] FIX: Cannot continue reshape if incremental assembly is used Lukasz Dorau
2011-09-06 21:34 ` Dan Williams
2011-09-07  2:37   ` NeilBrown
2011-09-08  8:26     ` Dorau, Lukasz
     [not found] <AcxtYQevh8pgNQALRp+uJrxQ29o1Kg==>
2011-09-07 13:21 ` Dorau, Lukasz
2011-09-08  1:11   ` Williams, Dan J
2011-09-08  1:26     ` NeilBrown
     [not found]       ` <CABE8wwuheLbPA8JCJ0pw_nNOsWBWowHmLZ+piUOHXYcoFRtuHA@mail.gmail.com>
2011-09-19  6:40         ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.