All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Imsm OROM compatibility/boot support fixes
@ 2012-04-05 15:28 Przemyslaw Czarnowski
  2012-04-05 15:29 ` [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks Przemyslaw Czarnowski
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Przemyslaw Czarnowski @ 2012-04-05 15:28 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

This series of patches contain fixes regarding OROM and mdadm compatibility
connected with handling of recovery/degraded states and missing disks.
OROM behaves little differently when it marks disk as missing (see description
of patches for details)
Last one in the series does not allow to modify metadata (in particular does
not finish migration) on disks when array is set to read only by kernel
(eg. during restart of system, when root partition lies on raid device).

---

Przemyslaw Czarnowski (4):
      imsm: monitor: do not finish migration if there are no failed disks
      imsm: clean up missing disks if there are any left after migration
      imsm: avoid double change of serial number of missing device
      imsm: monitor: do not finish recovery, when raid goes to read-only


 monitor.c     |    1 +
 super-intel.c |   33 ++++++++++++++++++++++++---------
 2 files changed, 25 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks
  2012-04-05 15:28 [PATCH 0/4] Imsm OROM compatibility/boot support fixes Przemyslaw Czarnowski
@ 2012-04-05 15:29 ` Przemyslaw Czarnowski
  2012-04-05 17:56   ` Williams, Dan J
  2012-04-05 15:30 ` [PATCH 2/4] imsm: clean up missing disks if there are any left after migration Przemyslaw Czarnowski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Przemyslaw Czarnowski @ 2012-04-05 15:29 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Transition from "degraded" to "recovery" made in OROM is slightly different
than the same transision in mdadm. Missing disk is not removed from list of
raid devices, but just from map. Therefore mdadm should not end migration
basing on existence of list of missing disks but should rely on count of
failed disks.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
---
 super-intel.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index dad4c4d..e1cd9b8 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -6953,6 +6953,12 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
 	if (!super->missing)
 		return;
 
+	/* When orom adds replacement for missing disk it does
+	 * not remove entry of missing disk, but just updates map with
+	 * new added disk. So it is not enough just to test if there is
+	 * any missing disk, we have to look if there are any failed disks
+	 * in map to stop migration */
+
 	dprintf("imsm: mark missing\n");
 	/* end process for initialization and rebuild only
 	 */
@@ -6963,7 +6969,8 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
 		failed = imsm_count_failed(super, dev, MAP_0);
 		map_state = imsm_check_degraded(super, dev, failed, MAP_0);
 
-		end_migration(dev, super, map_state);
+		if (failed)
+			end_migration(dev, super, map_state);
 	}
 	for (dl = super->missing; dl; dl = dl->next)
 		mark_missing(dev, &dl->disk, dl->index);


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] imsm: clean up missing disks if there are any left after migration
  2012-04-05 15:28 [PATCH 0/4] Imsm OROM compatibility/boot support fixes Przemyslaw Czarnowski
  2012-04-05 15:29 ` [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks Przemyslaw Czarnowski
@ 2012-04-05 15:30 ` Przemyslaw Czarnowski
  2012-04-05 17:06   ` Williams, Dan J
  2012-04-05 15:30 ` [PATCH 3/4] imsm: avoid double change of serial number of missing device Przemyslaw Czarnowski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Przemyslaw Czarnowski @ 2012-04-05 15:30 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

It is for compatibility with OROM way to set "recovery" state.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
---
 super-intel.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index e1cd9b8..f843997 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7211,6 +7211,8 @@ skip_mark_checkpoint:
 	return consistent;
 }
 
+static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
+
 static void imsm_set_disk(struct active_array *a, int n, int state)
 {
 	int inst = a->info.container_member;
@@ -7260,6 +7262,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
 			dprintf("while rebuilding");
 			end_migration(dev, super, map_state);
 			map = get_imsm_map(dev, MAP_0);
+			/* sweep list of missing disks, OROM might leave
+			 * some residuals from old members */
+			while (super->missing) {
+				imsm_delete(super, &super->missing, super->missing->index);
+			}
 			map->failed_disk_num = ~0;
 			super->updates_pending++;
 			a->last_checkpoint = 0;
@@ -7792,8 +7799,6 @@ static int remove_disk_super(struct intel_super *super, int major, int minor)
 	return 0;
 }
 
-static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
-
 static int add_remove_disk_update(struct intel_super *super)
 {
 	int check_degraded = 0;


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] imsm: avoid double change of serial number of missing device
  2012-04-05 15:28 [PATCH 0/4] Imsm OROM compatibility/boot support fixes Przemyslaw Czarnowski
  2012-04-05 15:29 ` [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks Przemyslaw Czarnowski
  2012-04-05 15:30 ` [PATCH 2/4] imsm: clean up missing disks if there are any left after migration Przemyslaw Czarnowski
@ 2012-04-05 15:30 ` Przemyslaw Czarnowski
  2012-04-05 17:41   ` Williams, Dan J
  2012-04-05 15:31 ` [PATCH 4/4] imsm: monitor: do not finish recovery, when raid goes to read-only Przemyslaw Czarnowski
  2012-04-09 23:22 ` [PATCH 0/4] Imsm OROM compatibility/boot support fixes NeilBrown
  4 siblings, 1 reply; 10+ messages in thread
From: Przemyslaw Czarnowski @ 2012-04-05 15:30 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

If degraded state is set in OROM, it do not set 'failed' flag for missing,
device, just changes serial number, sets scsiId to 0xffff and sets highest
bit in ord table in map. When mdadm replaces missing disk, the one left
there is marked as failed and missing again (changing already marked serial
number).

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
---
 super-intel.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index f843997..b361b72 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -6908,12 +6908,15 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
 	if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
 		return 0;
 
-	memcpy(buf, disk->serial, MAX_RAID_SERIAL_LEN);
-	buf[MAX_RAID_SERIAL_LEN] = '\000';
-	strcat(buf, ":0");
-	if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
-		shift = len - MAX_RAID_SERIAL_LEN + 1;
-	strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);
+	/* do not add ":0", OROM has already done it */
+	if (!(ord & IMSM_ORD_REBUILD)) {
+		memcpy(buf, disk->serial, MAX_RAID_SERIAL_LEN);
+		buf[MAX_RAID_SERIAL_LEN] = '\000';
+		strcat(buf, ":0");
+		if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
+			shift = len - MAX_RAID_SERIAL_LEN + 1;
+		strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);
+	}
 
 	disk->status |= FAILED_DISK;
 	set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD);


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] imsm: monitor: do not finish recovery, when raid goes to read-only
  2012-04-05 15:28 [PATCH 0/4] Imsm OROM compatibility/boot support fixes Przemyslaw Czarnowski
                   ` (2 preceding siblings ...)
  2012-04-05 15:30 ` [PATCH 3/4] imsm: avoid double change of serial number of missing device Przemyslaw Czarnowski
@ 2012-04-05 15:31 ` Przemyslaw Czarnowski
  2012-04-05 17:37   ` Williams, Dan J
  2012-04-09 23:22 ` [PATCH 0/4] Imsm OROM compatibility/boot support fixes NeilBrown
  4 siblings, 1 reply; 10+ messages in thread
From: Przemyslaw Czarnowski @ 2012-04-05 15:31 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

If kernel forces raid to be read-only (eg. during reboot, and root
partition lies on array) monitor should not finish recovery and leave
metadata untouched.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
---
 monitor.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index c987d10..6a736a0 100644
--- a/monitor.c
+++ b/monitor.c
@@ -304,6 +304,7 @@ static int read_and_act(struct active_array *a)
 	}
 
 	if (!deactivate &&
+	    a->curr_state != readonly &&
 	    a->curr_action == idle &&
 	    a->prev_action == recover) {
 		/* A recovery has finished.  Some disks may be in sync now,


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] imsm: clean up missing disks if there are any left after migration
  2012-04-05 15:30 ` [PATCH 2/4] imsm: clean up missing disks if there are any left after migration Przemyslaw Czarnowski
@ 2012-04-05 17:06   ` Williams, Dan J
  0 siblings, 0 replies; 10+ messages in thread
From: Williams, Dan J @ 2012-04-05 17:06 UTC (permalink / raw)
  To: Przemyslaw Czarnowski; +Cc: neilb, linux-raid, ed.ciechanowski

On Thu, Apr 5, 2012 at 8:30 AM, Przemyslaw Czarnowski
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> It is for compatibility with OROM way to set "recovery" state.
>
> Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
> ---
>  super-intel.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index e1cd9b8..f843997 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -7211,6 +7211,8 @@ skip_mark_checkpoint:
>        return consistent;
>  }
>
> +static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
> +
>  static void imsm_set_disk(struct active_array *a, int n, int state)
>  {
>        int inst = a->info.container_member;
> @@ -7260,6 +7262,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
>                        dprintf("while rebuilding");
>                        end_migration(dev, super, map_state);
>                        map = get_imsm_map(dev, MAP_0);
> +                       /* sweep list of missing disks, OROM might leave
> +                        * some residuals from old members */
> +                       while (super->missing) {
> +                               imsm_delete(super, &super->missing, super->missing->index);
> +                       }

No, the expectation is that the list of disks is only mutated by the
monitor from imsm_process_update() where we can be sure that the
manager thread is not walking the disk list.  I.e. we get away from
needing locks by just making sure the monitor can't race with the
manager.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] imsm: monitor: do not finish recovery, when raid goes to read-only
  2012-04-05 15:31 ` [PATCH 4/4] imsm: monitor: do not finish recovery, when raid goes to read-only Przemyslaw Czarnowski
@ 2012-04-05 17:37   ` Williams, Dan J
  0 siblings, 0 replies; 10+ messages in thread
From: Williams, Dan J @ 2012-04-05 17:37 UTC (permalink / raw)
  To: Przemyslaw Czarnowski; +Cc: neilb, linux-raid, ed.ciechanowski

On Thu, Apr 5, 2012 at 8:31 AM, Przemyslaw Czarnowski
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> If kernel forces raid to be read-only (eg. during reboot, and root
> partition lies on array) monitor should not finish recovery and leave
> metadata untouched.

I'm not sure I understand the point of this one, can you clarify?
Even though the rootfs is readonly at initial boot it may still
require the root block device to be readwrite.  For example xfs needs
to do journal playback even when mounting ro.  So if the monitor is
running at all it means that the array was started read-write (or at
least read-auto) and should be fine to complete recovery.

--
Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] imsm: avoid double change of serial number of missing device
  2012-04-05 15:30 ` [PATCH 3/4] imsm: avoid double change of serial number of missing device Przemyslaw Czarnowski
@ 2012-04-05 17:41   ` Williams, Dan J
  0 siblings, 0 replies; 10+ messages in thread
From: Williams, Dan J @ 2012-04-05 17:41 UTC (permalink / raw)
  To: Przemyslaw Czarnowski; +Cc: neilb, linux-raid, ed.ciechanowski

On Thu, Apr 5, 2012 at 8:30 AM, Przemyslaw Czarnowski
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> If degraded state is set in OROM, it do not set 'failed' flag for missing,
> device, just changes serial number, sets scsiId to 0xffff and sets highest
> bit in ord table in map. When mdadm replaces missing disk, the one left
> there is marked as failed and missing again (changing already marked serial
> number).
>
> Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
> ---
>  super-intel.c |   15 +++++++++------
>  1 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index f843997..b361b72 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -6908,12 +6908,15 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
>        if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
>                return 0;
>
> -       memcpy(buf, disk->serial, MAX_RAID_SERIAL_LEN);
> -       buf[MAX_RAID_SERIAL_LEN] = '\000';
> -       strcat(buf, ":0");
> -       if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
> -               shift = len - MAX_RAID_SERIAL_LEN + 1;
> -       strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);
> +       /* do not add ":0", OROM has already done it */
> +       if (!(ord & IMSM_ORD_REBUILD)) {

What about the case where we are rebuilding to this disk and then get
a failure, doesn't this prevent the serial number from getting
updated?

The metadata expectation, as far as I can tell, is in-metadata-serial
!= serial-retrieved-from-disk.  So if it gets mutated twice what harm
does that cause?

> +               memcpy(buf, disk->serial, MAX_RAID_SERIAL_LEN);
> +               buf[MAX_RAID_SERIAL_LEN] = '\000';
> +               strcat(buf, ":0");
> +               if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
> +                       shift = len - MAX_RAID_SERIAL_LEN + 1;
> +               strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);
> +       }
>
>        disk->status |= FAILED_DISK;
>        set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks
  2012-04-05 15:29 ` [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks Przemyslaw Czarnowski
@ 2012-04-05 17:56   ` Williams, Dan J
  0 siblings, 0 replies; 10+ messages in thread
From: Williams, Dan J @ 2012-04-05 17:56 UTC (permalink / raw)
  To: Przemyslaw Czarnowski; +Cc: neilb, linux-raid, ed.ciechanowski

On Thu, Apr 5, 2012 at 8:29 AM, Przemyslaw Czarnowski
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> Transition from "degraded" to "recovery" made in OROM is slightly different
> than the same transision in mdadm. Missing disk is not removed from list of
> raid devices, but just from map. Therefore mdadm should not end migration
> basing on existence of list of missing disks but should rely on count of
> failed disks.
>
> Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
> ---
>  super-intel.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index dad4c4d..e1cd9b8 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -6953,6 +6953,12 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
>        if (!super->missing)
>                return;
>
> +       /* When orom adds replacement for missing disk it does
> +        * not remove entry of missing disk, but just updates map with
> +        * new added disk. So it is not enough just to test if there is
> +        * any missing disk, we have to look if there are any failed disks
> +        * in map to stop migration */
> +
>        dprintf("imsm: mark missing\n");
>        /* end process for initialization and rebuild only
>         */
> @@ -6963,7 +6969,8 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
>                failed = imsm_count_failed(super, dev, MAP_0);
>                map_state = imsm_check_degraded(super, dev, failed, MAP_0);
>
> -               end_migration(dev, super, map_state);
> +               if (failed)
> +                       end_migration(dev, super, map_state);

Doesn't this need to be something like "failed >= max_degraded" since
some recovery scenarios can continue in the presence of failed disks
like raid10, raid6.  Can you describe a bit more what the user visible
behavior is when the OROM leaves these stale entries?

This may be the right thing to do, but handle_missing is also called
in the case where we know a disk we had previously has been removed
and in that case we certainly want to restart recovery, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Imsm OROM compatibility/boot support fixes
  2012-04-05 15:28 [PATCH 0/4] Imsm OROM compatibility/boot support fixes Przemyslaw Czarnowski
                   ` (3 preceding siblings ...)
  2012-04-05 15:31 ` [PATCH 4/4] imsm: monitor: do not finish recovery, when raid goes to read-only Przemyslaw Czarnowski
@ 2012-04-09 23:22 ` NeilBrown
  4 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2012-04-09 23:22 UTC (permalink / raw)
  To: Przemyslaw Czarnowski; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

[-- Attachment #1: Type: text/plain, Size: 1380 bytes --]

On Thu, 05 Apr 2012 17:28:12 +0200 Przemyslaw Czarnowski
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:

> This series of patches contain fixes regarding OROM and mdadm compatibility
> connected with handling of recovery/degraded states and missing disks.
> OROM behaves little differently when it marks disk as missing (see description
> of patches for details)
> Last one in the series does not allow to modify metadata (in particular does
> not finish migration) on disks when array is set to read only by kernel
> (eg. during restart of system, when root partition lies on raid device).
> 
> ---
> 
> Przemyslaw Czarnowski (4):
>       imsm: monitor: do not finish migration if there are no failed disks
>       imsm: clean up missing disks if there are any left after migration
>       imsm: avoid double change of serial number of missing device
>       imsm: monitor: do not finish recovery, when raid goes to read-only
> 
> 
>  monitor.c     |    1 +
>  super-intel.c |   33 ++++++++++++++++++++++++---------
>  2 files changed, 25 insertions(+), 9 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Given Dan's comments, I won't apply these just now. 

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-09 23:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-05 15:28 [PATCH 0/4] Imsm OROM compatibility/boot support fixes Przemyslaw Czarnowski
2012-04-05 15:29 ` [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks Przemyslaw Czarnowski
2012-04-05 17:56   ` Williams, Dan J
2012-04-05 15:30 ` [PATCH 2/4] imsm: clean up missing disks if there are any left after migration Przemyslaw Czarnowski
2012-04-05 17:06   ` Williams, Dan J
2012-04-05 15:30 ` [PATCH 3/4] imsm: avoid double change of serial number of missing device Przemyslaw Czarnowski
2012-04-05 17:41   ` Williams, Dan J
2012-04-05 15:31 ` [PATCH 4/4] imsm: monitor: do not finish recovery, when raid goes to read-only Przemyslaw Czarnowski
2012-04-05 17:37   ` Williams, Dan J
2012-04-09 23:22 ` [PATCH 0/4] Imsm OROM compatibility/boot support fixes NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.