All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/29] OLCE, migrations and raid10 takeover
@ 2010-12-09 15:18 Adam Kwolek
  2010-12-09 15:18 ` [PATCH 01/29] Add state_of_reshape for external metadata Adam Kwolek
                   ` (29 more replies)
  0 siblings, 30 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:18 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

This series for mdadm and introduces features (after some rework):
- Online Capacity Expansion (OLCE): patches 0001 to 0015
- Migrations: patches 0016 to 0023
- Takeover: patches 0024 to 0028
- Add spare to raid0: patch 0029

Changes made since last post:
1. rebased to latest devel 3.2
2. functions for finding array minor rewriten/colapsed in to one function (util.c)
3. used open_dev instead of open
4. Almost all added code fits in 80 collums


!!! Please note that searching spares has to be merged with auto-rebuild feature.  
    Krzysztof works this and he will post fix very soon.
    
Next steps:
- fix for spares managment
- add checkpointing (working without md fix for moving suspend_hi)


Online Capacity Expansion for raid0 and raid5 arrays implements the following algorithm for container reshape:
1.      mdadm: Freeze container
2.      mdadm: Perform takeover to raid5 for all raid0 arrays in container (imsm for raid0 <->raid5 takeover requires no metadata updates)
3.      mdadm: set raid_disks sysfs entry for all arrays in container
4.      mdadm: prepares and sends metadata update using reshape_super() vector for first array in container.
5.      mdadm: waits for array idle or reshape state
6.      managemon: prepare_update(): allocates memory for bigger device object
7.      monitor: process_update(): applies update, relinks memory for device objects. Sets reshape_delta_disks variable in active array to requested ne disks
8.      monitor: kicks managemon on reshape_delta_disks  value other than RESHAPE_NOT_ACTIVE and RESHAPE_IN_PROGRESS  value
9.      managemon: adds devices to md (let md set slot number on reshape start)
10.     managemon: sets sync_max to 0
11.     managemon: starts reshape in md
12.     managemon: on success sends slot verification message to monitor to update slots
13.     managemon: on failure sends reshape cancelation message (sets idle state to md)
14.     managemon: sets reshape_delta_disks variable to RESHAPE_IN_PROGRESS value to avoid managemon procedures reentry.
15.     monitor:
           a. for set slot message verifies and corrects (if necessary) slot information in metadata
           b. for cancel message roll backs metadata information, set reshape_delta_disks variable to RESHAPE_NOT_ACTIVE
16.     mdadm:  on idle array state exits and unfreezes array. End
17.     mdadm: on reshape array state continues with reshape (it also sends ping to monitor and mandgemon to be sure that metadata updates hits disks)
18.     mdadm: verifies array state: if slots are set correctly
19.     mdadm: calls child_grow() function
20.     mdadm: waits for reshape finish
21.     monitor: on reshape finish sets reshape_delta_disks variable to RESHAPE_NOT_ACTIVE
22.     mdadm: sets array size according to information in metadata
23.     mdadm: for raid0 array backward takeover to raid0 is executed.
24.     mdadm: check if other array in container requires reshape if, yes starts from #4
25.     mdadm: unfreezes array

Migration feature reuses code flow introduced for OLCE (Online Capacity Expansion) and uses the same grow/reshape flow in mdadm/mdmon.
Migration works in the following way:
1. mdadm: reshape_super() prepares metadata update and sends it to mdmon 2. mdadm: waits for reshape array state 3. monitor: receives metadata update and applies it.
4. monitor: metadata update triggers managemon.
5. managemon: updates array (md) configuration and starts reshape 6. mdadm: finds that reshape is started and continues it using check pointing 7. mdadm: reshape is finished
and manage_reshape() finalizes array:
    - Sets array size as is given in metadata
    - Performs takeover to raid0 if necessary

In current patches placement of manage_reshape() function call was changed (patch 0019).
It is moved to end of array processing to use common code form Grow.c for external metadata reshape case (we do not need to duplicate existing code) as it would do the same
things as code for native metadata. New manage_reshape() placement causes a few things to do in current implementation only and simplifees code.

Migrations command line:
1. Execute migration raid0->raid5:
    mdadm  --grow /dev/md/array_name -level 5 -layout=left-asymmetric

    This converts n-disks raid0 array to (n+1)-disks raid5 array.
    Additional disk is user from spares pool for raid5 array.

2. Execute migration raid5->raid0:
    mdadm  - -grow /dev/md/array_name -level 0

    This converts n-disks raid5 array to n-disks raid0 array.

3. Execute chunk size migration
    mdadm  - -grow /dev/md/array_name -chunk N

    where N is ne chunk size value

Online Capacity Expansion command line:
1. Add spares to container i.e. mdadm -add /dev/md/imsm_container_name /dev/sdX
   For Raid0 spares are required also. Patch "[PATCH 16] Add spares to raid0 array using takeover" enables this.
2. Execute reshape i.e. : mdadm -grown /dev/md/imsm_container_name -raid-devices=requested_raid_disks_number
   Grow is executed for all arrays in container that command is executed on.

Feature is treated as experimental due to Windows compatibility during reshape process, code is guarded by MDADM_EXPERIMENTAL environment variable.


---

Adam Kwolek (29):
      Add spares to raid0 in mdadm
      IMSM compatibility for raid0 -> raid10 takeover
      FIX: Problem with removing array after takeover
      Takeover raid0 -> raid10 for external metadata
      Takeover raid10 -> raid0 for external metadata
      Add takeover support for external meta
      Migration: Chunk size migration
      FIX: mdstat doesn't read chunk size correctly
      Read chunk size and layout from mdstat
      Migration raid0->raid5
      Detect level change
      Migration: raid5->raid0
      Change manage_reshape() placement
      Enable reshape for subarrays
      FIX: core during getting map
      WORKAROUND: md reports idle state during reshape start
      Add reshape progress updating
      Finalize reshape after adding disks to array
      Control reshape in mdadm
      imsm: Fill delta_disks field in getinfo_super()
      imsm: Do not indicate resync during reshape
      imsm: Do not accept messages sent by mdadm
      imsm: Cancel metadata changes on reshape start failure
      imsm: Verify slots in meta against slot numbers set by md
      Process reshape initialization by managemon
      imsm: Block array state change during reshape
      imsm: Process reshape_update in mdmon
      imsm: Prepare reshape_update in mdadm
      Add state_of_reshape for external metadata


 Grow.c        |  201 +++-
 Manage.c      |   14 
 managemon.c   |  220 +++++
 mdadm.h       |   39 +
 mdmon.c       |   57 +
 mdmon.h       |   10 
 mdstat.c      |   14 
 monitor.c     |   38 +
 super-intel.c | 2668 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 sysfs.c       |  174 ++++
 util.c        |   34 +
 11 files changed, 3389 insertions(+), 80 deletions(-)

-- 
Adam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/29] Add state_of_reshape for external metadata
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
@ 2010-12-09 15:18 ` Adam Kwolek
  2010-12-09 15:18 ` [PATCH 02/29] imsm: Prepare reshape_update in mdadm Adam Kwolek
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:18 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

During reshape we have to know what is present reshape state:
reshape_not_active: reshape is not started, array is in other state than reshape
reshape_is_starting: reshape is about to start, provably metadata is updated,
                     array in md can be in reshape state.
                     In this state mdmon should not allow for array rebuilds
                     as reconfiguration is in progress.
                     When everything goes fine the next state should be reshape_in_progress
                     In error case reshape_cancel_request should be reached.
reshape_in_progress: md is in reshape state and reshape is in progress
                     when reshape ends state_of_reshape will return to reshape_not_active
reshape_cancel_request: reshape canceling request is issued in error case.
                        during this state metadata rollback should occurs.
                        From this state state_of_reshape should go to reshape_not_active state

reshape_delta_disks field should contain valid value in reshape_in_progress state
and tells how many disks are added to array.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |    2 ++
 mdmon.h     |    5 +++++
 monitor.c   |   12 +++++++++++-
 3 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/managemon.c b/managemon.c
index ebd9b73..945b173 100644
--- a/managemon.c
+++ b/managemon.c
@@ -521,6 +521,8 @@ static void manage_new(struct mdstat_ent *mdstat,
 
 	new->container = container;
 
+	new->reshape_state = reshape_not_active;
+
 	inst = to_subarray(mdstat, container->devname);
 
 	new->info.array = mdi->array;
diff --git a/mdmon.h b/mdmon.h
index 5c51566..6f8b439 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -23,6 +23,8 @@ enum array_state { clear, inactive, suspended, readonly, read_auto,
 
 enum sync_action { idle, reshape, resync, recover, check, repair, bad_action };
 
+enum state_of_reshape { reshape_not_active, reshape_is_starting,
+			reshape_in_progress, reshape_cancel_request };
 
 struct active_array {
 	struct mdinfo info;
@@ -45,6 +47,9 @@ struct active_array {
 	enum array_state prev_state, curr_state, next_state;
 	enum sync_action prev_action, curr_action, next_action;
 
+	enum state_of_reshape reshape_state;
+	int reshape_delta_disks;
+
 	int check_degraded; /* flag set by mon, read by manage */
 
 	int devnum;
diff --git a/monitor.c b/monitor.c
index f166bc8..5298fa1 100644
--- a/monitor.c
+++ b/monitor.c
@@ -399,8 +399,18 @@ static int read_and_act(struct active_array *a)
 		signal_manager();
 	}
 
-	if (deactivate)
+	if (deactivate) {
 		a->container = NULL;
+		/* break reshape also
+		 */
+		if (a->reshape_state !=  reshape_in_progress)
+			a->reshape_state = reshape_not_active;
+	}
+
+	/* signal manager when reshape is in reshape_is_starting state
+	 */
+	if (a->reshape_state == reshape_is_starting)
+		signal_manager();
 
 	return dirty;
 }


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/29] imsm: Prepare reshape_update in mdadm
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
  2010-12-09 15:18 ` [PATCH 01/29] Add state_of_reshape for external metadata Adam Kwolek
@ 2010-12-09 15:18 ` Adam Kwolek
  2010-12-14  0:07   ` Neil Brown
  2010-12-09 15:19 ` [PATCH 03/29] imsm: Process reshape_update in mdmon Adam Kwolek
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:18 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

During Online Capacity Expansion metadata has to be updated to show
array changes and allow for future assembly of array.
To do this mdadm prepares and sends reshape_update metadata update to mdmon.
Update is sent for one array in container. It contains updated device
and spares that have to be turned in to array members.
For spares we have 2 cases:
1. For first array in container:
   reshape_delta_disks: shows how many disks will be added to array
   Spares are sent in update so variable spares_in_update in metadata update tells that mdmon has to turn spares in to array
   (IMSM's array meaning) members.
2. For 2nd array in container:
   reshape_delta_disks: shows how many disks will be added to array -exactly as in first case
   Spares were turned in to array members (they are not a spares) so we have for this volume
   reuse those disks only.

This update will change active array state to reshape_is_starting state.
This works in the following way:
1. reshape_super() prepares metadata update and send it to mdmon
2. managemon in prepare_update() allocates required memory for bigger device object
3. monitor in process_update() updates (replaces) device object with information
   passed from mdadm (memory was allocated by managemon)
4. process_update() function performs:
   - sets reshape_delta_disks variable to reshape_delta_disks value from update
   - sets array in to reshape_is_starting state.
5. This signals managemon to add devices to md and start reshape for this array
   and put array in to reshape_in_progress.
   Managemon can request reshape_cancel_request state in error case.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 mdadm.h       |    2 
 super-intel.c |  710 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 sysfs.c       |  174 ++++++++++++++
 util.c        |   34 +++
 4 files changed, 920 insertions(+), 0 deletions(-)

diff --git a/mdadm.h b/mdadm.h
index 175d228..ba3a9c5 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -494,6 +494,7 @@ extern int reshape_open_backup_file(char *backup,
 				    unsigned long long *offsets);
 extern unsigned long compute_backup_blocks(int nchunk, int ochunk,
 					   unsigned int ndata, unsigned int odata);
+extern struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd);
 
 extern int save_stripes(int *source, unsigned long long *offsets,
 			int raid_disks, int chunk_size, int level, int layout,
@@ -1060,6 +1061,7 @@ extern int conf_name_is_free(char *name);
 extern int devname_matches(char *name, char *match);
 extern struct mddev_ident *conf_match(struct mdinfo *info, struct supertype *st);
 extern int experimental(void);
+extern struct mdstat_ent *find_array_by_subdev(char *subdev, int container);
 
 extern void free_line(char *line);
 extern int match_oneof(char *devices, char *devname);
diff --git a/super-intel.c b/super-intel.c
index 2943898..183b82c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -285,6 +285,7 @@ enum imsm_update_type {
 	update_kill_array,
 	update_rename_array,
 	update_add_disk,
+	update_reshape,
 };
 
 struct imsm_update_activate_spare {
@@ -295,6 +296,43 @@ struct imsm_update_activate_spare {
 	struct imsm_update_activate_spare *next;
 };
 
+struct geo_params {
+	int dev_id;
+	char *dev_name;
+	long long size;
+	int level;
+	int layout;
+	int chunksize;
+	int raid_disks;
+};
+
+
+struct imsm_update_reshape {
+	enum imsm_update_type type;
+	int update_memory_size;
+	int reshape_delta_disks;
+	int disks_count;
+	int spares_in_update;
+	int devnum;
+	/* pointers to memory that will be allocated
+	 * by manager during prepare_update()
+	 */
+	struct intel_dev devs_mem;
+	/* status of update preparation
+	 */
+	int update_prepared;
+	/* anchor data prepared by mdadm */
+	int upd_devs_offset;
+	int device_size;
+	struct dl upd_disks[1];
+	/* here goes added spares
+	 */
+	/* and here goes imsm_devs pointed by upd_devs
+	 * devs are put here as row data every device_size bytes
+	 *
+	 */
+};
+
 struct disk_info {
 	__u8 serial[MAX_RAID_SERIAL_LEN];
 };
@@ -5271,6 +5309,9 @@ static void imsm_process_update(struct supertype *st,
 	mpb = super->anchor;
 
 	switch (type) {
+	case update_reshape: {
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5590,6 +5631,9 @@ static void imsm_prepare_update(struct supertype *st,
 	size_t len = 0;
 
 	switch (type) {
+	case update_reshape: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -5743,6 +5787,671 @@ static const char *imsm_get_disk_controller_domain(const char *path)
 		return NULL;
 }
 
+int imsm_find_array_minor_by_subdev(int subdev, int container, int *minor)
+{
+	char subdev_name[PATH_MAX];
+	struct mdstat_ent *mdstat;
+
+	sprintf(subdev_name, "%d", subdev);
+	mdstat = find_array_by_subdev(subdev_name, container);
+	if (mdstat) {
+		*minor = mdstat->devnum;
+		while (mdstat) {
+			struct mdstat_ent *ent;
+			ent = mdstat;
+			mdstat = mdstat->next;
+			ent->next = NULL;
+			free_mdstat(ent);
+		}
+		return 0;
+	}
+
+	return -1;
+}
+
+int imsm_reshape_is_allowed_on_container(struct supertype *st,
+					 struct geo_params *geo)
+{
+	int ret_val = 0;
+	struct mdinfo *info = NULL;
+	char buf[PATH_MAX];
+	int fd = -1;
+	int device_num = -1;
+	int devices_that_can_grow = 0;
+
+	dprintf("imsm: imsm_reshape_is_allowed_on_container(ENTER): "\
+		"st->devnum = (%i)\n",
+		st->devnum);
+
+	if (geo == NULL ||
+	    (geo->size != -1) || (geo->level != UnSet) ||
+	    (geo->layout != UnSet) || (geo->chunksize != 0)) {
+		dprintf("imsm: Container operation is allowed for "\
+			"raid disks number change only.\n");
+		return ret_val;
+	}
+
+	dprintf("imsm: open device (/dev/md%i)\n", st->devnum);
+	fd = open_dev(st->devnum);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		return ret_val;
+	}
+
+	if (geo->raid_disks == UnSet) {
+		dprintf("imsm: for container operation raid disks "\
+			"change is required\n");
+		goto exit_imsm_reshape_is_allowed_on_container;
+	}
+
+	device_num = 0; /* start from first device (skip container info) */
+	while (device_num > -1) {
+		int result;
+		int minor;
+		unsigned long long array_blocks;
+		struct imsm_map *map = NULL;
+		struct imsm_dev *dev = NULL;
+		struct intel_super *super = NULL;
+		int used_disks;
+
+
+		dprintf("imsm: checking device_num: %i\n", device_num);
+		super = st->sb;
+		super->current_vol = device_num;
+		st->ss->load_super(st, fd, NULL);
+		if (st->sb == NULL) {
+			if (device_num == 0) {
+				/* for the first checked device this is error
+				   there should be at least one device to check
+				 */
+				dprintf("imsm: error: superblock is NULL "\
+					"during container operation\n");
+			} else {
+				dprintf("imsm: no more devices to check, "\
+					"number of forund devices: %i\n",
+					devices_that_can_grow);
+				/* check if any device in container
+				 * can be growed
+				 */
+				if (devices_that_can_grow)
+					ret_val = 1;
+			}
+			break;
+		}
+		info = sysfs_read(fd,
+				  0,
+				  GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+		if (info == NULL) {
+			dprintf("imsm: Cannot get device info.\n");
+			break;
+		}
+		super = st->sb;
+		super->current_vol = device_num;
+		st->ss->getinfo_super(st, info, NULL);
+		if ((info->name == NULL) ||
+		    (strlen(info->name) == 0)) {
+			/* no more to load
+			 */
+			dprintf("imsm: no more devices to check, number "\
+				"of forund devices: %i\n",
+				devices_that_can_grow);
+			/* check if any device in container can be groved
+				 */
+			if (devices_that_can_grow)
+				ret_val = 1;
+			break;
+		}
+
+		if (geo->raid_disks < info->array.raid_disks) {
+			/* we work on container for Online Capacity Expansion
+			 * only so raid_disks has to grow
+			 */
+			dprintf("imsm: for container operation raid disks "\
+				"increase is required\n");
+			break;
+		}
+		/* check if size is set corectly
+		 * wrong conditions could happend
+		 * when previous reshape wes interrupted
+		 */
+		super = st->sb;
+		dev = get_imsm_dev(super, device_num);
+		if (dev == NULL) {
+			dprintf("cannot get imsm device\n");
+			ret_val = 0;
+			break;
+		}
+		map = get_imsm_map(dev, 0);
+		if (dev == NULL) {
+			dprintf("cannot get imsm device map\n");
+			ret_val = 0;
+			break;
+		}
+		used_disks = imsm_num_data_members(dev);
+		dprintf("read raid_disks = %i\n", used_disks);
+		dprintf("read requested disks = %i\n", geo->raid_disks);
+		array_blocks = map->blocks_per_member * used_disks;
+		/* round array size down to closest MB
+		 */
+		array_blocks = (array_blocks >> SECT_PER_MB_SHIFT)
+				<< SECT_PER_MB_SHIFT;
+		if (sysfs_set_num(info, NULL, "array_size", array_blocks/2) < 0)
+			dprintf("cannot set array size to %llu\n",
+				array_blocks/2);
+
+		if (geo->raid_disks > info->array.raid_disks)
+			devices_that_can_grow++;
+
+		if ((info->array.level != 0) &&
+		    (info->array.level != 5)) {
+			/* we cannot use this container other raid level
+			 */
+			dprintf("imsm: for container operation wrong"\
+				" raid level (%i) detected\n",
+				info->array.level);
+			break;
+		} else {
+			/* check for platform support
+			 *for this raid level configuration
+			 */
+			struct intel_super *super = st->sb;
+			if (!is_raid_level_supported(super->orom,
+						     info->array.level,
+						     geo->raid_disks)) {
+				dprintf("platform does not support raid%d with"\
+					" %d disk%s\n",
+					 info->array.level,
+					 geo->raid_disks,
+					 geo->raid_disks > 1 ? "s" : "");
+				break;
+			}
+		}
+
+		/* all raid5 and raid0 volumes in container
+		 * has to be ready for Online Capacity Expansion
+		 */
+		result = imsm_find_array_minor_by_subdev(device_num,
+							 st->container_dev,
+							 &minor);
+		if (result < 0) {
+			dprintf("imsm: cannot find array\n");
+			break;
+		}
+		sprintf(info->sys_name, "md%i", minor);
+		if (sysfs_get_str(info, NULL, "array_state", buf, 20) <= 0) {
+			dprintf("imsm: cannot read array state\n");
+			break;
+		}
+		if ((strncmp(buf, "clean", 5) != 0) &&
+		    (strncmp(buf, "clear", 5) != 0) &&
+		    (strncmp(buf, "active", 6) != 0)) {
+			int index = strlen(buf) - 1;
+
+			if (index < 0)
+				index = 0;
+			*(buf + index) = 0;
+			fprintf(stderr, "imsm: Error: Array %s is not in "\
+				"proper state (current state: %s). "\
+				"Cannot continue.\n",
+				info->sys_name,
+				buf);
+			break;
+		}
+		if (info->array.level > 0) {
+			if (sysfs_get_str(info,
+					  NULL,
+					  "sync_action",
+					  buf,
+					  20) <= 0) {
+				dprintf("imsm: for container operation "\
+					"no sync action\n");
+				break;
+			}
+			/* check if any reshape is not in progress
+			 */
+			if (strncmp(buf, "reshape", 7) == 0) {
+				dprintf("imsm: for container operation reshape"\
+					" is currently in progress\n");
+				break;
+			}
+		}
+		sysfs_free(info);
+		info = NULL;
+		device_num++;
+	}
+	sysfs_free(info);
+	info = NULL;
+
+exit_imsm_reshape_is_allowed_on_container:
+	if (fd >= 0)
+		close(fd);
+
+	dprintf("imsm: imsm_reshape_is_allowed_on_container(Exit) "\
+		"device_num = %i, ret_val = %i\n",
+		device_num,
+		ret_val);
+	if (ret_val)
+		dprintf("\tContainer operation allowed\n");
+	else
+		dprintf("\tError: %i\n", ret_val);
+
+	return ret_val;
+}
+
+struct mdinfo *get_spares_imsm(int devnum)
+{
+	int fd = -1;
+	struct mdinfo *info = NULL;
+	struct mdinfo *ret_val = NULL;
+	int cont_fd = -1;
+	struct supertype *st = NULL;
+	int find_result;
+	struct intel_super *super = NULL;
+
+	dprintf("imsm: get_spares_imsm for device: %i.\n", devnum);
+
+	cont_fd = open_dev(devnum);
+	if (cont_fd < 0) {
+		dprintf("imsm: ERROR: Cannot open container.\n");
+		goto abort;
+	}
+
+	/* get first volume */
+	st = super_by_fd(cont_fd, NULL);
+	if (st == NULL) {
+		dprintf("imsm: ERROR: Cannot load container information.\n");
+		goto abort;
+	}
+	find_result = imsm_find_array_minor_by_subdev(0, devnum, &devnum);
+	if (find_result < 0) {
+		dprintf("imsm: ERROR: Cannot find array.\n");
+		goto abort;
+	}
+	fd = open_dev(devnum);
+	if (fd < 0) {
+		dprintf("imsm: ERROR: Cannot open device.\n");
+		goto abort;
+	}
+	st->ss->load_super(st, cont_fd, NULL);
+	if (st->sb == NULL) {
+		dprintf("imsm: ERROR: Cannot load array information.\n");
+		goto abort;
+	}
+	info = sysfs_read(fd,
+			  0,
+			  GET_LEVEL | GET_VERSION | GET_DEVS | GET_STATE);
+	if (info == NULL) {
+		dprintf("imsm: Cannot get device info.\n");
+		goto abort;
+	}
+	super = st->sb;
+	super->current_vol = 0;
+	st->ss->getinfo_super(st, info, NULL);
+	ret_val = sysfs_get_unused_spares(cont_fd, fd);
+	if (ret_val == NULL) {
+		dprintf("imsm: ERROR: Cannot get spare devices.\n");
+		goto abort;
+	}
+	if (ret_val->array.spare_disks == 0) {
+		dprintf("imsm: ERROR: No available spares.\n");
+		free(ret_val);
+		ret_val = NULL;
+		goto abort;
+	}
+
+abort:
+	if (st)
+		st->ss->free_super(st);
+	sysfs_free(info);
+	if (fd > -1)
+		close(fd);
+	if (cont_fd > -1)
+		close(cont_fd);
+
+	return ret_val;
+}
+
+/******************************************************************************
+ * function: imsm_create_metadata_update_for_reshape
+ * Function creates update for whole IMSM container.
+ * Slot number for new devices are guesed only. Managemon will correct them
+ * when reshape will be triggered and md sets slot numbers.
+ * Slot numbers in metadata will be updated with stage_2 update
+ ******************************************************************************/
+struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
+	struct supertype *st,
+	struct geo_params *geo)
+{
+	struct imsm_update_reshape *ret_val = NULL;
+	struct intel_super *super = st->sb;
+	int update_memory_size = 0;
+	struct imsm_update_reshape *u = NULL;
+	struct imsm_map *new_map = NULL;
+	struct mdinfo *spares = NULL;
+	int i;
+	unsigned long long array_blocks;
+	int used_disks;
+	int delta_disks = 0;
+	struct dl *new_disks;
+	int device_size;
+	void *upd_devs;
+
+	dprintf("imsm_update_metadata_for_reshape(enter) raid_disks = %i\n",
+		geo->raid_disks);
+
+	if ((geo->raid_disks < super->anchor->num_disks) ||
+	    (geo->raid_disks == UnSet))
+		geo->raid_disks = super->anchor->num_disks;
+	delta_disks = geo->raid_disks - super->anchor->num_disks;
+
+	/* size of all update data without anchor */
+	update_memory_size = sizeof(struct imsm_update_reshape);
+	/* add space for all devices,
+	 * then add maps space
+	 */
+	device_size = sizeof(struct imsm_dev);
+	device_size += sizeof(struct imsm_map);
+	device_size += 2 * (geo->raid_disks - 1) * sizeof(__u32);
+
+	update_memory_size += device_size * super->anchor->num_raid_devs;
+	if (delta_disks > 1) {
+		/* now add space for spare disks information
+		 */
+		update_memory_size += sizeof(struct dl) * (delta_disks - 1);
+	}
+
+	u = calloc(1, update_memory_size);
+	if (u == NULL) {
+		dprintf("error: "\
+			"cannot get memory for imsm_update_reshape update\n");
+		return ret_val;
+	}
+	u->reshape_delta_disks = delta_disks;
+	u->update_prepared = -1;
+	u->update_memory_size = update_memory_size;
+	u->type = update_reshape;
+	u->spares_in_update = 0;
+	u->upd_devs_offset =  sizeof(struct imsm_update_reshape) +
+				sizeof(struct dl) * (delta_disks - 1);
+	upd_devs = (struct imsm_dev *)((void *)u + u->upd_devs_offset);
+	u->device_size = device_size;
+
+	for (i = 0; i < super->anchor->num_raid_devs; i++) {
+		struct imsm_dev *old_dev = __get_imsm_dev(super->anchor, i);
+		int old_disk_number;
+		int devnum = -1;
+
+		u->devnum = -1;
+		if (old_dev == NULL)
+			break;
+
+		if (st->devnum == st->container_dev)
+			imsm_find_array_minor_by_subdev(i, st->devnum, &devnum);
+		else
+			devnum = st->devnum;
+		if (devnum == geo->dev_id) {
+			__u8 to_state;
+			struct imsm_map *new_map2;
+			int idx;
+
+			new_map = NULL;
+			imsm_copy_dev(upd_devs, old_dev);
+			new_map = get_imsm_map(upd_devs, 0);
+			old_disk_number = new_map->num_members;
+			new_map->num_members = geo->raid_disks;
+			u->reshape_delta_disks = new_map->num_members -
+						 old_disk_number;
+			/* start migration on new device
+			 * it puts second map there also
+			 */
+
+			to_state = imsm_check_degraded(super, old_dev, 0);
+			migrate(upd_devs, to_state, MIGR_GEN_MIGR);
+			/* second map length is equal to first map
+			* correct second map length to old value
+			*/
+			new_map2 = get_imsm_map(upd_devs, 1);
+			if (new_map2) {
+				if (new_map2->num_members != old_disk_number) {
+					new_map2->num_members = old_disk_number;
+					/* guess new disk indexes
+					*/
+					for (idx = new_map2->num_members;
+					     idx < new_map->num_members;
+					     idx++)
+						set_imsm_ord_tbl_ent(new_map,
+								     idx,
+								     idx);
+				}
+				u->devnum = geo->dev_id;
+				break;
+			}
+		}
+	}
+
+	if (delta_disks <= 0) {
+		dprintf("imsm: reshape without grow (disk add).\n");
+		/* finalize update */
+		goto calculate_size_only;
+	}
+
+	/* now get spare disks list
+	 */
+	spares = get_spares_imsm(st->container_dev);
+
+	if (spares == NULL) {
+		dprintf("imsm: ERROR: Cannot get spare devices.\n");
+		goto exit_imsm_create_metadata_update_for_reshape;
+	}
+	if ((spares->array.spare_disks == 0) ||
+	(u->reshape_delta_disks > spares->array.spare_disks)) {
+		dprintf("imsm: ERROR: No available spares.\n");
+		goto exit_imsm_create_metadata_update_for_reshape;
+	}
+	/* we have got spares
+	 * update disk list in imsm_disk list table in anchor
+	 */
+	dprintf("imsm: %i spares are available.\n\n",
+		spares->array.spare_disks);
+	new_disks = u->upd_disks;
+	for (i = 0; i < u->reshape_delta_disks; i++) {
+		struct mdinfo *dev = spares->devs;
+		__u32 id;
+		int fd;
+		char buf[PATH_MAX];
+		int rv;
+		unsigned long long size;
+
+		sprintf(buf, "%d:%d", dev->disk.major, dev->disk.minor);
+		dprintf("open spare disk %s (%s)\n", buf, dev->sys_name);
+		fd = dev_open(buf, O_RDWR);
+		if (fd < 0) {
+			dprintf("\topen failed\n");
+			goto exit_imsm_create_metadata_update_for_reshape;
+		}
+		if (sysfs_disk_to_scsi_id(fd, &id) == 0)
+			new_disks[i].disk.scsi_id = __cpu_to_le32(id);
+		else
+			new_disks[i].disk.scsi_id = __cpu_to_le32(0);
+		new_disks[i].disk.status = CONFIGURED_DISK;
+		rv = imsm_read_serial(fd, NULL, new_disks[i].disk.serial);
+		if (rv != 0) {
+			dprintf("\tcannot read disk serial\n");
+			close(fd);
+			goto exit_imsm_create_metadata_update_for_reshape;
+		}
+		dprintf("\tdisk serial: %s\n", new_disks[i].disk.serial);
+		get_dev_size(fd, NULL, &size);
+		size /= 512;
+		new_disks[i].disk.total_blocks = __cpu_to_le32(size);
+		new_disks[i].disk.owner_cfg_num =
+			super->anchor->disk->owner_cfg_num;
+
+		new_disks[i].major = dev->disk.major;
+		new_disks[i].minor = dev->disk.minor;
+		/* no relink in update
+		 * use table access
+		 */
+		new_disks[i].next = NULL;
+
+		close(fd);
+		spares->devs = dev->next;
+		u->spares_in_update++;
+
+		free(dev);
+		dprintf("\n");
+	}
+calculate_size_only:
+	/* calculate new size
+	 */
+	if (new_map != NULL) {
+
+		used_disks = imsm_num_data_members(upd_devs);
+		if (used_disks) {
+			array_blocks = new_map->blocks_per_member * used_disks;
+			/* round array size down to closest MB
+			 */
+			array_blocks = (array_blocks >> SECT_PER_MB_SHIFT)
+					<< SECT_PER_MB_SHIFT;
+			((struct imsm_dev *)(upd_devs))->size_low =
+				__cpu_to_le32((__u32)array_blocks);
+			((struct imsm_dev *)(upd_devs))->size_high =
+				__cpu_to_le32((__u32)(array_blocks >> 32));
+			/* finalize update */
+			ret_val = u;
+		}
+	}
+
+exit_imsm_create_metadata_update_for_reshape:
+	/* free spares
+	 */
+	if (spares) {
+		while (spares->devs) {
+			struct mdinfo *dev = spares->devs;
+			spares->devs = dev->next;
+			free(dev);
+		}
+		free(spares);
+	}
+
+	if (ret_val == NULL)
+		free(u);
+
+	return ret_val;
+}
+
+int get_volume_for_olce(struct supertype *st, int raid_disks)
+{
+	int ret_val = -1;
+	struct mdinfo *sra = NULL;
+	struct mdinfo info;
+	struct intel_super *super = st->sb;
+	int i;
+	int fd = -1;
+
+	dprintf("imsm: open device (/dev/md%i)\n", st->devnum);
+	fd = open_dev(st->devnum);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		return ret_val;
+	}
+
+	super = st->sb;
+	if (super == NULL)
+		goto exit_get_volume_for_olce;
+
+	for (i = 0; i < super->anchor->num_raid_devs; i++) {
+		struct intel_super *super = NULL;
+
+		info.devs = NULL;
+		super = st->sb;
+		super->current_vol = i;
+		st->ss->getinfo_super(st, &info, NULL);
+
+		if (raid_disks > info.array.raid_disks) {
+			dprintf("Found device requested raid_disks = %i, "\
+				"array raid_disks = %i\n",
+				raid_disks, info.array.raid_disks);
+			ret_val = i;
+			break;
+		}
+	}
+
+exit_get_volume_for_olce:
+	sysfs_free(sra);
+	if (fd > -1)
+		close(fd);
+
+	return ret_val;
+}
+
+int imsm_reshape_super(struct supertype *st, long long size, int level,
+		       int layout, int chunksize, int raid_disks,
+		       char *backup, char *dev, int verbouse)
+{
+	int ret_val = 1;
+	struct geo_params geo;
+
+	dprintf("imsm: reshape_super called.\n");
+
+	memset(&geo, sizeof(struct geo_params), 0);
+
+	geo.dev_name = dev;
+	geo.size = size;
+	geo.level = level;
+	geo.layout = layout;
+	geo.chunksize = chunksize;
+	geo.raid_disks = raid_disks;
+
+	dprintf("\tfor level      : %i\n", geo.level);
+	dprintf("\tfor raid_disks : %i\n", geo.raid_disks);
+
+	if (experimental() == 0)
+		return ret_val;
+
+	/* verify reshape conditions
+	 * on container level we can do almost  everything */
+	if (st->container_dev == st->devnum) {
+		/* check for delta_disks > 0
+		 *and supported raid levels 0 and 5 only in container */
+		if (imsm_reshape_is_allowed_on_container(st, &geo)) {
+			struct imsm_update_reshape *u;
+			int array;
+
+			array = get_volume_for_olce(st, geo.raid_disks);
+			if (array >= 0) {
+				imsm_find_array_minor_by_subdev(array,
+								st->devnum,
+								&geo.dev_id);
+				if (geo.dev_id > 0) {
+					dprintf("imsm: Preparing metadata"\
+						" update for subarray: %i\n",
+						array);
+
+					st->update_tail = &st->updates;
+					u = imsm_create_metadata_update_for_reshape(st, &geo);
+
+					if (u) {
+						ret_val = 0;
+						append_metadata_update(st,
+							u,
+							u->update_memory_size);
+					} else
+						dprintf("imsm: Cannot prepare "\
+							"update\n");
+				} else
+					dprintf("imsm: Cannot find array "\
+						"in container\n");
+			}
+		} else
+			dprintf("imsm: Operation is not allowed "\
+				"on container\n");
+	} else
+		dprintf("imsm: not a container operation\n");
+
+	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
+	return ret_val;
+}
 
 struct superswitch super_imsm = {
 #ifndef	MDASSEMBLE
@@ -5779,6 +6488,7 @@ struct superswitch super_imsm = {
 	.container_content = container_content_imsm,
 	.default_geometry = default_geometry_imsm,
 	.get_disk_controller_domain = imsm_get_disk_controller_domain,
+	.reshape_super  = imsm_reshape_super,
 
 	.external	= 1,
 	.name = "imsm",
diff --git a/sysfs.c b/sysfs.c
index 7a0403d..ca29aab 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -801,6 +801,180 @@ int sysfs_unique_holder(int devnum, long rdev)
 		return found;
 }
 
+int sysfs_is_spare_device_belongs_to(int fd, char *devname)
+{
+	int ret_val = -1;
+	char fname[PATH_MAX];
+	char *base;
+	char *dbase;
+	struct mdinfo *sra;
+	DIR *dir = NULL;
+	struct dirent *de;
+
+	sra = malloc(sizeof(*sra));
+	if (sra == NULL)
+		goto abort;
+	memset(sra, 0, sizeof(*sra));
+	sysfs_init(sra, fd, -1);
+	if (sra->sys_name[0] == 0)
+		goto abort;
+
+	memset(fname, PATH_MAX, 0);
+	sprintf(fname, "/sys/block/%s/md/", sra->sys_name);
+	base = fname + strlen(fname);
+
+	/* Get all the devices as well */
+	*base = 0;
+	dir = opendir(fname);
+	if (!dir)
+		goto abort;
+	while ((de = readdir(dir)) != NULL) {
+		if (de->d_ino == 0 ||
+		    strncmp(de->d_name, "dev-", 4) != 0)
+			continue;
+		strcpy(base, de->d_name);
+		dbase = base + strlen(base);
+		*dbase = '\0';
+		dbase = strstr(fname, "/md/");
+		if (dbase && strcmp(devname, dbase) == 0) {
+			ret_val = 1;
+			goto abort;
+		}
+	}
+abort:
+	if (dir)
+		closedir(dir);
+	sysfs_free(sra);
+
+	return ret_val;
+}
+
+struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd)
+{
+	char fname[PATH_MAX];
+	char buf[PATH_MAX];
+	char *base;
+	char *dbase;
+	struct mdinfo *ret_val;
+	struct mdinfo *dev;
+	DIR *dir = NULL;
+	struct dirent *de;
+	int is_in;
+	char *to_check;
+
+	ret_val = malloc(sizeof(*ret_val));
+	if (ret_val == NULL)
+		goto abort;
+	memset(ret_val, 0, sizeof(*ret_val));
+	sysfs_init(ret_val, container_fd, -1);
+	if (ret_val->sys_name[0] == 0)
+		goto abort;
+
+	sprintf(fname, "/sys/block/%s/md/", ret_val->sys_name);
+	base = fname + strlen(fname);
+
+	strcpy(base, "raid_disks");
+	if (load_sys(fname, buf))
+		goto abort;
+	ret_val->array.raid_disks = strtoul(buf, NULL, 0);
+
+	/* Get all the devices as well */
+	*base = 0;
+	dir = opendir(fname);
+	if (!dir)
+		goto abort;
+	ret_val->array.spare_disks = 0;
+	while ((de = readdir(dir)) != NULL) {
+		char *ep;
+		if (de->d_ino == 0 ||
+		    strncmp(de->d_name, "dev-", 4) != 0)
+			continue;
+		strcpy(base, de->d_name);
+		dbase = base + strlen(base);
+		*dbase = '\0';
+
+		to_check = strstr(fname, "/md/");
+		is_in = sysfs_is_spare_device_belongs_to(fd, to_check);
+		if (is_in == -1) {
+			char *p;
+			struct stat stb;
+			char stb_name[PATH_MAX];
+
+			dev = malloc(sizeof(*dev));
+			if (!dev)
+				goto abort;
+			strncpy(dev->text_version, fname, 50);
+
+			*dbase++ = '/';
+
+			dev->disk.raid_disk = strtoul(buf, &ep, 10);
+			dev->disk.raid_disk = -1;
+
+			strcpy(dbase, "block/dev");
+			if (load_sys(fname, buf)) {
+				free(dev);
+				continue;
+			}
+			/* check first if we are working on block device
+			 * if not, we cannot check it
+			 */
+			p = strchr(dev->text_version, '-');
+			if (p)
+				p++;
+			sprintf(stb_name, "/dev/%s", p);
+			if (stat(stb_name, &stb) < 0) {
+				dprintf(Name ": stat failed for %s: %s.\n",
+					stb_name, strerror(errno));
+				free(dev);
+				continue;
+			}
+			if (!S_ISBLK(stb.st_mode)) {
+				dprintf(Name\
+					": %s is not a block device."\
+					" Skip checking.\n",
+					stb_name);
+				goto skip;
+			}
+			dprintf(Name": %s seams to a be block device\n",
+				stb_name);
+			sscanf(buf, "%d:%d",
+			       &dev->disk.major,
+			       &dev->disk.minor);
+			strcpy(dbase, "block/device/state");
+			if (load_sys(fname, buf) != 0) {
+				free(dev);
+				continue;
+			}
+			if (strncmp(buf, "offline", 7) == 0) {
+				free(dev);
+				continue;
+			}
+			if (strncmp(buf, "failed", 6) == 0) {
+				free(dev);
+				continue;
+			}
+
+skip:
+			/* add this disk to spares list */
+			dev->next = ret_val->devs;
+			ret_val->devs = dev;
+			ret_val->array.spare_disks++;
+			*(dbase-1) = '\0';
+			dprintf("sysfs: found spare: %s [%d:%d]\n",
+				fname, dev->disk.major, dev->disk.minor);
+		}
+	}
+	closedir(dir);
+	return ret_val;
+
+abort:
+	if (dir)
+		closedir(dir);
+	sysfs_free(ret_val);
+
+	return NULL;
+}
+
 int sysfs_freeze_array(struct mdinfo *sra)
 {
 	/* Try to freeze resync/rebuild on this array/container.
diff --git a/util.c b/util.c
index 4b41e2b..ebeaa16 100644
--- a/util.c
+++ b/util.c
@@ -1906,3 +1906,37 @@ int experimental(void)
 	}
 }
 
+struct mdstat_ent *find_array_by_subdev(char *subdev, int container)
+{
+	struct mdstat_ent *mdstat = mdstat_read(0, 0);
+	char full_name[PATH_MAX];
+	char *name;
+
+	sprintf(full_name, "/md%i/%s", container, subdev);
+	while (mdstat) {
+		struct mdstat_ent *ent;
+
+		if (mdstat->metadata_version &&
+		    strncmp(mdstat->metadata_version, "external:", 9) == 0) {
+			char *name_pos = strchr(mdstat->metadata_version, ':');
+			name = full_name;
+			if (name_pos) {
+				/* we should move to omit first character
+				 * in the name as it can have 2 values:
+				 * 1: '/' for not frozen array
+				 * 2: '-' for frozen array
+				 */
+				name_pos += 2;
+				name++;
+				if (strcmp(name_pos, name) == 0)
+					return mdstat;
+			}
+		}
+		ent = mdstat;
+		mdstat = mdstat->next;
+		ent->next = NULL;
+		free_mdstat(ent);
+	}
+	return NULL;
+}
+


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/29] imsm: Process reshape_update in mdmon
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
  2010-12-09 15:18 ` [PATCH 01/29] Add state_of_reshape for external metadata Adam Kwolek
  2010-12-09 15:18 ` [PATCH 02/29] imsm: Prepare reshape_update in mdadm Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:19 ` [PATCH 04/29] imsm: Block array state change during reshape Adam Kwolek
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

For this update prepare_update() allocates memory to relink imsm (bigger) device
imsm structures. It calculates new /bigger/ anchor size.

Process update applies update in to imsm structures. If necessary for first array in container
it turns spares in to raid disks in metadata.

active_array receives information about number of added devices (reshape_delta_disks)
state_of_array is turned in to reshape_is_starting (this triggers managemon action)

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |  156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 156 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 183b82c..48e26b1 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5310,6 +5310,111 @@ static void imsm_process_update(struct supertype *st,
 
 	switch (type) {
 	case update_reshape: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct dl *new_disk;
+		struct active_array *a;
+		int i;
+		__u32 new_mpb_size;
+		int new_disk_num;
+		struct intel_dev *current_dev;
+
+		dprintf("imsm: imsm_process_update() for update_reshape "\
+			"[u->update_prepared  = %i]\n",
+			u->update_prepared);
+		if ((u->update_prepared == -1) ||
+		    (u->devnum < 0)) {
+			dprintf("imsm: Error: update_reshape not prepared\n");
+			goto update_reshape_exit;
+		}
+
+		if (u->spares_in_update) {
+			new_disk_num = mpb->num_disks + u->reshape_delta_disks;
+			new_mpb_size = disks_to_mpb_size(new_disk_num);
+			if (mpb->mpb_size < new_mpb_size)
+				mpb->mpb_size = new_mpb_size;
+
+			/* enable spares to use in array
+			*/
+			for (i = 0; i < u->reshape_delta_disks; i++) {
+				char buf[PATH_MAX];
+
+				new_disk = super->disks;
+				while (new_disk) {
+					if ((new_disk->major ==
+					     u->upd_disks[i].major) &&
+					    (new_disk->minor ==
+					     u->upd_disks[i].minor))
+							break;
+					new_disk = new_disk->next;
+				}
+				if (new_disk == NULL) {
+					u->update_prepared = -1;
+					goto update_reshape_exit;
+				}
+				if (new_disk->index < 0) {
+					new_disk->index = i + mpb->num_disks;
+					/* slot to fill in autolayout */
+					new_disk->raiddisk = new_disk->index;
+					new_disk->disk.status |=
+						CONFIGURED_DISK;
+					new_disk->disk.status &= ~SPARE_DISK;
+				}
+				sprintf(buf,
+					"%d:%d",
+					new_disk->major,
+					new_disk->minor);
+				if (new_disk->fd < 0)
+					new_disk->fd = dev_open(buf, O_RDWR);
+				fd2devname(new_disk->fd , buf);
+				new_disk->devname = strdup(buf);
+			}
+		}
+
+		dprintf("imsm: process_update(): update_reshape: volume set"\
+			" mpb->num_raid_devs = %i\n", mpb->num_raid_devs);
+		/* manage changes in volumes
+		 */
+		/* check if array is in RESHAPE_NOT_ACTIVE reshape state
+		*/
+		for (a = st->arrays; a; a = a->next)
+			if (a->devnum == u->devnum)
+				break;
+		if ((a == NULL) || (a->reshape_state != reshape_not_active)) {
+			u->update_prepared = -1;
+			goto update_reshape_exit;
+		}
+		/* find current dev in intel_super
+		 */
+		dprintf("\t\tLooking  for volume %s\n",
+			(char *)u->devs_mem.dev->volume);
+		current_dev = super->devlist;
+		while (current_dev) {
+			if (strcmp((char *)current_dev->dev->volume,
+				   (char *)u->devs_mem.dev->volume) == 0)
+				break;
+			current_dev = current_dev->next;
+		}
+		if (current_dev == NULL) {
+			u->update_prepared = -1;
+			goto update_reshape_exit;
+		}
+
+		dprintf("Found volume %s\n", (char *)current_dev->dev->volume);
+		/* replace current device with provided in update
+		 */
+		free(current_dev->dev);
+		current_dev->dev = u->devs_mem.dev;
+		u->devs_mem.dev = NULL;
+
+		/* set reshape_delta_disks
+		 */
+		a->reshape_delta_disks = u->reshape_delta_disks;
+		a->reshape_state = reshape_is_starting;
+
+		super->updates_pending++;
+update_reshape_exit:
+		if (u->devs_mem.dev)
+			free(u->devs_mem.dev);
 		break;
 	}
 	case update_activate_spare: {
@@ -5632,6 +5737,57 @@ static void imsm_prepare_update(struct supertype *st,
 
 	switch (type) {
 	case update_reshape: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct dl *dl = NULL;
+		void *upd_devs;
+
+		u->update_prepared = -1;
+		u->devs_mem.dev = NULL;
+		dprintf("imsm: imsm_prepare_update() for update_reshape\n");
+		if (u->devnum < 0) {
+			dprintf("imsm: No passed device.\n");
+			break;
+		}
+		dprintf("imsm: reshape delta disks is = %i\n",
+			u->reshape_delta_disks);
+		if (u->reshape_delta_disks < 0)
+			break;
+		u->update_prepared = 1;
+		if (u->reshape_delta_disks == 0) {
+			/* for non growing reshape buffers sizes
+			 * are not affected but check some parameters
+			 */
+			break;
+		}
+		/* count HDDs
+		 */
+		u->disks_count = 0;
+		for (dl = super->disks; dl; dl = dl->next)
+			if (dl->index >= 0)
+				u->disks_count++;
+
+		/* set pointer in monitor address space
+		*/
+		upd_devs = (struct imsm_dev *)((void *)u + u->upd_devs_offset);
+		/* allocate memory for new volumes */
+		if (((struct imsm_dev *)(upd_devs))->vol.migr_type !=
+		    MIGR_GEN_MIGR) {
+			dprintf("imsm: Error.Device is not in "\
+				"migration state.\n");
+			u->update_prepared = -1;
+			break;
+		}
+		dprintf("passed device : %s\n",
+			((struct imsm_dev *)(upd_devs))->volume);
+		u->devs_mem.dev = calloc(1, u->device_size);
+		if (u->devs_mem.dev == NULL) {
+			u->update_prepared = -1;
+			break;
+		}
+		dprintf("METADATA Copy - using it.\n");
+		memcpy(u->devs_mem.dev, upd_devs, u->device_size);
+		len = disks_to_mpb_size(u->spares_in_update + mpb->num_disks);
+		dprintf("New anchor length is %llu\n", (unsigned long long)len);
 		break;
 	}
 	case update_create_array: {


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/29] imsm: Block array state change during reshape
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (2 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 03/29] imsm: Process reshape_update in mdmon Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:19 ` [PATCH 05/29] Process reshape initialization by managemon Adam Kwolek
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Array state change is blocked due to reshape action in progress
metadata changes are during applying.

'1' is returned to indicate that array is clean

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 48e26b1..4cc02d7 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -4780,6 +4780,17 @@ static int imsm_set_array_state(struct active_array *a, int consistent)
 	__u8 map_state = imsm_check_degraded(super, dev, failed);
 	__u32 blocks_per_unit;
 
+	if (a->reshape_state != reshape_not_active) {
+		/* array state change is blocked due to reshape action
+		 * metadata changes are during applying.
+		 *
+		 * '1' is returned to indicate that array is clean
+		 */
+		dprintf("imsm: imsm_set_array_state() called "\
+			"during reshape.\n");
+		return 1;
+	}
+
 	/* before we activate this array handle any missing disks */
 	if (consistent == 2)
 		handle_missing(super, dev);
@@ -5144,6 +5155,10 @@ static struct mdinfo *imsm_activate_spare(struct active_array *a,
 
 	dprintf("imsm: activate spare: inst=%d failed=%d (%d) level=%d\n",
 		inst, failed, a->info.array.raid_disks, a->info.array.level);
+
+	if (a->reshape_state != reshape_not_active)
+		return NULL;
+
 	if (imsm_check_degraded(super, dev, failed) != IMSM_T_STATE_DEGRADED)
 		return NULL;
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/29] Process reshape initialization by managemon
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (3 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 04/29] imsm: Block array state change during reshape Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:19 ` [PATCH 06/29] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Monitor signals request to managemon (using reshape_delta_disks variable).
This caused call to reshape_array() vector. It prepares second metadata update for added disks slot verification.
Slots are set by md during reshape start and they are unknown to user space so far.
Second update is sent after reshape is started. During this update processing, metadata is checked against slot numbers set by md and in mismatch case information metadata is updated.

The reshape is being stared in delayed state due to sync_max was set to 0. After this reshape_delta_disk is set to 'in progress' value to avoid reentry.
Reshape process is continued in mdadm.

If reshape cannot be started or any failure condition occurs, 'cancel' message is prepared by reshape_array() and send to monitor, to rollback metadata changes.
Mdadm is informed about failure by idle array state.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |   86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mdadm.h     |   27 +++++++++++++++++++
 2 files changed, 113 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index 945b173..c48b114 100644
--- a/managemon.c
+++ b/managemon.c
@@ -398,6 +398,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 	 */
 	char buf[64];
 	int frozen;
+	struct active_array *newa = NULL;
 
 	// FIXME
 	a->info.array.raid_disks = mdstat->raid_disks;
@@ -409,6 +410,91 @@ static void manage_member(struct mdstat_ent *mdstat,
 	else
 		frozen = 1; /* can't read metadata_version assume the worst */
 
+	if ((a->reshape_state != reshape_not_active) &&
+	    (a->reshape_state != reshape_in_progress)) {
+		dprintf("Reshape signals need to manage this member\n");
+		if (a->container->ss->reshape_array) {
+			struct metadata_update *updates = NULL;
+			struct mdinfo *newdev = NULL;
+			struct mdinfo *d;
+
+			newdev = newa->container->ss->reshape_array(newa,
+							reshape_in_progress,
+							&updates);
+			if (newdev) {
+				int status_ok = 1;
+				newa = duplicate_aa(a);
+				if (newa == NULL)
+					goto reshape_out;
+
+				for (d = newdev; d ; d = d->next) {
+					struct mdinfo *newd;
+
+					newd = malloc(sizeof(*newd));
+					if (!newd) {
+						status_ok = 0;
+						dprintf("Cannot allocate "\
+							"memory\n");
+						continue;
+					}
+					if (sysfs_add_disk(&newa->info,
+							   d,
+							   0) < 0) {
+						free(newd);
+						status_ok = 0;
+						dprintf("Cannot add disk "\
+							"to array.\n");
+						continue;
+					}
+					disk_init_and_add(newd, d, newa);
+				}
+				/* go with reshape
+				 */
+				if (status_ok)
+					if (sysfs_set_num(&newa->info,
+							  NULL,
+							  "sync_max",
+							  0) < 0)
+						status_ok = 0;
+				if (status_ok && sysfs_set_str(&newa->info,
+							      NULL,
+							      "sync_action",
+							      "reshape") == 0) {
+					/* reshape executed
+					 */
+					dprintf("Reshape was started\n");
+					replace_array(a->container, a, newa);
+					a = newa;
+				} else {
+					/* on problems cancel update
+					 */
+					free_aa(newa);
+					free_updates(&updates);
+					updates = NULL;
+					a->container->ss->reshape_array(a,
+							reshape_cancel_request,
+							&updates);
+					sysfs_set_str(&a->info,
+						      NULL,
+						      "sync_action",
+						      "idle");
+				}
+			}
+			dprintf("Send metadata update for reshape.\n");
+
+			queue_metadata_update(updates);
+			updates = NULL;
+			wakeup_monitor();
+reshape_out:
+			while (newdev) {
+				d = newdev->next;
+				free(newdev);
+				newdev = d;
+			}
+			free_updates(&updates);
+		}
+	}
+
 	if (a->check_degraded && !frozen) {
 		struct metadata_update *updates = NULL;
 		struct mdinfo *newdev = NULL;
diff --git a/mdadm.h b/mdadm.h
index ba3a9c5..e2f273f 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -520,6 +520,7 @@ extern char *map_dev(int major, int minor, int create);
 
 struct active_array;
 struct metadata_update;
+enum state_of_reshape;
 
 /* A superswitch provides entry point the a metadata handler.
  *
@@ -747,6 +748,32 @@ extern struct superswitch {
 	 */
 	const char *(*get_disk_controller_domain)(const char *path);
 
+	/* reshape_array() will
+	 * 1. check is sync_max is set to 0
+	 * 2. prepare device list that has to be added
+	 * 3. prepare metadata update message to set disks slots
+	 *    after reshape is started
+	 * request_type:
+	 * 1. RESHAPE_CANCEL_REQUEST
+	 *    In error case it prepares metadata roll back message.
+	 *    Such error case message should be prepared when
+	 *    passed request_type is set to RESHAPE_CANCEL_REQUEST.
+	 * 1. RESHAPE_IN_PROGRESS
+	 *    requests transition to RESHAPE_IN_PROGRESS state
+	 *    so proper update has to be prepared
+	 * In active array structure can appear values:
+	 * 1. RESHAPE_NOT_ACTIVE
+	 * 2. RESHAPE_IN_PROGRESS
+	 * 3. any other value indicates requested disk number if array change
+	 *    this is visible only during reshape and metadata initialization
+	 *    after initialization RESHAPE_IN_PROGRESS has to be placed
+	 *    in reshape_delta_disks. When reshape is finished it is replaced
+	 *    by RESHAPE_NOT_ACTIVE
+	 */
+	struct mdinfo *(*reshape_array)(struct active_array *a,
+			     enum state_of_reshape request_type,
+			     struct metadata_update **updates);
+
 	int swapuuid; /* true if uuid is bigending rather than hostendian */
 	int external;
 	const char *name; /* canonical metadata name */


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/29] imsm: Verify slots in meta against slot numbers set by md
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (4 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 05/29] Process reshape initialization by managemon Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:19 ` [PATCH 07/29] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

To verify slots numbers stored in metadata against those chosen by md, update_reshape_set_slots_update is used.

Managemon calls reshape_array() vector and prepares slot verification metadata update there. It is sent when reshape is started successfully in md.
Then monitor updates/verifies slots.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |  306 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 306 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 4cc02d7..b54cca7 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -286,6 +286,7 @@ enum imsm_update_type {
 	update_rename_array,
 	update_add_disk,
 	update_reshape,
+	update_reshape_set_slots,
 };
 
 struct imsm_update_activate_spare {
@@ -5289,6 +5290,7 @@ static int disks_overlap(struct intel_super *super, int idx, struct imsm_update_
 }
 
 static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
+static int imsm_reshape_array_set_slots(struct active_array *a);
 
 static void imsm_process_update(struct supertype *st,
 			        struct metadata_update *update)
@@ -5432,6 +5434,26 @@ update_reshape_exit:
 			free(u->devs_mem.dev);
 		break;
 	}
+	case update_reshape_set_slots: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct active_array *a;
+
+		dprintf("imsm: process_update() for update_reshape_set_slot "\
+			"for device %i\n",
+			u->devnum);
+		for (a = st->arrays; a; a = a->next)
+			if (a->devnum == u->devnum)
+				break;
+
+		if (a == NULL) {
+			dprintf(" - cannot locate requested array\n");
+			break;
+		}
+
+		if (imsm_reshape_array_set_slots(a) > -1)
+			super->updates_pending++;
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5805,6 +5827,9 @@ static void imsm_prepare_update(struct supertype *st,
 		dprintf("New anchor length is %llu\n", (unsigned long long)len);
 		break;
 	}
+	case update_reshape_set_slots: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -6624,6 +6649,286 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	return ret_val;
 }
 
+/* imsm_reshape_array_manage_new_slots()
+ * returns: number of corrected slots for correct == 1
+ *          counted number of different slots for correct == 0
+*/
+static int imsm_reshape_array_manage_new_slots(struct intel_super *super,
+					int inst,
+					int devnum,
+					int correct)
+{
+
+	struct imsm_dev *dev = get_imsm_dev(super, inst);
+	struct imsm_map *map_1 = get_imsm_map(dev, 0);
+	struct imsm_map *map_2 = get_imsm_map(dev, 1);
+	struct dl *dl;
+	unsigned long long sysfs_slot;
+	char buf[PATH_MAX];
+	int fd;
+	struct mdinfo *sra = NULL;
+	int ret_val = 0;
+
+	if ((map_1 == NULL) || (map_2 == NULL)) {
+		dprintf("imsm_reshape_array_set_slots() no maps "\
+			"(map_1 =%p, map_2 = %p)\n",
+			map_1,
+			map_2);
+		dprintf("\t\tdev->vol.migr_state = %i\n", dev->vol.migr_state);
+		dprintf("\t\tdev->volume = %s\n", dev->volume);
+		return -1;
+	}
+
+	/* verify/correct slot configuration of added disks
+	 */
+	dprintf("\n\nStart map verification for %i added devices "\
+		"on device no %i\n",
+		map_1->num_members - map_2->num_members, devnum);
+
+	fd = open_dev(devnum);
+	if (fd < 0) {
+		dprintf("imsm: ERROR: Cannot open device %s.\n", buf);
+		return -1;
+	}
+
+	sra = sysfs_read(fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+	if (!sra) {
+		dprintf("imsm: ERROR: Device not found.\n");
+		close(fd);
+		return -1;
+	}
+
+	for (dl = super->disks; dl; dl = dl->next) {
+		int fd2;
+		int rv;
+
+		dprintf("\tLooking at device %s (index = %i).\n",
+			dl->devname,
+			dl->index);
+		if (dl->devname && (strlen(dl->devname) > 5))
+			sprintf(buf, "/sys/block/%s/md/dev-%s/slot",
+				sra->sys_name, dl->devname+5);
+		fd2 = open(buf, O_RDONLY);
+		if (fd2 < 0)
+			continue;
+		rv = sysfs_fd_get_ll(fd2, &sysfs_slot);
+		close(fd2);
+		if (rv < 0)
+			continue;
+		dprintf("\t\tLooking at slot %llu in sysfs.\n", sysfs_slot);
+		if ((int)sysfs_slot != dl->index) {
+			dprintf("Slots doesn't match sysfs->%i and imsm->%i\n",
+				(int)sysfs_slot,
+				dl->index);
+			ret_val++;
+			if (correct)
+				dl->index = sysfs_slot;
+		}
+	}
+	close(fd);
+	sysfs_free(sra);
+	dprintf("IMSM Map verification finished (found wrong slots : %i).\n",
+		ret_val);
+
+	return ret_val;
+}
+
+static int imsm_reshape_array_set_slots(struct active_array *a)
+{
+	struct intel_super *super = a->container->sb;
+	int inst = a->info.container_member;
+
+	return imsm_reshape_array_manage_new_slots(super, inst, a->devnum, 1);
+}
+
+struct mdinfo *imsm_grow_array(struct active_array *a)
+{
+	int disk_count = 0;
+	struct intel_super *super = a->container->sb;
+	int inst = a->info.container_member;
+	struct imsm_dev *dev = get_imsm_dev(super, inst);
+	struct imsm_map *map = get_imsm_map(dev, 0);
+	struct mdinfo *di;
+	struct dl *dl;
+	int i;
+	int prev_raid_disks = a->info.array.raid_disks;
+	int new_raid_disks = prev_raid_disks + a->reshape_delta_disks;
+	struct mdinfo *vol = NULL;
+	int fd;
+	struct mdinfo *rv = NULL;
+
+	dprintf("imsm: grow array: inst=%d raid disks=%d(%d) level=%d\n",
+		inst,
+		a->info.array.raid_disks,
+		new_raid_disks,
+		a->info.array.level);
+
+	/* get array sysfs entry
+	 */
+	fd = open_dev(a->devnum);
+	if (fd < 0)
+		return rv;
+	vol = sysfs_read(fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+	if (vol == NULL) {
+		close(fd);
+		return rv;
+	}
+	/* Look for all disks beyond current configuration
+	 * To handle degradation after takeover
+	 * look also on last disk in configuration.
+	 */
+	for (i = prev_raid_disks; i < new_raid_disks; i++) {
+		/* OK, this device can be added.  Try to add.
+		 */
+		dl = imsm_add_spare(super, i, a, 0, rv);
+		if (!dl)
+			continue;
+
+		if (dl->index < 0)
+			dl->index = i;
+		/* found a usable disk with enough space */
+		di = malloc(sizeof(*di));
+		if (!di)
+			continue;
+
+		memset(di, 0, sizeof(*di));
+		/* dl->index will be -1 in the case we are activating a
+		 * pristine spare.  imsm_process_update() will create a
+		 * new index in this case.  On disks=4(5)ce a disk is found
+		 * to be failed in all member arrays it is kicked from the
+		 * metadata
+		 */
+		di->disk.number = dl->index;
+
+		/* (ab)use di->devs to store a pointer to the device
+		 * we chose
+		 */
+		di->devs = (struct mdinfo *) dl;
+
+		di->disk.raid_disk = -1;
+		di->disk.major = dl->major;
+		di->disk.minor = dl->minor;
+		di->disk.state = (1<<MD_DISK_SYNC) |
+				 (1<<MD_DISK_ACTIVE);
+		di->next_state = 0;
+
+		di->recovery_start = MaxSector;
+		di->data_offset = __le32_to_cpu(map->pba_of_lba0);
+		di->component_size = a->info.component_size;
+		di->container_member = inst;
+		super->random = random32();
+
+		di->next = rv;
+		rv = di;
+		disk_count++;
+		dprintf("%x:%x to be %d at %llu\n", dl->major, dl->minor,
+			i, di->data_offset);
+	}
+
+	dprintf("imsm: imsm_grow_array() configures %i raid disks\n",
+		disk_count);
+	close(fd);
+	sysfs_free(vol);
+	if (disk_count != a->reshape_delta_disks) {
+
+		dprintf("imsm: ERROR: but it should configure %i\n",
+			a->reshape_delta_disks);
+
+		while (rv) {
+			di = rv;
+			rv = rv->next;
+			free(di);
+		}
+	}
+
+	return rv;
+}
+
+struct mdinfo *imsm_reshape_array(struct active_array *a,
+				  enum state_of_reshape request_type,
+				  struct metadata_update **updates)
+{
+	struct imsm_update_reshape *u = NULL;
+	struct metadata_update *mu;
+	struct mdinfo *disk_list = NULL;
+
+	dprintf("imsm: imsm_reshape_array(reshape_delta_disks = %i)\t",
+		a->reshape_delta_disks);
+	if (request_type == reshape_cancel_request) {
+		dprintf("prepare cancel message.\n");
+		goto imsm_reshape_array_exit;
+	}
+	if (a->reshape_state == reshape_not_active) {
+		dprintf("has nothing to do.\n");
+		return disk_list;
+	}
+	if (a->reshape_delta_disks < 0) {
+		dprintf("doesn't support shrinking.\n");
+		a->reshape_state = reshape_not_active;
+		return disk_list;
+	}
+
+	if (a->reshape_delta_disks == 0) {
+		dprintf("array parameters has to be changed\n");
+		/* TBD */
+	}
+	if (a->reshape_delta_disks > 0) {
+		dprintf("grow is detected.\n");
+		disk_list = imsm_grow_array(a);
+	}
+
+	if (disk_list) {
+		dprintf("imsm: send update update_reshape_set_slots\n");
+
+		u = (struct imsm_update_reshape *)calloc(1,
+					sizeof(struct imsm_update_reshape));
+		if (u) {
+			u->type = update_reshape_set_slots;
+			a->reshape_state = reshape_in_progress;
+		}
+	} else
+		dprintf("error: cannot start reshape\n");
+
+imsm_reshape_array_exit:
+	if (u == NULL) {
+		dprintf("imsm: send update update_reshape_cancel\n");
+		a->reshape_state = reshape_not_active;
+		sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+	}
+
+	if (u) {
+		/* post any prepared update
+		 */
+		u->devnum = a->devnum;
+
+		u->update_memory_size = sizeof(struct imsm_update_reshape);
+		u->reshape_delta_disks = a->reshape_delta_disks;
+		u->update_prepared = 1;
+
+		mu = malloc(sizeof(struct metadata_update));
+		if (mu) {
+			mu->buf = (void *)u;
+			mu->space = NULL;
+			mu->len = u->update_memory_size;
+			mu->next = *updates;
+			*updates = mu;
+		} else {
+			a->reshape_state = reshape_not_active;
+			free(u);
+			u = NULL;
+		}
+	}
+
+	if ((disk_list) && (u == NULL)) {
+		while (disk_list) {
+			struct mdinfo *di = disk_list;
+			disk_list = disk_list->next;
+			free(di);
+		}
+	}
+	return disk_list;
+}
+
 struct superswitch super_imsm = {
 #ifndef	MDASSEMBLE
 	.examine_super	= examine_super_imsm,
@@ -6660,6 +6965,7 @@ struct superswitch super_imsm = {
 	.default_geometry = default_geometry_imsm,
 	.get_disk_controller_domain = imsm_get_disk_controller_domain,
 	.reshape_super  = imsm_reshape_super,
+	.reshape_array	= imsm_reshape_array,
 
 	.external	= 1,
 	.name = "imsm",


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/29] imsm: Cancel metadata changes on reshape start failure
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (5 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 06/29] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:19 ` [PATCH 08/29] imsm: Do not accept messages sent by mdadm Adam Kwolek
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

It can occurs that managemon cannot run reshape in md.
To perform metadata changes cancellation, update_reshape_cancel message is used. It is prepared by reshape_array() vector.
When monitor receives this message, it rollbacks metadata changes made previously during processing update_reshape update.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |  134 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index b54cca7..924e57a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -287,6 +287,7 @@ enum imsm_update_type {
 	update_add_disk,
 	update_reshape,
 	update_reshape_set_slots,
+	update_reshape_cancel,
 };
 
 struct imsm_update_activate_spare {
@@ -5454,6 +5455,104 @@ update_reshape_exit:
 			super->updates_pending++;
 		break;
 	}
+	case update_reshape_cancel: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct active_array *a;
+		int inst;
+		int i;
+		struct imsm_dev *dev;
+		struct imsm_dev *devi;
+		struct imsm_map *map_1;
+		struct imsm_map *map_2;
+		int reshape_delta_disks ;
+		struct dl *curr_disk;
+		int used_disks;
+		unsigned long long array_blocks;
+
+
+		dprintf("imsm: process_update() for update_reshape_cancel for "\
+			"device %i\n",
+			u->devnum);
+		for (a = st->arrays; a; a = a->next)
+			if (a->devnum == u->devnum)
+				break;
+		if (a == NULL)
+			break;
+
+		inst = a->info.container_member;
+		dev = get_imsm_dev(super, inst);
+		map_1 = get_imsm_map(dev, 0);
+		map_2 = get_imsm_map(dev, 1);
+		if (map_2 == NULL)
+			break;
+		reshape_delta_disks = map_1->num_members - map_2->num_members;
+		dprintf("\t\tRemove %i device(s) from configuration.\n",
+			reshape_delta_disks);
+
+		/* when cancel was applied during reshape of second volume,
+		 * we need disks for first array reshaped previously,
+		 * find the smallest delta_disks to remove
+		 */
+		i = 0;
+		devi = get_imsm_dev(super, i);
+		while (devi) {
+			struct imsm_map *mapi = get_imsm_map(devi, 0);
+			int delta_disks;
+
+			delta_disks = map_1->num_members - mapi->num_members;
+			if ((i != inst) &&
+			    (delta_disks < reshape_delta_disks) &&
+			    (delta_disks >= 0))
+				reshape_delta_disks = delta_disks;
+			i++;
+			devi = get_imsm_dev(super, i);
+		}
+		/* remove disks
+		 */
+		if (reshape_delta_disks > 0) {
+			/* reverse device(s) back to spares
+			*/
+			curr_disk = super->disks;
+			while (curr_disk) {
+				dprintf("Looking at %i device to remove\n",
+					curr_disk->index);
+				if (curr_disk->index >= map_2->num_members) {
+					dprintf("\t\t\tREMOVE\n");
+					curr_disk->index = -1;
+					curr_disk->raiddisk = -1;
+					curr_disk->disk.status &=
+						~CONFIGURED_DISK;
+					curr_disk->disk.status |=
+						SPARE_DISK;
+				}
+				curr_disk = curr_disk->next;
+			}
+		}
+		/* roll back maps and migration
+		 */
+		memcpy(map_1, map_2, sizeof_imsm_map(map_2));
+		/* reconfigure map_2 and perform migration end
+		 */
+		map_2 = get_imsm_map(dev, 1);
+		memcpy(map_2, map_1, sizeof_imsm_map(map_1));
+		end_migration(dev, map_1->map_state);
+		/* array size rollback
+		 */
+		used_disks = imsm_num_data_members(dev);
+		if (used_disks) {
+			array_blocks = map_1->blocks_per_member * used_disks;
+			/* round array size down to closest MB
+			*/
+			array_blocks = (array_blocks >> SECT_PER_MB_SHIFT)
+					<< SECT_PER_MB_SHIFT;
+			dev->size_low = __cpu_to_le32((__u32)array_blocks);
+			dev->size_high = __cpu_to_le32((__u32)(array_blocks
+					>> 32));
+		}
+
+		super->updates_pending++;
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5830,6 +5929,9 @@ static void imsm_prepare_update(struct supertype *st,
 	case update_reshape_set_slots: {
 		break;
 	}
+	case update_reshape_cancel: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -6649,6 +6751,31 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	return ret_val;
 }
 
+void imsm_grow_array_remove_devices_on_cancel(struct active_array *a)
+{
+	struct mdinfo *di = a->info.devs;
+	struct mdinfo *di_prev = NULL;
+
+	while (di) {
+		if (di->disk.raid_disk < 0) {
+			struct mdinfo *rmdev = di;
+			sysfs_set_str(&a->info, rmdev, "state", "faulty");
+			sysfs_set_str(&a->info, rmdev, "slot", "none");
+			sysfs_set_str(&a->info, rmdev, "state", "remove");
+
+			if (di_prev)
+				di_prev->next = di->next;
+			else
+				a->info.devs = di->next;
+			di = di->next;
+			free(rmdev);
+		} else {
+			di_prev = di;
+			di = di->next;
+		}
+	}
+}
+
 /* imsm_reshape_array_manage_new_slots()
  * returns: number of corrected slots for correct == 1
  *          counted number of different slots for correct == 0
@@ -6894,6 +7021,13 @@ imsm_reshape_array_exit:
 		dprintf("imsm: send update update_reshape_cancel\n");
 		a->reshape_state = reshape_not_active;
 		sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+		imsm_grow_array_remove_devices_on_cancel(a);
+		u = (struct imsm_update_reshape *)calloc(1,
+					sizeof(struct imsm_update_reshape));
+		if (u) {
+			u->type = update_reshape_cancel;
+			a->reshape_state = reshape_not_active;
+		}
 	}
 
 	if (u) {


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/29] imsm: Do not accept messages sent by mdadm
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (6 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 07/29] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:19 ` [PATCH 09/29] imsm: Do not indicate resync during reshape Adam Kwolek
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Messages update_reshape_cancel and update_reshape_set_slots ara intended to send by managemon.
If those message would be issued by mdadm prepare_message() is called in managemon for them.
In such cases set update_prepared to '-1' to indicate process_message() to not proceed such messages.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 924e57a..7e61f52 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5451,6 +5451,13 @@ update_reshape_exit:
 			break;
 		}
 
+		/* do not accept this update type sent by mdadm
+		 */
+		if (u->update_prepared == -1) {
+			dprintf("imsm: message is not accepted\n");
+			break;
+		}
+
 		if (imsm_reshape_array_set_slots(a) > -1)
 			super->updates_pending++;
 		break;
@@ -5479,6 +5486,13 @@ update_reshape_exit:
 		if (a == NULL)
 			break;
 
+		/* do not accept this update type sent by mdadm
+		 */
+		if (u->update_prepared == -1) {
+			dprintf("imsm: message is not accepted\n");
+			break;
+		}
+
 		inst = a->info.container_member;
 		dev = get_imsm_dev(super, inst);
 		map_1 = get_imsm_map(dev, 0);
@@ -5927,9 +5941,19 @@ static void imsm_prepare_update(struct supertype *st,
 		break;
 	}
 	case update_reshape_set_slots: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+
+		/* do not accept this update type sent by mdadm
+		 */
+		u->update_prepared = -1;
 		break;
 	}
 	case update_reshape_cancel: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+
+		/* do not accept this update type sent by mdadm
+		 */
+		u->update_prepared = -1;
 		break;
 	}
 	case update_create_array: {


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/29] imsm: Do not indicate resync during reshape
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (7 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 08/29] imsm: Do not accept messages sent by mdadm Adam Kwolek
@ 2010-12-09 15:19 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 10/29] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

If reshape is started resync is not allowed in parallel. This would break reshape.
If array is in General Migration state do not indicate resync and allow for reshape continuation.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 7e61f52..19859ef 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -4702,9 +4702,13 @@ static int is_resyncing(struct imsm_dev *dev)
 	    migr_type(dev) == MIGR_REPAIR)
 		return 1;
 
+	if (migr_type(dev) == MIGR_GEN_MIGR)
+		return 0;
+
 	migr_map = get_imsm_map(dev, 1);
 
-	if (migr_map->map_state == IMSM_T_STATE_NORMAL)
+	if ((migr_map->map_state == IMSM_T_STATE_NORMAL) &&
+	    (dev->vol.migr_type != MIGR_GEN_MIGR))
 		return 1;
 	else
 		return 0;


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/29] imsm: Fill delta_disks field in getinfo_super()
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (8 preceding siblings ...)
  2010-12-09 15:19 ` [PATCH 09/29] imsm: Do not indicate resync during reshape Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 11/29] Control reshape in mdadm Adam Kwolek
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

delta_disks field is not always filled during getinfo_super() call.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 19859ef..07851ce 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1511,6 +1511,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	struct intel_super *super = st->sb;
 	struct imsm_dev *dev = get_imsm_dev(super, super->current_vol);
 	struct imsm_map *map = get_imsm_map(dev, 0);
+	struct imsm_map *prev_map = get_imsm_map(dev, 1);
 	struct dl *dl;
 	char *devname;
 	int map_disks = info->array.raid_disks;
@@ -1542,7 +1543,11 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	info->component_size	  = __le32_to_cpu(map->blocks_per_member);
 	memset(info->uuid, 0, sizeof(info->uuid));
 	info->recovery_start = MaxSector;
-	info->reshape_active = 0;
+	info->reshape_active = (prev_map != NULL);
+	if (info->reshape_active)
+		info->delta_disks = map->num_members - prev_map->num_members;
+	else
+		info->delta_disks = 0;
 
 	if (map->map_state == IMSM_T_STATE_UNINITIALIZED || dev->vol.dirty) {
 		info->resync_start = 0;
@@ -1599,7 +1604,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 			}
 		}
 	}
-}				
+}
 
 /* check the config file to see if we can return a real uuid for this spare */
 static void fixup_container_spare_uuid(struct mdinfo *inf)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/29] Control reshape in mdadm
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (9 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 10/29] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 12/29] Finalize reshape after adding disks to array Adam Kwolek
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When managemon starts reshape while sync_max is set to 0, mdadm waits already for it in manage_reshape().
When array reaches reshape state, manage_reshape() handler checks if all metadata updates are in place.
If not mdadm has to wait until updates hits array.
It starts reshape using child_grow() common code. Then waits until reshape is not finished.
When it happens it sets size to value specified in metadata and performs backward takeover to raid0 if necessary.

If manage_reshape() finds idle array state (instead reshape state) it is treated as error condition and process is terminated.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c        |   16 +
 mdadm.h       |    6 +
 mdmon.c       |   57 +++++
 super-intel.c |  631 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 700 insertions(+), 10 deletions(-)

diff --git a/Grow.c b/Grow.c
index cf09a60..dce7912 100644
--- a/Grow.c
+++ b/Grow.c
@@ -453,10 +453,6 @@ static __u32 bsb_csum(char *buf, int len)
 	return __cpu_to_le32(csum);
 }
 
-static int child_grow(int afd, struct mdinfo *sra, unsigned long blocks,
-		      int *fds, unsigned long long *offsets,
-		      int disks, int chunk, int level, int layout, int data,
-		      int dests, int *destfd, unsigned long long *destoffsets);
 static int child_shrink(int afd, struct mdinfo *sra, unsigned long blocks,
 			int *fds, unsigned long long *offsets,
 			int disks, int chunk, int level, int layout, int data,
@@ -487,7 +483,7 @@ static int freeze_container(struct supertype *st)
 	return 1;
 }
 
-static void unfreeze_container(struct supertype *st)
+void unfreeze_container(struct supertype *st)
 {
 	int container_dev = (st->container_dev != NoMdDev
 			     ? st->container_dev : st->devnum);
@@ -543,7 +539,7 @@ static void unfreeze(struct supertype *st, int frozen)
 	}
 }
 
-static void wait_reshape(struct mdinfo *sra)
+void wait_reshape(struct mdinfo *sra)
 {
 	int fd = sysfs_get_fd(sra, NULL, "sync_action");
 	char action[20];
@@ -2199,10 +2195,10 @@ static void validate(int afd, int bfd, unsigned long long offset)
 	}
 }
 
-static int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
-		      int *fds, unsigned long long *offsets,
-		      int disks, int chunk, int level, int layout, int data,
-		      int dests, int *destfd, unsigned long long *destoffsets)
+int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
+	       int *fds, unsigned long long *offsets,
+	       int disks, int chunk, int level, int layout, int data,
+	       int dests, int *destfd, unsigned long long *destoffsets)
 {
 	char *buf;
 	int degraded = 0;
diff --git a/mdadm.h b/mdadm.h
index e2f273f..a596040 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -474,6 +474,7 @@ extern int sysfs_add_disk(struct mdinfo *sra, struct mdinfo *sd, int resume);
 extern int sysfs_disk_to_scsi_id(int fd, __u32 *id);
 extern int sysfs_unique_holder(int devnum, long rdev);
 extern int sysfs_freeze_array(struct mdinfo *sra);
+extern void wait_reshape(struct mdinfo *sra);
 extern int load_sys(char *path, char *buf);
 extern int reshape_prepare_fdlist(char *devname,
 				  struct mdinfo *sra,
@@ -495,6 +496,11 @@ extern int reshape_open_backup_file(char *backup,
 extern unsigned long compute_backup_blocks(int nchunk, int ochunk,
 					   unsigned int ndata, unsigned int odata);
 extern struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd);
+extern int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
+		      int *fds, unsigned long long *offsets,
+		      int disks, int chunk, int level, int layout, int data,
+		      int dests, int *destfd, unsigned long long *destoffsets);
+extern void unfreeze_container(struct supertype *st);
 
 extern int save_stripes(int *source, unsigned long long *offsets,
 			int raid_disks, int chunk_size, int level, int layout,
diff --git a/mdmon.c b/mdmon.c
index f56e57f..ebadff7 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -517,3 +517,60 @@ static int mdmon(char *devname, int devnum, int must_fork, int takeover)
 
 	exit(0);
 }
+
+/* Below there are some dummy functions
+ * needed for compilation but not used by mdmon
+ */
+
+void unfreeze_container(struct supertype *st)
+{
+}
+
+void wait_reshape(struct mdinfo *sra)
+{
+}
+
+unsigned long compute_backup_blocks(int nchunk, int ochunk,
+				    unsigned int ndata, unsigned int odata)
+{
+	return 0;
+}
+
+
+int reshape_prepare_fdlist(char *devname,
+			   struct mdinfo *sra,
+			   int raid_disks,
+			   int nrdisks,
+			   unsigned long blocks,
+			   char *backup_file,
+			   int *fdlist,
+			   unsigned long long *offsets)
+{
+	return 0;
+}
+
+int reshape_open_backup_file(char *backup_file,
+			     int fd,
+			     char *devname,
+			     long blocks,
+			     int *fdlist,
+			     unsigned long long *offsets)
+{
+	return -1;
+}
+
+int child_grow(int afd, struct mdinfo *sra,
+	       unsigned long stripes, int *fds, unsigned long long *offsets,
+	       int disks, int chunk, int level, int layout, int data,
+	       int dests, int *destfd, unsigned long long *destoffsets)
+{
+	return 1;
+}
+
+void reshape_free_fdlist(int *fdlist,
+			 unsigned long long *offsets,
+			 int size)
+{
+	;
+}
+
diff --git a/super-intel.c b/super-intel.c
index 07851ce..d747322 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -26,6 +26,7 @@
 #include <scsi/sg.h>
 #include <ctype.h>
 #include <dirent.h>
+#include <sys/mman.h>
 
 /* MPB == Metadata Parameter Block */
 #define MPB_SIGNATURE "Intel Raid ISM Cfg Sig. "
@@ -6777,10 +6778,13 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 		} else
 			dprintf("imsm: Operation is not allowed "\
 				"on container\n");
+		if (ret_val)
+			unfreeze_container(st);
 	} else
 		dprintf("imsm: not a container operation\n");
 
 	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
+
 	return ret_val;
 }
 
@@ -6901,6 +6905,13 @@ static int imsm_reshape_array_set_slots(struct active_array *a)
 	return imsm_reshape_array_manage_new_slots(super, inst, a->devnum, 1);
 }
 
+int imsm_reshape_array_count_slots_mismatches(struct intel_super *super,
+					      int inst,
+					      int devnum)
+{
+	return imsm_reshape_array_manage_new_slots(super, inst, devnum, 0);
+}
+
 struct mdinfo *imsm_grow_array(struct active_array *a)
 {
 	int disk_count = 0;
@@ -7096,6 +7107,625 @@ imsm_reshape_array_exit:
 	return disk_list;
 }
 
+int imsm_grow_manage_size(struct supertype *st,
+			  struct mdinfo *sra,
+			  int current_vol)
+{
+	int ret_val = 0;
+	struct mdinfo *info = NULL;
+	unsigned long long size;
+	int container_fd;
+	unsigned long long current_size = 0;
+
+	/* finalize current volume reshape
+	 * for external meta size has to be managed by mdadm
+	 * read size set in meta and put it to md when
+	 * reshape is finished.
+	 */
+
+	if (sra == NULL) {
+		dprintf("Error: imsm_grow_manage_size(): sra == NULL\n");
+		goto exit_grow_manage_size_ext_meta;
+	}
+	wait_reshape(sra);
+
+	/* reshape has finished, update md size
+	 * get per-device size and multiply by data disks
+	 */
+	container_fd = open_dev(st->container_dev);
+	if (container_fd < 0) {
+		dprintf("Error: imsm_grow_manage_size(): container_fd == 0\n");
+		goto exit_grow_manage_size_ext_meta;
+	}
+	st->ss->load_super(st, container_fd, NULL);
+	info = sysfs_read(container_fd,
+			  0,
+			  GET_LEVEL | GET_VERSION | GET_DEVS | GET_STATE);
+	close(container_fd);
+	if (info == NULL) {
+		dprintf("imsm: Cannot get device info.\n");
+		goto exit_grow_manage_size_ext_meta;
+	}
+	if (current_vol > -1) {
+		struct intel_super *super;
+
+		super = st->sb;
+		super->current_vol = current_vol;
+	}
+	st->ss->getinfo_super(st, info, NULL);
+	size = info->custom_array_size/2;
+	sysfs_get_ll(sra, NULL, "array_size", &current_size);
+	dprintf("imsm_grow_manage_size(): current size is %llu, "\
+		"set size to %llu\n",
+		current_size, size);
+	sysfs_set_num(sra, NULL, "array_size", size);
+
+	ret_val = 1;
+
+exit_grow_manage_size_ext_meta:
+	sysfs_free(info);
+	return ret_val;
+}
+
+int imsm_child_grow(struct supertype *st,
+		    char *devname,
+		    int fd_in,
+		    struct mdinfo *sra,
+		    int current_vol,
+		    char *backup)
+{
+	int ret_val = 0;
+	int nrdisks;
+	int *fdlist;
+	unsigned long long *offsets;
+	unsigned int ndata, odata;
+	int ndisks, odisks;
+	unsigned long blocks, stripes;
+	int d;
+	struct mdinfo *sd;
+	int validate_fd;
+
+	nrdisks = ndisks = odisks = sra->array.raid_disks;
+	odisks -= sra->delta_disks;
+	odata = odisks-1;
+	ndata = ndisks-1;
+	fdlist = malloc((1+nrdisks) * sizeof(int));
+	offsets = malloc((1+nrdisks) * sizeof(offsets[0]));
+	if (!fdlist || !offsets) {
+		fprintf(stderr, Name ": malloc failed: grow aborted\n");
+		ret_val = 1;
+		if (fdlist)
+			free(fdlist);
+		if (offsets)
+			free(offsets);
+		return ret_val;
+	}
+	blocks = compute_backup_blocks(sra->array.chunk_size,
+				       sra->array.chunk_size,
+				       ndata, odata);
+
+	/* set MD_DISK_SYNC flag to open all devices that has to be backuped
+	 */
+	for (sd = sra->devs; sd; sd = sd->next) {
+		if ((sd->disk.raid_disk > -1) &&
+		    ((unsigned int)sd->disk.raid_disk < odata)) {
+			sd->disk.state |= (1<<MD_DISK_SYNC);
+			sd->disk.state &= ~(1<<MD_DISK_FAULTY);
+		} else {
+			sd->disk.state |= (1<<MD_DISK_FAULTY);
+			sd->disk.state &= ~(1<<MD_DISK_SYNC);
+		}
+	}
+#ifdef DEBUG
+	dprintf("FD list disk inspection:\n");
+	for (sd = sra->devs; sd; sd = sd->next) {
+		char *dn = map_dev(sd->disk.major,
+				   sd->disk.minor, 1);
+		dprintf("Disk %s", dn);
+		dprintf("\tstate = %i\n", sd->disk.state);
+	}
+#endif
+	d = reshape_prepare_fdlist(devname, sra, odisks,
+				    nrdisks, blocks, NULL,
+				    fdlist, offsets);
+	if (d < 0) {
+		fprintf(stderr, Name ": cannot prepare device list\n");
+		free(fdlist);
+		free(offsets);
+		ret_val = 1;
+		return ret_val;
+	}
+
+	if (reshape_open_backup_file(backup, fd_in, "imsm",
+				     (signed)blocks,
+				     fdlist, offsets) == 0) {
+		free(fdlist);
+		free(offsets);
+		ret_val = 1;
+		return ret_val;
+	}
+	d++;
+
+	mlockall(MCL_FUTURE);
+	if (ret_val == 0) {
+		if (check_env("MDADM_GROW_VERIFY"))
+			validate_fd = fd_in;
+		else
+			validate_fd = -1;
+
+		sra->array.raid_disks = odisks;
+		sra->new_level = sra->array.level;
+		sra->new_layout = sra->array.layout;
+		sra->new_chunk = sra->array.chunk_size;
+
+		stripes = blocks / (sra->array.chunk_size/512) / odata;
+		child_grow(validate_fd, sra, stripes,
+			fdlist, offsets,
+			odisks, sra->array.chunk_size,
+			sra->array.level, sra->array.layout, odata,
+			d - odisks, fdlist + odisks, offsets + odisks);
+		imsm_grow_manage_size(st, sra, current_vol);
+	}
+	reshape_free_fdlist(fdlist, offsets, d);
+
+	if (backup)
+		unlink(backup);
+
+	return ret_val;
+}
+
+void return_to_raid0(struct mdinfo *sra)
+{
+	if (sra->array.level == 4) {
+		dprintf("Execute backward takeover to raid0\n");
+		sysfs_set_str(sra, NULL, "level", "raid0");
+	}
+}
+
+int imsm_check_reshape_conditions(int fd,
+				  struct supertype *st,
+				  int current_array)
+{
+	char buf[PATH_MAX];
+	struct mdinfo *info = NULL;
+	int arrays_in_reshape_state = 0;
+	int wait_counter = 0;
+	int i;
+	int ret_val = 0;
+	struct intel_super *super = st->sb;
+	struct imsm_super *mpb = super->anchor;
+	int wrong_slots_counter;
+
+	/* wait until all arrays will be in reshape state
+	 * or error occures (iddle state detected)
+	 */
+	while ((arrays_in_reshape_state == 0) &&
+	       (ret_val == 0)) {
+		arrays_in_reshape_state = 0;
+		int temp_array;
+
+		if (wait_counter)
+			sleep(1);
+
+		for (i = 0; i < mpb->num_raid_devs; i++) {
+			int sync_max;
+			int len;
+
+			/* check array state in md
+			 */
+			st->ss->load_super(st, fd, NULL);
+			if (st->sb == NULL) {
+				dprintf("cannot get sb\n");
+				ret_val = 1;
+				break;
+			}
+			info = sysfs_read(fd,
+					  0,
+					  GET_LEVEL | GET_VERSION |
+					  GET_DEVS | GET_STATE);
+			if (info == NULL) {
+				dprintf("imsm: Cannot get device info.\n");
+				break;
+			}
+			super = st->sb;
+			super->current_vol = i;
+			st->ss->getinfo_super(st, info, NULL);
+
+			imsm_find_array_minor_by_subdev(i,
+							st->container_dev,
+							&temp_array);
+			if (temp_array != current_array) {
+				if (temp_array < 0) {
+					ret_val = -1;
+					break;
+				}
+				sysfs_free(info);
+				info = NULL;
+				continue;
+			}
+			sprintf(info->sys_name, "md%i", current_array);
+			if (sysfs_get_str(info,
+					  NULL,
+					  "raid_disks",
+					  buf,
+					  sizeof(buf)) < 0) {
+				dprintf("cannot get raid_disks\n");
+				ret_val = 1;
+				break;
+			}
+			/* sync_max should be always set to 0
+			 */
+			if (sysfs_get_str(info,
+					  NULL,
+					  "sync_max",
+					  buf,
+					  sizeof(buf)) < 0) {
+				dprintf("cannot get sync_max\n");
+				ret_val = 1;
+				break;
+			}
+			len = strlen(buf)-1;
+			if (len < 0)
+				len = 0;
+			*(buf+len) = 0;
+			sync_max = atoi(buf);
+			if (sync_max != 0) {
+				dprintf("sync_max has wrong value (%s)\n", buf);
+				sysfs_free(info);
+				info = NULL;
+				continue;
+			}
+			if (sysfs_get_str(info,
+					  NULL,
+					  "sync_action",
+					  buf,
+					  sizeof(buf)) < 0) {
+				dprintf("cannot get sync_action\n");
+				ret_val = 1;
+				break;
+			}
+			len = strlen(buf)-1;
+			if (len < 0)
+				len = 0;
+			*(buf+len) = 0;
+			if (strncmp(buf, "idle", 7) == 0) {
+				dprintf("imsm: Error found array in idle state"\
+					" during reshape initialization\n");
+				ret_val = 1;
+				break;
+			}
+			if (strncmp(buf, "reshape", 7) == 0) {
+				arrays_in_reshape_state++;
+			} else {
+				if (strncmp(buf, "frozen", 6) != 0) {
+					*(buf+strlen(buf)) = 0;
+					dprintf("imsm: Error unexpected array "\
+						"state (%s) during reshape "\
+						"initialization\n",
+						buf);
+					ret_val = 1;
+					break;
+				}
+			}
+			/* this device looks ok, so
+			 * check if slots are set corectly
+			 */
+			super = st->sb;
+			wrong_slots_counter =
+				imsm_reshape_array_count_slots_mismatches(super,
+							i,
+							atoi(info->sys_name+2));
+			sysfs_free(info);
+			info = NULL;
+			if (wrong_slots_counter != 0) {
+				dprintf("Slots for correction %i.\n",
+					wrong_slots_counter);
+				ret_val = 1;
+				goto exit_imsm_check_reshape_conditions;
+			}
+		}
+		sysfs_free(info);
+		info = NULL;
+		wait_counter++;
+		if (wait_counter > 60) {
+			dprintf("exit on timeout, "\
+				"container is not prepared to reshape\n");
+			ret_val = 1;
+		}
+	}
+
+exit_imsm_check_reshape_conditions:
+	sysfs_free(info);
+	info = NULL;
+
+	return ret_val;
+}
+
+int imsm_manage_container_reshape(struct supertype *st, char *backup)
+{
+	int ret_val = 1;
+	char buf[PATH_MAX];
+	struct intel_super *super = st->sb;
+	struct imsm_super *mpb = super->anchor;
+	int fd;
+	struct mdinfo *info = NULL;
+	struct mdinfo info2;
+	int delta_disks;
+	struct geo_params geo;
+#ifdef DEBUG
+	int i;
+#endif
+
+	memset(&geo, sizeof(struct geo_params), 0);
+	/* verify reshape conditions
+	 * for single vlolume reshape exit only and reuse Grow_reshape() code
+	 */
+	if (st->container_dev != st->devnum) {
+		dprintf("imsm: imsm_manage_container_reshape() detects volume "\
+			"reshape (devnum = %i), exit.\n",
+			st->devnum);
+		return ret_val;
+	}
+
+	if (backup == NULL) {
+		fprintf(stderr, Name ": Cannot grow - need backup-file\n");
+		return ret_val;
+	}
+
+	geo.dev_name = devnum2devname(st->devnum);
+	if (geo.dev_name == NULL) {
+		dprintf("Error: imsm_manage_reshape(): can't get devname.\n");
+		return ret_val;
+	}
+
+	fd = open_dev(st->devnum);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		goto imsm_manage_container_reshape_exit;
+	}
+
+	/* send pings to roll managemon and monitor
+	 */
+	ping_manager(geo.dev_name);
+	ping_monitor(geo.dev_name);
+
+#ifdef DEBUG
+	/* device list for reshape
+	 */
+	dprintf("Arrays to run reshape (no: %i)\n", mpb->num_raid_devs);
+	for (i = 0; i < mpb->num_raid_devs; i++) {
+		struct imsm_dev *dev = get_imsm_dev(super, i);
+		dprintf("\tDevice: %s\n", dev->volume);
+	}
+#endif
+
+	info2.devs = NULL;
+	super = st->sb;
+	super->current_vol = 0;
+	st->ss->getinfo_super(st, &info2, NULL);
+	geo.dev_id = -1;
+	imsm_find_array_minor_by_subdev(super->current_vol,
+					st->container_dev,
+					&geo.dev_id);
+	if (geo.dev_id < 0) {
+		dprintf("imsm. Error.Cannot get first array.\n");
+		goto imsm_manage_container_reshape_exit;
+	}
+	if (imsm_check_reshape_conditions(fd, st, geo.dev_id)) {
+		dprintf("imsm. Error. Wrong reshape conditions.\n");
+		goto imsm_manage_container_reshape_exit;
+	}
+	geo.raid_disks = info2.array.raid_disks;
+	dprintf("Container is ready for reshape ...\n");
+	switch (fork()) {
+	case 0:
+		fprintf(stderr, Name ": Child forked to monitor reshape\n");
+		while (geo.dev_id > -1) {
+			int fd2 = -1;
+			int i;
+			int temp_array = -1;
+			int array;
+
+			for (i = 0; i < mpb->num_raid_devs; i++) {
+				struct intel_super *super;
+
+				st->ss->load_super(st, fd, NULL);
+				if (st->sb == NULL) {
+					dprintf("cannot get sb\n");
+					ret_val = 1;
+					goto imsm_manage_container_reshape_exit;
+				}
+				info2.devs = NULL;
+				super = st->sb;
+				super->current_vol = i;
+				st->ss->getinfo_super(st, &info2, NULL);
+				imsm_find_array_minor_by_subdev(
+					super->current_vol,
+					st->container_dev,
+					&temp_array);
+				if (temp_array == geo.dev_id) {
+					dprintf("Checking slots for device "\
+						"md%i\n",
+						geo.dev_id);
+					break;
+				}
+			}
+			dprintf("Prepare to reshape for device md%i\n",
+				geo.dev_id);
+			fd2 = open_dev(geo.dev_id);
+			if (fd2 < 0) {
+				dprintf("Cannot open array.\n");
+				ret_val = 1;
+				goto imsm_manage_container_reshape_exit;
+			}
+			info = sysfs_read(fd2,
+					  0,
+					  GET_VERSION | GET_LEVEL | GET_DEVS |
+					  GET_STATE | GET_COMPONENT |
+					  GET_OFFSET | GET_CACHE | GET_CHUNK |
+					  GET_DISKS | GET_DEGRADED | GET_SIZE |
+					  GET_LAYOUT);
+			if (info == NULL) {
+				dprintf("Cannot read sysfs.\n");
+				close(fd2);
+				ret_val = 1;
+				goto imsm_manage_container_reshape_exit;
+			}
+			delta_disks = info->delta_disks;
+			super = st->sb;
+
+			if (sysfs_get_str(info,
+					  NULL,
+					  "sync_completed",
+					  buf,
+					  sizeof(buf)) >= 0) {
+				/* check if in previous pass we reshape
+				 * any array, if not we have to omit
+				 * sync_complete condition and try
+				 * to reshape arrays
+				 */
+				if ((*buf == '0') ||
+				     /* or this array was already reshaped */
+				     (strncmp(buf, "none", 4) == 0)) {
+					dprintf("Skip this array, "\
+						"sync_completed is %s\n",
+						buf);
+					geo.dev_id = -1;
+					sysfs_free(info);
+					info = NULL;
+					close(fd2);
+					continue;
+				}
+			} else {
+				dprintf("Cannot read sync_complete.\n");
+				dprintf("Array level is: %i\n",
+					info->array.level);
+				ret_val = 1;
+				close(fd2);
+				goto imsm_manage_container_reshape_exit;
+			}
+			snprintf(buf, PATH_MAX, "/dev/md/%s", info2.name);
+			info->delta_disks = info2.delta_disks;
+
+			delta_disks = info->array.raid_disks - geo.raid_disks;
+			geo.raid_disks = info->array.raid_disks;
+			if (info->array.level == 4) {
+				geo.raid_disks--;
+				delta_disks--;
+			}
+
+			super = st->sb;
+			super->current_vol = i;
+			ret_val = imsm_child_grow(st, buf,
+						  fd2,
+						  info,
+						  i,
+						  backup);
+			return_to_raid0(info);
+			sysfs_free(info);
+			info = NULL;
+			close(fd2);
+			i++;
+			if (ret_val) {
+				dprintf("Reshape is broken (cannot reshape)\n");
+				ret_val = 1;
+				goto imsm_manage_container_reshape_exit;
+			}
+			geo.dev_id = -1;
+			array = get_volume_for_olce(st, geo.raid_disks);
+			if (array >= 0) {
+				struct imsm_update_reshape *u;
+				dprintf("imsm: next volume to reshape is "\
+					"subarray: %i\n",
+					array);
+				imsm_find_array_minor_by_subdev(array,
+								st->devnum,
+								&geo.dev_id);
+				if (geo.dev_id > -1) {
+					/* send next array update
+					 */
+					dprintf("imsm: Preparing metadata "\
+						"update for subarray: %i "\
+						"(md%i)\n", array, geo.dev_id);
+					st->update_tail = &st->updates;
+					u = imsm_create_metadata_update_for_reshape(st, &geo);
+					if (u) {
+						u->reshape_delta_disks =
+							delta_disks;
+						append_metadata_update(st,
+							u,
+							u->update_memory_size);
+						flush_metadata_updates(st);
+						/* send pings to roll managemon
+						 * and monitor
+						 */
+						ping_manager(geo.dev_name);
+						ping_monitor(geo.dev_name);
+
+						if (imsm_check_reshape_conditions(
+							fd,
+							st,
+							geo.dev_id)) {
+							ret_val = 1;
+							geo.dev_id = -1;
+						}
+					} else
+						geo.dev_id = -1;
+				}
+			}
+		}
+		unfreeze_container(st);
+		close(fd);
+		break;
+	case -1:
+		fprintf(stderr,
+			Name ": Cannot run child to monitor reshape: %s\n",
+			strerror(errno));
+		ret_val = 1;
+		break;
+	default:
+		/* The child will take care of unfreezing the array */
+		break;
+	}
+
+imsm_manage_container_reshape_exit:
+	sysfs_free(info);
+	if (fd > -1)
+		close(fd);
+	if (geo.dev_name)
+		free(geo.dev_name);
+
+	return ret_val;
+}
+
+int imsm_manage_reshape(struct supertype *st, char *backup)
+{
+	int ret_val = 0;
+
+	dprintf("imsm: manage_reshape() called\n");
+
+	if (experimental() == 0)
+		return ret_val;
+
+	/* verify reshape conditions
+	 * for single vlolume reshape exit only and reuse Grow_reshape() code
+	 */
+	if (st->container_dev != st->devnum) {
+		dprintf("imsm: manage_reshape() current volume devnum: %i\n",
+			st->devnum);
+
+		return ret_val;
+	}
+	ret_val = imsm_manage_container_reshape(st, backup);
+	/* unfreeze on error and success
+	 * for any result this is end of work
+	 */
+	unfreeze_container(st);
+
+	return ret_val;
+}
+
+
 struct superswitch super_imsm = {
 #ifndef	MDASSEMBLE
 	.examine_super	= examine_super_imsm,
@@ -7132,6 +7762,7 @@ struct superswitch super_imsm = {
 	.default_geometry = default_geometry_imsm,
 	.get_disk_controller_domain = imsm_get_disk_controller_domain,
 	.reshape_super  = imsm_reshape_super,
+	.manage_reshape = imsm_manage_reshape,
 	.reshape_array	= imsm_reshape_array,
 
 	.external	= 1,


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 12/29] Finalize reshape after adding disks to array
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (10 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 11/29] Control reshape in mdadm Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 13/29] Add reshape progress updating Adam Kwolek
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When reshape is finished monitor, has to finalize reshape in metadata.
To do this set_array_state() should be called.
This finishes migration and stores metadata on disks.

reshape_delta_disks is set to not active value.
This finishes reshape flow in mdmon.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 monitor.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index 5298fa1..6191886 100644
--- a/monitor.c
+++ b/monitor.c
@@ -305,6 +305,21 @@ static int read_and_act(struct active_array *a)
 		}
 	}
 
+	/* finalize reshape detection
+	 */
+	if ((a->curr_action != reshape) &&
+	    (a->prev_action == reshape)) {
+		/* set reshape_not_active
+		 * to allow for future rebuilds
+		 */
+		a->reshape_state = reshape_not_active;
+		/* A reshape has finished.
+		 * Some disks may be in sync now.
+		 */
+		a->container->ss->set_array_state(a, a->curr_state <= clean);
+		check_degraded = 1;
+	}
+
 	/* Check for failures and if found:
 	 * 1/ Record the failure in the metadata and unblock the device.
 	 *    FIXME update the kernel to stop notifying on failed drives when


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 13/29] Add reshape progress updating
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (11 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 12/29] Finalize reshape after adding disks to array Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 14/29] WORKAROUND: md reports idle state during reshape start Adam Kwolek
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Reshape progress is not updated in mdmon.
This patch adds reshape progress updating feature.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |   11 +++++++++++
 mdmon.h     |    1 +
 monitor.c   |    4 ++++
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index c48b114..dda9d95 100644
--- a/managemon.c
+++ b/managemon.c
@@ -417,6 +417,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 			struct metadata_update *updates = NULL;
 			struct mdinfo *newdev = NULL;
 			struct mdinfo *d;
+			int delta_disks = a->reshape_delta_disks;
 
 			newdev = newa->container->ss->reshape_array(newa,
 							reshape_in_progress,
@@ -463,6 +464,15 @@ static void manage_member(struct mdstat_ent *mdstat,
 					/* reshape executed
 					 */
 					dprintf("Reshape was started\n");
+					newa->new_data_disks =
+						newa->info.array.raid_disks +
+						delta_disks;
+					if (a->info.array.level == 4)
+						newa->new_data_disks--;
+					if (a->info.array.level == 5)
+						newa->new_data_disks--;
+					if (a->info.array.level == 6)
+						newa->new_data_disks--;
 					replace_array(a->container, a, newa);
 					a = newa;
 				} else {
@@ -608,6 +618,7 @@ static void manage_new(struct mdstat_ent *mdstat,
 	new->container = container;
 
 	new->reshape_state = reshape_not_active;
+	new->new_data_disks = 0;
 
 	inst = to_subarray(mdstat, container->devname);
 
diff --git a/mdmon.h b/mdmon.h
index 6f8b439..eff4988 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -49,6 +49,7 @@ struct active_array {
 
 	enum state_of_reshape reshape_state;
 	int reshape_delta_disks;
+	int new_data_disks;
 
 	int check_degraded; /* flag set by mon, read by manage */
 
diff --git a/monitor.c b/monitor.c
index 6191886..b5cd3c6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -305,6 +305,10 @@ static int read_and_act(struct active_array *a)
 		}
 	}
 
+	if (a->curr_action == reshape)
+		a->info.reshape_progress = a->info.resync_start *
+					   a->new_data_disks;
+
 	/* finalize reshape detection
 	 */
 	if ((a->curr_action != reshape) &&


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 14/29] WORKAROUND: md reports idle state during reshape start
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (12 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 13/29] Add reshape progress updating Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 15/29] FIX: core during getting map Adam Kwolek
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

md reports reshape->idle->reshape states transition on reshape start, so reshape finalization is wrongly indicated.
Finalize reshape when we have any progress only,
When reshape is really started, idle state causes reshape finalization as usually.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 monitor.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/monitor.c b/monitor.c
index b5cd3c6..bbfe3d2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -312,7 +312,8 @@ static int read_and_act(struct active_array *a)
 	/* finalize reshape detection
 	 */
 	if ((a->curr_action != reshape) &&
-	    (a->prev_action == reshape)) {
+	    (a->prev_action == reshape) &&
+	    (a->info.reshape_progress > 2)) {
 		/* set reshape_not_active
 		 * to allow for future rebuilds
 		 */


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 15/29] FIX: core during getting map
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (13 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 14/29] WORKAROUND: md reports idle state during reshape start Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:20 ` [PATCH 16/29] Enable reshape for subarrays Adam Kwolek
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

It can occurs that during walking container end conditions bases on "map"
function return value, so function can be protected for wrong data input.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index d747322..1ac6bc2 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -435,8 +435,12 @@ static size_t sizeof_imsm_map(struct imsm_map *map)
 
 struct imsm_map *get_imsm_map(struct imsm_dev *dev, int second_map)
 {
-	struct imsm_map *map = &dev->vol.map[0];
+	struct imsm_map *map;
+
+	if (dev == NULL)
+		return NULL;
 
+	map = &dev->vol.map[0];
 	if (second_map && !dev->vol.migr_state)
 		return NULL;
 	else if (second_map) {
@@ -1517,6 +1521,9 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	char *devname;
 	int map_disks = info->array.raid_disks;
 
+	if (map == NULL)
+		return;
+
 	for (dl = super->disks; dl; dl = dl->next)
 		if (dl->raiddisk == info->disk.raid_disk)
 			break;


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 16/29] Enable reshape for subarrays
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (14 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 15/29] FIX: core during getting map Adam Kwolek
@ 2010-12-09 15:20 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 17/29] Change manage_reshape() placement Adam Kwolek
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Reshape for subarrays is blocked in Grow.c due to lack of implementation.
This patch allows for subarray processing.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/Grow.c b/Grow.c
index dce7912..101eb8f 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1518,7 +1518,9 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 		 * layout/chunksize/raid_disks can be changed
 		 * though the kernel may not support it all.
 		 */
-		if (subarray) {
+		if ((subarray) &&
+		    !(st->ss->external &&
+		      st->ss->reshape_super && st->ss->manage_reshape)) {
 			fprintf(stderr, Name ": Cannot reshape subarrays yet\n");
 			break;
 		}


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 17/29] Change manage_reshape() placement
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (15 preceding siblings ...)
  2010-12-09 15:20 ` [PATCH 16/29] Enable reshape for subarrays Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 18/29] Migration: raid5->raid0 Adam Kwolek
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

After reshape_super() call manage_reshape() should do the same things
as grow_reshape() for native metadata case (for execution on array).
The difference is on reshape finish only, when md finishes his work.
For external metadata size is managed externally from md point of view,
so specific to metadata action is required there.
This causes moving manage_reshape() placement to add necessary actions only
to common flow and not duplicate current code.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c |   23 +++++++++++++----------
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/Grow.c b/Grow.c
index 101eb8f..9f355a1 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1783,14 +1783,6 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 			break;
 		}
 
-		if (st->ss->external) {
-			/* metadata handler takes it from here */
-			ping_manager(container);
-			st->ss->manage_reshape(st, backup_file);
-			frozen = 0;
-			break;
-		}
-
 		/* set up the backup-super-block.  This requires the
 		 * uuid from the array.
 		 */
@@ -1854,6 +1846,15 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 						       d - odisks, fdlist+odisks, offsets+odisks);
 			if (backup_file && done)
 				unlink(backup_file);
+
+			/* manage/finalize reshape in metadata specific way
+			 */
+			close(fd);
+			if (st->ss->external && st->ss->manage_reshape) {
+				st->ss->manage_reshape(st, backup_file);
+				break;
+			}
+
 			if (level != UnSet && level != array.level) {
 				/* We need to wait for the reshape to finish
 				 * (which will have happened unless odata < ndata)
@@ -1864,8 +1865,10 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				if (c == NULL)
 					exit(0);/* not possible */
 
-				if (odata < ndata)
-					wait_reshape(sra);
+				/* child process has always waits
+				 * for reshape finish to perform unfreeze
+				 */
+				wait_reshape(sra);
 				err = sysfs_set_str(sra, NULL, "level", c);
 				if (err)
 					fprintf(stderr, Name ": %s: could not set level to %s\n",


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 18/29] Migration: raid5->raid0
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (16 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 17/29] Change manage_reshape() placement Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 19/29] Detect level change Adam Kwolek
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Add implementation for migration from raid5 to raid0 in one step.
For this migration case (and others for external metadata case)
flow used for Expansion is used. This causes update array parameters
in managemon based on sent metadata update. To do this uptate md parameters
in Grow.c has to be disabled for external metadata case.

In Grow.c instead starting reshape for external metadata case
wait_reshape_start_ext() function is introduced.
Function waits for reshape start initialized by managemon after setting
array parameter as for Expansion case.

In managemon was added subarray_set_num_man() function.
It is similar to function that exists in Grow.c except 2 things:
1. it uses different way to "ping" monitor
2. it tries to set raid_disks more than 2 times as we are more sure that monitor works
   during processing in managemon context

For imsm raid level parameters flow from mdadm (via metadata update)
to managemon was added.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c        |   82 +++++++++++------
 managemon.c   |  124 +++++++++++++++++++++++---
 mdadm.h       |    2 
 mdmon.h       |    3 +
 super-intel.c |  277 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 435 insertions(+), 53 deletions(-)

diff --git a/Grow.c b/Grow.c
index 9f355a1..81373ef 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1752,28 +1752,53 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				break;
 			}
 		} else {
-			/* set them all just in case some old 'new_*' value
-			 * persists from some earlier problem
+			/* set parametes here only if managemon
+			 * is not responsible for this
 			 */
-			int err = err; /* only used if rv==1, and always set if
-					* rv==1, so initialisation not needed,
-					* despite gcc warning
-					*/
-			if (sysfs_set_num(sra, NULL, "chunk_size", nchunk) < 0)
-				rv = 1, err = errno;
-			if (!rv && sysfs_set_num(sra, NULL, "layout", nlayout) < 0)
-				rv = 1, err = errno;
-			if (!rv && sysfs_set_num(sra, NULL, "raid_disks", ndisks) < 0)
-				rv = 1, err = errno;
-			if (rv) {
-				fprintf(stderr, Name ": Cannot set device shape for %s\n",
-					devname);
-				if (get_linux_version() < 2006030)
-					fprintf(stderr, Name ": linux 2.6.30 or later required\n");
-				if (err == EBUSY && 
-				    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
-					fprintf(stderr, "       Bitmap must be removed before shape can be changed\n");
-				break;
+			if ((st->ss->external == 0) ||
+			    (st->ss->reshape_super == NULL)) {
+				/* set them all just in case some old 'new_*'
+				 * value persists from some earlier problem
+				 */
+				int err = err; /* only used if rv==1, and always
+						* set if rv==1,
+						* so initialisation not needed,
+						* despite gcc warning
+						*/
+				if (sysfs_set_num(sra,
+						  NULL,
+						  "chunk_size",
+						  nchunk) < 0)
+					rv = 1, err = errno;
+				if (!rv && sysfs_set_num(sra,
+							 NULL,
+							 "layout",
+							 nlayout) < 0)
+					rv = 1, err = errno;
+				if (!rv && sysfs_set_num(sra,
+							 NULL,
+							 "raid_disks",
+							 ndisks) < 0)
+					rv = 1, err = errno;
+				if (rv) {
+					fprintf(stderr,
+						Name ": Cannot set device "\
+						"shape for %s\n",
+						devname);
+					if (get_linux_version() < 2006030)
+						fprintf(stderr,
+							Name\
+							": linux 2.6.30 or "\
+							"later required\n");
+					if (err == EBUSY &&
+					    (array.state &
+					     (1<<MD_SB_BITMAP_PRESENT)))
+						fprintf(stderr,
+							"       Bitmap must be"\
+							" removed before shape"\
+							" can be changed\n");
+					break;
+				}
 			}
 		}
 
@@ -2200,8 +2225,8 @@ static void validate(int afd, int bfd, unsigned long long offset)
 	}
 }
 
-int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
-	       int *fds, unsigned long long *offsets,
+int child_grow(int afd, struct mdinfo *sra,
+	       unsigned long stripes, int *fds, unsigned long long *offsets,
 	       int disks, int chunk, int level, int layout, int data,
 	       int dests, int *destfd, unsigned long long *destoffsets)
 {
@@ -2264,11 +2289,12 @@ static int child_shrink(int afd, struct mdinfo *sra, unsigned long stripes,
 	return 1;
 }
 
-static int child_same_size(int afd, struct mdinfo *sra, unsigned long stripes,
-			   int *fds, unsigned long long *offsets,
-			   unsigned long long start,
-			   int disks, int chunk, int level, int layout, int data,
-			   int dests, int *destfd, unsigned long long *destoffsets)
+int child_same_size(int afd,
+		    struct mdinfo *sra, unsigned long stripes,
+		    int *fds, unsigned long long *offsets,
+		    unsigned long long start,
+		    int disks, int chunk, int level, int layout, int data,
+		    int dests, int *destfd, unsigned long long *destoffsets)
 {
 	unsigned long long size;
 	unsigned long tailstripes = stripes;
diff --git a/managemon.c b/managemon.c
index dda9d95..0c84e6d 100644
--- a/managemon.c
+++ b/managemon.c
@@ -380,6 +380,46 @@ static int disk_init_and_add(struct mdinfo *disk, struct mdinfo *clone,
 	return 0;
 }
 
+int subarray_set_num_man(char *container, struct mdinfo *sra, char *name, int n)
+{
+	/* when dealing with external metadata subarrays we need to be
+	 * prepared to handle EAGAIN.  The kernel may need to wait for
+	 * mdmon to mark the array active so the kernel can handle
+	 * allocations/writeback when preparing the reshape action
+	 * (md_allow_write()).  We temporarily disable safe_mode_delay
+	 * to close a race with the array_state going clean before the
+	 * next write to raid_disks / stripe_cache_size
+	 */
+	char safe[50];
+	int rc;
+#define MANAGEMON_COUNTER	20
+	int counter = MANAGEMON_COUNTER;
+
+	/* only 'raid_disks' and 'stripe_cache_size' trigger md_allow_write */
+	if (strcmp(name, "raid_disks") != 0 &&
+	    strcmp(name, "stripe_cache_size") != 0)
+		return sysfs_set_num(sra, NULL, name, n);
+
+	rc = sysfs_get_str(sra, NULL, "safe_mode_delay", safe, sizeof(safe));
+	if (rc <= 0)
+		return -1;
+	sysfs_set_num(sra, NULL, "safe_mode_delay", 0);
+	rc = sysfs_set_num(sra, NULL, name, n);
+	while ((rc < 0) && counter) {
+		counter--;
+		dprintf("managemon: Try to set %s to value %i (%i time(s)).\n",
+			name,
+			n,
+			MANAGEMON_COUNTER - counter);
+		wakeup_monitor();
+		usleep(250000);
+		rc = sysfs_set_num(sra, NULL, name, n);
+	}
+	sysfs_set_str(sra, NULL, "safe_mode_delay", safe);
+	return rc;
+}
+
+
 static void manage_member(struct mdstat_ent *mdstat,
 			  struct active_array *a)
 {
@@ -410,6 +450,8 @@ static void manage_member(struct mdstat_ent *mdstat,
 	else
 		frozen = 1; /* can't read metadata_version assume the worst */
 
+
+
 	if ((a->reshape_state != reshape_not_active) &&
 	    (a->reshape_state != reshape_in_progress)) {
 		dprintf("Reshape signals need to manage this member\n");
@@ -418,19 +460,19 @@ static void manage_member(struct mdstat_ent *mdstat,
 			struct mdinfo *newdev = NULL;
 			struct mdinfo *d;
 			int delta_disks = a->reshape_delta_disks;
+			int status_ok = 1;
 
+			newa = duplicate_aa(a);
+			if (newa == NULL) {
+				a->reshape_state = reshape_not_active;
+				goto reshape_out;
+			}
 			newdev = newa->container->ss->reshape_array(newa,
 							reshape_in_progress,
 							&updates);
 			if (newdev) {
-				int status_ok = 1;
-				newa = duplicate_aa(a);
-				if (newa == NULL)
-					goto reshape_out;
-
 				for (d = newdev; d ; d = d->next) {
 					struct mdinfo *newd;
-
 					newd = malloc(sizeof(*newd));
 					if (!newd) {
 						status_ok = 0;
@@ -449,7 +491,9 @@ static void manage_member(struct mdstat_ent *mdstat,
 					}
 					disk_init_and_add(newd, d, newa);
 				}
-				/* go with reshape
+			}
+			if (newa->reshape_state == reshape_in_progress) {
+				/* set reshape parametars
 				 */
 				if (status_ok)
 					if (sysfs_set_num(&newa->info,
@@ -457,6 +501,44 @@ static void manage_member(struct mdstat_ent *mdstat,
 							  "sync_max",
 							  0) < 0)
 						status_ok = 0;
+				if (status_ok && newa->reshape_raid_disks) {
+					dprintf("managemon: set raid_disks "\
+						"to %i\n",
+						newa->reshape_raid_disks);
+					if (subarray_set_num_man(
+						a->container->devname,
+						&newa->info,
+						"raid_disks",
+						newa->reshape_raid_disks))
+						status_ok = 0;
+				}
+				if (status_ok && newa->reshape_level > -1) {
+					char *c = map_num(pers,
+							  newa->reshape_level);
+					if (c == NULL)
+						status_ok = 0;
+					else {
+						dprintf("managemon: set level "\
+							"to %s\n",
+						c);
+						if (sysfs_set_str(&newa->info,
+								  NULL,
+								  "level",
+								  c) < 0)
+							status_ok = 0;
+					}
+				}
+				if (status_ok && newa->reshape_layout >= 0) {
+					dprintf("managemon: set layout to %i\n",
+						newa->reshape_layout);
+					if (sysfs_set_num(&newa->info,
+						NULL,
+						"layout",
+						newa->reshape_layout) < 0)
+						status_ok = 0;
+				}
+				/* go with reshape
+				 */
 				if (status_ok && sysfs_set_str(&newa->info,
 							      NULL,
 							      "sync_action",
@@ -464,9 +546,13 @@ static void manage_member(struct mdstat_ent *mdstat,
 					/* reshape executed
 					 */
 					dprintf("Reshape was started\n");
-					newa->new_data_disks =
-						newa->info.array.raid_disks +
-						delta_disks;
+					if (newa->reshape_raid_disks > 0)
+						newa->new_data_disks =
+						       newa->reshape_raid_disks;
+					else
+						newa->new_data_disks =
+						   newa->info.array.raid_disks +
+						   delta_disks;
 					if (a->info.array.level == 4)
 						newa->new_data_disks--;
 					if (a->info.array.level == 5)
@@ -475,10 +561,10 @@ static void manage_member(struct mdstat_ent *mdstat,
 						newa->new_data_disks--;
 					replace_array(a->container, a, newa);
 					a = newa;
+					newa = NULL;
 				} else {
 					/* on problems cancel update
 					 */
-					free_aa(newa);
 					free_updates(&updates);
 					updates = NULL;
 					a->container->ss->reshape_array(a,
@@ -488,20 +574,34 @@ static void manage_member(struct mdstat_ent *mdstat,
 						      NULL,
 						      "sync_action",
 						      "idle");
+					a->reshape_state = reshape_not_active;
 				}
 			}
+reshape_out:
+			if (a->reshape_state == reshape_not_active) {
+				dprintf("Cancel reshape.\n");
+				a->container->ss->reshape_array(a,
+							reshape_cancel_request,
+							&updates);
+				sysfs_set_str(&a->info,
+					      NULL,
+					      "sync_action",
+					      "idle");
+			}
 			dprintf("Send metadata update for reshape.\n");
 
 			queue_metadata_update(updates);
 			updates = NULL;
 			wakeup_monitor();
-reshape_out:
+
 			while (newdev) {
 				d = newdev->next;
 				free(newdev);
 				newdev = d;
 			}
 			free_updates(&updates);
+			if (newa)
+				free_aa(newa);
 		}
 	}
 
diff --git a/mdadm.h b/mdadm.h
index a596040..c6dfa3d 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -455,6 +455,8 @@ extern int sysfs_attr_match(const char *attr, const char *str);
 extern int sysfs_match_word(const char *word, char **list);
 extern int sysfs_set_str(struct mdinfo *sra, struct mdinfo *dev,
 			 char *name, char *val);
+extern int sysfs_get_ll(struct mdinfo *sra, struct mdinfo *dev,
+			char *name, unsigned long long *val);
 extern int sysfs_set_num(struct mdinfo *sra, struct mdinfo *dev,
 			 char *name, unsigned long long val);
 extern int sysfs_uevent(struct mdinfo *sra, char *event);
diff --git a/mdmon.h b/mdmon.h
index eff4988..a35752c 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -50,6 +50,9 @@ struct active_array {
 	enum state_of_reshape reshape_state;
 	int reshape_delta_disks;
 	int new_data_disks;
+	int reshape_raid_disks;
+	int reshape_level;
+	int reshape_layout;
 
 	int check_degraded; /* flag set by mon, read by manage */
 
diff --git a/super-intel.c b/super-intel.c
index 1ac6bc2..0d4bb07 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -314,6 +314,9 @@ struct imsm_update_reshape {
 	enum imsm_update_type type;
 	int update_memory_size;
 	int reshape_delta_disks;
+	int reshape_raid_disks;
+	int reshape_level;
+	int reshape_layout;
 	int disks_count;
 	int spares_in_update;
 	int devnum;
@@ -5352,6 +5355,7 @@ static void imsm_process_update(struct supertype *st,
 		__u32 new_mpb_size;
 		int new_disk_num;
 		struct intel_dev *current_dev;
+		struct imsm_dev *new_dev;
 
 		dprintf("imsm: imsm_process_update() for update_reshape "\
 			"[u->update_prepared  = %i]\n",
@@ -5420,12 +5424,13 @@ static void imsm_process_update(struct supertype *st,
 		}
 		/* find current dev in intel_super
 		 */
+		new_dev = (struct imsm_dev *)((void *)u + u->upd_devs_offset);
 		dprintf("\t\tLooking  for volume %s\n",
 			(char *)u->devs_mem.dev->volume);
 		current_dev = super->devlist;
 		while (current_dev) {
 			if (strcmp((char *)current_dev->dev->volume,
-				   (char *)u->devs_mem.dev->volume) == 0)
+				   (char *)new_dev->volume) == 0)
 				break;
 			current_dev = current_dev->next;
 		}
@@ -5444,7 +5449,14 @@ static void imsm_process_update(struct supertype *st,
 		/* set reshape_delta_disks
 		 */
 		a->reshape_delta_disks = u->reshape_delta_disks;
+		a->reshape_raid_disks = u->reshape_raid_disks;
 		a->reshape_state = reshape_is_starting;
+		a->reshape_level = u->reshape_level;
+		a->reshape_layout = u->reshape_layout;
+		if (a->reshape_level == 0) {
+			a->reshape_level = 5;
+			a->reshape_layout = 5;
+		}
 
 		super->updates_pending++;
 update_reshape_exit:
@@ -5920,12 +5932,7 @@ static void imsm_prepare_update(struct supertype *st,
 		if (u->reshape_delta_disks < 0)
 			break;
 		u->update_prepared = 1;
-		if (u->reshape_delta_disks == 0) {
-			/* for non growing reshape buffers sizes
-			 * are not affected but check some parameters
-			 */
-			break;
-		}
+
 		/* count HDDs
 		 */
 		u->disks_count = 0;
@@ -6450,6 +6457,126 @@ abort:
 	return ret_val;
 }
 
+/*****************************************************************************
+ * Function: update_geometry
+ * Description: Prepares imsm volume map update in case of volume reshape
+ * Returns: 0 on success, -1 if fail
+ * ***************************************************************************/
+int update_geometry(struct supertype *st,
+		    struct geo_params *geo)
+{
+	int fd = -1, ret_val = -1;
+	struct mdinfo *sra = NULL;
+	char supported = 1;
+
+	fd = open_dev(geo->dev_id);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		return -1;
+	}
+
+	sra = sysfs_read(fd,
+			 0,
+			 GET_DISKS | GET_LAYOUT | GET_CHUNK |
+			 GET_SIZE | GET_LEVEL | GET_DEVS);
+	if (!sra) {
+		dprintf("imsm: Cannot get mdinfo!\n");
+		goto update_geometry_exit;
+	}
+
+	if (sra->devs == NULL) {
+		dprintf("imsm: Cannot load device information.\n");
+		goto update_geometry_exit;
+	}
+	/* is size change possible??? */
+	if (((unsigned long long)geo->size != sra->devs->component_size) &&
+					      (geo->size != UnSet) &&
+					      (geo->size > 0)) {
+		geo->size = sra->devs->component_size;
+		dprintf("imsm: Change the array size not supported in imsm!\n");
+		goto update_geometry_exit;
+	}
+
+	if ((geo->level != sra->array.level) &&
+	    (geo->level >= 0) &&
+	    (geo->level != UnSet)) {
+		switch (sra->array.level) {
+		case 0:
+			if (geo->level != 5)
+				supported = 0;
+			break;
+		case 5:
+			if (geo->level != 0)
+				supported = 0;
+			break;
+		case 1:
+			if ((geo->level != 5) || (geo->level != 0))
+				supported = 0;
+			break;
+		case 10:
+			if (geo->level != 5)
+				supported = 0;
+			break;
+		default:
+			supported = 0;
+			break;
+		}
+		if (!supported) {
+			dprintf("imsm: Error. Level Migration from %d to %d "\
+				"not supported!\n",
+				sra->array.level,
+				geo->level);
+			goto update_geometry_exit;
+		}
+	} else {
+		geo->level = sra->array.level;
+	}
+
+	if ((geo->layout != sra->array.layout) &&
+	    ((geo->layout != UnSet) && (geo->layout != -1))) {
+		if ((sra->array.layout == 0) &&
+		    (sra->array.level == 5) &&
+		    (geo->layout == 5)) {
+			/* reshape 5 -> 4 */
+			geo->raid_disks++;
+		} else if ((sra->array.layout == 5) &&
+			   (sra->array.level == 5) &&
+			   (geo->layout == 0)) {
+			/* reshape 4 -> 5 */
+			geo->layout = 0;
+			geo->level = 5;
+		} else {
+			dprintf("imsm: Error. Layout Migration from %d to %d "\
+				"not supported!\n",
+				sra->array.layout,
+				geo->layout);
+			ret_val = -1;
+			goto update_geometry_exit;
+		}
+	}
+
+	if ((geo->chunksize == 0) || (geo->chunksize == UnSet))
+		geo->chunksize = sra->array.chunk_size;
+
+	if (!validate_geometry_imsm(st,
+				    geo->level,
+				    geo->layout,
+				    geo->raid_disks,
+				    geo->chunksize,
+				    geo->size,
+				    0, 0, 1))
+		goto update_geometry_exit;
+
+	ret_val = 0;
+
+update_geometry_exit:
+	sysfs_free(sra);
+	if (fd > -1)
+		close(fd);
+
+	return ret_val;
+}
+
 /******************************************************************************
  * function: imsm_create_metadata_update_for_reshape
  * Function creates update for whole IMSM container.
@@ -6507,6 +6634,9 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
 	}
 	u->reshape_delta_disks = delta_disks;
 	u->update_prepared = -1;
+	u->reshape_raid_disks = 0;
+	u->reshape_level = -1;
+	u->reshape_layout = -1;
 	u->update_memory_size = update_memory_size;
 	u->type = update_reshape;
 	u->spares_in_update = 0;
@@ -6563,6 +6693,26 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
 								     idx);
 				}
 				u->devnum = geo->dev_id;
+				/* case for reshape without grow */
+				if (u->reshape_delta_disks == 0) {
+					dprintf("imsm: reshape prepare "\
+						"metadata for volume= %d, "\
+						"index= %d\n",
+						geo->dev_id,
+						i);
+					if (update_geometry(st, geo) == -1) {
+						dprintf("imsm: ERROR: Cannot "\
+							"prepare update for "\
+							"volume map!\n");
+						ret_val = NULL;
+						goto exit_imsm_create_metadata_update_for_reshape;
+					} else {
+						new_map->raid_level =
+							geo->level;
+						new_map->blocks_per_strip =
+							geo->chunksize / 512;
+					}
+				}
 				break;
 			}
 		}
@@ -6729,6 +6879,10 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 		       char *backup, char *dev, int verbouse)
 {
 	int ret_val = 1;
+	struct mdinfo *sra = NULL;
+	int fd = -1;
+	char buf[PATH_MAX];
+	int delta_disks = -1;
 	struct geo_params geo;
 
 	dprintf("imsm: reshape_super called.\n");
@@ -6787,9 +6941,68 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 				"on container\n");
 		if (ret_val)
 			unfreeze_container(st);
+		goto imsm_reshape_super_exit;
 	} else
 		dprintf("imsm: not a container operation\n");
 
+	fd = open_dev(st->devnum);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device: %s\n", buf);
+		goto imsm_reshape_super_exit;
+	}
+
+	sra = sysfs_read(fd, 0,  GET_VERSION | GET_LEVEL | GET_LAYOUT |
+			 GET_DISKS | GET_DEVS | GET_CHUNK | GET_SIZE);
+	if (sra == NULL) {
+		fprintf(stderr, Name ": Cannot read sysfs info (imsm)\n");
+		goto imsm_reshape_super_exit;
+	}
+
+	geo.dev_id = -1;
+
+	/* continue volume check - proceed if delta_disk is zero only
+	 */
+	if (geo.raid_disks > 0 && geo.raid_disks != UnSet)
+		delta_disks = geo.raid_disks - sra->array.raid_disks;
+	else
+		delta_disks = 0;
+	dprintf("imsm: imsm_reshape_super() for array, delta disks = %i\n",
+		delta_disks);
+	if (delta_disks == 0) {
+		struct imsm_update_reshape *u;
+		st->update_tail = &st->updates;
+		dprintf("imsm: imsm_reshape_super(): raid_disks not changed "\
+			"for volume reshape. Reshape allowed.\n");
+
+		geo.dev_id = st->devnum;
+		u = imsm_create_metadata_update_for_reshape(st, &geo);
+		if (u) {
+			if (geo.raid_disks > raid_disks)
+				u->reshape_raid_disks = geo.raid_disks;
+			u->reshape_level = geo.level;
+			u->reshape_layout = geo.layout;
+			ret_val = 0;
+			append_metadata_update(st, u, u->update_memory_size);
+		}
+		goto imsm_reshape_super_exit;
+	} else {
+		char *devname = devnum2devname(st->devnum);
+		char *devtoprint = devname;
+
+		if (devtoprint == NULL)
+			devtoprint = "Device";
+		fprintf(stderr, Name
+			": %s cannot be reshaped. Command has to be executed on container.\n",
+			devtoprint);
+		if (devname)
+			free(devname);
+	}
+
+imsm_reshape_super_exit:
+	sysfs_free(sra);
+	if (fd >= 0)
+		close(fd);
+
 	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
 
 	return ret_val;
@@ -7048,7 +7261,8 @@ struct mdinfo *imsm_reshape_array(struct active_array *a,
 
 	if (a->reshape_delta_disks == 0) {
 		dprintf("array parameters has to be changed\n");
-		/* TBD */
+		a->reshape_state = reshape_in_progress;
+		return disk_list;
 	}
 	if (a->reshape_delta_disks > 0) {
 		dprintf("grow is detected.\n");
@@ -7075,17 +7289,14 @@ imsm_reshape_array_exit:
 		imsm_grow_array_remove_devices_on_cancel(a);
 		u = (struct imsm_update_reshape *)calloc(1,
 					sizeof(struct imsm_update_reshape));
-		if (u) {
+		if (u)
 			u->type = update_reshape_cancel;
-			a->reshape_state = reshape_not_active;
-		}
 	}
 
 	if (u) {
 		/* post any prepared update
 		 */
 		u->devnum = a->devnum;
-
 		u->update_memory_size = sizeof(struct imsm_update_reshape);
 		u->reshape_delta_disks = a->reshape_delta_disks;
 		u->update_prepared = 1;
@@ -7283,7 +7494,8 @@ int imsm_child_grow(struct supertype *st,
 
 void return_to_raid0(struct mdinfo *sra)
 {
-	if (sra->array.level == 4) {
+	if ((sra->array.level == 4) ||
+	    (sra->array.level == 0)) {
 		dprintf("Execute backward takeover to raid0\n");
 		sysfs_set_str(sra, NULL, "level", "raid0");
 	}
@@ -7718,9 +7930,47 @@ int imsm_manage_reshape(struct supertype *st, char *backup)
 	 * for single vlolume reshape exit only and reuse Grow_reshape() code
 	 */
 	if (st->container_dev != st->devnum) {
+		int fd;
 		dprintf("imsm: manage_reshape() current volume devnum: %i\n",
 			st->devnum);
 
+		fd = open_dev(st->devnum);
+		if (fd > -1) {
+			struct mdinfo *info;
+			struct mdinfo sra;
+			char *cont_name;
+
+			sra.devs = NULL;
+			st->ss->getinfo_super(st, &sra, NULL);
+			/* wait for reshape finish
+			* and manage array size based on metadata information
+			*/
+			cont_name = devnum2devname(st->devnum);
+			if (cont_name) {
+				ping_manager(cont_name);
+				ping_monitor(cont_name);
+				free(cont_name);
+			}
+			imsm_grow_manage_size(st, &sra, -1);
+
+			/* for level == 4: execute takeover to raid0 */
+			info = sysfs_read(fd,
+					  0,
+					  GET_VERSION | GET_LEVEL |
+					  GET_DEVS | GET_LAYOUT);
+			if (info) {
+				/* curently md doesn't support direct
+				 * translation from raid5 to raid4
+				 * it has be done via raid5 layout5
+				 */
+				if ((info->array.level == 5) &&
+				    (info->array.layout == 5))
+					info->array.level = 4;
+				return_to_raid0(info);
+				sysfs_free(info);
+			}
+			close(fd);
+		}
 		return ret_val;
 	}
 	ret_val = imsm_manage_container_reshape(st, backup);
@@ -7786,3 +8036,4 @@ struct superswitch super_imsm = {
 	.prepare_update = imsm_prepare_update,
 #endif /* MDASSEMBLE */
 };
+


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 19/29] Detect level change
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (17 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 18/29] Migration: raid5->raid0 Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 20/29] Migration raid0->raid5 Adam Kwolek
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

For level migration support it is necessary to allow mdmon to react for level changes.
It has to have ability to change configuration of active array,
and for array level change to raid0 finish array monitoring.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |   12 +++++++++++-
 monitor.c   |    2 +-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/managemon.c b/managemon.c
index 0c84e6d..1774299 100644
--- a/managemon.c
+++ b/managemon.c
@@ -450,7 +450,17 @@ static void manage_member(struct mdstat_ent *mdstat,
 	else
 		frozen = 1; /* can't read metadata_version assume the worst */
 
-
+	if (mdstat->level) {
+		int level = map_name(pers, mdstat->level);
+		if (a->info.array.level != level && level >= 0) {
+			newa = duplicate_aa(a);
+			if (newa) {
+				newa->info.array.level = level;
+				replace_array(a->container, a, newa);
+				a = newa;
+			}
+		}
+	}
 
 	if ((a->reshape_state != reshape_not_active) &&
 	    (a->reshape_state != reshape_in_progress)) {
diff --git a/monitor.c b/monitor.c
index bbfe3d2..50dada4 100644
--- a/monitor.c
+++ b/monitor.c
@@ -513,7 +513,7 @@ static int wait_and_act(struct supertype *container, int nowait)
 		/* once an array has been deactivated we want to
 		 * ask the manager to discard it.
 		 */
-		if (!a->container) {
+		if (!a->container || (a->info.array.level == 0)) {
 			if (discard_this) {
 				ap = &(*ap)->next;
 				continue;


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 20/29] Migration raid0->raid5
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (18 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 19/29] Detect level change Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 21/29] Read chunk size and layout from mdstat Adam Kwolek
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Add implementation for migration from raid0 to raid5 in one step.
For imsm raid level parameters flow from mdadm (vi metadata update) to managemon was added.

Block takeover for this migration case (update_reshape is used only)
For migration on container (OLCE) reinitialize variables that are changed
by single array reshape case.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   35 ++++++++++++++++++++++++++++++-----
 1 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 0d4bb07..564869e 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5002,7 +5002,7 @@ static void imsm_sync_metadata(struct supertype *container)
 {
 	struct intel_super *super = container->sb;
 
-	if (!super->updates_pending)
+	if (!super || !super->updates_pending)
 		return;
 
 	write_super_imsm(container, 0);
@@ -6605,6 +6605,14 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
 	dprintf("imsm_update_metadata_for_reshape(enter) raid_disks = %i\n",
 		geo->raid_disks);
 
+	if (super == NULL || super->anchor == NULL) {
+		dprintf("Error: imsm_create_metadata_update_for_reshape(): "\
+			"null pointers on input\n");
+		dprintf("\t\t super = %p\n", super);
+		if (super)
+			dprintf("\t\t super->anchor = %p\n", super->anchor);
+		return ret_val;
+	}
 	if ((geo->raid_disks < super->anchor->num_disks) ||
 	    (geo->raid_disks == UnSet))
 		geo->raid_disks = super->anchor->num_disks;
@@ -6693,8 +6701,12 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
 								     idx);
 				}
 				u->devnum = geo->dev_id;
-				/* case for reshape without grow */
-				if (u->reshape_delta_disks == 0) {
+				/* case for reshape without grow
+				 * or grow is level change effect
+				 */
+				if ((u->reshape_delta_disks == 0) ||
+				    ((new_map->raid_level != geo->level) &&
+				     (geo->level != UnSet))) {
 					dprintf("imsm: reshape prepare "\
 						"metadata for volume= %d, "\
 						"index= %d\n",
@@ -6881,6 +6893,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	int ret_val = 1;
 	struct mdinfo *sra = NULL;
 	int fd = -1;
+	int fdc = -1;
 	char buf[PATH_MAX];
 	int delta_disks = -1;
 	struct geo_params geo;
@@ -6945,9 +6958,15 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	} else
 		dprintf("imsm: not a container operation\n");
 
+	fdc = open_dev(st->container_dev);
+	if (fdc < 0) {
+		dprintf("imsm: cannot open container: %s\n", buf);
+		goto imsm_reshape_super_exit;
+	}
+
 	fd = open_dev(st->devnum);
 	if (fd < 0) {
-		dprintf("imsm: cannot open device: %s\n", buf);
+		dprintf("imsm: cannot open device: %s\n", geo.dev_name);
 		goto imsm_reshape_super_exit;
 	}
 
@@ -6964,8 +6983,10 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	 */
 	if (geo.raid_disks > 0 && geo.raid_disks != UnSet)
 		delta_disks = geo.raid_disks - sra->array.raid_disks;
-	else
+	else {
 		delta_disks = 0;
+		geo.raid_disks = sra->array.raid_disks;
+	}
 	dprintf("imsm: imsm_reshape_super() for array, delta disks = %i\n",
 		delta_disks);
 	if (delta_disks == 0) {
@@ -7002,6 +7023,8 @@ imsm_reshape_super_exit:
 	sysfs_free(sra);
 	if (fd >= 0)
 		close(fd);
+	if (fdc >= 0)
+		close(fdc);
 
 	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
 
@@ -7867,6 +7890,8 @@ int imsm_manage_container_reshape(struct supertype *st, char *backup)
 						"update for subarray: %i "\
 						"(md%i)\n", array, geo.dev_id);
 					st->update_tail = &st->updates;
+					geo.size = UnSet;
+					geo.level = UnSet;
 					u = imsm_create_metadata_update_for_reshape(st, &geo);
 					if (u) {
 						u->reshape_delta_disks =


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 21/29] Read chunk size and layout from mdstat
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (19 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 20/29] Migration raid0->raid5 Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 22/29] FIX: mdstat doesn't read chunk size correctly Adam Kwolek
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Support reading layout and chunk size from mdstat.
It is needed for external reshape with layout or chunk size changes.

This patch removes chunk size changing as result of mdadm action.
Chunk size in mdmon has to change when it is really changed in md only.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |    1 +
 mdadm.h     |    2 ++
 mdstat.c    |   11 +++++++++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/managemon.c b/managemon.c
index 1774299..172a355 100644
--- a/managemon.c
+++ b/managemon.c
@@ -444,6 +444,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 	a->info.array.raid_disks = mdstat->raid_disks;
 	// MORE
 
+	a->info.array.chunk_size = mdstat->chunk_size;
 	/* honor 'frozen' */
 	if (sysfs_get_str(&a->info, NULL, "metadata_version", buf, sizeof(buf)) > 0)
 		frozen = buf[9] == '-';
diff --git a/mdadm.h b/mdadm.h
index c6dfa3d..ceffb81 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -387,6 +387,8 @@ struct mdstat_ent {
 	int		resync; /* 3 if check, 2 if reshape, 1 if resync, 0 if recovery */
 	int		devcnt;
 	int		raid_disks;
+	int		layout;
+	int		chunk_size;
 	char *		metadata_version;
 	struct dev_member {
 		char			*name;
diff --git a/mdstat.c b/mdstat.c
index c5a07b5..47d81d4 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -146,7 +146,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 	end = &all;
 	for (; (line = conf_line(f)) ; free_line(line)) {
 		struct mdstat_ent *ent;
-		char *w;
+		char *w, *prev = NULL;
 		int devnum;
 		int in_devs = 0;
 		char *ep;
@@ -191,7 +191,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 		ent->dev = strdup(line);
 		ent->devnum = devnum;
 
-		for (w=dl_next(line); w!= line ; w=dl_next(w)) {
+		for (w = dl_next(line); w != line ; prev = w, w = dl_next(w)) {
 			int l = strlen(w);
 			char *eq;
 			if (strcmp(w, "active")==0)
@@ -266,6 +266,13 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 				   w[0] <= '9' &&
 				   w[l-1] == '%') {
 				ent->percent = atoi(w);
+			} else if (strcmp(w, "algorithm") == 0 &&
+				   dl_next(w) != line) {
+				w = dl_next(w);
+				ent->layout = atoi(w);
+			} else if (strncmp(w, "chunk", 5) == 0 &&
+				   prev != NULL) {
+				ent->chunk_size = atoi(prev) * 1024;
 			}
 		}
 		if (insert_here && (*insert_here)) {


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 22/29] FIX: mdstat doesn't read chunk size correctly
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (20 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 21/29] Read chunk size and layout from mdstat Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 23/29] Migration: Chunk size migration Adam Kwolek
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Chunk size mdstat entry is recognized by first letter in parsed string.
This is wrong behavior and it is corrected by this patch.

Probably check for 'check' string instead current condition.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 mdstat.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mdstat.c b/mdstat.c
index 47d81d4..6307ddb 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -247,7 +247,8 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 				else
 					ent->resync = 0;
 			} else if (ent->percent == -1 &&
-				   (w[0] == 'r' || w[0] == 'c')) {
+				   (w[0] == 'r' || w[0] == 'c') &&
+				   strncmp(w, "chunk", 5) != 0) {
 				if (strncmp(w, "resync", 4)==0)
 					ent->resync = 1;
 				if (strncmp(w, "reshape", 7)==0)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 23/29] Migration: Chunk size migration
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (21 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 22/29] FIX: mdstat doesn't read chunk size correctly Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:21 ` [PATCH 24/29] Add takeover support for external meta Adam Kwolek
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Add implementation for chunk size migration for external metadata.
Update works using array parameters update in managemon. Reshape is started by managemon also.
mdadm waits for reshape array state instead starting reshape process.
For imsm chunk size parameter flow, from mdadm (via metadata update) to managemon was added.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c   |   10 ++++++++++
 mdmon.h       |    1 +
 super-intel.c |    4 ++++
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index 172a355..7ff49ab 100644
--- a/managemon.c
+++ b/managemon.c
@@ -539,6 +539,16 @@ static void manage_member(struct mdstat_ent *mdstat,
 							status_ok = 0;
 					}
 				}
+				if (status_ok && newa->reshape_chunk_size > 0) {
+					dprintf("managemon: set chunk_size "\
+						"to %i\n",
+						newa->reshape_chunk_size);
+					if (sysfs_set_num(&newa->info,
+						NULL,
+						"chunk_size",
+						newa->reshape_chunk_size) < 0)
+						status_ok = 0;
+				}
 				if (status_ok && newa->reshape_layout >= 0) {
 					dprintf("managemon: set layout to %i\n",
 						newa->reshape_layout);
diff --git a/mdmon.h b/mdmon.h
index a35752c..c463003 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -53,6 +53,7 @@ struct active_array {
 	int reshape_raid_disks;
 	int reshape_level;
 	int reshape_layout;
+	int reshape_chunk_size;
 
 	int check_degraded; /* flag set by mon, read by manage */
 
diff --git a/super-intel.c b/super-intel.c
index 564869e..9f0bb2c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -317,6 +317,7 @@ struct imsm_update_reshape {
 	int reshape_raid_disks;
 	int reshape_level;
 	int reshape_layout;
+	int reshape_chunk_size;
 	int disks_count;
 	int spares_in_update;
 	int devnum;
@@ -5457,6 +5458,7 @@ static void imsm_process_update(struct supertype *st,
 			a->reshape_level = 5;
 			a->reshape_layout = 5;
 		}
+		a->reshape_chunk_size = u->reshape_chunk_size;
 
 		super->updates_pending++;
 update_reshape_exit:
@@ -6645,6 +6647,7 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
 	u->reshape_raid_disks = 0;
 	u->reshape_level = -1;
 	u->reshape_layout = -1;
+	u->reshape_chunk_size = -1;
 	u->update_memory_size = update_memory_size;
 	u->type = update_reshape;
 	u->spares_in_update = 0;
@@ -7002,6 +7005,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 				u->reshape_raid_disks = geo.raid_disks;
 			u->reshape_level = geo.level;
 			u->reshape_layout = geo.layout;
+			u->reshape_chunk_size = geo.chunksize;
 			ret_val = 0;
 			append_metadata_update(st, u, u->update_memory_size);
 		}


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 24/29] Add takeover support for external meta
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (22 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 23/29] Migration: Chunk size migration Adam Kwolek
@ 2010-12-09 15:21 ` Adam Kwolek
  2010-12-09 15:22 ` [PATCH 25/29] Takeover raid10 -> raid0 for external metadata Adam Kwolek
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

This patch introduces 0->10 and 10->0 takeover operations for external
metadata. It defines all necessary functions, interfaces and structures.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 Grow.c        |   77 +++++++++++++++++++++++++++++++++++++++-------
 super-intel.c |   96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/Grow.c b/Grow.c
index 81373ef..833b0bc 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1012,7 +1012,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 	 *
 	 */
 	struct mdu_array_info_s array, orig;
-	char *c;
+	char *c = NULL;
 	int rv = 0;
 	struct supertype *st;
 	char *subarray = NULL;
@@ -1305,17 +1305,69 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				rv = 1;/* not possible */
 				goto release;
 			}
-			err = sysfs_set_str(sra, NULL, "level", c);
-			if (err) {
-				err = errno;
-				fprintf(stderr, Name ": %s: could not set level to %s\n",
-					devname, c);
-				if (err == EBUSY && 
-				    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
-					fprintf(stderr, "       Bitmap must be removed before level can be changed\n");
+			if (level > 0) {
+				err = sysfs_set_str(sra, NULL, "level", c);
+				if (err) {
+					err = errno;
+					fprintf(stderr,
+						Name ": %s: could not set "\
+							"level to %s\n",
+						devname, c);
+					if (err == EBUSY &&
+					    (array.state &
+					     (1<<MD_SB_BITMAP_PRESENT)))
+						fprintf(stderr,
+							"       Bitmap must be"\
+							"removed before level "\
+							"can be changed\n");
+					rv = 1;
+					goto release;
+				}
+			}
+
+			if (st && reshape_super(st,
+						-1,
+						level,
+						UnSet,
+						0,
+						0,
+						NULL,
+						devname,
+						!quiet)) {
 				rv = 1;
 				goto release;
 			}
+			/* before sending update make sure that
+			 * for external metadata and after changing raid level
+			 * mdmon is running
+			 */
+			if (st->ss->external &&
+			    !mdmon_running(st->container_dev) &&
+			    level > 0) {
+				start_mdmon(st->container_dev);
+				if (container)
+					ping_monitor(container);
+			}
+			sync_metadata(st);
+			if (level == 0) {
+				err = sysfs_set_str(sra, NULL, "level", c);
+				if (err) {
+					err = errno;
+					fprintf(stderr, Name ": %s: could not "\
+						"set level to %s\n",
+						devname, c);
+					if (err == EBUSY &&
+					    (array.state &
+					     (1<<MD_SB_BITMAP_PRESENT)))
+						fprintf(stderr,
+							"       Bitmap must "\
+							"be removed before "\
+							"level can be "\
+							"changed\n");
+					rv = 1;
+				}
+				goto release;
+			}
 			orig = array;
 			orig_level = orig.level;
 			ioctl(fd, GET_ARRAY_INFO, &array);
@@ -1327,6 +1379,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				fprintf(stderr, Name " level of %s changed to %s\n",
 					devname, c);
 			changed = 1;
+
 		}
 	}
 
@@ -1381,8 +1434,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 			/* Looks like this level change doesn't need
 			 * a reshape after all.
 			 */
-			c = map_num(pers, level);
-			if (c) {
+			if ((c) && (level == 0)) {
 				rv = sysfs_set_str(sra, NULL, "level", c);
 				if (rv) {
 					int err = errno;
@@ -1401,7 +1453,8 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 		if (st->ss->external && !mdmon_running(st->container_dev) &&
 		    level > 0) {
 			start_mdmon(st->container_dev);
-			ping_monitor(container);
+			if (container)
+				ping_monitor(container);
 		}
 		goto release;
 	}
diff --git a/super-intel.c b/super-intel.c
index 9f0bb2c..589d40c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -289,6 +289,7 @@ enum imsm_update_type {
 	update_reshape,
 	update_reshape_set_slots,
 	update_reshape_cancel,
+	update_level,
 };
 
 struct imsm_update_activate_spare {
@@ -365,6 +366,31 @@ struct imsm_update_add_disk {
 	enum imsm_update_type type;
 };
 
+struct imsm_disk_changes {
+	int major;
+	int minor;
+	int index;
+};
+
+struct imsm_update_level {
+	enum imsm_update_type type;
+	struct dl *disk_list;
+	int delta_disks;
+	int container_member;
+	int disk_qan;
+	int changes_offset;
+	int rm_qan;
+	int add_qan;
+	struct imsm_dev dev;
+	/* here goes the table with disk changes
+	 */
+	/* and here goes imsm_disk_changes pointed by changes_offset
+	 * disk_changes are put here
+	 * as row data every sizeof(struct imsm_disk_changes)
+	 *
+	 */
+};
+
 static struct supertype *match_metadata_desc_imsm(char *arg)
 {
 	struct supertype *st;
@@ -5598,6 +5624,9 @@ update_reshape_exit:
 		super->updates_pending++;
 		break;
 	}
+	case update_level: {
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5982,6 +6011,9 @@ static void imsm_prepare_update(struct supertype *st,
 		u->update_prepared = -1;
 		break;
 	}
+	case update_level: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -6157,6 +6189,13 @@ int imsm_find_array_minor_by_subdev(int subdev, int container, int *minor)
 	return -1;
 }
 
+static int update_level_imsm(struct supertype *st, struct mdinfo *info,
+			     struct geo_params *geo, int verbose,
+			     int uuid_set, char *homehost)
+{
+	return 0;
+}
+
 int imsm_reshape_is_allowed_on_container(struct supertype *st,
 					 struct geo_params *geo)
 {
@@ -6895,6 +6934,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 {
 	int ret_val = 1;
 	struct mdinfo *sra = NULL;
+	struct mdinfo *srac = NULL;
 	int fd = -1;
 	int fdc = -1;
 	char buf[PATH_MAX];
@@ -6957,6 +6997,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 				"on container\n");
 		if (ret_val)
 			unfreeze_container(st);
+
 		goto imsm_reshape_super_exit;
 	} else
 		dprintf("imsm: not a container operation\n");
@@ -6966,6 +7007,12 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 		dprintf("imsm: cannot open container: %s\n", buf);
 		goto imsm_reshape_super_exit;
 	}
+	srac = sysfs_read(fdc, 0,  GET_VERSION | GET_LEVEL | GET_LAYOUT |
+			  GET_DISKS | GET_DEVS | GET_CHUNK | GET_SIZE);
+	if (srac == NULL) {
+		fprintf(stderr, Name ": Cannot read sysfs info (imsm)\n");
+		goto imsm_reshape_super_exit;
+	}
 
 	fd = open_dev(st->devnum);
 	if (fd < 0) {
@@ -6982,7 +7029,53 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 
 	geo.dev_id = -1;
 
-	/* continue volume check - proceed if delta_disk is zero only
+	/* we have volume so takeover can be performed for single volume only
+	 */
+	if ((geo.size == -1) &&
+	    (geo.layout == UnSet) &&
+	    (geo.raid_disks == 0) &&
+	    (geo.level != UnSet)) {
+		/* ok - this is takeover */
+		struct intel_super *super;
+
+		/* takeover raid0<->raid5 doesn't need meta update
+		 * this can be handled by migrations if necessary
+		 */
+		if ((geo.level == 5) && (sra->array.level == 5)) {
+			ret_val = 0;
+			goto imsm_reshape_super_exit;
+		}
+		st->ss->load_super(st, fdc, NULL);
+		super = st->sb;
+		if (!super) {
+			fprintf(stderr, Name ": Super pointer is NULL.\n");
+			goto imsm_reshape_super_exit;
+		}
+		if (super->anchor->num_raid_devs > 1) {
+			fprintf(stderr, Name ": Cannot perform raid10 takeover "
+				"on multiarray container for imsm.\n");
+			goto imsm_reshape_super_exit;
+		}
+		super->current_vol = 0;
+		st->ss->getinfo_super(st, sra, NULL);
+		if (imsm_find_array_minor_by_subdev(super->current_vol,
+						    st->container_dev,
+						    &geo.dev_id) < 0)
+			goto imsm_reshape_super_exit;
+
+		/* send metadata update for
+		 * raid10 -> raid0 or raid0 -> raid10 takeover */
+		if (((geo.level == 0) && (sra->array.level == 10)) ||
+		   ((geo.level == 10) && (sra->array.level == 0))) {
+			st->update_tail = &st->updates;
+			if (update_level_imsm(st, sra, &geo, 0, 0, NULL) == 0)
+				ret_val = 0;
+			goto imsm_reshape_super_exit;
+		}
+	}
+
+	/* this is not takeover
+	 * continue volume check - proceed if delta_disk is zero only
 	 */
 	if (geo.raid_disks > 0 && geo.raid_disks != UnSet)
 		delta_disks = geo.raid_disks - sra->array.raid_disks;
@@ -7025,6 +7118,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 
 imsm_reshape_super_exit:
 	sysfs_free(sra);
+	sysfs_free(srac);
 	if (fd >= 0)
 		close(fd);
 	if (fdc >= 0)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 25/29] Takeover raid10 -> raid0 for external metadata
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (23 preceding siblings ...)
  2010-12-09 15:21 ` [PATCH 24/29] Add takeover support for external meta Adam Kwolek
@ 2010-12-09 15:22 ` Adam Kwolek
  2010-12-09 15:22 ` [PATCH 26/29] Takeover raid0 -> raid10 " Adam Kwolek
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

The patch introduces takeover form level 10 to level 0 for imsm
metadata. This patch contains procedures connected with preparing
and applying metadata update during 10 -> 0 takeover.
When performing takeover 10->0 mdmon should update the external
metadata (due to disk slot and level changes).
To achieve that mdadm, after changing the level in md, mdadm calls
reshape_super() with and prepare the "update_level" metadata update type.
reshape_super) allocates a new imsm_dev with updated disk slot
numbers to be processed by mdmon in process_update().
process_update() discovers missing disks and adds them to imsm
metadata.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 super-intel.c |  193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 193 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 589d40c..14c009b 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5625,6 +5625,92 @@ update_reshape_exit:
 		break;
 	}
 	case update_level: {
+		struct imsm_update_level *u = (void *)update->buf;
+		struct imsm_dev *dev_new, *dev = NULL;
+		struct imsm_map *map;
+		struct dl *d;
+		int i, j;
+		int start_disk;
+
+		dev_new = &u->dev;
+		for (i = 0; i < mpb->num_raid_devs; i++) {
+			dev = get_imsm_dev(super, i);
+			if (strcmp((char *)dev_new->volume,
+				   (char *)dev->volume) == 0)
+				break;
+		}
+		if (i == super->anchor->num_raid_devs)
+			return;
+
+		if (dev == NULL)
+			return;
+
+		struct imsm_disk_changes *changes = (struct imsm_disk_changes *)
+						((void *)u + u->changes_offset);
+		map = get_imsm_map(dev_new, 0);
+		int *tab = (int *)&map->disk_ord_tbl;
+
+		/* iterate through devices to mark unused disks as spare
+		 * and update order table
+		 */
+		for (i = 0; i < u->rm_qan; i++) {
+			struct dl *dm = NULL;
+			for (dm = super->disks; dm; dm = dm->next) {
+				if ((dm->major != changes[i].major) ||
+				    (dm->minor != changes[i].minor))
+					continue;
+				for (j = 0; j < u->disk_qan; j++)
+					if ((tab[j] > dm->index) &&
+					    (dm->index >= 0))
+						tab[j]--;
+				struct dl *du;
+				for (du = super->disks; du; du = du->next)
+					if ((du->index > dm->index) &&
+					    (du->index > 0))
+						du->index--;
+				dm->disk.status = SPARE_DISK;
+				dm->index = -1;
+			}
+		}
+
+		if (u->rm_qan) {
+			/* Remove unused entrys in disk_ord_tbl */
+			for (i = 0; i < u->disk_qan; i++) {
+				if (tab[i] < 0)
+					for (j = i; j < (u->disk_qan - 1); j++)
+						tab[j] = tab[j+1];
+			}
+		}
+
+		imsm_copy_dev(dev, dev_new);
+		map = get_imsm_map(dev, 0);
+		start_disk = mpb->num_disks;
+
+		/* clear missing disks list */
+		while (super->missing) {
+			d = super->missing;
+			super->missing = d->next;
+			__free_imsm_disk(d);
+		}
+		if (u->rm_qan)
+			find_missing(super);
+
+		/* clear new disk entries if number of disks increased*/
+		d = super->missing;
+		for (i = start_disk; i < map->num_members; i++) {
+			if (!d)
+				break;
+			memset(&d->disk, 0, sizeof(d->disk));
+			strcpy((char *)d->disk.serial, "MISSING");
+			d->disk.total_blocks = map->blocks_per_member;
+			/* Set slot for missing disk */
+			set_imsm_ord_tbl_ent(map, i, d->index |
+					     IMSM_ORD_REBUILD);
+			d->raiddisk = i;
+			d = d->next;
+		}
+
+		super->updates_pending++;
 		break;
 	}
 	case update_activate_spare: {
@@ -6193,6 +6279,113 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 			     struct geo_params *geo, int verbose,
 			     int uuid_set, char *homehost)
 {
+	struct intel_super *super = st->sb;
+	struct imsm_update_level *u;
+	struct imsm_dev *dev_new, *dev = NULL;
+	struct imsm_map *map_new, *map;
+	struct mdinfo *newdi;
+	struct dl *dl;
+	int *tmp_ord_tbl;
+	int i, slot, idx;
+	int len;
+
+	/* update level is used only for 0->10 and 10->0 transitions */
+	if ((info->array.level != 10 && (geo->level != 0)) &&
+		((info->array.level != 0) && (geo->level != 10)))
+		return 1;
+
+	dev = __get_imsm_dev(super->anchor, 0);
+
+	map = get_imsm_map(dev, 0);
+	geo->raid_disks = (geo->level == 10) ? 4 : map->num_members;
+
+	if (!is_raid_level_supported(super->orom,
+				     geo->level,
+				     geo->raid_disks))
+		return 1;
+
+	len = sizeof(struct imsm_update_level) +
+		((geo->raid_disks - 1) * sizeof(__u32)) +
+		(geo->raid_disks * sizeof(struct imsm_disk_changes));
+
+	u = malloc(len);
+	if (u == NULL)
+		return 1;
+
+	u->changes_offset = sizeof(struct imsm_update_level) +
+			    ((geo->raid_disks - 1) * sizeof(__u32));
+	struct imsm_disk_changes *change = (struct imsm_disk_changes *)
+					((void *)u + u->changes_offset);
+	u->rm_qan = 0;
+	u->disk_list = NULL;
+	u->disk_qan = geo->raid_disks;
+
+	dev_new = &u->dev;
+	imsm_copy_dev(dev_new, dev);
+	map_new = get_imsm_map(dev_new, 0);
+
+	tmp_ord_tbl = malloc(sizeof(int) * geo->raid_disks);
+	if (tmp_ord_tbl == NULL) {
+		free(u);
+		return 1;
+	}
+
+	for (i = 0; i < geo->raid_disks; i++) {
+		tmp_ord_tbl[i] = -1;
+		change[i].major = -1;
+		change[i].minor = -1;
+	}
+
+	/* 10->0 transition:
+	 * - mark unused disks
+	 * - update indexes in order table
+	 */
+	if (geo->level == 0) {
+		/* iterate through devices to detect slot changes */
+		i = 0;
+		for (dl = super->disks; dl; dl = dl->next) {
+			idx = -1;
+			for (newdi = info->devs; newdi; newdi = newdi->next) {
+				if ((dl->major != newdi->disk.major) ||
+					    (dl->minor != newdi->disk.minor) ||
+					    (newdi->disk.raid_disk < 0))
+					continue;
+				slot = get_imsm_disk_slot(map, dl->index);
+				idx = get_imsm_ord_tbl_ent(dev_new, slot);
+				tmp_ord_tbl[newdi->disk.raid_disk] = idx;
+				break;
+			}
+			/* if slot not found, mark disk as not used */
+			if ((idx == -1) && (!(dl->disk.status & SPARE_DISK))) {
+				change[i].major = dl->major;
+				change[i].minor = dl->minor;
+				u->rm_qan++;
+				i++;
+			}
+		}
+		for (i = 0; i < geo->raid_disks; i++)
+			set_imsm_ord_tbl_ent(map_new, i, tmp_ord_tbl[i]);
+	}
+
+	map_new->num_members = (geo->level == 10) ?
+				geo->raid_disks :
+				(info->array.raid_disks - u->rm_qan);
+	map_new->map_state = IMSM_T_STATE_NORMAL;
+	map_new->failed_disk_num = 0;
+
+	if (geo->level == 10) {
+		map_new->num_domains = map_new->num_members / 2;
+		map_new->raid_level = 1;
+	} else {
+		map_new->num_domains = 1;
+		map_new->raid_level = geo->level;
+	}
+
+	u->type = update_level;
+	u->delta_disks = 0;
+	u->container_member = info->container_member;
+	append_metadata_update(st, u, len);
+	free(tmp_ord_tbl);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 26/29] Takeover raid0 -> raid10 for external metadata
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (24 preceding siblings ...)
  2010-12-09 15:22 ` [PATCH 25/29] Takeover raid10 -> raid0 for external metadata Adam Kwolek
@ 2010-12-09 15:22 ` Adam Kwolek
  2010-12-09 15:22 ` [PATCH 27/29] FIX: Problem with removing array after takeover Adam Kwolek
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

The patch introduces takeover form level 0 to level 10 for imsm
metadata. This patch contains procedures connected with preparing
and applying metadata update during 0 -> 10 takeover.

When performing takeover 0->10 mdmon should update the external
metadata (due to disk slot and level changes).
To achieve that mdadm, after changing the level in md, calls
reshape_super() with and prepare the "update_level" metadata update
type.
reshape_super) allocates a new imsm_dev with updated disk slot
numbers to be processed by mdmon in process_update().
process_update() discovers missing disks and adds them to imsm
metadata.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 Grow.c        |    2 ++
 super-intel.c |   73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/Grow.c b/Grow.c
index 833b0bc..3d74db1 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1368,6 +1368,8 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				}
 				goto release;
 			}
+			if (level == 10)
+				goto release;
 			orig = array;
 			orig_level = orig.level;
 			ioctl(fd, GET_ARRAY_INFO, &array);
diff --git a/super-intel.c b/super-intel.c
index 14c009b..0488c1c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3624,6 +3624,8 @@ static int write_super_imsm(struct supertype *st, int doclose)
 	for (d = super->disks; d ; d = d->next) {
 		if (d->index < 0)
 			continue;
+		if (d->fd < 0)
+			continue;
 		if (store_imsm_mpb(d->fd, mpb))
 			fprintf(stderr, "%s: failed for device %d:%d %s\n",
 				__func__, d->major, d->minor, strerror(errno));
@@ -5682,6 +5684,25 @@ update_reshape_exit:
 			}
 		}
 
+		if (u->add_qan)
+			for (i = 0; i < u->disk_qan; i++)
+				tab[i] = i;
+
+		struct dl *dc;
+		for (i = 0; i < u->add_qan; i++) {
+			/* update indexes in current list */
+			for (dc = super->disks; dc; dc = dc->next) {
+				if (dc->index >= changes[i].index)
+					dc->index++;
+			}
+			/* mark dummy disks for rebuild */
+			tab[changes[i].index] |= IMSM_ORD_REBUILD;
+		}
+		/* append dummy disk list at the end of current list */
+		for (dc = super->disks; dc->next; dc = dc->next)
+			; /* nothing to do, just go to the end of list */
+		dc->next = u->disk_list;
+
 		imsm_copy_dev(dev, dev_new);
 		map = get_imsm_map(dev, 0);
 		start_disk = mpb->num_disks;
@@ -6098,6 +6119,33 @@ static void imsm_prepare_update(struct supertype *st,
 		break;
 	}
 	case update_level: {
+		struct imsm_update_level *u = (void *) update->buf;
+		int i;
+		struct imsm_disk_changes *changes = (struct imsm_disk_changes *)
+						((void *)u + u->changes_offset);
+
+		dprintf("prepare_update(): update level\n");
+
+		for (i = 0; i < u->add_qan; i++) {
+			struct dl *d = calloc(1, sizeof(struct dl));
+			if (!d)
+				break;
+			memcpy(d, super->disks, sizeof(struct dl));
+
+			d->disk.status = FAILED_DISK;
+			strcpy((char *)d->disk.serial, "dummy");
+			strcpy((char *)d->serial, "dummy");
+			d->disk.scsi_id = 0;
+			d->fd = -1;
+			d->minor = 0;
+			d->major = 0;
+			d->index = changes[i].index;
+			d->next = u->disk_list;
+			u->disk_list = d;
+		}
+		len = disks_to_mpb_size(u->add_qan +
+					mpb->num_disks -
+					u->rm_qan);
 		break;
 	}
 	case update_create_array: {
@@ -6317,6 +6365,7 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 	struct imsm_disk_changes *change = (struct imsm_disk_changes *)
 					((void *)u + u->changes_offset);
 	u->rm_qan = 0;
+	u->add_qan = 0;
 	u->disk_list = NULL;
 	u->disk_qan = geo->raid_disks;
 
@@ -6334,6 +6383,7 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 		tmp_ord_tbl[i] = -1;
 		change[i].major = -1;
 		change[i].minor = -1;
+		change[i].index = -1;
 	}
 
 	/* 10->0 transition:
@@ -6367,9 +6417,28 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 			set_imsm_ord_tbl_ent(map_new, i, tmp_ord_tbl[i]);
 	}
 
+	/* 0->10 transition:
+	 * - add dummy disks to metdata
+	 * - store slots for dummy disks in update buffer
+	 */
+	if (geo->level == 10) {
+		u->add_qan = 0;
+		for (i = 0; i < geo->raid_disks; i++) {
+			int found = 0;
+			for (newdi = info->devs; newdi; newdi = newdi->next) {
+				if (newdi->disk.raid_disk == i) {
+					found = 1;
+					break;
+				}
+			}
+		if (!found)
+			change[u->add_qan++].index = i;
+		}
+	}
+
 	map_new->num_members = (geo->level == 10) ?
-				geo->raid_disks :
-				(info->array.raid_disks - u->rm_qan);
+		geo->raid_disks :
+		(info->array.raid_disks - u->rm_qan);
 	map_new->map_state = IMSM_T_STATE_NORMAL;
 	map_new->failed_disk_num = 0;
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 27/29] FIX: Problem with removing array after takeover
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (25 preceding siblings ...)
  2010-12-09 15:22 ` [PATCH 26/29] Takeover raid0 -> raid10 " Adam Kwolek
@ 2010-12-09 15:22 ` Adam Kwolek
  2010-12-09 15:22 ` [PATCH 28/29] IMSM compatibility for raid0 -> raid10 takeover Adam Kwolek
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When array parameters are changed old array 'A' is going to be removed
and new array 'B' is going to be serviced. If array B is raid0 array (takeovered),
array 'A' will never be deleted and mdmon is not going to exit.
Scenario:
1. managemon creates array 'B' and inserts it to begin of active arrays list
2. managemon sets field B->replaces = A

3. monitor: finds that array 'B' is raid 0 array and removes it from list
   information about removing array 'A' from list is lost
   and array 'A' stays in list forever

To resolve this situation wait with removing array 'B' until array 'A' is not removed.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 monitor.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/monitor.c b/monitor.c
index 50dada4..0961636 100644
--- a/monitor.c
+++ b/monitor.c
@@ -510,10 +510,14 @@ static int wait_and_act(struct supertype *container, int nowait)
 
 	for (ap = aap ; *ap ;) {
 		a = *ap;
+
 		/* once an array has been deactivated we want to
 		 * ask the manager to discard it.
+		 * but to do this we have to wait until replaced
+		 * array is removed
 		 */
-		if (!a->container || (a->info.array.level == 0)) {
+		if ((!a->container || (a->info.array.level == 0)) &&
+		     !a->replaces) {
 			if (discard_this) {
 				ap = &(*ap)->next;
 				continue;


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 28/29] IMSM compatibility for raid0 -> raid10 takeover
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (26 preceding siblings ...)
  2010-12-09 15:22 ` [PATCH 27/29] FIX: Problem with removing array after takeover Adam Kwolek
@ 2010-12-09 15:22 ` Adam Kwolek
  2010-12-09 15:22 ` [PATCH 29/29] Add spares to raid0 in mdadm Adam Kwolek
  2010-12-16 11:20 ` [PATCH 00/29] OLCE, migrations and raid10 takeover Neil Brown
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

imsm does not support raid10 array larger then 4-drives
so takeover to raid10 is allowed only from 2-drive raid0.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 Grow.c        |    1 +
 super-intel.c |   24 +++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/Grow.c b/Grow.c
index 3d74db1..a01051f 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1305,6 +1305,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				rv = 1;/* not possible */
 				goto release;
 			}
+
 			if (level > 0) {
 				err = sysfs_set_str(sra, NULL, "level", c);
 				if (err) {
diff --git a/super-intel.c b/super-intel.c
index 0488c1c..395e7b9 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -6342,8 +6342,30 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 		((info->array.level != 0) && (geo->level != 10)))
 		return 1;
 
-	dev = __get_imsm_dev(super->anchor, 0);
+	/* imsm does not support raid0 to raid10 takeover for 2-drive raid0 */
+	if ((geo->level == 10) &&
+	    (info->array.level == 0) &&
+	    (info->array.raid_disks > 2)) {
+		fprintf(stderr, "imsm: Cannot set level to %i for array %s "\
+			"(number of drives > 2)\n",
+			geo->level,
+			geo->dev_name);
+		/* return to raid0 */
+		char fname[PATH_MAX];
+		int fd;
+		sprintf(fname, "/sys/block/md%i/md/level", geo->dev_id);
+		fd = open(fname, O_WRONLY);
+		if (fd < 0)
+			return 1;
+		if (write(fd, "raid0", 5) != 5)
+			dprintf(Name ": failed to write raid0 to '%s' (%s)\n",
+				fname,
+				strerror(errno));
+		close(fd);
+		return 1;
+	}
 
+	dev = __get_imsm_dev(super->anchor, 0);
 	map = get_imsm_map(dev, 0);
 	geo->raid_disks = (geo->level == 10) ? 4 : map->num_members;
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 29/29] Add spares to raid0 in mdadm
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (27 preceding siblings ...)
  2010-12-09 15:22 ` [PATCH 28/29] IMSM compatibility for raid0 -> raid10 takeover Adam Kwolek
@ 2010-12-09 15:22 ` Adam Kwolek
  2010-12-16 11:20 ` [PATCH 00/29] OLCE, migrations and raid10 takeover Neil Brown
  29 siblings, 0 replies; 41+ messages in thread
From: Adam Kwolek @ 2010-12-09 15:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When user wants to add spares to container with raid0 arrays only
it is not possible to update metadata due to lack of running mdmon.
To allow for this direct metadata update by mdadm is used in such case.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 Manage.c      |   14 +++-----------
 super-intel.c |    8 ++++++++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/Manage.c b/Manage.c
index a203ec9..f8dcaaf 100644
--- a/Manage.c
+++ b/Manage.c
@@ -825,9 +825,8 @@ int Manage_subdevs(char *devname, int fd,
 			if (dv->writemostly == 1)
 				disc.state |= (1 << MD_DISK_WRITEMOSTLY);
 			if (tst->ss->external) {
-				/* add a disk to an external metadata container
-				 * only if mdmon is around to see it
-				 */
+				/* add a disk
+				 * to an external metadata container */
 				struct mdinfo new_mdi;
 				struct mdinfo *sra;
 				int container_fd;
@@ -841,13 +840,6 @@ int Manage_subdevs(char *devname, int fd,
 					return 1;
 				}
 
-				if (!mdmon_running(devnum)) {
-					fprintf(stderr, Name ": add failed for %s: mdmon not running\n",
-						dv->devname);
-					close(container_fd);
-					return 1;
-				}
-
 				sra = sysfs_read(container_fd, -1, 0);
 				if (!sra) {
 					fprintf(stderr, Name ": add failed for %s: sysfs_read failed\n",
@@ -865,9 +857,9 @@ int Manage_subdevs(char *devname, int fd,
 					fprintf(stderr, Name ": add new device to external metadata"
 						" failed for %s\n", dv->devname);
 					close(container_fd);
+					sysfs_free(sra);
 					return 1;
 				}
-				ping_monitor(devnum2devname(devnum));
 				sysfs_free(sra);
 				close(container_fd);
 			} else if (ioctl(fd, ADD_NEW_DISK, &disc)) {
diff --git a/super-intel.c b/super-intel.c
index 395e7b9..6697257 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3448,6 +3448,8 @@ static int add_to_super_imsm_volume(struct supertype *st, mdu_disk_info_t *dk,
 	return 0;
 }
 
+static int write_super_imsm(struct supertype *st, int doclose);
+
 static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 			      int fd, char *devname)
 {
@@ -3511,6 +3513,12 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 		super->disks = dd;
 	}
 
+	if (!mdmon_running(st->container_dev)) {
+		dprintf("imsm: mdmon is not active- write metadata by mdadm\n");
+		super->updates_pending++;
+		write_super_imsm(st, 0);
+	}
+
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/29] imsm: Prepare reshape_update in mdadm
  2010-12-09 15:18 ` [PATCH 02/29] imsm: Prepare reshape_update in mdadm Adam Kwolek
@ 2010-12-14  0:07   ` Neil Brown
  2010-12-14  7:54     ` Kwolek, Adam
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Brown @ 2010-12-14  0:07 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

On Thu, 09 Dec 2010 16:18:56 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> During Online Capacity Expansion metadata has to be updated to show
> array changes and allow for future assembly of array.
> To do this mdadm prepares and sends reshape_update metadata update to mdmon.
> Update is sent for one array in container. It contains updated device
> and spares that have to be turned in to array members.
> For spares we have 2 cases:
> 1. For first array in container:
>    reshape_delta_disks: shows how many disks will be added to array
>    Spares are sent in update so variable spares_in_update in metadata update tells that mdmon has to turn spares in to array
>    (IMSM's array meaning) members.
> 2. For 2nd array in container:
>    reshape_delta_disks: shows how many disks will be added to array -exactly as in first case
>    Spares were turned in to array members (they are not a spares) so we have for this volume
>    reuse those disks only.
> 
> This update will change active array state to reshape_is_starting state.
> This works in the following way:
> 1. reshape_super() prepares metadata update and send it to mdmon
> 2. managemon in prepare_update() allocates required memory for bigger device object
> 3. monitor in process_update() updates (replaces) device object with information
>    passed from mdadm (memory was allocated by managemon)
> 4. process_update() function performs:
>    - sets reshape_delta_disks variable to reshape_delta_disks value from update
>    - sets array in to reshape_is_starting state.
> 5. This signals managemon to add devices to md and start reshape for this array
>    and put array in to reshape_in_progress.
>    Managemon can request reshape_cancel_request state in error case.
> 
> Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>

I have applied this.
I haven't reviewed the super-intel.c bits very closely.

I have improved find_array_by_subdev, renamed it to mdstat_by_subdev and
moved it to mdstat.c where it belongs.
I have also totally re-written sysfs_get_unused_spares which was:
  - undocumented
  - overly complex
  - wrong in a few places. e.g.


> +
> +struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd)
> +{
> +	char fname[PATH_MAX];
> +	char buf[PATH_MAX];
> +	char *base;
> +	char *dbase;
> +	struct mdinfo *ret_val;
> +	struct mdinfo *dev;
> +	DIR *dir = NULL;
> +	struct dirent *de;
> +	int is_in;
> +	char *to_check;
> +
> +	ret_val = malloc(sizeof(*ret_val));
> +	if (ret_val == NULL)
> +		goto abort;
> +	memset(ret_val, 0, sizeof(*ret_val));
> +	sysfs_init(ret_val, container_fd, -1);
> +	if (ret_val->sys_name[0] == 0)
> +		goto abort;
> +
> +	sprintf(fname, "/sys/block/%s/md/", ret_val->sys_name);
> +	base = fname + strlen(fname);
> +
> +	strcpy(base, "raid_disks");
> +	if (load_sys(fname, buf))
> +		goto abort;
> +	ret_val->array.raid_disks = strtoul(buf, NULL, 0);
> +
> +	/* Get all the devices as well */
> +	*base = 0;
> +	dir = opendir(fname);
> +	if (!dir)
> +		goto abort;
> +	ret_val->array.spare_disks = 0;
> +	while ((de = readdir(dir)) != NULL) {
> +		char *ep;
> +		if (de->d_ino == 0 ||
> +		    strncmp(de->d_name, "dev-", 4) != 0)
> +			continue;
> +		strcpy(base, de->d_name);
> +		dbase = base + strlen(base);
> +		*dbase = '\0';
> +
> +		to_check = strstr(fname, "/md/");
> +		is_in = sysfs_is_spare_device_belongs_to(fd, to_check);
> +		if (is_in == -1) {
> +			char *p;
> +			struct stat stb;
> +			char stb_name[PATH_MAX];
> +
> +			dev = malloc(sizeof(*dev));
> +			if (!dev)
> +				goto abort;
> +			strncpy(dev->text_version, fname, 50);
> +
> +			*dbase++ = '/';
> +
> +			dev->disk.raid_disk = strtoul(buf, &ep, 10);
> +			dev->disk.raid_disk = -1;
> +
> +			strcpy(dbase, "block/dev");
> +			if (load_sys(fname, buf)) {
> +				free(dev);
> +				continue;
> +			}
> +			/* check first if we are working on block device
> +			 * if not, we cannot check it
> +			 */
> +			p = strchr(dev->text_version, '-');
> +			if (p)
> +				p++;
> +			sprintf(stb_name, "/dev/%s", p);
> +			if (stat(stb_name, &stb) < 0) {

Taking a string out of /sys and looking it up in /dev is wrong.  Names
in /dev tend to follow /sys by convention, but there is no guarantee.


> +				dprintf(Name ": stat failed for %s: %s.\n",
> +					stb_name, strerror(errno));
> +				free(dev);
> +				continue;
> +			}
> +			if (!S_ISBLK(stb.st_mode)) {

And everything that is a member of an array will be a block device, so
checking that it is a block device is pointless.


> +				dprintf(Name\
> +					": %s is not a block device."\
> +					" Skip checking.\n",
> +					stb_name);
> +				goto skip;
> +			}
> +			dprintf(Name": %s seams to a be block device\n",
> +				stb_name);
> +			sscanf(buf, "%d:%d",
> +			       &dev->disk.major,
> +			       &dev->disk.minor);
> +			strcpy(dbase, "block/device/state");
> +			if (load_sys(fname, buf) != 0) {
> +				free(dev);
> +				continue;
> +			}
> +			if (strncmp(buf, "offline", 7) == 0) {
> +				free(dev);
> +				continue;
> +			}
> +			if (strncmp(buf, "failed", 6) == 0) {
> +				free(dev);
> +				continue;
> +			}

The string you want to check here is "faulty" not "failed".

NeilBrown



^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/29] imsm: Prepare reshape_update in mdadm
  2010-12-14  0:07   ` Neil Brown
@ 2010-12-14  7:54     ` Kwolek, Adam
  0 siblings, 0 replies; 41+ messages in thread
From: Kwolek, Adam @ 2010-12-14  7:54 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech

As I wrote in cover email, spares management will/is be changed by Krzysztof.
Yesterday, he post patch and functions you pointed below as not correct or with bugs
are totally removed from code. 
I didn't update/develop those functions so far, due this reason.

BR
Adam

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Neil Brown
> Sent: Tuesday, December 14, 2010 1:08 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 02/29] imsm: Prepare reshape_update in mdadm
> 
> On Thu, 09 Dec 2010 16:18:56 +0100 Adam Kwolek <adam.kwolek@intel.com>
> wrote:
> 
> > During Online Capacity Expansion metadata has to be updated to show
> > array changes and allow for future assembly of array.
> > To do this mdadm prepares and sends reshape_update metadata update to
> mdmon.
> > Update is sent for one array in container. It contains updated device
> > and spares that have to be turned in to array members.
> > For spares we have 2 cases:
> > 1. For first array in container:
> >    reshape_delta_disks: shows how many disks will be added to array
> >    Spares are sent in update so variable spares_in_update in metadata
> update tells that mdmon has to turn spares in to array
> >    (IMSM's array meaning) members.
> > 2. For 2nd array in container:
> >    reshape_delta_disks: shows how many disks will be added to array -
> exactly as in first case
> >    Spares were turned in to array members (they are not a spares) so
> we have for this volume
> >    reuse those disks only.
> >
> > This update will change active array state to reshape_is_starting
> state.
> > This works in the following way:
> > 1. reshape_super() prepares metadata update and send it to mdmon
> > 2. managemon in prepare_update() allocates required memory for bigger
> device object
> > 3. monitor in process_update() updates (replaces) device object with
> information
> >    passed from mdadm (memory was allocated by managemon)
> > 4. process_update() function performs:
> >    - sets reshape_delta_disks variable to reshape_delta_disks value
> from update
> >    - sets array in to reshape_is_starting state.
> > 5. This signals managemon to add devices to md and start reshape for
> this array
> >    and put array in to reshape_in_progress.
> >    Managemon can request reshape_cancel_request state in error case.
> >
> > Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
> > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> 
> I have applied this.
> I haven't reviewed the super-intel.c bits very closely.
> 
> I have improved find_array_by_subdev, renamed it to mdstat_by_subdev
> and
> moved it to mdstat.c where it belongs.
> I have also totally re-written sysfs_get_unused_spares which was:
>   - undocumented
>   - overly complex
>   - wrong in a few places. e.g.
> 
> 
> > +
> > +struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd)
> > +{
> > +	char fname[PATH_MAX];
> > +	char buf[PATH_MAX];
> > +	char *base;
> > +	char *dbase;
> > +	struct mdinfo *ret_val;
> > +	struct mdinfo *dev;
> > +	DIR *dir = NULL;
> > +	struct dirent *de;
> > +	int is_in;
> > +	char *to_check;
> > +
> > +	ret_val = malloc(sizeof(*ret_val));
> > +	if (ret_val == NULL)
> > +		goto abort;
> > +	memset(ret_val, 0, sizeof(*ret_val));
> > +	sysfs_init(ret_val, container_fd, -1);
> > +	if (ret_val->sys_name[0] == 0)
> > +		goto abort;
> > +
> > +	sprintf(fname, "/sys/block/%s/md/", ret_val->sys_name);
> > +	base = fname + strlen(fname);
> > +
> > +	strcpy(base, "raid_disks");
> > +	if (load_sys(fname, buf))
> > +		goto abort;
> > +	ret_val->array.raid_disks = strtoul(buf, NULL, 0);
> > +
> > +	/* Get all the devices as well */
> > +	*base = 0;
> > +	dir = opendir(fname);
> > +	if (!dir)
> > +		goto abort;
> > +	ret_val->array.spare_disks = 0;
> > +	while ((de = readdir(dir)) != NULL) {
> > +		char *ep;
> > +		if (de->d_ino == 0 ||
> > +		    strncmp(de->d_name, "dev-", 4) != 0)
> > +			continue;
> > +		strcpy(base, de->d_name);
> > +		dbase = base + strlen(base);
> > +		*dbase = '\0';
> > +
> > +		to_check = strstr(fname, "/md/");
> > +		is_in = sysfs_is_spare_device_belongs_to(fd, to_check);
> > +		if (is_in == -1) {
> > +			char *p;
> > +			struct stat stb;
> > +			char stb_name[PATH_MAX];
> > +
> > +			dev = malloc(sizeof(*dev));
> > +			if (!dev)
> > +				goto abort;
> > +			strncpy(dev->text_version, fname, 50);
> > +
> > +			*dbase++ = '/';
> > +
> > +			dev->disk.raid_disk = strtoul(buf, &ep, 10);
> > +			dev->disk.raid_disk = -1;
> > +
> > +			strcpy(dbase, "block/dev");
> > +			if (load_sys(fname, buf)) {
> > +				free(dev);
> > +				continue;
> > +			}
> > +			/* check first if we are working on block device
> > +			 * if not, we cannot check it
> > +			 */
> > +			p = strchr(dev->text_version, '-');
> > +			if (p)
> > +				p++;
> > +			sprintf(stb_name, "/dev/%s", p);
> > +			if (stat(stb_name, &stb) < 0) {
> 
> Taking a string out of /sys and looking it up in /dev is wrong.  Names
> in /dev tend to follow /sys by convention, but there is no guarantee.
> 
> 
> > +				dprintf(Name ": stat failed for %s: %s.\n",
> > +					stb_name, strerror(errno));
> > +				free(dev);
> > +				continue;
> > +			}
> > +			if (!S_ISBLK(stb.st_mode)) {
> 
> And everything that is a member of an array will be a block device, so
> checking that it is a block device is pointless.
> 
> 
> > +				dprintf(Name\
> > +					": %s is not a block device."\
> > +					" Skip checking.\n",
> > +					stb_name);
> > +				goto skip;
> > +			}
> > +			dprintf(Name": %s seams to a be block device\n",
> > +				stb_name);
> > +			sscanf(buf, "%d:%d",
> > +			       &dev->disk.major,
> > +			       &dev->disk.minor);
> > +			strcpy(dbase, "block/device/state");
> > +			if (load_sys(fname, buf) != 0) {
> > +				free(dev);
> > +				continue;
> > +			}
> > +			if (strncmp(buf, "offline", 7) == 0) {
> > +				free(dev);
> > +				continue;
> > +			}
> > +			if (strncmp(buf, "failed", 6) == 0) {
> > +				free(dev);
> > +				continue;
> > +			}
> 
> The string you want to check here is "faulty" not "failed".
> 
> NeilBrown
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (28 preceding siblings ...)
  2010-12-09 15:22 ` [PATCH 29/29] Add spares to raid0 in mdadm Adam Kwolek
@ 2010-12-16 11:20 ` Neil Brown
  2010-12-20  8:27   ` Wojcik, Krzysztof
  29 siblings, 1 reply; 41+ messages in thread
From: Neil Brown @ 2010-12-16 11:20 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

On Thu, 09 Dec 2010 16:18:38 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> This series for mdadm and introduces features (after some rework):
> - Online Capacity Expansion (OLCE): patches 0001 to 0015

I've been making slow work through these.  I'm up to about '0011'.
A number of the patches needed very substantial re-work to fit the model that
I posted earlier and to remove unnecessary complexity and to fit the
requirements of mdmon (where e.g. the monitor is not allowed to allocate
memory).

As it is all now fresh in my mind again I took the opportunity to write a
document describing some of the design philosophy of mdadm, and also updated
the external-reshape-design.txt document.

Both of these can be found in the devel-3.2 branch of my git tree, but I'll
include them here as well.

Hopefully I'll make some more progress tomorrow.

NeilBrown

 
mdmon-design.txt
================


When managing a RAID1 array which uses metadata other than the
"native" metadata understood by the kernel, mdadm makes use of a
partner program named 'mdmon' to manage some aspects of updating
that metadata and synchronising the metadata with the array state.

This document provides some details on how mdmon works.

Containers
----------

As background: mdadm makes a distinction between an 'array' and a
'container'.  Other sources sometimes use the term 'volume' or
'device' for an 'array', and may use the term 'array' for a
'container'.

For our purposes:
 - a 'container' is a collection of devices which are described by a
   single set of metadata.  The metadata may be stored equally
   on all devices, or different devices may have quite different
   subsets of the total metadata.  But there is conceptually one set
   of metadata that unifies the devices.

 - an 'array' is a set of datablock from various devices which
   together are used to present the abstraction of a single linear
   sequence of block, which may provide data redundancy or enhanced
   performance.

So a container has some metadata and provides a number of arrays which
are described by that metadata.

Sometimes this model doesn't work perfectly.  For example, global
spares may have their own metadata which is quite different from the
metadata from any device that participates in one or more arrays.
Such a global spare might still need to belong to some container so
that it is available to be used should a failure arise.  In that case
we consider the 'metadata' to be the union of the metadata on the
active devices which describes the arrays, and the metadata on the
global spares which only describes the spares.  In this case different
devices in the one container will have quite different metadata.


Purpose
-------

The main purpose of mdmon is to update the metadata in response to
changes to the array which need to be reflected in the metadata before
futures writes to the array can safely be performed.
These include:
 - transitions from 'clean' to 'dirty'.
 - recording the devices have failed.
 - recording the progress of a 'reshape'

This requires mdmon to be running at any time that the array is
writable (a read-only array does not require mdmon to be running).

Because mdmon must be able to process these metadata updates at any
time, it must (when running) have exclusive write access to the
metadata.  Any other changes (e.g. reconfiguration of the array) must
go through mdmon.

A secondary role for mdmon is to activate spares when a device fails.
This role is much less time-critical than the other metadata updates,
so it could be performed by a separate process, possibly
"mdadm --monitor" which has a related role of moving devices between
arrays.  A main reason for including this functionality in mdmon is
that in the native-metadata case this function is handled in the
kernel, and mdmon's reason for existence to provide functionality
which is otherwise handled by the kernel.


Design overview
---------------

mdmon is structured as two threads with a common address space and
common data structures.  These threads are know as the 'monitor' and
the 'manager'.

The 'monitor' has the primary role of monitoring the array for
important state changes and updating the metadata accordingly.  As
writes to the array can be blocked until 'monitor' completes and
acknowledges the update, it much be very careful not to block itself.
In particular it must not block waiting for any write to complete else
it could deadlock.  This means that it must not allocate memory as
doing this can require dirty memory to be written out and if the
system choose to write to the array that mdmon is monitoring, the
memory allocation could deadlock.

So 'monitor' must never allocate memory and must limit the number of
other system call it performs. It may:
 - use select (or poll) to wait for activity on a file descriptor
 - read from a sysfs file descriptor
 - write to a sysfs file descriptor
 - write the metadata out to the block devices using O_DIRECT
 - send a signal (kill) to the manager thread

It must not e.g. open files or do anything similar that might allocate
resources.

The 'manager' thread does everything else that is needed.  If any
files are to be opened (e.g. because a device has been added to the
array), the manager does that.  If any memory needs to be allocated
(e.g. to hold data about a new array as can happen when one set of
metadata describes several arrays), the manager performs that
allocation.

The 'manager' is also responsible for communicating with mdadm and
assigning spares to replace failed devices.


Handling metadata updates
-------------------------

There are a number of cases in which mdadm needs to update the
metdata which mdmon is managing.  These include:
 - creating a new array in an active container
 - adding a device to a container
 - reconfiguring an array
etc.

To complete these updates, mdadm must send a message to mdmon which
will merge the update into the metadata as it is at that moment.

To achieve this, mdmon creates a Unix Domain Socket which the manager
thread listens on.  mdadm sends a message over this socket.  The
manager thread examines the message to see if it will require
allocating any memory and allocates it.  This is done in the
'prepare_update' metadata method.

The update message is then queued for handling by the monitor thread
which it will do when convenient.  The monitor thread calls
->process_update which should atomically make the required changes to
the metadata, making use of the pre-allocate memory as required.  Any
memory the is no-longer needed can be placed back in the request and
the manager thread will free it.

The exact format of a metadata update is up to the implementer of the
metadata handlers.  It will simply describe a change that needs to be
made.  It will sometimes contain fragments of the metadata to be
copied in to place.  However the ->process_update routine must make
sure not to over-write any field that the monitor thread might have
updated, such as a 'device failed' or 'array is dirty' state.

When the monitor thread has completed the update and written it to the
devices, an acknowledgement message is sent back over the socket so
that mdadm knows it is complete.


=================================================================================

External Reshape

1 Problem statement

External (third-party metadata) reshape differs from native-metadata
reshape in three key ways:

1.1 Format specific constraints

In the native case reshape is limited by what is implemented in the
generic reshape routine (Grow_reshape()) and what is supported by the
kernel.  There are exceptional cases where Grow_reshape() may block
operations when it knows that the kernel implementation is broken, but
otherwise the kernel is relied upon to be the final arbiter of what
reshape operations are supported.

In the external case the kernel, and the generic checks in
Grow_reshape(), become the super-set of what reshapes are possible.  The
metadata format may not support, or have yet to implement a given
reshape type.  The implication for Grow_reshape() is that it must query
the metadata handler and effect changes in the metadata before the new
geometry is posted to the kernel.  The ->reshape_super method allows
Grow_reshape() to validate the requested operation and post the metadata
update.

1.2 Scope of reshape

Native metadata reshape is always performed at the array scope (no
metadata relationship with sibling arrays on the same disks).  External
reshape, depending on the format, may not allow the number of member
disks to be changed in a subarray unless the change is simultaneously
applied to all subarrays in the container.  For example the imsm format
requires all member disks to be a member of all subarrays, so a 4-disk
raid5 in a container that also houses a 4-disk raid10 array could not be
reshaped to 5 disks as the imsm format does not support a 5-disk raid10
representation.  This requires the ->reshape_super method to check the
contents of the array and ask the user to run the reshape at container
scope (if all subarrays are agreeable to the change), or report an
error in the case where one subarray cannot support the change.

1.3 Monitoring / checkpointing

Reshape, unlike rebuild/resync, requires strict checkpointing to survive
interrupted reshape operations.  For example when expanding a raid5
array the first few stripes of the array will be overwritten in a
destructive manner.  When restarting the reshape process we need to know
the exact location of the last successfully written stripe, and we need
to restore the data in any partially overwritten stripe.  Native
metadata stores this backup data in the unused portion of spares that
are being promoted to array members, or in an external backup file
(located on a non-involved block device).

The kernel is in charge of recording checkpoints of reshape progress,
but mdadm is delegated the task of managing the backup space which
involves:
1/ Identifying what data will be overwritten in the next unit of reshape
   operation
2/ Suspending access to that region so that a snapshot of the data can
   be transferred to the backup space.
3/ Allowing the kernel to reshape the saved region and setting the
   boundary for the next backup.

In the external reshape case we want to preserve this mdadm
'reshape-manager' arrangement, but have a third actor, mdmon, to
consider.  It is tempting to give the role of managing reshape to mdmon,
but that is counter to its role as a monitor, and conflicts with the
existing capabilities and role of mdadm to manage the progress of
reshape.  For clarity the external reshape implementation maintains the
role of mdmon as a (mostly) passive recorder of raid events, and mdadm
treats it as it would the kernel in the native reshape case (modulo
needing to send explicit metadata update messages and checking that
mdmon took the expected action).

External reshape can use the generic md backup file as a fallback, but in the
optimal/firmware-compatible case the reshape-manager will use the metadata
specific areas for managing reshape.  The implementation also needs to spawn a
reshape-manager per subarray when the reshape is being carried out at the
container level.  For these two reasons the ->manage_reshape() method is
introduced.  This method in addition to base tasks mentioned above:
1/ Processed each subarray one at a time in series - where appropriate.
2/ Uses either generic routines in Grow.c for md-style backup file
   support, or uses the metadata-format specific location for storing
   recovery data.
This aims to avoid a "midlayer mistake"[1] and lets the metadata handler
optionally take advantage of generic infrastructure in Grow.c

2 Details for specific reshape requests

There are quite a few moving pieces spread out across md, mdadm, and mdmon for
the support of external reshape, and there are several different types of
reshape that need to be comprehended by the implementation.  A rundown of
these details follows.

2.0 General provisions:

Obtain an exclusive open on the container to make sure we are not
running concurrently with a Create() event.

2.1 Freezing sync_action

   Before making any attempt at a reshape we 'freeze' every array in
   the container to ensure no spare assignment or recovery happens.
   This involves writing 'frozen' to sync_action and changing the '/'
   after 'external:' in metadata_version to a '-'. mdmon knows that
   this means not to perform any management.

   Before doing this we check that all sync_actions are 'idle', which
   is racy but still useful.
   Afterwards we check that all member arrays have no spares
   or partial spares (recovery_start != 'none') which would indicate a
   race.  If they do, we unfreeze again.

   Once this completes we know all the arrays are stable.  They may
   still have failed devices as devices can fail at any time.  However
   we treat those like failures that happen during the reshape.

2.2 Reshape size

   1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
      initializes st->update_tail
   2/ mdadm::Grow_reshape() calls ->reshape_super() to check that the size change
      is allowed (being performed at subarray scope / enough room) prepares a
      metadata update
   3/ mdadm::Grow_reshape(): flushes the metadata update (via
      flush_metadata_update(), or ->sync_metadata())
   4/ mdadm::Grow_reshape(): post the new size to the kernel


2.3 Reshape level (simple-takeover)

"simple-takeover" implies the level change can be satisfied without touching
sync_action

    1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
       initializes st->update_tail
    2/ mdadm::Grow_reshape() calls ->reshape_super() to check that the level change
       is allowed (being performed at subarray scope) prepares a
       metadata update
       2a/ raid10 --> raid0: degrade all mirror legs prior to calling
           ->reshape_super
    3/ mdadm::Grow_reshape(): flushes the metadata update (via
       flush_metadata_update(), or ->sync_metadata())
    4/ mdadm::Grow_reshape(): post the new level to the kernel

2.4 Reshape chunk, layout

2.5 Reshape raid disks (grow)

    1/ mdadm::Grow_reshape(): unconditionally initializes st->update_tail
       because only redundant raid levels can modify the number of raid disks
    2/ mdadm::Grow_reshape(): calls ->reshape_super() to check that the level
       change is allowed (being performed at proper scope / permissible
       geometry / proper spares available in the container), chooses
       the spares to use, and prepares a metadata update.
    3/ mdadm::Grow_reshape(): Converts each subarray in the container to the
       raid level that can perform the reshape and starts mdmon.
    4/ mdadm::Grow_reshape(): Pushes the update to mdmon.
    5/ mdadm::Grow_reshape(): uses container_content to find details of
       the spares and passes them to the kernel.
    6/ mdadm::Grow_reshape(): gives raid_disks update to the kernel,
       sets sync_max, sync_min, suspend_lo, suspend_hi all to zero,
       and starts the reshape by writing 'reshape' to sync_action.
    7/ mdmon::monitor notices the sync_action change and tells
       managemon to check for new devices.  managemon notices the new
       devices, opens relevant sysfs file, and passes them all to
       monitor.
    8/ mdadm::Grow_reshape() calls ->manage_reshape to oversee the
       rest of the reshape.
       
    9/ mdadm::<format>->manage_reshape(): saves data that will be overwritten by
       the kernel to either the backup file or the metadata specific location,
       advances sync_max, waits for reshape, ping mdmon, repeat.
       Meanwhile mdmon::read_and_act(): records checkpoints.
       Specifically.

       9a/ if the 'next' stripe to be reshaped will over-write
           itself during reshape then:
	9a.1/ increase suspend_hi to cover a suitable number of
           stripes.
	9a.2/ backup those stripes safely.
	9a.3/ advance sync_max to allow those stripes to be backed up
	9a.4/ when sync_completed indicates that those stripes have
           been reshaped, manage_reshape must ping_manager
	9a.5/ when mdmon notices that sync_completed has been updated,
           it records the new checkpoint in the metadata
	9a.6/ after the ping_manager, manage_reshape will increase
           suspend_lo to allow access to those stripes again

       9b/ if the 'next' stripe to be reshaped will over-write unused
           space during reshape then we apply same process as above,
	   except that there is no need to back anything up.
	   Note that we *do* need to keep suspend_hi progressing as
	   it is not safe to write to the area-under-reshape.  For
	   kernel-managed-metadata this protection is provided by
	   ->reshape_safe, but that does not protect us in the case
	   of user-space-managed-metadata.
	   
   10/ mdadm::<format>->manage_reshape(): Once reshape completes changes the raid
       level back to the nominal raid level (if necessary)

       FIXME: native metadata does not have the capability to record the original
       raid level in reshape-restart case because the kernel always records current
       raid level to the metadata, whereas external metadata can masquerade at an
       alternate level based on the reshape state.

2.6 Reshape raid disks (shrink)

3 TODO

...

[1]: Linux kernel design patterns - part 3, Neil Brown http://lwn.net/Articles/336262/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-16 11:20 ` [PATCH 00/29] OLCE, migrations and raid10 takeover Neil Brown
@ 2010-12-20  8:27   ` Wojcik, Krzysztof
  2010-12-20 21:59     ` Neil Brown
  2010-12-21 12:37     ` Neil Brown
  0 siblings, 2 replies; 41+ messages in thread
From: Wojcik, Krzysztof @ 2010-12-20  8:27 UTC (permalink / raw)
  To: Neil Brown, Kwolek, Adam
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech

Neil,

How we can help you to speed up your work?

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Neil Brown
> Sent: Thursday, December 16, 2010 12:21 PM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
>
> On Thu, 09 Dec 2010 16:18:38 +0100 Adam Kwolek <adam.kwolek@intel.com>
> wrote:
>
> > This series for mdadm and introduces features (after some rework):
> > - Online Capacity Expansion (OLCE): patches 0001 to 0015
>
> I've been making slow work through these.  I'm up to about '0011'.
> A number of the patches needed very substantial re-work to fit the
> model that
> I posted earlier and to remove unnecessary complexity and to fit the
> requirements of mdmon (where e.g. the monitor is not allowed to
> allocate
> memory).
>
> As it is all now fresh in my mind again I took the opportunity to write
> a
> document describing some of the design philosophy of mdadm, and also
> updated
> the external-reshape-design.txt document.
>
> Both of these can be found in the devel-3.2 branch of my git tree, but
> I'll
> include them here as well.
>
> Hopefully I'll make some more progress tomorrow.
>
> NeilBrown
>
>
> mdmon-design.txt
> ================
>
>
> When managing a RAID1 array which uses metadata other than the
> "native" metadata understood by the kernel, mdadm makes use of a
> partner program named 'mdmon' to manage some aspects of updating
> that metadata and synchronising the metadata with the array state.
>
> This document provides some details on how mdmon works.
>
> Containers
> ----------
>
> As background: mdadm makes a distinction between an 'array' and a
> 'container'.  Other sources sometimes use the term 'volume' or
> 'device' for an 'array', and may use the term 'array' for a
> 'container'.
>
> For our purposes:
>  - a 'container' is a collection of devices which are described by a
>    single set of metadata.  The metadata may be stored equally
>    on all devices, or different devices may have quite different
>    subsets of the total metadata.  But there is conceptually one set
>    of metadata that unifies the devices.
>
>  - an 'array' is a set of datablock from various devices which
>    together are used to present the abstraction of a single linear
>    sequence of block, which may provide data redundancy or enhanced
>    performance.
>
> So a container has some metadata and provides a number of arrays which
> are described by that metadata.
>
> Sometimes this model doesn't work perfectly.  For example, global
> spares may have their own metadata which is quite different from the
> metadata from any device that participates in one or more arrays.
> Such a global spare might still need to belong to some container so
> that it is available to be used should a failure arise.  In that case
> we consider the 'metadata' to be the union of the metadata on the
> active devices which describes the arrays, and the metadata on the
> global spares which only describes the spares.  In this case different
> devices in the one container will have quite different metadata.
>
>
> Purpose
> -------
>
> The main purpose of mdmon is to update the metadata in response to
> changes to the array which need to be reflected in the metadata before
> futures writes to the array can safely be performed.
> These include:
>  - transitions from 'clean' to 'dirty'.
>  - recording the devices have failed.
>  - recording the progress of a 'reshape'
>
> This requires mdmon to be running at any time that the array is
> writable (a read-only array does not require mdmon to be running).
>
> Because mdmon must be able to process these metadata updates at any
> time, it must (when running) have exclusive write access to the
> metadata.  Any other changes (e.g. reconfiguration of the array) must
> go through mdmon.
>
> A secondary role for mdmon is to activate spares when a device fails.
> This role is much less time-critical than the other metadata updates,
> so it could be performed by a separate process, possibly
> "mdadm --monitor" which has a related role of moving devices between
> arrays.  A main reason for including this functionality in mdmon is
> that in the native-metadata case this function is handled in the
> kernel, and mdmon's reason for existence to provide functionality
> which is otherwise handled by the kernel.
>
>
> Design overview
> ---------------
>
> mdmon is structured as two threads with a common address space and
> common data structures.  These threads are know as the 'monitor' and
> the 'manager'.
>
> The 'monitor' has the primary role of monitoring the array for
> important state changes and updating the metadata accordingly.  As
> writes to the array can be blocked until 'monitor' completes and
> acknowledges the update, it much be very careful not to block itself.
> In particular it must not block waiting for any write to complete else
> it could deadlock.  This means that it must not allocate memory as
> doing this can require dirty memory to be written out and if the
> system choose to write to the array that mdmon is monitoring, the
> memory allocation could deadlock.
>
> So 'monitor' must never allocate memory and must limit the number of
> other system call it performs. It may:
>  - use select (or poll) to wait for activity on a file descriptor
>  - read from a sysfs file descriptor
>  - write to a sysfs file descriptor
>  - write the metadata out to the block devices using O_DIRECT
>  - send a signal (kill) to the manager thread
>
> It must not e.g. open files or do anything similar that might allocate
> resources.
>
> The 'manager' thread does everything else that is needed.  If any
> files are to be opened (e.g. because a device has been added to the
> array), the manager does that.  If any memory needs to be allocated
> (e.g. to hold data about a new array as can happen when one set of
> metadata describes several arrays), the manager performs that
> allocation.
>
> The 'manager' is also responsible for communicating with mdadm and
> assigning spares to replace failed devices.
>
>
> Handling metadata updates
> -------------------------
>
> There are a number of cases in which mdadm needs to update the
> metdata which mdmon is managing.  These include:
>  - creating a new array in an active container
>  - adding a device to a container
>  - reconfiguring an array
> etc.
>
> To complete these updates, mdadm must send a message to mdmon which
> will merge the update into the metadata as it is at that moment.
>
> To achieve this, mdmon creates a Unix Domain Socket which the manager
> thread listens on.  mdadm sends a message over this socket.  The
> manager thread examines the message to see if it will require
> allocating any memory and allocates it.  This is done in the
> 'prepare_update' metadata method.
>
> The update message is then queued for handling by the monitor thread
> which it will do when convenient.  The monitor thread calls
> ->process_update which should atomically make the required changes to
> the metadata, making use of the pre-allocate memory as required.  Any
> memory the is no-longer needed can be placed back in the request and
> the manager thread will free it.
>
> The exact format of a metadata update is up to the implementer of the
> metadata handlers.  It will simply describe a change that needs to be
> made.  It will sometimes contain fragments of the metadata to be
> copied in to place.  However the ->process_update routine must make
> sure not to over-write any field that the monitor thread might have
> updated, such as a 'device failed' or 'array is dirty' state.
>
> When the monitor thread has completed the update and written it to the
> devices, an acknowledgement message is sent back over the socket so
> that mdadm knows it is complete.
>
>
> =======================================================================
> ==========
>
> External Reshape
>
> 1 Problem statement
>
> External (third-party metadata) reshape differs from native-metadata
> reshape in three key ways:
>
> 1.1 Format specific constraints
>
> In the native case reshape is limited by what is implemented in the
> generic reshape routine (Grow_reshape()) and what is supported by the
> kernel.  There are exceptional cases where Grow_reshape() may block
> operations when it knows that the kernel implementation is broken, but
> otherwise the kernel is relied upon to be the final arbiter of what
> reshape operations are supported.
>
> In the external case the kernel, and the generic checks in
> Grow_reshape(), become the super-set of what reshapes are possible.
> The
> metadata format may not support, or have yet to implement a given
> reshape type.  The implication for Grow_reshape() is that it must query
> the metadata handler and effect changes in the metadata before the new
> geometry is posted to the kernel.  The ->reshape_super method allows
> Grow_reshape() to validate the requested operation and post the
> metadata
> update.
>
> 1.2 Scope of reshape
>
> Native metadata reshape is always performed at the array scope (no
> metadata relationship with sibling arrays on the same disks).  External
> reshape, depending on the format, may not allow the number of member
> disks to be changed in a subarray unless the change is simultaneously
> applied to all subarrays in the container.  For example the imsm format
> requires all member disks to be a member of all subarrays, so a 4-disk
> raid5 in a container that also houses a 4-disk raid10 array could not
> be
> reshaped to 5 disks as the imsm format does not support a 5-disk raid10
> representation.  This requires the ->reshape_super method to check the
> contents of the array and ask the user to run the reshape at container
> scope (if all subarrays are agreeable to the change), or report an
> error in the case where one subarray cannot support the change.
>
> 1.3 Monitoring / checkpointing
>
> Reshape, unlike rebuild/resync, requires strict checkpointing to
> survive
> interrupted reshape operations.  For example when expanding a raid5
> array the first few stripes of the array will be overwritten in a
> destructive manner.  When restarting the reshape process we need to
> know
> the exact location of the last successfully written stripe, and we need
> to restore the data in any partially overwritten stripe.  Native
> metadata stores this backup data in the unused portion of spares that
> are being promoted to array members, or in an external backup file
> (located on a non-involved block device).
>
> The kernel is in charge of recording checkpoints of reshape progress,
> but mdadm is delegated the task of managing the backup space which
> involves:
> 1/ Identifying what data will be overwritten in the next unit of
> reshape
>    operation
> 2/ Suspending access to that region so that a snapshot of the data can
>    be transferred to the backup space.
> 3/ Allowing the kernel to reshape the saved region and setting the
>    boundary for the next backup.
>
> In the external reshape case we want to preserve this mdadm
> 'reshape-manager' arrangement, but have a third actor, mdmon, to
> consider.  It is tempting to give the role of managing reshape to
> mdmon,
> but that is counter to its role as a monitor, and conflicts with the
> existing capabilities and role of mdadm to manage the progress of
> reshape.  For clarity the external reshape implementation maintains the
> role of mdmon as a (mostly) passive recorder of raid events, and mdadm
> treats it as it would the kernel in the native reshape case (modulo
> needing to send explicit metadata update messages and checking that
> mdmon took the expected action).
>
> External reshape can use the generic md backup file as a fallback, but
> in the
> optimal/firmware-compatible case the reshape-manager will use the
> metadata
> specific areas for managing reshape.  The implementation also needs to
> spawn a
> reshape-manager per subarray when the reshape is being carried out at
> the
> container level.  For these two reasons the ->manage_reshape() method
> is
> introduced.  This method in addition to base tasks mentioned above:
> 1/ Processed each subarray one at a time in series - where appropriate.
> 2/ Uses either generic routines in Grow.c for md-style backup file
>    support, or uses the metadata-format specific location for storing
>    recovery data.
> This aims to avoid a "midlayer mistake"[1] and lets the metadata
> handler
> optionally take advantage of generic infrastructure in Grow.c
>
> 2 Details for specific reshape requests
>
> There are quite a few moving pieces spread out across md, mdadm, and
> mdmon for
> the support of external reshape, and there are several different types
> of
> reshape that need to be comprehended by the implementation.  A rundown
> of
> these details follows.
>
> 2.0 General provisions:
>
> Obtain an exclusive open on the container to make sure we are not
> running concurrently with a Create() event.
>
> 2.1 Freezing sync_action
>
>    Before making any attempt at a reshape we 'freeze' every array in
>    the container to ensure no spare assignment or recovery happens.
>    This involves writing 'frozen' to sync_action and changing the '/'
>    after 'external:' in metadata_version to a '-'. mdmon knows that
>    this means not to perform any management.
>
>    Before doing this we check that all sync_actions are 'idle', which
>    is racy but still useful.
>    Afterwards we check that all member arrays have no spares
>    or partial spares (recovery_start != 'none') which would indicate a
>    race.  If they do, we unfreeze again.
>
>    Once this completes we know all the arrays are stable.  They may
>    still have failed devices as devices can fail at any time.  However
>    we treat those like failures that happen during the reshape.
>
> 2.2 Reshape size
>
>    1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
>       initializes st->update_tail
>    2/ mdadm::Grow_reshape() calls ->reshape_super() to check that the
> size change
>       is allowed (being performed at subarray scope / enough room)
> prepares a
>       metadata update
>    3/ mdadm::Grow_reshape(): flushes the metadata update (via
>       flush_metadata_update(), or ->sync_metadata())
>    4/ mdadm::Grow_reshape(): post the new size to the kernel
>
>
> 2.3 Reshape level (simple-takeover)
>
> "simple-takeover" implies the level change can be satisfied without
> touching
> sync_action
>
>     1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
>        initializes st->update_tail
>     2/ mdadm::Grow_reshape() calls ->reshape_super() to check that the
> level change
>        is allowed (being performed at subarray scope) prepares a
>        metadata update
>        2a/ raid10 --> raid0: degrade all mirror legs prior to calling
>            ->reshape_super
>     3/ mdadm::Grow_reshape(): flushes the metadata update (via
>        flush_metadata_update(), or ->sync_metadata())
>     4/ mdadm::Grow_reshape(): post the new level to the kernel
>
> 2.4 Reshape chunk, layout
>
> 2.5 Reshape raid disks (grow)
>
>     1/ mdadm::Grow_reshape(): unconditionally initializes st-
> >update_tail
>        because only redundant raid levels can modify the number of raid
> disks
>     2/ mdadm::Grow_reshape(): calls ->reshape_super() to check that the
> level
>        change is allowed (being performed at proper scope / permissible
>        geometry / proper spares available in the container), chooses
>        the spares to use, and prepares a metadata update.
>     3/ mdadm::Grow_reshape(): Converts each subarray in the container
> to the
>        raid level that can perform the reshape and starts mdmon.
>     4/ mdadm::Grow_reshape(): Pushes the update to mdmon.
>     5/ mdadm::Grow_reshape(): uses container_content to find details of
>        the spares and passes them to the kernel.
>     6/ mdadm::Grow_reshape(): gives raid_disks update to the kernel,
>        sets sync_max, sync_min, suspend_lo, suspend_hi all to zero,
>        and starts the reshape by writing 'reshape' to sync_action.
>     7/ mdmon::monitor notices the sync_action change and tells
>        managemon to check for new devices.  managemon notices the new
>        devices, opens relevant sysfs file, and passes them all to
>        monitor.
>     8/ mdadm::Grow_reshape() calls ->manage_reshape to oversee the
>        rest of the reshape.
>
>     9/ mdadm::<format>->manage_reshape(): saves data that will be
> overwritten by
>        the kernel to either the backup file or the metadata specific
> location,
>        advances sync_max, waits for reshape, ping mdmon, repeat.
>        Meanwhile mdmon::read_and_act(): records checkpoints.
>        Specifically.
>
>        9a/ if the 'next' stripe to be reshaped will over-write
>            itself during reshape then:
>       9a.1/ increase suspend_hi to cover a suitable number of
>            stripes.
>       9a.2/ backup those stripes safely.
>       9a.3/ advance sync_max to allow those stripes to be backed up
>       9a.4/ when sync_completed indicates that those stripes have
>            been reshaped, manage_reshape must ping_manager
>       9a.5/ when mdmon notices that sync_completed has been updated,
>            it records the new checkpoint in the metadata
>       9a.6/ after the ping_manager, manage_reshape will increase
>            suspend_lo to allow access to those stripes again
>
>        9b/ if the 'next' stripe to be reshaped will over-write unused
>            space during reshape then we apply same process as above,
>          except that there is no need to back anything up.
>          Note that we *do* need to keep suspend_hi progressing as
>          it is not safe to write to the area-under-reshape.  For
>          kernel-managed-metadata this protection is provided by
>          ->reshape_safe, but that does not protect us in the case
>          of user-space-managed-metadata.
>
>    10/ mdadm::<format>->manage_reshape(): Once reshape completes
> changes the raid
>        level back to the nominal raid level (if necessary)
>
>        FIXME: native metadata does not have the capability to record
> the original
>        raid level in reshape-restart case because the kernel always
> records current
>        raid level to the metadata, whereas external metadata can
> masquerade at an
>        alternate level based on the reshape state.
>
> 2.6 Reshape raid disks (shrink)
>
> 3 TODO
>
> ...
>
> [1]: Linux kernel design patterns - part 3, Neil Brown
> http://lwn.net/Articles/336262/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-20  8:27   ` Wojcik, Krzysztof
@ 2010-12-20 21:59     ` Neil Brown
  2010-12-21 12:37     ` Neil Brown
  1 sibling, 0 replies; 41+ messages in thread
From: Neil Brown @ 2010-12-20 21:59 UTC (permalink / raw)
  To: Wojcik, Krzysztof
  Cc: Kwolek, Adam, linux-raid, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech

On Mon, 20 Dec 2010 08:27:38 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> Neil,
> 
> How we can help you to speed up your work?
> 

Thanks for the offer, but you probably cannot at the moment.

You (collectively) have already done a great job in exploring the problem
space and hitting lots of different issues and attempted solutions.
That allows me to get a good understanding of what is required and where the
difficulties and important details etc.
I can see that the solutions you have come up with aren't ideal (aren't
really easy to maintain and are subtly wrong in some cases), largely because
the infrastructure that you were working in was too restrictive.

Now that I have that big, detailed, picture, I can re-write the infrastructure
to (hopefully) make it easy to implement the same functionality in a much
clearer and more maintainable way.  And I'm getting close.

Once that is done, you can certainly help by modifying the IMSM specific code
to fit the new model (which I will try to document a bit better).


Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-20  8:27   ` Wojcik, Krzysztof
  2010-12-20 21:59     ` Neil Brown
@ 2010-12-21 12:37     ` Neil Brown
  2010-12-27 14:56       ` Kwolek, Adam
  1 sibling, 1 reply; 41+ messages in thread
From: Neil Brown @ 2010-12-21 12:37 UTC (permalink / raw)
  To: Wojcik, Krzysztof
  Cc: Kwolek, Adam, linux-raid, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech

On Mon, 20 Dec 2010 08:27:38 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> Neil,
> 
> How we can help you to speed up your work?
> 

An update:

 I got the code to a state where the shape of what I'm trying to do should be
 clear, though the code to do it isn't quite complete yet.
 The last commit in my devel-3.2 tree contains some incomplete code, some
 enhanced documentation for how it is supposed to work, and a to-do list
 to remind me what I should be doing next.

 If any of your people have time to look at it and possibly fill in some of
 the gaps that would be great.  I am on leave for the next 2 weeks but I'll
 be reading email and will try to respond to any question or small patches
 that arise (I won't be reviewing any long patch series though).

 Probably the most useful next step would be to fit the migration checkpoint
 patches into the new approach - that should be fairly straight forward.
 Then probably look at reshape_super and make sure it is doing the right
 thing according to the new rules.
 manage_reshape will probably need to wait until the new code has stablised a
 bit.

Thanks,

NeilBrown


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-21 12:37     ` Neil Brown
@ 2010-12-27 14:56       ` Kwolek, Adam
  2010-12-28  2:24         ` Neil Brown
  0 siblings, 1 reply; 41+ messages in thread
From: Kwolek, Adam @ 2010-12-27 14:56 UTC (permalink / raw)
  To: Neil Brown, Wojcik, Krzysztof
  Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech

Hi,

Please find below my few comments for current grow/reshape code on devel 3.2.

In super-intel.c, update_reshape_container_disk update is defined.
It is intended to send for container only and operates on all devices /arrays/ in container in one shot. 
This is IMSM incompatible way. IMSM updates disks list in metadata and first device information during first array reshape.
During second array reshape, device is updated only (using disk information updated during first reshape)
As it I am showing above. Container operation is in fact set of single array operations. This is due to single migration area (per container),
where backup is stored. It is related to currently reshaped device and we can have single array in reshape state only.

This drives to conclusion that update should be issued per array and should have device id (dev minor or subdev).

Considering above current implementation of update_reshape_container_disk is not compatible with imsm.

This is driven by current reshape_container() implementation in Grow.c.
The flow goes as follow:
1. reshape_container()
  - metadata update for container (+ sync_metadata)
  - call in loop reshape_array() for each array in container

2. reshape_array()
   - adds disks for each device in container - but it should add disks to this particular array it is called for (loop is on reshape_container() already)
   - reshape_array() - currently doesn't update metadata, if so - in line 1891 sync_metadata() seems to be not necessary /no reshape_super() call/
   - currently reshape_array() assumes that metadata is ready for all devices in container - this is not imsm compatible /as described above/

How mdadm should work to be imsm compatible:
1. container operation is set of array operations (not atomic/single container update)
2. for every array the same disk set should be used (it is covered in current devel 3.2 code but not in imsm compatible way)
3. there is not metadata update after reshape_super() call for container.
4. first array update should update disk list in metadata
6. second array update should reuse (+verify) used disk list present in metadata already
7. reshape cannot be run in parallel

Main code changes I'm proposing?
Problem 1.
	Main code in Grow.c cannot assume when whole metadata /all arrays in container/ will be updated for particular external meta
	during reshape_super() in reshape_container(). It can issue no update to container/arrays/ due to compatibility reason.
Resolution 1.
	reshape_super() has to be called for container and for every array in reshape_array() (i.e.) before new disks will be added to array (and after monitor is running).
	We can distinguish calls by comparing (as currently), st->container_dev and st->devnum.
	This causes that in reshape_array(), spares cannot be added to all arrays in container at one time (based on metadata information), 
	but for current array only (all arrays will be covered by loop in reshape_container()).
	This means also, that disks cannot be added based on metadata information without adding additional reshape_suped() call in reshape_array(),
	as it cannot be assumed that all arrays in container are updated at this moment without this call.

Problem 2.
	As reshape_container() performs action for all arrays in container probably we should allow for code reuse from Grow.c by external meta.
Resolution 2.
	In my implementation, I've did it by moving manage_reshape() call to the end of reshape processing. In this function I've manage array size and backward takeover to raid0 /optionally/
  	I think that this would be correct for imsm, but to have flexibility manage_reshape() call should not be changed. In such situation additional function can be added:
	i.e. finalize_reshape() for purposes I've pointed above (after wait_reshape() and before level change action, Grow.c:2045).
	Action after manage_reshape() should be based on manage_reshape() on condition if it exists and is yes, on return code form this call.
	This allows metadata specific code decide to reuse or not /or for some cases/ as much as it is possible of main reshape code.

Problem 3.
	After manage_reshape() call we are returning to main reshape_container() code, I assume that manage_reshape() should execute at some point fork() and return,
	as code in Grow.c does. This can cause running parallel reshapes using i.e. single backup file.
Resolution 3. 
	Resolution will depend on particular metadata implementation, but as it is single backup_file used, I can assume that intention is sequential reshapes run.
	This can be handled in reshape_super() by waiting for all arrays to be in idle state. This entry condition should be checked in reshape_super() called in reshape_container()
	without waiting for state change, and in reshape_super() called in reshape_array() with waiting for idle state.


Considering all above I think that container reshape should work as follow (using current devel 3.2 code):

reshape_container()
{
  a) reshape_super(container context)
	- check reshape condition and exit with error when any array is not ready for reshape
	  Checking condition here makes us safe that we can wait for required condition during reshape_super() call in array context later
	- leaving this call here makes possible to implement container update also if metadata requires this
  b) fork() if reshape_super() call succeed (+).
	For multi-array container this is required to return to console for sequential work on arrays. 
	When we want to procced for second array, we have to wait until first reshape is finished. 
	This would block console.
  c) call in loop for all devices reshape_array()
	1. analyse_change()
	2. change level and start monitor (if required)
	3. reshape_super(array context) (+)
		- check reshape condition, if any array is in reshape state in container this time wait for reshape end.
		- send metadata update and allow for reshape processing
	4. load metadata (container_content()) and based on this information add disks to md for single array (currently processed)
	5. set other array parameters to md (including raid_disks)
	6. start_reshape()
	7. manage_reshape()(+) -> if metadata requires take ownership of process or ...
	8. control reshape by Grow.c code using fork() 
	   (possible that fork usage can be optional for container, as it is forked already)
		- child_XXX()
		- wait reshape
		- finalize_reshape(for external metadata)(+) - we can handle size change in common code usage case
		- backward takeover
	9. return to reshape_container() and process next array (c.) if any
}

Things added to your current design (marked above with (+)):
	1. additional fork()
	2. additional reshape_super() in reshape_array()
	3. change in call to manage_reshape() to allow processing in main code for external metadata
		Depends on:
			- if manage_reshape() exists
			- manage_reshape() return code
	4. added finalize_reshape()


Above flow gives possibility to:
1. run reshapes in sequence or
2. run reshapes in parallel 
3. reuse Grow.c code (we can use manage_reshape, finalize_reshape, code in Grow.c or any reasonable combination)

	and all this is up to metadata specific code (reshape_super()/manage_reshape()/finalize_reshape()).

Please let me know what do you think about this. If you like it, I'll start work on code changes.

BR
Adam



> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Tuesday, December 21, 2010 1:37 PM
> To: Wojcik, Krzysztof
> Cc: Kwolek, Adam; linux-raid@vger.kernel.org; Williams, Dan J;
> Ciechanowski, Ed; Neubauer, Wojciech
> Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
> 
> On Mon, 20 Dec 2010 08:27:38 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> 
> > Neil,
> >
> > How we can help you to speed up your work?
> >
> 
> An update:
> 
>  I got the code to a state where the shape of what I'm trying to do
> should be
>  clear, though the code to do it isn't quite complete yet.
>  The last commit in my devel-3.2 tree contains some incomplete code,
> some
>  enhanced documentation for how it is supposed to work, and a to-do
> list
>  to remind me what I should be doing next.
> 
>  If any of your people have time to look at it and possibly fill in
> some of
>  the gaps that would be great.  I am on leave for the next 2 weeks but
> I'll
>  be reading email and will try to respond to any question or small
> patches
>  that arise (I won't be reviewing any long patch series though).
> 
>  Probably the most useful next step would be to fit the migration
> checkpoint
>  patches into the new approach - that should be fairly straight
> forward.
>  Then probably look at reshape_super and make sure it is doing the
> right
>  thing according to the new rules.
>  manage_reshape will probably need to wait until the new code has
> stablised a
>  bit.
> 
> Thanks,
> 
> NeilBrown


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-27 14:56       ` Kwolek, Adam
@ 2010-12-28  2:24         ` Neil Brown
  2010-12-28  8:49           ` Kwolek, Adam
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Brown @ 2010-12-28  2:24 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: Wojcik, Krzysztof, linux-raid, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech

On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> Hi,
> 
> Please find below my few comments for current grow/reshape code on devel 3.2.
> 
> In super-intel.c, update_reshape_container_disk update is defined.
> It is intended to send for container only and operates on all devices /arrays/ in container in one shot. 
> This is IMSM incompatible way. IMSM updates disks list in metadata and first device information during first array reshape.
> During second array reshape, device is updated only (using disk information updated during first reshape)
> As it I am showing above. Container operation is in fact set of single array operations. This is due to single migration area (per container),
> where backup is stored. It is related to currently reshaped device and we can have single array in reshape state only.
> 
> This drives to conclusion that update should be issued per array and should have device id (dev minor or subdev).
> 
> Considering above current implementation of update_reshape_container_disk is not compatible with imsm.
> 
> This is driven by current reshape_container() implementation in Grow.c.
> The flow goes as follow:
> 1. reshape_container()
>   - metadata update for container (+ sync_metadata)
>   - call in loop reshape_array() for each array in container
> 
> 2. reshape_array()
>    - adds disks for each device in container - but it should add disks to this particular array it is called for (loop is on reshape_container() already)
>    - reshape_array() - currently doesn't update metadata, if so - in line 1891 sync_metadata() seems to be not necessary /no reshape_super() call/
>    - currently reshape_array() assumes that metadata is ready for all devices in container - this is not imsm compatible /as described above/
> 
> How mdadm should work to be imsm compatible:
> 1. container operation is set of array operations (not atomic/single container update)
> 2. for every array the same disk set should be used (it is covered in current devel 3.2 code but not in imsm compatible way)
> 3. there is not metadata update after reshape_super() call for container.
> 4. first array update should update disk list in metadata
> 6. second array update should reuse (+verify) used disk list present in metadata already
> 7. reshape cannot be run in parallel
> 
> Main code changes I'm proposing?
> Problem 1.
> 	Main code in Grow.c cannot assume when whole metadata /all arrays in container/ will be updated for particular external meta
> 	during reshape_super() in reshape_container(). It can issue no update to container/arrays/ due to compatibility reason.
> Resolution 1.
> 	reshape_super() has to be called for container and for every array in reshape_array() (i.e.) before new disks will be added to array (and after monitor is running).
> 	We can distinguish calls by comparing (as currently), st->container_dev and st->devnum.
> 	This causes that in reshape_array(), spares cannot be added to all arrays in container at one time (based on metadata information), 
> 	but for current array only (all arrays will be covered by loop in reshape_container()).
> 	This means also, that disks cannot be added based on metadata information without adding additional reshape_suped() call in reshape_array(),
> 	as it cannot be assumed that all arrays in container are updated at this moment without this call.
> 
> Problem 2.
> 	As reshape_container() performs action for all arrays in container probably we should allow for code reuse from Grow.c by external meta.
> Resolution 2.
> 	In my implementation, I've did it by moving manage_reshape() call to the end of reshape processing. In this function I've manage array size and backward takeover to raid0 /optionally/
>   	I think that this would be correct for imsm, but to have flexibility manage_reshape() call should not be changed. In such situation additional function can be added:
> 	i.e. finalize_reshape() for purposes I've pointed above (after wait_reshape() and before level change action, Grow.c:2045).
> 	Action after manage_reshape() should be based on manage_reshape() on condition if it exists and is yes, on return code form this call.
> 	This allows metadata specific code decide to reuse or not /or for some cases/ as much as it is possible of main reshape code.
> 
> Problem 3.
> 	After manage_reshape() call we are returning to main reshape_container() code, I assume that manage_reshape() should execute at some point fork() and return,
> 	as code in Grow.c does. This can cause running parallel reshapes using i.e. single backup file.
> Resolution 3. 
> 	Resolution will depend on particular metadata implementation, but as it is single backup_file used, I can assume that intention is sequential reshapes run.
> 	This can be handled in reshape_super() by waiting for all arrays to be in idle state. This entry condition should be checked in reshape_super() called in reshape_container()
> 	without waiting for state change, and in reshape_super() called in reshape_array() with waiting for idle state.
> 
> 
> Considering all above I think that container reshape should work as follow (using current devel 3.2 code):
> 
> reshape_container()
> {
>   a) reshape_super(container context)
> 	- check reshape condition and exit with error when any array is not ready for reshape
> 	  Checking condition here makes us safe that we can wait for required condition during reshape_super() call in array context later
> 	- leaving this call here makes possible to implement container update also if metadata requires this
>   b) fork() if reshape_super() call succeed (+).
> 	For multi-array container this is required to return to console for sequential work on arrays. 
> 	When we want to procced for second array, we have to wait until first reshape is finished. 
> 	This would block console.
>   c) call in loop for all devices reshape_array()
> 	1. analyse_change()
> 	2. change level and start monitor (if required)
> 	3. reshape_super(array context) (+)
> 		- check reshape condition, if any array is in reshape state in container this time wait for reshape end.
> 		- send metadata update and allow for reshape processing
> 	4. load metadata (container_content()) and based on this information add disks to md for single array (currently processed)
> 	5. set other array parameters to md (including raid_disks)
> 	6. start_reshape()
> 	7. manage_reshape()(+) -> if metadata requires take ownership of process or ...
> 	8. control reshape by Grow.c code using fork() 
> 	   (possible that fork usage can be optional for container, as it is forked already)
> 		- child_XXX()
> 		- wait reshape
> 		- finalize_reshape(for external metadata)(+) - we can handle size change in common code usage case
> 		- backward takeover
> 	9. return to reshape_container() and process next array (c.) if any
> }
> 
> Things added to your current design (marked above with (+)):
> 	1. additional fork()
> 	2. additional reshape_super() in reshape_array()
> 	3. change in call to manage_reshape() to allow processing in main code for external metadata
> 		Depends on:
> 			- if manage_reshape() exists
> 			- manage_reshape() return code
> 	4. added finalize_reshape()
> 
> 
> Above flow gives possibility to:
> 1. run reshapes in sequence or
> 2. run reshapes in parallel 
> 3. reuse Grow.c code (we can use manage_reshape, finalize_reshape, code in Grow.c or any reasonable combination)
> 
> 	and all this is up to metadata specific code (reshape_super()/manage_reshape()/finalize_reshape()).
> 
> Please let me know what do you think about this. If you like it, I'll start work on code changes.
> 
> BR
> Adam
> 


Hi Adam,
 thanks for you comments and the obvious effort you have put into
 understanding the new structure.

 I agree that my code is making some assumption that don't fit quite right
 with IMSM.  However I think they can be resolved with relatively small
 changes.

 I don't think we should support parallel reshaping at all.  md doesn't allow
 multiple arrays that share a device to be reshaped at the same time anyway,
 and running parallel reshapes would result in horrible performance.  So we
 should only support multiple reshapes in series.  Therefore on a single
 'fork' call is needed.  The forked child then handles all of the reshaping.

 I only want one call to ->reshape_super.  It essentially tells the metadata
 handler that the user has requested a reshape, and as the user only made one
 request, only one call to ->reshape_super should be made.

 I had assume that the one call would update the metadata for all of the
 arrays, but you say that is incorrect for IMSM. In that case it should just
 update the metadata for the first array.  When that reshape completes, the
 metadata will be told (in ->set_array_state) and it should at that point
 update the metadata to reflect that the second array is undergoing reshape.

 This requires slightly different behaviour in reshape_container.  Instead of
 calling ->container_content just once and then reshaping each array it should
 repeatedly:
    - load the metadata
    - call ->container_content, find the first array which is ready for
      reshape
    - process that reshape so that it completes
    - ping_monitor to ensure that the metadata is updated
 That should be quite general and compatible with IMSM.

 I am not entirely sure that you wanted finalize_reshape to do.  If it was to
 update the metadata to show that the reshape was complete, then that should
 be done in ->set_array_state.  If it was to e.g. convert a RAID4 back to
 RAID0, that is already done towards the end of reshape_array.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-28  2:24         ` Neil Brown
@ 2010-12-28  8:49           ` Kwolek, Adam
  2010-12-29 10:34             ` Neil Brown
  0 siblings, 1 reply; 41+ messages in thread
From: Kwolek, Adam @ 2010-12-28  8:49 UTC (permalink / raw)
  To: Neil Brown
  Cc: Wojcik, Krzysztof, linux-raid, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech



> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Tuesday, December 28, 2010 3:24 AM
> To: Kwolek, Adam
> Cc: Wojcik, Krzysztof; linux-raid@vger.kernel.org; Williams, Dan J;
> Ciechanowski, Ed; Neubauer, Wojciech
> Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
> 
> On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> > Hi,
> >
> > Please find below my few comments for current grow/reshape code on
> devel 3.2.
> >
> > In super-intel.c, update_reshape_container_disk update is defined.
> > It is intended to send for container only and operates on all devices
> /arrays/ in container in one shot.
> > This is IMSM incompatible way. IMSM updates disks list in metadata
> and first device information during first array reshape.
> > During second array reshape, device is updated only (using disk
> information updated during first reshape)
> > As it I am showing above. Container operation is in fact set of
> single array operations. This is due to single migration area (per
> container),
> > where backup is stored. It is related to currently reshaped device
> and we can have single array in reshape state only.
> >
> > This drives to conclusion that update should be issued per array and
> should have device id (dev minor or subdev).
> >
> > Considering above current implementation of
> update_reshape_container_disk is not compatible with imsm.
> >
> > This is driven by current reshape_container() implementation in
> Grow.c.
> > The flow goes as follow:
> > 1. reshape_container()
> >   - metadata update for container (+ sync_metadata)
> >   - call in loop reshape_array() for each array in container
> >
> > 2. reshape_array()
> >    - adds disks for each device in container - but it should add
> disks to this particular array it is called for (loop is on
> reshape_container() already)
> >    - reshape_array() - currently doesn't update metadata, if so - in
> line 1891 sync_metadata() seems to be not necessary /no reshape_super()
> call/
> >    - currently reshape_array() assumes that metadata is ready for all
> devices in container - this is not imsm compatible /as described above/
> >
> > How mdadm should work to be imsm compatible:
> > 1. container operation is set of array operations (not atomic/single
> container update)
> > 2. for every array the same disk set should be used (it is covered in
> current devel 3.2 code but not in imsm compatible way)
> > 3. there is not metadata update after reshape_super() call for
> container.
> > 4. first array update should update disk list in metadata
> > 6. second array update should reuse (+verify) used disk list present
> in metadata already
> > 7. reshape cannot be run in parallel
> >
> > Main code changes I'm proposing?
> > Problem 1.
> > 	Main code in Grow.c cannot assume when whole metadata /all arrays
> in container/ will be updated for particular external meta
> > 	during reshape_super() in reshape_container(). It can issue no
> update to container/arrays/ due to compatibility reason.
> > Resolution 1.
> > 	reshape_super() has to be called for container and for every
> array in reshape_array() (i.e.) before new disks will be added to array
> (and after monitor is running).
> > 	We can distinguish calls by comparing (as currently), st-
> >container_dev and st->devnum.
> > 	This causes that in reshape_array(), spares cannot be added to
> all arrays in container at one time (based on metadata information),
> > 	but for current array only (all arrays will be covered by loop in
> reshape_container()).
> > 	This means also, that disks cannot be added based on metadata
> information without adding additional reshape_suped() call in
> reshape_array(),
> > 	as it cannot be assumed that all arrays in container are updated
> at this moment without this call.
> >
> > Problem 2.
> > 	As reshape_container() performs action for all arrays in
> container probably we should allow for code reuse from Grow.c by
> external meta.
> > Resolution 2.
> > 	In my implementation, I've did it by moving manage_reshape() call
> to the end of reshape processing. In this function I've manage array
> size and backward takeover to raid0 /optionally/
> >   	I think that this would be correct for imsm, but to have
> flexibility manage_reshape() call should not be changed. In such
> situation additional function can be added:
> > 	i.e. finalize_reshape() for purposes I've pointed above (after
> wait_reshape() and before level change action, Grow.c:2045).
> > 	Action after manage_reshape() should be based on manage_reshape()
> on condition if it exists and is yes, on return code form this call.
> > 	This allows metadata specific code decide to reuse or not /or for
> some cases/ as much as it is possible of main reshape code.
> >
> > Problem 3.
> > 	After manage_reshape() call we are returning to main
> reshape_container() code, I assume that manage_reshape() should execute
> at some point fork() and return,
> > 	as code in Grow.c does. This can cause running parallel reshapes
> using i.e. single backup file.
> > Resolution 3.
> > 	Resolution will depend on particular metadata implementation, but
> as it is single backup_file used, I can assume that intention is
> sequential reshapes run.
> > 	This can be handled in reshape_super() by waiting for all arrays
> to be in idle state. This entry condition should be checked in
> reshape_super() called in reshape_container()
> > 	without waiting for state change, and in reshape_super() called
> in reshape_array() with waiting for idle state.
> >
> >
> > Considering all above I think that container reshape should work as
> follow (using current devel 3.2 code):
> >
> > reshape_container()
> > {
> >   a) reshape_super(container context)
> > 	- check reshape condition and exit with error when any array is
> not ready for reshape
> > 	  Checking condition here makes us safe that we can wait for
> required condition during reshape_super() call in array context later
> > 	- leaving this call here makes possible to implement container
> update also if metadata requires this
> >   b) fork() if reshape_super() call succeed (+).
> > 	For multi-array container this is required to return to console
> for sequential work on arrays.
> > 	When we want to procced for second array, we have to wait until
> first reshape is finished.
> > 	This would block console.
> >   c) call in loop for all devices reshape_array()
> > 	1. analyse_change()
> > 	2. change level and start monitor (if required)
> > 	3. reshape_super(array context) (+)
> > 		- check reshape condition, if any array is in reshape state
> in container this time wait for reshape end.
> > 		- send metadata update and allow for reshape processing
> > 	4. load metadata (container_content()) and based on this
> information add disks to md for single array (currently processed)
> > 	5. set other array parameters to md (including raid_disks)
> > 	6. start_reshape()
> > 	7. manage_reshape()(+) -> if metadata requires take ownership of
> process or ...
> > 	8. control reshape by Grow.c code using fork()
> > 	   (possible that fork usage can be optional for container, as it
> is forked already)
> > 		- child_XXX()
> > 		- wait reshape
> > 		- finalize_reshape(for external metadata)(+) - we can
> handle size change in common code usage case
> > 		- backward takeover
> > 	9. return to reshape_container() and process next array (c.) if
> any
> > }
> >
> > Things added to your current design (marked above with (+)):
> > 	1. additional fork()
> > 	2. additional reshape_super() in reshape_array()
> > 	3. change in call to manage_reshape() to allow processing in main
> code for external metadata
> > 		Depends on:
> > 			- if manage_reshape() exists
> > 			- manage_reshape() return code
> > 	4. added finalize_reshape()
> >
> >
> > Above flow gives possibility to:
> > 1. run reshapes in sequence or
> > 2. run reshapes in parallel
> > 3. reuse Grow.c code (we can use manage_reshape, finalize_reshape,
> code in Grow.c or any reasonable combination)
> >
> > 	and all this is up to metadata specific code
> (reshape_super()/manage_reshape()/finalize_reshape()).
> >
> > Please let me know what do you think about this. If you like it, I'll
> start work on code changes.
> >
> > BR
> > Adam
> >
> 
> 
> Hi Adam,
>  thanks for you comments and the obvious effort you have put into
>  understanding the new structure.
> 
>  I agree that my code is making some assumption that don't fit quite
> right
>  with IMSM.  However I think they can be resolved with relatively small
>  changes.
> 
>  I don't think we should support parallel reshaping at all.  md doesn't
> allow
>  multiple arrays that share a device to be reshaped at the same time
> anyway,
>  and running parallel reshapes would result in horrible performance.
> So we
>  should only support multiple reshapes in series.  Therefore on a
> single
>  'fork' call is needed.  The forked child then handles all of the
> reshaping.
> 
>  I only want one call to ->reshape_super.  It essentially tells the
> metadata
>  handler that the user has requested a reshape, and as the user only
> made one
>  request, only one call to ->reshape_super should be made.
> 
>  I had assume that the one call would update the metadata for all of
> the
>  arrays, but you say that is incorrect for IMSM. In that case it should
> just
>  update the metadata for the first array.  When that reshape completes,
> the
>  metadata will be told (in ->set_array_state) and it should at that
> point
>  update the metadata to reflect that the second array is undergoing
> reshape.
> 
>  This requires slightly different behaviour in reshape_container.
> Instead of
>  calling ->container_content just once and then reshaping each array it
> should
>  repeatedly:
>     - load the metadata
>     - call ->container_content, find the first array which is ready for
>       reshape
>     - process that reshape so that it completes
>     - ping_monitor to ensure that the metadata is updated
>  That should be quite general and compatible with IMSM.
> 
>  I am not entirely sure that you wanted finalize_reshape to do.  If it
> was to
>  update the metadata to show that the reshape was complete, then that
> should
>  be done in ->set_array_state.  If it was to e.g. convert a RAID4 back
> to
>  RAID0, that is already done towards the end of reshape_array.
> 
> Thanks,
> NeilBrown

Hi,

Your approach looks ok, and should lead us to imsm compatible implementation.

The only problem I can see is that set_array_state is called for particular array and I would avoid modifying another array metadata information ther.
This means that sequence of set_array_state is required to archieve final/required metadata state.

so sequence can be:
	1. mdadm: ping_monitor
	2. monitor (in loop for active_array): set_array_state(a1: finalize current reshape), set_array_state(a2: start new reshape in metadata, when previous is finished)

If mdadm will not follow arrays order in mdmon we will have situation:
	1. mdadm: ping_monitor
	2. monitor (in loop for active_array): set_array_state(a2: nothing to do, a1 is under reshape), set_array_state(a1: finalize current reshape)

This leads me to 2 pings sent by mdadm:
	1. mdadm: ping_monitor
	2. monitor (in loop for active_array): set_array_state(a2: nothing to do, a1 is under reshape), set_array_state(a1: finalize current reshape)
	3. mdadm: ping_monitor
	4. monitor (in loop for active_array): set_array_state(a2: start new reshape in metadata, when previous is finished), set_array_state(a1: nothing to do)

If there is more arrays in container (not imsm case) it is important that first ping finalizes current reshape and second ping starts new one.

Regarding finalize_reshape(): If metadata specific implementation is similar or the same as current reshape_array() implementation, there is no need to duplicate it in manage_reshape(). We can proceed in main code, but at the end, array size has to be updated for external metadata (md has been set to external size management). This is main purpose of finalize_reshape(), as metadata specific code can read new size from metadata and it can set it in md.

If you state that reshape control for external metadata always has to be put in to manage_reshape(), we can forget about finalize_reshape().

BR
Adam



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-28  8:49           ` Kwolek, Adam
@ 2010-12-29 10:34             ` Neil Brown
  2010-12-29 12:29               ` Kwolek, Adam
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Brown @ 2010-12-29 10:34 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: Wojcik, Krzysztof, linux-raid, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech

On Tue, 28 Dec 2010 08:49:52 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: Neil Brown [mailto:neilb@suse.de]
> > Sent: Tuesday, December 28, 2010 3:24 AM
> > To: Kwolek, Adam
> > Cc: Wojcik, Krzysztof; linux-raid@vger.kernel.org; Williams, Dan J;
> > Ciechanowski, Ed; Neubauer, Wojciech
> > Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
> > 
> > On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam"
> > <adam.kwolek@intel.com>
> > wrote:
> > 
> > > Hi,
> > >
> > > Please find below my few comments for current grow/reshape code on
> > devel 3.2.
> > >
> > > In super-intel.c, update_reshape_container_disk update is defined.
> > > It is intended to send for container only and operates on all devices
> > /arrays/ in container in one shot.
> > > This is IMSM incompatible way. IMSM updates disks list in metadata
> > and first device information during first array reshape.
> > > During second array reshape, device is updated only (using disk
> > information updated during first reshape)
> > > As it I am showing above. Container operation is in fact set of
> > single array operations. This is due to single migration area (per
> > container),
> > > where backup is stored. It is related to currently reshaped device
> > and we can have single array in reshape state only.
> > >
> > > This drives to conclusion that update should be issued per array and
> > should have device id (dev minor or subdev).
> > >
> > > Considering above current implementation of
> > update_reshape_container_disk is not compatible with imsm.
> > >
> > > This is driven by current reshape_container() implementation in
> > Grow.c.
> > > The flow goes as follow:
> > > 1. reshape_container()
> > >   - metadata update for container (+ sync_metadata)
> > >   - call in loop reshape_array() for each array in container
> > >
> > > 2. reshape_array()
> > >    - adds disks for each device in container - but it should add
> > disks to this particular array it is called for (loop is on
> > reshape_container() already)
> > >    - reshape_array() - currently doesn't update metadata, if so - in
> > line 1891 sync_metadata() seems to be not necessary /no reshape_super()
> > call/
> > >    - currently reshape_array() assumes that metadata is ready for all
> > devices in container - this is not imsm compatible /as described above/
> > >
> > > How mdadm should work to be imsm compatible:
> > > 1. container operation is set of array operations (not atomic/single
> > container update)
> > > 2. for every array the same disk set should be used (it is covered in
> > current devel 3.2 code but not in imsm compatible way)
> > > 3. there is not metadata update after reshape_super() call for
> > container.
> > > 4. first array update should update disk list in metadata
> > > 6. second array update should reuse (+verify) used disk list present
> > in metadata already
> > > 7. reshape cannot be run in parallel
> > >
> > > Main code changes I'm proposing?
> > > Problem 1.
> > > 	Main code in Grow.c cannot assume when whole metadata /all arrays
> > in container/ will be updated for particular external meta
> > > 	during reshape_super() in reshape_container(). It can issue no
> > update to container/arrays/ due to compatibility reason.
> > > Resolution 1.
> > > 	reshape_super() has to be called for container and for every
> > array in reshape_array() (i.e.) before new disks will be added to array
> > (and after monitor is running).
> > > 	We can distinguish calls by comparing (as currently), st-
> > >container_dev and st->devnum.
> > > 	This causes that in reshape_array(), spares cannot be added to
> > all arrays in container at one time (based on metadata information),
> > > 	but for current array only (all arrays will be covered by loop in
> > reshape_container()).
> > > 	This means also, that disks cannot be added based on metadata
> > information without adding additional reshape_suped() call in
> > reshape_array(),
> > > 	as it cannot be assumed that all arrays in container are updated
> > at this moment without this call.
> > >
> > > Problem 2.
> > > 	As reshape_container() performs action for all arrays in
> > container probably we should allow for code reuse from Grow.c by
> > external meta.
> > > Resolution 2.
> > > 	In my implementation, I've did it by moving manage_reshape() call
> > to the end of reshape processing. In this function I've manage array
> > size and backward takeover to raid0 /optionally/
> > >   	I think that this would be correct for imsm, but to have
> > flexibility manage_reshape() call should not be changed. In such
> > situation additional function can be added:
> > > 	i.e. finalize_reshape() for purposes I've pointed above (after
> > wait_reshape() and before level change action, Grow.c:2045).
> > > 	Action after manage_reshape() should be based on manage_reshape()
> > on condition if it exists and is yes, on return code form this call.
> > > 	This allows metadata specific code decide to reuse or not /or for
> > some cases/ as much as it is possible of main reshape code.
> > >
> > > Problem 3.
> > > 	After manage_reshape() call we are returning to main
> > reshape_container() code, I assume that manage_reshape() should execute
> > at some point fork() and return,
> > > 	as code in Grow.c does. This can cause running parallel reshapes
> > using i.e. single backup file.
> > > Resolution 3.
> > > 	Resolution will depend on particular metadata implementation, but
> > as it is single backup_file used, I can assume that intention is
> > sequential reshapes run.
> > > 	This can be handled in reshape_super() by waiting for all arrays
> > to be in idle state. This entry condition should be checked in
> > reshape_super() called in reshape_container()
> > > 	without waiting for state change, and in reshape_super() called
> > in reshape_array() with waiting for idle state.
> > >
> > >
> > > Considering all above I think that container reshape should work as
> > follow (using current devel 3.2 code):
> > >
> > > reshape_container()
> > > {
> > >   a) reshape_super(container context)
> > > 	- check reshape condition and exit with error when any array is
> > not ready for reshape
> > > 	  Checking condition here makes us safe that we can wait for
> > required condition during reshape_super() call in array context later
> > > 	- leaving this call here makes possible to implement container
> > update also if metadata requires this
> > >   b) fork() if reshape_super() call succeed (+).
> > > 	For multi-array container this is required to return to console
> > for sequential work on arrays.
> > > 	When we want to procced for second array, we have to wait until
> > first reshape is finished.
> > > 	This would block console.
> > >   c) call in loop for all devices reshape_array()
> > > 	1. analyse_change()
> > > 	2. change level and start monitor (if required)
> > > 	3. reshape_super(array context) (+)
> > > 		- check reshape condition, if any array is in reshape state
> > in container this time wait for reshape end.
> > > 		- send metadata update and allow for reshape processing
> > > 	4. load metadata (container_content()) and based on this
> > information add disks to md for single array (currently processed)
> > > 	5. set other array parameters to md (including raid_disks)
> > > 	6. start_reshape()
> > > 	7. manage_reshape()(+) -> if metadata requires take ownership of
> > process or ...
> > > 	8. control reshape by Grow.c code using fork()
> > > 	   (possible that fork usage can be optional for container, as it
> > is forked already)
> > > 		- child_XXX()
> > > 		- wait reshape
> > > 		- finalize_reshape(for external metadata)(+) - we can
> > handle size change in common code usage case
> > > 		- backward takeover
> > > 	9. return to reshape_container() and process next array (c.) if
> > any
> > > }
> > >
> > > Things added to your current design (marked above with (+)):
> > > 	1. additional fork()
> > > 	2. additional reshape_super() in reshape_array()
> > > 	3. change in call to manage_reshape() to allow processing in main
> > code for external metadata
> > > 		Depends on:
> > > 			- if manage_reshape() exists
> > > 			- manage_reshape() return code
> > > 	4. added finalize_reshape()
> > >
> > >
> > > Above flow gives possibility to:
> > > 1. run reshapes in sequence or
> > > 2. run reshapes in parallel
> > > 3. reuse Grow.c code (we can use manage_reshape, finalize_reshape,
> > code in Grow.c or any reasonable combination)
> > >
> > > 	and all this is up to metadata specific code
> > (reshape_super()/manage_reshape()/finalize_reshape()).
> > >
> > > Please let me know what do you think about this. If you like it, I'll
> > start work on code changes.
> > >
> > > BR
> > > Adam
> > >
> > 
> > 
> > Hi Adam,
> >  thanks for you comments and the obvious effort you have put into
> >  understanding the new structure.
> > 
> >  I agree that my code is making some assumption that don't fit quite
> > right
> >  with IMSM.  However I think they can be resolved with relatively small
> >  changes.
> > 
> >  I don't think we should support parallel reshaping at all.  md doesn't
> > allow
> >  multiple arrays that share a device to be reshaped at the same time
> > anyway,
> >  and running parallel reshapes would result in horrible performance.
> > So we
> >  should only support multiple reshapes in series.  Therefore on a
> > single
> >  'fork' call is needed.  The forked child then handles all of the
> > reshaping.
> > 
> >  I only want one call to ->reshape_super.  It essentially tells the
> > metadata
> >  handler that the user has requested a reshape, and as the user only
> > made one
> >  request, only one call to ->reshape_super should be made.
> > 
> >  I had assume that the one call would update the metadata for all of
> > the
> >  arrays, but you say that is incorrect for IMSM. In that case it should
> > just
> >  update the metadata for the first array.  When that reshape completes,
> > the
> >  metadata will be told (in ->set_array_state) and it should at that
> > point
> >  update the metadata to reflect that the second array is undergoing
> > reshape.
> > 
> >  This requires slightly different behaviour in reshape_container.
> > Instead of
> >  calling ->container_content just once and then reshaping each array it
> > should
> >  repeatedly:
> >     - load the metadata
> >     - call ->container_content, find the first array which is ready for
> >       reshape
> >     - process that reshape so that it completes
> >     - ping_monitor to ensure that the metadata is updated
> >  That should be quite general and compatible with IMSM.
> > 
> >  I am not entirely sure that you wanted finalize_reshape to do.  If it
> > was to
> >  update the metadata to show that the reshape was complete, then that
> > should
> >  be done in ->set_array_state.  If it was to e.g. convert a RAID4 back
> > to
> >  RAID0, that is already done towards the end of reshape_array.
> > 
> > Thanks,
> > NeilBrown
> 
> Hi,
> 
> Your approach looks ok, and should lead us to imsm compatible implementation.
> 
> The only problem I can see is that set_array_state is called for particular array and I would avoid modifying another array metadata information ther.

I can see why you might want to avoid updating one array from a call to
update a different array, but I think it is the correct thing to do in this
case.

What we are really doing here is changing the whole container (both arrays).
We are making them both bigger in number of devices.
The rules for updating the metadata require that this happens in multiple
steps and at different times.  One of the steps is that when the first array
finishes a reshape, the next array can start.  So in order to impose the
container-wide change, we need to make a change to the second array when
finalising the reshape of the first.
I'm not sure if I've said that as clearly as I would like to, but the point
remains:  I really think preparing the second array for reshape by a
set_array_state call on the first array is the right thing to do.


> This means that sequence of set_array_state is required to archieve final/required metadata state.
> 
> so sequence can be:
> 	1. mdadm: ping_monitor
> 	2. monitor (in loop for active_array): set_array_state(a1: finalize current reshape), set_array_state(a2: start new reshape in metadata, when previous is finished)

So this will be a single set_array_state call.  IMSM *knows* that when the
first array has been reshaped to more devices, the second array must also be
reshaped, so it will make that change transparently.

> 
> If mdadm will not follow arrays order in mdmon we will have situation:
> 	1. mdadm: ping_monitor
> 	2. monitor (in loop for active_array): set_array_state(a2: nothing to do, a1 is under reshape), set_array_state(a1: finalize current reshape)
> 
> This leads me to 2 pings sent by mdadm:
> 	1. mdadm: ping_monitor
> 	2. monitor (in loop for active_array): set_array_state(a2: nothing to do, a1 is under reshape), set_array_state(a1: finalize current reshape)
> 	3. mdadm: ping_monitor
> 	4. monitor (in loop for active_array): set_array_state(a2: start new reshape in metadata, when previous is finished), set_array_state(a1: nothing to do)
> 
> If there is more arrays in container (not imsm case) it is important that first ping finalizes current reshape and second ping starts new one.

Only one ping should be needed.

> 
> Regarding finalize_reshape(): If metadata specific implementation is similar or the same as current reshape_array() implementation, there is no need to duplicate it in manage_reshape(). We can proceed in main code, but at the end, array size has to be updated for external metadata (md has been set to external size management). This is main purpose of finalize_reshape(), as metadata specific code can read new size from metadata and it can set it in md.

I hadn't thought about changing the array size.  Thanks for remembering it.

I think the common code should do this.  When the reshape completes mdadm
should re-read the metadata and use container_content to find out the
expected size of the array, and then tell md what this size is.

So reshape_array will ping_monitor and re-read the metadata.  That means that
reshape_container doesn't need to do either of these.  After calling
reshape_array on one array, it will call container_content(st, NULL) and look
to see if any other arrays need to be reshaped.

> 
> If you state that reshape control for external metadata always has to be put in to manage_reshape(), we can forget about finalize_reshape().
> 

I'm not exactly sure I know what you mean by that first part, but I think the
answer is yet:  I do want any external metadata handler to have a
manage_reshape, and I don't want to have a finalize_reshape at all.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 00/29] OLCE, migrations and raid10 takeover
  2010-12-29 10:34             ` Neil Brown
@ 2010-12-29 12:29               ` Kwolek, Adam
  0 siblings, 0 replies; 41+ messages in thread
From: Kwolek, Adam @ 2010-12-29 12:29 UTC (permalink / raw)
  To: Neil Brown
  Cc: Wojcik, Krzysztof, linux-raid, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech


> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Wednesday, December 29, 2010 11:34 AM
> To: Kwolek, Adam
> Cc: Wojcik, Krzysztof; linux-raid@vger.kernel.org; Williams, Dan J;
> Ciechanowski, Ed; Neubauer, Wojciech
> Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
>
> On Tue, 28 Dec 2010 08:49:52 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Neil Brown [mailto:neilb@suse.de]
> > > Sent: Tuesday, December 28, 2010 3:24 AM
> > > To: Kwolek, Adam
> > > Cc: Wojcik, Krzysztof; linux-raid@vger.kernel.org; Williams, Dan J;
> > > Ciechanowski, Ed; Neubauer, Wojciech
> > > Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
> > >
> > > On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam"
> > > <adam.kwolek@intel.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Please find below my few comments for current grow/reshape code
> on
> > > devel 3.2.
> > > >
> > > > In super-intel.c, update_reshape_container_disk update is
> defined.
> > > > It is intended to send for container only and operates on all
> devices
> > > /arrays/ in container in one shot.
> > > > This is IMSM incompatible way. IMSM updates disks list in
> metadata
> > > and first device information during first array reshape.
> > > > During second array reshape, device is updated only (using disk
> > > information updated during first reshape)
> > > > As it I am showing above. Container operation is in fact set of
> > > single array operations. This is due to single migration area (per
> > > container),
> > > > where backup is stored. It is related to currently reshaped
> device
> > > and we can have single array in reshape state only.
> > > >
> > > > This drives to conclusion that update should be issued per array
> and
> > > should have device id (dev minor or subdev).
> > > >
> > > > Considering above current implementation of
> > > update_reshape_container_disk is not compatible with imsm.
> > > >
> > > > This is driven by current reshape_container() implementation in
> > > Grow.c.
> > > > The flow goes as follow:
> > > > 1. reshape_container()
> > > >   - metadata update for container (+ sync_metadata)
> > > >   - call in loop reshape_array() for each array in container
> > > >
> > > > 2. reshape_array()
> > > >    - adds disks for each device in container - but it should add
> > > disks to this particular array it is called for (loop is on
> > > reshape_container() already)
> > > >    - reshape_array() - currently doesn't update metadata, if so -
> in
> > > line 1891 sync_metadata() seems to be not necessary /no
> reshape_super()
> > > call/
> > > >    - currently reshape_array() assumes that metadata is ready for
> all
> > > devices in container - this is not imsm compatible /as described
> above/
> > > >
> > > > How mdadm should work to be imsm compatible:
> > > > 1. container operation is set of array operations (not
> atomic/single
> > > container update)
> > > > 2. for every array the same disk set should be used (it is
> covered in
> > > current devel 3.2 code but not in imsm compatible way)
> > > > 3. there is not metadata update after reshape_super() call for
> > > container.
> > > > 4. first array update should update disk list in metadata
> > > > 6. second array update should reuse (+verify) used disk list
> present
> > > in metadata already
> > > > 7. reshape cannot be run in parallel
> > > >
> > > > Main code changes I'm proposing?
> > > > Problem 1.
> > > >         Main code in Grow.c cannot assume when whole metadata /all
> arrays
> > > in container/ will be updated for particular external meta
> > > >         during reshape_super() in reshape_container(). It can issue
> no
> > > update to container/arrays/ due to compatibility reason.
> > > > Resolution 1.
> > > >         reshape_super() has to be called for container and for
> every
> > > array in reshape_array() (i.e.) before new disks will be added to
> array
> > > (and after monitor is running).
> > > >         We can distinguish calls by comparing (as currently), st-
> > > >container_dev and st->devnum.
> > > >         This causes that in reshape_array(), spares cannot be added
> to
> > > all arrays in container at one time (based on metadata
> information),
> > > >         but for current array only (all arrays will be covered by
> loop in
> > > reshape_container()).
> > > >         This means also, that disks cannot be added based on
> metadata
> > > information without adding additional reshape_suped() call in
> > > reshape_array(),
> > > >         as it cannot be assumed that all arrays in container are
> updated
> > > at this moment without this call.
> > > >
> > > > Problem 2.
> > > >         As reshape_container() performs action for all arrays in
> > > container probably we should allow for code reuse from Grow.c by
> > > external meta.
> > > > Resolution 2.
> > > >         In my implementation, I've did it by moving
> manage_reshape() call
> > > to the end of reshape processing. In this function I've manage
> array
> > > size and backward takeover to raid0 /optionally/
> > > >         I think that this would be correct for imsm, but to have
> > > flexibility manage_reshape() call should not be changed. In such
> > > situation additional function can be added:
> > > >         i.e. finalize_reshape() for purposes I've pointed above
> (after
> > > wait_reshape() and before level change action, Grow.c:2045).
> > > >         Action after manage_reshape() should be based on
> manage_reshape()
> > > on condition if it exists and is yes, on return code form this
> call.
> > > >         This allows metadata specific code decide to reuse or not
> /or for
> > > some cases/ as much as it is possible of main reshape code.
> > > >
> > > > Problem 3.
> > > >         After manage_reshape() call we are returning to main
> > > reshape_container() code, I assume that manage_reshape() should
> execute
> > > at some point fork() and return,
> > > >         as code in Grow.c does. This can cause running parallel
> reshapes
> > > using i.e. single backup file.
> > > > Resolution 3.
> > > >         Resolution will depend on particular metadata
> implementation, but
> > > as it is single backup_file used, I can assume that intention is
> > > sequential reshapes run.
> > > >         This can be handled in reshape_super() by waiting for all
> arrays
> > > to be in idle state. This entry condition should be checked in
> > > reshape_super() called in reshape_container()
> > > >         without waiting for state change, and in reshape_super()
> called
> > > in reshape_array() with waiting for idle state.
> > > >
> > > >
> > > > Considering all above I think that container reshape should work
> as
> > > follow (using current devel 3.2 code):
> > > >
> > > > reshape_container()
> > > > {
> > > >   a) reshape_super(container context)
> > > >         - check reshape condition and exit with error when any
> array is
> > > not ready for reshape
> > > >           Checking condition here makes us safe that we can wait
> for
> > > required condition during reshape_super() call in array context
> later
> > > >         - leaving this call here makes possible to implement
> container
> > > update also if metadata requires this
> > > >   b) fork() if reshape_super() call succeed (+).
> > > >         For multi-array container this is required to return to
> console
> > > for sequential work on arrays.
> > > >         When we want to procced for second array, we have to wait
> until
> > > first reshape is finished.
> > > >         This would block console.
> > > >   c) call in loop for all devices reshape_array()
> > > >         1. analyse_change()
> > > >         2. change level and start monitor (if required)
> > > >         3. reshape_super(array context) (+)
> > > >                 - check reshape condition, if any array is in reshape
> state
> > > in container this time wait for reshape end.
> > > >                 - send metadata update and allow for reshape
> processing
> > > >         4. load metadata (container_content()) and based on this
> > > information add disks to md for single array (currently processed)
> > > >         5. set other array parameters to md (including raid_disks)
> > > >         6. start_reshape()
> > > >         7. manage_reshape()(+) -> if metadata requires take
> ownership of
> > > process or ...
> > > >         8. control reshape by Grow.c code using fork()
> > > >            (possible that fork usage can be optional for container,
> as it
> > > is forked already)
> > > >                 - child_XXX()
> > > >                 - wait reshape
> > > >                 - finalize_reshape(for external metadata)(+) - we can
> > > handle size change in common code usage case
> > > >                 - backward takeover
> > > >         9. return to reshape_container() and process next array
> (c.) if
> > > any
> > > > }
> > > >
> > > > Things added to your current design (marked above with (+)):
> > > >         1. additional fork()
> > > >         2. additional reshape_super() in reshape_array()
> > > >         3. change in call to manage_reshape() to allow processing
> in main
> > > code for external metadata
> > > >                 Depends on:
> > > >                         - if manage_reshape() exists
> > > >                         - manage_reshape() return code
> > > >         4. added finalize_reshape()
> > > >
> > > >
> > > > Above flow gives possibility to:
> > > > 1. run reshapes in sequence or
> > > > 2. run reshapes in parallel
> > > > 3. reuse Grow.c code (we can use manage_reshape,
> finalize_reshape,
> > > code in Grow.c or any reasonable combination)
> > > >
> > > >         and all this is up to metadata specific code
> > > (reshape_super()/manage_reshape()/finalize_reshape()).
> > > >
> > > > Please let me know what do you think about this. If you like it,
> I'll
> > > start work on code changes.
> > > >
> > > > BR
> > > > Adam
> > > >
> > >
> > >
> > > Hi Adam,
> > >  thanks for you comments and the obvious effort you have put into
> > >  understanding the new structure.
> > >
> > >  I agree that my code is making some assumption that don't fit
> quite
> > > right
> > >  with IMSM.  However I think they can be resolved with relatively
> small
> > >  changes.
> > >
> > >  I don't think we should support parallel reshaping at all.  md
> doesn't
> > > allow
> > >  multiple arrays that share a device to be reshaped at the same
> time
> > > anyway,
> > >  and running parallel reshapes would result in horrible
> performance.
> > > So we
> > >  should only support multiple reshapes in series.  Therefore on a
> > > single
> > >  'fork' call is needed.  The forked child then handles all of the
> > > reshaping.
> > >
> > >  I only want one call to ->reshape_super.  It essentially tells the
> > > metadata
> > >  handler that the user has requested a reshape, and as the user
> only
> > > made one
> > >  request, only one call to ->reshape_super should be made.
> > >
> > >  I had assume that the one call would update the metadata for all
> of
> > > the
> > >  arrays, but you say that is incorrect for IMSM. In that case it
> should
> > > just
> > >  update the metadata for the first array.  When that reshape
> completes,
> > > the
> > >  metadata will be told (in ->set_array_state) and it should at that
> > > point
> > >  update the metadata to reflect that the second array is undergoing
> > > reshape.
> > >
> > >  This requires slightly different behaviour in reshape_container.
> > > Instead of
> > >  calling ->container_content just once and then reshaping each
> array it
> > > should
> > >  repeatedly:
> > >     - load the metadata
> > >     - call ->container_content, find the first array which is ready
> for
> > >       reshape
> > >     - process that reshape so that it completes
> > >     - ping_monitor to ensure that the metadata is updated
> > >  That should be quite general and compatible with IMSM.
> > >
> > >  I am not entirely sure that you wanted finalize_reshape to do.  If
> it
> > > was to
> > >  update the metadata to show that the reshape was complete, then
> that
> > > should
> > >  be done in ->set_array_state.  If it was to e.g. convert a RAID4
> back
> > > to
> > >  RAID0, that is already done towards the end of reshape_array.
> > >
> > > Thanks,
> > > NeilBrown
> >
> > Hi,
> >
> > Your approach looks ok, and should lead us to imsm compatible
> implementation.
> >
> > The only problem I can see is that set_array_state is called for
> particular array and I would avoid modifying another array metadata
> information ther.
>
> I can see why you might want to avoid updating one array from a call to
> update a different array, but I think it is the correct thing to do in
> this
> case.
>
> What we are really doing here is changing the whole container (both
> arrays).
> We are making them both bigger in number of devices.
> The rules for updating the metadata require that this happens in
> multiple
> steps and at different times.  One of the steps is that when the first
> array
> finishes a reshape, the next array can start.  So in order to impose
> the
> container-wide change, we need to make a change to the second array
> when
> finalising the reshape of the first.
> I'm not sure if I've said that as clearly as I would like to, but the
> point
> remains:  I really think preparing the second array for reshape by a
> set_array_state call on the first array is the right thing to do.
>
>
> > This means that sequence of set_array_state is required to archieve
> final/required metadata state.
> >
> > so sequence can be:
> >     1. mdadm: ping_monitor
> >     2. monitor (in loop for active_array): set_array_state(a1:
> finalize current reshape), set_array_state(a2: start new reshape in
> metadata, when previous is finished)
>
> So this will be a single set_array_state call.  IMSM *knows* that when
> the
> first array has been reshaped to more devices, the second array must
> also be
> reshaped, so it will make that change transparently.
>
> >
> > If mdadm will not follow arrays order in mdmon we will have
> situation:
> >     1. mdadm: ping_monitor
> >     2. monitor (in loop for active_array): set_array_state(a2:
> nothing to do, a1 is under reshape), set_array_state(a1: finalize
> current reshape)
> >
> > This leads me to 2 pings sent by mdadm:
> >     1. mdadm: ping_monitor
> >     2. monitor (in loop for active_array): set_array_state(a2:
> nothing to do, a1 is under reshape), set_array_state(a1: finalize
> current reshape)
> >     3. mdadm: ping_monitor
> >     4. monitor (in loop for active_array): set_array_state(a2: start
> new reshape in metadata, when previous is finished),
> set_array_state(a1: nothing to do)
> >
> > If there is more arrays in container (not imsm case) it is important
> that first ping finalizes current reshape and second ping starts new
> one.
>
> Only one ping should be needed.
>
> >
> > Regarding finalize_reshape(): If metadata specific implementation is
> similar or the same as current reshape_array() implementation, there is
> no need to duplicate it in manage_reshape(). We can proceed in main
> code, but at the end, array size has to be updated for external
> metadata (md has been set to external size management). This is main
> purpose of finalize_reshape(), as metadata specific code can read new
> size from metadata and it can set it in md.
>
> I hadn't thought about changing the array size.  Thanks for remembering
> it.
>
> I think the common code should do this.  When the reshape completes
> mdadm
> should re-read the metadata and use container_content to find out the
> expected size of the array, and then tell md what this size is.
>
> So reshape_array will ping_monitor and re-read the metadata.  That
> means that
> reshape_container doesn't need to do either of these.  After calling
> reshape_array on one array, it will call container_content(st, NULL)
> and look
> to see if any other arrays need to be reshaped.
>
> >
> > If you state that reshape control for external metadata always has to
> be put in to manage_reshape(), we can forget about finalize_reshape().
> >
>
> I'm not exactly sure I know what you mean by that first part, but I
> think the
> answer is yet:  I do want any external metadata handler to have a
> manage_reshape, and I don't want to have a finalize_reshape at all.
>
> Thanks,
> NeilBrown


Thank you for information, I think I know everything what I need at this moment.
In a few days, you'll get new patches for OLCE.

BR
Adam


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-12-29 12:29 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
2010-12-09 15:18 ` [PATCH 01/29] Add state_of_reshape for external metadata Adam Kwolek
2010-12-09 15:18 ` [PATCH 02/29] imsm: Prepare reshape_update in mdadm Adam Kwolek
2010-12-14  0:07   ` Neil Brown
2010-12-14  7:54     ` Kwolek, Adam
2010-12-09 15:19 ` [PATCH 03/29] imsm: Process reshape_update in mdmon Adam Kwolek
2010-12-09 15:19 ` [PATCH 04/29] imsm: Block array state change during reshape Adam Kwolek
2010-12-09 15:19 ` [PATCH 05/29] Process reshape initialization by managemon Adam Kwolek
2010-12-09 15:19 ` [PATCH 06/29] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
2010-12-09 15:19 ` [PATCH 07/29] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
2010-12-09 15:19 ` [PATCH 08/29] imsm: Do not accept messages sent by mdadm Adam Kwolek
2010-12-09 15:19 ` [PATCH 09/29] imsm: Do not indicate resync during reshape Adam Kwolek
2010-12-09 15:20 ` [PATCH 10/29] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
2010-12-09 15:20 ` [PATCH 11/29] Control reshape in mdadm Adam Kwolek
2010-12-09 15:20 ` [PATCH 12/29] Finalize reshape after adding disks to array Adam Kwolek
2010-12-09 15:20 ` [PATCH 13/29] Add reshape progress updating Adam Kwolek
2010-12-09 15:20 ` [PATCH 14/29] WORKAROUND: md reports idle state during reshape start Adam Kwolek
2010-12-09 15:20 ` [PATCH 15/29] FIX: core during getting map Adam Kwolek
2010-12-09 15:20 ` [PATCH 16/29] Enable reshape for subarrays Adam Kwolek
2010-12-09 15:21 ` [PATCH 17/29] Change manage_reshape() placement Adam Kwolek
2010-12-09 15:21 ` [PATCH 18/29] Migration: raid5->raid0 Adam Kwolek
2010-12-09 15:21 ` [PATCH 19/29] Detect level change Adam Kwolek
2010-12-09 15:21 ` [PATCH 20/29] Migration raid0->raid5 Adam Kwolek
2010-12-09 15:21 ` [PATCH 21/29] Read chunk size and layout from mdstat Adam Kwolek
2010-12-09 15:21 ` [PATCH 22/29] FIX: mdstat doesn't read chunk size correctly Adam Kwolek
2010-12-09 15:21 ` [PATCH 23/29] Migration: Chunk size migration Adam Kwolek
2010-12-09 15:21 ` [PATCH 24/29] Add takeover support for external meta Adam Kwolek
2010-12-09 15:22 ` [PATCH 25/29] Takeover raid10 -> raid0 for external metadata Adam Kwolek
2010-12-09 15:22 ` [PATCH 26/29] Takeover raid0 -> raid10 " Adam Kwolek
2010-12-09 15:22 ` [PATCH 27/29] FIX: Problem with removing array after takeover Adam Kwolek
2010-12-09 15:22 ` [PATCH 28/29] IMSM compatibility for raid0 -> raid10 takeover Adam Kwolek
2010-12-09 15:22 ` [PATCH 29/29] Add spares to raid0 in mdadm Adam Kwolek
2010-12-16 11:20 ` [PATCH 00/29] OLCE, migrations and raid10 takeover Neil Brown
2010-12-20  8:27   ` Wojcik, Krzysztof
2010-12-20 21:59     ` Neil Brown
2010-12-21 12:37     ` Neil Brown
2010-12-27 14:56       ` Kwolek, Adam
2010-12-28  2:24         ` Neil Brown
2010-12-28  8:49           ` Kwolek, Adam
2010-12-29 10:34             ` Neil Brown
2010-12-29 12:29               ` Kwolek, Adam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.