All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/27] OLCE, migrations and raid10 takeover
@ 2010-12-06 13:20 Adam Kwolek
  2010-12-06 13:20 ` [PATCH 01/27] FIX: wait_backup() sometimes hangs Adam Kwolek
                   ` (27 more replies)
  0 siblings, 28 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

This series for mdadm and introduces features (after some rework):
- Online Capacity Expansion (OLCE): patches 0002 to 0016
-  Migrations: patches 0017 to 0023
    1. raid0 to raid5 : patches 0017, 0018
    2. raid5 to raid0 : patches 0019, 0020
    3. chunk size migration) : patches 0020, 0021
- Takeover: patches 0024 to 0027

Next steps:
- Adding spares to raid0 for IMSM will be rewritten by Krzysztof Wojcik in the few days.
- I'll correct checkpointing to work without md fix for moving suspend_hi


Online Capacity Expansion for raid0 and raid5 arrays implements the following algorithm for container reshape:
1.      mdadm: Freeze container
2.      mdadm: Perform takeover to raid5 for all raid0 arrays in container (imsm for raid0 <->raid5 takeover requires no metadata updates)
3.      mdadm: set raid_disks sysfs entry for all arrays in container
4.      mdadm: prepares and sends metadata update using reshape_super() vector for first array in container.
5.      mdadm: waits for array idle or reshape state
6.      managemon: prepare_update(): allocates memory for bigger device object
7.      monitor: process_update(): applies update, relinks memory for device objects. Sets reshape_delta_disks variable in active array to requested ne disks
8.      monitor: kicks managemon on reshape_delta_disks  value other than RESHAPE_NOT_ACTIVE and RESHAPE_IN_PROGRESS  value
9.      managemon: adds devices to md (let md set slot number on reshape start)
10.     managemon: sets sync_max to 0
11.     managemon: starts reshape in md
12.     managemon: on success sends slot verification message to monitor to update slots
13.     managemon: on failure sends reshape cancelation message (sets idle state to md)
14.     managemon: sets reshape_delta_disks variable to RESHAPE_IN_PROGRESS value to avoid managemon procedures reentry.
15.     monitor:
           a. for set slot message verifies and corrects (if necessary) slot information in metadata
           b. for cancel message roll backs metadata information, set reshape_delta_disks variable to RESHAPE_NOT_ACTIVE
16.     mdadm:  on idle array state exits and unfreezes array. End
17.     mdadm: on reshape array state continues with reshape (it also sends ping to monitor and mandgemon to be sure that metadata updates hits disks)
18.     mdadm: verifies array state: if slots are set correctly
19.     mdadm: calls child_grow() function
20.     mdadm: waits for reshape finish
21.     monitor: on reshape finish sets reshape_delta_disks variable to RESHAPE_NOT_ACTIVE
22.     mdadm: sets array size according to information in metadata
23.     mdadm: for raid0 array backward takeover to raid0 is executed.
24.     mdadm: check if other array in container requires reshape if, yes starts from #4
25.     mdadm: unfreezes array

Migration feature reuses code flow introduced for OLCE (Online Capacity Expansion) and uses the same grow/reshape flow in mdadm/mdmon.
Migration works in the following way:
1. mdadm: reshape_super() prepares metadata update and sends it to mdmon
2. mdadm: waits for reshape array state
3. monitor: receives metadata update and applies it.
4. monitor: metadata update triggers managemon.
5. managemon: updates array (md) configuration and starts reshape
6. mdadm: finds that reshape is started and continues it using check pointing
7. mdadm: reshape is finished and manage_reshape() finalizes array:
    - Sets array size as is given in metadata
    - Performs takeover to raid0 if necessary

In current patches placement of manage_reshape() function call was changed (patch 0019).
It is moved to end of array processing to use common code form Grow.c for external metadata reshape case (we do not need to duplicate existing code) as it would do the same
things as code for native metadata. New manage_reshape() placement causes a few things to do in current implementation only and simplifees code.

Migrations command line:
1. Execute migration raid0->raid5:
    mdadm  --grow /dev/md/array_name -level 5 -layout=left-asymmetric

    This converts n-disks raid0 array to (n+1)-disks raid5 array.
    Additional disk is user from spares pool for raid5 array.

2. Execute migration raid5->raid0:
    mdadm  - -grow /dev/md/array_name -level 0

    This converts n-disks raid5 array to n-disks raid0 array.

3. Execute chunk size migration
    mdadm  - -grow /dev/md/array_name -chunk N

    where N is ne chunk size value

Online Capacity Expansion command line:
1. Add spares to container i.e. mdadm -add /dev/md/imsm_container_name /dev/sdX
   For Raid0 spares are required also. Patch "[PATCH 16] Add spares to raid0 array using takeover" enables this.
2. Execute reshape i.e. : mdadm -grown /dev/md/imsm_container_name -raid-devices=requested_raid_disks_number
   Grow is executed for all arrays in container that command is executed on.

Feature is treated as experimental due to Windows compatibility during reshape process, code is guarded by MDADM_EXPERIMENTAL environment variable.


---

Adam Kwolek (27):
      FIX: Problem with removing array after takeover
      Takeover raid0 -> raid10 for external metadata
      Takeover raid10 -> raid0 for external metadata
      Add takeover support for external meta
      Migration: Chunk size migration
      Read chunk size and layout from mdstat
      Migration raid0->raid5
      Detect level change
      Migration: raid5->raid0
      Change manage_reshape() placement
      Enable reshape for subarrays
      FIX: core during getting map
      WORKAROUND: md reports idle state during reshape start
      Add reshape progress updating
      Finalize reshape after adding disks to array
      Control reshape in mdadm
      imsm: Fill delta_disks field in getinfo_super()
      imsm: Do not indicate resync during reshape
      imsm: Do not accept messages sent by mdadm
      imsm: Cancel metadata changes on reshape start failure
      imsm: Verify slots in meta against slot numbers set by md
      Process reshape initialization by managemon
      imsm: Block array state change during reshape
      imsm: Process reshape_update in mdmon
      imsm: Prepare reshape_update in mdadm
      Add state_of_reshape for external metadata
      FIX: wait_backup() sometimes hangs


 Grow.c        |  161 ++--
 managemon.c   |  179 ++++
 mdadm.h       |   40 +
 mdmon.c       |   65 ++
 mdmon.h       |    9 
 mdstat.c      |   11 
 monitor.c     |   37 +
 super-intel.c | 2468 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 sysfs.c       |  169 ++++
 util.c        |  147 +++
 10 files changed, 3218 insertions(+), 68 deletions(-)

-- 
Adam

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/27] FIX: wait_backup() sometimes hangs
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
@ 2010-12-06 13:20 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 02/27] Add state_of_reshape for external metadata Adam Kwolek
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

mdadm cannot be compiled - just reminder

Sometimes wait_backup() omits transition from reshape to idle state and mdadm seams to be hung.
Add 1 sec. timeout for waiting on select.
This allows for wait_backup exit when reshape is ended.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/Grow.c b/Grow.c
index 99807b4..ecaeb39 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2065,6 +2065,10 @@ static int wait_backup(struct mdinfo *sra,
 	}
 	while (completed < offset + blocks) {
 		char action[20];
+		struct timeval t;
+
+		t.tv_sec = 1;
+		t.tv_usec = 0;
 		fd_set rfds;
 		FD_ZERO(&rfds);
 		FD_SET(fd, &rfds);


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/27] Add state_of_reshape for external metadata
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
  2010-12-06 13:20 ` [PATCH 01/27] FIX: wait_backup() sometimes hangs Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 03/27] imsm: Prepare reshape_update in mdadm Adam Kwolek
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

During reshape we have to know what is present reshape state:
reshape_not_active: reshape is not started, array is in other state than reshape
reshape_is_starting: reshape is about to start, provably metadata is updated,
                     array in md can be in reshape state.
                     In this state mdmon should not allow for array rebuilds
                     as reconfiguration is in progress.
                     When everything goes fine the next state should be reshape_in_progress
                     In error case reshape_cancel_request should be reached.
reshape_in_progress: md is in reshape state and reshape is in progress
                     when reshape ends state_of_reshape will return to reshape_not_active
reshape_cancel_request: reshape canceling request is issued in error case.
                        during this state metadata rollback should occurs.
                        From this state state_of_reshape should go to reshape_not_active state

reshape_delta_disks field should contain valid value in reshape_in_progress state
and tells how many disks are added to array.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |    2 ++
 mdmon.h     |    4 ++++
 monitor.c   |   12 +++++++++++-
 3 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/managemon.c b/managemon.c
index ebd9b73..945b173 100644
--- a/managemon.c
+++ b/managemon.c
@@ -521,6 +521,8 @@ static void manage_new(struct mdstat_ent *mdstat,
 
 	new->container = container;
 
+	new->reshape_state = reshape_not_active;
+
 	inst = to_subarray(mdstat, container->devname);
 
 	new->info.array = mdi->array;
diff --git a/mdmon.h b/mdmon.h
index 5c51566..b869544 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -23,6 +23,7 @@ enum array_state { clear, inactive, suspended, readonly, read_auto,
 
 enum sync_action { idle, reshape, resync, recover, check, repair, bad_action };
 
+enum state_of_reshape { reshape_not_active, reshape_is_starting, reshape_in_progress, reshape_cancel_request };
 
 struct active_array {
 	struct mdinfo info;
@@ -45,6 +46,9 @@ struct active_array {
 	enum array_state prev_state, curr_state, next_state;
 	enum sync_action prev_action, curr_action, next_action;
 
+	enum state_of_reshape reshape_state;
+	int reshape_delta_disks;
+
 	int check_degraded; /* flag set by mon, read by manage */
 
 	int devnum;
diff --git a/monitor.c b/monitor.c
index 59b4181..986fdb0 100644
--- a/monitor.c
+++ b/monitor.c
@@ -399,8 +399,18 @@ static int read_and_act(struct active_array *a)
 		signal_manager();
 	}
 
-	if (deactivate)
+	if (deactivate) {
 		a->container = NULL;
+		/* break reshape also
+		 */
+		if (a->reshape_state !=  reshape_in_progress)
+			a->reshape_state = reshape_not_active;
+	}
+
+	/* signal manager when reshape is in reshape_is_starting state
+	 */
+	if (a->reshape_state == reshape_is_starting)
+		signal_manager();
 
 	return dirty;
 }


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/27] imsm: Prepare reshape_update in mdadm
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
  2010-12-06 13:20 ` [PATCH 01/27] FIX: wait_backup() sometimes hangs Adam Kwolek
  2010-12-06 13:21 ` [PATCH 02/27] Add state_of_reshape for external metadata Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-08  3:10   ` Neil Brown
  2010-12-06 13:21 ` [PATCH 04/27] imsm: Process reshape_update in mdmon Adam Kwolek
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

During Online Capacity Expansion metadata has to be updated to show
array changes and allow for future assembly of array.
To do this mdadm prepares and sends reshape_update metadata update to mdmon.
Update is sent for one array in container. It contains updated device
and spares that have to be turned in to array members.
For spares we have 2 cases:
1. For first array in container:
   reshape_delta_disks: shows how many disks will be added to array
   Spares are sent in update so variable spares_in_update in metadata update tells that mdmon has to turn spares in to array
   (IMSM's array meaning) members.
2. For 2nd array in container:
   reshape_delta_disks: shows how many disks will be added to array -exactly as in first case
   Spares were turned in to array members (they are not a spares) so we have for this volume
   reuse those disks only.

This update will change active array state to reshape_is_starting state.
This works in the following way:
1. reshape_super() prepares metadata update and send it to mdmon
2. managemon in prepare_update() allocates required memory for bigger device object
3. monitor in process_update() updates (replaces) device object with information
   passed from mdadm (memory was allocated by managemon)
4. process_update() function performs:
   - sets reshape_delta_disks variable to reshape_delta_disks value from update
   - sets array in to reshape_is_starting state.
5. This signals managemon to add devices to md and start reshape for this array
   and put array in to reshape_in_progress.
   Managemon can request reshape_cancel_request state in error case.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 mdadm.h       |    3 
 mdmon.c       |   13 +
 super-intel.c |  669 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 sysfs.c       |  169 ++++++++++++++
 util.c        |  147 +++++++++++++
 5 files changed, 1001 insertions(+), 0 deletions(-)

diff --git a/mdadm.h b/mdadm.h
index 175d228..423f62c 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -494,6 +494,7 @@ extern int reshape_open_backup_file(char *backup,
 				    unsigned long long *offsets);
 extern unsigned long compute_backup_blocks(int nchunk, int ochunk,
 					   unsigned int ndata, unsigned int odata);
+extern struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd);
 
 extern int save_stripes(int *source, unsigned long long *offsets,
 			int raid_disks, int chunk_size, int level, int layout,
@@ -1060,6 +1061,8 @@ extern int conf_name_is_free(char *name);
 extern int devname_matches(char *name, char *match);
 extern struct mddev_ident *conf_match(struct mdinfo *info, struct supertype *st);
 extern int experimental(void);
+extern int find_array_minor(char *text_version, int external, int container, int *minor);
+extern int find_array_minor2(char *text_version, int external, int container, int *minor);
 
 extern void free_line(char *line);
 extern int match_oneof(char *devices, char *devname);
diff --git a/mdmon.c b/mdmon.c
index f56e57f..413ee29 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -517,3 +517,16 @@ static int mdmon(char *devname, int devnum, int must_fork, int takeover)
 
 	exit(0);
 }
+
+/* Below there are some dummy functions needed for compilation but not used by mdmon
+ */
+
+void map_read(struct map_ent **melp)
+{
+	*melp = NULL;
+}
+
+void map_free(struct map_ent *map)
+{
+}
+
diff --git a/super-intel.c b/super-intel.c
index 2943898..ae2f567 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -285,6 +285,7 @@ enum imsm_update_type {
 	update_kill_array,
 	update_rename_array,
 	update_add_disk,
+	update_reshape,
 };
 
 struct imsm_update_activate_spare {
@@ -295,6 +296,43 @@ struct imsm_update_activate_spare {
 	struct imsm_update_activate_spare *next;
 };
 
+struct geo_params {
+	int dev_id;
+	char *dev_name;
+	long long size;
+	int level;
+	int layout;
+	int chunksize;
+	int raid_disks;
+};
+
+
+struct imsm_update_reshape {
+	enum imsm_update_type type;
+	int update_memory_size;
+	int reshape_delta_disks;
+	int disks_count;
+	int spares_in_update;
+	int devnum;
+	/* pointers to memory that will be allocated
+	 * by manager during prepare_update()
+	 */
+	struct intel_dev devs_mem;
+	/* status of update preparation
+	 */
+	int update_prepared;
+	/* anchor data prepared by mdadm */
+	int upd_devs_offset;
+	int device_size;
+	struct dl upd_disks[1];
+	/* here goes added spares
+	 */
+	/* and here goes imsm_devs pointed by upd_devs
+	 * devs are put here as row data every device_size bytes
+	 *
+	 */
+};
+
 struct disk_info {
 	__u8 serial[MAX_RAID_SERIAL_LEN];
 };
@@ -5236,6 +5274,7 @@ static int disks_overlap(struct intel_super *super, int idx, struct imsm_update_
 }
 
 static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
+int imsm_get_new_device_name(struct dl *dl);
 
 static void imsm_process_update(struct supertype *st,
 			        struct metadata_update *update)
@@ -5271,6 +5310,9 @@ static void imsm_process_update(struct supertype *st,
 	mpb = super->anchor;
 
 	switch (type) {
+	case update_reshape: {
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5590,6 +5632,9 @@ static void imsm_prepare_update(struct supertype *st,
 	size_t len = 0;
 
 	switch (type) {
+	case update_reshape: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -5743,6 +5788,629 @@ static const char *imsm_get_disk_controller_domain(const char *path)
 		return NULL;
 }
 
+int imsm_reshape_is_allowed_on_container(struct supertype *st,
+					 struct geo_params *geo)
+{
+	int ret_val = 0;
+	struct mdinfo *info = NULL;
+	char buf[PATH_MAX];
+	int fd = -1;
+	int device_num = -1;
+	int devices_that_can_grow = 0;
+
+	dprintf("imsm: imsm_reshape_is_allowed_on_container(ENTER): st->devnum = (%i)\n", st->devnum);
+
+	if (geo == NULL ||
+	    (geo->size != -1) || (geo->level != UnSet) ||
+	    (geo->layout != UnSet) || (geo->chunksize != 0)) {
+		dprintf("imsm: Container operation is allowed for raid disks number change only.\n");
+		return ret_val;
+	}
+
+	snprintf(buf, PATH_MAX, "/dev/md%i", st->devnum);
+	dprintf("imsm: open device (%s)\n", buf);
+	fd = open(buf , O_RDONLY | O_DIRECT);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		return ret_val;
+	}
+
+	if (geo->raid_disks == UnSet) {
+		dprintf("imsm: for container operation raid disks change is required\n");
+		goto exit_imsm_reshape_is_allowed_on_container;
+	}
+
+	device_num = 0; /* start from first device (skip container info) */
+	while (device_num > -1) {
+		int result;
+		int minor;
+		unsigned long long array_blocks;
+		struct imsm_map *map = NULL;
+		struct imsm_dev *dev = NULL;
+		struct intel_super *super = NULL;
+		int used_disks;
+
+
+		dprintf("imsm: checking device_num: %i\n", device_num);
+		super = st->sb;
+		super->current_vol = device_num;
+		st->ss->load_super(st, fd, NULL);
+		if (st->sb == NULL) {
+			if (device_num == 0) {
+				/* for the first checked device this is error
+				   there should be at least one device to check
+				 */
+				dprintf("imsm: error: superblock is NULL during container operation\n");
+			} else {
+				dprintf("imsm: no more devices to check, number of forund devices: %i\n",
+					devices_that_can_grow);
+				/* check if any device in container can be groved
+				 */
+				if (devices_that_can_grow)
+					ret_val = 1;
+			}
+			break;
+		}
+		info = sysfs_read(fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+		if (info == NULL) {
+			dprintf("imsm: Cannot get device info.\n");
+			break;
+		}
+		super = st->sb;
+		super->current_vol = device_num;
+		st->ss->getinfo_super(st, info, NULL);
+		if ((info->name == NULL) ||
+		    (strlen(info->name) == 0)) {
+			/* no more to load
+			 */
+			dprintf("imsm: no more devices to check, number of forund devices: %i\n",
+				devices_that_can_grow);
+			/* check if any device in container can be groved
+				 */
+			if (devices_that_can_grow)
+				ret_val = 1;
+			break;
+		}
+
+		if (geo->raid_disks < info->array.raid_disks) {
+			/* we work on container for Online Capacity Expansion
+			 * only so raid_disks has to grow
+			 */
+			dprintf("imsm: for container operation raid disks increase is required\n");
+			break;
+		}
+		/* check if size is set corectly
+		 * wrong conditions could happend when previous reshape wes interrupted
+		 */
+		super = st->sb;
+		dev = get_imsm_dev(super, device_num);
+		if (dev == NULL) {
+			dprintf("cannot get imsm device\n");
+			ret_val = 0;
+			break;
+		}
+		map = get_imsm_map(dev, 0);
+		if (dev == NULL) {
+			dprintf("cannot get imsm device map\n");
+			ret_val = 0;
+			break;
+		}
+		used_disks = imsm_num_data_members(dev);
+		dprintf("read raid_disks = %i\n", used_disks);
+		dprintf("read requested disks = %i\n", geo->raid_disks);
+		array_blocks = map->blocks_per_member * used_disks;
+		/* round array size down to closest MB
+		 */
+		array_blocks = (array_blocks >> SECT_PER_MB_SHIFT) << SECT_PER_MB_SHIFT;
+		if (sysfs_set_num(info, NULL, "array_size", array_blocks/2) < 0)
+			dprintf("cannot set array size to %llu\n", array_blocks/2);
+
+		if (geo->raid_disks > info->array.raid_disks)
+			devices_that_can_grow++;
+
+		if ((info->array.level != 0) &&
+		    (info->array.level != 5)) {
+			/* we cannot use this container other raid level
+			 */
+			dprintf("imsm: for container operation wrong raid level (%i) detected\n", info->array.level);
+			break;
+		} else {
+			/* check for platform support for this raid level configuration
+			 */
+			struct intel_super *super = st->sb;
+			if (!is_raid_level_supported(super->orom,  info->array.level, geo->raid_disks)) {
+				dprintf("platform does not support raid%d with %d disk%s\n",
+					 info->array.level, geo->raid_disks, geo->raid_disks > 1 ? "s" : "");
+				break;
+			}
+		}
+
+		/* all raid5 and raid0 volumes in container
+		 * has to be ready for Online Capacity Expansion
+		 */
+		result = find_array_minor2(info->text_version, st->ss->external, st->devnum, &minor);
+		if (result < 0) {
+			dprintf("imsm: cannot find array\n");
+			break;
+		}
+		sprintf(info->sys_name, "md%i", minor);
+		if (sysfs_get_str(info, NULL, "array_state", buf, 20) <= 0) {
+			dprintf("imsm: cannot read array state\n");
+			break;
+		}
+		if ((strncmp(buf, "clean", 5) != 0) &&
+		    (strncmp(buf, "clear", 5) != 0) &&
+		    (strncmp(buf, "active", 6) != 0)) {
+			int index = strlen(buf) - 1;
+
+			if (index < 0)
+				index = 0;
+			*(buf + index) = 0;
+			fprintf(stderr, "imsm: Error: Array %s is not in proper state (current state: %s). Cannot continue.\n", info->sys_name, buf);
+			break;
+		}
+		if (info->array.level > 0) {
+			if (sysfs_get_str(info, NULL, "sync_action", buf, 20) <= 0) {
+				dprintf("imsm: for container operation no sync action\n");
+				break;
+			}
+			/* check if any reshape is not in progress
+			 */
+			if (strncmp(buf, "reshape", 7) == 0) {
+				dprintf("imsm: for container operation reshape is currently in progress\n");
+				break;
+			}
+		}
+		sysfs_free(info);
+		info = NULL;
+		device_num++;
+	}
+	sysfs_free(info);
+	info = NULL;
+
+exit_imsm_reshape_is_allowed_on_container:
+	if (fd >= 0)
+		close(fd);
+
+	dprintf("imsm: imsm_reshape_is_allowed_on_container(Exit) device_num = %i, ret_val = %i\n", device_num, ret_val);
+	if (ret_val)
+		dprintf("\tContainer operation allowed\n");
+	else
+		dprintf("\tError: %i\n", ret_val);
+
+	return ret_val;
+}
+
+struct mdinfo *get_spares_imsm(int devnum)
+{
+	int fd = -1;
+	char buf[PATH_MAX];
+	struct mdinfo *info = NULL;
+	struct mdinfo *ret_val = NULL;
+	int cont_id = -1;
+	struct supertype *st = NULL;
+	int find_result;
+	struct intel_super *super = NULL;
+
+	dprintf("imsm: get_spares_imsm for device: %i.\n", devnum);
+
+	sprintf(buf, "/dev/md%i", devnum);
+	dprintf("try to read container %s\n", buf);
+
+	cont_id = open(buf, O_RDONLY);
+	if (cont_id < 0) {
+		dprintf("imsm: ERROR: Cannot open container.\n");
+		goto abort;
+	}
+
+	/* get first volume */
+	st = super_by_fd(cont_id, NULL);
+	if (st == NULL) {
+		dprintf("imsm: ERROR: Cannot load container information.\n");
+		goto abort;
+	}
+	sprintf(buf, "/md%i/0", devnum);
+	find_result = find_array_minor2(buf, 1, devnum, &devnum);
+	if (find_result < 0) {
+		dprintf("imsm: ERROR: Cannot find array.\n");
+		goto abort;
+	}
+	sprintf(buf, "/dev/md%i", devnum);
+	fd = open(buf, O_RDONLY);
+	if (fd < 0) {
+		dprintf("imsm: ERROR: Cannot open device.\n");
+		goto abort;
+	}
+	st->ss->load_super(st, cont_id, NULL);
+	if (st->sb == NULL) {
+		dprintf("imsm: ERROR: Cannot load array information.\n");
+		goto abort;
+	}
+	info = sysfs_read(fd, 0, GET_LEVEL | GET_VERSION | GET_DEVS | GET_STATE);
+	if (info == NULL) {
+		dprintf("imsm: Cannot get device info.\n");
+		goto abort;
+	}
+	super = st->sb;
+	super->current_vol = 0;
+	st->ss->getinfo_super(st, info, NULL);
+	sprintf(buf, "/dev/md/%s", info->name);
+	ret_val = sysfs_get_unused_spares(cont_id, fd);
+	if (ret_val == NULL) {
+		dprintf("imsm: ERROR: Cannot get spare devices.\n");
+		goto abort;
+	}
+	if (ret_val->array.spare_disks == 0) {
+		dprintf("imsm: ERROR: No available spares.\n");
+		free(ret_val);
+		ret_val = NULL;
+		goto abort;
+	}
+
+abort:
+	if (st)
+		st->ss->free_super(st);
+	sysfs_free(info);
+	if (fd > -1)
+		close(fd);
+	if (cont_id > -1)
+		close(cont_id);
+
+	return ret_val;
+}
+
+/******************************************************************************
+ * function: imsm_create_metadata_update_for_reshape
+ * Function creates update for whole IMSM container.
+ * Slot number for new devices are guesed only. Managemon will correct them
+ * when reshape will be triggered and md sets slot numbers.
+ * Slot numbers in metadata will be updated with stage_2 update
+ ******************************************************************************/
+struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(struct supertype *st, struct geo_params *geo)
+{
+	struct imsm_update_reshape *ret_val = NULL;
+	struct intel_super *super = st->sb;
+	int update_memory_size = 0;
+	struct imsm_update_reshape *u = NULL;
+	struct imsm_map *new_map = NULL;
+	struct mdinfo *spares = NULL;
+	int i;
+	unsigned long long array_blocks;
+	int used_disks;
+	int delta_disks = 0;
+	struct dl *new_disks;
+	int device_size;
+	void *upd_devs;
+
+	dprintf("imsm imsm_update_metadata_for_reshape(enter) raid_disks = %i\n", geo->raid_disks);
+
+	if ((geo->raid_disks < super->anchor->num_disks) ||
+	    (geo->raid_disks == UnSet))
+		geo->raid_disks = super->anchor->num_disks;
+	delta_disks = geo->raid_disks - super->anchor->num_disks;
+
+	/* size of all update data without anchor */
+	update_memory_size = sizeof(struct imsm_update_reshape);
+	/* add space for all devices,
+	 * then add maps space
+	 */
+	device_size = sizeof(struct imsm_dev);
+	device_size += sizeof(struct imsm_map);
+	device_size += 2 * (geo->raid_disks - 1) * sizeof(__u32);
+
+	update_memory_size += device_size * super->anchor->num_raid_devs;
+	if (delta_disks > 1) {
+		/* now add space for spare disks information
+		 */
+		update_memory_size += sizeof(struct dl) * (delta_disks - 1);
+	}
+
+	u = calloc(1, update_memory_size);
+	if (u == NULL) {
+		dprintf("error: cannot get memory for imsm_update_reshape update\n");
+		return ret_val;
+	}
+	u->reshape_delta_disks = delta_disks;
+	u->update_prepared = -1;
+	u->update_memory_size = update_memory_size;
+	u->type = update_reshape;
+	u->spares_in_update = 0;
+	u->upd_devs_offset =  sizeof(struct imsm_update_reshape) + sizeof(struct dl) * (delta_disks - 1);
+	upd_devs = (struct imsm_dev *)((void *)u + u->upd_devs_offset);
+	u->device_size = device_size;
+
+	for (i = 0; i < super->anchor->num_raid_devs; i++) {
+		struct imsm_dev *old_dev = __get_imsm_dev(super->anchor, i);
+		int old_disk_number;
+		int devnum = -1;
+
+		u->devnum = -1;
+		if (old_dev == NULL)
+			break;
+
+		find_array_minor((char *)old_dev->volume, 1, st->devnum, &devnum);
+		if (devnum == geo->dev_id) {
+			__u8 to_state;
+			struct imsm_map *new_map2;
+			int idx;
+
+			new_map = NULL;
+			imsm_copy_dev(upd_devs, old_dev);
+			new_map = get_imsm_map(upd_devs, 0);
+			old_disk_number = new_map->num_members;
+			new_map->num_members = geo->raid_disks;
+			u->reshape_delta_disks = new_map->num_members - old_disk_number;
+			/* start migration on new device
+			 * it puts second map there also
+			 */
+
+			to_state = imsm_check_degraded(super, old_dev, 0);
+			migrate(upd_devs, to_state, MIGR_GEN_MIGR);
+			/* second map length is equal to first map
+			* correct second map length to old value
+			*/
+			new_map2 = get_imsm_map(upd_devs, 1);
+			if (new_map2) {
+				if (new_map2->num_members != old_disk_number) {
+					new_map2->num_members = old_disk_number;
+					/* guess new disk indexes
+					*/
+					for (idx = new_map2->num_members; idx < new_map->num_members; idx++)
+						set_imsm_ord_tbl_ent(new_map, idx, idx);
+				}
+				u->devnum = geo->dev_id;
+				break;
+			}
+		}
+	}
+
+	if (delta_disks <= 0) {
+		dprintf("imsm: reshape without grow (disk add).\n");
+		/* finalize update */
+		goto calculate_size_only;
+	}
+
+	/* now get spare disks list
+	 */
+	spares = get_spares_imsm(st->container_dev);
+
+	if (spares == NULL) {
+		dprintf("imsm: ERROR: Cannot get spare devices.\n");
+		goto exit_imsm_create_metadata_update_for_reshape;
+	}
+	if ((spares->array.spare_disks == 0) ||
+	(u->reshape_delta_disks > spares->array.spare_disks)) {
+		dprintf("imsm: ERROR: No available spares.\n");
+		goto exit_imsm_create_metadata_update_for_reshape;
+	}
+	/* we have got spares
+	 * update disk list in imsm_disk list table in anchor
+	 */
+	dprintf("imsm: %i spares are available.\n\n", spares->array.spare_disks);
+	new_disks = u->upd_disks;
+	for (i = 0; i < u->reshape_delta_disks; i++) {
+		struct mdinfo *dev = spares->devs;
+		__u32 id;
+		int fd;
+		char buf[PATH_MAX];
+		int rv;
+		unsigned long long size;
+
+		sprintf(buf, "%d:%d", dev->disk.major, dev->disk.minor);
+		dprintf("open spare disk %s (%s)\n", buf, dev->sys_name);
+		fd = dev_open(buf, O_RDWR);
+		if (fd < 0) {
+			dprintf("\topen failed\n");
+			goto exit_imsm_create_metadata_update_for_reshape;
+		}
+		if (sysfs_disk_to_scsi_id(fd, &id) == 0)
+			new_disks[i].disk.scsi_id = __cpu_to_le32(id);
+		else
+			new_disks[i].disk.scsi_id = __cpu_to_le32(0);
+		new_disks[i].disk.status = CONFIGURED_DISK;
+		rv = imsm_read_serial(fd, NULL, new_disks[i].disk.serial);
+		if (rv != 0) {
+			dprintf("\tcannot read disk serial\n");
+			close(fd);
+			goto exit_imsm_create_metadata_update_for_reshape;
+		}
+		dprintf("\tdisk serial: %s\n", new_disks[i].disk.serial);
+		get_dev_size(fd, NULL, &size);
+		size /= 512;
+		new_disks[i].disk.total_blocks = __cpu_to_le32(size);
+		new_disks[i].disk.owner_cfg_num = super->anchor->disk->owner_cfg_num;
+
+		new_disks[i].major = dev->disk.major;
+		new_disks[i].minor = dev->disk.minor;
+		/* no relink in update
+		 * use table access
+		 */
+		new_disks[i].next = NULL;
+
+		close(fd);
+		spares->devs = dev->next;
+		u->spares_in_update++;
+
+		free(dev);
+		dprintf("\n");
+	}
+calculate_size_only:
+	/* calculate new size
+	 */
+	if (new_map != NULL) {
+
+		used_disks = imsm_num_data_members(upd_devs);
+		if (used_disks) {
+			array_blocks = new_map->blocks_per_member * used_disks;
+			/* round array size down to closest MB
+			 */
+			array_blocks = (array_blocks >> SECT_PER_MB_SHIFT) << SECT_PER_MB_SHIFT;
+			((struct imsm_dev *)(upd_devs))->size_low = __cpu_to_le32((__u32)array_blocks);
+			((struct imsm_dev *)(upd_devs))->size_high = __cpu_to_le32((__u32)(array_blocks >> 32));
+			/* finalize update */
+			ret_val = u;
+		}
+	}
+
+exit_imsm_create_metadata_update_for_reshape:
+	/* free spares
+	 */
+	if (spares) {
+		while (spares->devs) {
+			struct mdinfo *dev = spares->devs;
+			spares->devs = dev->next;
+			free(dev);
+		}
+		free(spares);
+	}
+
+	if (ret_val == NULL)
+		free(u);
+
+	return ret_val;
+}
+
+char *get_volume_for_olce(struct supertype *st, int raid_disks)
+{
+	char *ret_val = NULL;
+	struct mdinfo *sra = NULL;
+	struct mdinfo info;
+	char *ret_buf;
+	struct intel_super *super = st->sb;
+	int i;
+	int fd = -1;
+	char buf[PATH_MAX];
+
+	snprintf(buf, PATH_MAX, "/dev/md%i", st->devnum);
+	dprintf("imsm: open device (%s)\n", buf);
+	fd = open(buf , O_RDONLY | O_DIRECT);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		return ret_val;
+	}
+
+	ret_buf = malloc(PATH_MAX);
+	if (ret_buf == NULL)
+		goto exit_get_volume_for_olce;
+
+	super = st->sb;
+	for (i = 0; i < super->anchor->num_raid_devs; i++) {
+		struct intel_super *super = NULL;
+
+		st->ss->load_super(st, fd, NULL);
+		if (st->sb == NULL)
+			goto exit_get_volume_for_olce;
+		info.devs = NULL;
+		super = st->sb;
+		super->current_vol = i;
+		st->ss->getinfo_super(st, &info, NULL);
+
+		if (raid_disks > info.array.raid_disks) {
+			snprintf(ret_buf, PATH_MAX,
+				 "%s",  info.name);
+			dprintf("Found device for OLCE requested raid_disks = %i, array raid_disks = %i\n",
+				raid_disks, info.array.raid_disks);
+			ret_val = ret_buf;
+			break;
+		}
+	}
+
+exit_get_volume_for_olce:
+	if ((ret_val == NULL) && ret_buf)
+		free(ret_buf);
+	sysfs_free(sra);
+	if (fd > -1)
+		close(fd);
+
+	return ret_val;
+}
+
+int imsm_reshape_super(struct supertype *st, long long size, int level,
+		       int layout, int chunksize, int raid_disks,
+		       char *backup, char *dev, int verbouse)
+{
+	int ret_val = 1;
+	struct geo_params geo;
+
+	dprintf("imsm: reshape_super called.\n");
+
+	memset(&geo, sizeof (struct geo_params), 0);
+
+	geo.dev_name = dev;
+	geo.size = size;
+	geo.level = level;
+	geo.layout = layout;
+	geo.chunksize = chunksize;
+	geo.raid_disks = raid_disks;
+
+	dprintf("\tfor level      : %i\n", geo.level);
+	dprintf("\tfor raid_disks : %i\n", geo.raid_disks);
+
+	if (experimental() == 0)
+		return ret_val;
+
+	/* verify reshape conditions
+	 * on container level we can do almost  everything */
+	if (st->container_dev == st->devnum) {
+		/* check for delta_disks > 0 and supported raid levels 0 and 5 only in container */
+		if (imsm_reshape_is_allowed_on_container(st, &geo)) {
+			struct imsm_update_reshape *u;
+			char *array;
+
+			array = get_volume_for_olce(st, geo.raid_disks);
+			if (array) {
+				find_array_minor(array, 1, st->devnum, &geo.dev_id);
+				if (geo.dev_id > 0) {
+					dprintf("imsm: Preparing metadata update for: %s\n", array);
+
+					st->update_tail = &st->updates;
+					u = imsm_create_metadata_update_for_reshape(st, &geo);
+
+					if (u) {
+						ret_val = 0;
+						append_metadata_update(st, u, u->update_memory_size);
+					} else
+						dprintf("imsm: Cannot prepare update\n");
+				} else
+					dprintf("imsm: Cannot find array in container\n");
+				free(array);
+			}
+		} else
+			dprintf("imsm: Operation is not allowed on container\n");
+	} else
+		dprintf("imsm: not a container operation\n");
+
+	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
+	return ret_val;
+}
+
+int imsm_get_new_device_name(struct dl *dl)
+{
+	int rv;
+	char dv[PATH_MAX];
+	char nm[PATH_MAX];
+	char *dname;
+
+	if (dl->devname != NULL)
+		return 0;
+
+	sprintf(dv, "/sys/dev/block/%d:%d", dl->major, dl->minor);
+	memset(nm, 0, sizeof(nm));
+	rv = readlink(dv, nm, sizeof(nm));
+	if (rv > 0) {
+		nm[rv] = '\0';
+		dname = strrchr(nm, '/');
+		if (dname) {
+			char buf[PATH_MAX];
+
+			dname++;
+			sprintf(buf, "/dev/%s", dname);
+			dl->devname = strdup(buf);
+		}
+	}
+
+	return rv;
+}
 
 struct superswitch super_imsm = {
 #ifndef	MDASSEMBLE
@@ -5779,6 +6447,7 @@ struct superswitch super_imsm = {
 	.container_content = container_content_imsm,
 	.default_geometry = default_geometry_imsm,
 	.get_disk_controller_domain = imsm_get_disk_controller_domain,
+	.reshape_super  = imsm_reshape_super,
 
 	.external	= 1,
 	.name = "imsm",
diff --git a/sysfs.c b/sysfs.c
index 7a0403d..366d2db 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -801,6 +801,175 @@ int sysfs_unique_holder(int devnum, long rdev)
 		return found;
 }
 
+int sysfs_is_spare_device_belongs_to(int fd, char *devname)
+{
+	int ret_val = -1;
+	char fname[PATH_MAX];
+	char *base;
+	char *dbase;
+	struct mdinfo *sra;
+	DIR *dir = NULL;
+	struct dirent *de;
+
+	sra = malloc(sizeof(*sra));
+	if (sra == NULL)
+		goto abort;
+	memset(sra, 0, sizeof(*sra));
+	sysfs_init(sra, fd, -1);
+	if (sra->sys_name[0] == 0)
+		goto abort;
+
+	memset(fname, PATH_MAX, 0);
+	sprintf(fname, "/sys/block/%s/md/", sra->sys_name);
+	base = fname + strlen(fname);
+
+	/* Get all the devices as well */
+	*base = 0;
+	dir = opendir(fname);
+	if (!dir)
+		goto abort;
+	while ((de = readdir(dir)) != NULL) {
+		if (de->d_ino == 0 ||
+		    strncmp(de->d_name, "dev-", 4) != 0)
+			continue;
+		strcpy(base, de->d_name);
+		dbase = base + strlen(base);
+		*dbase = '\0';
+		dbase = strstr(fname, "/md/");
+		if (dbase && strcmp(devname, dbase) == 0) {
+			ret_val = 1;
+			goto abort;
+		}
+	}
+abort:
+	if (dir)
+		closedir(dir);
+	sysfs_free(sra);
+
+	return ret_val;
+}
+
+struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd)
+{
+	char fname[PATH_MAX];
+	char buf[PATH_MAX];
+	char *base;
+	char *dbase;
+	struct mdinfo *ret_val;
+	struct mdinfo *dev;
+	DIR *dir = NULL;
+	struct dirent *de;
+	int is_in;
+	char *to_check;
+
+	ret_val = malloc(sizeof(*ret_val));
+	if (ret_val == NULL)
+		goto abort;
+	memset(ret_val, 0, sizeof(*ret_val));
+	sysfs_init(ret_val, container_fd, -1);
+	if (ret_val->sys_name[0] == 0)
+		goto abort;
+
+	sprintf(fname, "/sys/block/%s/md/", ret_val->sys_name);
+	base = fname + strlen(fname);
+
+	strcpy(base, "raid_disks");
+	if (load_sys(fname, buf))
+		goto abort;
+	ret_val->array.raid_disks = strtoul(buf, NULL, 0);
+
+	/* Get all the devices as well */
+	*base = 0;
+	dir = opendir(fname);
+	if (!dir)
+		goto abort;
+	ret_val->array.spare_disks = 0;
+	while ((de = readdir(dir)) != NULL) {
+		char *ep;
+		if (de->d_ino == 0 ||
+		    strncmp(de->d_name, "dev-", 4) != 0)
+			continue;
+		strcpy(base, de->d_name);
+		dbase = base + strlen(base);
+		*dbase = '\0';
+
+		to_check = strstr(fname, "/md/");
+		is_in = sysfs_is_spare_device_belongs_to(fd, to_check);
+		if (is_in == -1) {
+			char *p;
+			struct stat stb;
+			char stb_name[PATH_MAX];
+
+			dev = malloc(sizeof(*dev));
+			if (!dev)
+				goto abort;
+			strncpy(dev->text_version, fname, 50);
+
+			*dbase++ = '/';
+
+			dev->disk.raid_disk = strtoul(buf, &ep, 10);
+			dev->disk.raid_disk = -1;
+
+			strcpy(dbase, "block/dev");
+			if (load_sys(fname, buf)) {
+				free(dev);
+				continue;
+			}
+			/* check first if we are working on block device
+			 * if not, we cannot check it
+			 */
+			p = strchr(dev->text_version, '-');
+			if (p)
+				p++;
+			sprintf(stb_name, "/dev/%s", p);
+			if (stat(stb_name, &stb) < 0) {
+				dprintf(Name ": stat failed for %s: %s.\n",
+					stb_name, strerror(errno));
+				free(dev);
+				continue;
+			}
+			if (!S_ISBLK(stb.st_mode)) {
+				dprintf(Name ": %s is not a block device. Skip checking.\n",
+					stb_name);
+				goto skip;
+			}
+			dprintf(Name": %s seams to a be block device\n", stb_name);
+			sscanf(buf, "%d:%d", &dev->disk.major, &dev->disk.minor);
+			strcpy(dbase, "block/device/state");
+			if (load_sys(fname, buf) != 0) {
+				free(dev);
+				continue;
+			}
+			if (strncmp(buf, "offline", 7) == 0) {
+				free(dev);
+				continue;
+			}
+			if (strncmp(buf, "failed", 6) == 0) {
+				free(dev);
+				continue;
+			}
+
+skip:
+			/* add this disk to spares list */
+			dev->next = ret_val->devs;
+			ret_val->devs = dev;
+			ret_val->array.spare_disks++;
+			*(dbase-1) = '\0';
+			dprintf("sysfs: found spare: %s [%d:%d]\n",
+				fname, dev->disk.major, dev->disk.minor);
+		}
+	}
+	closedir(dir);
+	return ret_val;
+
+abort:
+	if (dir)
+		closedir(dir);
+	sysfs_free(ret_val);
+
+	return NULL;
+}
+
 int sysfs_freeze_array(struct mdinfo *sra)
 {
 	/* Try to freeze resync/rebuild on this array/container.
diff --git a/util.c b/util.c
index 4b41e2b..7eac1e3 100644
--- a/util.c
+++ b/util.c
@@ -1906,3 +1906,150 @@ int experimental(void)
 	}
 }
 
+int path2devnum(char *pth)
+{
+	char *ep;
+	int fd = -1;
+	char *dev_pth = NULL;
+	char *dev_str;
+	int dev_num = -1;
+
+	fd = open(pth, O_RDONLY);
+	if (fd < 0)
+		return dev_num;
+	close(fd);
+	dev_pth = canonicalize_file_name(pth);
+	if (dev_pth == NULL)
+		return dev_num;
+	dev_str = strrchr(dev_pth, '/');
+	if (dev_str) {
+		while (!isdigit(dev_str[0]))
+			dev_str++;
+		dev_num = strtoul(dev_str, &ep, 10);
+		if (*ep != '\0')
+			dev_num = -1;
+	}
+
+	if (dev_pth)
+		free(dev_pth);
+
+	return dev_num;
+}
+
+extern void map_read(struct map_ent **map);
+extern void map_free(struct map_ent *map);
+int find_array_minor(char *text_version, int external, int container, int *minor)
+{
+	int i;
+	char path[PATH_MAX];
+	struct stat s;
+
+	if (minor == NULL)
+		return -2;
+
+	snprintf(path, PATH_MAX, "/dev/md/%s", text_version);
+	i = path2devnum(path);
+	if (i > -1) {
+		*minor = i;
+		return 0;
+	}
+
+	i = path2devnum(text_version);
+	if (i > -1) {
+		*minor = i;
+		return 0;
+	}
+
+	if (container > 0) {
+		struct map_ent *map = NULL;
+		struct map_ent *m;
+		char cont[PATH_MAX];
+
+		snprintf(cont, PATH_MAX, "/md%i/", container);
+		map_read(&map);
+		for (m = map; m; m = m->next) {
+			int index;
+			unsigned int len = 0;
+			char buf[PATH_MAX];
+
+			/* array have belongs to proper container
+			*/
+			if (strncmp(cont, m->metadata, 6) != 0)
+				continue;
+			/* begin of array name in map have to be the same
+			 * as array name in metadata
+			 */
+			if (strncmp(m->path, path, strlen(path)) != 0)
+				continue;
+			/* array name has to be followed by '_' char
+			 */
+			len = strlen(path);
+			if (*(m->path + len) != '_')
+				continue;
+			/* then we have to have  valid index
+			 */
+			len++;
+			if (strlen(m->path + len) <= 0)
+			    continue;
+			/* index has to be las position in array name
+			 */
+			index = atoi(m->path + strlen(path) + 1);
+			snprintf(buf, PATH_MAX, "%i", index);
+			len += strlen(buf);
+			if (len != strlen(m->path))
+				continue;
+			dprintf("Found %s device based on mdadm maps\n", m->path);
+			*minor = m->devnum;
+			map_free(map);
+			return 0;
+		}
+		map_free(map);
+	}
+
+	for (i = 127; i >= 0; i--) {
+		char buf[PATH_MAX];
+
+		snprintf(path, PATH_MAX, "/sys/block/md%d/md/", i);
+		if (stat(path, &s) != -1) {
+			strcat(path, "metadata_version");
+			if (load_sys(path, buf))
+				continue;
+			if (external) {
+				char *version = strchr(buf, ':');
+				if (version && strcmp(version + 1,
+						      text_version))
+					continue;
+			} else {
+				if (strcmp(buf, text_version))
+					continue;
+			}
+			*minor = i;
+			return 0;
+		}
+	}
+
+	return -1;
+}
+
+/* find_array_minor2 looks for frozen devices also
+ */
+int find_array_minor2(char *text_version, int external, int container, int *minor)
+{
+	int result;
+	char buf[PATH_MAX];
+
+	strcpy(buf, text_version);
+	result = find_array_minor(text_version, external, container, minor);
+	if (result < 0) {
+		/* try to find frozen array also
+		 */
+		char buf[PATH_MAX];
+
+		strcpy(buf, text_version);
+
+		*buf = '-';
+		result = find_array_minor(buf, external, container, minor);
+	}
+	return result;
+}
+


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/27] imsm: Process reshape_update in mdmon
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (2 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 03/27] imsm: Prepare reshape_update in mdadm Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 05/27] imsm: Block array state change during reshape Adam Kwolek
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

For this update prepare_update() allocates memory to relink imsm (bigger) device
imsm structures. It calculates new /bigger/ anchor size.

Process update applies update in to imsm structures. If necessary for first array in container
it turns spares in to raid disks in metadata.

active_array receives information about number of added devices (reshape_delta_disks)
state_of_array is turned in to reshape_is_starting (this triggers managemon action)

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |  140 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 140 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index ae2f567..402cc30 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5311,6 +5311,99 @@ static void imsm_process_update(struct supertype *st,
 
 	switch (type) {
 	case update_reshape: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct dl *new_disk;
+		struct active_array *a;
+		int i;
+		__u32 new_mpb_size;
+		int new_disk_num;
+		struct intel_dev *current_dev;
+
+		dprintf("imsm: imsm_process_update() for update_reshape [u->update_prepared  = %i]\n", u->update_prepared);
+		if ((u->update_prepared == -1) ||
+		    (u->devnum < 0)) {
+			dprintf("imsm: Error: update_reshape not prepared\n");
+			goto update_reshape_exit;
+		}
+
+		if (u->spares_in_update) {
+			new_disk_num = mpb->num_disks + u->reshape_delta_disks;
+			new_mpb_size = disks_to_mpb_size(new_disk_num);
+			if (mpb->mpb_size < new_mpb_size)
+				mpb->mpb_size = new_mpb_size;
+
+			/* enable spares to use in array
+			*/
+			for (i = 0; i < u->reshape_delta_disks; i++) {
+				char buf[PATH_MAX];
+
+				new_disk = super->disks;
+				while (new_disk) {
+					if ((new_disk->major == u->upd_disks[i].major) &&
+					    (new_disk->minor == u->upd_disks[i].minor))
+							break;
+					new_disk = new_disk->next;
+				}
+				if (new_disk == NULL) {
+					u->update_prepared = -1;
+					goto update_reshape_exit;
+				}
+				if (new_disk->index < 0) {
+					new_disk->index = i + mpb->num_disks;
+					new_disk->raiddisk = new_disk->index; /* slot to fill in autolayout */
+					new_disk->disk.status |= CONFIGURED_DISK;
+					new_disk->disk.status &= ~SPARE_DISK;
+				}
+				sprintf(buf, "%d:%d", new_disk->major, new_disk->minor);
+				if (new_disk->fd < 0)
+					new_disk->fd = dev_open(buf, O_RDWR);
+				imsm_get_new_device_name(new_disk);
+			}
+		}
+
+		dprintf("imsm: process_update(): update_reshape: volume set mpb->num_raid_devs = %i\n", mpb->num_raid_devs);
+		/* manage changes in volumes
+		 */
+		/* check if array is in RESHAPE_NOT_ACTIVE reshape state
+		*/
+		for (a = st->arrays; a; a = a->next)
+			if (a->devnum == u->devnum)
+				break;
+		if ((a == NULL) || (a->reshape_state != reshape_not_active)) {
+			u->update_prepared = -1;
+			goto update_reshape_exit;
+		}
+		/* find current dev in intel_super
+		 */
+		dprintf("\t\tLooking  for volume %s\n", (char *)u->devs_mem.dev->volume);
+		current_dev = super->devlist;
+		while (current_dev) {
+			if (strcmp((char *)current_dev->dev->volume,
+				   (char *)u->devs_mem.dev->volume) == 0)
+				break;
+			current_dev = current_dev->next;
+		}
+		if (current_dev == NULL) {
+			u->update_prepared = -1;
+			goto update_reshape_exit;
+		}
+
+		dprintf("Found volume %s\n", (char *)current_dev->dev->volume);
+		/* replace current device with provided in update
+		 */
+		free(current_dev->dev);
+		current_dev->dev = u->devs_mem.dev;
+		u->devs_mem.dev = NULL;
+
+		/* set reshape_delta_disks
+		 */
+		a->reshape_delta_disks = u->reshape_delta_disks;
+		a->reshape_state = reshape_is_starting;
+
+		super->updates_pending++;
+update_reshape_exit:
+		if (u->devs_mem.dev)
+			free(u->devs_mem.dev);
 		break;
 	}
 	case update_activate_spare: {
@@ -5633,6 +5726,53 @@ static void imsm_prepare_update(struct supertype *st,
 
 	switch (type) {
 	case update_reshape: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct dl *dl = NULL;
+		void *upd_devs;
+
+		u->update_prepared = -1;
+		u->devs_mem.dev = NULL;
+		dprintf("imsm: imsm_prepare_update() for update_reshape\n");
+		if (u->devnum < 0) {
+			dprintf("imsm: No passed device.\n");
+			break;
+		}
+		dprintf("imsm: reshape delta disks is = %i\n", u->reshape_delta_disks);
+		if (u->reshape_delta_disks < 0)
+			break;
+		u->update_prepared = 1;
+		if (u->reshape_delta_disks == 0) {
+			/* for non growing reshape buffers sizes are not affected
+			 * but check some parameters
+			 */
+			break;
+		}
+		/* count HDDs
+		 */
+		u->disks_count = 0;
+		for (dl = super->disks; dl; dl = dl->next)
+			if (dl->index >= 0)
+				u->disks_count++;
+
+		/* set pointer in monitor address space
+		*/
+		upd_devs = (struct imsm_dev *)((void *)u + u->upd_devs_offset);
+		/* allocate memory for new volumes */
+		if (((struct imsm_dev *)(upd_devs))->vol.migr_type != MIGR_GEN_MIGR) {
+			dprintf("imsm: Error.Device is not in migration state.\n");
+			u->update_prepared = -1;
+			break;
+		}
+		dprintf("passed device : %s\n", ((struct imsm_dev *)(upd_devs))->volume);
+		u->devs_mem.dev = calloc(1, u->device_size);
+		if (u->devs_mem.dev == NULL) {
+			u->update_prepared = -1;
+			break;
+		}
+		dprintf("METADATA Copy - using it.\n");
+		memcpy(u->devs_mem.dev, upd_devs, u->device_size);
+		len = disks_to_mpb_size(u->spares_in_update + mpb->num_disks);
+		dprintf("New anchor length is %llu\n", (unsigned long long)len);
 		break;
 	}
 	case update_create_array: {


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/27] imsm: Block array state change during reshape
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (3 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 04/27] imsm: Process reshape_update in mdmon Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 06/27] Process reshape initialization by managemon Adam Kwolek
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Array state change is blocked due to reshape action in progress
metadata changes are during applying.

'1' is returned to indicate that array is clean

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 402cc30..c072753 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -4780,6 +4780,16 @@ static int imsm_set_array_state(struct active_array *a, int consistent)
 	__u8 map_state = imsm_check_degraded(super, dev, failed);
 	__u32 blocks_per_unit;
 
+	if (a->reshape_state != reshape_not_active) {
+		/* array state change is blocked due to reshape action
+		 * metadata changes are during applying.
+		 *
+		 * '1' is returned to indicate that array is clean
+		 */
+		dprintf("imsm: imsm_set_array_state() called during reshape.\n");
+		return 1;
+	}
+
 	/* before we activate this array handle any missing disks */
 	if (consistent == 2)
 		handle_missing(super, dev);
@@ -5144,6 +5154,10 @@ static struct mdinfo *imsm_activate_spare(struct active_array *a,
 
 	dprintf("imsm: activate spare: inst=%d failed=%d (%d) level=%d\n",
 		inst, failed, a->info.array.raid_disks, a->info.array.level);
+
+	if (a->reshape_state != reshape_not_active)
+		return NULL;
+
 	if (imsm_check_degraded(super, dev, failed) != IMSM_T_STATE_DEGRADED)
 		return NULL;
 


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/27] Process reshape initialization by managemon
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (4 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 05/27] imsm: Block array state change during reshape Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 07/27] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Monitor signals request to managemon (using reshape_delta_disks variable).
This caused call to reshape_array() vector. It prepares second metadata update for added disks slot verification.
Slots are set by md during reshape start and they are unknown to user space so far.
Second update is sent after reshape is started. During this update processing, metadata is checked against slot numbers set by md and in mismatch case information metadata is updated.

The reshape is being stared in delayed state due to sync_max was set to 0. After this reshape_delta_disk is set to 'in progress' value to avoid reentry.
Reshape process is continued in mdadm.

If reshape cannot be started or any failure condition occurs, 'cancel' message is prepared by reshape_array() and send to monitor, to rollback metadata changes.
Mdadm is informed about failure by idle array state.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mdadm.h     |   27 +++++++++++++++++++++++
 2 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index 945b173..a6e6880 100644
--- a/managemon.c
+++ b/managemon.c
@@ -398,6 +398,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 	 */
 	char buf[64];
 	int frozen;
+	struct active_array *newa = NULL;
 
 	// FIXME
 	a->info.array.raid_disks = mdstat->raid_disks;
@@ -409,6 +410,74 @@ static void manage_member(struct mdstat_ent *mdstat,
 	else
 		frozen = 1; /* can't read metadata_version assume the worst */
 
+	if ((a->reshape_state != reshape_not_active) &&
+	    (a->reshape_state != reshape_in_progress)) {
+		dprintf("Reshape signals need to manage this member\n");
+		if (a->container->ss->reshape_array) {
+			struct metadata_update *updates = NULL;
+			struct mdinfo *newdev = NULL;
+			struct mdinfo *d;
+
+			newdev = newa->container->ss->reshape_array(newa, reshape_in_progress, &updates);
+			if (newdev) {
+				int status_ok = 1;
+				newa = duplicate_aa(a);
+				if (newa == NULL)
+					goto reshape_out;
+
+				for (d = newdev; d ; d = d->next) {
+					struct mdinfo *newd;
+
+					newd = malloc(sizeof(*newd));
+					if (!newd) {
+						status_ok = 0;
+						dprintf("Cannot aallocate memory for new disk.\n");
+						continue;
+					}
+					if (sysfs_add_disk(&newa->info, d, 0) < 0) {
+						free(newd);
+						status_ok = 0;
+						dprintf("Cannot add disk to array.\n");
+						continue;
+					}
+					disk_init_and_add(newd, d, newa);
+				}
+				/* go with reshape
+				 */
+				if (status_ok)
+					if (sysfs_set_num(&newa->info, NULL, "sync_max", 0) < 0)
+						status_ok = 0;
+				if (status_ok && sysfs_set_str(&newa->info, NULL, "sync_action", "reshape") == 0) {
+					/* reshape executed
+					 */
+					dprintf("Reshape was started\n");
+					replace_array(a->container, a, newa);
+					a = newa;
+				} else {
+					/* on problems cancel update
+					 */
+					free_aa(newa);
+					free_updates(&updates);
+					updates = NULL;
+					a->container->ss->reshape_array(a, reshape_cancel_request, &updates);
+					sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+				}
+			}
+			dprintf("Send metadata update for reshape.\n");
+
+			queue_metadata_update(updates);
+			updates = NULL;
+			wakeup_monitor();
+reshape_out:
+			while (newdev) {
+				d = newdev->next;
+				free(newdev);
+				newdev = d;
+			}
+			free_updates(&updates);
+		}
+	}
+
 	if (a->check_degraded && !frozen) {
 		struct metadata_update *updates = NULL;
 		struct mdinfo *newdev = NULL;
diff --git a/mdadm.h b/mdadm.h
index 423f62c..20f65cc 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -520,6 +520,7 @@ extern char *map_dev(int major, int minor, int create);
 
 struct active_array;
 struct metadata_update;
+enum state_of_reshape;
 
 /* A superswitch provides entry point the a metadata handler.
  *
@@ -747,6 +748,32 @@ extern struct superswitch {
 	 */
 	const char *(*get_disk_controller_domain)(const char *path);
 
+	/* reshape_array() will
+	 * 1. check is sync_max is set to 0
+	 * 2. prepare device list that has to be added
+	 * 3. prepare metadata update message to set disks slots
+	 *    after reshape is started
+	 * request_type:
+	 * 1. RESHAPE_CANCEL_REQUEST
+	 *    In error case it prepares metadata roll back message.
+	 *    Such error case message should be prepared when
+	 *    passed request_type is set to RESHAPE_CANCEL_REQUEST.
+	 * 1. RESHAPE_IN_PROGRESS
+	 *    requests transition to RESHAPE_IN_PROGRESS state
+	 *    so proper update has to be prepared
+	 * In active array structure can appear values:
+	 * 1. RESHAPE_NOT_ACTIVE
+	 * 2. RESHAPE_IN_PROGRESS
+	 * 3. any other value indicates requested disk number if array change
+	 *    this is visible only during reshape and metadata initialization
+	 *    after initialization RESHAPE_IN_PROGRESS has to be placed
+	 *    in reshape_delta_disks. When reshape is finished it is replaced
+	 *    by RESHAPE_NOT_ACTIVE
+	 */
+	struct mdinfo *(*reshape_array)(struct active_array *a,
+			     enum state_of_reshape request_type,
+			     struct metadata_update **updates);
+
 	int swapuuid; /* true if uuid is bigending rather than hostendian */
 	int external;
 	const char *name; /* canonical metadata name */


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/27] imsm: Verify slots in meta against slot numbers set by md
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (5 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 06/27] Process reshape initialization by managemon Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 08/27] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

To verify slots numbers stored in metadata against those chosen by md, update_reshape_set_slots_update is used.

Managemon calls reshape_array() vector and prepares slot verification metadata update there. It is sent when reshape is started successfully in md.
Then monitor updates/verifies slots.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |  302 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 302 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index c072753..82c17ba 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -286,6 +286,7 @@ enum imsm_update_type {
 	update_rename_array,
 	update_add_disk,
 	update_reshape,
+	update_reshape_set_slots,
 };
 
 struct imsm_update_activate_spare {
@@ -5288,6 +5289,7 @@ static int disks_overlap(struct intel_super *super, int idx, struct imsm_update_
 }
 
 static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
+int imsm_reshape_array_set_slots(struct active_array *a);
 int imsm_get_new_device_name(struct dl *dl);
 
 static void imsm_process_update(struct supertype *st,
@@ -5420,6 +5422,25 @@ update_reshape_exit:
 			free(u->devs_mem.dev);
 		break;
 	}
+	case update_reshape_set_slots: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct active_array *a;
+
+		dprintf("imsm: process_update() for update_reshape_set_slot for device %i\n", u->devnum);
+		for (a = st->arrays; a; a = a->next)
+			if (a->devnum == u->devnum) {
+				break;
+			}
+
+		if (a == NULL) {
+			dprintf(" - cannot locate requested array\n");
+			break;
+		}
+
+		if (imsm_reshape_array_set_slots(a) > -1)
+			super->updates_pending++;
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5789,6 +5810,9 @@ static void imsm_prepare_update(struct supertype *st,
 		dprintf("New anchor length is %llu\n", (unsigned long long)len);
 		break;
 	}
+	case update_reshape_set_slots: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -6566,6 +6590,283 @@ int imsm_get_new_device_name(struct dl *dl)
 	return rv;
 }
 
+int imsm_reshape_array_manage_new_slots(struct intel_super *super, int inst, int devnum, int correct);
+
+int imsm_reshape_array_set_slots(struct active_array *a)
+{
+	struct intel_super *super = a->container->sb;
+	int inst = a->info.container_member;
+
+	return imsm_reshape_array_manage_new_slots(super, inst, a->devnum, 1);
+}
+/* imsm_reshape_array_manage_new_slots()
+ * returns: number of corrected slots for correct == 1
+ *          counted number of different slots for correct == 0
+*/
+int imsm_reshape_array_manage_new_slots(struct intel_super *super, int inst, int devnum, int correct)
+{
+
+	struct imsm_dev *dev = get_imsm_dev(super, inst);
+	struct imsm_map *map_1 = get_imsm_map(dev, 0);
+	struct imsm_map *map_2 = get_imsm_map(dev, 1);
+	struct dl *dl;
+	unsigned long long sysfs_slot;
+	char buf[PATH_MAX];
+	char *devname;
+	int fd;
+	struct mdinfo *sra = NULL;
+	int ret_val = 0;
+
+	if ((map_1 == NULL) || (map_2 == NULL)) {
+		dprintf("imsm_reshape_array_set_slots() no maps (map_1 =%p, map_2 = %p)\n", map_1, map_2);
+		dprintf("\t\tdev->vol.migr_state = %i\n", dev->vol.migr_state);
+		dprintf("\t\tdev->volume = %s\n", dev->volume);
+		return -1;
+	}
+
+	/* verify/correct slot configuration of added disks
+	 */
+	dprintf("\n\nStart map verification for %i added devices on device no %i\n",
+		map_1->num_members - map_2->num_members, devnum);
+	devname = devnum2devname(devnum);
+	if (devname == NULL) {
+		dprintf("imsm: ERROR: Cannot get device name.\n");
+		return -1;
+	}
+	sprintf(buf, "/dev/%s", devname);
+	free(devname);
+
+	fd = open(buf, O_RDONLY);
+	if (fd < 0) {
+		dprintf("imsm: ERROR: Cannot open device %s.\n", buf);
+		return -1;
+	}
+
+	sra = sysfs_read(fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+	if (!sra) {
+		dprintf("imsm: ERROR: Device not found.\n");
+		close(fd);
+		return -1;
+	}
+
+	for (dl = super->disks; dl; dl = dl->next) {
+		int fd2;
+		int rv;
+
+		dprintf("\tLooking at device %s (index = %i).\n", dl->devname, dl->index);
+		if (dl->devname && (strlen(dl->devname) > 5))
+			sprintf(buf, "/sys/block/%s/md/dev-%s/slot",
+				sra->sys_name, dl->devname+5);
+		fd2 = open(buf, O_RDONLY);
+		if (fd2 < 0)
+			continue;
+		rv = sysfs_fd_get_ll(fd2, &sysfs_slot);
+		close(fd2);
+		if (rv < 0)
+			continue;
+		dprintf("\t\tLooking at slot %llu in sysfs.\n", sysfs_slot);
+		if ((int)sysfs_slot != dl->index) {
+			dprintf("Slots doesn't match sysfs->%i and imsm->%i\n", (int)sysfs_slot, dl->index);
+			ret_val++;
+			if (correct)
+				dl->index = sysfs_slot;
+		}
+	}
+	close(fd);
+	sysfs_free(sra);
+	dprintf("IMSM Map verification finished (found wrong slots : %i).\n", ret_val);
+
+	return ret_val;
+}
+
+struct mdinfo *imsm_grow_array(struct active_array *a)
+{
+	int disk_count = 0;
+	struct intel_super *super = a->container->sb;
+	int inst = a->info.container_member;
+	struct imsm_dev *dev = get_imsm_dev(super, inst);
+	struct imsm_map *map = get_imsm_map(dev, 0);
+	struct mdinfo *di;
+	struct dl *dl;
+	int i;
+	int prev_raid_disks = a->info.array.raid_disks;
+	int new_raid_disks = prev_raid_disks + a->reshape_delta_disks;
+	struct mdinfo *vol = NULL;
+	char buf[PATH_MAX];
+	char *p;
+	int fd;
+	struct mdinfo *rv = NULL;
+
+	dprintf("imsm: grow array: inst=%d raid disks=%d(%d) level=%d\n",
+		inst, a->info.array.raid_disks, new_raid_disks, a->info.array.level);
+
+	/* get array sysfs entry
+	 */
+	p = devnum2devname(a->devnum);
+	if (p == NULL)
+		return rv;
+	sprintf(buf, "/dev/%s", p);
+	free(p);
+	fd = open(buf, O_RDONLY);
+	if (fd < 0)
+		return rv;
+	vol = sysfs_read(fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+	if (vol == NULL) {
+		close(fd);
+		return rv;
+	}
+	/* Look for all disks beyond current configuration
+	 * To handle degradation after takeover
+	 * look also on last disk in configuration.
+	 */
+	for (i = prev_raid_disks; i < new_raid_disks; i++) {
+		/* OK, this device can be added.  Try to add.
+		 */
+		dl = imsm_add_spare(super, i, a, 0, rv);
+		if (!dl)
+			continue;
+
+		if (dl->index < 0)
+			dl->index = i;
+		/* found a usable disk with enough space */
+		di = malloc(sizeof(*di));
+		if (!di)
+			continue;
+
+		memset(di, 0, sizeof(*di));
+		/* dl->index will be -1 in the case we are activating a
+		 * pristine spare.  imsm_process_update() will create a
+		 * new index in this case.  On disks=4(5)ce a disk is found to be
+		 * failed in all member arrays it is kicked from the
+		 * metadata
+		 */
+		di->disk.number = dl->index;
+
+		/* (ab)use di->devs to store a pointer to the device
+		 * we chose
+		 */
+		di->devs = (struct mdinfo *) dl;
+
+		di->disk.raid_disk = -1;
+		di->disk.major = dl->major;
+		di->disk.minor = dl->minor;
+		di->disk.state = (1<<MD_DISK_SYNC) |
+				 (1<<MD_DISK_ACTIVE);
+		di->next_state = 0;
+
+		di->recovery_start = MaxSector;
+		di->data_offset = __le32_to_cpu(map->pba_of_lba0);
+		di->component_size = a->info.component_size;
+		di->container_member = inst;
+		super->random = random32();
+
+		di->next = rv;
+		rv = di;
+		disk_count++;
+		dprintf("%x:%x to be %d at %llu\n", dl->major, dl->minor,
+			i, di->data_offset);
+	}
+
+	dprintf("imsm: imsm_grow_array() configures %i raid disks\n", disk_count);
+	close(fd);
+	sysfs_free(vol);
+	if (disk_count != a->reshape_delta_disks) {
+
+		dprintf("imsm: ERROR: but it should configure %i\n",
+			a->reshape_delta_disks);
+
+		while (rv) {
+			di = rv;
+			rv = rv->next;
+			free(di);
+		}
+	}
+
+	return rv;
+}
+
+struct mdinfo *imsm_reshape_array(struct active_array *a, enum state_of_reshape request_type,
+			struct metadata_update **updates)
+{
+	struct imsm_update_reshape *u = NULL;
+	struct metadata_update *mu;
+	struct mdinfo *disk_list = NULL;
+
+	dprintf("imsm: imsm_reshape_array(reshape_delta_disks = %i)\t", a->reshape_delta_disks);
+	if (request_type == reshape_cancel_request) {
+		dprintf("prepare cancel message.\n");
+		goto imsm_reshape_array_exit;
+	}
+	if (a->reshape_state == reshape_not_active) {
+		dprintf("has nothing to do.\n");
+		return disk_list;
+	}
+	if (a->reshape_delta_disks < 0) {
+		dprintf("doesn't support shrinking.\n");
+		a->reshape_state = reshape_not_active;
+		return disk_list;
+	}
+
+	if (a->reshape_delta_disks == 0) {
+		dprintf("array parameters has to be changed\n");
+		/* TBD */
+	}
+	if (a->reshape_delta_disks > 0) {
+		dprintf("grow is detected.\n");
+		disk_list = imsm_grow_array(a);
+	}
+
+	if (disk_list) {
+		dprintf("imsm: send update update_reshape_set_slots\n");
+
+		u = (struct imsm_update_reshape *)calloc(1, sizeof(struct imsm_update_reshape));
+		if (u) {
+			u->type = update_reshape_set_slots;
+			a->reshape_state = reshape_in_progress;
+		}
+	} else
+		dprintf("error: cannot start reshape\n");
+
+imsm_reshape_array_exit:
+	if (u == NULL) {
+		dprintf("imsm: send update update_reshape_cancel\n");
+		a->reshape_state = reshape_not_active;
+		sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+	}
+
+	if (u) {
+		/* post any prepared update
+		 */
+		u->devnum = a->devnum;
+
+		u->update_memory_size = sizeof(struct imsm_update_reshape);
+		u->reshape_delta_disks = a->reshape_delta_disks;
+		u->update_prepared = 1;
+
+		mu = malloc(sizeof(struct metadata_update));
+		if (mu) {
+			mu->buf = (void *)u;
+			mu->space = NULL;
+			mu->len = u->update_memory_size;
+			mu->next = *updates;
+			*updates = mu;
+		} else {
+			a->reshape_state = reshape_not_active;
+			free(u);
+			u = NULL;
+		}
+	}
+
+	if ((disk_list) && (u == NULL)) {
+		while (disk_list) {
+			struct mdinfo *di = disk_list;
+			disk_list = disk_list->next;
+			free(di);
+		}
+	}
+	return disk_list;
+}
+
 struct superswitch super_imsm = {
 #ifndef	MDASSEMBLE
 	.examine_super	= examine_super_imsm,
@@ -6602,6 +6903,7 @@ struct superswitch super_imsm = {
 	.default_geometry = default_geometry_imsm,
 	.get_disk_controller_domain = imsm_get_disk_controller_domain,
 	.reshape_super  = imsm_reshape_super,
+	.reshape_array	= imsm_reshape_array,
 
 	.external	= 1,
 	.name = "imsm",


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/27] imsm: Cancel metadata changes on reshape start failure
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (6 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 07/27] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:21 ` [PATCH 09/27] imsm: Do not accept messages sent by mdadm Adam Kwolek
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

It can occurs that managemon cannot run reshape in md.
To perform metadata changes cancellation, update_reshape_cancel message is used. It is prepared by reshape_array() vector.
When monitor receives this message, it rollbacks metadata changes made previously during processing update_reshape update.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 82c17ba..e43b58a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -287,6 +287,7 @@ enum imsm_update_type {
 	update_add_disk,
 	update_reshape,
 	update_reshape_set_slots,
+	update_reshape_cancel,
 };
 
 struct imsm_update_activate_spare {
@@ -5441,6 +5442,94 @@ update_reshape_exit:
 			super->updates_pending++;
 		break;
 	}
+	case update_reshape_cancel: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+		struct active_array *a;
+		int inst;
+		int i;
+		struct imsm_dev *dev;
+		struct imsm_dev *devi;
+		struct imsm_map *map_1;
+		struct imsm_map *map_2;
+		int reshape_delta_disks ;
+		struct dl *curr_disk;
+		int used_disks;
+		unsigned long long array_blocks;
+
+
+		dprintf("imsm: process_update() for update_reshape_cancel for device %i\n", u->devnum);
+		for (a = st->arrays; a; a = a->next)
+			if (a->devnum == u->devnum) {
+				break;
+			}
+		if (a == NULL)
+			break;
+
+		inst = a->info.container_member;
+		dev = get_imsm_dev(super, inst);
+		map_1 = get_imsm_map(dev, 0);
+		map_2 = get_imsm_map(dev, 1);
+		if (map_2 == NULL)
+			break;
+		reshape_delta_disks = map_1->num_members - map_2->num_members;
+		dprintf("\t\tRemove %i device(s) from configuration.\n", reshape_delta_disks);
+
+		/* when cancel was applied during reshape of second volume, we need disks for first
+		 * array reshaped previously, find the smallest delta_disks to remove
+		 */
+		i = 0;
+		devi = get_imsm_dev(super, i);
+		while (devi) {
+			struct imsm_map *mapi = get_imsm_map(devi, 0);
+			int delta_disks = map_1->num_members - mapi->num_members;
+			if ((i != inst) &&
+			    (delta_disks < reshape_delta_disks) &&
+			    (delta_disks >= 0))
+				reshape_delta_disks = delta_disks;
+			i++;
+			devi = get_imsm_dev(super, i);
+		}
+		/* remove disks
+		 */
+		if (reshape_delta_disks > 0) {
+			/* reverse device(s) back to spares
+			*/
+			curr_disk = super->disks;
+			while (curr_disk) {
+				dprintf("Looking at %i device to remove\n", curr_disk->index);
+				if (curr_disk->index >= map_2->num_members) {
+					dprintf("\t\t\tREMOVE\n");
+					curr_disk->index = -1;
+					curr_disk->raiddisk = -1;
+					curr_disk->disk.status &= ~CONFIGURED_DISK;
+					curr_disk->disk.status |= SPARE_DISK;
+				}
+				curr_disk = curr_disk->next;
+			}
+		}
+		/* roll back maps and migration
+		 */
+		memcpy(map_1, map_2, sizeof_imsm_map(map_2));
+		/* reconfigure map_2 and perform migration end
+		 */
+		map_2 = get_imsm_map(dev, 1);
+		memcpy(map_2, map_1, sizeof_imsm_map(map_1));
+		end_migration(dev, map_1->map_state);
+		/* array size rollback
+		 */
+		used_disks = imsm_num_data_members(dev);
+		if (used_disks) {
+			array_blocks = map_1->blocks_per_member * used_disks;
+			/* round array size down to closest MB
+			*/
+			array_blocks = (array_blocks >> SECT_PER_MB_SHIFT) << SECT_PER_MB_SHIFT;
+			dev->size_low = __cpu_to_le32((__u32)array_blocks);
+			dev->size_high = __cpu_to_le32((__u32)(array_blocks >> 32));
+		}
+
+		super->updates_pending++;
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5813,6 +5902,9 @@ static void imsm_prepare_update(struct supertype *st,
 	case update_reshape_set_slots: {
 		break;
 	}
+	case update_reshape_cancel: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -6562,6 +6654,31 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	return ret_val;
 }
 
+void imsm_grow_array_remove_devices_on_cancel(struct active_array *a)
+{
+	struct mdinfo *di = a->info.devs;
+	struct mdinfo *di_prev = NULL;
+
+	while (di) {
+		if (di->disk.raid_disk < 0) {
+			struct mdinfo *rmdev = di;
+			sysfs_set_str(&a->info, rmdev, "state", "faulty");
+			sysfs_set_str(&a->info, rmdev, "slot", "none");
+			sysfs_set_str(&a->info, rmdev, "state", "remove");
+
+			if (di_prev)
+				di_prev->next = di->next;
+			else
+				a->info.devs = di->next;
+			di = di->next;
+			free(rmdev);
+		} else {
+			di_prev = di;
+			di = di->next;
+		}
+	}
+}
+
 int imsm_get_new_device_name(struct dl *dl)
 {
 	int rv;
@@ -6832,6 +6949,12 @@ imsm_reshape_array_exit:
 		dprintf("imsm: send update update_reshape_cancel\n");
 		a->reshape_state = reshape_not_active;
 		sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+		imsm_grow_array_remove_devices_on_cancel(a);
+		u = (struct imsm_update_reshape *)calloc(1, sizeof(struct imsm_update_reshape));
+		if (u) {
+			u->type = update_reshape_cancel;
+			a->reshape_state = reshape_not_active;
+		}
 	}
 
 	if (u) {


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/27] imsm: Do not accept messages sent by mdadm
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (7 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 08/27] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
@ 2010-12-06 13:21 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 10/27] imsm: Do not indicate resync during reshape Adam Kwolek
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:21 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Messages update_reshape_cancel and update_reshape_set_slots ara intended to send by managemon.
If those message would be issued by mdadm prepare_message() is called in managemon for them.
In such cases set update_prepared to '-1' to indicate process_message() to not proceed such messages.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index e43b58a..b78baa0 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5438,6 +5438,13 @@ update_reshape_exit:
 			break;
 		}
 
+		/* do not accept this update type sent by mdadm
+		 */
+		if (u->update_prepared == -1) {
+			dprintf("imsm: message is sent by mdadm. cannot accept\n\n");
+			break;
+		}
+
 		if (imsm_reshape_array_set_slots(a) > -1)
 			super->updates_pending++;
 		break;
@@ -5465,6 +5472,13 @@ update_reshape_exit:
 		if (a == NULL)
 			break;
 
+		/* do not accept this update type sent by mdadm
+		 */
+		if (u->update_prepared == -1) {
+			dprintf("imsm: message is sent by mdadm. cannot accept\n\n");
+			break;
+		}
+
 		inst = a->info.container_member;
 		dev = get_imsm_dev(super, inst);
 		map_1 = get_imsm_map(dev, 0);
@@ -5900,9 +5914,19 @@ static void imsm_prepare_update(struct supertype *st,
 		break;
 	}
 	case update_reshape_set_slots: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+
+		/* do not accept this update type sent by mdadm
+		 */
+		u->update_prepared = -1;
 		break;
 	}
 	case update_reshape_cancel: {
+		struct imsm_update_reshape *u = (void *)update->buf;
+
+		/* do not accept this update type sent by mdadm
+		 */
+		u->update_prepared = -1;
 		break;
 	}
 	case update_create_array: {


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/27] imsm: Do not indicate resync during reshape
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (8 preceding siblings ...)
  2010-12-06 13:21 ` [PATCH 09/27] imsm: Do not accept messages sent by mdadm Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 11/27] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

If reshape is started resync is not allowed in parallel. This would break reshape.
If array is in General Migration state do not indicate resync and allow for reshape continuation.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index b78baa0..62c4245 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -4702,9 +4702,13 @@ static int is_resyncing(struct imsm_dev *dev)
 	    migr_type(dev) == MIGR_REPAIR)
 		return 1;
 
+	if (migr_type(dev) == MIGR_GEN_MIGR)
+		return 0;
+
 	migr_map = get_imsm_map(dev, 1);
 
-	if (migr_map->map_state == IMSM_T_STATE_NORMAL)
+	if ((migr_map->map_state == IMSM_T_STATE_NORMAL) &&
+	    (dev->vol.migr_type != MIGR_GEN_MIGR))
 		return 1;
 	else
 		return 0;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/27] imsm: Fill delta_disks field in getinfo_super()
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (9 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 10/27] imsm: Do not indicate resync during reshape Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 12/27] Control reshape in mdadm Adam Kwolek
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

delta_disks field is not always filled during getinfo_super() call.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 62c4245..5c3bd7b 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1511,6 +1511,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	struct intel_super *super = st->sb;
 	struct imsm_dev *dev = get_imsm_dev(super, super->current_vol);
 	struct imsm_map *map = get_imsm_map(dev, 0);
+	struct imsm_map *prev_map = get_imsm_map(dev, 1);
 	struct dl *dl;
 	char *devname;
 	int map_disks = info->array.raid_disks;
@@ -1542,7 +1543,11 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	info->component_size	  = __le32_to_cpu(map->blocks_per_member);
 	memset(info->uuid, 0, sizeof(info->uuid));
 	info->recovery_start = MaxSector;
-	info->reshape_active = 0;
+	info->reshape_active = (prev_map != NULL);
+	if (info->reshape_active)
+		info->delta_disks = map->num_members - prev_map->num_members;
+	else
+		info->delta_disks = 0;
 
 	if (map->map_state == IMSM_T_STATE_UNINITIALIZED || dev->vol.dirty) {
 		info->resync_start = 0;
@@ -1599,7 +1604,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 			}
 		}
 	}
-}				
+}
 
 /* check the config file to see if we can return a real uuid for this spare */
 static void fixup_container_spare_uuid(struct mdinfo *inf)


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 12/27] Control reshape in mdadm
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (10 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 11/27] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 13/27] Finalize reshape after adding disks to array Adam Kwolek
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

When managemon starts reshape while sync_max is set to 0, mdadm waits already for it in manage_reshape().
When array reaches reshape state, manage_reshape() handler checks if all metadata updates are in place.
If not mdadm has to wait until updates hits array.
It starts reshape using child_grow() common code. Then waits until reshape is not finished.
When it happens it sets size to value specified in metadata and performs backward takeover to raid0 if necessary.

If manage_reshape() finds idle array state (instead reshape state) it is treated as error condition and process is terminated.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c        |   16 +-
 mdadm.h       |    6 +
 mdmon.c       |   52 +++++
 super-intel.c |  561 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 625 insertions(+), 10 deletions(-)

diff --git a/Grow.c b/Grow.c
index ecaeb39..b5442a5 100644
--- a/Grow.c
+++ b/Grow.c
@@ -453,10 +453,6 @@ static __u32 bsb_csum(char *buf, int len)
 	return __cpu_to_le32(csum);
 }
 
-static int child_grow(int afd, struct mdinfo *sra, unsigned long blocks,
-		      int *fds, unsigned long long *offsets,
-		      int disks, int chunk, int level, int layout, int data,
-		      int dests, int *destfd, unsigned long long *destoffsets);
 static int child_shrink(int afd, struct mdinfo *sra, unsigned long blocks,
 			int *fds, unsigned long long *offsets,
 			int disks, int chunk, int level, int layout, int data,
@@ -487,7 +483,7 @@ static int freeze_container(struct supertype *st)
 	return 1;
 }
 
-static void unfreeze_container(struct supertype *st)
+void unfreeze_container(struct supertype *st)
 {
 	int container_dev = (st->container_dev != NoMdDev
 			     ? st->container_dev : st->devnum);
@@ -543,7 +539,7 @@ static void unfreeze(struct supertype *st, int frozen)
 	}
 }
 
-static void wait_reshape(struct mdinfo *sra)
+void wait_reshape(struct mdinfo *sra)
 {
 	int fd = sysfs_get_fd(sra, NULL, "sync_action");
 	char action[20];
@@ -2203,10 +2199,10 @@ static void validate(int afd, int bfd, unsigned long long offset)
 	}
 }
 
-static int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
-		      int *fds, unsigned long long *offsets,
-		      int disks, int chunk, int level, int layout, int data,
-		      int dests, int *destfd, unsigned long long *destoffsets)
+int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
+	       int *fds, unsigned long long *offsets,
+	       int disks, int chunk, int level, int layout, int data,
+	       int dests, int *destfd, unsigned long long *destoffsets)
 {
 	char *buf;
 	int degraded = 0;
diff --git a/mdadm.h b/mdadm.h
index 20f65cc..4563d14 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -474,6 +474,7 @@ extern int sysfs_add_disk(struct mdinfo *sra, struct mdinfo *sd, int resume);
 extern int sysfs_disk_to_scsi_id(int fd, __u32 *id);
 extern int sysfs_unique_holder(int devnum, long rdev);
 extern int sysfs_freeze_array(struct mdinfo *sra);
+extern void wait_reshape(struct mdinfo *sra);
 extern int load_sys(char *path, char *buf);
 extern int reshape_prepare_fdlist(char *devname,
 				  struct mdinfo *sra,
@@ -495,6 +496,11 @@ extern int reshape_open_backup_file(char *backup,
 extern unsigned long compute_backup_blocks(int nchunk, int ochunk,
 					   unsigned int ndata, unsigned int odata);
 extern struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd);
+extern int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
+		      int *fds, unsigned long long *offsets,
+		      int disks, int chunk, int level, int layout, int data,
+		      int dests, int *destfd, unsigned long long *destoffsets);
+extern void unfreeze_container(struct supertype *st);
 
 extern int save_stripes(int *source, unsigned long long *offsets,
 			int raid_disks, int chunk_size, int level, int layout,
diff --git a/mdmon.c b/mdmon.c
index 413ee29..271833f 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -530,3 +530,55 @@ void map_free(struct map_ent *map)
 {
 }
 
+void unfreeze_container(struct supertype *st)
+{
+}
+
+void wait_reshape(struct mdinfo *sra)
+{
+}
+
+unsigned long compute_backup_blocks(int nchunk, int ochunk,
+				    unsigned int ndata, unsigned int odata)
+{
+	return 0;
+}
+
+
+int reshape_prepare_fdlist(char *devname,
+			   struct mdinfo *sra,
+			   int raid_disks,
+			   int nrdisks,
+			   unsigned long blocks,
+			   char *backup_file,
+			   int *fdlist,
+			   unsigned long long *offsets)
+{
+	return 0;
+}
+
+int reshape_open_backup_file(char *backup_file,
+			     int fd,
+			     char *devname,
+			     long blocks,
+			     int *fdlist,
+			     unsigned long long *offsets)
+{
+	return -1;
+}
+
+int child_grow(int afd, struct mdinfo *sra,
+	       unsigned long stripes, int *fds, unsigned long long *offsets,
+	       int disks, int chunk, int level, int layout, int data,
+	       int dests, int *destfd, unsigned long long *destoffsets)
+{
+	return 1;
+}
+
+void reshape_free_fdlist(int *fdlist,
+			 unsigned long long *offsets,
+			 int size)
+{
+	;
+}
+
diff --git a/super-intel.c b/super-intel.c
index 5c3bd7b..5f96af2 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -26,6 +26,7 @@
 #include <scsi/sg.h>
 #include <ctype.h>
 #include <dirent.h>
+#include <sys/mman.h>
 
 /* MPB == Metadata Parameter Block */
 #define MPB_SIGNATURE "Intel Raid ISM Cfg Sig. "
@@ -6680,10 +6681,13 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 			}
 		} else
 			dprintf("imsm: Operation is not allowed on container\n");
+		if (ret_val)
+			unfreeze_container(st);
 	} else
 		dprintf("imsm: not a container operation\n");
 
 	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
+
 	return ret_val;
 }
 
@@ -6749,6 +6753,13 @@ int imsm_reshape_array_set_slots(struct active_array *a)
 
 	return imsm_reshape_array_manage_new_slots(super, inst, a->devnum, 1);
 }
+
+int imsm_reshape_array_count_slots_mismatches(struct intel_super *super, int inst, int devnum)
+{
+
+	return imsm_reshape_array_manage_new_slots(super, inst, devnum, 0);
+}
+
 /* imsm_reshape_array_manage_new_slots()
  * returns: number of corrected slots for correct == 1
  *          counted number of different slots for correct == 0
@@ -7023,6 +7034,555 @@ imsm_reshape_array_exit:
 	return disk_list;
 }
 
+int imsm_grow_manage_size(struct supertype *st, struct mdinfo *sra, int current_vol)
+{
+	int ret_val = 0;
+	struct mdinfo *info = NULL;
+	unsigned long long size;
+	int container_fd;
+	unsigned long long current_size = 0;
+
+	/* finalize current volume reshape
+	 * for external meta size has to be managed by mdadm
+	 * read size set in meta and put it to md when
+	 * reshape is finished.
+	 */
+
+	if (sra == NULL) {
+		dprintf("Error: imsm_grow_manage_size(): sra == NULL\n");
+		goto exit_grow_manage_size_ext_meta;
+	}
+	wait_reshape(sra);
+
+	/* reshape has finished, update md size
+	 * get per-device size and multiply by data disks
+	 */
+	container_fd = open_dev(st->container_dev);
+	if (container_fd < 0) {
+		dprintf("Error: imsm_grow_manage_size(): container_fd == 0\n");
+		goto exit_grow_manage_size_ext_meta;
+	}
+	st->ss->load_super(st, container_fd, NULL);
+	info = sysfs_read(container_fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+	close(container_fd);
+	if (info == NULL) {
+		dprintf("imsm: Cannot get device info.\n");
+		goto exit_grow_manage_size_ext_meta;
+	}
+	if (current_vol > -1) {
+		struct intel_super *super;
+
+		super = st->sb;
+		super->current_vol = current_vol;
+	}
+	st->ss->getinfo_super(st, info, NULL);
+	size = info->custom_array_size/2;
+	sysfs_get_ll(sra, NULL, "array_size", &current_size);
+	dprintf("imsm_grow_manage_size(): current size is %llu, set size to %llu\n", current_size, size);
+	sysfs_set_num(sra, NULL, "array_size", size);
+
+	ret_val = 1;
+
+exit_grow_manage_size_ext_meta:
+	sysfs_free(info);
+	return ret_val;
+}
+
+int imsm_child_grow(struct supertype *st, char *devname, int fd_in, struct mdinfo *sra, int current_vol, char *backup)
+{
+	int ret_val = 0;
+	int nrdisks;
+	int *fdlist;
+	unsigned long long *offsets;
+	unsigned int ndata, odata;
+	int ndisks, odisks;
+	unsigned long blocks, stripes;
+	int d;
+	struct mdinfo *sd;
+	int validate_fd;
+
+	nrdisks = ndisks = odisks = sra->array.raid_disks;
+	odisks -= sra->delta_disks;
+	odata = odisks-1;
+	ndata = ndisks-1;
+	fdlist = malloc((1+nrdisks) * sizeof(int));
+	offsets = malloc((1+nrdisks) * sizeof(offsets[0]));
+	if (!fdlist || !offsets) {
+		fprintf(stderr, Name ": malloc failed: grow aborted\n");
+		ret_val = 1;
+		if (fdlist)
+			free(fdlist);
+		if (offsets)
+			free(offsets);
+		return ret_val;
+	}
+	blocks = compute_backup_blocks(sra->array.chunk_size,
+				       sra->array.chunk_size,
+				       ndata, odata);
+
+	/* set MD_DISK_SYNC flag to open all devices that has to be backuped
+	 */
+	for (sd = sra->devs; sd; sd = sd->next) {
+		if ((sd->disk.raid_disk > -1) &&
+		    ((unsigned int)sd->disk.raid_disk < odata)) {
+			sd->disk.state |= (1<<MD_DISK_SYNC);
+			sd->disk.state &= ~(1<<MD_DISK_FAULTY);
+		} else {
+			sd->disk.state |= (1<<MD_DISK_FAULTY);
+			sd->disk.state &= ~(1<<MD_DISK_SYNC);
+		}
+	}
+#ifdef DEBUG
+	dprintf("FD list disk inspection:\n");
+	for (sd = sra->devs; sd; sd = sd->next) {
+		char *dn = map_dev(sd->disk.major,
+				   sd->disk.minor, 1);
+		dprintf("Disk %s", dn);
+		dprintf("\tstate = %i\n", sd->disk.state);
+	}
+#endif
+	d = reshape_prepare_fdlist(devname, sra, odisks,
+				    nrdisks, blocks, NULL,
+				    fdlist, offsets);
+	if (d < 0) {
+		fprintf(stderr, Name ": cannot prepare device list\n");
+		free(fdlist);
+		free(offsets);
+		ret_val = 1;
+		return ret_val;
+	}
+
+	if (reshape_open_backup_file(backup, fd_in, "imsm",
+				     (signed)blocks,
+				     fdlist, offsets) == 0) {
+		free(fdlist);
+		free(offsets);
+		ret_val = 1;
+		return ret_val;
+	}
+	d++;
+
+	mlockall(MCL_FUTURE);
+	if (ret_val == 0) {
+		if (check_env("MDADM_GROW_VERIFY"))
+			validate_fd = fd_in;
+		else
+			validate_fd = -1;
+
+		sra->array.raid_disks = odisks;
+		sra->new_level = sra->array.level;
+		sra->new_layout = sra->array.layout;
+		sra->new_chunk = sra->array.chunk_size;
+
+		stripes = blocks / (sra->array.chunk_size/512) / odata;
+		child_grow(validate_fd, sra, stripes,
+			fdlist, offsets,
+			odisks, sra->array.chunk_size,
+			sra->array.level, sra->array.layout, odata,
+			d - odisks, fdlist + odisks, offsets + odisks);
+		imsm_grow_manage_size(st, sra, current_vol);
+	}
+	reshape_free_fdlist(fdlist, offsets, d);
+
+	if (backup)
+		unlink(backup);
+
+	return ret_val;
+}
+
+void return_to_raid0(struct mdinfo *sra)
+{
+	if (sra->array.level == 4) {
+		dprintf("Execute backward takeover to raid0\n");
+		sysfs_set_str(sra, NULL, "level", "raid0");
+	}
+}
+
+int imsm_check_reshape_conditions(int fd, struct supertype *st, int current_array)
+{
+	char buf[PATH_MAX];
+	struct mdinfo *info = NULL;
+	int arrays_in_reshape_state = 0;
+	int wait_counter = 0;
+	int i;
+	int ret_val = 0;
+	struct intel_super *super = st->sb;
+	struct imsm_super *mpb = super->anchor;
+	int wrong_slots_counter;
+
+	/* wait until all arrays will be in reshape state
+	 * or error occures (iddle state detected)
+	 */
+	while ((arrays_in_reshape_state == 0) &&
+	       (ret_val == 0)) {
+		arrays_in_reshape_state = 0;
+		int temp_array;
+
+		if (wait_counter)
+			sleep(1);
+
+		for (i = 0; i < mpb->num_raid_devs; i++) {
+			int sync_max;
+			int len;
+
+			/* check array state in md
+			 */
+			st->ss->load_super(st, fd, NULL);
+			if (st->sb == NULL) {
+				dprintf("cannot get sb\n");
+				ret_val = 1;
+				break;
+			}
+			info = sysfs_read(fd, 0, GET_LEVEL|GET_VERSION|GET_DEVS|GET_STATE);
+			if (info == NULL) {
+				dprintf("imsm: Cannot get device info.\n");
+				break;
+			}
+			super = st->sb;
+			super->current_vol = i;
+			st->ss->getinfo_super(st, info, NULL);
+
+			find_array_minor(info->name, 1, st->devnum, &temp_array);
+			if (temp_array != current_array) {
+				if (temp_array < 0) {
+					ret_val = -1;
+					break;
+				}
+				sysfs_free(info);
+				info = NULL;
+				continue;
+			}
+			sprintf(info->sys_name, "md%i", current_array);
+			if (sysfs_get_str(info, NULL, "raid_disks", buf, sizeof(buf)) < 0) {
+				dprintf("cannot get raid_disks\n");
+				ret_val = 1;
+				break;
+			}
+			/* sync_max should be always set to 0
+			 */
+			if (sysfs_get_str(info, NULL, "sync_max", buf, sizeof(buf)) < 0) {
+				dprintf("cannot get sync_max\n");
+				ret_val = 1;
+				break;
+			}
+			len = strlen(buf)-1;
+			if (len < 0)
+				len = 0;
+			*(buf+len) = 0;
+			sync_max = atoi(buf);
+			if (sync_max != 0) {
+				dprintf("sync_max has wrong value (%s)\n", buf);
+				sysfs_free(info);
+				info = NULL;
+				continue;
+			}
+			if (sysfs_get_str(info, NULL, "sync_action", buf, sizeof(buf)) < 0) {
+				dprintf("cannot get sync_action\n");
+				ret_val = 1;
+				break;
+			}
+			len = strlen(buf)-1;
+			if (len < 0)
+				len = 0;
+			*(buf+len) = 0;
+			if (strncmp(buf, "idle", 7) == 0) {
+				dprintf("imsm: Error found array in idle state during reshape initialization\n");
+				ret_val = 1;
+				break;
+			}
+			if (strncmp(buf, "reshape", 7) == 0) {
+				arrays_in_reshape_state++;
+			} else {
+				if (strncmp(buf, "frozen", 6) != 0) {
+					*(buf+strlen(buf)) = 0;
+					dprintf("imsm: Error unexpected array state (%s) during reshape initialization\n",
+						buf);
+					ret_val = 1;
+					break;
+				}
+			}
+			/* this device looks ok, so
+			 * check if slots are set corectly
+			 */
+			super = st->sb;
+			wrong_slots_counter = imsm_reshape_array_count_slots_mismatches(super, i, atoi(info->sys_name+2));
+			sysfs_free(info);
+			info = NULL;
+			if (wrong_slots_counter != 0) {
+				dprintf("Slots for correction %i.\n", wrong_slots_counter);
+				ret_val = 1;
+				goto exit_imsm_check_reshape_conditions;
+			}
+		}
+		sysfs_free(info);
+		info = NULL;
+		wait_counter++;
+		if (wait_counter > 60) {
+			dprintf("exit on timeout, container is not prepared to reshape\n");
+			ret_val = 1;
+		}
+	}
+
+exit_imsm_check_reshape_conditions:
+	sysfs_free(info);
+	info = NULL;
+
+	return ret_val;
+}
+
+int imsm_manage_container_reshape(struct supertype *st, char *backup)
+{
+	int ret_val = 1;
+	char buf[PATH_MAX];
+	struct intel_super *super = st->sb;
+	struct imsm_super *mpb = super->anchor;
+	int fd;
+	struct mdinfo *info = NULL;
+	struct mdinfo info2;
+	int delta_disks;
+	struct geo_params geo;
+#ifdef DEBUG
+	int i;
+#endif
+
+	memset(&geo, sizeof (struct geo_params), 0);
+	/* verify reshape conditions
+	 * for single vlolume reshape exit only and reuse Grow_reshape() code
+	 */
+	if (st->container_dev != st->devnum) {
+		dprintf("imsm: imsm_manage_container_reshape() detects volume reshape (devnum = %i), exit.\n", st->devnum);
+		return ret_val;
+	}
+
+	if (backup == NULL) {
+		fprintf(stderr, Name ": Cannot grow - need backup-file\n");
+		return ret_val;
+	}
+
+	geo.dev_name = devnum2devname(st->devnum);
+	if (geo.dev_name == NULL) {
+		dprintf("imsm: Error: imsm_manage_reshape(): cannot get device name.\n");
+		return ret_val;
+	}
+
+	snprintf(buf, PATH_MAX, "/dev/%s", geo.dev_name);
+	fd = open(buf , O_RDONLY | O_DIRECT);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		goto imsm_manage_container_reshape_exit;
+	}
+
+	/* send pings to roll managemon and monitor
+	 */
+	ping_manager(geo.dev_name);
+	ping_monitor(geo.dev_name);
+
+#ifdef DEBUG
+	/* device list for reshape
+	 */
+	dprintf("Arrays to run reshape (no: %i)\n", mpb->num_raid_devs);
+	for (i = 0; i < mpb->num_raid_devs; i++) {
+		struct imsm_dev *dev = get_imsm_dev(super, i);
+		dprintf("\tDevice: %s\n", dev->volume);
+	}
+#endif
+
+	info2.devs = NULL;
+	super = st->sb;
+	super->current_vol = 0;
+	st->ss->getinfo_super(st, &info2, NULL);
+	geo.dev_id = -1;
+	find_array_minor(info2.name, 1, st->devnum, &geo.dev_id);
+	if (geo.dev_id < 0) {
+		dprintf("imsm. Error.Cannot get first array.\n");
+		goto imsm_manage_container_reshape_exit;
+	}
+	if (imsm_check_reshape_conditions(fd, st, geo.dev_id)) {
+		dprintf("imsm. Error. Wrong reshape conditions.\n");
+		goto imsm_manage_container_reshape_exit;
+	}
+	geo.raid_disks = info2.array.raid_disks;
+	dprintf("Container is ready for reshape ...\n");
+	switch (fork()) {
+	case 0:
+		fprintf(stderr, Name ": Child forked to run and monitor reshape\n");
+		while (geo.dev_id > -1) {
+			int fd2 = -1;
+			int i;
+			int temp_array = -1;
+			char *array;
+
+			for (i = 0; i < mpb->num_raid_devs; i++) {
+				struct intel_super *super;
+
+				st->ss->load_super(st, fd, NULL);
+				if (st->sb == NULL) {
+					dprintf("cannot get sb\n");
+					ret_val = 1;
+					goto imsm_manage_container_reshape_exit;
+				}
+				info2.devs = NULL;
+				super = st->sb;
+				super->current_vol = i;
+				st->ss->getinfo_super(st, &info2, NULL);
+				find_array_minor(info2.name, 1, st->devnum, &temp_array);
+				if (temp_array == geo.dev_id) {
+					dprintf("Checking slots for device md%i\n", geo.dev_id);
+					break;
+				}
+			}
+			snprintf(buf, PATH_MAX, "/dev/md%i", geo.dev_id);
+			dprintf("Prepare to reshape for device md%i\n", geo.dev_id);
+			fd2 = open(buf, O_RDWR | O_DIRECT);
+			if (fd2 < 0) {
+				dprintf("Reshape is broken (cannot open array)\n");
+				ret_val = 1;
+				goto imsm_manage_container_reshape_exit;
+			}
+			info = sysfs_read(fd2, 0, GET_VERSION | GET_LEVEL | GET_DEVS | GET_STATE |\
+						 GET_COMPONENT | GET_OFFSET | GET_CACHE |\
+						 GET_CHUNK | GET_DISKS | GET_DEGRADED |
+						 GET_SIZE | GET_LAYOUT);
+			if (info == NULL) {
+				dprintf("Reshape is broken (cannot read sysfs)\n");
+				close(fd2);
+				ret_val = 1;
+				goto imsm_manage_container_reshape_exit;
+			}
+			delta_disks = info->delta_disks;
+			super = st->sb;
+
+			if (sysfs_get_str(info, NULL, "sync_completed", buf, sizeof(buf)) >= 0) {
+				/* check if in previous pass we reshape any array
+				 * if not we have to omit sync_complete condition
+				 * and try to reshape arrays
+				 */
+				if ((*buf == '0') ||
+				     /* or this array was already reshaped */
+				     (strncmp(buf, "none", 4) == 0)) {
+					dprintf("Skip this array, sync_completed is %s\n", buf);
+					geo.dev_id = -1;
+					sysfs_free(info);
+					info = NULL;
+					close(fd2);
+					continue;
+				}
+			} else {
+				dprintf("Reshape is broken (cannot read sync_complete)\n");
+				dprintf("Array level is: %i\n", info->array.level);
+				ret_val = 1;
+				close(fd2);
+				goto imsm_manage_container_reshape_exit;
+			}
+			snprintf(buf, PATH_MAX, "/dev/md/%s", info2.name);
+			info->delta_disks = info2.delta_disks;
+
+			delta_disks = info->array.raid_disks - geo.raid_disks;
+			geo.raid_disks = info->array.raid_disks;
+			if (info->array.level == 4) {
+				geo.raid_disks--;
+				delta_disks--;
+			}
+
+			super = st->sb;
+			super->current_vol = i;
+			ret_val = imsm_child_grow(st, buf,
+						  fd2,
+						  info,
+						  i,
+						  backup);
+			return_to_raid0(info);
+			sysfs_free(info);
+			info = NULL;
+			close(fd2);
+			i++;
+			if (ret_val) {
+				dprintf("Reshape is broken (cannot reshape)\n");
+				ret_val = 1;
+				goto imsm_manage_container_reshape_exit;
+			}
+			geo.dev_id = -1;
+			array = get_volume_for_olce(st, geo.raid_disks);
+			if (array) {
+				struct imsm_update_reshape *u;
+				dprintf("imsm: next volume to reshape is: %s\n", array);
+				find_array_minor(array, 1, st->devnum, &geo.dev_id);
+				if (geo.dev_id > -1) {
+					/* send next array update
+					 */
+					dprintf("imsm: Preparing metadata update for: %s (md%i)\n", array, geo.dev_id);
+					st->update_tail = &st->updates;
+					u = imsm_create_metadata_update_for_reshape(st, &geo);
+					if (u) {
+						u->reshape_delta_disks = delta_disks;
+						append_metadata_update(st, u, u->update_memory_size);
+						flush_metadata_updates(st);
+						/* send pings to roll managemon and monitor
+						 */
+						ping_manager(geo.dev_name);
+						ping_monitor(geo.dev_name);
+
+						if (imsm_check_reshape_conditions(fd, st, geo.dev_id)) {
+							dprintf("imsm. Error. Wrong reshape conditions.\n");
+							ret_val = 1;
+							geo.dev_id = -1;
+						}
+					} else
+						geo.dev_id = -1;
+				}
+				free(array);
+			}
+		}
+		unfreeze_container(st);
+		close(fd);
+		break;
+	case -1:
+		fprintf(stderr, Name ": Cannot run child to monitor reshape: %s\n",
+			strerror(errno));
+		ret_val = 1;
+		break;
+	default:
+		/* The child will take care of unfreezing the array */
+		break;
+	}
+
+imsm_manage_container_reshape_exit:
+	sysfs_free(info);
+	if (fd > -1)
+		close(fd);
+	if (geo.dev_name)
+		free(geo.dev_name);
+
+	return ret_val;
+}
+
+int imsm_manage_reshape(struct supertype *st, char *backup)
+{
+	int ret_val = 0;
+
+	dprintf("imsm: manage_reshape() called\n");
+
+	if (experimental() == 0)
+		return ret_val;
+
+	/* verify reshape conditions
+	 * for single vlolume reshape exit only and reuse Grow_reshape() code
+	 */
+	if (st->container_dev != st->devnum) {
+		dprintf("imsm: manage_reshape() current volume devnum: %i\n", st->devnum);
+
+		return ret_val;
+	}
+	ret_val = imsm_manage_container_reshape(st, backup);
+	/* unfreeze on error and success
+	 * for any result this is end of work
+	 */
+	unfreeze_container(st);
+
+	return ret_val;
+}
+
+
 struct superswitch super_imsm = {
 #ifndef	MDASSEMBLE
 	.examine_super	= examine_super_imsm,
@@ -7059,6 +7619,7 @@ struct superswitch super_imsm = {
 	.default_geometry = default_geometry_imsm,
 	.get_disk_controller_domain = imsm_get_disk_controller_domain,
 	.reshape_super  = imsm_reshape_super,
+	.manage_reshape = imsm_manage_reshape,
 	.reshape_array	= imsm_reshape_array,
 
 	.external	= 1,


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 13/27] Finalize reshape after adding disks to array
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (11 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 12/27] Control reshape in mdadm Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 14/27] Add reshape progress updating Adam Kwolek
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

When reshape is finished monitor, has to finalize reshape in metadata.
To do this set_array_state() should be called.
This finishes migration and stores metadata on disks.

reshape_delta_disks is set to not active value.
This finishes reshape flow in mdmon.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 monitor.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index 986fdb0..1e22444 100644
--- a/monitor.c
+++ b/monitor.c
@@ -305,6 +305,21 @@ static int read_and_act(struct active_array *a)
 		}
 	}
 
+	/* finalize reshape detection
+	 */
+	if ((a->curr_action != reshape) &&
+	    (a->prev_action == reshape)) {
+		/* set reshape_not_active
+		 * to allow for future rebuilds
+		 */
+		a->reshape_state = reshape_not_active;
+		/* A reshape has finished.
+		 * Some disks may be in sync now.
+		 */
+		a->container->ss->set_array_state(a, a->curr_state <= clean);
+		check_degraded = 1;
+	}
+
 	/* Check for failures and if found:
 	 * 1/ Record the failure in the metadata and unblock the device.
 	 *    FIXME update the kernel to stop notifying on failed drives when


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 14/27] Add reshape progress updating
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (12 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 13/27] Finalize reshape after adding disks to array Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 15/27] WORKAROUND: md reports idle state during reshape start Adam Kwolek
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Reshape progress is not updated in mdmon.
This patch adds reshape progress updating feature.

 Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |    9 +++++++++
 mdmon.h     |    1 +
 monitor.c   |    3 +++
 3 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index a6e6880..5d81623 100644
--- a/managemon.c
+++ b/managemon.c
@@ -417,6 +417,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 			struct metadata_update *updates = NULL;
 			struct mdinfo *newdev = NULL;
 			struct mdinfo *d;
+			int delta_disks = a->reshape_delta_disks;
 
 			newdev = newa->container->ss->reshape_array(newa, reshape_in_progress, &updates);
 			if (newdev) {
@@ -451,6 +452,13 @@ static void manage_member(struct mdstat_ent *mdstat,
 					/* reshape executed
 					 */
 					dprintf("Reshape was started\n");
+					newa->new_data_disks = newa->info.array.raid_disks + delta_disks;
+					if (a->info.array.level == 4)
+						newa->new_data_disks--;
+					if (a->info.array.level == 5)
+						newa->new_data_disks--;
+					if (a->info.array.level == 6)
+						newa->new_data_disks--;
 					replace_array(a->container, a, newa);
 					a = newa;
 				} else {
@@ -591,6 +599,7 @@ static void manage_new(struct mdstat_ent *mdstat,
 	new->container = container;
 
 	new->reshape_state = reshape_not_active;
+	new->new_data_disks = 0;
 
 	inst = to_subarray(mdstat, container->devname);
 
diff --git a/mdmon.h b/mdmon.h
index b869544..b922af4 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -48,6 +48,7 @@ struct active_array {
 
 	enum state_of_reshape reshape_state;
 	int reshape_delta_disks;
+	int new_data_disks;
 
 	int check_degraded; /* flag set by mon, read by manage */
 
diff --git a/monitor.c b/monitor.c
index 1e22444..207004c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -305,6 +305,9 @@ static int read_and_act(struct active_array *a)
 		}
 	}
 
+	if (a->curr_action == reshape)
+		a->info.reshape_progress = a->info.resync_start * a->new_data_disks;
+
 	/* finalize reshape detection
 	 */
 	if ((a->curr_action != reshape) &&


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 15/27] WORKAROUND: md reports idle state during reshape start
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (13 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 14/27] Add reshape progress updating Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 16/27] FIX: core during getting map Adam Kwolek
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

md reports reshape->idle->reshape states transition on reshape start, so reshape finalization is wrongly indicated.
Finalize reshape when we have any progress only,
When reshape is really started, idle state causes reshape finalization as usually.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 monitor.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/monitor.c b/monitor.c
index 207004c..5837b34 100644
--- a/monitor.c
+++ b/monitor.c
@@ -311,7 +311,8 @@ static int read_and_act(struct active_array *a)
 	/* finalize reshape detection
 	 */
 	if ((a->curr_action != reshape) &&
-	    (a->prev_action == reshape)) {
+	    (a->prev_action == reshape) &&
+	    (a->info.reshape_progress > 2)) {
 		/* set reshape_not_active
 		 * to allow for future rebuilds
 		 */


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 16/27] FIX: core during getting map
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (14 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 15/27] WORKAROUND: md reports idle state during reshape start Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:22 ` [PATCH 17/27] Enable reshape for subarrays Adam Kwolek
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

It can occurs that during walking container end conditions bases on "map"
function return value, so function can be protected for wrong data input.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 5f96af2..aced38c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -435,8 +435,12 @@ static size_t sizeof_imsm_map(struct imsm_map *map)
 
 struct imsm_map *get_imsm_map(struct imsm_dev *dev, int second_map)
 {
-	struct imsm_map *map = &dev->vol.map[0];
+	struct imsm_map *map;
+
+	if (dev == NULL)
+		return NULL;
 
+	map = &dev->vol.map[0];
 	if (second_map && !dev->vol.migr_state)
 		return NULL;
 	else if (second_map) {
@@ -1517,6 +1521,9 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	char *devname;
 	int map_disks = info->array.raid_disks;
 
+	if (map == NULL)
+		return;
+
 	for (dl = super->disks; dl; dl = dl->next)
 		if (dl->raiddisk == info->disk.raid_disk)
 			break;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 17/27] Enable reshape for subarrays
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (15 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 16/27] FIX: core during getting map Adam Kwolek
@ 2010-12-06 13:22 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 18/27] Change manage_reshape() placement Adam Kwolek
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Reshape for subarrays is blocked in Grow.c due to lack of implementation.
This patch allows for subarray processing.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/Grow.c b/Grow.c
index b5442a5..f03dcc9 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1518,7 +1518,9 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 		 * layout/chunksize/raid_disks can be changed
 		 * though the kernel may not support it all.
 		 */
-		if (subarray) {
+		if ((subarray) &&
+		    !(st->ss->external &&
+		      st->ss->reshape_super && st->ss->manage_reshape)) {
 			fprintf(stderr, Name ": Cannot reshape subarrays yet\n");
 			break;
 		}


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 18/27] Change manage_reshape() placement
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (16 preceding siblings ...)
  2010-12-06 13:22 ` [PATCH 17/27] Enable reshape for subarrays Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 19/27] Migration: raid5->raid0 Adam Kwolek
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

After reshape_super() call manage_reshape() should do the same things
as grow_reshape() for native metadata case (for execution on array).
The difference is on reshape finish only, when md finishes his work.
For external metadata size is managed externally from md point of view,
so specific to metadata action is required there.
This causes moving manage_reshape() placement to add necessary actions only
to common flow and not duplicate current code.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c |   23 +++++++++++++----------
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/Grow.c b/Grow.c
index f03dcc9..b51c4d0 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1783,14 +1783,6 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 			break;
 		}
 
-		if (st->ss->external) {
-			/* metadata handler takes it from here */
-			ping_manager(container);
-			st->ss->manage_reshape(st, backup_file);
-			frozen = 0;
-			break;
-		}
-
 		/* set up the backup-super-block.  This requires the
 		 * uuid from the array.
 		 */
@@ -1854,6 +1846,15 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 						       d - odisks, fdlist+odisks, offsets+odisks);
 			if (backup_file && done)
 				unlink(backup_file);
+
+			/* manage/finalize reshape in metadata specific way
+			 */
+			close(fd);
+			if (st->ss->external && st->ss->manage_reshape) {
+				st->ss->manage_reshape(st, backup_file);
+				break;
+			}
+
 			if (level != UnSet && level != array.level) {
 				/* We need to wait for the reshape to finish
 				 * (which will have happened unless odata < ndata)
@@ -1864,8 +1865,10 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				if (c == NULL)
 					exit(0);/* not possible */
 
-				if (odata < ndata)
-					wait_reshape(sra);
+				/* child process has always wait for reshape finish
+				 * to perform unfreeze
+				 */
+				wait_reshape(sra);
 				err = sysfs_set_str(sra, NULL, "level", c);
 				if (err)
 					fprintf(stderr, Name ": %s: could not set level to %s\n",


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 19/27] Migration: raid5->raid0
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (17 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 18/27] Change manage_reshape() placement Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 20/27] Detect level change Adam Kwolek
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Add implementation for migration from raid5 to raid0 in one step.
For this migration case (and others for external metadata case)
flow used for Expansion is used. This causes update array parameters
in managemon based on sent metadata update. To do this uptate md parameters
in Grow.c has to be disabled for external metadata case.

In Grow.c instead starting reshape for external metadata case
wait_reshape_start_ext() function is introduced.
Function waits for reshape start initialized by managemon after setting
array parameter as for Expansion case.

In managemon was added subarray_set_num_man() function.
It is similar to function that exists in Grow.c except 2 things:
1. it uses different way to "ping" monitor
2. it tries to set raid_disks more than 2 times as we are more sure that monitor works
   during processing in managemon context

For imsm raid level parameters flow from mdadm (via metadata update)
to managemon was added.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Grow.c        |   61 +++++++-------
 managemon.c   |  104 +++++++++++++++++++++---
 mdadm.h       |    2 
 mdmon.h       |    3 +
 super-intel.c |  251 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 368 insertions(+), 53 deletions(-)

diff --git a/Grow.c b/Grow.c
index b51c4d0..465309c 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1752,28 +1752,32 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				break;
 			}
 		} else {
-			/* set them all just in case some old 'new_*' value
-			 * persists from some earlier problem
+			/* set parametes here only if managemon is not responsible for this
 			 */
-			int err = err; /* only used if rv==1, and always set if
-					* rv==1, so initialisation not needed,
-					* despite gcc warning
-					*/
-			if (sysfs_set_num(sra, NULL, "chunk_size", nchunk) < 0)
-				rv = 1, err = errno;
-			if (!rv && sysfs_set_num(sra, NULL, "layout", nlayout) < 0)
-				rv = 1, err = errno;
-			if (!rv && sysfs_set_num(sra, NULL, "raid_disks", ndisks) < 0)
-				rv = 1, err = errno;
-			if (rv) {
-				fprintf(stderr, Name ": Cannot set device shape for %s\n",
-					devname);
-				if (get_linux_version() < 2006030)
-					fprintf(stderr, Name ": linux 2.6.30 or later required\n");
-				if (err == EBUSY && 
-				    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
-					fprintf(stderr, "       Bitmap must be removed before shape can be changed\n");
-				break;
+			if ((st->ss->external == 0) || (st->ss->reshape_super == NULL)) {
+				/* set them all just in case some old 'new_*' value
+				 * persists from some earlier problem
+				 */
+				int err = err; /* only used if rv==1, and always set if
+						* rv==1, so initialisation not needed,
+						* despite gcc warning
+						*/
+				if (sysfs_set_num(sra, NULL, "chunk_size", nchunk) < 0)
+					rv = 1, err = errno;
+				if (!rv && sysfs_set_num(sra, NULL, "layout", nlayout) < 0)
+					rv = 1, err = errno;
+				if (!rv && sysfs_set_num(sra, NULL, "raid_disks", ndisks) < 0)
+					rv = 1, err = errno;
+				if (rv) {
+					fprintf(stderr, Name ": Cannot set device shape for %s\n",
+						devname);
+					if (get_linux_version() < 2006030)
+						fprintf(stderr, Name ": linux 2.6.30 or later required\n");
+					if (err == EBUSY &&
+					    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
+						fprintf(stderr, "       Bitmap must be removed before shape can be changed\n");
+					break;
+				}
 			}
 		}
 
@@ -2204,8 +2208,8 @@ static void validate(int afd, int bfd, unsigned long long offset)
 	}
 }
 
-int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
-	       int *fds, unsigned long long *offsets,
+int child_grow(int afd, struct mdinfo *sra,
+	       unsigned long stripes, int *fds, unsigned long long *offsets,
 	       int disks, int chunk, int level, int layout, int data,
 	       int dests, int *destfd, unsigned long long *destoffsets)
 {
@@ -2268,11 +2272,12 @@ static int child_shrink(int afd, struct mdinfo *sra, unsigned long stripes,
 	return 1;
 }
 
-static int child_same_size(int afd, struct mdinfo *sra, unsigned long stripes,
-			   int *fds, unsigned long long *offsets,
-			   unsigned long long start,
-			   int disks, int chunk, int level, int layout, int data,
-			   int dests, int *destfd, unsigned long long *destoffsets)
+int child_same_size(int afd,
+		    struct mdinfo *sra, unsigned long stripes,
+		    int *fds, unsigned long long *offsets,
+		    unsigned long long start,
+		    int disks, int chunk, int level, int layout, int data,
+		    int dests, int *destfd, unsigned long long *destoffsets)
 {
 	unsigned long long size;
 	unsigned long tailstripes = stripes;
diff --git a/managemon.c b/managemon.c
index 5d81623..2739c39 100644
--- a/managemon.c
+++ b/managemon.c
@@ -380,6 +380,43 @@ static int disk_init_and_add(struct mdinfo *disk, struct mdinfo *clone,
 	return 0;
 }
 
+int subarray_set_num_man(char *container, struct mdinfo *sra, char *name, int n)
+{
+	/* when dealing with external metadata subarrays we need to be
+	 * prepared to handle EAGAIN.  The kernel may need to wait for
+	 * mdmon to mark the array active so the kernel can handle
+	 * allocations/writeback when preparing the reshape action
+	 * (md_allow_write()).  We temporarily disable safe_mode_delay
+	 * to close a race with the array_state going clean before the
+	 * next write to raid_disks / stripe_cache_size
+	 */
+	char safe[50];
+	int rc;
+#define MANAGEMON_COUNTER	20
+	int counter = MANAGEMON_COUNTER;
+
+	/* only 'raid_disks' and 'stripe_cache_size' trigger md_allow_write */
+	if (strcmp(name, "raid_disks") != 0 &&
+	    strcmp(name, "stripe_cache_size") != 0)
+		return sysfs_set_num(sra, NULL, name, n);
+
+	rc = sysfs_get_str(sra, NULL, "safe_mode_delay", safe, sizeof(safe));
+	if (rc <= 0)
+		return -1;
+	sysfs_set_num(sra, NULL, "safe_mode_delay", 0);
+	rc = sysfs_set_num(sra, NULL, name, n);
+	while ((rc < 0) && counter) {
+		counter--;
+		dprintf("managemon: Try to set %s to value %i (%i time(s)).\n", name, n, MANAGEMON_COUNTER - counter);
+		wakeup_monitor();
+		usleep(250000);
+		rc = sysfs_set_num(sra, NULL, name, n);
+	}
+	sysfs_set_str(sra, NULL, "safe_mode_delay", safe);
+	return rc;
+}
+
+
 static void manage_member(struct mdstat_ent *mdstat,
 			  struct active_array *a)
 {
@@ -410,6 +447,8 @@ static void manage_member(struct mdstat_ent *mdstat,
 	else
 		frozen = 1; /* can't read metadata_version assume the worst */
 
+
+
 	if ((a->reshape_state != reshape_not_active) &&
 	    (a->reshape_state != reshape_in_progress)) {
 		dprintf("Reshape signals need to manage this member\n");
@@ -418,17 +457,17 @@ static void manage_member(struct mdstat_ent *mdstat,
 			struct mdinfo *newdev = NULL;
 			struct mdinfo *d;
 			int delta_disks = a->reshape_delta_disks;
+			int status_ok = 1;
 
+			newa = duplicate_aa(a);
+			if (newa == NULL) {
+				a->reshape_state = reshape_not_active;
+				goto reshape_out;
+			}
 			newdev = newa->container->ss->reshape_array(newa, reshape_in_progress, &updates);
 			if (newdev) {
-				int status_ok = 1;
-				newa = duplicate_aa(a);
-				if (newa == NULL)
-					goto reshape_out;
-
 				for (d = newdev; d ; d = d->next) {
 					struct mdinfo *newd;
-
 					newd = malloc(sizeof(*newd));
 					if (!newd) {
 						status_ok = 0;
@@ -443,16 +482,49 @@ static void manage_member(struct mdstat_ent *mdstat,
 					}
 					disk_init_and_add(newd, d, newa);
 				}
-				/* go with reshape
+			}
+			if (newa->reshape_state == reshape_in_progress) {
+				/* set reshape parametars
 				 */
-				if (status_ok)
+				if (status_ok) {
+					dprintf("managemon: set sync_max to 0\n");
 					if (sysfs_set_num(&newa->info, NULL, "sync_max", 0) < 0)
 						status_ok = 0;
+				}
+
+				if (status_ok && newa->reshape_raid_disks) {
+					dprintf("managemon: set raid_disks to %i\n", newa->reshape_raid_disks);
+					if (subarray_set_num_man(a->container->devname, &newa->info, "raid_disks", newa->reshape_raid_disks))
+							status_ok = 0;
+				}
+
+				if (status_ok && newa->reshape_level > -1) {
+					char *c = map_num(pers, newa->reshape_level);
+					if (c == NULL)
+						status_ok = 0;
+					else {
+						dprintf("managemon: set level to %s\n", c);
+						if (sysfs_set_str(&newa->info, NULL, "level", c) < 0)
+							status_ok = 0;
+					}
+				}
+
+				if (status_ok && newa->reshape_layout >= 0) {
+					dprintf("managemon: set layout to %i\n", newa->reshape_layout);
+					if (sysfs_set_num(&newa->info, NULL, "layout", newa->reshape_layout) < 0)
+						status_ok = 0;
+				}
+
+				/* go with reshape
+				 */
 				if (status_ok && sysfs_set_str(&newa->info, NULL, "sync_action", "reshape") == 0) {
 					/* reshape executed
 					 */
 					dprintf("Reshape was started\n");
-					newa->new_data_disks = newa->info.array.raid_disks + delta_disks;
+					if (newa->reshape_raid_disks > 0)
+						newa->new_data_disks = newa->reshape_raid_disks;
+					else
+						newa->new_data_disks = newa->info.array.raid_disks + delta_disks;
 					if (a->info.array.level == 4)
 						newa->new_data_disks--;
 					if (a->info.array.level == 5)
@@ -461,28 +533,38 @@ static void manage_member(struct mdstat_ent *mdstat,
 						newa->new_data_disks--;
 					replace_array(a->container, a, newa);
 					a = newa;
+					newa = NULL;
 				} else {
 					/* on problems cancel update
 					 */
-					free_aa(newa);
 					free_updates(&updates);
 					updates = NULL;
+
 					a->container->ss->reshape_array(a, reshape_cancel_request, &updates);
 					sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+					a->reshape_state = reshape_not_active;
 				}
 			}
+reshape_out:
+			if (a->reshape_state == reshape_not_active) {
+				dprintf("Cancel reshape.\n");
+				a->container->ss->reshape_array(a, reshape_cancel_request, &updates);
+				sysfs_set_str(&a->info, NULL, "sync_action", "idle");
+			}
 			dprintf("Send metadata update for reshape.\n");
 
 			queue_metadata_update(updates);
 			updates = NULL;
 			wakeup_monitor();
-reshape_out:
+
 			while (newdev) {
 				d = newdev->next;
 				free(newdev);
 				newdev = d;
 			}
 			free_updates(&updates);
+			if (newa)
+				free_aa(newa);
 		}
 	}
 
diff --git a/mdadm.h b/mdadm.h
index 4563d14..7aba440 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -455,6 +455,8 @@ extern int sysfs_attr_match(const char *attr, const char *str);
 extern int sysfs_match_word(const char *word, char **list);
 extern int sysfs_set_str(struct mdinfo *sra, struct mdinfo *dev,
 			 char *name, char *val);
+extern int sysfs_get_ll(struct mdinfo *sra, struct mdinfo *dev,
+			char *name, unsigned long long *val);
 extern int sysfs_set_num(struct mdinfo *sra, struct mdinfo *dev,
 			 char *name, unsigned long long val);
 extern int sysfs_uevent(struct mdinfo *sra, char *event);
diff --git a/mdmon.h b/mdmon.h
index b922af4..a31ad97 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -49,6 +49,9 @@ struct active_array {
 	enum state_of_reshape reshape_state;
 	int reshape_delta_disks;
 	int new_data_disks;
+	int reshape_raid_disks;
+	int reshape_level;
+	int reshape_layout;
 
 	int check_degraded; /* flag set by mon, read by manage */
 
diff --git a/super-intel.c b/super-intel.c
index aced38c..2c5331f 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -314,6 +314,9 @@ struct imsm_update_reshape {
 	enum imsm_update_type type;
 	int update_memory_size;
 	int reshape_delta_disks;
+	int reshape_raid_disks;
+	int reshape_level;
+	int reshape_layout;
 	int disks_count;
 	int spares_in_update;
 	int devnum;
@@ -5352,6 +5355,7 @@ static void imsm_process_update(struct supertype *st,
 		__u32 new_mpb_size;
 		int new_disk_num;
 		struct intel_dev *current_dev;
+		struct imsm_dev *new_dev;
 
 		dprintf("imsm: imsm_process_update() for update_reshape [u->update_prepared  = %i]\n", u->update_prepared);
 		if ((u->update_prepared == -1) ||
@@ -5409,11 +5413,12 @@ static void imsm_process_update(struct supertype *st,
 		}
 		/* find current dev in intel_super
 		 */
-		dprintf("\t\tLooking  for volume %s\n", (char *)u->devs_mem.dev->volume);
+		new_dev = (struct imsm_dev *)((void *)u + u->upd_devs_offset);
+		dprintf("\t\tLooking  for volume %s\n", (char *)new_dev->volume);
 		current_dev = super->devlist;
 		while (current_dev) {
 			if (strcmp((char *)current_dev->dev->volume,
-				   (char *)u->devs_mem.dev->volume) == 0)
+				   (char *)new_dev->volume) == 0)
 				break;
 			current_dev = current_dev->next;
 		}
@@ -5432,7 +5437,14 @@ static void imsm_process_update(struct supertype *st,
 		/* set reshape_delta_disks
 		 */
 		a->reshape_delta_disks = u->reshape_delta_disks;
+		a->reshape_raid_disks = u->reshape_raid_disks;
 		a->reshape_state = reshape_is_starting;
+		a->reshape_level = u->reshape_level;
+		a->reshape_layout = u->reshape_layout;
+		if (a->reshape_level == 0) {
+			a->reshape_level = 5;
+			a->reshape_layout = 5;
+		}
 
 		super->updates_pending++;
 update_reshape_exit:
@@ -5896,12 +5908,7 @@ static void imsm_prepare_update(struct supertype *st,
 		if (u->reshape_delta_disks < 0)
 			break;
 		u->update_prepared = 1;
-		if (u->reshape_delta_disks == 0) {
-			/* for non growing reshape buffers sizes are not affected
-			 * but check some parameters
-			 */
-			break;
-		}
+
 		/* count HDDs
 		 */
 		u->disks_count = 0;
@@ -6370,6 +6377,106 @@ abort:
 	return ret_val;
 }
 
+/*****************************************************************************
+ * Function: update_geometry
+ * Description: Prepares imsm volume map update in case of volume reshape
+ * Returns: 0 on success, -1 if fail
+ * ***************************************************************************/
+int update_geometry(struct supertype *st,
+		    struct geo_params *geo)
+{
+	int fd = -1, ret_val = -1;
+	struct mdinfo *sra = NULL;
+	char buf[PATH_MAX];
+	char supported = 1;
+
+	snprintf(buf, PATH_MAX, "/dev/md%i", geo->dev_id);
+	fd = open(buf , O_RDONLY | O_DIRECT);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device\n");
+		return -1;
+	}
+
+	sra = sysfs_read(fd, 0, GET_DISKS | GET_LAYOUT | GET_CHUNK | GET_SIZE | GET_LEVEL | GET_DEVS);
+	if (!sra) {
+		dprintf("imsm: Cannot get mdinfo!\n");
+		goto update_geometry_exit;
+	}
+
+	if (sra->devs == NULL) {
+		dprintf("imsm: Cannot load device information.\n");
+		goto update_geometry_exit;
+	}
+	/* is size change possible??? */
+	if (((unsigned long long)geo->size != sra->devs->component_size) && (geo->size != UnSet) && (geo->size > 0)) {
+		geo->size = sra->devs->component_size;
+		dprintf("imsm: Change the array size not supported in imsm!\n");
+		goto update_geometry_exit;
+	}
+
+	if ((geo->level != sra->array.level) && (geo->level >= 0) && (geo->level != UnSet)) {
+		switch (sra->array.level) {
+		case 0:
+			if (geo->level != 5)
+				supported = 0;
+			break;
+		case 5:
+			if (geo->level != 0)
+				supported = 0;
+			break;
+		case 1:
+			if ((geo->level != 5) || (geo->level != 0))
+				supported = 0;
+			break;
+		case 10:
+			if (geo->level != 5)
+				supported = 0;
+			break;
+		default:
+			supported = 0;
+			break;
+		}
+		if (!supported) {
+			dprintf("imsm: Error. Level Migration from %d to %d not supported!\n", sra->array.level, geo->level);
+			goto update_geometry_exit;
+		}
+	} else {
+		geo->level = sra->array.level;
+	}
+
+	if ((geo->layout != sra->array.layout) && ((geo->layout != UnSet) && (geo->layout != -1))) {
+		if ((sra->array.layout == 0) && (sra->array.level == 5) && (geo->layout == 5)) {
+			/* reshape 5 -> 4 */
+			geo->raid_disks++;
+		} else if ((sra->array.layout == 5) && (sra->array.level == 5) && (geo->layout == 0)) {
+			/* reshape 4 -> 5 */
+			geo->layout = 0;
+			geo->level = 5;
+		} else {
+			dprintf("imsm: Error. Layout Migration from %d to %d not supported!\n", sra->array.layout, geo->layout);
+			ret_val = -1;
+			goto update_geometry_exit;
+		}
+	}
+
+	if ((geo->chunksize == 0) || (geo->chunksize == UnSet))
+	    geo->chunksize = sra->array.chunk_size;
+
+	if (!validate_geometry_imsm(st, geo->level, geo->layout, geo->raid_disks,
+					geo->chunksize,  geo->size,
+					0, 0, 1))
+		goto update_geometry_exit;
+
+	ret_val = 0;
+
+update_geometry_exit:
+	sysfs_free(sra);
+	if (fd > -1)
+		close(fd);
+
+	return ret_val;
+}
+
 /******************************************************************************
  * function: imsm_create_metadata_update_for_reshape
  * Function creates update for whole IMSM container.
@@ -6423,6 +6530,9 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(struct super
 	}
 	u->reshape_delta_disks = delta_disks;
 	u->update_prepared = -1;
+	u->reshape_raid_disks = 0;
+	u->reshape_level = -1;
+	u->reshape_layout = -1;
 	u->update_memory_size = update_memory_size;
 	u->type = update_reshape;
 	u->spares_in_update = 0;
@@ -6470,6 +6580,18 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(struct super
 						set_imsm_ord_tbl_ent(new_map, idx, idx);
 				}
 				u->devnum = geo->dev_id;
+				/* case for reshape without grow */
+				if (u->reshape_delta_disks == 0) {
+					dprintf("imsm: reshape prepate metadata for volume= %d, index= %d\n", geo->dev_id, i);
+					if (update_geometry(st, geo) == -1) {
+						dprintf("imsm: ERROR: Cannot prepare update for volume map!\n");
+						ret_val = NULL;
+						goto exit_imsm_create_metadata_update_for_reshape;
+					} else {
+						new_map->raid_level = geo->level;
+						new_map->blocks_per_strip = geo->chunksize / 512;
+					}
+				}
 				break;
 			}
 		}
@@ -6641,6 +6763,10 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 		       char *backup, char *dev, int verbouse)
 {
 	int ret_val = 1;
+	struct mdinfo *sra = NULL;
+	int fd = -1;
+	char buf[PATH_MAX];
+	int delta_disks = -1;
 	struct geo_params geo;
 
 	dprintf("imsm: reshape_super called.\n");
@@ -6690,9 +6816,69 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 			dprintf("imsm: Operation is not allowed on container\n");
 		if (ret_val)
 			unfreeze_container(st);
+		goto imsm_reshape_super_exit;
 	} else
 		dprintf("imsm: not a container operation\n");
 
+	snprintf(buf, PATH_MAX, "/dev/md%i", st->devnum);
+	fd = open(buf , O_RDONLY | O_DIRECT);
+	if (fd < 0) {
+		dprintf("imsm: cannot open device: %s\n", buf);
+		goto imsm_reshape_super_exit;
+	}
+
+	sra = sysfs_read(fd, 0,  GET_VERSION | GET_LEVEL | GET_LAYOUT |
+			 GET_DISKS | GET_DEVS | GET_CHUNK | GET_SIZE);
+	if (sra == NULL) {
+		fprintf(stderr, Name ": Cannot read sysfs info (imsm)\n");
+		goto imsm_reshape_super_exit;
+	}
+
+	geo.dev_id = -1;
+	find_array_minor(geo.dev_name, 1, st->devnum, &geo.dev_id);
+
+	/* continue volume check - proceed if delta_disk is zero only
+	 */
+	if (geo.raid_disks > 0 && geo.raid_disks != UnSet)
+		delta_disks = geo.raid_disks - sra->array.raid_disks;
+	else
+		delta_disks = 0;
+	dprintf("imsm: imsm_reshape_super() called on array when delta disks = %i\n", delta_disks);
+	if (delta_disks == 0) {
+		struct imsm_update_reshape *u;
+		st->update_tail = &st->updates;
+		dprintf("imsm: imsm_reshape_super(): raid_disks not changed for volume reshape. Reshape allowed.\n");
+
+		if (find_array_minor(geo.dev_name, 1, st->devnum, &geo.dev_id) > -1) {
+			u = imsm_create_metadata_update_for_reshape(st, &geo);
+			if (u) {
+				if (geo.raid_disks > raid_disks)
+					u->reshape_raid_disks = geo.raid_disks;
+				u->reshape_level = geo.level;
+				u->reshape_layout = geo.layout;
+				ret_val = 0;
+				append_metadata_update(st, u, u->update_memory_size);
+			}
+		}
+		goto imsm_reshape_super_exit;
+	} else {
+		char *devname = devnum2devname(st->devnum);
+		char *devtoprint = devname;
+
+		if (devtoprint == NULL)
+			devtoprint = "Device";
+		fprintf(stderr, Name
+			": %s cannot be reshaped. Command has to be executed on container.\n",
+			devtoprint);
+		if (devname)
+			free(devname);
+	}
+
+imsm_reshape_super_exit:
+	sysfs_free(sra);
+	if (fd >= 0)
+		close(fd);
+
 	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
 
 	return ret_val;
@@ -6977,7 +7163,8 @@ struct mdinfo *imsm_reshape_array(struct active_array *a, enum state_of_reshape
 
 	if (a->reshape_delta_disks == 0) {
 		dprintf("array parameters has to be changed\n");
-		/* TBD */
+		a->reshape_state = reshape_in_progress;
+		return disk_list;
 	}
 	if (a->reshape_delta_disks > 0) {
 		dprintf("grow is detected.\n");
@@ -7002,17 +7189,14 @@ imsm_reshape_array_exit:
 		sysfs_set_str(&a->info, NULL, "sync_action", "idle");
 		imsm_grow_array_remove_devices_on_cancel(a);
 		u = (struct imsm_update_reshape *)calloc(1, sizeof(struct imsm_update_reshape));
-		if (u) {
+		if (u)
 			u->type = update_reshape_cancel;
-			a->reshape_state = reshape_not_active;
-		}
 	}
 
 	if (u) {
 		/* post any prepared update
 		 */
 		u->devnum = a->devnum;
-
 		u->update_memory_size = sizeof(struct imsm_update_reshape);
 		u->reshape_delta_disks = a->reshape_delta_disks;
 		u->update_prepared = 1;
@@ -7199,7 +7383,8 @@ int imsm_child_grow(struct supertype *st, char *devname, int fd_in, struct mdinf
 
 void return_to_raid0(struct mdinfo *sra)
 {
-	if (sra->array.level == 4) {
+	if ((sra->array.level == 4) ||
+	    (sra->array.level == 0)) {
 		dprintf("Execute backward takeover to raid0\n");
 		sysfs_set_str(sra, NULL, "level", "raid0");
 	}
@@ -7576,8 +7761,45 @@ int imsm_manage_reshape(struct supertype *st, char *backup)
 	 * for single vlolume reshape exit only and reuse Grow_reshape() code
 	 */
 	if (st->container_dev != st->devnum) {
+		char buf[PATH_MAX];
+		int fd;
+
 		dprintf("imsm: manage_reshape() current volume devnum: %i\n", st->devnum);
 
+		snprintf(buf, PATH_MAX, "/dev/md%i", st->devnum);
+		fd = open(buf , O_RDWR | O_DIRECT);
+		if (fd > -1) {
+			struct mdinfo *info;
+			struct mdinfo sra;
+			char *cont_name;
+
+			sra.devs = NULL;
+			st->ss->getinfo_super(st, &sra, NULL);
+			/* wait for reshape finish
+			* and manage array size based on metadata information
+			*/
+			cont_name = devnum2devname(st->devnum);
+			if (cont_name) {
+				ping_manager(cont_name);
+				ping_monitor(cont_name);
+				free(cont_name);
+			}
+			imsm_grow_manage_size(st, &sra, -1);
+
+			/* for level == 4: execute takeover to raid0 */
+			info = sysfs_read(fd, 0, GET_VERSION | GET_LEVEL | GET_DEVS | GET_LAYOUT);
+			if (info) {
+				/* curently md doesn't support direct translation from raid5 to raid4
+				 * it has be done via raid5 layout5
+				 */
+				if ((info->array.level == 5) &&
+				    (info->array.layout == 5))
+					info->array.level = 4;
+				return_to_raid0(info);
+				sysfs_free(info);
+			}
+			close(fd);
+		}
 		return ret_val;
 	}
 	ret_val = imsm_manage_container_reshape(st, backup);
@@ -7643,3 +7865,4 @@ struct superswitch super_imsm = {
 	.prepare_update = imsm_prepare_update,
 #endif /* MDASSEMBLE */
 };
+


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 20/27] Detect level change
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (18 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 19/27] Migration: raid5->raid0 Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 21/27] Migration raid0->raid5 Adam Kwolek
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

For level migration support it is necessary to allow mdmon to react for level changes.
It has to have ability to change configuration of active array,
and for array level change to raid0 finish array monitoring.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |   12 +++++++++++-
 monitor.c   |    2 +-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/managemon.c b/managemon.c
index 2739c39..0f02f74 100644
--- a/managemon.c
+++ b/managemon.c
@@ -447,7 +447,17 @@ static void manage_member(struct mdstat_ent *mdstat,
 	else
 		frozen = 1; /* can't read metadata_version assume the worst */
 
-
+	if (mdstat->level) {
+		int level = map_name(pers, mdstat->level);
+		if (a->info.array.level != level && level >= 0) {
+			newa = duplicate_aa(a);
+			if (newa) {
+				newa->info.array.level = level;
+				replace_array(a->container, a, newa);
+				a = newa;
+			}
+		}
+	}
 
 	if ((a->reshape_state != reshape_not_active) &&
 	    (a->reshape_state != reshape_in_progress)) {
diff --git a/monitor.c b/monitor.c
index 5837b34..6385555 100644
--- a/monitor.c
+++ b/monitor.c
@@ -512,7 +512,7 @@ static int wait_and_act(struct supertype *container, int nowait)
 		/* once an array has been deactivated we want to
 		 * ask the manager to discard it.
 		 */
-		if (!a->container) {
+		if (!a->container || (a->info.array.level == 0)) {
 			if (discard_this) {
 				ap = &(*ap)->next;
 				continue;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 21/27] Migration raid0->raid5
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (19 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 20/27] Detect level change Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 22/27] Read chunk size and layout from mdstat Adam Kwolek
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Add implementation for migration from raid0 to raid5 in one step.
For imsm raid level parameters flow from mdadm (vi metadata update) to managemon was added.

Block takeover for this migration case (update_reshape is used only)
For migration on container (OLCE) reinitialize variables that are changed
by single array reshape case.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   50 ++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 2c5331f..dee23e2 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5001,7 +5001,7 @@ static void imsm_sync_metadata(struct supertype *container)
 {
 	struct intel_super *super = container->sb;
 
-	if (!super->updates_pending)
+	if (!super || !super->updates_pending)
 		return;
 
 	write_super_imsm(container, 0);
@@ -6502,6 +6502,13 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(struct super
 
 	dprintf("imsm imsm_update_metadata_for_reshape(enter) raid_disks = %i\n", geo->raid_disks);
 
+	if (super == NULL || super->anchor == NULL) {
+		dprintf("Error: imsm_create_metadata_update_for_reshape(): null pointers on input\n");
+		dprintf("\t\t super = %p\n", super);
+		if (super)
+			dprintf("\t\t super->anchor = %p\n", super->anchor);
+		return ret_val;
+	}
 	if ((geo->raid_disks < super->anchor->num_disks) ||
 	    (geo->raid_disks == UnSet))
 		geo->raid_disks = super->anchor->num_disks;
@@ -6580,8 +6587,11 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(struct super
 						set_imsm_ord_tbl_ent(new_map, idx, idx);
 				}
 				u->devnum = geo->dev_id;
-				/* case for reshape without grow */
-				if (u->reshape_delta_disks == 0) {
+				/* case for reshape without grow
+				 * or grow is level change effect
+				 */
+				if ((u->reshape_delta_disks == 0) ||
+				    ((new_map->raid_level != geo->level) && (geo->level != UnSet))) {
 					dprintf("imsm: reshape prepate metadata for volume= %d, index= %d\n", geo->dev_id, i);
 					if (update_geometry(st, geo) == -1) {
 						dprintf("imsm: ERROR: Cannot prepare update for volume map!\n");
@@ -6765,6 +6775,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	int ret_val = 1;
 	struct mdinfo *sra = NULL;
 	int fd = -1;
+	int fdc = -1;
 	char buf[PATH_MAX];
 	int delta_disks = -1;
 	struct geo_params geo;
@@ -6820,10 +6831,16 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	} else
 		dprintf("imsm: not a container operation\n");
 
-	snprintf(buf, PATH_MAX, "/dev/md%i", st->devnum);
-	fd = open(buf , O_RDONLY | O_DIRECT);
+	snprintf(buf, PATH_MAX, "/dev/md%i", st->container_dev);
+	fdc = open(buf , O_RDONLY | O_DIRECT);
+	if (fdc < 0) {
+		dprintf("imsm: cannot open container: %s\n", buf);
+		goto imsm_reshape_super_exit;
+	}
+
+	fd = open(geo.dev_name , O_RDONLY | O_DIRECT);
 	if (fd < 0) {
-		dprintf("imsm: cannot open device: %s\n", buf);
+		dprintf("imsm: cannot open device: %s\n", geo.dev_name);
 		goto imsm_reshape_super_exit;
 	}
 
@@ -6841,8 +6858,10 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	 */
 	if (geo.raid_disks > 0 && geo.raid_disks != UnSet)
 		delta_disks = geo.raid_disks - sra->array.raid_disks;
-	else
+	else {
 		delta_disks = 0;
+		geo.raid_disks = sra->array.raid_disks;
+	}
 	dprintf("imsm: imsm_reshape_super() called on array when delta disks = %i\n", delta_disks);
 	if (delta_disks == 0) {
 		struct imsm_update_reshape *u;
@@ -6850,7 +6869,16 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 		dprintf("imsm: imsm_reshape_super(): raid_disks not changed for volume reshape. Reshape allowed.\n");
 
 		if (find_array_minor(geo.dev_name, 1, st->devnum, &geo.dev_id) > -1) {
-			u = imsm_create_metadata_update_for_reshape(st, &geo);
+			struct supertype *st2 = NULL;
+			struct supertype *st_tmp = st;
+			if (st->sb == NULL) {
+				st2 = super_by_fd(fdc, NULL);
+				st->ss->load_super(st2, fdc, NULL);
+				if (st2)
+					st_tmp = st2;
+			}
+
+			u = imsm_create_metadata_update_for_reshape(st_tmp, &geo);
 			if (u) {
 				if (geo.raid_disks > raid_disks)
 					u->reshape_raid_disks = geo.raid_disks;
@@ -6859,6 +6887,8 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 				ret_val = 0;
 				append_metadata_update(st, u, u->update_memory_size);
 			}
+			if (st2)
+				st->ss->free_super(st2);
 		}
 		goto imsm_reshape_super_exit;
 	} else {
@@ -6878,6 +6908,8 @@ imsm_reshape_super_exit:
 	sysfs_free(sra);
 	if (fd >= 0)
 		close(fd);
+	if (fdc >= 0)
+		close(fdc);
 
 	dprintf("imsm: reshape_super Exit code = %i\n", ret_val);
 
@@ -7704,6 +7736,8 @@ int imsm_manage_container_reshape(struct supertype *st, char *backup)
 					 */
 					dprintf("imsm: Preparing metadata update for: %s (md%i)\n", array, geo.dev_id);
 					st->update_tail = &st->updates;
+					geo.size = UnSet;
+					geo.level = UnSet;
 					u = imsm_create_metadata_update_for_reshape(st, &geo);
 					if (u) {
 						u->reshape_delta_disks = delta_disks;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 22/27] Read chunk size and layout from mdstat
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (20 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 21/27] Migration raid0->raid5 Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 23/27] Migration: Chunk size migration Adam Kwolek
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Support reading layout and chunk size from mdstat.
It is needed for external reshape with layout or chunk size changes.

This patch removes chunk size changing as result of mdadm action.
Chunk size in mdmon has to change when it is really changed in md only.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c |    1 +
 mdadm.h     |    2 ++
 mdstat.c    |   11 +++++++++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/managemon.c b/managemon.c
index 0f02f74..4874681 100644
--- a/managemon.c
+++ b/managemon.c
@@ -441,6 +441,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 	a->info.array.raid_disks = mdstat->raid_disks;
 	// MORE
 
+	a->info.array.chunk_size = mdstat->chunk_size;
 	/* honor 'frozen' */
 	if (sysfs_get_str(&a->info, NULL, "metadata_version", buf, sizeof(buf)) > 0)
 		frozen = buf[9] == '-';
diff --git a/mdadm.h b/mdadm.h
index 7aba440..4aae4cf 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -387,6 +387,8 @@ struct mdstat_ent {
 	int		resync; /* 3 if check, 2 if reshape, 1 if resync, 0 if recovery */
 	int		devcnt;
 	int		raid_disks;
+	int		layout;
+	int		chunk_size;
 	char *		metadata_version;
 	struct dev_member {
 		char			*name;
diff --git a/mdstat.c b/mdstat.c
index fdce516..d1fb18f 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -146,7 +146,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 	end = &all;
 	for (; (line = conf_line(f)) ; free_line(line)) {
 		struct mdstat_ent *ent;
-		char *w;
+		char *w, *prev = NULL;
 		int devnum;
 		int in_devs = 0;
 		char *ep;
@@ -191,7 +191,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 		ent->dev = strdup(line);
 		ent->devnum = devnum;
 
-		for (w=dl_next(line); w!= line ; w=dl_next(w)) {
+		for (w = dl_next(line); w != line ; prev = w, w = dl_next(w)) {
 			int l = strlen(w);
 			char *eq;
 			if (strcmp(w, "active")==0)
@@ -266,6 +266,13 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 				   w[0] <= '9' &&
 				   w[l-1] == '%') {
 				ent->percent = atoi(w);
+			} else if (strcmp(w, "algorithm") == 0 &&
+				   dl_next(w) != line) {
+				w = dl_next(w);
+				ent->layout = atoi(w);
+			} else if (strncmp(w, "chunk", 5) == 0 &&
+				   prev != NULL) {
+				ent->chunk_size = atoi(prev) * 1024;
 			}
 		}
 		if (insert_here && (*insert_here)) {


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 23/27] Migration: Chunk size migration
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (21 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 22/27] Read chunk size and layout from mdstat Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:23 ` [PATCH 24/27] Add takeover support for external meta Adam Kwolek
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

Add implementation for chunk size migration for external metadata.
Update works using array parameters update in managemon. Reshape is started by managemon also.
mdadm waits for reshape array state instead starting reshape process.
For imsm chunk size parameter flow, from mdadm (via metadata update) to managemon was added.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 managemon.c   |    6 ++++++
 mdmon.h       |    1 +
 super-intel.c |    4 ++++
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index 4874681..6ac7178 100644
--- a/managemon.c
+++ b/managemon.c
@@ -520,6 +520,12 @@ static void manage_member(struct mdstat_ent *mdstat,
 					}
 				}
 
+				if (status_ok && newa->reshape_chunk_size > 0) {
+					dprintf("managemon: set chunk_size to %i\n", newa->reshape_chunk_size);
+					if (sysfs_set_num(&newa->info, NULL, "chunk_size", newa->reshape_chunk_size) < 0)
+						status_ok = 0;
+				}
+
 				if (status_ok && newa->reshape_layout >= 0) {
 					dprintf("managemon: set layout to %i\n", newa->reshape_layout);
 					if (sysfs_set_num(&newa->info, NULL, "layout", newa->reshape_layout) < 0)
diff --git a/mdmon.h b/mdmon.h
index a31ad97..c83bd99 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -52,6 +52,7 @@ struct active_array {
 	int reshape_raid_disks;
 	int reshape_level;
 	int reshape_layout;
+	int reshape_chunk_size;
 
 	int check_degraded; /* flag set by mon, read by manage */
 
diff --git a/super-intel.c b/super-intel.c
index dee23e2..5779129 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -317,6 +317,7 @@ struct imsm_update_reshape {
 	int reshape_raid_disks;
 	int reshape_level;
 	int reshape_layout;
+	int reshape_chunk_size;
 	int disks_count;
 	int spares_in_update;
 	int devnum;
@@ -5445,6 +5446,7 @@ static void imsm_process_update(struct supertype *st,
 			a->reshape_level = 5;
 			a->reshape_layout = 5;
 		}
+		a->reshape_chunk_size = u->reshape_chunk_size;
 
 		super->updates_pending++;
 update_reshape_exit:
@@ -6540,6 +6542,7 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(struct super
 	u->reshape_raid_disks = 0;
 	u->reshape_level = -1;
 	u->reshape_layout = -1;
+	u->reshape_chunk_size = -1;
 	u->update_memory_size = update_memory_size;
 	u->type = update_reshape;
 	u->spares_in_update = 0;
@@ -6884,6 +6887,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 					u->reshape_raid_disks = geo.raid_disks;
 				u->reshape_level = geo.level;
 				u->reshape_layout = geo.layout;
+				u->reshape_chunk_size = geo.chunksize;
 				ret_val = 0;
 				append_metadata_update(st, u, u->update_memory_size);
 			}


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 24/27] Add takeover support for external meta
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (22 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 23/27] Migration: Chunk size migration Adam Kwolek
@ 2010-12-06 13:23 ` Adam Kwolek
  2010-12-06 13:24 ` [PATCH 25/27] Takeover raid10 -> raid0 for external metadata Adam Kwolek
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

This patch introduces 0->10 and 10->0 takeover operations for external
metadata. It defines all neccessary functions, interfaces and structures.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 Grow.c        |   55 +++++++++++++++++++++++++++-------
 super-intel.c |   91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 133 insertions(+), 13 deletions(-)

diff --git a/Grow.c b/Grow.c
index 465309c..cf37288 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1012,7 +1012,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 	 *
 	 */
 	struct mdu_array_info_s array, orig;
-	char *c;
+	char *c = NULL;
 	int rv = 0;
 	struct supertype *st;
 	char *subarray = NULL;
@@ -1305,17 +1305,47 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				rv = 1;/* not possible */
 				goto release;
 			}
-			err = sysfs_set_str(sra, NULL, "level", c);
-			if (err) {
-				err = errno;
-				fprintf(stderr, Name ": %s: could not set level to %s\n",
-					devname, c);
-				if (err == EBUSY && 
-				    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
-					fprintf(stderr, "       Bitmap must be removed before level can be changed\n");
+			if (level > 0) {
+				err = sysfs_set_str(sra, NULL, "level", c);
+				if (err) {
+					err = errno;
+					fprintf(stderr, Name ": %s: could not set level to %s\n",
+						devname, c);
+					if (err == EBUSY &&
+					    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
+						fprintf(stderr, "       Bitmap must be removed before level can be changed\n");
+					rv = 1;
+					goto release;
+				}
+			}
+
+			if (st && reshape_super(st, -1, level, UnSet, 0, 0, NULL, devname, !quiet)) {
 				rv = 1;
 				goto release;
 			}
+			/* before sending update make sure that for external metadata
+			 * and after changing raid level mdmon is running
+			 */
+			if (st->ss->external && !mdmon_running(st->container_dev) &&
+			    level > 0) {
+				start_mdmon(st->container_dev);
+				if (container)
+					ping_monitor(container);
+			}
+			sync_metadata(st);
+			if (level == 0) {
+				err = sysfs_set_str(sra, NULL, "level", c);
+				if (err) {
+					err = errno;
+					fprintf(stderr, Name ": %s: could not set level to %s\n",
+						devname, c);
+					if (err == EBUSY &&
+					    (array.state & (1<<MD_SB_BITMAP_PRESENT)))
+						fprintf(stderr, "       Bitmap must be removed before level can be changed\n");
+					rv = 1;
+				}
+				goto release;
+			}
 			orig = array;
 			orig_level = orig.level;
 			ioctl(fd, GET_ARRAY_INFO, &array);
@@ -1327,6 +1357,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				fprintf(stderr, Name " level of %s changed to %s\n",
 					devname, c);
 			changed = 1;
+
 		}
 	}
 
@@ -1381,8 +1412,7 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 			/* Looks like this level change doesn't need
 			 * a reshape after all.
 			 */
-			c = map_num(pers, level);
-			if (c) {
+			if ((c) && (level == 0)) {
 				rv = sysfs_set_str(sra, NULL, "level", c);
 				if (rv) {
 					int err = errno;
@@ -1401,7 +1431,8 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 		if (st->ss->external && !mdmon_running(st->container_dev) &&
 		    level > 0) {
 			start_mdmon(st->container_dev);
-			ping_monitor(container);
+			if (container)
+				ping_monitor(container);
 		}
 		goto release;
 	}
diff --git a/super-intel.c b/super-intel.c
index 5779129..27196d7 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -289,6 +289,7 @@ enum imsm_update_type {
 	update_reshape,
 	update_reshape_set_slots,
 	update_reshape_cancel,
+	update_level,
 };
 
 struct imsm_update_activate_spare {
@@ -365,6 +366,30 @@ struct imsm_update_add_disk {
 	enum imsm_update_type type;
 };
 
+struct imsm_disk_changes {
+	int major;
+	int minor;
+	int index;
+};
+
+struct imsm_update_level {
+	enum imsm_update_type type;
+	struct dl *disk_list;
+	int delta_disks;
+	int container_member;
+	int disk_qan;
+	int changes_offset;
+	int rm_qan;
+	int add_qan;
+	struct imsm_dev dev;
+	/* here goes the table with disk changes
+	 */
+	/* and here goes imsm_disk_changes pointed by changes_offset
+	 * disk_changes are put here as row data every sizeof(struct imsm_disk_changes)
+	 *
+	 */
+};
+
 static struct supertype *match_metadata_desc_imsm(char *arg)
 {
 	struct supertype *st;
@@ -5575,6 +5600,9 @@ update_reshape_exit:
 		super->updates_pending++;
 		break;
 	}
+	case update_level: {
+		break;
+	}
 	case update_activate_spare: {
 		struct imsm_update_activate_spare *u = (void *) update->buf; 
 		struct imsm_dev *dev = get_imsm_dev(super, u->array);
@@ -5955,6 +5983,9 @@ static void imsm_prepare_update(struct supertype *st,
 		u->update_prepared = -1;
 		break;
 	}
+	case update_level: {
+		break;
+	}
 	case update_create_array: {
 		struct imsm_update_create_array *u = (void *) update->buf;
 		struct intel_dev *dv;
@@ -6108,6 +6139,13 @@ static const char *imsm_get_disk_controller_domain(const char *path)
 		return NULL;
 }
 
+static int update_level_imsm(struct supertype *st, struct mdinfo *info,
+			     struct geo_params *geo, int verbose,
+			     int uuid_set, char *homehost)
+{
+	return 0;
+}
+
 int imsm_reshape_is_allowed_on_container(struct supertype *st,
 					 struct geo_params *geo)
 {
@@ -6777,6 +6815,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 {
 	int ret_val = 1;
 	struct mdinfo *sra = NULL;
+	struct mdinfo *srac = NULL;
 	int fd = -1;
 	int fdc = -1;
 	char buf[PATH_MAX];
@@ -6830,6 +6869,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 			dprintf("imsm: Operation is not allowed on container\n");
 		if (ret_val)
 			unfreeze_container(st);
+
 		goto imsm_reshape_super_exit;
 	} else
 		dprintf("imsm: not a container operation\n");
@@ -6841,6 +6881,13 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 		goto imsm_reshape_super_exit;
 	}
 
+	srac = sysfs_read(fdc, 0,  GET_VERSION | GET_LEVEL | GET_LAYOUT |
+			  GET_DISKS | GET_DEVS | GET_CHUNK | GET_SIZE);
+	if (srac == NULL) {
+		fprintf(stderr, Name ": Cannot read sysfs info (imsm)\n");
+		goto imsm_reshape_super_exit;
+	}
+
 	fd = open(geo.dev_name , O_RDONLY | O_DIRECT);
 	if (fd < 0) {
 		dprintf("imsm: cannot open device: %s\n", geo.dev_name);
@@ -6857,7 +6904,48 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 	geo.dev_id = -1;
 	find_array_minor(geo.dev_name, 1, st->devnum, &geo.dev_id);
 
-	/* continue volume check - proceed if delta_disk is zero only
+	/* we have volume so takeover can be performed for single volume only
+	 */
+	if ((geo.size == -1) && (geo.layout == UnSet) && (geo.raid_disks == 0) && (geo.level != UnSet) &&
+	    (geo.dev_id > -1)) {
+		/* ok - this is takeover */
+		struct intel_super *super;
+
+		/* takeover raid0<->raid5 doesn't need meta update
+		 * this can be handled by migrations if necessary
+		 */
+		if ((geo.level == 5) && (sra->array.level == 5)) {
+			ret_val = 0;
+			goto imsm_reshape_super_exit;
+		}
+		st->ss->load_super(st, fdc, NULL);
+		super = st->sb;
+		if (!super) {
+			fprintf(stderr, Name ": Super pointer is NULL.\n");
+			goto imsm_reshape_super_exit;
+		}
+		if (super->anchor->num_raid_devs > 1) {
+			fprintf(stderr, Name ": Cannot perform raid10 takeover on multiarray container for imsm.\n");
+			goto imsm_reshape_super_exit;
+		}
+		super->current_vol = 0;
+		st->ss->getinfo_super(st, sra, NULL);
+
+		/* send metadata update for
+		 * raid10 -> raid0 or raid0 -> raid10 takeover */
+		if (((geo.level == 0) && (sra->array.level == 10)) ||
+		   ((geo.level == 10) && (sra->array.level == 0))) {
+			st->update_tail = &st->updates;
+			if (update_level_imsm(st, sra, &geo, 0, 0, NULL) == 0)
+				ret_val = 0;
+			else
+				ret_val = 1;
+			goto imsm_reshape_super_exit;
+		}
+	}
+
+	/* this is not takeover
+	 * continue volume check - proceed if delta_disk is zero only
 	 */
 	if (geo.raid_disks > 0 && geo.raid_disks != UnSet)
 		delta_disks = geo.raid_disks - sra->array.raid_disks;
@@ -6910,6 +6998,7 @@ int imsm_reshape_super(struct supertype *st, long long size, int level,
 
 imsm_reshape_super_exit:
 	sysfs_free(sra);
+	sysfs_free(srac);
 	if (fd >= 0)
 		close(fd);
 	if (fdc >= 0)


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 25/27] Takeover raid10 -> raid0 for external metadata
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (23 preceding siblings ...)
  2010-12-06 13:23 ` [PATCH 24/27] Add takeover support for external meta Adam Kwolek
@ 2010-12-06 13:24 ` Adam Kwolek
  2010-12-06 13:24 ` [PATCH 26/27] Takeover raid0 -> raid10 " Adam Kwolek
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

The patch introduces takeover form level 10 to level 0 for imsm
metadata. This patch contains procedures connected with preparing
and applying metadata update during 10 -> 0 takeover.
When performing takeover 10->0 mdmon should update the external
metadata (due to disk slot and level changes).
To achieve that mdadm, after changing the level in md, mdadm calls
reshape_super() with and prepare the "update_level" metadata update type.
reshape_super) allocates a new imsm_dev with updated disk slot
numbers to be processed by mdmon in process_update().
process_update() discovers missing disks and adds them to imsm
metadata.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 super-intel.c |  193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 193 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 27196d7..a39292a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5601,6 +5601,85 @@ update_reshape_exit:
 		break;
 	}
 	case update_level: {
+		struct imsm_update_level *u = (void *)update->buf;
+		struct imsm_dev *dev_new, *dev = NULL;
+		struct imsm_map *map;
+		struct dl *d;
+		int i, j;
+		int start_disk;
+
+		dev_new = &u->dev;
+		for (i = 0; i < mpb->num_raid_devs; i++) {
+			dev = get_imsm_dev(super, i);
+			if (strcmp((char *)dev_new->volume, (char *)dev->volume) == 0)
+				break;
+		}
+		if (i == super->anchor->num_raid_devs)
+			return;
+
+		if (dev == NULL)
+			return;
+
+		struct imsm_disk_changes *changes = (struct imsm_disk_changes *)((void *)u + u->changes_offset);
+		map = get_imsm_map(dev_new, 0);
+		int *tab = (int *)&map->disk_ord_tbl;
+
+		/* iterate through devices to mark unused disks as spare and update order table */
+		for (i = 0; i < u->rm_qan; i++) {
+			struct dl *dm = NULL;
+			for (dm = super->disks; dm; dm = dm->next) {
+				if ((dm->major != changes[i].major) ||
+				    (dm->minor != changes[i].minor))
+					continue;
+				for (j = 0; j < u->disk_qan; j++)
+					if ((tab[j] > dm->index) && (dm->index >= 0))
+						tab[j]--;
+				struct dl *du;
+				for (du = super->disks; du; du = du->next)
+					if ((du->index > dm->index) && (du->index > 0))
+						du->index--;
+				dm->disk.status = SPARE_DISK;
+				dm->index = -1;
+			}
+		}
+
+		if (u->rm_qan) {
+			/* Remove unused entrys in disk_ord_tbl */
+			for (i = 0; i < u->disk_qan; i++) {
+				if (tab[i] < 0)
+					for (j = i; j < (u->disk_qan - 1); j++)
+						tab[j] = tab[j+1];
+			}
+		}
+
+		imsm_copy_dev(dev, dev_new);
+		map = get_imsm_map(dev, 0);
+		start_disk = mpb->num_disks;
+
+		/* clear missing disks list */
+		while (super->missing) {
+			d = super->missing;
+			super->missing = d->next;
+			__free_imsm_disk(d);
+		}
+		if (u->rm_qan)
+			find_missing(super);
+
+		/* clear new disk entries if number of disks increased*/
+		d = super->missing;
+		for (i = start_disk; i < map->num_members; i++) {
+			if (!d)
+				break;
+			memset(&d->disk, 0, sizeof(d->disk));
+			strcpy((char *)d->disk.serial, "MISSING");
+			d->disk.total_blocks = map->blocks_per_member;
+			/* Set slot for missing disk */
+			set_imsm_ord_tbl_ent(map, i, d->index | IMSM_ORD_REBUILD);
+			d->raiddisk = i;
+			d = d->next;
+		}
+
+		super->updates_pending++;
 		break;
 	}
 	case update_activate_spare: {
@@ -6143,6 +6222,120 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 			     struct geo_params *geo, int verbose,
 			     int uuid_set, char *homehost)
 {
+	struct intel_super *super = st->sb;
+	struct imsm_update_level *u;
+	struct imsm_dev *dev_new, *dev = NULL;
+	struct imsm_map *map_new, *map;
+	struct mdinfo *newdi;
+	struct dl *dl;
+	int *tmp_ord_tbl;
+	int i, slot, idx;
+	int len;
+
+	/* update level is used only for 0->10 and 10->0 transitions */
+	if ((info->array.level != 10 && (geo->level != 0)) &&
+		((info->array.level != 0) && (geo->level != 10)))
+		return 1;
+
+	/* find requested device */
+	for (i = 0; i < super->anchor->num_raid_devs; i++) {
+		dev = __get_imsm_dev(super->anchor, i);
+		int devnum = -1;
+
+		if (dev == NULL)
+			return 1;
+
+		find_array_minor((char *)dev->volume, 1, st->devnum, &devnum);
+		if (devnum == geo->dev_id)
+			break;
+	}
+
+	map = get_imsm_map(dev, 0);
+	geo->raid_disks = (geo->level == 10) ? 4 : map->num_members;
+
+	if (!is_raid_level_supported(super->orom,
+				     geo->level,
+				     geo->raid_disks))
+		return 1;
+
+	len = sizeof(struct imsm_update_level) +
+		((geo->raid_disks - 1) * sizeof(__u32)) +
+		(geo->raid_disks * sizeof(struct imsm_disk_changes));
+
+	u = malloc(len);
+	if (u == NULL)
+		return 1;
+
+	u->changes_offset = sizeof(struct imsm_update_level) + ((geo->raid_disks - 1) * sizeof(__u32));
+	struct imsm_disk_changes *change = (struct imsm_disk_changes *) ((void *)u + u->changes_offset);
+	u->rm_qan = 0;
+	u->disk_list = NULL;
+	u->disk_qan = geo->raid_disks;
+
+	dev_new = &u->dev;
+	imsm_copy_dev(dev_new, dev);
+	map_new = get_imsm_map(dev_new, 0);
+
+	tmp_ord_tbl = malloc(sizeof(int) * geo->raid_disks);
+	if (tmp_ord_tbl == NULL) {
+		free(u);
+		return 1;
+	}
+
+	for (i = 0; i < geo->raid_disks; i++) {
+		tmp_ord_tbl[i] = -1;
+		change[i].major = -1;
+		change[i].minor = -1;
+	}
+
+	/* 10->0 transition:
+	 * - mark unused disks
+	 * - update indexes in order table
+	 */
+	if (geo->level == 0) {
+	/* iterate through devices to detect slot changes */
+		i = 0;
+		for (dl = super->disks; dl; dl = dl->next) {
+			idx = -1;
+			for (newdi = info->devs; newdi; newdi = newdi->next) {
+				if ((dl->major != newdi->disk.major) ||
+					    (dl->minor != newdi->disk.minor) ||
+					    (newdi->disk.raid_disk < 0))
+					continue;
+				slot = get_imsm_disk_slot(map, dl->index);
+				idx = get_imsm_ord_tbl_ent(dev_new, slot);
+				tmp_ord_tbl[newdi->disk.raid_disk] = idx;
+				break;
+			}
+			/* if slot not found, mark disk as not used */
+			if ((idx == -1) && (!(dl->disk.status & SPARE_DISK))) {
+				change[i].major = dl->major;
+				change[i].minor = dl->minor;
+				u->rm_qan++;
+				i++;
+			}
+		}
+		for (i = 0; i < geo->raid_disks; i++)
+		set_imsm_ord_tbl_ent(map_new, i, tmp_ord_tbl[i]);
+	}
+
+	map_new->num_members = (geo->level == 10) ? geo->raid_disks : (info->array.raid_disks - u->rm_qan);
+	map_new->map_state = IMSM_T_STATE_NORMAL;
+	map_new->failed_disk_num = 0;
+
+	if (geo->level == 10) {
+		map_new->num_domains = map_new->num_members / 2;
+		map_new->raid_level = 1;
+	} else {
+		map_new->num_domains = 1;
+		map_new->raid_level = geo->level;
+	}
+
+	u->type = update_level;
+	u->delta_disks = 0;
+	u->container_member = info->container_member;
+	append_metadata_update(st, u, len);
+	free(tmp_ord_tbl);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 26/27] Takeover raid0 -> raid10 for external metadata
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (24 preceding siblings ...)
  2010-12-06 13:24 ` [PATCH 25/27] Takeover raid10 -> raid0 for external metadata Adam Kwolek
@ 2010-12-06 13:24 ` Adam Kwolek
  2010-12-06 13:24 ` [PATCH 27/27] FIX: Problem with removing array after takeover Adam Kwolek
  2010-12-07 10:18 ` [PATCH 00/27] OLCE, migrations and raid10 takeover Neil Brown
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

The patch introduces takeover form level 0 to level 10 for imsm
metadata. This patch contains procedures connected with preparing
and applying metadata update during 0 -> 10 takeover.

When performing takeover 0->10 mdmon should update the external
metadata (due to disk slot and level changes).
To achieve that mdadm, after changing the level in md, calls
reshape_super() with and prepare the "update_level" metadata update
type.
reshape_super) allocates a new imsm_dev with updated disk slot
numbers to be processed by mdmon in process_update().
process_update() discovers missing disks and adds them to imsm
metadata.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---

 Grow.c        |    2 ++
 super-intel.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/Grow.c b/Grow.c
index cf37288..24955b4 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1346,6 +1346,8 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
 				}
 				goto release;
 			}
+			if (level == 10)
+			    goto release;
 			orig = array;
 			orig_level = orig.level;
 			ioctl(fd, GET_ARRAY_INFO, &array);
diff --git a/super-intel.c b/super-intel.c
index a39292a..67209d2 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3623,6 +3623,8 @@ static int write_super_imsm(struct supertype *st, int doclose)
 	for (d = super->disks; d ; d = d->next) {
 		if (d->index < 0)
 			continue;
+		if (d->fd < 0)
+		    continue;
 		if (store_imsm_mpb(d->fd, mpb))
 			fprintf(stderr, "%s: failed for device %d:%d %s\n",
 				__func__, d->major, d->minor, strerror(errno));
@@ -5652,6 +5654,25 @@ update_reshape_exit:
 			}
 		}
 
+		if (u->add_qan)
+		    for (i = 0; i < u->disk_qan; i++)
+			tab[i] = i;
+
+		struct dl *dc;
+		for (i = 0; i < u->add_qan; i++) {
+			/* update indexes in current list */
+			for (dc = super->disks; dc; dc = dc->next) {
+				if (dc->index >= changes[i].index)
+				    dc->index++;
+			}
+			/* mark dummy disks for rebuild */
+			tab[changes[i].index] |= IMSM_ORD_REBUILD;
+		}
+		/* append dummy disk list at the end of current list */
+		for (dc = super->disks; dc->next; dc = dc->next)
+			; /* nothing to do, just go to the end of list */
+		dc->next = u->disk_list;
+
 		imsm_copy_dev(dev, dev_new);
 		map = get_imsm_map(dev, 0);
 		start_disk = mpb->num_disks;
@@ -6063,6 +6084,30 @@ static void imsm_prepare_update(struct supertype *st,
 		break;
 	}
 	case update_level: {
+		struct imsm_update_level *u = (void *) update->buf;
+		int i;
+		struct imsm_disk_changes *changes = (struct imsm_disk_changes *)((void *)u + u->changes_offset);
+
+		dprintf("prepare_update(): update level\n");
+
+		for (i = 0; i < u->add_qan; i++) {
+			struct dl *d = calloc(1, sizeof(struct dl));
+			if (!d)
+			    break;
+			memcpy(d, super->disks, sizeof(struct dl));
+
+			d->disk.status = FAILED_DISK;
+			strcpy((char *)d->disk.serial, "dummy");
+			strcpy((char *)d->serial, "dummy");
+			d->disk.scsi_id = 0;
+			d->fd = -1;
+			d->minor = 0;
+			d->major = 0;
+			d->index = changes[i].index;
+			d->next = u->disk_list;
+			u->disk_list = d;
+		}
+		len = disks_to_mpb_size(u->add_qan + mpb->num_disks - u->rm_qan);
 		break;
 	}
 	case update_create_array: {
@@ -6269,6 +6314,7 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 	u->changes_offset = sizeof(struct imsm_update_level) + ((geo->raid_disks - 1) * sizeof(__u32));
 	struct imsm_disk_changes *change = (struct imsm_disk_changes *) ((void *)u + u->changes_offset);
 	u->rm_qan = 0;
+	u->add_qan = 0;
 	u->disk_list = NULL;
 	u->disk_qan = geo->raid_disks;
 
@@ -6286,6 +6332,7 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 		tmp_ord_tbl[i] = -1;
 		change[i].major = -1;
 		change[i].minor = -1;
+		change[i].index = -1;
 	}
 
 	/* 10->0 transition:
@@ -6319,6 +6366,25 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
 		set_imsm_ord_tbl_ent(map_new, i, tmp_ord_tbl[i]);
 	}
 
+	/* 0->10 transition:
+	 * - add dummy disks to metdata
+	 * - store slots for dummy disks in update buffer
+	 */
+	if (geo->level == 10) {
+		u->add_qan = 0;
+		for (i = 0; i < geo->raid_disks; i++) {
+			int found = 0;
+			for (newdi = info->devs; newdi; newdi = newdi->next) {
+				if (newdi->disk.raid_disk == i) {
+					found = 1;
+					break;
+				}
+			}
+		if (!found)
+		    change[u->add_qan++].index = i;
+		}
+	}
+
 	map_new->num_members = (geo->level == 10) ? geo->raid_disks : (info->array.raid_disks - u->rm_qan);
 	map_new->map_state = IMSM_T_STATE_NORMAL;
 	map_new->failed_disk_num = 0;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 27/27] FIX: Problem with removing array after takeover
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (25 preceding siblings ...)
  2010-12-06 13:24 ` [PATCH 26/27] Takeover raid0 -> raid10 " Adam Kwolek
@ 2010-12-06 13:24 ` Adam Kwolek
  2010-12-07 10:18 ` [PATCH 00/27] OLCE, migrations and raid10 takeover Neil Brown
  27 siblings, 0 replies; 36+ messages in thread
From: Adam Kwolek @ 2010-12-06 13:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

When array parameters are changed old array 'A' is going to be removed
and new array 'B' is going to be serviced. If array B is raid0 array (takeovered),
array 'A' will never be deleted and mdmon is not going to exit.
Scenario:
1. managemon creates array 'B' and inserts it to begin of active arrays list
2. managemon sets field B->replaces = A

3. monitor: finds that array 'B' is raid 0 array and removes it from list
   information about removing array 'A' from list is lost
   and array 'A' stays in list forever

To resolve this situation wait with removing array 'B' until array 'A' is not removed.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 monitor.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/monitor.c b/monitor.c
index 6385555..ff2da9a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -509,10 +509,14 @@ static int wait_and_act(struct supertype *container, int nowait)
 
 	for (ap = aap ; *ap ;) {
 		a = *ap;
+
 		/* once an array has been deactivated we want to
 		 * ask the manager to discard it.
+		 * but to do this we have to wait until replaced
+		 * array is removed
 		 */
-		if (!a->container || (a->info.array.level == 0)) {
+		if ((!a->container || (a->info.array.level == 0)) &&
+		     !a->replaces) {
 			if (discard_this) {
 				ap = &(*ap)->next;
 				continue;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/27] OLCE, migrations and raid10 takeover
  2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
                   ` (26 preceding siblings ...)
  2010-12-06 13:24 ` [PATCH 27/27] FIX: Problem with removing array after takeover Adam Kwolek
@ 2010-12-07 10:18 ` Neil Brown
  27 siblings, 0 replies; 36+ messages in thread
From: Neil Brown @ 2010-12-07 10:18 UTC (permalink / raw)
  To: Adam Kwolek; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

On Mon, 06 Dec 2010 14:20:45 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> This series for mdadm and introduces features (after some rework):
> - Online Capacity Expansion (OLCE): patches 0002 to 0016
> -  Migrations: patches 0017 to 0023
>     1. raid0 to raid5 : patches 0017, 0018
>     2. raid5 to raid0 : patches 0019, 0020
>     3. chunk size migration) : patches 0020, 0021
> - Takeover: patches 0024 to 0027
> 
> Next steps:
> - Adding spares to raid0 for IMSM will be rewritten by Krzysztof Wojcik in the few days.
> - I'll correct checkpointing to work without md fix for moving suspend_hi

Thanks Adam.

I had hoped to get to these today, but other things got in the way and I just
couldn't make the time.  Hopefully I'll look through and provide some feed
back (and apply a lot of them) tomorrow.

Thanks,
NeilBrown


> 
> 
> Online Capacity Expansion for raid0 and raid5 arrays implements the following algorithm for container reshape:
> 1.      mdadm: Freeze container
> 2.      mdadm: Perform takeover to raid5 for all raid0 arrays in container (imsm for raid0 <->raid5 takeover requires no metadata updates)
> 3.      mdadm: set raid_disks sysfs entry for all arrays in container
> 4.      mdadm: prepares and sends metadata update using reshape_super() vector for first array in container.
> 5.      mdadm: waits for array idle or reshape state
> 6.      managemon: prepare_update(): allocates memory for bigger device object
> 7.      monitor: process_update(): applies update, relinks memory for device objects. Sets reshape_delta_disks variable in active array to requested ne disks
> 8.      monitor: kicks managemon on reshape_delta_disks  value other than RESHAPE_NOT_ACTIVE and RESHAPE_IN_PROGRESS  value
> 9.      managemon: adds devices to md (let md set slot number on reshape start)
> 10.     managemon: sets sync_max to 0
> 11.     managemon: starts reshape in md
> 12.     managemon: on success sends slot verification message to monitor to update slots
> 13.     managemon: on failure sends reshape cancelation message (sets idle state to md)
> 14.     managemon: sets reshape_delta_disks variable to RESHAPE_IN_PROGRESS value to avoid managemon procedures reentry.
> 15.     monitor:
>            a. for set slot message verifies and corrects (if necessary) slot information in metadata
>            b. for cancel message roll backs metadata information, set reshape_delta_disks variable to RESHAPE_NOT_ACTIVE
> 16.     mdadm:  on idle array state exits and unfreezes array. End
> 17.     mdadm: on reshape array state continues with reshape (it also sends ping to monitor and mandgemon to be sure that metadata updates hits disks)
> 18.     mdadm: verifies array state: if slots are set correctly
> 19.     mdadm: calls child_grow() function
> 20.     mdadm: waits for reshape finish
> 21.     monitor: on reshape finish sets reshape_delta_disks variable to RESHAPE_NOT_ACTIVE
> 22.     mdadm: sets array size according to information in metadata
> 23.     mdadm: for raid0 array backward takeover to raid0 is executed.
> 24.     mdadm: check if other array in container requires reshape if, yes starts from #4
> 25.     mdadm: unfreezes array
> 
> Migration feature reuses code flow introduced for OLCE (Online Capacity Expansion) and uses the same grow/reshape flow in mdadm/mdmon.
> Migration works in the following way:
> 1. mdadm: reshape_super() prepares metadata update and sends it to mdmon
> 2. mdadm: waits for reshape array state
> 3. monitor: receives metadata update and applies it.
> 4. monitor: metadata update triggers managemon.
> 5. managemon: updates array (md) configuration and starts reshape
> 6. mdadm: finds that reshape is started and continues it using check pointing
> 7. mdadm: reshape is finished and manage_reshape() finalizes array:
>     - Sets array size as is given in metadata
>     - Performs takeover to raid0 if necessary
> 
> In current patches placement of manage_reshape() function call was changed (patch 0019).
> It is moved to end of array processing to use common code form Grow.c for external metadata reshape case (we do not need to duplicate existing code) as it would do the same
> things as code for native metadata. New manage_reshape() placement causes a few things to do in current implementation only and simplifees code.
> 
> Migrations command line:
> 1. Execute migration raid0->raid5:
>     mdadm  --grow /dev/md/array_name -level 5 -layout=left-asymmetric
> 
>     This converts n-disks raid0 array to (n+1)-disks raid5 array.
>     Additional disk is user from spares pool for raid5 array.
> 
> 2. Execute migration raid5->raid0:
>     mdadm  - -grow /dev/md/array_name -level 0
> 
>     This converts n-disks raid5 array to n-disks raid0 array.
> 
> 3. Execute chunk size migration
>     mdadm  - -grow /dev/md/array_name -chunk N
> 
>     where N is ne chunk size value
> 
> Online Capacity Expansion command line:
> 1. Add spares to container i.e. mdadm -add /dev/md/imsm_container_name /dev/sdX
>    For Raid0 spares are required also. Patch "[PATCH 16] Add spares to raid0 array using takeover" enables this.
> 2. Execute reshape i.e. : mdadm -grown /dev/md/imsm_container_name -raid-devices=requested_raid_disks_number
>    Grow is executed for all arrays in container that command is executed on.
> 
> Feature is treated as experimental due to Windows compatibility during reshape process, code is guarded by MDADM_EXPERIMENTAL environment variable.
> 
> 
> ---
> 
> Adam Kwolek (27):
>       FIX: Problem with removing array after takeover
>       Takeover raid0 -> raid10 for external metadata
>       Takeover raid10 -> raid0 for external metadata
>       Add takeover support for external meta
>       Migration: Chunk size migration
>       Read chunk size and layout from mdstat
>       Migration raid0->raid5
>       Detect level change
>       Migration: raid5->raid0
>       Change manage_reshape() placement
>       Enable reshape for subarrays
>       FIX: core during getting map
>       WORKAROUND: md reports idle state during reshape start
>       Add reshape progress updating
>       Finalize reshape after adding disks to array
>       Control reshape in mdadm
>       imsm: Fill delta_disks field in getinfo_super()
>       imsm: Do not indicate resync during reshape
>       imsm: Do not accept messages sent by mdadm
>       imsm: Cancel metadata changes on reshape start failure
>       imsm: Verify slots in meta against slot numbers set by md
>       Process reshape initialization by managemon
>       imsm: Block array state change during reshape
>       imsm: Process reshape_update in mdmon
>       imsm: Prepare reshape_update in mdadm
>       Add state_of_reshape for external metadata
>       FIX: wait_backup() sometimes hangs
> 
> 
>  Grow.c        |  161 ++--
>  managemon.c   |  179 ++++
>  mdadm.h       |   40 +
>  mdmon.c       |   65 ++
>  mdmon.h       |    9 
>  mdstat.c      |   11 
>  monitor.c     |   37 +
>  super-intel.c | 2468 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  sysfs.c       |  169 ++++
>  util.c        |  147 +++
>  10 files changed, 3218 insertions(+), 68 deletions(-)
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/27] imsm: Prepare reshape_update in mdadm
  2010-12-06 13:21 ` [PATCH 03/27] imsm: Prepare reshape_update in mdadm Adam Kwolek
@ 2010-12-08  3:10   ` Neil Brown
  2010-12-08 14:18     ` Kwolek, Adam
  0 siblings, 1 reply; 36+ messages in thread
From: Neil Brown @ 2010-12-08  3:10 UTC (permalink / raw)
  To: Adam Kwolek; +Cc: linux-raid, dan.j.williams, ed.ciechanowski

On Mon, 06 Dec 2010 14:21:08 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

>  
> +int path2devnum(char *pth)
> +{
> +	char *ep;
> +	int fd = -1;
> +	char *dev_pth = NULL;
> +	char *dev_str;
> +	int dev_num = -1;
> +
> +	fd = open(pth, O_RDONLY);
> +	if (fd < 0)
> +		return dev_num;
> +	close(fd);
> +	dev_pth = canonicalize_file_name(pth);
> +	if (dev_pth == NULL)
> +		return dev_num;
> +	dev_str = strrchr(dev_pth, '/');
> +	if (dev_str) {
> +		while (!isdigit(dev_str[0]))
> +			dev_str++;
> +		dev_num = strtoul(dev_str, &ep, 10);
> +		if (*ep != '\0')
> +			dev_num = -1;
> +	}
> +
> +	if (dev_pth)
> +		free(dev_pth);
> +
> +	return dev_num;
> +}


I have repeatedly asked you to explain and document these functions.  They
look way to complex for whatever it that they might be trying to do.

I do not feel at all included to spend time reviewing your patches if you
don't respond to the review comments.

So I'm ignoring the rest of this series until you explain this and we come to
an agreement on what it does and how it should work.

BTW there are two places in this patch where you should be using open_dev,
and several where you have lines longer than 80 characters.

NeilBrown


> +
> +extern void map_read(struct map_ent **map);
> +extern void map_free(struct map_ent *map);
> +int find_array_minor(char *text_version, int external, int container, int *minor)
> +{
> +	int i;
> +	char path[PATH_MAX];
> +	struct stat s;
> +
> +	if (minor == NULL)
> +		return -2;
> +
> +	snprintf(path, PATH_MAX, "/dev/md/%s", text_version);
> +	i = path2devnum(path);
> +	if (i > -1) {
> +		*minor = i;
> +		return 0;
> +	}
> +
> +	i = path2devnum(text_version);
> +	if (i > -1) {
> +		*minor = i;
> +		return 0;
> +	}
> +
> +	if (container > 0) {
> +		struct map_ent *map = NULL;
> +		struct map_ent *m;
> +		char cont[PATH_MAX];
> +
> +		snprintf(cont, PATH_MAX, "/md%i/", container);
> +		map_read(&map);
> +		for (m = map; m; m = m->next) {
> +			int index;
> +			unsigned int len = 0;
> +			char buf[PATH_MAX];
> +
> +			/* array have belongs to proper container
> +			*/
> +			if (strncmp(cont, m->metadata, 6) != 0)
> +				continue;
> +			/* begin of array name in map have to be the same
> +			 * as array name in metadata
> +			 */
> +			if (strncmp(m->path, path, strlen(path)) != 0)
> +				continue;
> +			/* array name has to be followed by '_' char
> +			 */
> +			len = strlen(path);
> +			if (*(m->path + len) != '_')
> +				continue;
> +			/* then we have to have  valid index
> +			 */
> +			len++;
> +			if (strlen(m->path + len) <= 0)
> +			    continue;
> +			/* index has to be las position in array name
> +			 */
> +			index = atoi(m->path + strlen(path) + 1);
> +			snprintf(buf, PATH_MAX, "%i", index);
> +			len += strlen(buf);
> +			if (len != strlen(m->path))
> +				continue;
> +			dprintf("Found %s device based on mdadm maps\n", m->path);
> +			*minor = m->devnum;
> +			map_free(map);
> +			return 0;
> +		}
> +		map_free(map);
> +	}
> +
> +	for (i = 127; i >= 0; i--) {
> +		char buf[PATH_MAX];
> +
> +		snprintf(path, PATH_MAX, "/sys/block/md%d/md/", i);
> +		if (stat(path, &s) != -1) {
> +			strcat(path, "metadata_version");
> +			if (load_sys(path, buf))
> +				continue;
> +			if (external) {
> +				char *version = strchr(buf, ':');
> +				if (version && strcmp(version + 1,
> +						      text_version))
> +					continue;
> +			} else {
> +				if (strcmp(buf, text_version))
> +					continue;
> +			}
> +			*minor = i;
> +			return 0;
> +		}
> +	}
> +
> +	return -1;
> +}
> +
> +/* find_array_minor2 looks for frozen devices also
> + */
> +int find_array_minor2(char *text_version, int external, int container, int *minor)
> +{
> +	int result;
> +	char buf[PATH_MAX];
> +
> +	strcpy(buf, text_version);
> +	result = find_array_minor(text_version, external, container, minor);
> +	if (result < 0) {
> +		/* try to find frozen array also
> +		 */
> +		char buf[PATH_MAX];
> +
> +		strcpy(buf, text_version);
> +
> +		*buf = '-';
> +		result = find_array_minor(buf, external, container, minor);
> +	}
> +	return result;
> +}
> +
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 03/27] imsm: Prepare reshape_update in mdadm
  2010-12-08  3:10   ` Neil Brown
@ 2010-12-08 14:18     ` Kwolek, Adam
  2010-12-08 22:05       ` Neil Brown
  0 siblings, 1 reply; 36+ messages in thread
From: Kwolek, Adam @ 2010-12-08 14:18 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

I didn't describe those functions, because I've realized that it has to be changed after your email.
I've planned to do this before I send patches, but unfortunately this "thread" was missed before sending ;).
Here is current state.

Functions: 
	- path2devnum() (Translates path (or symbolic link) to md device given by user to device number.)
	- find_array_minor() (returns device minor for given array name, it uses maps to find names with additions (i.e. "_0") from assebbly)
	- find_array_minor2() (as find_array_minor() but works for frozen arrays also)

Will be removed and replaced by:

int find_array_minor_by_subdev(int subdev, int container, int *minor) {
	char text_version[PATH_MAX];
	char path[PATH_MAX];
	int i;
	struct stat s;

	sprintf(text_version, "md%i/%i", container, subdev);
	for (i = 127; i >= 0; i--) {
		char buf[PATH_MAX];

		snprintf(path, PATH_MAX, "/sys/block/md%d/md/", i);
		if (stat(path, &s) != -1) {
			char *version;

			strcat(path, "metadata_version");
			if (load_sys(path, buf))
				continue;
			version = strchr(buf, ':');
			/* compare without first letter
			 * it could be marked as frozen with '-'
			 */
			if (!version || strcmp(version + 2, text_version))
				continue;
			*minor = i;
			return 0;
		}
	}

	return -1;
}

This is a little part of find_array_minor() with changed input.

This function searches given container for subdev and returns it minor id.
i.e.: for input parameters 0 and 127 it creates search string i.e. "md127/0" and for such device returns minor (~126).
For success 0 is returned, and -1 for failure (and passed minor variable remains untouched). 
It searches all devices from 127 to 0. 
I think it is simpler now (as you raised this issue), and mdadm doesn't make decisions based on array name/string/ now.

Please let me know how do you find this change. 

I've replaced open() with open_dev() in this patch (and in a few next one: 0003, 0007, 0012, 0019, 0021)

If it is ok, please let me know if you want fresh series or you'll apply current one and I'll send patches for above changes then.
In future patches, I'll address problem with lines longer than 80 characters also.

Please note also that implementation of searching spare devices for reshape is not final. 
It requires better integration with auto-rebuild searching spares mechanism.
This task is in my nearest plans (I'm hoping to post fixes instead of whole series again ;))


BR
Adam

> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Wednesday, December 08, 2010 4:10 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: [PATCH 03/27] imsm: Prepare reshape_update in mdadm
> 
> On Mon, 06 Dec 2010 14:21:08 +0100 Adam Kwolek <adam.kwolek@intel.com>
> wrote:
> 
> >
> > +int path2devnum(char *pth)
> > +{
> > +	char *ep;
> > +	int fd = -1;
> > +	char *dev_pth = NULL;
> > +	char *dev_str;
> > +	int dev_num = -1;
> > +
> > +	fd = open(pth, O_RDONLY);
> > +	if (fd < 0)
> > +		return dev_num;
> > +	close(fd);
> > +	dev_pth = canonicalize_file_name(pth);
> > +	if (dev_pth == NULL)
> > +		return dev_num;
> > +	dev_str = strrchr(dev_pth, '/');
> > +	if (dev_str) {
> > +		while (!isdigit(dev_str[0]))
> > +			dev_str++;
> > +		dev_num = strtoul(dev_str, &ep, 10);
> > +		if (*ep != '\0')
> > +			dev_num = -1;
> > +	}
> > +
> > +	if (dev_pth)
> > +		free(dev_pth);
> > +
> > +	return dev_num;
> > +}
> 
> 
> I have repeatedly asked you to explain and document these functions.
> They
> look way to complex for whatever it that they might be trying to do.
> 
> I do not feel at all included to spend time reviewing your patches if
> you
> don't respond to the review comments.
> 
> So I'm ignoring the rest of this series until you explain this and we
> come to
> an agreement on what it does and how it should work.
> 
> BTW there are two places in this patch where you should be using
> open_dev,
> and several where you have lines longer than 80 characters.
> 
> NeilBrown
> 
> 
> > +
> > +extern void map_read(struct map_ent **map);
> > +extern void map_free(struct map_ent *map);
> > +int find_array_minor(char *text_version, int external, int
> container, int *minor)
> > +{
> > +	int i;
> > +	char path[PATH_MAX];
> > +	struct stat s;
> > +
> > +	if (minor == NULL)
> > +		return -2;
> > +
> > +	snprintf(path, PATH_MAX, "/dev/md/%s", text_version);
> > +	i = path2devnum(path);
> > +	if (i > -1) {
> > +		*minor = i;
> > +		return 0;
> > +	}
> > +
> > +	i = path2devnum(text_version);
> > +	if (i > -1) {
> > +		*minor = i;
> > +		return 0;
> > +	}
> > +
> > +	if (container > 0) {
> > +		struct map_ent *map = NULL;
> > +		struct map_ent *m;
> > +		char cont[PATH_MAX];
> > +
> > +		snprintf(cont, PATH_MAX, "/md%i/", container);
> > +		map_read(&map);
> > +		for (m = map; m; m = m->next) {
> > +			int index;
> > +			unsigned int len = 0;
> > +			char buf[PATH_MAX];
> > +
> > +			/* array have belongs to proper container
> > +			*/
> > +			if (strncmp(cont, m->metadata, 6) != 0)
> > +				continue;
> > +			/* begin of array name in map have to be the same
> > +			 * as array name in metadata
> > +			 */
> > +			if (strncmp(m->path, path, strlen(path)) != 0)
> > +				continue;
> > +			/* array name has to be followed by '_' char
> > +			 */
> > +			len = strlen(path);
> > +			if (*(m->path + len) != '_')
> > +				continue;
> > +			/* then we have to have  valid index
> > +			 */
> > +			len++;
> > +			if (strlen(m->path + len) <= 0)
> > +			    continue;
> > +			/* index has to be las position in array name
> > +			 */
> > +			index = atoi(m->path + strlen(path) + 1);
> > +			snprintf(buf, PATH_MAX, "%i", index);
> > +			len += strlen(buf);
> > +			if (len != strlen(m->path))
> > +				continue;
> > +			dprintf("Found %s device based on mdadm maps\n", m-
> >path);
> > +			*minor = m->devnum;
> > +			map_free(map);
> > +			return 0;
> > +		}
> > +		map_free(map);
> > +	}
> > +
> > +	for (i = 127; i >= 0; i--) {
> > +		char buf[PATH_MAX];
> > +
> > +		snprintf(path, PATH_MAX, "/sys/block/md%d/md/", i);
> > +		if (stat(path, &s) != -1) {
> > +			strcat(path, "metadata_version");
> > +			if (load_sys(path, buf))
> > +				continue;
> > +			if (external) {
> > +				char *version = strchr(buf, ':');
> > +				if (version && strcmp(version + 1,
> > +						      text_version))
> > +					continue;
> > +			} else {
> > +				if (strcmp(buf, text_version))
> > +					continue;
> > +			}
> > +			*minor = i;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -1;
> > +}
> > +
> > +/* find_array_minor2 looks for frozen devices also
> > + */
> > +int find_array_minor2(char *text_version, int external, int
> container, int *minor)
> > +{
> > +	int result;
> > +	char buf[PATH_MAX];
> > +
> > +	strcpy(buf, text_version);
> > +	result = find_array_minor(text_version, external, container,
> minor);
> > +	if (result < 0) {
> > +		/* try to find frozen array also
> > +		 */
> > +		char buf[PATH_MAX];
> > +
> > +		strcpy(buf, text_version);
> > +
> > +		*buf = '-';
> > +		result = find_array_minor(buf, external, container, minor);
> > +	}
> > +	return result;
> > +}
> > +
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/27] imsm: Prepare reshape_update in mdadm
  2010-12-08 14:18     ` Kwolek, Adam
@ 2010-12-08 22:05       ` Neil Brown
  2010-12-09  8:42         ` Suspend_hi mamagment during reshape Kwolek, Adam
  0 siblings, 1 reply; 36+ messages in thread
From: Neil Brown @ 2010-12-08 22:05 UTC (permalink / raw)
  To: Kwolek, Adam; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

On Wed, 8 Dec 2010 13:51:27 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> I didn't describe those functions, because I've realized that it has to be changed after your email.
> I've planned to do this before I send patches, but unfortunately this "thread" was missed before sending ;).
> Here is current state.
> 
> Functions: 
> 	- path2devnum() (Translates path (or symbolic link) to md device given by user to device number.)
> 	- find_array_minor() (returns device minor for given array name, it uses maps to find names with additions (i.e. "_0") from assebbly)
> 	- find_array_minor2() (as find_array_minor() but works for frozen arrays also)
> 
> Will be removed and replaced by:
> 
> int find_array_minor_by_subdev(int subdev, int container, int *minor)
> {
> 	char text_version[PATH_MAX];
> 	char path[PATH_MAX];
> 	int i;
> 	struct stat s;
> 
> 	sprintf(text_version, "md%i/%i", container, subdev);
> 	for (i = 127; i >= 0; i--) {
> 		char buf[PATH_MAX];
> 
> 		snprintf(path, PATH_MAX, "/sys/block/md%d/md/", i);
> 		if (stat(path, &s) != -1) {
> 			char *version;
> 
> 			strcat(path, "metadata_version");
> 			if (load_sys(path, buf))
> 				continue;
> 			version = strchr(buf, ':');
> 			/* compare without first letter
> 			 * it could be marked as frozen with '-'
> 			 */
> 			if (!version || strcmp(version + 2, text_version))
> 				continue;
> 			*minor = i;
> 			return 0;
> 		}
> 	}
> 
> 	return -1;
> }
> 
> This is a little part of find_array_minor() with changed input.
> 
> This function searches given container for subdev and returns it minor id.
> i.e.: for input parameters 0 and 127 it creates search string i.e. "md127/0" and for such device returns minor (~126).
> For success 0 is returned, and -1 for failure (and passed minor variable remains untouched). 
> It searches all devices from 127 to 0. 
> I think it is simpler now (as you raised this issue), and mdadm doesn't make decisions based on array name/string/ now.
> 
> Please let me know how do you find this change. 

Yes much simpler and better.  And I understand easily what you are trying to
do which is also good!

There are two problems:
1/ it assumes that subdevs have a number.  I would prefer that generic code
  assumed that a subdev has a name - if that happens to always be a numeric
  name, that is up to the metadata handler.
2/ searching from 127 to 0 is wrong because md number can be much bigger than
  127 if you have enough devices.  It would be better to use readdir to get a
  list of the /sys/block/mdXX that actually exist.

However I would rather not uses sysfs at all, but mdstat as that is easy to
read and search quickly.
Have a look at mdstat_by_component.  Write something similar which is given a
container and subarray name and returns the corresponding mdstat_ent.

That would be much cleaner and even simpler.


> 
> I've replaced open() with open_dev() in this patch (and in a few next one: 0003, 0007, 0012, 0019, 0021)
> 
> If it is ok, please let me know if you want fresh series or you'll apply current one and I'll send patches for above changes then.
> In future patches, I'll address problem with lines longer than 80 characters also.

Please always send a series of patches against what you pull from my
devel-3.2 branch.  I'll fix up any conflicts against anything I might have
committed since you pulled.
It is probably best to send smallish sets of patches  - maybe about 10 at a
time.  If they are accepted, you can send the next batch.  If there are
issues, you can revise subsequent patches before you send them.

NeilBrown


> 
> Please note also that implementation of searching spare devices for reshape is not final. 
> It requires better integration with auto-rebuild searching spares mechanism.
> This task is in my nearest plans (I'm hoping to post fixes instead of whole series again ;))
> 
> 
> BR
> Adam
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Suspend_hi mamagment during reshape
  2010-12-08 22:05       ` Neil Brown
@ 2010-12-09  8:42         ` Kwolek, Adam
  2010-12-09 10:28           ` Neil Brown
  0 siblings, 1 reply; 36+ messages in thread
From: Kwolek, Adam @ 2010-12-09  8:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

Hi,

I've got a problem with suspend_hi management during check-pointing, as we discuss this a while ago.

Currently, I've corrected check-pointing in the way that mdmon sets suspend_hi to the place that sync_max is set in current pass to guard access.
This assumption looks for me ok in general, problem is when mdadm decides to set sync_max to max. mdmon cannot set max due to fact that this would block
rest of array to user. This means that mdmon should move sync_max and suspend_hi in parallel through the rest of array by some distances.
This can gives us additional opportunities to store checkpoints. I would like to know your opinion about such solution.

Second problem is about cleanup after reshape. 
From uses space after reshape, I'm not able to set suspend_hi to 0. This is up to suspend_hi_store() checks.(suspend_lo cannot be set to 0, and suspend_hi cannot be less than suspend_lo).
I think that part of Maciek's patch should be applied to md in raid5.c, so at the end of raid5_finish_reshape() the following code should be placed:

if (mddev->external) {
	mddev->suspend_hi = 0;
	mddev->suspend_lo = 0;
	mddev->pers->quiesce(mddev, 1);
	mddev->pers->quiesce(mddev, 0);
}

The other option is accept for setting suspend_lo/hi to 0 when there is no array processing (reshape), but first change I think is better.
What is your opinion?

BR
Adam


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Suspend_hi mamagment during reshape
  2010-12-09  8:42         ` Suspend_hi mamagment during reshape Kwolek, Adam
@ 2010-12-09 10:28           ` Neil Brown
  2010-12-09 15:59             ` Kwolek, Adam
  0 siblings, 1 reply; 36+ messages in thread
From: Neil Brown @ 2010-12-09 10:28 UTC (permalink / raw)
  To: Kwolek, Adam; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

On Thu, 9 Dec 2010 08:42:35 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> Hi,
> 
> I've got a problem with suspend_hi management during check-pointing, as we discuss this a while ago.
> 
> Currently, I've corrected check-pointing in the way that mdmon sets suspend_hi to the place that sync_max is set in current pass to guard access.
> This assumption looks for me ok in general, problem is when mdadm decides to set sync_max to max. mdmon cannot set max due to fact that this would block
> rest of array to user. This means that mdmon should move sync_max and suspend_hi in parallel through the rest of array by some distances.
> This can gives us additional opportunities to store checkpoints. I would like to know your opinion about such solution.

suspend_hi should be manipulated by mdadm, not mdmon.

Here is my outline that I sent earlier.  Please base your implementation on
this, though feel free to comment if you find some part of it doesn't work.

This is from my email to you on 29 Nov 2010 
 subject: Re: [PATCH 00/53] External Metadata Reshape


1/ mdadm freezes the array so the no recovery or reshape can start.
2/ mdadm sets sync_max to 0 so even when the array is unfrozen, no data will
   be relocated.  It also sets suspend_lo and suspend_hi to zero.
3/ mdadm tells the kernel about the requested reshape, setting some or all of
   chunk_size, layout, level, raid_disks (and later, data_offset for each
   device).
4/ mdadm checks that mdmon has noticed the changes and has updates the
   metadata to show a reshape-in-progress (ping_monitor).
5/ mdadm unfreezes the array for mdmon (change the '-' in metadata_version
   back to '/') and calls ping_monitor
6/ mdmon assigns spares as appropriate and tells the kernel which slot to use
   for each.  This requires a kernel change.  The slot number will be stored
   in saved_raid_disk.  ping_monitor doesn't complete until the spares have
   been assigned.
7/ mdadm asked the kernel to start reshape (echo reshape > sync_action).
   This causes md_check_recovery to all remove_and_add_spares which will
   add the chosen spares to the required slots and will create the reshape
   thread.  That thread will not actually do anything yet as sync_max
   is still 0.

8/ Now we loop, performing backups, reshaping data, and updating the metadata.
   It proceeds in a 'double-buffered' process where we are backing up one
   section while the previous section is being reshaped.

 8a/ mdadm sets suspend_hi to a larger number.  This blocks until intervening
     IO is flushed.
 8b/ mdadm makes a backup copy of the data up to the new suspend_hi
 8c/ mdadm updates sync_max to match suspend_hi.
 8d/ kernel starts reshaping data and periodically signals progress through
     sync_completed
 8e/ mdmon notices sync_completed changing and updates the metadata to
     record how far the reshape has progressed. 
 8f/ mdadm notices sync_completed changing and when it passes the end of the
     oldest of the two sections being worked on it uses ping_monitor to
     ensure the metadata is up-to-date and then moves suspend_lo to the
     beginning of the next section, and then goes back to 8a.

9/ When sync_completed reaches the end of the array, mdmon will notice and
   update the metadata to show that the reshape has finished, and mdadm will
   set both suspend_lo and suspend_hi to beyond the end of the array, and all
   is done.


> 
> Second problem is about cleanup after reshape. 
> >From uses space after reshape, I'm not able to set suspend_hi to 0. This is up to suspend_hi_store() checks.(suspend_lo cannot be set to 0, and suspend_hi cannot be less than suspend_lo).
> I think that part of Maciek's patch should be applied to md in raid5.c, so at the end of raid5_finish_reshape() the following code should be placed:
> 
> if (mddev->external) {
> 	mddev->suspend_hi = 0;
> 	mddev->suspend_lo = 0;
> 	mddev->pers->quiesce(mddev, 1);
> 	mddev->pers->quiesce(mddev, 0);
> }
> 
> The other option is accept for setting suspend_lo/hi to 0 when there is no array processing (reshape), but first change I think is better.
> What is your opinion?

Why do you want to set suspend_hi to zero after a reshape.
Just set both suspend_hi and suspend_lo to the size of the array (which is
where the above process would get them to) and leave them there.

NeilBrown


> 
> BR
> Adam
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Suspend_hi mamagment during reshape
  2010-12-09 10:28           ` Neil Brown
@ 2010-12-09 15:59             ` Kwolek, Adam
  2010-12-09 16:08               ` Kwolek, Adam
  0 siblings, 1 reply; 36+ messages in thread
From: Kwolek, Adam @ 2010-12-09 15:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed



> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Thursday, December 09, 2010 11:28 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: Suspend_hi mamagment during reshape
> 
> On Thu, 9 Dec 2010 08:42:35 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> > Hi,
> >
> > I've got a problem with suspend_hi management during check-pointing,
> as we discuss this a while ago.
> >
> > Currently, I've corrected check-pointing in the way that mdmon sets
> suspend_hi to the place that sync_max is set in current pass to guard
> access.
> > This assumption looks for me ok in general, problem is when mdadm
> decides to set sync_max to max. mdmon cannot set max due to fact that
> this would block
> > rest of array to user. This means that mdmon should move sync_max and
> suspend_hi in parallel through the rest of array by some distances.
> > This can gives us additional opportunities to store checkpoints. I
> would like to know your opinion about such solution.
> 
> suspend_hi should be manipulated by mdadm, not mdmon.
> 
> Here is my outline that I sent earlier.  Please base your
> implementation on
> this, though feel free to comment if you find some part of it doesn't
> work.
> 
> This is from my email to you on 29 Nov 2010
>  subject: Re: [PATCH 00/53] External Metadata Reshape
> 
> 
> 1/ mdadm freezes the array so the no recovery or reshape can start.
> 2/ mdadm sets sync_max to 0 so even when the array is unfrozen, no data
> will
>    be relocated.  It also sets suspend_lo and suspend_hi to zero.
> 3/ mdadm tells the kernel about the requested reshape, setting some or
> all of
>    chunk_size, layout, level, raid_disks (and later, data_offset for
> each
>    device).
> 4/ mdadm checks that mdmon has noticed the changes and has updates the
>    metadata to show a reshape-in-progress (ping_monitor).
> 5/ mdadm unfreezes the array for mdmon (change the '-' in
> metadata_version
>    back to '/') and calls ping_monitor
> 6/ mdmon assigns spares as appropriate and tells the kernel which slot
> to use
>    for each.  This requires a kernel change.  The slot number will be
> stored
>    in saved_raid_disk.  ping_monitor doesn't complete until the spares
> have
>    been assigned.
> 7/ mdadm asked the kernel to start reshape (echo reshape >
> sync_action).
>    This causes md_check_recovery to all remove_and_add_spares which
> will
>    add the chosen spares to the required slots and will create the
> reshape
>    thread.  That thread will not actually do anything yet as sync_max
>    is still 0.
> 
> 8/ Now we loop, performing backups, reshaping data, and updating the
> metadata.
>    It proceeds in a 'double-buffered' process where we are backing up
> one
>    section while the previous section is being reshaped.
> 
>  8a/ mdadm sets suspend_hi to a larger number.  This blocks until
> intervening
>      IO is flushed.
>  8b/ mdadm makes a backup copy of the data up to the new suspend_hi
>  8c/ mdadm updates sync_max to match suspend_hi.
>  8d/ kernel starts reshaping data and periodically signals progress
> through
>      sync_completed
>  8e/ mdmon notices sync_completed changing and updates the metadata to
>      record how far the reshape has progressed.
>  8f/ mdadm notices sync_completed changing and when it passes the end
> of the
>      oldest of the two sections being worked on it uses ping_monitor to
>      ensure the metadata is up-to-date and then moves suspend_lo to the
>      beginning of the next section, and then goes back to 8a.
> 
> 9/ When sync_completed reaches the end of the array, mdmon will notice
> and
>    update the metadata to show that the reshape has finished, and mdadm
> will
>    set both suspend_lo and suspend_hi to beyond the end of the array,
> and all
>    is done.


Yes, I've got it, but for disk add case (OLCE) mdadm participates in process at begin only.
After short time he direct mdmon to go with reshape to sync_max position as critical section is being passed.
At this moment I think that mdmon should handle setting of sync_max. If mdmon will make what mdadm tells him, it should set
suspend_hi to the end of array also (mdmon cannot monitor moving of suspend_hi). Proper setting suspend_hi is possible only together with
sync_max.
Summarizing problem for me is agreement that mdmon should handle moving sync_max entry when mdadm direct to set sync_max to max.
I want to avoid setting large area between suspend_lo and suspend_hi (for a long/reshape time).

... or we should decide that mdadm should participate in whole process (during working on critical area and later)?
This is your intention?

> 
> >
> > Second problem is about cleanup after reshape.
> > >From uses space after reshape, I'm not able to set suspend_hi to 0.
> This is up to suspend_hi_store() checks.(suspend_lo cannot be set to 0,
> and suspend_hi cannot be less than suspend_lo).
> > I think that part of Maciek's patch should be applied to md in
> raid5.c, so at the end of raid5_finish_reshape() the following code
> should be placed:
> >
> > if (mddev->external) {
> > 	mddev->suspend_hi = 0;
> > 	mddev->suspend_lo = 0;
> > 	mddev->pers->quiesce(mddev, 1);
> > 	mddev->pers->quiesce(mddev, 0);
> > }
> >
> > The other option is accept for setting suspend_lo/hi to 0 when there
> is no array processing (reshape), but first change I think is better.
> > What is your opinion?
> 
> Why do you want to set suspend_hi to zero after a reshape.
> Just set both suspend_hi and suspend_lo to the size of the array (which
> is
> where the above process would get them to) and leave them there.
> 
> NeilBrown

I'll try to set those values as you described.

I wanted to set suspend_lo/hi to 0 to get configuration of those entries back to state before reshape.
I think that way, if I cannot manage those keys after reshape than how can I repeat reshape process (i.e. with other grow parameters).
I will need to manage them before I start next operation. After reshape array (imho) should be ready for any next action. I think it is not ready now.
I'm right?

BR
Adam

> 
> >
> > BR
> > Adam
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Suspend_hi mamagment during reshape
  2010-12-09 15:59             ` Kwolek, Adam
@ 2010-12-09 16:08               ` Kwolek, Adam
  0 siblings, 0 replies; 36+ messages in thread
From: Kwolek, Adam @ 2010-12-09 16:08 UTC (permalink / raw)
  To: Kwolek, Adam, Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Kwolek, Adam
> Sent: Thursday, December 09, 2010 5:00 PM
> To: Neil Brown
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> Subject: RE: Suspend_hi mamagment during reshape
> 
> 
> 
> > -----Original Message-----
> > From: Neil Brown [mailto:neilb@suse.de]
> > Sent: Thursday, December 09, 2010 11:28 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> > Subject: Re: Suspend_hi mamagment during reshape
> >
> > On Thu, 9 Dec 2010 08:42:35 +0000 "Kwolek, Adam"
> > <adam.kwolek@intel.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I've got a problem with suspend_hi management during check-
> pointing,
> > as we discuss this a while ago.
> > >
> > > Currently, I've corrected check-pointing in the way that mdmon sets
> > suspend_hi to the place that sync_max is set in current pass to guard
> > access.
> > > This assumption looks for me ok in general, problem is when mdadm
> > decides to set sync_max to max. mdmon cannot set max due to fact that
> > this would block
> > > rest of array to user. This means that mdmon should move sync_max
> and
> > suspend_hi in parallel through the rest of array by some distances.
> > > This can gives us additional opportunities to store checkpoints. I
> > would like to know your opinion about such solution.
> >
> > suspend_hi should be manipulated by mdadm, not mdmon.
> >
> > Here is my outline that I sent earlier.  Please base your
> > implementation on
> > this, though feel free to comment if you find some part of it doesn't
> > work.
> >
> > This is from my email to you on 29 Nov 2010
> >  subject: Re: [PATCH 00/53] External Metadata Reshape
> >
> >
> > 1/ mdadm freezes the array so the no recovery or reshape can start.
> > 2/ mdadm sets sync_max to 0 so even when the array is unfrozen, no
> data
> > will
> >    be relocated.  It also sets suspend_lo and suspend_hi to zero.
> > 3/ mdadm tells the kernel about the requested reshape, setting some
> or
> > all of
> >    chunk_size, layout, level, raid_disks (and later, data_offset for
> > each
> >    device).
> > 4/ mdadm checks that mdmon has noticed the changes and has updates
> the
> >    metadata to show a reshape-in-progress (ping_monitor).
> > 5/ mdadm unfreezes the array for mdmon (change the '-' in
> > metadata_version
> >    back to '/') and calls ping_monitor
> > 6/ mdmon assigns spares as appropriate and tells the kernel which
> slot
> > to use
> >    for each.  This requires a kernel change.  The slot number will be
> > stored
> >    in saved_raid_disk.  ping_monitor doesn't complete until the
> spares
> > have
> >    been assigned.
> > 7/ mdadm asked the kernel to start reshape (echo reshape >
> > sync_action).
> >    This causes md_check_recovery to all remove_and_add_spares which
> > will
> >    add the chosen spares to the required slots and will create the
> > reshape
> >    thread.  That thread will not actually do anything yet as sync_max
> >    is still 0.
> >
> > 8/ Now we loop, performing backups, reshaping data, and updating the
> > metadata.
> >    It proceeds in a 'double-buffered' process where we are backing up
> > one
> >    section while the previous section is being reshaped.
> >
> >  8a/ mdadm sets suspend_hi to a larger number.  This blocks until
> > intervening
> >      IO is flushed.
> >  8b/ mdadm makes a backup copy of the data up to the new suspend_hi
> >  8c/ mdadm updates sync_max to match suspend_hi.
> >  8d/ kernel starts reshaping data and periodically signals progress
> > through
> >      sync_completed
> >  8e/ mdmon notices sync_completed changing and updates the metadata
> to
> >      record how far the reshape has progressed.
> >  8f/ mdadm notices sync_completed changing and when it passes the end
> > of the
> >      oldest of the two sections being worked on it uses ping_monitor
> to
> >      ensure the metadata is up-to-date and then moves suspend_lo to
> the
> >      beginning of the next section, and then goes back to 8a.
> >
> > 9/ When sync_completed reaches the end of the array, mdmon will
> notice
> > and
> >    update the metadata to show that the reshape has finished, and
> mdadm
> > will
> >    set both suspend_lo and suspend_hi to beyond the end of the array,
> > and all
> >    is done.
> 
> 
> Yes, I've got it, but for disk add case (OLCE) mdadm participates in
> process at begin only.
> After short time he direct mdmon to go with reshape to sync_max
> position as critical section is being passed.
> At this moment I think that mdmon should handle setting of sync_max. If
> mdmon will make what mdadm tells him, it should set
> suspend_hi to the end of array also (mdmon cannot monitor moving of
> suspend_hi). Proper setting suspend_hi is possible only together with
> sync_max.
> Summarizing problem for me is agreement that mdmon should handle moving
> sync_max entry when mdadm direct to set sync_max to max.
> I want to avoid setting large area between suspend_lo and suspend_hi
> (for a long/reshape time).
> 
> ... or we should decide that mdadm should participate in whole process
> (during working on critical area and later)?
> This is your intention?
> 
> >
> > >
> > > Second problem is about cleanup after reshape.
> > > >From uses space after reshape, I'm not able to set suspend_hi to
> 0.
> > This is up to suspend_hi_store() checks.(suspend_lo cannot be set to
> 0,
> > and suspend_hi cannot be less than suspend_lo).
> > > I think that part of Maciek's patch should be applied to md in
> > raid5.c, so at the end of raid5_finish_reshape() the following code
> > should be placed:
> > >
> > > if (mddev->external) {
> > > 	mddev->suspend_hi = 0;
> > > 	mddev->suspend_lo = 0;
> > > 	mddev->pers->quiesce(mddev, 1);
> > > 	mddev->pers->quiesce(mddev, 0);
> > > }
> > >
> > > The other option is accept for setting suspend_lo/hi to 0 when
> there
> > is no array processing (reshape), but first change I think is better.
> > > What is your opinion?
> >
> > Why do you want to set suspend_hi to zero after a reshape.
> > Just set both suspend_hi and suspend_lo to the size of the array
> (which
> > is
> > where the above process would get them to) and leave them there.
> >
> > NeilBrown
> 
> I'll try to set those values as you described.
> 
> I wanted to set suspend_lo/hi to 0 to get configuration of those
> entries back to state before reshape.
> I think that way, if I cannot manage those keys after reshape than how
> can I repeat reshape process (i.e. with other grow parameters).
> I will need to manage them before I start next operation. After reshape
> array (imho) should be ready for any next action. I think it is not
> ready now.
> I'm right?

OK, it works :), after setting those values to the end they can be moved to 0 again.


> 
> BR
> Adam
> 
> >
> > >
> > > BR
> > > Adam
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> raid"
> > in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-12-09 16:08 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-06 13:20 [PATCH 00/27] OLCE, migrations and raid10 takeover Adam Kwolek
2010-12-06 13:20 ` [PATCH 01/27] FIX: wait_backup() sometimes hangs Adam Kwolek
2010-12-06 13:21 ` [PATCH 02/27] Add state_of_reshape for external metadata Adam Kwolek
2010-12-06 13:21 ` [PATCH 03/27] imsm: Prepare reshape_update in mdadm Adam Kwolek
2010-12-08  3:10   ` Neil Brown
2010-12-08 14:18     ` Kwolek, Adam
2010-12-08 22:05       ` Neil Brown
2010-12-09  8:42         ` Suspend_hi mamagment during reshape Kwolek, Adam
2010-12-09 10:28           ` Neil Brown
2010-12-09 15:59             ` Kwolek, Adam
2010-12-09 16:08               ` Kwolek, Adam
2010-12-06 13:21 ` [PATCH 04/27] imsm: Process reshape_update in mdmon Adam Kwolek
2010-12-06 13:21 ` [PATCH 05/27] imsm: Block array state change during reshape Adam Kwolek
2010-12-06 13:21 ` [PATCH 06/27] Process reshape initialization by managemon Adam Kwolek
2010-12-06 13:21 ` [PATCH 07/27] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
2010-12-06 13:21 ` [PATCH 08/27] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
2010-12-06 13:21 ` [PATCH 09/27] imsm: Do not accept messages sent by mdadm Adam Kwolek
2010-12-06 13:22 ` [PATCH 10/27] imsm: Do not indicate resync during reshape Adam Kwolek
2010-12-06 13:22 ` [PATCH 11/27] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
2010-12-06 13:22 ` [PATCH 12/27] Control reshape in mdadm Adam Kwolek
2010-12-06 13:22 ` [PATCH 13/27] Finalize reshape after adding disks to array Adam Kwolek
2010-12-06 13:22 ` [PATCH 14/27] Add reshape progress updating Adam Kwolek
2010-12-06 13:22 ` [PATCH 15/27] WORKAROUND: md reports idle state during reshape start Adam Kwolek
2010-12-06 13:22 ` [PATCH 16/27] FIX: core during getting map Adam Kwolek
2010-12-06 13:22 ` [PATCH 17/27] Enable reshape for subarrays Adam Kwolek
2010-12-06 13:23 ` [PATCH 18/27] Change manage_reshape() placement Adam Kwolek
2010-12-06 13:23 ` [PATCH 19/27] Migration: raid5->raid0 Adam Kwolek
2010-12-06 13:23 ` [PATCH 20/27] Detect level change Adam Kwolek
2010-12-06 13:23 ` [PATCH 21/27] Migration raid0->raid5 Adam Kwolek
2010-12-06 13:23 ` [PATCH 22/27] Read chunk size and layout from mdstat Adam Kwolek
2010-12-06 13:23 ` [PATCH 23/27] Migration: Chunk size migration Adam Kwolek
2010-12-06 13:23 ` [PATCH 24/27] Add takeover support for external meta Adam Kwolek
2010-12-06 13:24 ` [PATCH 25/27] Takeover raid10 -> raid0 for external metadata Adam Kwolek
2010-12-06 13:24 ` [PATCH 26/27] Takeover raid0 -> raid10 " Adam Kwolek
2010-12-06 13:24 ` [PATCH 27/27] FIX: Problem with removing array after takeover Adam Kwolek
2010-12-07 10:18 ` [PATCH 00/27] OLCE, migrations and raid10 takeover Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.