[RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly
@ 2014-04-24  7:22 Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

As mentioned in the kernel enabling [1] these cache volumes augment the
existing imsm model in that cache legs each imply a container.  In the
standard cache configuration two single-drive-raid0 volumes (from
separate containers) are associated into a cached volume.  The diagram
below attempts to make this clearer.

+-----------------------+              +----------------------+             
|          sda          | SSD          |         sdb          | HDD         
| +-------------------+ |              | +------------------+ |             
| |   /dev/md/imsm0   | | Container0   | |  /dev/md/imsm1   | | Container1  
| | +---------------+ | |              | | +--------------+ | |             
| | | /dev/md/vol0  | | | RAID Volume0 | | | /dev/md/vol1 | | | RAID Volume1
| | |  +---------+  | | |              | | | +----------+ | | |             
| | |  |SRT Cache|  | | |              | | | |SRT Target| | | |             
+-+-+--+----+----+--+-+-+              +-+-+-+----+-----+-+-+-+             
            |                                     |                         
            |                                     |                         
            |          HDD Cached by SSD          |                         
            |           +--------------+          |                         
            +-----------+ /dev/md/isrt +----------+                         
                        +--------------+                                    

In support of the standard mdadm volume discovery model a uuid a
synthesized from the combination of the two container-family-numbers and
immutable volume-ids.  Examine_brief is modified to aggregate cache legs
across containers.

Create support is not included, but existing volumes can be
auto-assembled:

  mdadm -Ebs > conf
  mdadm -Asc conf

To facilitate testing the patches are also available on github, but note
that this branch will rebase according to review feedback.

  git://github.com/djbw/mdadm isrt

[1]: http://marc.info/?l=linux-raid&m=139832034826379&w=2

---

Dan Williams (11):
      sysfs: fix sysfs_set_array() to accept valid negative array levels
      make must_be_container() more selective
      Assemble: show the uuid in the verbose case
      Assemble: teardown partially assembled arrays
      Examine: support for coalescing "cache legs"
      imsm: immutable volume id
      imsm: cache metadata definitions
      imsm: read cache metadata
      imsm: examine cache configurations
      imsm: assemble cache volumes
      imsm: support cache enabled arrays


 Assemble.c    |   27 ++-
 Examine.c     |   92 +++++++++-
 Makefile      |    2 
 isrt-intel.h  |  270 ++++++++++++++++++++++++++++++
 maps.c        |    1 
 mdadm.h       |    4 
 super-intel.c |  516 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 sysfs.c       |   13 +
 util.c        |   18 +-
 9 files changed, 865 insertions(+), 78 deletions(-)
 create mode 100644 isrt-intel.h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

From: Dan Williams <dan.j.william@intel.com>

Assume this FIXME was to prevent loading a personality for containers.
Fix it up to accept the values that correlate with the actual md kernel
personalities.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 sysfs.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/sysfs.c b/sysfs.c
index 9a1d856960e8..4cbd4e5d051b 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -628,8 +628,9 @@ int sysfs_set_array(struct mdinfo *info, int vers)
 			return 1;
 		}
 	}
-	if (info->array.level < 0)
-		return 0; /* FIXME */
+	/* containers have no personality, they're rather bland */
+	if (info->array.level <= LEVEL_CONTAINER)
+		return 0;
 	rv |= sysfs_set_str(info, NULL, "level",
 			    map_num(pers, info->array.level));
 	if (info->reshape_active && info->delta_disks != UnSet)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 02/11] make must_be_container() more selective
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Cache configurations in mid-assembly may appear to be a "container" as
they are 0-sized and external.  Teach must_be_container() to look for
"external:<metadata name>" as the container identfier from
<sysfs>/md/metadata_version.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 util.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/util.c b/util.c
index afb2bb110f24..93f9200fa4c7 100644
--- a/util.c
+++ b/util.c
@@ -1176,18 +1176,19 @@ int get_dev_size(int fd, char *dname, unsigned long long *sizep)
 }
 
 /* Return true if this can only be a container, not a member device.
- * i.e. is and md device and size is zero
+ * i.e. is and md device and the text_version matches an external
+ * metadata format
  */
 int must_be_container(int fd)
 {
-	unsigned long long size;
-	if (md_get_version(fd) < 0)
+	struct mdinfo *sra = sysfs_read(fd, NULL, GET_VERSION);
+	struct superswitch *ss;
+
+	if (!sra)
 		return 0;
-	if (get_dev_size(fd, NULL, &size) == 0)
-		return 1;
-	if (size == 0)
-		return 1;
-	return 0;
+	ss = version_to_superswitch(sra->text_version);
+	sysfs_free(sra);
+	return ss ? ss->external : 0;
 }
 
 /* Sets endofpart parameter to the last block used by the last GPT partition on the device.


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Make the verbose output more usable when the array name is not given.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Assemble.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 05ace561fb50..a72d427f4773 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1288,9 +1288,21 @@ try_again:
 	 */
 	if (!st && ident->st)
 		st = ident->st;
-	if (c->verbose>0)
-		pr_err("looking for devices for %s\n",
-		       mddev ? mddev : "further assembly");
+
+	if (c->verbose > 0) {
+		char uuid[64], *id;
+
+		if (mddev)
+			id = mddev;
+		else if (ident->uuid_set) {
+			__fname_from_uuid(ident->uuid,
+					  st ? st->ss->swapuuid : 0,
+					  uuid, ':');
+			id = uuid + 5;
+		} else
+			id = "further assembly";
+		pr_err("looking for devices for %s\n", id);
+	}
 
 	content = &info;
 	if (st)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (2 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

In the scenario of assembling composite md arrays, e.g. /dev/md2 with
/dev/md1 and /dev/md0 as components, we want /dev/md2 assembly to be
delayed until the components are available.  If we attempt to assemble
/dev/md2 when only /dev/md0 is available /dev/md2 will not be fully
initialized, and /dev/md1 will be assembled to an equally defunct
/dev/md3.

So teardown the early /dev/md2 on the expectation that more devices will
arrive in a later assembly pass.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Assemble.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index a72d427f4773..9a2399ef6411 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1147,16 +1147,17 @@ static int start_array(int mdfd,
 		if (sparecnt)
 			fprintf(stderr, " and %d spare%s", sparecnt, sparecnt==1?"":"s");
 		if (!enough(content->array.level, content->array.raid_disks,
-			    content->array.layout, 1, avail))
+			    content->array.layout, 1, avail)) {
 			fprintf(stderr, " - not enough to start the array.\n");
-		else if (!enough(content->array.level,
+			ioctl(mdfd, STOP_ARRAY, NULL);
+		} else if (!enough(content->array.level,
 				 content->array.raid_disks,
 				 content->array.layout, clean,
-				 avail))
+				 avail)) {
 			fprintf(stderr, " - not enough to start the "
 				"array while not clean - consider "
 				"--force.\n");
-		else {
+		} else {
 			if (req_cnt == (unsigned)content->array.raid_disks)
 				fprintf(stderr, " - need all %d to start it", req_cnt);
 			else


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs"
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (3 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

'isrt' volumes introduce a new category of a array.  They have
components that are subarrays from separate containers (likely thanks to
the constraint that all active members in an imsm container must be
members of all subarrays).  We want '-Eb' to identify the composite
volume uuid, but the default coalescing (by container) results in
duplicated output of the cache volume uuid.

Instead, introduce infrastructure to handle this directly.

1/ add ->cache_legs to struct mdinfo to indicate how many subarrays in a given
   container are components (legs) of a cache association.

2/ add ->cache_leg to struct supertype to indicate a cache leg to
   enumerate via ->getinfo_super()

3/ teach Examine to coalesce cache volumes across containers by uuid and
   dump their details via ->brief_examine_cache() extension to struct
   superswitch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Examine.c |   92 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 mdadm.h   |    3 ++
 2 files changed, 83 insertions(+), 12 deletions(-)

diff --git a/Examine.c b/Examine.c
index 953b8eee2360..945af8454a5f 100644
--- a/Examine.c
+++ b/Examine.c
@@ -30,6 +30,73 @@
 #endif
 #include	"md_u.h"
 #include	"md_p.h"
+
+struct array {
+	struct supertype *st;
+	struct mdinfo info;
+	void *devs;
+	struct array *next;
+	int spares;
+	int cache_leg;
+};
+
+static struct array *add_cache_legs(struct array *caches, struct supertype *st,
+				    struct mdinfo *info, struct array *arrays)
+{
+	struct mdinfo cache_info;
+	struct array *ap;
+	int i;
+
+	for (i = 1; i <= info->cache_legs; i++) {
+		/* in the case where the cache leg is assembled its uuid
+		 * may appear in the arrays list, so we need to check
+		 * both the caches list and the arrays list for
+		 * duplicates
+		 */
+		struct array *lists[] = { caches, arrays };
+		int j;
+
+		st->cache_leg = i;
+		st->ss->getinfo_super(st, &cache_info, NULL);
+		st->cache_leg = 0;
+		for (j = 0; j < 2; j++) {
+			for (ap = lists[j]; ap; ap = ap->next) {
+				if (st->ss == ap->st->ss
+				    && same_uuid(ap->info.uuid, cache_info.uuid,
+						 st->ss->swapuuid))
+					break;
+			}
+			if (ap)
+				break;
+		}
+		if (!ap) {
+			ap = xcalloc(1, sizeof(*ap));
+			ap->devs = dl_head();
+			ap->next = caches;
+			ap->st = st;
+			ap->cache_leg = i;
+			caches = ap;
+			memcpy(&ap->info, &cache_info, sizeof(cache_info));
+		}
+	}
+
+	return caches;
+}
+
+static void free_arrays(struct array *arrays)
+{
+	struct array *ap;
+
+	while (arrays) {
+		ap = arrays;
+		arrays = ap->next;
+
+		ap->st->ss->free_super(ap->st);
+		free(ap);
+	}
+}
+
+
 int Examine(struct mddev_dev *devlist,
 	    struct context *c,
 	    struct supertype *forcest)
@@ -54,14 +121,7 @@ int Examine(struct mddev_dev *devlist,
 	int fd;
 	int rv = 0;
 	int err = 0;
-
-	struct array {
-		struct supertype *st;
-		struct mdinfo info;
-		void *devs;
-		struct array *next;
-		int spares;
-	} *arrays = NULL;
+	struct array *arrays = NULL, *caches = NULL;
 
 	for (; devlist ; devlist = devlist->next) {
 		struct supertype *st;
@@ -131,13 +191,14 @@ int Examine(struct mddev_dev *devlist,
 					break;
 			}
 			if (!ap) {
-				ap = xmalloc(sizeof(*ap));
+				ap = xcalloc(1, sizeof(*ap));
 				ap->devs = dl_head();
 				ap->next = arrays;
-				ap->spares = 0;
 				ap->st = st;
 				arrays = ap;
 				st->ss->getinfo_super(st, &ap->info, NULL);
+				caches = add_cache_legs(caches, st, &ap->info,
+							arrays);
 			} else
 				st->ss->getinfo_super(st, &ap->info, NULL);
 			if (!have_container &&
@@ -179,11 +240,18 @@ int Examine(struct mddev_dev *devlist,
 					printf("\n");
 				ap->st->ss->brief_examine_subarrays(ap->st, c->verbose);
 			}
-			ap->st->ss->free_super(ap->st);
-			/* FIXME free ap */
 			if (ap->spares || c->verbose > 0)
 				printf("\n");
 		}
+		/* list container caches after their parent containers
+		 * and subarrays
+		 */
+		for (ap = caches; ap; ap = ap->next)
+			if (ap->st->ss->brief_examine_cache)
+				ap->st->ss->brief_examine_cache(ap->st, ap->cache_leg);
+		free_arrays(arrays);
+		free_arrays(caches);
+
 	}
 	return rv;
 }
diff --git a/mdadm.h b/mdadm.h
index f6a614e19316..111f90f599af 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -233,6 +233,7 @@ struct mdinfo {
 	int container_enough; /* flag external handlers can set to
 			       * indicate that subarrays have not enough (-1),
 			       * enough to start (0), or all expected disks (1) */
+	int cache_legs; /* number of cross-container cache members in this 'array' */
 	char		sys_name[20];
 	struct mdinfo *devs;
 	struct mdinfo *next;
@@ -684,6 +685,7 @@ extern struct superswitch {
 	void (*examine_super)(struct supertype *st, char *homehost);
 	void (*brief_examine_super)(struct supertype *st, int verbose);
 	void (*brief_examine_subarrays)(struct supertype *st, int verbose);
+	void (*brief_examine_cache)(struct supertype *st, int cache);
 	void (*export_examine_super)(struct supertype *st);
 	int (*examine_badblocks)(struct supertype *st, int fd, char *devname);
 	int (*copy_metadata)(struct supertype *st, int from, int to);
@@ -1006,6 +1008,7 @@ struct supertype {
 				 Used when examining metadata to display content of disk
 				 when user has no hw/firmare compatible system.
 			      */
+	int cache_leg; /* hack to interrogate cache legs within containers */
 	struct metadata_update *updates;
 	struct metadata_update **update_tail;
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 06/11] imsm: immutable volume id
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (4 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Support the new extensions to have a volume-id that is immutable and
unique for the life of the container.  Prior to this change deleting and
recreating a volume would result in it having the same uuid as the
previous volume in that position.  Now, every time a volume is created a
container generation count is incremented allowing the volume-ids to
include container generation salt.

TODO update kill_subarray_imsm() and update_subarray_imsm() to allow
deletion and renaming (respectively) of arrays with an immutable id.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 super-intel.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index f0a7ab5ccc7a..07e4c68982cd 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -188,8 +188,9 @@ struct imsm_dev {
 	__u16 cache_policy;
 	__u8  cng_state;
 	__u8  cng_sub_state;
-#define IMSM_DEV_FILLERS 10
-	__u32 filler[IMSM_DEV_FILLERS];
+	__u16 dev_id;
+	__u16 fill;
+	__u32 filler[9];
 	struct imsm_vol vol;
 } __attribute__ ((packed));
 
@@ -209,8 +210,9 @@ struct imsm_super {
 	__u32 orig_family_num;		/* 0x40 - 0x43 original family num */
 	__u32 pwr_cycle_count;		/* 0x44 - 0x47 simulated power cycle count for array */
 	__u32 bbm_log_size;		/* 0x48 - 0x4B - size of bad Block Mgmt Log in bytes */
-#define IMSM_FILLERS 35
-	__u32 filler[IMSM_FILLERS];	/* 0x4C - 0xD7 RAID_MPB_FILLERS */
+	__u16 create_events;		/* counter for generating unique ids */
+	__u16 fill1;
+	__u32 filler[34];
 	struct imsm_disk disk[1];	/* 0xD8 diskTbl[numDisks] */
 	/* here comes imsm_dev[num_raid_devs] */
 	/* here comes BBM logs */
@@ -1984,6 +1986,30 @@ static int match_home_imsm(struct supertype *st, char *homehost)
 	return -1;
 }
 
+static void volume_uuid_from_super(struct intel_super *super, struct sha1_ctx *ctx)
+{
+	struct imsm_dev *dev = NULL;
+
+	if (super->current_vol >= 0)
+		dev = get_imsm_dev(super, super->current_vol);
+
+	if (!dev)
+		return;
+
+	/* if the container is tracking creation events then dev_id is
+	 * valid and we can advertise an immutable uuid, otherwise use the
+	 * old volume-position/name method
+	 */
+	if (super->anchor->create_events) {
+		sha1_process_bytes(&dev->dev_id, sizeof(dev->dev_id), ctx);
+	} else {
+		__u32 vol = super->current_vol;
+
+		sha1_process_bytes(&vol, sizeof(vol), ctx);
+		sha1_process_bytes(dev->volume, MAX_RAID_SERIAL_LEN, ctx);
+	}
+}
+
 static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
 {
 	/* The uuid returned here is used for:
@@ -2012,7 +2038,6 @@ static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
 
 	char buf[20];
 	struct sha1_ctx ctx;
-	struct imsm_dev *dev = NULL;
 	__u32 family_num;
 
 	/* some mdadm versions failed to set ->orig_family_num, in which
@@ -2025,13 +2050,7 @@ static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
 	sha1_init_ctx(&ctx);
 	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
 	sha1_process_bytes(&family_num, sizeof(__u32), &ctx);
-	if (super->current_vol >= 0)
-		dev = get_imsm_dev(super, super->current_vol);
-	if (dev) {
-		__u32 vol = super->current_vol;
-		sha1_process_bytes(&vol, sizeof(vol), &ctx);
-		sha1_process_bytes(dev->volume, MAX_RAID_SERIAL_LEN, &ctx);
-	}
+	volume_uuid_from_super(super, &ctx);
 	sha1_finish_ctx(&ctx, buf);
 	memcpy(uuid, buf, 4*4);
 }
@@ -4562,6 +4581,39 @@ static int check_name(struct intel_super *super, char *name, int quiet)
 	return !reason;
 }
 
+static void new_dev_id(struct intel_super *super, struct imsm_dev *new_dev)
+{
+	struct imsm_super *mpb = super->anchor;
+	__u16 create_events, i;
+
+	/* only turn on create_events tracking for newly born
+	 * containers, lest we change the uuid of live volumes (see
+	 * volume_uuid_from_super())
+	 */
+	create_events = __le16_to_cpu(mpb->create_events);
+	if (super->current_vol > 0 && !create_events)
+		return;
+
+	/* catch the case of create_events wrapping to an existing id in
+	 * the mpb
+	 */
+	do {
+		create_events++;
+		/* wrap to 1, because zero means no dev_id support */
+		if (create_events == 0)
+			create_events = 1;
+		for (i = 0; i < mpb->num_raid_devs; i++) {
+			struct imsm_dev *dev = __get_imsm_dev(mpb, i);
+
+			if (dev->dev_id == __cpu_to_le16(create_events))
+				break;
+		}
+	} while (i < mpb->num_raid_devs);
+
+	new_dev->dev_id = __cpu_to_le16(create_events);
+	mpb->create_events = __cpu_to_le16(create_events);
+}
+
 static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
 				  unsigned long long size, char *name,
 				  char *homehost, int *uuid,
@@ -4655,6 +4707,7 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
 		return 0;
 	dv = xmalloc(sizeof(*dv));
 	dev = xcalloc(1, sizeof(*dev) + sizeof(__u32) * (info->raid_disks - 1));
+	new_dev_id(super, dev);
 	strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_LEN);
 	array_blocks = calc_array_size(info->level, info->raid_disks,
 					       info->layout, info->chunk_size,


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 07/11] imsm: cache metadata definitions
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (5 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Makefile      |    2 
 isrt-intel.h  |  256 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 super-intel.c |   16 +++-
 3 files changed, 271 insertions(+), 3 deletions(-)
 create mode 100644 isrt-intel.h

diff --git a/Makefile b/Makefile
index b823d85f89e3..7d50df69a744 100644
--- a/Makefile
+++ b/Makefile
@@ -127,7 +127,7 @@ CHECK_OBJS = restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
 
 SRCS =  $(patsubst %.o,%.c,$(OBJS))
 
-INCL = mdadm.h part.h bitmap.h
+INCL = mdadm.h part.h bitmap.h isrt-intel.h platform-intel.h
 
 MON_OBJS = mdmon.o monitor.o managemon.o util.o maps.o mdstat.o sysfs.o \
 	policy.o lib.o \
diff --git a/isrt-intel.h b/isrt-intel.h
new file mode 100644
index 000000000000..50365de1a620
--- /dev/null
+++ b/isrt-intel.h
@@ -0,0 +1,256 @@
+/*
+ * mdadm - Intel(R) Smart Response Technology Support
+ *
+ * Copyright (C) 2011-2014 Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef __ISRT_INTEL_H__
+#define __ISRT_INTEL_H__
+
+enum {
+	/* for a given cache device how many volumes can be associated */
+	MAX_NV_CACHE_VOLS = 1,
+	/* likely should be dynamically configurable when this driver is
+	 * made more generic
+	 */
+	ISRT_FRAME_SIZE = 8192,
+	VOL_CONFIG_RESERVED = 32,
+	MD_HEADER_RESERVED = 32,
+	MAX_RAID_SERIAL_LEN = 16,
+	NVC_SIG_LEN = 32,
+	ISRT_DEV_IDX = 0,
+	ISRT_TARGET_DEV_IDX = 1,
+
+	NV_CACHE_MODE_OFF          = 0,
+	NV_CACHE_MODE_OFF_TO_SAFE  = 1, /* powerfail recovery state */
+	NV_CACHE_MODE_OFF_TO_PERF  = 2, /* powerfail recovery state */
+	NV_CACHE_MODE_SAFE         = 3,
+	NV_CACHE_MODE_SAFE_TO_OFF  = 4,
+	NV_CACHE_MODE_PERF         = 5,
+	NV_CACHE_MODE_PERF_TO_OFF  = 6,
+	NV_CACHE_MODE_PERF_TO_SAFE = 7,
+	NV_CACHE_MODE_IS_FAILING   = 8,
+	NV_CACHE_MODE_HAS_FAILED   = 9,
+	NV_CACHE_MODE_DIS_PERF     = 10, /* caching on volume or nv cache disabled */
+	NV_CACHE_MODE_DIS_SAFE     = 11, /* volume or NV cache not associated */
+};
+
+struct segment_index_pair {
+	__u32 segment;
+	__u32 index;
+};
+
+#define NV_CACHE_CONFIG_SIG "Intel IMSM NV Cache Cfg. Sig.   "
+#define MAX_NVC_SIZE_GB            128UL      /* Max NvCache we can support is 128GB */
+#define NVC_FRAME_SIZE             8192UL
+#define NVC_FRAME_SIZE_IN_KB       (NVC_FRAME_SIZE / 1024UL)                  /* 8 */
+#define NVC_FRAMES_PER_GB          (1024UL * (1024UL / NVC_FRAME_SIZE_IN_KB))   /* 128k */
+#define MAX_NVC_FRAMES             (MAX_NVC_SIZE_GB * NVC_FRAMES_PER_GB)    /* 16m */
+#define SEGIDX_PAIRS_PER_NVC_FRAME (NVC_FRAME_SIZE / sizeof(struct segment_index_pair)) /* 1k */
+#define SEGHEAP_SEGS_PER_NVC_FRAME (NVC_FRAME_SIZE / sizeof(__u32)) /* 2k */
+#define FRAMES_PER_SEGHEAP_FRAME   (SEGIDX_PAIRS_PER_NVC_FRAME \
+				    * SEGHEAP_SEGS_PER_NVC_FRAME) /* 2m */
+#define MAX_SEGHEAP_NVC_FRAMES     (MAX_NVC_FRAMES/FRAMES_PER_SEGHEAP_FRAME)  /* 8 */
+#define MAX_SEGHEAP_TOC_ENTRIES    (MAX_SEGHEAP_NVC_FRAMES + 1)
+
+
+/* XXX: size of enum guarantees? */
+enum nvc_shutdown_state {
+	ShutdownStateNormal,
+	ShutdownStateS4CrashDmpStart,
+	ShutdownStateS4CrashDmpEnd,
+	ShutdownStateS4CrashDmpFailed
+};
+
+struct isrt_mpb {
+	/*
+	 * Metadata array (packed_md0_nba or packed_md1_nba).  is the base for
+	 * the Metadata Delta Log changes.  The current contents of the Metadata
+	 * Delta Log applied to this packed metadata base becomes the working
+	 * packed metadata upon recovery from a power failure.  The alternate
+	 * packed metadata array, indicated by (md_base_for_delta_log ^1) is
+	 * where the next complete write of packed metadata from DRAM will be
+	 * written. On a clean shutdown, packed metadata will also be written to
+	 * the alternate array.
+	 */
+	__u32 packed_md0_nba; /* Start of primary packed metadata array */
+	__u32 packed_md1_nba; /* Start of secondary packed metadata array */
+	__u32 md_base_for_delta_log; /* 0 or 1. Indicates which packed */
+	__u32 packed_md_size; /* Size of packed metadata array in bytes */
+	__u32 aux_packed_md_nba; /* Start of array of extra metadata for driver use */
+	__u32 aux_packed_md_size; /* Size of array of extra metadata for driver use */
+	__u32 cache_frame0_nba; /* Start of actual cache frames */
+	__u32 seg_num_index_nba; /* Start of the Seg_num_index array */
+	__u32 seg_num_heap_nba; /* Start of the Seg_num_heap */
+	__u32 seg_num_heap_size; /* Size of the Seg_num Heap in bytes (always a */
+	/*
+	 * Multiple of NVM_PAGE_SIZE bytes. The Seg_nums in the tail of the last
+	 * page are all set to 0xFFFFFFFF
+	 */
+	__u32 seg_heap_toc[MAX_SEGHEAP_TOC_ENTRIES];
+	__u32 md_delta_log_nba; /* Start of the Metadata Delta Log region */
+	/*  The Delta Log is a circular buffer */
+	__u32 md_delta_log_max_size; /* Size of the Metadata Delta Log region in bytes */
+	__u32 orom_frames_to_sync_nba; /* Start of the orom_frames_to_sync record */
+	__u32 num_cache_frames; /* Total number of cache frames */
+	__u32 cache_frame_size; /* Size of each cache frame in bytes */
+	__u32 lba_alignment; /* Offset to add to host I/O request LBA before
+			       * shifting to form the segment number
+			       */
+	__u32 valid_frame_gen_num; /* Valid cache frame generation number */
+	/*
+	 * If the cache frame metadata contains a smaller generation number,
+	 * that frame's contents are considered invalid.
+	 */
+	__u32 packed_md_frame_gen_num; /* Packed metadata frame generation number */
+	/*
+	 * This is the frame generation number associated with all frames in the
+	 * packed metadata array. If this is < valid_frame_gen_num, then all
+	 * frames in packed metadata are considered invalid.
+	 */
+	__u32 curr_clean_batch_num; /* Initialized to 0, incremented whenever
+				      * the cache goes clean. If this value is
+				      * greater than the Nv_cache_metadata
+				      * dirty_batch_num in the atomic metadata
+				      * of the cache frame, the frame is
+				      * considered clean.
+				      */
+	__u32 total_used_sectors; /* Total number of NVM sectors of size
+				    * NVM_SECTOR_SIZE used by cache frames and
+				    * metadata.
+				    */
+	/* OROM I/O Log fields */
+	__u32 orom_log_nba; /* OROM I/O Log area for next boot */
+	__u32 orom_log_size; /* OROM I/O Log size in 512-byte blocks */
+
+	/* Hibernate/Crashdump Extent_log */
+	__u32 s4_crash_dmp_extent_log_nba; /* I/O Extent Log area created by the */
+					   /* hibernate/crashdump driver for OROM */
+	/* Driver shutdown state utilized by the OROM */
+	enum nvc_shutdown_state driver_shutdown_state;
+
+	__u32 validity_bits;
+	__u64 nvc_hdr_array_in_dram;
+
+	/* The following fields are used in managing the Metadata Delta Log. */
+
+	/*
+	 * Every delta record in the Metadata Delta Log  has a copy of the value
+	 * of this field at the time the record was written. This gen num is
+	 * incremented by 1 every time the log fills up, and allows powerfail
+	 * recovery to easily find the end of the log (it's the first record
+	 * whose gen num field is < curr_delta_log_gen_num.)
+	 */
+	__u32 curr_delta_log_gen_num;
+	/*
+	 * This is the Nba to the start of the current generation of delta
+	 * records in the log.  Since the log is circular, the currentlog
+	 * extends from md_delta_log_first up to and including
+	 * (md_delta_log_first +max_records-2) % max_records) NOTE: when reading
+	 * the delta log, the actual end of the log is indicated by the first
+	 * record whose gen num field is <curr_delta_log_gen_num, so the
+	 * 'max_records-2' guarantees we'll have at least one delta record whose
+	 * gen num field will qualify to mark the end of the log.
+	 */
+	__u32 md_delta_log_first;
+	/*
+	 * How many free frames are used in the Metadata Delta Log. After every
+	 * write of a delta log record that contains at least one
+	 * Md_delta_log_entry, there must always be exactly
+	 */
+
+	__u32 md_delta_log_num_free_frames;
+	__u32 num_dirty_frames; /* Number of dirty frames in cache when this
+				  * isrt_mpb was written.
+				  */
+	__u32 num_dirty_frames_at_mode_trans; /* Number of dirty frames from
+						* the start of the most recent
+						* transition out of Performance
+						* mode (Perf_to_safe/Perf_to_off)
+						*/
+} __attribute__((packed));
+
+
+struct nv_cache_vol_config_md {
+	__u32 acc_vol_orig_family_num; /* Unique Volume Id of the accelerated
+					 * volume caching to the NVC Volume
+					 */
+	__u16 acc_vol_dev_id; /* (original family + dev_id ) if there is no
+				* volume associated with Nv_cache, both of these
+				* fields are 0.
+				*/
+	__u16 nv_cache_mode; /* NV Cache mode of this volume */
+	/*
+	 * The serial_no of the accelerated volume associated with Nv_cache.  If
+	 * there is no volume associated with Nv_cache, acc_vol_name[0] = 0
+	 */
+	char acc_vol_name[MAX_RAID_SERIAL_LEN];
+	__u32 flags;
+	__u32 power_cycle_count; /* Power Cycle Count of the underlying disk or
+				   * volume from the last device enumeration.
+				   */
+	/* Used to determine separation case. */
+	__u32  expansion_space[VOL_CONFIG_RESERVED];
+} __attribute__((packed));
+
+struct nv_cache_config_md_header {
+	char signature[NVC_SIG_LEN]; /* "Intel IMSM NV Cache Cfg. Sig.   " */
+	__u16  version_number; /* NV_CACHE_CFG_MD_VERSION */
+	__u16  header_length; /* Length by bytes */
+	__u32  total_length; /* Length of the entire Config Metadata including
+			       * header and volume(s) in bytes
+			       */
+	/* Elements above here will never change even in new versions */
+	__u16  num_volumes; /* Number of volumes that have config metadata. in
+			      * 9.0 it's either 0 or 1
+			      */
+	__u32 expansion_space[MD_HEADER_RESERVED];
+	struct nv_cache_vol_config_md vol_config_md[MAX_NV_CACHE_VOLS]; /* Array of Volume */
+	/* Config Metadata entries. Contains "num_volumes" */
+	/* entries. In 9.0 'MAX_NV_CACHE_VOLS' = 1. */
+} __attribute__((packed));
+
+struct nv_cache_control_data {
+	struct nv_cache_config_md_header hdr;
+	struct isrt_mpb mpb;
+} __attribute__((packed));
+
+/* One or more sectors in NAND page are bad */
+#define NVC_PACKED_SECTORS_BAD (1 << 0)
+#define NVC_PACKED_DIRTY (1 << 1)
+#define NVC_PACKED_FRAME_TYPE_SHIFT (2)
+/* If set, frame is in clean area of LRU list */
+#define NVC_PACKED_IN_CLEAN_AREA (1 << 5)
+/*
+ * This frame was TRIMMed (OROM shouldn't expect the delta log rebuild to match
+ * the packed metadata stored on a clean shutdown.
+ */
+#define NVC_PACKED_TRIMMED (1 << 6)
+
+struct nv_cache_packed_md {
+	__u32 seg_num; /* Disk Segment currently assigned to frame */
+	__u16 per_sector_validity; /* Per sector validity */
+	__u8 flags;
+	union {
+		__u8 pad;
+		/* repurpose padding for driver state */
+		__u8 locked;
+	};
+} __attribute__((packed));
+
+#define SEGMENTS_PER_PAGE_SHIFT 6
+#define SEGMENTS_PER_PAGE (1 << SEGMENTS_PER_PAGE_SHIFT)
+#define SEGMENTS_PER_PAGE_MASK (SEGMENTS_PER_PAGE-1)
+#define FRAME_SHIFT 4
+#define SECTORS_PER_FRAME (1 << FRAME_SHIFT)
+#define FRAME_MASK (SECTORS_PER_FRAME-1)
+
+#endif /* __ISRT_INTEL_H__ */
diff --git a/super-intel.c b/super-intel.c
index 07e4c68982cd..acc46368322f 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -22,6 +22,7 @@
 #include "mdmon.h"
 #include "sha1.h"
 #include "platform-intel.h"
+#include "isrt-intel.h"
 #include <values.h>
 #include <scsi/sg.h>
 #include <ctype.h>
@@ -39,7 +40,6 @@
 #define MPB_VERSION_CNG "1.2.06"
 #define MPB_VERSION_ATTRIBS "1.3.00"
 #define MAX_SIGNATURE_LENGTH  32
-#define MAX_RAID_SERIAL_LEN   16
 
 /* supports RAID0 */
 #define MPB_ATTRIB_RAID0		__cpu_to_le32(0x00000001)
@@ -179,6 +179,8 @@ struct imsm_dev {
 #define DEV_CLONE_N_GO		__cpu_to_le32(0x400)
 #define DEV_CLONE_MAN_SYNC	__cpu_to_le32(0x800)
 #define DEV_CNG_MASTER_DISK_NUM	__cpu_to_le32(0x1000)
+/* Volume is being used as NvCache for an accelerated volume */
+#define DEV_NVC_VOLUME          __cpu_to_le32(0x4000)
 	__u32 status;	/* Persistent RaidDev status */
 	__u32 reserved_blocks; /* Reserved blocks at beginning of volume */
 	__u8  migr_priority;
@@ -189,8 +191,18 @@ struct imsm_dev {
 	__u8  cng_state;
 	__u8  cng_sub_state;
 	__u16 dev_id;
+	__u8 nv_cache_mode;
+#define DEV_NVC_CLEAN		(0)
+#define DEV_NVC_DIRTY		(1)
+#define DEV_NVC_HEALTH_GOOD     (0 << 1)
+#define DEV_NVC_HEALTH_FAILED	(1 << 1)
+#define DEV_NVC_HEALTH_READONLY	(2 << 1)
+#define DEV_NVC_HEALTH_BACKUP	(3 << 1)
+	__u8 nv_cache_flags;
+	__u32 nvc_orig_family_num; /* Unique Volume Id of the cache */
+	__u16 nvc_dev_id;	   /* volume associated with this volume */
 	__u16 fill;
-	__u32 filler[9];
+	__u32 filler[7];
 	struct imsm_vol vol;
 } __attribute__ ((packed));
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 08/11] imsm: read cache metadata
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (6 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Add support for identifying cache volumes and retrieving the associated
cache mpb located at the start of the volume marked with a
DEV_NVC_VOLUME flag.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 super-intel.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index acc46368322f..f179d80b8209 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -345,6 +345,7 @@ static unsigned int mpb_sectors(struct imsm_super *mpb)
 struct intel_dev {
 	struct imsm_dev *dev;
 	struct intel_dev *next;
+	struct nv_cache_control_data *nvc;
 	unsigned index;
 };
 
@@ -3086,6 +3087,7 @@ static void free_devlist(struct intel_super *super)
 
 	while (super->devlist) {
 		dv = super->devlist->next;
+		free(super->devlist->nvc);
 		free(super->devlist->dev);
 		free(super->devlist);
 		super->devlist = dv;
@@ -3467,9 +3469,34 @@ static void end_migration(struct imsm_dev *dev, struct intel_super *super,
 }
 #endif
 
-static int parse_raid_devices(struct intel_super *super)
+static int load_cache(int fd, struct intel_dev *dv)
 {
-	int i;
+	struct imsm_dev *dev = dv->dev;
+	struct imsm_map *map = get_imsm_map(dev, MAP_X);
+	off_t offset = pba_of_lba0(map) << 9;
+	ssize_t size = (sizeof(*dv->nvc) + 511) & ~511;
+	int ret;
+
+	if (posix_memalign((void**) &dv->nvc, 512, size) != 0) {
+		pr_err("Failed to allocate cache anchor buffer"
+				" for %.16s\n", dev->volume);
+		return 1;
+	}
+
+	ret = pread(fd, dv->nvc, size, offset);
+	if (ret != size) {
+		pr_err("Failed to read cache metadata for %.16s: %s\n",
+			dev->volume, strerror(errno));
+		free(dv->nvc);
+		dv->nvc = NULL;
+	}
+
+	return ret != size;
+}
+
+static int load_raid_devices(int fd, struct intel_super *super)
+{
+	int i, err;
 	struct imsm_dev *dev_new;
 	size_t len, len_migr;
 	size_t max_len = 0;
@@ -3496,6 +3523,16 @@ static int parse_raid_devices(struct intel_super *super)
 		dv->index = i;
 		dv->next = super->devlist;
 		super->devlist = dv;
+
+		/* volumes that serve as caches have metadata at offset-0 from
+		 * the start of the volume
+		 */
+		if (dv->dev->status & DEV_NVC_VOLUME) {
+			err = load_cache(fd, dv);
+			if (err)
+				return err;
+		} else
+			dv->nvc = NULL;
 	}
 
 	/* ensure that super->buf is large enough when all raid devices
@@ -3718,8 +3755,7 @@ static void clear_hi(struct intel_super *super)
 	}
 }
 
-static int
-load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd)
+static int load_mpb(int fd, struct intel_super *super, char *devname, int keep_fd)
 {
 	int err;
 
@@ -3729,7 +3765,10 @@ load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd
 	err = load_imsm_disk(fd, super, devname, keep_fd);
 	if (err)
 		return err;
-	err = parse_raid_devices(super);
+	err = load_raid_devices(fd, super);
+	if (err)
+		return err;
+
 	clear_hi(super);
 	return err;
 }
@@ -4384,13 +4423,13 @@ static int get_super_block(struct intel_super **super_list, char *devnm, char *d
 	}
 
 	find_intel_hba_capability(dfd, s, devname);
-	err = load_and_parse_mpb(dfd, s, NULL, keep_fd);
+	err = load_mpb(dfd, s, NULL, keep_fd);
 
 	/* retry the load if we might have raced against mdmon */
 	if (err == 3 && devnm && mdmon_running(devnm))
 		for (retry = 0; retry < 3; retry++) {
 			usleep(3000);
-			err = load_and_parse_mpb(dfd, s, NULL, keep_fd);
+			err = load_mpb(dfd, s, NULL, keep_fd);
 			if (err != 3)
 				break;
 		}
@@ -4473,7 +4512,7 @@ static int load_super_imsm(struct supertype *st, int fd, char *devname)
 		free_imsm(super);
 		return 2;
 	}
-	rv = load_and_parse_mpb(fd, super, devname, 0);
+	rv = load_mpb(fd, super, devname, 0);
 
 	if (rv) {
 		if (devname)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 09/11] imsm: examine cache configurations
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (7 preceding siblings ...)
  2014-04-24  7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Allow -E to show the cache associations of volumes.  The UUIDs are
calculated from the stored "orig_family_num" and "dev_id" in the cache
metadata.

For -Eb a UUID is synthesized from the union of the
<cache>:<cache-target> tuple for the purposes of identifying the
complete volume.

The proposed assembly hierarchy is:

1/ (2) containers (one for the cache "array", one for the cache-target
   "array")

2/ (2) subarrays (one for the cache "volume", one for the cache-target
   "volume")

3/ (1) stacked array with the subarray from 2/ as component members

...where "array" and "volume" are the imsm terminology for a mdadm
container and subarray.

TODO: what to do about the name of the composite volume?  Leave it
dynamically assigned for now, we could have it takeover the cache-target
name, but that name is not available when examinig the cache device...

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 isrt-intel.h  |   12 ++++
 super-intel.c |  184 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 172 insertions(+), 24 deletions(-)

diff --git a/isrt-intel.h b/isrt-intel.h
index 50365de1a620..6d7e92f4da37 100644
--- a/isrt-intel.h
+++ b/isrt-intel.h
@@ -43,6 +43,18 @@ enum {
 	NV_CACHE_MODE_DIS_SAFE     = 11, /* volume or NV cache not associated */
 };
 
+static inline int nvc_enabled(__u8 mode)
+{
+	switch (mode) {
+	case NV_CACHE_MODE_OFF:
+	case NV_CACHE_MODE_DIS_PERF:
+	case NV_CACHE_MODE_DIS_SAFE:
+		return 0;
+	default:
+		return 1;
+	}
+}
+
 struct segment_index_pair {
 	__u32 segment;
 	__u32 index;
diff --git a/super-intel.c b/super-intel.c
index f179d80b8209..7a7a48e9e6d7 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -770,18 +770,40 @@ static struct imsm_dev *__get_imsm_dev(struct imsm_super *mpb, __u8 index)
 	return NULL;
 }
 
-static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
+static struct intel_dev *get_intel_dev(struct intel_super *super, __u8 index)
 {
 	struct intel_dev *dv;
 
-	if (index >= super->anchor->num_raid_devs)
-		return NULL;
 	for (dv = super->devlist; dv; dv = dv->next)
 		if (dv->index == index)
-			return dv->dev;
+			return dv;
+	return NULL;
+}
+
+static int is_isrt_leg(struct intel_dev *dv)
+{
+	return dv->nvc || nvc_enabled(dv->dev->nv_cache_mode);
+}
+
+static struct intel_dev *get_isrt_leg(struct intel_super *super, int leg)
+{
+	struct intel_dev *dv;
+
+	for (dv = super->devlist; dv; dv = dv->next)
+		if (!is_isrt_leg(dv))
+			continue;
+		else if (--leg == 0)
+			return dv;
 	return NULL;
 }
 
+static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
+{
+	struct intel_dev *dv = get_intel_dev(super, index);
+
+	return dv ? dv->dev : NULL;
+}
+
 /*
  * for second_map:
  *  == MAP_0 get first map
@@ -1122,20 +1144,112 @@ static int is_gen_migration(struct imsm_dev *dev);
 static __u64 blocks_per_migr_unit(struct intel_super *super,
 				  struct imsm_dev *dev);
 
-static void print_imsm_dev(struct intel_super *super,
-			   struct imsm_dev *dev,
-			   char *uuid,
-			   int disk_idx)
+/* generate the <cache> + <cache_target> (in that order) UUID */
+static int cache_volume_uuid(struct intel_super *super, struct intel_dev *dv, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+	struct imsm_dev *dev = dv->dev;
+
+	if (!is_isrt_leg(dv))
+		return 1;
+
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+	if (dv->nvc) {
+		struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+		/* self id + cache target id */
+		sha1_process_bytes(&super->anchor->orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&dev->dev_id, sizeof(dv->dev->dev_id), &ctx);
+		sha1_process_bytes(&cfg->acc_vol_orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&cfg->acc_vol_dev_id, sizeof(cfg->acc_vol_dev_id), &ctx);
+	} else if (nvc_enabled(dev->nv_cache_mode)) {
+		/* cache id + self id */
+		sha1_process_bytes(&dev->nvc_orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&dev->nvc_dev_id, sizeof(dev->nvc_dev_id), &ctx);
+		sha1_process_bytes(&super->anchor->orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&dev->dev_id, sizeof(dev->dev_id), &ctx);
+	}
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+	return 0;
+}
+
+static void cache_target_uuid(struct intel_super *super, struct intel_dev *dv, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+	struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+	sha1_process_bytes(&cfg->acc_vol_orig_family_num, sizeof(__u32), &ctx);
+	sha1_process_bytes(&cfg->acc_vol_dev_id, sizeof(cfg->acc_vol_dev_id), &ctx);
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+}
+
+static void cache_uuid(struct intel_super *super, struct imsm_dev *dev, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+	sha1_process_bytes(&dev->nvc_orig_family_num, sizeof(__u32), &ctx);
+	sha1_process_bytes(&dev->nvc_dev_id, sizeof(dev->nvc_dev_id), &ctx);
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+}
+
+static void examine_cache(struct intel_super *super, struct intel_dev *dv)
+{
+	int uuid[4];
+	char uuid_str[64];
+	char *cache_role = NULL;
+	struct imsm_dev *dev = dv->dev;
+
+	if (dv->nvc) {
+		cache_role = "cache";
+		cache_target_uuid(super, dv, uuid);
+	}
+	if (nvc_enabled(dev->nv_cache_mode)) {
+		if (cache_role)
+			cache_role = NULL; /* can't have it both ways */
+		else {
+			cache_role = "cache-target";
+			cache_uuid(super, dev, uuid);
+		}
+	}
+	__fname_from_uuid(uuid, 0, uuid_str, ':');
+
+	if (!cache_role)
+		return;
+
+	printf("          Magic : Intel (R) Smart Response Technology\n");
+	printf("     Cache role : %s\n", cache_role);
+	printf("     Cache peer : %s\n", uuid_str + 5);
+	cache_volume_uuid(super, dv, uuid);
+	__fname_from_uuid(uuid, 0, uuid_str, ':');
+	printf("   Cache volume : %s\n", uuid_str + 5);
+}
+
+static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
+			   struct mdinfo *info, int disk_idx)
 {
 	__u64 sz;
+	__u32 ord;
 	int slot, i;
+	char uuid_str[64];
+	struct imsm_dev *dev = dv->dev;
 	struct imsm_map *map = get_imsm_map(dev, MAP_0);
 	struct imsm_map *map2 = get_imsm_map(dev, MAP_1);
-	__u32 ord;
 
 	printf("\n");
 	printf("[%.16s]:\n", dev->volume);
-	printf("           UUID : %s\n", uuid);
+	__fname_from_uuid(info->uuid, 0, uuid_str, ':');
+	printf("           UUID : %s\n", uuid_str + 5);
 	printf("     RAID Level : %d", get_imsm_raid_level(map));
 	if (map2)
 		printf(" <-- %d", get_imsm_raid_level(map2));
@@ -1224,6 +1338,11 @@ static void print_imsm_dev(struct intel_super *super,
 	}
 	printf("\n");
 	printf("    Dirty State : %s\n", dev->vol.dirty ? "dirty" : "clean");
+
+	if (is_isrt_leg(dv)) {
+		printf("\n");
+		examine_cache(super, dv);
+	}
 }
 
 static void print_imsm_disk(struct imsm_disk *disk, int index, __u32 reserved)
@@ -1443,13 +1562,12 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
 		       (unsigned long long) __le64_to_cpu(log->first_spare_lba));
 	}
 	for (i = 0; i < mpb->num_raid_devs; i++) {
+		struct intel_dev *dv = get_intel_dev(super, i);
 		struct mdinfo info;
-		struct imsm_dev *dev = __get_imsm_dev(mpb, i);
 
 		super->current_vol = i;
 		getinfo_super_imsm(st, &info, NULL);
-		fname_from_uuid(st, &info, nbuf, ':');
-		print_imsm_dev(super, dev, nbuf + 5, super->disks->index);
+		print_imsm_dev(super, dv, &info, super->disks->index);
 	}
 	for (i = 0; i < mpb->num_disks; i++) {
 		if (i == super->disks->index)
@@ -1466,7 +1584,6 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
 
 static void brief_examine_super_imsm(struct supertype *st, int verbose)
 {
-	/* We just write a generic IMSM ARRAY entry */
 	struct mdinfo info;
 	char nbuf[64];
 	struct intel_super *super = st->sb;
@@ -1481,14 +1598,28 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
 	printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
 }
 
+static void brief_examine_cache_imsm(struct supertype *st, int cache_leg)
+{
+	int uuid[4];
+	char nbuf[64];
+	struct intel_super *super = st->sb;
+	struct intel_dev *dv = get_isrt_leg(super, cache_leg);
+
+	if (!dv)
+		return;
+
+	cache_volume_uuid(super, dv, uuid);
+	__fname_from_uuid(uuid, 0, nbuf, ':');
+	printf("ARRAY UUID=%s\n", nbuf + 5);
+}
+
 static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
 {
-	/* We just write a generic IMSM ARRAY entry */
-	struct mdinfo info;
+	int i;
 	char nbuf[64];
 	char nbuf1[64];
+	struct mdinfo info;
 	struct intel_super *super = st->sb;
-	int i;
 
 	if (!super->anchor->num_raid_devs)
 		return;
@@ -1496,13 +1627,13 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
 	getinfo_super_imsm(st, &info, NULL);
 	fname_from_uuid(st, &info, nbuf, ':');
 	for (i = 0; i < super->anchor->num_raid_devs; i++) {
-		struct imsm_dev *dev = get_imsm_dev(super, i);
+		struct intel_dev *dv = get_intel_dev(super, i);
 
 		super->current_vol = i;
 		getinfo_super_imsm(st, &info, NULL);
 		fname_from_uuid(st, &info, nbuf1, ':');
 		printf("ARRAY /dev/md/%.16s container=%s member=%d UUID=%s\n",
-		       dev->volume, nbuf + 5, i, nbuf1 + 5);
+		       dv->dev->volume, nbuf + 5, i, nbuf1 + 5);
 	}
 }
 
@@ -2827,12 +2958,12 @@ static struct imsm_disk *get_imsm_missing(struct intel_super *super, __u8 index)
 
 static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map)
 {
-	struct intel_super *super = st->sb;
-	struct imsm_disk *disk;
-	int map_disks = info->array.raid_disks;
-	int max_enough = -1;
 	int i;
+	struct imsm_disk *disk;
 	struct imsm_super *mpb;
+	struct intel_super *super = st->sb;
+	int max_enough = -1, cache_legs = 0;
+	int map_disks = info->array.raid_disks;
 
 	if (super->current_vol >= 0) {
 		getinfo_super_imsm_volume(st, info, map);
@@ -2869,7 +3000,8 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 	mpb = super->anchor;
 
 	for (i = 0; i < mpb->num_raid_devs; i++) {
-		struct imsm_dev *dev = get_imsm_dev(super, i);
+		struct intel_dev *dv = get_intel_dev(super, i);
+		struct imsm_dev *dev = dv->dev;
 		int failed, enough, j, missing = 0;
 		struct imsm_map *map;
 		__u8 state;
@@ -2877,6 +3009,8 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 		failed = imsm_count_failed(super, dev, MAP_0);
 		state = imsm_check_degraded(super, dev, failed, MAP_0);
 		map = get_imsm_map(dev, MAP_0);
+		if (is_isrt_leg(dv))
+			cache_legs++;
 
 		/* any newly missing disks?
 		 * (catches single-degraded vs double-degraded)
@@ -2917,6 +3051,7 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 	}
 	dprintf("%s: enough: %d\n", __func__, max_enough);
 	info->container_enough = max_enough;
+	info->cache_legs = cache_legs;
 
 	if (super->disks) {
 		__u32 reserved = imsm_reserved_sectors(super, super->disks);
@@ -10578,6 +10713,7 @@ struct superswitch super_imsm = {
 	.examine_super	= examine_super_imsm,
 	.brief_examine_super = brief_examine_super_imsm,
 	.brief_examine_subarrays = brief_examine_subarrays_imsm,
+	.brief_examine_cache = brief_examine_cache_imsm,
 	.export_examine_super = export_examine_super_imsm,
 	.detail_super	= detail_super_imsm,
 	.brief_detail_super = brief_detail_super_imsm,


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 10/11] imsm: assemble cache volumes
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (8 preceding siblings ...)
  2014-04-24  7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Teach load_super to examine the passed in fd and determine if it is a
cache or cache-target md device.

Generate info to allow the two halves of the cache to be assembled.

XXX: what are the rules we need for compare_super to determine stale
cache associations?

Create a LEVEL_ISRT md device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 isrt-intel.h  |    2 +
 maps.c        |    1 
 mdadm.h       |    1 
 super-intel.c |  191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 sysfs.c       |    8 ++
 util.c        |    1 
 6 files changed, 198 insertions(+), 6 deletions(-)

diff --git a/isrt-intel.h b/isrt-intel.h
index 6d7e92f4da37..ea106c5ac02c 100644
--- a/isrt-intel.h
+++ b/isrt-intel.h
@@ -28,6 +28,8 @@ enum {
 	NVC_SIG_LEN = 32,
 	ISRT_DEV_IDX = 0,
 	ISRT_TARGET_DEV_IDX = 1,
+	ISRT_ROLE_CACHE = 0,
+	ISRT_ROLE_TARGET = 1,
 
 	NV_CACHE_MODE_OFF          = 0,
 	NV_CACHE_MODE_OFF_TO_SAFE  = 1, /* powerfail recovery state */
diff --git a/maps.c b/maps.c
index 64f1df2c42c3..28c010fdf9bf 100644
--- a/maps.c
+++ b/maps.c
@@ -93,6 +93,7 @@ mapping_t pers[] = {
 	{ "10", 10},
 	{ "faulty", LEVEL_FAULTY},
 	{ "container", LEVEL_CONTAINER},
+	{ "isrt", LEVEL_ISRT },
 	{ NULL, 0}
 };
 
diff --git a/mdadm.h b/mdadm.h
index 111f90f599af..e613d3866d8b 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1457,6 +1457,7 @@ char *xstrdup(const char *str);
 #define	LEVEL_MULTIPATH		(-4)
 #define	LEVEL_LINEAR		(-1)
 #define	LEVEL_FAULTY		(-5)
+#define	LEVEL_ISRT		(-12)
 
 /* kernel module doesn't know about these */
 #define LEVEL_CONTAINER		(-100)
diff --git a/super-intel.c b/super-intel.c
index 7a7a48e9e6d7..e69d2a044e92 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -379,6 +379,8 @@ struct intel_super {
 	int updates_pending; /* count of pending updates for mdmon */
 	int current_vol; /* index of raid device undergoing creation */
 	unsigned long long create_offset; /* common start for 'current_vol' */
+	int load_cache; /* flag to indicate we are operating on the cache metadata */
+	int cache_dev; /* subarray/volume index of the cache volume */
 	__u32 random; /* random data for seeding new family numbers */
 	struct intel_dev *devlist;
 	struct dl {
@@ -1246,6 +1248,9 @@ static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
 	struct imsm_map *map = get_imsm_map(dev, MAP_0);
 	struct imsm_map *map2 = get_imsm_map(dev, MAP_1);
 
+	if (super->load_cache)
+		examine_cache(super, dv);
+
 	printf("\n");
 	printf("[%.16s]:\n", dev->volume);
 	__fname_from_uuid(info->uuid, 0, uuid_str, ':');
@@ -1339,7 +1344,7 @@ static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
 	printf("\n");
 	printf("    Dirty State : %s\n", dev->vol.dirty ? "dirty" : "clean");
 
-	if (is_isrt_leg(dv)) {
+	if (!super->load_cache) {
 		printf("\n");
 		examine_cache(super, dv);
 	}
@@ -1514,6 +1519,8 @@ static int imsm_check_attributes(__u32 attributes)
 
 #ifndef MDASSEMBLE
 static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map);
+static void getinfo_super_imsm_cache(struct intel_super *super, struct intel_dev *dv,
+				     struct mdinfo *info, char *map);
 
 static void examine_super_imsm(struct supertype *st, char *homehost)
 {
@@ -1527,6 +1534,18 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
 	__u32 reserved = imsm_reserved_sectors(super, super->disks);
 	struct dl *dl;
 
+	if (super->load_cache) {
+		struct intel_dev *dv = get_intel_dev(super, super->cache_dev);
+		struct mdinfo info;
+
+		super->load_cache = 0;
+		super->current_vol = super->cache_dev;
+		getinfo_super_imsm(st, &info, NULL);
+		super->load_cache = 1;
+		print_imsm_dev(super, dv, &info, super->disks->index);
+		return;
+	}
+
 	snprintf(str, MPB_SIG_LEN, "%s", mpb->sig);
 	printf("          Magic : %s\n", str);
 	snprintf(str, strlen(MPB_VERSION_RAID0), "%s", get_imsm_version(mpb));
@@ -1595,21 +1614,24 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
 
 	getinfo_super_imsm(st, &info, NULL);
 	fname_from_uuid(st, &info, nbuf, ':');
-	printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
+	if (super->load_cache)
+		printf("ARRAY UUID=%s\n", nbuf + 5);
+	else
+		printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
 }
 
 static void brief_examine_cache_imsm(struct supertype *st, int cache_leg)
 {
-	int uuid[4];
 	char nbuf[64];
+	struct mdinfo info;
 	struct intel_super *super = st->sb;
 	struct intel_dev *dv = get_isrt_leg(super, cache_leg);
 
 	if (!dv)
 		return;
 
-	cache_volume_uuid(super, dv, uuid);
-	__fname_from_uuid(uuid, 0, nbuf, ':');
+	getinfo_super_imsm_cache(super, dv, &info, NULL);
+	fname_from_uuid(st, &info, nbuf, ':');
 	printf("ARRAY UUID=%s\n", nbuf + 5);
 }
 
@@ -1621,6 +1643,10 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
 	struct mdinfo info;
 	struct intel_super *super = st->sb;
 
+	/* don't re-report container metadata info */
+	if (super->load_cache)
+		return;
+
 	if (!super->anchor->num_raid_devs)
 		return;
 
@@ -2956,6 +2982,71 @@ static struct imsm_disk *get_imsm_missing(struct intel_super *super, __u8 index)
 	return NULL;
 }
 
+static void getinfo_super_imsm_cache(struct intel_super *super, struct intel_dev *dv,
+				     struct mdinfo *info, char *dmap)
+{
+	__u16 nv_cache_mode;
+	int role_failed = 0, role;
+	struct imsm_dev *dev = dv->dev;
+	struct imsm_map *map = get_imsm_map(dev, MAP_X);
+
+	memset(info, 0, sizeof(*info));
+
+	role = dv->nvc ? ISRT_ROLE_CACHE : ISRT_ROLE_TARGET;
+	if (role == ISRT_ROLE_CACHE) {
+		struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+		nv_cache_mode = cfg->nv_cache_mode;
+		info->events = 0;
+	} else {
+		nv_cache_mode = dev->nv_cache_mode;
+		info->events = 1; /* make Assemble choose the cache target */
+	}
+
+	if (map->map_state == IMSM_T_STATE_FAILED ||
+	    nv_cache_mode == NV_CACHE_MODE_IS_FAILING ||
+	    nv_cache_mode == NV_CACHE_MODE_HAS_FAILED)
+		role_failed = 1;
+
+	info->array.raid_disks    = 2;
+	info->array.level         = LEVEL_ISRT;
+	info->array.layout        = 0;
+	info->array.md_minor      = -1;
+	info->array.ctime         = 0;
+	info->array.utime         = 0;
+	info->array.chunk_size    = 0;
+
+	info->disk.major = 0;
+	info->disk.minor = 0;
+	info->disk.raid_disk = role;
+	info->reshape_active = 0;
+	info->array.major_version = -1;
+	info->array.minor_version = -2;
+	strcpy(info->text_version, "isrt");
+	info->safe_mode_delay = 0;
+	info->disk.number = role;
+	info->name[0] = 0;
+	info->recovery_start = MaxSector;
+	info->data_offset = 0;
+	info->custom_array_size = __le32_to_cpu(dev->size_high);
+	info->custom_array_size <<= 32;
+	info->custom_array_size |= __le32_to_cpu(dev->size_low);
+	info->component_size = info->custom_array_size;
+
+	if (role_failed)
+		info->disk.state = (1 << MD_DISK_FAULTY);
+	else
+		info->disk.state = (1 << MD_DISK_ACTIVE) | (1 << MD_DISK_SYNC);
+	cache_volume_uuid(super, dv, info->uuid);
+
+	if (dmap) {
+		/* we can only report self-state */
+		dmap[!role] = 1;
+		dmap[role] = !role_failed;
+	}
+}
+
+
 static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map)
 {
 	int i;
@@ -2965,6 +3056,20 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 	int max_enough = -1, cache_legs = 0;
 	int map_disks = info->array.raid_disks;
 
+	if (super->load_cache || st->cache_leg) {
+		struct intel_dev *dv;
+
+		if (st->cache_leg) {
+			dv = get_isrt_leg(super, st->cache_leg);
+			if (!dv)
+				return;
+		} else
+			dv = get_intel_dev(super, super->cache_dev);
+
+		getinfo_super_imsm_cache(super, dv, info, map);
+		return;
+	}
+
 	if (super->current_vol >= 0) {
 		getinfo_super_imsm_volume(st, info, map);
 		return;
@@ -3266,6 +3371,27 @@ static int compare_super_imsm(struct supertype *st, struct supertype *tst)
 		}
 	}
 
+	/* cache configuration metadata lives on member arrays, as long
+	 * as they mutually agree on the volume-uuid then consider them a match
+	 * XXX: sufficient? we do have the failure checks in
+	 * getinfo_super_cache() to mitigate
+	 */
+	if (first->load_cache != sec->load_cache)
+		return 3;
+	else if (first->load_cache) {
+		struct intel_dev *first_dv, *sec_dv;
+		int first_uuid[4], sec_uuid[4];
+
+		first_dv = get_intel_dev(first, first->cache_dev);
+		sec_dv = get_intel_dev(sec, sec->cache_dev);
+		cache_volume_uuid(first, first_dv, first_uuid);
+		cache_volume_uuid(sec, sec_dv, sec_uuid);
+		if (memcmp(first_uuid, sec_uuid, sizeof(first_uuid)))
+			return 3;
+		else
+			return 0;
+	}
+
 	/* if an anchor does not have num_raid_devs set then it is a free
 	 * floating spare
 	 */
@@ -4621,6 +4747,52 @@ static int load_container_imsm(struct supertype *st, int fd, char *devname)
 {
 	return load_super_imsm_all(st, fd, &st->sb, devname, NULL, 1);
 }
+
+static int load_super_cache(struct supertype *st, int fd, char *devname)
+{
+	struct mdinfo *sra = sysfs_read(fd, 0, GET_VERSION);
+	char *subarray, *devnm, *ep;
+	int cfd, cache_dev, err = 1;
+	struct intel_super *super;
+	struct intel_dev *dv;
+
+	if (sra && sra->array.major_version == -1 &&
+	    is_subarray(sra->text_version))
+		/* pass */;
+	else
+		goto out;
+
+	/* modify sra->text_version in place */
+	ep = strchr(sra->text_version+1, '/');
+	*ep = '\0';
+	devnm = sra->text_version+1;
+	subarray = ep+1;
+
+	cfd = open_dev(devnm);
+	if (cfd < 0)
+		goto out;
+
+	err = load_container_imsm(st, cfd, devname);
+	close(cfd);
+	if (err)
+		goto out;
+
+	super = st->sb;
+	cache_dev = strtoul(subarray, &ep, 10);
+	/* validate this volume is a cache or cache-target */
+	if (*ep != '\0' || !(dv = get_intel_dev(super, cache_dev))
+	    || !is_isrt_leg(dv)) {
+		free_super_imsm(st);
+		err = 2;
+		goto out;
+	}
+
+	super->load_cache = 1;
+	super->cache_dev = cache_dev;
+ out:
+	sysfs_free(sra);
+	return err;
+}
 #endif
 
 static int load_super_imsm(struct supertype *st, int fd, char *devname)
@@ -4634,6 +4806,15 @@ static int load_super_imsm(struct supertype *st, int fd, char *devname)
 
 	free_super_imsm(st);
 
+#ifndef MDASSEMBLE
+	/* check if this is a component leg of a cache array and load
+	 * the cache metadata from the parent container
+	 */
+	rv = load_super_cache(st, fd, devname);
+	if (rv == 0)
+		return rv;
+#endif
+
 	super = alloc_super();
 	/* Load hba and capabilities if they exist.
 	 * But do not preclude loading metadata in case capabilities or hba are
diff --git a/sysfs.c b/sysfs.c
index 4cbd4e5d051b..898edde49392 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -638,7 +638,13 @@ int sysfs_set_array(struct mdinfo *info, int vers)
 	rv |= sysfs_set_num(info, NULL, "raid_disks", raid_disks);
 	rv |= sysfs_set_num(info, NULL, "chunk_size", info->array.chunk_size);
 	rv |= sysfs_set_num(info, NULL, "layout", info->array.layout);
-	rv |= sysfs_set_num(info, NULL, "component_size", info->component_size/2);
+	if (info->array.level == LEVEL_ISRT) {
+		/* FIXME: how do we support asymmetric component sizes for
+		 * external metadata?
+		 */
+		rv |= sysfs_set_num(info, NULL, "component_size", 0);
+	} else
+		rv |= sysfs_set_num(info, NULL, "component_size", info->component_size/2);
 	if (info->custom_array_size) {
 		int rc;
 
diff --git a/util.c b/util.c
index 93f9200fa4c7..c9c4dec0fac1 100644
--- a/util.c
+++ b/util.c
@@ -362,6 +362,7 @@ int enough(int level, int raid_disks, int layout, int clean, char *avail)
 
 	case LEVEL_MULTIPATH:
 		return avail_disks>= 1;
+	case LEVEL_ISRT:
 	case LEVEL_LINEAR:
 	case 0:
 		return avail_disks == raid_disks;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (9 preceding siblings ...)
  2014-04-24  7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Turn on attribute support for caching.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 super-intel.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index e69d2a044e92..10c38b248ce6 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -81,7 +81,8 @@
 					MPB_ATTRIB_RAID1           | \
 					MPB_ATTRIB_RAID10          | \
 					MPB_ATTRIB_RAID5           | \
-					MPB_ATTRIB_EXP_STRIPE_SIZE)
+					MPB_ATTRIB_EXP_STRIPE_SIZE | \
+					MPB_ATTRIB_NVM)
 
 /* Define attributes that are unused but not harmful */
 #define MPB_ATTRIB_IGNORED		(MPB_ATTRIB_NEVER_USE)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-24  7:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.