All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 00/11] mdadm tool: add the support for cluster-md
@ 2015-05-08  8:50 gqjiang
  2015-05-08  8:50 ` [PATCH V2 01/11] Create n bitmaps for clustered mode gqjiang
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

Compared to previous version:
1. re-arrange the squence of patches
2. add some memembers into sb_le_to_cpu
3. handle some logic change and comments from Neil

Basic background for Cluster MD: Cluster MD is a shared-device RAID for a
cluster, currently, the implementation is limited to RAID1 but with further
work (and some positive feedback), it could be extend to other RAID levels.

The kernel part code of cluster-md has been sent to maillist several month
ago by Goldywyn, and to make cluster-md works, the mdadm tools also need to
do some changes accordingly.

This patch set extends mdadm tool to aware cluster MD scenario, and handle
related cluster-md scenario.

1. the first part (0001-0007) comes from Goldwyn, which add initial
support for cluster-md, those changes included make mdadm awares nodes,
home-cluster and n bitmaps for clustered mode, also let mdadm can 
confirm disk which is added by another node.


2. the second part is for support change cluster-name and node nums under
assemble mode. Which extend write-bitmap to handle above cases, and also
use the extended write_bitmap for update uuid. [PATCH V2 10/10] is just compiled
test only.

BTW: this series is based on commit "72a457 IMSM: Count arrays per orom".

Some reltated links:
[1] http://marc.info/?l=linux-raid&m=141891941330336&w=2
[2] http://marc.info/?l=linux-raid&m=141935561418770&w=2

Guoqing Jiang (11):
  Create n bitmaps for clustered mode
  Add nodes option while creating md
  home-cluster while creating an array
  Show all bitmaps while examining bitmap
  Add a new clustered disk
  Convert a bitmap=none device to clustered
  Skip clustered devices in incremental
  mdadm: add the ability to change cluster name
  mdadm: change the num of cluster node
  Reuse calc_bitmap_size to reduce code size
  Reuse the write_bitmap for update uuid

 Assemble.c    |  14 ++++--
 Create.c      |   5 +-
 Grow.c        |  22 +++++++--
 Incremental.c |   5 ++
 Makefile      |   1 +
 Manage.c      |  33 +++++++++++--
 ReadMe.c      |   3 ++
 bitmap.c      |  94 ++++++++++++++++++++++---------------
 bitmap.h      |   7 ++-
 config.c      |  27 ++++++++++-
 md_p.h        |   7 +++
 md_u.h        |   1 +
 mdadm.8.in    |  28 ++++++++++-
 mdadm.c       |  69 ++++++++++++++++++++++++++-
 mdadm.h       |  20 +++++++-
 super0.c      |   4 +-
 super1.c      | 147 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 util.c        |  60 ++++++++++++++++++++++++
 18 files changed, 461 insertions(+), 86 deletions(-)

-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 01/11] Create n bitmaps for clustered mode
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-15  5:45   ` NeilBrown
  2015-05-08  8:50 ` [PATCH V2 02/11] Add nodes option while creating md gqjiang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

For a clustered MD, create bitmaps equal to number of nodes so
each node has an independent bitmap.

Only the first bitmap is has the bits set so that the first node
that assembles the device also performs the sync.

The bitmaps are aligned to 4k boundaries.

On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Create.c   |  3 ++-
 bitmap.c   |  2 ++
 bitmap.h   |  7 +++++--
 mdadm.8.in |  7 ++++++-
 mdadm.c    |  9 +++++++++
 super1.c   | 59 +++++++++++++++++++++++++++++++++++++++++------------------
 6 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/Create.c b/Create.c
index ef28da0..69f5432 100644
--- a/Create.c
+++ b/Create.c
@@ -750,7 +750,8 @@ int Create(struct supertype *st, char *mddev,
 #endif
 	}
 
-	if (s->bitmap_file && strcmp(s->bitmap_file, "internal")==0) {
+	if (s->bitmap_file && (strcmp(s->bitmap_file, "internal")==0
+			|| strcmp(s->bitmap_file, "clustered")==0)) {
 		if ((vers%100) < 2) {
 			pr_err("internal bitmaps not supported by this kernel.\n");
 			goto abort_locked;
diff --git a/bitmap.c b/bitmap.c
index b1d54a6..920033a 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -32,6 +32,8 @@ inline void sb_le_to_cpu(bitmap_super_t *sb)
 	sb->daemon_sleep = __le32_to_cpu(sb->daemon_sleep);
 	sb->sync_size = __le64_to_cpu(sb->sync_size);
 	sb->write_behind = __le32_to_cpu(sb->write_behind);
+	sb->nodes = __le32_to_cpu(sb->nodes);
+	sb->sectors_reserved = __le32_to_cpu(sb->sectors_reserved);
 }
 
 inline void sb_cpu_to_le(bitmap_super_t *sb)
diff --git a/bitmap.h b/bitmap.h
index c8725a3..adbf0b4 100644
--- a/bitmap.h
+++ b/bitmap.h
@@ -154,8 +154,11 @@ typedef struct bitmap_super_s {
 	__u32 chunksize;    /* 52  the bitmap chunk size in bytes */
 	__u32 daemon_sleep; /* 56  seconds between disk flushes */
 	__u32 write_behind; /* 60  number of outstanding write-behind writes */
-
-	__u8  pad[256 - 64]; /* set to zero */
+	__u32 sectors_reserved; /* 64 number of 512-byte sectors that are
+				 * reserved for the bitmap. */
+	__u32 nodes;        /* 68 the maximum number of nodes in cluster. */
+	__u8 cluster_name[64]; /* 72 cluster name to which this md belongs */
+	__u8  pad[256 - 136]; /* set to zero */
 } bitmap_super_t;
 
 /* notes:
diff --git a/mdadm.8.in b/mdadm.8.in
index a630310..4aec0db 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -694,7 +694,12 @@ and so is replicated on all devices.  If the word
 .B "none"
 is given with
 .B \-\-grow
-mode, then any bitmap that is present is removed.
+mode, then any bitmap that is present is removed. If the word
+.B "clustered"
+is given, the array is created for a clustered environment. One bitmap
+is created for each node as defined by the
+.B \-\-nodes
+parameter and are stored internally.
 
 To help catch typing errors, the filename must contain at least one
 slash ('/') if it is a real file (not 'internal' or 'none').
diff --git a/mdadm.c b/mdadm.c
index 3e8c49b..bd9382e 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -1097,6 +1097,15 @@ int main(int argc, char *argv[])
 				s.bitmap_file = optarg;
 				continue;
 			}
+			if (strcmp(optarg, "clustered")== 0) {
+				s.bitmap_file = optarg;
+				/* Set the default number of cluster nodes
+				 * to 4 if not already set by user
+				 */
+				if (c.nodes < 1)
+					c.nodes = 4;
+				continue;
+			}
 			/* probable typo */
 			pr_err("bitmap file must contain a '/', or be 'internal', or 'none'\n"
 				"       not '%s'\n", optarg);
diff --git a/super1.c b/super1.c
index f0508fe..57b1526 100644
--- a/super1.c
+++ b/super1.c
@@ -2144,6 +2144,10 @@ add_internal_bitmap1(struct supertype *st,
 	bms->daemon_sleep = __cpu_to_le32(delay);
 	bms->sync_size = __cpu_to_le64(size);
 	bms->write_behind = __cpu_to_le32(write_behind);
+	bms->nodes = __cpu_to_le32(st->nodes);
+	if (st->cluster_name)
+		strncpy((char *)bms->cluster_name,
+				st->cluster_name, strlen(st->cluster_name));
 
 	*chunkp = chunk;
 	return 1;
@@ -2177,6 +2181,7 @@ static int write_bitmap1(struct supertype *st, int fd)
 	void *buf;
 	int towrite, n;
 	struct align_fd afd;
+	unsigned int i;
 
 	init_afd(&afd, fd);
 
@@ -2185,27 +2190,45 @@ static int write_bitmap1(struct supertype *st, int fd)
 	if (posix_memalign(&buf, 4096, 4096))
 		return -ENOMEM;
 
-	memset(buf, 0xff, 4096);
-	memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
-
-	towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
-	towrite = (towrite+7) >> 3; /* bits to bytes */
-	towrite += sizeof(bitmap_super_t);
-	towrite = ROUND_UP(towrite, 512);
-	while (towrite > 0) {
-		n = towrite;
-		if (n > 4096)
-			n = 4096;
-		n = awrite(&afd, buf, n);
-		if (n > 0)
-			towrite -= n;
+	/* We use bms->nodes as opposed to st->nodes to
+	 * be compatible with write-after-reads such as
+	 * the GROW operation.
+	 */
+	for (i = 0; i < __le32_to_cpu(bms->nodes); i++) {
+		/* Only the bitmap[0] should resync
+		 * whole device on initial assembly
+		 */
+		if (i)
+			memset(buf, 0x00, 4096);
 		else
+			memset(buf, 0xff, 4096);
+		memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
+
+		towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
+		towrite = (towrite+7) >> 3; /* bits to bytes */
+		towrite += sizeof(bitmap_super_t);
+		/* we need the bitmaps to be at 4k boundary */
+		towrite = ROUND_UP(towrite, 4096);
+		while (towrite > 0) {
+			n = towrite;
+			if (n > 4096)
+				n = 4096;
+			n = awrite(&afd, buf, n);
+			if (n > 0)
+				towrite -= n;
+			else
+				break;
+			if (i)
+				memset(buf, 0x00, 4096);
+			else
+				memset(buf, 0xff, 4096);
+		}
+		fsync(fd);
+		if (towrite) {
+			rv = -2;
 			break;
-		memset(buf, 0xff, 4096);
+		}
 	}
-	fsync(fd);
-	if (towrite)
-		rv = -2;
 
 	free(buf);
 	return rv;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 02/11] Add nodes option while creating md
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
  2015-05-08  8:50 ` [PATCH V2 01/11] Create n bitmaps for clustered mode gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 03/11] home-cluster while creating an array gqjiang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

Specifies the maximum number of nodes in the cluster that may use
this device simultaneously. This is equivalent to the number of
bitmaps created in the internal superblock (patches to follow).

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Create.c   |  1 +
 ReadMe.c   |  1 +
 mdadm.8.in |  6 ++++++
 mdadm.c    | 25 ++++++++++++++++++++++++-
 mdadm.h    |  3 +++
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Create.c b/Create.c
index 69f5432..e4577af 100644
--- a/Create.c
+++ b/Create.c
@@ -531,6 +531,7 @@ int Create(struct supertype *st, char *mddev,
 				st->ss->name);
 		warn = 1;
 	}
+	st->nodes = c->nodes;
 
 	if (warn) {
 		if (c->runstop!= 1) {
diff --git a/ReadMe.c b/ReadMe.c
index 87a4916..30c569d 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -140,6 +140,7 @@ struct option long_options[] = {
     {"homehost",  1, 0,  HomeHost},
     {"symlinks",  1, 0,  Symlinks},
     {"data-offset",1, 0, DataOffset},
+    {"nodes",1, 0, Nodes},
 
     /* For assemble */
     {"uuid",      1, 0, 'u'},
diff --git a/mdadm.8.in b/mdadm.8.in
index 4aec0db..9c1497e 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -971,6 +971,12 @@ However for RAID0, it is not possible to add spares.  So to increase
 the number of devices in a RAID0, it is necessary to set the new
 number of devices, and to add the new devices, in the same command.
 
+.TP
+.BR \-\-nodes
+Only works when the array is for clustered environment. It specify the
+maximum number of nodes in the cluster that will use this device
+simultaneously. If not specified, this defaults to 4.
+
 .SH For assemble:
 
 .TP
diff --git a/mdadm.c b/mdadm.c
index bd9382e..031383d 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -588,7 +588,14 @@ int main(int argc, char *argv[])
 			}
 			ident.raid_disks = s.raiddisks;
 			continue;
-
+		case O(CREATE, Nodes):
+			c.nodes = parse_num(optarg);
+			if (c.nodes <= 0) {
+				pr_err("invalid number for the number of "
+						"cluster nodes: %s\n", optarg);
+				exit(2);
+			}
+			continue;
 		case O(CREATE,'x'): /* number of spare (eXtra) disks */
 			if (s.sparedisks) {
 				pr_err("spare-devices set twice: %d and %s\n",
@@ -1386,6 +1393,22 @@ int main(int argc, char *argv[])
 	case CREATE:
 		if (c.delay == 0)
 			c.delay = DEFAULT_BITMAP_DELAY;
+
+		if (strcmp(s.bitmap_file, "clustered") == 0) {
+			if (s.level != 1) {
+			    pr_err("--bitmap=clustered is currently supported with RAID mirror only\n");
+			    rv = 1;
+			    break;
+			}
+		} else {
+			if (c.nodes) {
+				pr_err("--nodes argument is incompatible with --bitmap=%s.\n",
+					s.bitmap_file);
+				rv = 1;
+				break;
+			}
+		}
+
 		if (s.write_behind && !s.bitmap_file) {
 			pr_err("write-behind mode requires a bitmap.\n");
 			rv = 1;
diff --git a/mdadm.h b/mdadm.h
index 141f963..9d55801 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -344,6 +344,7 @@ enum special_options {
 	Dump,
 	Restore,
 	Action,
+	Nodes,
 };
 
 enum prefix_standard {
@@ -418,6 +419,7 @@ struct context {
 	char	*backup_file;
 	int	invalid_backup;
 	char	*action;
+	int	nodes;
 };
 
 struct shape {
@@ -1029,6 +1031,7 @@ struct supertype {
 			 */
 	int devcnt;
 	int retry_soon;
+	int nodes;
 
 	struct mdinfo *devs;
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 03/11] home-cluster while creating an array
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
  2015-05-08  8:50 ` [PATCH V2 01/11] Create n bitmaps for clustered mode gqjiang
  2015-05-08  8:50 ` [PATCH V2 02/11] Add nodes option while creating md gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 04/11] Show all bitmaps while examining bitmap gqjiang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Create.c   |  1 +
 Makefile   |  1 +
 ReadMe.c   |  1 +
 config.c   | 27 ++++++++++++++++++++++++++-
 mdadm.8.in |  6 ++++++
 mdadm.c    | 25 +++++++++++++++++++++++++
 mdadm.h    |  5 +++++
 util.c     | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/Create.c b/Create.c
index e4577af..9663dc4 100644
--- a/Create.c
+++ b/Create.c
@@ -532,6 +532,7 @@ int Create(struct supertype *st, char *mddev,
 		warn = 1;
 	}
 	st->nodes = c->nodes;
+	st->cluster_name = c->homecluster;
 
 	if (warn) {
 		if (c->runstop!= 1) {
diff --git a/Makefile b/Makefile
index a7d8c5c..431f08b 100644
--- a/Makefile
+++ b/Makefile
@@ -101,6 +101,7 @@ endif
 # If you want a static binary, you might uncomment these
 # LDFLAGS = -static
 # STRIP = -s
+LDLIBS=-ldl
 
 INSTALL = /usr/bin/install
 DESTDIR =
diff --git a/ReadMe.c b/ReadMe.c
index 30c569d..c6286ae 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -141,6 +141,7 @@ struct option long_options[] = {
     {"symlinks",  1, 0,  Symlinks},
     {"data-offset",1, 0, DataOffset},
     {"nodes",1, 0, Nodes},
+    {"home-cluster",1, 0, ClusterName},
 
     /* For assemble */
     {"uuid",      1, 0, 'u'},
diff --git a/config.c b/config.c
index 7342c42..21b6afd 100644
--- a/config.c
+++ b/config.c
@@ -77,7 +77,7 @@ char DefaultAltConfFile[] = CONFFILE2;
 char DefaultAltConfDir[] = CONFFILE2 ".d";
 
 enum linetype { Devices, Array, Mailaddr, Mailfrom, Program, CreateDev,
-		Homehost, AutoMode, Policy, PartPolicy, LTEnd };
+		Homehost, HomeCluster, AutoMode, Policy, PartPolicy, LTEnd };
 char *keywords[] = {
 	[Devices]  = "devices",
 	[Array]    = "array",
@@ -86,6 +86,7 @@ char *keywords[] = {
 	[Program]  = "program",
 	[CreateDev]= "create",
 	[Homehost] = "homehost",
+	[HomeCluster] = "homecluster",
 	[AutoMode] = "auto",
 	[Policy]   = "policy",
 	[PartPolicy]="part-policy",
@@ -562,6 +563,21 @@ void homehostline(char *line)
 	}
 }
 
+static char *home_cluster = NULL;
+void homeclusterline(char *line)
+{
+	char *w;
+
+	for (w=dl_next(line); w != line ; w=dl_next(w)) {
+		if (home_cluster == NULL) {
+			if (strcasecmp(w, "<none>")==0)
+				home_cluster = xstrdup("");
+			else
+				home_cluster = xstrdup(w);
+		}
+	}
+}
+
 char auto_yes[] = "yes";
 char auto_no[] = "no";
 char auto_homehost[] = "homehost";
@@ -724,6 +740,9 @@ void conf_file(FILE *f)
 		case Homehost:
 			homehostline(line);
 			break;
+		case HomeCluster:
+			homeclusterline(line);
+			break;
 		case AutoMode:
 			autoline(line);
 			break;
@@ -884,6 +903,12 @@ char *conf_get_homehost(int *require_homehostp)
 	return home_host;
 }
 
+char *conf_get_homecluster(void)
+{
+	load_conffile();
+	return home_cluster;
+}
+
 struct createinfo *conf_get_create_info(void)
 {
 	load_conffile();
diff --git a/mdadm.8.in b/mdadm.8.in
index 9c1497e..56d9bcc 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -415,6 +415,12 @@ This functionality is currently only provided by
 and
 .BR \-\-monitor .
 
+.TP
+.B \-\-home\-cluster=
+specifies the cluster name for the md device. The md device can be assembled
+only on the cluster which matches the name specified. If this option is not
+provided, mdadm tried to detect the cluster name automatically.
+
 .SH For create, build, or grow:
 
 .TP
diff --git a/mdadm.c b/mdadm.c
index 031383d..4f9bd9b 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -596,6 +596,13 @@ int main(int argc, char *argv[])
 				exit(2);
 			}
 			continue;
+		case O(CREATE, ClusterName):
+			c.homecluster = optarg;
+			if (strlen(c.homecluster) > 64) {
+				pr_err("Cluster name too big.\n");
+				exit(ERANGE);
+			}
+			continue;
 		case O(CREATE,'x'): /* number of spare (eXtra) disks */
 			if (s.sparedisks) {
 				pr_err("spare-devices set twice: %d and %s\n",
@@ -1276,6 +1283,18 @@ int main(int argc, char *argv[])
 		c.require_homehost = 0;
 	}
 
+	if (c.homecluster == NULL && (c.nodes > 0)) {
+		c.homecluster = conf_get_homecluster();
+		if (c.homecluster == NULL)
+			rv = get_cluster_name(&c.homecluster);
+		if (rv == 0) {
+			c.homehost = xstrdup(c.homecluster);
+			/* Add a : to differentiate between a host
+			 * and a cluster */
+			strcat(c.homehost, ":");
+		}
+	}
+
 	if (c.backup_file && data_offset != INVALID_SECTORS) {
 		pr_err("--backup-file and --data-offset are incompatible\n");
 		exit(2);
@@ -1407,6 +1426,12 @@ int main(int argc, char *argv[])
 				rv = 1;
 				break;
 			}
+			if (c.homecluster) {
+				pr_err("--home-cluster argument is incompatible with --bitmap=%s.\n",
+					s.bitmap_file);
+				rv = 1;
+				break;
+			}
 		}
 
 		if (s.write_behind && !s.bitmap_file) {
diff --git a/mdadm.h b/mdadm.h
index 9d55801..f56d9d6 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -345,6 +345,7 @@ enum special_options {
 	Restore,
 	Action,
 	Nodes,
+	ClusterName,
 };
 
 enum prefix_standard {
@@ -420,6 +421,7 @@ struct context {
 	int	invalid_backup;
 	char	*action;
 	int	nodes;
+	char	*homecluster;
 };
 
 struct shape {
@@ -1032,6 +1034,7 @@ struct supertype {
 	int devcnt;
 	int retry_soon;
 	int nodes;
+	char *cluster_name;
 
 	struct mdinfo *devs;
 
@@ -1308,6 +1311,7 @@ extern char *conf_get_mailaddr(void);
 extern char *conf_get_mailfrom(void);
 extern char *conf_get_program(void);
 extern char *conf_get_homehost(int *require_homehostp);
+extern char *conf_get_homecluster(void);
 extern char *conf_line(FILE *file);
 extern char *conf_word(FILE *file, int allow_key);
 extern void print_quoted(char *str);
@@ -1416,6 +1420,7 @@ extern char *stat2devnm(struct stat *st);
 extern char *fd2devnm(int fd);
 
 extern int in_initrd(void);
+extern int get_cluster_name(char **name);
 
 #define _ROUND_UP(val, base)	(((val) + (base) - 1) & ~(base - 1))
 #define ROUND_UP(val, base)	_ROUND_UP(val, (typeof(val))(base))
diff --git a/util.c b/util.c
index cc98d3b..ed9a745 100644
--- a/util.c
+++ b/util.c
@@ -34,6 +34,8 @@
 #include	<ctype.h>
 #include	<dirent.h>
 #include	<signal.h>
+#include	<dlfcn.h>
+#include	<corosync/cmap.h>
 
 /*
  * following taken from linux/blkpg.h because they aren't
@@ -1976,3 +1978,51 @@ void reopen_mddev(int mdfd)
 	if (fd >= 0 && fd != mdfd)
 		dup2(fd, mdfd);
 }
+
+int get_cluster_name(char **cluster_name)
+{
+        void *lib_handle = NULL;
+        int rv = -1;
+
+        cmap_handle_t handle;
+        static int (*initialize)(cmap_handle_t *handle);
+        static int (*get_string)(cmap_handle_t handle,
+                        const char *string,
+                        char **name);
+        static int (*finalize)(cmap_handle_t handle);
+
+
+        lib_handle = dlopen("libcmap.so.4", RTLD_NOW | RTLD_LOCAL);
+        if (!lib_handle)
+                return rv;
+
+        initialize = dlsym(lib_handle, "cmap_initialize");
+        if (!initialize)
+                goto out;
+
+        get_string = dlsym(lib_handle, "cmap_get_string");
+        if (!get_string)
+                goto out;
+
+        finalize = dlsym(lib_handle, "cmap_finalize");
+        if (!finalize)
+                goto out;
+
+        rv = initialize(&handle);
+        if (rv != CS_OK)
+                goto out;
+
+        rv = get_string(handle, "totem.cluster_name", cluster_name);
+        if (rv != CS_OK) {
+                free(*cluster_name);
+                rv = -1;
+                goto name_err;
+        }
+
+        rv = 0;
+name_err:
+        finalize(handle);
+out:
+        dlclose(lib_handle);
+        return rv;
+}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 04/11] Show all bitmaps while examining bitmap
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (2 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 03/11] home-cluster while creating an array gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 05/11] Add a new clustered disk gqjiang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

This adds capability of exmining bitmaps corresponding to all
nodes/slots on the device.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 bitmap.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 54 insertions(+), 18 deletions(-)

diff --git a/bitmap.c b/bitmap.c
index 920033a..bccc67c 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -260,7 +260,7 @@ int ExamineBitmap(char *filename, int brief, struct supertype *st)
 	int rv = 1;
 	char buf[64];
 	int swap;
-	int fd;
+	int fd, i;
 	__u32 uuid32[4];
 
 	fd = bitmap_file_open(filename, &st);
@@ -317,23 +317,59 @@ int ExamineBitmap(char *filename, int brief, struct supertype *st)
 		       uuid32[2],
 		       uuid32[3]);
 
-	printf("          Events : %llu\n", (unsigned long long)sb->events);
-	printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
-	printf("           State : %s\n", bitmap_state(sb->state));
-	printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
-	printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
-	if (sb->write_behind)
-		sprintf(buf, "Allow write behind, max %d", sb->write_behind);
-	else
-		sprintf(buf, "Normal");
-	printf("      Write Mode : %s\n", buf);
-	printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
-					human_size(sb->sync_size * 512));
-	if (brief)
-		goto free_info;
-	printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
-			info->total_bits, info->dirty_bits,
-			100.0 * info->dirty_bits / (info->total_bits?:1));
+	if (sb->nodes == 0) {
+		printf("          Events : %llu\n", (unsigned long long)sb->events);
+		printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
+		printf("           State : %s\n", bitmap_state(sb->state));
+		printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
+		printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
+		if (sb->write_behind)
+			sprintf(buf, "Allow write behind, max %d", sb->write_behind);
+		else
+			sprintf(buf, "Normal");
+		printf("      Write Mode : %s\n", buf);
+		printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
+						human_size(sb->sync_size * 512));
+		if (brief)
+			goto free_info;
+		printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
+				info->total_bits, info->dirty_bits,
+				100.0 * info->dirty_bits / (info->total_bits?:1));
+	} else {
+		printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
+		printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
+		if (sb->write_behind)
+			sprintf(buf, "Allow write behind, max %d", sb->write_behind);
+		else
+			sprintf(buf, "Normal");
+		printf("      Write Mode : %s\n", buf);
+		printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
+						human_size(sb->sync_size * 512));
+		printf("   Cluster nodes : %d\n", sb->nodes);
+		printf("    Cluster name : %s\n", sb->cluster_name);
+		i = 0;
+		do {
+			if (i) {
+				free(info);
+				info = bitmap_fd_read(fd, brief);
+				sb = &info->sb;
+			}
+			if (sb->magic != BITMAP_MAGIC)
+				pr_err("invalid bitmap magic 0x%x, the bitmap file appears to be corrupted\n", sb->magic);
+
+			printf("       Node Slot : %d\n", i);
+			printf("          Events : %llu\n", (unsigned long long)sb->events);
+			printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
+			printf("           State : %s\n", bitmap_state(sb->state));
+			if (brief)
+				continue;
+			printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
+					info->total_bits, info->dirty_bits,
+					100.0 * info->dirty_bits / (info->total_bits?:1));
+
+		} while (++i < (int)sb->nodes);
+	}
+
 free_info:
 	free(info);
 	return rv;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 05/11] Add a new clustered disk
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (3 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 04/11] Show all bitmaps while examining bitmap gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 06/11] Convert a bitmap=none device to clustered gqjiang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:

--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)

The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Manage.c   | 33 +++++++++++++++++++++++++++++----
 ReadMe.c   |  1 +
 md_p.h     |  7 +++++++
 md_u.h     |  1 +
 mdadm.8.in |  9 +++++++++
 mdadm.c    |  4 ++++
 mdadm.h    |  2 ++
 util.c     | 10 ++++++++++
 8 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/Manage.c b/Manage.c
index d3cfb55..4c3d451 100644
--- a/Manage.c
+++ b/Manage.c
@@ -690,7 +690,8 @@ skip_re_add:
 int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	       struct supertype *tst, mdu_array_info_t *array,
 	       int force, int verbose, char *devname,
-	       char *update, unsigned long rdev, unsigned long long array_size)
+	       char *update, unsigned long rdev, unsigned long long array_size,
+	       int raid_slot)
 {
 	unsigned long long ldsize;
 	struct supertype *dev_st = NULL;
@@ -879,7 +880,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	}
 	disc.major = major(rdev);
 	disc.minor = minor(rdev);
-	disc.number =j;
+	if (raid_slot < 0)
+		disc.number = j;
+	else
+		disc.number = raid_slot;
 	disc.state = 0;
 	if (array->not_persistent==0) {
 		int dfd;
@@ -920,6 +924,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			}
 		free(used);
 	}
+
+	if (array->state & (1 << MD_SB_CLUSTERED)) {
+		if (dv->disposition == 'c')
+			disc.state |= (1 << MD_DISK_CANDIDATE);
+		else
+			disc.state |= (1 << MD_DISK_CLUSTER_ADD);
+	}
+
 	if (dv->writemostly == 1)
 		disc.state |= (1 << MD_DISK_WRITEMOSTLY);
 	if (tst->ss->external) {
@@ -1239,6 +1251,7 @@ int Manage_subdevs(char *devname, int fd,
 	 *        variant on 'A'
 	 *  'F' - Another variant of 'A', where the device was faulty
 	 *        so must be removed from the array first.
+	 *  'c' - confirm the device as found (for clustered environments)
 	 *
 	 * For 'f' and 'r', the device can also be a kernel-internal
 	 * name such as 'sdb'.
@@ -1254,6 +1267,7 @@ int Manage_subdevs(char *devname, int fd,
 	struct mdinfo info;
 	int frozen = 0;
 	int busy = 0;
+	int raid_slot = -1;
 
 	if (ioctl(fd, GET_ARRAY_INFO, &array)) {
 		pr_err("Cannot get array info for %s\n",
@@ -1282,6 +1296,11 @@ int Manage_subdevs(char *devname, int fd,
 		int rv;
 		int mj,mn;
 
+		raid_slot = -1;
+		if (dv->disposition == 'c')
+			parse_cluster_confirm_arg(dv->devname, &dv->devname,
+					&raid_slot);
+
 		if (strcmp(dv->devname, "failed") == 0 ||
 		    strcmp(dv->devname, "faulty") == 0) {
 			if (dv->disposition != 'A'
@@ -1307,6 +1326,11 @@ int Manage_subdevs(char *devname, int fd,
 		if (strcmp(dv->devname, "missing") == 0) {
 			struct mddev_dev *add_devlist = NULL;
 			struct mddev_dev **dp;
+			if (dv->disposition == 'c') {
+				rv = ioctl(fd, CLUSTERED_DISK_NACK, NULL);
+				break;
+			}
+
 			if (dv->disposition != 'A') {
 				pr_err("'missing' only meaningful with --re-add\n");
 				goto abort;
@@ -1399,7 +1423,7 @@ int Manage_subdevs(char *devname, int fd,
 			else {
 				int open_err = errno;
 				if (stat(dv->devname, &stb) != 0) {
-					pr_err("Cannot find %s: %s\n",
+					pr_err("%s: %d Cannot find %s: %s\n", __func__, __LINE__,
 					       dv->devname, strerror(errno));
 					goto abort;
 				}
@@ -1437,6 +1461,7 @@ int Manage_subdevs(char *devname, int fd,
 		case 'A':
 		case 'M': /* --re-add missing */
 		case 'F': /* --re-add faulty  */
+		case 'c': /* --cluster-confirm */
 			/* add the device */
 			if (subarray) {
 				pr_err("Cannot add disks to a \'member\' array, perform this operation on the parent container\n");
@@ -1470,7 +1495,7 @@ int Manage_subdevs(char *devname, int fd,
 			}
 			rv = Manage_add(fd, tfd, dv, tst, &array,
 					force, verbose, devname, update,
-					rdev, array_size);
+					rdev, array_size, raid_slot);
 			close(tfd);
 			tfd = -1;
 			if (rv < 0)
diff --git a/ReadMe.c b/ReadMe.c
index c6286ae..c854cd5 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -169,6 +169,7 @@ struct option long_options[] = {
     {"wait",	  0, 0,  WaitOpt},
     {"wait-clean", 0, 0, Waitclean },
     {"action",    1, 0, Action },
+    {"cluster-confirm", 0, 0, ClusterConfirm},
 
     /* For Detail/Examine */
     {"brief",	  0, 0, Brief},
diff --git a/md_p.h b/md_p.h
index c4846ba..e59504f 100644
--- a/md_p.h
+++ b/md_p.h
@@ -78,6 +78,12 @@
 #define MD_DISK_ACTIVE		1 /* disk is running but may not be in sync */
 #define MD_DISK_SYNC		2 /* disk is in sync with the raid set */
 #define MD_DISK_REMOVED		3 /* disk is in sync with the raid set */
+#define MD_DISK_CLUSTER_ADD     4 /* Initiate a disk add across the cluster
+				   * For clustered enviroments only.
+				   */
+#define MD_DISK_CANDIDATE	5 /* disk is added as spare (local) until confirmed
+				   * For clustered enviroments only.
+				   */
 
 #define	MD_DISK_WRITEMOSTLY	9 /* disk is "write-mostly" is RAID1 config.
 				   * read requests will only be sent here in
@@ -106,6 +112,7 @@ typedef struct mdp_device_descriptor_s {
 #define MD_SB_BLOCK_CONTAINER_RESHAPE 3 /* block container wide reshapes */
 #define MD_SB_BLOCK_VOLUME	4 /* block activation of array, other arrays
 				   * in container can be activated */
+#define MD_SB_CLUSTERED		5 /* MD is clustered  */
 #define	MD_SB_BITMAP_PRESENT	8 /* bitmap may be present nearby */
 
 typedef struct mdp_superblock_s {
diff --git a/md_u.h b/md_u.h
index be9868a..76068d6 100644
--- a/md_u.h
+++ b/md_u.h
@@ -44,6 +44,7 @@
 #define STOP_ARRAY		_IO (MD_MAJOR, 0x32)
 #define STOP_ARRAY_RO		_IO (MD_MAJOR, 0x33)
 #define RESTART_ARRAY_RW	_IO (MD_MAJOR, 0x34)
+#define CLUSTERED_DISK_NACK	_IO (MD_MAJOR, 0x35)
 
 typedef struct mdu_version_s {
 	int major;
diff --git a/mdadm.8.in b/mdadm.8.in
index 56d9bcc..57873ec 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -1406,6 +1406,15 @@ will avoid reading from these devices if possible.
 .BR \-\-readwrite
 Subsequent devices that are added or re\-added will have the 'write-mostly'
 flag cleared.
+.TP
+.BR \-\-cluster\-confirm
+Confirm the existence of the device. This is issued in response to an \-\-add
+request by a node in a cluster. When a node adds a device it sends a message
+to all nodes in the cluster to look for a device with a UUID. This translates
+to a udev notification with the UUID of the device to be added and the slot
+number. The receiving node must acknowledge this message
+with \-\-cluster\-confirm. Valid arguments are <slot>:<devicename> in case
+the device is found or <slot>:missing in case the device is not found.
 
 .P
 Each of these options requires that the first device listed is the array
diff --git a/mdadm.c b/mdadm.c
index 4f9bd9b..5ccbecc 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -196,6 +196,7 @@ int main(int argc, char *argv[])
 		case 'f':
 		case Fail:
 		case ReAdd: /* re-add */
+		case ClusterConfirm:
 			if (!mode) {
 				newmode = MANAGE;
 				shortopt = short_bitmap_options;
@@ -933,6 +934,9 @@ int main(int argc, char *argv[])
 					   * remove the device */
 			devmode = 'f';
 			continue;
+		case O(MANAGE, ClusterConfirm):
+			devmode = 'c';
+			continue;
 		case O(MANAGE,Replace):
 			/* Mark these devices for replacement */
 			devmode = 'R';
diff --git a/mdadm.h b/mdadm.h
index f56d9d6..00c726e 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -346,6 +346,7 @@ enum special_options {
 	Action,
 	Nodes,
 	ClusterName,
+	ClusterConfirm,
 };
 
 enum prefix_standard {
@@ -1281,6 +1282,7 @@ extern int parse_uuid(char *str, int uuid[4]);
 extern int parse_layout_10(char *layout);
 extern int parse_layout_faulty(char *layout);
 extern long parse_num(char *num);
+extern int parse_cluster_confirm_arg(char *inp, char **devname, int *slot);
 extern int check_ext2(int fd, char *name);
 extern int check_reiser(int fd, char *name);
 extern int check_raid(int fd, char *name);
diff --git a/util.c b/util.c
index ed9a745..8d27564 100644
--- a/util.c
+++ b/util.c
@@ -273,6 +273,16 @@ long parse_num(char *num)
 }
 #endif
 
+int parse_cluster_confirm_arg(char *input, char **devname, int *slot)
+{
+	char *dev;
+	*slot = strtoul(input, &dev, 10);
+	if (dev == input || dev[0] != ':')
+		return -1;
+	*devname = dev+1;
+	return 0;
+}
+
 void remove_partitions(int fd)
 {
 	/* remove partitions from this block devices.
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 06/11] Convert a bitmap=none device to clustered
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (4 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 05/11] Add a new clustered disk gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 07/11] Skip clustered devices in incremental gqjiang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

This adds the ability to convert a regular md without bitmap
(--bitmap=none) to a clustered device (--bitmap=clustered).

To convert a device with --bitmap=internal or --bitmap=external,
you have to convert to --bitmap=none and then re-execute the
command with --bitmap=clustered.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Grow.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/Grow.c b/Grow.c
index 9a573fd..1122cec 100644
--- a/Grow.c
+++ b/Grow.c
@@ -330,9 +330,16 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 			}
 			return 0;
 		}
-		pr_err("Internal bitmap already present on %s\n",
-			devname);
-		return 1;
+		if ((strcmp(s->bitmap_file, "clustered")==0) && (array.state & (1<<MD_SB_CLUSTERED))) {
+			pr_err("Clustered bitmap already present on %s\n",
+					devname);
+			return 1;
+		}
+		if ((strcmp(s->bitmap_file, "internal")==0) && (!(array.state & (1<<MD_SB_CLUSTERED)))) {
+			pr_err("Internal bitmap already present on %s\n",
+					devname);
+			return 1;
+		}
 	}
 
 	if (strcmp(s->bitmap_file, "none") == 0) {
@@ -375,7 +382,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 		free(st);
 		return 1;
 	}
-	if (strcmp(s->bitmap_file, "internal") == 0) {
+	if ((strcmp(s->bitmap_file, "internal") == 0) ||
+		(strcmp(s->bitmap_file, "clustered") == 0)) {
 		int rv;
 		int d;
 		int offset_setable = 0;
@@ -384,6 +392,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 			pr_err("Internal bitmaps not supported with %s metadata\n", st->ss->name);
 			return 1;
 		}
+		st->nodes = c->nodes;
+		st->cluster_name = c->homecluster;
 		mdi = sysfs_read(fd, NULL, GET_BITMAP_LOCATION);
 		if (mdi)
 			offset_setable = 1;
@@ -426,6 +436,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 			rv = sysfs_set_num_signed(mdi, NULL, "bitmap/location",
 						  mdi->bitmap_offset);
 		} else {
+			if (strcmp(s->bitmap_file, "clustered") == 0)
+				array.state |= (1<<MD_SB_CLUSTERED);
 			array.state |= (1<<MD_SB_BITMAP_PRESENT);
 			rv = ioctl(fd, SET_ARRAY_INFO, &array);
 		}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 07/11] Skip clustered devices in incremental
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (5 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 06/11] Convert a bitmap=none device to clustered gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 08/11] mdadm: add the ability to change cluster name gqjiang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

We want the clustered devices to be started exclusively by a cluster
resource-agent. So, avoid starting using the incremental option.

This also skips a clustered md from starting during boot in inactive mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Incremental.c | 5 +++++
 super1.c      | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/Incremental.c b/Incremental.c
index 0c9a9a4..5450a5c 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -232,6 +232,11 @@ int Incremental(struct mddev_dev *devlist, struct context *c,
 				devname);
 		goto out;
 	}
+	/* Skip the clustered ones. This should be started by
+	 * clustering resource agents
+	 */
+	if (info.array.state & (1 << MD_SB_CLUSTERED))
+		goto out;
 
 	/* 3a/ if not, check for homehost match.  If no match, continue
 	 * but don't trust the 'name' in the array. Thus a 'random' minor
diff --git a/super1.c b/super1.c
index 57b1526..246d34e 100644
--- a/super1.c
+++ b/super1.c
@@ -891,6 +891,8 @@ static void getinfo_super1(struct supertype *st, struct mdinfo *info, char *map)
 	info->array.state =
 		(__le64_to_cpu(sb->resync_offset) == MaxSector)
 		? 1 : 0;
+	if (__le32_to_cpu(bsb->nodes) > 1)
+		info->array.state |= (1 << MD_SB_CLUSTERED);
 
 	info->data_offset = __le64_to_cpu(sb->data_offset);
 	info->component_size = __le64_to_cpu(sb->size);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 08/11] mdadm: add the ability to change cluster name
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (6 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 07/11] Skip clustered devices in incremental gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 09/11] mdadm: change the num of cluster node gqjiang
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

To support change the cluster name, the commit do the followings:

1. extend original write_bitmap function for new scenario.
2. add the scenarion to handle the modification of cluster's name
   in write_bitmap1.
3. make update_super1 can change the name in mdp_superblock_1.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Assemble.c |  5 +++++
 Grow.c     |  2 +-
 mdadm.c    |  3 +++
 mdadm.h    |  7 ++++++-
 super0.c   |  4 ++--
 super1.c   | 43 ++++++++++++++++++++++++++++++++++++++++---
 6 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 25a103d..e1b846c 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -644,6 +644,11 @@ static int load_devices(struct devs *devices, char *devmap,
 				*stp = st;
 				return -1;
 			}
+			if (strcmp(c->update, "home-cluster") == 0) {
+				err = tst->ss->update_super(tst, content, c->update,
+							    devname, 0, 0, c->homecluster);
+				tst->ss->write_bitmap(tst, dfd, NameUpdate);
+			}
 			if (strcmp(c->update, "uuid")==0 &&
 			    !ident->uuid_set) {
 				ident->uuid_set = 1;
diff --git a/Grow.c b/Grow.c
index 1122cec..bf44e66 100644
--- a/Grow.c
+++ b/Grow.c
@@ -420,7 +420,7 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 						    bitmapsize, offset_setable,
 						    major)
 						)
-						st->ss->write_bitmap(st, fd2);
+						st->ss->write_bitmap(st, fd2, NoUpdate);
 					else {
 						pr_err("failed to create internal bitmap - chunksize problem.\n");
 						close(fd2);
diff --git a/mdadm.c b/mdadm.c
index 5ccbecc..f597960 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -598,6 +598,7 @@ int main(int argc, char *argv[])
 			}
 			continue;
 		case O(CREATE, ClusterName):
+		case O(ASSEMBLE, ClusterName):
 			c.homecluster = optarg;
 			if (strlen(c.homecluster) > 64) {
 				pr_err("Cluster name too big.\n");
@@ -741,6 +742,8 @@ int main(int argc, char *argv[])
 				continue;
 			if (strcmp(c.update, "homehost")==0)
 				continue;
+			if (strcmp(c.update, "home-cluster")==0)
+				continue;
 			if (strcmp(c.update, "devicesize")==0)
 				continue;
 			if (strcmp(c.update, "no-bitmap")==0)
diff --git a/mdadm.h b/mdadm.h
index 00c726e..d8b0749 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -354,6 +354,11 @@ enum prefix_standard {
 	IEC
 };
 
+enum bitmap_update {
+    NoUpdate,
+    NameUpdate,
+};
+
 /* structures read from config file */
 /* List of mddevice names and identifiers
  * Identifiers can be:
@@ -850,7 +855,7 @@ extern struct superswitch {
 	/* if add_internal_bitmap succeeded for existing array, this
 	 * writes it out.
 	 */
-	int (*write_bitmap)(struct supertype *st, int fd);
+	int (*write_bitmap)(struct supertype *st, int fd, enum bitmap_update update);
 	/* Free the superblock and any other allocated data */
 	void (*free_super)(struct supertype *st);
 
diff --git a/super0.c b/super0.c
index deb5999..6ad9d39 100644
--- a/super0.c
+++ b/super0.c
@@ -900,7 +900,7 @@ static int write_init_super0(struct supertype *st)
 		rv = store_super0(st, di->fd);
 
 		if (rv == 0 && (sb->state & (1<<MD_SB_BITMAP_PRESENT)))
-			rv = st->ss->write_bitmap(st, di->fd);
+			rv = st->ss->write_bitmap(st, di->fd, NoUpdate);
 
 		if (rv)
 			pr_err("failed to write superblock to %s\n",
@@ -1175,7 +1175,7 @@ static void locate_bitmap0(struct supertype *st, int fd)
 	lseek64(fd, offset, 0);
 }
 
-static int write_bitmap0(struct supertype *st, int fd)
+static int write_bitmap0(struct supertype *st, int fd, enum bitmap_update update)
 {
 	unsigned long long dsize;
 	unsigned long long offset;
diff --git a/super1.c b/super1.c
index 246d34e..f3a3528 100644
--- a/super1.c
+++ b/super1.c
@@ -1073,7 +1073,23 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 		info->name[32] = 0;
 	}
 
-	if (strcmp(update, "force-one")==0) {
+	if (strcmp(update, "home-cluster") == 0 &&
+	    homehost) {
+		/* Note that 'home-cluster' is to change the name of cluster,
+		 * it is another "name" update.
+		 */
+		char *new_name = xmalloc(sizeof(sb->set_name));
+		if (strrchr(sb->set_name, ':')) {
+			strcpy(new_name, strchr(sb->set_name, ':'));
+		}
+
+		memset(sb->set_name, 0, sizeof(sb->set_name));
+		strcpy(sb->set_name, homehost);
+		if (new_name)
+			strcat(sb->set_name, new_name);
+
+		free(new_name);
+	} else if (strcmp(update, "force-one")==0) {
 		/* Not enough devices for a working array,
 		 * so bring this one up-to-date
 		 */
@@ -1691,7 +1707,7 @@ static int write_init_super1(struct supertype *st)
 		sb->sb_csum = calc_sb_1_csum(sb);
 		rv = store_super1(st, di->fd);
 		if (rv == 0 && (__le32_to_cpu(sb->feature_map) & 1))
-			rv = st->ss->write_bitmap(st, di->fd);
+			rv = st->ss->write_bitmap(st, di->fd, NoUpdate);
 		close(di->fd);
 		di->fd = -1;
 		if (rv)
@@ -2175,7 +2191,7 @@ static void locate_bitmap1(struct supertype *st, int fd)
 	lseek64(fd, offset<<9, 0);
 }
 
-static int write_bitmap1(struct supertype *st, int fd)
+static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update)
 {
 	struct mdp_superblock_1 *sb = st->sb;
 	bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb)+MAX_SB_SIZE);
@@ -2184,6 +2200,27 @@ static int write_bitmap1(struct supertype *st, int fd)
 	int towrite, n;
 	struct align_fd afd;
 	unsigned int i;
+	char *new_name;
+
+	switch (update) {
+	case NameUpdate:
+	    new_name = xmalloc(sizeof(sb->set_name));
+
+	    strncpy(new_name, sb->set_name, sizeof(sb->set_name));
+	    memset((char *)bms->cluster_name, 0, sizeof(bms->cluster_name));
+
+	    if (strtok(new_name, ":"))
+		strncpy((char *)bms->cluster_name, new_name, strlen(sb->set_name));
+	    else
+		/* In case the original set_name doesn't like aaa:md* */
+		strncpy((char *)bms->cluster_name, sb->set_name, strlen(sb->set_name));
+
+	    free(new_name);
+	    break;
+	case NoUpdate:
+	default:
+	    break;
+	}
 
 	init_afd(&afd, fd);
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 09/11] mdadm: change the num of cluster node
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (7 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 08/11] mdadm: add the ability to change cluster name gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 10/11] Reuse calc_bitmap_size to reduce code size gqjiang
  2015-05-08  8:50 ` [PATCH V2 11/11] Reuse the write_bitmap for update uuid gqjiang
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

This extends nodes option for assemble mode, make the num of
cluster node could be change by user.

Before that, it is necessary to ensure there are enough space
for those nodes, calc_bitmap_size is introduced to calculate
the bitmap size of each node.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Assemble.c |  4 ++++
 ReadMe.c   |  2 +-
 mdadm.c    |  3 +++
 mdadm.h    |  1 +
 super1.c   | 37 +++++++++++++++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/Assemble.c b/Assemble.c
index e1b846c..9ff546b 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -626,6 +626,10 @@ static int load_devices(struct devs *devices, char *devmap,
 
 			if (strcmp(c->update, "byteorder") == 0)
 				err = 0;
+			else if (strcmp(c->update, "nodes") == 0) {
+				tst->nodes = c->nodes;
+				err = tst->ss->write_bitmap(tst, dfd, NodeNumUpdate);
+			}
 			else
 				err = tst->ss->update_super(tst, content, c->update,
 							    devname, c->verbose,
diff --git a/ReadMe.c b/ReadMe.c
index c854cd5..d1830e1 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -140,7 +140,7 @@ struct option long_options[] = {
     {"homehost",  1, 0,  HomeHost},
     {"symlinks",  1, 0,  Symlinks},
     {"data-offset",1, 0, DataOffset},
-    {"nodes",1, 0, Nodes},
+    {"nodes",1, 0, Nodes}, /* also for --assemble */
     {"home-cluster",1, 0, ClusterName},
 
     /* For assemble */
diff --git a/mdadm.c b/mdadm.c
index f597960..a423592 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -589,6 +589,7 @@ int main(int argc, char *argv[])
 			}
 			ident.raid_disks = s.raiddisks;
 			continue;
+		case O(ASSEMBLE, Nodes):
 		case O(CREATE, Nodes):
 			c.nodes = parse_num(optarg);
 			if (c.nodes <= 0) {
@@ -744,6 +745,8 @@ int main(int argc, char *argv[])
 				continue;
 			if (strcmp(c.update, "home-cluster")==0)
 				continue;
+			if (strcmp(c.update, "nodes")==0)
+				continue;
 			if (strcmp(c.update, "devicesize")==0)
 				continue;
 			if (strcmp(c.update, "no-bitmap")==0)
diff --git a/mdadm.h b/mdadm.h
index d8b0749..97892e6 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -357,6 +357,7 @@ enum prefix_standard {
 enum bitmap_update {
     NoUpdate,
     NameUpdate,
+    NodeNumUpdate,
 };
 
 /* structures read from config file */
diff --git a/super1.c b/super1.c
index f3a3528..50675be 100644
--- a/super1.c
+++ b/super1.c
@@ -134,6 +134,20 @@ struct misc_dev_info {
 					|MD_FEATURE_NEW_OFFSET		\
 					)
 
+/* return how many bytes are needed for bitmap, for cluster-md each node
+ * should have it's own bitmap */
+static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned int boundary)
+{
+	unsigned long long bits, bytes;
+
+	bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
+	bytes = (bits+7) >> 3;
+	bytes += sizeof(bitmap_super_t);
+	bytes = ROUND_UP(bytes, boundary);
+
+	return bytes;
+}
+
 static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb)
 {
 	unsigned int disk_csum, csum;
@@ -2201,6 +2215,7 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 	struct align_fd afd;
 	unsigned int i;
 	char *new_name;
+	unsigned long long total_bm_space, bm_space_per_node;
 
 	switch (update) {
 	case NameUpdate:
@@ -2217,6 +2232,28 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 
 	    free(new_name);
 	    break;
+	case NodeNumUpdate:
+	    /* cluster md only supports superblock 1.2 now */
+	    if (st->minor_version != 2) {
+		pr_err("Warning: cluster md only works with superblock 1.2\n");
+		return -EINVAL;
+	    }
+
+	    /* Each node has an independent bitmap, it is necessary to calculate the
+	     * space is enough or not, first get how many bytes for the total bitmap */
+	    bm_space_per_node = calc_bitmap_size(bms, 4096);
+
+	    total_bm_space = 512 * (__le64_to_cpu(sb->data_offset) - __le64_to_cpu(sb->super_offset));
+	    total_bm_space = total_bm_space - 4096; /* leave another 4k for superblock */
+
+	    if (bm_space_per_node * st->nodes > total_bm_space) {
+		pr_err("Warning: The max num of nodes can't exceed %llu\n",
+			total_bm_space / bm_space_per_node);
+		return -ENOMEM;
+	    }
+
+	    bms->nodes = __cpu_to_le32(st->nodes);
+	    break;
 	case NoUpdate:
 	default:
 	    break;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 10/11] Reuse calc_bitmap_size to reduce code size
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (8 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 09/11] mdadm: change the num of cluster node gqjiang
@ 2015-05-08  8:50 ` gqjiang
  2015-05-08  8:50 ` [PATCH V2 11/11] Reuse the write_bitmap for update uuid gqjiang
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

We can use the new added calc_bitmap_size func to
remove some redundant lines.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 super1.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/super1.c b/super1.c
index 50675be..81e8fca 100644
--- a/super1.c
+++ b/super1.c
@@ -695,12 +695,8 @@ static int copy_metadata1(struct supertype *st, int from, int to)
 				/* have the header, can calculate
 				 * correct bitmap bytes */
 				bitmap_super_t *bms;
-				int bits;
 				bms = (void*)buf;
-				bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
-				bytes = (bits+7) >> 3;
-				bytes += sizeof(bitmap_super_t);
-				bytes = ROUND_UP(bytes, 512);
+				bytes = calc_bitmap_size(bms, 512);
 				if (n > bytes)
 					n =  bytes;
 			}
@@ -2280,11 +2276,7 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 			memset(buf, 0xff, 4096);
 		memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
 
-		towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
-		towrite = (towrite+7) >> 3; /* bits to bytes */
-		towrite += sizeof(bitmap_super_t);
-		/* we need the bitmaps to be at 4k boundary */
-		towrite = ROUND_UP(towrite, 4096);
+		towrite = calc_bitmap_size(bms, 4096);
 		while (towrite > 0) {
 			n = towrite;
 			if (n > 4096)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 11/11] Reuse the write_bitmap for update uuid
  2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
                   ` (9 preceding siblings ...)
  2015-05-08  8:50 ` [PATCH V2 10/11] Reuse calc_bitmap_size to reduce code size gqjiang
@ 2015-05-08  8:50 ` gqjiang
  10 siblings, 0 replies; 14+ messages in thread
From: gqjiang @ 2015-05-08  8:50 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

From: Guoqing Jiang <gqjiang@suse.com>

Since write_bitmap is extended for handle different sistuations,
then it also could possible to support change the uuid of bitmap,
and remove bitmap_update_uuid accordingly.

Q: is the write_bitmap0 also impacted?

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Assemble.c |  5 ++---
 bitmap.c   | 20 --------------------
 mdadm.h    |  2 +-
 super1.c   |  4 ++++
 4 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 9ff546b..fabca5c 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -664,9 +664,8 @@ static int load_devices(struct devs *devices, char *devmap,
 
 			if (strcmp(c->update, "uuid")==0 &&
 			    ident->bitmap_fd >= 0 && !bitmap_done) {
-				if (bitmap_update_uuid(ident->bitmap_fd,
-						       content->uuid,
-						       tst->ss->swapuuid) != 0)
+				copy_uuid(tst->devs->uuid, content->uuid, tst->ss->swapuuid);
+				if (tst->ss->write_bitmap(tst, dfd, UUIDUpdate))
 					pr_err("Could not update uuid on external bitmap.\n");
 				else
 					bitmap_done = 1;
diff --git a/bitmap.c b/bitmap.c
index bccc67c..7df296e 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -462,23 +462,3 @@ out:
 		unlink(filename); /* possibly corrupted, better get rid of it */
 	return rv;
 }
-
-int bitmap_update_uuid(int fd, int *uuid, int swap)
-{
-	struct bitmap_super_s bm;
-	if (lseek(fd, 0, 0) != 0)
-		return 1;
-	if (read(fd, &bm, sizeof(bm)) != sizeof(bm))
-		return 1;
-	if (bm.magic != __cpu_to_le32(BITMAP_MAGIC))
-		return 1;
-	copy_uuid(bm.uuid, uuid, swap);
-	if (lseek(fd, 0, 0) != 0)
-		return 2;
-	if (write(fd, &bm, sizeof(bm)) != sizeof(bm)) {
-		lseek(fd, 0, 0);
-		return 2;
-	}
-	lseek(fd, 0, 0);
-	return 0;
-}
diff --git a/mdadm.h b/mdadm.h
index 97892e6..7b9bb28 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -358,6 +358,7 @@ enum bitmap_update {
     NoUpdate,
     NameUpdate,
     NodeNumUpdate,
+    UUIDUpdate,
 };
 
 /* structures read from config file */
@@ -1273,7 +1274,6 @@ extern int CreateBitmap(char *filename, int force, char uuid[16],
 			int major);
 extern int ExamineBitmap(char *filename, int brief, struct supertype *st);
 extern int Write_rules(char *rule_name);
-extern int bitmap_update_uuid(int fd, int *uuid, int swap);
 extern unsigned long bitmap_sectors(struct bitmap_super_s *bsb);
 extern int Dump_metadata(char *dev, char *dir, struct context *c,
 			 struct supertype *st);
diff --git a/super1.c b/super1.c
index 81e8fca..b2e960d 100644
--- a/super1.c
+++ b/super1.c
@@ -2250,6 +2250,10 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 
 	    bms->nodes = __cpu_to_le32(st->nodes);
 	    break;
+	case UUIDUpdate:
+	    memset((char *)bms->uuid, 0, sizeof(bms->uuid));
+	    strncpy((char *)bms->uuid, (char *)st->devs->uuid, sizeof(bms->uuid));
+	    break;
 	case NoUpdate:
 	default:
 	    break;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 01/11] Create n bitmaps for clustered mode
  2015-05-08  8:50 ` [PATCH V2 01/11] Create n bitmaps for clustered mode gqjiang
@ 2015-05-15  5:45   ` NeilBrown
  2015-05-18  2:21     ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2015-05-15  5:45 UTC (permalink / raw)
  To: gqjiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 7004 bytes --]

On Fri,  8 May 2015 16:50:19 +0800 gqjiang@suse.com wrote:

> From: Guoqing Jiang <gqjiang@suse.com>
> 
> For a clustered MD, create bitmaps equal to number of nodes so
> each node has an independent bitmap.
> 
> Only the first bitmap is has the bits set so that the first node
> that assembles the device also performs the sync.
> 
> The bitmaps are aligned to 4k boundaries.
> 
> On-disk format:
> 
> 0                    4k                     8k                    12k
> -------------------------------------------------------------------
> | idle                | md super            | bm super [0] + bits |
> | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
> | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
> | bm bits [3, contd]  |                     |                     |
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Create.c   |  3 ++-
>  bitmap.c   |  2 ++
>  bitmap.h   |  7 +++++--
>  mdadm.8.in |  7 ++++++-
>  mdadm.c    |  9 +++++++++
>  super1.c   | 59 +++++++++++++++++++++++++++++++++++++++++------------------
>  6 files changed, 65 insertions(+), 22 deletions(-)
> 
> diff --git a/Create.c b/Create.c
> index ef28da0..69f5432 100644
> --- a/Create.c
> +++ b/Create.c
> @@ -750,7 +750,8 @@ int Create(struct supertype *st, char *mddev,
>  #endif
>  	}
>  
> -	if (s->bitmap_file && strcmp(s->bitmap_file, "internal")==0) {
> +	if (s->bitmap_file && (strcmp(s->bitmap_file, "internal")==0
> +			|| strcmp(s->bitmap_file, "clustered")==0)) {
>  		if ((vers%100) < 2) {
>  			pr_err("internal bitmaps not supported by this kernel.\n");
>  			goto abort_locked;
> diff --git a/bitmap.c b/bitmap.c
> index b1d54a6..920033a 100644
> --- a/bitmap.c
> +++ b/bitmap.c
> @@ -32,6 +32,8 @@ inline void sb_le_to_cpu(bitmap_super_t *sb)
>  	sb->daemon_sleep = __le32_to_cpu(sb->daemon_sleep);
>  	sb->sync_size = __le64_to_cpu(sb->sync_size);
>  	sb->write_behind = __le32_to_cpu(sb->write_behind);
> +	sb->nodes = __le32_to_cpu(sb->nodes);
> +	sb->sectors_reserved = __le32_to_cpu(sb->sectors_reserved);
>  }
>  
>  inline void sb_cpu_to_le(bitmap_super_t *sb)
> diff --git a/bitmap.h b/bitmap.h
> index c8725a3..adbf0b4 100644
> --- a/bitmap.h
> +++ b/bitmap.h
> @@ -154,8 +154,11 @@ typedef struct bitmap_super_s {
>  	__u32 chunksize;    /* 52  the bitmap chunk size in bytes */
>  	__u32 daemon_sleep; /* 56  seconds between disk flushes */
>  	__u32 write_behind; /* 60  number of outstanding write-behind writes */
> -
> -	__u8  pad[256 - 64]; /* set to zero */
> +	__u32 sectors_reserved; /* 64 number of 512-byte sectors that are
> +				 * reserved for the bitmap. */
> +	__u32 nodes;        /* 68 the maximum number of nodes in cluster. */
> +	__u8 cluster_name[64]; /* 72 cluster name to which this md belongs */
> +	__u8  pad[256 - 136]; /* set to zero */
>  } bitmap_super_t;
>  
>  /* notes:
> diff --git a/mdadm.8.in b/mdadm.8.in
> index a630310..4aec0db 100644
> --- a/mdadm.8.in
> +++ b/mdadm.8.in
> @@ -694,7 +694,12 @@ and so is replicated on all devices.  If the word
>  .B "none"
>  is given with
>  .B \-\-grow
> -mode, then any bitmap that is present is removed.
> +mode, then any bitmap that is present is removed. If the word
> +.B "clustered"
> +is given, the array is created for a clustered environment. One bitmap
> +is created for each node as defined by the
> +.B \-\-nodes
> +parameter and are stored internally.
>  
>  To help catch typing errors, the filename must contain at least one
>  slash ('/') if it is a real file (not 'internal' or 'none').
> diff --git a/mdadm.c b/mdadm.c
> index 3e8c49b..bd9382e 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -1097,6 +1097,15 @@ int main(int argc, char *argv[])
>  				s.bitmap_file = optarg;
>  				continue;
>  			}
> +			if (strcmp(optarg, "clustered")== 0) {
> +				s.bitmap_file = optarg;
> +				/* Set the default number of cluster nodes
> +				 * to 4 if not already set by user
> +				 */
> +				if (c.nodes < 1)
> +					c.nodes = 4;
> +				continue;
> +			}


Please make sure than mdadm compiles after each patch is applied.
You don't defined that 'nodes' field until a later patch.

NeilBrown


>  			/* probable typo */
>  			pr_err("bitmap file must contain a '/', or be 'internal', or 'none'\n"
>  				"       not '%s'\n", optarg);
> diff --git a/super1.c b/super1.c
> index f0508fe..57b1526 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -2144,6 +2144,10 @@ add_internal_bitmap1(struct supertype *st,
>  	bms->daemon_sleep = __cpu_to_le32(delay);
>  	bms->sync_size = __cpu_to_le64(size);
>  	bms->write_behind = __cpu_to_le32(write_behind);
> +	bms->nodes = __cpu_to_le32(st->nodes);
> +	if (st->cluster_name)
> +		strncpy((char *)bms->cluster_name,
> +				st->cluster_name, strlen(st->cluster_name));
>  
>  	*chunkp = chunk;
>  	return 1;
> @@ -2177,6 +2181,7 @@ static int write_bitmap1(struct supertype *st, int fd)
>  	void *buf;
>  	int towrite, n;
>  	struct align_fd afd;
> +	unsigned int i;
>  
>  	init_afd(&afd, fd);
>  
> @@ -2185,27 +2190,45 @@ static int write_bitmap1(struct supertype *st, int fd)
>  	if (posix_memalign(&buf, 4096, 4096))
>  		return -ENOMEM;
>  
> -	memset(buf, 0xff, 4096);
> -	memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
> -
> -	towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
> -	towrite = (towrite+7) >> 3; /* bits to bytes */
> -	towrite += sizeof(bitmap_super_t);
> -	towrite = ROUND_UP(towrite, 512);
> -	while (towrite > 0) {
> -		n = towrite;
> -		if (n > 4096)
> -			n = 4096;
> -		n = awrite(&afd, buf, n);
> -		if (n > 0)
> -			towrite -= n;
> +	/* We use bms->nodes as opposed to st->nodes to
> +	 * be compatible with write-after-reads such as
> +	 * the GROW operation.
> +	 */
> +	for (i = 0; i < __le32_to_cpu(bms->nodes); i++) {
> +		/* Only the bitmap[0] should resync
> +		 * whole device on initial assembly
> +		 */
> +		if (i)
> +			memset(buf, 0x00, 4096);
>  		else
> +			memset(buf, 0xff, 4096);
> +		memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
> +
> +		towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
> +		towrite = (towrite+7) >> 3; /* bits to bytes */
> +		towrite += sizeof(bitmap_super_t);
> +		/* we need the bitmaps to be at 4k boundary */
> +		towrite = ROUND_UP(towrite, 4096);
> +		while (towrite > 0) {
> +			n = towrite;
> +			if (n > 4096)
> +				n = 4096;
> +			n = awrite(&afd, buf, n);
> +			if (n > 0)
> +				towrite -= n;
> +			else
> +				break;
> +			if (i)
> +				memset(buf, 0x00, 4096);
> +			else
> +				memset(buf, 0xff, 4096);
> +		}
> +		fsync(fd);
> +		if (towrite) {
> +			rv = -2;
>  			break;
> -		memset(buf, 0xff, 4096);
> +		}
>  	}
> -	fsync(fd);
> -	if (towrite)
> -		rv = -2;
>  
>  	free(buf);
>  	return rv;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 01/11] Create n bitmaps for clustered mode
  2015-05-15  5:45   ` NeilBrown
@ 2015-05-18  2:21     ` Guoqing Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2015-05-18  2:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, rgoldwyn

NeilBrown wrote:
>> diff --git a/mdadm.c b/mdadm.c
>> index 3e8c49b..bd9382e 100644
>> --- a/mdadm.c
>> +++ b/mdadm.c
>> @@ -1097,6 +1097,15 @@ int main(int argc, char *argv[])
>>  				s.bitmap_file = optarg;
>>  				continue;
>>  			}
>> +			if (strcmp(optarg, "clustered")== 0) {
>> +				s.bitmap_file = optarg;
>> +				/* Set the default number of cluster nodes
>> +				 * to 4 if not already set by user
>> +				 */
>> +				if (c.nodes < 1)
>> +					c.nodes = 4;
>> +				continue;
>> +			}
>>     
>
>
> Please make sure than mdadm compiles after each patch is applied.
> You don't defined that 'nodes' field until a later patch.
>
>   
Oops, I will fix it in next version.

Thanks,
Guoqing
> NeilBrown
>
>   


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-05-18  2:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-08  8:50 [PATCH V2 00/11] mdadm tool: add the support for cluster-md gqjiang
2015-05-08  8:50 ` [PATCH V2 01/11] Create n bitmaps for clustered mode gqjiang
2015-05-15  5:45   ` NeilBrown
2015-05-18  2:21     ` Guoqing Jiang
2015-05-08  8:50 ` [PATCH V2 02/11] Add nodes option while creating md gqjiang
2015-05-08  8:50 ` [PATCH V2 03/11] home-cluster while creating an array gqjiang
2015-05-08  8:50 ` [PATCH V2 04/11] Show all bitmaps while examining bitmap gqjiang
2015-05-08  8:50 ` [PATCH V2 05/11] Add a new clustered disk gqjiang
2015-05-08  8:50 ` [PATCH V2 06/11] Convert a bitmap=none device to clustered gqjiang
2015-05-08  8:50 ` [PATCH V2 07/11] Skip clustered devices in incremental gqjiang
2015-05-08  8:50 ` [PATCH V2 08/11] mdadm: add the ability to change cluster name gqjiang
2015-05-08  8:50 ` [PATCH V2 09/11] mdadm: change the num of cluster node gqjiang
2015-05-08  8:50 ` [PATCH V2 10/11] Reuse calc_bitmap_size to reduce code size gqjiang
2015-05-08  8:50 ` [PATCH V2 11/11] Reuse the write_bitmap for update uuid gqjiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.