All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 00/11] mdadm tool: add the support for cluster-md
@ 2015-05-20  3:20 Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 01/11] Create n bitmaps for clustered mode Guoqing Jiang
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

V3 changes:
1. re-orgnize some codes to ensure mdadm compiles after each patch is applied
2. change the code for super1.c for first patch since it has side effect for
non-cluster condition

V2 changes:
1. re-arrange the squence of patches
2. add some memembers into sb_le_to_cpu
3. handle some logic change and comments from Neil

Basic background for Cluster MD: Cluster MD is a shared-device RAID for a
cluster, currently, the implementation is limited to RAID1 but with further
work (and some positive feedback), it could be extend to other RAID levels.

The kernel part code of cluster-md has been sent to maillist several month
ago by Goldywyn, and to make cluster-md works, the mdadm tools also need to
do some changes accordingly.

This patch set extends mdadm tool to aware cluster MD scenario, and handle
related cluster-md scenario.

1. the first part (0001-0007) comes from Goldwyn, which add initial
support for cluster-md, those changes included make mdadm awares nodes,
home-cluster and n bitmaps for clustered mode, also let mdadm can 
confirm disk which is added by another node.


2. the second part is for support change cluster-name and node nums under
assemble mode. Which extend write-bitmap to handle above cases, and also
use the extended write_bitmap for update uuid. [PATCH V2 10/10] is just compiled
test only.

BTW: this series is based on commit "72a457 IMSM: Count arrays per orom".

Some reltated links:
[1] http://marc.info/?l=linux-raid&m=141891941330336&w=2
[2] http://marc.info/?l=linux-raid&m=141935561418770&w=2

Guoqing Jiang (11):
  Create n bitmaps for clustered mode
  Add nodes option while creating md
  home-cluster while creating an array
  Show all bitmaps while examining bitmap
  Add a new clustered disk
  Convert a bitmap=none device to clustered
  Skip clustered devices in incremental
  mdadm: add the ability to change cluster name
  mdadm: change the num of cluster node
  Reuse calc_bitmap_size to reduce code size
  Reuse the write_bitmap for update uuid

 Assemble.c    |  14 ++++--
 Create.c      |   5 +-
 Grow.c        |  22 +++++++--
 Incremental.c |   5 ++
 Makefile      |   1 +
 Manage.c      |  33 +++++++++++--
 ReadMe.c      |   3 ++
 bitmap.c      |  94 ++++++++++++++++++++++---------------
 bitmap.h      |   7 ++-
 config.c      |  27 ++++++++++-
 md_p.h        |   7 +++
 md_u.h        |   1 +
 mdadm.8.in    |  28 +++++++++++-
 mdadm.c       |  69 +++++++++++++++++++++++++++-
 mdadm.h       |  20 +++++++-
 super0.c      |   4 +-
 super1.c      | 145 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 util.c        |  60 ++++++++++++++++++++++++
 18 files changed, 458 insertions(+), 87 deletions(-)

-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V3 01/11] Create n bitmaps for clustered mode
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 02/11] Add nodes option while creating md Guoqing Jiang
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

For a clustered MD, create bitmaps equal to number of nodes so
each node has an independent bitmap.

Only the first bitmap is has the bits set so that the first node
that assembles the device also performs the sync.

The bitmaps are aligned to 4k boundaries.

On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Create.c   |  3 ++-
 bitmap.c   |  2 ++
 bitmap.h   |  7 +++++--
 mdadm.8.in |  7 ++++++-
 super1.c   | 53 ++++++++++++++++++++++++++++++++++-------------------
 5 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/Create.c b/Create.c
index ef28da0..69f5432 100644
--- a/Create.c
+++ b/Create.c
@@ -750,7 +750,8 @@ int Create(struct supertype *st, char *mddev,
 #endif
 	}
 
-	if (s->bitmap_file && strcmp(s->bitmap_file, "internal")==0) {
+	if (s->bitmap_file && (strcmp(s->bitmap_file, "internal")==0
+			|| strcmp(s->bitmap_file, "clustered")==0)) {
 		if ((vers%100) < 2) {
 			pr_err("internal bitmaps not supported by this kernel.\n");
 			goto abort_locked;
diff --git a/bitmap.c b/bitmap.c
index b1d54a6..920033a 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -32,6 +32,8 @@ inline void sb_le_to_cpu(bitmap_super_t *sb)
 	sb->daemon_sleep = __le32_to_cpu(sb->daemon_sleep);
 	sb->sync_size = __le64_to_cpu(sb->sync_size);
 	sb->write_behind = __le32_to_cpu(sb->write_behind);
+	sb->nodes = __le32_to_cpu(sb->nodes);
+	sb->sectors_reserved = __le32_to_cpu(sb->sectors_reserved);
 }
 
 inline void sb_cpu_to_le(bitmap_super_t *sb)
diff --git a/bitmap.h b/bitmap.h
index c8725a3..adbf0b4 100644
--- a/bitmap.h
+++ b/bitmap.h
@@ -154,8 +154,11 @@ typedef struct bitmap_super_s {
 	__u32 chunksize;    /* 52  the bitmap chunk size in bytes */
 	__u32 daemon_sleep; /* 56  seconds between disk flushes */
 	__u32 write_behind; /* 60  number of outstanding write-behind writes */
-
-	__u8  pad[256 - 64]; /* set to zero */
+	__u32 sectors_reserved; /* 64 number of 512-byte sectors that are
+				 * reserved for the bitmap. */
+	__u32 nodes;        /* 68 the maximum number of nodes in cluster. */
+	__u8 cluster_name[64]; /* 72 cluster name to which this md belongs */
+	__u8  pad[256 - 136]; /* set to zero */
 } bitmap_super_t;
 
 /* notes:
diff --git a/mdadm.8.in b/mdadm.8.in
index a630310..4aec0db 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -694,7 +694,12 @@ and so is replicated on all devices.  If the word
 .B "none"
 is given with
 .B \-\-grow
-mode, then any bitmap that is present is removed.
+mode, then any bitmap that is present is removed. If the word
+.B "clustered"
+is given, the array is created for a clustered environment. One bitmap
+is created for each node as defined by the
+.B \-\-nodes
+parameter and are stored internally.
 
 To help catch typing errors, the filename must contain at least one
 slash ('/') if it is a real file (not 'internal' or 'none').
diff --git a/super1.c b/super1.c
index f0508fe..7928a3d 100644
--- a/super1.c
+++ b/super1.c
@@ -2177,6 +2177,7 @@ static int write_bitmap1(struct supertype *st, int fd)
 	void *buf;
 	int towrite, n;
 	struct align_fd afd;
+	unsigned int i = 0;
 
 	init_afd(&afd, fd);
 
@@ -2185,27 +2186,41 @@ static int write_bitmap1(struct supertype *st, int fd)
 	if (posix_memalign(&buf, 4096, 4096))
 		return -ENOMEM;
 
-	memset(buf, 0xff, 4096);
-	memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
-
-	towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
-	towrite = (towrite+7) >> 3; /* bits to bytes */
-	towrite += sizeof(bitmap_super_t);
-	towrite = ROUND_UP(towrite, 512);
-	while (towrite > 0) {
-		n = towrite;
-		if (n > 4096)
-			n = 4096;
-		n = awrite(&afd, buf, n);
-		if (n > 0)
-			towrite -= n;
+	do {
+		/* Only the bitmap[0] should resync
+		 * whole device on initial assembly
+		 */
+		if (i)
+			memset(buf, 0x00, 4096);
 		else
+			memset(buf, 0xff, 4096);
+		memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
+
+		towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
+		towrite = (towrite+7) >> 3; /* bits to bytes */
+		towrite += sizeof(bitmap_super_t);
+		/* we need the bitmaps to be at 4k boundary */
+		towrite = ROUND_UP(towrite, 4096);
+		while (towrite > 0) {
+			n = towrite;
+			if (n > 4096)
+				n = 4096;
+			n = awrite(&afd, buf, n);
+			if (n > 0)
+				towrite -= n;
+			else
+				break;
+			if (i)
+				memset(buf, 0x00, 4096);
+			else
+				memset(buf, 0xff, 4096);
+		}
+		fsync(fd);
+		if (towrite) {
+			rv = -2;
 			break;
-		memset(buf, 0xff, 4096);
-	}
-	fsync(fd);
-	if (towrite)
-		rv = -2;
+		}
+	} while (++i < __le32_to_cpu(bms->nodes));
 
 	free(buf);
 	return rv;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 02/11] Add nodes option while creating md
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 01/11] Create n bitmaps for clustered mode Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:13   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 03/11] home-cluster while creating an array Guoqing Jiang
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

Specifies the maximum number of nodes in the cluster that may use
this device simultaneously. This is equivalent to the number of
bitmaps created in the internal superblock (patches to follow).

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Create.c   |  1 +
 ReadMe.c   |  1 +
 mdadm.8.in |  6 ++++++
 mdadm.c    | 34 +++++++++++++++++++++++++++++++++-
 mdadm.h    |  3 +++
 super1.c   |  1 +
 6 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Create.c b/Create.c
index 69f5432..e4577af 100644
--- a/Create.c
+++ b/Create.c
@@ -531,6 +531,7 @@ int Create(struct supertype *st, char *mddev,
 				st->ss->name);
 		warn = 1;
 	}
+	st->nodes = c->nodes;
 
 	if (warn) {
 		if (c->runstop!= 1) {
diff --git a/ReadMe.c b/ReadMe.c
index 87a4916..30c569d 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -140,6 +140,7 @@ struct option long_options[] = {
     {"homehost",  1, 0,  HomeHost},
     {"symlinks",  1, 0,  Symlinks},
     {"data-offset",1, 0, DataOffset},
+    {"nodes",1, 0, Nodes},
 
     /* For assemble */
     {"uuid",      1, 0, 'u'},
diff --git a/mdadm.8.in b/mdadm.8.in
index 4aec0db..9c1497e 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -971,6 +971,12 @@ However for RAID0, it is not possible to add spares.  So to increase
 the number of devices in a RAID0, it is necessary to set the new
 number of devices, and to add the new devices, in the same command.
 
+.TP
+.BR \-\-nodes
+Only works when the array is for clustered environment. It specify the
+maximum number of nodes in the cluster that will use this device
+simultaneously. If not specified, this defaults to 4.
+
 .SH For assemble:
 
 .TP
diff --git a/mdadm.c b/mdadm.c
index 3e8c49b..15a43d2 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -588,7 +588,14 @@ int main(int argc, char *argv[])
 			}
 			ident.raid_disks = s.raiddisks;
 			continue;
-
+		case O(CREATE, Nodes):
+			c.nodes = parse_num(optarg);
+			if (c.nodes <= 0) {
+				pr_err("invalid number for the number of "
+						"cluster nodes: %s\n", optarg);
+				exit(2);
+			}
+			continue;
 		case O(CREATE,'x'): /* number of spare (eXtra) disks */
 			if (s.sparedisks) {
 				pr_err("spare-devices set twice: %d and %s\n",
@@ -1097,6 +1104,15 @@ int main(int argc, char *argv[])
 				s.bitmap_file = optarg;
 				continue;
 			}
+			if (strcmp(optarg, "clustered")== 0) {
+			    s.bitmap_file = optarg;
+			    /* Set the default number of cluster nodes
+			     * to 4 if not already set by user
+			     */
+			    if (c.nodes < 1)
+				    c.nodes = 4;
+			    continue;
+			}
 			/* probable typo */
 			pr_err("bitmap file must contain a '/', or be 'internal', or 'none'\n"
 				"       not '%s'\n", optarg);
@@ -1377,6 +1393,22 @@ int main(int argc, char *argv[])
 	case CREATE:
 		if (c.delay == 0)
 			c.delay = DEFAULT_BITMAP_DELAY;
+
+		if (strcmp(s.bitmap_file, "clustered") == 0) {
+			if (s.level != 1) {
+			    pr_err("--bitmap=clustered is currently supported with RAID mirror only\n");
+			    rv = 1;
+			    break;
+			}
+		} else {
+			if (c.nodes) {
+				pr_err("--nodes argument is incompatible with --bitmap=%s.\n",
+					s.bitmap_file);
+				rv = 1;
+				break;
+			}
+		}
+
 		if (s.write_behind && !s.bitmap_file) {
 			pr_err("write-behind mode requires a bitmap.\n");
 			rv = 1;
diff --git a/mdadm.h b/mdadm.h
index 141f963..9d55801 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -344,6 +344,7 @@ enum special_options {
 	Dump,
 	Restore,
 	Action,
+	Nodes,
 };
 
 enum prefix_standard {
@@ -418,6 +419,7 @@ struct context {
 	char	*backup_file;
 	int	invalid_backup;
 	char	*action;
+	int	nodes;
 };
 
 struct shape {
@@ -1029,6 +1031,7 @@ struct supertype {
 			 */
 	int devcnt;
 	int retry_soon;
+	int nodes;
 
 	struct mdinfo *devs;
 
diff --git a/super1.c b/super1.c
index 7928a3d..78d98a7 100644
--- a/super1.c
+++ b/super1.c
@@ -2144,6 +2144,7 @@ add_internal_bitmap1(struct supertype *st,
 	bms->daemon_sleep = __cpu_to_le32(delay);
 	bms->sync_size = __cpu_to_le64(size);
 	bms->write_behind = __cpu_to_le32(write_behind);
+	bms->nodes = __cpu_to_le32(st->nodes);
 
 	*chunkp = chunk;
 	return 1;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 03/11] home-cluster while creating an array
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 01/11] Create n bitmaps for clustered mode Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 02/11] Add nodes option while creating md Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:19   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 04/11] Show all bitmaps while examining bitmap Guoqing Jiang
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Create.c   |  1 +
 Makefile   |  1 +
 ReadMe.c   |  1 +
 config.c   | 27 ++++++++++++++++++++++++++-
 mdadm.8.in |  6 ++++++
 mdadm.c    | 25 +++++++++++++++++++++++++
 mdadm.h    |  5 +++++
 super1.c   |  3 +++
 util.c     | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/Create.c b/Create.c
index e4577af..9663dc4 100644
--- a/Create.c
+++ b/Create.c
@@ -532,6 +532,7 @@ int Create(struct supertype *st, char *mddev,
 		warn = 1;
 	}
 	st->nodes = c->nodes;
+	st->cluster_name = c->homecluster;
 
 	if (warn) {
 		if (c->runstop!= 1) {
diff --git a/Makefile b/Makefile
index a7d8c5c..431f08b 100644
--- a/Makefile
+++ b/Makefile
@@ -101,6 +101,7 @@ endif
 # If you want a static binary, you might uncomment these
 # LDFLAGS = -static
 # STRIP = -s
+LDLIBS=-ldl
 
 INSTALL = /usr/bin/install
 DESTDIR =
diff --git a/ReadMe.c b/ReadMe.c
index 30c569d..c6286ae 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -141,6 +141,7 @@ struct option long_options[] = {
     {"symlinks",  1, 0,  Symlinks},
     {"data-offset",1, 0, DataOffset},
     {"nodes",1, 0, Nodes},
+    {"home-cluster",1, 0, ClusterName},
 
     /* For assemble */
     {"uuid",      1, 0, 'u'},
diff --git a/config.c b/config.c
index 7342c42..21b6afd 100644
--- a/config.c
+++ b/config.c
@@ -77,7 +77,7 @@ char DefaultAltConfFile[] = CONFFILE2;
 char DefaultAltConfDir[] = CONFFILE2 ".d";
 
 enum linetype { Devices, Array, Mailaddr, Mailfrom, Program, CreateDev,
-		Homehost, AutoMode, Policy, PartPolicy, LTEnd };
+		Homehost, HomeCluster, AutoMode, Policy, PartPolicy, LTEnd };
 char *keywords[] = {
 	[Devices]  = "devices",
 	[Array]    = "array",
@@ -86,6 +86,7 @@ char *keywords[] = {
 	[Program]  = "program",
 	[CreateDev]= "create",
 	[Homehost] = "homehost",
+	[HomeCluster] = "homecluster",
 	[AutoMode] = "auto",
 	[Policy]   = "policy",
 	[PartPolicy]="part-policy",
@@ -562,6 +563,21 @@ void homehostline(char *line)
 	}
 }
 
+static char *home_cluster = NULL;
+void homeclusterline(char *line)
+{
+	char *w;
+
+	for (w=dl_next(line); w != line ; w=dl_next(w)) {
+		if (home_cluster == NULL) {
+			if (strcasecmp(w, "<none>")==0)
+				home_cluster = xstrdup("");
+			else
+				home_cluster = xstrdup(w);
+		}
+	}
+}
+
 char auto_yes[] = "yes";
 char auto_no[] = "no";
 char auto_homehost[] = "homehost";
@@ -724,6 +740,9 @@ void conf_file(FILE *f)
 		case Homehost:
 			homehostline(line);
 			break;
+		case HomeCluster:
+			homeclusterline(line);
+			break;
 		case AutoMode:
 			autoline(line);
 			break;
@@ -884,6 +903,12 @@ char *conf_get_homehost(int *require_homehostp)
 	return home_host;
 }
 
+char *conf_get_homecluster(void)
+{
+	load_conffile();
+	return home_cluster;
+}
+
 struct createinfo *conf_get_create_info(void)
 {
 	load_conffile();
diff --git a/mdadm.8.in b/mdadm.8.in
index 9c1497e..56d9bcc 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -415,6 +415,12 @@ This functionality is currently only provided by
 and
 .BR \-\-monitor .
 
+.TP
+.B \-\-home\-cluster=
+specifies the cluster name for the md device. The md device can be assembled
+only on the cluster which matches the name specified. If this option is not
+provided, mdadm tried to detect the cluster name automatically.
+
 .SH For create, build, or grow:
 
 .TP
diff --git a/mdadm.c b/mdadm.c
index 15a43d2..8f567d8 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -596,6 +596,13 @@ int main(int argc, char *argv[])
 				exit(2);
 			}
 			continue;
+		case O(CREATE, ClusterName):
+			c.homecluster = optarg;
+			if (strlen(c.homecluster) > 64) {
+				pr_err("Cluster name too big.\n");
+				exit(ERANGE);
+			}
+			continue;
 		case O(CREATE,'x'): /* number of spare (eXtra) disks */
 			if (s.sparedisks) {
 				pr_err("spare-devices set twice: %d and %s\n",
@@ -1276,6 +1283,18 @@ int main(int argc, char *argv[])
 		c.require_homehost = 0;
 	}
 
+	if (c.homecluster == NULL && (c.nodes > 0)) {
+		c.homecluster = conf_get_homecluster();
+		if (c.homecluster == NULL)
+			rv = get_cluster_name(&c.homecluster);
+		if (rv == 0) {
+			c.homehost = xstrdup(c.homecluster);
+			/* Add a : to differentiate between a host
+			 * and a cluster */
+			strcat(c.homehost, ":");
+		}
+	}
+
 	if (c.backup_file && data_offset != INVALID_SECTORS) {
 		pr_err("--backup-file and --data-offset are incompatible\n");
 		exit(2);
@@ -1407,6 +1426,12 @@ int main(int argc, char *argv[])
 				rv = 1;
 				break;
 			}
+			if (c.homecluster) {
+				pr_err("--home-cluster argument is incompatible with --bitmap=%s.\n",
+					s.bitmap_file);
+				rv = 1;
+				break;
+			}
 		}
 
 		if (s.write_behind && !s.bitmap_file) {
diff --git a/mdadm.h b/mdadm.h
index 9d55801..f56d9d6 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -345,6 +345,7 @@ enum special_options {
 	Restore,
 	Action,
 	Nodes,
+	ClusterName,
 };
 
 enum prefix_standard {
@@ -420,6 +421,7 @@ struct context {
 	int	invalid_backup;
 	char	*action;
 	int	nodes;
+	char	*homecluster;
 };
 
 struct shape {
@@ -1032,6 +1034,7 @@ struct supertype {
 	int devcnt;
 	int retry_soon;
 	int nodes;
+	char *cluster_name;
 
 	struct mdinfo *devs;
 
@@ -1308,6 +1311,7 @@ extern char *conf_get_mailaddr(void);
 extern char *conf_get_mailfrom(void);
 extern char *conf_get_program(void);
 extern char *conf_get_homehost(int *require_homehostp);
+extern char *conf_get_homecluster(void);
 extern char *conf_line(FILE *file);
 extern char *conf_word(FILE *file, int allow_key);
 extern void print_quoted(char *str);
@@ -1416,6 +1420,7 @@ extern char *stat2devnm(struct stat *st);
 extern char *fd2devnm(int fd);
 
 extern int in_initrd(void);
+extern int get_cluster_name(char **name);
 
 #define _ROUND_UP(val, base)	(((val) + (base) - 1) & ~(base - 1))
 #define ROUND_UP(val, base)	_ROUND_UP(val, (typeof(val))(base))
diff --git a/super1.c b/super1.c
index 78d98a7..60f470b 100644
--- a/super1.c
+++ b/super1.c
@@ -2145,6 +2145,9 @@ add_internal_bitmap1(struct supertype *st,
 	bms->sync_size = __cpu_to_le64(size);
 	bms->write_behind = __cpu_to_le32(write_behind);
 	bms->nodes = __cpu_to_le32(st->nodes);
+	if (st->cluster_name)
+	    strncpy((char *)bms->cluster_name,
+		    st->cluster_name, strlen(st->cluster_name));
 
 	*chunkp = chunk;
 	return 1;
diff --git a/util.c b/util.c
index cc98d3b..ed9a745 100644
--- a/util.c
+++ b/util.c
@@ -34,6 +34,8 @@
 #include	<ctype.h>
 #include	<dirent.h>
 #include	<signal.h>
+#include	<dlfcn.h>
+#include	<corosync/cmap.h>
 
 /*
  * following taken from linux/blkpg.h because they aren't
@@ -1976,3 +1978,51 @@ void reopen_mddev(int mdfd)
 	if (fd >= 0 && fd != mdfd)
 		dup2(fd, mdfd);
 }
+
+int get_cluster_name(char **cluster_name)
+{
+        void *lib_handle = NULL;
+        int rv = -1;
+
+        cmap_handle_t handle;
+        static int (*initialize)(cmap_handle_t *handle);
+        static int (*get_string)(cmap_handle_t handle,
+                        const char *string,
+                        char **name);
+        static int (*finalize)(cmap_handle_t handle);
+
+
+        lib_handle = dlopen("libcmap.so.4", RTLD_NOW | RTLD_LOCAL);
+        if (!lib_handle)
+                return rv;
+
+        initialize = dlsym(lib_handle, "cmap_initialize");
+        if (!initialize)
+                goto out;
+
+        get_string = dlsym(lib_handle, "cmap_get_string");
+        if (!get_string)
+                goto out;
+
+        finalize = dlsym(lib_handle, "cmap_finalize");
+        if (!finalize)
+                goto out;
+
+        rv = initialize(&handle);
+        if (rv != CS_OK)
+                goto out;
+
+        rv = get_string(handle, "totem.cluster_name", cluster_name);
+        if (rv != CS_OK) {
+                free(*cluster_name);
+                rv = -1;
+                goto name_err;
+        }
+
+        rv = 0;
+name_err:
+        finalize(handle);
+out:
+        dlclose(lib_handle);
+        return rv;
+}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 04/11] Show all bitmaps while examining bitmap
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (2 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 03/11] home-cluster while creating an array Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:23   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 05/11] Add a new clustered disk Guoqing Jiang
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

This adds capability of exmining bitmaps corresponding to all
nodes/slots on the device.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 bitmap.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 54 insertions(+), 18 deletions(-)

diff --git a/bitmap.c b/bitmap.c
index 920033a..bccc67c 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -260,7 +260,7 @@ int ExamineBitmap(char *filename, int brief, struct supertype *st)
 	int rv = 1;
 	char buf[64];
 	int swap;
-	int fd;
+	int fd, i;
 	__u32 uuid32[4];
 
 	fd = bitmap_file_open(filename, &st);
@@ -317,23 +317,59 @@ int ExamineBitmap(char *filename, int brief, struct supertype *st)
 		       uuid32[2],
 		       uuid32[3]);
 
-	printf("          Events : %llu\n", (unsigned long long)sb->events);
-	printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
-	printf("           State : %s\n", bitmap_state(sb->state));
-	printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
-	printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
-	if (sb->write_behind)
-		sprintf(buf, "Allow write behind, max %d", sb->write_behind);
-	else
-		sprintf(buf, "Normal");
-	printf("      Write Mode : %s\n", buf);
-	printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
-					human_size(sb->sync_size * 512));
-	if (brief)
-		goto free_info;
-	printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
-			info->total_bits, info->dirty_bits,
-			100.0 * info->dirty_bits / (info->total_bits?:1));
+	if (sb->nodes == 0) {
+		printf("          Events : %llu\n", (unsigned long long)sb->events);
+		printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
+		printf("           State : %s\n", bitmap_state(sb->state));
+		printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
+		printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
+		if (sb->write_behind)
+			sprintf(buf, "Allow write behind, max %d", sb->write_behind);
+		else
+			sprintf(buf, "Normal");
+		printf("      Write Mode : %s\n", buf);
+		printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
+						human_size(sb->sync_size * 512));
+		if (brief)
+			goto free_info;
+		printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
+				info->total_bits, info->dirty_bits,
+				100.0 * info->dirty_bits / (info->total_bits?:1));
+	} else {
+		printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
+		printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
+		if (sb->write_behind)
+			sprintf(buf, "Allow write behind, max %d", sb->write_behind);
+		else
+			sprintf(buf, "Normal");
+		printf("      Write Mode : %s\n", buf);
+		printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
+						human_size(sb->sync_size * 512));
+		printf("   Cluster nodes : %d\n", sb->nodes);
+		printf("    Cluster name : %s\n", sb->cluster_name);
+		i = 0;
+		do {
+			if (i) {
+				free(info);
+				info = bitmap_fd_read(fd, brief);
+				sb = &info->sb;
+			}
+			if (sb->magic != BITMAP_MAGIC)
+				pr_err("invalid bitmap magic 0x%x, the bitmap file appears to be corrupted\n", sb->magic);
+
+			printf("       Node Slot : %d\n", i);
+			printf("          Events : %llu\n", (unsigned long long)sb->events);
+			printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
+			printf("           State : %s\n", bitmap_state(sb->state));
+			if (brief)
+				continue;
+			printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
+					info->total_bits, info->dirty_bits,
+					100.0 * info->dirty_bits / (info->total_bits?:1));
+
+		} while (++i < (int)sb->nodes);
+	}
+
 free_info:
 	free(info);
 	return rv;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 05/11] Add a new clustered disk
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (3 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 04/11] Show all bitmaps while examining bitmap Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:35   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 06/11] Convert a bitmap=none device to clustered Guoqing Jiang
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:

--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)

The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Manage.c   | 33 +++++++++++++++++++++++++++++----
 ReadMe.c   |  1 +
 md_p.h     |  7 +++++++
 md_u.h     |  1 +
 mdadm.8.in |  9 +++++++++
 mdadm.c    |  4 ++++
 mdadm.h    |  2 ++
 util.c     | 10 ++++++++++
 8 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/Manage.c b/Manage.c
index d3cfb55..4c3d451 100644
--- a/Manage.c
+++ b/Manage.c
@@ -690,7 +690,8 @@ skip_re_add:
 int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	       struct supertype *tst, mdu_array_info_t *array,
 	       int force, int verbose, char *devname,
-	       char *update, unsigned long rdev, unsigned long long array_size)
+	       char *update, unsigned long rdev, unsigned long long array_size,
+	       int raid_slot)
 {
 	unsigned long long ldsize;
 	struct supertype *dev_st = NULL;
@@ -879,7 +880,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	}
 	disc.major = major(rdev);
 	disc.minor = minor(rdev);
-	disc.number =j;
+	if (raid_slot < 0)
+		disc.number = j;
+	else
+		disc.number = raid_slot;
 	disc.state = 0;
 	if (array->not_persistent==0) {
 		int dfd;
@@ -920,6 +924,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			}
 		free(used);
 	}
+
+	if (array->state & (1 << MD_SB_CLUSTERED)) {
+		if (dv->disposition == 'c')
+			disc.state |= (1 << MD_DISK_CANDIDATE);
+		else
+			disc.state |= (1 << MD_DISK_CLUSTER_ADD);
+	}
+
 	if (dv->writemostly == 1)
 		disc.state |= (1 << MD_DISK_WRITEMOSTLY);
 	if (tst->ss->external) {
@@ -1239,6 +1251,7 @@ int Manage_subdevs(char *devname, int fd,
 	 *        variant on 'A'
 	 *  'F' - Another variant of 'A', where the device was faulty
 	 *        so must be removed from the array first.
+	 *  'c' - confirm the device as found (for clustered environments)
 	 *
 	 * For 'f' and 'r', the device can also be a kernel-internal
 	 * name such as 'sdb'.
@@ -1254,6 +1267,7 @@ int Manage_subdevs(char *devname, int fd,
 	struct mdinfo info;
 	int frozen = 0;
 	int busy = 0;
+	int raid_slot = -1;
 
 	if (ioctl(fd, GET_ARRAY_INFO, &array)) {
 		pr_err("Cannot get array info for %s\n",
@@ -1282,6 +1296,11 @@ int Manage_subdevs(char *devname, int fd,
 		int rv;
 		int mj,mn;
 
+		raid_slot = -1;
+		if (dv->disposition == 'c')
+			parse_cluster_confirm_arg(dv->devname, &dv->devname,
+					&raid_slot);
+
 		if (strcmp(dv->devname, "failed") == 0 ||
 		    strcmp(dv->devname, "faulty") == 0) {
 			if (dv->disposition != 'A'
@@ -1307,6 +1326,11 @@ int Manage_subdevs(char *devname, int fd,
 		if (strcmp(dv->devname, "missing") == 0) {
 			struct mddev_dev *add_devlist = NULL;
 			struct mddev_dev **dp;
+			if (dv->disposition == 'c') {
+				rv = ioctl(fd, CLUSTERED_DISK_NACK, NULL);
+				break;
+			}
+
 			if (dv->disposition != 'A') {
 				pr_err("'missing' only meaningful with --re-add\n");
 				goto abort;
@@ -1399,7 +1423,7 @@ int Manage_subdevs(char *devname, int fd,
 			else {
 				int open_err = errno;
 				if (stat(dv->devname, &stb) != 0) {
-					pr_err("Cannot find %s: %s\n",
+					pr_err("%s: %d Cannot find %s: %s\n", __func__, __LINE__,
 					       dv->devname, strerror(errno));
 					goto abort;
 				}
@@ -1437,6 +1461,7 @@ int Manage_subdevs(char *devname, int fd,
 		case 'A':
 		case 'M': /* --re-add missing */
 		case 'F': /* --re-add faulty  */
+		case 'c': /* --cluster-confirm */
 			/* add the device */
 			if (subarray) {
 				pr_err("Cannot add disks to a \'member\' array, perform this operation on the parent container\n");
@@ -1470,7 +1495,7 @@ int Manage_subdevs(char *devname, int fd,
 			}
 			rv = Manage_add(fd, tfd, dv, tst, &array,
 					force, verbose, devname, update,
-					rdev, array_size);
+					rdev, array_size, raid_slot);
 			close(tfd);
 			tfd = -1;
 			if (rv < 0)
diff --git a/ReadMe.c b/ReadMe.c
index c6286ae..c854cd5 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -169,6 +169,7 @@ struct option long_options[] = {
     {"wait",	  0, 0,  WaitOpt},
     {"wait-clean", 0, 0, Waitclean },
     {"action",    1, 0, Action },
+    {"cluster-confirm", 0, 0, ClusterConfirm},
 
     /* For Detail/Examine */
     {"brief",	  0, 0, Brief},
diff --git a/md_p.h b/md_p.h
index c4846ba..e59504f 100644
--- a/md_p.h
+++ b/md_p.h
@@ -78,6 +78,12 @@
 #define MD_DISK_ACTIVE		1 /* disk is running but may not be in sync */
 #define MD_DISK_SYNC		2 /* disk is in sync with the raid set */
 #define MD_DISK_REMOVED		3 /* disk is in sync with the raid set */
+#define MD_DISK_CLUSTER_ADD     4 /* Initiate a disk add across the cluster
+				   * For clustered enviroments only.
+				   */
+#define MD_DISK_CANDIDATE	5 /* disk is added as spare (local) until confirmed
+				   * For clustered enviroments only.
+				   */
 
 #define	MD_DISK_WRITEMOSTLY	9 /* disk is "write-mostly" is RAID1 config.
 				   * read requests will only be sent here in
@@ -106,6 +112,7 @@ typedef struct mdp_device_descriptor_s {
 #define MD_SB_BLOCK_CONTAINER_RESHAPE 3 /* block container wide reshapes */
 #define MD_SB_BLOCK_VOLUME	4 /* block activation of array, other arrays
 				   * in container can be activated */
+#define MD_SB_CLUSTERED		5 /* MD is clustered  */
 #define	MD_SB_BITMAP_PRESENT	8 /* bitmap may be present nearby */
 
 typedef struct mdp_superblock_s {
diff --git a/md_u.h b/md_u.h
index be9868a..76068d6 100644
--- a/md_u.h
+++ b/md_u.h
@@ -44,6 +44,7 @@
 #define STOP_ARRAY		_IO (MD_MAJOR, 0x32)
 #define STOP_ARRAY_RO		_IO (MD_MAJOR, 0x33)
 #define RESTART_ARRAY_RW	_IO (MD_MAJOR, 0x34)
+#define CLUSTERED_DISK_NACK	_IO (MD_MAJOR, 0x35)
 
 typedef struct mdu_version_s {
 	int major;
diff --git a/mdadm.8.in b/mdadm.8.in
index 56d9bcc..57873ec 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -1406,6 +1406,15 @@ will avoid reading from these devices if possible.
 .BR \-\-readwrite
 Subsequent devices that are added or re\-added will have the 'write-mostly'
 flag cleared.
+.TP
+.BR \-\-cluster\-confirm
+Confirm the existence of the device. This is issued in response to an \-\-add
+request by a node in a cluster. When a node adds a device it sends a message
+to all nodes in the cluster to look for a device with a UUID. This translates
+to a udev notification with the UUID of the device to be added and the slot
+number. The receiving node must acknowledge this message
+with \-\-cluster\-confirm. Valid arguments are <slot>:<devicename> in case
+the device is found or <slot>:missing in case the device is not found.
 
 .P
 Each of these options requires that the first device listed is the array
diff --git a/mdadm.c b/mdadm.c
index 8f567d8..56fdeb7 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -196,6 +196,7 @@ int main(int argc, char *argv[])
 		case 'f':
 		case Fail:
 		case ReAdd: /* re-add */
+		case ClusterConfirm:
 			if (!mode) {
 				newmode = MANAGE;
 				shortopt = short_bitmap_options;
@@ -933,6 +934,9 @@ int main(int argc, char *argv[])
 					   * remove the device */
 			devmode = 'f';
 			continue;
+		case O(MANAGE, ClusterConfirm):
+			devmode = 'c';
+			continue;
 		case O(MANAGE,Replace):
 			/* Mark these devices for replacement */
 			devmode = 'R';
diff --git a/mdadm.h b/mdadm.h
index f56d9d6..00c726e 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -346,6 +346,7 @@ enum special_options {
 	Action,
 	Nodes,
 	ClusterName,
+	ClusterConfirm,
 };
 
 enum prefix_standard {
@@ -1281,6 +1282,7 @@ extern int parse_uuid(char *str, int uuid[4]);
 extern int parse_layout_10(char *layout);
 extern int parse_layout_faulty(char *layout);
 extern long parse_num(char *num);
+extern int parse_cluster_confirm_arg(char *inp, char **devname, int *slot);
 extern int check_ext2(int fd, char *name);
 extern int check_reiser(int fd, char *name);
 extern int check_raid(int fd, char *name);
diff --git a/util.c b/util.c
index ed9a745..8d27564 100644
--- a/util.c
+++ b/util.c
@@ -273,6 +273,16 @@ long parse_num(char *num)
 }
 #endif
 
+int parse_cluster_confirm_arg(char *input, char **devname, int *slot)
+{
+	char *dev;
+	*slot = strtoul(input, &dev, 10);
+	if (dev == input || dev[0] != ':')
+		return -1;
+	*devname = dev+1;
+	return 0;
+}
+
 void remove_partitions(int fd)
 {
 	/* remove partitions from this block devices.
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 06/11] Convert a bitmap=none device to clustered
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (4 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 05/11] Add a new clustered disk Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:40   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 07/11] Skip clustered devices in incremental Guoqing Jiang
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

This adds the ability to convert a regular md without bitmap
(--bitmap=none) to a clustered device (--bitmap=clustered).

To convert a device with --bitmap=internal or --bitmap=external,
you have to convert to --bitmap=none and then re-execute the
command with --bitmap=clustered.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Grow.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/Grow.c b/Grow.c
index 9a573fd..1122cec 100644
--- a/Grow.c
+++ b/Grow.c
@@ -330,9 +330,16 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 			}
 			return 0;
 		}
-		pr_err("Internal bitmap already present on %s\n",
-			devname);
-		return 1;
+		if ((strcmp(s->bitmap_file, "clustered")==0) && (array.state & (1<<MD_SB_CLUSTERED))) {
+			pr_err("Clustered bitmap already present on %s\n",
+					devname);
+			return 1;
+		}
+		if ((strcmp(s->bitmap_file, "internal")==0) && (!(array.state & (1<<MD_SB_CLUSTERED)))) {
+			pr_err("Internal bitmap already present on %s\n",
+					devname);
+			return 1;
+		}
 	}
 
 	if (strcmp(s->bitmap_file, "none") == 0) {
@@ -375,7 +382,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 		free(st);
 		return 1;
 	}
-	if (strcmp(s->bitmap_file, "internal") == 0) {
+	if ((strcmp(s->bitmap_file, "internal") == 0) ||
+		(strcmp(s->bitmap_file, "clustered") == 0)) {
 		int rv;
 		int d;
 		int offset_setable = 0;
@@ -384,6 +392,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 			pr_err("Internal bitmaps not supported with %s metadata\n", st->ss->name);
 			return 1;
 		}
+		st->nodes = c->nodes;
+		st->cluster_name = c->homecluster;
 		mdi = sysfs_read(fd, NULL, GET_BITMAP_LOCATION);
 		if (mdi)
 			offset_setable = 1;
@@ -426,6 +436,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 			rv = sysfs_set_num_signed(mdi, NULL, "bitmap/location",
 						  mdi->bitmap_offset);
 		} else {
+			if (strcmp(s->bitmap_file, "clustered") == 0)
+				array.state |= (1<<MD_SB_CLUSTERED);
 			array.state |= (1<<MD_SB_BITMAP_PRESENT);
 			rv = ioctl(fd, SET_ARRAY_INFO, &array);
 		}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 07/11] Skip clustered devices in incremental
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (5 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 06/11] Convert a bitmap=none device to clustered Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 08/11] mdadm: add the ability to change cluster name Guoqing Jiang
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

We want the clustered devices to be started exclusively by a cluster
resource-agent. So, avoid starting using the incremental option.

This also skips a clustered md from starting during boot in inactive mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Incremental.c | 5 +++++
 super1.c      | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/Incremental.c b/Incremental.c
index 0c9a9a4..5450a5c 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -232,6 +232,11 @@ int Incremental(struct mddev_dev *devlist, struct context *c,
 				devname);
 		goto out;
 	}
+	/* Skip the clustered ones. This should be started by
+	 * clustering resource agents
+	 */
+	if (info.array.state & (1 << MD_SB_CLUSTERED))
+		goto out;
 
 	/* 3a/ if not, check for homehost match.  If no match, continue
 	 * but don't trust the 'name' in the array. Thus a 'random' minor
diff --git a/super1.c b/super1.c
index 60f470b..fd728d2 100644
--- a/super1.c
+++ b/super1.c
@@ -891,6 +891,8 @@ static void getinfo_super1(struct supertype *st, struct mdinfo *info, char *map)
 	info->array.state =
 		(__le64_to_cpu(sb->resync_offset) == MaxSector)
 		? 1 : 0;
+	if (__le32_to_cpu(bsb->nodes) > 1)
+		info->array.state |= (1 << MD_SB_CLUSTERED);
 
 	info->data_offset = __le64_to_cpu(sb->data_offset);
 	info->component_size = __le64_to_cpu(sb->size);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 08/11] mdadm: add the ability to change cluster name
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (6 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 07/11] Skip clustered devices in incremental Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:53   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 09/11] mdadm: change the num of cluster node Guoqing Jiang
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

To support change the cluster name, the commit do the followings:

1. extend original write_bitmap function for new scenario.
2. add the scenarion to handle the modification of cluster's name
   in write_bitmap1.
3. make update_super1 can change the name in mdp_superblock_1.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Assemble.c |  5 +++++
 Grow.c     |  2 +-
 mdadm.c    |  3 +++
 mdadm.h    |  7 ++++++-
 super0.c   |  4 ++--
 super1.c   | 43 ++++++++++++++++++++++++++++++++++++++++---
 6 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 25a103d..e1b846c 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -644,6 +644,11 @@ static int load_devices(struct devs *devices, char *devmap,
 				*stp = st;
 				return -1;
 			}
+			if (strcmp(c->update, "home-cluster") == 0) {
+				err = tst->ss->update_super(tst, content, c->update,
+							    devname, 0, 0, c->homecluster);
+				tst->ss->write_bitmap(tst, dfd, NameUpdate);
+			}
 			if (strcmp(c->update, "uuid")==0 &&
 			    !ident->uuid_set) {
 				ident->uuid_set = 1;
diff --git a/Grow.c b/Grow.c
index 1122cec..bf44e66 100644
--- a/Grow.c
+++ b/Grow.c
@@ -420,7 +420,7 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
 						    bitmapsize, offset_setable,
 						    major)
 						)
-						st->ss->write_bitmap(st, fd2);
+						st->ss->write_bitmap(st, fd2, NoUpdate);
 					else {
 						pr_err("failed to create internal bitmap - chunksize problem.\n");
 						close(fd2);
diff --git a/mdadm.c b/mdadm.c
index 56fdeb7..22f4fc7 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -598,6 +598,7 @@ int main(int argc, char *argv[])
 			}
 			continue;
 		case O(CREATE, ClusterName):
+		case O(ASSEMBLE, ClusterName):
 			c.homecluster = optarg;
 			if (strlen(c.homecluster) > 64) {
 				pr_err("Cluster name too big.\n");
@@ -741,6 +742,8 @@ int main(int argc, char *argv[])
 				continue;
 			if (strcmp(c.update, "homehost")==0)
 				continue;
+			if (strcmp(c.update, "home-cluster")==0)
+				continue;
 			if (strcmp(c.update, "devicesize")==0)
 				continue;
 			if (strcmp(c.update, "no-bitmap")==0)
diff --git a/mdadm.h b/mdadm.h
index 00c726e..d8b0749 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -354,6 +354,11 @@ enum prefix_standard {
 	IEC
 };
 
+enum bitmap_update {
+    NoUpdate,
+    NameUpdate,
+};
+
 /* structures read from config file */
 /* List of mddevice names and identifiers
  * Identifiers can be:
@@ -850,7 +855,7 @@ extern struct superswitch {
 	/* if add_internal_bitmap succeeded for existing array, this
 	 * writes it out.
 	 */
-	int (*write_bitmap)(struct supertype *st, int fd);
+	int (*write_bitmap)(struct supertype *st, int fd, enum bitmap_update update);
 	/* Free the superblock and any other allocated data */
 	void (*free_super)(struct supertype *st);
 
diff --git a/super0.c b/super0.c
index deb5999..6ad9d39 100644
--- a/super0.c
+++ b/super0.c
@@ -900,7 +900,7 @@ static int write_init_super0(struct supertype *st)
 		rv = store_super0(st, di->fd);
 
 		if (rv == 0 && (sb->state & (1<<MD_SB_BITMAP_PRESENT)))
-			rv = st->ss->write_bitmap(st, di->fd);
+			rv = st->ss->write_bitmap(st, di->fd, NoUpdate);
 
 		if (rv)
 			pr_err("failed to write superblock to %s\n",
@@ -1175,7 +1175,7 @@ static void locate_bitmap0(struct supertype *st, int fd)
 	lseek64(fd, offset, 0);
 }
 
-static int write_bitmap0(struct supertype *st, int fd)
+static int write_bitmap0(struct supertype *st, int fd, enum bitmap_update update)
 {
 	unsigned long long dsize;
 	unsigned long long offset;
diff --git a/super1.c b/super1.c
index fd728d2..07944d4 100644
--- a/super1.c
+++ b/super1.c
@@ -1073,7 +1073,23 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 		info->name[32] = 0;
 	}
 
-	if (strcmp(update, "force-one")==0) {
+	if (strcmp(update, "home-cluster") == 0 &&
+	    homehost) {
+		/* Note that 'home-cluster' is to change the name of cluster,
+		 * it is another "name" update.
+		 */
+		char *new_name = xmalloc(sizeof(sb->set_name));
+		if (strrchr(sb->set_name, ':')) {
+			strcpy(new_name, strchr(sb->set_name, ':'));
+		}
+
+		memset(sb->set_name, 0, sizeof(sb->set_name));
+		strcpy(sb->set_name, homehost);
+		if (new_name)
+			strcat(sb->set_name, new_name);
+
+		free(new_name);
+	} else if (strcmp(update, "force-one")==0) {
 		/* Not enough devices for a working array,
 		 * so bring this one up-to-date
 		 */
@@ -1691,7 +1707,7 @@ static int write_init_super1(struct supertype *st)
 		sb->sb_csum = calc_sb_1_csum(sb);
 		rv = store_super1(st, di->fd);
 		if (rv == 0 && (__le32_to_cpu(sb->feature_map) & 1))
-			rv = st->ss->write_bitmap(st, di->fd);
+			rv = st->ss->write_bitmap(st, di->fd, NoUpdate);
 		close(di->fd);
 		di->fd = -1;
 		if (rv)
@@ -2175,7 +2191,7 @@ static void locate_bitmap1(struct supertype *st, int fd)
 	lseek64(fd, offset<<9, 0);
 }
 
-static int write_bitmap1(struct supertype *st, int fd)
+static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update)
 {
 	struct mdp_superblock_1 *sb = st->sb;
 	bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb)+MAX_SB_SIZE);
@@ -2184,6 +2200,27 @@ static int write_bitmap1(struct supertype *st, int fd)
 	int towrite, n;
 	struct align_fd afd;
 	unsigned int i = 0;
+	char *new_name;
+
+	switch (update) {
+	case NameUpdate:
+	    new_name = xmalloc(sizeof(sb->set_name));
+
+	    strncpy(new_name, sb->set_name, sizeof(sb->set_name));
+	    memset((char *)bms->cluster_name, 0, sizeof(bms->cluster_name));
+
+	    if (strtok(new_name, ":"))
+		strncpy((char *)bms->cluster_name, new_name, strlen(sb->set_name));
+	    else
+		/* In case the original set_name doesn't like aaa:md* */
+		strncpy((char *)bms->cluster_name, sb->set_name, strlen(sb->set_name));
+
+	    free(new_name);
+	    break;
+	case NoUpdate:
+	default:
+	    break;
+	}
 
 	init_afd(&afd, fd);
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 09/11] mdadm: change the num of cluster node
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (7 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 08/11] mdadm: add the ability to change cluster name Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:56   ` NeilBrown
  2015-05-20  3:20 ` [PATCH V3 10/11] Reuse calc_bitmap_size to reduce code size Guoqing Jiang
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

This extends nodes option for assemble mode, make the num of
cluster node could be change by user.

Before that, it is necessary to ensure there are enough space
for those nodes, calc_bitmap_size is introduced to calculate
the bitmap size of each node.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Assemble.c |  4 ++++
 ReadMe.c   |  2 +-
 mdadm.c    |  3 +++
 mdadm.h    |  1 +
 super1.c   | 37 +++++++++++++++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/Assemble.c b/Assemble.c
index e1b846c..9ff546b 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -626,6 +626,10 @@ static int load_devices(struct devs *devices, char *devmap,
 
 			if (strcmp(c->update, "byteorder") == 0)
 				err = 0;
+			else if (strcmp(c->update, "nodes") == 0) {
+				tst->nodes = c->nodes;
+				err = tst->ss->write_bitmap(tst, dfd, NodeNumUpdate);
+			}
 			else
 				err = tst->ss->update_super(tst, content, c->update,
 							    devname, c->verbose,
diff --git a/ReadMe.c b/ReadMe.c
index c854cd5..d1830e1 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -140,7 +140,7 @@ struct option long_options[] = {
     {"homehost",  1, 0,  HomeHost},
     {"symlinks",  1, 0,  Symlinks},
     {"data-offset",1, 0, DataOffset},
-    {"nodes",1, 0, Nodes},
+    {"nodes",1, 0, Nodes}, /* also for --assemble */
     {"home-cluster",1, 0, ClusterName},
 
     /* For assemble */
diff --git a/mdadm.c b/mdadm.c
index 22f4fc7..87c572d 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -589,6 +589,7 @@ int main(int argc, char *argv[])
 			}
 			ident.raid_disks = s.raiddisks;
 			continue;
+		case O(ASSEMBLE, Nodes):
 		case O(CREATE, Nodes):
 			c.nodes = parse_num(optarg);
 			if (c.nodes <= 0) {
@@ -744,6 +745,8 @@ int main(int argc, char *argv[])
 				continue;
 			if (strcmp(c.update, "home-cluster")==0)
 				continue;
+			if (strcmp(c.update, "nodes")==0)
+				continue;
 			if (strcmp(c.update, "devicesize")==0)
 				continue;
 			if (strcmp(c.update, "no-bitmap")==0)
diff --git a/mdadm.h b/mdadm.h
index d8b0749..97892e6 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -357,6 +357,7 @@ enum prefix_standard {
 enum bitmap_update {
     NoUpdate,
     NameUpdate,
+    NodeNumUpdate,
 };
 
 /* structures read from config file */
diff --git a/super1.c b/super1.c
index 07944d4..fe12e81 100644
--- a/super1.c
+++ b/super1.c
@@ -134,6 +134,20 @@ struct misc_dev_info {
 					|MD_FEATURE_NEW_OFFSET		\
 					)
 
+/* return how many bytes are needed for bitmap, for cluster-md each node
+ * should have it's own bitmap */
+static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned int boundary)
+{
+	unsigned long long bits, bytes;
+
+	bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
+	bytes = (bits+7) >> 3;
+	bytes += sizeof(bitmap_super_t);
+	bytes = ROUND_UP(bytes, boundary);
+
+	return bytes;
+}
+
 static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb)
 {
 	unsigned int disk_csum, csum;
@@ -2201,6 +2215,7 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 	struct align_fd afd;
 	unsigned int i = 0;
 	char *new_name;
+	unsigned long long total_bm_space, bm_space_per_node;
 
 	switch (update) {
 	case NameUpdate:
@@ -2217,6 +2232,28 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 
 	    free(new_name);
 	    break;
+	case NodeNumUpdate:
+	    /* cluster md only supports superblock 1.2 now */
+	    if (st->minor_version != 2) {
+		pr_err("Warning: cluster md only works with superblock 1.2\n");
+		return -EINVAL;
+	    }
+
+	    /* Each node has an independent bitmap, it is necessary to calculate the
+	     * space is enough or not, first get how many bytes for the total bitmap */
+	    bm_space_per_node = calc_bitmap_size(bms, 4096);
+
+	    total_bm_space = 512 * (__le64_to_cpu(sb->data_offset) - __le64_to_cpu(sb->super_offset));
+	    total_bm_space = total_bm_space - 4096; /* leave another 4k for superblock */
+
+	    if (bm_space_per_node * st->nodes > total_bm_space) {
+		pr_err("Warning: The max num of nodes can't exceed %llu\n",
+			total_bm_space / bm_space_per_node);
+		return -ENOMEM;
+	    }
+
+	    bms->nodes = __cpu_to_le32(st->nodes);
+	    break;
 	case NoUpdate:
 	default:
 	    break;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 10/11] Reuse calc_bitmap_size to reduce code size
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (8 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 09/11] mdadm: change the num of cluster node Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-20  3:20 ` [PATCH V3 11/11] Reuse the write_bitmap for update uuid Guoqing Jiang
  2015-05-25  5:03 ` [PATCH V3 00/11] mdadm tool: add the support for cluster-md NeilBrown
  11 siblings, 0 replies; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

We can use the new added calc_bitmap_size func to
remove some redundant lines.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 super1.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/super1.c b/super1.c
index fe12e81..b36b702 100644
--- a/super1.c
+++ b/super1.c
@@ -695,12 +695,8 @@ static int copy_metadata1(struct supertype *st, int from, int to)
 				/* have the header, can calculate
 				 * correct bitmap bytes */
 				bitmap_super_t *bms;
-				int bits;
 				bms = (void*)buf;
-				bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
-				bytes = (bits+7) >> 3;
-				bytes += sizeof(bitmap_super_t);
-				bytes = ROUND_UP(bytes, 512);
+				bytes = calc_bitmap_size(bms, 512);
 				if (n > bytes)
 					n =  bytes;
 			}
@@ -2276,11 +2272,7 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 			memset(buf, 0xff, 4096);
 		memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
 
-		towrite = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
-		towrite = (towrite+7) >> 3; /* bits to bytes */
-		towrite += sizeof(bitmap_super_t);
-		/* we need the bitmaps to be at 4k boundary */
-		towrite = ROUND_UP(towrite, 4096);
+		towrite = calc_bitmap_size(bms, 4096);
 		while (towrite > 0) {
 			n = towrite;
 			if (n > 4096)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 11/11] Reuse the write_bitmap for update uuid
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (9 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 10/11] Reuse calc_bitmap_size to reduce code size Guoqing Jiang
@ 2015-05-20  3:20 ` Guoqing Jiang
  2015-05-25  4:59   ` NeilBrown
  2015-05-25  5:03 ` [PATCH V3 00/11] mdadm tool: add the support for cluster-md NeilBrown
  11 siblings, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-20  3:20 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, rgoldwyn

Since write_bitmap is extended for handle different sistuations,
then it also could possible to support change the uuid of bitmap,
and remove bitmap_update_uuid accordingly.

Q: is the write_bitmap0 also impacted?

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 Assemble.c |  5 ++---
 bitmap.c   | 20 --------------------
 mdadm.h    |  2 +-
 super1.c   |  4 ++++
 4 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 9ff546b..fabca5c 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -664,9 +664,8 @@ static int load_devices(struct devs *devices, char *devmap,
 
 			if (strcmp(c->update, "uuid")==0 &&
 			    ident->bitmap_fd >= 0 && !bitmap_done) {
-				if (bitmap_update_uuid(ident->bitmap_fd,
-						       content->uuid,
-						       tst->ss->swapuuid) != 0)
+				copy_uuid(tst->devs->uuid, content->uuid, tst->ss->swapuuid);
+				if (tst->ss->write_bitmap(tst, dfd, UUIDUpdate))
 					pr_err("Could not update uuid on external bitmap.\n");
 				else
 					bitmap_done = 1;
diff --git a/bitmap.c b/bitmap.c
index bccc67c..7df296e 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -462,23 +462,3 @@ out:
 		unlink(filename); /* possibly corrupted, better get rid of it */
 	return rv;
 }
-
-int bitmap_update_uuid(int fd, int *uuid, int swap)
-{
-	struct bitmap_super_s bm;
-	if (lseek(fd, 0, 0) != 0)
-		return 1;
-	if (read(fd, &bm, sizeof(bm)) != sizeof(bm))
-		return 1;
-	if (bm.magic != __cpu_to_le32(BITMAP_MAGIC))
-		return 1;
-	copy_uuid(bm.uuid, uuid, swap);
-	if (lseek(fd, 0, 0) != 0)
-		return 2;
-	if (write(fd, &bm, sizeof(bm)) != sizeof(bm)) {
-		lseek(fd, 0, 0);
-		return 2;
-	}
-	lseek(fd, 0, 0);
-	return 0;
-}
diff --git a/mdadm.h b/mdadm.h
index 97892e6..7b9bb28 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -358,6 +358,7 @@ enum bitmap_update {
     NoUpdate,
     NameUpdate,
     NodeNumUpdate,
+    UUIDUpdate,
 };
 
 /* structures read from config file */
@@ -1273,7 +1274,6 @@ extern int CreateBitmap(char *filename, int force, char uuid[16],
 			int major);
 extern int ExamineBitmap(char *filename, int brief, struct supertype *st);
 extern int Write_rules(char *rule_name);
-extern int bitmap_update_uuid(int fd, int *uuid, int swap);
 extern unsigned long bitmap_sectors(struct bitmap_super_s *bsb);
 extern int Dump_metadata(char *dev, char *dir, struct context *c,
 			 struct supertype *st);
diff --git a/super1.c b/super1.c
index b36b702..75e1081 100644
--- a/super1.c
+++ b/super1.c
@@ -2250,6 +2250,10 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
 
 	    bms->nodes = __cpu_to_le32(st->nodes);
 	    break;
+	case UUIDUpdate:
+	    memset((char *)bms->uuid, 0, sizeof(bms->uuid));
+	    strncpy((char *)bms->uuid, (char *)st->devs->uuid, sizeof(bms->uuid));
+	    break;
 	case NoUpdate:
 	default:
 	    break;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 02/11] Add nodes option while creating md
  2015-05-20  3:20 ` [PATCH V3 02/11] Add nodes option while creating md Guoqing Jiang
@ 2015-05-25  4:13   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:13 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 5036 bytes --]

On Wed, 20 May 2015 11:20:34 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> Specifies the maximum number of nodes in the cluster that may use
> this device simultaneously. This is equivalent to the number of
> bitmaps created in the internal superblock (patches to follow).
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Create.c   |  1 +
>  ReadMe.c   |  1 +
>  mdadm.8.in |  6 ++++++
>  mdadm.c    | 34 +++++++++++++++++++++++++++++++++-
>  mdadm.h    |  3 +++
>  super1.c   |  1 +
>  6 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/Create.c b/Create.c
> index 69f5432..e4577af 100644
> --- a/Create.c
> +++ b/Create.c
> @@ -531,6 +531,7 @@ int Create(struct supertype *st, char *mddev,
>  				st->ss->name);
>  		warn = 1;
>  	}
> +	st->nodes = c->nodes;
>  
>  	if (warn) {
>  		if (c->runstop!= 1) {
> diff --git a/ReadMe.c b/ReadMe.c
> index 87a4916..30c569d 100644
> --- a/ReadMe.c
> +++ b/ReadMe.c
> @@ -140,6 +140,7 @@ struct option long_options[] = {
>      {"homehost",  1, 0,  HomeHost},
>      {"symlinks",  1, 0,  Symlinks},
>      {"data-offset",1, 0, DataOffset},
> +    {"nodes",1, 0, Nodes},
>  
>      /* For assemble */
>      {"uuid",      1, 0, 'u'},
> diff --git a/mdadm.8.in b/mdadm.8.in
> index 4aec0db..9c1497e 100644
> --- a/mdadm.8.in
> +++ b/mdadm.8.in
> @@ -971,6 +971,12 @@ However for RAID0, it is not possible to add spares.  So to increase
>  the number of devices in a RAID0, it is necessary to set the new
>  number of devices, and to add the new devices, in the same command.
>  
> +.TP
> +.BR \-\-nodes
> +Only works when the array is for clustered environment. It specify the
> +maximum number of nodes in the cluster that will use this device
> +simultaneously. If not specified, this defaults to 4.
> +

"It specifies the maximum"...

>  .SH For assemble:
>  
>  .TP
> diff --git a/mdadm.c b/mdadm.c
> index 3e8c49b..15a43d2 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -588,7 +588,14 @@ int main(int argc, char *argv[])
>  			}
>  			ident.raid_disks = s.raiddisks;
>  			continue;
> -
> +		case O(CREATE, Nodes):
> +			c.nodes = parse_num(optarg);
> +			if (c.nodes <= 0) {
> +				pr_err("invalid number for the number of "
> +						"cluster nodes: %s\n", optarg);

Please don't break strings on to multiple lines.
    pr_err("invalid number for the number of cluster node: %s\n",
           optarg);

it doesn't matter if it exceeds 80 columns for this case.


> +				exit(2);
> +			}
> +			continue;
>  		case O(CREATE,'x'): /* number of spare (eXtra) disks */
>  			if (s.sparedisks) {
>  				pr_err("spare-devices set twice: %d and %s\n",
> @@ -1097,6 +1104,15 @@ int main(int argc, char *argv[])
>  				s.bitmap_file = optarg;
>  				continue;
>  			}
> +			if (strcmp(optarg, "clustered")== 0) {
> +			    s.bitmap_file = optarg;
> +			    /* Set the default number of cluster nodes
> +			     * to 4 if not already set by user
> +			     */
> +			    if (c.nodes < 1)
> +				    c.nodes = 4;
> +			    continue;
> +			}
>  			/* probable typo */
>  			pr_err("bitmap file must contain a '/', or be 'internal', or 'none'\n"
>  				"       not '%s'\n", optarg);
> @@ -1377,6 +1393,22 @@ int main(int argc, char *argv[])
>  	case CREATE:
>  		if (c.delay == 0)
>  			c.delay = DEFAULT_BITMAP_DELAY;
> +
> +		if (strcmp(s.bitmap_file, "clustered") == 0) {
> +			if (s.level != 1) {
> +			    pr_err("--bitmap=clustered is currently supported with RAID mirror only\n");
> +			    rv = 1;
> +			    break;
> +			}
> +		} else {
> +			if (c.nodes) {
> +				pr_err("--nodes argument is incompatible with --bitmap=%s.\n",
> +					s.bitmap_file);
> +				rv = 1;
> +				break;
> +			}

That's a bit careless....

 ./mdadm -C /dev/md0 -l1 -n2 --nodes=5 /dev/loop[01]
 Segmentation fault


NeilBrown



> +		}
> +
>  		if (s.write_behind && !s.bitmap_file) {
>  			pr_err("write-behind mode requires a bitmap.\n");
>  			rv = 1;
> diff --git a/mdadm.h b/mdadm.h
> index 141f963..9d55801 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -344,6 +344,7 @@ enum special_options {
>  	Dump,
>  	Restore,
>  	Action,
> +	Nodes,
>  };
>  
>  enum prefix_standard {
> @@ -418,6 +419,7 @@ struct context {
>  	char	*backup_file;
>  	int	invalid_backup;
>  	char	*action;
> +	int	nodes;
>  };
>  
>  struct shape {
> @@ -1029,6 +1031,7 @@ struct supertype {
>  			 */
>  	int devcnt;
>  	int retry_soon;
> +	int nodes;
>  
>  	struct mdinfo *devs;
>  
> diff --git a/super1.c b/super1.c
> index 7928a3d..78d98a7 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -2144,6 +2144,7 @@ add_internal_bitmap1(struct supertype *st,
>  	bms->daemon_sleep = __cpu_to_le32(delay);
>  	bms->sync_size = __cpu_to_le64(size);
>  	bms->write_behind = __cpu_to_le32(write_behind);
> +	bms->nodes = __cpu_to_le32(st->nodes);
>  
>  	*chunkp = chunk;
>  	return 1;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 03/11] home-cluster while creating an array
  2015-05-20  3:20 ` [PATCH V3 03/11] home-cluster while creating an array Guoqing Jiang
@ 2015-05-25  4:19   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:19 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 9550 bytes --]

On Wed, 20 May 2015 11:20:35 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> The home-cluster is stored in the bitmap super block of the
> array. The device can be assembled on a cluster with the
> cluster name same as the one recorded in the bitmap.
> 
> If home-cluster is not specified, this is auto-detected using
> dlopen corosync cmap library.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Create.c   |  1 +
>  Makefile   |  1 +
>  ReadMe.c   |  1 +
>  config.c   | 27 ++++++++++++++++++++++++++-
>  mdadm.8.in |  6 ++++++
>  mdadm.c    | 25 +++++++++++++++++++++++++
>  mdadm.h    |  5 +++++
>  super1.c   |  3 +++
>  util.c     | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  9 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/Create.c b/Create.c
> index e4577af..9663dc4 100644
> --- a/Create.c
> +++ b/Create.c
> @@ -532,6 +532,7 @@ int Create(struct supertype *st, char *mddev,
>  		warn = 1;
>  	}
>  	st->nodes = c->nodes;
> +	st->cluster_name = c->homecluster;
>  
>  	if (warn) {
>  		if (c->runstop!= 1) {
> diff --git a/Makefile b/Makefile
> index a7d8c5c..431f08b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -101,6 +101,7 @@ endif
>  # If you want a static binary, you might uncomment these
>  # LDFLAGS = -static
>  # STRIP = -s
> +LDLIBS=-ldl
>  
>  INSTALL = /usr/bin/install
>  DESTDIR =
> diff --git a/ReadMe.c b/ReadMe.c
> index 30c569d..c6286ae 100644
> --- a/ReadMe.c
> +++ b/ReadMe.c
> @@ -141,6 +141,7 @@ struct option long_options[] = {
>      {"symlinks",  1, 0,  Symlinks},
>      {"data-offset",1, 0, DataOffset},
>      {"nodes",1, 0, Nodes},
> +    {"home-cluster",1, 0, ClusterName},
>  
>      /* For assemble */
>      {"uuid",      1, 0, 'u'},
> diff --git a/config.c b/config.c
> index 7342c42..21b6afd 100644
> --- a/config.c
> +++ b/config.c
> @@ -77,7 +77,7 @@ char DefaultAltConfFile[] = CONFFILE2;
>  char DefaultAltConfDir[] = CONFFILE2 ".d";
>  
>  enum linetype { Devices, Array, Mailaddr, Mailfrom, Program, CreateDev,
> -		Homehost, AutoMode, Policy, PartPolicy, LTEnd };
> +		Homehost, HomeCluster, AutoMode, Policy, PartPolicy, LTEnd };
>  char *keywords[] = {
>  	[Devices]  = "devices",
>  	[Array]    = "array",
> @@ -86,6 +86,7 @@ char *keywords[] = {
>  	[Program]  = "program",
>  	[CreateDev]= "create",
>  	[Homehost] = "homehost",
> +	[HomeCluster] = "homecluster",
>  	[AutoMode] = "auto",
>  	[Policy]   = "policy",
>  	[PartPolicy]="part-policy",
> @@ -562,6 +563,21 @@ void homehostline(char *line)
>  	}
>  }
>  
> +static char *home_cluster = NULL;
> +void homeclusterline(char *line)
> +{
> +	char *w;
> +
> +	for (w=dl_next(line); w != line ; w=dl_next(w)) {
> +		if (home_cluster == NULL) {
> +			if (strcasecmp(w, "<none>")==0)
> +				home_cluster = xstrdup("");
> +			else
> +				home_cluster = xstrdup(w);
> +		}
> +	}
> +}
> +
>  char auto_yes[] = "yes";
>  char auto_no[] = "no";
>  char auto_homehost[] = "homehost";
> @@ -724,6 +740,9 @@ void conf_file(FILE *f)
>  		case Homehost:
>  			homehostline(line);
>  			break;
> +		case HomeCluster:
> +			homeclusterline(line);
> +			break;
>  		case AutoMode:
>  			autoline(line);
>  			break;
> @@ -884,6 +903,12 @@ char *conf_get_homehost(int *require_homehostp)
>  	return home_host;
>  }
>  
> +char *conf_get_homecluster(void)
> +{
> +	load_conffile();
> +	return home_cluster;
> +}
> +
>  struct createinfo *conf_get_create_info(void)
>  {
>  	load_conffile();
> diff --git a/mdadm.8.in b/mdadm.8.in
> index 9c1497e..56d9bcc 100644
> --- a/mdadm.8.in
> +++ b/mdadm.8.in
> @@ -415,6 +415,12 @@ This functionality is currently only provided by
>  and
>  .BR \-\-monitor .
>  
> +.TP
> +.B \-\-home\-cluster=
> +specifies the cluster name for the md device. The md device can be assembled
> +only on the cluster which matches the name specified. If this option is not
> +provided, mdadm tried to detect the cluster name automatically.

"... mdadm tries to detect...."   s, not d.


> +
>  .SH For create, build, or grow:
>  
>  .TP
> diff --git a/mdadm.c b/mdadm.c
> index 15a43d2..8f567d8 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -596,6 +596,13 @@ int main(int argc, char *argv[])
>  				exit(2);
>  			}
>  			continue;
> +		case O(CREATE, ClusterName):
> +			c.homecluster = optarg;
> +			if (strlen(c.homecluster) > 64) {
> +				pr_err("Cluster name too big.\n");
> +				exit(ERANGE);
> +			}
> +			continue;
>  		case O(CREATE,'x'): /* number of spare (eXtra) disks */
>  			if (s.sparedisks) {
>  				pr_err("spare-devices set twice: %d and %s\n",
> @@ -1276,6 +1283,18 @@ int main(int argc, char *argv[])
>  		c.require_homehost = 0;
>  	}
>  
> +	if (c.homecluster == NULL && (c.nodes > 0)) {
> +		c.homecluster = conf_get_homecluster();
> +		if (c.homecluster == NULL)
> +			rv = get_cluster_name(&c.homecluster);
> +		if (rv == 0) {
> +			c.homehost = xstrdup(c.homecluster);
> +			/* Add a : to differentiate between a host
> +			 * and a cluster */
> +			strcat(c.homehost, ":");
> +		}

That ':' will be copied over the last byte allocated by 'strdup', and the
'\0' will be written beyond the memory allocated by strdup.  That could
corrupt something.


> +	}
> +
>  	if (c.backup_file && data_offset != INVALID_SECTORS) {
>  		pr_err("--backup-file and --data-offset are incompatible\n");
>  		exit(2);
> @@ -1407,6 +1426,12 @@ int main(int argc, char *argv[])
>  				rv = 1;
>  				break;
>  			}
> +			if (c.homecluster) {
> +				pr_err("--home-cluster argument is incompatible with --bitmap=%s.\n",
> +					s.bitmap_file);
> +				rv = 1;
> +				break;
> +			}

Better make sure s.bitmap_file is not NULL here too.


Thanks,
NeilBrown


>  		}
>  
>  		if (s.write_behind && !s.bitmap_file) {
> diff --git a/mdadm.h b/mdadm.h
> index 9d55801..f56d9d6 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -345,6 +345,7 @@ enum special_options {
>  	Restore,
>  	Action,
>  	Nodes,
> +	ClusterName,
>  };
>  
>  enum prefix_standard {
> @@ -420,6 +421,7 @@ struct context {
>  	int	invalid_backup;
>  	char	*action;
>  	int	nodes;
> +	char	*homecluster;
>  };
>  
>  struct shape {
> @@ -1032,6 +1034,7 @@ struct supertype {
>  	int devcnt;
>  	int retry_soon;
>  	int nodes;
> +	char *cluster_name;
>  
>  	struct mdinfo *devs;
>  
> @@ -1308,6 +1311,7 @@ extern char *conf_get_mailaddr(void);
>  extern char *conf_get_mailfrom(void);
>  extern char *conf_get_program(void);
>  extern char *conf_get_homehost(int *require_homehostp);
> +extern char *conf_get_homecluster(void);
>  extern char *conf_line(FILE *file);
>  extern char *conf_word(FILE *file, int allow_key);
>  extern void print_quoted(char *str);
> @@ -1416,6 +1420,7 @@ extern char *stat2devnm(struct stat *st);
>  extern char *fd2devnm(int fd);
>  
>  extern int in_initrd(void);
> +extern int get_cluster_name(char **name);
>  
>  #define _ROUND_UP(val, base)	(((val) + (base) - 1) & ~(base - 1))
>  #define ROUND_UP(val, base)	_ROUND_UP(val, (typeof(val))(base))
> diff --git a/super1.c b/super1.c
> index 78d98a7..60f470b 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -2145,6 +2145,9 @@ add_internal_bitmap1(struct supertype *st,
>  	bms->sync_size = __cpu_to_le64(size);
>  	bms->write_behind = __cpu_to_le32(write_behind);
>  	bms->nodes = __cpu_to_le32(st->nodes);
> +	if (st->cluster_name)
> +	    strncpy((char *)bms->cluster_name,
> +		    st->cluster_name, strlen(st->cluster_name));
>  
>  	*chunkp = chunk;
>  	return 1;
> diff --git a/util.c b/util.c
> index cc98d3b..ed9a745 100644
> --- a/util.c
> +++ b/util.c
> @@ -34,6 +34,8 @@
>  #include	<ctype.h>
>  #include	<dirent.h>
>  #include	<signal.h>
> +#include	<dlfcn.h>
> +#include	<corosync/cmap.h>
>  
>  /*
>   * following taken from linux/blkpg.h because they aren't
> @@ -1976,3 +1978,51 @@ void reopen_mddev(int mdfd)
>  	if (fd >= 0 && fd != mdfd)
>  		dup2(fd, mdfd);
>  }
> +
> +int get_cluster_name(char **cluster_name)
> +{
> +        void *lib_handle = NULL;
> +        int rv = -1;
> +
> +        cmap_handle_t handle;
> +        static int (*initialize)(cmap_handle_t *handle);
> +        static int (*get_string)(cmap_handle_t handle,
> +                        const char *string,
> +                        char **name);
> +        static int (*finalize)(cmap_handle_t handle);
> +
> +
> +        lib_handle = dlopen("libcmap.so.4", RTLD_NOW | RTLD_LOCAL);
> +        if (!lib_handle)
> +                return rv;
> +
> +        initialize = dlsym(lib_handle, "cmap_initialize");
> +        if (!initialize)
> +                goto out;
> +
> +        get_string = dlsym(lib_handle, "cmap_get_string");
> +        if (!get_string)
> +                goto out;
> +
> +        finalize = dlsym(lib_handle, "cmap_finalize");
> +        if (!finalize)
> +                goto out;
> +
> +        rv = initialize(&handle);
> +        if (rv != CS_OK)
> +                goto out;
> +
> +        rv = get_string(handle, "totem.cluster_name", cluster_name);
> +        if (rv != CS_OK) {
> +                free(*cluster_name);
> +                rv = -1;
> +                goto name_err;
> +        }
> +
> +        rv = 0;
> +name_err:
> +        finalize(handle);
> +out:
> +        dlclose(lib_handle);
> +        return rv;
> +}


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 04/11] Show all bitmaps while examining bitmap
  2015-05-20  3:20 ` [PATCH V3 04/11] Show all bitmaps while examining bitmap Guoqing Jiang
@ 2015-05-25  4:23   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:23 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 4817 bytes --]

On Wed, 20 May 2015 11:20:36 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> This adds capability of exmining bitmaps corresponding to all
> nodes/slots on the device.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  bitmap.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 54 insertions(+), 18 deletions(-)
> 
> diff --git a/bitmap.c b/bitmap.c
> index 920033a..bccc67c 100644
> --- a/bitmap.c
> +++ b/bitmap.c
> @@ -260,7 +260,7 @@ int ExamineBitmap(char *filename, int brief, struct supertype *st)
>  	int rv = 1;
>  	char buf[64];
>  	int swap;
> -	int fd;
> +	int fd, i;
>  	__u32 uuid32[4];
>  
>  	fd = bitmap_file_open(filename, &st);
> @@ -317,23 +317,59 @@ int ExamineBitmap(char *filename, int brief, struct supertype *st)
>  		       uuid32[2],
>  		       uuid32[3]);
>  
> -	printf("          Events : %llu\n", (unsigned long long)sb->events);
> -	printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
> -	printf("           State : %s\n", bitmap_state(sb->state));
> -	printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
> -	printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
> -	if (sb->write_behind)
> -		sprintf(buf, "Allow write behind, max %d", sb->write_behind);
> -	else
> -		sprintf(buf, "Normal");
> -	printf("      Write Mode : %s\n", buf);
> -	printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
> -					human_size(sb->sync_size * 512));
> -	if (brief)
> -		goto free_info;
> -	printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
> -			info->total_bits, info->dirty_bits,
> -			100.0 * info->dirty_bits / (info->total_bits?:1));
> +	if (sb->nodes == 0) {
> +		printf("          Events : %llu\n", (unsigned long long)sb->events);
> +		printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
> +		printf("           State : %s\n", bitmap_state(sb->state));
> +		printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
> +		printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
> +		if (sb->write_behind)
> +			sprintf(buf, "Allow write behind, max %d", sb->write_behind);
> +		else
> +			sprintf(buf, "Normal");
> +		printf("      Write Mode : %s\n", buf);
> +		printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
> +						human_size(sb->sync_size * 512));
> +		if (brief)
> +			goto free_info;
> +		printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
> +				info->total_bits, info->dirty_bits,
> +				100.0 * info->dirty_bits / (info->total_bits?:1));
> +	} else {
> +		printf("       Chunksize : %s\n", human_chunksize(sb->chunksize));
> +		printf("          Daemon : %ds flush period\n", sb->daemon_sleep);
> +		if (sb->write_behind)
> +			sprintf(buf, "Allow write behind, max %d", sb->write_behind);
> +		else
> +			sprintf(buf, "Normal");
> +		printf("      Write Mode : %s\n", buf);
> +		printf("       Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
> +						human_size(sb->sync_size * 512));

This is too much duplication.  Duplicate code leads to maintenance errors.

if (sb->nodes == 0)
    print Events/ Events cleared / state

print chunksuze, daemon flush period, write mode sync size

if (sb->nodes > 0) {
   print Events/ Events cleared etc for each bitmap 
}


> +		printf("   Cluster nodes : %d\n", sb->nodes);
> +		printf("    Cluster name : %s\n", sb->cluster_name);

sb->cluster_name is not nul terminated if it is exactly 64 bytes long.
You need "%64s" ... and you need to test that case.


> +		i = 0;
> +		do {

Please make this a regular
   for(i = 0; i < nodes; i++)
loop.  do/while should  only be used when some non-trivial computation is
needed before the condition can be tested.


> +			if (i) {
> +				free(info);
> +				info = bitmap_fd_read(fd, brief);
> +				sb = &info->sb;
> +			}
> +			if (sb->magic != BITMAP_MAGIC)
> +				pr_err("invalid bitmap magic 0x%x, the bitmap file appears to be corrupted\n", sb->magic);
> +
> +			printf("       Node Slot : %d\n", i);
> +			printf("          Events : %llu\n", (unsigned long long)sb->events);
> +			printf("  Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
> +			printf("           State : %s\n", bitmap_state(sb->state));
> +			if (brief)
> +				continue;
> +			printf("          Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
> +					info->total_bits, info->dirty_bits,
> +					100.0 * info->dirty_bits / (info->total_bits?:1));
> +
> +		} while (++i < (int)sb->nodes);
> +	}
> +
>  free_info:
>  	free(info);
>  	return rv;

Thanks,
NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 05/11] Add a new clustered disk
  2015-05-20  3:20 ` [PATCH V3 05/11] Add a new clustered disk Guoqing Jiang
@ 2015-05-25  4:35   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:35 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 3569 bytes --]

On Wed, 20 May 2015 11:20:37 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> A clustered disk is added by the traditional --add sequence.
> However, other nodes need to acknowledge that they can "see"
> the device. This is done by --cluster-confirm:
> 
> --cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
> or
> --cluster-confirm SLOTNUM:missing (if disk is not found)
> 
> The node initiating the --add, has the disk state tagged with
> MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
> MD_DISK_CANDIDATE.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Manage.c   | 33 +++++++++++++++++++++++++++++----
>  ReadMe.c   |  1 +
>  md_p.h     |  7 +++++++
>  md_u.h     |  1 +
>  mdadm.8.in |  9 +++++++++
>  mdadm.c    |  4 ++++
>  mdadm.h    |  2 ++
>  util.c     | 10 ++++++++++
>  8 files changed, 63 insertions(+), 4 deletions(-)
> 
> diff --git a/Manage.c b/Manage.c
> index d3cfb55..4c3d451 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -690,7 +690,8 @@ skip_re_add:
>  int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  	       struct supertype *tst, mdu_array_info_t *array,
>  	       int force, int verbose, char *devname,
> -	       char *update, unsigned long rdev, unsigned long long array_size)
> +	       char *update, unsigned long rdev, unsigned long long array_size,
> +	       int raid_slot)
>  {
>  	unsigned long long ldsize;
>  	struct supertype *dev_st = NULL;
> @@ -879,7 +880,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  	}
>  	disc.major = major(rdev);
>  	disc.minor = minor(rdev);
> -	disc.number =j;
> +	if (raid_slot < 0)
> +		disc.number = j;
> +	else
> +		disc.number = raid_slot;
>  	disc.state = 0;
>  	if (array->not_persistent==0) {
>  		int dfd;
> @@ -920,6 +924,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			}
>  		free(used);
>  	}
> +
> +	if (array->state & (1 << MD_SB_CLUSTERED)) {
> +		if (dv->disposition == 'c')
> +			disc.state |= (1 << MD_DISK_CANDIDATE);
> +		else
> +			disc.state |= (1 << MD_DISK_CLUSTER_ADD);
> +	}
> +
>  	if (dv->writemostly == 1)
>  		disc.state |= (1 << MD_DISK_WRITEMOSTLY);
>  	if (tst->ss->external) {
> @@ -1239,6 +1251,7 @@ int Manage_subdevs(char *devname, int fd,
>  	 *        variant on 'A'
>  	 *  'F' - Another variant of 'A', where the device was faulty
>  	 *        so must be removed from the array first.
> +	 *  'c' - confirm the device as found (for clustered environments)
>  	 *
>  	 * For 'f' and 'r', the device can also be a kernel-internal
>  	 * name such as 'sdb'.
> @@ -1254,6 +1267,7 @@ int Manage_subdevs(char *devname, int fd,
>  	struct mdinfo info;
>  	int frozen = 0;
>  	int busy = 0;
> +	int raid_slot = -1;
>  
>  	if (ioctl(fd, GET_ARRAY_INFO, &array)) {
>  		pr_err("Cannot get array info for %s\n",
> @@ -1282,6 +1296,11 @@ int Manage_subdevs(char *devname, int fd,
>  		int rv;
>  		int mj,mn;
>  
> +		raid_slot = -1;
> +		if (dv->disposition == 'c')
> +			parse_cluster_confirm_arg(dv->devname, &dv->devname,
> +					&raid_slot);
> +

This function returns -1 if it couldn't successfully parse dv->devname, but
you aren't catching that error and reporting it.


NeilBrown


>  
> +int parse_cluster_confirm_arg(char *input, char **devname, int *slot)
> +{
> +	char *dev;
> +	*slot = strtoul(input, &dev, 10);
> +	if (dev == input || dev[0] != ':')
> +		return -1;
> +	*devname = dev+1;
> +	return 0;
> +}
> +

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 06/11] Convert a bitmap=none device to clustered
  2015-05-20  3:20 ` [PATCH V3 06/11] Convert a bitmap=none device to clustered Guoqing Jiang
@ 2015-05-25  4:40   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:40 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 2943 bytes --]

On Wed, 20 May 2015 11:20:38 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> This adds the ability to convert a regular md without bitmap
> (--bitmap=none) to a clustered device (--bitmap=clustered).
> 
> To convert a device with --bitmap=internal or --bitmap=external,
> you have to convert to --bitmap=none and then re-execute the
> command with --bitmap=clustered.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Grow.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/Grow.c b/Grow.c
> index 9a573fd..1122cec 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -330,9 +330,16 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
>  			}
>  			return 0;
>  		}
> -		pr_err("Internal bitmap already present on %s\n",
> -			devname);
> -		return 1;
> +		if ((strcmp(s->bitmap_file, "clustered")==0) && (array.state & (1<<MD_SB_CLUSTERED))) {
> +			pr_err("Clustered bitmap already present on %s\n",
> +					devname);
> +			return 1;
> +		}
> +		if ((strcmp(s->bitmap_file, "internal")==0) && (!(array.state & (1<<MD_SB_CLUSTERED)))) {
> +			pr_err("Internal bitmap already present on %s\n",
> +					devname);
> +			return 1;
> +		}

You shouldn't be checking the value of s->bitmap_file here.  No matter what
value it has, there is some sort of bitmap present and you should report what
sort and return.


>  	}
>  
>  	if (strcmp(s->bitmap_file, "none") == 0) {
> @@ -375,7 +382,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
>  		free(st);
>  		return 1;
>  	}
> -	if (strcmp(s->bitmap_file, "internal") == 0) {
> +	if ((strcmp(s->bitmap_file, "internal") == 0) ||
> +		(strcmp(s->bitmap_file, "clustered") == 0)) {

Indentation is wrong.  It should be:

   if ((strcmp(......) == 0) ||
       (strcmp(......) == 0)) {

except that the extra parentheses are not needed. so

   if (strcmp(......) == 0 ||
       strcmp(......) == 0) {


Thanks,
NeilBrown

>  		int rv;
>  		int d;
>  		int offset_setable = 0;
> @@ -384,6 +392,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
>  			pr_err("Internal bitmaps not supported with %s metadata\n", st->ss->name);
>  			return 1;
>  		}
> +		st->nodes = c->nodes;
> +		st->cluster_name = c->homecluster;
>  		mdi = sysfs_read(fd, NULL, GET_BITMAP_LOCATION);
>  		if (mdi)
>  			offset_setable = 1;
> @@ -426,6 +436,8 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
>  			rv = sysfs_set_num_signed(mdi, NULL, "bitmap/location",
>  						  mdi->bitmap_offset);
>  		} else {
> +			if (strcmp(s->bitmap_file, "clustered") == 0)
> +				array.state |= (1<<MD_SB_CLUSTERED);
>  			array.state |= (1<<MD_SB_BITMAP_PRESENT);
>  			rv = ioctl(fd, SET_ARRAY_INFO, &array);
>  		}


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 08/11] mdadm: add the ability to change cluster name
  2015-05-20  3:20 ` [PATCH V3 08/11] mdadm: add the ability to change cluster name Guoqing Jiang
@ 2015-05-25  4:53   ` NeilBrown
  2015-05-26  8:38     ` Guoqing Jiang
  2015-06-01 16:26     ` Goldwyn Rodrigues
  0 siblings, 2 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:53 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 7103 bytes --]

On Wed, 20 May 2015 11:20:40 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> To support change the cluster name, the commit do the followings:
> 
> 1. extend original write_bitmap function for new scenario.
> 2. add the scenarion to handle the modification of cluster's name
>    in write_bitmap1.
> 3. make update_super1 can change the name in mdp_superblock_1.

You haven't documented --update=home-cluster in mdadm.8.in, or at

			fprintf(outf, "Valid --update options are:\n"

Also, I just realised that you are storing the cluster name in the array
name.  I don't think that is a clever idea.
The cluster name can be 64 chars.  The array name can only be 32.

I think leave homehost and homecluster completely out of the array name when
the array is clustered.

and
> +	    new_name = xmalloc(sizeof(sb->set_name));

is really unnecessary.  Just do "char new_name[32];".
But you are probably going to remove that code anyway.

NeilBrown



> 
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Assemble.c |  5 +++++
>  Grow.c     |  2 +-
>  mdadm.c    |  3 +++
>  mdadm.h    |  7 ++++++-
>  super0.c   |  4 ++--
>  super1.c   | 43 ++++++++++++++++++++++++++++++++++++++++---
>  6 files changed, 57 insertions(+), 7 deletions(-)
> 
> diff --git a/Assemble.c b/Assemble.c
> index 25a103d..e1b846c 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -644,6 +644,11 @@ static int load_devices(struct devs *devices, char *devmap,
>  				*stp = st;
>  				return -1;
>  			}
> +			if (strcmp(c->update, "home-cluster") == 0) {
> +				err = tst->ss->update_super(tst, content, c->update,
> +							    devname, 0, 0, c->homecluster);
> +				tst->ss->write_bitmap(tst, dfd, NameUpdate);
> +			}
>  			if (strcmp(c->update, "uuid")==0 &&
>  			    !ident->uuid_set) {
>  				ident->uuid_set = 1;
> diff --git a/Grow.c b/Grow.c
> index 1122cec..bf44e66 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -420,7 +420,7 @@ int Grow_addbitmap(char *devname, int fd, struct context *c, struct shape *s)
>  						    bitmapsize, offset_setable,
>  						    major)
>  						)
> -						st->ss->write_bitmap(st, fd2);
> +						st->ss->write_bitmap(st, fd2, NoUpdate);
>  					else {
>  						pr_err("failed to create internal bitmap - chunksize problem.\n");
>  						close(fd2);
> diff --git a/mdadm.c b/mdadm.c
> index 56fdeb7..22f4fc7 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -598,6 +598,7 @@ int main(int argc, char *argv[])
>  			}
>  			continue;
>  		case O(CREATE, ClusterName):
> +		case O(ASSEMBLE, ClusterName):
>  			c.homecluster = optarg;
>  			if (strlen(c.homecluster) > 64) {
>  				pr_err("Cluster name too big.\n");
> @@ -741,6 +742,8 @@ int main(int argc, char *argv[])
>  				continue;
>  			if (strcmp(c.update, "homehost")==0)
>  				continue;
> +			if (strcmp(c.update, "home-cluster")==0)
> +				continue;
>  			if (strcmp(c.update, "devicesize")==0)
>  				continue;
>  			if (strcmp(c.update, "no-bitmap")==0)
> diff --git a/mdadm.h b/mdadm.h
> index 00c726e..d8b0749 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -354,6 +354,11 @@ enum prefix_standard {
>  	IEC
>  };
>  
> +enum bitmap_update {
> +    NoUpdate,
> +    NameUpdate,
> +};
> +
>  /* structures read from config file */
>  /* List of mddevice names and identifiers
>   * Identifiers can be:
> @@ -850,7 +855,7 @@ extern struct superswitch {
>  	/* if add_internal_bitmap succeeded for existing array, this
>  	 * writes it out.
>  	 */
> -	int (*write_bitmap)(struct supertype *st, int fd);
> +	int (*write_bitmap)(struct supertype *st, int fd, enum bitmap_update update);
>  	/* Free the superblock and any other allocated data */
>  	void (*free_super)(struct supertype *st);
>  
> diff --git a/super0.c b/super0.c
> index deb5999..6ad9d39 100644
> --- a/super0.c
> +++ b/super0.c
> @@ -900,7 +900,7 @@ static int write_init_super0(struct supertype *st)
>  		rv = store_super0(st, di->fd);
>  
>  		if (rv == 0 && (sb->state & (1<<MD_SB_BITMAP_PRESENT)))
> -			rv = st->ss->write_bitmap(st, di->fd);
> +			rv = st->ss->write_bitmap(st, di->fd, NoUpdate);
>  
>  		if (rv)
>  			pr_err("failed to write superblock to %s\n",
> @@ -1175,7 +1175,7 @@ static void locate_bitmap0(struct supertype *st, int fd)
>  	lseek64(fd, offset, 0);
>  }
>  
> -static int write_bitmap0(struct supertype *st, int fd)
> +static int write_bitmap0(struct supertype *st, int fd, enum bitmap_update update)
>  {
>  	unsigned long long dsize;
>  	unsigned long long offset;
> diff --git a/super1.c b/super1.c
> index fd728d2..07944d4 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -1073,7 +1073,23 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
>  		info->name[32] = 0;
>  	}
>  
> -	if (strcmp(update, "force-one")==0) {
> +	if (strcmp(update, "home-cluster") == 0 &&
> +	    homehost) {
> +		/* Note that 'home-cluster' is to change the name of cluster,
> +		 * it is another "name" update.
> +		 */
> +		char *new_name = xmalloc(sizeof(sb->set_name));
> +		if (strrchr(sb->set_name, ':')) {
> +			strcpy(new_name, strchr(sb->set_name, ':'));
> +		}
> +
> +		memset(sb->set_name, 0, sizeof(sb->set_name));
> +		strcpy(sb->set_name, homehost);
> +		if (new_name)
> +			strcat(sb->set_name, new_name);
> +
> +		free(new_name);
> +	} else if (strcmp(update, "force-one")==0) {
>  		/* Not enough devices for a working array,
>  		 * so bring this one up-to-date
>  		 */
> @@ -1691,7 +1707,7 @@ static int write_init_super1(struct supertype *st)
>  		sb->sb_csum = calc_sb_1_csum(sb);
>  		rv = store_super1(st, di->fd);
>  		if (rv == 0 && (__le32_to_cpu(sb->feature_map) & 1))
> -			rv = st->ss->write_bitmap(st, di->fd);
> +			rv = st->ss->write_bitmap(st, di->fd, NoUpdate);
>  		close(di->fd);
>  		di->fd = -1;
>  		if (rv)
> @@ -2175,7 +2191,7 @@ static void locate_bitmap1(struct supertype *st, int fd)
>  	lseek64(fd, offset<<9, 0);
>  }
>  
> -static int write_bitmap1(struct supertype *st, int fd)
> +static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update)
>  {
>  	struct mdp_superblock_1 *sb = st->sb;
>  	bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb)+MAX_SB_SIZE);
> @@ -2184,6 +2200,27 @@ static int write_bitmap1(struct supertype *st, int fd)
>  	int towrite, n;
>  	struct align_fd afd;
>  	unsigned int i = 0;
> +	char *new_name;
> +
> +	switch (update) {
> +	case NameUpdate:
> +	    new_name = xmalloc(sizeof(sb->set_name));
> +
> +	    strncpy(new_name, sb->set_name, sizeof(sb->set_name));
> +	    memset((char *)bms->cluster_name, 0, sizeof(bms->cluster_name));
> +
> +	    if (strtok(new_name, ":"))
> +		strncpy((char *)bms->cluster_name, new_name, strlen(sb->set_name));
> +	    else
> +		/* In case the original set_name doesn't like aaa:md* */
> +		strncpy((char *)bms->cluster_name, sb->set_name, strlen(sb->set_name));
> +
> +	    free(new_name);
> +	    break;
> +	case NoUpdate:
> +	default:
> +	    break;
> +	}
>  
>  	init_afd(&afd, fd);
>  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 09/11] mdadm: change the num of cluster node
  2015-05-20  3:20 ` [PATCH V3 09/11] mdadm: change the num of cluster node Guoqing Jiang
@ 2015-05-25  4:56   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:56 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 4936 bytes --]

On Wed, 20 May 2015 11:20:41 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> This extends nodes option for assemble mode, make the num of
> cluster node could be change by user.
> 
> Before that, it is necessary to ensure there are enough space
> for those nodes, calc_bitmap_size is introduced to calculate
> the bitmap size of each node.
> 
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Assemble.c |  4 ++++
>  ReadMe.c   |  2 +-
>  mdadm.c    |  3 +++
>  mdadm.h    |  1 +
>  super1.c   | 37 +++++++++++++++++++++++++++++++++++++
>  5 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/Assemble.c b/Assemble.c
> index e1b846c..9ff546b 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -626,6 +626,10 @@ static int load_devices(struct devs *devices, char *devmap,
>  
>  			if (strcmp(c->update, "byteorder") == 0)
>  				err = 0;
> +			else if (strcmp(c->update, "nodes") == 0) {
> +				tst->nodes = c->nodes;
> +				err = tst->ss->write_bitmap(tst, dfd, NodeNumUpdate);
> +			}
>  			else
>  				err = tst->ss->update_super(tst, content, c->update,
>  							    devname, c->verbose,
> diff --git a/ReadMe.c b/ReadMe.c
> index c854cd5..d1830e1 100644
> --- a/ReadMe.c
> +++ b/ReadMe.c
> @@ -140,7 +140,7 @@ struct option long_options[] = {
>      {"homehost",  1, 0,  HomeHost},
>      {"symlinks",  1, 0,  Symlinks},
>      {"data-offset",1, 0, DataOffset},
> -    {"nodes",1, 0, Nodes},
> +    {"nodes",1, 0, Nodes}, /* also for --assemble */
>      {"home-cluster",1, 0, ClusterName},
>  
>      /* For assemble */
> diff --git a/mdadm.c b/mdadm.c
> index 22f4fc7..87c572d 100644
> --- a/mdadm.c
> +++ b/mdadm.c
> @@ -589,6 +589,7 @@ int main(int argc, char *argv[])
>  			}
>  			ident.raid_disks = s.raiddisks;
>  			continue;
> +		case O(ASSEMBLE, Nodes):
>  		case O(CREATE, Nodes):
>  			c.nodes = parse_num(optarg);
>  			if (c.nodes <= 0) {
> @@ -744,6 +745,8 @@ int main(int argc, char *argv[])
>  				continue;
>  			if (strcmp(c.update, "home-cluster")==0)
>  				continue;
> +			if (strcmp(c.update, "nodes")==0)
> +				continue;
>  			if (strcmp(c.update, "devicesize")==0)
>  				continue;
>  			if (strcmp(c.update, "no-bitmap")==0)
> diff --git a/mdadm.h b/mdadm.h
> index d8b0749..97892e6 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -357,6 +357,7 @@ enum prefix_standard {
>  enum bitmap_update {
>      NoUpdate,
>      NameUpdate,
> +    NodeNumUpdate,
>  };
>  
>  /* structures read from config file */
> diff --git a/super1.c b/super1.c
> index 07944d4..fe12e81 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -134,6 +134,20 @@ struct misc_dev_info {
>  					|MD_FEATURE_NEW_OFFSET		\
>  					)
>  
> +/* return how many bytes are needed for bitmap, for cluster-md each node
> + * should have it's own bitmap */
> +static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned int boundary)
> +{
> +	unsigned long long bits, bytes;
> +
> +	bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
> +	bytes = (bits+7) >> 3;
> +	bytes += sizeof(bitmap_super_t);
> +	bytes = ROUND_UP(bytes, boundary);
> +
> +	return bytes;
> +}
> +
>  static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb)
>  {
>  	unsigned int disk_csum, csum;
> @@ -2201,6 +2215,7 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
>  	struct align_fd afd;
>  	unsigned int i = 0;
>  	char *new_name;
> +	unsigned long long total_bm_space, bm_space_per_node;
>  
>  	switch (update) {
>  	case NameUpdate:
> @@ -2217,6 +2232,28 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
>  
>  	    free(new_name);
>  	    break;
> +	case NodeNumUpdate:
> +	    /* cluster md only supports superblock 1.2 now */
> +	    if (st->minor_version != 2) {
> +		pr_err("Warning: cluster md only works with superblock 1.2\n");
> +		return -EINVAL;
> +	    }
> +
> +	    /* Each node has an independent bitmap, it is necessary to calculate the
> +	     * space is enough or not, first get how many bytes for the total bitmap */
> +	    bm_space_per_node = calc_bitmap_size(bms, 4096);
> +
> +	    total_bm_space = 512 * (__le64_to_cpu(sb->data_offset) - __le64_to_cpu(sb->super_offset));
> +	    total_bm_space = total_bm_space - 4096; /* leave another 4k for superblock */
> +
> +	    if (bm_space_per_node * st->nodes > total_bm_space) {
> +		pr_err("Warning: The max num of nodes can't exceed %llu\n",
> +			total_bm_space / bm_space_per_node);
> +		return -ENOMEM;
> +	    }
> +
> +	    bms->nodes = __cpu_to_le32(st->nodes);
> +	    break;
>  	case NoUpdate:
>  	default:
>  	    break;


Again, missing documentation for the new --update option.
And indents should be tabs, not "4 spaces".  That "switch(update)"
has wrong indents - I didn't notice before.

Thanks,
NeilBrown.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 11/11] Reuse the write_bitmap for update uuid
  2015-05-20  3:20 ` [PATCH V3 11/11] Reuse the write_bitmap for update uuid Guoqing Jiang
@ 2015-05-25  4:59   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  4:59 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 3280 bytes --]

On Wed, 20 May 2015 11:20:43 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> Since write_bitmap is extended for handle different sistuations,
> then it also could possible to support change the uuid of bitmap,
> and remove bitmap_update_uuid accordingly.
> 
> Q: is the write_bitmap0 also impacted?

Yes.  You would need to add UUIDUpdate handling to write_bitmap0 too.

Thanks,
NeilBrown



> 
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  Assemble.c |  5 ++---
>  bitmap.c   | 20 --------------------
>  mdadm.h    |  2 +-
>  super1.c   |  4 ++++
>  4 files changed, 7 insertions(+), 24 deletions(-)
> 
> diff --git a/Assemble.c b/Assemble.c
> index 9ff546b..fabca5c 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -664,9 +664,8 @@ static int load_devices(struct devs *devices, char *devmap,
>  
>  			if (strcmp(c->update, "uuid")==0 &&
>  			    ident->bitmap_fd >= 0 && !bitmap_done) {
> -				if (bitmap_update_uuid(ident->bitmap_fd,
> -						       content->uuid,
> -						       tst->ss->swapuuid) != 0)
> +				copy_uuid(tst->devs->uuid, content->uuid, tst->ss->swapuuid);
> +				if (tst->ss->write_bitmap(tst, dfd, UUIDUpdate))
>  					pr_err("Could not update uuid on external bitmap.\n");
>  				else
>  					bitmap_done = 1;
> diff --git a/bitmap.c b/bitmap.c
> index bccc67c..7df296e 100644
> --- a/bitmap.c
> +++ b/bitmap.c
> @@ -462,23 +462,3 @@ out:
>  		unlink(filename); /* possibly corrupted, better get rid of it */
>  	return rv;
>  }
> -
> -int bitmap_update_uuid(int fd, int *uuid, int swap)
> -{
> -	struct bitmap_super_s bm;
> -	if (lseek(fd, 0, 0) != 0)
> -		return 1;
> -	if (read(fd, &bm, sizeof(bm)) != sizeof(bm))
> -		return 1;
> -	if (bm.magic != __cpu_to_le32(BITMAP_MAGIC))
> -		return 1;
> -	copy_uuid(bm.uuid, uuid, swap);
> -	if (lseek(fd, 0, 0) != 0)
> -		return 2;
> -	if (write(fd, &bm, sizeof(bm)) != sizeof(bm)) {
> -		lseek(fd, 0, 0);
> -		return 2;
> -	}
> -	lseek(fd, 0, 0);
> -	return 0;
> -}
> diff --git a/mdadm.h b/mdadm.h
> index 97892e6..7b9bb28 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -358,6 +358,7 @@ enum bitmap_update {
>      NoUpdate,
>      NameUpdate,
>      NodeNumUpdate,
> +    UUIDUpdate,
>  };
>  
>  /* structures read from config file */
> @@ -1273,7 +1274,6 @@ extern int CreateBitmap(char *filename, int force, char uuid[16],
>  			int major);
>  extern int ExamineBitmap(char *filename, int brief, struct supertype *st);
>  extern int Write_rules(char *rule_name);
> -extern int bitmap_update_uuid(int fd, int *uuid, int swap);
>  extern unsigned long bitmap_sectors(struct bitmap_super_s *bsb);
>  extern int Dump_metadata(char *dev, char *dir, struct context *c,
>  			 struct supertype *st);
> diff --git a/super1.c b/super1.c
> index b36b702..75e1081 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -2250,6 +2250,10 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
>  
>  	    bms->nodes = __cpu_to_le32(st->nodes);
>  	    break;
> +	case UUIDUpdate:
> +	    memset((char *)bms->uuid, 0, sizeof(bms->uuid));
> +	    strncpy((char *)bms->uuid, (char *)st->devs->uuid, sizeof(bms->uuid));
> +	    break;
>  	case NoUpdate:
>  	default:
>  	    break;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 00/11] mdadm tool: add the support for cluster-md
  2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
                   ` (10 preceding siblings ...)
  2015-05-20  3:20 ` [PATCH V3 11/11] Reuse the write_bitmap for update uuid Guoqing Jiang
@ 2015-05-25  5:03 ` NeilBrown
  11 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2015-05-25  5:03 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, rgoldwyn

[-- Attachment #1: Type: text/plain, Size: 3501 bytes --]

On Wed, 20 May 2015 11:20:32 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:

> V3 changes:
> 1. re-orgnize some codes to ensure mdadm compiles after each patch is applied
> 2. change the code for super1.c for first patch since it has side effect for
> non-cluster condition
> 
> V2 changes:
> 1. re-arrange the squence of patches
> 2. add some memembers into sb_le_to_cpu
> 3. handle some logic change and comments from Neil
> 
> Basic background for Cluster MD: Cluster MD is a shared-device RAID for a
> cluster, currently, the implementation is limited to RAID1 but with further
> work (and some positive feedback), it could be extend to other RAID levels.
> 
> The kernel part code of cluster-md has been sent to maillist several month
> ago by Goldywyn, and to make cluster-md works, the mdadm tools also need to
> do some changes accordingly.
> 
> This patch set extends mdadm tool to aware cluster MD scenario, and handle
> related cluster-md scenario.
> 
> 1. the first part (0001-0007) comes from Goldwyn, which add initial
> support for cluster-md, those changes included make mdadm awares nodes,
> home-cluster and n bitmaps for clustered mode, also let mdadm can 
> confirm disk which is added by another node.
> 
> 
> 2. the second part is for support change cluster-name and node nums under
> assemble mode. Which extend write-bitmap to handle above cases, and also
> use the extended write_bitmap for update uuid. [PATCH V2 10/10] is just compiled
> test only.
> 
> BTW: this series is based on commit "72a457 IMSM: Count arrays per orom".
> 
> Some reltated links:
> [1] http://marc.info/?l=linux-raid&m=141891941330336&w=2
> [2] http://marc.info/?l=linux-raid&m=141935561418770&w=2
> 
> Guoqing Jiang (11):
>   Create n bitmaps for clustered mode
>   Add nodes option while creating md
>   home-cluster while creating an array
>   Show all bitmaps while examining bitmap
>   Add a new clustered disk
>   Convert a bitmap=none device to clustered
>   Skip clustered devices in incremental
>   mdadm: add the ability to change cluster name
>   mdadm: change the num of cluster node
>   Reuse calc_bitmap_size to reduce code size
>   Reuse the write_bitmap for update uuid
> 
>  Assemble.c    |  14 ++++--
>  Create.c      |   5 +-
>  Grow.c        |  22 +++++++--
>  Incremental.c |   5 ++
>  Makefile      |   1 +
>  Manage.c      |  33 +++++++++++--
>  ReadMe.c      |   3 ++
>  bitmap.c      |  94 ++++++++++++++++++++++---------------
>  bitmap.h      |   7 ++-
>  config.c      |  27 ++++++++++-
>  md_p.h        |   7 +++
>  md_u.h        |   1 +
>  mdadm.8.in    |  28 +++++++++++-
>  mdadm.c       |  69 +++++++++++++++++++++++++++-
>  mdadm.h       |  20 +++++++-
>  super0.c      |   4 +-
>  super1.c      | 145 +++++++++++++++++++++++++++++++++++++++++++++++-----------
>  util.c        |  60 ++++++++++++++++++++++++
>  18 files changed, 458 insertions(+), 87 deletions(-)
> 


Thanks.  This looked like it is getting close.  Most of the things I have
commented on a fairly minor and should be easy to fix.
Hopefully the next time you post I have just add all the patched to my git
tree and we can take incremental patches from there.

I hope to make a 3.3.3 release of mdadm soonish, and then I'll target 3.4 to
primarily just add the clustering stuff.
If I get the next patchset before I've released 3.3.3, I'll just put it in a
separate branch and merge it later.

Thanks,
NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 08/11] mdadm: add the ability to change cluster name
  2015-05-25  4:53   ` NeilBrown
@ 2015-05-26  8:38     ` Guoqing Jiang
  2015-06-01 16:26     ` Goldwyn Rodrigues
  1 sibling, 0 replies; 23+ messages in thread
From: Guoqing Jiang @ 2015-05-26  8:38 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, rgoldwyn

NeilBrown wrote:
> On Wed, 20 May 2015 11:20:40 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:
>
>   
>> To support change the cluster name, the commit do the followings:
>>
>> 1. extend original write_bitmap function for new scenario.
>> 2. add the scenarion to handle the modification of cluster's name
>>    in write_bitmap1.
>> 3. make update_super1 can change the name in mdp_superblock_1.
>>     
>
> You haven't documented --update=home-cluster in mdadm.8.in, or at
>
> 			fprintf(outf, "Valid --update options are:\n"
>   
Sorry, I will add it.
> Also, I just realised that you are storing the cluster name in the array
> name.  I don't think that is a clever idea.
> The cluster name can be 64 chars.  The array name can only be 32.
>
>   
Yes, you are correct. The array name is combined by homecluster and
homehost.
Generally, it doesn't have problem since people don't set the longer
name generally,
of course it is better to double check the length.

if  strlen( homecluster + homehost) < 32
    set the new name
else
    tell the user new cluster is too long, return
> I think leave homehost and homecluster completely out of the array name when
> the array is clustered.  
Could you please elaborate more about it? Does it mean add extra cluster
member in
superblock? Something like:

diff --git a/super1.c b/super1.c
index b949f5e..4ea1115 100644
--- a/super1.c
+++ b/super1.c
        char    set_name[32];   /* set and interpreted by user-space */
+       char    cluster_name[64];

If so, the kernel code also need related modification.

Or just let the bitmap stores the clustername, and the set_name doesn't
need to
be change,  then display the clustername from bitmap when run 'mdadm -E/D'.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 08/11] mdadm: add the ability to change cluster name
  2015-05-25  4:53   ` NeilBrown
  2015-05-26  8:38     ` Guoqing Jiang
@ 2015-06-01 16:26     ` Goldwyn Rodrigues
  1 sibling, 0 replies; 23+ messages in thread
From: Goldwyn Rodrigues @ 2015-06-01 16:26 UTC (permalink / raw)
  To: NeilBrown, Guoqing Jiang; +Cc: linux-raid

Hi Neil,

On 05/24/2015 11:53 PM, NeilBrown wrote:
> On Wed, 20 May 2015 11:20:40 +0800 Guoqing Jiang <gqjiang@suse.com> wrote:
>
>> To support change the cluster name, the commit do the followings:
>>
>> 1. extend original write_bitmap function for new scenario.
>> 2. add the scenarion to handle the modification of cluster's name
>>     in write_bitmap1.
>> 3. make update_super1 can change the name in mdp_superblock_1.
>
> You haven't documented --update=home-cluster in mdadm.8.in, or at
>
> 			fprintf(outf, "Valid --update options are:\n"
>
> Also, I just realised that you are storing the cluster name in the array
> name.  I don't think that is a clever idea.
> The cluster name can be 64 chars.  The array name can only be 32.
>
> I think leave homehost and homecluster completely out of the array name when
> the array is clustered.
>
> and
>> +	    new_name = xmalloc(sizeof(sb->set_name));
>
> is really unnecessary.  Just do "char new_name[32];".
> But you are probably going to remove that code anyway.
>

We had decided to keep the clustername in the array name, instead of the 
homehost, when we create a new array because the arrayname carries the 
hostname of the node which creates the device. However, we differentiate 
the hostname and the clustername by the number of colons.

For non-clustered-

arrayname:hostname

For clustered-

arrayname::clustername

Since, this is a limited size, we can truncate the name instead of 
erring. Some tricks with strncpy should work fine.

This is for information purposes is only and not used for anything 
(including joining the cluster)

However, leaving the homehost and homecluster out of the arrayname seems 
a better option than providing half-baked information.

Guoqing, you may have to change the "create" patch accordingly.

<snipped>

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-06-01 16:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20  3:20 [PATCH V3 00/11] mdadm tool: add the support for cluster-md Guoqing Jiang
2015-05-20  3:20 ` [PATCH V3 01/11] Create n bitmaps for clustered mode Guoqing Jiang
2015-05-20  3:20 ` [PATCH V3 02/11] Add nodes option while creating md Guoqing Jiang
2015-05-25  4:13   ` NeilBrown
2015-05-20  3:20 ` [PATCH V3 03/11] home-cluster while creating an array Guoqing Jiang
2015-05-25  4:19   ` NeilBrown
2015-05-20  3:20 ` [PATCH V3 04/11] Show all bitmaps while examining bitmap Guoqing Jiang
2015-05-25  4:23   ` NeilBrown
2015-05-20  3:20 ` [PATCH V3 05/11] Add a new clustered disk Guoqing Jiang
2015-05-25  4:35   ` NeilBrown
2015-05-20  3:20 ` [PATCH V3 06/11] Convert a bitmap=none device to clustered Guoqing Jiang
2015-05-25  4:40   ` NeilBrown
2015-05-20  3:20 ` [PATCH V3 07/11] Skip clustered devices in incremental Guoqing Jiang
2015-05-20  3:20 ` [PATCH V3 08/11] mdadm: add the ability to change cluster name Guoqing Jiang
2015-05-25  4:53   ` NeilBrown
2015-05-26  8:38     ` Guoqing Jiang
2015-06-01 16:26     ` Goldwyn Rodrigues
2015-05-20  3:20 ` [PATCH V3 09/11] mdadm: change the num of cluster node Guoqing Jiang
2015-05-25  4:56   ` NeilBrown
2015-05-20  3:20 ` [PATCH V3 10/11] Reuse calc_bitmap_size to reduce code size Guoqing Jiang
2015-05-20  3:20 ` [PATCH V3 11/11] Reuse the write_bitmap for update uuid Guoqing Jiang
2015-05-25  4:59   ` NeilBrown
2015-05-25  5:03 ` [PATCH V3 00/11] mdadm tool: add the support for cluster-md NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.