All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Global Heartbeat - fs patches
@ 2010-09-14 22:50 Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter Sunil Mushran
                   ` (20 more replies)
  0 siblings, 21 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel


So this is the next drop of the global heartbeat patches that have been
rebased with current mainline head. The patches are feature-wise complete.

Please refer to this wiki to learn more on this feature.
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/NewGlobalHeartbeat

Please review.

Sunil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-25  8:11   ` Wengang Wang
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 02/20] ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO Sunil Mushran
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Add heartbeat mode parameter to the configfs tree. This will be used
to set/show the heartbeat mode. The user is free to toggle the mode
between local and global as long as there is no active heartbeat region.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   70 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 41d5f1f..57cc715 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -77,7 +77,19 @@ static struct o2hb_callback *hbcall_from_type(enum o2hb_callback_type type);
 
 #define O2HB_DEFAULT_BLOCK_BITS       9
 
+enum o2hb_heartbeat_modes {
+	O2HB_HEARTBEAT_LOCAL		= 0,
+	O2HB_HEARTBEAT_GLOBAL,
+	O2HB_HEARTBEAT_NUM_MODES,
+};
+
+char *o2hb_heartbeat_mode_desc[O2HB_HEARTBEAT_NUM_MODES] = {
+		"local",	/* O2HB_HEARTBEAT_LOCAL */
+		"global",	/* O2HB_HEARTBEAT_GLOBAL */
+};
+
 unsigned int o2hb_dead_threshold = O2HB_DEFAULT_DEAD_THRESHOLD;
+unsigned int o2hb_heartbeat_mode = O2HB_HEARTBEAT_LOCAL;
 
 /* Only sets a new threshold if there are no active regions.
  *
@@ -94,6 +106,22 @@ static void o2hb_dead_threshold_set(unsigned int threshold)
 	}
 }
 
+static int o2hb_global_hearbeat_mode_set(unsigned int hb_mode)
+{
+	int ret = -1;
+
+	if (hb_mode < O2HB_HEARTBEAT_NUM_MODES) {
+		spin_lock(&o2hb_live_lock);
+		if (list_empty(&o2hb_all_regions)) {
+			o2hb_heartbeat_mode = hb_mode;
+			ret = 0;
+		}
+		spin_unlock(&o2hb_live_lock);
+	}
+
+	return ret;
+}
+
 struct o2hb_node_event {
 	struct list_head        hn_item;
 	enum o2hb_callback_type hn_event_type;
@@ -1688,6 +1716,39 @@ static ssize_t o2hb_heartbeat_group_threshold_store(struct o2hb_heartbeat_group
 	return count;
 }
 
+static
+ssize_t o2hb_heartbeat_group_mode_show(struct o2hb_heartbeat_group *group,
+				       char *page)
+{
+	return sprintf(page, "%s\n",
+		       o2hb_heartbeat_mode_desc[o2hb_heartbeat_mode]);
+}
+
+static
+ssize_t o2hb_heartbeat_group_mode_store(struct o2hb_heartbeat_group *group,
+					const char *page, size_t count)
+{
+	unsigned int i;
+	int ret;
+	size_t len;
+
+	len = (page[count - 1] == '\n') ? count - 1 : count;
+
+	for (i = 0; i < O2HB_HEARTBEAT_NUM_MODES; ++i) {
+		if (strnicmp(page, o2hb_heartbeat_mode_desc[i], len))
+			continue;
+
+		ret = o2hb_global_hearbeat_mode_set(i);
+		if (!ret)
+			printk(KERN_NOTICE "ocfs2: Heartbeat mode set to %s\n",
+			       o2hb_heartbeat_mode_desc[i]);
+		return count;
+	}
+
+	return -EINVAL;
+
+}
+
 static struct o2hb_heartbeat_group_attribute o2hb_heartbeat_group_attr_threshold = {
 	.attr	= { .ca_owner = THIS_MODULE,
 		    .ca_name = "dead_threshold",
@@ -1696,8 +1757,17 @@ static struct o2hb_heartbeat_group_attribute o2hb_heartbeat_group_attr_threshold
 	.store	= o2hb_heartbeat_group_threshold_store,
 };
 
+static struct o2hb_heartbeat_group_attribute o2hb_heartbeat_group_attr_mode = {
+	.attr   = { .ca_owner = THIS_MODULE,
+		.ca_name = "mode",
+		.ca_mode = S_IRUGO | S_IWUSR },
+	.show   = o2hb_heartbeat_group_mode_show,
+	.store  = o2hb_heartbeat_group_mode_store,
+};
+
 static struct configfs_attribute *o2hb_heartbeat_group_attrs[] = {
 	&o2hb_heartbeat_group_attr_threshold.attr,
+	&o2hb_heartbeat_group_attr_mode.attr,
 	NULL,
 };
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 02/20] ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 03/20] ocfs2: Add support for heartbeat=global mount option Sunil Mushran
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for
both userspace and o2cb cluster stacks. It also allows us to extend cluster
info to include stack flags.

This patch also adds stackflags to sb->s_clusterinfo. It also introduces a
clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled
global heartbeat mode.

This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The
clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/ocfs2.h    |   31 +++++++++++++++++++++++++++++--
 fs/ocfs2/ocfs2_fs.h |   40 ++++++++++++++++++++++++++++++++++------
 fs/ocfs2/super.c    |    4 +++-
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index c67003b..d5496a7 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -368,6 +368,8 @@ struct ocfs2_super
 	struct ocfs2_alloc_stats alloc_stats;
 	char dev_str[20];		/* "major,minor" of the device */
 
+	u8 osb_stackflags;
+
 	char osb_cluster_stack[OCFS2_STACK_LABEL_LEN + 1];
 	struct ocfs2_cluster_connection *cconn;
 	struct ocfs2_lock_res osb_super_lockres;
@@ -601,10 +603,35 @@ static inline int ocfs2_is_soft_readonly(struct ocfs2_super *osb)
 	return ret;
 }
 
-static inline int ocfs2_userspace_stack(struct ocfs2_super *osb)
+static inline int ocfs2_clusterinfo_valid(struct ocfs2_super *osb)
 {
 	return (osb->s_feature_incompat &
-		OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK);
+		(OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK |
+		 OCFS2_FEATURE_INCOMPAT_CLUSTERINFO));
+}
+
+static inline int ocfs2_userspace_stack(struct ocfs2_super *osb)
+{
+	if (ocfs2_clusterinfo_valid(osb) &&
+	    memcmp(osb->osb_cluster_stack, OCFS2_CLASSIC_CLUSTER_STACK,
+		   OCFS2_STACK_LABEL_LEN))
+		return 1;
+	return 0;
+}
+
+static inline int ocfs2_o2cb_stack(struct ocfs2_super *osb)
+{
+	if (ocfs2_clusterinfo_valid(osb) &&
+	    !memcmp(osb->osb_cluster_stack, OCFS2_CLASSIC_CLUSTER_STACK,
+		   OCFS2_STACK_LABEL_LEN))
+		return 1;
+	return 0;
+}
+
+static inline int ocfs2_cluster_o2cb_global_heartbeat(struct ocfs2_super *osb)
+{
+	return ocfs2_o2cb_stack(osb) &&
+		(osb->osb_stackflags & OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT);
 }
 
 static inline int ocfs2_mount_local(struct ocfs2_super *osb)
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 33f1c9a..abe048e 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -101,7 +101,8 @@
 					 | OCFS2_FEATURE_INCOMPAT_META_ECC \
 					 | OCFS2_FEATURE_INCOMPAT_INDEXED_DIRS \
 					 | OCFS2_FEATURE_INCOMPAT_REFCOUNT_TREE \
-					 | OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG)
+					 | OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG	\
+					 | OCFS2_FEATURE_INCOMPAT_CLUSTERINFO)
 #define OCFS2_FEATURE_RO_COMPAT_SUPP	(OCFS2_FEATURE_RO_COMPAT_UNWRITTEN \
 					 | OCFS2_FEATURE_RO_COMPAT_USRQUOTA \
 					 | OCFS2_FEATURE_RO_COMPAT_GRPQUOTA)
@@ -170,6 +171,13 @@
 #define OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG	0x2000
 
 /*
+ * Incompat bit to indicate useable clusterinfo with stackflags for all
+ * cluster stacks (userspace adnd o2cb). If this bit is set,
+ * INCOMPAT_USERSPACE_STACK becomes superfluous and thus should not be set.
+ */
+#define OCFS2_FEATURE_INCOMPAT_CLUSTERINFO	0x4000
+
+/*
  * backup superblock flag is used to indicate that this volume
  * has backup superblocks.
  */
@@ -279,10 +287,13 @@
 #define OCFS2_VOL_UUID_LEN		16
 #define OCFS2_MAX_VOL_LABEL_LEN		64
 
-/* The alternate, userspace stack fields */
+/* The cluster stack fields */
 #define OCFS2_STACK_LABEL_LEN		4
 #define OCFS2_CLUSTER_NAME_LEN		16
 
+/* Classic (historically speaking) cluster stack */
+#define OCFS2_CLASSIC_CLUSTER_STACK	"o2cb"
+
 /* Journal limits (in bytes) */
 #define OCFS2_MIN_JOURNAL_SIZE		(4 * 1024 * 1024)
 
@@ -292,6 +303,11 @@
  */
 #define OCFS2_MIN_XATTR_INLINE_SIZE     256
 
+/*
+ * Cluster info flags (ocfs2_cluster_info.ci_stackflags)
+ */
+#define OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT	(0x01)
+
 struct ocfs2_system_inode_info {
 	char	*si_name;
 	int	si_iflags;
@@ -553,9 +569,21 @@ struct ocfs2_slot_map_extended {
  */
 };
 
+/*
+ * ci_stackflags is only valid if the incompat bit
+ * OCFS2_FEATURE_INCOMPAT_CLUSTERINFO is set.
+ */
 struct ocfs2_cluster_info {
 /*00*/	__u8   ci_stack[OCFS2_STACK_LABEL_LEN];
-	__le32 ci_reserved;
+	union {
+		__le32 ci_reserved;
+		struct {
+			__u8 ci_reserved1;
+			__u8 ci_reserved2;
+			__u8 ci_reserved3;
+			__u8 ci_stackflags;
+		};
+	};
 /*08*/	__u8   ci_cluster[OCFS2_CLUSTER_NAME_LEN];
 /*18*/
 };
@@ -592,9 +620,9 @@ struct ocfs2_super_block {
 					 * group header */
 /*50*/	__u8  s_label[OCFS2_MAX_VOL_LABEL_LEN];	/* Label for mounting, etc. */
 /*90*/	__u8  s_uuid[OCFS2_VOL_UUID_LEN];	/* 128-bit uuid */
-/*A0*/  struct ocfs2_cluster_info s_cluster_info; /* Selected userspace
-						     stack.  Only valid
-						     with INCOMPAT flag. */
+/*A0*/  struct ocfs2_cluster_info s_cluster_info; /* Only valid if either
+						     userspace or clusterinfo
+						     INCOMPAT flag set. */
 /*B8*/	__le16 s_xattr_inline_size;	/* extended attribute inline size
 					   for this fs*/
 	__le16 s_reserved0;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index fa1be1b..7554317 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -2149,7 +2149,9 @@ static int ocfs2_initialize_super(struct super_block *sb,
 		goto bail;
 	}
 
-	if (ocfs2_userspace_stack(osb)) {
+	if (ocfs2_clusterinfo_valid(osb)) {
+		osb->osb_stackflags =
+			OCFS2_RAW_SB(di)->s_cluster_info.ci_stackflags;
 		memcpy(osb->osb_cluster_stack,
 		       OCFS2_RAW_SB(di)->s_cluster_info.ci_stack,
 		       OCFS2_STACK_LABEL_LEN);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 03/20] ocfs2: Add support for heartbeat=global mount option
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 02/20] ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-25  8:39   ` Wengang Wang
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 04/20] ocfs2/dlm: Expose dlm_protocol in dlm_state Sunil Mushran
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Adds support for heartbeat=global mount option. It ensures that the heartbeat
mode passed matches the one enabled on disk.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/ocfs2.h    |    4 ++-
 fs/ocfs2/ocfs2_fs.h |    1 +
 fs/ocfs2/super.c    |   55 ++++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index d5496a7..481387b 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -243,7 +243,7 @@ enum ocfs2_local_alloc_state
 
 enum ocfs2_mount_options
 {
-	OCFS2_MOUNT_HB_LOCAL   = 1 << 0, /* Heartbeat started in local mode */
+	OCFS2_MOUNT_HB_LOCAL = 1 << 0, /* Local heartbeat */
 	OCFS2_MOUNT_BARRIER = 1 << 1,	/* Use block barriers */
 	OCFS2_MOUNT_NOINTR  = 1 << 2,   /* Don't catch signals */
 	OCFS2_MOUNT_ERRORS_PANIC = 1 << 3, /* Panic on errors */
@@ -256,6 +256,8 @@ enum ocfs2_mount_options
 						   control lists */
 	OCFS2_MOUNT_USRQUOTA = 1 << 10, /* We support user quotas */
 	OCFS2_MOUNT_GRPQUOTA = 1 << 11, /* We support group quotas */
+	OCFS2_MOUNT_HB_NONE = 1 << 12, /* No heartbeat */
+	OCFS2_MOUNT_HB_GLOBAL = 1 << 13, /* Global heartbeat */
 };
 
 #define OCFS2_OSB_SOFT_RO			0x0001
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index abe048e..4eeeccd 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -363,6 +363,7 @@ static struct ocfs2_system_inode_info ocfs2_system_inodes[NUM_SYSTEM_INODES] = {
 /* Parameter passed from mount.ocfs2 to module */
 #define OCFS2_HB_NONE			"heartbeat=none"
 #define OCFS2_HB_LOCAL			"heartbeat=local"
+#define OCFS2_HB_GLOBAL			"heartbeat=global"
 
 /*
  * OCFS2 directory file types.  Only the low 3 bits are used.  The
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 7554317..00d842c 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -162,6 +162,7 @@ enum {
 	Opt_nointr,
 	Opt_hb_none,
 	Opt_hb_local,
+	Opt_hb_global,
 	Opt_data_ordered,
 	Opt_data_writeback,
 	Opt_atime_quantum,
@@ -190,6 +191,7 @@ static const match_table_t tokens = {
 	{Opt_nointr, "nointr"},
 	{Opt_hb_none, OCFS2_HB_NONE},
 	{Opt_hb_local, OCFS2_HB_LOCAL},
+	{Opt_hb_global, OCFS2_HB_GLOBAL},
 	{Opt_data_ordered, "data=ordered"},
 	{Opt_data_writeback, "data=writeback"},
 	{Opt_atime_quantum, "atime_quantum=%u"},
@@ -608,6 +610,7 @@ static int ocfs2_remount(struct super_block *sb, int *flags, char *data)
 	int ret = 0;
 	struct mount_options parsed_options;
 	struct ocfs2_super *osb = OCFS2_SB(sb);
+	u32 tmp;
 
 	lock_kernel();
 
@@ -617,8 +620,9 @@ static int ocfs2_remount(struct super_block *sb, int *flags, char *data)
 		goto out;
 	}
 
-	if ((osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL) !=
-	    (parsed_options.mount_opt & OCFS2_MOUNT_HB_LOCAL)) {
+	tmp = OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL |
+		OCFS2_MOUNT_HB_NONE;
+	if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) {
 		ret = -EINVAL;
 		mlog(ML_ERROR, "Cannot change heartbeat mode on remount\n");
 		goto out;
@@ -809,23 +813,29 @@ bail:
 
 static int ocfs2_verify_heartbeat(struct ocfs2_super *osb)
 {
-	if (ocfs2_mount_local(osb)) {
-		if (osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL) {
+	u32 hb_enabled = OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL;
+
+	if (osb->s_mount_opt & hb_enabled) {
+		if (ocfs2_mount_local(osb)) {
 			mlog(ML_ERROR, "Cannot heartbeat on a locally "
 			     "mounted device.\n");
 			return -EINVAL;
 		}
-	}
-
-	if (ocfs2_userspace_stack(osb)) {
-		if (osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL) {
+		if (ocfs2_userspace_stack(osb)) {
 			mlog(ML_ERROR, "Userspace stack expected, but "
 			     "o2cb heartbeat arguments passed to mount\n");
 			return -EINVAL;
 		}
+		if (((osb->s_mount_opt & OCFS2_MOUNT_HB_GLOBAL) &&
+		     !ocfs2_cluster_o2cb_global_heartbeat(osb)) ||
+		    ((osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL) &&
+		     ocfs2_cluster_o2cb_global_heartbeat(osb))) {
+			mlog(ML_ERROR, "Mismatching o2cb heartbeat modes\n");
+			return -EINVAL;
+		}
 	}
 
-	if (!(osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL)) {
+	if (!(osb->s_mount_opt & hb_enabled)) {
 		if (!ocfs2_mount_local(osb) && !ocfs2_is_hard_readonly(osb) &&
 		    !ocfs2_userspace_stack(osb)) {
 			mlog(ML_ERROR, "Heartbeat has to be started to mount "
@@ -1291,6 +1301,7 @@ static int ocfs2_parse_options(struct super_block *sb,
 {
 	int status;
 	char *p;
+	u32 tmp;
 
 	mlog_entry("remount: %d, options: \"%s\"\n", is_remount,
 		   options ? options : "(none)");
@@ -1322,7 +1333,10 @@ static int ocfs2_parse_options(struct super_block *sb,
 			mopt->mount_opt |= OCFS2_MOUNT_HB_LOCAL;
 			break;
 		case Opt_hb_none:
-			mopt->mount_opt &= ~OCFS2_MOUNT_HB_LOCAL;
+			mopt->mount_opt |= OCFS2_MOUNT_HB_NONE;
+			break;
+		case Opt_hb_global:
+			mopt->mount_opt |= OCFS2_MOUNT_HB_GLOBAL;
 			break;
 		case Opt_barrier:
 			if (match_int(&args[0], &option)) {
@@ -1477,6 +1491,15 @@ static int ocfs2_parse_options(struct super_block *sb,
 		}
 	}
 
+	/* Ensure only one heartbeat mode */
+	tmp = mopt->mount_opt & (OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL |
+				 OCFS2_MOUNT_HB_NONE);
+	if (hweight32(tmp) != 1) {
+		mlog(ML_ERROR, "Invalid heartbeat mount option: %s\n", options);
+		status = 0;
+		goto bail;
+	}
+
 	status = 1;
 
 bail:
@@ -1490,10 +1513,14 @@ static int ocfs2_show_options(struct seq_file *s, struct vfsmount *mnt)
 	unsigned long opts = osb->s_mount_opt;
 	unsigned int local_alloc_megs;
 
-	if (opts & OCFS2_MOUNT_HB_LOCAL)
-		seq_printf(s, ",_netdev,heartbeat=local");
-	else
-		seq_printf(s, ",heartbeat=none");
+	if (opts & (OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL)) {
+		seq_printf(s, ",_netdev");
+		if (opts & OCFS2_MOUNT_HB_LOCAL)
+			seq_printf(s, ",%s", OCFS2_HB_LOCAL);
+		else
+			seq_printf(s, ",%s", OCFS2_HB_GLOBAL);
+	} else
+		seq_printf(s, ",%s", OCFS2_HB_NONE);
 
 	if (opts & OCFS2_MOUNT_NOINTR)
 		seq_printf(s, ",nointr");
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 04/20] ocfs2/dlm: Expose dlm_protocol in dlm_state
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (2 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 03/20] ocfs2: Add support for heartbeat=global mount option Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-25  8:42   ` Wengang Wang
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 05/20] ocfs2/cluster: Get all heartbeat regions Sunil Mushran
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Add dlm_protocol to the list of info shown by the debugfs file, dlm_state.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/dlm/dlmdebug.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 5efdd37..51164a6 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -775,7 +775,9 @@ static int debug_state_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
 
 	/* Domain: xxxxxxxxxx  Key: 0xdfbac769 */
 	out += snprintf(db->buf + out, db->len - out,
-			"Domain: %s  Key: 0x%08x\n", dlm->name, dlm->key);
+			"Domain: %s  Key: 0x%08x  Protocol: %d.%d\n",
+			dlm->name, dlm->key, dlm->dlm_locking_proto.pv_major,
+			dlm->dlm_locking_proto.pv_minor);
 
 	/* Thread Pid: xxx  Node: xxx  State: xxxxx */
 	out += snprintf(db->buf + out, db->len - out,
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 05/20] ocfs2/cluster: Get all heartbeat regions
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (3 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 04/20] ocfs2/dlm: Expose dlm_protocol in dlm_state Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 21:57   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 06/20] ocfs2/dlm: Add message DLM_QUERY_REGION Sunil Mushran
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Export function in o2hb to get a list of heartbeat regions. It also adds an
upper limit to the length of the heartbeat region name.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   34 ++++++++++++++++++++++++++++++++++
 fs/ocfs2/cluster/heartbeat.h |    4 ++++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 57cc715..cec9d4c 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1623,6 +1623,9 @@ static struct config_item *o2hb_heartbeat_group_make_item(struct config_group *g
 	if (reg == NULL)
 		return ERR_PTR(-ENOMEM);
 
+	if (strlen(name) > O2HB_MAX_REGION_NAME_LEN)
+		return ERR_PTR(-ENAMETOOLONG);
+
 	config_item_init_type_name(&reg->hr_item, name, &o2hb_region_type);
 
 	spin_lock(&o2hb_live_lock);
@@ -2033,3 +2036,34 @@ void o2hb_stop_all_regions(void)
 	spin_unlock(&o2hb_live_lock);
 }
 EXPORT_SYMBOL_GPL(o2hb_stop_all_regions);
+
+int o2hb_get_all_regions(char *region_uuids, u8 max_regions)
+{
+	struct o2hb_region *reg;
+	int numregs = 0;
+	char *p;
+
+	spin_lock(&o2hb_live_lock);
+
+	p = region_uuids;
+	list_for_each_entry(reg, &o2hb_all_regions, hr_all_item) {
+		mlog(0, "Region: %s\n", config_item_name(&reg->hr_item));
+		if (numregs < max_regions) {
+			memcpy(p, config_item_name(&reg->hr_item),
+			       O2HB_MAX_REGION_NAME_LEN);
+			p += O2HB_MAX_REGION_NAME_LEN;
+		}
+		numregs++;
+	}
+
+	spin_unlock(&o2hb_live_lock);
+
+	return numregs;
+}
+EXPORT_SYMBOL_GPL(o2hb_get_all_regions);
+
+int o2hb_global_heartbeat_active(void)
+{
+	return (o2hb_heartbeat_mode == O2HB_HEARTBEAT_GLOBAL);
+}
+EXPORT_SYMBOL(o2hb_global_heartbeat_active);
diff --git a/fs/ocfs2/cluster/heartbeat.h b/fs/ocfs2/cluster/heartbeat.h
index 2f16492..00ad8e8 100644
--- a/fs/ocfs2/cluster/heartbeat.h
+++ b/fs/ocfs2/cluster/heartbeat.h
@@ -31,6 +31,8 @@
 
 #define O2HB_REGION_TIMEOUT_MS		2000
 
+#define O2HB_MAX_REGION_NAME_LEN	32
+
 /* number of changes to be seen as live */
 #define O2HB_LIVE_THRESHOLD	   2
 /* number of equal samples to be seen as dead */
@@ -81,5 +83,7 @@ int o2hb_check_node_heartbeating(u8 node_num);
 int o2hb_check_node_heartbeating_from_callback(u8 node_num);
 int o2hb_check_local_node_heartbeating(void);
 void o2hb_stop_all_regions(void);
+int o2hb_get_all_regions(char *region_uuids, u8 numregions);
+int o2hb_global_heartbeat_active(void);
 
 #endif /* O2CLUSTER_HEARTBEAT_H */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 06/20] ocfs2/dlm: Add message DLM_QUERY_REGION
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (4 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 05/20] ocfs2/cluster: Get all heartbeat regions Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 07/20] ocfs2: Print message if user mounts without starting global heartbeat Sunil Mushran
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Adds new dlm message DLM_QUERY_REGION that sends the names of all active
heartbeat regions. This message is only sent in the global heartbeat
mode. If the regions in the joining node do not fully match the ones in
the active nodes, the join domain request is rejected.

The dlm_protocol is bumped upto 1.1.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/ocfs2_nodemanager.h |    6 +
 fs/ocfs2/dlm/dlmcommon.h             |   12 ++-
 fs/ocfs2/dlm/dlmdomain.c             |  220 +++++++++++++++++++++++++++++++++-
 3 files changed, 236 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/cluster/ocfs2_nodemanager.h b/fs/ocfs2/cluster/ocfs2_nodemanager.h
index 5b9854b..49b5943 100644
--- a/fs/ocfs2/cluster/ocfs2_nodemanager.h
+++ b/fs/ocfs2/cluster/ocfs2_nodemanager.h
@@ -36,4 +36,10 @@
 /* host name, group name, cluster name all 64 bytes */
 #define O2NM_MAX_NAME_LEN        64    // __NEW_UTS_LEN
 
+/*
+ * Maximum number of global heartbeat regions allowed.
+ * **CAUTION**  Changing this number will break dlm compatibility.
+ */
+#define O2NM_MAX_REGIONS	32
+
 #endif /* _OCFS2_NODEMANAGER_H */
diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 4b6ae2c..808591a 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -445,7 +445,8 @@ enum {
 	DLM_LOCK_REQUEST_MSG,	 /* 515 */
 	DLM_RECO_DATA_DONE_MSG,	 /* 516 */
 	DLM_BEGIN_RECO_MSG,	 /* 517 */
-	DLM_FINALIZE_RECO_MSG	 /* 518 */
+	DLM_FINALIZE_RECO_MSG,	 /* 518 */
+	DLM_QUERY_REGION,	 /* 519 */
 };
 
 struct dlm_reco_node_data
@@ -727,6 +728,15 @@ struct dlm_cancel_join
 	u8 domain[O2NM_MAX_NAME_LEN];
 };
 
+struct dlm_query_region {
+	u8 qr_node;
+	u8 qr_numregions;
+	u8 qr_namelen;
+	u8 pad1;
+	u8 qr_domain[O2NM_MAX_NAME_LEN];
+	u8 qr_regions[O2HB_MAX_REGION_NAME_LEN * O2NM_MAX_REGIONS];
+};
+
 struct dlm_exit_domain
 {
 	u8 node_idx;
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 153abb5..221941f 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -128,10 +128,13 @@ static DECLARE_WAIT_QUEUE_HEAD(dlm_domain_events);
  * will have a negotiated version with the same major number and a minor
  * number equal or smaller.  The dlm_ctxt->dlm_locking_proto field should
  * be used to determine what a running domain is actually using.
+ *
+ * New in version 1.1:
+ *	- Message DLM_QUERY_REGION added to support global heartbeat
  */
 static const struct dlm_protocol_version dlm_protocol = {
 	.pv_major = 1,
-	.pv_minor = 0,
+	.pv_minor = 1,
 };
 
 #define DLM_DOMAIN_BACKOFF_MS 200
@@ -142,6 +145,8 @@ static int dlm_assert_joined_handler(struct o2net_msg *msg, u32 len, void *data,
 				     void **ret_data);
 static int dlm_cancel_join_handler(struct o2net_msg *msg, u32 len, void *data,
 				   void **ret_data);
+static int dlm_query_region_handler(struct o2net_msg *msg, u32 len,
+				    void *data, void **ret_data);
 static int dlm_exit_domain_handler(struct o2net_msg *msg, u32 len, void *data,
 				   void **ret_data);
 static int dlm_protocol_compare(struct dlm_protocol_version *existing,
@@ -920,6 +925,203 @@ static int dlm_assert_joined_handler(struct o2net_msg *msg, u32 len, void *data,
 	return 0;
 }
 
+static int dlm_match_regions(struct dlm_ctxt *dlm,
+			     struct dlm_query_region *qr)
+{
+	char *local = NULL, *remote = qr->qr_regions;
+	char *l, *r;
+	int localnr, i, j, foundit;
+	int status = 0;
+
+	if (!o2hb_global_heartbeat_active()) {
+		if (qr->qr_numregions) {
+			mlog(ML_ERROR, "Domain %s: Joining node %d has global "
+			     "heartbeat enabled but local node %d does not\n",
+			     qr->qr_domain, qr->qr_node, dlm->node_num);
+			status = -EINVAL;
+		}
+		goto bail;
+	}
+
+	if (o2hb_global_heartbeat_active() && !qr->qr_numregions) {
+		mlog(ML_ERROR, "Domain %s: Local node %d has global "
+		     "heartbeat enabled but joining node %d does not\n",
+		     qr->qr_domain, dlm->node_num, qr->qr_node);
+		status = -EINVAL;
+		goto bail;
+	}
+
+	r = remote;
+	for (i = 0; i < qr->qr_numregions; ++i) {
+		mlog(0, "Region %.*s\n", O2HB_MAX_REGION_NAME_LEN, r);
+		r += O2HB_MAX_REGION_NAME_LEN;
+	}
+
+	local = kmalloc(sizeof(qr->qr_regions), GFP_KERNEL);
+	if (!local) {
+		status = -ENOMEM;
+		goto bail;
+	}
+
+	localnr = o2hb_get_all_regions(local, O2NM_MAX_REGIONS);
+
+	/* compare local regions with remote */
+	l = local;
+	for (i = 0; i < localnr; ++i) {
+		foundit = 0;
+		r = remote;
+		for (j = 0; j <= qr->qr_numregions; ++j) {
+			if (!memcmp(l, r, O2HB_MAX_REGION_NAME_LEN)) {
+				foundit = 1;
+				break;
+			}
+			r += O2HB_MAX_REGION_NAME_LEN;
+		}
+		if (!foundit) {
+			status = -EINVAL;
+			mlog(ML_ERROR, "Domain %s: Region '%.*s' registered "
+			     "in local node %d but not in joining node %d\n",
+			     qr->qr_domain, O2HB_MAX_REGION_NAME_LEN, l,
+			     dlm->node_num, qr->qr_node);
+			goto bail;
+		}
+		l += O2HB_MAX_REGION_NAME_LEN;
+	}
+
+	/* compare remote with local regions */
+	r = remote;
+	for (i = 0; i < qr->qr_numregions; ++i) {
+		foundit = 0;
+		l = local;
+		for (j = 0; j < localnr; ++j) {
+			if (!memcmp(r, l, O2HB_MAX_REGION_NAME_LEN)) {
+				foundit = 1;
+				break;
+			}
+			l += O2HB_MAX_REGION_NAME_LEN;
+		}
+		if (!foundit) {
+			status = -EINVAL;
+			mlog(ML_ERROR, "Domain %s: Region '%.*s' registered "
+			     "in joining node %d but not in local node %d\n",
+			     qr->qr_domain, O2HB_MAX_REGION_NAME_LEN, r,
+			     qr->qr_node, dlm->node_num);
+			goto bail;
+		}
+		r += O2HB_MAX_REGION_NAME_LEN;
+	}
+
+bail:
+	kfree(local);
+
+	return status;
+}
+
+static int dlm_send_regions(struct dlm_ctxt *dlm, unsigned long *node_map)
+{
+	struct dlm_query_region *qr = NULL;
+	int status, ret = 0, i;
+	char *p;
+
+	if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES)
+		goto bail;
+
+	qr = kzalloc(sizeof(struct dlm_query_region), GFP_KERNEL);
+	if (!qr) {
+		ret = -ENOMEM;
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	qr->qr_node = dlm->node_num;
+	qr->qr_namelen = strlen(dlm->name);
+	memcpy(qr->qr_domain, dlm->name, qr->qr_namelen);
+	/* if local hb, the numregions will be zero */
+	if (o2hb_global_heartbeat_active())
+		qr->qr_numregions = o2hb_get_all_regions(qr->qr_regions,
+							 O2NM_MAX_REGIONS);
+
+	p = qr->qr_regions;
+	for (i = 0; i < qr->qr_numregions; ++i, p += O2HB_MAX_REGION_NAME_LEN)
+		mlog(0, "Region %.*s\n", O2HB_MAX_REGION_NAME_LEN, p);
+
+	i = -1;
+	while ((i = find_next_bit(node_map, O2NM_MAX_NODES,
+				  i + 1)) < O2NM_MAX_NODES) {
+		if (i == dlm->node_num)
+			continue;
+
+		mlog(ML_NOTICE, "Sending regions to node %d\n", i);
+
+		ret = o2net_send_message(DLM_QUERY_REGION, DLM_MOD_KEY, qr,
+					 sizeof(struct dlm_query_region),
+					 i, &status);
+		if (ret >= 0)
+			ret = status;
+		if (ret) {
+			mlog(ML_ERROR, "Region mismatch %d, node %d\n",
+			     ret, i);
+			break;
+		}
+	}
+
+bail:
+	kfree(qr);
+	return ret;
+}
+
+static int dlm_query_region_handler(struct o2net_msg *msg, u32 len,
+				    void *data, void **ret_data)
+{
+	struct dlm_query_region *qr;
+	struct dlm_ctxt *dlm = NULL;
+	int status = 0;
+	int locked = 0;
+
+	qr = (struct dlm_query_region *) msg->buf;
+
+	mlog(ML_NOTICE, "Node %u queries hb regions on domain %s\n",
+	     qr->qr_node, qr->qr_domain);
+
+	status = -EINVAL;
+
+	spin_lock(&dlm_domain_lock);
+	dlm = __dlm_lookup_domain_full(qr->qr_domain, qr->qr_namelen);
+	if (!dlm) {
+		mlog(ML_ERROR, "Node %d queried hb regions on domain %s "
+		     "before join domain\n", qr->qr_node, qr->qr_domain);
+		goto bail;
+	}
+
+	spin_lock(&dlm->spinlock);
+	locked = 1;
+	if (dlm->joining_node != qr->qr_node) {
+		mlog(ML_ERROR, "Node %d queried hb regions on domain %s "
+		     "but joining node is %d\n", qr->qr_node, qr->qr_domain,
+		     dlm->joining_node);
+		goto bail;
+	}
+
+	/* Support for global heartbeat was added in 1.1 */
+	if (dlm->dlm_locking_proto.pv_major == 1 &&
+	    dlm->dlm_locking_proto.pv_minor == 0) {
+		mlog(ML_ERROR, "Node %d queried hb regions on domain %s "
+		     "but active dlm protocol is %d.%d\n", qr->qr_node,
+		     qr->qr_domain, dlm->dlm_locking_proto.pv_major,
+		     dlm->dlm_locking_proto.pv_minor);
+		goto bail;
+	}
+
+	status = dlm_match_regions(dlm, qr);
+
+bail:
+	if (locked)
+		spin_unlock(&dlm->spinlock);
+	spin_unlock(&dlm_domain_lock);
+
+	return status;
+}
+
 static int dlm_cancel_join_handler(struct o2net_msg *msg, u32 len, void *data,
 				   void **ret_data)
 {
@@ -1240,6 +1442,15 @@ static int dlm_try_to_join_domain(struct dlm_ctxt *dlm)
 	set_bit(dlm->node_num, dlm->domain_map);
 	spin_unlock(&dlm->spinlock);
 
+	/* Support for global heartbeat was added in 1.1 */
+	if (dlm_protocol.pv_major > 1 || dlm_protocol.pv_minor > 0) {
+		status = dlm_send_regions(dlm, ctxt->yes_resp_map);
+		if (status) {
+			mlog_errno(status);
+			goto bail;
+		}
+	}
+
 	dlm_send_join_asserts(dlm, ctxt->yes_resp_map);
 
 	/* Joined state *must* be set before the joining node
@@ -1806,6 +2017,13 @@ static int dlm_register_net_handlers(void)
 					sizeof(struct dlm_cancel_join),
 					dlm_cancel_join_handler,
 					NULL, NULL, &dlm_join_handlers);
+	if (status)
+		goto bail;
+
+	status = o2net_register_handler(DLM_QUERY_REGION, DLM_MOD_KEY,
+					sizeof(struct dlm_query_region),
+					dlm_query_region_handler,
+					NULL, NULL, &dlm_join_handlers);
 
 bail:
 	if (status < 0)
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 07/20] ocfs2: Print message if user mounts without starting global heartbeat
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (5 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 06/20] ocfs2/dlm: Add message DLM_QUERY_REGION Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 08/20] ocfs2/dlm: Add message DLM_QUERY_NODEINFO Sunil Mushran
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

In global heartbeat mode, the heartbeat is started by the user. This patch
prints an error if the user attempts to mount a volume without starting the
heartbeat.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/stack_o2cb.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/stack_o2cb.c b/fs/ocfs2/stack_o2cb.c
index 0d3049f..19965b0 100644
--- a/fs/ocfs2/stack_o2cb.c
+++ b/fs/ocfs2/stack_o2cb.c
@@ -283,6 +283,8 @@ static int o2cb_cluster_connect(struct ocfs2_cluster_connection *conn)
 	/* for now we only have one cluster/node, make sure we see it
 	 * in the heartbeat universe */
 	if (!o2hb_check_local_node_heartbeating()) {
+		if (o2hb_global_heartbeat_active())
+			mlog(ML_ERROR, "Global heartbeat not started\n");
 		rc = -EINVAL;
 		goto out;
 	}
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 08/20] ocfs2/dlm: Add message DLM_QUERY_NODEINFO
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (6 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 07/20] ocfs2: Print message if user mounts without starting global heartbeat Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:18   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 09/20] ocfs2/cluster: Print messages when adding/removing nodes and heartbeat regions Sunil Mushran
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all
registered nodes. This message is sent if the negotiated dlm protocol is
1.1 or higher. If the information of the joining node does not match
that of any existing nodes, the join domain request is rejected.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/dlm/dlmcommon.h |   17 ++++
 fs/ocfs2/dlm/dlmdomain.c |  182 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 198 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 808591a..a7c590d 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -447,6 +447,7 @@ enum {
 	DLM_BEGIN_RECO_MSG,	 /* 517 */
 	DLM_FINALIZE_RECO_MSG,	 /* 518 */
 	DLM_QUERY_REGION,	 /* 519 */
+	DLM_QUERY_NODEINFO,	 /* 520 */
 };
 
 struct dlm_reco_node_data
@@ -737,6 +738,22 @@ struct dlm_query_region {
 	u8 qr_regions[O2HB_MAX_REGION_NAME_LEN * O2NM_MAX_REGIONS];
 };
 
+struct dlm_node_info {
+	u8 ni_nodenum;
+	u8 pad1;
+	u16 ni_ipv4_port;
+	u32 ni_ipv4_address;
+};
+
+struct dlm_query_nodeinfo {
+	u8 qn_nodenum;
+	u8 qn_numnodes;
+	u8 qn_namelen;
+	u8 pad1;
+	u8 qn_domain[O2NM_MAX_NAME_LEN];
+	struct dlm_node_info qn_nodes[O2NM_MAX_NODES];
+};
+
 struct dlm_exit_domain
 {
 	u8 node_idx;
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 221941f..f8dc76f 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -131,6 +131,7 @@ static DECLARE_WAIT_QUEUE_HEAD(dlm_domain_events);
  *
  * New in version 1.1:
  *	- Message DLM_QUERY_REGION added to support global heartbeat
+ *	- Message DLM_QUERY_NODEINFO added to allow online node removes
  */
 static const struct dlm_protocol_version dlm_protocol = {
 	.pv_major = 1,
@@ -1122,6 +1123,173 @@ bail:
 	return status;
 }
 
+static int dlm_match_nodes(struct dlm_ctxt *dlm, struct dlm_query_nodeinfo *qn)
+{
+	struct o2nm_node *local;
+	struct dlm_node_info *remote;
+	int i, j;
+	int status = 0;
+
+	for (j = 0; j < qn->qn_numnodes; ++j)
+		mlog(0, "Node %3d, %pI4:%u\n", qn->qn_nodes[j].ni_nodenum,
+		     &(qn->qn_nodes[j].ni_ipv4_address),
+		     ntohs(qn->qn_nodes[j].ni_ipv4_port));
+
+	for (i = 0; i < O2NM_MAX_NODES && !status; ++i) {
+		local = o2nm_get_node_by_num(i);
+		remote = NULL;
+		for (j = 0; j < qn->qn_numnodes; ++j) {
+			if (qn->qn_nodes[j].ni_nodenum == i) {
+				remote = &(qn->qn_nodes[j]);
+				break;
+			}
+		}
+
+		if (!local && !remote)
+			continue;
+
+		if ((local && !remote) || (!local && remote))
+			status = -EINVAL;
+
+		if (!status &&
+		    ((remote->ni_nodenum != local->nd_num) ||
+		     (remote->ni_ipv4_port != local->nd_ipv4_port) ||
+		     (remote->ni_ipv4_address != local->nd_ipv4_address)))
+			status = -EINVAL;
+
+		if (status) {
+			if (remote && !local)
+				mlog(ML_ERROR, "Domain %s: Node %d (%pI4:%u) "
+				     "registered in joining node %d but not in "
+				     "local node %d\n", qn->qn_domain,
+				     remote->ni_nodenum,
+				     &(remote->ni_ipv4_address),
+				     ntohs(remote->ni_ipv4_port),
+				     qn->qn_nodenum, dlm->node_num);
+			if (local && !remote)
+				mlog(ML_ERROR, "Domain %s: Node %d (%pI4:%u) "
+				     "registered in local node %d but not in "
+				     "joining node %d\n", qn->qn_domain,
+				     local->nd_num, &(local->nd_ipv4_address),
+				     ntohs(local->nd_ipv4_port),
+				     dlm->node_num, qn->qn_nodenum);
+			BUG_ON((!local && !remote));
+		}
+
+		if (local)
+			o2nm_node_put(local);
+	}
+
+	return status;
+}
+
+static int dlm_send_nodeinfo(struct dlm_ctxt *dlm, unsigned long *node_map)
+{
+	struct dlm_query_nodeinfo *qn = NULL;
+	struct o2nm_node *node;
+	int ret = 0, status, count, i;
+
+	if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES)
+		goto bail;
+
+	qn = kzalloc(sizeof(struct dlm_query_nodeinfo), GFP_KERNEL);
+	if (!qn) {
+		ret = -ENOMEM;
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	for (i = 0, count = 0; i < O2NM_MAX_NODES; ++i) {
+		node = o2nm_get_node_by_num(i);
+		if (!node)
+			continue;
+		qn->qn_nodes[count].ni_nodenum = node->nd_num;
+		qn->qn_nodes[count].ni_ipv4_port = node->nd_ipv4_port;
+		qn->qn_nodes[count].ni_ipv4_address = node->nd_ipv4_address;
+		mlog(0, "Node %3d, %pI4:%u\n", node->nd_num,
+		     &(node->nd_ipv4_address), ntohs(node->nd_ipv4_port));
+		++count;
+		o2nm_node_put(node);
+	}
+
+	qn->qn_nodenum = dlm->node_num;
+	qn->qn_numnodes = count;
+	qn->qn_namelen = strlen(dlm->name);
+	memcpy(qn->qn_domain, dlm->name, qn->qn_namelen);
+
+	i = -1;
+	while ((i = find_next_bit(node_map, O2NM_MAX_NODES,
+				  i + 1)) < O2NM_MAX_NODES) {
+		if (i == dlm->node_num)
+			continue;
+
+		mlog(ML_NOTICE, "Sending nodeinfo to node %d\n", i);
+
+		ret = o2net_send_message(DLM_QUERY_NODEINFO, DLM_MOD_KEY,
+					 qn, sizeof(struct dlm_query_nodeinfo),
+					 i, &status);
+		if (ret >= 0)
+			ret = status;
+		if (ret) {
+			mlog(ML_ERROR, "node mismatch %d, node %d\n", ret, i);
+			break;
+		}
+	}
+
+bail:
+	kfree(qn);
+	return ret;
+}
+
+static int dlm_query_nodeinfo_handler(struct o2net_msg *msg, u32 len,
+				      void *data, void **ret_data)
+{
+	struct dlm_query_nodeinfo *qn;
+	struct dlm_ctxt *dlm = NULL;
+	int locked = 0, status = -EINVAL;
+
+	qn = (struct dlm_query_nodeinfo *) msg->buf;
+
+	mlog(ML_NOTICE, "Node %u queries nodes on domain %s\n",
+	     qn->qn_nodenum, qn->qn_domain);
+
+	spin_lock(&dlm_domain_lock);
+	dlm = __dlm_lookup_domain_full(qn->qn_domain, qn->qn_namelen);
+	if (!dlm) {
+		mlog(ML_ERROR, "Node %d queried nodes on domain %s before "
+		     "join domain\n", qn->qn_nodenum, qn->qn_domain);
+		goto bail;
+	}
+
+	spin_lock(&dlm->spinlock);
+	locked = 1;
+	if (dlm->joining_node != qn->qn_nodenum) {
+		mlog(ML_ERROR, "Node %d queried nodes on domain %s but "
+		     "joining node is %d\n", qn->qn_nodenum, qn->qn_domain,
+		     dlm->joining_node);
+		goto bail;
+	}
+
+	/* Support for node query was added in 1.1 */
+	if (dlm->dlm_locking_proto.pv_major == 1 &&
+	    dlm->dlm_locking_proto.pv_minor == 0) {
+		mlog(ML_ERROR, "Node %d queried nodes on domain %s "
+		     "but active dlm protocol is %d.%d\n", qn->qn_nodenum,
+		     qn->qn_domain, dlm->dlm_locking_proto.pv_major,
+		     dlm->dlm_locking_proto.pv_minor);
+		goto bail;
+	}
+
+	status = dlm_match_nodes(dlm, qn);
+
+bail:
+	if (locked)
+		spin_unlock(&dlm->spinlock);
+	spin_unlock(&dlm_domain_lock);
+
+	return status;
+}
+
 static int dlm_cancel_join_handler(struct o2net_msg *msg, u32 len, void *data,
 				   void **ret_data)
 {
@@ -1442,8 +1610,13 @@ static int dlm_try_to_join_domain(struct dlm_ctxt *dlm)
 	set_bit(dlm->node_num, dlm->domain_map);
 	spin_unlock(&dlm->spinlock);
 
-	/* Support for global heartbeat was added in 1.1 */
+	/* Support for global heartbeat and node info was added in 1.1 */
 	if (dlm_protocol.pv_major > 1 || dlm_protocol.pv_minor > 0) {
+		status = dlm_send_nodeinfo(dlm, ctxt->yes_resp_map);
+		if (status) {
+			mlog_errno(status);
+			goto bail;
+		}
 		status = dlm_send_regions(dlm, ctxt->yes_resp_map);
 		if (status) {
 			mlog_errno(status);
@@ -2025,6 +2198,13 @@ static int dlm_register_net_handlers(void)
 					dlm_query_region_handler,
 					NULL, NULL, &dlm_join_handlers);
 
+	if (status)
+		goto bail;
+
+	status = o2net_register_handler(DLM_QUERY_NODEINFO, DLM_MOD_KEY,
+					sizeof(struct dlm_query_nodeinfo),
+					dlm_query_nodeinfo_handler,
+					NULL, NULL, &dlm_join_handlers);
 bail:
 	if (status < 0)
 		dlm_unregister_net_handlers();
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 09/20] ocfs2/cluster: Print messages when adding/removing nodes and heartbeat regions
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (7 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 08/20] ocfs2/dlm: Add message DLM_QUERY_NODEINFO Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:25   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 10/20] ocfs2/cluster: Check slots for unconfigured live nodes Sunil Mushran
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Prints messages when the user adds or removes nodes and heartbeat regions.
The heartbeat region logging is only enabled in the global heartbeat mode. These
messages are useful when debugging cluster related issues.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c   |    9 ++++++++-
 fs/ocfs2/cluster/nodemanager.c |    4 ++++
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index cec9d4c..1d71856 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1476,6 +1476,10 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region *reg,
 	else
 		ret = -EIO;
 
+	if (hb_task && o2hb_global_heartbeat_active())
+		printk(KERN_NOTICE "o2hb: Heartbeat started on region %s\n",
+		       config_item_name(&reg->hr_item));
+
 out:
 	if (filp)
 		fput(filp);
@@ -1659,6 +1663,9 @@ static void o2hb_heartbeat_group_drop_item(struct config_group *group,
 		wake_up(&o2hb_steady_queue);
 	}
 
+	if (o2hb_global_heartbeat_active())
+		printk(KERN_NOTICE "o2hb: Heartbeat stopped on region %s\n",
+		       config_item_name(&reg->hr_item));
 	config_item_put(item);
 }
 
@@ -1743,7 +1750,7 @@ ssize_t o2hb_heartbeat_group_mode_store(struct o2hb_heartbeat_group *group,
 
 		ret = o2hb_global_hearbeat_mode_set(i);
 		if (!ret)
-			printk(KERN_NOTICE "ocfs2: Heartbeat mode set to %s\n",
+			printk(KERN_NOTICE "o2hb: Heartbeat mode set to %s\n",
 			       o2hb_heartbeat_mode_desc[i]);
 		return count;
 	}
diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
index ed0c9f3..f488fbe 100644
--- a/fs/ocfs2/cluster/nodemanager.c
+++ b/fs/ocfs2/cluster/nodemanager.c
@@ -711,6 +711,8 @@ static struct config_item *o2nm_node_group_make_item(struct config_group *group,
 	config_item_init_type_name(&node->nd_item, name, &o2nm_node_type);
 	spin_lock_init(&node->nd_lock);
 
+	printk(KERN_NOTICE "o2nm: Registering node %s\n", name);
+
 	return &node->nd_item;
 }
 
@@ -744,6 +746,8 @@ static void o2nm_node_group_drop_item(struct config_group *group,
 	}
 	write_unlock(&cluster->cl_nodes_lock);
 
+	printk(KERN_NOTICE "o2nm: Unregistered node %s\n",
+	       config_item_name(&node->nd_item));
 	config_item_put(item);
 }
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 10/20] ocfs2/cluster: Check slots for unconfigured live nodes
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (8 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 09/20] ocfs2/cluster: Print messages when adding/removing nodes and heartbeat regions Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:31   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 11/20] ocfs2/cluster: Reorganize o2hb debugfs init Sunil Mushran
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

o2hb currently checks slots for configured nodes only. This patch makes
it check the slots for the live nodes too to take care of a race in which
a node is removed from the configuration but not from the live map.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   38 +++++++++++++++++++++++++++++++-------
 1 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 1d71856..de798c7 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -593,14 +593,24 @@ static int o2hb_check_slot(struct o2hb_region *reg,
 	u64 cputime;
 	unsigned int dead_ms = o2hb_dead_threshold * O2HB_REGION_TIMEOUT_MS;
 	unsigned int slot_dead_ms;
+	int tmp;
 
 	memcpy(hb_block, slot->ds_raw_block, reg->hr_block_bytes);
 
-	/* Is this correct? Do we assume that the node doesn't exist
-	 * if we're not configured for him? */
+	/*
+	 * If a node is no longer configured but is still in the livemap, we
+	 * may need to clear that bit from the livemap.
+	 */
 	node = o2nm_get_node_by_num(slot->ds_node_num);
-	if (!node)
-		return 0;
+	if (!node) {
+		spin_lock(&o2hb_live_lock);
+		tmp = test_bit(slot->ds_node_num, o2hb_live_node_bitmap);
+		spin_unlock(&o2hb_live_lock);
+		if (!tmp)
+			return 0;
+		printk(KERN_NOTICE "o2hb: Live node %d is not registered\n",
+		       slot->ds_node_num);
+	}
 
 	if (!o2hb_verify_crc(reg, hb_block)) {
 		/* all paths from here will drop o2hb_live_lock for
@@ -717,8 +727,9 @@ fire_callbacks:
 		if (list_empty(&o2hb_live_slots[slot->ds_node_num])) {
 			clear_bit(slot->ds_node_num, o2hb_live_node_bitmap);
 
-			o2hb_queue_node_event(&event, O2HB_NODE_DOWN_CB, node,
-					      slot->ds_node_num);
+			if (node)
+				o2hb_queue_node_event(&event, O2HB_NODE_DOWN_CB,
+						      node, slot->ds_node_num);
 
 			changed = 1;
 		}
@@ -738,7 +749,8 @@ out:
 
 	o2hb_run_event_list(&event);
 
-	o2nm_node_put(node);
+	if (node)
+		o2nm_node_put(node);
 	return changed;
 }
 
@@ -765,6 +777,7 @@ static int o2hb_do_disk_heartbeat(struct o2hb_region *reg)
 {
 	int i, ret, highest_node, change = 0;
 	unsigned long configured_nodes[BITS_TO_LONGS(O2NM_MAX_NODES)];
+	unsigned long live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
 	struct o2hb_bio_wait_ctxt write_wc;
 
 	ret = o2nm_configured_node_map(configured_nodes,
@@ -774,6 +787,17 @@ static int o2hb_do_disk_heartbeat(struct o2hb_region *reg)
 		return ret;
 	}
 
+	/*
+	 * If a node is not configured but is in the livemap, we still need
+	 * to read the slot so as to be able to remove it from the livemap.
+	 */
+	o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
+	i = -1;
+	while((i = find_next_bit(live_node_bitmap,
+				 O2NM_MAX_NODES, i + 1)) < O2NM_MAX_NODES) {
+		set_bit(i, configured_nodes);
+	}
+
 	highest_node = o2hb_highest_node(configured_nodes, O2NM_MAX_NODES);
 	if (highest_node >= O2NM_MAX_NODES) {
 		mlog(ML_NOTICE, "ocfs2_heartbeat: no configured nodes found!\n");
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 11/20] ocfs2/cluster: Reorganize o2hb debugfs init
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (9 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 10/20] ocfs2/cluster: Check slots for unconfigured live nodes Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 12/20] ocfs2/cluster: Maintain live node bitmap per heartbeat region Sunil Mushran
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

o2hb debugfs handling is reorganized to allow for easy expansion.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |  101 ++++++++++++++++++++++++++++++++----------
 1 files changed, 78 insertions(+), 23 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index de798c7..683478d 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -62,8 +62,19 @@ static unsigned long o2hb_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
 static LIST_HEAD(o2hb_node_events);
 static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
 
+#define O2HB_DB_TYPE_LIVENODES		0
+struct o2hb_debug_buf {
+	int db_type;
+	int db_size;
+	int db_len;
+	void *db_data;
+};
+
+static struct o2hb_debug_buf *o2hb_db_livenodes;
+
 #define O2HB_DEBUG_DIR			"o2hb"
 #define O2HB_DEBUG_LIVENODES		"livenodes"
+
 static struct dentry *o2hb_debug_dir;
 static struct dentry *o2hb_debug_livenodes;
 
@@ -969,21 +980,35 @@ static int o2hb_thread(void *data)
 #ifdef CONFIG_DEBUG_FS
 static int o2hb_debug_open(struct inode *inode, struct file *file)
 {
+	struct o2hb_debug_buf *db = inode->i_private;
 	unsigned long map[BITS_TO_LONGS(O2NM_MAX_NODES)];
 	char *buf = NULL;
 	int i = -1;
 	int out = 0;
 
+	/* max_nodes should be the largest bitmap we pass here */
+	BUG_ON(sizeof(map) < db->db_size);
+
 	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!buf)
 		goto bail;
 
-	o2hb_fill_node_map(map, sizeof(map));
+	switch(db->db_type) {
+	case O2HB_DB_TYPE_LIVENODES:
+		spin_lock(&o2hb_live_lock);
+		memcpy(map, db->db_data, db->db_size);
+		spin_unlock(&o2hb_live_lock);
+		break;
+
+	default:
+		goto done;
+	}
 
-	while ((i = find_next_bit(map, O2NM_MAX_NODES, i + 1)) < O2NM_MAX_NODES)
+	while ((i = find_next_bit(map, db->db_len, i + 1)) < db->db_len)
 		out += snprintf(buf + out, PAGE_SIZE - out, "%d ", i);
 	out += snprintf(buf + out, PAGE_SIZE - out, "\n");
 
+done:
 	i_size_write(inode, out);
 
 	file->private_data = buf;
@@ -1030,10 +1055,56 @@ static const struct file_operations o2hb_debug_fops = {
 
 void o2hb_exit(void)
 {
-	if (o2hb_debug_livenodes)
-		debugfs_remove(o2hb_debug_livenodes);
-	if (o2hb_debug_dir)
-		debugfs_remove(o2hb_debug_dir);
+	kfree(o2hb_db_livenodes);
+	debugfs_remove(o2hb_debug_livenodes);
+	debugfs_remove(o2hb_debug_dir);
+}
+
+static struct dentry *o2hb_debug_create(const char *name, struct dentry *dir,
+					struct o2hb_debug_buf **db, int db_len,
+					int type, int size, int len, void *data)
+{
+	*db = kmalloc(db_len, GFP_KERNEL);
+	if (!*db)
+		return NULL;
+
+	(*db)->db_type = type;
+	(*db)->db_size = size;
+	(*db)->db_len = len;
+	(*db)->db_data = data;
+
+	return debugfs_create_file(name, S_IFREG|S_IRUSR, dir, *db,
+				   &o2hb_debug_fops);
+}
+
+static int o2hb_debug_init(void)
+{
+	int ret = -ENOMEM;
+
+	o2hb_debug_dir = debugfs_create_dir(O2HB_DEBUG_DIR, NULL);
+	if (!o2hb_debug_dir) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	o2hb_debug_livenodes = o2hb_debug_create(O2HB_DEBUG_LIVENODES,
+						 o2hb_debug_dir,
+						 &o2hb_db_livenodes,
+						 sizeof(*o2hb_db_livenodes),
+						 O2HB_DB_TYPE_LIVENODES,
+						 sizeof(o2hb_live_node_bitmap),
+						 O2NM_MAX_NODES,
+						 o2hb_live_node_bitmap);
+	if (!o2hb_debug_livenodes) {
+		mlog_errno(ret);
+		goto bail;
+	}
+	ret = 0;
+bail:
+	if (ret)
+		o2hb_exit();
+
+	return ret;
 }
 
 int o2hb_init(void)
@@ -1050,23 +1121,7 @@ int o2hb_init(void)
 
 	memset(o2hb_live_node_bitmap, 0, sizeof(o2hb_live_node_bitmap));
 
-	o2hb_debug_dir = debugfs_create_dir(O2HB_DEBUG_DIR, NULL);
-	if (!o2hb_debug_dir) {
-		mlog_errno(-ENOMEM);
-		return -ENOMEM;
-	}
-
-	o2hb_debug_livenodes = debugfs_create_file(O2HB_DEBUG_LIVENODES,
-						   S_IFREG|S_IRUSR,
-						   o2hb_debug_dir, NULL,
-						   &o2hb_debug_fops);
-	if (!o2hb_debug_livenodes) {
-		mlog_errno(-ENOMEM);
-		debugfs_remove(o2hb_debug_dir);
-		return -ENOMEM;
-	}
-
-	return 0;
+	return o2hb_debug_init();
 }
 
 /* if we're already in a callback then we're already serialized by the sem */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 12/20] ocfs2/cluster: Maintain live node bitmap per heartbeat region
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (10 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 11/20] ocfs2/cluster: Reorganize o2hb debugfs init Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions Sunil Mushran
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

Currently we track a global livenode bitmap that keeps track of all nodes
that are heartbeating in all regions.

This patch adds the ability to track the livenode bitmap on a per region basis.
We will use this facility in a later patch to allow us to withstand the loss of
a minority number of regions.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 683478d..29b5c70 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -174,6 +174,9 @@ struct o2hb_region {
 	struct block_device	*hr_bdev;
 	struct o2hb_disk_slot	*hr_slots;
 
+	/* live node map of this region */
+	unsigned long		hr_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
+
 	/* let the person setting up hb wait for it to return until it
 	 * has reached a 'steady' state.  This will be fixed when we have
 	 * a more complete api that doesn't lead to this sort of fragility. */
@@ -688,6 +691,8 @@ fire_callbacks:
 		mlog(ML_HEARTBEAT, "Node %d (id 0x%llx) joined my region\n",
 		     slot->ds_node_num, (long long)slot->ds_last_generation);
 
+		set_bit(slot->ds_node_num, reg->hr_live_node_bitmap);
+
 		/* first on the list generates a callback */
 		if (list_empty(&o2hb_live_slots[slot->ds_node_num])) {
 			set_bit(slot->ds_node_num, o2hb_live_node_bitmap);
@@ -733,6 +738,8 @@ fire_callbacks:
 		mlog(ML_HEARTBEAT, "Node %d left my region\n",
 		     slot->ds_node_num);
 
+		clear_bit(slot->ds_node_num, reg->hr_live_node_bitmap);
+
 		/* last off the live_slot generates a callback */
 		list_del_init(&slot->ds_live_item);
 		if (list_empty(&o2hb_live_slots[slot->ds_node_num])) {
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (11 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 12/20] ocfs2/cluster: Maintain live node bitmap per heartbeat region Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-25  9:36   ` Wengang Wang
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 14/20] ocfs2/cluster: Track bitmap of live " Sunil Mushran
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

In global heartbeat mode, we have a upper limit for the number of active regions.
This patch adds the facility to track the number of active global heartbeat
regions and fails to start heartbeat if the number exceeds the maximum.

Signed-of-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   24 ++++++++++++++++++++++--
 1 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 29b5c70..57c906b 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -62,6 +62,12 @@ static unsigned long o2hb_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
 static LIST_HEAD(o2hb_node_events);
 static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
 
+/*
+ * In global heartbeat, we maintain a series of region bitmaps.
+ * 	- o2hb_region_bitmap allows us to limit the region number to max region.
+ */
+static unsigned long o2hb_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
+
 #define O2HB_DB_TYPE_LIVENODES		0
 struct o2hb_debug_buf {
 	int db_type;
@@ -176,6 +182,7 @@ struct o2hb_region {
 
 	/* live node map of this region */
 	unsigned long		hr_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
+	unsigned int		hr_region_num;
 
 	/* let the person setting up hb wait for it to return until it
 	 * has reached a 'steady' state.  This will be fixed when we have
@@ -1127,6 +1134,7 @@ int o2hb_init(void)
 	INIT_LIST_HEAD(&o2hb_node_events);
 
 	memset(o2hb_live_node_bitmap, 0, sizeof(o2hb_live_node_bitmap));
+	memset(o2hb_region_bitmap, 0, sizeof(o2hb_region_bitmap));
 
 	return o2hb_debug_init();
 }
@@ -1716,12 +1724,22 @@ static struct config_item *o2hb_heartbeat_group_make_item(struct config_group *g
 	if (strlen(name) > O2HB_MAX_REGION_NAME_LEN)
 		return ERR_PTR(-ENAMETOOLONG);
 
-	config_item_init_type_name(&reg->hr_item, name, &o2hb_region_type);
-
 	spin_lock(&o2hb_live_lock);
+	reg->hr_region_num = 0;
+	if (o2hb_global_heartbeat_active()) {
+		reg->hr_region_num = find_first_zero_bit(o2hb_region_bitmap,
+							 O2NM_MAX_REGIONS);
+		if (reg->hr_region_num >= O2NM_MAX_REGIONS) {
+			spin_unlock(&o2hb_live_lock);
+			return ERR_PTR(-EFBIG);
+		}
+		set_bit(reg->hr_region_num, o2hb_region_bitmap);
+	}
 	list_add_tail(&reg->hr_all_item, &o2hb_all_regions);
 	spin_unlock(&o2hb_live_lock);
 
+	config_item_init_type_name(&reg->hr_item, name, &o2hb_region_type);
+
 	return &reg->hr_item;
 }
 
@@ -1733,6 +1751,8 @@ static void o2hb_heartbeat_group_drop_item(struct config_group *group,
 
 	/* stop the thread when the user removes the region dir */
 	spin_lock(&o2hb_live_lock);
+	if (o2hb_global_heartbeat_active())
+		clear_bit(reg->hr_region_num, o2hb_region_bitmap);
 	hb_task = reg->hr_task;
 	reg->hr_task = NULL;
 	spin_unlock(&o2hb_live_lock);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 14/20] ocfs2/cluster: Track bitmap of live heartbeat regions
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (12 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 15/20] ocfs2/cluster: Maintain bitmap of quorum regions Sunil Mushran
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

A heartbeat region becomes live (or active) after a fixed number of (steady)
iterations.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 57c906b..9339c82 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -65,8 +65,10 @@ static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
 /*
  * In global heartbeat, we maintain a series of region bitmaps.
  * 	- o2hb_region_bitmap allows us to limit the region number to max region.
+ * 	- o2hb_live_region_bitmap tracks live regions (seen steady iterations).
  */
 static unsigned long o2hb_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
+static unsigned long o2hb_live_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 
 #define O2HB_DB_TYPE_LIVENODES		0
 struct o2hb_debug_buf {
@@ -1135,6 +1137,7 @@ int o2hb_init(void)
 
 	memset(o2hb_live_node_bitmap, 0, sizeof(o2hb_live_node_bitmap));
 	memset(o2hb_region_bitmap, 0, sizeof(o2hb_region_bitmap));
+	memset(o2hb_live_region_bitmap, 0, sizeof(o2hb_live_region_bitmap));
 
 	return o2hb_debug_init();
 }
@@ -1563,6 +1566,8 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region *reg,
 	/* Ok, we were woken.  Make sure it wasn't by drop_item() */
 	spin_lock(&o2hb_live_lock);
 	hb_task = reg->hr_task;
+	if (o2hb_global_heartbeat_active())
+		set_bit(reg->hr_region_num, o2hb_live_region_bitmap);
 	spin_unlock(&o2hb_live_lock);
 
 	if (hb_task)
@@ -1751,8 +1756,10 @@ static void o2hb_heartbeat_group_drop_item(struct config_group *group,
 
 	/* stop the thread when the user removes the region dir */
 	spin_lock(&o2hb_live_lock);
-	if (o2hb_global_heartbeat_active())
+	if (o2hb_global_heartbeat_active()) {
 		clear_bit(reg->hr_region_num, o2hb_region_bitmap);
+		clear_bit(reg->hr_region_num, o2hb_live_region_bitmap);
+	}
 	hb_task = reg->hr_task;
 	reg->hr_task = NULL;
 	spin_unlock(&o2hb_live_lock);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 15/20] ocfs2/cluster: Maintain bitmap of quorum regions
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (13 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 14/20] ocfs2/cluster: Track bitmap of live " Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:34   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 16/20] ocfs2/cluster: Maintain bitmap of failed regions Sunil Mushran
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

o2hb allows online adding of regions. However, a newly added region is not
used in quorum calculations unless it has been added on all nodes. This patch
tracks a bitmap of such quorum regions.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 9339c82..5e8e1ae 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -66,9 +66,12 @@ static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
  * In global heartbeat, we maintain a series of region bitmaps.
  * 	- o2hb_region_bitmap allows us to limit the region number to max region.
  * 	- o2hb_live_region_bitmap tracks live regions (seen steady iterations).
+ * 	- o2hb_quorum_region_bitmap tracks live regions that have seen all nodes
+ * 		heartbeat on it.
  */
 static unsigned long o2hb_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 static unsigned long o2hb_live_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
+static unsigned long o2hb_quorum_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 
 #define O2HB_DB_TYPE_LIVENODES		0
 struct o2hb_debug_buf {
@@ -605,6 +608,35 @@ static void o2hb_shutdown_slot(struct o2hb_disk_slot *slot)
 	o2nm_node_put(node);
 }
 
+static void o2hb_set_quorum_device(struct o2hb_region *reg,
+				   struct o2hb_disk_slot *slot)
+{
+	assert_spin_locked(&o2hb_live_lock);
+
+	if (!o2hb_global_heartbeat_active())
+		return;
+
+	if (test_bit(reg->hr_region_num, o2hb_quorum_region_bitmap))
+		return;
+
+	/*
+	 * A region can be added to the quorum only when it sees all
+	 * live nodes heartbeat on it. In other words, the region has been
+	 * added to all nodes.
+	 */
+	if (memcmp(reg->hr_live_node_bitmap, o2hb_live_node_bitmap,
+		   sizeof(o2hb_live_node_bitmap)))
+		return;
+
+	if (slot->ds_changed_samples < O2HB_LIVE_THRESHOLD)
+		return;
+
+	printk(KERN_NOTICE "o2hb: Region %s is now a quorum device\n",
+	       config_item_name(&reg->hr_item));
+
+	set_bit(reg->hr_region_num, o2hb_quorum_region_bitmap);
+}
+
 static int o2hb_check_slot(struct o2hb_region *reg,
 			   struct o2hb_disk_slot *slot)
 {
@@ -772,6 +804,8 @@ fire_callbacks:
 		slot->ds_equal_samples = 0;
 	}
 out:
+	o2hb_set_quorum_device(reg, slot);
+
 	spin_unlock(&o2hb_live_lock);
 
 	o2hb_run_event_list(&event);
@@ -1138,6 +1172,7 @@ int o2hb_init(void)
 	memset(o2hb_live_node_bitmap, 0, sizeof(o2hb_live_node_bitmap));
 	memset(o2hb_region_bitmap, 0, sizeof(o2hb_region_bitmap));
 	memset(o2hb_live_region_bitmap, 0, sizeof(o2hb_live_region_bitmap));
+	memset(o2hb_quorum_region_bitmap, 0, sizeof(o2hb_quorum_region_bitmap));
 
 	return o2hb_debug_init();
 }
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 16/20] ocfs2/cluster: Maintain bitmap of failed regions
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (14 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 15/20] ocfs2/cluster: Maintain bitmap of quorum regions Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:35   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 17/20] ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps Sunil Mushran
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

In global heartbeat mode, we track the bitmap of regions that have seen
heartbeat timeouts. We fence if the number of such regions is greater than
or equal to half the number of quorum regions.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 5e8e1ae..6be817b 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -68,10 +68,12 @@ static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
  * 	- o2hb_live_region_bitmap tracks live regions (seen steady iterations).
  * 	- o2hb_quorum_region_bitmap tracks live regions that have seen all nodes
  * 		heartbeat on it.
+ * 	- o2hb_failed_region_bitmap tracks the regions that have seen io timeouts.
  */
 static unsigned long o2hb_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 static unsigned long o2hb_live_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 static unsigned long o2hb_quorum_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
+static unsigned long o2hb_failed_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 
 #define O2HB_DB_TYPE_LIVENODES		0
 struct o2hb_debug_buf {
@@ -217,8 +219,19 @@ struct o2hb_bio_wait_ctxt {
 	int               wc_error;
 };
 
+static int o2hb_pop_count(void *map, int count)
+{
+	int i = -1, pop = 0;
+
+	while((i = find_next_bit(map, count, i + 1)) < count)
+		pop++;
+	return pop;
+}
+
 static void o2hb_write_timeout(struct work_struct *work)
 {
+	int failed, quorum;
+	unsigned long flags;
 	struct o2hb_region *reg =
 		container_of(work, struct o2hb_region,
 			     hr_write_timeout_work.work);
@@ -226,6 +239,28 @@ static void o2hb_write_timeout(struct work_struct *work)
 	mlog(ML_ERROR, "Heartbeat write timeout to device %s after %u "
 	     "milliseconds\n", reg->hr_dev_name,
 	     jiffies_to_msecs(jiffies - reg->hr_last_timeout_start));
+
+	if (o2hb_global_heartbeat_active()) {
+		spin_lock_irqsave(&o2hb_live_lock, flags);
+		if (test_bit(reg->hr_region_num, o2hb_quorum_region_bitmap))
+			set_bit(reg->hr_region_num, o2hb_failed_region_bitmap);
+		failed = o2hb_pop_count(&o2hb_failed_region_bitmap,
+					O2NM_MAX_REGIONS);
+		quorum = o2hb_pop_count(&o2hb_quorum_region_bitmap,
+					O2NM_MAX_REGIONS);
+		spin_unlock_irqrestore(&o2hb_live_lock, flags);
+
+		printk(KERN_NOTICE "Number of regions %d, failed regions %d\n",
+		       quorum, failed);
+
+		/*
+		 * Fence if the number of failed regions >= half the number
+		 * of  quorum regions
+		 */
+		if ((failed << 1) < quorum)
+			return;
+	}
+
 	o2quo_disk_timeout();
 }
 
@@ -234,6 +269,11 @@ static void o2hb_arm_write_timeout(struct o2hb_region *reg)
 	mlog(ML_HEARTBEAT, "Queue write timeout for %u ms\n",
 	     O2HB_MAX_WRITE_TIMEOUT_MS);
 
+	if (o2hb_global_heartbeat_active()) {
+		spin_lock(&o2hb_live_lock);
+		clear_bit(reg->hr_region_num, o2hb_failed_region_bitmap);
+		spin_unlock(&o2hb_live_lock);
+	}
 	cancel_delayed_work(&reg->hr_write_timeout_work);
 	reg->hr_last_timeout_start = jiffies;
 	schedule_delayed_work(&reg->hr_write_timeout_work,
@@ -1173,6 +1213,7 @@ int o2hb_init(void)
 	memset(o2hb_region_bitmap, 0, sizeof(o2hb_region_bitmap));
 	memset(o2hb_live_region_bitmap, 0, sizeof(o2hb_live_region_bitmap));
 	memset(o2hb_quorum_region_bitmap, 0, sizeof(o2hb_quorum_region_bitmap));
+	memset(o2hb_failed_region_bitmap, 0, sizeof(o2hb_failed_region_bitmap));
 
 	return o2hb_debug_init();
 }
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 17/20] ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (15 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 16/20] ocfs2/cluster: Maintain bitmap of failed regions Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 18/20] ocfs2/cluster: Create debugfs dir/files for each region Sunil Mushran
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

This patch prints the bitmaps of live, quorum and failed regions. This
information will be useful in debugging cluster issues.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   63 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 6be817b..6885a22 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -76,6 +76,9 @@ static unsigned long o2hb_quorum_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 static unsigned long o2hb_failed_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 
 #define O2HB_DB_TYPE_LIVENODES		0
+#define O2HB_DB_TYPE_LIVEREGIONS	1
+#define O2HB_DB_TYPE_QUORUMREGIONS	2
+#define O2HB_DB_TYPE_FAILEDREGIONS	3
 struct o2hb_debug_buf {
 	int db_type;
 	int db_size;
@@ -84,12 +87,21 @@ struct o2hb_debug_buf {
 };
 
 static struct o2hb_debug_buf *o2hb_db_livenodes;
+static struct o2hb_debug_buf *o2hb_db_liveregions;
+static struct o2hb_debug_buf *o2hb_db_quorumregions;
+static struct o2hb_debug_buf *o2hb_db_failedregions;
 
 #define O2HB_DEBUG_DIR			"o2hb"
 #define O2HB_DEBUG_LIVENODES		"livenodes"
+#define O2HB_DEBUG_LIVEREGIONS		"live_regions"
+#define O2HB_DEBUG_QUORUMREGIONS	"quorum_regions"
+#define O2HB_DEBUG_FAILEDREGIONS	"failed_regions"
 
 static struct dentry *o2hb_debug_dir;
 static struct dentry *o2hb_debug_livenodes;
+static struct dentry *o2hb_debug_liveregions;
+static struct dentry *o2hb_debug_quorumregions;
+static struct dentry *o2hb_debug_failedregions;
 
 static LIST_HEAD(o2hb_all_regions);
 
@@ -1085,6 +1097,9 @@ static int o2hb_debug_open(struct inode *inode, struct file *file)
 
 	switch(db->db_type) {
 	case O2HB_DB_TYPE_LIVENODES:
+	case O2HB_DB_TYPE_LIVEREGIONS:
+	case O2HB_DB_TYPE_QUORUMREGIONS:
+	case O2HB_DB_TYPE_FAILEDREGIONS:
 		spin_lock(&o2hb_live_lock);
 		memcpy(map, db->db_data, db->db_size);
 		spin_unlock(&o2hb_live_lock);
@@ -1146,6 +1161,12 @@ static const struct file_operations o2hb_debug_fops = {
 void o2hb_exit(void)
 {
 	kfree(o2hb_db_livenodes);
+	kfree(o2hb_db_liveregions);
+	kfree(o2hb_db_quorumregions);
+	kfree(o2hb_db_failedregions);
+	debugfs_remove(o2hb_debug_failedregions);
+	debugfs_remove(o2hb_debug_quorumregions);
+	debugfs_remove(o2hb_debug_liveregions);
 	debugfs_remove(o2hb_debug_livenodes);
 	debugfs_remove(o2hb_debug_dir);
 }
@@ -1189,6 +1210,48 @@ static int o2hb_debug_init(void)
 		mlog_errno(ret);
 		goto bail;
 	}
+
+	o2hb_debug_liveregions = o2hb_debug_create(O2HB_DEBUG_LIVEREGIONS,
+						   o2hb_debug_dir,
+						   &o2hb_db_liveregions,
+						   sizeof(*o2hb_db_liveregions),
+						   O2HB_DB_TYPE_LIVEREGIONS,
+						   sizeof(o2hb_live_region_bitmap),
+						   O2NM_MAX_REGIONS,
+						   o2hb_live_region_bitmap);
+	if (!o2hb_debug_liveregions) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	o2hb_debug_quorumregions =
+			o2hb_debug_create(O2HB_DEBUG_QUORUMREGIONS,
+					  o2hb_debug_dir,
+					  &o2hb_db_quorumregions,
+					  sizeof(*o2hb_db_quorumregions),
+					  O2HB_DB_TYPE_QUORUMREGIONS,
+					  sizeof(o2hb_quorum_region_bitmap),
+					  O2NM_MAX_REGIONS,
+					  o2hb_quorum_region_bitmap);
+	if (!o2hb_debug_quorumregions) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	o2hb_debug_failedregions =
+			o2hb_debug_create(O2HB_DEBUG_FAILEDREGIONS,
+					  o2hb_debug_dir,
+					  &o2hb_db_failedregions,
+					  sizeof(*o2hb_db_failedregions),
+					  O2HB_DB_TYPE_FAILEDREGIONS,
+					  sizeof(o2hb_failed_region_bitmap),
+					  O2NM_MAX_REGIONS,
+					  o2hb_failed_region_bitmap);
+	if (!o2hb_debug_failedregions) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
 	ret = 0;
 bail:
 	if (ret)
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 18/20] ocfs2/cluster: Create debugfs dir/files for each region
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (16 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 17/20] ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 19/20] ocfs2/cluster: Add printks to show heartbeat up/down events Sunil Mushran
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

This patch creates debugfs directory for each o2hb region and creates
files to expose the region number and the per region live node bitmap.
This information will be useful in debugging cluster issues.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   77 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 6885a22..ad5fe57 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -79,6 +79,8 @@ static unsigned long o2hb_failed_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 #define O2HB_DB_TYPE_LIVEREGIONS	1
 #define O2HB_DB_TYPE_QUORUMREGIONS	2
 #define O2HB_DB_TYPE_FAILEDREGIONS	3
+#define O2HB_DB_TYPE_REGION_LIVENODES	4
+#define O2HB_DB_TYPE_REGION_NUMBER	5
 struct o2hb_debug_buf {
 	int db_type;
 	int db_size;
@@ -96,6 +98,7 @@ static struct o2hb_debug_buf *o2hb_db_failedregions;
 #define O2HB_DEBUG_LIVEREGIONS		"live_regions"
 #define O2HB_DEBUG_QUORUMREGIONS	"quorum_regions"
 #define O2HB_DEBUG_FAILEDREGIONS	"failed_regions"
+#define O2HB_DEBUG_REGION_NUMBER	"num"
 
 static struct dentry *o2hb_debug_dir;
 static struct dentry *o2hb_debug_livenodes;
@@ -203,6 +206,12 @@ struct o2hb_region {
 	unsigned long		hr_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
 	unsigned int		hr_region_num;
 
+	struct dentry		*hr_debug_dir;
+	struct dentry		*hr_debug_livenodes;
+	struct dentry		*hr_debug_regnum;
+	struct o2hb_debug_buf	*hr_db_livenodes;
+	struct o2hb_debug_buf	*hr_db_regnum;
+
 	/* let the person setting up hb wait for it to return until it
 	 * has reached a 'steady' state.  This will be fixed when we have
 	 * a more complete api that doesn't lead to this sort of fragility. */
@@ -1083,6 +1092,7 @@ static int o2hb_thread(void *data)
 static int o2hb_debug_open(struct inode *inode, struct file *file)
 {
 	struct o2hb_debug_buf *db = inode->i_private;
+	struct o2hb_region *reg;
 	unsigned long map[BITS_TO_LONGS(O2NM_MAX_NODES)];
 	char *buf = NULL;
 	int i = -1;
@@ -1105,6 +1115,19 @@ static int o2hb_debug_open(struct inode *inode, struct file *file)
 		spin_unlock(&o2hb_live_lock);
 		break;
 
+	case O2HB_DB_TYPE_REGION_LIVENODES:
+		spin_lock(&o2hb_live_lock);
+		reg = (struct o2hb_region *)db->db_data;
+		memcpy(map, reg->hr_live_node_bitmap, db->db_size);
+		spin_unlock(&o2hb_live_lock);
+		break;
+
+	case O2HB_DB_TYPE_REGION_NUMBER:
+		reg = (struct o2hb_region *)db->db_data;
+		out += snprintf(buf + out, PAGE_SIZE - out, "%d\n",
+				reg->hr_region_num);
+		goto done;
+
 	default:
 		goto done;
 	}
@@ -1342,6 +1365,12 @@ static void o2hb_region_release(struct config_item *item)
 	if (reg->hr_slots)
 		kfree(reg->hr_slots);
 
+	kfree(reg->hr_db_regnum);
+	kfree(reg->hr_db_livenodes);
+	debugfs_remove(reg->hr_debug_livenodes);
+	debugfs_remove(reg->hr_debug_regnum);
+	debugfs_remove(reg->hr_debug_dir);
+
 	spin_lock(&o2hb_live_lock);
 	list_del(&reg->hr_all_item);
 	spin_unlock(&o2hb_live_lock);
@@ -1856,10 +1885,52 @@ static struct o2hb_heartbeat_group *to_o2hb_heartbeat_group(struct config_group
 		: NULL;
 }
 
+static int o2hb_debug_region_init(struct o2hb_region *reg, struct dentry *dir)
+{
+	int ret = -ENOMEM;
+
+	reg->hr_debug_dir =
+		debugfs_create_dir(config_item_name(&reg->hr_item), dir);
+	if (!reg->hr_debug_dir) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	reg->hr_debug_livenodes =
+			o2hb_debug_create(O2HB_DEBUG_LIVENODES,
+					  reg->hr_debug_dir,
+					  &(reg->hr_db_livenodes),
+					  sizeof(*(reg->hr_db_livenodes)),
+					  O2HB_DB_TYPE_REGION_LIVENODES,
+					  sizeof(reg->hr_live_node_bitmap),
+					  O2NM_MAX_NODES, reg);
+	if (!reg->hr_debug_livenodes) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	reg->hr_debug_regnum =
+			o2hb_debug_create(O2HB_DEBUG_REGION_NUMBER,
+					  reg->hr_debug_dir,
+					  &(reg->hr_db_regnum),
+					  sizeof(*(reg->hr_db_regnum)),
+					  O2HB_DB_TYPE_REGION_NUMBER,
+					  0, O2NM_MAX_NODES, reg);
+	if (!reg->hr_debug_regnum) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	ret = 0;
+bail:
+	return ret;
+}
+
 static struct config_item *o2hb_heartbeat_group_make_item(struct config_group *group,
 							  const char *name)
 {
 	struct o2hb_region *reg = NULL;
+	int ret;
 
 	reg = kzalloc(sizeof(struct o2hb_region), GFP_KERNEL);
 	if (reg == NULL)
@@ -1884,6 +1955,12 @@ static struct config_item *o2hb_heartbeat_group_make_item(struct config_group *g
 
 	config_item_init_type_name(&reg->hr_item, name, &o2hb_region_type);
 
+	ret = o2hb_debug_region_init(reg, o2hb_debug_dir);
+	if (ret) {
+		config_item_put(&reg->hr_item);
+		return ERR_PTR(ret);
+	}
+
 	return &reg->hr_item;
 }
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 19/20] ocfs2/cluster: Add printks to show heartbeat up/down events
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (17 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 18/20] ocfs2/cluster: Create debugfs dir/files for each region Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:36   ` Joel Becker
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 20/20] ocfs2/cluster: Show per region heartbeat elapsed time Sunil Mushran
  2010-09-23 22:37 ` [Ocfs2-devel] Global Heartbeat - fs patches Joel Becker
  20 siblings, 1 reply; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

This patch adds printks to show o2hb up and down events. This information
will be useful in debugging cluster issues.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index ad5fe57..ce6b166 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -601,7 +601,7 @@ static void o2hb_run_event_list(struct o2hb_node_event *queued_event)
 		list_del_init(&event->hn_item);
 		spin_unlock(&o2hb_live_lock);
 
-		mlog(ML_HEARTBEAT, "Node %s event for %d\n",
+		printk(KERN_NOTICE "o2hb: Node %s event for %d\n",
 		     event->hn_event_type == O2HB_NODE_UP_CB ? "UP" : "DOWN",
 		     event->hn_node_num);
 
@@ -797,6 +797,8 @@ fire_callbacks:
 
 		/* first on the list generates a callback */
 		if (list_empty(&o2hb_live_slots[slot->ds_node_num])) {
+			printk(KERN_NOTICE "o2hb: Add node %d to live nodes "
+			       "bitmap\n", slot->ds_node_num);
 			set_bit(slot->ds_node_num, o2hb_live_node_bitmap);
 
 			o2hb_queue_node_event(&event, O2HB_NODE_UP_CB, node,
@@ -845,6 +847,8 @@ fire_callbacks:
 		/* last off the live_slot generates a callback */
 		list_del_init(&slot->ds_live_item);
 		if (list_empty(&o2hb_live_slots[slot->ds_node_num])) {
+			printk(KERN_NOTICE "o2hb: Remove node %d from live "
+			       "nodes bitmap\n", slot->ds_node_num);
 			clear_bit(slot->ds_node_num, o2hb_live_node_bitmap);
 
 			if (node)
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 20/20] ocfs2/cluster: Show per region heartbeat elapsed time
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (18 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 19/20] ocfs2/cluster: Add printks to show heartbeat up/down events Sunil Mushran
@ 2010-09-14 22:50 ` Sunil Mushran
  2010-09-23 22:37 ` [Ocfs2-devel] Global Heartbeat - fs patches Joel Becker
  20 siblings, 0 replies; 35+ messages in thread
From: Sunil Mushran @ 2010-09-14 22:50 UTC (permalink / raw)
  To: ocfs2-devel

This patch adds a per region debugfs file that shows the elapsed time
since the time the o2hb timer was last armed.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/cluster/heartbeat.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index ce6b166..37043e8 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -81,6 +81,7 @@ static unsigned long o2hb_failed_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
 #define O2HB_DB_TYPE_FAILEDREGIONS	3
 #define O2HB_DB_TYPE_REGION_LIVENODES	4
 #define O2HB_DB_TYPE_REGION_NUMBER	5
+#define O2HB_DB_TYPE_REGION_ELAPSED_TIME	6
 struct o2hb_debug_buf {
 	int db_type;
 	int db_size;
@@ -99,6 +100,7 @@ static struct o2hb_debug_buf *o2hb_db_failedregions;
 #define O2HB_DEBUG_QUORUMREGIONS	"quorum_regions"
 #define O2HB_DEBUG_FAILEDREGIONS	"failed_regions"
 #define O2HB_DEBUG_REGION_NUMBER	"num"
+#define O2HB_DEBUG_REGION_ELAPSED_TIME	"elapsed_time_in_ms"
 
 static struct dentry *o2hb_debug_dir;
 static struct dentry *o2hb_debug_livenodes;
@@ -209,8 +211,10 @@ struct o2hb_region {
 	struct dentry		*hr_debug_dir;
 	struct dentry		*hr_debug_livenodes;
 	struct dentry		*hr_debug_regnum;
+	struct dentry		*hr_debug_elapsed_time;
 	struct o2hb_debug_buf	*hr_db_livenodes;
 	struct o2hb_debug_buf	*hr_db_regnum;
+	struct o2hb_debug_buf	*hr_db_elapsed_time;
 
 	/* let the person setting up hb wait for it to return until it
 	 * has reached a 'steady' state.  This will be fixed when we have
@@ -1132,6 +1136,13 @@ static int o2hb_debug_open(struct inode *inode, struct file *file)
 				reg->hr_region_num);
 		goto done;
 
+	case O2HB_DB_TYPE_REGION_ELAPSED_TIME:
+		reg = (struct o2hb_region *)db->db_data;
+		out += snprintf(buf + out, PAGE_SIZE - out, "%u\n",
+				jiffies_to_msecs(jiffies -
+						 reg->hr_last_timeout_start));
+		goto done;
+
 	default:
 		goto done;
 	}
@@ -1925,6 +1936,18 @@ static int o2hb_debug_region_init(struct o2hb_region *reg, struct dentry *dir)
 		goto bail;
 	}
 
+	reg->hr_debug_elapsed_time =
+			o2hb_debug_create(O2HB_DEBUG_REGION_ELAPSED_TIME,
+					  reg->hr_debug_dir,
+					  &(reg->hr_db_elapsed_time),
+					  sizeof(*(reg->hr_db_elapsed_time)),
+					  O2HB_DB_TYPE_REGION_ELAPSED_TIME,
+					  0, 0, reg);
+	if (!reg->hr_debug_elapsed_time) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
 	ret = 0;
 bail:
 	return ret;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 05/20] ocfs2/cluster: Get all heartbeat regions
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 05/20] ocfs2/cluster: Get all heartbeat regions Sunil Mushran
@ 2010-09-23 21:57   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 21:57 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:41PM -0700, Sunil Mushran wrote:
> +int o2hb_get_all_regions(char *region_uuids, u8 max_regions)
> +{
> +	struct o2hb_region *reg;
> +	int numregs = 0;
> +	char *p;
> +
> +	spin_lock(&o2hb_live_lock);
> +
> +	p = region_uuids;
> +	list_for_each_entry(reg, &o2hb_all_regions, hr_all_item) {
> +		mlog(0, "Region: %s\n", config_item_name(&reg->hr_item));
> +		if (numregs < max_regions) {
> +			memcpy(p, config_item_name(&reg->hr_item),
> +			       O2HB_MAX_REGION_NAME_LEN);
> +			p += O2HB_MAX_REGION_NAME_LEN;
> +		}
> +		numregs++;
> +	}

	The way I read this, region_uuids is a single array of length
max_regions*MAX_REGION_NAME_LEN.  That's pretty ugly, no?  I get that
you don't want to allocate strings, but there is no reason that you
can't require someone to pass in char**:

int o2hb_get_all_regions(char **region_uuids, u8 max_regions)
	...
		strncpy(region_uuids[numregs], name, MAX_LEN)
		numregs++

Sure, the memory layout of "region_uuids[max_regions][MAX_LEN]" the same
as "region_uuids[max_regions * MAX_LEN]", but walking it looks better.

Joel

-- 

"But all my words come back to me
 In shades of mediocrity.
 Like emptiness in harmony
 I need someone to comfort me."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 08/20] ocfs2/dlm: Add message DLM_QUERY_NODEINFO
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 08/20] ocfs2/dlm: Add message DLM_QUERY_NODEINFO Sunil Mushran
@ 2010-09-23 22:18   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:18 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:44PM -0700, Sunil Mushran wrote:
> Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all
> registered nodes. This message is sent if the negotiated dlm protocol is
> 1.1 or higher. If the information of the joining node does not match
> that of any existing nodes, the join domain request is rejected.
> 
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

	I don't think you should be bumping the actual protocol before
all 1.1 messages are in the code.  Otherwise, someone bisecting between
the QUERY_REGION and QUEYR_NODEINFO patches might have real problems.

Joel

-- 

Life's Little Instruction Book #198

	"Feed a stranger's expired parking meter."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 09/20] ocfs2/cluster: Print messages when adding/removing nodes and heartbeat regions
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 09/20] ocfs2/cluster: Print messages when adding/removing nodes and heartbeat regions Sunil Mushran
@ 2010-09-23 22:25   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:25 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:45PM -0700, Sunil Mushran wrote:
> Prints messages when the user adds or removes nodes and heartbeat regions.
> The heartbeat region logging is only enabled in the global heartbeat mode. These
> messages are useful when debugging cluster related issues.
> 
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

	I gotta say NAK.  Maybe Mark disagrees, but it certainly sounds
like we're spamming the log for things that are normally not terribly
interesting.
	If you're worried about tracking registration of nodes, have
the o2cb tool syslog it.  That will get logged, but won't pollute dmesg.

Joel

-- 

"Baby, even the losers
 Get luck sometimes.
 Even the losers
 Keep a little bit of pride."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 10/20] ocfs2/cluster: Check slots for unconfigured live nodes
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 10/20] ocfs2/cluster: Check slots for unconfigured live nodes Sunil Mushran
@ 2010-09-23 22:31   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:31 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:46PM -0700, Sunil Mushran wrote:
> +	if (!node) {
> +		spin_lock(&o2hb_live_lock);
> +		tmp = test_bit(slot->ds_node_num, o2hb_live_node_bitmap);
> +		spin_unlock(&o2hb_live_lock);
> +		if (!tmp)
> +			return 0;
> +		printk(KERN_NOTICE "o2hb: Live node %d is not registered\n",
> +		       slot->ds_node_num);

	This notice is ill-placed, I think.  I do believe a NOTICE is
warranted when the user has removed a node from configfs but that node
is still heartbeating.  Maybe we only print the NOTICE when it goes down
or up?  Or perhaps we print this notice here, but we only do so once.
I think printing this notice every 30 seconds forever would be awful.
	I would also change the text.  Something like "o2hb: Node number
%d is still heartbeating, but its configuration has been removed."

Joel

-- 

"There is no sincerer love than the love of food."  
         - George Bernard Shaw 

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 15/20] ocfs2/cluster: Maintain bitmap of quorum regions
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 15/20] ocfs2/cluster: Maintain bitmap of quorum regions Sunil Mushran
@ 2010-09-23 22:34   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:34 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:51PM -0700, Sunil Mushran wrote:
> +	printk(KERN_NOTICE "o2hb: Region %s is now a quorum device\n",
> +	       config_item_name(&reg->hr_item));

	I'm thinking that this is a valid message.

Joel
-- 

Life's Little Instruction Book #510

	"Count your blessings."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 16/20] ocfs2/cluster: Maintain bitmap of failed regions
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 16/20] ocfs2/cluster: Maintain bitmap of failed regions Sunil Mushran
@ 2010-09-23 22:35   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:35 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:52PM -0700, Sunil Mushran wrote:
> +	if (o2hb_global_heartbeat_active()) {
> +		spin_lock_irqsave(&o2hb_live_lock, flags);
> +		if (test_bit(reg->hr_region_num, o2hb_quorum_region_bitmap))
> +			set_bit(reg->hr_region_num, o2hb_failed_region_bitmap);
> +		failed = o2hb_pop_count(&o2hb_failed_region_bitmap,
> +					O2NM_MAX_REGIONS);
> +		quorum = o2hb_pop_count(&o2hb_quorum_region_bitmap,
> +					O2NM_MAX_REGIONS);
> +		spin_unlock_irqrestore(&o2hb_live_lock, flags);
> +
> +		printk(KERN_NOTICE "Number of regions %d, failed regions %d\n",
> +		       quorum, failed);

	This is not worth printing.  Quorum fired, so we know what
happened.

Joel

-- 

"The whole problem with the world is that fools and fanatics are always
 so certain of themselves, and wiser people so full of doubts."
	- Bertrand Russell

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 19/20] ocfs2/cluster: Add printks to show heartbeat up/down events
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 19/20] ocfs2/cluster: Add printks to show heartbeat up/down events Sunil Mushran
@ 2010-09-23 22:36   ` Joel Becker
  0 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:36 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:55PM -0700, Sunil Mushran wrote:
> This patch adds printks to show o2hb up and down events. This information
> will be useful in debugging cluster issues.
> 
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

	I'm just not feeling all this dmesg spam.

Joel

-- 

"Hell is oneself, hell is alone, the other figures in it, merely projections."
        - T. S. Eliot

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] Global Heartbeat - fs patches
  2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
                   ` (19 preceding siblings ...)
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 20/20] ocfs2/cluster: Show per region heartbeat elapsed time Sunil Mushran
@ 2010-09-23 22:37 ` Joel Becker
  20 siblings, 0 replies; 35+ messages in thread
From: Joel Becker @ 2010-09-23 22:37 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Sep 14, 2010 at 03:50:36PM -0700, Sunil Mushran wrote:
> 
> So this is the next drop of the global heartbeat patches that have been
> rebased with current mainline head. The patches are feature-wise complete.
> 
> Please refer to this wiki to learn more on this feature.
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/NewGlobalHeartbeat
> 
> Please review.

	My review is done.  It's all pretty clean stuff; well done,
Sunil.  Most of my review is cosmetic.
	The one big thing that needs to change is that the default
dlm_protocol cannot be bumped until all the functionality is there.
After the messages are added.  After the quorum calculation is updated.
Just make it the last patch of the series.

Joel

-- 

 print STDOUT q
 Just another Perl hacker,
 unless $spring
	- Larry Wall

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter Sunil Mushran
@ 2010-09-25  8:11   ` Wengang Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Wengang Wang @ 2010-09-25  8:11 UTC (permalink / raw)
  To: ocfs2-devel

On 10-09-14 15:50, Sunil Mushran wrote:
> Add heartbeat mode parameter to the configfs tree. This will be used
> to set/show the heartbeat mode. The user is free to toggle the mode
> between local and global as long as there is no active heartbeat region.
> 
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
> ---
>  fs/ocfs2/cluster/heartbeat.c |   70 ++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 70 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index 41d5f1f..57cc715 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -77,7 +77,19 @@ static struct o2hb_callback *hbcall_from_type(enum o2hb_callback_type type);
>  
> +static
> +ssize_t o2hb_heartbeat_group_mode_store(struct o2hb_heartbeat_group *group,
> +					const char *page, size_t count)
> +{
> +	unsigned int i;
> +	int ret;
> +	size_t len;
> +
> +	len = (page[count - 1] == '\n') ? count - 1 : count;

How about adding
 
+	if (!len)
+		return -EINVAL;

In case len is 0(though userspace should take care of this), strnicmp() returns "match".
And the mode will be set to O2HB_HEARTBEAT_LOCAL unexpectedly.

regards,
wengang.
> +
> +	for (i = 0; i < O2HB_HEARTBEAT_NUM_MODES; ++i) {
> +		if (strnicmp(page, o2hb_heartbeat_mode_desc[i], len))
> +			continue;
> +
> +		ret = o2hb_global_hearbeat_mode_set(i);
> +		if (!ret)
> +			printk(KERN_NOTICE "ocfs2: Heartbeat mode set to %s\n",
> +			       o2hb_heartbeat_mode_desc[i]);
> +		return count;
> +	}
> +
> +	return -EINVAL;
> +
> +}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 03/20] ocfs2: Add support for heartbeat=global mount option
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 03/20] ocfs2: Add support for heartbeat=global mount option Sunil Mushran
@ 2010-09-25  8:39   ` Wengang Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Wengang Wang @ 2010-09-25  8:39 UTC (permalink / raw)
  To: ocfs2-devel

On 10-09-14 15:50, Sunil Mushran wrote:
> Adds support for heartbeat=global mount option. It ensures that the heartbeat
> mode passed matches the one enabled on disk.
> 
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
> ---
>  fs/ocfs2/ocfs2.h    |    4 ++-
>  fs/ocfs2/ocfs2_fs.h |    1 +
>  fs/ocfs2/super.c    |   55 ++++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 45 insertions(+), 15 deletions(-)
> 
> @@ -1291,6 +1301,7 @@ static int ocfs2_parse_options(struct super_block *sb,
>  {
>  	int status;
>  	char *p;
> +	u32 tmp;
>  
>  	mlog_entry("remount: %d, options: \"%s\"\n", is_remount,
>  		   options ? options : "(none)");
> @@ -1322,7 +1333,10 @@ static int ocfs2_parse_options(struct super_block *sb,
>  			mopt->mount_opt |= OCFS2_MOUNT_HB_LOCAL;
>  			break;
>  		case Opt_hb_none:
> -			mopt->mount_opt &= ~OCFS2_MOUNT_HB_LOCAL;
> +			mopt->mount_opt |= OCFS2_MOUNT_HB_NONE;
> +			break;
> +		case Opt_hb_global:
> +			mopt->mount_opt |= OCFS2_MOUNT_HB_GLOBAL;
>  			break;
>  		case Opt_barrier:
>  			if (match_int(&args[0], &option)) {
> @@ -1477,6 +1491,15 @@ static int ocfs2_parse_options(struct super_block *sb,
>  		}
>  	}
>  
> +	/* Ensure only one heartbeat mode */
> +	tmp = mopt->mount_opt & (OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL |
> +				 OCFS2_MOUNT_HB_NONE);
> +	if (hweight32(tmp) != 1) {
> +		mlog(ML_ERROR, "Invalid heartbeat mount option: %s\n", options);

Coming here, the "options" doesn't hold orginal options yet.

regards,
wengang.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 04/20] ocfs2/dlm: Expose dlm_protocol in dlm_state
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 04/20] ocfs2/dlm: Expose dlm_protocol in dlm_state Sunil Mushran
@ 2010-09-25  8:42   ` Wengang Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Wengang Wang @ 2010-09-25  8:42 UTC (permalink / raw)
  To: ocfs2-devel

On 10-09-14 15:50, Sunil Mushran wrote:
> Add dlm_protocol to the list of info shown by the debugfs file, dlm_state.
> 
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
> ---
>  fs/ocfs2/dlm/dlmdebug.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
> index 5efdd37..51164a6 100644
> --- a/fs/ocfs2/dlm/dlmdebug.c
> +++ b/fs/ocfs2/dlm/dlmdebug.c
> @@ -775,7 +775,9 @@ static int debug_state_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
>  
>  	/* Domain: xxxxxxxxxx  Key: 0xdfbac769 */
>  	out += snprintf(db->buf + out, db->len - out,
> -			"Domain: %s  Key: 0x%08x\n", dlm->name, dlm->key);
> +			"Domain: %s  Key: 0x%08x  Protocol: %d.%d\n",

It seems %d is no problem for small numbers, but %u is better?

regards,
wengang.
> +			dlm->name, dlm->key, dlm->dlm_locking_proto.pv_major,
> +			dlm->dlm_locking_proto.pv_minor);
>  
>  	/* Thread Pid: xxx  Node: xxx  State: xxxxx */
>  	out += snprintf(db->buf + out, db->len - out,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions
  2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions Sunil Mushran
@ 2010-09-25  9:36   ` Wengang Wang
  2010-09-25  9:44     ` Joel Becker
  0 siblings, 1 reply; 35+ messages in thread
From: Wengang Wang @ 2010-09-25  9:36 UTC (permalink / raw)
  To: ocfs2-devel

On 10-09-14 15:50, Sunil Mushran wrote:
> In global heartbeat mode, we have a upper limit for the number of active regions.
> This patch adds the facility to track the number of active global heartbeat
> regions and fails to start heartbeat if the number exceeds the maximum.
> 
> Signed-of-by: Sunil Mushran <sunil.mushran@oracle.com>
> ---
>  fs/ocfs2/cluster/heartbeat.c |   24 ++++++++++++++++++++++--
>  1 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index 29b5c70..57c906b 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -62,6 +62,12 @@ static unsigned long o2hb_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
>  static LIST_HEAD(o2hb_node_events);
>  static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
>  
> +/*
> + * In global heartbeat, we maintain a series of region bitmaps.
> + * 	- o2hb_region_bitmap allows us to limit the region number to max region.
> + */
> +static unsigned long o2hb_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
> +
>  #define O2HB_DB_TYPE_LIVENODES		0
>  struct o2hb_debug_buf {
>  	int db_type;
> @@ -176,6 +182,7 @@ struct o2hb_region {
>  
>  	/* live node map of this region */
>  	unsigned long		hr_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
> +	unsigned int		hr_region_num;

I don't remember clear the value for O2NM_MAX_REGIONS, 32? Will
u8 be better than "unsigned int"?

regards,
wengang.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions
  2010-09-25  9:36   ` Wengang Wang
@ 2010-09-25  9:44     ` Joel Becker
  2010-09-25 10:09       ` Wengang Wang
  0 siblings, 1 reply; 35+ messages in thread
From: Joel Becker @ 2010-09-25  9:44 UTC (permalink / raw)
  To: ocfs2-devel

On Sat, Sep 25, 2010 at 05:36:38PM +0800, Wengang Wang wrote:
> On 10-09-14 15:50, Sunil Mushran wrote:
> > In global heartbeat mode, we have a upper limit for the number of active regions.
> > This patch adds the facility to track the number of active global heartbeat
> > regions and fails to start heartbeat if the number exceeds the maximum.
> > 
> > Signed-of-by: Sunil Mushran <sunil.mushran@oracle.com>
> > ---
> >  fs/ocfs2/cluster/heartbeat.c |   24 ++++++++++++++++++++++--
> >  1 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> > index 29b5c70..57c906b 100644
> > --- a/fs/ocfs2/cluster/heartbeat.c
> > +++ b/fs/ocfs2/cluster/heartbeat.c
> > @@ -62,6 +62,12 @@ static unsigned long o2hb_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
> >  static LIST_HEAD(o2hb_node_events);
> >  static DECLARE_WAIT_QUEUE_HEAD(o2hb_steady_queue);
> >  
> > +/*
> > + * In global heartbeat, we maintain a series of region bitmaps.
> > + * 	- o2hb_region_bitmap allows us to limit the region number to max region.
> > + */
> > +static unsigned long o2hb_region_bitmap[BITS_TO_LONGS(O2NM_MAX_REGIONS)];
> > +
> >  #define O2HB_DB_TYPE_LIVENODES		0
> >  struct o2hb_debug_buf {
> >  	int db_type;
> > @@ -176,6 +182,7 @@ struct o2hb_region {
> >  
> >  	/* live node map of this region */
> >  	unsigned long		hr_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
> > +	unsigned int		hr_region_num;
> 
> I don't remember clear the value for O2NM_MAX_REGIONS, 32? Will
> u8 be better than "unsigned int"?

	It's not an on-disk structure.  There's no need to enforce the
size.

Joel

-- 

Life's Little Instruction Book #173

	"Be kinder than necessary."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions
  2010-09-25  9:44     ` Joel Becker
@ 2010-09-25 10:09       ` Wengang Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Wengang Wang @ 2010-09-25 10:09 UTC (permalink / raw)
  To: ocfs2-devel

> > >  	/* live node map of this region */
> > >  	unsigned long		hr_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
> > > +	unsigned int		hr_region_num;
> > 
> > I don't remember clear the value for O2NM_MAX_REGIONS, 32? Will
> > u8 be better than "unsigned int"?
> 
> 	It's not an on-disk structure.  There's no need to enforce the
> size.

I got it. Thanks!

regards,
wengang.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2010-09-25 10:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-14 22:50 [Ocfs2-devel] Global Heartbeat - fs patches Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 01/20] ocfs2/cluster: Add heartbeat mode configfs parameter Sunil Mushran
2010-09-25  8:11   ` Wengang Wang
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 02/20] ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 03/20] ocfs2: Add support for heartbeat=global mount option Sunil Mushran
2010-09-25  8:39   ` Wengang Wang
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 04/20] ocfs2/dlm: Expose dlm_protocol in dlm_state Sunil Mushran
2010-09-25  8:42   ` Wengang Wang
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 05/20] ocfs2/cluster: Get all heartbeat regions Sunil Mushran
2010-09-23 21:57   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 06/20] ocfs2/dlm: Add message DLM_QUERY_REGION Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 07/20] ocfs2: Print message if user mounts without starting global heartbeat Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 08/20] ocfs2/dlm: Add message DLM_QUERY_NODEINFO Sunil Mushran
2010-09-23 22:18   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 09/20] ocfs2/cluster: Print messages when adding/removing nodes and heartbeat regions Sunil Mushran
2010-09-23 22:25   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 10/20] ocfs2/cluster: Check slots for unconfigured live nodes Sunil Mushran
2010-09-23 22:31   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 11/20] ocfs2/cluster: Reorganize o2hb debugfs init Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 12/20] ocfs2/cluster: Maintain live node bitmap per heartbeat region Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 13/20] ocfs2/cluster: Track number of global heartbeat regions Sunil Mushran
2010-09-25  9:36   ` Wengang Wang
2010-09-25  9:44     ` Joel Becker
2010-09-25 10:09       ` Wengang Wang
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 14/20] ocfs2/cluster: Track bitmap of live " Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 15/20] ocfs2/cluster: Maintain bitmap of quorum regions Sunil Mushran
2010-09-23 22:34   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 16/20] ocfs2/cluster: Maintain bitmap of failed regions Sunil Mushran
2010-09-23 22:35   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 17/20] ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 18/20] ocfs2/cluster: Create debugfs dir/files for each region Sunil Mushran
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 19/20] ocfs2/cluster: Add printks to show heartbeat up/down events Sunil Mushran
2010-09-23 22:36   ` Joel Becker
2010-09-14 22:50 ` [Ocfs2-devel] [PATCH 20/20] ocfs2/cluster: Show per region heartbeat elapsed time Sunil Mushran
2010-09-23 22:37 ` [Ocfs2-devel] Global Heartbeat - fs patches Joel Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.