All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-02-15 22:49 lhh
  0 siblings, 0 replies; 21+ messages in thread
From: lhh @ 2007-02-15 22:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL5
Changes by:	lhh at sourceware.org	2007-02-15 22:49:33

Added files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Add LVM failover agent; by Jon Brassow

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=NONE&r2=1.1.6.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2008-02-06 17:43 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2008-02-06 17:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2008-02-06 17:43:33

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	- Bug 431705: HA LVM should prevent users from running an invalid setup (2)
	- better checking for improper setup
	-- this time for presence of fail-over VG in the volume_list

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.13&r2=1.14

--- cluster/rgmanager/src/resources/lvm.sh	2008/02/06 16:40:27	1.13
+++ cluster/rgmanager/src/resources/lvm.sh	2008/02/06 17:43:33	1.14
@@ -84,7 +84,8 @@
 	##
 	# Fixme: we might be able to perform a better check...
 	if [ "$(find /boot/*.img -newer /etc/lvm/lvm.conf)" == "" ]; then
-		ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+		ocf_log err "HA LVM:  Improper setup detected"
+		ocf_log err "- initrd image needs to be newer than lvm.conf"
 		return $OCF_ERR_GENERIC
 	fi
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2008-02-06 16:40 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2008-02-06 16:40 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2008-02-06 16:40:27

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	- better checking for improper setup
	-- this time for presence of fail-over VG in the volume_list

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.12&r2=1.13

--- cluster/rgmanager/src/resources/lvm.sh	2008/01/03 21:02:53	1.12
+++ cluster/rgmanager/src/resources/lvm.sh	2008/02/06 16:40:27	1.13
@@ -56,11 +56,24 @@
 {
 	##
 	# Machine's cluster node name must be present as
-	# a tag in lvm.conf:activation/volume_list
+	# a tag in lvm.conf:activation/volume_list and the volume group
+	# to be failed over must NOT be there.
 	##
-	if ! lvm dumpconfig activation/volume_list >& /dev/null ||
-	   ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then
-		ocf_log err "lvm.conf improperly configured for HA LVM."
+	if ! lvm dumpconfig activation/volume_list >& /dev/null; then
+		ocf_log err "HA LVM:  Improper setup detected"
+		ocf_log err "- \"volume_list\" not specified in lvm.conf."
+		return $OCF_ERR_GENERIC
+	fi
+
+	if ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then
+		ocf_log err "HA LVM:  Improper setup detected"
+		ocf_log err "- @$(local_node_name) missing from \"volume_list\" in lvm.conf"
+		return $OCF_ERR_GENERIC
+	fi
+
+	if lvm dumpconfig activation/volume_list | grep $OCF_RESKEY_vg_name; then
+		ocf_log err "HA LVM:  Improper setup detected"
+		ocf_log err "- $OCF_RESKEY_vg_name found in \"volume_list\" in lvm.conf"
 		return $OCF_ERR_GENERIC
 	fi
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2008-01-03 20:56 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2008-01-03 20:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2008-01-03 20:56:49

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	s/validate/verify/

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.10&r2=1.11

--- cluster/rgmanager/src/resources/lvm.sh	2008/01/03 20:35:39	1.10
+++ cluster/rgmanager/src/resources/lvm.sh	2008/01/03 20:56:49	1.11
@@ -146,7 +146,7 @@
 	rv=0
 	;;
 
-validate-all)
+verify-all)
 	##
 	# We can safely ignore clustered volume groups (VGs handled by CLVM)
 	##
@@ -163,7 +163,7 @@
 	rv=0
 	;;
 *)
-	echo "usage: $0 {start|status|monitor|stop|restart|meta-data|validate-all}"
+	echo "usage: $0 {start|status|monitor|stop|restart|meta-data|verify-all}"
 	exit $OCF_ERR_UNIMPLEMENTED
 	;;
 esac



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-07-02 21:59 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-07-02 21:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	jbrassow at sourceware.org	2007-07-02 21:59:34

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	<Previous check-in>
	Require vg_name to be unique.  Allowing multiple LVs from the same VG
	on different machines can lead to races when updating metadata during
	device failures.
	</Previous check-in>
	
	We can do better.  This patch puts the validation in lvm.sh so that
	it can print out a understandable error message.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.1.2.6&r2=1.1.2.7

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/29 14:35:19	1.1.2.6
+++ cluster/rgmanager/src/resources/lvm.sh	2007/07/02 21:59:34	1.1.2.7
@@ -71,7 +71,7 @@
 	    <content type="string"/>
         </parameter>
 
-        <parameter name="vg_name" required="1" unique="1">
+        <parameter name="vg_name" required="1">
             <longdesc lang="en">
                 If you can see this, your GUI is broken.
             </longdesc>
@@ -465,6 +465,17 @@
 		exit 0
 	fi
 
+	if ! lvs $OCF_RESKEY_vg_name >& /dev/null; then
+		lv_count=0
+	else
+		lv_count=`lvs --noheadings -o name $OCF_RESKEY_vg_name | grep -v _mlog | grep -v _mimage | grep -v nconsistent | wc -l`
+	fi
+	if [ $lv_count -gt 1 ]; then
+		ocf_log err "HA LVM requires Only one logical volume per volume group."
+		ocf_log err "There are currently $lv_count logical volumes in $OCF_RESKEY_vg_name"
+		ocf_log err "Failing HA LVM start of $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+		exit $OCF_ERR_GENERIC
+	fi
 	ha_lvm_proper_setup_check || exit 1
 		
 	if [ -z $OCF_RESKEY_lv_name ]; then



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-07-02 21:59 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-07-02 21:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL5
Changes by:	jbrassow at sourceware.org	2007-07-02 21:59:04

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	<Previous check-in>
	Require vg_name to be unique.  Allowing multiple LVs from the same VG
	on different machines can lead to races when updating metadata during
	device failures.
	</Previous check-in>
	
	We can do better.  This patch puts the validation in lvm.sh so that
	it can print out a understandable error message.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=1.1.6.5&r2=1.1.6.6

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/29 14:37:00	1.1.6.5
+++ cluster/rgmanager/src/resources/lvm.sh	2007/07/02 21:59:04	1.1.6.6
@@ -71,7 +71,7 @@
 	    <content type="string"/>
         </parameter>
 
-        <parameter name="vg_name" required="1" unique="1">
+        <parameter name="vg_name" required="1">
             <longdesc lang="en">
                 If you can see this, your GUI is broken.
             </longdesc>
@@ -465,6 +465,17 @@
 		exit 0
 	fi
 
+	if ! lvs $OCF_RESKEY_vg_name >& /dev/null; then
+		lv_count=0
+	else
+		lv_count=`lvs --noheadings -o name $OCF_RESKEY_vg_name | grep -v _mlog | grep -v _mimage | grep -v nconsistent | wc -l`
+	fi
+	if [ $lv_count -gt 1 ]; then
+		ocf_log err "HA LVM requires Only one logical volume per volume group."
+		ocf_log err "There are currently $lv_count logical volumes in $OCF_RESKEY_vg_name"
+		ocf_log err "Failing HA LVM start of $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+		exit $OCF_ERR_GENERIC
+	fi
 	ha_lvm_proper_setup_check || exit 1
 		
 	if [ -z $OCF_RESKEY_lv_name ]; then



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-07-02 21:58 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-07-02 21:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2007-07-02 21:58:34

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	<Previous check-in>
	Require vg_name to be unique.  Allowing multiple LVs from the same VG
	on different machines can lead to races when updating metadata during
	device failures.
	</Previous check-in>
	
	We can do better.  This patch puts the validation in lvm.sh so that
	it can print out a understandable error message.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.8&r2=1.9

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/29 14:33:52	1.8
+++ cluster/rgmanager/src/resources/lvm.sh	2007/07/02 21:58:34	1.9
@@ -71,7 +71,7 @@
 	    <content type="string"/>
         </parameter>
 
-        <parameter name="vg_name" required="1" unique="1">
+        <parameter name="vg_name" required="1">
             <longdesc lang="en">
                 If you can see this, your GUI is broken.
             </longdesc>
@@ -465,6 +465,17 @@
 		exit 0
 	fi
 
+	if ! lvs $OCF_RESKEY_vg_name >& /dev/null; then
+		lv_count=0
+	else
+		lv_count=`lvs --noheadings -o name $OCF_RESKEY_vg_name | grep -v _mlog | grep -v _mimage | grep -v nconsistent | wc -l`
+	fi
+	if [ $lv_count -gt 1 ]; then
+		ocf_log err "HA LVM requires Only one logical volume per volume group."
+		ocf_log err "There are currently $lv_count logical volumes in $OCF_RESKEY_vg_name"
+		ocf_log err "Failing HA LVM start of $OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name"
+		exit $OCF_ERR_GENERIC
+	fi
 	ha_lvm_proper_setup_check || exit 1
 		
 	if [ -z $OCF_RESKEY_lv_name ]; then



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-29 14:37 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-29 14:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL5
Changes by:	jbrassow at sourceware.org	2007-05-29 14:37:00

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Bug 241673
	
	Require vg_name to be unique.  Allowing multiple LVs from the same VG
	on different machines can lead to races when updating metadata during
	device failures.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=1.1.6.4&r2=1.1.6.5

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/09 20:51:30	1.1.6.4
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/29 14:37:00	1.1.6.5
@@ -71,7 +71,7 @@
 	    <content type="string"/>
         </parameter>
 
-        <parameter name="vg_name" required="1">
+        <parameter name="vg_name" required="1" unique="1">
             <longdesc lang="en">
                 If you can see this, your GUI is broken.
             </longdesc>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-29 14:35 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-29 14:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	jbrassow at sourceware.org	2007-05-29 14:35:19

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Bug 240874
	
	Require vg_name to be unique.  Allowing multiple LVs from the same VG
	on different machines can lead to races when updating metadata during
	device failures.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.1.2.5&r2=1.1.2.6

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/09 20:50:51	1.1.2.5
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/29 14:35:19	1.1.2.6
@@ -71,7 +71,7 @@
 	    <content type="string"/>
         </parameter>
 
-        <parameter name="vg_name" required="1">
+        <parameter name="vg_name" required="1" unique="1">
             <longdesc lang="en">
                 If you can see this, your GUI is broken.
             </longdesc>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-29 14:33 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-29 14:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2007-05-29 14:33:52

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Require vg_name to be unique.  Allowing multiple LVs from the same VG
	on different machines can lead to races when updating metadata during
	device failures.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.7&r2=1.8

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/09 20:48:35	1.7
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/29 14:33:52	1.8
@@ -71,7 +71,7 @@
 	    <content type="string"/>
         </parameter>
 
-        <parameter name="vg_name" required="1">
+        <parameter name="vg_name" required="1" unique="1">
             <longdesc lang="en">
                 If you can see this, your GUI is broken.
             </longdesc>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-09 20:51 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-09 20:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL5
Changes by:	jbrassow at sourceware.org	2007-05-09 20:51:31

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	If misconfigured, HA LVM + mirroring can cause data corruption.  We should
	attempt to catch configuration errors before allowing LVM resources to start.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=1.1.6.3&r2=1.1.6.4

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/09 18:04:19	1.1.6.3
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/09 20:51:30	1.1.6.4
@@ -432,6 +432,32 @@
 	return $OCF_SUCCESS
 }
 
+ha_lvm_proper_setup_check()
+{
+	# First, let's check that they have setup their lvm.conf correctly
+	if ! lvm dumpconfig activation/volume_list >& /dev/null ||
+	   ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then
+		ocf_log err "lvm.conf improperly configured for HA LVM."
+		return $OCF_ERR_GENERIC
+	fi
+
+	# Next, we need to ensure that their initrd has been updated
+	if [ -e /boot/initrd-`uname -r`.img ]; then
+		if [ "$(find /boot/initrd-`uname -r`.img -newer /etc/lvm/lvm.conf)" == "" ]; then
+			ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+			return $OCF_ERR_GENERIC
+		fi
+	else
+		# Best guess...
+		if [ "$(find /boot/*.img -newer /etc/lvm/lvm.conf)" == "" ]; then
+			ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+			return $OCF_ERR_GENERIC
+		fi
+	fi
+
+	return $OCF_SUCCESS
+}
+
 case $1 in
 start)
 	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
@@ -439,6 +465,8 @@
 		exit 0
 	fi
 
+	ha_lvm_proper_setup_check || exit 1
+		
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate start || exit 1
 	else
@@ -462,6 +490,10 @@
 		exit 0
 	fi
 
+	if ! ha_lvm_proper_setup_check; then
+		ocf_log err "WARNING: An improper setup can cause data corruption!"
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate stop || exit 1
 	else



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-09 20:50 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-09 20:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	jbrassow at sourceware.org	2007-05-09 20:50:51

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	If misconfigured, HA LVM + mirroring can cause data corruption.  We should
	attempt to catch configuration errors before allowing LVM resources to start.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.1.2.4&r2=1.1.2.5

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/09 18:03:28	1.1.2.4
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/09 20:50:51	1.1.2.5
@@ -432,6 +432,32 @@
 	return $OCF_SUCCESS
 }
 
+ha_lvm_proper_setup_check()
+{
+	# First, let's check that they have setup their lvm.conf correctly
+	if ! lvm dumpconfig activation/volume_list >& /dev/null ||
+	   ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then
+		ocf_log err "lvm.conf improperly configured for HA LVM."
+		return $OCF_ERR_GENERIC
+	fi
+
+	# Next, we need to ensure that their initrd has been updated
+	if [ -e /boot/initrd-`uname -r`.img ]; then
+		if [ "$(find /boot/initrd-`uname -r`.img -newer /etc/lvm/lvm.conf)" == "" ]; then
+			ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+			return $OCF_ERR_GENERIC
+		fi
+	else
+		# Best guess...
+		if [ "$(find /boot/*.img -newer /etc/lvm/lvm.conf)" == "" ]; then
+			ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+			return $OCF_ERR_GENERIC
+		fi
+	fi
+
+	return $OCF_SUCCESS
+}
+
 case $1 in
 start)
 	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
@@ -439,6 +465,8 @@
 		exit 0
 	fi
 
+	ha_lvm_proper_setup_check || exit 1
+		
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate start || exit 1
 	else
@@ -462,6 +490,10 @@
 		exit 0
 	fi
 
+	if ! ha_lvm_proper_setup_check; then
+		ocf_log err "WARNING: An improper setup can cause data corruption!"
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate stop || exit 1
 	else



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-09 20:48 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-09 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2007-05-09 20:48:35

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	If misconfigured, HA LVM + mirroring can cause data corruption.  We should
	attempt to catch configuration errors before allowing LVM resources to start.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.6&r2=1.7

--- cluster/rgmanager/src/resources/lvm.sh	2007/05/09 18:00:44	1.6
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/09 20:48:35	1.7
@@ -432,6 +432,32 @@
 	return $OCF_SUCCESS
 }
 
+ha_lvm_proper_setup_check()
+{
+	# First, let's check that they have setup their lvm.conf correctly
+	if ! lvm dumpconfig activation/volume_list >& /dev/null ||
+	   ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then
+		ocf_log err "lvm.conf improperly configured for HA LVM."
+		return $OCF_ERR_GENERIC
+	fi
+
+	# Next, we need to ensure that their initrd has been updated
+	if [ -e /boot/initrd-`uname -r`.img ]; then
+		if [ "$(find /boot/initrd-`uname -r`.img -newer /etc/lvm/lvm.conf)" == "" ]; then
+			ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+			return $OCF_ERR_GENERIC
+		fi
+	else
+		# Best guess...
+		if [ "$(find /boot/*.img -newer /etc/lvm/lvm.conf)" == "" ]; then
+			ocf_log err "HA LVM requires the initrd image to be newer than lvm.conf"
+			return $OCF_ERR_GENERIC
+		fi
+	fi
+
+	return $OCF_SUCCESS
+}
+
 case $1 in
 start)
 	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
@@ -439,6 +465,8 @@
 		exit 0
 	fi
 
+	ha_lvm_proper_setup_check || exit 1
+		
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate start || exit 1
 	else
@@ -462,6 +490,10 @@
 		exit 0
 	fi
 
+	if ! ha_lvm_proper_setup_check; then
+		ocf_log err "WARNING: An improper setup can cause data corruption!"
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate stop || exit 1
 	else



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-09 18:04 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-09 18:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL5
Changes by:	jbrassow at sourceware.org	2007-05-09 18:04:19

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	People seem to think that they have to setup lvm in rgmanager even though they
	are using clvm.  This causes the two to collide during use.
	
	The HA LVM resource script should detect if a volume is clustered and ignore it.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=1.1.6.2&r2=1.1.6.3

--- cluster/rgmanager/src/resources/lvm.sh	2007/04/18 19:14:21	1.1.6.2
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/09 18:04:19	1.1.6.3
@@ -236,7 +236,12 @@
 	# Check if device is active
 	#
 	if [[ ! $(lvs -o attr --noheadings $lv_path) =~ ....a. ]]; then
-	    return $OCF_ERR_GENERIC
+		return $OCF_ERR_GENERIC
+	fi
+
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		return $OCF_SUCCESS
 	fi
 
 	#
@@ -429,6 +434,11 @@
 
 case $1 in
 start)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate start || exit 1
 	else
@@ -447,6 +457,11 @@
 	;;
 		    
 stop)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate stop || exit 1
 	else
@@ -467,6 +482,11 @@
 	;;
 
 verify-all)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	verify_all
 	rv=$?
 	;;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-09 18:03 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-09 18:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	jbrassow at sourceware.org	2007-05-09 18:03:28

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	People seem to think that they have to setup lvm in rgmanager even though they
	are using clvm.  This causes the two to collide during use.
	
	The HA LVM resource script should detect if a volume is clustered and ignore it.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.1.2.3&r2=1.1.2.4

--- cluster/rgmanager/src/resources/lvm.sh	2007/04/18 17:02:05	1.1.2.3
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/09 18:03:28	1.1.2.4
@@ -236,7 +236,12 @@
 	# Check if device is active
 	#
 	if [[ ! $(lvs -o attr --noheadings $lv_path) =~ ....a. ]]; then
-	    return $OCF_ERR_GENERIC
+		return $OCF_ERR_GENERIC
+	fi
+
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		return $OCF_SUCCESS
 	fi
 
 	#
@@ -429,6 +434,11 @@
 
 case $1 in
 start)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate start || exit 1
 	else
@@ -447,6 +457,11 @@
 	;;
 		    
 stop)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate stop || exit 1
 	else
@@ -467,6 +482,11 @@
 	;;
 
 verify-all)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	verify_all
 	rv=$?
 	;;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-05-09 18:00 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-05-09 18:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2007-05-09 18:00:45

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	People seem to think that they have to setup lvm in rgmanager even though they
	are using clvm.  This causes the two to collide during use.
	
	The HA LVM resource script should detect if a volume is clustered and ignore it.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.5&r2=1.6

--- cluster/rgmanager/src/resources/lvm.sh	2007/04/18 18:14:56	1.5
+++ cluster/rgmanager/src/resources/lvm.sh	2007/05/09 18:00:44	1.6
@@ -236,7 +236,12 @@
 	# Check if device is active
 	#
 	if [[ ! $(lvs -o attr --noheadings $lv_path) =~ ....a. ]]; then
-	    return $OCF_ERR_GENERIC
+		return $OCF_ERR_GENERIC
+	fi
+
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		return $OCF_SUCCESS
 	fi
 
 	#
@@ -429,6 +434,11 @@
 
 case $1 in
 start)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate start || exit 1
 	else
@@ -447,6 +457,11 @@
 	;;
 		    
 stop)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	if [ -z $OCF_RESKEY_lv_name ]; then
 		vg_activate stop || exit 1
 	else
@@ -467,6 +482,11 @@
 	;;
 
 validate-all)
+	if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then
+		ocf_log notice "$OCF_RESKEY_vg_name is a cluster volume.  Ignoring..."
+		exit 0
+	fi
+
 	verify_all
 	rv=$?
 	;;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-04-18 19:14 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-04-18 19:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL5
Changes by:	jbrassow at sourceware.org	2007-04-18 20:14:22

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Bug 236580: [HA LVM]: Bringing site back on-line after failure causes pr...
	
	Setup:
	- 2 interconnected sites
	- each site has a disk and a machine
	- LVM mirroring is used to mirror the disks from the sites
	
	When one site fails, the LVM happily moves over to the second site -
	removing the failed disk from the VG that was part of the failed
	site.  However, when the failed site is restored and the service
	attempts to move back to the original machine, it fails because
	of the conflicts in LVM metadata on the disks.
	
	This fix allows the LV to be reactivated on the original node
	by filtering out the devices which have stale metadata (i.e
	the device that was removed during the failure).

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=1.1.6.1&r2=1.1.6.2

--- cluster/rgmanager/src/resources/lvm.sh	2007/02/15 22:49:33	1.1.6.1
+++ cluster/rgmanager/src/resources/lvm.sh	2007/04/18 19:14:21	1.1.6.2
@@ -149,6 +149,78 @@
 	return $OCF_ERR_GENERIC
 }
 
+# lvm_exec_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  This function filters out those
+# failed devices when executing the given command
+#
+# Finishing with vgscan resets the cache/filter
+lvm_exec_resilient()
+{
+	declare command=$1
+	declare all_pvs
+
+	ocf_log notice "Making resilient : $command"
+
+	if [ -z $command ]; then
+		ocf_log err "lvm_exec_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	# pvs will print out only those devices that are valid
+	# If a device dies and comes back, it will not appear
+	# in pvs output (but you will get a Warning).
+	all_pvs=(`pvs --noheadings -o pv_name | grep -v Warning`)
+
+	# Now we use those valid devices in a filter which we set up.
+	# The device will then be activated because there are no
+	# metadata conflicts.
+        command=$command" --config devices{filter=[";
+	for i in ${all_pvs[*]}; do
+		command=$command'"a|'$i'|",'
+	done
+	command=$command"\"r|.*|\"]}"
+
+	ocf_log notice "Resilient command: $command"
+	if ! $command ; then
+		ocf_log err "lvm_exec_resilient failed"
+		vgscan
+		return $OCF_ERR_GENERIC
+	else
+		vgscan
+		return $OCF_SUCCESS
+	fi
+}
+
+# lv_activate_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  We must filter out those
+# failed devices when trying to reactivate
+lv_activate_resilient()
+{
+	declare action=$1
+	declare lv_path=$2
+	declare op="-ay"
+
+	if [ -z $action ] || [ -z $lv_path ]; then
+		ocf_log err "lv_activate_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	if [ $action != "start" ]; then
+	        op="-an"
+	fi
+
+	if ! lvm_exec_resilient "lvchange $op $lv_path" ; then
+		ocf_log err "lv_activate_resilient $action failed on $lv_path"
+		return $OCF_ERR_GENERIC
+	else
+		return $OCF_SUCCESS
+	fi
+}
+
 # lv_status
 #
 # Is the LV active?
@@ -163,7 +235,7 @@
 	#
 	# Check if device is active
 	#
-	if [[ ! $(lvs -o attr --noheadings vg/mirror) =~ ....a. ]]; then
+	if [[ ! $(lvs -o attr --noheadings $lv_path) =~ ....a. ]]; then
 	    return $OCF_ERR_GENERIC
 	fi
 
@@ -203,7 +275,7 @@
 		ocf_log err "WARNING: $my_name does not own $lv_path"
 		ocf_log err "WARNING: Attempting shutdown of $lv_path"
 
-		lvchange -an $lv_path
+		lv_activate_resilient "stop" $lv_path
 		return $OCF_ERR_GENERIC
 	fi
 
@@ -229,15 +301,14 @@
 			ocf_log err "Unable to add tag to $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
-		lvchange -ay $lv_path
-		if [ $? -ne 0 ]; then
+
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to activate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
 	else
 		ocf_log notice "Deactivating $lv_path"
-		lvchange -an $lv_path
-		if [ $? -ne 0 ]; then
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to deactivate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-04-18 18:14 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-04-18 18:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Changes by:	jbrassow at sourceware.org	2007-04-18 19:14:56

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Bug 236580: [HA LVM]: Bringing site back on-line after failure causes pr...
	
	Setup:
	- 2 interconnected sites
	- each site has a disk and a machine
	- LVM mirroring is used to mirror the disks from the sites
	
	When one site fails, the LVM happily moves over to the second site -
	removing the failed disk from the VG that was part of the failed
	site.  However, when the failed site is restored and the service
	attempts to move back to the original machine, it fails because
	of the conflicts in LVM metadata on the disks.
	
	This fix allows the LV to be reactivated on the original node
	by filtering out the devices which have stale metadata (i.e
	the device that was removed during the failure).

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&r1=1.4&r2=1.5

--- cluster/rgmanager/src/resources/lvm.sh	2007/04/05 15:08:20	1.4
+++ cluster/rgmanager/src/resources/lvm.sh	2007/04/18 18:14:56	1.5
@@ -149,6 +149,78 @@
 	return $OCF_ERR_GENERIC
 }
 
+# lvm_exec_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  This function filters out those
+# failed devices when executing the given command
+#
+# Finishing with vgscan resets the cache/filter
+lvm_exec_resilient()
+{
+	declare command=$1
+	declare all_pvs
+
+	ocf_log notice "Making resilient : $command"
+
+	if [ -z $command ]; then
+		ocf_log err "lvm_exec_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	# pvs will print out only those devices that are valid
+	# If a device dies and comes back, it will not appear
+	# in pvs output (but you will get a Warning).
+	all_pvs=(`pvs --noheadings -o pv_name | grep -v Warning`)
+
+	# Now we use those valid devices in a filter which we set up.
+	# The device will then be activated because there are no
+	# metadata conflicts.
+        command=$command" --config devices{filter=[";
+	for i in ${all_pvs[*]}; do
+		command=$command'"a|'$i'|",'
+	done
+	command=$command"\"r|.*|\"]}"
+
+	ocf_log notice "Resilient command: $command"
+	if ! $command ; then
+		ocf_log err "lvm_exec_resilient failed"
+		vgscan
+		return $OCF_ERR_GENERIC
+	else
+		vgscan
+		return $OCF_SUCCESS
+	fi
+}
+
+# lv_activate_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  We must filter out those
+# failed devices when trying to reactivate
+lv_activate_resilient()
+{
+	declare action=$1
+	declare lv_path=$2
+	declare op="-ay"
+
+	if [ -z $action ] || [ -z $lv_path ]; then
+		ocf_log err "lv_activate_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	if [ $action != "start" ]; then
+	        op="-an"
+	fi
+
+	if ! lvm_exec_resilient "lvchange $op $lv_path" ; then
+		ocf_log err "lv_activate_resilient $action failed on $lv_path"
+		return $OCF_ERR_GENERIC
+	else
+		return $OCF_SUCCESS
+	fi
+}
+
 # lv_status
 #
 # Is the LV active?
@@ -203,7 +275,7 @@
 		ocf_log err "WARNING: $my_name does not own $lv_path"
 		ocf_log err "WARNING: Attempting shutdown of $lv_path"
 
-		lvchange -an $lv_path
+		lv_activate_resilient "stop" $lv_path
 		return $OCF_ERR_GENERIC
 	fi
 
@@ -229,15 +301,14 @@
 			ocf_log err "Unable to add tag to $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
-		lvchange -ay $lv_path
-		if [ $? -ne 0 ]; then
+
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to activate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
 	else
 		ocf_log notice "Deactivating $lv_path"
-		lvchange -an $lv_path
-		if [ $? -ne 0 ]; then
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to deactivate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-04-18 18:09 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-04-18 18:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL45
Changes by:	jbrassow at sourceware.org	2007-04-18 19:09:12

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Bug 236580: [HA LVM]: Bringing site back on-line after failure causes pr...
	
	Setup:
	- 2 interconnected sites
	- each site has a disk and a machine
	- LVM mirroring is used to mirror the disks from the sites
	
	When one site fails, the LVM happily moves over to the second site -
	removing the failed disk from the VG that was part of the failed
	site.  However, when the failed site is restored and the service
	attempts to move back to the original machine, it fails because
	of the conflicts in LVM metadata on the disks.
	
	This fix allows the LV to be reactivated on the original node
	by filtering out the devices which have stale metadata (i.e
	the device that was removed during the failure).

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL45&r1=1.1.2.2&r2=1.1.2.2.2.1

--- cluster/rgmanager/src/resources/lvm.sh	2007/03/08 19:37:42	1.1.2.2
+++ cluster/rgmanager/src/resources/lvm.sh	2007/04/18 18:09:12	1.1.2.2.2.1
@@ -149,6 +149,78 @@
 	return $OCF_ERR_GENERIC
 }
 
+# lvm_exec_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  This function filters out those
+# failed devices when executing the given command
+#
+# Finishing with vgscan resets the cache/filter
+lvm_exec_resilient()
+{
+	declare command=$1
+	declare all_pvs
+
+	ocf_log notice "Making resilient : $command"
+
+	if [ -z $command ]; then
+		ocf_log err "lvm_exec_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	# pvs will print out only those devices that are valid
+	# If a device dies and comes back, it will not appear
+	# in pvs output (but you will get a Warning).
+	all_pvs=(`pvs --noheadings -o pv_name | grep -v Warning`)
+
+	# Now we use those valid devices in a filter which we set up.
+	# The device will then be activated because there are no
+	# metadata conflicts.
+        command=$command" --config devices{filter=[";
+	for i in ${all_pvs[*]}; do
+		command=$command'"a|'$i'|",'
+	done
+	command=$command"\"r|.*|\"]}"
+
+	ocf_log notice "Resilient command: $command"
+	if ! $command ; then
+		ocf_log err "lvm_exec_resilient failed"
+		vgscan
+		return $OCF_ERR_GENERIC
+	else
+		vgscan
+		return $OCF_SUCCESS
+	fi
+}
+
+# lv_activate_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  We must filter out those
+# failed devices when trying to reactivate
+lv_activate_resilient()
+{
+	declare action=$1
+	declare lv_path=$2
+	declare op="-ay"
+
+	if [ -z $action ] || [ -z $lv_path ]; then
+		ocf_log err "lv_activate_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	if [ $action != "start" ]; then
+	        op="-an"
+	fi
+
+	if ! lvm_exec_resilient "lvchange $op $lv_path" ; then
+		ocf_log err "lv_activate_resilient $action failed on $lv_path"
+		return $OCF_ERR_GENERIC
+	else
+		return $OCF_SUCCESS
+	fi
+}
+
 # lv_status
 #
 # Is the LV active?
@@ -203,7 +275,7 @@
 		ocf_log err "WARNING: $my_name does not own $lv_path"
 		ocf_log err "WARNING: Attempting shutdown of $lv_path"
 
-		lvchange -an $lv_path
+		lv_activate_resilient "stop" $lv_path
 		return $OCF_ERR_GENERIC
 	fi
 
@@ -229,15 +301,14 @@
 			ocf_log err "Unable to add tag to $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
-		lvchange -ay $lv_path
-		if [ $? -ne 0 ]; then
+
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to activate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
 	else
 		ocf_log notice "Deactivating $lv_path"
-		lvchange -an $lv_path
-		if [ $? -ne 0 ]; then
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to deactivate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-04-18 17:02 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-04-18 17:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	jbrassow at sourceware.org	2007-04-18 18:02:06

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	Bug 236580: [HA LVM]: Bringing site back on-line after failure causes pr...
	
	Setup:
	- 2 interconnected sites
	- each site has a disk and a machine
	- LVM mirroring is used to mirror the disks from the sites
	
	When one site fails, the LVM happily moves over to the second site -
	removing the failed disk from the VG that was part of the failed
	site.  However, when the failed site is restored and the service
	attempts to move back to the original machine, it fails because
	of the conflicts in LVM metadata on the disks.
	
	This fix allows the LV to be reactivated on the original node
	by filtering out the devices which have stale metadata (i.e
	the device that was removed during the failure).

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.1.2.2&r2=1.1.2.3

--- cluster/rgmanager/src/resources/lvm.sh	2007/03/08 19:37:42	1.1.2.2
+++ cluster/rgmanager/src/resources/lvm.sh	2007/04/18 17:02:05	1.1.2.3
@@ -149,6 +149,78 @@
 	return $OCF_ERR_GENERIC
 }
 
+# lvm_exec_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  This function filters out those
+# failed devices when executing the given command
+#
+# Finishing with vgscan resets the cache/filter
+lvm_exec_resilient()
+{
+	declare command=$1
+	declare all_pvs
+
+	ocf_log notice "Making resilient : $command"
+
+	if [ -z $command ]; then
+		ocf_log err "lvm_exec_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	# pvs will print out only those devices that are valid
+	# If a device dies and comes back, it will not appear
+	# in pvs output (but you will get a Warning).
+	all_pvs=(`pvs --noheadings -o pv_name | grep -v Warning`)
+
+	# Now we use those valid devices in a filter which we set up.
+	# The device will then be activated because there are no
+	# metadata conflicts.
+        command=$command" --config devices{filter=[";
+	for i in ${all_pvs[*]}; do
+		command=$command'"a|'$i'|",'
+	done
+	command=$command"\"r|.*|\"]}"
+
+	ocf_log notice "Resilient command: $command"
+	if ! $command ; then
+		ocf_log err "lvm_exec_resilient failed"
+		vgscan
+		return $OCF_ERR_GENERIC
+	else
+		vgscan
+		return $OCF_SUCCESS
+	fi
+}
+
+# lv_activate_resilient
+#
+# Sometimes, devices can come back.  Their metadata will conflict
+# with the good devices that remain.  We must filter out those
+# failed devices when trying to reactivate
+lv_activate_resilient()
+{
+	declare action=$1
+	declare lv_path=$2
+	declare op="-ay"
+
+	if [ -z $action ] || [ -z $lv_path ]; then
+		ocf_log err "lv_activate_resilient: Arguments not supplied"
+		return $OCF_ERR_ARGS
+	fi
+
+	if [ $action != "start" ]; then
+	        op="-an"
+	fi
+
+	if ! lvm_exec_resilient "lvchange $op $lv_path" ; then
+		ocf_log err "lv_activate_resilient $action failed on $lv_path"
+		return $OCF_ERR_GENERIC
+	else
+		return $OCF_SUCCESS
+	fi
+}
+
 # lv_status
 #
 # Is the LV active?
@@ -203,7 +275,7 @@
 		ocf_log err "WARNING: $my_name does not own $lv_path"
 		ocf_log err "WARNING: Attempting shutdown of $lv_path"
 
-		lvchange -an $lv_path
+		lv_activate_resilient "stop" $lv_path
 		return $OCF_ERR_GENERIC
 	fi
 
@@ -229,15 +301,14 @@
 			ocf_log err "Unable to add tag to $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
-		lvchange -ay $lv_path
-		if [ $? -ne 0 ]; then
+
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to activate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi
 	else
 		ocf_log notice "Deactivating $lv_path"
-		lvchange -an $lv_path
-		if [ $? -ne 0 ]; then
+		if ! lv_activate_resilient $action $lv_path; then
 			ocf_log err "Unable to deactivate $lv_path"
 			return $OCF_ERR_GENERIC
 		fi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Cluster-devel] cluster/rgmanager/src/resources lvm.sh
@ 2007-03-08 19:37 jbrassow
  0 siblings, 0 replies; 21+ messages in thread
From: jbrassow @ 2007-03-08 19:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	jbrassow at sourceware.org	2007-03-08 19:37:42

Modified files:
	rgmanager/src/resources: lvm.sh 

Log message:
	bug231408
	
	Fix accidental hard-coded value.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/rgmanager/src/resources/lvm.sh.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.1.2.1&r2=1.1.2.2

--- cluster/rgmanager/src/resources/lvm.sh	2007/02/15 22:46:00	1.1.2.1
+++ cluster/rgmanager/src/resources/lvm.sh	2007/03/08 19:37:42	1.1.2.2
@@ -163,7 +163,7 @@
 	#
 	# Check if device is active
 	#
-	if [[ ! $(lvs -o attr --noheadings vg/mirror) =~ ....a. ]]; then
+	if [[ ! $(lvs -o attr --noheadings $lv_path) =~ ....a. ]]; then
 	    return $OCF_ERR_GENERIC
 	fi
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-02-06 17:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-15 22:49 [Cluster-devel] cluster/rgmanager/src/resources lvm.sh lhh
2007-03-08 19:37 jbrassow
2007-04-18 17:02 jbrassow
2007-04-18 18:09 jbrassow
2007-04-18 18:14 jbrassow
2007-04-18 19:14 jbrassow
2007-05-09 18:00 jbrassow
2007-05-09 18:03 jbrassow
2007-05-09 18:04 jbrassow
2007-05-09 20:48 jbrassow
2007-05-09 20:50 jbrassow
2007-05-09 20:51 jbrassow
2007-05-29 14:33 jbrassow
2007-05-29 14:35 jbrassow
2007-05-29 14:37 jbrassow
2007-07-02 21:58 jbrassow
2007-07-02 21:59 jbrassow
2007-07-02 21:59 jbrassow
2008-01-03 20:56 jbrassow
2008-02-06 16:40 jbrassow
2008-02-06 17:43 jbrassow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.