All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] fenced: don't ignore victim_done messages for reduced victims
@ 2011-02-22 22:01 David Teigland
  2011-02-22 22:26 ` Ryan O'Hara
  0 siblings, 1 reply; 2+ messages in thread
From: David Teigland @ 2011-02-22 22:01 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Needs ACK for RHEL6.


When a victim is "reduced" (i.e. fenced skips fencing it because it
rejoins the cluster cleanly before fenced fences it), it is immediately
removed from the list of victims, before the "victim_done" message is
sent for it.  The victim_done message updates the time of the last
successful fencing operation for a failed node.

The code that processes received victim_done messages was ignoring the
message for the reduced victim because the node couldn't be found in
the victims list.  This caused the latest fencing information to not be
recorded for the node, causing dlm_controld to wait indefinately for
fencing to complete for the reduced victim.

The fix is to simply record the information from a victim_done message
even if the node is not in the victims list.

bz 678704

Signed-off-by: David Teigland <teigland@redhat.com>
---
 fence/fenced/cpg.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c
index a8629b9..99e16a0 100644
--- a/fence/fenced/cpg.c
+++ b/fence/fenced/cpg.c
@@ -652,9 +652,9 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
 
 	node = get_node_victim(fd, id->nodeid);
 	if (!node) {
+		/* see comment below about no node */
 		log_debug("receive_victim_done %d:%u no victim nodeid %d",
 			  hd->nodeid, seq, id->nodeid);
-		return;
 	}
 
 	log_debug("receive_victim_done %d:%u remove victim %d time %llu how %d",
@@ -670,9 +670,11 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
 	if (hd->nodeid == our_nodeid) {
 		/* sanity check, I don't think this should happen;
 		   see comment in fence_victims() */
-		if (!node->local_victim_done)
-			log_error("expect local_victim_done");
-		node->local_victim_done = 0;
+		if (node) {
+			if (!node->local_victim_done)
+				log_error("expect local_victim_done");
+			node->local_victim_done = 0;
+		}
 	} else {
 		/* save details of fencing operation from master, which
 		   master saves at the time it completes it */
@@ -680,8 +682,12 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
 				   id->fence_how, id->fence_time);
 	}
 
-	list_del(&node->list);
-	free(node);
+	/* we can have no node when reduce_victims() removes it, bz 678704 */
+
+	if (node) {
+		list_del(&node->list);
+		free(node);
+	}
 }
 
 /* we know that the quorum value here is consistent with the cpg events
-- 
1.7.1.1



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Cluster-devel] fenced: don't ignore victim_done messages for reduced victims
  2011-02-22 22:01 [Cluster-devel] fenced: don't ignore victim_done messages for reduced victims David Teigland
@ 2011-02-22 22:26 ` Ryan O'Hara
  0 siblings, 0 replies; 2+ messages in thread
From: Ryan O'Hara @ 2011-02-22 22:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Looks correct to me. ACK.

On Tue, Feb 22, 2011 at 05:01:27PM -0500, David Teigland wrote:
> 
> Needs ACK for RHEL6.
> 
> 
> When a victim is "reduced" (i.e. fenced skips fencing it because it
> rejoins the cluster cleanly before fenced fences it), it is immediately
> removed from the list of victims, before the "victim_done" message is
> sent for it.  The victim_done message updates the time of the last
> successful fencing operation for a failed node.
> 
> The code that processes received victim_done messages was ignoring the
> message for the reduced victim because the node couldn't be found in
> the victims list.  This caused the latest fencing information to not be
> recorded for the node, causing dlm_controld to wait indefinately for
> fencing to complete for the reduced victim.
> 
> The fix is to simply record the information from a victim_done message
> even if the node is not in the victims list.
> 
> bz 678704
> 
> Signed-off-by: David Teigland <teigland@redhat.com>
> ---
>  fence/fenced/cpg.c |   18 ++++++++++++------
>  1 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c
> index a8629b9..99e16a0 100644
> --- a/fence/fenced/cpg.c
> +++ b/fence/fenced/cpg.c
> @@ -652,9 +652,9 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>  
>  	node = get_node_victim(fd, id->nodeid);
>  	if (!node) {
> +		/* see comment below about no node */
>  		log_debug("receive_victim_done %d:%u no victim nodeid %d",
>  			  hd->nodeid, seq, id->nodeid);
> -		return;
>  	}
>  
>  	log_debug("receive_victim_done %d:%u remove victim %d time %llu how %d",
> @@ -670,9 +670,11 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>  	if (hd->nodeid == our_nodeid) {
>  		/* sanity check, I don't think this should happen;
>  		   see comment in fence_victims() */
> -		if (!node->local_victim_done)
> -			log_error("expect local_victim_done");
> -		node->local_victim_done = 0;
> +		if (node) {
> +			if (!node->local_victim_done)
> +				log_error("expect local_victim_done");
> +			node->local_victim_done = 0;
> +		}
>  	} else {
>  		/* save details of fencing operation from master, which
>  		   master saves at the time it completes it */
> @@ -680,8 +682,12 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>  				   id->fence_how, id->fence_time);
>  	}
>  
> -	list_del(&node->list);
> -	free(node);
> +	/* we can have no node when reduce_victims() removes it, bz 678704 */
> +
> +	if (node) {
> +		list_del(&node->list);
> +		free(node);
> +	}
>  }
>  
>  /* we know that the quorum value here is consistent with the cpg events
> -- 
> 1.7.1.1



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-02-22 22:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-22 22:01 [Cluster-devel] fenced: don't ignore victim_done messages for reduced victims David Teigland
2011-02-22 22:26 ` Ryan O'Hara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.