All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Better clean-up of pvmove failures in a cluster
@ 2014-05-05 21:38 Jonathan Brassow
  2014-05-06  7:54 ` Zdenek Kabelac
  0 siblings, 1 reply; 2+ messages in thread
From: Jonathan Brassow @ 2014-05-05 21:38 UTC (permalink / raw)
  To: lvm-devel

Hi,

I don't see any reason why the following patch would be bad, but I'd
like to hear from others first.  In essence, I'm just trying to improve
the error code path.  This seems to improve things for pvmove when a
node in the cluster is not running 'cmirrord' as it should.  However, it
doesn't clean everything up when attempting to create a full cluster
mirror LV.  So there might still be some work left after this.

 brassow

activation:  Remove empty DM devices when table fails to load.

As part of better error handling, remove DM devices that have been
successfully created but failed to load a table.  This can happen
when pvmove'ing in a cluster and the cluster mirror daemon is not
running on a remote node - the mapping table failing to load as a
result.  In this case, any revert would work on other nodes running
cmirrord because the DM devices on those nodes did succeed in loading.
However, because no table was able to load on the non-cmirrord nodes,
there is no table present that points to what needs to be reverted.
This causes the empty DM device to remain on the system without being
present in any LVM representation.

Index: lvm2/libdm/libdm-deptree.c
===================================================================
--- lvm2.orig/libdm/libdm-deptree.c
+++ lvm2/libdm/libdm-deptree.c
@@ -1929,6 +1929,26 @@ out:
 	return r;
 }
 
+/*
+ * _remove_node
+ *
+ * This function is only used to remove a DM device that has failed
+ * to load any table.
+ */
+static int _remove_node(struct dm_tree_node *dnode)
+{
+	if (!dnode->info.exists)
+		return 1;
+
+	if (dnode->info.live_table || dnode->info.inactive_table) {
+		log_error(INTERNAL_ERROR
+			  "_remove_node called on device with loaded table(s).");
+		return 0;
+	}
+
+	return _deactivate_node(dnode->name, dnode->info.major,
+				dnode->info.minor, NULL, 0, 0);
+}
 
 static int _build_dev_string(char *devbuf, size_t bufsize, struct dm_tree_node *node)
 {
@@ -2662,7 +2682,7 @@ int dm_tree_preload_children(struct dm_t
 			     const char *uuid_prefix,
 			     size_t uuid_prefix_len)
 {
-	int r = 1;
+	int r = 1, node_created = 0;
 	void *handle = NULL;
 	struct dm_tree_node *child;
 	struct dm_info newinfo;
@@ -2684,13 +2704,16 @@ int dm_tree_preload_children(struct dm_t
 				return_0;
 
 		/* FIXME Cope if name exists with no uuid? */
-		if (!child->info.exists && !_create_node(child))
+		if (!child->info.exists && !(node_created = _create_node(child)))
 			return_0;
 
 		if (!child->info.inactive_table &&
 		    child->props.segment_count &&
-		    !_load_node(child))
+		    !_load_node(child)) {
+			if (node_created && !_remove_node(child))
+				return_0;
 			return_0;
+		}
 
 		/* Propagate device size change change */
 		if (child->props.size_changed)




^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH] Better clean-up of pvmove failures in a cluster
  2014-05-05 21:38 [PATCH] Better clean-up of pvmove failures in a cluster Jonathan Brassow
@ 2014-05-06  7:54 ` Zdenek Kabelac
  0 siblings, 0 replies; 2+ messages in thread
From: Zdenek Kabelac @ 2014-05-06  7:54 UTC (permalink / raw)
  To: lvm-devel

Dne 5.5.2014 23:38, Jonathan Brassow napsal(a):
> Hi,
>
> I don't see any reason why the following patch would be bad, but I'd
> like to hear from others first.  In essence, I'm just trying to improve
> the error code path.  This seems to improve things for pvmove when a
> node in the cluster is not running 'cmirrord' as it should.  However, it
> doesn't clean everything up when attempting to create a full cluster
> mirror LV.  So there might still be some work left after this.

I've already tried to add some initial recovery support for error path.

If you take a look into the upstream libdm-deptee code - there is already
dnode->activated  list for recovery path - which track just newly introduced
devices.

The problem is however much more complicated - it should work as transaction,
since multiple device may get changed - and all changes should be
reverted properly.

>
>   brassow
>
> activation:  Remove empty DM devices when table fails to load.

There is postprocessing routine which should cleanup any left
empty entries in the table.

I'm currently working on more generic solution.

Zdenek



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-05-06  7:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-05 21:38 [PATCH] Better clean-up of pvmove failures in a cluster Jonathan Brassow
2014-05-06  7:54 ` Zdenek Kabelac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.