All of lore.kernel.org
 help / color / mirror / Atom feed
* clustered snapshots
@ 2009-09-28 23:16 Mikulas Patocka
  2009-09-29 21:39 ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2009-09-28 23:16 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: Mike Snitzer, dm-devel, Alasdair G Kergon

Hi

I uploaded my test clustered snapshots to 
http://people.redhat.com/mpatocka/patches/kernel/clustered-snapshots-preview/

My patches take different approach from Jon's patches. My patches 
basically replace down_write(&s->lock) and up_write(&s->lock) with 
clusterized locking.

If there are pending exceptions, the cluster lock must be held while the 
local lock is unlocked. The cluster lock is droppen when all pending 
exceptions are reallocated and the local lock is dropped.

The patches are based on my & Mike's merging.

These patches are less invasive than Jon's, they area small, they don't 
change so much logic and most importantly, they leave merging as it is.

Note that it was never tried in a cluster because I don't have a cluster!! 
So there may be a silly bug that makes it not work at all. The purpose of 
the patches is to show different simpler approach to clustering. You 
should test it and debug it.

TODO:
- implement lock caching (Jon's task for his dm-lock module)
- once we implement it, we can implement selective re-read --- i.e. don't 
reread the exceptions if the cluster lock was not taken by any other node
- in a few cases we could optimize it to use readlock or only local lock
- implemented cluster merging

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-09-28 23:16 clustered snapshots Mikulas Patocka
@ 2009-09-29 21:39 ` Mikulas Patocka
  2009-09-30 16:17   ` Jonathan Brassow
  2009-10-02 17:26   ` Mike Snitzer
  0 siblings, 2 replies; 13+ messages in thread
From: Mikulas Patocka @ 2009-09-29 21:39 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: Mike Snitzer, dm-devel, Alasdair G Kergon

Hi

I uploaded a new version there. It has selective re-read and it has 
optimized locking --- it does no remote communication if the chunk already 
exists, so it should be as efficient as Jon's approach.

Jon, please look at it and review the patches, if you find out that this 
approach can't work at all, we can drop it and go with your approach. If 
it is OK, we can consider adopting it, it looks simpler than making 
clustered-exception-store.

Mikulas


On Mon, 28 Sep 2009, Mikulas Patocka wrote:

> Hi
> 
> I uploaded my test clustered snapshots to 
> http://people.redhat.com/mpatocka/patches/kernel/clustered-snapshots-preview/
> 
> My patches take different approach from Jon's patches. My patches 
> basically replace down_write(&s->lock) and up_write(&s->lock) with 
> clusterized locking.
> 
> If there are pending exceptions, the cluster lock must be held while the 
> local lock is unlocked. The cluster lock is droppen when all pending 
> exceptions are reallocated and the local lock is dropped.
> 
> The patches are based on my & Mike's merging.
> 
> These patches are less invasive than Jon's, they area small, they don't 
> change so much logic and most importantly, they leave merging as it is.
> 
> Note that it was never tried in a cluster because I don't have a cluster!! 
> So there may be a silly bug that makes it not work at all. The purpose of 
> the patches is to show different simpler approach to clustering. You 
> should test it and debug it.
> 
> TODO:
> - implement lock caching (Jon's task for his dm-lock module)
> - once we implement it, we can implement selective re-read --- i.e. don't 
> reread the exceptions if the cluster lock was not taken by any other node
> - in a few cases we could optimize it to use readlock or only local lock
> - implemented cluster merging
> 
> Mikulas
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-09-29 21:39 ` Mikulas Patocka
@ 2009-09-30 16:17   ` Jonathan Brassow
  2009-10-01  5:43     ` Mikulas Patocka
  2009-10-02 17:26   ` Mike Snitzer
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Brassow @ 2009-09-30 16:17 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Mike Snitzer, dm-devel, Alasdair G Kergon

I've just looked at the last group of patches briefly.  So far, I like  
what I see.  I especially like the concept of locking locally first,  
and then switching to cluster.  If I add cluster lock caching - this  
will further speed things up.

I'd prefer to have the UUID that is passed in on the CTR line be more  
general - leaving the possibility for future features/flags, rather  
than just plopping an extra argument on the end.  I also need to look  
over the 'reread_exceptions' stuff, as this seems a little light...  I  
assume you didn't just take my patch to allow rereads because by  
adding an additional function to the API, you simplify some of the  
logic?

Also, any progress on adding the cluster locking to the your shared  
snapshots?

  brassow


On Sep 29, 2009, at 4:39 PM, Mikulas Patocka wrote:

> Hi
>
> I uploaded a new version there. It has selective re-read and it has
> optimized locking --- it does no remote communication if the chunk  
> already
> exists, so it should be as efficient as Jon's approach.
>
> Jon, please look at it and review the patches, if you find out that  
> this
> approach can't work at all, we can drop it and go with your  
> approach. If
> it is OK, we can consider adopting it, it looks simpler than making
> clustered-exception-store.
>
> Mikulas
>
>
> On Mon, 28 Sep 2009, Mikulas Patocka wrote:
>
>> Hi
>>
>> I uploaded my test clustered snapshots to
>> http://people.redhat.com/mpatocka/patches/kernel/clustered-snapshots-preview/
>>
>> My patches take different approach from Jon's patches. My patches
>> basically replace down_write(&s->lock) and up_write(&s->lock) with
>> clusterized locking.
>>
>> If there are pending exceptions, the cluster lock must be held  
>> while the
>> local lock is unlocked. The cluster lock is droppen when all pending
>> exceptions are reallocated and the local lock is dropped.
>>
>> The patches are based on my & Mike's merging.
>>
>> These patches are less invasive than Jon's, they area small, they  
>> don't
>> change so much logic and most importantly, they leave merging as it  
>> is.
>>
>> Note that it was never tried in a cluster because I don't have a  
>> cluster!!
>> So there may be a silly bug that makes it not work at all. The  
>> purpose of
>> the patches is to show different simpler approach to clustering. You
>> should test it and debug it.
>>
>> TODO:
>> - implement lock caching (Jon's task for his dm-lock module)
>> - once we implement it, we can implement selective re-read --- i.e.  
>> don't
>> reread the exceptions if the cluster lock was not taken by any  
>> other node
>> - in a few cases we could optimize it to use readlock or only local  
>> lock
>> - implemented cluster merging
>>
>> Mikulas
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-09-30 16:17   ` Jonathan Brassow
@ 2009-10-01  5:43     ` Mikulas Patocka
  0 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2009-10-01  5:43 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: Mike Snitzer, dm-devel, Alasdair G Kergon

Hi

> I've just looked at the last group of patches briefly.  So far, I like what I
> see.  I especially like the concept of locking locally first, and then
> switching to cluster.  If I add cluster lock caching - this will further speed
> things up.
> 
> I'd prefer to have the UUID that is passed in on the CTR line be more general
> - leaving the possibility for future features/flags, rather than just plopping
> an extra argument on the end.

OK, so propose some command line syntax.

> I also need to look over the
> 'reread_exceptions' stuff, as this seems a little light...  I assume you
> didn't just take my patch to allow rereads because by adding an additional
> function to the API, you simplify some of the logic?

I actually tested reread (in no-cluster situation). The trick is that we 
must pass the requests to a thread because if we invoke request from a 
request routine, it gets queued and never finished until the request 
routine finishes. The code is already there to handle dropped snapshot, so 
I just reused it.

For the actual reread I resuse most of existing code.

> Also, any progress on adding the cluster locking to the your shared 
> snapshots?
> 
> brassow

I haven't started with it, but I'll do it the same way, it doesn't seem 
hard. I am more scared of the userspace support for clustering (do you 
have any?)

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-09-29 21:39 ` Mikulas Patocka
  2009-09-30 16:17   ` Jonathan Brassow
@ 2009-10-02 17:26   ` Mike Snitzer
  2009-10-02 17:55     ` Mikulas Patocka
  2009-10-04  3:48     ` Mike Snitzer
  1 sibling, 2 replies; 13+ messages in thread
From: Mike Snitzer @ 2009-10-02 17:26 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Alasdair G Kergon

On Tue, Sep 29 2009 at  5:39pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
> 
> I uploaded a new version there. It has selective re-read and it has 
> optimized locking --- it does no remote communication if the chunk already 
> exists, so it should be as efficient as Jon's approach.
> 
> Jon, please look at it and review the patches, if you find out that this 
> approach can't work at all, we can drop it and go with your approach. If 
> it is OK, we can consider adopting it, it looks simpler than making 
> clustered-exception-store.
> 
> Mikulas
> 
> 
> On Mon, 28 Sep 2009, Mikulas Patocka wrote:
> 
> > Hi
> > 
> > I uploaded my test clustered snapshots to 
> > http://people.redhat.com/mpatocka/patches/kernel/clustered-snapshots-preview/

I have updated the following quilt tree and folded Mikulas' clustered
snapshots patches into the end:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/

Changes include:
. check for {prepare,commit}_merge not {prepare,commit}_exception in
  snapshot_merge_process (dm-snapshot-merge-process.patch)
. removed kernel/dm-exception-store-return-zero-when-empty.patch
  - adjusted dm-exception-store-merge-accounting.patch to preserve
    established kernel<->userspace interface
  - adjusted a few other patches
. removed kernel/dm-snapshot-dont-insert-before-existing-chunk.patch
. moved proper cleanup of memory exceptions in merge_callback() from
  dm-snapshot-merge-use-larger-io-when-merging.patch to
  dm-snapshot-merge-interlock-writes.patch
  - this merge_callback() code walks exceptions in reverse but has not
    been tested on a larger merge to know whether the 'back-merge' of
    exceptions is properly being accounted for when cleaning up the
    in-core snapshot exceptions.
    - I think the fact that the exception cleanup is 2 stage (first
      disk, then all in memory exceptions in reverse) may make
      back-merge "just work"...
. added agk's pending patches to the front of the series
  - this works on 2.6.31
  - I haven't rebased to 2.6.32-rc1 yet

I have done some light snapshot-merge testing to verify that this
updated snapshot-merge quilt tree builds and works as expected.  I'll be
testing more extensively shortly (particularly back-merge).

I have also updated the LVM2 snapshot-merge patches to work on 2.02.54-cvs:
http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.54/

One thing to note is dm-snapshot now depends on dm_cluster_locking; when I
tried to insmod dm-snapshot.ko I got:
dm_snapshot: Unknown symbol dm_cluster_lock_exit
dm_snapshot: Unknown symbol dm_cluster_lock_by_str
dm_snapshot: Unknown symbol dm_cluster_lock_init

I'm also seeing:
DLM (built Sep 11 2009 23:48:05) installed
dlm: no local IP address has been set
dlm: cannot start dlm lowcomms -107
device-mapper: dm-cluster-locking: Failed to create lockspace: dm-snap

Am I just missing the relevant LVM2 bits to push down proper args to the
snapshot_ctr()?  Seems to me that we could do better about defaulting to
_not_ triggering DLM/clustered calls...

Also, I have no idea if I somehow messed up the clustered-locking
patches when I refreshed them to work with my quilt tree.  I had a quick
review of the patch changes I made and the only one that needs a closer
look was dm-snapshot-clustered-locking-optimize.patch.

Mikulas please review/rebase your tree using my updated quilt tree and
let me know if you see any issues.  Maybe you have newer patches at this
point?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-02 17:26   ` Mike Snitzer
@ 2009-10-02 17:55     ` Mikulas Patocka
  2009-10-04  3:48     ` Mike Snitzer
  1 sibling, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2009-10-02 17:55 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair G Kergon

> I have done some light snapshot-merge testing to verify that this
> updated snapshot-merge quilt tree builds and works as expected.  I'll be
> testing more extensively shortly (particularly back-merge).
> 
> I have also updated the LVM2 snapshot-merge patches to work on 2.02.54-cvs:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.54/
> 
> One thing to note is dm-snapshot now depends on dm_cluster_locking; when I
> tried to insmod dm-snapshot.ko I got:
> dm_snapshot: Unknown symbol dm_cluster_lock_exit
> dm_snapshot: Unknown symbol dm_cluster_lock_by_str
> dm_snapshot: Unknown symbol dm_cluster_lock_init

That is expected and it shouldn't make problems (modprobe handles 
dependences, just insmod doesn't). If you compile the kernel without 
dm_cluster_locking, it would not use it and it would have clustering 
disabled.

> I'm also seeing:
> DLM (built Sep 11 2009 23:48:05) installed
> dlm: no local IP address has been set
> dlm: cannot start dlm lowcomms -107
> device-mapper: dm-cluster-locking: Failed to create lockspace: dm-snap
> 
> Am I just missing the relevant LVM2 bits to push down proper args to the
> snapshot_ctr()?  Seems to me that we could do better about defaulting to
> _not_ triggering DLM/clustered calls...

That just means that you can't use clustering because you didn't configure 
DLM, otherwise it is harmless. I will redo the patch so that it tries to 
initialized dlm only if cluster arguments are supplied --- so that the 
message won't scare people.

> Also, I have no idea if I somehow messed up the clustered-locking
> patches when I refreshed them to work with my quilt tree.  I had a quick
> review of the patch changes I made and the only one that needs a closer
> look was dm-snapshot-clustered-locking-optimize.patch.

You didn't mess it, I get the same message too. But it is harmless.

> Mikulas please review/rebase your tree using my updated quilt tree and
> let me know if you see any issues.  Maybe you have newer patches at this
> point?
> 
> Thanks,
> Mike

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-02 17:26   ` Mike Snitzer
  2009-10-02 17:55     ` Mikulas Patocka
@ 2009-10-04  3:48     ` Mike Snitzer
  2009-10-05  2:25       ` Mikulas Patocka
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2009-10-04  3:48 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Alasdair G Kergon

On Fri, Oct 02 2009 at  1:26pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> I have updated the following quilt tree and folded Mikulas' clustered
> snapshots patches into the end:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/
...
> I have done some light snapshot-merge testing to verify that this
> updated snapshot-merge quilt tree builds and works as expected.  I'll be
> testing more extensively shortly (particularly back-merge).

Mikulas,

I had the harddrive die in the server I was using for snapshot-merge
testing so my "more extensive testing" was delayed until today.

In testing it was quite clear that back-merge was not working at all.  I
had to get creative with how to make back-merging work with the
older-style snapshot-merge (which doesn't have jon's work to fold the
'complete' hashtable into the store; that really helped snapshot-merge
support back-merges).

I have updated my people page's quilt tree to include the following
patch; I've now tested snapshot-merge to _really_ work with both just
the snapshot-merge patches and also with your clustered snapshot patches
ontop.

Also, I'll be folding these changes into the appropriate snapshot-merge
patches; the dm-snapshot-merge-support-insert-before-existing-chunk.patch
at the end of the snapshot-merge patches is just a means to get all the
working snapshot-merge bits in place.

All things considered I think the following is about as clean as we can
hope for but I welcome your (and others') review.  


Subject: [PATCH] Add clear_exception callback to the dm_exception_store_type's ->commit_merge

This callback is used to clear in-core exceptions during the exception
store's commit_merge.  One _must_ clear the in-core exceptions prior to
the on-disk exceptions.  But this clearing of exceptions must be done
fine-grained.  One cannot clear all in-core exceptions and then clear
all on-disk exceptions and arrive a a solution that is actually stable
and works.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-exception-store.h |    5 ++
 drivers/md/dm-snap-persistent.c |   30 ++++++++++-------
 drivers/md/dm-snap.c            |   69 +++++++++++++++++++++++++---------------
 3 files changed, 66 insertions(+), 38 deletions(-)

Index: linux-2.6/drivers/md/dm-exception-store.h
===================================================================
--- linux-2.6.orig/drivers/md/dm-exception-store.h
+++ linux-2.6/drivers/md/dm-exception-store.h
@@ -91,8 +91,11 @@ struct dm_exception_store_type {
 	/*
 	 * Clear the last n exceptions.
 	 * n must be <= the value returned by prepare_merge.
+	 * callback is used to clear in-core exceptions.
 	 */
-	int (*commit_merge) (struct dm_exception_store *store, int n);
+	int (*commit_merge) (struct dm_exception_store *store, int n,
+			     int (*callback) (void *, chunk_t old, chunk_t new),
+			     void *callback_context);
 
 	/*
 	 * The snapshot is invalid, note this in the metadata.
Index: linux-2.6/drivers/md/dm-snap-persistent.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-snap-persistent.c
+++ linux-2.6/drivers/md/dm-snap-persistent.c
@@ -410,15 +410,6 @@ static void write_exception(struct pstor
 	e->new_chunk = cpu_to_le64(de->new_chunk);
 }
 
-static void clear_exception(struct pstore *ps, uint32_t index)
-{
-	struct disk_exception *e = get_exception(ps, index);
-
-	/* clear it */
-	e->old_chunk = 0;
-	e->new_chunk = 0;
-}
-
 /*
  * Registers the exceptions that are present in the current area.
  * 'full' is filled in to indicate if the area has been
@@ -726,15 +717,30 @@ static int persistent_prepare_merge(stru
 	return i;
 }
 
-static int persistent_commit_merge(struct dm_exception_store *store, int n)
+static int persistent_commit_merge(struct dm_exception_store *store, int n,
+				   int (*callback) (void *,
+						    chunk_t old, chunk_t new),
+				   void *callback_context)
 {
 	int r, i;
 	struct pstore *ps = get_info(store);
 
 	BUG_ON(n > ps->current_committed);
+	BUG_ON(!callback);
 
-	for (i = 0; i < n; i++)
-		clear_exception(ps, ps->current_committed - 1 - i);
+	for (i = 0; i < n; i++) {
+		struct disk_exception *de =
+			get_exception(ps, ps->current_committed - 1 - i);
+
+		/* clear in-core exception */
+		r = callback(callback_context, de->old_chunk, de->new_chunk);
+		if (r < 0)
+			return r;
+
+		/* clear disk exception */
+		de->old_chunk = 0;
+		de->new_chunk = 0;
+	}
 
 	r = area_io(ps, WRITE);
 	if (r < 0)
Index: linux-2.6/drivers/md/dm-snap.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-snap.c
+++ linux-2.6/drivers/md/dm-snap.c
@@ -764,11 +764,50 @@ static inline void release_write_interlo
 		error_bios(b);
 }
 
+static int clear_merged_exception(struct dm_snap_exception *e,
+				  chunk_t old_chunk, chunk_t new_chunk)
+{
+	if (dm_consecutive_chunk_count(e)) {
+		if ((old_chunk == e->old_chunk) &&
+		    (new_chunk == dm_chunk_number(e->new_chunk))) {
+			e->old_chunk++;
+			e->new_chunk++;
+		} else if (old_chunk != e->old_chunk +
+			   dm_consecutive_chunk_count(e) &&
+			   new_chunk != dm_chunk_number(e->new_chunk) +
+			   dm_consecutive_chunk_count(e)) {
+			DMERR("merge from the middle of a chunk range");
+			return -1;
+		}
+		dm_consecutive_chunk_count_dec(e);
+	} else {
+		remove_exception(e);
+		free_exception(e);
+	}
+
+	return 0;
+}
+
+static int merge_clear_exception_callback(void *context,
+					  chunk_t old_chunk, chunk_t new_chunk)
+{
+	struct dm_snap_exception *e;
+	struct exception_table *complete_et = context;
+
+	e = lookup_exception(complete_et, old_chunk);
+	if (!e) {
+		DMERR("exception for block %llu is on disk but not in memory",
+		      (unsigned long long)old_chunk);
+		return -1;
+	}
+
+	return clear_merged_exception(e, old_chunk, new_chunk);
+}
+
 static void merge_callback(int read_err, unsigned long write_err, void *context)
 {
-	int r, i;
+	int r;
 	struct dm_snapshot *s = context;
-	struct dm_snap_exception *e;
 
 	if (read_err || write_err) {
 		if (read_err)
@@ -778,35 +817,15 @@ static void merge_callback(int read_err,
 		goto shut;
 	}
 
-	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n);
+	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n,
+					 merge_clear_exception_callback,
+					 &s->complete);
 	if (r < 0) {
 		DMERR("Write error in exception store, shutting down merge");
 		goto shut;
 	}
 
 	down_write(&s->lock);
-	/*
-	 * Must process chunks (and associated exceptions) in reverse
-	 * so that dm_consecutive_chunk_count_dec() accounting works
-	 */
-	for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
-		e = lookup_exception(&s->complete,
-				     s->merge_write_interlock + i);
-		if (!e) {
-			DMERR("exception for block %llu is on "
-			      "disk but not in memory",
-			      (unsigned long long)
-			      s->merge_write_interlock + i);
-			up_write(&s->lock);
-			goto shut;
-		}
-		if (dm_consecutive_chunk_count(e)) {
-			dm_consecutive_chunk_count_dec(e);
-		} else {
-			remove_exception(e);
-			free_exception(e);
-		}
-	}
 	release_write_interlock(s, 0);
 
 	snapshot_merge_process(s);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-04  3:48     ` Mike Snitzer
@ 2009-10-05  2:25       ` Mikulas Patocka
  2009-10-05  4:33         ` Mike Snitzer
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2009-10-05  2:25 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair G Kergon

Hi

I don't understand the purpose of this patch. I think it's useless.

- during merge, nothing except the merge process reads the on-disk 
exceptions. So it doesn't matter when you clean them. You must clean them 
before dropping the interlock, but the order is not important.

- during merge, the interlock is held for the range to be merged. So there 
shouldn't be any concurrent writes. Concurrent reads are dispatched to the 
snapshot area (as long as the exception remains in the table).

- bug in this patch: merge_clear_exception_callback modifies the hash 
table outside a lock

I think this patch just hides other bug that you have made. Do you set and 
test the interlock correctly? If yes, it will block any I/Os to the chunks 
being merged.

Mikulas

On Sat, 3 Oct 2009, Mike Snitzer wrote:

> Mikulas,
> 
> I had the harddrive die in the server I was using for snapshot-merge
> testing so my "more extensive testing" was delayed until today.
> 
> In testing it was quite clear that back-merge was not working at all.  I
> had to get creative with how to make back-merging work with the
> older-style snapshot-merge (which doesn't have jon's work to fold the
> 'complete' hashtable into the store; that really helped snapshot-merge
> support back-merges).
> 
> I have updated my people page's quilt tree to include the following
> patch; I've now tested snapshot-merge to _really_ work with both just
> the snapshot-merge patches and also with your clustered snapshot patches
> ontop.
> 
> Also, I'll be folding these changes into the appropriate snapshot-merge
> patches; the dm-snapshot-merge-support-insert-before-existing-chunk.patch
> at the end of the snapshot-merge patches is just a means to get all the
> working snapshot-merge bits in place.
> 
> All things considered I think the following is about as clean as we can
> hope for but I welcome your (and others') review.  
> 
> 
> Subject: [PATCH] Add clear_exception callback to the dm_exception_store_type's ->commit_merge
> 
> This callback is used to clear in-core exceptions during the exception
> store's commit_merge.  One _must_ clear the in-core exceptions prior to
> the on-disk exceptions.  But this clearing of exceptions must be done
> fine-grained.  One cannot clear all in-core exceptions and then clear
> all on-disk exceptions and arrive a a solution that is actually stable
> and works.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/md/dm-exception-store.h |    5 ++
>  drivers/md/dm-snap-persistent.c |   30 ++++++++++-------
>  drivers/md/dm-snap.c            |   69 +++++++++++++++++++++++++---------------
>  3 files changed, 66 insertions(+), 38 deletions(-)
> 
> Index: linux-2.6/drivers/md/dm-exception-store.h
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-exception-store.h
> +++ linux-2.6/drivers/md/dm-exception-store.h
> @@ -91,8 +91,11 @@ struct dm_exception_store_type {
>  	/*
>  	 * Clear the last n exceptions.
>  	 * n must be <= the value returned by prepare_merge.
> +	 * callback is used to clear in-core exceptions.
>  	 */
> -	int (*commit_merge) (struct dm_exception_store *store, int n);
> +	int (*commit_merge) (struct dm_exception_store *store, int n,
> +			     int (*callback) (void *, chunk_t old, chunk_t new),
> +			     void *callback_context);
>  
>  	/*
>  	 * The snapshot is invalid, note this in the metadata.
> Index: linux-2.6/drivers/md/dm-snap-persistent.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-snap-persistent.c
> +++ linux-2.6/drivers/md/dm-snap-persistent.c
> @@ -410,15 +410,6 @@ static void write_exception(struct pstor
>  	e->new_chunk = cpu_to_le64(de->new_chunk);
>  }
>  
> -static void clear_exception(struct pstore *ps, uint32_t index)
> -{
> -	struct disk_exception *e = get_exception(ps, index);
> -
> -	/* clear it */
> -	e->old_chunk = 0;
> -	e->new_chunk = 0;
> -}
> -
>  /*
>   * Registers the exceptions that are present in the current area.
>   * 'full' is filled in to indicate if the area has been
> @@ -726,15 +717,30 @@ static int persistent_prepare_merge(stru
>  	return i;
>  }
>  
> -static int persistent_commit_merge(struct dm_exception_store *store, int n)
> +static int persistent_commit_merge(struct dm_exception_store *store, int n,
> +				   int (*callback) (void *,
> +						    chunk_t old, chunk_t new),
> +				   void *callback_context)
>  {
>  	int r, i;
>  	struct pstore *ps = get_info(store);
>  
>  	BUG_ON(n > ps->current_committed);
> +	BUG_ON(!callback);
>  
> -	for (i = 0; i < n; i++)
> -		clear_exception(ps, ps->current_committed - 1 - i);
> +	for (i = 0; i < n; i++) {
> +		struct disk_exception *de =
> +			get_exception(ps, ps->current_committed - 1 - i);
> +
> +		/* clear in-core exception */
> +		r = callback(callback_context, de->old_chunk, de->new_chunk);
> +		if (r < 0)
> +			return r;
> +
> +		/* clear disk exception */
> +		de->old_chunk = 0;
> +		de->new_chunk = 0;
> +	}
>  
>  	r = area_io(ps, WRITE);
>  	if (r < 0)
> Index: linux-2.6/drivers/md/dm-snap.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-snap.c
> +++ linux-2.6/drivers/md/dm-snap.c
> @@ -764,11 +764,50 @@ static inline void release_write_interlo
>  		error_bios(b);
>  }
>  
> +static int clear_merged_exception(struct dm_snap_exception *e,
> +				  chunk_t old_chunk, chunk_t new_chunk)
> +{
> +	if (dm_consecutive_chunk_count(e)) {
> +		if ((old_chunk == e->old_chunk) &&
> +		    (new_chunk == dm_chunk_number(e->new_chunk))) {
> +			e->old_chunk++;
> +			e->new_chunk++;
> +		} else if (old_chunk != e->old_chunk +
> +			   dm_consecutive_chunk_count(e) &&
> +			   new_chunk != dm_chunk_number(e->new_chunk) +
> +			   dm_consecutive_chunk_count(e)) {
> +			DMERR("merge from the middle of a chunk range");
> +			return -1;
> +		}
> +		dm_consecutive_chunk_count_dec(e);
> +	} else {
> +		remove_exception(e);
> +		free_exception(e);
> +	}
> +
> +	return 0;
> +}
> +
> +static int merge_clear_exception_callback(void *context,
> +					  chunk_t old_chunk, chunk_t new_chunk)
> +{
> +	struct dm_snap_exception *e;
> +	struct exception_table *complete_et = context;
> +
> +	e = lookup_exception(complete_et, old_chunk);
> +	if (!e) {
> +		DMERR("exception for block %llu is on disk but not in memory",
> +		      (unsigned long long)old_chunk);
> +		return -1;
> +	}
> +
> +	return clear_merged_exception(e, old_chunk, new_chunk);
> +}
> +
>  static void merge_callback(int read_err, unsigned long write_err, void *context)
>  {
> -	int r, i;
> +	int r;
>  	struct dm_snapshot *s = context;
> -	struct dm_snap_exception *e;
>  
>  	if (read_err || write_err) {
>  		if (read_err)
> @@ -778,35 +817,15 @@ static void merge_callback(int read_err,
>  		goto shut;
>  	}
>  
> -	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n);
> +	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n,
> +					 merge_clear_exception_callback,
> +					 &s->complete);
>  	if (r < 0) {
>  		DMERR("Write error in exception store, shutting down merge");
>  		goto shut;
>  	}
>  
>  	down_write(&s->lock);
> -	/*
> -	 * Must process chunks (and associated exceptions) in reverse
> -	 * so that dm_consecutive_chunk_count_dec() accounting works
> -	 */
> -	for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
> -		e = lookup_exception(&s->complete,
> -				     s->merge_write_interlock + i);
> -		if (!e) {
> -			DMERR("exception for block %llu is on "
> -			      "disk but not in memory",
> -			      (unsigned long long)
> -			      s->merge_write_interlock + i);
> -			up_write(&s->lock);
> -			goto shut;
> -		}
> -		if (dm_consecutive_chunk_count(e)) {
> -			dm_consecutive_chunk_count_dec(e);
> -		} else {
> -			remove_exception(e);
> -			free_exception(e);
> -		}
> -	}
>  	release_write_interlock(s, 0);
>  
>  	snapshot_merge_process(s);
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-05  2:25       ` Mikulas Patocka
@ 2009-10-05  4:33         ` Mike Snitzer
  2009-10-05 13:38           ` Mike Snitzer
  2009-10-05 15:57           ` Mikulas Patocka
  0 siblings, 2 replies; 13+ messages in thread
From: Mike Snitzer @ 2009-10-05  4:33 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Alasdair G Kergon

On Sun, Oct 04 2009 at 10:25pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
> 
> I don't understand the purpose of this patch. I think it's useless.

You don't understand it, so you think it's useless.. nice to know where
I'm starting.

> - during merge, nothing except the merge process reads the on-disk 
> exceptions. So it doesn't matter when you clean them. You must clean them 
> before dropping the interlock, but the order is not important.

Sure, I thought the same but quickly realized that in practice order
does matter if you'd like to avoid failing the in-core exception cleanup
(IFF we allow chunks to be inserted before existing chunks; aka
"back-merges").

Your original approach to cleaning all on-disk exceptions then all
in-core has no hope of working for "back-merges".  Your original
snapshot-merge completely eliminated insertions before existing chunks.

I also tried to clean all in-core exceptions and then clean all on-disk
exceptions.  I chose this order because when cleaning in-core exceptions
you need to _know_ the {old,new}_chunk for the on-disk
exception.. Otherwise you can't properly determine whether the in-core's
exception that has consecutive chunks needs e->{old,new}_chunk++ before
dm_consecutive_chunk_count_dec(e).

Even though it was awkward I tried it... code was ugly as hell and
simply didn't work.  Judging the in-core exception's cleanup based on
some exception store's location of the associated on-disk exception
required taking the approach I did with adding a callback to
commit_merge().  The exception stores should not know anything of the
in-core exceptions in the common snapshot code.

Lesson I learned is that the in-core dm_snap_exception's consecutive
chunk accounting is _really_ subtle.  And while I agree there should be
no relation between the proper cleanup of on-disk and in-core
exceptions; doing the cleanup of each in lock-step appears to be the
only way forward (that I found).

> - during merge, the interlock is held for the range to be merged. So there 
> shouldn't be any concurrent writes. Concurrent reads are dispatched to the 
> snapshot area (as long as the exception remains in the table).
> 
> - bug in this patch: merge_clear_exception_callback modifies the hash 
> table outside a lock

Yes I noticed that after having sent the patch.  Pretty trivial to fix
and in practice I fail to see how not taking the lock really matters
considering there was no additional concurrent IO in my merge tests.

BTW, taking the lock before the ->commit_merge only works, without
lockdep issues, if you use nested locking; which you introduced with the
first patch in your clustered locking series.

> I think this patch just hides other bug that you have made. Do you set and 
> test the interlock correctly? If yes, it will block any I/Os to the chunks 
> being merged.

What bug have I made?  Pretty bold assertion only to back it up with
hand-waving and bravado.  I've worked hard to preserve your original work
as closely as possible; yet add the ability to support insertion of
chunks before existing chunks.

Fact of the matter is that snapshot-merge ontop of Jon's patches (which
reworked the exception and exception table code; and in end moved the
'complete' hash table internal to the exception store) gave us a
_working_ model for how to cleanup in-core exceptions that had chunks
that were inserted before others.  I think that how we arrived at that
working model was mostly luck:

1) Jon's patches first enabled dm-snap-persistent's clear_exception() to
clean the in-core exception and associated on-disk exception
symmetrically.

2) Jon, you and I discussed how supporting inserts before existing
chunks _should_ work so long as they are accounted for at the time we 
dm_consecutive_chunk_count_dec(e); Jon and I got that working built on
his patches.. it was really straight forward because of #1.

Fast forward to now; where you don't like Jon's approach to cluster
locking/snapshots.. I'm trying to make the "old-style" snapshot-merge
support back-merges.  It took way too much effort to arrive at the patch
you think is "useless".  That patch is actually quite clean; and also
now gracefully fails rather than using BUG.

But in general I do think that how I got to this patch is anything but
transparent and obvious.  It was after careful study and testing of
Jon's working model that I was able to reproduce the ability to reliably
cleanup the in-core exceptions (when insertion of chunks before existing
is allowed).

Again, your original snapshot-merge completely eliminated insertions
before existing chunks.  I made it work ontop of your orignal
snapshot-merge, but it took careful porting of concepts that we got for
free from Jon's patches (namely symmetric cleanup of in-core and on-disk
exceptions).  I still need to go over it to better understand the "_why_
does it work?".

I encourage you to do the same.  Or come up with an alternative approach
to supporting cleanup of "back-merged" chunks; one that works in practice
and not in theory.

Mike


> On Sat, 3 Oct 2009, Mike Snitzer wrote:
> 
> > Mikulas,
> > 
> > I had the harddrive die in the server I was using for snapshot-merge
> > testing so my "more extensive testing" was delayed until today.
> > 
> > In testing it was quite clear that back-merge was not working at all.  I
> > had to get creative with how to make back-merging work with the
> > older-style snapshot-merge (which doesn't have jon's work to fold the
> > 'complete' hashtable into the store; that really helped snapshot-merge
> > support back-merges).
> > 
> > I have updated my people page's quilt tree to include the following
> > patch; I've now tested snapshot-merge to _really_ work with both just
> > the snapshot-merge patches and also with your clustered snapshot patches
> > ontop.
> > 
> > Also, I'll be folding these changes into the appropriate snapshot-merge
> > patches; the dm-snapshot-merge-support-insert-before-existing-chunk.patch
> > at the end of the snapshot-merge patches is just a means to get all the
> > working snapshot-merge bits in place.
> > 
> > All things considered I think the following is about as clean as we can
> > hope for but I welcome your (and others') review.  
> > 
> > 
> > Subject: [PATCH] Add clear_exception callback to the dm_exception_store_type's ->commit_merge
> > 
> > This callback is used to clear in-core exceptions during the exception
> > store's commit_merge.  One _must_ clear the in-core exceptions prior to
> > the on-disk exceptions.  But this clearing of exceptions must be done
> > fine-grained.  One cannot clear all in-core exceptions and then clear
> > all on-disk exceptions and arrive a a solution that is actually stable
> > and works.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  drivers/md/dm-exception-store.h |    5 ++
> >  drivers/md/dm-snap-persistent.c |   30 ++++++++++-------
> >  drivers/md/dm-snap.c            |   69 +++++++++++++++++++++++++---------------
> >  3 files changed, 66 insertions(+), 38 deletions(-)
> > 
> > Index: linux-2.6/drivers/md/dm-exception-store.h
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-exception-store.h
> > +++ linux-2.6/drivers/md/dm-exception-store.h
> > @@ -91,8 +91,11 @@ struct dm_exception_store_type {
> >  	/*
> >  	 * Clear the last n exceptions.
> >  	 * n must be <= the value returned by prepare_merge.
> > +	 * callback is used to clear in-core exceptions.
> >  	 */
> > -	int (*commit_merge) (struct dm_exception_store *store, int n);
> > +	int (*commit_merge) (struct dm_exception_store *store, int n,
> > +			     int (*callback) (void *, chunk_t old, chunk_t new),
> > +			     void *callback_context);
> >  
> >  	/*
> >  	 * The snapshot is invalid, note this in the metadata.
> > Index: linux-2.6/drivers/md/dm-snap-persistent.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-snap-persistent.c
> > +++ linux-2.6/drivers/md/dm-snap-persistent.c
> > @@ -410,15 +410,6 @@ static void write_exception(struct pstor
> >  	e->new_chunk = cpu_to_le64(de->new_chunk);
> >  }
> >  
> > -static void clear_exception(struct pstore *ps, uint32_t index)
> > -{
> > -	struct disk_exception *e = get_exception(ps, index);
> > -
> > -	/* clear it */
> > -	e->old_chunk = 0;
> > -	e->new_chunk = 0;
> > -}
> > -
> >  /*
> >   * Registers the exceptions that are present in the current area.
> >   * 'full' is filled in to indicate if the area has been
> > @@ -726,15 +717,30 @@ static int persistent_prepare_merge(stru
> >  	return i;
> >  }
> >  
> > -static int persistent_commit_merge(struct dm_exception_store *store, int n)
> > +static int persistent_commit_merge(struct dm_exception_store *store, int n,
> > +				   int (*callback) (void *,
> > +						    chunk_t old, chunk_t new),
> > +				   void *callback_context)
> >  {
> >  	int r, i;
> >  	struct pstore *ps = get_info(store);
> >  
> >  	BUG_ON(n > ps->current_committed);
> > +	BUG_ON(!callback);
> >  
> > -	for (i = 0; i < n; i++)
> > -		clear_exception(ps, ps->current_committed - 1 - i);
> > +	for (i = 0; i < n; i++) {
> > +		struct disk_exception *de =
> > +			get_exception(ps, ps->current_committed - 1 - i);
> > +
> > +		/* clear in-core exception */
> > +		r = callback(callback_context, de->old_chunk, de->new_chunk);
> > +		if (r < 0)
> > +			return r;
> > +
> > +		/* clear disk exception */
> > +		de->old_chunk = 0;
> > +		de->new_chunk = 0;
> > +	}
> >  
> >  	r = area_io(ps, WRITE);
> >  	if (r < 0)
> > Index: linux-2.6/drivers/md/dm-snap.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-snap.c
> > +++ linux-2.6/drivers/md/dm-snap.c
> > @@ -764,11 +764,50 @@ static inline void release_write_interlo
> >  		error_bios(b);
> >  }
> >  
> > +static int clear_merged_exception(struct dm_snap_exception *e,
> > +				  chunk_t old_chunk, chunk_t new_chunk)
> > +{
> > +	if (dm_consecutive_chunk_count(e)) {
> > +		if ((old_chunk == e->old_chunk) &&
> > +		    (new_chunk == dm_chunk_number(e->new_chunk))) {
> > +			e->old_chunk++;
> > +			e->new_chunk++;
> > +		} else if (old_chunk != e->old_chunk +
> > +			   dm_consecutive_chunk_count(e) &&
> > +			   new_chunk != dm_chunk_number(e->new_chunk) +
> > +			   dm_consecutive_chunk_count(e)) {
> > +			DMERR("merge from the middle of a chunk range");
> > +			return -1;
> > +		}
> > +		dm_consecutive_chunk_count_dec(e);
> > +	} else {
> > +		remove_exception(e);
> > +		free_exception(e);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int merge_clear_exception_callback(void *context,
> > +					  chunk_t old_chunk, chunk_t new_chunk)
> > +{
> > +	struct dm_snap_exception *e;
> > +	struct exception_table *complete_et = context;
> > +
> > +	e = lookup_exception(complete_et, old_chunk);
> > +	if (!e) {
> > +		DMERR("exception for block %llu is on disk but not in memory",
> > +		      (unsigned long long)old_chunk);
> > +		return -1;
> > +	}
> > +
> > +	return clear_merged_exception(e, old_chunk, new_chunk);
> > +}
> > +
> >  static void merge_callback(int read_err, unsigned long write_err, void *context)
> >  {
> > -	int r, i;
> > +	int r;
> >  	struct dm_snapshot *s = context;
> > -	struct dm_snap_exception *e;
> >  
> >  	if (read_err || write_err) {
> >  		if (read_err)
> > @@ -778,35 +817,15 @@ static void merge_callback(int read_err,
> >  		goto shut;
> >  	}
> >  
> > -	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n);
> > +	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n,
> > +					 merge_clear_exception_callback,
> > +					 &s->complete);
> >  	if (r < 0) {
> >  		DMERR("Write error in exception store, shutting down merge");
> >  		goto shut;
> >  	}
> >  
> >  	down_write(&s->lock);
> > -	/*
> > -	 * Must process chunks (and associated exceptions) in reverse
> > -	 * so that dm_consecutive_chunk_count_dec() accounting works
> > -	 */
> > -	for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
> > -		e = lookup_exception(&s->complete,
> > -				     s->merge_write_interlock + i);
> > -		if (!e) {
> > -			DMERR("exception for block %llu is on "
> > -			      "disk but not in memory",
> > -			      (unsigned long long)
> > -			      s->merge_write_interlock + i);
> > -			up_write(&s->lock);
> > -			goto shut;
> > -		}
> > -		if (dm_consecutive_chunk_count(e)) {
> > -			dm_consecutive_chunk_count_dec(e);
> > -		} else {
> > -			remove_exception(e);
> > -			free_exception(e);
> > -		}
> > -	}
> >  	release_write_interlock(s, 0);
> >  
> >  	snapshot_merge_process(s);
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-05  4:33         ` Mike Snitzer
@ 2009-10-05 13:38           ` Mike Snitzer
  2009-10-05 15:57           ` Mikulas Patocka
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2009-10-05 13:38 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Alasdair G Kergon

On Mon, Oct 05 2009 at 12:33am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Sun, Oct 04 2009 at 10:25pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > Hi
> > 
> > I don't understand the purpose of this patch. I think it's useless.
> 
> You don't understand it, so you think it's useless.. nice to know where
> I'm starting.
> 
> > - during merge, nothing except the merge process reads the on-disk 
> > exceptions. So it doesn't matter when you clean them. You must clean them 
> > before dropping the interlock, but the order is not important.
> 
> Sure, I thought the same but quickly realized that in practice order
> does matter if you'd like to avoid failing the in-core exception cleanup
> (IFF we allow chunks to be inserted before existing chunks; aka
> "back-merges").
> 
> Your original approach to cleaning all on-disk exceptions then all
> in-core has no hope of working for "back-merges".  Your original
> snapshot-merge completely eliminated insertions before existing chunks.

<snip - my long-winded defensiveness>

> Lesson I learned is that the in-core dm_snap_exception's consecutive
> chunk accounting is _really_ subtle.  And while I agree there should be
> no relation between the proper cleanup of on-disk and in-core
> exceptions; doing the cleanup of each in lock-step appears to be the
> only way forward (that I found).

If I just judge the location of the chunk within the in-core exception
support for back-merges works just as well.  Here is a minimalist patch
the accomplishes the same:

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index bcfbe85..b271f56 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -790,17 +790,25 @@ static void merge_callback(int read_err, unsigned long write_err, void *context)
         * so that dm_consecutive_chunk_count_dec() accounting works
         */
        for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
-               e = lookup_exception(&s->complete,
-                                    s->merge_write_interlock + i);
+               chunk_t old_chunk = s->merge_write_interlock + i;
+               e = lookup_exception(&s->complete, old_chunk);
                if (!e) {
                        DMERR("exception for block %llu is on "
                              "disk but not in memory",
-                             (unsigned long long)
-                             s->merge_write_interlock + i);
+                             (unsigned long long)old_chunk);
                        up_write(&s->lock);
                        goto shut;
                }
                if (dm_consecutive_chunk_count(e)) {
+                       if (old_chunk == e->old_chunk) {
+                               e->old_chunk++;
+                               e->new_chunk++;
+                       } else if (old_chunk != e->old_chunk +
+                                  dm_consecutive_chunk_count(e)) {
+                               DMERR("merge from the middle of a chunk range");
+                               up_write(&s->lock);
+                               goto shut;
+                       }
                        dm_consecutive_chunk_count_dec(e);
                } else {
                        remove_exception(e);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-05  4:33         ` Mike Snitzer
  2009-10-05 13:38           ` Mike Snitzer
@ 2009-10-05 15:57           ` Mikulas Patocka
  2009-10-05 16:24             ` Mike Snitzer
  1 sibling, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2009-10-05 15:57 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair G Kergon



On Mon, 5 Oct 2009, Mike Snitzer wrote:

> On Sun, Oct 04 2009 at 10:25pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > Hi
> > 
> > I don't understand the purpose of this patch. I think it's useless.
> 
> You don't understand it, so you think it's useless.. nice to know where
> I'm starting.
> 
> > - during merge, nothing except the merge process reads the on-disk 
> > exceptions. So it doesn't matter when you clean them. You must clean them 
> > before dropping the interlock, but the order is not important.
> 
> Sure, I thought the same but quickly realized that in practice order
> does matter if you'd like to avoid failing the in-core exception cleanup
> (IFF we allow chunks to be inserted before existing chunks; aka
> "back-merges").
> 
> Your original approach to cleaning all on-disk exceptions then all
> in-core has no hope of working for "back-merges".  Your original
> snapshot-merge completely eliminated insertions before existing chunks.
> 
> I also tried to clean all in-core exceptions and then clean all on-disk
> exceptions.  I chose this order because when cleaning in-core exceptions
> you need to _know_ the {old,new}_chunk for the on-disk
> exception.. Otherwise you can't properly determine whether the in-core's
> exception that has consecutive chunks needs e->{old,new}_chunk++ before
> dm_consecutive_chunk_count_dec(e).
> 
> Even though it was awkward I tried it... code was ugly as hell and
> simply didn't work.  Judging the in-core exception's cleanup based on
> some exception store's location of the associated on-disk exception
> required taking the approach I did with adding a callback to
> commit_merge().  The exception stores should not know anything of the
> in-core exceptions in the common snapshot code.

Well, you can't "try" to fix bugs without understanding them. This effort 
may hide the bug.

Unrelated changes may hide the bug --- not fix it but make it appear under 
different conditions. So, if there's a bug, I take a snapshot of my code 
(so that I can always revert to the "buggy" code) and then try to find it. 
If the code works after some change but I don't know why, it cannot be 
considered as a fix for the bug.

> Lesson I learned is that the in-core dm_snap_exception's consecutive
> chunk accounting is _really_ subtle.  And while I agree there should be
> no relation between the proper cleanup of on-disk and in-core
> exceptions; doing the cleanup of each in lock-step appears to be the
> only way forward (that I found).

Looking closer at it, all that the patch changes is that before the patch, 
exceptions were cleaned last-to-first and after the patch they are cleaned 
first-to-last. This is likely a reason that the patch works around a real 
bug somewhere else. Not the callback.

> > I think this patch just hides other bug that you have made. Do you set and 
> > test the interlock correctly? If yes, it will block any I/Os to the chunks 
> > being merged.
> 
> What bug have I made?  Pretty bold assertion only to back it up with
> hand-waving and bravado.  I've worked hard to preserve your original work
> as closely as possible; yet add the ability to support insertion of
> chunks before existing chunks.

If it worked before without this patch and stopped working after you've 
made some changes, than you have likely made a bug. So find it.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clustered snapshots
  2009-10-05 15:57           ` Mikulas Patocka
@ 2009-10-05 16:24             ` Mike Snitzer
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2009-10-05 16:24 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Alasdair G Kergon

On Mon, Oct 05 2009 at 11:57am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Mon, 5 Oct 2009, Mike Snitzer wrote:
...
> > Lesson I learned is that the in-core dm_snap_exception's consecutive
> > chunk accounting is _really_ subtle.  And while I agree there should be
> > no relation between the proper cleanup of on-disk and in-core
> > exceptions; doing the cleanup of each in lock-step appears to be the
> > only way forward (that I found).
> 
> Looking closer at it, all that the patch changes is that before the patch, 
> exceptions were cleaned last-to-first and after the patch they are cleaned 
> first-to-last. This is likely a reason that the patch works around a real 
> bug somewhere else. Not the callback.

Callback aside; the code didn't change the processing order.  You're
reading the code wrong.  persistent_commit_merge() processes the on-disk
exceptions in reverse with the following loop:

	   for (i = 0; i < n; i++)
	       clear_exception(ps, ps->current_committed - 1 - i);

The in-core exception cleanup in merge_callback() also cleans up in
reverse:

	  /*
	   * Must process chunks (and associated exceptions) in reverse
	   * so that dm_consecutive_chunk_count_dec() accounting works
	   */
	   for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
	   ...

So the fact that I added a callback from persistent_commit_merge()
did _not_ change the last-to-first exception cleanup ordering (for
either on-disk or in-core).

Regardless, I'm not using the callback.. I have simpler code to handle
back-merges within the merge_callback() for-loop.  I emailed that patch
in my previous reply to this thread...

> > > I think this patch just hides other bug that you have made. Do you set and 
> > > test the interlock correctly? If yes, it will block any I/Os to the chunks 
> > > being merged.
> > 
> > What bug have I made?  Pretty bold assertion only to back it up with
> > hand-waving and bravado.  I've worked hard to preserve your original work
> > as closely as possible; yet add the ability to support insertion of
> > chunks before existing chunks.
> 
> If it worked before without this patch and stopped working after you've 
> made some changes, than you have likely made a bug. So find it.

I'm not going to go round and round on this.  Backmerges never worked
with your snapshot-merge because you never took them into account (you
disabled them!).

In the interest of moving forward please test that the following code
works on your sparc64 system:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/

You'll also need to update LVM2 2.02-54-cvs to use these patches:
http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.54/

Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* clustered snapshots
@ 2011-04-26  9:18 Christian Motschke
  0 siblings, 0 replies; 13+ messages in thread
From: Christian Motschke @ 2011-04-26  9:18 UTC (permalink / raw)
  To: lvm-devel

Hi,

I tried the unstable packages of lvm2 in debian with support for clustered snapshots (2.02.84) (I looked at the sources, the patch is included). It is unusable (at least for me ;-).
I have corosync and clvm with openais support running. Locking type is set to 3 in lvm.conf. I have only 1 server with clvm in this test setup. I hope this is not the problem.

What I did:
gcc:~# vgchange -cy vgtest
  Volume group "vgtest" successfully changed
gcc:~# vgdisplay -C
  VG     #PV #LV #SN Attr   VSize VFree
  vgtest   1   1   0 wz--nc 4.00g 3.00g
gcc:~# lvdisplay -C
  LV      VG     Attr   LSize Origin Snap%  Move Log Copy%  Convert
  test-lv vgtest -wi-a- 1.00g                                      
gcc:~# 

gcc:~# lvchange -aey vgtest/test-lv
gcc:~# lvcreate -s -n test-lv-snap -L 100m vgtest/test-lv
  Logical volume "test-lv-snap" created
gcc:~# lvdisplay -C
  LV           VG     Attr   LSize   Origin  Snap%  Move Log Copy%  Convert
  test-lv      vgtest owi-a-   1.00g                                       
  test-lv-snap vgtest swi-a- 100.00m test-lv 100.00                        

So, after I created the snapshot it is immediately disabled and 100% full. What am I doing wrong?

With kind regards.
Christian Motschke






^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-04-26  9:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-28 23:16 clustered snapshots Mikulas Patocka
2009-09-29 21:39 ` Mikulas Patocka
2009-09-30 16:17   ` Jonathan Brassow
2009-10-01  5:43     ` Mikulas Patocka
2009-10-02 17:26   ` Mike Snitzer
2009-10-02 17:55     ` Mikulas Patocka
2009-10-04  3:48     ` Mike Snitzer
2009-10-05  2:25       ` Mikulas Patocka
2009-10-05  4:33         ` Mike Snitzer
2009-10-05 13:38           ` Mike Snitzer
2009-10-05 15:57           ` Mikulas Patocka
2009-10-05 16:24             ` Mike Snitzer
2011-04-26  9:18 Christian Motschke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.