All of lore.kernel.org
 help / color / mirror / Atom feed
* rebased snapshot-merge patches
@ 2009-08-31 22:08 ` Mike Snitzer
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2009-08-31 22:08 UTC (permalink / raw)
  To: dm-devel, lvm-devel

I've updated both the DM and LVM2 snapshot-merge patches to the latest
upstream Linux and LVM2.

All patches (both Linux and LVM2) have been tested to pass the LVM2
testsuite.  There is more work to be done on these patches but I wanted
to make others aware of them sooner rather than later.

The basis for the DM snapshot-merge patches is Linux 2.6.31-rc8 + the
remaining DM fixes that Alasdair should be pushing to Linus this week:
http://people.redhat.com/msnitzer/patches/dm-2.6.31-rc-fixes/

The DM snapshot-merge patches are here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/

The LVM2 snapshot-merge patches are here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/

I threw some real work at snapshot-merge by taking a snap _before_
formatting a 100G LV with ext3.  Then merged all the exceptions.  One
observation is that the merge process is _quite_ slow in comparison to
how long it took to format the LV (with associated snapshot exception
copy-out).  Will need to look at this further shortly... it's likely a
function of using minimal system resources during the merge via kcopyd;
whereas the ext3 format puts excessive pressure on the system's page
cache to queue work for mkfs's immediate writeback.

# lvcreate -n testlv -L 100G test
# lvcreate -n testlv_snap -s -L 20G test/testlv
# lvs
  LV          VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  testlv      test owi-a- 100.00G
  testlv_snap test swi-a-  20.00G testlv   0.00
# mkfs.ext3 /dev/test/testlv
# lvs
  LV          VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  testlv      test owi-a- 100.00G
  testlv_snap test swi-a-  20.00G testlv   8.52
# lvconvert -M test/testlv_snap
  Merging of volume testlv_snap started.
  testlv: Merged: 8.5%
  testlv: Merged: 8.4%
  testlv: Merged: 8.4%
  testlv: Merged: 8.3%
  ...
  testlv: Merged: 7.2%
  ...
  testlv: Merged: 0.0%
  testlv: Merged: 0.0%
  testlv: Merged: 0.0%
  Merge into logical volume testlv finished.
  Logical volume "snapshot1" successfully removed

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rebased snapshot-merge patches
@ 2009-08-31 22:08 ` Mike Snitzer
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2009-08-31 22:08 UTC (permalink / raw)
  To: lvm-devel

I've updated both the DM and LVM2 snapshot-merge patches to the latest
upstream Linux and LVM2.

All patches (both Linux and LVM2) have been tested to pass the LVM2
testsuite.  There is more work to be done on these patches but I wanted
to make others aware of them sooner rather than later.

The basis for the DM snapshot-merge patches is Linux 2.6.31-rc8 + the
remaining DM fixes that Alasdair should be pushing to Linus this week:
http://people.redhat.com/msnitzer/patches/dm-2.6.31-rc-fixes/

The DM snapshot-merge patches are here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/

The LVM2 snapshot-merge patches are here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/

I threw some real work at snapshot-merge by taking a snap _before_
formatting a 100G LV with ext3.  Then merged all the exceptions.  One
observation is that the merge process is _quite_ slow in comparison to
how long it took to format the LV (with associated snapshot exception
copy-out).  Will need to look at this further shortly... it's likely a
function of using minimal system resources during the merge via kcopyd;
whereas the ext3 format puts excessive pressure on the system's page
cache to queue work for mkfs's immediate writeback.

# lvcreate -n testlv -L 100G test
# lvcreate -n testlv_snap -s -L 20G test/testlv
# lvs
  LV          VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  testlv      test owi-a- 100.00G
  testlv_snap test swi-a-  20.00G testlv   0.00
# mkfs.ext3 /dev/test/testlv
# lvs
  LV          VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  testlv      test owi-a- 100.00G
  testlv_snap test swi-a-  20.00G testlv   8.52
# lvconvert -M test/testlv_snap
  Merging of volume testlv_snap started.
  testlv: Merged: 8.5%
  testlv: Merged: 8.4%
  testlv: Merged: 8.4%
  testlv: Merged: 8.3%
  ...
  testlv: Merged: 7.2%
  ...
  testlv: Merged: 0.0%
  testlv: Merged: 0.0%
  testlv: Merged: 0.0%
  Merge into logical volume testlv finished.
  Logical volume "snapshot1" successfully removed



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dm-devel] rebased snapshot-merge patches
  2009-08-31 22:08 ` Mike Snitzer
@ 2009-09-08 14:17   ` Mikulas Patocka
  -1 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2009-09-08 14:17 UTC (permalink / raw)
  To: device-mapper development; +Cc: Mike Snitzer, lvm-devel

Hi

> The DM snapshot-merge patches are here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/
> 
> The LVM2 snapshot-merge patches are here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/
> 
> I threw some real work at snapshot-merge by taking a snap _before_
> formatting a 100G LV with ext3.  Then merged all the exceptions.  One
> observation is that the merge process is _quite_ slow in comparison to
> how long it took to format the LV (with associated snapshot exception
> copy-out).  Will need to look at this further shortly... it's likely a
> function of using minimal system resources during the merge via kcopyd;
> whereas the ext3 format puts excessive pressure on the system's page
> cache to queue work for mkfs's immediate writeback.

I thought about this, see the comment:
/* TODO: use larger I/O size once we verify that kcopyd handles it */

There was some bug that kcopyd didn't handle larget I/O but it is already 
fixed, so it is possible to extend it.

s->store->type->prepare_merge returns the number of chunks that can be 
linearly copied starting from the returned chunk numbers backward. (but 
the caller is allowed to copy less, and the caller puts the number of 
copied chunks to s->store->type->commit_merge)

I.e. if returned chunk numbers are old_chunk == 10 and new_chunk == 20 and 
returned value is 3, then chunk 20 can be copied to 10, chunk 19 to 9 and 
18 to 8.

There is a variable, s->merge_write_interlock_n, that is now always one, 
but can hold larger number --- the number of chunks that is being copied.

So it can be trivialy extended to copy more chunks at once.


On the other hand, if the snapshot doesn't contain consecutive chunks (it 
was created as a result of random writes, not as a result of one big 
write), larger I/O can't be done and its merging will be slow by design. 
It could be improved by spawning several concurrent kcopyd jobs, but I 
wouldn't do it because it would complicate code too much and it would 
damage interactive performance. (in a desktop or server environment, the 
user typically cares more about interactive latency than about copy 
throughput).

Mikulas

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dm-devel] rebased snapshot-merge patches
@ 2009-09-08 14:17   ` Mikulas Patocka
  0 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2009-09-08 14:17 UTC (permalink / raw)
  To: lvm-devel

Hi

> The DM snapshot-merge patches are here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/
> 
> The LVM2 snapshot-merge patches are here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/
> 
> I threw some real work at snapshot-merge by taking a snap _before_
> formatting a 100G LV with ext3.  Then merged all the exceptions.  One
> observation is that the merge process is _quite_ slow in comparison to
> how long it took to format the LV (with associated snapshot exception
> copy-out).  Will need to look at this further shortly... it's likely a
> function of using minimal system resources during the merge via kcopyd;
> whereas the ext3 format puts excessive pressure on the system's page
> cache to queue work for mkfs's immediate writeback.

I thought about this, see the comment:
/* TODO: use larger I/O size once we verify that kcopyd handles it */

There was some bug that kcopyd didn't handle larget I/O but it is already 
fixed, so it is possible to extend it.

s->store->type->prepare_merge returns the number of chunks that can be 
linearly copied starting from the returned chunk numbers backward. (but 
the caller is allowed to copy less, and the caller puts the number of 
copied chunks to s->store->type->commit_merge)

I.e. if returned chunk numbers are old_chunk == 10 and new_chunk == 20 and 
returned value is 3, then chunk 20 can be copied to 10, chunk 19 to 9 and 
18 to 8.

There is a variable, s->merge_write_interlock_n, that is now always one, 
but can hold larger number --- the number of chunks that is being copied.

So it can be trivialy extended to copy more chunks at once.


On the other hand, if the snapshot doesn't contain consecutive chunks (it 
was created as a result of random writes, not as a result of one big 
write), larger I/O can't be done and its merging will be slow by design. 
It could be improved by spawning several concurrent kcopyd jobs, but I 
wouldn't do it because it would complicate code too much and it would 
damage interactive performance. (in a desktop or server environment, the 
user typically cares more about interactive latency than about copy 
throughput).

Mikulas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebased snapshot-merge patches
  2009-09-08 14:17   ` Mikulas Patocka
@ 2009-09-11  6:51     ` Mike Snitzer
  -1 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2009-09-11  6:51 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, lvm-devel

On Tue, Sep 08 2009 at 10:17am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
> 
> > The DM snapshot-merge patches are here:
> > http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/
> > 
> > The LVM2 snapshot-merge patches are here:
> > http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/
> > 
> > I threw some real work at snapshot-merge by taking a snap _before_
> > formatting a 100G LV with ext3.  Then merged all the exceptions.  One
> > observation is that the merge process is _quite_ slow in comparison to
> > how long it took to format the LV (with associated snapshot exception
> > copy-out).  Will need to look at this further shortly... it's likely a
> > function of using minimal system resources during the merge via kcopyd;
> > whereas the ext3 format puts excessive pressure on the system's page
> > cache to queue work for mkfs's immediate writeback.
> 
> I thought about this, see the comment:
> /* TODO: use larger I/O size once we verify that kcopyd handles it */
> 
> There was some bug that kcopyd didn't handle larget I/O but it is already 
> fixed, so it is possible to extend it.
> 
> s->store->type->prepare_merge returns the number of chunks that can be 
> linearly copied starting from the returned chunk numbers backward. (but 
> the caller is allowed to copy less, and the caller puts the number of 
> copied chunks to s->store->type->commit_merge)
> 
> I.e. if returned chunk numbers are old_chunk == 10 and new_chunk == 20 and 
> returned value is 3, then chunk 20 can be copied to 10, chunk 19 to 9 and 
> 18 to 8.
> 
> There is a variable, s->merge_write_interlock_n, that is now always one, 
> but can hold larger number --- the number of chunks that is being copied.
> 
> So it can be trivialy extended to copy more chunks at once.
> 
> 
> On the other hand, if the snapshot doesn't contain consecutive chunks (it 
> was created as a result of random writes, not as a result of one big 
> write), larger I/O can't be done and its merging will be slow by design. 
> It could be improved by spawning several concurrent kcopyd jobs, but I 
> wouldn't do it because it would complicate code too much and it would 
> damage interactive performance. (in a desktop or server environment, the 
> user typically cares more about interactive latency than about copy 
> throughput).

Mikulas,

I ran with your suggestion and it works quite well (results below).  It
proved a bit more involved to get working because I needed to:
1) Fix the fact that I was _always_ seeing a return from
   persistent_prepare_merge() that was equal to ps->current_committed
2) Fix areas that needed to properly use s->merge_write_interlock_n
   - assumption of only 1 chunk per dm_kcopyd_copy() was somewhat
     wide-spread

The changes are detailed at the end of this mail.  I still have a "TODO"
that you'll see in the patch that speaks to needing to have
snapshot_merge_process() check all origin chunks via __chunk_is_tracked()
rather than just the one check of the first chunk.  Your feedback here
would be much appreciated.  I'm not completely clear on the intent of
the __chunk_is_tracked() check there and what it now needs to do.

Here are performance results from some mkfs-based testing:

# lvcreate -n testlv -L 32G test
# lvcreate -n testlv_snap -s -L 7G test/testlv

# time mkfs.ext3 /dev/test/testlv
...
real    1m7.827s
user    0m0.116s
sys     0m11.017s

# lvs
  LV          VG   Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  testlv      test owi-a- 32.00G
  testlv_snap test swi-a-  7.00G testlv   9.05

before:
-------
# time lvconvert -M test/testlv_snap
  Merging of volume testlv_snap started.
  ...
  Merge into logical volume testlv finished.
  Logical volume "snapshot1" successfully removed

real    22m33.100s
user    0m0.045s
sys     0m0.711s


after:
------
# time lvconvert -M test/testlv_snap
  Merging of volume testlv_snap started.
  testlv: Merged: 6.4%
  testlv: Merged: 3.5%
  testlv: Merged: 0.9%
  testlv: Merged: 0.0%
  Merge into logical volume testlv finished.
  Logical volume "snapshot1" successfully removed

real    1m0.881s [NOTE: we could be ~15 sec faster than reported]
user    0m0.015s
sys     0m0.560s


So we're now seeing _very_ respectible snapshot-merge performance; but
please review the following patch in case I'm somehow cheating, thanks!

Mike

---
 drivers/md/dm-snap-persistent.c |    6 ++--
 drivers/md/dm-snap.c            |   71 +++++++++++++++++++++++---------------
 2 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
index 23c3a48..27405a0 100644
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -723,9 +723,9 @@ static int persistent_prepare_merge(struct dm_exception_store *store,
 
 	for (i = 1; i < ps->current_committed; i++) {
 		read_exception(ps, ps->current_committed - 1 - i, &de);
-		if (de.old_chunk == *old_chunk - i &&
-		    de.new_chunk == *new_chunk - i)
-			continue;
+		if (de.old_chunk != *old_chunk - i ||
+		    de.new_chunk != *new_chunk - i)
+			break;
 	}
 
 	return i;
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c7252b2..98bcbb6 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -653,12 +653,13 @@ static void merge_callback(int read_err, unsigned long write_err,
 
 static void snapshot_merge_process(struct dm_snapshot *s)
 {
-	int r;
+	int r, linear_chunks;
 	chunk_t old_chunk, new_chunk;
 	struct origin *o;
 	chunk_t min_chunksize;
 	int must_wait;
 	struct dm_io_region src, dest;
+	sector_t io_size;
 
 	BUG_ON(!s->merge_running);
 	if (s->merge_shutdown)
@@ -674,34 +675,37 @@ static void snapshot_merge_process(struct dm_snapshot *s)
 		DMERR("target store does not support merging");
 		goto shut;
 	}
-	r = s->store->type->prepare_merge(s->store, &old_chunk, &new_chunk);
-	if (r <= 0) {
-		if (r < 0)
+	linear_chunks = s->store->type->prepare_merge(s->store,
+						      &old_chunk, &new_chunk);
+	if (linear_chunks <= 0) {
+		if (linear_chunks < 0)
 			DMERR("Read error in exception store, "
 			      "shutting down merge");
 		goto shut;
 	}
 
-	/* TODO: use larger I/O size once we verify that kcopyd handles it */
-
+	/*
+	 * Use one (potentially large) I/O to copy all 'linear_chunks'
+	 * from the exception store to the origin
+	 */
+	io_size = linear_chunks * s->store->chunk_size;
 	dest.bdev = s->origin->bdev;
-	dest.sector = chunk_to_sector(s->store, old_chunk);
-	dest.count = min((sector_t)s->store->chunk_size,
-			 get_dev_size(dest.bdev) - dest.sector);
+	dest.sector = chunk_to_sector(s->store, old_chunk + 1 - linear_chunks);
+	dest.count = min(io_size, get_dev_size(dest.bdev) - dest.sector);
 
 	src.bdev = s->cow->bdev;
-	src.sector = chunk_to_sector(s->store, new_chunk);
+	src.sector = chunk_to_sector(s->store, new_chunk + 1 - linear_chunks);
 	src.count = dest.count;
 
 test_again:
-	/* Reallocate other snapshots */
+	/* Reallocate other snapshots; must account for all 'linear_chunks' */
 	down_read(&_origins_lock);
 	o = __lookup_origin(s->origin->bdev);
 	must_wait = 0;
 	min_chunksize = minimum_chunk_size(o);
 	if (min_chunksize) {
 		chunk_t n;
-		for (n = 0; n < s->store->chunk_size; n += min_chunksize) {
+		for (n = 0; n < io_size; n += min_chunksize) {
 			r = __origin_write(&o->snapshots, dest.sector + n,
 					   NULL);
 			if (r == DM_MAPIO_SUBMITTED)
@@ -715,10 +719,12 @@ test_again:
 	}
 
 	down_write(&s->lock);
-	s->merge_write_interlock = old_chunk;
-	s->merge_write_interlock_n = 1;
+	s->merge_write_interlock = old_chunk + 1 - linear_chunks;
+	s->merge_write_interlock_n = linear_chunks;
 	up_write(&s->lock);
 
+	/* TODO: rather than only checking if the first chunk has a conflicting
+	   read; do all 'linear_chunks' need to be checked? */
 	while (__chunk_is_tracked(s, old_chunk))
 		msleep(1);
 
@@ -745,7 +751,7 @@ static inline void release_write_interlock(struct dm_snapshot *s, int err)
 
 static void merge_callback(int read_err, unsigned long write_err, void *context)
 {
-	int r;
+	int r, i;
 	struct dm_snapshot *s = context;
 	struct dm_snap_exception *e;
 
@@ -757,25 +763,34 @@ static void merge_callback(int read_err, unsigned long write_err, void *context)
 		goto shut;
 	}
 
-	r = s->store->type->commit_merge(s->store, 1);
+	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n);
 	if (r < 0) {
 		DMERR("Write error in exception store, shutting down merge");
 		goto shut;
 	}
 
 	down_write(&s->lock);
-	e = lookup_exception(&s->complete, s->merge_write_interlock);
-	if (!e) {
-		DMERR("exception for block %llu is on disk but not in memory",
-		      (unsigned long long)s->merge_write_interlock);
-		up_write(&s->lock);
-		goto shut;
-	}
-	if (dm_consecutive_chunk_count(e)) {
-		dm_consecutive_chunk_count_dec(e);
-	} else {
-		remove_exception(e);
-		free_exception(e);
+	/*
+	 * Must process chunks (and associated exceptions) in reverse
+	 * so that dm_consecutive_chunk_count_dec() accounting works
+	 */
+	for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
+		e = lookup_exception(&s->complete,
+				     s->merge_write_interlock + i);
+		if (!e) {
+			DMERR("exception for block %llu is on "
+			      "disk but not in memory",
+			      (unsigned long long)
+			      s->merge_write_interlock + i);
+			up_write(&s->lock);
+			goto shut;
+		}
+		if (dm_consecutive_chunk_count(e)) {
+			dm_consecutive_chunk_count_dec(e);
+		} else {
+			remove_exception(e);
+			free_exception(e);
+		}
 	}
 	release_write_interlock(s, 0);
 

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: rebased snapshot-merge patches
@ 2009-09-11  6:51     ` Mike Snitzer
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2009-09-11  6:51 UTC (permalink / raw)
  To: lvm-devel

On Tue, Sep 08 2009 at 10:17am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
> 
> > The DM snapshot-merge patches are here:
> > http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/
> > 
> > The LVM2 snapshot-merge patches are here:
> > http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/
> > 
> > I threw some real work at snapshot-merge by taking a snap _before_
> > formatting a 100G LV with ext3.  Then merged all the exceptions.  One
> > observation is that the merge process is _quite_ slow in comparison to
> > how long it took to format the LV (with associated snapshot exception
> > copy-out).  Will need to look at this further shortly... it's likely a
> > function of using minimal system resources during the merge via kcopyd;
> > whereas the ext3 format puts excessive pressure on the system's page
> > cache to queue work for mkfs's immediate writeback.
> 
> I thought about this, see the comment:
> /* TODO: use larger I/O size once we verify that kcopyd handles it */
> 
> There was some bug that kcopyd didn't handle larget I/O but it is already 
> fixed, so it is possible to extend it.
> 
> s->store->type->prepare_merge returns the number of chunks that can be 
> linearly copied starting from the returned chunk numbers backward. (but 
> the caller is allowed to copy less, and the caller puts the number of 
> copied chunks to s->store->type->commit_merge)
> 
> I.e. if returned chunk numbers are old_chunk == 10 and new_chunk == 20 and 
> returned value is 3, then chunk 20 can be copied to 10, chunk 19 to 9 and 
> 18 to 8.
> 
> There is a variable, s->merge_write_interlock_n, that is now always one, 
> but can hold larger number --- the number of chunks that is being copied.
> 
> So it can be trivialy extended to copy more chunks at once.
> 
> 
> On the other hand, if the snapshot doesn't contain consecutive chunks (it 
> was created as a result of random writes, not as a result of one big 
> write), larger I/O can't be done and its merging will be slow by design. 
> It could be improved by spawning several concurrent kcopyd jobs, but I 
> wouldn't do it because it would complicate code too much and it would 
> damage interactive performance. (in a desktop or server environment, the 
> user typically cares more about interactive latency than about copy 
> throughput).

Mikulas,

I ran with your suggestion and it works quite well (results below).  It
proved a bit more involved to get working because I needed to:
1) Fix the fact that I was _always_ seeing a return from
   persistent_prepare_merge() that was equal to ps->current_committed
2) Fix areas that needed to properly use s->merge_write_interlock_n
   - assumption of only 1 chunk per dm_kcopyd_copy() was somewhat
     wide-spread

The changes are detailed at the end of this mail.  I still have a "TODO"
that you'll see in the patch that speaks to needing to have
snapshot_merge_process() check all origin chunks via __chunk_is_tracked()
rather than just the one check of the first chunk.  Your feedback here
would be much appreciated.  I'm not completely clear on the intent of
the __chunk_is_tracked() check there and what it now needs to do.

Here are performance results from some mkfs-based testing:

# lvcreate -n testlv -L 32G test
# lvcreate -n testlv_snap -s -L 7G test/testlv

# time mkfs.ext3 /dev/test/testlv
...
real    1m7.827s
user    0m0.116s
sys     0m11.017s

# lvs
  LV          VG   Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  testlv      test owi-a- 32.00G
  testlv_snap test swi-a-  7.00G testlv   9.05

before:
-------
# time lvconvert -M test/testlv_snap
  Merging of volume testlv_snap started.
  ...
  Merge into logical volume testlv finished.
  Logical volume "snapshot1" successfully removed

real    22m33.100s
user    0m0.045s
sys     0m0.711s


after:
------
# time lvconvert -M test/testlv_snap
  Merging of volume testlv_snap started.
  testlv: Merged: 6.4%
  testlv: Merged: 3.5%
  testlv: Merged: 0.9%
  testlv: Merged: 0.0%
  Merge into logical volume testlv finished.
  Logical volume "snapshot1" successfully removed

real    1m0.881s [NOTE: we could be ~15 sec faster than reported]
user    0m0.015s
sys     0m0.560s


So we're now seeing _very_ respectible snapshot-merge performance; but
please review the following patch in case I'm somehow cheating, thanks!

Mike

---
 drivers/md/dm-snap-persistent.c |    6 ++--
 drivers/md/dm-snap.c            |   71 +++++++++++++++++++++++---------------
 2 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
index 23c3a48..27405a0 100644
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -723,9 +723,9 @@ static int persistent_prepare_merge(struct dm_exception_store *store,
 
 	for (i = 1; i < ps->current_committed; i++) {
 		read_exception(ps, ps->current_committed - 1 - i, &de);
-		if (de.old_chunk == *old_chunk - i &&
-		    de.new_chunk == *new_chunk - i)
-			continue;
+		if (de.old_chunk != *old_chunk - i ||
+		    de.new_chunk != *new_chunk - i)
+			break;
 	}
 
 	return i;
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c7252b2..98bcbb6 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -653,12 +653,13 @@ static void merge_callback(int read_err, unsigned long write_err,
 
 static void snapshot_merge_process(struct dm_snapshot *s)
 {
-	int r;
+	int r, linear_chunks;
 	chunk_t old_chunk, new_chunk;
 	struct origin *o;
 	chunk_t min_chunksize;
 	int must_wait;
 	struct dm_io_region src, dest;
+	sector_t io_size;
 
 	BUG_ON(!s->merge_running);
 	if (s->merge_shutdown)
@@ -674,34 +675,37 @@ static void snapshot_merge_process(struct dm_snapshot *s)
 		DMERR("target store does not support merging");
 		goto shut;
 	}
-	r = s->store->type->prepare_merge(s->store, &old_chunk, &new_chunk);
-	if (r <= 0) {
-		if (r < 0)
+	linear_chunks = s->store->type->prepare_merge(s->store,
+						      &old_chunk, &new_chunk);
+	if (linear_chunks <= 0) {
+		if (linear_chunks < 0)
 			DMERR("Read error in exception store, "
 			      "shutting down merge");
 		goto shut;
 	}
 
-	/* TODO: use larger I/O size once we verify that kcopyd handles it */
-
+	/*
+	 * Use one (potentially large) I/O to copy all 'linear_chunks'
+	 * from the exception store to the origin
+	 */
+	io_size = linear_chunks * s->store->chunk_size;
 	dest.bdev = s->origin->bdev;
-	dest.sector = chunk_to_sector(s->store, old_chunk);
-	dest.count = min((sector_t)s->store->chunk_size,
-			 get_dev_size(dest.bdev) - dest.sector);
+	dest.sector = chunk_to_sector(s->store, old_chunk + 1 - linear_chunks);
+	dest.count = min(io_size, get_dev_size(dest.bdev) - dest.sector);
 
 	src.bdev = s->cow->bdev;
-	src.sector = chunk_to_sector(s->store, new_chunk);
+	src.sector = chunk_to_sector(s->store, new_chunk + 1 - linear_chunks);
 	src.count = dest.count;
 
 test_again:
-	/* Reallocate other snapshots */
+	/* Reallocate other snapshots; must account for all 'linear_chunks' */
 	down_read(&_origins_lock);
 	o = __lookup_origin(s->origin->bdev);
 	must_wait = 0;
 	min_chunksize = minimum_chunk_size(o);
 	if (min_chunksize) {
 		chunk_t n;
-		for (n = 0; n < s->store->chunk_size; n += min_chunksize) {
+		for (n = 0; n < io_size; n += min_chunksize) {
 			r = __origin_write(&o->snapshots, dest.sector + n,
 					   NULL);
 			if (r == DM_MAPIO_SUBMITTED)
@@ -715,10 +719,12 @@ test_again:
 	}
 
 	down_write(&s->lock);
-	s->merge_write_interlock = old_chunk;
-	s->merge_write_interlock_n = 1;
+	s->merge_write_interlock = old_chunk + 1 - linear_chunks;
+	s->merge_write_interlock_n = linear_chunks;
 	up_write(&s->lock);
 
+	/* TODO: rather than only checking if the first chunk has a conflicting
+	   read; do all 'linear_chunks' need to be checked? */
 	while (__chunk_is_tracked(s, old_chunk))
 		msleep(1);
 
@@ -745,7 +751,7 @@ static inline void release_write_interlock(struct dm_snapshot *s, int err)
 
 static void merge_callback(int read_err, unsigned long write_err, void *context)
 {
-	int r;
+	int r, i;
 	struct dm_snapshot *s = context;
 	struct dm_snap_exception *e;
 
@@ -757,25 +763,34 @@ static void merge_callback(int read_err, unsigned long write_err, void *context)
 		goto shut;
 	}
 
-	r = s->store->type->commit_merge(s->store, 1);
+	r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n);
 	if (r < 0) {
 		DMERR("Write error in exception store, shutting down merge");
 		goto shut;
 	}
 
 	down_write(&s->lock);
-	e = lookup_exception(&s->complete, s->merge_write_interlock);
-	if (!e) {
-		DMERR("exception for block %llu is on disk but not in memory",
-		      (unsigned long long)s->merge_write_interlock);
-		up_write(&s->lock);
-		goto shut;
-	}
-	if (dm_consecutive_chunk_count(e)) {
-		dm_consecutive_chunk_count_dec(e);
-	} else {
-		remove_exception(e);
-		free_exception(e);
+	/*
+	 * Must process chunks (and associated exceptions) in reverse
+	 * so that dm_consecutive_chunk_count_dec() accounting works
+	 */
+	for (i = s->merge_write_interlock_n - 1; i >= 0; i--) {
+		e = lookup_exception(&s->complete,
+				     s->merge_write_interlock + i);
+		if (!e) {
+			DMERR("exception for block %llu is on "
+			      "disk but not in memory",
+			      (unsigned long long)
+			      s->merge_write_interlock + i);
+			up_write(&s->lock);
+			goto shut;
+		}
+		if (dm_consecutive_chunk_count(e)) {
+			dm_consecutive_chunk_count_dec(e);
+		} else {
+			remove_exception(e);
+			free_exception(e);
+		}
 	}
 	release_write_interlock(s, 0);
 



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: rebased snapshot-merge patches
  2009-09-11  6:51     ` Mike Snitzer
@ 2009-09-12  6:56       ` Mike Snitzer
  -1 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2009-09-12  6:56 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, lvm-devel

On Fri, Sep 11 2009 at  2:51am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> Mikulas,
> 
> I ran with your suggestion and it works quite well (results below).
...
> The changes are detailed at the end of this mail.  I still have a "TODO"
> that you'll see in the patch that speaks to needing to have
> snapshot_merge_process() check all origin chunks via __chunk_is_tracked()
> rather than just the one check of the first chunk.  Your feedback here
> would be much appreciated.  I'm not completely clear on the intent of
> the __chunk_is_tracked() check there and what it now needs to do.

I believe I've sorted this out.  The TODO should've read something like:
snapshot_merge_process() should delay the merging of _all_ chunks
that have in-progress writes; not just the first chunk in the region
that is to be merged.

The revised patch (relative to Jon's unified snapshot tree) is here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/dm-snapshot-merge-use-larger-io-when-merging.patch

Pretty sure it _should_ work; but can't test until we get this unified
snapshot patchset working properly.

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebased snapshot-merge patches
@ 2009-09-12  6:56       ` Mike Snitzer
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2009-09-12  6:56 UTC (permalink / raw)
  To: lvm-devel

On Fri, Sep 11 2009 at  2:51am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> Mikulas,
> 
> I ran with your suggestion and it works quite well (results below).
...
> The changes are detailed at the end of this mail.  I still have a "TODO"
> that you'll see in the patch that speaks to needing to have
> snapshot_merge_process() check all origin chunks via __chunk_is_tracked()
> rather than just the one check of the first chunk.  Your feedback here
> would be much appreciated.  I'm not completely clear on the intent of
> the __chunk_is_tracked() check there and what it now needs to do.

I believe I've sorted this out.  The TODO should've read something like:
snapshot_merge_process() should delay the merging of _all_ chunks
that have in-progress writes; not just the first chunk in the region
that is to be merged.

The revised patch (relative to Jon's unified snapshot tree) is here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/dm-snapshot-merge-use-larger-io-when-merging.patch

Pretty sure it _should_ work; but can't test until we get this unified
snapshot patchset working properly.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebased snapshot-merge patches
  2009-09-12  6:56       ` Mike Snitzer
@ 2009-09-12  7:54         ` Mikulas Patocka
  -1 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2009-09-12  7:54 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, lvm-devel

On Sat, 12 Sep 2009, Mike Snitzer wrote:

> On Fri, Sep 11 2009 at  2:51am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > Mikulas,
> > 
> > I ran with your suggestion and it works quite well (results below).
> ...
> > The changes are detailed at the end of this mail.  I still have a "TODO"
> > that you'll see in the patch that speaks to needing to have
> > snapshot_merge_process() check all origin chunks via __chunk_is_tracked()
> > rather than just the one check of the first chunk.  Your feedback here
> > would be much appreciated.  I'm not completely clear on the intent of
> > the __chunk_is_tracked() check there and what it now needs to do.
> 
> I believe I've sorted this out.  The TODO should've read something like:
> snapshot_merge_process() should delay the merging of _all_ chunks
> that have in-progress writes; not just the first chunk in the region
> that is to be merged.
> 
> The revised patch (relative to Jon's unified snapshot tree) is here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/dm-snapshot-merge-use-larger-io-when-merging.patch
> 
> Pretty sure it _should_ work; but can't test until we get this unified
> snapshot patchset working properly.

Yes, that's right. The purpose of the tracking while merging is to exclude 
pending writes to the region that is being merged --- so we must wait for 
writes on all chunks. The patch seems fine.

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebased snapshot-merge patches
@ 2009-09-12  7:54         ` Mikulas Patocka
  0 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2009-09-12  7:54 UTC (permalink / raw)
  To: lvm-devel

On Sat, 12 Sep 2009, Mike Snitzer wrote:

> On Fri, Sep 11 2009 at  2:51am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > Mikulas,
> > 
> > I ran with your suggestion and it works quite well (results below).
> ...
> > The changes are detailed at the end of this mail.  I still have a "TODO"
> > that you'll see in the patch that speaks to needing to have
> > snapshot_merge_process() check all origin chunks via __chunk_is_tracked()
> > rather than just the one check of the first chunk.  Your feedback here
> > would be much appreciated.  I'm not completely clear on the intent of
> > the __chunk_is_tracked() check there and what it now needs to do.
> 
> I believe I've sorted this out.  The TODO should've read something like:
> snapshot_merge_process() should delay the merging of _all_ chunks
> that have in-progress writes; not just the first chunk in the region
> that is to be merged.
> 
> The revised patch (relative to Jon's unified snapshot tree) is here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/dm-snapshot-merge-use-larger-io-when-merging.patch
> 
> Pretty sure it _should_ work; but can't test until we get this unified
> snapshot patchset working properly.

Yes, that's right. The purpose of the tracking while merging is to exclude 
pending writes to the region that is being merged --- so we must wait for 
writes on all chunks. The patch seems fine.

Mikulas



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-09-12  7:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-31 22:08 rebased snapshot-merge patches Mike Snitzer
2009-08-31 22:08 ` Mike Snitzer
2009-09-08 14:17 ` [dm-devel] " Mikulas Patocka
2009-09-08 14:17   ` Mikulas Patocka
2009-09-11  6:51   ` Mike Snitzer
2009-09-11  6:51     ` Mike Snitzer
2009-09-12  6:56     ` Mike Snitzer
2009-09-12  6:56       ` Mike Snitzer
2009-09-12  7:54       ` Mikulas Patocka
2009-09-12  7:54         ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.