All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Eric Wheeler <ewheeler@ewheeler.net>
Cc: axboe@kernel.dk, dm-devel@redhat.com, ejt@redhat.com,
	LVM2 development <lvm-devel@redhat.com>
Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
Date: Thu, 4 Dec 2014 10:33:59 -0500	[thread overview]
Message-ID: <20141204153358.GA19315@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1412032318290.29609@ware.dreamhost.com>

Hi,

In the future please send DM changes to dm-devel@redhat.com

Comments inlined below, and I've provided a revised patch at the end.

On Thu, Dec 04 2014 at  2:25am -0500,
Eric Wheeler <ewheeler@ewheeler.net> wrote:

> This patch skips all-zero writes to unallocated blocks of dm-thinp volumes.
> 
> Unallocated zero-writes are 70x faster and never allocate space in this test:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s
> 
> Without the patch, zero-writes allocate space and hit the disk:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s
> 
> For the test below, notice the allocation difference for thin volumes
> test1 and test2 (after dd if=test1 of=test2), even though they have the
> same md5sum:
>   LV    VG   Attr       LSize Pool  Origin Data%
>   test1 test Vwi-a-tz-- 4.00g thinp         22.04
>   test2 test Vwi-a-tz-- 4.00g thinp         18.33
> 
> An additional 3.71% of space was saved by the patch, and so were
> the ~150MB of (possibly random) IOs that would have hit disk, not to
> mention reads that now bypass the disk since they are unallocated.
> We also save the metadata overhead of ~2400 allocations when calling
> provision_block().
> 
>         # lvcreate -T test/thinp -L 5G
>         # lvcreate -T test/thinp -V 4G -n test1
>         # lvcreate -T test/thinp -V 4G -n test2
> 
> Simple ext4+kernel tree extract test:
> 
> First prepare two dm-thinp volumes test1 and test2 of equal size.  First
> mkfs.ext4 /dev/test/test1 without the patch and then mount and extract
> 3.17.4's source tree onto the test1 filesystem, and unmount
> 
> Next, install patched dm_thin_pool.ko, then dd test1 over test2 and
> verify checksums:
>         # dd if=/dev/test/test1  of=/dev/test/test2 bs=1M
>         # md5sum /dev/test/test?
>         b210f032a6465178103317f3c40ab59f  /dev/test/test1
>         b210f032a6465178103317f3c40ab59f  /dev/test/test2
> Yes, they match!
> 
> 
> Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net>
> ---
>  Resending the patch as it was malformed on the first try.

The resend was also malformed.. but don't worry about resending for this
patch.

> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index fc9c848..71dd545 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  	}
>  }

A helper like this really belongs in block/bio.c:

> +/* return true if bio data contains all 0x00's */
> +bool bio_all_zeros(struct bio *bio) +{
> +	unsigned long flags;
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +
> +	char *data;
> +	uint64_t *p;
> +	int i, count;
> + +	bool allzeros = true;
> +
> +	bio_for_each_segment(bv, bio, iter) {
> +		data = bvec_kmap_irq(&bv, &flags);
> +
> +		p = (uint64_t*)data;
> +		count = bv.bv_len / sizeof(uint64_t);

Addressing a bio's contents in terms of uint64_t has the potential to
access beyond bv.bv_len (byte addressing vs 64bit addressing).  I can
see you were just looking to be more efficient about checking the bios'
contents but I'm not convinced it would always be safe.

I'm open to something more efficient than what I implemented below, but
it is the most straight-forward code I thought of.

> +
> +		for (i = 0; i < count; i++) {
> +			if (*p)	{
> +				allzeros = false;
> +				break;
> +			}
> +			p++;
> +		}
> +
> +		bvec_kunmap_irq(data, &flags);
> +
> +		if (likely(!allzeros))
> +				break;
> +	}
> +
> +	return allzeros;
> +}
> +
>  static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block,
>  			    struct dm_bio_prison_cell *cell)
>  {
> @@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  		return;
>  	}
> 
> +	/*
> +	 * Skip writes of all zeroes
> +	 */
> +	if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) {
> +		cell_defer_no_holder(tc, cell);
> +		bio_endio(bio, 0);
> +		return;
> +	}
> +

No need to check for bio_data_dir(bio) == WRITE (at this point in
provision_block() we already know it is a WRITE).

Here is a revised patch that is more like I'd expect to land upstream.
Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
should I split it out as a separate patch for you to pick up or just
carry it as part of the patch that lands in linux-dm.git?


From: Mike Snitzer <snitzer@redhat.com>
Date: Thu, 4 Dec 2014 10:18:32 -0500
Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Eric Wheeler <ewheeler@ewheeler.net>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c          |   25 +++++++++++++++++++++++++
 drivers/md/dm-thin.c |   10 ++++++++++
 include/linux/bio.h  |    1 +
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3e6e198..7d07593 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -515,6 +515,31 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+
+		for (i = 0; i < bv.bv_len; i++) {
+			if (*p) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			p++;
+		}
+		bvec_kunmap_irq(data, &flags);
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 8735543..13aff8c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1501,6 +1501,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+	/*
+	 * Optimize away writes of all zeroes, subsequent reads to
+	 * associated unprovisioned blocks will be zero filled.
+	 */
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7347f48..602094b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -465,6 +465,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.4.4

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: lvm-devel@redhat.com
Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
Date: Thu, 4 Dec 2014 10:33:59 -0500	[thread overview]
Message-ID: <20141204153358.GA19315@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1412032318290.29609@ware.dreamhost.com>

Hi,

In the future please send DM changes to dm-devel at redhat.com

Comments inlined below, and I've provided a revised patch at the end.

On Thu, Dec 04 2014 at  2:25am -0500,
Eric Wheeler <ewheeler@ewheeler.net> wrote:

> This patch skips all-zero writes to unallocated blocks of dm-thinp volumes.
> 
> Unallocated zero-writes are 70x faster and never allocate space in this test:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s
> 
> Without the patch, zero-writes allocate space and hit the disk:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s
> 
> For the test below, notice the allocation difference for thin volumes
> test1 and test2 (after dd if=test1 of=test2), even though they have the
> same md5sum:
>   LV    VG   Attr       LSize Pool  Origin Data%
>   test1 test Vwi-a-tz-- 4.00g thinp         22.04
>   test2 test Vwi-a-tz-- 4.00g thinp         18.33
> 
> An additional 3.71% of space was saved by the patch, and so were
> the ~150MB of (possibly random) IOs that would have hit disk, not to
> mention reads that now bypass the disk since they are unallocated.
> We also save the metadata overhead of ~2400 allocations when calling
> provision_block().
> 
>         # lvcreate -T test/thinp -L 5G
>         # lvcreate -T test/thinp -V 4G -n test1
>         # lvcreate -T test/thinp -V 4G -n test2
> 
> Simple ext4+kernel tree extract test:
> 
> First prepare two dm-thinp volumes test1 and test2 of equal size.  First
> mkfs.ext4 /dev/test/test1 without the patch and then mount and extract
> 3.17.4's source tree onto the test1 filesystem, and unmount
> 
> Next, install patched dm_thin_pool.ko, then dd test1 over test2 and
> verify checksums:
>         # dd if=/dev/test/test1  of=/dev/test/test2 bs=1M
>         # md5sum /dev/test/test?
>         b210f032a6465178103317f3c40ab59f  /dev/test/test1
>         b210f032a6465178103317f3c40ab59f  /dev/test/test2
> Yes, they match!
> 
> 
> Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net>
> ---
>  Resending the patch as it was malformed on the first try.

The resend was also malformed.. but don't worry about resending for this
patch.

> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index fc9c848..71dd545 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  	}
>  }

A helper like this really belongs in block/bio.c:

> +/* return true if bio data contains all 0x00's */
> +bool bio_all_zeros(struct bio *bio) +{
> +	unsigned long flags;
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +
> +	char *data;
> +	uint64_t *p;
> +	int i, count;
> + +	bool allzeros = true;
> +
> +	bio_for_each_segment(bv, bio, iter) {
> +		data = bvec_kmap_irq(&bv, &flags);
> +
> +		p = (uint64_t*)data;
> +		count = bv.bv_len / sizeof(uint64_t);

Addressing a bio's contents in terms of uint64_t has the potential to
access beyond bv.bv_len (byte addressing vs 64bit addressing).  I can
see you were just looking to be more efficient about checking the bios'
contents but I'm not convinced it would always be safe.

I'm open to something more efficient than what I implemented below, but
it is the most straight-forward code I thought of.

> +
> +		for (i = 0; i < count; i++) {
> +			if (*p)	{
> +				allzeros = false;
> +				break;
> +			}
> +			p++;
> +		}
> +
> +		bvec_kunmap_irq(data, &flags);
> +
> +		if (likely(!allzeros))
> +				break;
> +	}
> +
> +	return allzeros;
> +}
> +
>  static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block,
>  			    struct dm_bio_prison_cell *cell)
>  {
> @@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  		return;
>  	}
> 
> +	/*
> +	 * Skip writes of all zeroes
> +	 */
> +	if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) {
> +		cell_defer_no_holder(tc, cell);
> +		bio_endio(bio, 0);
> +		return;
> +	}
> +

No need to check for bio_data_dir(bio) == WRITE (at this point in
provision_block() we already know it is a WRITE).

Here is a revised patch that is more like I'd expect to land upstream.
Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
should I split it out as a separate patch for you to pick up or just
carry it as part of the patch that lands in linux-dm.git?


From: Mike Snitzer <snitzer@redhat.com>
Date: Thu, 4 Dec 2014 10:18:32 -0500
Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Eric Wheeler <ewheeler@ewheeler.net>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c          |   25 +++++++++++++++++++++++++
 drivers/md/dm-thin.c |   10 ++++++++++
 include/linux/bio.h  |    1 +
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3e6e198..7d07593 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -515,6 +515,31 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+
+		for (i = 0; i < bv.bv_len; i++) {
+			if (*p) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			p++;
+		}
+		bvec_kunmap_irq(data, &flags);
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 8735543..13aff8c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1501,6 +1501,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+	/*
+	 * Optimize away writes of all zeroes, subsequent reads to
+	 * associated unprovisioned blocks will be zero filled.
+	 */
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7347f48..602094b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -465,6 +465,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.4.4



  reply	other threads:[~2014-12-04 15:33 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-30 18:38 dm-thinp feature request: skip allocation on writes of all zeroes Eric Wheeler
2014-09-30 19:30 ` Zdenek Kabelac
2014-09-30 20:13 ` Mike Snitzer
2014-09-30 22:38   ` Eric Wheeler
2014-12-04  7:05 ` [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks Eric Wheeler
2014-12-04  7:25   ` Eric Wheeler
2014-12-04 15:33     ` Mike Snitzer [this message]
2014-12-04 15:33       ` [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Mike Snitzer
2014-12-04 15:43       ` Mike Snitzer
2014-12-04 15:43         ` Mike Snitzer
2014-12-06 22:33         ` Eric Wheeler
2014-12-06 22:33           ` Eric Wheeler
2014-12-05 14:47       ` Mike Snitzer
2014-12-05 14:47         ` Mike Snitzer
2014-12-06 22:36         ` Eric Wheeler
2014-12-06 22:36           ` Eric Wheeler
2014-12-05 17:27       ` [PATCH] " Jens Axboe
2014-12-05 17:27         ` Jens Axboe
2014-12-05 18:33         ` Mike Snitzer
2014-12-05 18:33           ` Mike Snitzer
2014-12-06 22:40           ` Eric Wheeler
2014-12-06 22:40             ` Eric Wheeler
2014-12-07  1:41             ` [lvm-devel] " Jens Axboe
2014-12-07  1:41               ` Jens Axboe
2014-12-07  6:30               ` Eric Wheeler
2014-12-07  6:30                 ` Eric Wheeler
2014-12-07  6:45                 ` Eric Wheeler
2014-12-07  6:45                   ` Eric Wheeler
2014-12-08 16:57                 ` [lvm-devel] " Jens Axboe
2014-12-08 16:57                   ` Jens Axboe
2014-12-09  8:02                   ` Eric Wheeler
2014-12-09  8:02                     ` Eric Wheeler
2014-12-09 15:31                     ` [lvm-devel] " Jens Axboe
2014-12-09 15:31                       ` Jens Axboe
2014-12-09 15:41                       ` [lvm-devel] " Jens Axboe
2014-12-09 15:41                         ` Jens Axboe
2014-12-10  2:52                         ` [PATCH] " Eric Wheeler
2014-12-10  2:52                           ` Eric Wheeler
2015-01-26  2:53                       ` Eric Wheeler
2015-01-26  2:53                         ` Eric Wheeler
2015-02-15  0:31                         ` [lvm-devel] " Eric Wheeler
2015-02-15  0:31                           ` Eric Wheeler
2014-12-09 14:38                   ` Marian Csontos
2014-12-09 14:38                     ` Marian Csontos
2014-12-07  1:36           ` Jens Axboe
2014-12-07  1:36             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141204153358.GA19315@redhat.com \
    --to=snitzer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=ejt@redhat.com \
    --cc=ewheeler@ewheeler.net \
    --cc=lvm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.