dm-thinp feature request: skip allocation on writes of all zeroes

All of lore.kernel.org
 help / color / mirror / Atom feed

* dm-thinp feature request: skip allocation on writes of all zeroes
@ 2014-09-30 18:38 Eric Wheeler
  2014-09-30 19:30 ` Zdenek Kabelac
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-09-30 18:38 UTC (permalink / raw)
  To: lvm-devel

Hello all,

We have been testing dm-thinp for the last couple years and it has been 
stable for the most part.

I have always wondered, does LVM ignore writes with blocks of all zeroes 
when the destination address is unallocated? From experience, it does not 
appear that this is so (or I cannot find the config option). This would 
reduce IOs, particularily when importing VM disk images from other 
sources. Currently, we must import, allocate lots of zeroes, then fstrim 
after the fact.

I have paged through the dm-thinp code, but I am not quite certain what 
the best place to add this logic would be such that the IO is completed 
and the allocation is not performed.

I would be happy to work on a patch, but I need a little bit of direction:

What would it take to add this as a feature option to the dmsetup table 
string?

Where in the dm-thinp code would this best be addressed?

--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm-thinp feature request: skip allocation on writes of all zeroes
  2014-09-30 18:38 dm-thinp feature request: skip allocation on writes of all zeroes Eric Wheeler
@ 2014-09-30 19:30 ` Zdenek Kabelac
  2014-09-30 20:13 ` Mike Snitzer
  2014-12-04  7:05 ` [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks Eric Wheeler
  2 siblings, 0 replies; 46+ messages in thread
From: Zdenek Kabelac @ 2014-09-30 19:30 UTC (permalink / raw)
  To: lvm-devel

Dne 30.9.2014 v 20:38 Eric Wheeler napsal(a):
> Hello all,
>
> We have been testing dm-thinp for the last couple years and it has been stable
> for the most part.
>
> I have always wondered, does LVM ignore writes with blocks of all zeroes when
> the destination address is unallocated? From experience, it does not appear
> that this is so (or I cannot find the config option). This would reduce IOs,
> particularily when importing VM disk images from other sources. Currently, we
> must import, allocate lots of zeroes, then fstrim after the fact.

If you use thinpool with enabled zeroing of provisioned chunks -
then you always get 'zeroed' thin-volume device in all parts -
so you don't need to zero anything in such block device.

Now if you 'skip' writing zeroes (like i.e. cp -a  does when copying files),
you get your  'skip' zero writes.

Note - if your original source device would be already an LV - it might be 
then probably more efficient to either use such LV as an external origin, or 
implement support for merging LV into thin-pool.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm-thinp feature request: skip allocation on writes of all zeroes
  2014-09-30 18:38 dm-thinp feature request: skip allocation on writes of all zeroes Eric Wheeler
  2014-09-30 19:30 ` Zdenek Kabelac
@ 2014-09-30 20:13 ` Mike Snitzer
  2014-09-30 22:38   ` Eric Wheeler
  2014-12-04  7:05 ` [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks Eric Wheeler
  2 siblings, 1 reply; 46+ messages in thread
From: Mike Snitzer @ 2014-09-30 20:13 UTC (permalink / raw)
  To: lvm-devel

On Tue, Sep 30 2014 at  2:38pm -0400,
Eric Wheeler <lvm-dev@lists.ewheeler.net> wrote:

> Hello all,
> 
> We have been testing dm-thinp for the last couple years and it has
> been stable for the most part.
> 
> I have always wondered, does LVM ignore writes with blocks of all
> zeroes when the destination address is unallocated? From experience,
> it does not appear that this is so (or I cannot find the config
> option). This would reduce IOs, particularily when importing VM disk
> images from other sources. Currently, we must import, allocate lots
> of zeroes, then fstrim after the fact.

We don't inspect the data being written to infer WRITE SAME-like action.
And in your example fstrim wouldn't do anything for you.  The zeroed
data in the image is still valid isn't it?  fstrim is only useful to
reclaim blocks that are no longer used.

> I have paged through the dm-thinp code, but I am not quite certain
> what the best place to add this logic would be such that the IO is
> completed and the allocation is not performed.
> 
> I would be happy to work on a patch, but I need a little bit of direction:
> 
> What would it take to add this as a feature option to the dmsetup
> table string?
> 
> Where in the dm-thinp code would this best be addressed?

Honestly, you'd be better served to use deduplication.  A dm-dedup
target has been in development (and I'm reviewing it), see:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=dm-dedup

It is very much in-development, but it gives you an idea.  You could
pair a dedup target with dm-thinp in a stacked configuration to achieve
your goals.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm-thinp feature request: skip allocation on writes of all zeroes
  2014-09-30 20:13 ` Mike Snitzer
@ 2014-09-30 22:38   ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-09-30 22:38 UTC (permalink / raw)
  To: lvm-devel

>> I have always wondered, does LVM ignore writes with blocks of all
>> zeroes when the destination address is unallocated? From experience,
>> it does not appear that this is so (or I cannot find the config
>> option). This would reduce IOs, particularily when importing VM disk
>> images from other sources. Currently, we must import, allocate lots
>> of zeroes, then fstrim after the fact.
>
> And in your example fstrim wouldn't do anything for you. [...]
> fstrim is only useful to reclaim blocks that are no longer used.

For this particular example, unused data is mostly zeroes and thus fstrim 
frees those unused chunks when dm-thinp receives the discard. I realize 
that fstrim, strictly speaking, is not the solution---it was only meant to 
highlight my example.

> The zeroed data in the image is still valid isn't it?

Yes, and it would continue to be valid because dm-thinp returns zeroes in 
read requests to unallocated chunks. A quick check for all zeroes in the 
bio before writing to an unallocated chunk would save a good amount of 
overhead in our application, and probably others' as well. (Of course if 
the chunk is allocated, the write must proceed like normal.)

> We don't inspect the data being written to infer WRITE SAME-like action.

I would like to implement data inspection in this way and make it 
available as an option to the dmsetup table string.

Can you recommend where I should look in the dm-thinp code to check the 
bio for all zeroes and branch accordingly?

(You are correct, deduplication would be an even better solution, but for 
the moment I believe that WRITE SAME-like action could be implemented with 
a relatively small patch if I can learn where to insert the code.)

-Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks
  2014-09-30 18:38 dm-thinp feature request: skip allocation on writes of all zeroes Eric Wheeler
  2014-09-30 19:30 ` Zdenek Kabelac
  2014-09-30 20:13 ` Mike Snitzer
@ 2014-12-04  7:05 ` Eric Wheeler
  2014-12-04  7:25   ` Eric Wheeler
  2 siblings, 1 reply; 46+ messages in thread
From: Eric Wheeler @ 2014-12-04  7:05 UTC (permalink / raw)
  To: lvm-devel

This patch skips all-zero writes to unallocated blocks of dm-thinp 
volumes.

Unallocated zero-writes are 70x faster and never allocate space in this test:
         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
         1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s

Without the patch, zero-writes allocate space and hit the disk:
         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
         1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s

For the test below, notice the allocation difference for thin volumes
test1 and test2 (after dd if=test1 of=test2), even though they have the
same md5sum:
   LV    VG   Attr       LSize Pool  Origin Data%
   test1 test Vwi-a-tz-- 4.00g thinp         22.04
   test2 test Vwi-a-tz-- 4.00g thinp         18.33

An additional 3.71% of space was saved by the patch, and so were
the ~150MB of (possibly random) IOs that would have hit disk, not to
mention reads that now bypass the disk since they are unallocated.
We also save the metadata overhead of ~2400 allocations when calling
provision_block().

         # lvcreate -T test/thinp -L 5G
         # lvcreate -T test/thinp -V 4G -n test1
         # lvcreate -T test/thinp -V 4G -n test2

Simple ext4+kernel tree extract test:

First prepare two dm-thinp volumes test1 and test2 of equal size.  First
mkfs.ext4 /dev/test/test1 without the patch and then mount and extract
3.17.4's source tree onto the test1 filesystem, and unmount

Next, install patched dm_thin_pool.ko, then dd test1 over test2 and
verify checksums:
         # dd if=/dev/test/test1  of=/dev/test/test2 bs=1M
         # md5sum /dev/test/test?
         b210f032a6465178103317f3c40ab59f  /dev/test/test1
         b210f032a6465178103317f3c40ab59f  /dev/test/test2
Yes, they match!

Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net>
---
  drivers/md/dm-thin.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
  1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..71dd545 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
  	}
  }

+/* return true if bio data contains all 0x00's */
+bool bio_all_zeros(struct bio *bio) 
+{
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	char *data;
+	uint64_t *p;
+	int i, count;
+ 
+	bool allzeros = true;
+
+	bio_for_each_segment(bv, bio, iter) {
+		data = bvec_kmap_irq(&bv, &flags);
+
+		p = (uint64_t*)data;
+		count = bv.bv_len / sizeof(uint64_t);
+
+		for (i = 0; i < count; i++) {
+			if (*p)	{
+				allzeros = false;
+				break;
+			}
+			p++;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+
+		if (likely(!allzeros))
+				break;
+	}
+
+	return allzeros;
+}
+
  static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block,
  			    struct dm_bio_prison_cell *cell)
  {
@@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
  		return;
  	}

+	/*
+	 * Skip writes of all zeroes
+	 */
+	if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
  	r = alloc_data_block(tc, &data_block);
  	switch (r) {
  	case 0:
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks
  2014-12-04  7:05 ` [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks Eric Wheeler
@ 2014-12-04  7:25   ` Eric Wheeler
  2014-12-04 15:33       ` Mike Snitzer
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Wheeler @ 2014-12-04  7:25 UTC (permalink / raw)
  To: lvm-devel

This patch skips all-zero writes to unallocated blocks of dm-thinp volumes.

Unallocated zero-writes are 70x faster and never allocate space in this test:
         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
         1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s

Without the patch, zero-writes allocate space and hit the disk:
         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
         1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s

For the test below, notice the allocation difference for thin volumes
test1 and test2 (after dd if=test1 of=test2), even though they have the
same md5sum:
   LV    VG   Attr       LSize Pool  Origin Data%
   test1 test Vwi-a-tz-- 4.00g thinp         22.04
   test2 test Vwi-a-tz-- 4.00g thinp         18.33

An additional 3.71% of space was saved by the patch, and so were
the ~150MB of (possibly random) IOs that would have hit disk, not to
mention reads that now bypass the disk since they are unallocated.
We also save the metadata overhead of ~2400 allocations when calling
provision_block().

         # lvcreate -T test/thinp -L 5G
         # lvcreate -T test/thinp -V 4G -n test1
         # lvcreate -T test/thinp -V 4G -n test2

Simple ext4+kernel tree extract test:

First prepare two dm-thinp volumes test1 and test2 of equal size.  First
mkfs.ext4 /dev/test/test1 without the patch and then mount and extract
3.17.4's source tree onto the test1 filesystem, and unmount

Next, install patched dm_thin_pool.ko, then dd test1 over test2 and
verify checksums:
         # dd if=/dev/test/test1  of=/dev/test/test2 bs=1M
         # md5sum /dev/test/test?
         b210f032a6465178103317f3c40ab59f  /dev/test/test1
         b210f032a6465178103317f3c40ab59f  /dev/test/test2
Yes, they match!


Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net>
---
  Resending the patch as it was malformed on the first try.

   Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
   888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
   www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

drivers/md/dm-thin.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
  1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..71dd545 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
  	}
  }

+/* return true if bio data contains all 0x00's */
+bool bio_all_zeros(struct bio *bio) 
+{
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	char *data;
+	uint64_t *p;
+	int i, count;
+ 
+	bool allzeros = true;
+
+	bio_for_each_segment(bv, bio, iter) {
+		data = bvec_kmap_irq(&bv, &flags);
+
+		p = (uint64_t*)data;
+		count = bv.bv_len / sizeof(uint64_t);
+
+		for (i = 0; i < count; i++) {
+			if (*p)	{
+				allzeros = false;
+				break;
+			}
+			p++;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+
+		if (likely(!allzeros))
+				break;
+	}
+
+	return allzeros;
+}
+
  static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block,
  			    struct dm_bio_prison_cell *cell)
  {
@@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
  		return;
  	}

+	/*
+	 * Skip writes of all zeroes
+	 */
+	if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
  	r = alloc_data_block(tc, &data_block);
  	switch (r) {
  	case 0:
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-04  7:25   ` Eric Wheeler
@ 2014-12-04 15:33       ` Mike Snitzer
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-04 15:33 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: axboe, dm-devel, ejt, LVM2 development

Hi,

In the future please send DM changes to dm-devel@redhat.com

Comments inlined below, and I've provided a revised patch at the end.

On Thu, Dec 04 2014 at  2:25am -0500,
Eric Wheeler <ewheeler@ewheeler.net> wrote:

> This patch skips all-zero writes to unallocated blocks of dm-thinp volumes.
> 
> Unallocated zero-writes are 70x faster and never allocate space in this test:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s
> 
> Without the patch, zero-writes allocate space and hit the disk:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s
> 
> For the test below, notice the allocation difference for thin volumes
> test1 and test2 (after dd if=test1 of=test2), even though they have the
> same md5sum:
>   LV    VG   Attr       LSize Pool  Origin Data%
>   test1 test Vwi-a-tz-- 4.00g thinp         22.04
>   test2 test Vwi-a-tz-- 4.00g thinp         18.33
> 
> An additional 3.71% of space was saved by the patch, and so were
> the ~150MB of (possibly random) IOs that would have hit disk, not to
> mention reads that now bypass the disk since they are unallocated.
> We also save the metadata overhead of ~2400 allocations when calling
> provision_block().
> 
>         # lvcreate -T test/thinp -L 5G
>         # lvcreate -T test/thinp -V 4G -n test1
>         # lvcreate -T test/thinp -V 4G -n test2
> 
> Simple ext4+kernel tree extract test:
> 
> First prepare two dm-thinp volumes test1 and test2 of equal size.  First
> mkfs.ext4 /dev/test/test1 without the patch and then mount and extract
> 3.17.4's source tree onto the test1 filesystem, and unmount
> 
> Next, install patched dm_thin_pool.ko, then dd test1 over test2 and
> verify checksums:
>         # dd if=/dev/test/test1  of=/dev/test/test2 bs=1M
>         # md5sum /dev/test/test?
>         b210f032a6465178103317f3c40ab59f  /dev/test/test1
>         b210f032a6465178103317f3c40ab59f  /dev/test/test2
> Yes, they match!
> 
> 
> Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net>
> ---
>  Resending the patch as it was malformed on the first try.

The resend was also malformed.. but don't worry about resending for this
patch.

> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index fc9c848..71dd545 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  	}
>  }

A helper like this really belongs in block/bio.c:

> +/* return true if bio data contains all 0x00's */
> +bool bio_all_zeros(struct bio *bio) +{
> +	unsigned long flags;
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +
> +	char *data;
> +	uint64_t *p;
> +	int i, count;
> + +	bool allzeros = true;
> +
> +	bio_for_each_segment(bv, bio, iter) {
> +		data = bvec_kmap_irq(&bv, &flags);
> +
> +		p = (uint64_t*)data;
> +		count = bv.bv_len / sizeof(uint64_t);

Addressing a bio's contents in terms of uint64_t has the potential to
access beyond bv.bv_len (byte addressing vs 64bit addressing).  I can
see you were just looking to be more efficient about checking the bios'
contents but I'm not convinced it would always be safe.

I'm open to something more efficient than what I implemented below, but
it is the most straight-forward code I thought of.

> +
> +		for (i = 0; i < count; i++) {
> +			if (*p)	{
> +				allzeros = false;
> +				break;
> +			}
> +			p++;
> +		}
> +
> +		bvec_kunmap_irq(data, &flags);
> +
> +		if (likely(!allzeros))
> +				break;
> +	}
> +
> +	return allzeros;
> +}
> +
>  static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block,
>  			    struct dm_bio_prison_cell *cell)
>  {
> @@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  		return;
>  	}
> 
> +	/*
> +	 * Skip writes of all zeroes
> +	 */
> +	if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) {
> +		cell_defer_no_holder(tc, cell);
> +		bio_endio(bio, 0);
> +		return;
> +	}
> +

No need to check for bio_data_dir(bio) == WRITE (at this point in
provision_block() we already know it is a WRITE).

Here is a revised patch that is more like I'd expect to land upstream.
Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
should I split it out as a separate patch for you to pick up or just
carry it as part of the patch that lands in linux-dm.git?


From: Mike Snitzer <snitzer@redhat.com>
Date: Thu, 4 Dec 2014 10:18:32 -0500
Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Eric Wheeler <ewheeler@ewheeler.net>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c          |   25 +++++++++++++++++++++++++
 drivers/md/dm-thin.c |   10 ++++++++++
 include/linux/bio.h  |    1 +
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3e6e198..7d07593 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -515,6 +515,31 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+
+		for (i = 0; i < bv.bv_len; i++) {
+			if (*p) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			p++;
+		}
+		bvec_kunmap_irq(data, &flags);
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 8735543..13aff8c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1501,6 +1501,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+	/*
+	 * Optimize away writes of all zeroes, subsequent reads to
+	 * associated unprovisioned blocks will be zero filled.
+	 */
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7347f48..602094b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -465,6 +465,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.4.4

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-04 15:33       ` Mike Snitzer
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-04 15:33 UTC (permalink / raw)
  To: lvm-devel

Hi,

In the future please send DM changes to dm-devel at redhat.com

Comments inlined below, and I've provided a revised patch at the end.

On Thu, Dec 04 2014 at  2:25am -0500,
Eric Wheeler <ewheeler@ewheeler.net> wrote:

> This patch skips all-zero writes to unallocated blocks of dm-thinp volumes.
> 
> Unallocated zero-writes are 70x faster and never allocate space in this test:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s
> 
> Without the patch, zero-writes allocate space and hit the disk:
>         # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024
>         1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s
> 
> For the test below, notice the allocation difference for thin volumes
> test1 and test2 (after dd if=test1 of=test2), even though they have the
> same md5sum:
>   LV    VG   Attr       LSize Pool  Origin Data%
>   test1 test Vwi-a-tz-- 4.00g thinp         22.04
>   test2 test Vwi-a-tz-- 4.00g thinp         18.33
> 
> An additional 3.71% of space was saved by the patch, and so were
> the ~150MB of (possibly random) IOs that would have hit disk, not to
> mention reads that now bypass the disk since they are unallocated.
> We also save the metadata overhead of ~2400 allocations when calling
> provision_block().
> 
>         # lvcreate -T test/thinp -L 5G
>         # lvcreate -T test/thinp -V 4G -n test1
>         # lvcreate -T test/thinp -V 4G -n test2
> 
> Simple ext4+kernel tree extract test:
> 
> First prepare two dm-thinp volumes test1 and test2 of equal size.  First
> mkfs.ext4 /dev/test/test1 without the patch and then mount and extract
> 3.17.4's source tree onto the test1 filesystem, and unmount
> 
> Next, install patched dm_thin_pool.ko, then dd test1 over test2 and
> verify checksums:
>         # dd if=/dev/test/test1  of=/dev/test/test2 bs=1M
>         # md5sum /dev/test/test?
>         b210f032a6465178103317f3c40ab59f  /dev/test/test1
>         b210f032a6465178103317f3c40ab59f  /dev/test/test2
> Yes, they match!
> 
> 
> Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net>
> ---
>  Resending the patch as it was malformed on the first try.

The resend was also malformed.. but don't worry about resending for this
patch.

> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index fc9c848..71dd545 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  	}
>  }

A helper like this really belongs in block/bio.c:

> +/* return true if bio data contains all 0x00's */
> +bool bio_all_zeros(struct bio *bio) +{
> +	unsigned long flags;
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +
> +	char *data;
> +	uint64_t *p;
> +	int i, count;
> + +	bool allzeros = true;
> +
> +	bio_for_each_segment(bv, bio, iter) {
> +		data = bvec_kmap_irq(&bv, &flags);
> +
> +		p = (uint64_t*)data;
> +		count = bv.bv_len / sizeof(uint64_t);

Addressing a bio's contents in terms of uint64_t has the potential to
access beyond bv.bv_len (byte addressing vs 64bit addressing).  I can
see you were just looking to be more efficient about checking the bios'
contents but I'm not convinced it would always be safe.

I'm open to something more efficient than what I implemented below, but
it is the most straight-forward code I thought of.

> +
> +		for (i = 0; i < count; i++) {
> +			if (*p)	{
> +				allzeros = false;
> +				break;
> +			}
> +			p++;
> +		}
> +
> +		bvec_kunmap_irq(data, &flags);
> +
> +		if (likely(!allzeros))
> +				break;
> +	}
> +
> +	return allzeros;
> +}
> +
>  static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block,
>  			    struct dm_bio_prison_cell *cell)
>  {
> @@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  		return;
>  	}
> 
> +	/*
> +	 * Skip writes of all zeroes
> +	 */
> +	if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) {
> +		cell_defer_no_holder(tc, cell);
> +		bio_endio(bio, 0);
> +		return;
> +	}
> +

No need to check for bio_data_dir(bio) == WRITE (at this point in
provision_block() we already know it is a WRITE).

Here is a revised patch that is more like I'd expect to land upstream.
Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
should I split it out as a separate patch for you to pick up or just
carry it as part of the patch that lands in linux-dm.git?


From: Mike Snitzer <snitzer@redhat.com>
Date: Thu, 4 Dec 2014 10:18:32 -0500
Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Eric Wheeler <ewheeler@ewheeler.net>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c          |   25 +++++++++++++++++++++++++
 drivers/md/dm-thin.c |   10 ++++++++++
 include/linux/bio.h  |    1 +
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3e6e198..7d07593 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -515,6 +515,31 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+
+		for (i = 0; i < bv.bv_len; i++) {
+			if (*p) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			p++;
+		}
+		bvec_kunmap_irq(data, &flags);
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 8735543..13aff8c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1501,6 +1501,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+	/*
+	 * Optimize away writes of all zeroes, subsequent reads to
+	 * associated unprovisioned blocks will be zero filled.
+	 */
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7347f48..602094b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -465,6 +465,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.4.4



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-04 15:33       ` Mike Snitzer
@ 2014-12-04 15:43         ` Mike Snitzer
  -1 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-04 15:43 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: axboe, dm-devel, ejt, LVM2 development

On Thu, Dec 04 2014 at 10:33am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index fc9c848..71dd545 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
> >  	}
> >  }
> 
> A helper like this really belongs in block/bio.c:
> 
> > +/* return true if bio data contains all 0x00's */
> > +bool bio_all_zeros(struct bio *bio) +{
> > +	unsigned long flags;
> > +	struct bio_vec bv;
> > +	struct bvec_iter iter;
> > +
> > +	char *data;
> > +	uint64_t *p;
> > +	int i, count;
> > + +	bool allzeros = true;
> > +
> > +	bio_for_each_segment(bv, bio, iter) {
> > +		data = bvec_kmap_irq(&bv, &flags);
> > +
> > +		p = (uint64_t*)data;
> > +		count = bv.bv_len / sizeof(uint64_t);
> 
> Addressing a bio's contents in terms of uint64_t has the potential to
> access beyond bv.bv_len (byte addressing vs 64bit addressing).  I can
> see you were just looking to be more efficient about checking the bios'
> contents but I'm not convinced it would always be safe.

Actually, given your: count = bv.bv_len / sizeof(uint64_t);

My immediate concern was it would've truncated any partial contents of
the bio_vec beyond the last uint64_t boundary.  Leading to possible
false positive return from the function (because some contents weren't
checked).

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-04 15:43         ` Mike Snitzer
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-04 15:43 UTC (permalink / raw)
  To: lvm-devel

On Thu, Dec 04 2014 at 10:33am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index fc9c848..71dd545 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
> >  	}
> >  }
> 
> A helper like this really belongs in block/bio.c:
> 
> > +/* return true if bio data contains all 0x00's */
> > +bool bio_all_zeros(struct bio *bio) +{
> > +	unsigned long flags;
> > +	struct bio_vec bv;
> > +	struct bvec_iter iter;
> > +
> > +	char *data;
> > +	uint64_t *p;
> > +	int i, count;
> > + +	bool allzeros = true;
> > +
> > +	bio_for_each_segment(bv, bio, iter) {
> > +		data = bvec_kmap_irq(&bv, &flags);
> > +
> > +		p = (uint64_t*)data;
> > +		count = bv.bv_len / sizeof(uint64_t);
> 
> Addressing a bio's contents in terms of uint64_t has the potential to
> access beyond bv.bv_len (byte addressing vs 64bit addressing).  I can
> see you were just looking to be more efficient about checking the bios'
> contents but I'm not convinced it would always be safe.

Actually, given your: count = bv.bv_len / sizeof(uint64_t);

My immediate concern was it would've truncated any partial contents of
the bio_vec beyond the last uint64_t boundary.  Leading to possible
false positive return from the function (because some contents weren't
checked).



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-04 15:33       ` Mike Snitzer
@ 2014-12-05 14:47         ` Mike Snitzer
  -1 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-05 14:47 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: axboe, dm-devel, ejt, LVM2 development

On Thu, Dec 04 2014 at 10:33am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> From: Mike Snitzer <snitzer@redhat.com>
> Date: Thu, 4 Dec 2014 10:18:32 -0500
> Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
> 
> Introduce bio_is_zero_filled() and use it to optimize away writing all
> zeroes to unprovisioned blocks.  Subsequent reads to the associated
> unprovisioned blocks will be zero filled.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Eric Wheeler <ewheeler@ewheeler.net>
> Cc: Jens Axboe <axboe@kernel.dk>

In testing this patch it is now quite clear that this change seriously
impacts test coverage in the device-mapper-test-suite because dmts'
wipe_device() uses /dev/zero as the ifile for dd to write to the disk.
As such, with this patch all tests expecting to see provisioned blocks
as a side-effect of wipe_device now fail (e.g. DiscardQuickTests).

So this change won't go upstream until full test coverage can be
restored in dmts.  I'll see what I can come up with but it is low
priority.

BTW, this also makes me wonder if this change will be extremely
unintuitive to existing or future user of DM thinp.  So much so that it
might be best to require a new thin-pool feature flag to enable this
optimization.

Mike

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-05 14:47         ` Mike Snitzer
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-05 14:47 UTC (permalink / raw)
  To: lvm-devel

On Thu, Dec 04 2014 at 10:33am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> From: Mike Snitzer <snitzer@redhat.com>
> Date: Thu, 4 Dec 2014 10:18:32 -0500
> Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
> 
> Introduce bio_is_zero_filled() and use it to optimize away writing all
> zeroes to unprovisioned blocks.  Subsequent reads to the associated
> unprovisioned blocks will be zero filled.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Eric Wheeler <ewheeler@ewheeler.net>
> Cc: Jens Axboe <axboe@kernel.dk>

In testing this patch it is now quite clear that this change seriously
impacts test coverage in the device-mapper-test-suite because dmts'
wipe_device() uses /dev/zero as the ifile for dd to write to the disk.
As such, with this patch all tests expecting to see provisioned blocks
as a side-effect of wipe_device now fail (e.g. DiscardQuickTests).

So this change won't go upstream until full test coverage can be
restored in dmts.  I'll see what I can come up with but it is low
priority.

BTW, this also makes me wonder if this change will be extremely
unintuitive to existing or future user of DM thinp.  So much so that it
might be best to require a new thin-pool feature flag to enable this
optimization.

Mike

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-04 15:33       ` Mike Snitzer
@ 2014-12-05 17:27         ` Jens Axboe
  -1 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-05 17:27 UTC (permalink / raw)
  To: Mike Snitzer, Eric Wheeler; +Cc: dm-devel, ejt, LVM2 development

On 12/04/2014 08:33 AM, Mike Snitzer wrote:
>
> Here is a revised patch that is more like I'd expect to land upstream.
> Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
> should I split it out as a separate patch for you to pick up or just
> carry it as part of the patch that lands in linux-dm.git?

Yeah I'm fine with adding that helper. I do wonder what the performance 
impact is on this for dm. Have you tried a (worst case) test of writing 
blocks that are zero filled, but with the last byte not being a zero?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-05 17:27         ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-05 17:27 UTC (permalink / raw)
  To: lvm-devel

On 12/04/2014 08:33 AM, Mike Snitzer wrote:
>
> Here is a revised patch that is more like I'd expect to land upstream.
> Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
> should I split it out as a separate patch for you to pick up or just
> carry it as part of the patch that lands in linux-dm.git?

Yeah I'm fine with adding that helper. I do wonder what the performance 
impact is on this for dm. Have you tried a (worst case) test of writing 
blocks that are zero filled, but with the last byte not being a zero?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-05 17:27         ` Jens Axboe
@ 2014-12-05 18:33           ` Mike Snitzer
  -1 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-05 18:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Wheeler, dm-devel, ejt, LVM2 development

On Fri, Dec 05 2014 at 12:27pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 12/04/2014 08:33 AM, Mike Snitzer wrote:
> >
> >Here is a revised patch that is more like I'd expect to land upstream.
> >Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
> >should I split it out as a separate patch for you to pick up or just
> >carry it as part of the patch that lands in linux-dm.git?
> 
> Yeah I'm fine with adding that helper.

Good to know, thanks.

> I do wonder what the performance impact is on this for dm. Have you
> tried a (worst case) test of writing blocks that are zero filled, but
> with the last byte not being a zero?

I haven't tested that yet, but I share your concern.  Does fio offer the
ability to slap down a specific pattern like this?  If not is there a
logical place to extend fio to enable this?

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-05 18:33           ` Mike Snitzer
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Snitzer @ 2014-12-05 18:33 UTC (permalink / raw)
  To: lvm-devel

On Fri, Dec 05 2014 at 12:27pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 12/04/2014 08:33 AM, Mike Snitzer wrote:
> >
> >Here is a revised patch that is more like I'd expect to land upstream.
> >Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
> >should I split it out as a separate patch for you to pick up or just
> >carry it as part of the patch that lands in linux-dm.git?
> 
> Yeah I'm fine with adding that helper.

Good to know, thanks.

> I do wonder what the performance impact is on this for dm. Have you
> tried a (worst case) test of writing blocks that are zero filled, but
> with the last byte not being a zero?

I haven't tested that yet, but I share your concern.  Does fio offer the
ability to slap down a specific pattern like this?  If not is there a
logical place to extend fio to enable this?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-04 15:43         ` Mike Snitzer
@ 2014-12-06 22:33           ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-06 22:33 UTC (permalink / raw)
  To: LVM2 development; +Cc: axboe, dm-devel, ejt

> Actually, given your: count = bv.bv_len / sizeof(uint64_t);
>
> My immediate concern was it would've truncated any partial contents of
> the bio_vec beyond the last uint64_t boundary.  Leading to possible
> false positive return from the function (because some contents weren't
> checked).

Understood.  I had always thought bio data was block-size aligned.

Out of curiosity, do you have examples of when this is not so?

>
> --
> lvm-devel mailing list
> lvm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-06 22:33           ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-06 22:33 UTC (permalink / raw)
  To: lvm-devel

> Actually, given your: count = bv.bv_len / sizeof(uint64_t);
>
> My immediate concern was it would've truncated any partial contents of
> the bio_vec beyond the last uint64_t boundary.  Leading to possible
> false positive return from the function (because some contents weren't
> checked).

Understood.  I had always thought bio data was block-size aligned.

Out of curiosity, do you have examples of when this is not so?

>
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-05 14:47         ` Mike Snitzer
@ 2014-12-06 22:36           ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-06 22:36 UTC (permalink / raw)
  To: LVM2 development; +Cc: axboe, dm-devel, ejt

On Fri, 5 Dec 2014, Mike Snitzer wrote:

> On Thu, Dec 04 2014 at 10:33am -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> In testing this patch it is now quite clear that this change seriously
> impacts test coverage in the device-mapper-test-suite because dmts'
> wipe_device() uses /dev/zero as the ifile for dd to write to the disk.
> As such, with this patch all tests expecting to see provisioned blocks
> as a side-effect of wipe_device now fail (e.g. DiscardQuickTests).
>
> So this change won't go upstream until full test coverage can be
> restored in dmts.  I'll see what I can come up with but it is low
> priority.

Can dtms use /dev/urandom?

> BTW, this also makes me wonder if this change will be extremely
> unintuitive to existing or future user of DM thinp.  So much so that it
> might be best to require a new thin-pool feature flag to enable this
> optimization.

Good idea, some may wish to turn it off.  IMO, leaving it on by default 
would likely work for most use cases.

-Eric

> Mike
>
> --
> lvm-devel mailing list
> lvm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-06 22:36           ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-06 22:36 UTC (permalink / raw)
  To: lvm-devel

On Fri, 5 Dec 2014, Mike Snitzer wrote:

> On Thu, Dec 04 2014 at 10:33am -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> In testing this patch it is now quite clear that this change seriously
> impacts test coverage in the device-mapper-test-suite because dmts'
> wipe_device() uses /dev/zero as the ifile for dd to write to the disk.
> As such, with this patch all tests expecting to see provisioned blocks
> as a side-effect of wipe_device now fail (e.g. DiscardQuickTests).
>
> So this change won't go upstream until full test coverage can be
> restored in dmts.  I'll see what I can come up with but it is low
> priority.

Can dtms use /dev/urandom?

> BTW, this also makes me wonder if this change will be extremely
> unintuitive to existing or future user of DM thinp.  So much so that it
> might be best to require a new thin-pool feature flag to enable this
> optimization.

Good idea, some may wish to turn it off.  IMO, leaving it on by default 
would likely work for most use cases.

-Eric

> Mike
>
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-05 18:33           ` Mike Snitzer
@ 2014-12-06 22:40             ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-06 22:40 UTC (permalink / raw)
  To: LVM2 development; +Cc: Jens Axboe, dm-devel, ejt

On Fri, 5 Dec 2014, Mike Snitzer wrote:
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled, but
>> with the last byte not being a zero?

The additional overhead of worst-case should be (nearly) equal to the 
simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing 
that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @ 3.30GHz.

I could see where really fast storage that can obtain 1.4gb/s might notice 
a performance regression---but for most use cases that is quite fast.

-Eric

> I haven't tested that yet, but I share your concern.  Does fio offer the
> ability to slap down a specific pattern like this?  If not is there a
> logical place to extend fio to enable this?




> --
> lvm-devel mailing list
> lvm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-06 22:40             ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-06 22:40 UTC (permalink / raw)
  To: lvm-devel

On Fri, 5 Dec 2014, Mike Snitzer wrote:
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled, but
>> with the last byte not being a zero?

The additional overhead of worst-case should be (nearly) equal to the 
simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing 
that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @ 3.30GHz.

I could see where really fast storage that can obtain 1.4gb/s might notice 
a performance regression---but for most use cases that is quite fast.

-Eric

> I haven't tested that yet, but I share your concern.  Does fio offer the
> ability to slap down a specific pattern like this?  If not is there a
> logical place to extend fio to enable this?

> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-05 18:33           ` Mike Snitzer
@ 2014-12-07  1:36             ` Jens Axboe
  -1 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-07  1:36 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Eric Wheeler, dm-devel, ejt, LVM2 development

On 12/05/2014 11:33 AM, Mike Snitzer wrote:
> On Fri, Dec 05 2014 at 12:27pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
>
>> On 12/04/2014 08:33 AM, Mike Snitzer wrote:
>>>
>>> Here is a revised patch that is more like I'd expect to land upstream.
>>> Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
>>> should I split it out as a separate patch for you to pick up or just
>>> carry it as part of the patch that lands in linux-dm.git?
>>
>> Yeah I'm fine with adding that helper.
>
> Good to know, thanks.
>
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled, but
>> with the last byte not being a zero?
>
> I haven't tested that yet, but I share your concern.  Does fio offer the
> ability to slap down a specific pattern like this?  If not is there a
> logical place to extend fio to enable this?

You can do an explicit pattern for IO buffers, but it's not really 
geared towards this, it's more for repeat patterns. But maybe something 
could be added, let me think about the best way to do that...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-07  1:36             ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-07  1:36 UTC (permalink / raw)
  To: lvm-devel

On 12/05/2014 11:33 AM, Mike Snitzer wrote:
> On Fri, Dec 05 2014 at 12:27pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
>
>> On 12/04/2014 08:33 AM, Mike Snitzer wrote:
>>>
>>> Here is a revised patch that is more like I'd expect to land upstream.
>>> Jens are you OK with us adding bio_is_zero_filled to block/bio.c?  If so
>>> should I split it out as a separate patch for you to pick up or just
>>> carry it as part of the patch that lands in linux-dm.git?
>>
>> Yeah I'm fine with adding that helper.
>
> Good to know, thanks.
>
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled, but
>> with the last byte not being a zero?
>
> I haven't tested that yet, but I share your concern.  Does fio offer the
> ability to slap down a specific pattern like this?  If not is there a
> logical place to extend fio to enable this?

You can do an explicit pattern for IO buffers, but it's not really 
geared towards this, it's more for repeat patterns. But maybe something 
could be added, let me think about the best way to do that...

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-06 22:40             ` Eric Wheeler
@ 2014-12-07  1:41               ` Jens Axboe
  -1 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-07  1:41 UTC (permalink / raw)
  To: Eric Wheeler, LVM2 development; +Cc: dm-devel, ejt

On 12/06/2014 03:40 PM, Eric Wheeler wrote:
> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>> I do wonder what the performance impact is on this for dm. Have you
>>> tried a (worst case) test of writing blocks that are zero filled, but
>>> with the last byte not being a zero?
>
> The additional overhead of worst-case should be (nearly) equal to the
> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @ 3.30GHz.

That seems way too slow for checking if it's zero or not... Memory 
bandwidth should be way higher than that. The line above, was that what 
you ran? How does it look with bs=4k or higher?

> I could see where really fast storage that can obtain 1.4gb/s might
> notice a performance regression---but for most use cases that is quite
> fast.

Depends on your view point, I think it's pretty slow. Plenty of devices 
out there that are faster than that. Quick example, the laptop I am 
typing this email from:

   read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec

Writes are around 1GB/sec, enough that I bet the slowdown in checking 
for zeroes will be noticeable. In the pci-e flash space, more than 
double your current /dev/zero benchmark is common for current devices.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-07  1:41               ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-07  1:41 UTC (permalink / raw)
  To: lvm-devel

On 12/06/2014 03:40 PM, Eric Wheeler wrote:
> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>> I do wonder what the performance impact is on this for dm. Have you
>>> tried a (worst case) test of writing blocks that are zero filled, but
>>> with the last byte not being a zero?
>
> The additional overhead of worst-case should be (nearly) equal to the
> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @ 3.30GHz.

That seems way too slow for checking if it's zero or not... Memory 
bandwidth should be way higher than that. The line above, was that what 
you ran? How does it look with bs=4k or higher?

> I could see where really fast storage that can obtain 1.4gb/s might
> notice a performance regression---but for most use cases that is quite
> fast.

Depends on your view point, I think it's pretty slow. Plenty of devices 
out there that are faster than that. Quick example, the laptop I am 
typing this email from:

   read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec

Writes are around 1GB/sec, enough that I bet the slowdown in checking 
for zeroes will be noticeable. In the pci-e flash space, more than 
double your current /dev/zero benchmark is common for current devices.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-07  1:41               ` Jens Axboe
@ 2014-12-07  6:30                 ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-07  6:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, ejt, LVM2 development

On Sat, 6 Dec 2014, Jens Axboe wrote:
> On 12/06/2014 03:40 PM, Eric Wheeler wrote:
>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>>> I do wonder what the performance impact is on this for dm. Have you
>>>> tried a (worst case) test of writing blocks that are zero filled, but
>>>> with the last byte not being a zero?
>> 
>> The additional overhead of worst-case should be (nearly) equal to the
>> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
>> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @ 3.30GHz.
>
> That seems way too slow for checking if it's zero or not... Memory bandwidth 
> should be way higher than that. The line above, was that what you ran? How 
> does it look with bs=4k or higher?

In userspace I can get ~12GB/s, so I think the algorithm is sound.
dd might not be the right tool for this.

>  read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec

Can you suggest the right fio commandline to test sequential writes if all 
zeros?  I tried --zero_buffers but couldn't get it to write zeros, writes 
kept going to disk.

Also, attached is the patch that supports uintptr_t word sized 0-checks. 
It steps byte-aligned at the beginning and end in case either end is not 
word aligned.

I tried a few different algorithms:
   Mike's trivial byte-by-byte zero check
   using memcmp(ZERO_PAGE, data, bv.bv_len)==0
   and the fastest one below:

-Eric

---
  block/bio.c          |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++
  drivers/md/dm-thin.c |   10 +++++++
  include/linux/bio.h  |    1 +
  3 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..9100d35 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,73 @@ void zero_fill_bio(struct bio *bio)
  }
  EXPORT_SYMBOL(zero_fill_bio);

+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i, count;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+		uintptr_t *parch;
+		int left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+
+		/* check unaligned bytes at the beginning of p */
+		if (unlikely( ( (uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 )) {
+			count = sizeof(uintptr_t) - ( (uintptr_t)p & (sizeof(uintptr_t)-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++;
+			}
+			left -= count;
+		}
+
+		/* we should be word aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 ));
+
+		/* now check in word-sized chunks */
+		parch = (uintptr_t*)p;
+		count = left >> ilog2(sizeof(uintptr_t)); /* count = left / sizeof(uintptr_t) */;
+		for (i = 0; i < count; i++) {
+			if (*parch) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			parch++;
+		}
+		left -= count << ilog2(sizeof(uintptr_t)); /* left -= count*sizeof(uintptr_t) */
+
+		/* check remaining unaligned values at the end */
+		p = (char*)parch;
+		if (unlikely(left > 0))
+		{
+			for (i = 0; i < left; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++; 
+			}
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != p ));
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
  /**
   * bio_put - release a reference to a bio
   * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..6a0c2c0 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1258,6 +1258,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
  		return;
  	}

+	/*
+	* Optimize away writes of all zeroes, subsequent reads to
+	* associated unprovisioned blocks will be zero filled.
+	*/
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
  	r = alloc_data_block(tc, &data_block);
  	switch (r) {
  	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a64576..abb46f7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
  				     int, int, gfp_t);
  extern int bio_uncopy_user(struct bio *);
  void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
  extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
  extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
  extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.1

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-07  6:30                 ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-07  6:30 UTC (permalink / raw)
  To: lvm-devel

On Sat, 6 Dec 2014, Jens Axboe wrote:
> On 12/06/2014 03:40 PM, Eric Wheeler wrote:
>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>>> I do wonder what the performance impact is on this for dm. Have you
>>>> tried a (worst case) test of writing blocks that are zero filled, but
>>>> with the last byte not being a zero?
>> 
>> The additional overhead of worst-case should be (nearly) equal to the
>> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
>> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @ 3.30GHz.
>
> That seems way too slow for checking if it's zero or not... Memory bandwidth 
> should be way higher than that. The line above, was that what you ran? How 
> does it look with bs=4k or higher?

In userspace I can get ~12GB/s, so I think the algorithm is sound.
dd might not be the right tool for this.

>  read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec

Can you suggest the right fio commandline to test sequential writes if all 
zeros?  I tried --zero_buffers but couldn't get it to write zeros, writes 
kept going to disk.

Also, attached is the patch that supports uintptr_t word sized 0-checks. 
It steps byte-aligned at the beginning and end in case either end is not 
word aligned.

I tried a few different algorithms:
   Mike's trivial byte-by-byte zero check
   using memcmp(ZERO_PAGE, data, bv.bv_len)==0
   and the fastest one below:

-Eric

---
  block/bio.c          |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++
  drivers/md/dm-thin.c |   10 +++++++
  include/linux/bio.h  |    1 +
  3 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..9100d35 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,73 @@ void zero_fill_bio(struct bio *bio)
  }
  EXPORT_SYMBOL(zero_fill_bio);

+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i, count;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+		uintptr_t *parch;
+		int left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+
+		/* check unaligned bytes at the beginning of p */
+		if (unlikely( ( (uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 )) {
+			count = sizeof(uintptr_t) - ( (uintptr_t)p & (sizeof(uintptr_t)-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++;
+			}
+			left -= count;
+		}
+
+		/* we should be word aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 ));
+
+		/* now check in word-sized chunks */
+		parch = (uintptr_t*)p;
+		count = left >> ilog2(sizeof(uintptr_t)); /* count = left / sizeof(uintptr_t) */;
+		for (i = 0; i < count; i++) {
+			if (*parch) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			parch++;
+		}
+		left -= count << ilog2(sizeof(uintptr_t)); /* left -= count*sizeof(uintptr_t) */
+
+		/* check remaining unaligned values@the end */
+		p = (char*)parch;
+		if (unlikely(left > 0))
+		{
+			for (i = 0; i < left; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++; 
+			}
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != p ));
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
  /**
   * bio_put - release a reference to a bio
   * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..6a0c2c0 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1258,6 +1258,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
  		return;
  	}

+	/*
+	* Optimize away writes of all zeroes, subsequent reads to
+	* associated unprovisioned blocks will be zero filled.
+	*/
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
  	r = alloc_data_block(tc, &data_block);
  	switch (r) {
  	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a64576..abb46f7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
  				     int, int, gfp_t);
  extern int bio_uncopy_user(struct bio *);
  void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
  extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
  extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
  extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-07  6:30                 ` Eric Wheeler
@ 2014-12-07  6:45                   ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-07  6:45 UTC (permalink / raw)
  To: LVM2 development; +Cc: Jens Axboe, dm-devel, ejt

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.  bio_is_zero_filled now 
works with unaligned bvec data.

Signed-off-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>

---
> Also, attached is the patch that supports uintptr_t word sized 0-checks. 
Re-sending.  I think I've fixed my MUA's tendency to break patches.

 block/bio.c          |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm-thin.c |   10 +++++++
 include/linux/bio.h  |    1 +
 3 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..9100d35 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,73 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i, count;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+		uintptr_t *parch;
+		int left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+
+		/* check unaligned bytes at the beginning of p */
+		if (unlikely( ( (uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 )) {
+			count = sizeof(uintptr_t) - ( (uintptr_t)p & (sizeof(uintptr_t)-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++;
+			}
+			left -= count;
+		}
+
+		/* we should be word aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 ));
+
+		/* now check in word-sized chunks */
+		parch = (uintptr_t*)p;
+		count = left >> ilog2(sizeof(uintptr_t)); /* count = left / sizeof(uintptr_t) */;
+		for (i = 0; i < count; i++) {
+			if (*parch) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			parch++;
+		}
+		left -= count << ilog2(sizeof(uintptr_t)); /* left -= count*sizeof(uintptr_t) */
+
+		/* check remaining unaligned values at the end */
+		p = (char*)parch;
+		if (unlikely(left > 0))
+		{
+			for (i = 0; i < left; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++; 
+			}
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != p ));
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..6a0c2c0 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1258,6 +1258,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+	/*
+	* Optimize away writes of all zeroes, subsequent reads to
+	* associated unprovisioned blocks will be zero filled.
+	*/
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a64576..abb46f7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.1

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-07  6:45                   ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-07  6:45 UTC (permalink / raw)
  To: lvm-devel

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.  bio_is_zero_filled now 
works with unaligned bvec data.

Signed-off-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>

---
> Also, attached is the patch that supports uintptr_t word sized 0-checks. 
Re-sending.  I think I've fixed my MUA's tendency to break patches.

 block/bio.c          |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm-thin.c |   10 +++++++
 include/linux/bio.h  |    1 +
 3 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..9100d35 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,73 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i, count;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	bio_for_each_segment(bv, bio, iter) {
+		char *data = bvec_kmap_irq(&bv, &flags);
+		char *p = data;
+		uintptr_t *parch;
+		int left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+
+		/* check unaligned bytes at the beginning of p */
+		if (unlikely( ( (uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 )) {
+			count = sizeof(uintptr_t) - ( (uintptr_t)p & (sizeof(uintptr_t)-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++;
+			}
+			left -= count;
+		}
+
+		/* we should be word aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (sizeof(uintptr_t)-1) ) != 0 ));
+
+		/* now check in word-sized chunks */
+		parch = (uintptr_t*)p;
+		count = left >> ilog2(sizeof(uintptr_t)); /* count = left / sizeof(uintptr_t) */;
+		for (i = 0; i < count; i++) {
+			if (*parch) {
+				bvec_kunmap_irq(data, &flags);
+				return false;
+			}
+			parch++;
+		}
+		left -= count << ilog2(sizeof(uintptr_t)); /* left -= count*sizeof(uintptr_t) */
+
+		/* check remaining unaligned values@the end */
+		p = (char*)parch;
+		if (unlikely(left > 0))
+		{
+			for (i = 0; i < left; i++) {
+				if (*p) {
+					bvec_kunmap_irq(data, &flags);
+					return false;
+				}
+				p++; 
+			}
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != p ));
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..6a0c2c0 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1258,6 +1258,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+	/*
+	* Optimize away writes of all zeroes, subsequent reads to
+	* associated unprovisioned blocks will be zero filled.
+	*/
+	if (unlikely(bio_is_zero_filled(bio))) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a64576..abb46f7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-07  6:30                 ` Eric Wheeler
@ 2014-12-08 16:57                   ` Jens Axboe
  -1 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-08 16:57 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: dm-devel, ejt, LVM2 development

On 12/06/2014 11:30 PM, Eric Wheeler wrote:
> On Sat, 6 Dec 2014, Jens Axboe wrote:
>> On 12/06/2014 03:40 PM, Eric Wheeler wrote:
>>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>>>> I do wonder what the performance impact is on this for dm. Have you
>>>>> tried a (worst case) test of writing blocks that are zero filled, but
>>>>> with the last byte not being a zero?
>>>
>>> The additional overhead of worst-case should be (nearly) equal to the
>>> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
>>> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @
>>> 3.30GHz.
>>
>> That seems way too slow for checking if it's zero or not... Memory
>> bandwidth should be way higher than that. The line above, was that
>> what you ran? How does it look with bs=4k or higher?
>
> In userspace I can get ~12GB/s, so I think the algorithm is sound.
> dd might not be the right tool for this.

It's straight forward looping through the vectors, it can't really work 
any other way. But we need to figure out why it's so slow...

>>  read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec
>
> Can you suggest the right fio commandline to test sequential writes if
> all zeros?  I tried --zero_buffers but couldn't get it to write zeros,
> writes kept going to disk.

zero_buffers=1
scramble_buffers=0

should get you all zeroes.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-08 16:57                   ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-08 16:57 UTC (permalink / raw)
  To: lvm-devel

On 12/06/2014 11:30 PM, Eric Wheeler wrote:
> On Sat, 6 Dec 2014, Jens Axboe wrote:
>> On 12/06/2014 03:40 PM, Eric Wheeler wrote:
>>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>>>> I do wonder what the performance impact is on this for dm. Have you
>>>>> tried a (worst case) test of writing blocks that are zero filled, but
>>>>> with the last byte not being a zero?
>>>
>>> The additional overhead of worst-case should be (nearly) equal to the
>>> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
>>> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @
>>> 3.30GHz.
>>
>> That seems way too slow for checking if it's zero or not... Memory
>> bandwidth should be way higher than that. The line above, was that
>> what you ran? How does it look with bs=4k or higher?
>
> In userspace I can get ~12GB/s, so I think the algorithm is sound.
> dd might not be the right tool for this.

It's straight forward looping through the vectors, it can't really work 
any other way. But we need to figure out why it's so slow...

>>  read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec
>
> Can you suggest the right fio commandline to test sequential writes if
> all zeros?  I tried --zero_buffers but couldn't get it to write zeros,
> writes kept going to disk.

zero_buffers=1
scramble_buffers=0

should get you all zeroes.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-08 16:57                   ` Jens Axboe
@ 2014-12-09  8:02                     ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-09  8:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, ejt, LVM2 development

On Fri, 5 Dec 2014, Mike Snitzer wrote:
> I do wonder what the performance impact is on this for dm. Have you
> tried a (worst case) test of writing blocks that are zero filled,

Jens, thank you for your help w/ fio for generating zeroed writes!  
Clearly fio is superior to dd as a sequential benchmarking tool; I was 
actually able to push on the system's memory bandwidth.

Results:

I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled() 
and then complete without writing to disk, regardless of the return value 
from bio_is_zero_filled().  In loop.c this was done in 
do_bio_filebacked(), and for dm-thin.c this was done within 
provision_block().

This allows us to compare the performance difference between the simple 
loopback block device driver vs the more complex dm-thinp implementation 
just prior to block allocation.  These benchmarks give us a sense of how 
performance differences relate between bio_is_zero_filled() and block 
device implementation complexity, in addition to the raw performance of 
bio_is_zero_filled in best- and worst-case scenarios.

Since we always complete without writing after the call to 
bio_is_zero_filled, regardless of the bio's content (all zeros or not), we 
can benchmark the difference in the common use case of random data, as 
well as the edge case of skipping writes for bio's that contain all zeros 
when writing to unallocated space of thin-provisioned volumes.

These benchmarks were performed under KVM, so expect them to be lower 
bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2 
@ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.

Benchmarks were performed using fio-2.1.14-33-gf8b8f
 --name=writebw 
 --rw=write 
 --time_based 
 --runtime=7 --ramp_time=3 
 --norandommap 
 --ioengine=libaio 
 --group_reporting 
 --direct=1 
 --bs=1m 
 --filename=/dev/X
 --numjobs=Y

Random data was tested using:
  --zero_buffers=0 --scramble_buffers=1 

Zeroed data was tested using:
  --zero_buffers=1 --scramble_buffers=0

Values below are from aggrb.

              dm-thinp (MB/s)   loopback (MB/s)   loop faster by factor of
==============+======================================================
random jobs=4 | 18496.0          33522.0           1.68x
zeros  jobs=4 |  8119.2           9767.2           1.20x
==============+======================================================
random jobs=1 |  7330.5          12330.0           1.81x
zeros  jobs=1 |  4965.2           6799.9           1.11x

We can see that fio reports a best-case performance of 33.5GB/s with 
random data using 4 jobs in this test environment within loop.c.

For the real-world best-case within dm-thinp, fio reports 18.4GB/s, which 
is is relevant for use cases where bio vectors tend to contain non-zero 
data, particularly toward the beginning of the vector set.

I expect that the performance difference between loop.c and dm-thinp is 
due to implementation complexity of the block device driver, such as 
checking the metadata to see if a block must be allocated before calling 
provision_block().

(Note that it may be possible for these test values to exceed the memory 
bandwidth of the system since we exit early if finding non-zero data in a 
biovec, thus the remaining data is not actually inspected but is counted 
by fio.  Worst-case values should all be below the memory bandwidth 
maximum since all data is inspected.  I believe memtest86+ says my memory 
bandwidth is ~17GB/s.)

-- 
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-09  8:02                     ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-09  8:02 UTC (permalink / raw)
  To: lvm-devel

On Fri, 5 Dec 2014, Mike Snitzer wrote:
> I do wonder what the performance impact is on this for dm. Have you
> tried a (worst case) test of writing blocks that are zero filled,

Jens, thank you for your help w/ fio for generating zeroed writes!  
Clearly fio is superior to dd as a sequential benchmarking tool; I was 
actually able to push on the system's memory bandwidth.

Results:

I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled() 
and then complete without writing to disk, regardless of the return value 
from bio_is_zero_filled().  In loop.c this was done in 
do_bio_filebacked(), and for dm-thin.c this was done within 
provision_block().

This allows us to compare the performance difference between the simple 
loopback block device driver vs the more complex dm-thinp implementation 
just prior to block allocation.  These benchmarks give us a sense of how 
performance differences relate between bio_is_zero_filled() and block 
device implementation complexity, in addition to the raw performance of 
bio_is_zero_filled in best- and worst-case scenarios.

Since we always complete without writing after the call to 
bio_is_zero_filled, regardless of the bio's content (all zeros or not), we 
can benchmark the difference in the common use case of random data, as 
well as the edge case of skipping writes for bio's that contain all zeros 
when writing to unallocated space of thin-provisioned volumes.

These benchmarks were performed under KVM, so expect them to be lower 
bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2 
@ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.

Benchmarks were performed using fio-2.1.14-33-gf8b8f
 --name=writebw 
 --rw=write 
 --time_based 
 --runtime=7 --ramp_time=3 
 --norandommap 
 --ioengine=libaio 
 --group_reporting 
 --direct=1 
 --bs=1m 
 --filename=/dev/X
 --numjobs=Y

Random data was tested using:
  --zero_buffers=0 --scramble_buffers=1 

Zeroed data was tested using:
  --zero_buffers=1 --scramble_buffers=0

Values below are from aggrb.

              dm-thinp (MB/s)   loopback (MB/s)   loop faster by factor of
==============+======================================================
random jobs=4 | 18496.0          33522.0           1.68x
zeros  jobs=4 |  8119.2           9767.2           1.20x
==============+======================================================
random jobs=1 |  7330.5          12330.0           1.81x
zeros  jobs=1 |  4965.2           6799.9           1.11x

We can see that fio reports a best-case performance of 33.5GB/s with 
random data using 4 jobs in this test environment within loop.c.

For the real-world best-case within dm-thinp, fio reports 18.4GB/s, which 
is is relevant for use cases where bio vectors tend to contain non-zero 
data, particularly toward the beginning of the vector set.

I expect that the performance difference between loop.c and dm-thinp is 
due to implementation complexity of the block device driver, such as 
checking the metadata to see if a block must be allocated before calling 
provision_block().

(Note that it may be possible for these test values to exceed the memory 
bandwidth of the system since we exit early if finding non-zero data in a 
biovec, thus the remaining data is not actually inspected but is counted 
by fio.  Worst-case values should all be below the memory bandwidth 
maximum since all data is inspected.  I believe memtest86+ says my memory 
bandwidth is ~17GB/s.)

-- 
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-08 16:57                   ` Jens Axboe
@ 2014-12-09 14:38                     ` Marian Csontos
  -1 siblings, 0 replies; 46+ messages in thread
From: Marian Csontos @ 2014-12-09 14:38 UTC (permalink / raw)
  To: LVM2 development, Eric Wheeler; +Cc: dm-devel, ejt

On 12/08/2014 05:57 PM, Jens Axboe wrote:
> On 12/06/2014 11:30 PM, Eric Wheeler wrote:
>> On Sat, 6 Dec 2014, Jens Axboe wrote:
>>> On 12/06/2014 03:40 PM, Eric Wheeler wrote:
>>>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>>>>> I do wonder what the performance impact is on this for dm. Have you
>>>>>> tried a (worst case) test of writing blocks that are zero filled, but
>>>>>> with the last byte not being a zero?
>>>>
>>>> The additional overhead of worst-case should be (nearly) equal to the
>>>> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
>>>> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @
>>>> 3.30GHz.
>>>
>>> That seems way too slow for checking if it's zero or not... Memory
>>> bandwidth should be way higher than that. The line above, was that
>>> what you ran? How does it look with bs=4k or higher?
>>
>> In userspace I can get ~12GB/s, so I think the algorithm is sound.
>> dd might not be the right tool for this.
>
> It's straight forward looping through the vectors, it can't really work
> any other way. But we need to figure out why it's so slow...

May be a premature optimization without any supporting data, but as this 
will be a frequently running loop, it is worth optimizing.

Two tips:

1. Is the compiler unrolling loops?

Using inline bvec_kunmap_irq in the loop may prevent the compiler from 
doing so.

Try breaking the loop when *parch is non zero and calling it outside of 
loop only when i >= count.

2. The function is doing quite a lot of jumping around making CPU 
pipeline mostly useless.

Try using kernel's built-in memcmp, which I expect to be optimized, and 
compare with a zero-page.

Perhaps doing few useless bit-or ops for every write would be more 
effective than this constant jumping.

-- Martian


>
>>>  read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec
>>
>> Can you suggest the right fio commandline to test sequential writes if
>> all zeros?  I tried --zero_buffers but couldn't get it to write zeros,
>> writes kept going to disk.
>
> zero_buffers=1
> scramble_buffers=0
>
> should get you all zeroes.
>

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-09 14:38                     ` Marian Csontos
  0 siblings, 0 replies; 46+ messages in thread
From: Marian Csontos @ 2014-12-09 14:38 UTC (permalink / raw)
  To: lvm-devel

On 12/08/2014 05:57 PM, Jens Axboe wrote:
> On 12/06/2014 11:30 PM, Eric Wheeler wrote:
>> On Sat, 6 Dec 2014, Jens Axboe wrote:
>>> On 12/06/2014 03:40 PM, Eric Wheeler wrote:
>>>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>>>>> I do wonder what the performance impact is on this for dm. Have you
>>>>>> tried a (worst case) test of writing blocks that are zero filled, but
>>>>>> with the last byte not being a zero?
>>>>
>>>> The additional overhead of worst-case should be (nearly) equal to the
>>>> simplest test case of dd if=/dev/zero of=/dev/thinp/vol.  In my testing
>>>> that was 1.4GB/s within KVM on an Intel Xeon(R) CPU E3-1230 V2 @
>>>> 3.30GHz.
>>>
>>> That seems way too slow for checking if it's zero or not... Memory
>>> bandwidth should be way higher than that. The line above, was that
>>> what you ran? How does it look with bs=4k or higher?
>>
>> In userspace I can get ~12GB/s, so I think the algorithm is sound.
>> dd might not be the right tool for this.
>
> It's straight forward looping through the vectors, it can't really work
> any other way. But we need to figure out why it's so slow...

May be a premature optimization without any supporting data, but as this 
will be a frequently running loop, it is worth optimizing.

Two tips:

1. Is the compiler unrolling loops?

Using inline bvec_kunmap_irq in the loop may prevent the compiler from 
doing so.

Try breaking the loop when *parch is non zero and calling it outside of 
loop only when i >= count.

2. The function is doing quite a lot of jumping around making CPU 
pipeline mostly useless.

Try using kernel's built-in memcmp, which I expect to be optimized, and 
compare with a zero-page.

Perhaps doing few useless bit-or ops for every write would be more 
effective than this constant jumping.

-- Martian


>
>>>  read : io=12233MB, bw=1432.7MB/s, iops=22922, runt=  8539msec
>>
>> Can you suggest the right fio commandline to test sequential writes if
>> all zeros?  I tried --zero_buffers but couldn't get it to write zeros,
>> writes kept going to disk.
>
> zero_buffers=1
> scramble_buffers=0
>
> should get you all zeroes.
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-09  8:02                     ` Eric Wheeler
@ 2014-12-09 15:31                       ` Jens Axboe
  -1 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-09 15:31 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: dm-devel, ejt, LVM2 development

On 12/09/2014 01:02 AM, Eric Wheeler wrote:
> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled,
>
> Jens, thank you for your help w/ fio for generating zeroed writes!
> Clearly fio is superior to dd as a sequential benchmarking tool; I was
> actually able to push on the system's memory bandwidth.
>
> Results:
>
> I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled()
> and then complete without writing to disk, regardless of the return value
> from bio_is_zero_filled().  In loop.c this was done in
> do_bio_filebacked(), and for dm-thin.c this was done within
> provision_block().
>
> This allows us to compare the performance difference between the simple
> loopback block device driver vs the more complex dm-thinp implementation
> just prior to block allocation.  These benchmarks give us a sense of how
> performance differences relate between bio_is_zero_filled() and block
> device implementation complexity, in addition to the raw performance of
> bio_is_zero_filled in best- and worst-case scenarios.
>
> Since we always complete without writing after the call to
> bio_is_zero_filled, regardless of the bio's content (all zeros or not), we
> can benchmark the difference in the common use case of random data, as
> well as the edge case of skipping writes for bio's that contain all zeros
> when writing to unallocated space of thin-provisioned volumes.
>
> These benchmarks were performed under KVM, so expect them to be lower
> bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2
> @ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.
>
> Benchmarks were performed using fio-2.1.14-33-gf8b8f
>   --name=writebw
>   --rw=write
>   --time_based
>   --runtime=7 --ramp_time=3
>   --norandommap
>   --ioengine=libaio
>   --group_reporting
>   --direct=1
>   --bs=1m
>   --filename=/dev/X
>   --numjobs=Y
>
> Random data was tested using:
>    --zero_buffers=0 --scramble_buffers=1
>
> Zeroed data was tested using:
>    --zero_buffers=1 --scramble_buffers=0
>
> Values below are from aggrb.
>
>                dm-thinp (MB/s)   loopback (MB/s)   loop faster by factor of
> ==============+======================================================
> random jobs=4 | 18496.0          33522.0           1.68x
> zeros  jobs=4 |  8119.2           9767.2           1.20x
> ==============+======================================================
> random jobs=1 |  7330.5          12330.0           1.81x
> zeros  jobs=1 |  4965.2           6799.9           1.11x

This looks more reasonable in terms of throughput.

One major worry here is that checking every write is blowing your cache, 
so you could have a major impact on performance in general. Even for 
O_DIRECT writes, you are now accessing the memory. Have you looked into 
doing non-temporal memory compares instead? I think that would be the 
way to go.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-09 15:31                       ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-09 15:31 UTC (permalink / raw)
  To: lvm-devel

On 12/09/2014 01:02 AM, Eric Wheeler wrote:
> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled,
>
> Jens, thank you for your help w/ fio for generating zeroed writes!
> Clearly fio is superior to dd as a sequential benchmarking tool; I was
> actually able to push on the system's memory bandwidth.
>
> Results:
>
> I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled()
> and then complete without writing to disk, regardless of the return value
> from bio_is_zero_filled().  In loop.c this was done in
> do_bio_filebacked(), and for dm-thin.c this was done within
> provision_block().
>
> This allows us to compare the performance difference between the simple
> loopback block device driver vs the more complex dm-thinp implementation
> just prior to block allocation.  These benchmarks give us a sense of how
> performance differences relate between bio_is_zero_filled() and block
> device implementation complexity, in addition to the raw performance of
> bio_is_zero_filled in best- and worst-case scenarios.
>
> Since we always complete without writing after the call to
> bio_is_zero_filled, regardless of the bio's content (all zeros or not), we
> can benchmark the difference in the common use case of random data, as
> well as the edge case of skipping writes for bio's that contain all zeros
> when writing to unallocated space of thin-provisioned volumes.
>
> These benchmarks were performed under KVM, so expect them to be lower
> bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2
> @ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.
>
> Benchmarks were performed using fio-2.1.14-33-gf8b8f
>   --name=writebw
>   --rw=write
>   --time_based
>   --runtime=7 --ramp_time=3
>   --norandommap
>   --ioengine=libaio
>   --group_reporting
>   --direct=1
>   --bs=1m
>   --filename=/dev/X
>   --numjobs=Y
>
> Random data was tested using:
>    --zero_buffers=0 --scramble_buffers=1
>
> Zeroed data was tested using:
>    --zero_buffers=1 --scramble_buffers=0
>
> Values below are from aggrb.
>
>                dm-thinp (MB/s)   loopback (MB/s)   loop faster by factor of
> ==============+======================================================
> random jobs=4 | 18496.0          33522.0           1.68x
> zeros  jobs=4 |  8119.2           9767.2           1.20x
> ==============+======================================================
> random jobs=1 |  7330.5          12330.0           1.81x
> zeros  jobs=1 |  4965.2           6799.9           1.11x

This looks more reasonable in terms of throughput.

One major worry here is that checking every write is blowing your cache, 
so you could have a major impact on performance in general. Even for 
O_DIRECT writes, you are now accessing the memory. Have you looked into 
doing non-temporal memory compares instead? I think that would be the 
way to go.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-09 15:31                       ` Jens Axboe
@ 2014-12-09 15:41                         ` Jens Axboe
  -1 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-09 15:41 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: dm-devel, ejt, LVM2 development

On 12/09/2014 08:31 AM, Jens Axboe wrote:
> On 12/09/2014 01:02 AM, Eric Wheeler wrote:
>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>> I do wonder what the performance impact is on this for dm. Have you
>>> tried a (worst case) test of writing blocks that are zero filled,
>>
>> Jens, thank you for your help w/ fio for generating zeroed writes!
>> Clearly fio is superior to dd as a sequential benchmarking tool; I was
>> actually able to push on the system's memory bandwidth.
>>
>> Results:
>>
>> I hacked block/loop.c and md/dm-thin.c to always call
>> bio_is_zero_filled()
>> and then complete without writing to disk, regardless of the return value
>> from bio_is_zero_filled().  In loop.c this was done in
>> do_bio_filebacked(), and for dm-thin.c this was done within
>> provision_block().
>>
>> This allows us to compare the performance difference between the simple
>> loopback block device driver vs the more complex dm-thinp implementation
>> just prior to block allocation.  These benchmarks give us a sense of how
>> performance differences relate between bio_is_zero_filled() and block
>> device implementation complexity, in addition to the raw performance of
>> bio_is_zero_filled in best- and worst-case scenarios.
>>
>> Since we always complete without writing after the call to
>> bio_is_zero_filled, regardless of the bio's content (all zeros or
>> not), we
>> can benchmark the difference in the common use case of random data, as
>> well as the edge case of skipping writes for bio's that contain all zeros
>> when writing to unallocated space of thin-provisioned volumes.
>>
>> These benchmarks were performed under KVM, so expect them to be lower
>> bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU
>> E3-1230 V2
>> @ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.
>>
>> Benchmarks were performed using fio-2.1.14-33-gf8b8f
>>   --name=writebw
>>   --rw=write
>>   --time_based
>>   --runtime=7 --ramp_time=3
>>   --norandommap
>>   --ioengine=libaio
>>   --group_reporting
>>   --direct=1
>>   --bs=1m
>>   --filename=/dev/X
>>   --numjobs=Y
>>
>> Random data was tested using:
>>    --zero_buffers=0 --scramble_buffers=1
>>
>> Zeroed data was tested using:
>>    --zero_buffers=1 --scramble_buffers=0
>>
>> Values below are from aggrb.
>>
>>                dm-thinp (MB/s)   loopback (MB/s)   loop faster by
>> factor of
>> ==============+======================================================
>> random jobs=4 | 18496.0          33522.0           1.68x
>> zeros  jobs=4 |  8119.2           9767.2           1.20x
>> ==============+======================================================
>> random jobs=1 |  7330.5          12330.0           1.81x
>> zeros  jobs=1 |  4965.2           6799.9           1.11x
>
> This looks more reasonable in terms of throughput.
>
> One major worry here is that checking every write is blowing your cache,
> so you could have a major impact on performance in general. Even for
> O_DIRECT writes, you are now accessing the memory. Have you looked into
> doing non-temporal memory compares instead? I think that would be the
> way to go.

So I found your patch in the thread. For each vector, use memcmp() 
instead and hope it does the right thing. You can compare with 
empty_zero_page. That should drastically cut down on the amount of hand 
rolled code you have in bio_is_zero_filled() at the moment.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-09 15:41                         ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2014-12-09 15:41 UTC (permalink / raw)
  To: lvm-devel

On 12/09/2014 08:31 AM, Jens Axboe wrote:
> On 12/09/2014 01:02 AM, Eric Wheeler wrote:
>> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>>> I do wonder what the performance impact is on this for dm. Have you
>>> tried a (worst case) test of writing blocks that are zero filled,
>>
>> Jens, thank you for your help w/ fio for generating zeroed writes!
>> Clearly fio is superior to dd as a sequential benchmarking tool; I was
>> actually able to push on the system's memory bandwidth.
>>
>> Results:
>>
>> I hacked block/loop.c and md/dm-thin.c to always call
>> bio_is_zero_filled()
>> and then complete without writing to disk, regardless of the return value
>> from bio_is_zero_filled().  In loop.c this was done in
>> do_bio_filebacked(), and for dm-thin.c this was done within
>> provision_block().
>>
>> This allows us to compare the performance difference between the simple
>> loopback block device driver vs the more complex dm-thinp implementation
>> just prior to block allocation.  These benchmarks give us a sense of how
>> performance differences relate between bio_is_zero_filled() and block
>> device implementation complexity, in addition to the raw performance of
>> bio_is_zero_filled in best- and worst-case scenarios.
>>
>> Since we always complete without writing after the call to
>> bio_is_zero_filled, regardless of the bio's content (all zeros or
>> not), we
>> can benchmark the difference in the common use case of random data, as
>> well as the edge case of skipping writes for bio's that contain all zeros
>> when writing to unallocated space of thin-provisioned volumes.
>>
>> These benchmarks were performed under KVM, so expect them to be lower
>> bounds due to overhead.  The hardware is a Intel(R) Xeon(R) CPU
>> E3-1230 V2
>> @ 3.30GHz.  The VM was allocated 4GB of memory with 4 cpu cores.
>>
>> Benchmarks were performed using fio-2.1.14-33-gf8b8f
>>   --name=writebw
>>   --rw=write
>>   --time_based
>>   --runtime=7 --ramp_time=3
>>   --norandommap
>>   --ioengine=libaio
>>   --group_reporting
>>   --direct=1
>>   --bs=1m
>>   --filename=/dev/X
>>   --numjobs=Y
>>
>> Random data was tested using:
>>    --zero_buffers=0 --scramble_buffers=1
>>
>> Zeroed data was tested using:
>>    --zero_buffers=1 --scramble_buffers=0
>>
>> Values below are from aggrb.
>>
>>                dm-thinp (MB/s)   loopback (MB/s)   loop faster by
>> factor of
>> ==============+======================================================
>> random jobs=4 | 18496.0          33522.0           1.68x
>> zeros  jobs=4 |  8119.2           9767.2           1.20x
>> ==============+======================================================
>> random jobs=1 |  7330.5          12330.0           1.81x
>> zeros  jobs=1 |  4965.2           6799.9           1.11x
>
> This looks more reasonable in terms of throughput.
>
> One major worry here is that checking every write is blowing your cache,
> so you could have a major impact on performance in general. Even for
> O_DIRECT writes, you are now accessing the memory. Have you looked into
> doing non-temporal memory compares instead? I think that would be the
> way to go.

So I found your patch in the thread. For each vector, use memcmp() 
instead and hope it does the right thing. You can compare with 
empty_zero_page. That should drastically cut down on the amount of hand 
rolled code you have in bio_is_zero_filled() at the moment.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-09 15:41                         ` Jens Axboe
@ 2014-12-10  2:52                           ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-10  2:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, Marian Csontos, ejt, Eric Wheeler, LVM2 development

> > >                dm-thinp (MB/s)   loopback (MB/s)   loop faster by
> > >
> > > ==============+======================================================
> > > random jobs=4 | 18496.0          33522.0           1.68x
> > > zeros  jobs=4 |  8119.2           9767.2           1.20x
> > > ==============+======================================================
> > > random jobs=1 |  7330.5          12330.0           1.81x
> > > zeros  jobs=1 |  4965.2           6799.9           1.11x
> > 
> > This looks more reasonable in terms of throughput.
> > 
> > One major worry here is that checking every write is blowing your cache,
> > so you could have a major impact on performance in general. Even for
> > O_DIRECT writes, you are now accessing the memory. Have you looked into
> > doing non-temporal memory compares instead? I think that would be the
> > way to go.
> 
> So I found your patch in the thread. For each vector, use memcmp() instead and
> hope it does the right thing. You can compare with empty_zero_page. That
> should drastically cut down on the amount of hand rolled code you have in
> bio_is_zero_filled() at the moment.

I ran these against dm-thinp within provision_block() when checking
  memcmp(page_address(ZERO_PAGE()), data, bv.bv_len) != 0
and got these numbers that show memcmp() based testing is rather slow:
  zeroed jobs=1 1268.1 MB/s
  zeroed jobs=4 1445.6 MB/s

  random jobs=1  7554.1 MB/s
  random jobs=4 17677.0 MB/s


I updated the checking code to use __uint128_t, and performance got even 
better than with 64-bit integers (note that gcc is emulating 128bit).  
These are the numbers w/ __uint128_t:

  zeroed jobs=1  6424 MB/s
  zeroed jobs=4 10738 MB/s MB/s

  random jobs=1  7928 MB/s
  random jobs=4 19435 MB/s MB/s


Jens says to try non-temporal memory compares to deal with cache 
invalidation, but that sounds like arch-specific assembler unless I am 
mistaken.  Does GCC have something I can use to flag the buffer as not to 
be cached?  If not, I'm happy to poke some asm code into the check 
routine, but I'm a bit rusty there.  

Can someone suggest the inline asm that I would use to rep until a 
non-zero value appears?  Assume that we are 8 or 16-byte aligned, probably 
the bigger the word sized checks are the better, performance wise.

My latest version is attached, and includes moving bvec_kunmap_irq out of 
the for loops in hopes of getting gcc to unroll loops.

-Eric

---
 block/bio.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..262f190 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,80 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i, count, left;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	char *data = NULL, *p;
+	__uint128_t *parch;
+
+	bio_for_each_segment(bv, bio, iter) {
+		data = bvec_kmap_irq(&bv, &flags);
+		p = data;
+		left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+
+		/* check unaligned bytes at the beginning of p */
+		if (unlikely( ( (uintptr_t)p & (sizeof(__uint128_t)-1) ) != 0 )) {
+			count = sizeof(__uint128_t) - ( (uintptr_t)p & (sizeof(__uint128_t)-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) 
+					break;
+				p++;
+			}
+			if (i < count)
+				goto out_false;
+			left -= count;
+		}
+
+		/* we should be word aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (sizeof(__uint128_t)-1) ) != 0 ));
+
+		/* now check in word-sized chunks */
+		parch = (__uint128_t*)p;
+		count = left >> ilog2(sizeof(__uint128_t)); /* count = left / sizeof(__uint128_t) */;
+		for (i = 0; i < count; i++) {
+			if (*parch) 
+				break;
+			parch++;
+		}
+		if (i < count)
+			goto out_false;
+		left -= count << ilog2(sizeof(__uint128_t)); /* left -= count*sizeof(__uint128_t) */
+
+		/* check remaining unaligned values at the end */
+		p = (char*)parch;
+		if (unlikely(left > 0))
+		{
+			for (i = 0; i < left; i++) {
+				if (*p)
+					break;
+				p++; 
+			}
+			if (i < count)
+				goto out_false;
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != p ));
+	}
+
+	return true;
+
+out_false:
+	if (data != NULL)
+		bvec_kunmap_irq(data, &flags);
+	return false;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
-- 
1.7.1

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2014-12-10  2:52                           ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2014-12-10  2:52 UTC (permalink / raw)
  To: lvm-devel

> > >                dm-thinp (MB/s)   loopback (MB/s)   loop faster by
> > >
> > > ==============+======================================================
> > > random jobs=4 | 18496.0          33522.0           1.68x
> > > zeros  jobs=4 |  8119.2           9767.2           1.20x
> > > ==============+======================================================
> > > random jobs=1 |  7330.5          12330.0           1.81x
> > > zeros  jobs=1 |  4965.2           6799.9           1.11x
> > 
> > This looks more reasonable in terms of throughput.
> > 
> > One major worry here is that checking every write is blowing your cache,
> > so you could have a major impact on performance in general. Even for
> > O_DIRECT writes, you are now accessing the memory. Have you looked into
> > doing non-temporal memory compares instead? I think that would be the
> > way to go.
> 
> So I found your patch in the thread. For each vector, use memcmp() instead and
> hope it does the right thing. You can compare with empty_zero_page. That
> should drastically cut down on the amount of hand rolled code you have in
> bio_is_zero_filled() at the moment.

I ran these against dm-thinp within provision_block() when checking
  memcmp(page_address(ZERO_PAGE()), data, bv.bv_len) != 0
and got these numbers that show memcmp() based testing is rather slow:
  zeroed jobs=1 1268.1 MB/s
  zeroed jobs=4 1445.6 MB/s

  random jobs=1  7554.1 MB/s
  random jobs=4 17677.0 MB/s


I updated the checking code to use __uint128_t, and performance got even 
better than with 64-bit integers (note that gcc is emulating 128bit).  
These are the numbers w/ __uint128_t:

  zeroed jobs=1  6424 MB/s
  zeroed jobs=4 10738 MB/s MB/s

  random jobs=1  7928 MB/s
  random jobs=4 19435 MB/s MB/s


Jens says to try non-temporal memory compares to deal with cache 
invalidation, but that sounds like arch-specific assembler unless I am 
mistaken.  Does GCC have something I can use to flag the buffer as not to 
be cached?  If not, I'm happy to poke some asm code into the check 
routine, but I'm a bit rusty there.  

Can someone suggest the inline asm that I would use to rep until a 
non-zero value appears?  Assume that we are 8 or 16-byte aligned, probably 
the bigger the word sized checks are the better, performance wise.

My latest version is attached, and includes moving bvec_kunmap_irq out of 
the for loops in hopes of getting gcc to unroll loops.

-Eric

---
 block/bio.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..262f190 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,80 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+bool bio_is_zero_filled(struct bio *bio)
+{
+	unsigned i, count, left;
+	unsigned long flags;
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	char *data = NULL, *p;
+	__uint128_t *parch;
+
+	bio_for_each_segment(bv, bio, iter) {
+		data = bvec_kmap_irq(&bv, &flags);
+		p = data;
+		left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+
+		/* check unaligned bytes at the beginning of p */
+		if (unlikely( ( (uintptr_t)p & (sizeof(__uint128_t)-1) ) != 0 )) {
+			count = sizeof(__uint128_t) - ( (uintptr_t)p & (sizeof(__uint128_t)-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) 
+					break;
+				p++;
+			}
+			if (i < count)
+				goto out_false;
+			left -= count;
+		}
+
+		/* we should be word aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (sizeof(__uint128_t)-1) ) != 0 ));
+
+		/* now check in word-sized chunks */
+		parch = (__uint128_t*)p;
+		count = left >> ilog2(sizeof(__uint128_t)); /* count = left / sizeof(__uint128_t) */;
+		for (i = 0; i < count; i++) {
+			if (*parch) 
+				break;
+			parch++;
+		}
+		if (i < count)
+			goto out_false;
+		left -= count << ilog2(sizeof(__uint128_t)); /* left -= count*sizeof(__uint128_t) */
+
+		/* check remaining unaligned values@the end */
+		p = (char*)parch;
+		if (unlikely(left > 0))
+		{
+			for (i = 0; i < left; i++) {
+				if (*p)
+					break;
+				p++; 
+			}
+			if (i < count)
+				goto out_false;
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != p ));
+	}
+
+	return true;
+
+out_false:
+	if (data != NULL)
+		bvec_kunmap_irq(data, &flags);
+	return false;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2014-12-09 15:31                       ` Jens Axboe
@ 2015-01-26  2:53                         ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2015-01-26  2:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, ejt, LVM2 development

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.  

Single-thread performance evaluates zero bio's at ~8137MB/s under KVM on a
Xeon E3-1230.  Zero-checks attempt to minimize cache effects on non-zero
data. bio_is_zero_filled works with unaligned bvec data. (Using memcmp
and comparing against the zero page is ~5x slower, so this implementation
was optimized to increase pipelined exectution.)

The pool_feature pf.skip_zero_writes is implemented and configurable at
creation time or via table reload.  

To test we prepare two dm-thinp volumes test1 and test2 of equal size.
We format test1 without the patch, mount, and extract the Linux source
tree onto the test1 filesystem, and unmount.  Finally, we activate
skip_zero_writes, dd test1 over test2, and verify checksums:
        b210f032a6465178103317f3c40ab59f  /dev/test/test1
        b210f032a6465178103317f3c40ab59f  /dev/test/test2

Notice the allocation difference for thin volumes test1 and test2 (after
dd if=test1 of=test2), even though they have the same md5sum:
  LV    VG   Attr       LSize Pool  Origin Data%
  test1 test Vwi-a-tz-- 4.00g thinp         22.04
  test2 test Vwi-a-tz-- 4.00g thinp         18.33

An additional 3.71% of space was saved by the patch, and so were the
~150MB of (possibly random) IOs that would have hit disk, not to mention
reads that now bypass the disk since they are unallocated.  We also save
the metadata overhead of ~2400 provision_block() allocations.


Signed-off-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: ejt@redhat.com

---
 Documentation/device-mapper/thin-provisioning.txt |    2 +
 block/bio.c                                       |  103 +++++++++++++++++++++
 drivers/md/dm-thin.c                              |   27 ++++++
 include/linux/bio.h                               |    1 +
 4 files changed, 133 insertions(+), 0 deletions(-)
 http://www.globallinuxsecurity.pro/

diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 2f51735..7304ccf 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -266,6 +266,8 @@ i) Constructor
 
       error_if_no_space: Error IOs, instead of queueing, if no space.
 
+	  enable_zero_writes: Enable all-zero writes to unallocated blocks. 
+    
     Data block size must be between 64KB (128 sectors) and 1GB
     (2097152 sectors) inclusive.
 
diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..7e372b7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,109 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+__attribute__((optimize("unroll-loops")))
+bool bio_is_zero_filled(struct bio *bio)
+{
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	char *data, *p;
+	__uint128_t *p128;
+
+	unsigned i, count, left;
+	unsigned long flags;
+
+	/* Align to the data chunk size */
+	const int align_to = sizeof(__uint128_t);
+
+	p128 = NULL;
+	bio_for_each_segment(bv, bio, iter) {
+		data = bvec_kmap_irq(&bv, &flags);
+		p = data;
+		left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+		/* check unaligned bytes at the beginning of p, if any */
+		if (unlikely( ( (uintptr_t)p & (align_to-1) ) != 0 )) {
+			count = align_to - ( (uintptr_t)p & (align_to-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) 
+					break;
+				p++;
+			}
+			if (i < count)
+				goto out_false;
+			left -= count;
+		}
+
+		/* we should be align_to-byte aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (align_to-1) ) != 0 ));
+
+		p128 = (__uint128_t*)p;
+		count = left >> 9; /* count = left / (32*16) */
+
+		/* Quickly sample first, middle, and last values 
+		 * and exit early without hitting the loop if nonzero.
+		 * This reduces cache effects for non-zero data and has almost
+		 * no effective penalty since these values will probably be in 
+		 * cache when we test them below.
+		 */
+		if (likely(count) &&
+			   (p128[0] || p128[left>>5] || p128[(left>>4)-1]))
+			goto out_false;
+
+		/* We are using __uint128_t here, so 32*16=512 bytes per loop 
+		 * Writing out the 32 operations helps with pipelining. */
+		for (i = 0; i < count; i++) {
+			if (p128[ 0] | p128[ 1] | p128[ 2] | p128[ 3] |
+			    p128[ 4] | p128[ 5] | p128[ 6] | p128[ 7] |
+			    p128[ 8] | p128[ 9] | p128[10] | p128[11] |
+			    p128[12] | p128[13] | p128[14] | p128[15] |
+			    p128[16] | p128[17] | p128[18] | p128[19] |
+			    p128[20] | p128[21] | p128[22] | p128[23] |
+			    p128[24] | p128[25] | p128[26] | p128[27] |
+			    p128[28] | p128[29] | p128[30] | p128[31])
+				break;
+
+			p128 += 32;
+		}
+		if (i < count)
+			goto out_false;
+
+		left -= count << 9; /* left -= count * 32*16 */
+
+		/* check remaining unaligned values at the end, if any */
+		if (unlikely(left > 0))	{
+			p = (char*)p128;
+			for (i = 0; i < left; i++) {
+				if (*p)
+					break;
+				p++;
+			}
+			if (i < left)
+				goto out_false;
+
+			/* Fixup values for the BUG_ON checks below */
+			p128 = (__uint128_t*)p;
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != (char*)p128 ));
+	}
+
+	return true;
+
+out_false:
+	if (likely(data != NULL))
+		bvec_kunmap_irq(data, &flags);
+
+	return false;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..d0cebd2 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -151,6 +151,7 @@ struct pool_features {
 	bool discard_enabled:1;
 	bool discard_passdown:1;
 	bool error_if_no_space:1;
+	bool skip_zero_writes:1;
 };
 
 struct thin_c;
@@ -1258,6 +1259,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+        /*
+	 *  Skip writes of all zeroes to unprovisioned blocks.
+	 */
+
+	if (tc->pool->pf.skip_zero_writes && bio_is_zero_filled(bio) ) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
@@ -2063,6 +2074,7 @@ static void pool_features_init(struct pool_features *pf)
 	pf->discard_enabled = true;
 	pf->discard_passdown = true;
 	pf->error_if_no_space = false;
+	pf->skip_zero_writes = true;
 }
 
 static void __pool_destroy(struct pool *pool)
@@ -2310,6 +2322,9 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
 		else if (!strcasecmp(arg_name, "error_if_no_space"))
 			pf->error_if_no_space = true;
 
+		else if (!strcasecmp(arg_name, "enable_zero_writes"))
+			pf->skip_zero_writes = false;
+
 		else {
 			ti->error = "Unrecognised pool feature requested";
 			r = -EINVAL;
@@ -2393,6 +2408,7 @@ static dm_block_t calc_metadata_threshold(struct pool_c *pt)
  *	     no_discard_passdown: don't pass discards down to the data device
  *	     read_only: Don't allow any changes to be made to the pool metadata.
  *	     error_if_no_space: error IOs, instead of queueing, if no space.
+ *	     enable_zero_writes: Enable all-zero writes to unallocated blocks.
  */
 static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 {
@@ -2511,6 +2527,9 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	}
 	ti->private = pt;
 
+	/* The pf skip_zero_writes is safe to change at any time */
+	pool->pf.skip_zero_writes = pf.skip_zero_writes;
+
 	r = dm_pool_register_metadata_threshold(pt->pool->pmd,
 						calc_metadata_threshold(pt),
 						metadata_low_callback,
@@ -2936,6 +2955,9 @@ static void emit_flags(struct pool_features *pf, char *result,
 
 	if (pf->error_if_no_space)
 		DMEMIT("error_if_no_space ");
+	
+	if (!pf->skip_zero_writes)
+		DMEMIT("enable_zero_writes ");
 }
 
 /*
@@ -3043,6 +3065,11 @@ static void pool_status(struct dm_target *ti, status_type_t type,
 		else
 			DMEMIT("queue_if_no_space ");
 
+		if (!pool->pf.skip_zero_writes)
+			DMEMIT("enable_zero_writes ");
+		else
+			DMEMIT("skip_zero_writes ");
+
 		break;
 
 	case STATUSTYPE_TABLE:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a64576..abb46f7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2015-01-26  2:53                         ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2015-01-26  2:53 UTC (permalink / raw)
  To: lvm-devel

Introduce bio_is_zero_filled() and use it to optimize away writing all
zeroes to unprovisioned blocks.  Subsequent reads to the associated
unprovisioned blocks will be zero filled.  

Single-thread performance evaluates zero bio's at ~8137MB/s under KVM on a
Xeon E3-1230.  Zero-checks attempt to minimize cache effects on non-zero
data. bio_is_zero_filled works with unaligned bvec data. (Using memcmp
and comparing against the zero page is ~5x slower, so this implementation
was optimized to increase pipelined exectution.)

The pool_feature pf.skip_zero_writes is implemented and configurable at
creation time or via table reload.  

To test we prepare two dm-thinp volumes test1 and test2 of equal size.
We format test1 without the patch, mount, and extract the Linux source
tree onto the test1 filesystem, and unmount.  Finally, we activate
skip_zero_writes, dd test1 over test2, and verify checksums:
        b210f032a6465178103317f3c40ab59f  /dev/test/test1
        b210f032a6465178103317f3c40ab59f  /dev/test/test2

Notice the allocation difference for thin volumes test1 and test2 (after
dd if=test1 of=test2), even though they have the same md5sum:
  LV    VG   Attr       LSize Pool  Origin Data%
  test1 test Vwi-a-tz-- 4.00g thinp         22.04
  test2 test Vwi-a-tz-- 4.00g thinp         18.33

An additional 3.71% of space was saved by the patch, and so were the
~150MB of (possibly random) IOs that would have hit disk, not to mention
reads that now bypass the disk since they are unallocated.  We also save
the metadata overhead of ~2400 provision_block() allocations.


Signed-off-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: ejt at redhat.com

---
 Documentation/device-mapper/thin-provisioning.txt |    2 +
 block/bio.c                                       |  103 +++++++++++++++++++++
 drivers/md/dm-thin.c                              |   27 ++++++
 include/linux/bio.h                               |    1 +
 4 files changed, 133 insertions(+), 0 deletions(-)
 http://www.globallinuxsecurity.pro/

diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 2f51735..7304ccf 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -266,6 +266,8 @@ i) Constructor
 
       error_if_no_space: Error IOs, instead of queueing, if no space.
 
+	  enable_zero_writes: Enable all-zero writes to unallocated blocks. 
+    
     Data block size must be between 64KB (128 sectors) and 1GB
     (2097152 sectors) inclusive.
 
diff --git a/block/bio.c b/block/bio.c
index 8c2e55e..7e372b7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -511,6 +511,109 @@ void zero_fill_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(zero_fill_bio);
 
+__attribute__((optimize("unroll-loops")))
+bool bio_is_zero_filled(struct bio *bio)
+{
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	char *data, *p;
+	__uint128_t *p128;
+
+	unsigned i, count, left;
+	unsigned long flags;
+
+	/* Align to the data chunk size */
+	const int align_to = sizeof(__uint128_t);
+
+	p128 = NULL;
+	bio_for_each_segment(bv, bio, iter) {
+		data = bvec_kmap_irq(&bv, &flags);
+		p = data;
+		left = bv.bv_len;
+
+		if (unlikely( data == NULL ))
+			continue;
+
+		/* check unaligned bytes at the beginning of p, if any */
+		if (unlikely( ( (uintptr_t)p & (align_to-1) ) != 0 )) {
+			count = align_to - ( (uintptr_t)p & (align_to-1) );
+			for (i = 0; i < count; i++) {
+				if (*p) 
+					break;
+				p++;
+			}
+			if (i < count)
+				goto out_false;
+			left -= count;
+		}
+
+		/* we should be align_to-byte aligned now */
+		BUG_ON(unlikely( ((uintptr_t)p & (align_to-1) ) != 0 ));
+
+		p128 = (__uint128_t*)p;
+		count = left >> 9; /* count = left / (32*16) */
+
+		/* Quickly sample first, middle, and last values 
+		 * and exit early without hitting the loop if nonzero.
+		 * This reduces cache effects for non-zero data and has almost
+		 * no effective penalty since these values will probably be in 
+		 * cache when we test them below.
+		 */
+		if (likely(count) &&
+			   (p128[0] || p128[left>>5] || p128[(left>>4)-1]))
+			goto out_false;
+
+		/* We are using __uint128_t here, so 32*16=512 bytes per loop 
+		 * Writing out the 32 operations helps with pipelining. */
+		for (i = 0; i < count; i++) {
+			if (p128[ 0] | p128[ 1] | p128[ 2] | p128[ 3] |
+			    p128[ 4] | p128[ 5] | p128[ 6] | p128[ 7] |
+			    p128[ 8] | p128[ 9] | p128[10] | p128[11] |
+			    p128[12] | p128[13] | p128[14] | p128[15] |
+			    p128[16] | p128[17] | p128[18] | p128[19] |
+			    p128[20] | p128[21] | p128[22] | p128[23] |
+			    p128[24] | p128[25] | p128[26] | p128[27] |
+			    p128[28] | p128[29] | p128[30] | p128[31])
+				break;
+
+			p128 += 32;
+		}
+		if (i < count)
+			goto out_false;
+
+		left -= count << 9; /* left -= count * 32*16 */
+
+		/* check remaining unaligned values@the end, if any */
+		if (unlikely(left > 0))	{
+			p = (char*)p128;
+			for (i = 0; i < left; i++) {
+				if (*p)
+					break;
+				p++;
+			}
+			if (i < left)
+				goto out_false;
+
+			/* Fixup values for the BUG_ON checks below */
+			p128 = (__uint128_t*)p;
+			left = 0;
+		}
+
+		bvec_kunmap_irq(data, &flags);
+		BUG_ON(unlikely( left > 0 ));
+		BUG_ON(unlikely( data+bv.bv_len != (char*)p128 ));
+	}
+
+	return true;
+
+out_false:
+	if (likely(data != NULL))
+		bvec_kunmap_irq(data, &flags);
+
+	return false;
+}
+EXPORT_SYMBOL(bio_is_zero_filled);
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index fc9c848..d0cebd2 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -151,6 +151,7 @@ struct pool_features {
 	bool discard_enabled:1;
 	bool discard_passdown:1;
 	bool error_if_no_space:1;
+	bool skip_zero_writes:1;
 };
 
 struct thin_c;
@@ -1258,6 +1259,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		return;
 	}
 
+        /*
+	 *  Skip writes of all zeroes to unprovisioned blocks.
+	 */
+
+	if (tc->pool->pf.skip_zero_writes && bio_is_zero_filled(bio) ) {
+		cell_defer_no_holder(tc, cell);
+		bio_endio(bio, 0);
+		return;
+	}
+
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
@@ -2063,6 +2074,7 @@ static void pool_features_init(struct pool_features *pf)
 	pf->discard_enabled = true;
 	pf->discard_passdown = true;
 	pf->error_if_no_space = false;
+	pf->skip_zero_writes = true;
 }
 
 static void __pool_destroy(struct pool *pool)
@@ -2310,6 +2322,9 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
 		else if (!strcasecmp(arg_name, "error_if_no_space"))
 			pf->error_if_no_space = true;
 
+		else if (!strcasecmp(arg_name, "enable_zero_writes"))
+			pf->skip_zero_writes = false;
+
 		else {
 			ti->error = "Unrecognised pool feature requested";
 			r = -EINVAL;
@@ -2393,6 +2408,7 @@ static dm_block_t calc_metadata_threshold(struct pool_c *pt)
  *	     no_discard_passdown: don't pass discards down to the data device
  *	     read_only: Don't allow any changes to be made to the pool metadata.
  *	     error_if_no_space: error IOs, instead of queueing, if no space.
+ *	     enable_zero_writes: Enable all-zero writes to unallocated blocks.
  */
 static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 {
@@ -2511,6 +2527,9 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	}
 	ti->private = pt;
 
+	/* The pf skip_zero_writes is safe to change at any time */
+	pool->pf.skip_zero_writes = pf.skip_zero_writes;
+
 	r = dm_pool_register_metadata_threshold(pt->pool->pmd,
 						calc_metadata_threshold(pt),
 						metadata_low_callback,
@@ -2936,6 +2955,9 @@ static void emit_flags(struct pool_features *pf, char *result,
 
 	if (pf->error_if_no_space)
 		DMEMIT("error_if_no_space ");
+	
+	if (!pf->skip_zero_writes)
+		DMEMIT("enable_zero_writes ");
 }
 
 /*
@@ -3043,6 +3065,11 @@ static void pool_status(struct dm_target *ti, status_type_t type,
 		else
 			DMEMIT("queue_if_no_space ");
 
+		if (!pool->pf.skip_zero_writes)
+			DMEMIT("enable_zero_writes ");
+		else
+			DMEMIT("skip_zero_writes ");
+
 		break;
 
 	case STATUSTYPE_TABLE:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a64576..abb46f7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+bool bio_is_zero_filled(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
  2015-01-26  2:53                         ` Eric Wheeler
@ 2015-02-15  0:31                           ` Eric Wheeler
  -1 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2015-02-15  0:31 UTC (permalink / raw)
  To: LVM2 development; +Cc: Jens Axboe, dm-devel, Marian Csontos, ejt, Mike Snitzer

Hey Jens / Joe / Mike / Marian:

I'm checking in for additional feedback on the patch below (intact in 
previous email of this thread) in the hope of meeting the merge window for 
3.20.

I have implemented the following based on your suggestions:
  * pipelining for performance
  * showed that memcmp against zero page is far slower than 
    directly checking for zero
  * minimized cache effects of nonzero data with early minimal nonzero 
    testing before hitting the pipeline
  * implemented configurability via table reload (pf->skip_zero_writes)
  * implemented status via dmsetup status

Do you have any additional considerations that should be addressed before 
pushing this upstream?

-Eric


On Sun, 25 Jan 2015, Eric Wheeler wrote:

> Introduce bio_is_zero_filled() and use it to optimize away writing all
> zeroes to unprovisioned blocks.  Subsequent reads to the associated
> unprovisioned blocks will be zero filled.  
> 
> Single-thread performance evaluates zero bio's at ~8137MB/s under KVM on a
> Xeon E3-1230.  Zero-checks attempt to minimize cache effects on non-zero
> data. bio_is_zero_filled works with unaligned bvec data. (Using memcmp
> and comparing against the zero page is ~5x slower, so this implementation
> was optimized to increase pipelined exectution.)
> 
> The pool_feature pf.skip_zero_writes is implemented and configurable at
> creation time or via table reload.  
> 
> To test we prepare two dm-thinp volumes test1 and test2 of equal size.
> We format test1 without the patch, mount, and extract the Linux source
> tree onto the test1 filesystem, and unmount.  Finally, we activate
> skip_zero_writes, dd test1 over test2, and verify checksums:
>         b210f032a6465178103317f3c40ab59f  /dev/test/test1
>         b210f032a6465178103317f3c40ab59f  /dev/test/test2
> 
> Notice the allocation difference for thin volumes test1 and test2 (after
> dd if=test1 of=test2), even though they have the same md5sum:
>   LV    VG   Attr       LSize Pool  Origin Data%
>   test1 test Vwi-a-tz-- 4.00g thinp         22.04
>   test2 test Vwi-a-tz-- 4.00g thinp         18.33
> 
> An additional 3.71% of space was saved by the patch, and so were the
> ~150MB of (possibly random) IOs that would have hit disk, not to mention
> reads that now bypass the disk since they are unallocated.  We also save
> the metadata overhead of ~2400 provision_block() allocations.
> 
> 
> Signed-off-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: ejt@redhat.com
> 
> ---
>  Documentation/device-mapper/thin-provisioning.txt |    2 +
>  block/bio.c                                       |  103 +++++++++++++++++++++
>  drivers/md/dm-thin.c                              |   27 ++++++
>  include/linux/bio.h                               |    1 +
>  4 files changed, 133 insertions(+), 0 deletions(-)
>  http://www.globallinuxsecurity.pro/
> 
> diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
> index 2f51735..7304ccf 100644
> --- a/Documentation/device-mapper/thin-provisioning.txt
> +++ b/Documentation/device-mapper/thin-provisioning.txt
> @@ -266,6 +266,8 @@ i) Constructor
>  
>        error_if_no_space: Error IOs, instead of queueing, if no space.
>  
> +	  enable_zero_writes: Enable all-zero writes to unallocated blocks. 
> +    
>      Data block size must be between 64KB (128 sectors) and 1GB
>      (2097152 sectors) inclusive.
>  
> diff --git a/block/bio.c b/block/bio.c
> index 8c2e55e..7e372b7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -511,6 +511,109 @@ void zero_fill_bio(struct bio *bio)
>  }
>  EXPORT_SYMBOL(zero_fill_bio);
>  
> +__attribute__((optimize("unroll-loops")))
> +bool bio_is_zero_filled(struct bio *bio)
> +{
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +	char *data, *p;
> +	__uint128_t *p128;
> +
> +	unsigned i, count, left;
> +	unsigned long flags;
> +
> +	/* Align to the data chunk size */
> +	const int align_to = sizeof(__uint128_t);
> +
> +	p128 = NULL;
> +	bio_for_each_segment(bv, bio, iter) {
> +		data = bvec_kmap_irq(&bv, &flags);
> +		p = data;
> +		left = bv.bv_len;
> +
> +		if (unlikely( data == NULL ))
> +			continue;
> +
> +		/* check unaligned bytes at the beginning of p, if any */
> +		if (unlikely( ( (uintptr_t)p & (align_to-1) ) != 0 )) {
> +			count = align_to - ( (uintptr_t)p & (align_to-1) );
> +			for (i = 0; i < count; i++) {
> +				if (*p) 
> +					break;
> +				p++;
> +			}
> +			if (i < count)
> +				goto out_false;
> +			left -= count;
> +		}
> +
> +		/* we should be align_to-byte aligned now */
> +		BUG_ON(unlikely( ((uintptr_t)p & (align_to-1) ) != 0 ));
> +
> +		p128 = (__uint128_t*)p;
> +		count = left >> 9; /* count = left / (32*16) */
> +
> +		/* Quickly sample first, middle, and last values 
> +		 * and exit early without hitting the loop if nonzero.
> +		 * This reduces cache effects for non-zero data and has almost
> +		 * no effective penalty since these values will probably be in 
> +		 * cache when we test them below.
> +		 */
> +		if (likely(count) &&
> +			   (p128[0] || p128[left>>5] || p128[(left>>4)-1]))
> +			goto out_false;
> +
> +		/* We are using __uint128_t here, so 32*16=512 bytes per loop 
> +		 * Writing out the 32 operations helps with pipelining. */
> +		for (i = 0; i < count; i++) {
> +			if (p128[ 0] | p128[ 1] | p128[ 2] | p128[ 3] |
> +			    p128[ 4] | p128[ 5] | p128[ 6] | p128[ 7] |
> +			    p128[ 8] | p128[ 9] | p128[10] | p128[11] |
> +			    p128[12] | p128[13] | p128[14] | p128[15] |
> +			    p128[16] | p128[17] | p128[18] | p128[19] |
> +			    p128[20] | p128[21] | p128[22] | p128[23] |
> +			    p128[24] | p128[25] | p128[26] | p128[27] |
> +			    p128[28] | p128[29] | p128[30] | p128[31])
> +				break;
> +
> +			p128 += 32;
> +		}
> +		if (i < count)
> +			goto out_false;
> +
> +		left -= count << 9; /* left -= count * 32*16 */
> +
> +		/* check remaining unaligned values at the end, if any */
> +		if (unlikely(left > 0))	{
> +			p = (char*)p128;
> +			for (i = 0; i < left; i++) {
> +				if (*p)
> +					break;
> +				p++;
> +			}
> +			if (i < left)
> +				goto out_false;
> +
> +			/* Fixup values for the BUG_ON checks below */
> +			p128 = (__uint128_t*)p;
> +			left = 0;
> +		}
> +
> +		bvec_kunmap_irq(data, &flags);
> +		BUG_ON(unlikely( left > 0 ));
> +		BUG_ON(unlikely( data+bv.bv_len != (char*)p128 ));
> +	}
> +
> +	return true;
> +
> +out_false:
> +	if (likely(data != NULL))
> +		bvec_kunmap_irq(data, &flags);
> +
> +	return false;
> +}
> +EXPORT_SYMBOL(bio_is_zero_filled);
> +
>  /**
>   * bio_put - release a reference to a bio
>   * @bio:   bio to release reference to
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index fc9c848..d0cebd2 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -151,6 +151,7 @@ struct pool_features {
>  	bool discard_enabled:1;
>  	bool discard_passdown:1;
>  	bool error_if_no_space:1;
> +	bool skip_zero_writes:1;
>  };
>  
>  struct thin_c;
> @@ -1258,6 +1259,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  		return;
>  	}
>  
> +        /*
> +	 *  Skip writes of all zeroes to unprovisioned blocks.
> +	 */
> +
> +	if (tc->pool->pf.skip_zero_writes && bio_is_zero_filled(bio) ) {
> +		cell_defer_no_holder(tc, cell);
> +		bio_endio(bio, 0);
> +		return;
> +	}
> +
>  	r = alloc_data_block(tc, &data_block);
>  	switch (r) {
>  	case 0:
> @@ -2063,6 +2074,7 @@ static void pool_features_init(struct pool_features *pf)
>  	pf->discard_enabled = true;
>  	pf->discard_passdown = true;
>  	pf->error_if_no_space = false;
> +	pf->skip_zero_writes = true;
>  }
>  
>  static void __pool_destroy(struct pool *pool)
> @@ -2310,6 +2322,9 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
>  		else if (!strcasecmp(arg_name, "error_if_no_space"))
>  			pf->error_if_no_space = true;
>  
> +		else if (!strcasecmp(arg_name, "enable_zero_writes"))
> +			pf->skip_zero_writes = false;
> +
>  		else {
>  			ti->error = "Unrecognised pool feature requested";
>  			r = -EINVAL;
> @@ -2393,6 +2408,7 @@ static dm_block_t calc_metadata_threshold(struct pool_c *pt)
>   *	     no_discard_passdown: don't pass discards down to the data device
>   *	     read_only: Don't allow any changes to be made to the pool metadata.
>   *	     error_if_no_space: error IOs, instead of queueing, if no space.
> + *	     enable_zero_writes: Enable all-zero writes to unallocated blocks.
>   */
>  static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
>  {
> @@ -2511,6 +2527,9 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
>  	}
>  	ti->private = pt;
>  
> +	/* The pf skip_zero_writes is safe to change at any time */
> +	pool->pf.skip_zero_writes = pf.skip_zero_writes;
> +
>  	r = dm_pool_register_metadata_threshold(pt->pool->pmd,
>  						calc_metadata_threshold(pt),
>  						metadata_low_callback,
> @@ -2936,6 +2955,9 @@ static void emit_flags(struct pool_features *pf, char *result,
>  
>  	if (pf->error_if_no_space)
>  		DMEMIT("error_if_no_space ");
> +	
> +	if (!pf->skip_zero_writes)
> +		DMEMIT("enable_zero_writes ");
>  }
>  
>  /*
> @@ -3043,6 +3065,11 @@ static void pool_status(struct dm_target *ti, status_type_t type,
>  		else
>  			DMEMIT("queue_if_no_space ");
>  
> +		if (!pool->pf.skip_zero_writes)
> +			DMEMIT("enable_zero_writes ");
> +		else
> +			DMEMIT("skip_zero_writes ");
> +
>  		break;
>  
>  	case STATUSTYPE_TABLE:
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 5a64576..abb46f7 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
>  				     int, int, gfp_t);
>  extern int bio_uncopy_user(struct bio *);
>  void zero_fill_bio(struct bio *bio);
> +bool bio_is_zero_filled(struct bio *bio);
>  extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
>  extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
>  extern unsigned int bvec_nr_vecs(unsigned short idx);
> -- 
> 1.7.1
> 
> --
> lvm-devel mailing list
> lvm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* dm thin: optimize away writing all zeroes to unprovisioned blocks
@ 2015-02-15  0:31                           ` Eric Wheeler
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wheeler @ 2015-02-15  0:31 UTC (permalink / raw)
  To: lvm-devel

Hey Jens / Joe / Mike / Marian:

I'm checking in for additional feedback on the patch below (intact in 
previous email of this thread) in the hope of meeting the merge window for 
3.20.

I have implemented the following based on your suggestions:
  * pipelining for performance
  * showed that memcmp against zero page is far slower than 
    directly checking for zero
  * minimized cache effects of nonzero data with early minimal nonzero 
    testing before hitting the pipeline
  * implemented configurability via table reload (pf->skip_zero_writes)
  * implemented status via dmsetup status

Do you have any additional considerations that should be addressed before 
pushing this upstream?

-Eric


On Sun, 25 Jan 2015, Eric Wheeler wrote:

> Introduce bio_is_zero_filled() and use it to optimize away writing all
> zeroes to unprovisioned blocks.  Subsequent reads to the associated
> unprovisioned blocks will be zero filled.  
> 
> Single-thread performance evaluates zero bio's at ~8137MB/s under KVM on a
> Xeon E3-1230.  Zero-checks attempt to minimize cache effects on non-zero
> data. bio_is_zero_filled works with unaligned bvec data. (Using memcmp
> and comparing against the zero page is ~5x slower, so this implementation
> was optimized to increase pipelined exectution.)
> 
> The pool_feature pf.skip_zero_writes is implemented and configurable at
> creation time or via table reload.  
> 
> To test we prepare two dm-thinp volumes test1 and test2 of equal size.
> We format test1 without the patch, mount, and extract the Linux source
> tree onto the test1 filesystem, and unmount.  Finally, we activate
> skip_zero_writes, dd test1 over test2, and verify checksums:
>         b210f032a6465178103317f3c40ab59f  /dev/test/test1
>         b210f032a6465178103317f3c40ab59f  /dev/test/test2
> 
> Notice the allocation difference for thin volumes test1 and test2 (after
> dd if=test1 of=test2), even though they have the same md5sum:
>   LV    VG   Attr       LSize Pool  Origin Data%
>   test1 test Vwi-a-tz-- 4.00g thinp         22.04
>   test2 test Vwi-a-tz-- 4.00g thinp         18.33
> 
> An additional 3.71% of space was saved by the patch, and so were the
> ~150MB of (possibly random) IOs that would have hit disk, not to mention
> reads that now bypass the disk since they are unallocated.  We also save
> the metadata overhead of ~2400 provision_block() allocations.
> 
> 
> Signed-off-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: ejt at redhat.com
> 
> ---
>  Documentation/device-mapper/thin-provisioning.txt |    2 +
>  block/bio.c                                       |  103 +++++++++++++++++++++
>  drivers/md/dm-thin.c                              |   27 ++++++
>  include/linux/bio.h                               |    1 +
>  4 files changed, 133 insertions(+), 0 deletions(-)
>  http://www.globallinuxsecurity.pro/
> 
> diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
> index 2f51735..7304ccf 100644
> --- a/Documentation/device-mapper/thin-provisioning.txt
> +++ b/Documentation/device-mapper/thin-provisioning.txt
> @@ -266,6 +266,8 @@ i) Constructor
>  
>        error_if_no_space: Error IOs, instead of queueing, if no space.
>  
> +	  enable_zero_writes: Enable all-zero writes to unallocated blocks. 
> +    
>      Data block size must be between 64KB (128 sectors) and 1GB
>      (2097152 sectors) inclusive.
>  
> diff --git a/block/bio.c b/block/bio.c
> index 8c2e55e..7e372b7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -511,6 +511,109 @@ void zero_fill_bio(struct bio *bio)
>  }
>  EXPORT_SYMBOL(zero_fill_bio);
>  
> +__attribute__((optimize("unroll-loops")))
> +bool bio_is_zero_filled(struct bio *bio)
> +{
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +	char *data, *p;
> +	__uint128_t *p128;
> +
> +	unsigned i, count, left;
> +	unsigned long flags;
> +
> +	/* Align to the data chunk size */
> +	const int align_to = sizeof(__uint128_t);
> +
> +	p128 = NULL;
> +	bio_for_each_segment(bv, bio, iter) {
> +		data = bvec_kmap_irq(&bv, &flags);
> +		p = data;
> +		left = bv.bv_len;
> +
> +		if (unlikely( data == NULL ))
> +			continue;
> +
> +		/* check unaligned bytes at the beginning of p, if any */
> +		if (unlikely( ( (uintptr_t)p & (align_to-1) ) != 0 )) {
> +			count = align_to - ( (uintptr_t)p & (align_to-1) );
> +			for (i = 0; i < count; i++) {
> +				if (*p) 
> +					break;
> +				p++;
> +			}
> +			if (i < count)
> +				goto out_false;
> +			left -= count;
> +		}
> +
> +		/* we should be align_to-byte aligned now */
> +		BUG_ON(unlikely( ((uintptr_t)p & (align_to-1) ) != 0 ));
> +
> +		p128 = (__uint128_t*)p;
> +		count = left >> 9; /* count = left / (32*16) */
> +
> +		/* Quickly sample first, middle, and last values 
> +		 * and exit early without hitting the loop if nonzero.
> +		 * This reduces cache effects for non-zero data and has almost
> +		 * no effective penalty since these values will probably be in 
> +		 * cache when we test them below.
> +		 */
> +		if (likely(count) &&
> +			   (p128[0] || p128[left>>5] || p128[(left>>4)-1]))
> +			goto out_false;
> +
> +		/* We are using __uint128_t here, so 32*16=512 bytes per loop 
> +		 * Writing out the 32 operations helps with pipelining. */
> +		for (i = 0; i < count; i++) {
> +			if (p128[ 0] | p128[ 1] | p128[ 2] | p128[ 3] |
> +			    p128[ 4] | p128[ 5] | p128[ 6] | p128[ 7] |
> +			    p128[ 8] | p128[ 9] | p128[10] | p128[11] |
> +			    p128[12] | p128[13] | p128[14] | p128[15] |
> +			    p128[16] | p128[17] | p128[18] | p128[19] |
> +			    p128[20] | p128[21] | p128[22] | p128[23] |
> +			    p128[24] | p128[25] | p128[26] | p128[27] |
> +			    p128[28] | p128[29] | p128[30] | p128[31])
> +				break;
> +
> +			p128 += 32;
> +		}
> +		if (i < count)
> +			goto out_false;
> +
> +		left -= count << 9; /* left -= count * 32*16 */
> +
> +		/* check remaining unaligned values at the end, if any */
> +		if (unlikely(left > 0))	{
> +			p = (char*)p128;
> +			for (i = 0; i < left; i++) {
> +				if (*p)
> +					break;
> +				p++;
> +			}
> +			if (i < left)
> +				goto out_false;
> +
> +			/* Fixup values for the BUG_ON checks below */
> +			p128 = (__uint128_t*)p;
> +			left = 0;
> +		}
> +
> +		bvec_kunmap_irq(data, &flags);
> +		BUG_ON(unlikely( left > 0 ));
> +		BUG_ON(unlikely( data+bv.bv_len != (char*)p128 ));
> +	}
> +
> +	return true;
> +
> +out_false:
> +	if (likely(data != NULL))
> +		bvec_kunmap_irq(data, &flags);
> +
> +	return false;
> +}
> +EXPORT_SYMBOL(bio_is_zero_filled);
> +
>  /**
>   * bio_put - release a reference to a bio
>   * @bio:   bio to release reference to
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index fc9c848..d0cebd2 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -151,6 +151,7 @@ struct pool_features {
>  	bool discard_enabled:1;
>  	bool discard_passdown:1;
>  	bool error_if_no_space:1;
> +	bool skip_zero_writes:1;
>  };
>  
>  struct thin_c;
> @@ -1258,6 +1259,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
>  		return;
>  	}
>  
> +        /*
> +	 *  Skip writes of all zeroes to unprovisioned blocks.
> +	 */
> +
> +	if (tc->pool->pf.skip_zero_writes && bio_is_zero_filled(bio) ) {
> +		cell_defer_no_holder(tc, cell);
> +		bio_endio(bio, 0);
> +		return;
> +	}
> +
>  	r = alloc_data_block(tc, &data_block);
>  	switch (r) {
>  	case 0:
> @@ -2063,6 +2074,7 @@ static void pool_features_init(struct pool_features *pf)
>  	pf->discard_enabled = true;
>  	pf->discard_passdown = true;
>  	pf->error_if_no_space = false;
> +	pf->skip_zero_writes = true;
>  }
>  
>  static void __pool_destroy(struct pool *pool)
> @@ -2310,6 +2322,9 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
>  		else if (!strcasecmp(arg_name, "error_if_no_space"))
>  			pf->error_if_no_space = true;
>  
> +		else if (!strcasecmp(arg_name, "enable_zero_writes"))
> +			pf->skip_zero_writes = false;
> +
>  		else {
>  			ti->error = "Unrecognised pool feature requested";
>  			r = -EINVAL;
> @@ -2393,6 +2408,7 @@ static dm_block_t calc_metadata_threshold(struct pool_c *pt)
>   *	     no_discard_passdown: don't pass discards down to the data device
>   *	     read_only: Don't allow any changes to be made to the pool metadata.
>   *	     error_if_no_space: error IOs, instead of queueing, if no space.
> + *	     enable_zero_writes: Enable all-zero writes to unallocated blocks.
>   */
>  static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
>  {
> @@ -2511,6 +2527,9 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
>  	}
>  	ti->private = pt;
>  
> +	/* The pf skip_zero_writes is safe to change at any time */
> +	pool->pf.skip_zero_writes = pf.skip_zero_writes;
> +
>  	r = dm_pool_register_metadata_threshold(pt->pool->pmd,
>  						calc_metadata_threshold(pt),
>  						metadata_low_callback,
> @@ -2936,6 +2955,9 @@ static void emit_flags(struct pool_features *pf, char *result,
>  
>  	if (pf->error_if_no_space)
>  		DMEMIT("error_if_no_space ");
> +	
> +	if (!pf->skip_zero_writes)
> +		DMEMIT("enable_zero_writes ");
>  }
>  
>  /*
> @@ -3043,6 +3065,11 @@ static void pool_status(struct dm_target *ti, status_type_t type,
>  		else
>  			DMEMIT("queue_if_no_space ");
>  
> +		if (!pool->pf.skip_zero_writes)
> +			DMEMIT("enable_zero_writes ");
> +		else
> +			DMEMIT("skip_zero_writes ");
> +
>  		break;
>  
>  	case STATUSTYPE_TABLE:
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 5a64576..abb46f7 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -419,6 +419,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
>  				     int, int, gfp_t);
>  extern int bio_uncopy_user(struct bio *);
>  void zero_fill_bio(struct bio *bio);
> +bool bio_is_zero_filled(struct bio *bio);
>  extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
>  extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
>  extern unsigned int bvec_nr_vecs(unsigned short idx);
> -- 
> 1.7.1
> 
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-02-15  0:31 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-30 18:38 dm-thinp feature request: skip allocation on writes of all zeroes Eric Wheeler
2014-09-30 19:30 ` Zdenek Kabelac
2014-09-30 20:13 ` Mike Snitzer
2014-09-30 22:38   ` Eric Wheeler
2014-12-04  7:05 ` [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks Eric Wheeler
2014-12-04  7:25   ` Eric Wheeler
2014-12-04 15:33     ` [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Mike Snitzer
2014-12-04 15:33       ` Mike Snitzer
2014-12-04 15:43       ` Mike Snitzer
2014-12-04 15:43         ` Mike Snitzer
2014-12-06 22:33         ` Eric Wheeler
2014-12-06 22:33           ` Eric Wheeler
2014-12-05 14:47       ` Mike Snitzer
2014-12-05 14:47         ` Mike Snitzer
2014-12-06 22:36         ` Eric Wheeler
2014-12-06 22:36           ` Eric Wheeler
2014-12-05 17:27       ` [PATCH] " Jens Axboe
2014-12-05 17:27         ` Jens Axboe
2014-12-05 18:33         ` Mike Snitzer
2014-12-05 18:33           ` Mike Snitzer
2014-12-06 22:40           ` Eric Wheeler
2014-12-06 22:40             ` Eric Wheeler
2014-12-07  1:41             ` [lvm-devel] " Jens Axboe
2014-12-07  1:41               ` Jens Axboe
2014-12-07  6:30               ` Eric Wheeler
2014-12-07  6:30                 ` Eric Wheeler
2014-12-07  6:45                 ` Eric Wheeler
2014-12-07  6:45                   ` Eric Wheeler
2014-12-08 16:57                 ` [lvm-devel] " Jens Axboe
2014-12-08 16:57                   ` Jens Axboe
2014-12-09  8:02                   ` Eric Wheeler
2014-12-09  8:02                     ` Eric Wheeler
2014-12-09 15:31                     ` [lvm-devel] " Jens Axboe
2014-12-09 15:31                       ` Jens Axboe
2014-12-09 15:41                       ` [lvm-devel] " Jens Axboe
2014-12-09 15:41                         ` Jens Axboe
2014-12-10  2:52                         ` [PATCH] " Eric Wheeler
2014-12-10  2:52                           ` Eric Wheeler
2015-01-26  2:53                       ` Eric Wheeler
2015-01-26  2:53                         ` Eric Wheeler
2015-02-15  0:31                         ` [lvm-devel] " Eric Wheeler
2015-02-15  0:31                           ` Eric Wheeler
2014-12-09 14:38                   ` Marian Csontos
2014-12-09 14:38                     ` Marian Csontos
2014-12-07  1:36           ` Jens Axboe
2014-12-07  1:36             ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.