From: Mike Snitzer <snitzer@redhat.com> To: Eric Wheeler <ewheeler@ewheeler.net> Cc: axboe@kernel.dk, dm-devel@redhat.com, ejt@redhat.com, LVM2 development <lvm-devel@redhat.com> Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Date: Thu, 4 Dec 2014 10:33:59 -0500 [thread overview] Message-ID: <20141204153358.GA19315@redhat.com> (raw) In-Reply-To: <alpine.DEB.2.02.1412032318290.29609@ware.dreamhost.com> Hi, In the future please send DM changes to dm-devel@redhat.com Comments inlined below, and I've provided a revised patch at the end. On Thu, Dec 04 2014 at 2:25am -0500, Eric Wheeler <ewheeler@ewheeler.net> wrote: > This patch skips all-zero writes to unallocated blocks of dm-thinp volumes. > > Unallocated zero-writes are 70x faster and never allocate space in this test: > # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024 > 1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s > > Without the patch, zero-writes allocate space and hit the disk: > # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024 > 1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s > > For the test below, notice the allocation difference for thin volumes > test1 and test2 (after dd if=test1 of=test2), even though they have the > same md5sum: > LV VG Attr LSize Pool Origin Data% > test1 test Vwi-a-tz-- 4.00g thinp 22.04 > test2 test Vwi-a-tz-- 4.00g thinp 18.33 > > An additional 3.71% of space was saved by the patch, and so were > the ~150MB of (possibly random) IOs that would have hit disk, not to > mention reads that now bypass the disk since they are unallocated. > We also save the metadata overhead of ~2400 allocations when calling > provision_block(). > > # lvcreate -T test/thinp -L 5G > # lvcreate -T test/thinp -V 4G -n test1 > # lvcreate -T test/thinp -V 4G -n test2 > > Simple ext4+kernel tree extract test: > > First prepare two dm-thinp volumes test1 and test2 of equal size. First > mkfs.ext4 /dev/test/test1 without the patch and then mount and extract > 3.17.4's source tree onto the test1 filesystem, and unmount > > Next, install patched dm_thin_pool.ko, then dd test1 over test2 and > verify checksums: > # dd if=/dev/test/test1 of=/dev/test/test2 bs=1M > # md5sum /dev/test/test? > b210f032a6465178103317f3c40ab59f /dev/test/test1 > b210f032a6465178103317f3c40ab59f /dev/test/test2 > Yes, they match! > > > Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net> > --- > Resending the patch as it was malformed on the first try. The resend was also malformed.. but don't worry about resending for this patch. > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index fc9c848..71dd545 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio, > } > } A helper like this really belongs in block/bio.c: > +/* return true if bio data contains all 0x00's */ > +bool bio_all_zeros(struct bio *bio) +{ > + unsigned long flags; > + struct bio_vec bv; > + struct bvec_iter iter; > + > + char *data; > + uint64_t *p; > + int i, count; > + + bool allzeros = true; > + > + bio_for_each_segment(bv, bio, iter) { > + data = bvec_kmap_irq(&bv, &flags); > + > + p = (uint64_t*)data; > + count = bv.bv_len / sizeof(uint64_t); Addressing a bio's contents in terms of uint64_t has the potential to access beyond bv.bv_len (byte addressing vs 64bit addressing). I can see you were just looking to be more efficient about checking the bios' contents but I'm not convinced it would always be safe. I'm open to something more efficient than what I implemented below, but it is the most straight-forward code I thought of. > + > + for (i = 0; i < count; i++) { > + if (*p) { > + allzeros = false; > + break; > + } > + p++; > + } > + > + bvec_kunmap_irq(data, &flags); > + > + if (likely(!allzeros)) > + break; > + } > + > + return allzeros; > +} > + > static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block, > struct dm_bio_prison_cell *cell) > { > @@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block > return; > } > > + /* > + * Skip writes of all zeroes > + */ > + if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) { > + cell_defer_no_holder(tc, cell); > + bio_endio(bio, 0); > + return; > + } > + No need to check for bio_data_dir(bio) == WRITE (at this point in provision_block() we already know it is a WRITE). Here is a revised patch that is more like I'd expect to land upstream. Jens are you OK with us adding bio_is_zero_filled to block/bio.c? If so should I split it out as a separate patch for you to pick up or just carry it as part of the patch that lands in linux-dm.git? From: Mike Snitzer <snitzer@redhat.com> Date: Thu, 4 Dec 2014 10:18:32 -0500 Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Introduce bio_is_zero_filled() and use it to optimize away writing all zeroes to unprovisioned blocks. Subsequent reads to the associated unprovisioned blocks will be zero filled. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Eric Wheeler <ewheeler@ewheeler.net> Cc: Jens Axboe <axboe@kernel.dk> --- block/bio.c | 25 +++++++++++++++++++++++++ drivers/md/dm-thin.c | 10 ++++++++++ include/linux/bio.h | 1 + 3 files changed, 36 insertions(+), 0 deletions(-) diff --git a/block/bio.c b/block/bio.c index 3e6e198..7d07593 100644 --- a/block/bio.c +++ b/block/bio.c @@ -515,6 +515,31 @@ void zero_fill_bio(struct bio *bio) } EXPORT_SYMBOL(zero_fill_bio); +bool bio_is_zero_filled(struct bio *bio) +{ + unsigned i; + unsigned long flags; + struct bio_vec bv; + struct bvec_iter iter; + + bio_for_each_segment(bv, bio, iter) { + char *data = bvec_kmap_irq(&bv, &flags); + char *p = data; + + for (i = 0; i < bv.bv_len; i++) { + if (*p) { + bvec_kunmap_irq(data, &flags); + return false; + } + p++; + } + bvec_kunmap_irq(data, &flags); + } + + return true; +} +EXPORT_SYMBOL(bio_is_zero_filled); + /** * bio_put - release a reference to a bio * @bio: bio to release reference to diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 8735543..13aff8c 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1501,6 +1501,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block return; } + /* + * Optimize away writes of all zeroes, subsequent reads to + * associated unprovisioned blocks will be zero filled. + */ + if (unlikely(bio_is_zero_filled(bio))) { + cell_defer_no_holder(tc, cell); + bio_endio(bio, 0); + return; + } + r = alloc_data_block(tc, &data_block); switch (r) { case 0: diff --git a/include/linux/bio.h b/include/linux/bio.h index 7347f48..602094b 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -465,6 +465,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *, int, int, gfp_t); extern int bio_uncopy_user(struct bio *); void zero_fill_bio(struct bio *bio); +bool bio_is_zero_filled(struct bio *bio); extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *); extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int); extern unsigned int bvec_nr_vecs(unsigned short idx); -- 1.7.4.4 -- lvm-devel mailing list lvm-devel@redhat.com https://www.redhat.com/mailman/listinfo/lvm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com> To: lvm-devel@redhat.com Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Date: Thu, 4 Dec 2014 10:33:59 -0500 [thread overview] Message-ID: <20141204153358.GA19315@redhat.com> (raw) In-Reply-To: <alpine.DEB.2.02.1412032318290.29609@ware.dreamhost.com> Hi, In the future please send DM changes to dm-devel at redhat.com Comments inlined below, and I've provided a revised patch at the end. On Thu, Dec 04 2014 at 2:25am -0500, Eric Wheeler <ewheeler@ewheeler.net> wrote: > This patch skips all-zero writes to unallocated blocks of dm-thinp volumes. > > Unallocated zero-writes are 70x faster and never allocate space in this test: > # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024 > 1073741824 bytes (1.1 GB) copied, 0.794343 s, 1.4 GB/s > > Without the patch, zero-writes allocate space and hit the disk: > # dd if=/dev/zero of=/dev/test/test1 bs=1M count=1024 > 1073741824 bytes (1.1 GB) copied, 53.8064 s, 20.0 MB/s > > For the test below, notice the allocation difference for thin volumes > test1 and test2 (after dd if=test1 of=test2), even though they have the > same md5sum: > LV VG Attr LSize Pool Origin Data% > test1 test Vwi-a-tz-- 4.00g thinp 22.04 > test2 test Vwi-a-tz-- 4.00g thinp 18.33 > > An additional 3.71% of space was saved by the patch, and so were > the ~150MB of (possibly random) IOs that would have hit disk, not to > mention reads that now bypass the disk since they are unallocated. > We also save the metadata overhead of ~2400 allocations when calling > provision_block(). > > # lvcreate -T test/thinp -L 5G > # lvcreate -T test/thinp -V 4G -n test1 > # lvcreate -T test/thinp -V 4G -n test2 > > Simple ext4+kernel tree extract test: > > First prepare two dm-thinp volumes test1 and test2 of equal size. First > mkfs.ext4 /dev/test/test1 without the patch and then mount and extract > 3.17.4's source tree onto the test1 filesystem, and unmount > > Next, install patched dm_thin_pool.ko, then dd test1 over test2 and > verify checksums: > # dd if=/dev/test/test1 of=/dev/test/test2 bs=1M > # md5sum /dev/test/test? > b210f032a6465178103317f3c40ab59f /dev/test/test1 > b210f032a6465178103317f3c40ab59f /dev/test/test2 > Yes, they match! > > > Signed-off-by: Eric Wheeler <lvm-dev@lists.ewheeler.net> > --- > Resending the patch as it was malformed on the first try. The resend was also malformed.. but don't worry about resending for this patch. > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index fc9c848..71dd545 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -1230,6 +1230,42 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio, > } > } A helper like this really belongs in block/bio.c: > +/* return true if bio data contains all 0x00's */ > +bool bio_all_zeros(struct bio *bio) +{ > + unsigned long flags; > + struct bio_vec bv; > + struct bvec_iter iter; > + > + char *data; > + uint64_t *p; > + int i, count; > + + bool allzeros = true; > + > + bio_for_each_segment(bv, bio, iter) { > + data = bvec_kmap_irq(&bv, &flags); > + > + p = (uint64_t*)data; > + count = bv.bv_len / sizeof(uint64_t); Addressing a bio's contents in terms of uint64_t has the potential to access beyond bv.bv_len (byte addressing vs 64bit addressing). I can see you were just looking to be more efficient about checking the bios' contents but I'm not convinced it would always be safe. I'm open to something more efficient than what I implemented below, but it is the most straight-forward code I thought of. > + > + for (i = 0; i < count; i++) { > + if (*p) { > + allzeros = false; > + break; > + } > + p++; > + } > + > + bvec_kunmap_irq(data, &flags); > + > + if (likely(!allzeros)) > + break; > + } > + > + return allzeros; > +} > + > static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block, > struct dm_bio_prison_cell *cell) > { > @@ -1258,6 +1294,15 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block > return; > } > > + /* > + * Skip writes of all zeroes > + */ > + if (bio_data_dir(bio) == WRITE && unlikely( bio_all_zeros(bio) )) { > + cell_defer_no_holder(tc, cell); > + bio_endio(bio, 0); > + return; > + } > + No need to check for bio_data_dir(bio) == WRITE (at this point in provision_block() we already know it is a WRITE). Here is a revised patch that is more like I'd expect to land upstream. Jens are you OK with us adding bio_is_zero_filled to block/bio.c? If so should I split it out as a separate patch for you to pick up or just carry it as part of the patch that lands in linux-dm.git? From: Mike Snitzer <snitzer@redhat.com> Date: Thu, 4 Dec 2014 10:18:32 -0500 Subject: [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Introduce bio_is_zero_filled() and use it to optimize away writing all zeroes to unprovisioned blocks. Subsequent reads to the associated unprovisioned blocks will be zero filled. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Eric Wheeler <ewheeler@ewheeler.net> Cc: Jens Axboe <axboe@kernel.dk> --- block/bio.c | 25 +++++++++++++++++++++++++ drivers/md/dm-thin.c | 10 ++++++++++ include/linux/bio.h | 1 + 3 files changed, 36 insertions(+), 0 deletions(-) diff --git a/block/bio.c b/block/bio.c index 3e6e198..7d07593 100644 --- a/block/bio.c +++ b/block/bio.c @@ -515,6 +515,31 @@ void zero_fill_bio(struct bio *bio) } EXPORT_SYMBOL(zero_fill_bio); +bool bio_is_zero_filled(struct bio *bio) +{ + unsigned i; + unsigned long flags; + struct bio_vec bv; + struct bvec_iter iter; + + bio_for_each_segment(bv, bio, iter) { + char *data = bvec_kmap_irq(&bv, &flags); + char *p = data; + + for (i = 0; i < bv.bv_len; i++) { + if (*p) { + bvec_kunmap_irq(data, &flags); + return false; + } + p++; + } + bvec_kunmap_irq(data, &flags); + } + + return true; +} +EXPORT_SYMBOL(bio_is_zero_filled); + /** * bio_put - release a reference to a bio * @bio: bio to release reference to diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 8735543..13aff8c 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1501,6 +1501,16 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block return; } + /* + * Optimize away writes of all zeroes, subsequent reads to + * associated unprovisioned blocks will be zero filled. + */ + if (unlikely(bio_is_zero_filled(bio))) { + cell_defer_no_holder(tc, cell); + bio_endio(bio, 0); + return; + } + r = alloc_data_block(tc, &data_block); switch (r) { case 0: diff --git a/include/linux/bio.h b/include/linux/bio.h index 7347f48..602094b 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -465,6 +465,7 @@ extern struct bio *bio_copy_user_iov(struct request_queue *, int, int, gfp_t); extern int bio_uncopy_user(struct bio *); void zero_fill_bio(struct bio *bio); +bool bio_is_zero_filled(struct bio *bio); extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *); extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int); extern unsigned int bvec_nr_vecs(unsigned short idx); -- 1.7.4.4
next prev parent reply other threads:[~2014-12-04 15:33 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-09-30 18:38 dm-thinp feature request: skip allocation on writes of all zeroes Eric Wheeler 2014-09-30 19:30 ` Zdenek Kabelac 2014-09-30 20:13 ` Mike Snitzer 2014-09-30 22:38 ` Eric Wheeler 2014-12-04 7:05 ` [PATCH] dm-thinp: skip allocation on writes of all zeros to unallocated blocks Eric Wheeler 2014-12-04 7:25 ` Eric Wheeler 2014-12-04 15:33 ` Mike Snitzer [this message] 2014-12-04 15:33 ` [PATCH] dm thin: optimize away writing all zeroes to unprovisioned blocks Mike Snitzer 2014-12-04 15:43 ` Mike Snitzer 2014-12-04 15:43 ` Mike Snitzer 2014-12-06 22:33 ` Eric Wheeler 2014-12-06 22:33 ` Eric Wheeler 2014-12-05 14:47 ` Mike Snitzer 2014-12-05 14:47 ` Mike Snitzer 2014-12-06 22:36 ` Eric Wheeler 2014-12-06 22:36 ` Eric Wheeler 2014-12-05 17:27 ` [PATCH] " Jens Axboe 2014-12-05 17:27 ` Jens Axboe 2014-12-05 18:33 ` Mike Snitzer 2014-12-05 18:33 ` Mike Snitzer 2014-12-06 22:40 ` Eric Wheeler 2014-12-06 22:40 ` Eric Wheeler 2014-12-07 1:41 ` [lvm-devel] " Jens Axboe 2014-12-07 1:41 ` Jens Axboe 2014-12-07 6:30 ` Eric Wheeler 2014-12-07 6:30 ` Eric Wheeler 2014-12-07 6:45 ` Eric Wheeler 2014-12-07 6:45 ` Eric Wheeler 2014-12-08 16:57 ` [lvm-devel] " Jens Axboe 2014-12-08 16:57 ` Jens Axboe 2014-12-09 8:02 ` Eric Wheeler 2014-12-09 8:02 ` Eric Wheeler 2014-12-09 15:31 ` [lvm-devel] " Jens Axboe 2014-12-09 15:31 ` Jens Axboe 2014-12-09 15:41 ` [lvm-devel] " Jens Axboe 2014-12-09 15:41 ` Jens Axboe 2014-12-10 2:52 ` [PATCH] " Eric Wheeler 2014-12-10 2:52 ` Eric Wheeler 2015-01-26 2:53 ` Eric Wheeler 2015-01-26 2:53 ` Eric Wheeler 2015-02-15 0:31 ` [lvm-devel] " Eric Wheeler 2015-02-15 0:31 ` Eric Wheeler 2014-12-09 14:38 ` Marian Csontos 2014-12-09 14:38 ` Marian Csontos 2014-12-07 1:36 ` Jens Axboe 2014-12-07 1:36 ` Jens Axboe
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20141204153358.GA19315@redhat.com \ --to=snitzer@redhat.com \ --cc=axboe@kernel.dk \ --cc=dm-devel@redhat.com \ --cc=ejt@redhat.com \ --cc=ewheeler@ewheeler.net \ --cc=lvm-devel@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.