All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix discards on dm-thin
@ 2012-07-16 18:34 Mikulas Patocka
  2012-07-16 18:34 ` [PATCH 1/3] dm: introduce split_discard_requests Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-16 18:34 UTC (permalink / raw)
  To: Alasdair G. Kergon; +Cc: dm-devel, Edward Thornber, Mike Snitzer

Hi

There three patches fix some discard bugs on dm-thin. The patches should 
be placed in Alasdair's patchset after 
"dm-thin-optimize-power-of-two-block-size.patch". They shouldn't be set 
upstream because the bugs they fix exist only in Alasdair's patchset, not 
upstream.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] dm: introduce split_discard_requests
  2012-07-16 18:34 [PATCH 0/3] Fix discards on dm-thin Mikulas Patocka
@ 2012-07-16 18:34 ` Mikulas Patocka
  2012-07-16 18:35   ` [PATCH 2/3] dm-thin: fix discard support Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-16 18:34 UTC (permalink / raw)
  To: Alasdair G. Kergon; +Cc: dm-devel, Edward Thornber, Mike Snitzer

dm: introduce split_discard_requests

This patch introduces a new variable split_discard_requests. It can be
set by targets, when it is set, discard requests are split on max_io_len
boundary.

When split_discard_requests is not set, discard requests are split on
target boundary, as it used to be before this patch.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm.c               |    5 ++++-
 include/linux/device-mapper.h |    6 ++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

Index: linux-3.4.4-fast/drivers/md/dm.c
===================================================================
--- linux-3.4.4-fast.orig/drivers/md/dm.c	2012-07-05 23:55:38.000000000 +0200
+++ linux-3.4.4-fast/drivers/md/dm.c	2012-07-05 23:56:41.000000000 +0200
@@ -1244,7 +1244,10 @@ static int __clone_and_map_discard(struc
 		if (!ti->num_discard_requests)
 			return -EOPNOTSUPP;
 
-		len = min(ci->sector_count, max_io_len_target_boundary(ci->sector, ti));
+		if (!ti->split_discard_requests)
+			len = min(ci->sector_count, max_io_len_target_boundary(ci->sector, ti));
+		else
+			len = min(ci->sector_count, max_io_len(ci->sector, ti));
 
 		__issue_target_requests(ci, ti, ti->num_discard_requests, len);
 
Index: linux-3.4.4-fast/include/linux/device-mapper.h
===================================================================
--- linux-3.4.4-fast.orig/include/linux/device-mapper.h	2012-07-05 23:53:47.000000000 +0200
+++ linux-3.4.4-fast/include/linux/device-mapper.h	2012-07-05 23:59:37.000000000 +0200
@@ -220,6 +220,12 @@ struct dm_target {
 	unsigned discards_supported:1;
 
 	/*
+	 * Set if the target required discard request to be split
+	 * on max_io_len boundary.
+	 */
+	unsigned split_discard_requests:1;
+
+	/*
 	 * Set if this target does not return zeroes on discarded blocks.
 	 */
 	unsigned discard_zeroes_data_unsupported:1;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/3] dm-thin: fix discard support
  2012-07-16 18:34 ` [PATCH 1/3] dm: introduce split_discard_requests Mikulas Patocka
@ 2012-07-16 18:35   ` Mikulas Patocka
  2012-07-16 18:35     ` [PATCH 3/3] dm-thin: fix discard_granularity Mikulas Patocka
  2012-07-17 13:26     ` [PATCH 2/3] dm-thin: fix discard support Vivek Goyal
  0 siblings, 2 replies; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-16 18:35 UTC (permalink / raw)
  To: Alasdair G. Kergon; +Cc: dm-devel, Edward Thornber, Mike Snitzer

dm-thin: fix discard support

There is a bug in dm_thin regarding processing discards.
When dm-thin receives a discard request with size equal to block size
that is not aligned on block size boundary, io_overlaps_block returns
true, process_discard treats this discard as a full block discard,
deletes the full block - the result is that some data that shouldn't be
discarded are discarded.

This patch sets the variable "ti->split_discard_requests", so that
device mapper core splits discard requests on a block boundary.

Consequently, a discard request that spans multiple blocks is never sent
to dm-thin. The patch also removes some code in process_discard that
deals with discards that span multiple blocks.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-thin.c |   18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
===================================================================
--- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 18:46:18.000000000 +0200
+++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:07:19.000000000 +0200
@@ -1246,17 +1246,10 @@ static void process_discard(struct thin_
 			}
 		} else {
 			/*
-			 * This path is hit if people are ignoring
-			 * limits->discard_granularity.  It ignores any
-			 * part of the discard that is in a subsequent
-			 * block.
+			 * The dm makes sure that the discard doesn't span
+			 * a block boundary. So we submit the discard
+			 * to the appropriate block.
 			 */
-			sector_t offset = pool->sectors_per_block_shift >= 0 ?
-			      bio->bi_sector & (pool->sectors_per_block - 1) :
-			      bio->bi_sector - block * pool->sectors_per_block;
-			unsigned remaining = (pool->sectors_per_block - offset) << SECTOR_SHIFT;
-			bio->bi_size = min(bio->bi_size, remaining);
-
 			cell_release_singleton(cell, bio);
 			cell_release_singleton(cell2, bio);
 			remap_and_issue(tc, bio, lookup_result.block);
@@ -2506,7 +2499,8 @@ static void set_discard_limits(struct po
 
 	/*
 	 * This is just a hint, and not enforced.  We have to cope with
-	 * bios that overlap 2 blocks.
+	 * bios cover a block partially.  A discard that spans a block boundary
+	 * is not sent to this target.
 	 */
 	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
 	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
@@ -2648,6 +2642,8 @@ static int thin_ctr(struct dm_target *ti
 	if (tc->pool->pf.discard_enabled) {
 		ti->discards_supported = 1;
 		ti->num_discard_requests = 1;
+		/* Discard requests must be split on a chunk boundary */
+		ti->split_discard_requests = 1;
 	}
 
 	dm_put(pool_md);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/3] dm-thin: fix discard_granularity
  2012-07-16 18:35   ` [PATCH 2/3] dm-thin: fix discard support Mikulas Patocka
@ 2012-07-16 18:35     ` Mikulas Patocka
  2012-07-17 15:31       ` Mike Snitzer
  2012-07-17 13:26     ` [PATCH 2/3] dm-thin: fix discard support Vivek Goyal
  1 sibling, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-16 18:35 UTC (permalink / raw)
  To: Alasdair G. Kergon; +Cc: dm-devel, Edward Thornber, Mike Snitzer

dm-thin: fix discard_granularity

The kernel expects that limits->discard_granularity is a power of two.
Set this limit only if we use a power of two block size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-thin.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
===================================================================
--- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
+++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
@@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
 	 * bios cover a block partially.  A discard that spans a block boundary
 	 * is not sent to this target.
 	 */
-	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
+	if (pool->sectors_per_block_shift >= 0)
+		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
 	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
 }
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] dm-thin: fix discard support
  2012-07-16 18:35   ` [PATCH 2/3] dm-thin: fix discard support Mikulas Patocka
  2012-07-16 18:35     ` [PATCH 3/3] dm-thin: fix discard_granularity Mikulas Patocka
@ 2012-07-17 13:26     ` Vivek Goyal
  2012-07-17 13:58       ` Mike Snitzer
  1 sibling, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2012-07-17 13:26 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, Edward Thornber, Mike Snitzer, Alasdair G. Kergon

On Mon, Jul 16, 2012 at 02:35:18PM -0400, Mikulas Patocka wrote:
> dm-thin: fix discard support
> 
> There is a bug in dm_thin regarding processing discards.
> When dm-thin receives a discard request with size equal to block size
> that is not aligned on block size boundary, io_overlaps_block returns
> true, process_discard treats this discard as a full block discard,
  ^^^^
> deletes the full block - the result is that some data that shouldn't be
> discarded are discarded.

Looking at io_overlaps_block(), it looks like it will return false (and
not true) for bios which are not aligned to block size boundary.

static int io_overlaps_block(struct pool *pool, struct bio *bio)
{
        return !(bio->bi_sector & pool->offset_mask) &&
                (bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));

}

Hence for block which crosses block size boundary, we should be sending
discard down for partial block as per the current code and no harm should
be done?


> 
> This patch sets the variable "ti->split_discard_requests", so that
> device mapper core splits discard requests on a block boundary.
> 
> Consequently, a discard request that spans multiple blocks is never sent
> to dm-thin. The patch also removes some code in process_discard that
> deals with discards that span multiple blocks.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  drivers/md/dm-thin.c |   18 +++++++-----------
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> ===================================================================
> --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 18:46:18.000000000 +0200
> +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:07:19.000000000 +0200
> @@ -1246,17 +1246,10 @@ static void process_discard(struct thin_
>  			}
>  		} else {
>  			/*
> -			 * This path is hit if people are ignoring
> -			 * limits->discard_granularity.  It ignores any
> -			 * part of the discard that is in a subsequent
> -			 * block.
> +			 * The dm makes sure that the discard doesn't span
> +			 * a block boundary. So we submit the discard
> +			 * to the appropriate block.
>  			 */
> -			sector_t offset = pool->sectors_per_block_shift >= 0 ?
> -			      bio->bi_sector & (pool->sectors_per_block - 1) :
> -			      bio->bi_sector - block * pool->sectors_per_block;
> -			unsigned remaining = (pool->sectors_per_block - offset) << SECTOR_SHIFT;
> -			bio->bi_size = min(bio->bi_size, remaining);
> -

So previous code will also send down partial block discard and this code
will also send down partial discard. So nothing has changed from
functionality point of view?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] dm-thin: fix discard support
  2012-07-17 13:26     ` [PATCH 2/3] dm-thin: fix discard support Vivek Goyal
@ 2012-07-17 13:58       ` Mike Snitzer
  2012-07-17 14:18         ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2012-07-17 13:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Edward Thornber, dm-devel, Mikulas Patocka, Alasdair G. Kergon

On Tue, Jul 17 2012 at  9:26am -0400,
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Jul 16, 2012 at 02:35:18PM -0400, Mikulas Patocka wrote:
> > dm-thin: fix discard support
> > 
> > There is a bug in dm_thin regarding processing discards.
> > When dm-thin receives a discard request with size equal to block size
> > that is not aligned on block size boundary, io_overlaps_block returns
> > true, process_discard treats this discard as a full block discard,
>   ^^^^
> > deletes the full block - the result is that some data that shouldn't be
> > discarded are discarded.
> 
> Looking at io_overlaps_block(), it looks like it will return false (and
> not true) for bios which are not aligned to block size boundary.
> 
> static int io_overlaps_block(struct pool *pool, struct bio *bio)
> {
>         return !(bio->bi_sector & pool->offset_mask) &&
>                 (bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));
> 
> }
> 
> Hence for block which crosses block size boundary, we should be sending
> discard down for partial block as per the current code and no harm should
> be done?

Right, not sure why Mikulas read that as it'd return true.
 
> > This patch sets the variable "ti->split_discard_requests", so that
> > device mapper core splits discard requests on a block boundary.
> > 
> > Consequently, a discard request that spans multiple blocks is never sent
> > to dm-thin. The patch also removes some code in process_discard that
> > deals with discards that span multiple blocks.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > ---
> >  drivers/md/dm-thin.c |   18 +++++++-----------
> >  1 file changed, 7 insertions(+), 11 deletions(-)
> > 
> > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > ===================================================================
> > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 18:46:18.000000000 +0200
> > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:07:19.000000000 +0200
> > @@ -1246,17 +1246,10 @@ static void process_discard(struct thin_
> >  			}
> >  		} else {
> >  			/*
> > -			 * This path is hit if people are ignoring
> > -			 * limits->discard_granularity.  It ignores any
> > -			 * part of the discard that is in a subsequent
> > -			 * block.
> > +			 * The dm makes sure that the discard doesn't span
> > +			 * a block boundary. So we submit the discard
> > +			 * to the appropriate block.
> >  			 */
> > -			sector_t offset = pool->sectors_per_block_shift >= 0 ?
> > -			      bio->bi_sector & (pool->sectors_per_block - 1) :
> > -			      bio->bi_sector - block * pool->sectors_per_block;
> > -			unsigned remaining = (pool->sectors_per_block - offset) << SECTOR_SHIFT;
> > -			bio->bi_size = min(bio->bi_size, remaining);
> > -
> 
> So previous code will also send down partial block discard and this code
> will also send down partial discard. So nothing has changed from
> functionality point of view?

The change is the bit that you trimmed:

ti->split_discard_requests = 1;

That will restrict the size of the discard to be on a blocksize
boundary.

But I'm really not sure we want to impose such small discards -- though
the current thinp code does have a problem with discards that are too
large (I need to dig up specifics that Joe conveyed to me a few weeks
back; I was asking: "why cannot the thinp device have discard limits
that match the underlying data device's discard limits?").

<snitm> why not just rely on the max of the underlying device?
<ejt> we have to quiesce the blocks we'e about to discard
<snitm> e.g. remove the explicit override for max_bytes in set_discard_limits?
<ejt> no, it's not simple at all.
<ejt> have to be very careful we can service the discard in bounded memory
<snitm> so you don't think thinp can handle processing what the hardware can?
<ejt> not yet; I'd need to change the bio_prison to be able to lock
ranges of blocks, not just single ones.  And add a btree_trim method to
prune a btree.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] dm-thin: fix discard support
  2012-07-17 13:58       ` Mike Snitzer
@ 2012-07-17 14:18         ` Mikulas Patocka
  2012-07-17 14:49           ` Mike Snitzer
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-17 14:18 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Edward Thornber, Alasdair G. Kergon, Vivek Goyal



On Tue, 17 Jul 2012, Mike Snitzer wrote:

> On Tue, Jul 17 2012 at  9:26am -0400,
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Mon, Jul 16, 2012 at 02:35:18PM -0400, Mikulas Patocka wrote:
> > > dm-thin: fix discard support
> > > 
> > > There is a bug in dm_thin regarding processing discards.
> > > When dm-thin receives a discard request with size equal to block size
> > > that is not aligned on block size boundary, io_overlaps_block returns
> > > true, process_discard treats this discard as a full block discard,
> >   ^^^^
> > > deletes the full block - the result is that some data that shouldn't be
> > > discarded are discarded.
> > 
> > Looking at io_overlaps_block(), it looks like it will return false (and
> > not true) for bios which are not aligned to block size boundary.
> > 
> > static int io_overlaps_block(struct pool *pool, struct bio *bio)
> > {
> >         return !(bio->bi_sector & pool->offset_mask) &&
> >                 (bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));
> > 
> > }
> > 
> > Hence for block which crosses block size boundary, we should be sending
> > discard down for partial block as per the current code and no harm should
> > be done?
> 
> Right, not sure why Mikulas read that as it'd return true.

The patch refers to the patchset that will be sent out for the next 
kernel: http://people.redhat.com/agk/patches/linux/editing/series.html

In the current 3.5-rc code unaligned discard is partially ignored. In the 
patchset it causes wrong data to be discarded.

> > > This patch sets the variable "ti->split_discard_requests", so that
> > > device mapper core splits discard requests on a block boundary.
> > > 
> > > Consequently, a discard request that spans multiple blocks is never sent
> > > to dm-thin. The patch also removes some code in process_discard that
> > > deals with discards that span multiple blocks.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > 
> > > ---
> > >  drivers/md/dm-thin.c |   18 +++++++-----------
> > >  1 file changed, 7 insertions(+), 11 deletions(-)
> > > 
> > > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > > ===================================================================
> > > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 18:46:18.000000000 +0200
> > > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:07:19.000000000 +0200
> > > @@ -1246,17 +1246,10 @@ static void process_discard(struct thin_
> > >  			}
> > >  		} else {
> > >  			/*
> > > -			 * This path is hit if people are ignoring
> > > -			 * limits->discard_granularity.  It ignores any
> > > -			 * part of the discard that is in a subsequent
> > > -			 * block.
> > > +			 * The dm makes sure that the discard doesn't span
> > > +			 * a block boundary. So we submit the discard
> > > +			 * to the appropriate block.
> > >  			 */
> > > -			sector_t offset = pool->sectors_per_block_shift >= 0 ?
> > > -			      bio->bi_sector & (pool->sectors_per_block - 1) :
> > > -			      bio->bi_sector - block * pool->sectors_per_block;
> > > -			unsigned remaining = (pool->sectors_per_block - offset) << SECTOR_SHIFT;
> > > -			bio->bi_size = min(bio->bi_size, remaining);
> > > -
> > 
> > So previous code will also send down partial block discard and this code
> > will also send down partial discard. So nothing has changed from
> > functionality point of view?
> 
> The change is the bit that you trimmed:
> 
> ti->split_discard_requests = 1;
> 
> That will restrict the size of the discard to be on a blocksize
> boundary.
> 
> But I'm really not sure we want to impose such small discards -- though

Because the code can't handle large discards --- it trims them to a block 
boundary and sends the trimmed request to the first chunk.

> the current thinp code does have a problem with discards that are too
> large (I need to dig up specifics that Joe conveyed to me a few weeks
> back; I was asking: "why cannot the thinp device have discard limits
> that match the underlying data device's discard limits?").
> 
> <snitm> why not just rely on the max of the underlying device?
> <ejt> we have to quiesce the blocks we'e about to discard
> <snitm> e.g. remove the explicit override for max_bytes in set_discard_limits?
> <ejt> no, it's not simple at all.
> <ejt> have to be very careful we can service the discard in bounded memory
> <snitm> so you don't think thinp can handle processing what the hardware can?
> <ejt> not yet; I'd need to change the bio_prison to be able to lock
> ranges of blocks, not just single ones.  And add a btree_trim method to
> prune a btree.

When Joe implements this, ti->split_discard_requests could be cleared.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] dm-thin: fix discard support
  2012-07-17 14:18         ` Mikulas Patocka
@ 2012-07-17 14:49           ` Mike Snitzer
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2012-07-17 14:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, Edward Thornber, Alasdair G. Kergon, Vivek Goyal

On Tue, Jul 17 2012 at 10:18am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Tue, 17 Jul 2012, Mike Snitzer wrote:
> 
> > On Tue, Jul 17 2012 at  9:26am -0400,
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > On Mon, Jul 16, 2012 at 02:35:18PM -0400, Mikulas Patocka wrote:
> > > > dm-thin: fix discard support
> > > > 
> > > > There is a bug in dm_thin regarding processing discards.
> > > > When dm-thin receives a discard request with size equal to block size
> > > > that is not aligned on block size boundary, io_overlaps_block returns
> > > > true, process_discard treats this discard as a full block discard,
> > >   ^^^^
> > > > deletes the full block - the result is that some data that shouldn't be
> > > > discarded are discarded.
> > > 
> > > Looking at io_overlaps_block(), it looks like it will return false (and
> > > not true) for bios which are not aligned to block size boundary.
> > > 
> > > static int io_overlaps_block(struct pool *pool, struct bio *bio)
> > > {
> > >         return !(bio->bi_sector & pool->offset_mask) &&
> > >                 (bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));
> > > 
> > > }
> > > 
> > > Hence for block which crosses block size boundary, we should be sending
> > > discard down for partial block as per the current code and no harm should
> > > be done?
> > 
> > Right, not sure why Mikulas read that as it'd return true.
> 
> The patch refers to the patchset that will be sent out for the next 
> kernel: http://people.redhat.com/agk/patches/linux/editing/series.html
> 
> In the current 3.5-rc code unaligned discard is partially ignored. In the 
> patchset it causes wrong data to be discarded.

Alasdair should reorder the patches then.  Your thinp discard changes
should come before the patch with the io_overlaps_block optimization
(that optimization causes it to return true for this case).

> > > > This patch sets the variable "ti->split_discard_requests", so that
> > > > device mapper core splits discard requests on a block boundary.
> > > > 
> > > > Consequently, a discard request that spans multiple blocks is never sent
> > > > to dm-thin. The patch also removes some code in process_discard that
> > > > deals with discards that span multiple blocks.
> > > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > 
> > > > ---
> > > >  drivers/md/dm-thin.c |   18 +++++++-----------
> > > >  1 file changed, 7 insertions(+), 11 deletions(-)
> > > > 
> > > > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > > > ===================================================================
> > > > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 18:46:18.000000000 +0200
> > > > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:07:19.000000000 +0200
> > > > @@ -1246,17 +1246,10 @@ static void process_discard(struct thin_
> > > >  			}
> > > >  		} else {
> > > >  			/*
> > > > -			 * This path is hit if people are ignoring
> > > > -			 * limits->discard_granularity.  It ignores any
> > > > -			 * part of the discard that is in a subsequent
> > > > -			 * block.
> > > > +			 * The dm makes sure that the discard doesn't span
> > > > +			 * a block boundary. So we submit the discard
> > > > +			 * to the appropriate block.
> > > >  			 */
> > > > -			sector_t offset = pool->sectors_per_block_shift >= 0 ?
> > > > -			      bio->bi_sector & (pool->sectors_per_block - 1) :
> > > > -			      bio->bi_sector - block * pool->sectors_per_block;
> > > > -			unsigned remaining = (pool->sectors_per_block - offset) << SECTOR_SHIFT;
> > > > -			bio->bi_size = min(bio->bi_size, remaining);
> > > > -
> > > 
> > > So previous code will also send down partial block discard and this code
> > > will also send down partial discard. So nothing has changed from
> > > functionality point of view?
> > 
> > The change is the bit that you trimmed:
> > 
> > ti->split_discard_requests = 1;
> > 
> > That will restrict the size of the discard to be on a blocksize
> > boundary.
> > 
> > But I'm really not sure we want to impose such small discards -- though
> 
> Because the code can't handle large discards --- it trims them to a block 
> boundary and sends the trimmed request to the first chunk.

Yeap, fair enough.

> > the current thinp code does have a problem with discards that are too
> > large (I need to dig up specifics that Joe conveyed to me a few weeks
> > back; I was asking: "why cannot the thinp device have discard limits
> > that match the underlying data device's discard limits?").
> > 
> > <snitm> why not just rely on the max of the underlying device?
> > <ejt> we have to quiesce the blocks we'e about to discard
> > <snitm> e.g. remove the explicit override for max_bytes in set_discard_limits?
> > <ejt> no, it's not simple at all.
> > <ejt> have to be very careful we can service the discard in bounded memory
> > <snitm> so you don't think thinp can handle processing what the hardware can?
> > <ejt> not yet; I'd need to change the bio_prison to be able to lock
> > ranges of blocks, not just single ones.  And add a btree_trim method to
> > prune a btree.
> 
> When Joe implements this, ti->split_discard_requests could be cleared.

Sure, I'm fine with this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] dm-thin: fix discard_granularity
  2012-07-16 18:35     ` [PATCH 3/3] dm-thin: fix discard_granularity Mikulas Patocka
@ 2012-07-17 15:31       ` Mike Snitzer
  2012-07-17 19:35         ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2012-07-17 15:31 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Edward Thornber, Alasdair G. Kergon

On Mon, Jul 16 2012 at  2:35pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> dm-thin: fix discard_granularity
> 
> The kernel expects that limits->discard_granularity is a power of two.
> Set this limit only if we use a power of two block size.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  drivers/md/dm-thin.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> ===================================================================
> --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
> +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
> @@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
>  	 * bios cover a block partially.  A discard that spans a block boundary
>  	 * is not sent to this target.
>  	 */
> -	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> +	if (pool->sectors_per_block_shift >= 0)
> +		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
>  	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
>  }

Given the block layer's assumption that discard_granularity is always a
power of 2: thinp should disable discard if the thinp blocksize is a non
power of 2.  So this patch isn't correct (discard support should be
disabled in pool_ctr based on the specified blocksize).

Future work should be:
1) explore/eliminate the block layer's power of 2 constraints
2) divorce thinp's discard limits (for discard passdown) from the pool's
   blocksize

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] dm-thin: fix discard_granularity
  2012-07-17 15:31       ` Mike Snitzer
@ 2012-07-17 19:35         ` Mikulas Patocka
  2012-07-17 21:52           ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-17 19:35 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Edward Thornber, Alasdair G. Kergon



On Tue, 17 Jul 2012, Mike Snitzer wrote:

> On Mon, Jul 16 2012 at  2:35pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > dm-thin: fix discard_granularity
> > 
> > The kernel expects that limits->discard_granularity is a power of two.
> > Set this limit only if we use a power of two block size.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > ---
> >  drivers/md/dm-thin.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > ===================================================================
> > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
> > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
> > @@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
> >  	 * bios cover a block partially.  A discard that spans a block boundary
> >  	 * is not sent to this target.
> >  	 */
> > -	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > +	if (pool->sectors_per_block_shift >= 0)
> > +		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> >  	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> >  }
> 
> Given the block layer's assumption that discard_granularity is always a
> power of 2: thinp should disable discard if the thinp blocksize is a non
> power of 2.  So this patch isn't correct (discard support should be
> disabled in pool_ctr based on the specified blocksize).

discard_granularity is just a hint (and IMHO quite useless hint).

The documentation says that it indicates a size of internal allocation 
unit that may be larger than the block size. The code doesn't use it this 
way - it is used in FITRIM ioctl where it specifies the minimum request 
size to be sent. It is also used in blkdev_issue_discard where it is used 
to round down the number of sectors to discard on discard_granularity 
boundary - this is wrong, it aligns request size on discard_granularity 
boundary, but it doesn't align request start on this boundary.

I don't see a reason why we should disable discard because of an a rarely 
used hint.

Mikulas

> Future work should be:
> 1) explore/eliminate the block layer's power of 2 constraints
> 2) divorce thinp's discard limits (for discard passdown) from the pool's
>    blocksize
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] dm-thin: fix discard_granularity
  2012-07-17 19:35         ` Mikulas Patocka
@ 2012-07-17 21:52           ` Vivek Goyal
  2012-07-18 19:10             ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2012-07-17 21:52 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, Edward Thornber, Alasdair G. Kergon, Mike Snitzer

On Tue, Jul 17, 2012 at 03:35:04PM -0400, Mikulas Patocka wrote:
> 
> 
> On Tue, 17 Jul 2012, Mike Snitzer wrote:
> 
> > On Mon, Jul 16 2012 at  2:35pm -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > dm-thin: fix discard_granularity
> > > 
> > > The kernel expects that limits->discard_granularity is a power of two.
> > > Set this limit only if we use a power of two block size.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > 
> > > ---
> > >  drivers/md/dm-thin.c |    3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > > ===================================================================
> > > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
> > > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
> > > @@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
> > >  	 * bios cover a block partially.  A discard that spans a block boundary
> > >  	 * is not sent to this target.
> > >  	 */
> > > -	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > +	if (pool->sectors_per_block_shift >= 0)
> > > +		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > >  	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > >  }
> > 
> > Given the block layer's assumption that discard_granularity is always a
> > power of 2: thinp should disable discard if the thinp blocksize is a non
> > power of 2.  So this patch isn't correct (discard support should be
> > disabled in pool_ctr based on the specified blocksize).
> 
> discard_granularity is just a hint (and IMHO quite useless hint).
> 
> The documentation says that it indicates a size of internal allocation 
> unit that may be larger than the block size. The code doesn't use it this 
> way - it is used in FITRIM ioctl where it specifies the minimum request 
> size to be sent. It is also used in blkdev_issue_discard where it is used 
> to round down the number of sectors to discard on discard_granularity 
> boundary - this is wrong, it aligns request size on discard_granularity 
> boundary, but it doesn't align request start on this boundary.

I am not sure I understand completely what you are trying to say. But
after paolo's patch, blkdev_issue_discard() will take into account
max_discard_sectors to limit max discard request size and use
discard_granularity and discard_alignment to determine aligned request start.

First request in the range will go as it is and can be unaligned but if
discard range is big, then rest of the request start will be aligned.

Because there might be an unligned requests at the start of range drivers
will still have to handle unaligned requests, i think.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] dm-thin: fix discard_granularity
  2012-07-17 21:52           ` Vivek Goyal
@ 2012-07-18 19:10             ` Mikulas Patocka
  2012-07-19 15:44               ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2012-07-18 19:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: paolo.bonzini, dm-devel, Edward Thornber, Alasdair G. Kergon,
	Mike Snitzer



On Tue, 17 Jul 2012, Vivek Goyal wrote:

> On Tue, Jul 17, 2012 at 03:35:04PM -0400, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 17 Jul 2012, Mike Snitzer wrote:
> > 
> > > On Mon, Jul 16 2012 at  2:35pm -0400,
> > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > 
> > > > dm-thin: fix discard_granularity
> > > > 
> > > > The kernel expects that limits->discard_granularity is a power of two.
> > > > Set this limit only if we use a power of two block size.
> > > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > 
> > > > ---
> > > >  drivers/md/dm-thin.c |    3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > > > ===================================================================
> > > > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
> > > > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
> > > > @@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
> > > >  	 * bios cover a block partially.  A discard that spans a block boundary
> > > >  	 * is not sent to this target.
> > > >  	 */
> > > > -	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > > +	if (pool->sectors_per_block_shift >= 0)
> > > > +		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > >  	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > > >  }
> > > 
> > > Given the block layer's assumption that discard_granularity is always a
> > > power of 2: thinp should disable discard if the thinp blocksize is a non
> > > power of 2.  So this patch isn't correct (discard support should be
> > > disabled in pool_ctr based on the specified blocksize).
> > 
> > discard_granularity is just a hint (and IMHO quite useless hint).
> > 
> > The documentation says that it indicates a size of internal allocation 
> > unit that may be larger than the block size. The code doesn't use it this 
> > way - it is used in FITRIM ioctl where it specifies the minimum request 
> > size to be sent. It is also used in blkdev_issue_discard where it is used 
> > to round down the number of sectors to discard on discard_granularity 
> > boundary - this is wrong, it aligns request size on discard_granularity 
> > boundary, but it doesn't align request start on this boundary.
> 
> I am not sure I understand completely what you are trying to say.

I mean that it is used inconsistently - sometimes it is used as s minimum 
request to be sent (requests smaller than discard_granularity are not 
sent). And sometimes length is rounded down to discard_granularity 
boundary.

> But
> after paolo's patch, blkdev_issue_discard() will take into account
> max_discard_sectors to limit max discard request size and use
> discard_granularity and discard_alignment to determine aligned request start.

The question is - how are we supposed to propagate these parameters 
through linearly appended devices, raid0, raid1 or other mappings?

For example, if you have a logical volume that consists of two linearly 
appended disks, disk1 with discard_granularity1,discard_alignment1 and 
disk2 with discard_granularity2,discard_alignment2 - tell me, how do you 
calculate discard_granularity and discard_alignment for the combined 
logical device from these four numbers? How do you calculate it if those 
two disks are in raid0 or raid1?

It seems to me that would be much better to state that discard request 
size is unlimited and break one long discard to several smaller discard 
requests at the physical disk driver - then, the problem with combining 
the limits would go away.

> First request in the range will go as it is and can be unaligned but if
> discard range is big, then rest of the request start will be aligned.
> 
> Because there might be an unligned requests at the start of range drivers
> will still have to handle unaligned requests, i think.
> 
> Thanks
> Vivek

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] dm-thin: fix discard_granularity
  2012-07-18 19:10             ` Mikulas Patocka
@ 2012-07-19 15:44               ` Vivek Goyal
  0 siblings, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2012-07-19 15:44 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: paolo.bonzini, dm-devel, Edward Thornber, Alasdair G. Kergon,
	Mike Snitzer

On Wed, Jul 18, 2012 at 03:10:21PM -0400, Mikulas Patocka wrote:
> 
> 
> On Tue, 17 Jul 2012, Vivek Goyal wrote:
> 
> > On Tue, Jul 17, 2012 at 03:35:04PM -0400, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Tue, 17 Jul 2012, Mike Snitzer wrote:
> > > 
> > > > On Mon, Jul 16 2012 at  2:35pm -0400,
> > > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > > 
> > > > > dm-thin: fix discard_granularity
> > > > > 
> > > > > The kernel expects that limits->discard_granularity is a power of two.
> > > > > Set this limit only if we use a power of two block size.
> > > > > 
> > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > > 
> > > > > ---
> > > > >  drivers/md/dm-thin.c |    3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > > > > ===================================================================
> > > > > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
> > > > > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
> > > > > @@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
> > > > >  	 * bios cover a block partially.  A discard that spans a block boundary
> > > > >  	 * is not sent to this target.
> > > > >  	 */
> > > > > -	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > > > +	if (pool->sectors_per_block_shift >= 0)
> > > > > +		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > > >  	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > > > >  }
> > > > 
> > > > Given the block layer's assumption that discard_granularity is always a
> > > > power of 2: thinp should disable discard if the thinp blocksize is a non
> > > > power of 2.  So this patch isn't correct (discard support should be
> > > > disabled in pool_ctr based on the specified blocksize).
> > > 
> > > discard_granularity is just a hint (and IMHO quite useless hint).
> > > 
> > > The documentation says that it indicates a size of internal allocation 
> > > unit that may be larger than the block size. The code doesn't use it this 
> > > way - it is used in FITRIM ioctl where it specifies the minimum request 
> > > size to be sent. It is also used in blkdev_issue_discard where it is used 
> > > to round down the number of sectors to discard on discard_granularity 
> > > boundary - this is wrong, it aligns request size on discard_granularity 
> > > boundary, but it doesn't align request start on this boundary.
> > 
> > I am not sure I understand completely what you are trying to say.
> 
> I mean that it is used inconsistently

We can always fix inconsistent use.

> - sometimes it is used as s minimum 
> request to be sent (requests smaller than discard_granularity are not 
> sent). And sometimes length is rounded down to discard_granularity 
> boundary.

Shouldn't it be used as both. If you use it only to determine the minimum
request size, how do you come up with alignment for your next request?

I thought these are hints so you try your best and then leave it to
physical device/driver to determine how to deal with it now.

> 
> > But
> > after paolo's patch, blkdev_issue_discard() will take into account
> > max_discard_sectors to limit max discard request size and use
> > discard_granularity and discard_alignment to determine aligned request start.
> 
> The question is - how are we supposed to propagate these parameters 
> through linearly appended devices, raid0, raid1 or other mappings?

We already deal with it in stacking limits. (blk_stack_limits()). Now
one can question whether that's the most optimal way to do things or
not.

> 
> For example, if you have a logical volume that consists of two linearly 
> appended disks, disk1 with discard_granularity1,discard_alignment1 and 
> disk2 with discard_granularity2,discard_alignment2 - tell me, how do you 
> calculate discard_granularity and discard_alignment for the combined 
> logical device from these four numbers? How do you calculate it if those 
> two disks are in raid0 or raid1?

I am looking at current logic in blk_stack_limits().

- For discard granularity, it just takes maximum of granularity1 and
  granularity2. So as long as one granularity is multiple of other
  granularity, things are just fine.

- For discard_alignment we seem to be taking lcm() of both the values but
  I can't wrap my head around it that why does that work.

- For max_discard_sectors, we take minimum of two devices.

So discard_alignment seems to be only odd piece w.r.t stacking.

> 
> It seems to me that would be much better to state that discard request 
> size is unlimited and break one long discard to several smaller discard 
> requests at the physical disk driver - then, the problem with combining 
> the limits would go away.

May be. But if hints can be propagated and bio's don't have be broken
down, then it should make life simpler for drivers writers.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-07-19 15:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-16 18:34 [PATCH 0/3] Fix discards on dm-thin Mikulas Patocka
2012-07-16 18:34 ` [PATCH 1/3] dm: introduce split_discard_requests Mikulas Patocka
2012-07-16 18:35   ` [PATCH 2/3] dm-thin: fix discard support Mikulas Patocka
2012-07-16 18:35     ` [PATCH 3/3] dm-thin: fix discard_granularity Mikulas Patocka
2012-07-17 15:31       ` Mike Snitzer
2012-07-17 19:35         ` Mikulas Patocka
2012-07-17 21:52           ` Vivek Goyal
2012-07-18 19:10             ` Mikulas Patocka
2012-07-19 15:44               ` Vivek Goyal
2012-07-17 13:26     ` [PATCH 2/3] dm-thin: fix discard support Vivek Goyal
2012-07-17 13:58       ` Mike Snitzer
2012-07-17 14:18         ` Mikulas Patocka
2012-07-17 14:49           ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.