linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
@ 2010-03-06 21:10 Lars Ellenberg
  2010-03-08  5:33 ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ellenberg @ 2010-03-06 21:10 UTC (permalink / raw)
  To: Alasdair G Kergon
  Cc: device-mapper development, Mikulas Patocka, Neil Brown,
	jens.axboe, linux-kernel

If the lower device exposes a merge_bvec_fn,
dm_set_device_limits() restricts max_sectors
to PAGE_SIZE "just to be safe".

This is not sufficient, however.

If someone uses bio_add_page() to add 8 disjunct 512 byte partial
pages to a bio, it would succeed, but could still cross a border
of whatever restrictions are below us (e.g. raid10 stripe boundary).
An attempted bio_split() would not succeed, because bi_vcnt is 8.

One example that triggered this frequently is the xen io layer.

raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1

Signed-off-by: Lars <lars.ellenberg@linbit.com>

---

Neil: you may want to double check linear.c and raid*.c for similar logic,
even though it is unlikely that someone puts md raid6 on top of something
exposing a merge_bvec_fn.

This is not the first time this has been patched, btw.
See https://bugzilla.redhat.com/show_bug.cgi?id=440093
and the patch by Mikulas:
https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff

---
 drivers/md/dm-table.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 4b22feb..c686ff4 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
 
 	/*
 	 * Check if merge fn is supported.
-	 * If not we'll force DM to use PAGE_SIZE or
+	 * If not we'll force DM to use single bio_vec of PAGE_SIZE or
 	 * smaller I/O, just to be safe.
 	 */
 
-	if (q->merge_bvec_fn && !ti->type->merge)
+	if (q->merge_bvec_fn && !ti->type->merge) {
 		limits->max_sectors =
 			min_not_zero(limits->max_sectors,
 				     (unsigned int) (PAGE_SIZE >> 9));
+		/* Restricting max_sectors is not enough.
+		 * If someone uses bio_add_page to add 8 disjunct 512 byte
+		 * partial pages to a bio, it would succeed,
+		 * but could still cross a border of whatever restrictions
+		 * are below us (e.g. raid0 stripe boundary).  An attempted
+		 * bio_split() would not succeed, because bi_vcnt is 8. */
+		limits->max_segments = 1;
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dm_set_device_limits);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
  2010-03-06 21:10 [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported Lars Ellenberg
@ 2010-03-08  5:33 ` Neil Brown
  2010-03-08  8:35   ` Mikulas Patocka
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-03-08  5:33 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: Alasdair G Kergon, device-mapper development, Mikulas Patocka,
	jens.axboe, linux-kernel

On Sat, 6 Mar 2010 22:10:13 +0100
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> If the lower device exposes a merge_bvec_fn,
> dm_set_device_limits() restricts max_sectors
> to PAGE_SIZE "just to be safe".
> 
> This is not sufficient, however.
> 
> If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> pages to a bio, it would succeed, but could still cross a border
> of whatever restrictions are below us (e.g. raid10 stripe boundary).
> An attempted bio_split() would not succeed, because bi_vcnt is 8.
> 
> One example that triggered this frequently is the xen io layer.
> 
> raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
> 
> Signed-off-by: Lars <lars.ellenberg@linbit.com>
> 
> ---
> 
> Neil: you may want to double check linear.c and raid*.c for similar logic,
> even though it is unlikely that someone puts md raid6 on top of something
> exposing a merge_bvec_fn.
> 

Unlikely, but by no means impossible.  I have queued up an appropriate fix
for md.

Thanks!

NeilBrown


> This is not the first time this has been patched, btw.
> See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> and the patch by Mikulas:
> https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> 
> ---
>  drivers/md/dm-table.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 4b22feb..c686ff4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
>  
>  	/*
>  	 * Check if merge fn is supported.
> -	 * If not we'll force DM to use PAGE_SIZE or
> +	 * If not we'll force DM to use single bio_vec of PAGE_SIZE or
>  	 * smaller I/O, just to be safe.
>  	 */
>  
> -	if (q->merge_bvec_fn && !ti->type->merge)
> +	if (q->merge_bvec_fn && !ti->type->merge) {
>  		limits->max_sectors =
>  			min_not_zero(limits->max_sectors,
>  				     (unsigned int) (PAGE_SIZE >> 9));
> +		/* Restricting max_sectors is not enough.
> +		 * If someone uses bio_add_page to add 8 disjunct 512 byte
> +		 * partial pages to a bio, it would succeed,
> +		 * but could still cross a border of whatever restrictions
> +		 * are below us (e.g. raid0 stripe boundary).  An attempted
> +		 * bio_split() would not succeed, because bi_vcnt is 8. */
> +		limits->max_segments = 1;
> +	}
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(dm_set_device_limits);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
  2010-03-08  5:33 ` Neil Brown
@ 2010-03-08  8:35   ` Mikulas Patocka
  2010-03-08 13:14     ` Lars Ellenberg
  0 siblings, 1 reply; 9+ messages in thread
From: Mikulas Patocka @ 2010-03-08  8:35 UTC (permalink / raw)
  To: Neil Brown
  Cc: Lars Ellenberg, Alasdair G Kergon, device-mapper development,
	jens.axboe, linux-kernel



On Mon, 8 Mar 2010, Neil Brown wrote:

> On Sat, 6 Mar 2010 22:10:13 +0100
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> 
> > If the lower device exposes a merge_bvec_fn,
> > dm_set_device_limits() restricts max_sectors
> > to PAGE_SIZE "just to be safe".
> > 
> > This is not sufficient, however.
> > 
> > If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> > pages to a bio, it would succeed, but could still cross a border
> > of whatever restrictions are below us (e.g. raid10 stripe boundary).
> > An attempted bio_split() would not succeed, because bi_vcnt is 8.
> > 
> > One example that triggered this frequently is the xen io layer.
> > 
> > raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
> > 
> > Signed-off-by: Lars <lars.ellenberg@linbit.com>
> > 
> > ---
> > 
> > Neil: you may want to double check linear.c and raid*.c for similar logic,
> > even though it is unlikely that someone puts md raid6 on top of something
> > exposing a merge_bvec_fn.
> > 
> 
> Unlikely, but by no means impossible.  I have queued up an appropriate fix
> for md.
> 
> Thanks!
> 
> NeilBrown

Hi

That patch with limits->max_segments = 1; is wrong. It fixes this bug 
sometimes and sometimes not.

The problem is, if someone attempts to create a bio with two vector 
entries, the first maps the last sector contained in some page and the 
second maps the first sector of the next physical page: it has one 
segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and 
the raid driver will reject it.

> > This is not the first time this has been patched, btw.
> > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > and the patch by Mikulas:
> > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff

Look at this patch, it is the proper way how to fix it: create a 
merge_bvec_fn that reject more than one biovec entry.

Mikulas

> > ---
> >  drivers/md/dm-table.c |   12 ++++++++++--
> >  1 files changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 4b22feb..c686ff4 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
> >  
> >  	/*
> >  	 * Check if merge fn is supported.
> > -	 * If not we'll force DM to use PAGE_SIZE or
> > +	 * If not we'll force DM to use single bio_vec of PAGE_SIZE or
> >  	 * smaller I/O, just to be safe.
> >  	 */
> >  
> > -	if (q->merge_bvec_fn && !ti->type->merge)
> > +	if (q->merge_bvec_fn && !ti->type->merge) {
> >  		limits->max_sectors =
> >  			min_not_zero(limits->max_sectors,
> >  				     (unsigned int) (PAGE_SIZE >> 9));
> > +		/* Restricting max_sectors is not enough.
> > +		 * If someone uses bio_add_page to add 8 disjunct 512 byte
> > +		 * partial pages to a bio, it would succeed,
> > +		 * but could still cross a border of whatever restrictions
> > +		 * are below us (e.g. raid0 stripe boundary).  An attempted
> > +		 * bio_split() would not succeed, because bi_vcnt is 8. */
> > +		limits->max_segments = 1;
> > +	}
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(dm_set_device_limits);
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
  2010-03-08  8:35   ` Mikulas Patocka
@ 2010-03-08 13:14     ` Lars Ellenberg
  2010-03-18 18:48       ` Andrew Morton
  2010-12-04  6:43       ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
  0 siblings, 2 replies; 9+ messages in thread
From: Lars Ellenberg @ 2010-03-08 13:14 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Neil Brown, Alasdair G Kergon, device-mapper development,
	jens.axboe, linux-kernel

On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> Hi
> 
> That patch with limits->max_segments = 1; is wrong. It fixes this bug 
> sometimes and sometimes not.
> 
> The problem is, if someone attempts to create a bio with two vector 
> entries, the first maps the last sector contained in some page and the 
> second maps the first sector of the next physical page: it has one 
> segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and 
> the raid driver will reject it.

Now that you put it that way ;)
You are right.

My asumption that "single segment" was  
equalvalent in practice with "single bvec"
does not hold true in that case.

Then, what about adding seg_boundary_mask restrictions as well?
	max_sectors = PAGE_SIZE >> 9;
	max_segments = 1;
	seg_boundary_mask = PAGE_SIZE -1;
or some such.

> > > This is not the first time this has been patched, btw.
> > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > and the patch by Mikulas:
> > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> 
> Look at this patch, it is the proper way how to fix it: create a 
> merge_bvec_fn that reject more than one biovec entry.

If adding seg_boundary_mask is still not sufficient,
lets merge that patch instead?
Why has it been dropped, respectively never been merged?
It became obsolete for dm-linear by 7bc3447b,
but in general the bug is still there, or am I missing something?



	Lars

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
  2010-03-08 13:14     ` Lars Ellenberg
@ 2010-03-18 18:48       ` Andrew Morton
  2010-03-18 21:48         ` Neil Brown
  2010-12-04  6:43       ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2010-03-18 18:48 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: Mikulas Patocka, Neil Brown, Alasdair G Kergon,
	device-mapper development, jens.axboe, linux-kernel

On Mon, 8 Mar 2010 14:14:49 +0100
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > Hi
> > 
> > That patch with limits->max_segments = 1; is wrong. It fixes this bug 
> > sometimes and sometimes not.
> > 
> > The problem is, if someone attempts to create a bio with two vector 
> > entries, the first maps the last sector contained in some page and the 
> > second maps the first sector of the next physical page: it has one 
> > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and 
> > the raid driver will reject it.
> 
> Now that you put it that way ;)
> You are right.
> 
> My asumption that "single segment" was  
> equalvalent in practice with "single bvec"
> does not hold true in that case.
> 
> Then, what about adding seg_boundary_mask restrictions as well?
> 	max_sectors = PAGE_SIZE >> 9;
> 	max_segments = 1;
> 	seg_boundary_mask = PAGE_SIZE -1;
> or some such.
> 
> > > > This is not the first time this has been patched, btw.
> > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > and the patch by Mikulas:
> > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> > 
> > Look at this patch, it is the proper way how to fix it: create a 
> > merge_bvec_fn that reject more than one biovec entry.
> 
> If adding seg_boundary_mask is still not sufficient,
> lets merge that patch instead?
> Why has it been dropped, respectively never been merged?
> It became obsolete for dm-linear by 7bc3447b,
> but in general the bug is still there, or am I missing something?
> 

This all seemed to die.  Does Neil's mysterypatch fix all these issues?

Neil, was that patch tagged for -stable backporting?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
  2010-03-18 18:48       ` Andrew Morton
@ 2010-03-18 21:48         ` Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2010-03-18 21:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lars Ellenberg, Mikulas Patocka, Alasdair G Kergon,
	device-mapper development, jens.axboe, linux-kernel

On Thu, 18 Mar 2010 11:48:58 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 8 Mar 2010 14:14:49 +0100
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> 
> > On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > That patch with limits->max_segments = 1; is wrong. It fixes this bug 
> > > sometimes and sometimes not.
> > > 
> > > The problem is, if someone attempts to create a bio with two vector 
> > > entries, the first maps the last sector contained in some page and the 
> > > second maps the first sector of the next physical page: it has one 
> > > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and 
> > > the raid driver will reject it.
> > 
> > Now that you put it that way ;)
> > You are right.
> > 
> > My asumption that "single segment" was  
> > equalvalent in practice with "single bvec"
> > does not hold true in that case.
> > 
> > Then, what about adding seg_boundary_mask restrictions as well?
> > 	max_sectors = PAGE_SIZE >> 9;
> > 	max_segments = 1;
> > 	seg_boundary_mask = PAGE_SIZE -1;
> > or some such.
> > 
> > > > > This is not the first time this has been patched, btw.
> > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > > and the patch by Mikulas:
> > > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> > > 
> > > Look at this patch, it is the proper way how to fix it: create a 
> > > merge_bvec_fn that reject more than one biovec entry.
> > 
> > If adding seg_boundary_mask is still not sufficient,
> > lets merge that patch instead?
> > Why has it been dropped, respectively never been merged?
> > It became obsolete for dm-linear by 7bc3447b,
> > but in general the bug is still there, or am I missing something?
> > 
> 
> This all seemed to die.  Does Neil's mysterypatch fix all these issues?
> 
> Neil, was that patch tagged for -stable backporting?

The patch at the top of my 'for-linus' branch (which Linus doesn't seem to
have pulled yet) fixes this for md and is tagged for -stable backporting.
I just sets max_segments and seg_boundary_mask.  There is no point setting
max_sectors as well.  I found that setting merge_bvec_fn, while a perfectly
correct approach, was more cumbersome.

My patch doesn't fix this for dm.  I assume the dm developers will do that.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] dm: check max_sectors in dm_merge_bvec  (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported)
  2010-03-08 13:14     ` Lars Ellenberg
  2010-03-18 18:48       ` Andrew Morton
@ 2010-12-04  6:43       ` Mike Snitzer
  2010-12-04 16:03         ` Lars Ellenberg
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2010-12-04  6:43 UTC (permalink / raw)
  To: device-mapper development
  Cc: Mikulas Patocka, linux-kernel, Alasdair G Kergon, jaxboe, lars.ellenberg

I'm late to this old thread but I stumbled across it while auditing the
various dm-devel patchwork patches, e.g.:
https://patchwork.kernel.org/patch/83666/
https://patchwork.kernel.org/patch/83932/

On Mon, Mar 08 2010 at  8:14am -0500,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > Hi
> > 
> > That patch with limits->max_segments = 1; is wrong. It fixes this bug 
> > sometimes and sometimes not.
> > 
> > The problem is, if someone attempts to create a bio with two vector 
> > entries, the first maps the last sector contained in some page and the 
> > second maps the first sector of the next physical page: it has one 
> > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and 
> > the raid driver will reject it.
> 
> Now that you put it that way ;)
> You are right.
> 
> My asumption that "single segment" was  
> equalvalent in practice with "single bvec"
> does not hold true in that case.
> 
> Then, what about adding seg_boundary_mask restrictions as well?
> 	max_sectors = PAGE_SIZE >> 9;
> 	max_segments = 1;
> 	seg_boundary_mask = PAGE_SIZE -1;
> or some such.
> 
> > > > This is not the first time this has been patched, btw.
> > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > and the patch by Mikulas:
> > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> > 
> > Look at this patch, it is the proper way how to fix it: create a 
> > merge_bvec_fn that reject more than one biovec entry.
> 
> If adding seg_boundary_mask is still not sufficient,
> lets merge that patch instead?
> Why has it been dropped, respectively never been merged?
> It became obsolete for dm-linear by 7bc3447b,
> but in general the bug is still there, or am I missing something?

No it _should_ be fixed in general given DM's dm_merge_bvec() _but_ I
did uncover what I think is a subtle oversight in its implementation.

Given dm_set_device_limits() sets q->limits->max_sectors,
shouldn't dm_merge_bvec() be using queue_max_sectors rather than
queue_max_hw_sectors?

blk_queue_max_hw_sectors() establishes that max_hw_sectors is the hard
limit and max_sectors the soft.  But AFAICT no relation is maintained
between the two over time (even though max_sectors <= max_hw_sectors
_should_ be enforced; in practice there is no blk_queue_max_sectors
setter that uniformly enforces as much).

Anyway, I think we need the following patch:
--

From: Mike Snitzer <snitzer@redhat.com>
Subject: dm: check max_sectors in dm_merge_bvec

dm_set_device_limits() will set q->limits->max_sectors to <= PAGE_SIZE
if an underlying device has a merge_bvec_fn.  Therefore, dm_merge_bvec()
must use queue_max_sectors() rather than queue_max_hw_sectors() to check
the appropriate limit.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cb1352..e83dcc8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1358,12 +1358,11 @@ static int dm_merge_bvec(struct request_queue *q,
 	/*
 	 * If the target doesn't support merge method and some of the devices
 	 * provided their merge_bvec method (we know this by looking at
-	 * queue_max_hw_sectors), then we can't allow bios with multiple vector
+	 * queue_max_sectors), then we can't allow bios with multiple vector
 	 * entries.  So always set max_size to 0, and the code below allows
 	 * just one page.
 	 */
-	else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
-
+	else if (queue_max_sectors(q) <= PAGE_SIZE >> 9)
 		max_size = 0;
 
 out_table:

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm: check max_sectors in dm_merge_bvec  (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported)
  2010-12-04  6:43       ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
@ 2010-12-04 16:03         ` Lars Ellenberg
  2010-12-04 19:21           ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ellenberg @ 2010-12-04 16:03 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Mikulas Patocka, linux-kernel,
	Alasdair G Kergon, jaxboe

On Sat, Dec 04, 2010 at 01:43:08AM -0500, Mike Snitzer wrote:
> I'm late to this old thread but I stumbled across it while auditing the
> various dm-devel patchwork patches, e.g.:
> https://patchwork.kernel.org/patch/83666/
> https://patchwork.kernel.org/patch/83932/
> 
> On Mon, Mar 08 2010 at  8:14am -0500,
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> 
> > On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > That patch with limits->max_segments = 1; is wrong. It fixes this bug 
> > > sometimes and sometimes not.
> > > 
> > > The problem is, if someone attempts to create a bio with two vector 
> > > entries, the first maps the last sector contained in some page and the 
> > > second maps the first sector of the next physical page: it has one 
> > > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and 
> > > the raid driver will reject it.
> > 
> > Now that you put it that way ;)
> > You are right.
> > 
> > My asumption that "single segment" was  
> > equalvalent in practice with "single bvec"
> > does not hold true in that case.
> > 
> > Then, what about adding seg_boundary_mask restrictions as well?
> > 	max_sectors = PAGE_SIZE >> 9;
> > 	max_segments = 1;
> > 	seg_boundary_mask = PAGE_SIZE -1;
> > or some such.
> > 
> > > > > This is not the first time this has been patched, btw.
> > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > > and the patch by Mikulas:
> > > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> > > 
> > > Look at this patch, it is the proper way how to fix it: create a 
> > > merge_bvec_fn that reject more than one biovec entry.
> > 
> > If adding seg_boundary_mask is still not sufficient,
> > lets merge that patch instead?
> > Why has it been dropped, respectively never been merged?
> > It became obsolete for dm-linear by 7bc3447b,
> > but in general the bug is still there, or am I missing something?
> 
> No it _should_ be fixed in general given DM's dm_merge_bvec() _but_ I
> did uncover what I think is a subtle oversight in its implementation.
> 
> Given dm_set_device_limits() sets q->limits->max_sectors,
> shouldn't dm_merge_bvec() be using queue_max_sectors rather than
> queue_max_hw_sectors?
> 
> blk_queue_max_hw_sectors() establishes that max_hw_sectors is the hard
> limit and max_sectors the soft.  But AFAICT no relation is maintained
> between the two over time (even though max_sectors <= max_hw_sectors
> _should_ be enforced; in practice there is no blk_queue_max_sectors
> setter that uniformly enforces as much).

Just for the record, in case someone finds this in the archives,
and wants to backport or base his own work on this:

 A long time ago, there was no .max_hw_sectors.  Then max_hw_sectors got
 introduced, but without accessor function.

 Before 2.6.31, there was no blk_queue_max_hw_sectors(),
 only blk_queue_max_sectors(), which set both.

 2.6.31 introduced some blk_queue_max_hw_sectors(), which _only_ set
 max_hw_sectors, and enforced a lower limit of BLK_DEF_MAX_SECTORS, so
 using that only, you have not been able to actually set lower limits
 than 512 kB. With 2.6.31 to 2.6.33, inclusive, you still need to use
 blk_queue_max_sectors() to set your limits.

 2.6.34 finally dropped the newly introduced function again,
 but renamed the other, so starting with 2.6.34 you need to use
 blk_queue_max_hw_sectors(), which now basically has the function body
 blk_queue_max_sectors() had up until 2.6.33.

> dm_set_device_limits() will set q->limits->max_sectors to <= PAGE_SIZE
> if an underlying device has a merge_bvec_fn.  Therefore, dm_merge_bvec()
> must use queue_max_sectors() rather than queue_max_hw_sectors() to check
> the appropriate limit.

IMO, you should not do this.
max_sectors is a user tunable, capped by max_hw_sectors.
max_hw_sectors is the driver limit.

Please set max_hw_sectors in dm_set_device_limits instead.

BTW, e.g.  o_direct will adhere to max_hw_limits,
but happily ignore max_sectors, I think.

> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/md/dm.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 7cb1352..e83dcc8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1358,12 +1358,11 @@ static int dm_merge_bvec(struct request_queue *q,
>  	/*
>  	 * If the target doesn't support merge method and some of the devices
>  	 * provided their merge_bvec method (we know this by looking at
> -	 * queue_max_hw_sectors), then we can't allow bios with multiple vector
> +	 * queue_max_sectors), then we can't allow bios with multiple vector
>  	 * entries.  So always set max_size to 0, and the code below allows
>  	 * just one page.
>  	 */
> -	else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
> -
> +	else if (queue_max_sectors(q) <= PAGE_SIZE >> 9)
>  		max_size = 0;
>  
>  out_table:


	Lars

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm: check max_sectors in dm_merge_bvec  (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported)
  2010-12-04 16:03         ` Lars Ellenberg
@ 2010-12-04 19:21           ` Mike Snitzer
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2010-12-04 19:21 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: device-mapper development, Mikulas Patocka, linux-kernel,
	Alasdair G Kergon, jaxboe

On Sat, Dec 04 2010 at 11:03am -0500,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> On Sat, Dec 04, 2010 at 01:43:08AM -0500, Mike Snitzer wrote:
>
> > Given dm_set_device_limits() sets q->limits->max_sectors,
> > shouldn't dm_merge_bvec() be using queue_max_sectors rather than
> > queue_max_hw_sectors?
> > 
> > blk_queue_max_hw_sectors() establishes that max_hw_sectors is the hard
> > limit and max_sectors the soft.  But AFAICT no relation is maintained
> > between the two over time (even though max_sectors <= max_hw_sectors
> > _should_ be enforced; in practice there is no blk_queue_max_sectors
> > setter that uniformly enforces as much).
> 
> Just for the record, in case someone finds this in the archives,
> and wants to backport or base his own work on this:
> 
>  A long time ago, there was no .max_hw_sectors.  Then max_hw_sectors got
>  introduced, but without accessor function.
> 
>  Before 2.6.31, there was no blk_queue_max_hw_sectors(),
>  only blk_queue_max_sectors(), which set both.
> 
>  2.6.31 introduced some blk_queue_max_hw_sectors(), which _only_ set
>  max_hw_sectors, and enforced a lower limit of BLK_DEF_MAX_SECTORS, so
>  using that only, you have not been able to actually set lower limits
>  than 512 kB. With 2.6.31 to 2.6.33, inclusive, you still need to use
>  blk_queue_max_sectors() to set your limits.
> 
>  2.6.34 finally dropped the newly introduced function again,
>  but renamed the other, so starting with 2.6.34 you need to use
>  blk_queue_max_hw_sectors(), which now basically has the function body
>  blk_queue_max_sectors() had up until 2.6.33.
> 
> > dm_set_device_limits() will set q->limits->max_sectors to <= PAGE_SIZE
> > if an underlying device has a merge_bvec_fn.  Therefore, dm_merge_bvec()
> > must use queue_max_sectors() rather than queue_max_hw_sectors() to check
> > the appropriate limit.
> 
> IMO, you should not do this.
> max_sectors is a user tunable, capped by max_hw_sectors.
> max_hw_sectors is the driver limit.
>
> Please set max_hw_sectors in dm_set_device_limits instead.

Right, good point.. will do (unless I happen upon a reason not to or
someone else shouts).

Thanks,
Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-12-04 19:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-06 21:10 [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported Lars Ellenberg
2010-03-08  5:33 ` Neil Brown
2010-03-08  8:35   ` Mikulas Patocka
2010-03-08 13:14     ` Lars Ellenberg
2010-03-18 18:48       ` Andrew Morton
2010-03-18 21:48         ` Neil Brown
2010-12-04  6:43       ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
2010-12-04 16:03         ` Lars Ellenberg
2010-12-04 19:21           ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).