* [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
@ 2010-03-06 21:10 Lars Ellenberg
2010-03-08 5:33 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: Lars Ellenberg @ 2010-03-06 21:10 UTC (permalink / raw)
To: Alasdair G Kergon
Cc: device-mapper development, Mikulas Patocka, Neil Brown,
jens.axboe, linux-kernel
If the lower device exposes a merge_bvec_fn,
dm_set_device_limits() restricts max_sectors
to PAGE_SIZE "just to be safe".
This is not sufficient, however.
If someone uses bio_add_page() to add 8 disjunct 512 byte partial
pages to a bio, it would succeed, but could still cross a border
of whatever restrictions are below us (e.g. raid10 stripe boundary).
An attempted bio_split() would not succeed, because bi_vcnt is 8.
One example that triggered this frequently is the xen io layer.
raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
Signed-off-by: Lars <lars.ellenberg@linbit.com>
---
Neil: you may want to double check linear.c and raid*.c for similar logic,
even though it is unlikely that someone puts md raid6 on top of something
exposing a merge_bvec_fn.
This is not the first time this has been patched, btw.
See https://bugzilla.redhat.com/show_bug.cgi?id=440093
and the patch by Mikulas:
https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
---
drivers/md/dm-table.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 4b22feb..c686ff4 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
/*
* Check if merge fn is supported.
- * If not we'll force DM to use PAGE_SIZE or
+ * If not we'll force DM to use single bio_vec of PAGE_SIZE or
* smaller I/O, just to be safe.
*/
- if (q->merge_bvec_fn && !ti->type->merge)
+ if (q->merge_bvec_fn && !ti->type->merge) {
limits->max_sectors =
min_not_zero(limits->max_sectors,
(unsigned int) (PAGE_SIZE >> 9));
+ /* Restricting max_sectors is not enough.
+ * If someone uses bio_add_page to add 8 disjunct 512 byte
+ * partial pages to a bio, it would succeed,
+ * but could still cross a border of whatever restrictions
+ * are below us (e.g. raid0 stripe boundary). An attempted
+ * bio_split() would not succeed, because bi_vcnt is 8. */
+ limits->max_segments = 1;
+ }
return 0;
}
EXPORT_SYMBOL_GPL(dm_set_device_limits);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
2010-03-06 21:10 [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported Lars Ellenberg
@ 2010-03-08 5:33 ` Neil Brown
2010-03-08 8:35 ` Mikulas Patocka
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-03-08 5:33 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Alasdair G Kergon, device-mapper development, Mikulas Patocka,
jens.axboe, linux-kernel
On Sat, 6 Mar 2010 22:10:13 +0100
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> If the lower device exposes a merge_bvec_fn,
> dm_set_device_limits() restricts max_sectors
> to PAGE_SIZE "just to be safe".
>
> This is not sufficient, however.
>
> If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> pages to a bio, it would succeed, but could still cross a border
> of whatever restrictions are below us (e.g. raid10 stripe boundary).
> An attempted bio_split() would not succeed, because bi_vcnt is 8.
>
> One example that triggered this frequently is the xen io layer.
>
> raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
>
> Signed-off-by: Lars <lars.ellenberg@linbit.com>
>
> ---
>
> Neil: you may want to double check linear.c and raid*.c for similar logic,
> even though it is unlikely that someone puts md raid6 on top of something
> exposing a merge_bvec_fn.
>
Unlikely, but by no means impossible. I have queued up an appropriate fix
for md.
Thanks!
NeilBrown
> This is not the first time this has been patched, btw.
> See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> and the patch by Mikulas:
> https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
>
> ---
> drivers/md/dm-table.c | 12 ++++++++++--
> 1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 4b22feb..c686ff4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
>
> /*
> * Check if merge fn is supported.
> - * If not we'll force DM to use PAGE_SIZE or
> + * If not we'll force DM to use single bio_vec of PAGE_SIZE or
> * smaller I/O, just to be safe.
> */
>
> - if (q->merge_bvec_fn && !ti->type->merge)
> + if (q->merge_bvec_fn && !ti->type->merge) {
> limits->max_sectors =
> min_not_zero(limits->max_sectors,
> (unsigned int) (PAGE_SIZE >> 9));
> + /* Restricting max_sectors is not enough.
> + * If someone uses bio_add_page to add 8 disjunct 512 byte
> + * partial pages to a bio, it would succeed,
> + * but could still cross a border of whatever restrictions
> + * are below us (e.g. raid0 stripe boundary). An attempted
> + * bio_split() would not succeed, because bi_vcnt is 8. */
> + limits->max_segments = 1;
> + }
> return 0;
> }
> EXPORT_SYMBOL_GPL(dm_set_device_limits);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
2010-03-08 5:33 ` Neil Brown
@ 2010-03-08 8:35 ` Mikulas Patocka
2010-03-08 13:14 ` Lars Ellenberg
0 siblings, 1 reply; 9+ messages in thread
From: Mikulas Patocka @ 2010-03-08 8:35 UTC (permalink / raw)
To: Neil Brown
Cc: Lars Ellenberg, Alasdair G Kergon, device-mapper development,
jens.axboe, linux-kernel
On Mon, 8 Mar 2010, Neil Brown wrote:
> On Sat, 6 Mar 2010 22:10:13 +0100
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
>
> > If the lower device exposes a merge_bvec_fn,
> > dm_set_device_limits() restricts max_sectors
> > to PAGE_SIZE "just to be safe".
> >
> > This is not sufficient, however.
> >
> > If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> > pages to a bio, it would succeed, but could still cross a border
> > of whatever restrictions are below us (e.g. raid10 stripe boundary).
> > An attempted bio_split() would not succeed, because bi_vcnt is 8.
> >
> > One example that triggered this frequently is the xen io layer.
> >
> > raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
> >
> > Signed-off-by: Lars <lars.ellenberg@linbit.com>
> >
> > ---
> >
> > Neil: you may want to double check linear.c and raid*.c for similar logic,
> > even though it is unlikely that someone puts md raid6 on top of something
> > exposing a merge_bvec_fn.
> >
>
> Unlikely, but by no means impossible. I have queued up an appropriate fix
> for md.
>
> Thanks!
>
> NeilBrown
Hi
That patch with limits->max_segments = 1; is wrong. It fixes this bug
sometimes and sometimes not.
The problem is, if someone attempts to create a bio with two vector
entries, the first maps the last sector contained in some page and the
second maps the first sector of the next physical page: it has one
segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and
the raid driver will reject it.
> > This is not the first time this has been patched, btw.
> > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > and the patch by Mikulas:
> > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
Look at this patch, it is the proper way how to fix it: create a
merge_bvec_fn that reject more than one biovec entry.
Mikulas
> > ---
> > drivers/md/dm-table.c | 12 ++++++++++--
> > 1 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 4b22feb..c686ff4 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
> >
> > /*
> > * Check if merge fn is supported.
> > - * If not we'll force DM to use PAGE_SIZE or
> > + * If not we'll force DM to use single bio_vec of PAGE_SIZE or
> > * smaller I/O, just to be safe.
> > */
> >
> > - if (q->merge_bvec_fn && !ti->type->merge)
> > + if (q->merge_bvec_fn && !ti->type->merge) {
> > limits->max_sectors =
> > min_not_zero(limits->max_sectors,
> > (unsigned int) (PAGE_SIZE >> 9));
> > + /* Restricting max_sectors is not enough.
> > + * If someone uses bio_add_page to add 8 disjunct 512 byte
> > + * partial pages to a bio, it would succeed,
> > + * but could still cross a border of whatever restrictions
> > + * are below us (e.g. raid0 stripe boundary). An attempted
> > + * bio_split() would not succeed, because bi_vcnt is 8. */
> > + limits->max_segments = 1;
> > + }
> > return 0;
> > }
> > EXPORT_SYMBOL_GPL(dm_set_device_limits);
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
2010-03-08 8:35 ` Mikulas Patocka
@ 2010-03-08 13:14 ` Lars Ellenberg
2010-03-18 18:48 ` Andrew Morton
2010-12-04 6:43 ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
0 siblings, 2 replies; 9+ messages in thread
From: Lars Ellenberg @ 2010-03-08 13:14 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Neil Brown, Alasdair G Kergon, device-mapper development,
jens.axboe, linux-kernel
On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> Hi
>
> That patch with limits->max_segments = 1; is wrong. It fixes this bug
> sometimes and sometimes not.
>
> The problem is, if someone attempts to create a bio with two vector
> entries, the first maps the last sector contained in some page and the
> second maps the first sector of the next physical page: it has one
> segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and
> the raid driver will reject it.
Now that you put it that way ;)
You are right.
My asumption that "single segment" was
equalvalent in practice with "single bvec"
does not hold true in that case.
Then, what about adding seg_boundary_mask restrictions as well?
max_sectors = PAGE_SIZE >> 9;
max_segments = 1;
seg_boundary_mask = PAGE_SIZE -1;
or some such.
> > > This is not the first time this has been patched, btw.
> > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > and the patch by Mikulas:
> > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
>
> Look at this patch, it is the proper way how to fix it: create a
> merge_bvec_fn that reject more than one biovec entry.
If adding seg_boundary_mask is still not sufficient,
lets merge that patch instead?
Why has it been dropped, respectively never been merged?
It became obsolete for dm-linear by 7bc3447b,
but in general the bug is still there, or am I missing something?
Lars
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
2010-03-08 13:14 ` Lars Ellenberg
@ 2010-03-18 18:48 ` Andrew Morton
2010-03-18 21:48 ` Neil Brown
2010-12-04 6:43 ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2010-03-18 18:48 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Mikulas Patocka, Neil Brown, Alasdair G Kergon,
device-mapper development, jens.axboe, linux-kernel
On Mon, 8 Mar 2010 14:14:49 +0100
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > Hi
> >
> > That patch with limits->max_segments = 1; is wrong. It fixes this bug
> > sometimes and sometimes not.
> >
> > The problem is, if someone attempts to create a bio with two vector
> > entries, the first maps the last sector contained in some page and the
> > second maps the first sector of the next physical page: it has one
> > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and
> > the raid driver will reject it.
>
> Now that you put it that way ;)
> You are right.
>
> My asumption that "single segment" was
> equalvalent in practice with "single bvec"
> does not hold true in that case.
>
> Then, what about adding seg_boundary_mask restrictions as well?
> max_sectors = PAGE_SIZE >> 9;
> max_segments = 1;
> seg_boundary_mask = PAGE_SIZE -1;
> or some such.
>
> > > > This is not the first time this has been patched, btw.
> > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > and the patch by Mikulas:
> > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> >
> > Look at this patch, it is the proper way how to fix it: create a
> > merge_bvec_fn that reject more than one biovec entry.
>
> If adding seg_boundary_mask is still not sufficient,
> lets merge that patch instead?
> Why has it been dropped, respectively never been merged?
> It became obsolete for dm-linear by 7bc3447b,
> but in general the bug is still there, or am I missing something?
>
This all seemed to die. Does Neil's mysterypatch fix all these issues?
Neil, was that patch tagged for -stable backporting?
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported
2010-03-18 18:48 ` Andrew Morton
@ 2010-03-18 21:48 ` Neil Brown
0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2010-03-18 21:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Lars Ellenberg, Mikulas Patocka, Alasdair G Kergon,
device-mapper development, jens.axboe, linux-kernel
On Thu, 18 Mar 2010 11:48:58 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 8 Mar 2010 14:14:49 +0100
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
>
> > On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > > Hi
> > >
> > > That patch with limits->max_segments = 1; is wrong. It fixes this bug
> > > sometimes and sometimes not.
> > >
> > > The problem is, if someone attempts to create a bio with two vector
> > > entries, the first maps the last sector contained in some page and the
> > > second maps the first sector of the next physical page: it has one
> > > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and
> > > the raid driver will reject it.
> >
> > Now that you put it that way ;)
> > You are right.
> >
> > My asumption that "single segment" was
> > equalvalent in practice with "single bvec"
> > does not hold true in that case.
> >
> > Then, what about adding seg_boundary_mask restrictions as well?
> > max_sectors = PAGE_SIZE >> 9;
> > max_segments = 1;
> > seg_boundary_mask = PAGE_SIZE -1;
> > or some such.
> >
> > > > > This is not the first time this has been patched, btw.
> > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > > and the patch by Mikulas:
> > > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> > >
> > > Look at this patch, it is the proper way how to fix it: create a
> > > merge_bvec_fn that reject more than one biovec entry.
> >
> > If adding seg_boundary_mask is still not sufficient,
> > lets merge that patch instead?
> > Why has it been dropped, respectively never been merged?
> > It became obsolete for dm-linear by 7bc3447b,
> > but in general the bug is still there, or am I missing something?
> >
>
> This all seemed to die. Does Neil's mysterypatch fix all these issues?
>
> Neil, was that patch tagged for -stable backporting?
The patch at the top of my 'for-linus' branch (which Linus doesn't seem to
have pulled yet) fixes this for md and is tagged for -stable backporting.
I just sets max_segments and seg_boundary_mask. There is no point setting
max_sectors as well. I found that setting merge_bvec_fn, while a perfectly
correct approach, was more cumbersome.
My patch doesn't fix this for dm. I assume the dm developers will do that.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported)
2010-03-08 13:14 ` Lars Ellenberg
2010-03-18 18:48 ` Andrew Morton
@ 2010-12-04 6:43 ` Mike Snitzer
2010-12-04 16:03 ` Lars Ellenberg
1 sibling, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2010-12-04 6:43 UTC (permalink / raw)
To: device-mapper development
Cc: Mikulas Patocka, linux-kernel, Alasdair G Kergon, jaxboe, lars.ellenberg
I'm late to this old thread but I stumbled across it while auditing the
various dm-devel patchwork patches, e.g.:
https://patchwork.kernel.org/patch/83666/
https://patchwork.kernel.org/patch/83932/
On Mon, Mar 08 2010 at 8:14am -0500,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > Hi
> >
> > That patch with limits->max_segments = 1; is wrong. It fixes this bug
> > sometimes and sometimes not.
> >
> > The problem is, if someone attempts to create a bio with two vector
> > entries, the first maps the last sector contained in some page and the
> > second maps the first sector of the next physical page: it has one
> > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and
> > the raid driver will reject it.
>
> Now that you put it that way ;)
> You are right.
>
> My asumption that "single segment" was
> equalvalent in practice with "single bvec"
> does not hold true in that case.
>
> Then, what about adding seg_boundary_mask restrictions as well?
> max_sectors = PAGE_SIZE >> 9;
> max_segments = 1;
> seg_boundary_mask = PAGE_SIZE -1;
> or some such.
>
> > > > This is not the first time this has been patched, btw.
> > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > and the patch by Mikulas:
> > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> >
> > Look at this patch, it is the proper way how to fix it: create a
> > merge_bvec_fn that reject more than one biovec entry.
>
> If adding seg_boundary_mask is still not sufficient,
> lets merge that patch instead?
> Why has it been dropped, respectively never been merged?
> It became obsolete for dm-linear by 7bc3447b,
> but in general the bug is still there, or am I missing something?
No it _should_ be fixed in general given DM's dm_merge_bvec() _but_ I
did uncover what I think is a subtle oversight in its implementation.
Given dm_set_device_limits() sets q->limits->max_sectors,
shouldn't dm_merge_bvec() be using queue_max_sectors rather than
queue_max_hw_sectors?
blk_queue_max_hw_sectors() establishes that max_hw_sectors is the hard
limit and max_sectors the soft. But AFAICT no relation is maintained
between the two over time (even though max_sectors <= max_hw_sectors
_should_ be enforced; in practice there is no blk_queue_max_sectors
setter that uniformly enforces as much).
Anyway, I think we need the following patch:
--
From: Mike Snitzer <snitzer@redhat.com>
Subject: dm: check max_sectors in dm_merge_bvec
dm_set_device_limits() will set q->limits->max_sectors to <= PAGE_SIZE
if an underlying device has a merge_bvec_fn. Therefore, dm_merge_bvec()
must use queue_max_sectors() rather than queue_max_hw_sectors() to check
the appropriate limit.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cb1352..e83dcc8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1358,12 +1358,11 @@ static int dm_merge_bvec(struct request_queue *q,
/*
* If the target doesn't support merge method and some of the devices
* provided their merge_bvec method (we know this by looking at
- * queue_max_hw_sectors), then we can't allow bios with multiple vector
+ * queue_max_sectors), then we can't allow bios with multiple vector
* entries. So always set max_size to 0, and the code below allows
* just one page.
*/
- else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
-
+ else if (queue_max_sectors(q) <= PAGE_SIZE >> 9)
max_size = 0;
out_table:
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported)
2010-12-04 6:43 ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
@ 2010-12-04 16:03 ` Lars Ellenberg
2010-12-04 19:21 ` Mike Snitzer
0 siblings, 1 reply; 9+ messages in thread
From: Lars Ellenberg @ 2010-12-04 16:03 UTC (permalink / raw)
To: Mike Snitzer
Cc: device-mapper development, Mikulas Patocka, linux-kernel,
Alasdair G Kergon, jaxboe
On Sat, Dec 04, 2010 at 01:43:08AM -0500, Mike Snitzer wrote:
> I'm late to this old thread but I stumbled across it while auditing the
> various dm-devel patchwork patches, e.g.:
> https://patchwork.kernel.org/patch/83666/
> https://patchwork.kernel.org/patch/83932/
>
> On Mon, Mar 08 2010 at 8:14am -0500,
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
>
> > On Mon, Mar 08, 2010 at 03:35:37AM -0500, Mikulas Patocka wrote:
> > > Hi
> > >
> > > That patch with limits->max_segments = 1; is wrong. It fixes this bug
> > > sometimes and sometimes not.
> > >
> > > The problem is, if someone attempts to create a bio with two vector
> > > entries, the first maps the last sector contained in some page and the
> > > second maps the first sector of the next physical page: it has one
> > > segment, it has size <= PAGE_SIZE, but it still may cross raid stripe and
> > > the raid driver will reject it.
> >
> > Now that you put it that way ;)
> > You are right.
> >
> > My asumption that "single segment" was
> > equalvalent in practice with "single bvec"
> > does not hold true in that case.
> >
> > Then, what about adding seg_boundary_mask restrictions as well?
> > max_sectors = PAGE_SIZE >> 9;
> > max_segments = 1;
> > seg_boundary_mask = PAGE_SIZE -1;
> > or some such.
> >
> > > > > This is not the first time this has been patched, btw.
> > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=440093
> > > > > and the patch by Mikulas:
> > > > > https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff
> > >
> > > Look at this patch, it is the proper way how to fix it: create a
> > > merge_bvec_fn that reject more than one biovec entry.
> >
> > If adding seg_boundary_mask is still not sufficient,
> > lets merge that patch instead?
> > Why has it been dropped, respectively never been merged?
> > It became obsolete for dm-linear by 7bc3447b,
> > but in general the bug is still there, or am I missing something?
>
> No it _should_ be fixed in general given DM's dm_merge_bvec() _but_ I
> did uncover what I think is a subtle oversight in its implementation.
>
> Given dm_set_device_limits() sets q->limits->max_sectors,
> shouldn't dm_merge_bvec() be using queue_max_sectors rather than
> queue_max_hw_sectors?
>
> blk_queue_max_hw_sectors() establishes that max_hw_sectors is the hard
> limit and max_sectors the soft. But AFAICT no relation is maintained
> between the two over time (even though max_sectors <= max_hw_sectors
> _should_ be enforced; in practice there is no blk_queue_max_sectors
> setter that uniformly enforces as much).
Just for the record, in case someone finds this in the archives,
and wants to backport or base his own work on this:
A long time ago, there was no .max_hw_sectors. Then max_hw_sectors got
introduced, but without accessor function.
Before 2.6.31, there was no blk_queue_max_hw_sectors(),
only blk_queue_max_sectors(), which set both.
2.6.31 introduced some blk_queue_max_hw_sectors(), which _only_ set
max_hw_sectors, and enforced a lower limit of BLK_DEF_MAX_SECTORS, so
using that only, you have not been able to actually set lower limits
than 512 kB. With 2.6.31 to 2.6.33, inclusive, you still need to use
blk_queue_max_sectors() to set your limits.
2.6.34 finally dropped the newly introduced function again,
but renamed the other, so starting with 2.6.34 you need to use
blk_queue_max_hw_sectors(), which now basically has the function body
blk_queue_max_sectors() had up until 2.6.33.
> dm_set_device_limits() will set q->limits->max_sectors to <= PAGE_SIZE
> if an underlying device has a merge_bvec_fn. Therefore, dm_merge_bvec()
> must use queue_max_sectors() rather than queue_max_hw_sectors() to check
> the appropriate limit.
IMO, you should not do this.
max_sectors is a user tunable, capped by max_hw_sectors.
max_hw_sectors is the driver limit.
Please set max_hw_sectors in dm_set_device_limits instead.
BTW, e.g. o_direct will adhere to max_hw_limits,
but happily ignore max_sectors, I think.
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/md/dm.c | 5 ++---
> 1 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 7cb1352..e83dcc8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1358,12 +1358,11 @@ static int dm_merge_bvec(struct request_queue *q,
> /*
> * If the target doesn't support merge method and some of the devices
> * provided their merge_bvec method (we know this by looking at
> - * queue_max_hw_sectors), then we can't allow bios with multiple vector
> + * queue_max_sectors), then we can't allow bios with multiple vector
> * entries. So always set max_size to 0, and the code below allows
> * just one page.
> */
> - else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
> -
> + else if (queue_max_sectors(q) <= PAGE_SIZE >> 9)
> max_size = 0;
>
> out_table:
Lars
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported)
2010-12-04 16:03 ` Lars Ellenberg
@ 2010-12-04 19:21 ` Mike Snitzer
0 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2010-12-04 19:21 UTC (permalink / raw)
To: Lars Ellenberg
Cc: device-mapper development, Mikulas Patocka, linux-kernel,
Alasdair G Kergon, jaxboe
On Sat, Dec 04 2010 at 11:03am -0500,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> On Sat, Dec 04, 2010 at 01:43:08AM -0500, Mike Snitzer wrote:
>
> > Given dm_set_device_limits() sets q->limits->max_sectors,
> > shouldn't dm_merge_bvec() be using queue_max_sectors rather than
> > queue_max_hw_sectors?
> >
> > blk_queue_max_hw_sectors() establishes that max_hw_sectors is the hard
> > limit and max_sectors the soft. But AFAICT no relation is maintained
> > between the two over time (even though max_sectors <= max_hw_sectors
> > _should_ be enforced; in practice there is no blk_queue_max_sectors
> > setter that uniformly enforces as much).
>
> Just for the record, in case someone finds this in the archives,
> and wants to backport or base his own work on this:
>
> A long time ago, there was no .max_hw_sectors. Then max_hw_sectors got
> introduced, but without accessor function.
>
> Before 2.6.31, there was no blk_queue_max_hw_sectors(),
> only blk_queue_max_sectors(), which set both.
>
> 2.6.31 introduced some blk_queue_max_hw_sectors(), which _only_ set
> max_hw_sectors, and enforced a lower limit of BLK_DEF_MAX_SECTORS, so
> using that only, you have not been able to actually set lower limits
> than 512 kB. With 2.6.31 to 2.6.33, inclusive, you still need to use
> blk_queue_max_sectors() to set your limits.
>
> 2.6.34 finally dropped the newly introduced function again,
> but renamed the other, so starting with 2.6.34 you need to use
> blk_queue_max_hw_sectors(), which now basically has the function body
> blk_queue_max_sectors() had up until 2.6.33.
>
> > dm_set_device_limits() will set q->limits->max_sectors to <= PAGE_SIZE
> > if an underlying device has a merge_bvec_fn. Therefore, dm_merge_bvec()
> > must use queue_max_sectors() rather than queue_max_hw_sectors() to check
> > the appropriate limit.
>
> IMO, you should not do this.
> max_sectors is a user tunable, capped by max_hw_sectors.
> max_hw_sectors is the driver limit.
>
> Please set max_hw_sectors in dm_set_device_limits instead.
Right, good point.. will do (unless I happen upon a reason not to or
someone else shouts).
Thanks,
Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-12-04 19:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-06 21:10 [PATCH] dm: max_segments=1 if merge_bvec_fn is not supported Lars Ellenberg
2010-03-08 5:33 ` Neil Brown
2010-03-08 8:35 ` Mikulas Patocka
2010-03-08 13:14 ` Lars Ellenberg
2010-03-18 18:48 ` Andrew Morton
2010-03-18 21:48 ` Neil Brown
2010-12-04 6:43 ` [PATCH] dm: check max_sectors in dm_merge_bvec (was: Re: dm: max_segments=1 if merge_bvec_fn is not supported) Mike Snitzer
2010-12-04 16:03 ` Lars Ellenberg
2010-12-04 19:21 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).