[dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk_sectors
@ 2020-11-30 17:18 Mike Snitzer
  2020-11-30 20:51 ` John Dorminy
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
  0 siblings, 2 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-11-30 17:18 UTC (permalink / raw)
  To: linux-block; +Cc: dm-devel, bjohnsto, jdorminy

chunk_sectors must reflect the most limited of all devices in the IO
stack.

Otherwise malformed IO may result. E.g.: prior to this fix,
->chunk_sectors = lcm_not_zero(8, 128) would result in
blk_max_size_offset() splitting IO at 128 sectors rather than the
required more restrictive 8 sectors.

Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-settings.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 9741d1d83e98..1d9decd4646e 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 	t->io_min = max(t->io_min, b->io_min);
 	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
-	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
+
+	if (b->chunk_sectors)
+		t->chunk_sectors = min_not_zero(t->chunk_sectors,
+						b->chunk_sectors);
 
 	/* Physical block size a multiple of the logical block size? */
 	if (t->physical_block_size & (t->logical_block_size - 1)) {
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk_sectors
  2020-11-30 17:18 [dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk_sectors Mike Snitzer
@ 2020-11-30 20:51 ` John Dorminy
  2020-11-30 23:24   ` [dm-devel] " Mike Snitzer
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
  1 sibling, 1 reply; 33+ messages in thread
From: John Dorminy @ 2020-11-30 20:51 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, device-mapper development, Bruce Johnston

I don't think this suffices, as it allows IOs that span max(a,b) chunk
boundaries.

Chunk sectors is defined as "if set, it will prevent merging across
chunk boundaries". Pulling the example from the last change:
It is possible, albeit more unlikely, for a block device to have a non
power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors,
which results in a full-stripe size of 1280K. This causes the RAID6's
io_opt to be advertised as 1280K, and a stacked device _could_ then be
made to use a blocksize, aka chunk_sectors, that matches non power-of-2
io_opt of underlying RAID6 -- resulting in stacked device's
chunk_sectors being a non power-of-2).

Suppose the stacked device had a block size/chunk_sectors of 256k.
Then, with this change, some IOs issued by the stacked device to the
RAID beneath could span 1280k sector boundaries, and require further
splitting still. I think combining as the GCD is better, since any IO
of size gcd(a,b) definitely spans neither a a-chunk nor a b-chunk
boundary.

But it's possible I'm misunderstanding the purpose of chunk_sectors,
or there should be a check that the one of the two devices' chunk
sizes divides the other.

Thanks.

-John

On Mon, Nov 30, 2020 at 12:18 PM Mike Snitzer <snitzer@redhat.com> wrote:
>
> chunk_sectors must reflect the most limited of all devices in the IO
> stack.
>
> Otherwise malformed IO may result. E.g.: prior to this fix,
> ->chunk_sectors = lcm_not_zero(8, 128) would result in
> blk_max_size_offset() splitting IO at 128 sectors rather than the
> required more restrictive 8 sectors.
>
> Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
> Cc: stable@vger.kernel.org
> Reported-by: John Dorminy <jdorminy@redhat.com>
> Reported-by: Bruce Johnston <bjohnsto@redhat.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-settings.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 9741d1d83e98..1d9decd4646e 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>
>         t->io_min = max(t->io_min, b->io_min);
>         t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> -       t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
> +
> +       if (b->chunk_sectors)
> +               t->chunk_sectors = min_not_zero(t->chunk_sectors,
> +                                               b->chunk_sectors);
>
>         /* Physical block size a multiple of the logical block size? */
>         if (t->physical_block_size & (t->logical_block_size - 1)) {
> --
> 2.15.0
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] block: revert to using min_not_zero() when stacking chunk_sectors
  2020-11-30 20:51 ` John Dorminy
@ 2020-11-30 23:24   ` Mike Snitzer
  2020-12-01  0:21     ` John Dorminy
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-11-30 23:24 UTC (permalink / raw)
  To: John Dorminy; +Cc: linux-block, device-mapper development, Bruce Johnston

On Mon, Nov 30 2020 at  3:51pm -0500,
John Dorminy <jdorminy@redhat.com> wrote:

> I don't think this suffices, as it allows IOs that span max(a,b) chunk
> boundaries.
> 
> Chunk sectors is defined as "if set, it will prevent merging across
> chunk boundaries". Pulling the example from the last change:

If you're going to cherry pick a portion of a commit header please
reference the commit id and use quotes or indentation to make it clear
what is being referenced, etc.

> It is possible, albeit more unlikely, for a block device to have a non
> power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors,
> which results in a full-stripe size of 1280K. This causes the RAID6's
> io_opt to be advertised as 1280K, and a stacked device _could_ then be
> made to use a blocksize, aka chunk_sectors, that matches non power-of-2
> io_opt of underlying RAID6 -- resulting in stacked device's
> chunk_sectors being a non power-of-2).

This was from the header for commit 07d098e6bba ("block: allow
'chunk_sectors' to be non-power-of-2")

> Suppose the stacked device had a block size/chunk_sectors of 256k.

Quite the tangent just to setup an a toy example of say: thinp with 256K
blocksize/chunk_sectors ontop of a RAID6 with a chunk_sectors of 128K
and stripesize of 1280K.

> Then, with this change, some IOs issued by the stacked device to the
> RAID beneath could span 1280k sector boundaries, and require further
> splitting still.
> I think combining as the GCD is better, since any IO
> of size gcd(a,b) definitely spans neither a a-chunk nor a b-chunk
> boundary.

To be clear, you are _not_ saying using lcm_not_zero() is correct.
You're saying that simply reverting block core back to using
min_not_zero() may not be as good as using gcd().

While that may be true (not sure yet) you've now muddied a conservative
fix (that reverts block core back to its longstanding use of
min_not_zero for chunk_sectors) in pursuit of addressing some different
concern than the case that you _really_ care about getting fixed
(I'm inferring based on your regression report):
4K chunk_sectors stacked on larger chunk_sectors, e.g. 256K

My patch fixes the case and doesn't try to innovate, it tries to get
block core back to sane chunk_sectors stacking (which I broke).

> But it's possible I'm misunderstanding the purpose of chunk_sectors,
> or there should be a check that the one of the two devices' chunk
> sizes divides the other.

Seriously not amused by your response, I now have to do damage control
because you have a concern that you really weren't able to communicate
very effectively.

But I got this far, so for your above toy example (stacking 128K and
256K chunk_sectors):
min_not_zero = 128K
gcd = 128K

SO please explain to me why gcd() is better at setting a chunk_sectors
that ensures IO doesn't span 1280K stripesize (nevermind that
chunk_sectors has no meaningful relation to io_opt to begin with!).

Mike


> 
> On Mon, Nov 30, 2020 at 12:18 PM Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > chunk_sectors must reflect the most limited of all devices in the IO
> > stack.
> >
> > Otherwise malformed IO may result. E.g.: prior to this fix,
> > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > required more restrictive 8 sectors.
> >
> > Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
> > Cc: stable@vger.kernel.org
> > Reported-by: John Dorminy <jdorminy@redhat.com>
> > Reported-by: Bruce Johnston <bjohnsto@redhat.com>
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  block/blk-settings.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 9741d1d83e98..1d9decd4646e 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >
> >         t->io_min = max(t->io_min, b->io_min);
> >         t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> > -       t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
> > +
> > +       if (b->chunk_sectors)
> > +               t->chunk_sectors = min_not_zero(t->chunk_sectors,
> > +                                               b->chunk_sectors);
> >
> >         /* Physical block size a multiple of the logical block size? */
> >         if (t->physical_block_size & (t->logical_block_size - 1)) {
> > --
> > 2.15.0
> >
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] block: revert to using min_not_zero() when stacking chunk_sectors
  2020-11-30 23:24   ` [dm-devel] " Mike Snitzer
@ 2020-12-01  0:21     ` John Dorminy
  2020-12-01  2:12       ` Mike Snitzer
  0 siblings, 1 reply; 33+ messages in thread
From: John Dorminy @ 2020-12-01  0:21 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, device-mapper development, Bruce Johnston

> If you're going to cherry pick a portion of a commit header please
> reference the commit id and use quotes or indentation to make it clear
> what is being referenced, etc.
Apologies.

> Quite the tangent just to setup an a toy example of say: thinp with 256K
> blocksize/chunk_sectors ontop of a RAID6 with a chunk_sectors of 128K
> and stripesize of 1280K.

I screwed up my math ... many apologies :/

Consider a thinp of chunk_sectors 512K atop a RAID6 with chunk_sectors 1280K.
(Previously, this RAID6 would be disallowed because chunk_sectors
could only be a power of 2, but 07d098e6bba removed this constraint.)
-With lcm_not_zero(), a full-device IO would be split into 2560K IOs,
which obviously spans both 512K and 1280K chunk boundaries.
-With min_not_zero(), a full-device IO would be split into 512K IOs,
some of which would span 1280k chunk boundaries. For instance, one IO
would span from offset 1024K to 1536K.
-With the hypothetical gcd_not_zero(), a full-device IO would be split
into 256K IOs, which span neither 512K nor 1280K chunk boundaries.

>
> To be clear, you are _not_ saying using lcm_not_zero() is correct.
> You're saying that simply reverting block core back to using
> min_not_zero() may not be as good as using gcd().

Assuming my understanding of chunk_sectors is correct -- which as per
blk-settings.c seems to be "a driver will not receive a bio that spans
a chunk_sector boundary, except in single-page cases" -- I believe
using lcm_not_zero() and min_not_zero() can both violate this
requirement. The current lcm_not_zero() is not correct, but also
reverting block core back to using min_not_zero() leaves edge cases as
above.

I believe gcd provides the requirement, but min_not_zero() +
disallowing non-power-of-2 chunk_sectors also provides the
requirement.

> > But it's possible I'm misunderstanding the purpose of chunk_sectors,
> > or there should be a check that the one of the two devices' chunk
> > sizes divides the other.
>
> Seriously not amused by your response, I now have to do damage control
> because you have a concern that you really weren't able to communicate
> very effectively.

Apologies.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] block: revert to using min_not_zero() when stacking chunk_sectors
  2020-12-01  0:21     ` John Dorminy
@ 2020-12-01  2:12       ` Mike Snitzer
  0 siblings, 0 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-12-01  2:12 UTC (permalink / raw)
  To: John Dorminy, Martin K. Petersen
  Cc: linux-block, device-mapper development, Bruce Johnston

On Mon, Nov 30 2020 at  7:21pm -0500,
John Dorminy <jdorminy@redhat.com> wrote:

> > If you're going to cherry pick a portion of a commit header please
> > reference the commit id and use quotes or indentation to make it clear
> > what is being referenced, etc.
> Apologies.
> 
> > Quite the tangent just to setup an a toy example of say: thinp with 256K
> > blocksize/chunk_sectors ontop of a RAID6 with a chunk_sectors of 128K
> > and stripesize of 1280K.
> 
> I screwed up my math ... many apologies :/
> 
> Consider a thinp of chunk_sectors 512K atop a RAID6 with chunk_sectors 1280K.
> (Previously, this RAID6 would be disallowed because chunk_sectors
> could only be a power of 2, but 07d098e6bba removed this constraint.)

Think you have your example messed up still.  RAID 10+2 with 128K
chunk_sectors, 1280K full stripe (io_opt). Then thinp stacked ontop of
it with chunk_sectors of 1280K was usecase that wasn't supported before.

So stacked chunk_sectors = min_not_zero(128K, 1280K) = 128K

> -With lcm_not_zero(), a full-device IO would be split into 2560K IOs,
> which obviously spans both 512K and 1280K chunk boundaries.

Sure, think we both agree lcm_not_zero() shouldn't be used.

> -With min_not_zero(), a full-device IO would be split into 512K IOs,
> some of which would span 1280k chunk boundaries. For instance, one IO
> would span from offset 1024K to 1536K.

RAID6 with chunk_sectors of 1280K is pretty insane...
And yet you're saying full device IO is 1280K...
So something still isn't adding up.

Anyway, if we run with your example of chunk_sectors (512K, 1280K), yes
there is serious potential for IO to span the RAID6 layer's chunk_sector
boundary.

> -With the hypothetical gcd_not_zero(), a full-device IO would be split
> into 256K IOs, which span neither 512K nor 1280K chunk boundaries.

Yeap, I see.

> > To be clear, you are _not_ saying using lcm_not_zero() is correct.
> > You're saying that simply reverting block core back to using
> > min_not_zero() may not be as good as using gcd().
> 
> Assuming my understanding of chunk_sectors is correct -- which as per
> blk-settings.c seems to be "a driver will not receive a bio that spans
> a chunk_sector boundary, except in single-page cases" -- I believe
> using lcm_not_zero() and min_not_zero() can both violate this
> requirement. The current lcm_not_zero() is not correct, but also
> reverting block core back to using min_not_zero() leaves edge cases as
> above.

But your chunk_sectors (512K, 1280K) example is a misconfigured IO
stack.  Really not sure it worth being concerned about it.

> I believe gcd provides the requirement, but min_not_zero() +
> disallowing non-power-of-2 chunk_sectors also provides the
> requirement.

Kind of on the fence on this... think I'd like to get Martin's take.

Using gcd() instead of min_not_zero() to stack chunk_sectors isn't a big
deal; given the nature of chunk_sectors coupled with it being able to be
a non-power-of-2 _does_ add a new wrinkle.

So you had a valid point all along, just that you made me work pretty
hard to understand you.

> > > But it's possible I'm misunderstanding the purpose of chunk_sectors,
> > > or there should be a check that the one of the two devices' chunk
> > > sizes divides the other.
> >
> > Seriously not amused by your response, I now have to do damage control
> > because you have a concern that you really weren't able to communicate
> > very effectively.
> 
> Apologies.

Eh, I need to build my pain threshold back up.. been away from it all
for more than a week.. ;)

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-11-30 17:18 [dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk_sectors Mike Snitzer
  2020-11-30 20:51 ` John Dorminy
@ 2020-12-01 16:07 ` Mike Snitzer
  2020-12-01 17:43   ` John Dorminy
                     ` (4 more replies)
  1 sibling, 5 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-12-01 16:07 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, dm-devel, bjohnsto, jdorminy, martin.petersen

commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
reflect the most limited of all devices in the IO stack.

Otherwise malformed IO may result. E.g.: prior to this fix,
->chunk_sectors = lcm_not_zero(8, 128) would result in
blk_max_size_offset() splitting IO at 128 sectors rather than the
required more restrictive 8 sectors.

And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
non-power-of-2") care must be taken to properly stack chunk_sectors to
be compatible with the possibility that a non-power-of-2 chunk_sectors
may be stacked. This is why gcd() is used instead of reverting back
to using min_not_zero().

Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
Fixes: 07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-settings.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

v2: use gcd(), instead of min_not_zero(), as suggested by John Dorminy

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 9741d1d83e98..659cdb8a07fe 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 	t->io_min = max(t->io_min, b->io_min);
 	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
-	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
+
+	/* Set non-power-of-2 compatible chunk_sectors boundary */
+	if (b->chunk_sectors)
+		t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
 
 	/* Physical block size a multiple of the logical block size? */
 	if (t->physical_block_size & (t->logical_block_size - 1)) {
-- 
2.15.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
@ 2020-12-01 17:43   ` John Dorminy
  2020-12-01 17:53   ` Jens Axboe
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: John Dorminy @ 2020-12-01 17:43 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, linux-block, device-mapper development,
	Bruce Johnston, Martin K. Petersen

On Tue, Dec 1, 2020 at 11:07 AM Mike Snitzer <snitzer@redhat.com> wrote:
>
> commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> reflect the most limited of all devices in the IO stack.
>
> Otherwise malformed IO may result. E.g.: prior to this fix,
> ->chunk_sectors = lcm_not_zero(8, 128) would result in
> blk_max_size_offset() splitting IO at 128 sectors rather than the
> required more restrictive 8 sectors.
>
> And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
> non-power-of-2") care must be taken to properly stack chunk_sectors to
> be compatible with the possibility that a non-power-of-2 chunk_sectors
> may be stacked. This is why gcd() is used instead of reverting back
> to using min_not_zero().
>
> Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
> Fixes: 07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2")
> Cc: stable@vger.kernel.org
> Reported-by: John Dorminy <jdorminy@redhat.com>
> Reported-by: Bruce Johnston <bjohnsto@redhat.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-settings.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> v2: use gcd(), instead of min_not_zero(), as suggested by John Dorminy
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 9741d1d83e98..659cdb8a07fe 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>
>         t->io_min = max(t->io_min, b->io_min);
>         t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> -       t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
> +
> +       /* Set non-power-of-2 compatible chunk_sectors boundary */
> +       if (b->chunk_sectors)
> +               t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
>
>         /* Physical block size a multiple of the logical block size? */
>         if (t->physical_block_size & (t->logical_block_size - 1)) {
> --
> 2.15.0
>

Reviewed-by: John Dorminy <jdorminy@redhat.com>

Thanks!

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
  2020-12-01 17:43   ` John Dorminy
@ 2020-12-01 17:53   ` Jens Axboe
  2020-12-01 18:02   ` Martin K. Petersen
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Jens Axboe @ 2020-12-01 17:53 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, dm-devel, bjohnsto, jdorminy, martin.petersen

On 12/1/20 9:07 AM, Mike Snitzer wrote:
> commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> reflect the most limited of all devices in the IO stack.
> 
> Otherwise malformed IO may result. E.g.: prior to this fix,
> ->chunk_sectors = lcm_not_zero(8, 128) would result in
> blk_max_size_offset() splitting IO at 128 sectors rather than the
> required more restrictive 8 sectors.
> 
> And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
> non-power-of-2") care must be taken to properly stack chunk_sectors to
> be compatible with the possibility that a non-power-of-2 chunk_sectors
> may be stacked. This is why gcd() is used instead of reverting back
> to using min_not_zero().

Applied for 5.10, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
  2020-12-01 17:43   ` John Dorminy
  2020-12-01 17:53   ` Jens Axboe
@ 2020-12-01 18:02   ` Martin K. Petersen
  2020-12-02  3:38   ` [dm-devel] [PATCH] dm: " Jeffle Xu
  2020-12-03  3:26   ` [dm-devel] [PATCH v2] block: " Ming Lei
  4 siblings, 0 replies; 33+ messages in thread
From: Martin K. Petersen @ 2020-12-01 18:02 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel


Mike,

> And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
> non-power-of-2") care must be taken to properly stack chunk_sectors to
> be compatible with the possibility that a non-power-of-2 chunk_sectors
> may be stacked. This is why gcd() is used instead of reverting back
> to using min_not_zero().

This approach looks fine to me.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dm-devel] [PATCH] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
                     ` (2 preceding siblings ...)
  2020-12-01 18:02   ` Martin K. Petersen
@ 2020-12-02  3:38   ` Jeffle Xu
  2020-12-02  3:38     ` Jeffle Xu
  2020-12-03  3:26   ` [dm-devel] [PATCH v2] block: " Ming Lei
  4 siblings, 1 reply; 33+ messages in thread
From: Jeffle Xu @ 2020-12-02  3:38 UTC (permalink / raw)
  To: snitzer; +Cc: linux-block, joseph.qi, dm-devel

As it said in commit 7e7986f9d3ba ("block: use gcd() to fix
chunk_sectors limit stacking"), chunk_sectors should reflect the most
limited of all devices in the IO stack.

The previous commit only fixes block/blk-settings.c:blk_stack_limits(),
while leaving dm.c:dm_calculate_queue_limits() unfixed.

Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 drivers/md/dm-table.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ce543b761be7..dcc0a27355d7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -22,6 +22,7 @@
 #include <linux/blk-mq.h>
 #include <linux/mount.h>
 #include <linux/dax.h>
+#include <linux/gcd.h>
 
 #define DM_MSG_PREFIX "table"
 
@@ -1457,7 +1458,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
 
 		/* Stack chunk_sectors if target-specific splitting is required */
 		if (ti->max_io_len)
-			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
+			ti_limits.chunk_sectors = gcd(ti->max_io_len,
 							       ti_limits.chunk_sectors);
 		/* Set I/O hints portion of queue limits */
 		if (ti->type->io_hints)
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [dm-devel] [PATCH] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  3:38   ` [dm-devel] [PATCH] dm: " Jeffle Xu
@ 2020-12-02  3:38     ` Jeffle Xu
  2020-12-02  3:57       ` JeffleXu
  0 siblings, 1 reply; 33+ messages in thread
From: Jeffle Xu @ 2020-12-02  3:38 UTC (permalink / raw)
  To: snitzer; +Cc: linux-block, joseph.qi, dm-devel

As it said in commit 7e7986f9d3ba ("block: use gcd() to fix
chunk_sectors limit stacking"), chunk_sectors should reflect the most
limited of all devices in the IO stack.

The previous commit only fixes block/blk-settings.c:blk_stack_limits(),
while leaving dm.c:dm_calculate_queue_limits() unfixed.

Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 drivers/md/dm-table.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ce543b761be7..dcc0a27355d7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -22,6 +22,7 @@
 #include <linux/blk-mq.h>
 #include <linux/mount.h>
 #include <linux/dax.h>
+#include <linux/gcd.h>
 
 #define DM_MSG_PREFIX "table"
 
@@ -1457,7 +1458,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
 
 		/* Stack chunk_sectors if target-specific splitting is required */
 		if (ti->max_io_len)
-			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
+			ti_limits.chunk_sectors = gcd(ti->max_io_len,
 							       ti_limits.chunk_sectors);
 		/* Set I/O hints portion of queue limits */
 		if (ti->type->io_hints)
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  3:38     ` Jeffle Xu
@ 2020-12-02  3:57       ` JeffleXu
  2020-12-02  5:03         ` [dm-devel] " Mike Snitzer
  0 siblings, 1 reply; 33+ messages in thread
From: JeffleXu @ 2020-12-02  3:57 UTC (permalink / raw)
  To: snitzer; +Cc: linux-block, joseph.qi, dm-devel

Actually in terms of this issue, I think the dilemma here is that,
@chunk_sectors of dm device is mainly from two source.

One is that from the underlying devices, which is calculated into one
composed one in blk_stack_limits().

> commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> reflect the most limited of all devices in the IO stack.
> 
> Otherwise malformed IO may result. E.g.: prior to this fix,
> ->chunk_sectors = lcm_not_zero(8, 128) would result in
> blk_max_size_offset() splitting IO at 128 sectors rather than the
> required more restrictive 8 sectors.

For this part, technically I can't agree that 'chunk_sectors must
reflect the most limited of all devices in the IO stack'. Even if the dm
device advertises chunk_sectors of 128K when the limits of two
underlying devices are 8K and 128K, and thus splitting is not done in dm
device phase, the underlying devices will split by themselves.

> @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  
>  	t->io_min = max(t->io_min, b->io_min);
>  	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> -	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
> +
> +	/* Set non-power-of-2 compatible chunk_sectors boundary */
> +	if (b->chunk_sectors)
> +		t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);

This may introduces a regression. Suppose the @chunk_sectors limits of
two underlying devices are 8K and 128K, then @chunk_sectors of dm device
is 8K after the fix. So even when a 128K sized bio is actually
redirecting to the underlying device with 128K @chunk_sectors limit,
this 128K sized bio will actually split into 16 split bios, each 8K
sized。Obviously it is excessive split. And I think this is actually why
lcm_not_zero(a, b) is used originally.

The other one source is dm device itself. DM device can set @max_io_len
through ->io_hint(), and then set @chunk_sectors from @max_io_len.

This part is actually where 'chunk_sectors must reflect the most limited
of all devices in the IO stack' is true, and we have to apply the most
strict limitation here. This is actually what the following patch does.

On 12/2/20 11:38 AM, Jeffle Xu wrote:
> As it said in commit 7e7986f9d3ba ("block: use gcd() to fix
> chunk_sectors limit stacking"), chunk_sectors should reflect the most
> limited of all devices in the IO stack.
> 
> The previous commit only fixes block/blk-settings.c:blk_stack_limits(),
> while leaving dm.c:dm_calculate_queue_limits() unfixed.
> 
> Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
> cc: stable@vger.kernel.org
> Reported-by: John Dorminy <jdorminy@redhat.com>
> Reported-by: Bruce Johnston <bjohnsto@redhat.com>
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  drivers/md/dm-table.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index ce543b761be7..dcc0a27355d7 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -22,6 +22,7 @@
>  #include <linux/blk-mq.h>
>  #include <linux/mount.h>
>  #include <linux/dax.h>
> +#include <linux/gcd.h>
>  
>  #define DM_MSG_PREFIX "table"
>  
> @@ -1457,7 +1458,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
>  
>  		/* Stack chunk_sectors if target-specific splitting is required */
>  		if (ti->max_io_len)
> -			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
> +			ti_limits.chunk_sectors = gcd(ti->max_io_len,
>  							       ti_limits.chunk_sectors);
>  		/* Set I/O hints portion of queue limits */
>  		if (ti->type->io_hints)
> 

-- 
Thanks,
Jeffle

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  3:57       ` JeffleXu
@ 2020-12-02  5:03         ` Mike Snitzer
  2020-12-02  5:14           ` Mike Snitzer
                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-12-02  5:03 UTC (permalink / raw)
  To: JeffleXu; +Cc: linux-block, joseph.qi, dm-devel

What you've done here is fairly chaotic/disruptive:
1) you emailed a patch out that isn't needed or ideal, I dealt already
   staged a DM fix in linux-next for 5.10-rcX, see:
   https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b
2) you replied to your patch and started referencing snippets of this
   other patch's header (now staged for 5.10-rcX via Jens' block tree):
   https://git.kernel.dk/cgit/linux-block/commit/?h=block-5.10&id=7e7986f9d3ba69a7375a41080a1f8c8012cb0923
   - why not reply to _that_ patch in response something stated in it?
3) you started telling me, and others on these lists, why you think I
   used lcm_not_zero().
   - reality is I wanted gcd() behavior, I just didn't reason through
     the math to know it.. it was a stupid oversight on my part.  Not
     designed with precision.
4) Why not check with me before you craft a patch like others reported
   the problem to you? I know it logical to follow the chain of
   implications based on one commit and see where else there might be
   gaps but... it is strange to just pickup someone else's work like
   that.

All just _seems_ weird and overdone. This isn't the kind of help I
need. That said, I _do_ appreciate you looking at making blk IO polling
work with bio-based (and DM's bio splitting in particular), but the
lack of importance you put on DM's splitting below makes me concerned.

On Tue, Dec 01 2020 at 10:57pm -0500,
JeffleXu <jefflexu@linux.alibaba.com> wrote:

> Actually in terms of this issue, I think the dilemma here is that,
> @chunk_sectors of dm device is mainly from two source.
> 
> One is that from the underlying devices, which is calculated into one
> composed one in blk_stack_limits().
> 
> > commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> > chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> > reflect the most limited of all devices in the IO stack.
> > 
> > Otherwise malformed IO may result. E.g.: prior to this fix,
> > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > required more restrictive 8 sectors.
> 
> For this part, technically I can't agree that 'chunk_sectors must
> reflect the most limited of all devices in the IO stack'. Even if the dm
> device advertises chunk_sectors of 128K when the limits of two
> underlying devices are 8K and 128K, and thus splitting is not done in dm
> device phase, the underlying devices will split by themselves.

DM targets themselves _do_ require their own splitting.  You cannot just
assume all IO that passes through DM targets doesn't need to be properly
sized on entry.  Sure underlying devices will do their own splitting,
but those splits are based on their requirements.  DM targets have their
own IO size limits too.  Each layer needs to enforce and respect the
constraints of its layer while also factoring in those of the underlying
devices.

> > @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >  
> >  	t->io_min = max(t->io_min, b->io_min);
> >  	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> > -	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
> > +
> > +	/* Set non-power-of-2 compatible chunk_sectors boundary */
> > +	if (b->chunk_sectors)
> > +		t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
> 
> This may introduces a regression.

Regression relative to what?  5.10 was the regression point.  The commit
header you pasted into your reply clearly conveys that commit
22ada802ede8 caused the regression.  It makes no sense to try to create
some other regression point.  You cannot have both from a single commit
in the most recent Linux 5.10 release.

And so I have no idea why you think that restoring DM's _required_
splitting constraints is somehow a regression.

> Suppose the @chunk_sectors limits of
> two underlying devices are 8K and 128K, then @chunk_sectors of dm device
> is 8K after the fix. So even when a 128K sized bio is actually
> redirecting to the underlying device with 128K @chunk_sectors limit,
> this 128K sized bio will actually split into 16 split bios, each 8K
> sized。Obviously it is excessive split. And I think this is actually why
> lcm_not_zero(a, b) is used originally.

No.  Not excessive splitting, required splitting.  And as I explained in
point 2) above, avoiding "excessive splits" isn't why lcm_not_zero() was
improperly used to stack chunk_sectors.

Some DM targets really do require the IO be split on specific boundaries
-- however inconvenient for the underlying layers that DM splitting
might be.

> The other one source is dm device itself. DM device can set @max_io_len
> through ->io_hint(), and then set @chunk_sectors from @max_io_len.

ti->max_io_len should always be set in the DM target's .ctr

DM core takes care of applying max_io_len to chunk_sectors since 5.10,
you should know that given your patch is meant to fix commit
882ec4e609c1

And for 5.11 I've staged a change to have it impose max_io_len in terms
of ->max_sectors too, see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.11&id=41dcb8f21a86edbe409b2bef9bb1df4cb9d66858

One thing I clearly need to do moving forward is: always post my changes
to dm-devel; just so someone like yourself can follow along via email
client.  I just assumed others who care about DM changes also track the
linux-dm.git tree's branches.  Clearly not the best assumption or
practice on my part.

> This part is actually where 'chunk_sectors must reflect the most limited
> of all devices in the IO stack' is true, and we have to apply the most
> strict limitation here. This is actually what the following patch does.

There is a very consistent and deliberate way that device limits must be
handled, sometimes I too have missteps but that doesn't change the fact
that there is a deliberate evenness to how limits are stacked.
blk_stack_limits() needs to be the authority on how these limits stack
up.  So all DM's limits stacking wraps calls to it.  My fix, shared in
point 1) above, restores that design pattern by _not_ having DM
duplicate a subset of how blk_stack_limits() does its stacking.

Mike

> On 12/2/20 11:38 AM, Jeffle Xu wrote:
> > As it said in commit 7e7986f9d3ba ("block: use gcd() to fix
> > chunk_sectors limit stacking"), chunk_sectors should reflect the most
> > limited of all devices in the IO stack.
> > 
> > The previous commit only fixes block/blk-settings.c:blk_stack_limits(),
> > while leaving dm.c:dm_calculate_queue_limits() unfixed.
> > 
> > Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
> > cc: stable@vger.kernel.org
> > Reported-by: John Dorminy <jdorminy@redhat.com>
> > Reported-by: Bruce Johnston <bjohnsto@redhat.com>
> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > ---
> >  drivers/md/dm-table.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index ce543b761be7..dcc0a27355d7 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/blk-mq.h>
> >  #include <linux/mount.h>
> >  #include <linux/dax.h>
> > +#include <linux/gcd.h>
> >  
> >  #define DM_MSG_PREFIX "table"
> >  
> > @@ -1457,7 +1458,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
> >  
> >  		/* Stack chunk_sectors if target-specific splitting is required */
> >  		if (ti->max_io_len)
> > -			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
> > +			ti_limits.chunk_sectors = gcd(ti->max_io_len,
> >  							       ti_limits.chunk_sectors);
> >  		/* Set I/O hints portion of queue limits */
> >  		if (ti->type->io_hints)
> > 
> 
> -- 
> Thanks,
> Jeffle
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  5:03         ` [dm-devel] " Mike Snitzer
@ 2020-12-02  5:14           ` Mike Snitzer
  2020-12-02  6:31             ` JeffleXu
  2020-12-02  6:28           ` JeffleXu
  2020-12-02  7:10           ` JeffleXu
  2 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-12-02  5:14 UTC (permalink / raw)
  To: JeffleXu; +Cc: linux-block, joseph.qi, dm-devel

On Wed, Dec 02 2020 at 12:03am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> What you've done here is fairly chaotic/disruptive:
> 1) you emailed a patch out that isn't needed or ideal, I dealt already
>    staged a DM fix in linux-next for 5.10-rcX, see:
>    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b
> 2) you replied to your patch and started referencing snippets of this
>    other patch's header (now staged for 5.10-rcX via Jens' block tree):
>    https://git.kernel.dk/cgit/linux-block/commit/?h=block-5.10&id=7e7986f9d3ba69a7375a41080a1f8c8012cb0923
>    - why not reply to _that_ patch in response something stated in it?

I now see you did reply to the original v2 patch:
https://www.redhat.com/archives/dm-devel/2020-December/msg00006.html

but you changed the Subject to have a "dm" prefix for some reason.
Strange but OK.. though it got really weird when you cut-and-paste your
other DM patch in reply at the bottom of your email.  If you find
yourself cross referencing emails and cutting and pasting like that, you
probably shouldn't.  Makes it chaotic for others to follow along.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  5:03         ` [dm-devel] " Mike Snitzer
  2020-12-02  5:14           ` Mike Snitzer
@ 2020-12-02  6:28           ` JeffleXu
  2020-12-02  7:10           ` JeffleXu
  2 siblings, 0 replies; 33+ messages in thread
From: JeffleXu @ 2020-12-02  6:28 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, joseph.qi, dm-devel



On 12/2/20 1:03 PM, Mike Snitzer wrote:
> What you've done here is fairly chaotic/disruptive:
> 1) you emailed a patch out that isn't needed or ideal, I dealt already
>    staged a DM fix in linux-next for 5.10-rcX, see:
>    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b

Fine. I indeed didn't follow linux-dm.git. Sorry it's my fault.

> 2) you replied to your patch and started referencing snippets of this
>    other patch's header (now staged for 5.10-rcX via Jens' block tree):
>    https://git.kernel.dk/cgit/linux-block/commit/?h=block-5.10&id=7e7986f9d3ba69a7375a41080a1f8c8012cb0923
>    - why not reply to _that_ patch in response something stated in it?

I just want to send in one email, which seems obviously improper.

> 3) you started telling me, and others on these lists, why you think I
>    used lcm_not_zero().
>    - reality is I wanted gcd() behavior, I just didn't reason through
>      the math to know it.. it was a stupid oversight on my part.  Not
>      designed with precision.
> 4) Why not check with me before you craft a patch like others reported
>    the problem to you? I know it logical to follow the chain of
>    implications based on one commit and see where else there might be
>    gaps but... it is strange to just pickup someone else's work like
>    that.
> 
> All just _seems_ weird and overdone. This isn't the kind of help I
> need. That said, I _do_ appreciate you looking at making blk IO polling
> work with bio-based (and DM's bio splitting in particular), but the
> lack of importance you put on DM's splitting below makes me concerned.
> 
Though I have noticed this series discussion yesterday, I didn't read it
thoroughly until today. When I noticed there may be one remained issue
(I know it is not now), the patch, that is commit 22ada802ede8 has been
adopt by Jens, so I send out a patch. If there's no Jens' reply, I will
just reply under your mail. That's it. I have to admit that I get
excited when I realized that I could send a patch. But it seems improper
and more likely a misunderstanding. I apologize if I did wrong.


> On Tue, Dec 01 2020 at 10:57pm -0500,
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
>> Actually in terms of this issue, I think the dilemma here is that,
>> @chunk_sectors of dm device is mainly from two source.
>>
>> One is that from the underlying devices, which is calculated into one
>> composed one in blk_stack_limits().
>>
>>> commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
>>> chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
>>> reflect the most limited of all devices in the IO stack.
>>>
>>> Otherwise malformed IO may result. E.g.: prior to this fix,
>>> ->chunk_sectors = lcm_not_zero(8, 128) would result in
>>> blk_max_size_offset() splitting IO at 128 sectors rather than the
>>> required more restrictive 8 sectors.
>>
>> For this part, technically I can't agree that 'chunk_sectors must
>> reflect the most limited of all devices in the IO stack'. Even if the dm
>> device advertises chunk_sectors of 128K when the limits of two
>> underlying devices are 8K and 128K, and thus splitting is not done in dm
>> device phase, the underlying devices will split by themselves.
> 
> DM targets themselves _do_ require their own splitting.  You cannot just
> assume all IO that passes through DM targets doesn't need to be properly
> sized on entry.  Sure underlying devices will do their own splitting,
> but those splits are based on their requirements.  DM targets have their
> own IO size limits too.  Each layer needs to enforce and respect the
> constraints of its layer while also factoring in those of the underlying
> devices.
> 
Got it. Thanks.


>>> @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>>  
>>>  	t->io_min = max(t->io_min, b->io_min);
>>>  	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
>>> -	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
>>> +
>>> +	/* Set non-power-of-2 compatible chunk_sectors boundary */
>>> +	if (b->chunk_sectors)
>>> +		t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
>>
>> This may introduces a regression.
> 
> Regression relative to what?  5.10 was the regression point.  The commit
> header you pasted into your reply clearly conveys that commit
> 22ada802ede8 caused the regression.  It makes no sense to try to create
> some other regression point.  You cannot have both from a single commit
> in the most recent Linux 5.10 release.
> 
> And so I have no idea why you think that restoring DM's _required_
> splitting constraints is somehow a regression.

I mistakenly missed that all these changes are introduced in v5.10.
Sorry for that.

> 
>> Suppose the @chunk_sectors limits of
>> two underlying devices are 8K and 128K, then @chunk_sectors of dm device
>> is 8K after the fix. So even when a 128K sized bio is actually
>> redirecting to the underlying device with 128K @chunk_sectors limit,
>> this 128K sized bio will actually split into 16 split bios, each 8K
>> sized。Obviously it is excessive split. And I think this is actually why
>> lcm_not_zero(a, b) is used originally.
> 
> No.  Not excessive splitting, required splitting.  And as I explained in
> point 2) above, avoiding "excessive splits" isn't why lcm_not_zero() was
> improperly used to stack chunk_sectors.

This is indeed a difference between 5.9 and 5.10. In 5.10 there may be
more small split bios, since a smaller chunk_sectors is applied for the
underlying device with larger chunk_sectors (that is, the underlying
device with 128K chunk_sectors). I can not say that more small split
bios will cause worse performance since I have not tested it.


> 
> Some DM targets really do require the IO be split on specific boundaries
> -- however inconvenient for the underlying layers that DM splitting
> might be.
> 


>> The other one source is dm device itself. DM device can set @max_io_len
>> through ->io_hint(), and then set @chunk_sectors from @max_io_len.
> 
> ti->max_io_len should always be set in the DM target's .ctr
> 
Yes I misremember it.

> DM core takes care of applying max_io_len to chunk_sectors since 5.10,
> you should know that given your patch is meant to fix commit
> 882ec4e609c1
> 
> And for 5.11 I've staged a change to have it impose max_io_len in terms
> of ->max_sectors too, see:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.11&id=41dcb8f21a86edbe409b2bef9bb1df4cb9d66858
> 
Thanks.

> One thing I clearly need to do moving forward is: always post my changes
> to dm-devel; just so someone like yourself can follow along via email
> client.  I just assumed others who care about DM changes also track the
> linux-dm.git tree's branches.  Clearly not the best assumption or
> practice on my part.

I used Linus' tree as my code base, which seems improper...

> 
>> This part is actually where 'chunk_sectors must reflect the most limited
>> of all devices in the IO stack' is true, and we have to apply the most
>> strict limitation here. This is actually what the following patch does.
> 
> There is a very consistent and deliberate way that device limits must be
> handled, sometimes I too have missteps but that doesn't change the fact
> that there is a deliberate evenness to how limits are stacked.
> blk_stack_limits() needs to be the authority on how these limits stack
> up.  So all DM's limits stacking wraps calls to it.  My fix, shared in
> point 1) above, restores that design pattern by _not_ having DM
> duplicate a subset of how blk_stack_limits() does its stacking.
> 
> Mike


-- 
Thanks,
Jeffle

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  5:14           ` Mike Snitzer
@ 2020-12-02  6:31             ` JeffleXu
  2020-12-02  6:35               ` JeffleXu
  0 siblings, 1 reply; 33+ messages in thread
From: JeffleXu @ 2020-12-02  6:31 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, joseph.qi, dm-devel



On 12/2/20 1:14 PM, Mike Snitzer wrote:
> On Wed, Dec 02 2020 at 12:03am -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
>> What you've done here is fairly chaotic/disruptive:
>> 1) you emailed a patch out that isn't needed or ideal, I dealt already
>>    staged a DM fix in linux-next for 5.10-rcX, see:
>>    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b
>> 2) you replied to your patch and started referencing snippets of this
>>    other patch's header (now staged for 5.10-rcX via Jens' block tree):
>>    https://git.kernel.dk/cgit/linux-block/commit/?h=block-5.10&id=7e7986f9d3ba69a7375a41080a1f8c8012cb0923
>>    - why not reply to _that_ patch in response something stated in it?
> 
> I now see you did reply to the original v2 patch:
> https://www.redhat.com/archives/dm-devel/2020-December/msg00006.html
> 
> but you changed the Subject to have a "dm" prefix for some reason.

In my original purpose, this is a new patch, 'dm: XXXXXXXX'. This patch
should coexist with your patch 'block: XXXXXX'.

Can I say that it's totally a mistake ;)


> Strange but OK.. though it got really weird when you cut-and-paste your
> other DM patch in reply at the bottom of your email.  If you find
> yourself cross referencing emails and cutting and pasting like that, you
> probably shouldn't.  Makes it chaotic for others to follow along.
> 
> Thanks,
> Mike
> 

-- 
Thanks,
Jeffle

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  6:31             ` JeffleXu
@ 2020-12-02  6:35               ` JeffleXu
  0 siblings, 0 replies; 33+ messages in thread
From: JeffleXu @ 2020-12-02  6:35 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, joseph.qi, dm-devel



On 12/2/20 2:31 PM, JeffleXu wrote:
> 
> 
> On 12/2/20 1:14 PM, Mike Snitzer wrote:
>> On Wed, Dec 02 2020 at 12:03am -0500,
>> Mike Snitzer <snitzer@redhat.com> wrote:
>>
>>> What you've done here is fairly chaotic/disruptive:
>>> 1) you emailed a patch out that isn't needed or ideal, I dealt already
>>>    staged a DM fix in linux-next for 5.10-rcX, see:
>>>    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b
>>> 2) you replied to your patch and started referencing snippets of this
>>>    other patch's header (now staged for 5.10-rcX via Jens' block tree):
>>>    https://git.kernel.dk/cgit/linux-block/commit/?h=block-5.10&id=7e7986f9d3ba69a7375a41080a1f8c8012cb0923
>>>    - why not reply to _that_ patch in response something stated in it?
>>
>> I now see you did reply to the original v2 patch:
>> https://www.redhat.com/archives/dm-devel/2020-December/msg00006.html
>>
>> but you changed the Subject to have a "dm" prefix for some reason.
> 
> In my original purpose, this is a new patch, 'dm: XXXXXXXX'. This patch
> should coexist with your patch 'block: XXXXXX'.
> 
> Can I say that it's totally a mistake ;)
s/mistake/misunderstanding

> 
> 
>> Strange but OK.. though it got really weird when you cut-and-paste your
>> other DM patch in reply at the bottom of your email.  If you find
>> yourself cross referencing emails and cutting and pasting like that, you
>> probably shouldn't.  Makes it chaotic for others to follow along.
>>
>> Thanks,
>> Mike
>>
> 

-- 
Thanks,
Jeffle

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  5:03         ` [dm-devel] " Mike Snitzer
  2020-12-02  5:14           ` Mike Snitzer
  2020-12-02  6:28           ` JeffleXu
@ 2020-12-02  7:10           ` JeffleXu
  2020-12-02 15:11             ` Mike Snitzer
  2 siblings, 1 reply; 33+ messages in thread
From: JeffleXu @ 2020-12-02  7:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, joseph.qi, dm-devel



On 12/2/20 1:03 PM, Mike Snitzer wrote:
> What you've done here is fairly chaotic/disruptive:
> 1) you emailed a patch out that isn't needed or ideal, I dealt already
>    staged a DM fix in linux-next for 5.10-rcX, see:
>    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b

Then ti->type->io_hints() is still bypassed when type->iterate_devices()
not defined?

-- 
Thanks,
Jeffle

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02  7:10           ` JeffleXu
@ 2020-12-02 15:11             ` Mike Snitzer
  2020-12-03  1:48               ` JeffleXu
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-12-02 15:11 UTC (permalink / raw)
  To: JeffleXu; +Cc: linux-block, joseph.qi, dm-devel

On Wed, Dec 02 2020 at  2:10am -0500,
JeffleXu <jefflexu@linux.alibaba.com> wrote:

> 
> 
> On 12/2/20 1:03 PM, Mike Snitzer wrote:
> > What you've done here is fairly chaotic/disruptive:
> > 1) you emailed a patch out that isn't needed or ideal, I dealt already
> >    staged a DM fix in linux-next for 5.10-rcX, see:
> >    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b
> 
> Then ti->type->io_hints() is still bypassed when type->iterate_devices()
> not defined?

Yes, the stacking of limits really is tightly coupled to device-based
influence.  Hypothetically some DM target that doesn't remap to any data
devices may want to override limits... in practice there isn't a need
for this.  If that changes we can take action to accommodate it.. but I'm
definitely not interested in modifying DM core in this area when there
isn't a demonstrated need.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] dm: use gcd() to fix chunk_sectors limit stacking
  2020-12-02 15:11             ` Mike Snitzer
@ 2020-12-03  1:48               ` JeffleXu
  0 siblings, 0 replies; 33+ messages in thread
From: JeffleXu @ 2020-12-03  1:48 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, joseph.qi, dm-devel



On 12/2/20 11:11 PM, Mike Snitzer wrote:
> On Wed, Dec 02 2020 at  2:10am -0500,
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
>>
>>
>> On 12/2/20 1:03 PM, Mike Snitzer wrote:
>>> What you've done here is fairly chaotic/disruptive:
>>> 1) you emailed a patch out that isn't needed or ideal, I dealt already
>>>    staged a DM fix in linux-next for 5.10-rcX, see:
>>>    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=f28de262ddf09b635095bdeaf0e07ff507b3c41b
>>
>> Then ti->type->io_hints() is still bypassed when type->iterate_devices()
>> not defined?
> 
> Yes, the stacking of limits really is tightly coupled to device-based
> influence.  Hypothetically some DM target that doesn't remap to any data
> devices may want to override limits... in practice there isn't a need
> for this.  If that changes we can take action to accommodate it.. but I'm
> definitely not interested in modifying DM core in this area when there
> isn't a demonstrated need.

Thanks.

-- 
Thanks,
Jeffle

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
                     ` (3 preceding siblings ...)
  2020-12-02  3:38   ` [dm-devel] [PATCH] dm: " Jeffle Xu
@ 2020-12-03  3:26   ` Ming Lei
  2020-12-03 14:33     ` Mike Snitzer
  4 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2020-12-03  3:26 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel

On Tue, Dec 01, 2020 at 11:07:09AM -0500, Mike Snitzer wrote:
> commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> reflect the most limited of all devices in the IO stack.
> 
> Otherwise malformed IO may result. E.g.: prior to this fix,
> ->chunk_sectors = lcm_not_zero(8, 128) would result in
> blk_max_size_offset() splitting IO at 128 sectors rather than the
> required more restrictive 8 sectors.

What is the user-visible result of splitting IO at 128 sectors?

I understand it isn't related with correctness, because the underlying
queue can split by its own chunk_sectors limit further. So is the issue
too many further-splitting on queue with chunk_sectors 8? then CPU
utilization is increased? Or other issue?

> 
> And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
> non-power-of-2") care must be taken to properly stack chunk_sectors to
> be compatible with the possibility that a non-power-of-2 chunk_sectors
> may be stacked. This is why gcd() is used instead of reverting back
> to using min_not_zero().

I guess gcd() won't be better because gcd(a,b) is <= max(a, b), so bio
size is decreased much with gcd(a, b), and IO performance should be affected.
Maybe worse than min_not_zero(a, b) which is often > gcd(a, b).


Thanks,
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-03  3:26   ` [dm-devel] [PATCH v2] block: " Ming Lei
@ 2020-12-03 14:33     ` Mike Snitzer
  2020-12-03 16:27       ` Keith Busch
  2020-12-04  1:12       ` Ming Lei
  0 siblings, 2 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-12-03 14:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel

On Wed, Dec 02 2020 at 10:26pm -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Tue, Dec 01, 2020 at 11:07:09AM -0500, Mike Snitzer wrote:
> > commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> > chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> > reflect the most limited of all devices in the IO stack.
> > 
> > Otherwise malformed IO may result. E.g.: prior to this fix,
> > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > required more restrictive 8 sectors.
> 
> What is the user-visible result of splitting IO at 128 sectors?

The VDO dm target fails because it requires IO it receives to be split
as it advertised (8 sectors).

> I understand it isn't related with correctness, because the underlying
> queue can split by its own chunk_sectors limit further. So is the issue
> too many further-splitting on queue with chunk_sectors 8? then CPU
> utilization is increased? Or other issue?

No, this is all about correctness.

Seems you're confining the definition of the possible stacking so that
the top-level device isn't allowed to have its own hard requirements on
IO sizes it sends to its internal implementation.  Just because the
underlying device can split further doesn't mean that the top-level
virtual driver can service larger IO sizes (not if the chunk_sectors
stacking throws away the hint the virtual driver provided because it
used lcm_not_zero).

> > And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
> > non-power-of-2") care must be taken to properly stack chunk_sectors to
> > be compatible with the possibility that a non-power-of-2 chunk_sectors
> > may be stacked. This is why gcd() is used instead of reverting back
> > to using min_not_zero().
> 
> I guess gcd() won't be better because gcd(a,b) is <= max(a, b), so bio
> size is decreased much with gcd(a, b), and IO performance should be affected.
> Maybe worse than min_not_zero(a, b) which is often > gcd(a, b).

Doesn't matter, it is about correctness.

We cannot stack up a chunk_sectors that violates requirements of a given
layer.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-03 14:33     ` Mike Snitzer
@ 2020-12-03 16:27       ` Keith Busch
  2020-12-03 17:56         ` Mike Snitzer
  2020-12-04  1:45         ` Ming Lei
  2020-12-04  1:12       ` Ming Lei
  1 sibling, 2 replies; 33+ messages in thread
From: Keith Busch @ 2020-12-03 16:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, Ming Lei,
	linux-block, dm-devel

On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> On Wed, Dec 02 2020 at 10:26pm -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > I understand it isn't related with correctness, because the underlying
> > queue can split by its own chunk_sectors limit further. So is the issue
> > too many further-splitting on queue with chunk_sectors 8? then CPU
> > utilization is increased? Or other issue?
> 
> No, this is all about correctness.
> 
> Seems you're confining the definition of the possible stacking so that
> the top-level device isn't allowed to have its own hard requirements on
> IO sizes it sends to its internal implementation.  Just because the
> underlying device can split further doesn't mean that the top-level
> virtual driver can service larger IO sizes (not if the chunk_sectors
> stacking throws away the hint the virtual driver provided because it
> used lcm_not_zero).

I may be missing something obvious here, but if the lower layers split
to their desired boundary already, why does this limit need to stack?
Won't it also work if each layer sets their desired chunk_sectors
without considering their lower layers? The commit that initially
stacked chunk_sectors doesn't provide any explanation.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-03 16:27       ` Keith Busch
@ 2020-12-03 17:56         ` Mike Snitzer
  2020-12-04  1:45         ` Ming Lei
  1 sibling, 0 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-12-03 17:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, Ming Lei,
	linux-block, dm-devel

On Thu, Dec 03 2020 at 11:27am -0500,
Keith Busch <kbusch@kernel.org> wrote:

> On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > On Wed, Dec 02 2020 at 10:26pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > I understand it isn't related with correctness, because the underlying
> > > queue can split by its own chunk_sectors limit further. So is the issue
> > > too many further-splitting on queue with chunk_sectors 8? then CPU
> > > utilization is increased? Or other issue?
> > 
> > No, this is all about correctness.
> > 
> > Seems you're confining the definition of the possible stacking so that
> > the top-level device isn't allowed to have its own hard requirements on
> > IO sizes it sends to its internal implementation.  Just because the
> > underlying device can split further doesn't mean that the top-level
> > virtual driver can service larger IO sizes (not if the chunk_sectors
> > stacking throws away the hint the virtual driver provided because it
> > used lcm_not_zero).
> 
> I may be missing something obvious here, but if the lower layers split
> to their desired boundary already, why does this limit need to stack?

The problematic scenario is when the topmost layer, or layers, are the
more constrained.  _That_ is why the top-level's chunk_sectors limit
cannot be relaxed.

For example (in extreme where chunk_sectors is stacked via gcd):
dm VDO target (chunk_sectors=4K)
on dm-thin (ideally chunk_sectors=1280K, reality chunk_sectors=128K)
on 10+2 RAID6 (chunk_sectors=128K, io_opt=1280K)
on raid members (chunk_sectors=0)

Results in the following bottom up blk_stack_limits() stacking:
gcd(128K, 0) = 128K -> but MD just sets chunk_sectors, no stacking is done afaik
gcd(1280K, 128K) = 128K -> this one hurts dm-thin, needless splitting
gcd(4K, 128K) = 4K -> vdo _must_ receive 4K IOs, hurts but "this is the way" ;)

So this is one extreme that shows stacking chunk_sectors is _not_
helpful (if the resulting chunk_sectors were actually used as basis for
splitting).  Better for each layer to just impose its own chunk_sectors
without concern for the layers below.

Think I'd be fine with block core removing the chunk_sectors stacking
from blk_stack_limits()...
(and as you see below, I've been forced to revert to _not_ using stacked
chunk_sectors based splitting in DM)

> Won't it also work if each layer sets their desired chunk_sectors
> without considering their lower layers? The commit that initially
> stacked chunk_sectors doesn't provide any explanation.

Yes, I think it would work.  The current stacking doesn't have the
luxury of knowing which layer a blk_stack_limits() maps too.  BUT within
a layer chunk_sectors really does need to be compatible/symbiotic.  So
it is unfortunately all or nothing as you build up the stack.

And that all-or-nothing stacking of chunk_sectors is why I've now (just
last night, based on further review by jdorminy) had to punt on using
stacked chunk_sectors and revert DM back to doing its own fine-grained
(and varied) splitting on a per DM target basis, see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=6bb38bcc33bf3093c08bd1b71e4f20c82bb60dd1

Kind of depressing that I went so far down the rabbit hole, of wanting
to lean on block core, that I lost sight of an important "tenet of DM":

+         * Does the target need to split IO even further?
+         * - varied (per target) IO splitting is a tenet of DM; this
+         *   explains why stacked chunk_sectors based splitting via
+         *   blk_max_size_offset() isn't possible here.

And it is because of this that DM is forced to lean on human creation of
an optimal IO stack.. which is prone to human error when a particular
thinp "blocksize" is selected, etc.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-03 14:33     ` Mike Snitzer
  2020-12-03 16:27       ` Keith Busch
@ 2020-12-04  1:12       ` Ming Lei
  2020-12-04  2:03         ` Mike Snitzer
  1 sibling, 1 reply; 33+ messages in thread
From: Ming Lei @ 2020-12-04  1:12 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel

On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> On Wed, Dec 02 2020 at 10:26pm -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Tue, Dec 01, 2020 at 11:07:09AM -0500, Mike Snitzer wrote:
> > > commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> > > chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> > > reflect the most limited of all devices in the IO stack.
> > > 
> > > Otherwise malformed IO may result. E.g.: prior to this fix,
> > > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > > required more restrictive 8 sectors.
> > 
> > What is the user-visible result of splitting IO at 128 sectors?
> 
> The VDO dm target fails because it requires IO it receives to be split
> as it advertised (8 sectors).

OK, looks VDO's chunk_sector limit is one hard constraint, even though it
is one DM device, so I guess you are talking about DM over VDO?

Another reason should be that VDO doesn't use blk_queue_split(), otherwise it
won't be a trouble, right?

Frankly speaking, if the stacking driver/device has its own hard queue limit
like normal hardware drive, the driver should be responsible for the splitting.

> 
> > I understand it isn't related with correctness, because the underlying
> > queue can split by its own chunk_sectors limit further. So is the issue
> > too many further-splitting on queue with chunk_sectors 8? then CPU
> > utilization is increased? Or other issue?
> 
> No, this is all about correctness.
> 
> Seems you're confining the definition of the possible stacking so that
> the top-level device isn't allowed to have its own hard requirements on

I just don't know this story, thanks for your clarification.

As I mentioned above, if the stacking driver has its own hard queue
limit, it should be the driver's responsibility to respect it via
blk_queue_split() or whatever.


Thanks,
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-03 16:27       ` Keith Busch
  2020-12-03 17:56         ` Mike Snitzer
@ 2020-12-04  1:45         ` Ming Lei
  2020-12-04  2:11           ` Mike Snitzer
  1 sibling, 1 reply; 33+ messages in thread
From: Ming Lei @ 2020-12-04  1:45 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, martin.petersen, jdorminy, Mike Snitzer, bjohnsto,
	linux-block, dm-devel

On Thu, Dec 03, 2020 at 08:27:38AM -0800, Keith Busch wrote:
> On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > On Wed, Dec 02 2020 at 10:26pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > I understand it isn't related with correctness, because the underlying
> > > queue can split by its own chunk_sectors limit further. So is the issue
> > > too many further-splitting on queue with chunk_sectors 8? then CPU
> > > utilization is increased? Or other issue?
> > 
> > No, this is all about correctness.
> > 
> > Seems you're confining the definition of the possible stacking so that
> > the top-level device isn't allowed to have its own hard requirements on
> > IO sizes it sends to its internal implementation.  Just because the
> > underlying device can split further doesn't mean that the top-level
> > virtual driver can service larger IO sizes (not if the chunk_sectors
> > stacking throws away the hint the virtual driver provided because it
> > used lcm_not_zero).
> 
> I may be missing something obvious here, but if the lower layers split
> to their desired boundary already, why does this limit need to stack?
> Won't it also work if each layer sets their desired chunk_sectors
> without considering their lower layers? The commit that initially
> stacked chunk_sectors doesn't provide any explanation.

There could be several reasons:

1) some limits have to be stacking, such as logical block size, because
lower layering may not handle un-aligned IO

2) performance reason, if every limits are stacked on topmost layer, in
theory IO just needs to be splitted in top layer, and not need to be
splitted further from all lower layer at all. But there should be exceptions
in unusual case, such as, lowering queue's limit changed after the stacking
limits are setup.

3) history reason, bio splitting is much younger than stacking queue
limits.

Maybe others?


Thanks,
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-04  1:12       ` Ming Lei
@ 2020-12-04  2:03         ` Mike Snitzer
  2020-12-04  3:59           ` Ming Lei
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-12-04  2:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel

On Thu, Dec 03 2020 at  8:12pm -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > On Wed, Dec 02 2020 at 10:26pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > On Tue, Dec 01, 2020 at 11:07:09AM -0500, Mike Snitzer wrote:
> > > > commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> > > > chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> > > > reflect the most limited of all devices in the IO stack.
> > > > 
> > > > Otherwise malformed IO may result. E.g.: prior to this fix,
> > > > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > > > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > > > required more restrictive 8 sectors.
> > > 
> > > What is the user-visible result of splitting IO at 128 sectors?
> > 
> > The VDO dm target fails because it requires IO it receives to be split
> > as it advertised (8 sectors).
> 
> OK, looks VDO's chunk_sector limit is one hard constraint, even though it
> is one DM device, so I guess you are talking about DM over VDO?
> 
> Another reason should be that VDO doesn't use blk_queue_split(), otherwise it
> won't be a trouble, right?
> 
> Frankly speaking, if the stacking driver/device has its own hard queue limit
> like normal hardware drive, the driver should be responsible for the splitting.

DM core does the splitting for VDO (just like any other DM target).
In 5.9 I updated DM to use chunk_sectors, use blk_stack_limits()
stacking of it, and also use blk_max_size_offset().

But all that block core code has shown itself to be too rigid for DM.  I
tried to force the issue by stacking DM targets' ti->max_io_len with
chunk_sectors.  But really I'd need to be able to pass in the per-target
max_io_len to blk_max_size_offset() to salvage using it.

Stacking chunk_sectors seems ill-conceived.  One size-fits-all splitting
is too rigid.

> > > I understand it isn't related with correctness, because the underlying
> > > queue can split by its own chunk_sectors limit further. So is the issue
> > > too many further-splitting on queue with chunk_sectors 8? then CPU
> > > utilization is increased? Or other issue?
> > 
> > No, this is all about correctness.
> > 
> > Seems you're confining the definition of the possible stacking so that
> > the top-level device isn't allowed to have its own hard requirements on
> 
> I just don't know this story, thanks for your clarification.
> 
> As I mentioned above, if the stacking driver has its own hard queue
> limit, it should be the driver's responsibility to respect it via
> blk_queue_split() or whatever.

Again, DM does its own splitting... that aspect of it isn't an issue.
The problem is the basis for splitting cannot be the stacked up
chunk_sectors.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-04  1:45         ` Ming Lei
@ 2020-12-04  2:11           ` Mike Snitzer
  2020-12-04  6:22             ` Damien Le Moal
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-12-04  2:11 UTC (permalink / raw)
  To: Ming Lei, hare
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block,
	dm-devel, Keith Busch

On Thu, Dec 03 2020 at  8:45pm -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Thu, Dec 03, 2020 at 08:27:38AM -0800, Keith Busch wrote:
> > On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > > On Wed, Dec 02 2020 at 10:26pm -0500,
> > > Ming Lei <ming.lei@redhat.com> wrote:
> > > 
> > > > I understand it isn't related with correctness, because the underlying
> > > > queue can split by its own chunk_sectors limit further. So is the issue
> > > > too many further-splitting on queue with chunk_sectors 8? then CPU
> > > > utilization is increased? Or other issue?
> > > 
> > > No, this is all about correctness.
> > > 
> > > Seems you're confining the definition of the possible stacking so that
> > > the top-level device isn't allowed to have its own hard requirements on
> > > IO sizes it sends to its internal implementation.  Just because the
> > > underlying device can split further doesn't mean that the top-level
> > > virtual driver can service larger IO sizes (not if the chunk_sectors
> > > stacking throws away the hint the virtual driver provided because it
> > > used lcm_not_zero).
> > 
> > I may be missing something obvious here, but if the lower layers split
> > to their desired boundary already, why does this limit need to stack?
> > Won't it also work if each layer sets their desired chunk_sectors
> > without considering their lower layers? The commit that initially
> > stacked chunk_sectors doesn't provide any explanation.
> 
> There could be several reasons:
> 
> 1) some limits have to be stacking, such as logical block size, because
> lower layering may not handle un-aligned IO
> 
> 2) performance reason, if every limits are stacked on topmost layer, in
> theory IO just needs to be splitted in top layer, and not need to be
> splitted further from all lower layer at all. But there should be exceptions
> in unusual case, such as, lowering queue's limit changed after the stacking
> limits are setup.
> 
> 3) history reason, bio splitting is much younger than stacking queue
> limits.
> 
> Maybe others?

Hannes didn't actually justify why he added chunk_sectors to
blk_stack_limits:

commit 987b3b26eb7b19960160505faf9b2f50ae77e14d
Author: Hannes Reinecke <hare@suse.de>
Date:   Tue Oct 18 15:40:31 2016 +0900

    block: update chunk_sectors in blk_stack_limits()

    Signed-off-by: Hannes Reinecke <hare@suse.com>
    Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
    Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>
    Signed-off-by: Jens Axboe <axboe@fb.com>

Likely felt it needed for zoned or NVMe devices.. dunno.

But given how we now have a model where block core, or DM core, will
split as needed I don't think normalizing chunk_sectors (to the degree
full use of blk_stack_limits does) and than using it as basis for
splitting makes a lot of sense.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-04  2:03         ` Mike Snitzer
@ 2020-12-04  3:59           ` Ming Lei
  2020-12-04 16:47             ` Mike Snitzer
  0 siblings, 1 reply; 33+ messages in thread
From: Ming Lei @ 2020-12-04  3:59 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel

On Thu, Dec 03, 2020 at 09:03:43PM -0500, Mike Snitzer wrote:
> On Thu, Dec 03 2020 at  8:12pm -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > > On Wed, Dec 02 2020 at 10:26pm -0500,
> > > Ming Lei <ming.lei@redhat.com> wrote:
> > > 
> > > > On Tue, Dec 01, 2020 at 11:07:09AM -0500, Mike Snitzer wrote:
> > > > > commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> > > > > chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> > > > > reflect the most limited of all devices in the IO stack.
> > > > > 
> > > > > Otherwise malformed IO may result. E.g.: prior to this fix,
> > > > > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > > > > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > > > > required more restrictive 8 sectors.
> > > > 
> > > > What is the user-visible result of splitting IO at 128 sectors?
> > > 
> > > The VDO dm target fails because it requires IO it receives to be split
> > > as it advertised (8 sectors).
> > 
> > OK, looks VDO's chunk_sector limit is one hard constraint, even though it
> > is one DM device, so I guess you are talking about DM over VDO?
> > 
> > Another reason should be that VDO doesn't use blk_queue_split(), otherwise it
> > won't be a trouble, right?
> > 
> > Frankly speaking, if the stacking driver/device has its own hard queue limit
> > like normal hardware drive, the driver should be responsible for the splitting.
> 
> DM core does the splitting for VDO (just like any other DM target).
> In 5.9 I updated DM to use chunk_sectors, use blk_stack_limits()
> stacking of it, and also use blk_max_size_offset().
> 
> But all that block core code has shown itself to be too rigid for DM.  I
> tried to force the issue by stacking DM targets' ti->max_io_len with
> chunk_sectors.  But really I'd need to be able to pass in the per-target
> max_io_len to blk_max_size_offset() to salvage using it.
> 
> Stacking chunk_sectors seems ill-conceived.  One size-fits-all splitting
> is too rigid.

DM/VDO knows exactly it is one hard chunk_sectors limit, and DM shouldn't play
the stacking trick on VDO's chunk_sectors limit, should it?


Thanks, 
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-04  2:11           ` Mike Snitzer
@ 2020-12-04  6:22             ` Damien Le Moal
  0 siblings, 0 replies; 33+ messages in thread
From: Damien Le Moal @ 2020-12-04  6:22 UTC (permalink / raw)
  To: Mike Snitzer, Ming Lei, hare
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block,
	dm-devel, Keith Busch

On 2020/12/04 11:11, Mike Snitzer wrote:
> On Thu, Dec 03 2020 at  8:45pm -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
>> On Thu, Dec 03, 2020 at 08:27:38AM -0800, Keith Busch wrote:
>>> On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
>>>> On Wed, Dec 02 2020 at 10:26pm -0500,
>>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>>
>>>>> I understand it isn't related with correctness, because the underlying
>>>>> queue can split by its own chunk_sectors limit further. So is the issue
>>>>> too many further-splitting on queue with chunk_sectors 8? then CPU
>>>>> utilization is increased? Or other issue?
>>>>
>>>> No, this is all about correctness.
>>>>
>>>> Seems you're confining the definition of the possible stacking so that
>>>> the top-level device isn't allowed to have its own hard requirements on
>>>> IO sizes it sends to its internal implementation.  Just because the
>>>> underlying device can split further doesn't mean that the top-level
>>>> virtual driver can service larger IO sizes (not if the chunk_sectors
>>>> stacking throws away the hint the virtual driver provided because it
>>>> used lcm_not_zero).
>>>
>>> I may be missing something obvious here, but if the lower layers split
>>> to their desired boundary already, why does this limit need to stack?
>>> Won't it also work if each layer sets their desired chunk_sectors
>>> without considering their lower layers? The commit that initially
>>> stacked chunk_sectors doesn't provide any explanation.
>>
>> There could be several reasons:
>>
>> 1) some limits have to be stacking, such as logical block size, because
>> lower layering may not handle un-aligned IO
>>
>> 2) performance reason, if every limits are stacked on topmost layer, in
>> theory IO just needs to be splitted in top layer, and not need to be
>> splitted further from all lower layer at all. But there should be exceptions
>> in unusual case, such as, lowering queue's limit changed after the stacking
>> limits are setup.
>>
>> 3) history reason, bio splitting is much younger than stacking queue
>> limits.
>>
>> Maybe others?
> 
> Hannes didn't actually justify why he added chunk_sectors to
> blk_stack_limits:
> 
> commit 987b3b26eb7b19960160505faf9b2f50ae77e14d
> Author: Hannes Reinecke <hare@suse.de>
> Date:   Tue Oct 18 15:40:31 2016 +0900
> 
>     block: update chunk_sectors in blk_stack_limits()
> 
>     Signed-off-by: Hannes Reinecke <hare@suse.com>
>     Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>     Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>     Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>     Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>     Signed-off-by: Jens Axboe <axboe@fb.com>
> 
> Likely felt it needed for zoned or NVMe devices.. dunno.

For zoned drives, chunk_sectors indicates the zone size so the stacking
propagates that value to the upper layer, if said layer is also zoned. If it is
not zoned (e.g. dm-zoned device), chunk_sectors can actually be 0: it would be
the responsibility of that layer to not issue BIO that cross zone boundaries to
the lower zoned layer. Since all of this depends on the upper layer zoned model,
removing the stacking of chunk_sectors would be fine, as long as the target
initialization code sets it based on the drive model being exposed. E.g.:
* dm-linear on zoned dev will be zoned with the same zone size
* dm-zoned on zoned dev is not zoned, so chunk_sectors can be 0
* dm-linear on RAID volume can have chunk_sectors set to the underlying volume
chunk_sectors (stripe size), if dm-linear is aligned to stripes.
* etc.

> But given how we now have a model where block core, or DM core, will
> split as needed I don't think normalizing chunk_sectors (to the degree
> full use of blk_stack_limits does) and than using it as basis for
> splitting makes a lot of sense.

For zoned dev, I agree. DM-core can set chunk_sectors for the DM device based on
its zone model for DM driver that supports zones (dm-linear, dm-flakey and
dm-zoned).

-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
  2020-12-04  3:59           ` Ming Lei
@ 2020-12-04 16:47             ` Mike Snitzer
  2020-12-04 17:32               ` [dm-devel] [RFC PATCH] dm: fix IO splitting [was: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking] Mike Snitzer
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-12-04 16:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, martin.petersen, jdorminy, bjohnsto, linux-block, dm-devel

On Thu, Dec 03 2020 at 10:59pm -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Thu, Dec 03, 2020 at 09:03:43PM -0500, Mike Snitzer wrote:
> > On Thu, Dec 03 2020 at  8:12pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > > > On Wed, Dec 02 2020 at 10:26pm -0500,
> > > > Ming Lei <ming.lei@redhat.com> wrote:
> > > > 
> > > > > On Tue, Dec 01, 2020 at 11:07:09AM -0500, Mike Snitzer wrote:
> > > > > > commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> > > > > > chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> > > > > > reflect the most limited of all devices in the IO stack.
> > > > > > 
> > > > > > Otherwise malformed IO may result. E.g.: prior to this fix,
> > > > > > ->chunk_sectors = lcm_not_zero(8, 128) would result in
> > > > > > blk_max_size_offset() splitting IO at 128 sectors rather than the
> > > > > > required more restrictive 8 sectors.
> > > > > 
> > > > > What is the user-visible result of splitting IO at 128 sectors?
> > > > 
> > > > The VDO dm target fails because it requires IO it receives to be split
> > > > as it advertised (8 sectors).
> > > 
> > > OK, looks VDO's chunk_sector limit is one hard constraint, even though it
> > > is one DM device, so I guess you are talking about DM over VDO?
> > > 
> > > Another reason should be that VDO doesn't use blk_queue_split(), otherwise it
> > > won't be a trouble, right?
> > > 
> > > Frankly speaking, if the stacking driver/device has its own hard queue limit
> > > like normal hardware drive, the driver should be responsible for the splitting.
> > 
> > DM core does the splitting for VDO (just like any other DM target).
> > In 5.9 I updated DM to use chunk_sectors, use blk_stack_limits()
> > stacking of it, and also use blk_max_size_offset().
> > 
> > But all that block core code has shown itself to be too rigid for DM.  I
> > tried to force the issue by stacking DM targets' ti->max_io_len with
> > chunk_sectors.  But really I'd need to be able to pass in the per-target
> > max_io_len to blk_max_size_offset() to salvage using it.
> > 
> > Stacking chunk_sectors seems ill-conceived.  One size-fits-all splitting
> > is too rigid.
> 
> DM/VDO knows exactly it is one hard chunk_sectors limit, and DM shouldn't play
> the stacking trick on VDO's chunk_sectors limit, should it?

Feel like I already answered this in detail but... correct, DM cannot
and should not use stacked chunk_sectors as basis for splitting.

Up until 5.9, where I changed DM core to set and then use chunk_sectors
for splitting via blk_max_size_offset(), DM only used its own per-target
ti->max_io_len in drivers/md/dm.c:max_io_len().

But I reverted back to DM's pre-5.9 splitting in this stable@ fix that
I'll be sending to Linus today for 5.10-rcX:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=6bb38bcc33bf3093c08bd1b71e4f20c82bb60dd1

DM is now back to pre-5.9 behavior where it doesn't even consider
chunk_sectors for splitting (NOTE: dm-zoned sets ti->max_io_len though
so it is effectively achieves the same boundary splits via max_io_len).

With that baseline established, what I'm now saying is: if DM, the most
common limits stacking consumer, cannot benefit from stacked
chunk_sectors then what stacked device does benefit?  Could be block
core's stacked chunk_sectors based splitting is good enough for others,
just not yet seeing how.  Feels like it predates blk_queue_split() and
the stacking of chunk_sectors could/should be removed now.

All said, I'm fine with leaving stacked chunk_sectors for others to care
about... think I've raised enough awareness on this topic now ;)

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dm-devel] [RFC PATCH] dm: fix IO splitting [was: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking]
  2020-12-04 16:47             ` Mike Snitzer
@ 2020-12-04 17:32               ` Mike Snitzer
  2020-12-04 17:49                 ` Mike Snitzer
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Snitzer @ 2020-12-04 17:32 UTC (permalink / raw)
  To: Ming Lei, axboe
  Cc: linux-block, dm-devel, bjohnsto, jdorminy, martin.petersen

On Fri, Dec 04 2020 at 11:47P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Dec 03 2020 at 10:59pm -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Thu, Dec 03, 2020 at 09:03:43PM -0500, Mike Snitzer wrote:
> > > Stacking chunk_sectors seems ill-conceived.  One size-fits-all splitting
> > > is too rigid.
> > 
> > DM/VDO knows exactly it is one hard chunk_sectors limit, and DM shouldn't play
> > the stacking trick on VDO's chunk_sectors limit, should it?
> 
> Feel like I already answered this in detail but... correct, DM cannot
> and should not use stacked chunk_sectors as basis for splitting.
> 
> Up until 5.9, where I changed DM core to set and then use chunk_sectors
> for splitting via blk_max_size_offset(), DM only used its own per-target
> ti->max_io_len in drivers/md/dm.c:max_io_len().
> 
> But I reverted back to DM's pre-5.9 splitting in this stable@ fix that
> I'll be sending to Linus today for 5.10-rcX:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=6bb38bcc33bf3093c08bd1b71e4f20c82bb60dd1
> 
> DM is now back to pre-5.9 behavior where it doesn't even consider
> chunk_sectors for splitting (NOTE: dm-zoned sets ti->max_io_len though
> so it is effectively achieves the same boundary splits via max_io_len).

Last question for all, I'd be fine with the following fix instead of
the above referenced commit 6bb38bcc33. It'd allow DM to continue to
use blk_max_size_offset(), any opinions?

From: Mike Snitzer <snitzer@redhat.com>
Date: Fri, 4 Dec 2020 12:03:25 -0500
Subject: [RFC PATCH] dm: fix IO splitting

FIXME: add proper header
Add chunk_sectors override to blk_max_size_offset().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-merge.c      |  2 +-
 drivers/md/dm-table.c  |  5 -----
 drivers/md/dm.c        | 19 +++++++++++--------
 include/linux/blkdev.h |  9 +++++----
 4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index bcf5e4580603..97b7c2821565 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -144,7 +144,7 @@ static struct bio *blk_bio_write_same_split(struct request_queue *q,
 static inline unsigned get_max_io_size(struct request_queue *q,
 				       struct bio *bio)
 {
-	unsigned sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
+	unsigned sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector, 0);
 	unsigned max_sectors = sectors;
 	unsigned pbs = queue_physical_block_size(q) >> SECTOR_SHIFT;
 	unsigned lbs = queue_logical_block_size(q) >> SECTOR_SHIFT;
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 2073ee8d18f4..7eeb7c4169c9 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -18,7 +18,6 @@
 #include <linux/mutex.h>
 #include <linux/delay.h>
 #include <linux/atomic.h>
-#include <linux/lcm.h>
 #include <linux/blk-mq.h>
 #include <linux/mount.h>
 #include <linux/dax.h>
@@ -1449,10 +1448,6 @@ int dm_calculate_queue_limits(struct dm_table *table,
 			zone_sectors = ti_limits.chunk_sectors;
 		}
 
-		/* Stack chunk_sectors if target-specific splitting is required */
-		if (ti->max_io_len)
-			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
-							       ti_limits.chunk_sectors);
 		/* Set I/O hints portion of queue limits */
 		if (ti->type->io_hints)
 			ti->type->io_hints(ti, &ti_limits);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 98866e725f25..f7eb3d2964f3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1039,15 +1039,18 @@ static sector_t max_io_len(struct dm_target *ti, sector_t sector)
 	sector_t max_len;
 
 	/*
-	 * Does the target need to split even further?
-	 * - q->limits.chunk_sectors reflects ti->max_io_len so
-	 *   blk_max_size_offset() provides required splitting.
-	 * - blk_max_size_offset() also respects q->limits.max_sectors
+	 * Does the target need to split IO even further?
+	 * - varied (per target) IO splitting is a tenet of DM; this
+	 *   explains why stacked chunk_sectors based splitting via
+	 *   blk_max_size_offset() isn't possible here. So pass in
+	 *   ti->max_io_len to override stacked chunk_sectors.
 	 */
-	max_len = blk_max_size_offset(ti->table->md->queue,
-				      target_offset);
-	if (len > max_len)
-		len = max_len;
+	if (ti->max_io_len) {
+		max_len = blk_max_size_offset(ti->table->md->queue,
+					      target_offset, ti->max_io_len);
+		if (len > max_len)
+			len = max_len;
+	}
 
 	return len;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 639cae2c158b..f56dc5497e67 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1073,11 +1073,12 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
  * file system requests.
  */
 static inline unsigned int blk_max_size_offset(struct request_queue *q,
-					       sector_t offset)
+					       sector_t offset,
+					       unsigned int chunk_sectors)
 {
-	unsigned int chunk_sectors = q->limits.chunk_sectors;
-
-	if (!chunk_sectors)
+	if (!chunk_sectors && q->limits.chunk_sectors)
+		chunk_sectors = q->limits.chunk_sectors;
+	else
 		return q->limits.max_sectors;
 
 	if (likely(is_power_of_2(chunk_sectors)))

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [dm-devel] [RFC PATCH] dm: fix IO splitting [was: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking]
  2020-12-04 17:32               ` [dm-devel] [RFC PATCH] dm: fix IO splitting [was: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking] Mike Snitzer
@ 2020-12-04 17:49                 ` Mike Snitzer
  0 siblings, 0 replies; 33+ messages in thread
From: Mike Snitzer @ 2020-12-04 17:49 UTC (permalink / raw)
  To: Ming Lei, axboe
  Cc: linux-block, dm-devel, bjohnsto, jdorminy, martin.petersen

On Fri, Dec 04 2020 at 12:32P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Fri, Dec 04 2020 at 11:47P -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Thu, Dec 03 2020 at 10:59pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > On Thu, Dec 03, 2020 at 09:03:43PM -0500, Mike Snitzer wrote:
> > > > Stacking chunk_sectors seems ill-conceived.  One size-fits-all splitting
> > > > is too rigid.
> > > 
> > > DM/VDO knows exactly it is one hard chunk_sectors limit, and DM shouldn't play
> > > the stacking trick on VDO's chunk_sectors limit, should it?
> > 
> > Feel like I already answered this in detail but... correct, DM cannot
> > and should not use stacked chunk_sectors as basis for splitting.
> > 
> > Up until 5.9, where I changed DM core to set and then use chunk_sectors
> > for splitting via blk_max_size_offset(), DM only used its own per-target
> > ti->max_io_len in drivers/md/dm.c:max_io_len().
> > 
> > But I reverted back to DM's pre-5.9 splitting in this stable@ fix that
> > I'll be sending to Linus today for 5.10-rcX:
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=6bb38bcc33bf3093c08bd1b71e4f20c82bb60dd1
> > 
> > DM is now back to pre-5.9 behavior where it doesn't even consider
> > chunk_sectors for splitting (NOTE: dm-zoned sets ti->max_io_len though
> > so it is effectively achieves the same boundary splits via max_io_len).
> 
> Last question for all, I'd be fine with the following fix instead of
> the above referenced commit 6bb38bcc33. It'd allow DM to continue to
> use blk_max_size_offset(), any opinions?
> 
> From: Mike Snitzer <snitzer@redhat.com>
> Date: Fri, 4 Dec 2020 12:03:25 -0500
> Subject: [RFC PATCH] dm: fix IO splitting
> 
> FIXME: add proper header
> Add chunk_sectors override to blk_max_size_offset().
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-merge.c      |  2 +-
>  drivers/md/dm-table.c  |  5 -----
>  drivers/md/dm.c        | 19 +++++++++++--------
>  include/linux/blkdev.h |  9 +++++----
>  4 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index bcf5e4580603..97b7c2821565 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -144,7 +144,7 @@ static struct bio *blk_bio_write_same_split(struct request_queue *q,
>  static inline unsigned get_max_io_size(struct request_queue *q,
>  				       struct bio *bio)
>  {
> -	unsigned sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
> +	unsigned sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector, 0);
>  	unsigned max_sectors = sectors;
>  	unsigned pbs = queue_physical_block_size(q) >> SECTOR_SHIFT;
>  	unsigned lbs = queue_logical_block_size(q) >> SECTOR_SHIFT;
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 2073ee8d18f4..7eeb7c4169c9 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -18,7 +18,6 @@
>  #include <linux/mutex.h>
>  #include <linux/delay.h>
>  #include <linux/atomic.h>
> -#include <linux/lcm.h>
>  #include <linux/blk-mq.h>
>  #include <linux/mount.h>
>  #include <linux/dax.h>
> @@ -1449,10 +1448,6 @@ int dm_calculate_queue_limits(struct dm_table *table,
>  			zone_sectors = ti_limits.chunk_sectors;
>  		}
>  
> -		/* Stack chunk_sectors if target-specific splitting is required */
> -		if (ti->max_io_len)
> -			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
> -							       ti_limits.chunk_sectors);
>  		/* Set I/O hints portion of queue limits */
>  		if (ti->type->io_hints)
>  			ti->type->io_hints(ti, &ti_limits);
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 98866e725f25..f7eb3d2964f3 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1039,15 +1039,18 @@ static sector_t max_io_len(struct dm_target *ti, sector_t sector)
>  	sector_t max_len;
>  
>  	/*
> -	 * Does the target need to split even further?
> -	 * - q->limits.chunk_sectors reflects ti->max_io_len so
> -	 *   blk_max_size_offset() provides required splitting.
> -	 * - blk_max_size_offset() also respects q->limits.max_sectors
> +	 * Does the target need to split IO even further?
> +	 * - varied (per target) IO splitting is a tenet of DM; this
> +	 *   explains why stacked chunk_sectors based splitting via
> +	 *   blk_max_size_offset() isn't possible here. So pass in
> +	 *   ti->max_io_len to override stacked chunk_sectors.
>  	 */
> -	max_len = blk_max_size_offset(ti->table->md->queue,
> -				      target_offset);
> -	if (len > max_len)
> -		len = max_len;
> +	if (ti->max_io_len) {
> +		max_len = blk_max_size_offset(ti->table->md->queue,
> +					      target_offset, ti->max_io_len);
> +		if (len > max_len)
> +			len = max_len;
> +	}
>  
>  	return len;
>  }
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 639cae2c158b..f56dc5497e67 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1073,11 +1073,12 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
>   * file system requests.
>   */
>  static inline unsigned int blk_max_size_offset(struct request_queue *q,
> -					       sector_t offset)
> +					       sector_t offset,
> +					       unsigned int chunk_sectors)
>  {
> -	unsigned int chunk_sectors = q->limits.chunk_sectors;
> -
> -	if (!chunk_sectors)
> +	if (!chunk_sectors && q->limits.chunk_sectors)
> +		chunk_sectors = q->limits.chunk_sectors;
> +	else
>  		return q->limits.max_sectors;
>  
>  	if (likely(is_power_of_2(chunk_sectors)))

FYI, above blkdev.h diff missed this hunk:

@@ -1101,7 +1102,7 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
 	    req_op(rq) == REQ_OP_SECURE_ERASE)
 		return blk_queue_get_max_sectors(q, req_op(rq));
 
-	return min(blk_max_size_offset(q, offset),
+	return min(blk_max_size_offset(q, offset, 0),
 			blk_queue_get_max_sectors(q, req_op(rq)));
 }
 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-12-04 17:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 17:18 [dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk_sectors Mike Snitzer
2020-11-30 20:51 ` John Dorminy
2020-11-30 23:24   ` [dm-devel] " Mike Snitzer
2020-12-01  0:21     ` John Dorminy
2020-12-01  2:12       ` Mike Snitzer
2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
2020-12-01 17:43   ` John Dorminy
2020-12-01 17:53   ` Jens Axboe
2020-12-01 18:02   ` Martin K. Petersen
2020-12-02  3:38   ` [dm-devel] [PATCH] dm: " Jeffle Xu
2020-12-02  3:38     ` Jeffle Xu
2020-12-02  3:57       ` JeffleXu
2020-12-02  5:03         ` [dm-devel] " Mike Snitzer
2020-12-02  5:14           ` Mike Snitzer
2020-12-02  6:31             ` JeffleXu
2020-12-02  6:35               ` JeffleXu
2020-12-02  6:28           ` JeffleXu
2020-12-02  7:10           ` JeffleXu
2020-12-02 15:11             ` Mike Snitzer
2020-12-03  1:48               ` JeffleXu
2020-12-03  3:26   ` [dm-devel] [PATCH v2] block: " Ming Lei
2020-12-03 14:33     ` Mike Snitzer
2020-12-03 16:27       ` Keith Busch
2020-12-03 17:56         ` Mike Snitzer
2020-12-04  1:45         ` Ming Lei
2020-12-04  2:11           ` Mike Snitzer
2020-12-04  6:22             ` Damien Le Moal
2020-12-04  1:12       ` Ming Lei
2020-12-04  2:03         ` Mike Snitzer
2020-12-04  3:59           ` Ming Lei
2020-12-04 16:47             ` Mike Snitzer
2020-12-04 17:32               ` [dm-devel] [RFC PATCH] dm: fix IO splitting [was: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking] Mike Snitzer
2020-12-04 17:49                 ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).