All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] nbd: reduce max_block restrictions
@ 2020-03-02 10:05 Vladimir Sementsov-Ogievskiy
  2020-03-02 10:05 ` [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status Vladimir Sementsov-Ogievskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-02 10:05 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

Recent changes in NBD protocol allowed to use some commands without
max_block restriction. Let's drop restrictions.

NBD change is here:
https://github.com/NetworkBlockDevice/nbd/commit/9f30fedb8699f151e7ef4ccc07e624330be3316b#diff-762fb7c670348da69cc9050ef58fe3ae

Vladimir Sementsov-Ogievskiy (5):
  block/nbd-client: drop max_block restriction from block_status
  block/nbd-client: drop max_block restriction from discard
  block: add max_pwrite_zeroes_no_fallback to BlockLimits
  block/io: fix bdrv_co_do_pwrite_zeroes head calculation
  block/io: auto-no-fallback for write-zeroes

 include/block/block_int.h |  8 ++++++++
 block/blkdebug.c          |  7 ++++++-
 block/io.c                | 32 +++++++++++++++++++++++++++++---
 block/nbd.c               |  5 +----
 4 files changed, 44 insertions(+), 8 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status
  2020-03-02 10:05 [PATCH 0/5] nbd: reduce max_block restrictions Vladimir Sementsov-Ogievskiy
@ 2020-03-02 10:05 ` Vladimir Sementsov-Ogievskiy
  2020-03-02 20:53   ` Eric Blake
  2020-03-02 10:05 ` [PATCH 2/5] block/nbd-client: drop max_block restriction from discard Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-02 10:05 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_BLOCK_STATUS. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 976be76647..2a58d6b91c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1320,9 +1320,7 @@ static int coroutine_fn nbd_client_co_block_status(
     NBDRequest request = {
         .type = NBD_CMD_BLOCK_STATUS,
         .from = offset,
-        .len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
-                                                bs->bl.request_alignment),
-                                s->info.max_block),
+        .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
                    MIN(bytes, s->info.size - offset)),
         .flags = NBD_CMD_FLAG_REQ_ONE,
     };
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] block/nbd-client: drop max_block restriction from discard
  2020-03-02 10:05 [PATCH 0/5] nbd: reduce max_block restrictions Vladimir Sementsov-Ogievskiy
  2020-03-02 10:05 ` [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status Vladimir Sementsov-Ogievskiy
@ 2020-03-02 10:05 ` Vladimir Sementsov-Ogievskiy
  2020-03-02 20:53   ` Eric Blake
  2020-03-02 10:05 ` [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-02 10:05 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_TRIM. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2a58d6b91c..a62761ea5a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1955,7 +1955,6 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 
     bs->bl.request_alignment = min;
-    bs->bl.max_pdiscard = max;
     bs->bl.max_pwrite_zeroes = max;
     bs->bl.max_transfer = max;
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits
  2020-03-02 10:05 [PATCH 0/5] nbd: reduce max_block restrictions Vladimir Sementsov-Ogievskiy
  2020-03-02 10:05 ` [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status Vladimir Sementsov-Ogievskiy
  2020-03-02 10:05 ` [PATCH 2/5] block/nbd-client: drop max_block restriction from discard Vladimir Sementsov-Ogievskiy
@ 2020-03-02 10:05 ` Vladimir Sementsov-Ogievskiy
  2020-03-13 21:07   ` Eric Blake
  2020-03-02 10:05 ` [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation Vladimir Sementsov-Ogievskiy
  2020-03-02 10:05 ` [PATCH 5/5] block/io: auto-no-fallback for write-zeroes Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-02 10:05 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which mirrors Qemu
flag BDRV_REQ_NO_FALLBACK). To drop the restriction we need new
max_pwrite_zeroes_no_fallback.

Default value of new max_pwrite_zeroes_no_fallback is zero and it means
no-restriction, so we are automatically done by this commit. Note that
nbd and blkdebug are the only drivers which in the same time define
max_pwrite_zeroes limit and support BDRV_REQ_NO_FALLBACK, so we need to
update only blkdebug.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block_int.h | 8 ++++++++
 block/blkdebug.c          | 7 ++++++-
 block/io.c                | 4 +++-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6f9fd5e20e..c167e887c6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -618,6 +618,14 @@ typedef struct BlockLimits {
      * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
     int32_t max_pwrite_zeroes;
 
+    /*
+     * Maximum number of bytes that can zeroized at once if flag
+     * BDRV_REQ_NO_FALLBACK specified (since it is signed, it must be < 2G, if
+     * set). Must be multiple of pwrite_zeroes_alignment. May be 0 if no
+     * inherent 32-bit limit.
+     */
+    int32_t max_pwrite_zeroes_no_fallback;
+
     /* Optimal alignment for write zeroes requests in bytes. A power
      * of 2 is best but not mandatory.  Must be a multiple of
      * bl.request_alignment, and must be less than max_pwrite_zeroes
diff --git a/block/blkdebug.c b/block/blkdebug.c
index af44aa973f..7627fbcb3b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -692,7 +692,11 @@ static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
     }
     assert(QEMU_IS_ALIGNED(offset, align));
     assert(QEMU_IS_ALIGNED(bytes, align));
-    if (bs->bl.max_pwrite_zeroes) {
+    if ((flags & BDRV_REQ_NO_FALLBACK) &&
+        bs->bl.max_pwrite_zeroes_no_fallback)
+    {
+        assert(bytes <= bs->bl.max_pwrite_zeroes_no_fallback);
+    } else if (bs->bl.max_pwrite_zeroes) {
         assert(bytes <= bs->bl.max_pwrite_zeroes);
     }
 
@@ -977,6 +981,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
     }
     if (s->max_write_zero) {
         bs->bl.max_pwrite_zeroes = s->max_write_zero;
+        bs->bl.max_pwrite_zeroes_no_fallback = s->max_write_zero;
     }
     if (s->opt_discard) {
         bs->bl.pdiscard_alignment = s->opt_discard;
diff --git a/block/io.c b/block/io.c
index 7e4cb74cf4..75fd5600c2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1752,7 +1752,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int head = 0;
     int tail = 0;
 
-    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
+    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
+                                        bs->bl.max_pwrite_zeroes_no_fallback :
+                                        bs->bl.max_pwrite_zeroes, INT_MAX);
     int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
                         bs->bl.request_alignment);
     int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation
  2020-03-02 10:05 [PATCH 0/5] nbd: reduce max_block restrictions Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-03-02 10:05 ` [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits Vladimir Sementsov-Ogievskiy
@ 2020-03-02 10:05 ` Vladimir Sementsov-Ogievskiy
  2020-03-13 21:47   ` Eric Blake
  2020-03-02 10:05 ` [PATCH 5/5] block/io: auto-no-fallback for write-zeroes Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-02 10:05 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, qemu-stable, qemu-devel, mreitz, stefanha, den

It's wrong to update head using num in this place, as num may be
reduced during the iteration, and we'll have wrong head value on next
iteration.

Instead update head at iteration end.

Cc: qemu-stable@nongnu.org
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 75fd5600c2..c64566b4cf 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1785,7 +1785,6 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
              * convenience, limit this request to max_transfer even if
              * we don't need to fall back to writes.  */
             num = MIN(MIN(bytes, max_transfer), alignment - head);
-            head = (head + num) % alignment;
             assert(num < max_write_zeroes);
         } else if (tail && num > alignment) {
             /* Shorten the request to the last aligned sector.  */
@@ -1844,6 +1843,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 
         offset += num;
         bytes -= num;
+        if (head) {
+            head = offset % alignment;
+        }
     }
 
 fail:
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] block/io: auto-no-fallback for write-zeroes
  2020-03-02 10:05 [PATCH 0/5] nbd: reduce max_block restrictions Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-03-02 10:05 ` [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation Vladimir Sementsov-Ogievskiy
@ 2020-03-02 10:05 ` Vladimir Sementsov-Ogievskiy
  2020-03-13 21:56   ` Eric Blake
  4 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-02 10:05 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

NBD driver may has max_pwrite_zeroes but doesn't has
max_pwrite_zeroes_no_fallback limit. This means, that (when
BDRV_REQ_NO_FALLBACK is supported) it is beneficial to try send request
with BDRV_REQ_NO_FALLBACK instead of splitting the request accordingly
to max_pwrite_zeroes.

If failed, fallback to old behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/io.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index c64566b4cf..48d71b0883 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1752,17 +1752,28 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int head = 0;
     int tail = 0;
 
-    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
-                                        bs->bl.max_pwrite_zeroes_no_fallback :
-                                        bs->bl.max_pwrite_zeroes, INT_MAX);
+    int max_write_zeroes;
     int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
                         bs->bl.request_alignment);
     int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
+    bool auto_no_fallback;
 
     if (!drv) {
         return -ENOMEDIUM;
     }
 
+    if (!(flags & BDRV_REQ_NO_FALLBACK) &&
+        (bs->supported_zero_flags & BDRV_REQ_NO_FALLBACK) &&
+        bs->bl.max_pwrite_zeroes &&
+        bs->bl.max_pwrite_zeroes < bytes &&
+        (bs->bl.max_pwrite_zeroes < bs->bl.max_pwrite_zeroes_no_fallback ||
+         bs->bl.max_pwrite_zeroes_no_fallback == 0))
+    {
+        assert(drv->bdrv_co_pwrite_zeroes);
+        flags |= BDRV_REQ_NO_FALLBACK;
+        auto_no_fallback = true;
+    }
+
     if ((flags & ~bs->supported_zero_flags) & BDRV_REQ_NO_FALLBACK) {
         return -ENOTSUP;
     }
@@ -1770,7 +1781,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     assert(alignment % bs->bl.request_alignment == 0);
     head = offset % alignment;
     tail = (offset + bytes) % alignment;
-    max_write_zeroes = QEMU_ALIGN_DOWN(max_write_zeroes, alignment);
+    max_write_zeroes =
+        QEMU_ALIGN_DOWN(MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
+                                     bs->bl.max_pwrite_zeroes_no_fallback :
+                                     bs->bl.max_pwrite_zeroes, INT_MAX),
+                        alignment);
     assert(max_write_zeroes >= bs->bl.request_alignment);
 
     while (bytes > 0 && !ret) {
@@ -1801,6 +1816,13 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
         if (drv->bdrv_co_pwrite_zeroes) {
             ret = drv->bdrv_co_pwrite_zeroes(bs, offset, num,
                                              flags & bs->supported_zero_flags);
+            if (ret == -ENOTSUP && auto_no_fallback) {
+                flags &= ~BDRV_REQ_NO_FALLBACK;
+                max_write_zeroes =
+                    QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pwrite_zeroes,
+                                                 INT_MAX), alignment);
+                continue;
+            }
             if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
                 !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
                 need_flush = true;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status
  2020-03-02 10:05 ` [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status Vladimir Sementsov-Ogievskiy
@ 2020-03-02 20:53   ` Eric Blake
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2020-03-02 20:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> NBD spec is updated, so that max_block doesn't relate to
> NBD_CMD_BLOCK_STATUS. So, drop the restriction.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 976be76647..2a58d6b91c 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -1320,9 +1320,7 @@ static int coroutine_fn nbd_client_co_block_status(
>       NBDRequest request = {
>           .type = NBD_CMD_BLOCK_STATUS,
>           .from = offset,
> -        .len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
> -                                                bs->bl.request_alignment),
> -                                s->info.max_block),
> +        .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
>                      MIN(bytes, s->info.size - offset)),

Reviewed-by: Eric Blake <eblake@redhat.com>

We're still capped at 32 bits (instead of status over the entire 
device), but future NBD extensions will get to that.  In the meantime, 
this is a lot nicer than having to stick to 32M per request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] block/nbd-client: drop max_block restriction from discard
  2020-03-02 10:05 ` [PATCH 2/5] block/nbd-client: drop max_block restriction from discard Vladimir Sementsov-Ogievskiy
@ 2020-03-02 20:53   ` Eric Blake
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2020-03-02 20:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> NBD spec is updated, so that max_block doesn't relate to
> NBD_CMD_TRIM. So, drop the restriction.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2a58d6b91c..a62761ea5a 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -1955,7 +1955,6 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>       }
>   
>       bs->bl.request_alignment = min;
> -    bs->bl.max_pdiscard = max;

We should still tell the block layer that we have a 32-bit cap (in case 
the block layer starts supporting 64-bit discard).

>       bs->bl.max_pwrite_zeroes = max;
>       bs->bl.max_transfer = max;
>   
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits
  2020-03-02 10:05 ` [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits Vladimir Sementsov-Ogievskiy
@ 2020-03-13 21:07   ` Eric Blake
  2020-03-24  8:32     ` Vladimir Sementsov-Ogievskiy
  2020-04-01 14:09     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Blake @ 2020-03-13 21:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> NBD spec is updated, so that max_block doesn't relate to

Maybe: The NBD spec was recently updated to clarify that max_block...

> NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which mirrors Qemu
> flag BDRV_REQ_NO_FALLBACK). To drop the restriction we need new
> max_pwrite_zeroes_no_fallback.

It feels odd to have two different pwrite_zeroes limits in the block 
layer, but I can live with it if other block layer gurus are also okay 
with it.

> 
> Default value of new max_pwrite_zeroes_no_fallback is zero and it means
> no-restriction, so we are automatically done by this commit. Note that

Why not have the default value be set to the existing value of the 
normal pwrite_zeroes limit, rather than 0?

> nbd and blkdebug are the only drivers which in the same time define
> max_pwrite_zeroes limit and support BDRV_REQ_NO_FALLBACK, so we need to
> update only blkdebug.

Grammar:

The default value for the new max_pwrite_zeroes_no_fallback is zero, 
meaning no restriction, which covers all drivers not touched by this 
commit.  Note that nbd and blkdebug are the only drivers which have a 
max_pwrite_zeroes limit while supporting BDRV_REQ_NO_FALLBACK, so we 
only need to update blkdebug.

Except that I think there IS still a limit in current NBD: you can't 
request anything larger than 32 bits (whereas some other drivers may 
allow a full 63-bit request, as well as future NBD usage when we finally 
add 64-bit extensions to the protocol).  So I think this patch is 
incomplete; it should be updating the nbd code to set the proper limit.

(I still need to post v2 of my patches for bdrv_co_make_zero support, 
which is a case where knowing if there is a 32-bit limit when using 
BDRV_REQ_NO_FALLBACK for fast zeroing is important).

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block_int.h | 8 ++++++++
>   block/blkdebug.c          | 7 ++++++-
>   block/io.c                | 4 +++-
>   3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 6f9fd5e20e..c167e887c6 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -618,6 +618,14 @@ typedef struct BlockLimits {
>        * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>       int32_t max_pwrite_zeroes;
>   
> +    /*
> +     * Maximum number of bytes that can zeroized at once if flag

zeroed

> +     * BDRV_REQ_NO_FALLBACK specified (since it is signed, it must be < 2G, if
> +     * set).

Why must it be a signed 32-bit number?  Why not let it be a 64-bit number?

> Must be multiple of pwrite_zeroes_alignment. May be 0 if no
> +     * inherent 32-bit limit.
> +     */
> +    int32_t max_pwrite_zeroes_no_fallback;
> +
>       /* Optimal alignment for write zeroes requests in bytes. A power
>        * of 2 is best but not mandatory.  Must be a multiple of
>        * bl.request_alignment, and must be less than max_pwrite_zeroes
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index af44aa973f..7627fbcb3b 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -692,7 +692,11 @@ static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
>       }
>       assert(QEMU_IS_ALIGNED(offset, align));
>       assert(QEMU_IS_ALIGNED(bytes, align));
> -    if (bs->bl.max_pwrite_zeroes) {
> +    if ((flags & BDRV_REQ_NO_FALLBACK) &&
> +        bs->bl.max_pwrite_zeroes_no_fallback)
> +    {
> +        assert(bytes <= bs->bl.max_pwrite_zeroes_no_fallback);
> +    } else if (bs->bl.max_pwrite_zeroes) {
>           assert(bytes <= bs->bl.max_pwrite_zeroes);
>       }
>   
> @@ -977,6 +981,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
>       }
>       if (s->max_write_zero) {
>           bs->bl.max_pwrite_zeroes = s->max_write_zero;
> +        bs->bl.max_pwrite_zeroes_no_fallback = s->max_write_zero;

Ah, so you DO default it to max_pwwrite_zeroes instead of to 0; the 
commit message does not quite match the code.

>       }
>       if (s->opt_discard) {
>           bs->bl.pdiscard_alignment = s->opt_discard;
> diff --git a/block/io.c b/block/io.c
> index 7e4cb74cf4..75fd5600c2 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1752,7 +1752,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>       int head = 0;
>       int tail = 0;
>   
> -    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
> +    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
> +                                        bs->bl.max_pwrite_zeroes_no_fallback :
> +                                        bs->bl.max_pwrite_zeroes, INT_MAX);

I'd still like to get rid of this INT_MAX clamping.  If we can blank the 
entire image in one call, even when it is larger than 4G, then it is 
worth making that exposed to the user.  (Even in NBD, we might decide to 
add an extension that allows NBD_CMD_WRITE_ZEROES with a new flag and 
with offset/length == 0/0, as an official way to make the entire image 
zero, whereas it is now currently unspecified to pass a length of 0).

>       int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>                           bs->bl.request_alignment);
>       int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation
  2020-03-02 10:05 ` [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation Vladimir Sementsov-Ogievskiy
@ 2020-03-13 21:47   ` Eric Blake
  2020-03-24  9:22     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2020-03-13 21:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-stable, qemu-devel, mreitz, stefanha, den

On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> It's wrong to update head using num in this place, as num may be
> reduced during the iteration, and we'll have wrong head value on next
> iteration.
> 
> Instead update head at iteration end.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/io.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)

Offhand, I don't see how this fixes any bug....
/me reads on

> 
> diff --git a/block/io.c b/block/io.c
> index 75fd5600c2..c64566b4cf 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1785,7 +1785,6 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>                * convenience, limit this request to max_transfer even if
>                * we don't need to fall back to writes.  */
>               num = MIN(MIN(bytes, max_transfer), alignment - head);
> -            head = (head + num) % alignment;
>               assert(num < max_write_zeroes);

Here, we've asserted that if head was non-zero, num was already smaller 
than max_write_zeroes.  The rest of the loop does indeed have code that 
appears like it can reduce num, but that code is guarded:

         /* limit request size */
         if (num > max_write_zeroes) {
             num = max_write_zeroes;
         }
...
         if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;

             if ((flags & BDRV_REQ_FUA) &&
                 !(bs->supported_write_flags & BDRV_REQ_FUA)) {
                 /* No need for bdrv_driver_pwrite() to do a fallback
                  * flush on each chunk; use just one at the end */
                 write_flags &= ~BDRV_REQ_FUA;
                 need_flush = true;
             }
             num = MIN(num, max_transfer);

Oh. Now I see.  If max_write_zeroes is > max_transfer, but we fall back 
to a bounce buffer, it is indeed possible that a misaligned request that 
forces fallbacks to writes may indeed require more than one write to get 
to the point where it is then using a buffer aligned to max_write_zeroes.

Do we have an iotest provoking this, or is it theoretical?  With an 
iotest, this one is material for 5.0 even if the rest of the series 
misses soft freeze.

>           } else if (tail && num > alignment) {
>               /* Shorten the request to the last aligned sector.  */
> @@ -1844,6 +1843,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>   
>           offset += num;
>           bytes -= num;
> +        if (head) {
> +            head = offset % alignment;
> +        }

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] block/io: auto-no-fallback for write-zeroes
  2020-03-02 10:05 ` [PATCH 5/5] block/io: auto-no-fallback for write-zeroes Vladimir Sementsov-Ogievskiy
@ 2020-03-13 21:56   ` Eric Blake
  2020-04-01 14:35     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2020-03-13 21:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> NBD driver may has max_pwrite_zeroes but doesn't has
> max_pwrite_zeroes_no_fallback limit. This means, that (when
> BDRV_REQ_NO_FALLBACK is supported) it is beneficial to try send request
> with BDRV_REQ_NO_FALLBACK instead of splitting the request accordingly
> to max_pwrite_zeroes.
> 
> If failed, fallback to old behavior.

Grammar:

When BDRV_REQ_NO_FALLBACK is supported, the NBD driver supports a larger 
request size.  Add code to try large zero requests with a NO_FALLBACK 
request prior to having to split a request into chunks according to 
max_pwrite_zeroes.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/io.c | 30 ++++++++++++++++++++++++++----
>   1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index c64566b4cf..48d71b0883 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1752,17 +1752,28 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>       int head = 0;
>       int tail = 0;
>   
> -    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
> -                                        bs->bl.max_pwrite_zeroes_no_fallback :
> -                                        bs->bl.max_pwrite_zeroes, INT_MAX);
> +    int max_write_zeroes;
>       int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>                           bs->bl.request_alignment);
>       int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
> +    bool auto_no_fallback;
>   
>       if (!drv) {
>           return -ENOMEDIUM;
>       }
>   
> +    if (!(flags & BDRV_REQ_NO_FALLBACK) &&
> +        (bs->supported_zero_flags & BDRV_REQ_NO_FALLBACK) &&
> +        bs->bl.max_pwrite_zeroes &&
> +        bs->bl.max_pwrite_zeroes < bytes &&
> +        (bs->bl.max_pwrite_zeroes < bs->bl.max_pwrite_zeroes_no_fallback ||
> +         bs->bl.max_pwrite_zeroes_no_fallback == 0))

Why are we letting max_pwrite_zeroes_no_fallback ever be 0?  It might be 
more convenient if it is always guaranteed to be >= max_pwrite_zeroes by 
the block layer.

> +    {
> +        assert(drv->bdrv_co_pwrite_zeroes);
> +        flags |= BDRV_REQ_NO_FALLBACK;
> +        auto_no_fallback = true;
> +    }
> +
>       if ((flags & ~bs->supported_zero_flags) & BDRV_REQ_NO_FALLBACK) {
>           return -ENOTSUP;
>       }
> @@ -1770,7 +1781,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>       assert(alignment % bs->bl.request_alignment == 0);
>       head = offset % alignment;
>       tail = (offset + bytes) % alignment;
> -    max_write_zeroes = QEMU_ALIGN_DOWN(max_write_zeroes, alignment);
> +    max_write_zeroes =
> +        QEMU_ALIGN_DOWN(MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
> +                                     bs->bl.max_pwrite_zeroes_no_fallback :
> +                                     bs->bl.max_pwrite_zeroes, INT_MAX),
> +                        alignment);
>       assert(max_write_zeroes >= bs->bl.request_alignment);
>   
>       while (bytes > 0 && !ret) {
> @@ -1801,6 +1816,13 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>           if (drv->bdrv_co_pwrite_zeroes) {
>               ret = drv->bdrv_co_pwrite_zeroes(bs, offset, num,
>                                                flags & bs->supported_zero_flags);
> +            if (ret == -ENOTSUP && auto_no_fallback) {
> +                flags &= ~BDRV_REQ_NO_FALLBACK;
> +                max_write_zeroes =
> +                    QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pwrite_zeroes,
> +                                                 INT_MAX), alignment);
> +                continue;
> +            }
>               if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
>                   !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
>                   need_flush = true;
> 

Otherwise makes sense.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits
  2020-03-13 21:07   ` Eric Blake
@ 2020-03-24  8:32     ` Vladimir Sementsov-Ogievskiy
  2020-03-31  6:52       ` Vladimir Sementsov-Ogievskiy
  2020-04-01 14:09     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-24  8:32 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

14.03.2020 0:07, Eric Blake wrote:
> On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>> NBD spec is updated, so that max_block doesn't relate to
> 
> Maybe: The NBD spec was recently updated to clarify that max_block...
> 
>> NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which mirrors Qemu
>> flag BDRV_REQ_NO_FALLBACK). To drop the restriction we need new
>> max_pwrite_zeroes_no_fallback.
> 
> It feels odd to have two different pwrite_zeroes limits in the block layer, but I can live with it if other block layer gurus are also okay with it.
> 
>>
>> Default value of new max_pwrite_zeroes_no_fallback is zero and it means
>> no-restriction, so we are automatically done by this commit. Note that
> 
> Why not have the default value be set to the existing value of the normal pwrite_zeroes limit, rather than 0?
> 
>> nbd and blkdebug are the only drivers which in the same time define
>> max_pwrite_zeroes limit and support BDRV_REQ_NO_FALLBACK, so we need to
>> update only blkdebug.
> 
> Grammar:
> 
> The default value for the new max_pwrite_zeroes_no_fallback is zero, meaning no restriction, which covers all drivers not touched by this commit.  Note that nbd and blkdebug are the only drivers which have a max_pwrite_zeroes limit while supporting BDRV_REQ_NO_FALLBACK, so we only need to update blkdebug.
> 
> Except that I think there IS still a limit in current NBD: you can't request anything larger than 32 bits (whereas some other drivers may allow a full 63-bit request, as well as future NBD usage when we finally add 64-bit extensions to the protocol).  So I think this patch is incomplete; it should be updating the nbd code to set the proper limit.
> 
> (I still need to post v2 of my patches for bdrv_co_make_zero support, which is a case where knowing if there is a 32-bit limit when using BDRV_REQ_NO_FALLBACK for fast zeroing is important).
> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/block_int.h | 8 ++++++++
>>   block/blkdebug.c          | 7 ++++++-
>>   block/io.c                | 4 +++-
>>   3 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 6f9fd5e20e..c167e887c6 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -618,6 +618,14 @@ typedef struct BlockLimits {
>>        * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>>       int32_t max_pwrite_zeroes;
>> +    /*
>> +     * Maximum number of bytes that can zeroized at once if flag
> 
> zeroed
> 
>> +     * BDRV_REQ_NO_FALLBACK specified (since it is signed, it must be < 2G, if
>> +     * set).
> 
> Why must it be a signed 32-bit number?  Why not let it be a 64-bit number?
> 
>> Must be multiple of pwrite_zeroes_alignment. May be 0 if no
>> +     * inherent 32-bit limit.
>> +     */
>> +    int32_t max_pwrite_zeroes_no_fallback;
>> +
>>       /* Optimal alignment for write zeroes requests in bytes. A power
>>        * of 2 is best but not mandatory.  Must be a multiple of
>>        * bl.request_alignment, and must be less than max_pwrite_zeroes
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index af44aa973f..7627fbcb3b 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
>> @@ -692,7 +692,11 @@ static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
>>       }
>>       assert(QEMU_IS_ALIGNED(offset, align));
>>       assert(QEMU_IS_ALIGNED(bytes, align));
>> -    if (bs->bl.max_pwrite_zeroes) {
>> +    if ((flags & BDRV_REQ_NO_FALLBACK) &&
>> +        bs->bl.max_pwrite_zeroes_no_fallback)
>> +    {
>> +        assert(bytes <= bs->bl.max_pwrite_zeroes_no_fallback);
>> +    } else if (bs->bl.max_pwrite_zeroes) {
>>           assert(bytes <= bs->bl.max_pwrite_zeroes);
>>       }
>> @@ -977,6 +981,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
>>       }
>>       if (s->max_write_zero) {
>>           bs->bl.max_pwrite_zeroes = s->max_write_zero;
>> +        bs->bl.max_pwrite_zeroes_no_fallback = s->max_write_zero;
> 
> Ah, so you DO default it to max_pwwrite_zeroes instead of to 0; the commit message does not quite match the code.
> 
>>       }
>>       if (s->opt_discard) {
>>           bs->bl.pdiscard_alignment = s->opt_discard;
>> diff --git a/block/io.c b/block/io.c
>> index 7e4cb74cf4..75fd5600c2 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1752,7 +1752,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>       int head = 0;
>>       int tail = 0;
>> -    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
>> +    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
>> +                                        bs->bl.max_pwrite_zeroes_no_fallback :
>> +                                        bs->bl.max_pwrite_zeroes, INT_MAX);
> 
> I'd still like to get rid of this INT_MAX clamping.  If we can blank the entire image in one call, even when it is larger than 4G, then it is worth making that exposed to the user.  (Even in NBD, we might decide to add an extension that allows NBD_CMD_WRITE_ZEROES with a new flag and with offset/length == 0/0, as an official way to make the entire image zero, whereas it is now currently unspecified to pass a length of 0).
> 

Hmm. This series is kind of hacking. Now, 5.0 is missed anyway, I think, I'll prepare something more complete. It would be good to prepare generic block layer for 64bit commands.

>>       int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>>                           bs->bl.request_alignment);
>>       int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation
  2020-03-13 21:47   ` Eric Blake
@ 2020-03-24  9:22     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-24  9:22 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-stable, qemu-devel, mreitz, stefanha, den

14.03.2020 0:47, Eric Blake wrote:
> On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>> It's wrong to update head using num in this place, as num may be
>> reduced during the iteration, and we'll have wrong head value on next
>> iteration.
>>
>> Instead update head at iteration end.
>>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/io.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Offhand, I don't see how this fixes any bug....
> /me reads on
> 
>>
>> diff --git a/block/io.c b/block/io.c
>> index 75fd5600c2..c64566b4cf 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1785,7 +1785,6 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>                * convenience, limit this request to max_transfer even if
>>                * we don't need to fall back to writes.  */
>>               num = MIN(MIN(bytes, max_transfer), alignment - head);
>> -            head = (head + num) % alignment;
>>               assert(num < max_write_zeroes);
> 
> Here, we've asserted that if head was non-zero, num was already smaller than max_write_zeroes.  The rest of the loop does indeed have code that appears like it can reduce num, but that code is guarded:
> 
>          /* limit request size */
>          if (num > max_write_zeroes) {
>              num = max_write_zeroes;
>          }
> ...
>          if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
>              /* Fall back to bounce buffer if write zeroes is unsupported */
>              BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
> 
>              if ((flags & BDRV_REQ_FUA) &&
>                  !(bs->supported_write_flags & BDRV_REQ_FUA)) {
>                  /* No need for bdrv_driver_pwrite() to do a fallback
>                   * flush on each chunk; use just one at the end */
>                  write_flags &= ~BDRV_REQ_FUA;
>                  need_flush = true;
>              }
>              num = MIN(num, max_transfer);

Now I think that this is impossible. If we updated head above, than num is already less or equal to max_transfer, so is not updated by this line. Or do I now miss something?

So, the patch may be good as refactoring but not really needed for 5.0.

> 
> Oh. Now I see.  If max_write_zeroes is > max_transfer, but we fall back to a bounce buffer, it is indeed possible that a misaligned request that forces fallbacks to writes may indeed require more than one write to get to the point where it is then using a buffer aligned to max_write_zeroes.
> 
> Do we have an iotest provoking this, or is it theoretical?  With an iotest, this one is material for 5.0 even if the rest of the series misses soft freeze.
> 
>>           } else if (tail && num > alignment) {
>>               /* Shorten the request to the last aligned sector.  */
>> @@ -1844,6 +1843,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>           offset += num;
>>           bytes -= num;
>> +        if (head) {
>> +            head = offset % alignment;
>> +        }
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits
  2020-03-24  8:32     ` Vladimir Sementsov-Ogievskiy
@ 2020-03-31  6:52       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-31  6:52 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

24.03.2020 11:32, Vladimir Sementsov-Ogievskiy wrote:
> 14.03.2020 0:07, Eric Blake wrote:
>> On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> NBD spec is updated, so that max_block doesn't relate to
>>
>> Maybe: The NBD spec was recently updated to clarify that max_block...
>>
>>> NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which mirrors Qemu
>>> flag BDRV_REQ_NO_FALLBACK). To drop the restriction we need new
>>> max_pwrite_zeroes_no_fallback.
>>
>> It feels odd to have two different pwrite_zeroes limits in the block layer, but I can live with it if other block layer gurus are also okay with it.
>>
>>>
>>> Default value of new max_pwrite_zeroes_no_fallback is zero and it means
>>> no-restriction, so we are automatically done by this commit. Note that
>>
>> Why not have the default value be set to the existing value of the normal pwrite_zeroes limit, rather than 0?
>>
>>> nbd and blkdebug are the only drivers which in the same time define
>>> max_pwrite_zeroes limit and support BDRV_REQ_NO_FALLBACK, so we need to
>>> update only blkdebug.
>>
>> Grammar:
>>
>> The default value for the new max_pwrite_zeroes_no_fallback is zero, meaning no restriction, which covers all drivers not touched by this commit.  Note that nbd and blkdebug are the only drivers which have a max_pwrite_zeroes limit while supporting BDRV_REQ_NO_FALLBACK, so we only need to update blkdebug.
>>
>> Except that I think there IS still a limit in current NBD: you can't request anything larger than 32 bits (whereas some other drivers may allow a full 63-bit request, as well as future NBD usage when we finally add 64-bit extensions to the protocol).  So I think this patch is incomplete; it should be updating the nbd code to set the proper limit.
>>
>> (I still need to post v2 of my patches for bdrv_co_make_zero support, which is a case where knowing if there is a 32-bit limit when using BDRV_REQ_NO_FALLBACK for fast zeroing is important).
>>
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   include/block/block_int.h | 8 ++++++++
>>>   block/blkdebug.c          | 7 ++++++-
>>>   block/io.c                | 4 +++-
>>>   3 files changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 6f9fd5e20e..c167e887c6 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -618,6 +618,14 @@ typedef struct BlockLimits {
>>>        * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>>>       int32_t max_pwrite_zeroes;
>>> +    /*
>>> +     * Maximum number of bytes that can zeroized at once if flag
>>
>> zeroed
>>
>>> +     * BDRV_REQ_NO_FALLBACK specified (since it is signed, it must be < 2G, if
>>> +     * set).
>>
>> Why must it be a signed 32-bit number?  Why not let it be a 64-bit number?
>>
>>> Must be multiple of pwrite_zeroes_alignment. May be 0 if no
>>> +     * inherent 32-bit limit.
>>> +     */
>>> +    int32_t max_pwrite_zeroes_no_fallback;
>>> +
>>>       /* Optimal alignment for write zeroes requests in bytes. A power
>>>        * of 2 is best but not mandatory.  Must be a multiple of
>>>        * bl.request_alignment, and must be less than max_pwrite_zeroes
>>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>>> index af44aa973f..7627fbcb3b 100644
>>> --- a/block/blkdebug.c
>>> +++ b/block/blkdebug.c
>>> @@ -692,7 +692,11 @@ static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
>>>       }
>>>       assert(QEMU_IS_ALIGNED(offset, align));
>>>       assert(QEMU_IS_ALIGNED(bytes, align));
>>> -    if (bs->bl.max_pwrite_zeroes) {
>>> +    if ((flags & BDRV_REQ_NO_FALLBACK) &&
>>> +        bs->bl.max_pwrite_zeroes_no_fallback)
>>> +    {
>>> +        assert(bytes <= bs->bl.max_pwrite_zeroes_no_fallback);
>>> +    } else if (bs->bl.max_pwrite_zeroes) {
>>>           assert(bytes <= bs->bl.max_pwrite_zeroes);
>>>       }
>>> @@ -977,6 +981,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
>>>       }
>>>       if (s->max_write_zero) {
>>>           bs->bl.max_pwrite_zeroes = s->max_write_zero;
>>> +        bs->bl.max_pwrite_zeroes_no_fallback = s->max_write_zero;
>>
>> Ah, so you DO default it to max_pwwrite_zeroes instead of to 0; the commit message does not quite match the code.
>>
>>>       }
>>>       if (s->opt_discard) {
>>>           bs->bl.pdiscard_alignment = s->opt_discard;
>>> diff --git a/block/io.c b/block/io.c
>>> index 7e4cb74cf4..75fd5600c2 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1752,7 +1752,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>>       int head = 0;
>>>       int tail = 0;
>>> -    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
>>> +    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
>>> +                                        bs->bl.max_pwrite_zeroes_no_fallback :
>>> +                                        bs->bl.max_pwrite_zeroes, INT_MAX);
>>
>> I'd still like to get rid of this INT_MAX clamping.  If we can blank the entire image in one call, even when it is larger than 4G, then it is worth making that exposed to the user.  (Even in NBD, we might decide to add an extension that allows NBD_CMD_WRITE_ZEROES with a new flag and with offset/length == 0/0, as an official way to make the entire image zero, whereas it is now currently unspecified to pass a length of 0).
>>
> 
> Hmm. This series is kind of hacking. Now, 5.0 is missed anyway, I think, I'll prepare something more complete. It would be good to prepare generic block layer for 64bit commands.

I've started: "[RFC 0/3] 64bit block-layer part I". But I now see that it's not simple thing to convert the block-layer, so it probably will not be fast, and better not force the dependency between the series. So, I'll handle your comments and resend this series in separate and will try to keep it parallel.

> 
>>>       int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>>>                           bs->bl.request_alignment);
>>>       int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
>>>
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits
  2020-03-13 21:07   ` Eric Blake
  2020-03-24  8:32     ` Vladimir Sementsov-Ogievskiy
@ 2020-04-01 14:09     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-01 14:09 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

14.03.2020 0:07, Eric Blake wrote:
> On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>> NBD spec is updated, so that max_block doesn't relate to
> 
> Maybe: The NBD spec was recently updated to clarify that max_block...
> 
>> NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which mirrors Qemu
>> flag BDRV_REQ_NO_FALLBACK). To drop the restriction we need new
>> max_pwrite_zeroes_no_fallback.
> 
> It feels odd to have two different pwrite_zeroes limits in the block layer, but I can live with it if other block layer gurus are also okay with it.
> 
>>
>> Default value of new max_pwrite_zeroes_no_fallback is zero and it means
>> no-restriction, so we are automatically done by this commit. Note that
> 
> Why not have the default value be set to the existing value of the normal pwrite_zeroes limit, rather than 0?

Hm I agree, that it's better to keep safer default.

> 
>> nbd and blkdebug are the only drivers which in the same time define
>> max_pwrite_zeroes limit and support BDRV_REQ_NO_FALLBACK, so we need to
>> update only blkdebug.
> 
> Grammar:
> 
> The default value for the new max_pwrite_zeroes_no_fallback is zero, meaning no restriction, which covers all drivers not touched by this commit.  Note that nbd and blkdebug are the only drivers which have a max_pwrite_zeroes limit while supporting BDRV_REQ_NO_FALLBACK, so we only need to update blkdebug.
> 
> Except that I think there IS still a limit in current NBD: you can't request anything larger than 32 bits (whereas some other drivers may allow a full 63-bit request, as well as future NBD usage when we finally add 64-bit extensions to the protocol).  So I think this patch is incomplete; it should be updating the nbd code to set the proper limit.
> 
> (I still need to post v2 of my patches for bdrv_co_make_zero support, which is a case where knowing if there is a 32-bit limit when using BDRV_REQ_NO_FALLBACK for fast zeroing is important).
> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/block_int.h | 8 ++++++++
>>   block/blkdebug.c          | 7 ++++++-
>>   block/io.c                | 4 +++-
>>   3 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 6f9fd5e20e..c167e887c6 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -618,6 +618,14 @@ typedef struct BlockLimits {
>>        * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>>       int32_t max_pwrite_zeroes;
>> +    /*
>> +     * Maximum number of bytes that can zeroized at once if flag
> 
> zeroed
> 
>> +     * BDRV_REQ_NO_FALLBACK specified (since it is signed, it must be < 2G, if
>> +     * set).
> 
> Why must it be a signed 32-bit number?  Why not let it be a 64-bit number?
> 
>> Must be multiple of pwrite_zeroes_alignment. May be 0 if no
>> +     * inherent 32-bit limit.
>> +     */
>> +    int32_t max_pwrite_zeroes_no_fallback;
>> +
>>       /* Optimal alignment for write zeroes requests in bytes. A power
>>        * of 2 is best but not mandatory.  Must be a multiple of
>>        * bl.request_alignment, and must be less than max_pwrite_zeroes
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index af44aa973f..7627fbcb3b 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
>> @@ -692,7 +692,11 @@ static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
>>       }
>>       assert(QEMU_IS_ALIGNED(offset, align));
>>       assert(QEMU_IS_ALIGNED(bytes, align));
>> -    if (bs->bl.max_pwrite_zeroes) {
>> +    if ((flags & BDRV_REQ_NO_FALLBACK) &&
>> +        bs->bl.max_pwrite_zeroes_no_fallback)
>> +    {
>> +        assert(bytes <= bs->bl.max_pwrite_zeroes_no_fallback);
>> +    } else if (bs->bl.max_pwrite_zeroes) {
>>           assert(bytes <= bs->bl.max_pwrite_zeroes);
>>       }
>> @@ -977,6 +981,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
>>       }
>>       if (s->max_write_zero) {
>>           bs->bl.max_pwrite_zeroes = s->max_write_zero;
>> +        bs->bl.max_pwrite_zeroes_no_fallback = s->max_write_zero;
> 
> Ah, so you DO default it to max_pwwrite_zeroes instead of to 0; the commit message does not quite match the code.

In this patch it's only for blkdebug. But I'm going to change the default as you proposed.

> 
>>       }
>>       if (s->opt_discard) {
>>           bs->bl.pdiscard_alignment = s->opt_discard;
>> diff --git a/block/io.c b/block/io.c
>> index 7e4cb74cf4..75fd5600c2 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1752,7 +1752,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>       int head = 0;
>>       int tail = 0;
>> -    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
>> +    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
>> +                                        bs->bl.max_pwrite_zeroes_no_fallback :
>> +                                        bs->bl.max_pwrite_zeroes, INT_MAX);
> 
> I'd still like to get rid of this INT_MAX clamping.  If we can blank the entire image in one call, even when it is larger than 4G, then it is worth making that exposed to the user.  (Even in NBD, we might decide to add an extension that allows NBD_CMD_WRITE_ZEROES with a new flag and with offset/length == 0/0, as an official way to make the entire image zero, whereas it is now currently unspecified to pass a length of 0).

We can't get rid of it now, just because write_zero driver handler has int argument. So, I'm going to convert block-layer to int64_t-everywhere, but do it in separate series.

> 
>>       int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>>                           bs->bl.request_alignment);
>>       int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] block/io: auto-no-fallback for write-zeroes
  2020-03-13 21:56   ` Eric Blake
@ 2020-04-01 14:35     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-01 14:35 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den

14.03.2020 0:56, Eric Blake wrote:
> On 3/2/20 4:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>> NBD driver may has max_pwrite_zeroes but doesn't has
>> max_pwrite_zeroes_no_fallback limit. This means, that (when
>> BDRV_REQ_NO_FALLBACK is supported) it is beneficial to try send request
>> with BDRV_REQ_NO_FALLBACK instead of splitting the request accordingly
>> to max_pwrite_zeroes.
>>
>> If failed, fallback to old behavior.
> 
> Grammar:
> 
> When BDRV_REQ_NO_FALLBACK is supported, the NBD driver supports a larger request size.  Add code to try large zero requests with a NO_FALLBACK request prior to having to split a request into chunks according to max_pwrite_zeroes.
> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/io.c | 30 ++++++++++++++++++++++++++----
>>   1 file changed, 26 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index c64566b4cf..48d71b0883 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1752,17 +1752,28 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>       int head = 0;
>>       int tail = 0;
>> -    int max_write_zeroes = MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
>> -                                        bs->bl.max_pwrite_zeroes_no_fallback :
>> -                                        bs->bl.max_pwrite_zeroes, INT_MAX);
>> +    int max_write_zeroes;
>>       int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
>>                           bs->bl.request_alignment);
>>       int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
>> +    bool auto_no_fallback;
>>       if (!drv) {
>>           return -ENOMEDIUM;
>>       }
>> +    if (!(flags & BDRV_REQ_NO_FALLBACK) &&
>> +        (bs->supported_zero_flags & BDRV_REQ_NO_FALLBACK) &&
>> +        bs->bl.max_pwrite_zeroes &&
>> +        bs->bl.max_pwrite_zeroes < bytes &&
>> +        (bs->bl.max_pwrite_zeroes < bs->bl.max_pwrite_zeroes_no_fallback ||
>> +         bs->bl.max_pwrite_zeroes_no_fallback == 0))
> 
> Why are we letting max_pwrite_zeroes_no_fallback ever be 0?  It might be more convenient if it is always guaranteed to be >= max_pwrite_zeroes by the block layer.


All other limits may be 0 too, which means some default.. So, if we want to set the default for max_pwrite_zeroes_no_fallback explicitly, I think we should do it for all other limits too. And it should be separate thing..

> 
>> +    {
>> +        assert(drv->bdrv_co_pwrite_zeroes);
>> +        flags |= BDRV_REQ_NO_FALLBACK;
>> +        auto_no_fallback = true;
>> +    }
>> +
>>       if ((flags & ~bs->supported_zero_flags) & BDRV_REQ_NO_FALLBACK) {
>>           return -ENOTSUP;
>>       }
>> @@ -1770,7 +1781,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>       assert(alignment % bs->bl.request_alignment == 0);
>>       head = offset % alignment;
>>       tail = (offset + bytes) % alignment;
>> -    max_write_zeroes = QEMU_ALIGN_DOWN(max_write_zeroes, alignment);
>> +    max_write_zeroes =
>> +        QEMU_ALIGN_DOWN(MIN_NON_ZERO((flags & BDRV_REQ_NO_FALLBACK) ?
>> +                                     bs->bl.max_pwrite_zeroes_no_fallback :
>> +                                     bs->bl.max_pwrite_zeroes, INT_MAX),
>> +                        alignment);
>>       assert(max_write_zeroes >= bs->bl.request_alignment);
>>       while (bytes > 0 && !ret) {
>> @@ -1801,6 +1816,13 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>           if (drv->bdrv_co_pwrite_zeroes) {
>>               ret = drv->bdrv_co_pwrite_zeroes(bs, offset, num,
>>                                                flags & bs->supported_zero_flags);
>> +            if (ret == -ENOTSUP && auto_no_fallback) {
>> +                flags &= ~BDRV_REQ_NO_FALLBACK;
>> +                max_write_zeroes =
>> +                    QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pwrite_zeroes,
>> +                                                 INT_MAX), alignment);
>> +                continue;
>> +            }
>>               if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
>>                   !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
>>                   need_flush = true;
>>
> 
> Otherwise makes sense.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-04-01 14:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02 10:05 [PATCH 0/5] nbd: reduce max_block restrictions Vladimir Sementsov-Ogievskiy
2020-03-02 10:05 ` [PATCH 1/5] block/nbd-client: drop max_block restriction from block_status Vladimir Sementsov-Ogievskiy
2020-03-02 20:53   ` Eric Blake
2020-03-02 10:05 ` [PATCH 2/5] block/nbd-client: drop max_block restriction from discard Vladimir Sementsov-Ogievskiy
2020-03-02 20:53   ` Eric Blake
2020-03-02 10:05 ` [PATCH 3/5] block: add max_pwrite_zeroes_no_fallback to BlockLimits Vladimir Sementsov-Ogievskiy
2020-03-13 21:07   ` Eric Blake
2020-03-24  8:32     ` Vladimir Sementsov-Ogievskiy
2020-03-31  6:52       ` Vladimir Sementsov-Ogievskiy
2020-04-01 14:09     ` Vladimir Sementsov-Ogievskiy
2020-03-02 10:05 ` [PATCH 4/5] block/io: fix bdrv_co_do_pwrite_zeroes head calculation Vladimir Sementsov-Ogievskiy
2020-03-13 21:47   ` Eric Blake
2020-03-24  9:22     ` Vladimir Sementsov-Ogievskiy
2020-03-02 10:05 ` [PATCH 5/5] block/io: auto-no-fallback for write-zeroes Vladimir Sementsov-Ogievskiy
2020-03-13 21:56   ` Eric Blake
2020-04-01 14:35     ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.