All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based
@ 2017-07-03 22:14 Eric Blake
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status() Eric Blake
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, qemu-block, el13635

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (v3 is reviewed [1], modulo vvfat whitespace changes)
part 2: dirty-bitmap (v4 is posted [2]; v2 was reviewed but rebase changes
mean more review is needed)
part 3: this series, for bdrv_get_block_status (first half of v1, at [3],
did not get much review)
part 4: upcoming series, for .bdrv_co_block_status (second half of v1 [3])

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-status-v2

It is based on the union of Max's and Kevin's block branches,
plus posted fixes that make iotest 55 pass.

The diffstat shows a net growth in line count, but some of that is due
to better comments, and some because the code is a bit longer in order
to handle differing alignments between caller and driver.

I still haven't felt like tackling the task of rewriting migration/block.c
and qemu-img.c to use bytes (instead of sectors) everywhere - that might
give another net win in lines of code and legibility.

Patch 1/15 isn't really mine; Manos will probably be posting a v3
of his series, which should be committed before mine [4].

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06077.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00269.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02642.html
[4] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06846.html

Since v1: rebase against lots of upstream churn; add patches to support
arbitrary alignments and test them. I also split the original v1 series
(31 patches) into two halves, where teaching the individual drivers
about byte-level granularity is now going to be a fourth series (I'll
post a 16/15 RFC patch to show the interface that will use)

001/15:[down] 'block: add default implementations for bdrv_co_get_block_status()'
002/15:[down] 'block: Allow NULL file for bdrv_get_block_status()'
003/15:[down] 'block: Add flag to avoid wasted work in bdrv_is_allocated()'
004/15:[0002] [FC] 'block: Make bdrv_round_to_clusters() signature more useful'
005/15:[0018] [FC] 'qcow2: Switch is_zero_sectors() to byte-based'
006/15:[0004] [FC] 'block: Switch bdrv_make_zero() to byte-based'
007/15:[----] [--] 'qemu-img: Switch get_block_status() to byte-based'
008/15:[0058] [FC] 'block: Convert bdrv_get_block_status() to bytes'
009/15:[0064] [FC] 'block: Switch bdrv_co_get_block_status() to byte-based'
010/15:[0001] [FC] 'block: Switch BdrvCoGetBlockStatusData to byte-based'
011/15:[down] 'block: Switch bdrv_common_block_status_above() to byte-based'
012/15:[0010] [FC] 'block: Switch bdrv_co_get_block_status_above() to byte-based'
013/15:[0063] [FC] 'block: Convert bdrv_get_block_status_above() to bytes'
014/15:[down] 'block: Align block status requests'
015/15:[down] 'qemu-io: Relax 'alloc' now that block-status doesn't assert'

Eric Blake (14):
  block: Allow NULL file for bdrv_get_block_status()
  block: Add flag to avoid wasted work in bdrv_is_allocated()
  block: Make bdrv_round_to_clusters() signature more useful
  qcow2: Switch is_zero_sectors() to byte-based
  block: Switch bdrv_make_zero() to byte-based
  qemu-img: Switch get_block_status() to byte-based
  block: Convert bdrv_get_block_status() to bytes
  block: Switch bdrv_co_get_block_status() to byte-based
  block: Switch BdrvCoGetBlockStatusData to byte-based
  block: Switch bdrv_common_block_status_above() to byte-based
  block: Switch bdrv_co_get_block_status_above() to byte-based
  block: Convert bdrv_get_block_status_above() to bytes
  block: Align block status requests
  qemu-io: Relax 'alloc' now that block-status doesn't assert

Manos Pitsidianakis (1):
  block: add default implementations for bdrv_co_get_block_status()

 include/block/block.h      |  26 ++--
 include/block/block_int.h  |  27 +++-
 block/blkdebug.c           |   9 +-
 block/commit.c             |  12 +-
 block/io.c                 | 302 ++++++++++++++++++++++++++++-----------------
 block/mirror.c             |  32 ++---
 block/qcow2-cluster.c      |   2 +-
 block/qcow2.c              |  34 ++---
 qemu-img.c                 |  71 ++++++-----
 qemu-io-cmds.c             |  13 --
 block/trace-events         |   2 +-
 tests/qemu-iotests/177     |  11 +-
 tests/qemu-iotests/177.out |  18 ++-
 13 files changed, 317 insertions(+), 242 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status()
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:33   ` Fam Zheng
  2017-07-05 22:00   ` Eric Blake
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 02/15] block: Allow NULL file for bdrv_get_block_status() Eric Blake
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Max Reitz, Jeff Cody,
	Stefan Hajnoczi, Fam Zheng

From: Manos Pitsidianakis <el13635@mail.ntua.gr>

bdrv_co_get_block_status_from_file() and
bdrv_co_get_block_status_from_backing() set *file to bs->file and
bs->backing respectively, so that bdrv_co_get_block_status() can recurse
to them. Future block drivers won't have to duplicate code to implement
this.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Message-Id: <20170629184320.7151-4-el13635@mail.ntua.gr>

---
v2: Including this patch from Manos, since it affects my later patches;
however, I anticipate that we will get a full v3 series from Manos
merged first
---
 include/block/block_int.h | 16 ++++++++++++++++
 block/blkdebug.c          | 12 +-----------
 block/commit.c            | 12 +-----------
 block/io.c                | 24 ++++++++++++++++++++++++
 block/mirror.c            | 12 +-----------
 5 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 226232d..724799c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -958,6 +958,22 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared);

+/*
+ * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * their file
+ * */
+int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
+                                                     int64_t sector_num,
+                                                     int nb_sectors, int *pnum,
+                                                     BlockDriverState **file);
+/*
+ * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * their backing file
+ * */
+int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
+                                                     int64_t sector_num,
+                                                     int nb_sectors, int *pnum,
+                                                     BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index b25856c..f1539db 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -641,16 +641,6 @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
     return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
 }

-static int64_t coroutine_fn blkdebug_co_get_block_status(
-    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-    BlockDriverState **file)
-{
-    *pnum = nb_sectors;
-    *file = bs->file->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-        (sector_num << BDRV_SECTOR_BITS);
-}
-
 static void blkdebug_close(BlockDriverState *bs)
 {
     BDRVBlkdebugState *s = bs->opaque;
@@ -925,7 +915,7 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
     .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
     .bdrv_co_pdiscard       = blkdebug_co_pdiscard,
-    .bdrv_co_get_block_status = blkdebug_co_get_block_status,
+    .bdrv_co_get_block_status = bdrv_co_get_block_status_from_file,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
diff --git a/block/commit.c b/block/commit.c
index 774a8a5..9e875a6 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -242,16 +242,6 @@ static int coroutine_fn bdrv_commit_top_preadv(BlockDriverState *bs,
     return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
 }

-static int64_t coroutine_fn bdrv_commit_top_get_block_status(
-    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-    BlockDriverState **file)
-{
-    *pnum = nb_sectors;
-    *file = bs->backing->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-           (sector_num << BDRV_SECTOR_BITS);
-}
-
 static void bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 {
     bdrv_refresh_filename(bs->backing->bs);
@@ -277,7 +267,7 @@ static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
 static BlockDriver bdrv_commit_top = {
     .format_name                = "commit_top",
     .bdrv_co_preadv             = bdrv_commit_top_preadv,
-    .bdrv_co_get_block_status   = bdrv_commit_top_get_block_status,
+    .bdrv_co_get_block_status   = bdrv_co_get_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_close                 = bdrv_commit_top_close,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
diff --git a/block/io.c b/block/io.c
index 53c01cf..8c67ba8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1689,6 +1689,30 @@ typedef struct BdrvCoGetBlockStatusData {
     bool done;
 } BdrvCoGetBlockStatusData;

+int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
+                                                     int64_t sector_num,
+                                                     int nb_sectors, int *pnum,
+                                                     BlockDriverState **file)
+{
+    assert(bs->file && bs->file->bs);
+    *pnum = nb_sectors;
+    *file = bs->file->bs;
+    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
+           (sector_num << BDRV_SECTOR_BITS);
+}
+
+int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
+                                                     int64_t sector_num,
+                                                     int nb_sectors, int *pnum,
+                                                     BlockDriverState **file)
+{
+    assert(bs->backing && bs->backing->bs);
+    *pnum = nb_sectors;
+    *file = bs->backing->bs;
+    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
+           (sector_num << BDRV_SECTOR_BITS);
+}
+
 /*
  * Returns the allocation status of the specified sectors.
  * Drivers not implementing the functionality are assumed to not support
diff --git a/block/mirror.c b/block/mirror.c
index 6f54dcc..4e883bc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1043,16 +1043,6 @@ static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)
     return bdrv_co_flush(bs->backing->bs);
 }

-static int64_t coroutine_fn bdrv_mirror_top_get_block_status(
-    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-    BlockDriverState **file)
-{
-    *pnum = nb_sectors;
-    *file = bs->backing->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-           (sector_num << BDRV_SECTOR_BITS);
-}
-
 static int coroutine_fn bdrv_mirror_top_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags)
 {
@@ -1099,7 +1089,7 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_pwrite_zeroes      = bdrv_mirror_top_pwrite_zeroes,
     .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
     .bdrv_co_flush              = bdrv_mirror_top_flush,
-    .bdrv_co_get_block_status   = bdrv_mirror_top_get_block_status,
+    .bdrv_co_get_block_status   = bdrv_co_get_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_close                 = bdrv_mirror_top_close,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 02/15] block: Allow NULL file for bdrv_get_block_status()
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status() Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:33   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng,
	Max Reitz, Jeff Cody

Not all callers care about which BDS owns the mapping for a given
range of the file.  This patch merely simplifies the callers by
consolidating the logic in the common call point, while guaranteeing
a non-NULL file to all the driver callbacks, for no semantic change.
The only caller that does not care about pnum is bdrv_is_allocated,
as invoked by vvfat; we can likewise add assertions that the rest
of the stack does not have to worry about a NULL pnum.

Furthermore, this will also set the stage for a future cleanup: when
a caller does not care about which BDS owns an offset, it would be
nice to allow the driver to optimize things to not have to return
BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
allocation (for example, it's fairly easy to create a qcow2 image
where consecutive guest addresses are not at consecutive host
addresses), the current contract requires bdrv_get_block_status()
to clamp *pnum to the limit where host addresses are no longer
consecutive, but allowing a NULL file means that *pnum could be
set to the full length of known-allocated data.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: use local variable and final transfer, rather than assignment
of parameter to local
[previously in different series]:
v2: new patch, https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05645.html
---
 include/block/block_int.h | 10 ++++++----
 block/io.c                | 44 ++++++++++++++++++++++++++++----------------
 block/mirror.c            |  3 +--
 block/qcow2.c             |  4 +---
 qemu-img.c                | 10 ++++------
 5 files changed, 40 insertions(+), 31 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 724799c..ffa22c7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -168,10 +168,12 @@ struct BlockDriver {
         int64_t offset, int bytes);

     /*
-     * Building block for bdrv_block_status[_above]. The driver should
-     * answer only according to the current layer, and should not
-     * set BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
-     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.
+     * Building block for bdrv_block_status[_above] and
+     * bdrv_is_allocated[_above].  The driver should answer only
+     * according to the current layer, and should not set
+     * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
+     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
+     * layer guarantees non-NULL pnum and file.
      */
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum,
diff --git a/block/io.c b/block/io.c
index 8c67ba8..6358d07 100644
--- a/block/io.c
+++ b/block/io.c
@@ -671,7 +671,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
     int64_t target_sectors, ret, nb_sectors, sector_num = 0;
     BlockDriverState *bs = child->bs;
-    BlockDriverState *file;
     int n;

     target_sectors = bdrv_nb_sectors(bs);
@@ -684,7 +683,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
         if (nb_sectors <= 0) {
             return 0;
         }
-        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, &file);
+        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, NULL);
         if (ret < 0) {
             error_report("error getting block status at sector %" PRId64 ": %s",
                          sector_num, strerror(-ret));
@@ -1729,8 +1728,9 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * beyond the end of the disk image it will be clamped; if 'pnum' is set to
  * the end of the image, then the returned value will include BDRV_BLOCK_EOF.
  *
- * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, 'file'
- * points to the BDS which the sector range is allocated in.
+ * If returned value is positive, BDRV_BLOCK_OFFSET_VALID bit is set, and
+ * 'file' is non-NULL, then '*file' points to the BDS which the sector range
+ * is allocated in.
  */
 static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
                                                      int64_t sector_num,
@@ -1740,15 +1740,22 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     int64_t total_sectors;
     int64_t n;
     int64_t ret, ret2;
+    BlockDriverState *local_file = NULL;

-    *file = NULL;
+    assert(pnum);
     total_sectors = bdrv_nb_sectors(bs);
     if (total_sectors < 0) {
+        if (file) {
+            *file = NULL;
+        }
         return total_sectors;
     }

     if (sector_num >= total_sectors) {
         *pnum = 0;
+        if (file) {
+            *file = NULL;
+        }
         return BDRV_BLOCK_EOF;
     }

@@ -1765,23 +1772,27 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
         if (bs->drv->protocol_name) {
             ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
-            *file = bs;
+            if (file) {
+                *file = bs;
+            }
+        } else if (file) {
+            *file = NULL;
         }
         return ret;
     }

     bdrv_inc_in_flight(bs);
     ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
-                                            file);
+                                            &local_file);
     if (ret < 0) {
         *pnum = 0;
         goto out;
     }

     if (ret & BDRV_BLOCK_RAW) {
-        assert(ret & BDRV_BLOCK_OFFSET_VALID && *file);
-        ret = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
-                                       *pnum, pnum, file);
+        assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
+        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+                                       *pnum, pnum, &local_file);
         goto out;
     }

@@ -1799,14 +1810,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }

-    if (*file && *file != bs &&
+    if (local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
-        BlockDriverState *file2;
         int file_pnum;

-        ret2 = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
-                                        *pnum, &file_pnum, &file2);
+        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+                                        *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
              * is useful but not necessary.
@@ -1828,6 +1838,9 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     }

 out:
+    if (file) {
+        *file = local_file;
+    }
     bdrv_dec_in_flight(bs);
     if (ret >= 0 && sector_num + *pnum == total_sectors) {
         ret |= BDRV_BLOCK_EOF;
@@ -1931,7 +1944,6 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
-    BlockDriverState *file;
     int64_t sector_num = offset >> BDRV_SECTOR_BITS;
     int nb_sectors = bytes >> BDRV_SECTOR_BITS;
     int64_t ret;
@@ -1941,7 +1953,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
     assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
            bytes < INT_MAX * BDRV_SECTOR_SIZE);
     ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
-                                &file);
+                                NULL);
     if (ret < 0) {
         return ret;
     }
diff --git a/block/mirror.c b/block/mirror.c
index 4e883bc..21758a8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -391,7 +391,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         int io_sectors;
         unsigned int io_bytes;
         int64_t io_bytes_acct;
-        BlockDriverState *file;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
             MIRROR_METHOD_ZERO,
@@ -402,7 +401,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         ret = bdrv_get_block_status_above(source, NULL,
                                           offset >> BDRV_SECTOR_BITS,
                                           nb_chunks * sectors_per_chunk,
-                                          &io_sectors, &file);
+                                          &io_sectors, NULL);
         io_bytes = io_sectors * BDRV_SECTOR_SIZE;
         if (ret < 0) {
             io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
diff --git a/block/qcow2.c b/block/qcow2.c
index 9d3c70e..d75f248 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2909,7 +2909,6 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
                             uint32_t count)
 {
     int nr;
-    BlockDriverState *file;
     int64_t res;

     if (start + count > bs->total_sectors) {
@@ -2919,8 +2918,7 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
     if (!count) {
         return true;
     }
-    res = bdrv_get_block_status_above(bs, NULL, start, count,
-                                      &nr, &file);
+    res = bdrv_get_block_status_above(bs, NULL, start, count, &nr, NULL);
     return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
 }

diff --git a/qemu-img.c b/qemu-img.c
index 8def1e8..daba954 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1373,7 +1373,6 @@ static int img_compare(int argc, char **argv)

     for (;;) {
         int64_t status1, status2;
-        BlockDriverState *file;

         nb_sectors = sectors_to_process(total_sectors, sector_num);
         if (nb_sectors <= 0) {
@@ -1381,7 +1380,7 @@ static int img_compare(int argc, char **argv)
         }
         status1 = bdrv_get_block_status_above(bs1, NULL, sector_num,
                                               total_sectors1 - sector_num,
-                                              &pnum1, &file);
+                                              &pnum1, NULL);
         if (status1 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename1);
@@ -1391,7 +1390,7 @@ static int img_compare(int argc, char **argv)

         status2 = bdrv_get_block_status_above(bs2, NULL, sector_num,
                                               total_sectors2 - sector_num,
-                                              &pnum2, &file);
+                                              &pnum2, NULL);
         if (status2 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename2);
@@ -1594,15 +1593,14 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
     n = MIN(s->total_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);

     if (s->sector_next_status <= sector_num) {
-        BlockDriverState *file;
         if (s->target_has_backing) {
             ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
                                         sector_num - src_cur_offset,
-                                        n, &n, &file);
+                                        n, &n, NULL);
         } else {
             ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
                                               sector_num - src_cur_offset,
-                                              n, &n, &file);
+                                              n, &n, NULL);
         }
         if (ret < 0) {
             return ret;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status() Eric Blake
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 02/15] block: Allow NULL file for bdrv_get_block_status() Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:06   ` Fam Zheng
  2017-07-06  2:35   ` Eric Blake
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 04/15] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

Not all callers care about which BDS owns the mapping for a given
range of the file.  In particular, bdrv_is_allocated() cares more
about finding the largest run of allocated data from the guest
perspective, whether or not that data is consecutive from the
host perspective.  Therefore, doing subsequent refinements such
as checking how much of the format-layer allocation also satisfies
BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
case, it just costs extra CPU cycles during a single
bdrv_is_allocated(), but in the worst case, it results in a smaller
*pnum, and forces callers to iterate through more status probes when
visiting the entire file for even more extra CPU cycles.

This patch only optimizes the block layer.  But subsequent patches
will tweak the driver callback to be byte-based, and in the process,
can also pass this hint through to the driver.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/io.c | 51 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index 6358d07..719a6b0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1685,6 +1685,7 @@ typedef struct BdrvCoGetBlockStatusData {
     int nb_sectors;
     int *pnum;
     int64_t ret;
+    bool allocation;
     bool done;
 } BdrvCoGetBlockStatusData;

@@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
+ * If 'allocation' is true, the caller only cares about allocation
+ * status; this is a hint that a larger 'pnum' result is more
+ * important than including BDRV_BLOCK_OFFSET_VALID in the return.
+ *
  * If 'sector_num' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
  *
@@ -1733,6 +1738,7 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * is allocated in.
  */
 static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
+                                                     bool allocation,
                                                      int64_t sector_num,
                                                      int nb_sectors, int *pnum,
                                                      BlockDriverState **file)
@@ -1791,14 +1797,15 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+        ret = bdrv_co_get_block_status(local_file, allocation,
+                                       ret >> BDRV_SECTOR_BITS,
                                        *pnum, pnum, &local_file);
         goto out;
     }

     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
-    } else {
+    } else if (!allocation) {
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
         } else if (bs->backing) {
@@ -1810,12 +1817,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }

-    if (local_file && local_file != bs &&
+    if (!allocation && local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int file_pnum;

-        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+        ret2 = bdrv_co_get_block_status(local_file, true,
+                                        ret >> BDRV_SECTOR_BITS,
                                         *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
@@ -1850,6 +1858,7 @@ out:

 static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
         BlockDriverState *base,
+        bool allocation,
         int64_t sector_num,
         int nb_sectors,
         int *pnum,
@@ -1861,7 +1870,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
+        ret = bdrv_co_get_block_status(p, allocation, sector_num, nb_sectors,
+                                       pnum, file);
         if (ret < 0) {
             break;
         }
@@ -1891,6 +1901,7 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
     BdrvCoGetBlockStatusData *data = opaque;

     data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
+                                               data->allocation,
                                                data->sector_num,
                                                data->nb_sectors,
                                                data->pnum,
@@ -1903,11 +1914,12 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
  *
  * See bdrv_co_get_block_status_above() for details.
  */
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file)
+static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
+                                              BlockDriverState *base,
+                                              bool allocation,
+                                              int64_t sector_num,
+                                              int nb_sectors, int *pnum,
+                                              BlockDriverState **file)
 {
     Coroutine *co;
     BdrvCoGetBlockStatusData data = {
@@ -1917,6 +1929,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
         .sector_num = sector_num,
         .nb_sectors = nb_sectors,
         .pnum = pnum,
+        .allocation = allocation,
         .done = false,
     };

@@ -1932,6 +1945,16 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
     return data.ret;
 }

+int64_t bdrv_get_block_status_above(BlockDriverState *bs,
+                                    BlockDriverState *base,
+                                    int64_t sector_num,
+                                    int nb_sectors, int *pnum,
+                                    BlockDriverState **file)
+{
+    return bdrv_common_block_status_above(bs, base, false, sector_num,
+                                          nb_sectors, pnum, file);
+}
+
 int64_t bdrv_get_block_status(BlockDriverState *bs,
                               int64_t sector_num,
                               int nb_sectors, int *pnum,
@@ -1944,16 +1967,16 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
-    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
     int64_t ret;
     int psectors;

     assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
     assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
            bytes < INT_MAX * BDRV_SECTOR_SIZE);
-    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
-                                NULL);
+    ret = bdrv_common_block_status_above(bs, backing_bs(bs), true,
+                                         offset >> BDRV_SECTOR_BITS,
+                                         bytes >> BDRV_SECTOR_BITS, &psectors,
+                                         NULL);
     if (ret < 0) {
         return ret;
     }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 04/15] block: Make bdrv_round_to_clusters() signature more useful
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (2 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:34   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 05/15] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng,
	Max Reitz, Jeff Cody

In the process of converting sector-based interfaces to bytes,
I'm finding it easier to represent a byte count as a 64-bit
integer at the block layer (even if we are internally capped
by SIZE_MAX or even INT_MAX for individual transactions, it's
still nicer to not have to worry about truncation/overflow
issues on as many variables).  Update the signature of
bdrv_round_to_clusters() to uniformly use int64_t, matching
the signature already chosen for bdrv_is_allocated and the
fact that off_t is also a signed type, then adjust clients
according to the required fallout.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: fix commit message [John], rebase to earlier changes, including
mirror_clip_bytes() signature update
---
 include/block/block.h | 4 ++--
 block/io.c            | 7 ++++---
 block/mirror.c        | 7 +++----
 block/trace-events    | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index f0fdbe8..cc82a0d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -473,9 +473,9 @@ int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
 void bdrv_round_to_clusters(BlockDriverState *bs,
-                            int64_t offset, unsigned int bytes,
+                            int64_t offset, int64_t bytes,
                             int64_t *cluster_offset,
-                            unsigned int *cluster_bytes);
+                            int64_t *cluster_bytes);

 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
diff --git a/block/io.c b/block/io.c
index 719a6b0..2377f3a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -422,9 +422,9 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-                            int64_t offset, unsigned int bytes,
+                            int64_t offset, int64_t bytes,
                             int64_t *cluster_offset,
-                            unsigned int *cluster_bytes)
+                            int64_t *cluster_bytes)
 {
     BlockDriverInfo bdi;

@@ -922,7 +922,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
     struct iovec iov;
     QEMUIOVector bounce_qiov;
     int64_t cluster_offset;
-    unsigned int cluster_bytes;
+    int64_t cluster_bytes;
     size_t skip_bytes;
     int ret;

@@ -943,6 +943,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
     trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
                                    cluster_offset, cluster_bytes);

+    assert(cluster_bytes < SIZE_MAX);
     iov.iov_len = cluster_bytes;
     iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
     if (bounce_buffer == NULL) {
diff --git a/block/mirror.c b/block/mirror.c
index 21758a8..08c0c27 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -190,10 +190,9 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
     bool need_cow;
     int ret = 0;
     int64_t align_offset = *offset;
-    unsigned int align_bytes = *bytes;
+    int64_t align_bytes = *bytes;
     int max_bytes = s->granularity * s->max_iov;

-    assert(*bytes < INT_MAX);
     need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
     need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
                           s->cow_bitmap);
@@ -389,7 +388,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     while (nb_chunks > 0 && offset < s->bdev_length) {
         int64_t ret;
         int io_sectors;
-        unsigned int io_bytes;
+        int64_t io_bytes;
         int64_t io_bytes_acct;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
@@ -414,7 +413,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             io_bytes = s->granularity;
         } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
             int64_t target_offset;
-            unsigned int target_bytes;
+            int64_t target_bytes;
             bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
                                    &target_offset, &target_bytes);
             if (target_offset == offset &&
diff --git a/block/trace-events b/block/trace-events
index 4a4df25..13a5a87 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -12,7 +12,7 @@ blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flag
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags %#x"
-bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"
+bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %"PRId64" cluster_offset %"PRId64" cluster_bytes %u"

 # block/stream.c
 stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 05/15] qcow2: Switch is_zero_sectors() to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (3 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 04/15] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:34   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 06/15] block: Switch bdrv_make_zero() " Eric Blake
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, qemu-block, el13635, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and rename it to is_zero() in the
process.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rename function, rebase to upstream changes
---
 block/qcow2.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d75f248..9754193 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2905,21 +2905,28 @@ finish:
 }


-static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
-                            uint32_t count)
+static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
     int nr;
     int64_t res;
+    int64_t start;

-    if (start + count > bs->total_sectors) {
-        count = bs->total_sectors - start;
+    /* Widen to sector boundaries, then clamp to image length, before
+     * checking status of underlying sectors */
+    start = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+    bytes = QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE) - start;
+
+    if (start + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - start;
     }

-    if (!count) {
+    if (!bytes) {
         return true;
     }
-    res = bdrv_get_block_status_above(bs, NULL, start, count, &nr, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
+    res = bdrv_get_block_status_above(bs, NULL, start >> BDRV_SECTOR_BITS,
+                                      bytes >> BDRV_SECTOR_BITS, &nr, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) &&
+        nr * BDRV_SECTOR_SIZE == bytes;
 }

 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
@@ -2937,24 +2944,21 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     }

     if (head || tail) {
-        int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
         uint64_t off;
         unsigned int nr;

         assert(head + bytes <= s->cluster_size);

         /* check whether remainder of cluster already reads as zero */
-        if (!(is_zero_sectors(bs, cl_start,
-                              DIV_ROUND_UP(head, BDRV_SECTOR_SIZE)) &&
-              is_zero_sectors(bs, (offset + bytes) >> BDRV_SECTOR_BITS,
-                              DIV_ROUND_UP(-tail & (s->cluster_size - 1),
-                                           BDRV_SECTOR_SIZE)))) {
+        if (!(is_zero(bs, offset - head, head) &&
+              is_zero(bs, offset + bytes,
+                      tail ? s->cluster_size - tail : 0))) {
             return -ENOTSUP;
         }

         qemu_co_mutex_lock(&s->lock);
         /* We can have new write after previous check */
-        offset = cl_start << BDRV_SECTOR_BITS;
+        offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
         bytes = s->cluster_size;
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 06/15] block: Switch bdrv_make_zero() to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (4 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 05/15] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:35   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 07/15] qemu-img: Switch get_block_status() " Eric Blake
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of zeroing a device to track by bytes instead of
sectors (although we are still guaranteed that we iterate by steps
that are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase to earlier changes
---
 block/io.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2377f3a..b3ba9af 100644
--- a/block/io.c
+++ b/block/io.c
@@ -669,38 +669,38 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
  */
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
-    int64_t target_sectors, ret, nb_sectors, sector_num = 0;
+    int64_t target_size, ret, bytes, offset = 0;
     BlockDriverState *bs = child->bs;
-    int n;
+    int n; /* sectors */

-    target_sectors = bdrv_nb_sectors(bs);
-    if (target_sectors < 0) {
-        return target_sectors;
+    target_size = bdrv_getlength(bs);
+    if (target_size < 0) {
+        return target_size;
     }

     for (;;) {
-        nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);
-        if (nb_sectors <= 0) {
+        bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
+        if (bytes <= 0) {
             return 0;
         }
-        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, NULL);
+        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+                                    bytes >> BDRV_SECTOR_BITS, &n, NULL);
         if (ret < 0) {
-            error_report("error getting block status at sector %" PRId64 ": %s",
-                         sector_num, strerror(-ret));
+            error_report("error getting block status at offset %" PRId64 ": %s",
+                         offset, strerror(-ret));
             return ret;
         }
         if (ret & BDRV_BLOCK_ZERO) {
-            sector_num += n;
+            offset += n * BDRV_SECTOR_BITS;
             continue;
         }
-        ret = bdrv_pwrite_zeroes(child, sector_num << BDRV_SECTOR_BITS,
-                                 n << BDRV_SECTOR_BITS, flags);
+        ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
         if (ret < 0) {
-            error_report("error writing zeroes at sector %" PRId64 ": %s",
-                         sector_num, strerror(-ret));
+            error_report("error writing zeroes at offset %" PRId64 ": %s",
+                         offset, strerror(-ret));
             return ret;
         }
-        sector_num += n;
+        offset += n * BDRV_SECTOR_SIZE;
     }
 }

-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 07/15] qemu-img: Switch get_block_status() to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (5 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 06/15] block: Switch bdrv_make_zero() " Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  7:40   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 08/15] block: Convert bdrv_get_block_status() to bytes Eric Blake
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, qemu-block, el13635, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
an internal function (no semantic change), and simplifying its
caller accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: no change
---
 qemu-img.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index daba954..7b7f992 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2674,14 +2674,16 @@ static void dump_map_entry(OutputFormat output_format, MapEntry *e,
     }
 }

-static int get_block_status(BlockDriverState *bs, int64_t sector_num,
-                            int nb_sectors, MapEntry *e)
+static int get_block_status(BlockDriverState *bs, int64_t offset,
+                            int64_t bytes, MapEntry *e)
 {
     int64_t ret;
     int depth;
     BlockDriverState *file;
     bool has_offset;
+    int nb_sectors = bytes >> BDRV_SECTOR_BITS;

+    assert(bytes < INT_MAX);
     /* As an optimization, we could cache the current range of unallocated
      * clusters in each file of the chain, and avoid querying the same
      * range repeatedly.
@@ -2689,8 +2691,8 @@ static int get_block_status(BlockDriverState *bs, int64_t sector_num,

     depth = 0;
     for (;;) {
-        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &nb_sectors,
-                                    &file);
+        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS, nb_sectors,
+                                    &nb_sectors, &file);
         if (ret < 0) {
             return ret;
         }
@@ -2710,7 +2712,7 @@ static int get_block_status(BlockDriverState *bs, int64_t sector_num,
     has_offset = !!(ret & BDRV_BLOCK_OFFSET_VALID);

     *e = (MapEntry) {
-        .start = sector_num * BDRV_SECTOR_SIZE,
+        .start = offset,
         .length = nb_sectors * BDRV_SECTOR_SIZE,
         .data = !!(ret & BDRV_BLOCK_DATA),
         .zero = !!(ret & BDRV_BLOCK_ZERO),
@@ -2840,16 +2842,12 @@ static int img_map(int argc, char **argv)

     length = blk_getlength(blk);
     while (curr.start + curr.length < length) {
-        int64_t nsectors_left;
-        int64_t sector_num;
-        int n;
-
-        sector_num = (curr.start + curr.length) >> BDRV_SECTOR_BITS;
+        int64_t offset = curr.start + curr.length;
+        int64_t n;

         /* Probe up to 1 GiB at a time.  */
-        nsectors_left = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE) - sector_num;
-        n = MIN(1 << (30 - BDRV_SECTOR_BITS), nsectors_left);
-        ret = get_block_status(bs, sector_num, n, &next);
+        n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE);
+        ret = get_block_status(bs, offset, n, &next);

         if (ret < 0) {
             error_report("Could not read file metadata: %s", strerror(-ret));
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 08/15] block: Convert bdrv_get_block_status() to bytes
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (6 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 07/15] qemu-img: Switch get_block_status() " Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  8:40   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 09/15] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status() to
bdrv_block_status() ensures that the compiler enforces that all
callers are updated.  For now, the io.c layer still assert()s that
all callers are sector-aligned, but that can be relaxed when a later
patch implements byte-based block status in the drivers.

Note that we have an inherent limitation in the BDRV_BLOCK_* return
values: BDRV_BLOCK_OFFSET_VALID can only return the start of a
sector, even if we later relax the interface to query for the status
starting at an intermediate byte; document the obvious interpretation
that valid offsets are always sector-relative.

Therefore, for the most part this patch is just the addition of scaling
at the callers followed by inverse scaling at bdrv_block_status().  But
some code, particularly bdrv_is_allocated(), gets a lot simpler because
it no longer has to mess with sectors.

For ease of review, bdrv_get_block_status_above() will be tackled
separately.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase to earlier changes
---
 include/block/block.h | 12 +++++++-----
 block/io.c            | 31 +++++++++++++++++++------------
 block/qcow2-cluster.c |  2 +-
 qemu-img.c            | 20 +++++++++++---------
 4 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index cc82a0d..e3e6582 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -138,8 +138,10 @@ typedef struct HDGeometry {
  *
  * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK)
  * represent the offset in the returned BDS that is allocated for the
- * corresponding raw data; however, whether that offset actually contains
- * data also depends on BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO, as follows:
+ * corresponding raw data.  Individual bytes are at the same sector-relative
+ * locations (and thus, this bit cannot be set for mappings which are
+ * not equivalent modulo 512).  However, whether that offset actually
+ * contains data also depends on BDRV_BLOCK_DATA, as follows:
  *
  * DATA ZERO OFFSET_VALID
  *  t    t        t       sectors read as zero, returned file is zero at offset
@@ -419,9 +421,9 @@ int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
-int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
-                              int nb_sectors, int *pnum,
-                              BlockDriverState **file);
+int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
+                          int64_t bytes, int64_t *pnum,
+                          BlockDriverState **file);
 int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                     BlockDriverState *base,
                                     int64_t sector_num,
diff --git a/block/io.c b/block/io.c
index b3ba9af..2662f37 100644
--- a/block/io.c
+++ b/block/io.c
@@ -671,7 +671,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
     int64_t target_size, ret, bytes, offset = 0;
     BlockDriverState *bs = child->bs;
-    int n; /* sectors */

     target_size = bdrv_getlength(bs);
     if (target_size < 0) {
@@ -683,24 +682,23 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
         if (bytes <= 0) {
             return 0;
         }
-        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-                                    bytes >> BDRV_SECTOR_BITS, &n, NULL);
+        ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL);
         if (ret < 0) {
             error_report("error getting block status at offset %" PRId64 ": %s",
                          offset, strerror(-ret));
             return ret;
         }
         if (ret & BDRV_BLOCK_ZERO) {
-            offset += n * BDRV_SECTOR_BITS;
+            offset += bytes;
             continue;
         }
-        ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
+        ret = bdrv_pwrite_zeroes(child, offset, bytes, flags);
         if (ret < 0) {
             error_report("error writing zeroes at offset %" PRId64 ": %s",
                          offset, strerror(-ret));
             return ret;
         }
-        offset += n * BDRV_SECTOR_SIZE;
+        offset += bytes;
     }
 }

@@ -1956,13 +1954,22 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                           nb_sectors, pnum, file);
 }

-int64_t bdrv_get_block_status(BlockDriverState *bs,
-                              int64_t sector_num,
-                              int nb_sectors, int *pnum,
-                              BlockDriverState **file)
+int64_t bdrv_block_status(BlockDriverState *bs,
+                          int64_t offset, int64_t bytes, int64_t *pnum,
+                          BlockDriverState **file)
 {
-    return bdrv_get_block_status_above(bs, backing_bs(bs),
-                                       sector_num, nb_sectors, pnum, file);
+    int64_t ret;
+    int n;
+
+    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
+    assert(bytes <= BDRV_REQUEST_MAX_BYTES);
+    ret = bdrv_get_block_status_above(bs, backing_bs(bs),
+                                      offset >> BDRV_SECTOR_BITS,
+                                      bytes >> BDRV_SECTOR_BITS, &n, file);
+    if (pnum) {
+        *pnum = n * BDRV_SECTOR_SIZE;
+    }
+    return ret;
 }

 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f..0185986 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1567,7 +1567,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
          * cluster is already marked as zero, or if it's unallocated and we
          * don't have a backing file.
          *
-         * TODO We might want to use bdrv_get_block_status(bs) here, but we're
+         * TODO We might want to use bdrv_block_status(bs) here, but we're
          * holding s->lock, so that doesn't work today.
          *
          * If full_discard is true, the sector should not read back as zeroes,
diff --git a/qemu-img.c b/qemu-img.c
index 7b7f992..2b1422a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1594,9 +1594,14 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)

     if (s->sector_next_status <= sector_num) {
         if (s->target_has_backing) {
-            ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
-                                        sector_num - src_cur_offset,
-                                        n, &n, NULL);
+            int64_t count = n * BDRV_SECTOR_SIZE;
+
+            ret = bdrv_block_status(blk_bs(s->src[src_cur]),
+                                    (sector_num - src_cur_offset) *
+                                    BDRV_SECTOR_SIZE,
+                                    count, &count, NULL);
+            assert(ret < 0 || QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
+            n = count >> BDRV_SECTOR_BITS;
         } else {
             ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
                                               sector_num - src_cur_offset,
@@ -2681,9 +2686,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
     int depth;
     BlockDriverState *file;
     bool has_offset;
-    int nb_sectors = bytes >> BDRV_SECTOR_BITS;

-    assert(bytes < INT_MAX);
     /* As an optimization, we could cache the current range of unallocated
      * clusters in each file of the chain, and avoid querying the same
      * range repeatedly.
@@ -2691,12 +2694,11 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,

     depth = 0;
     for (;;) {
-        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS, nb_sectors,
-                                    &nb_sectors, &file);
+        ret = bdrv_block_status(bs, offset, bytes, &bytes, &file);
         if (ret < 0) {
             return ret;
         }
-        assert(nb_sectors);
+        assert(bytes);
         if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
             break;
         }
@@ -2713,7 +2715,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,

     *e = (MapEntry) {
         .start = offset,
-        .length = nb_sectors * BDRV_SECTOR_SIZE,
+        .length = bytes,
         .data = !!(ret & BDRV_BLOCK_DATA),
         .zero = !!(ret & BDRV_BLOCK_ZERO),
         .offset = ret & BDRV_BLOCK_OFFSET_MASK,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 09/15] block: Switch bdrv_co_get_block_status() to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (7 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 08/15] block: Convert bdrv_get_block_status() to bytes Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  8:59   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 10/15] block: Switch BdrvCoGetBlockStatusData " Eric Blake
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and as with its public counterpart,
rename to bdrv_co_block_status() to make the compiler enforce that
we catch all uses.  For now, we assert that callers still pass
aligned data, but ultimately, this will be the function where we
hand off to a byte-based driver callback, and will eventually need
to add logic to ensure we round calls according to the driver's
request_alignment then touch up the result handed back to the
caller, to start permitting a caller to pass unaligned offsets.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase to earlier changes
---
 block/io.c | 90 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 53 insertions(+), 37 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2662f37..2d324af 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1721,42 +1721,43 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * status; this is a hint that a larger 'pnum' result is more
  * important than including BDRV_BLOCK_OFFSET_VALID in the return.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is
+ * If 'offset' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
- * allocated/unallocated state.
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
+ * allocated/unallocated state.  It may be NULL.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped; if 'pnum' is set to
  * the end of the image, then the returned value will include BDRV_BLOCK_EOF.
  *
  * If returned value is positive, BDRV_BLOCK_OFFSET_VALID bit is set, and
- * 'file' is non-NULL, then '*file' points to the BDS which the sector range
- * is allocated in.
+ * 'file' is non-NULL, then '*file' points to the BDS which owns the
+ * allocated sector that contains offset.
  */
-static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
-                                                     bool allocation,
-                                                     int64_t sector_num,
-                                                     int nb_sectors, int *pnum,
-                                                     BlockDriverState **file)
+static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
+                                                 bool allocation,
+                                                 int64_t offset, int64_t bytes,
+                                                 int64_t *pnum,
+                                                 BlockDriverState **file)
 {
-    int64_t total_sectors;
-    int64_t n;
+    int64_t total_size;
+    int64_t n; /* bytes */
     int64_t ret, ret2;
     BlockDriverState *local_file = NULL;
+    int count; /* sectors */

     assert(pnum);
-    total_sectors = bdrv_nb_sectors(bs);
-    if (total_sectors < 0) {
+    total_size = bdrv_getlength(bs);
+    if (total_size < 0) {
         if (file) {
             *file = NULL;
         }
-        return total_sectors;
+        return total_size;
     }

-    if (sector_num >= total_sectors) {
+    if (offset >= total_size) {
         *pnum = 0;
         if (file) {
             *file = NULL;
@@ -1764,19 +1765,19 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         return BDRV_BLOCK_EOF;
     }

-    n = total_sectors - sector_num;
-    if (n < nb_sectors) {
-        nb_sectors = n;
+    n = total_size - offset;
+    if (n < bytes) {
+        bytes = n;
     }

     if (!bs->drv->bdrv_co_get_block_status) {
-        *pnum = nb_sectors;
+        *pnum = bytes;
         ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
-        if (sector_num + nb_sectors == total_sectors) {
+        if (offset + bytes == total_size) {
             ret |= BDRV_BLOCK_EOF;
         }
         if (bs->drv->protocol_name) {
-            ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
+            ret |= BDRV_BLOCK_OFFSET_VALID | (offset & BDRV_BLOCK_OFFSET_MASK);
             if (file) {
                 *file = bs;
             }
@@ -1787,18 +1788,27 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     }

     bdrv_inc_in_flight(bs);
-    ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
+    /*
+     * TODO: Rather than require aligned offsets, we could instead
+     * round to the driver's request_alignment here, then touch up
+     * count afterwards back to the caller's expectations.
+     */
+    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
+    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+                                            bytes >> BDRV_SECTOR_BITS, &count,
                                             &local_file);
     if (ret < 0) {
         *pnum = 0;
         goto out;
     }
+    *pnum = count * BDRV_SECTOR_SIZE;

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_get_block_status(local_file, allocation,
-                                       ret >> BDRV_SECTOR_BITS,
-                                       *pnum, pnum, &local_file);
+        ret = bdrv_co_block_status(local_file, allocation,
+                                   ret & BDRV_BLOCK_OFFSET_MASK,
+                                   *pnum, pnum, &local_file);
+        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
         goto out;
     }

@@ -1809,8 +1819,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
             ret |= BDRV_BLOCK_ZERO;
         } else if (bs->backing) {
             BlockDriverState *bs2 = bs->backing->bs;
-            int64_t nb_sectors2 = bdrv_nb_sectors(bs2);
-            if (nb_sectors2 >= 0 && sector_num >= nb_sectors2) {
+            int64_t size2 = bdrv_getlength(bs2);
+            if (size2 >= 0 && offset >= size2) {
                 ret |= BDRV_BLOCK_ZERO;
             }
         }
@@ -1819,11 +1829,11 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     if (!allocation && local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
-        int file_pnum;
+        int64_t file_pnum;

-        ret2 = bdrv_co_get_block_status(local_file, true,
-                                        ret >> BDRV_SECTOR_BITS,
-                                        *pnum, &file_pnum, NULL);
+        ret2 = bdrv_co_block_status(local_file, true,
+                                    ret & BDRV_BLOCK_OFFSET_MASK,
+                                    *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
              * is useful but not necessary.
@@ -1849,7 +1859,7 @@ out:
         *file = local_file;
     }
     bdrv_dec_in_flight(bs);
-    if (ret >= 0 && sector_num + *pnum == total_sectors) {
+    if (ret >= 0 && offset + *pnum == total_size) {
         ret |= BDRV_BLOCK_EOF;
     }
     return ret;
@@ -1869,11 +1879,17 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        ret = bdrv_co_get_block_status(p, allocation, sector_num, nb_sectors,
-                                       pnum, file);
+        int64_t count;
+
+        ret = bdrv_co_block_status(p, allocation,
+                                   sector_num * BDRV_SECTOR_SIZE,
+                                   nb_sectors * BDRV_SECTOR_SIZE, &count,
+                                   file);
         if (ret < 0) {
             break;
         }
+        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
+        *pnum = count >> BDRV_SECTOR_BITS;
         if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
             /*
              * Reading beyond the end of the file continues to read
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 10/15] block: Switch BdrvCoGetBlockStatusData to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (8 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 09/15] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  9:00   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 11/15] block: Switch bdrv_common_block_status_above() " Eric Blake
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase to earlier changes
---
 block/io.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2d324af..888f7a1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1676,17 +1676,17 @@ int bdrv_flush_all(void)
 }


-typedef struct BdrvCoGetBlockStatusData {
+typedef struct BdrvCoBlockStatusData {
     BlockDriverState *bs;
     BlockDriverState *base;
     BlockDriverState **file;
-    int64_t sector_num;
-    int nb_sectors;
-    int *pnum;
+    int64_t offset;
+    int64_t bytes;
+    int64_t *pnum;
     int64_t ret;
     bool allocation;
     bool done;
-} BdrvCoGetBlockStatusData;
+} BdrvCoBlockStatusData;

 int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
                                                      int64_t sector_num,
@@ -1913,14 +1913,16 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
 /* Coroutine wrapper for bdrv_get_block_status_above() */
 static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
 {
-    BdrvCoGetBlockStatusData *data = opaque;
+    BdrvCoBlockStatusData *data = opaque;
+    int n;

     data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
                                                data->allocation,
-                                               data->sector_num,
-                                               data->nb_sectors,
-                                               data->pnum,
+                                               data->offset >> BDRV_SECTOR_BITS,
+                                               data->bytes >> BDRV_SECTOR_BITS,
+                                               &n,
                                                data->file);
+    *data->pnum = n * BDRV_SECTOR_SIZE;
     data->done = true;
 }

@@ -1937,13 +1939,14 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
                                               BlockDriverState **file)
 {
     Coroutine *co;
-    BdrvCoGetBlockStatusData data = {
+    int64_t n;
+    BdrvCoBlockStatusData data = {
         .bs = bs,
         .base = base,
         .file = file,
-        .sector_num = sector_num,
-        .nb_sectors = nb_sectors,
-        .pnum = pnum,
+        .offset = sector_num * BDRV_SECTOR_SIZE,
+        .bytes = nb_sectors * BDRV_SECTOR_SIZE,
+        .pnum = &n,
         .allocation = allocation,
         .done = false,
     };
@@ -1957,6 +1960,8 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
         bdrv_coroutine_enter(bs, co);
         BDRV_POLL_WHILE(bs, !data.done);
     }
+    assert(data.ret < 0 || QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
+    *pnum = n >> BDRV_SECTOR_BITS;
     return data.ret;
 }

-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 11/15] block: Switch bdrv_common_block_status_above() to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (9 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 10/15] block: Switch BdrvCoGetBlockStatusData " Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  9:02   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 12/15] block: Switch bdrv_co_get_block_status_above() " Eric Blake
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/io.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/block/io.c b/block/io.c
index 888f7a1..697db75 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1934,19 +1934,18 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
 static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
                                               BlockDriverState *base,
                                               bool allocation,
-                                              int64_t sector_num,
-                                              int nb_sectors, int *pnum,
+                                              int64_t offset,
+                                              int64_t bytes, int64_t *pnum,
                                               BlockDriverState **file)
 {
     Coroutine *co;
-    int64_t n;
     BdrvCoBlockStatusData data = {
         .bs = bs,
         .base = base,
         .file = file,
-        .offset = sector_num * BDRV_SECTOR_SIZE,
-        .bytes = nb_sectors * BDRV_SECTOR_SIZE,
-        .pnum = &n,
+        .offset = offset,
+        .bytes = bytes,
+        .pnum = pnum,
         .allocation = allocation,
         .done = false,
     };
@@ -1960,8 +1959,6 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
         bdrv_coroutine_enter(bs, co);
         BDRV_POLL_WHILE(bs, !data.done);
     }
-    assert(data.ret < 0 || QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
-    *pnum = n >> BDRV_SECTOR_BITS;
     return data.ret;
 }

@@ -1971,8 +1968,19 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                     int nb_sectors, int *pnum,
                                     BlockDriverState **file)
 {
-    return bdrv_common_block_status_above(bs, base, false, sector_num,
-                                          nb_sectors, pnum, file);
+    int64_t ret;
+    int64_t n;
+
+    ret = bdrv_common_block_status_above(bs, base, false,
+                                         sector_num * BDRV_SECTOR_SIZE,
+                                         nb_sectors * BDRV_SECTOR_SIZE,
+                                         &n, file);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
+    *pnum = n >> BDRV_SECTOR_BITS;
+    return ret;
 }

 int64_t bdrv_block_status(BlockDriverState *bs,
@@ -1997,21 +2005,13 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
     int64_t ret;
-    int psectors;
+    int64_t dummy;

-    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
-    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
-           bytes < INT_MAX * BDRV_SECTOR_SIZE);
-    ret = bdrv_common_block_status_above(bs, backing_bs(bs), true,
-                                         offset >> BDRV_SECTOR_BITS,
-                                         bytes >> BDRV_SECTOR_BITS, &psectors,
-                                         NULL);
+    ret = bdrv_common_block_status_above(bs, backing_bs(bs), true, offset,
+                                         bytes, pnum ? pnum : &dummy, NULL);
     if (ret < 0) {
         return ret;
     }
-    if (pnum) {
-        *pnum = psectors * BDRV_SECTOR_SIZE;
-    }
     return !!(ret & BDRV_BLOCK_ALLOCATED);
 }

-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 12/15] block: Switch bdrv_co_get_block_status_above() to byte-based
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (10 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 11/15] block: Switch bdrv_common_block_status_above() " Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  9:28   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 13/15] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase to earlier changes
---
 block/io.c | 48 ++++++++++++++++++------------------------------
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/block/io.c b/block/io.c
index 697db75..85353fa 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1865,12 +1865,12 @@ out:
     return ret;
 }

-static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
+static int64_t coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
         BlockDriverState *base,
         bool allocation,
-        int64_t sector_num,
-        int nb_sectors,
-        int *pnum,
+        int64_t offset,
+        int64_t bytes,
+        int64_t *pnum,
         BlockDriverState **file)
 {
     BlockDriverState *p;
@@ -1879,17 +1879,10 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        int64_t count;
-
-        ret = bdrv_co_block_status(p, allocation,
-                                   sector_num * BDRV_SECTOR_SIZE,
-                                   nb_sectors * BDRV_SECTOR_SIZE, &count,
-                                   file);
+        ret = bdrv_co_block_status(p, allocation, offset, bytes, pnum, file);
         if (ret < 0) {
             break;
         }
-        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
-        *pnum = count >> BDRV_SECTOR_BITS;
         if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
             /*
              * Reading beyond the end of the file continues to read
@@ -1897,39 +1890,35 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
              * unallocated length we learned from an earlier
              * iteration.
              */
-            *pnum = nb_sectors;
+            *pnum = bytes;
         }
         if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
             break;
         }
-        /* [sector_num, pnum] unallocated on this layer, which could be only
-         * the first part of [sector_num, nb_sectors].  */
-        nb_sectors = MIN(nb_sectors, *pnum);
+        /* [offset, pnum] unallocated on this layer, which could be only
+         * the first part of [offset, bytes].  */
+        bytes = MIN(bytes, *pnum);
         first = false;
     }
     return ret;
 }

 /* Coroutine wrapper for bdrv_get_block_status_above() */
-static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
+static void coroutine_fn bdrv_block_status_above_co_entry(void *opaque)
 {
     BdrvCoBlockStatusData *data = opaque;
-    int n;

-    data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
-                                               data->allocation,
-                                               data->offset >> BDRV_SECTOR_BITS,
-                                               data->bytes >> BDRV_SECTOR_BITS,
-                                               &n,
-                                               data->file);
-    *data->pnum = n * BDRV_SECTOR_SIZE;
+    data->ret = bdrv_co_block_status_above(data->bs, data->base,
+                                           data->allocation,
+                                           data->offset, data->bytes,
+                                           data->pnum, data->file);
     data->done = true;
 }

 /*
- * Synchronous wrapper around bdrv_co_get_block_status_above().
+ * Synchronous wrapper around bdrv_co_block_status_above().
  *
- * See bdrv_co_get_block_status_above() for details.
+ * See bdrv_co_block_status_above() for details.
  */
 static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
                                               BlockDriverState *base,
@@ -1952,10 +1941,9 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,

     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
-        bdrv_get_block_status_above_co_entry(&data);
+        bdrv_block_status_above_co_entry(&data);
     } else {
-        co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry,
-                                   &data);
+        co = qemu_coroutine_create(bdrv_block_status_above_co_entry, &data);
         bdrv_coroutine_enter(bs, co);
         BDRV_POLL_WHILE(bs, !data.done);
     }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 13/15] block: Convert bdrv_get_block_status_above() to bytes
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (11 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 12/15] block: Switch bdrv_co_get_block_status_above() " Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  9:32   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 14/15] block: Align block status requests Eric Blake
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng,
	Max Reitz, Jeff Cody

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status_above()
to bdrv_block_status_above() ensures that the compiler enforces that
all callers are updated.  For now, the io.c layer still assert()s
that all callers are sector-aligned, but that can be relaxed when a
later patch implements byte-based block status in the drivers.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status().  But some
code, particularly bdrv_block_status(), gets a lot simpler because
it no longer has to mess with sectors.

For ease of review, bdrv_get_block_status() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase to earlier changes
---
 include/block/block.h | 10 +++++-----
 block/io.c            | 39 ++++++++-------------------------------
 block/mirror.c        | 12 ++++--------
 block/qcow2.c         |  8 +++-----
 qemu-img.c            | 39 +++++++++++++++++++++++----------------
 5 files changed, 43 insertions(+), 65 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index e3e6582..ed2ea69 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -424,11 +424,11 @@ bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
                           int64_t bytes, int64_t *pnum,
                           BlockDriverState **file);
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file);
+int64_t bdrv_block_status_above(BlockDriverState *bs,
+                                BlockDriverState *base,
+                                int64_t offset,
+                                int64_t bytes, int64_t *pnum,
+                                BlockDriverState **file);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
diff --git a/block/io.c b/block/io.c
index 85353fa..91d3e99 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1903,7 +1903,7 @@ static int64_t coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
     return ret;
 }

-/* Coroutine wrapper for bdrv_get_block_status_above() */
+/* Coroutine wrapper for bdrv_block_status_above() */
 static void coroutine_fn bdrv_block_status_above_co_entry(void *opaque)
 {
     BdrvCoBlockStatusData *data = opaque;
@@ -1950,43 +1950,20 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
     return data.ret;
 }

-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file)
+int64_t bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
+                                int64_t offset, int64_t bytes, int64_t *pnum,
+                                BlockDriverState **file)
 {
-    int64_t ret;
-    int64_t n;
-
-    ret = bdrv_common_block_status_above(bs, base, false,
-                                         sector_num * BDRV_SECTOR_SIZE,
-                                         nb_sectors * BDRV_SECTOR_SIZE,
-                                         &n, file);
-    if (ret < 0) {
-        return ret;
-    }
-    assert(QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
-    *pnum = n >> BDRV_SECTOR_BITS;
-    return ret;
+    return bdrv_common_block_status_above(bs, base, false, offset, bytes,
+                                          pnum, file);
 }

 int64_t bdrv_block_status(BlockDriverState *bs,
                           int64_t offset, int64_t bytes, int64_t *pnum,
                           BlockDriverState **file)
 {
-    int64_t ret;
-    int n;
-
-    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-    assert(bytes <= BDRV_REQUEST_MAX_BYTES);
-    ret = bdrv_get_block_status_above(bs, backing_bs(bs),
-                                      offset >> BDRV_SECTOR_BITS,
-                                      bytes >> BDRV_SECTOR_BITS, &n, file);
-    if (pnum) {
-        *pnum = n * BDRV_SECTOR_SIZE;
-    }
-    return ret;
+    return bdrv_block_status_above(bs, backing_bs(bs),
+                                   offset, bytes, pnum, file);
 }

 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
diff --git a/block/mirror.c b/block/mirror.c
index 08c0c27..1d555bb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -328,7 +328,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     uint64_t delay_ns = 0;
     /* At least the first dirty chunk is mirrored in one iteration. */
     int nb_chunks = 1;
-    int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
     int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

@@ -377,7 +376,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     }

     /* Clear dirty bits before querying the block status, because
-     * calling bdrv_get_block_status_above could yield - if some blocks are
+     * calling bdrv_block_status_above could yield - if some blocks are
      * marked dirty in this window, we need to know.
      */
     bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, offset,
@@ -387,7 +386,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
     while (nb_chunks > 0 && offset < s->bdev_length) {
         int64_t ret;
-        int io_sectors;
         int64_t io_bytes;
         int64_t io_bytes_acct;
         enum MirrorMethod {
@@ -397,11 +395,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         } mirror_method = MIRROR_METHOD_COPY;

         assert(!(offset % s->granularity));
-        ret = bdrv_get_block_status_above(source, NULL,
-                                          offset >> BDRV_SECTOR_BITS,
-                                          nb_chunks * sectors_per_chunk,
-                                          &io_sectors, NULL);
-        io_bytes = io_sectors * BDRV_SECTOR_SIZE;
+        ret = bdrv_block_status_above(source, NULL, offset,
+                                      nb_chunks * s->granularity,
+                                      &io_bytes, NULL);
         if (ret < 0) {
             io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
         } else if (ret & BDRV_BLOCK_DATA) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 9754193..d251bf7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2907,7 +2907,7 @@ finish:

 static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
-    int nr;
+    int64_t nr;
     int64_t res;
     int64_t start;

@@ -2923,10 +2923,8 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
     if (!bytes) {
         return true;
     }
-    res = bdrv_get_block_status_above(bs, NULL, start >> BDRV_SECTOR_BITS,
-                                      bytes >> BDRV_SECTOR_BITS, &nr, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) &&
-        nr * BDRV_SECTOR_SIZE == bytes;
+    res = bdrv_block_status_above(bs, NULL, start, bytes, &nr, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
 }

 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
diff --git a/qemu-img.c b/qemu-img.c
index 2b1422a..39aff13 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1224,7 +1224,7 @@ static int img_compare(int argc, char **argv)
     BlockDriverState *bs1, *bs2;
     int64_t total_sectors1, total_sectors2;
     uint8_t *buf1 = NULL, *buf2 = NULL;
-    int pnum1, pnum2;
+    int64_t pnum1, pnum2;
     int allocated1, allocated2;
     int ret = 0; /* return value - 0 Ident, 1 Different, >1 Error */
     bool progress = false, quiet = false, strict = false;
@@ -1378,9 +1378,11 @@ static int img_compare(int argc, char **argv)
         if (nb_sectors <= 0) {
             break;
         }
-        status1 = bdrv_get_block_status_above(bs1, NULL, sector_num,
-                                              total_sectors1 - sector_num,
-                                              &pnum1, NULL);
+        status1 = bdrv_block_status_above(bs1, NULL,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          (total_sectors1 - sector_num) *
+                                          BDRV_SECTOR_SIZE,
+                                          &pnum1, NULL);
         if (status1 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename1);
@@ -1388,9 +1390,11 @@ static int img_compare(int argc, char **argv)
         }
         allocated1 = status1 & BDRV_BLOCK_ALLOCATED;

-        status2 = bdrv_get_block_status_above(bs2, NULL, sector_num,
-                                              total_sectors2 - sector_num,
-                                              &pnum2, NULL);
+        status2 = bdrv_block_status_above(bs2, NULL,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          (total_sectors2 - sector_num) *
+                                          BDRV_SECTOR_SIZE,
+                                          &pnum2, NULL);
         if (status2 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename2);
@@ -1398,10 +1402,12 @@ static int img_compare(int argc, char **argv)
         }
         allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
         if (pnum1) {
-            nb_sectors = MIN(nb_sectors, pnum1);
+            nb_sectors = MIN(nb_sectors,
+                             DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE));
         }
         if (pnum2) {
-            nb_sectors = MIN(nb_sectors, pnum2);
+            nb_sectors = MIN(nb_sectors,
+                             DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE));
         }

         if (strict) {
@@ -1415,7 +1421,7 @@ static int img_compare(int argc, char **argv)
             }
         }
         if ((status1 & BDRV_BLOCK_ZERO) && (status2 & BDRV_BLOCK_ZERO)) {
-            nb_sectors = MIN(pnum1, pnum2);
+            nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
         } else if (allocated1 == allocated2) {
             if (allocated1) {
                 ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
@@ -1593,23 +1599,24 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
     n = MIN(s->total_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);

     if (s->sector_next_status <= sector_num) {
+        int64_t count = n * BDRV_SECTOR_SIZE;
+
         if (s->target_has_backing) {
-            int64_t count = n * BDRV_SECTOR_SIZE;

             ret = bdrv_block_status(blk_bs(s->src[src_cur]),
                                     (sector_num - src_cur_offset) *
                                     BDRV_SECTOR_SIZE,
                                     count, &count, NULL);
-            assert(ret < 0 || QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
-            n = count >> BDRV_SECTOR_BITS;
         } else {
-            ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
-                                              sector_num - src_cur_offset,
-                                              n, &n, NULL);
+            ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
+                                          (sector_num - src_cur_offset) *
+                                          BDRV_SECTOR_SIZE,
+                                          count, &count, NULL);
         }
         if (ret < 0) {
             return ret;
         }
+        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);

         if (ret & BDRV_BLOCK_ZERO) {
             s->status = BLK_ZERO;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 14/15] block: Align block status requests
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (12 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 13/15] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  9:44   ` Fam Zheng
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 15/15] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
  2017-07-03 22:20 ` [Qemu-devel] [RFC PATCH v2 16/15] block: Add .bdrv_co_block_status() callback Eric Blake
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Max Reitz, Stefan Hajnoczi, Fam Zheng

Any device that has request_alignment greater than 512 should be
unable to report status at a finer granularity; it may also be
simpler for such devices to be guaranteed that the block layer
has rounded things out to the granularity boundary (the way the
block layer already rounds all other I/O out).  Besides, getting
the code correct for super-sector alignment also benefits us
for the fact that our public interface now has byte granularity,
even though none of our drivers have byte-level callbacks.

Add an assertion in blkdebug that proves that the block layer
never requests status of unaligned sections, similar to what it
does on other requests (while still keeping the generic helper
in place for when future patches add a throttle driver).  Note
note that iotest 177 already covers this (it would fail if you
use just the blkdebug.c hunk without the io.c changes).
Meanwhile, we can drop assertions in callers that no longer have
to pass in sector-aligned addresses.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 include/block/block_int.h |  3 ++-
 block/blkdebug.c          | 13 +++++++++++-
 block/io.c                | 53 ++++++++++++++++++++++++++++++++---------------
 3 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index ffa22c7..5f6ba5d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -173,7 +173,8 @@ struct BlockDriver {
      * according to the current layer, and should not set
      * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
      * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
-     * layer guarantees non-NULL pnum and file.
+     * layer guarantees input aligned to request_alignment, as well as
+     * non-NULL pnum and file.
      */
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum,
diff --git a/block/blkdebug.c b/block/blkdebug.c
index f1539db..67736b4 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -641,6 +641,17 @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
     return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
 }

+static int64_t coroutine_fn blkdebug_co_get_block_status(
+    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+    BlockDriverState **file)
+{
+    assert(QEMU_IS_ALIGNED(sector_num | nb_sectors,
+                           DIV_ROUND_UP(bs->bl.request_alignment,
+                                        BDRV_SECTOR_SIZE)));
+    return bdrv_co_get_block_status_from_file(bs, sector_num, nb_sectors,
+                                              pnum, file);
+}
+
 static void blkdebug_close(BlockDriverState *bs)
 {
     BDRVBlkdebugState *s = bs->opaque;
@@ -915,7 +926,7 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
     .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
     .bdrv_co_pdiscard       = blkdebug_co_pdiscard,
-    .bdrv_co_get_block_status = bdrv_co_get_block_status_from_file,
+    .bdrv_co_get_block_status = blkdebug_co_get_block_status,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
diff --git a/block/io.c b/block/io.c
index 91d3e99..5ed1ac7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1746,7 +1746,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     int64_t n; /* bytes */
     int64_t ret, ret2;
     BlockDriverState *local_file = NULL;
-    int count; /* sectors */
+    int64_t aligned_offset, aligned_bytes;
+    uint32_t align;

     assert(pnum);
     total_size = bdrv_getlength(bs);
@@ -1788,27 +1789,44 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     }

     bdrv_inc_in_flight(bs);
-    /*
-     * TODO: Rather than require aligned offsets, we could instead
-     * round to the driver's request_alignment here, then touch up
-     * count afterwards back to the caller's expectations.
-     */
-    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-                                            bytes >> BDRV_SECTOR_BITS, &count,
-                                            &local_file);
-    if (ret < 0) {
-        *pnum = 0;
-        goto out;
+
+    /* Round out to request_alignment boundaries */
+    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
+    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
+    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
+
+    {
+        int count; /* sectors */
+
+        assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
+                               BDRV_SECTOR_SIZE));
+        ret = bs->drv->bdrv_co_get_block_status(
+            bs, aligned_offset >> BDRV_SECTOR_BITS,
+            aligned_bytes >> BDRV_SECTOR_BITS, &count, &local_file);
+        if (ret < 0) {
+            *pnum = 0;
+            goto out;
+        }
+        *pnum = count * BDRV_SECTOR_SIZE;
+    }
+
+    /* Clamp pnum and ret to original request */
+    assert(QEMU_IS_ALIGNED(*pnum, align));
+    *pnum -= offset - aligned_offset;
+    if (aligned_offset >> BDRV_SECTOR_BITS != offset >> BDRV_SECTOR_BITS &&
+        ret & BDRV_BLOCK_OFFSET_VALID) {
+        ret += QEMU_ALIGN_DOWN(offset - aligned_offset, BDRV_SECTOR_SIZE);
+    }
+    if (*pnum > bytes) {
+        *pnum = bytes;
     }
-    *pnum = count * BDRV_SECTOR_SIZE;

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
         ret = bdrv_co_block_status(local_file, allocation,
-                                   ret & BDRV_BLOCK_OFFSET_MASK,
+                                   (ret & BDRV_BLOCK_OFFSET_MASK) |
+                                   (offset & ~BDRV_BLOCK_OFFSET_MASK),
                                    *pnum, pnum, &local_file);
-        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
         goto out;
     }

@@ -1832,7 +1850,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
         int64_t file_pnum;

         ret2 = bdrv_co_block_status(local_file, true,
-                                    ret & BDRV_BLOCK_OFFSET_MASK,
+                                    (ret & BDRV_BLOCK_OFFSET_MASK) |
+                                    (offset & ~BDRV_BLOCK_OFFSET_MASK),
                                     *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v2 15/15] qemu-io: Relax 'alloc' now that block-status doesn't assert
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (13 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 14/15] block: Align block status requests Eric Blake
@ 2017-07-03 22:14 ` Eric Blake
  2017-07-04  9:50   ` Fam Zheng
  2017-07-03 22:20 ` [Qemu-devel] [RFC PATCH v2 16/15] block: Add .bdrv_co_block_status() callback Eric Blake
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, qemu-block, el13635, Max Reitz

Previously, the alloc command required that input parameters be
sector-aligned and clamped to 32 bits, because the underlying
bdrv_is_allocated used a 32-bit parameter and asserted aligned
inputs.  But now that we have fixed block status to report a
64-bit bytes value, and to properly round requests on behalf of
guests, we can pass any values, and can use qemu-io to add
coverage that our rounding is correct regardless of the guest
alignment constraints.

Update iotest 177 to intentionally probe block status at
unaligned boundaries, which also required tweaking the image
prep to leave an unallocated portion to the image under test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 qemu-io-cmds.c             | 13 -------------
 tests/qemu-iotests/177     | 11 +++++++++--
 tests/qemu-iotests/177.out | 18 +++++++++++++-----
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index e529b8f..84c2e47 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1769,10 +1769,6 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
     if (offset < 0) {
         print_cvtnum_err(offset, argv[1]);
         return 0;
-    } else if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
-        printf("%" PRId64 " is not a sector-aligned value for 'offset'\n",
-               offset);
-        return 0;
     }

     if (argc == 3) {
@@ -1780,19 +1776,10 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
         if (count < 0) {
             print_cvtnum_err(count, argv[2]);
             return 0;
-        } else if (count > INT_MAX * BDRV_SECTOR_SIZE) {
-            printf("length argument cannot exceed %llu, given %s\n",
-                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
-            return 0;
         }
     } else {
         count = BDRV_SECTOR_SIZE;
     }
-    if (!QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE)) {
-        printf("%" PRId64 " is not a sector-aligned value for 'count'\n",
-               count);
-        return 0;
-    }

     remaining = count;
     sum_alloc = 0;
diff --git a/tests/qemu-iotests/177 b/tests/qemu-iotests/177
index f8ed8fb..36e3b87 100755
--- a/tests/qemu-iotests/177
+++ b/tests/qemu-iotests/177
@@ -51,7 +51,7 @@ echo "== setting up files =="
 TEST_IMG="$TEST_IMG.base" _make_test_img $size
 $QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
 _make_test_img -b "$TEST_IMG.base"
-$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -P 22 0 110M" "$TEST_IMG" | _filter_qemu_io

 # Limited to 64k max-transfer
 echo
@@ -82,6 +82,12 @@ $QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
          -c "discard 80000001 30M" | _filter_qemu_io

 echo
+echo "== block status smaller than alignment =="
+limits=align=4k
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+	 -c "alloc 1 1" -c "alloc 0x6dffff0 1000" -c map | _filter_qemu_io
+
+echo
 echo "== verify image content =="

 function verify_io()
@@ -103,7 +109,8 @@ function verify_io()
     echo read -P 0 32M 32M
     echo read -P 22 64M 13M
     echo read -P $discarded 77M 29M
-    echo read -P 22 106M 22M
+    echo read -P 22 106M 4M
+    echo read -P 11 110M 18M
 }

 verify_io | $QEMU_IO -r "$TEST_IMG" | _filter_qemu_io
diff --git a/tests/qemu-iotests/177.out b/tests/qemu-iotests/177.out
index 43a7778..8c05f69 100644
--- a/tests/qemu-iotests/177.out
+++ b/tests/qemu-iotests/177.out
@@ -5,8 +5,8 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
-wrote 134217728/134217728 bytes at offset 0
-128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 115343360/115343360 bytes at offset 0
+110 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

 == constrained alignment and max-transfer ==
 wrote 131072/131072 bytes at offset 1000
@@ -26,6 +26,12 @@ wrote 33554432/33554432 bytes at offset 33554432
 discard 31457280/31457280 bytes at offset 80000001
 30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

+== block status smaller than alignment ==
+1/1 bytes allocated at offset 1 bytes
+16/1000 bytes allocated at offset 110 MiB
+110 MiB (0x6e00000) bytes     allocated at offset 0 bytes (0x0)
+18 MiB (0x1200000) bytes not allocated at offset 110 MiB (0x6e00000)
+
 == verify image content ==
 read 1000/1000 bytes at offset 0
 1000 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -43,12 +49,14 @@ read 13631488/13631488 bytes at offset 67108864
 13 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 30408704/30408704 bytes at offset 80740352
 29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 23068672/23068672 bytes at offset 111149056
-22 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4194304/4194304 bytes at offset 111149056
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 18874368/18874368 bytes at offset 115343360
+18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Offset          Length          File
 0               0x800000        TEST_DIR/t.IMGFMT
 0x900000        0x2400000       TEST_DIR/t.IMGFMT
 0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
-0x6a00000       0x1600000       TEST_DIR/t.IMGFMT
+0x6a00000       0x400000        TEST_DIR/t.IMGFMT
 No errors were found on the image.
 *** done
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC PATCH v2 16/15] block: Add .bdrv_co_block_status() callback
  2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
                   ` (14 preceding siblings ...)
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 15/15] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
@ 2017-07-03 22:20 ` Eric Blake
  2017-07-04 11:30   ` Fam Zheng
  15 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-03 22:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi, Fam Zheng, Max Reitz

We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers.  Add a new callback that operates
on as small as byte boundaries. Subsequent patches will then update
individual drivers, then finally remove .bdrv_co_get_block_status().
The old code now uses a goto in order to minimize churn at that later
removal.

The new code also passes through the 'allocation' hint, which will
allow subsequent patches to further optimize callers that only care
about how much of the image is allocated, rather than which offsets
the allocation actually maps to.

Note that most drivers give sector-aligned answers, except at
end-of-file, even when request_alignment is smaller than a sector.
However, bdrv_getlength() is sector-aligned (even though it gives a
byte answer), often by exceeding the actual file size.  If we were to
give back strict results, at least file-posix.c would report a
transition from DATA to HOLE at the end of a file even in the middle
of a sector, which can throw off callers; so we intentionally lie and
state that any partial sector at the end of a file has the same
status for the entire sector.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: improve alignment handling, add additional 'allocation' argument
for future optimization potential, ensure all iotests still pass

Sending as an RFC as part of my third series; this patch is technically
the start of my fourth series, but given the rebase churn I've had
elsewhere, it can't hurt to get the interface looked at in case it
needs tweaking
---
 include/block/block_int.h |  9 ++++++++-
 block/io.c                | 27 ++++++++++++++++++++++++---
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5f6ba5d..45ff534 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -172,13 +172,20 @@ struct BlockDriver {
      * bdrv_is_allocated[_above].  The driver should answer only
      * according to the current layer, and should not set
      * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
-     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
+     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  As a hint,
+     * the flag allocation is true if the caller cares more about
+     * learning how much of the image is allocated, without regards to
+     * a breakdown by offset (a driver may either ignore the hint, or
+     * avoid _OFFSET_VALID to provide a larger *pnum).  The block
      * layer guarantees input aligned to request_alignment, as well as
      * non-NULL pnum and file.
      */
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum,
         BlockDriverState **file);
+    int64_t coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bd,
+        bool allocation, int64_t offset, int64_t bytes, int64_t *pnum,
+        BlockDriverState **file);

     /*
      * Invalidate any cached meta-data.
diff --git a/block/io.c b/block/io.c
index 5ed1ac7..00cdac1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1771,7 +1771,7 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
         bytes = n;
     }

-    if (!bs->drv->bdrv_co_get_block_status) {
+    if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
         *pnum = bytes;
         ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
         if (offset + bytes == total_size) {
@@ -1791,11 +1791,14 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     bdrv_inc_in_flight(bs);

     /* Round out to request_alignment boundaries */
-    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
+    align = bs->bl.request_alignment;
+    if (bs->drv->bdrv_co_get_block_status && align < BDRV_SECTOR_SIZE) {
+        align = BDRV_SECTOR_SIZE;
+    }
     aligned_offset = QEMU_ALIGN_DOWN(offset, align);
     aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;

-    {
+    if (bs->drv->bdrv_co_get_block_status) {
         int count; /* sectors */

         assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
@@ -1808,8 +1811,26 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
             goto out;
         }
         *pnum = count * BDRV_SECTOR_SIZE;
+        goto refine;
     }

+    ret = bs->drv->bdrv_co_block_status(bs, false, aligned_offset,
+                                        aligned_bytes, pnum, &local_file);
+    if (ret < 0) {
+        *pnum = 0;
+        goto out;
+    }
+
+    /*
+     * total_size is always sector-aligned, by sometimes exceeding actual
+     * file size. Expand pnum if it lands mid-sector due to end-of-file.
+     */
+    if (QEMU_ALIGN_UP(*pnum + aligned_offset,
+                      BDRV_SECTOR_SIZE) == total_size) {
+        *pnum = total_size - aligned_offset;
+    }
+
+ refine:
     /* Clamp pnum and ret to original request */
     assert(QEMU_IS_ALIGNED(*pnum, align));
     *pnum -= offset - aligned_offset;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
@ 2017-07-04  7:06   ` Fam Zheng
  2017-07-05 11:56     ` Eric Blake
  2017-07-06  2:35   ` Eric Blake
  1 sibling, 1 reply; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:06 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * Drivers not implementing the functionality are assumed to not support
>   * backing files, hence all their sectors are reported as allocated.
>   *
> + * If 'allocation' is true, the caller only cares about allocation
> + * status; this is a hint that a larger 'pnum' result is more
> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
> + *

This is slightly unintuitive. I would guess "allocation == false" means "I don't
care about the allocation status" but it actually is "I don't care about the
consecutiveness". The "only" semantics is missing in the parameter name.

Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
true" means BDRV_BLOCK_OFFSET_VALID is wanted.

Sorry for bikeshedding.

Fam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status()
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status() Eric Blake
@ 2017-07-04  7:33   ` Fam Zheng
  2017-07-05 22:00   ` Eric Blake
  1 sibling, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:33 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Jeff Cody, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> From: Manos Pitsidianakis <el13635@mail.ntua.gr>
> 
> bdrv_co_get_block_status_from_file() and
> bdrv_co_get_block_status_from_backing() set *file to bs->file and
> bs->backing respectively, so that bdrv_co_get_block_status() can recurse
> to them. Future block drivers won't have to duplicate code to implement
> this.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
> Message-Id: <20170629184320.7151-4-el13635@mail.ntua.gr>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/15] block: Allow NULL file for bdrv_get_block_status()
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 02/15] block: Allow NULL file for bdrv_get_block_status() Eric Blake
@ 2017-07-04  7:33   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:33 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Jeff Cody, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  This patch merely simplifies the callers by
> consolidating the logic in the common call point, while guaranteeing
> a non-NULL file to all the driver callbacks, for no semantic change.
> The only caller that does not care about pnum is bdrv_is_allocated,
> as invoked by vvfat; we can likewise add assertions that the rest
> of the stack does not have to worry about a NULL pnum.
> 
> Furthermore, this will also set the stage for a future cleanup: when
> a caller does not care about which BDS owns an offset, it would be
> nice to allow the driver to optimize things to not have to return
> BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
> allocation (for example, it's fairly easy to create a qcow2 image
> where consecutive guest addresses are not at consecutive host
> addresses), the current contract requires bdrv_get_block_status()
> to clamp *pnum to the limit where host addresses are no longer
> consecutive, but allowing a NULL file means that *pnum could be
> set to the full length of known-allocated data.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/15] block: Make bdrv_round_to_clusters() signature more useful
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 04/15] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
@ 2017-07-04  7:34   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:34 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Jeff Cody, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> In the process of converting sector-based interfaces to bytes,
> I'm finding it easier to represent a byte count as a 64-bit
> integer at the block layer (even if we are internally capped
> by SIZE_MAX or even INT_MAX for individual transactions, it's
> still nicer to not have to worry about truncation/overflow
> issues on as many variables).  Update the signature of
> bdrv_round_to_clusters() to uniformly use int64_t, matching
> the signature already chosen for bdrv_is_allocated and the
> fact that off_t is also a signed type, then adjust clients
> according to the required fallout.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/15] qcow2: Switch is_zero_sectors() to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 05/15] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
@ 2017-07-04  7:34   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:34 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Max Reitz, jsnow, qemu-block, el13635

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change), and rename it to is_zero() in the
> process.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/15] block: Switch bdrv_make_zero() to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 06/15] block: Switch bdrv_make_zero() " Eric Blake
@ 2017-07-04  7:35   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:35 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of zeroing a device to track by bytes instead of
> sectors (although we are still guaranteed that we iterate by steps
> that are sector-aligned).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/15] qemu-img: Switch get_block_status() to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 07/15] qemu-img: Switch get_block_status() " Eric Blake
@ 2017-07-04  7:40   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  7:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Max Reitz, jsnow, qemu-block, el13635

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> an internal function (no semantic change), and simplifying its
> caller accordingly.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/15] block: Convert bdrv_get_block_status() to bytes
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 08/15] block: Convert bdrv_get_block_status() to bytes Eric Blake
@ 2017-07-04  8:40   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  8:40 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the name of the function from bdrv_get_block_status() to
> bdrv_block_status() ensures that the compiler enforces that all
> callers are updated.  For now, the io.c layer still assert()s that
> all callers are sector-aligned, but that can be relaxed when a later
> patch implements byte-based block status in the drivers.
> 
> Note that we have an inherent limitation in the BDRV_BLOCK_* return
> values: BDRV_BLOCK_OFFSET_VALID can only return the start of a
> sector, even if we later relax the interface to query for the status
> starting at an intermediate byte; document the obvious interpretation
> that valid offsets are always sector-relative.
> 
> Therefore, for the most part this patch is just the addition of scaling
> at the callers followed by inverse scaling at bdrv_block_status().  But
> some code, particularly bdrv_is_allocated(), gets a lot simpler because
> it no longer has to mess with sectors.
> 
> For ease of review, bdrv_get_block_status_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/15] block: Switch bdrv_co_get_block_status() to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 09/15] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
@ 2017-07-04  8:59   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  8:59 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change), and as with its public counterpart,
> rename to bdrv_co_block_status() to make the compiler enforce that
> we catch all uses.  For now, we assert that callers still pass
> aligned data, but ultimately, this will be the function where we
> hand off to a byte-based driver callback, and will eventually need
> to add logic to ensure we round calls according to the driver's
> request_alignment then touch up the result handed back to the
> caller, to start permitting a caller to pass unaligned offsets.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/15] block: Switch BdrvCoGetBlockStatusData to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 10/15] block: Switch BdrvCoGetBlockStatusData " Eric Blake
@ 2017-07-04  9:00   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  9:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> type (no semantic change), and rename it to match the corresponding
> public function rename.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/15] block: Switch bdrv_common_block_status_above() to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 11/15] block: Switch bdrv_common_block_status_above() " Eric Blake
@ 2017-07-04  9:02   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  9:02 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/15] block: Switch bdrv_co_get_block_status_above() to byte-based
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 12/15] block: Switch bdrv_co_get_block_status_above() " Eric Blake
@ 2017-07-04  9:28   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  9:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> type (no semantic change), and rename it to match the corresponding
> public function rename.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/15] block: Convert bdrv_get_block_status_above() to bytes
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 13/15] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
@ 2017-07-04  9:32   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  9:32 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, el13635, Jeff Cody, Max Reitz,
	Stefan Hajnoczi, jsnow

On Mon, 07/03 17:14, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the name of the function from bdrv_get_block_status_above()
> to bdrv_block_status_above() ensures that the compiler enforces that
> all callers are updated.  For now, the io.c layer still assert()s
> that all callers are sector-aligned, but that can be relaxed when a
> later patch implements byte-based block status in the drivers.
> 
> For the most part this patch is just the addition of scaling at the
> callers followed by inverse scaling at bdrv_block_status().  But some
> code, particularly bdrv_block_status(), gets a lot simpler because
> it no longer has to mess with sectors.
> 
> For ease of review, bdrv_get_block_status() was tackled separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/15] block: Align block status requests
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 14/15] block: Align block status requests Eric Blake
@ 2017-07-04  9:44   ` Fam Zheng
  2017-07-05 12:01     ` Eric Blake
  0 siblings, 1 reply; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  9:44 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, jsnow, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi

On Mon, 07/03 17:14, Eric Blake wrote:
> Any device that has request_alignment greater than 512 should be
> unable to report status at a finer granularity; it may also be
> simpler for such devices to be guaranteed that the block layer
> has rounded things out to the granularity boundary (the way the
> block layer already rounds all other I/O out).  Besides, getting
> the code correct for super-sector alignment also benefits us
> for the fact that our public interface now has byte granularity,
> even though none of our drivers have byte-level callbacks.
> 
> Add an assertion in blkdebug that proves that the block layer
> never requests status of unaligned sections, similar to what it
> does on other requests (while still keeping the generic helper
> in place for when future patches add a throttle driver).  Note
> note that iotest 177 already covers this (it would fail if you

Drop one "note"?

> use just the blkdebug.c hunk without the io.c changes).
> Meanwhile, we can drop assertions in callers that no longer have
> to pass in sector-aligned addresses.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  include/block/block_int.h |  3 ++-
>  block/blkdebug.c          | 13 +++++++++++-
>  block/io.c                | 53 ++++++++++++++++++++++++++++++++---------------
>  3 files changed, 50 insertions(+), 19 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index ffa22c7..5f6ba5d 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -173,7 +173,8 @@ struct BlockDriver {
>       * according to the current layer, and should not set
>       * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
>       * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
> -     * layer guarantees non-NULL pnum and file.
> +     * layer guarantees input aligned to request_alignment, as well as
> +     * non-NULL pnum and file.
>       */
>      int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum,
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index f1539db..67736b4 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -641,6 +641,17 @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
>      return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
>  }
> 
> +static int64_t coroutine_fn blkdebug_co_get_block_status(
> +    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> +    BlockDriverState **file)
> +{
> +    assert(QEMU_IS_ALIGNED(sector_num | nb_sectors,
> +                           DIV_ROUND_UP(bs->bl.request_alignment,
> +                                        BDRV_SECTOR_SIZE)));
> +    return bdrv_co_get_block_status_from_file(bs, sector_num, nb_sectors,
> +                                              pnum, file);
> +}
> +
>  static void blkdebug_close(BlockDriverState *bs)
>  {
>      BDRVBlkdebugState *s = bs->opaque;
> @@ -915,7 +926,7 @@ static BlockDriver bdrv_blkdebug = {
>      .bdrv_co_flush_to_disk  = blkdebug_co_flush,
>      .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
>      .bdrv_co_pdiscard       = blkdebug_co_pdiscard,
> -    .bdrv_co_get_block_status = bdrv_co_get_block_status_from_file,
> +    .bdrv_co_get_block_status = blkdebug_co_get_block_status,
> 
>      .bdrv_debug_event           = blkdebug_debug_event,
>      .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
> diff --git a/block/io.c b/block/io.c
> index 91d3e99..5ed1ac7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1746,7 +1746,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>      int64_t n; /* bytes */
>      int64_t ret, ret2;
>      BlockDriverState *local_file = NULL;
> -    int count; /* sectors */
> +    int64_t aligned_offset, aligned_bytes;
> +    uint32_t align;
> 
>      assert(pnum);
>      total_size = bdrv_getlength(bs);
> @@ -1788,27 +1789,44 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>      }
> 
>      bdrv_inc_in_flight(bs);
> -    /*
> -     * TODO: Rather than require aligned offsets, we could instead
> -     * round to the driver's request_alignment here, then touch up
> -     * count afterwards back to the caller's expectations.
> -     */
> -    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
> -    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
> -                                            bytes >> BDRV_SECTOR_BITS, &count,
> -                                            &local_file);
> -    if (ret < 0) {
> -        *pnum = 0;
> -        goto out;
> +
> +    /* Round out to request_alignment boundaries */
> +    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
> +    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
> +    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
> +
> +    {

Not sure why using a {} block here?

> +        int count; /* sectors */
> +
> +        assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
> +                               BDRV_SECTOR_SIZE));
> +        ret = bs->drv->bdrv_co_get_block_status(
> +            bs, aligned_offset >> BDRV_SECTOR_BITS,
> +            aligned_bytes >> BDRV_SECTOR_BITS, &count, &local_file);
> +        if (ret < 0) {
> +            *pnum = 0;
> +            goto out;
> +        }
> +        *pnum = count * BDRV_SECTOR_SIZE;
> +    }
> +
> +    /* Clamp pnum and ret to original request */
> +    assert(QEMU_IS_ALIGNED(*pnum, align));
> +    *pnum -= offset - aligned_offset;
> +    if (aligned_offset >> BDRV_SECTOR_BITS != offset >> BDRV_SECTOR_BITS &&
> +        ret & BDRV_BLOCK_OFFSET_VALID) {
> +        ret += QEMU_ALIGN_DOWN(offset - aligned_offset, BDRV_SECTOR_SIZE);
> +    }
> +    if (*pnum > bytes) {
> +        *pnum = bytes;
>      }
> -    *pnum = count * BDRV_SECTOR_SIZE;
> 
>      if (ret & BDRV_BLOCK_RAW) {
>          assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
>          ret = bdrv_co_block_status(local_file, allocation,
> -                                   ret & BDRV_BLOCK_OFFSET_MASK,
> +                                   (ret & BDRV_BLOCK_OFFSET_MASK) |
> +                                   (offset & ~BDRV_BLOCK_OFFSET_MASK),
>                                     *pnum, pnum, &local_file);
> -        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
>          goto out;
>      }
> 
> @@ -1832,7 +1850,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>          int64_t file_pnum;
> 
>          ret2 = bdrv_co_block_status(local_file, true,
> -                                    ret & BDRV_BLOCK_OFFSET_MASK,
> +                                    (ret & BDRV_BLOCK_OFFSET_MASK) |
> +                                    (offset & ~BDRV_BLOCK_OFFSET_MASK),
>                                      *pnum, &file_pnum, NULL);
>          if (ret2 >= 0) {
>              /* Ignore errors.  This is just providing extra information, it
> -- 
> 2.9.4
> 

Fam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/15] qemu-io: Relax 'alloc' now that block-status doesn't assert
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 15/15] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
@ 2017-07-04  9:50   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04  9:50 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Max Reitz, jsnow, qemu-block, el13635

On Mon, 07/03 17:14, Eric Blake wrote:
> Previously, the alloc command required that input parameters be
> sector-aligned and clamped to 32 bits, because the underlying
> bdrv_is_allocated used a 32-bit parameter and asserted aligned
> inputs.  But now that we have fixed block status to report a
> 64-bit bytes value, and to properly round requests on behalf of
> guests, we can pass any values, and can use qemu-io to add
> coverage that our rounding is correct regardless of the guest
> alignment constraints.
> 
> Update iotest 177 to intentionally probe block status at
> unaligned boundaries, which also required tweaking the image
> prep to leave an unallocated portion to the image under test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 16/15] block: Add .bdrv_co_block_status() callback
  2017-07-03 22:20 ` [Qemu-devel] [RFC PATCH v2 16/15] block: Add .bdrv_co_block_status() callback Eric Blake
@ 2017-07-04 11:30   ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-04 11:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, jsnow, qemu-block, el13635, Stefan Hajnoczi,
	Max Reitz

On Mon, 07/03 17:20, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based. Now that the block layer exposes byte-based allocation,
> it's time to tackle the drivers.  Add a new callback that operates
> on as small as byte boundaries. Subsequent patches will then update
> individual drivers, then finally remove .bdrv_co_get_block_status().
> The old code now uses a goto in order to minimize churn at that later
> removal.
> 
> The new code also passes through the 'allocation' hint, which will
> allow subsequent patches to further optimize callers that only care
> about how much of the image is allocated, rather than which offsets
> the allocation actually maps to.
> 
> Note that most drivers give sector-aligned answers, except at
> end-of-file, even when request_alignment is smaller than a sector.
> However, bdrv_getlength() is sector-aligned (even though it gives a
> byte answer), often by exceeding the actual file size.  If we were to
> give back strict results, at least file-posix.c would report a
> transition from DATA to HOLE at the end of a file even in the middle
> of a sector, which can throw off callers; so we intentionally lie and
> state that any partial sector at the end of a file has the same
> status for the entire sector.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: improve alignment handling, add additional 'allocation' argument
> for future optimization potential, ensure all iotests still pass
> 
> Sending as an RFC as part of my third series; this patch is technically
> the start of my fourth series, but given the rebase churn I've had
> elsewhere, it can't hurt to get the interface looked at in case it
> needs tweaking
> ---
>  include/block/block_int.h |  9 ++++++++-
>  block/io.c                | 27 ++++++++++++++++++++++++---
>  2 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 5f6ba5d..45ff534 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -172,13 +172,20 @@ struct BlockDriver {
>       * bdrv_is_allocated[_above].  The driver should answer only
>       * according to the current layer, and should not set
>       * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
> -     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
> +     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  As a hint,
> +     * the flag allocation is true if the caller cares more about
> +     * learning how much of the image is allocated, without regards to
> +     * a breakdown by offset (a driver may either ignore the hint, or
> +     * avoid _OFFSET_VALID to provide a larger *pnum).  The block
>       * layer guarantees input aligned to request_alignment, as well as
>       * non-NULL pnum and file.
>       */
>      int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum,
>          BlockDriverState **file);
> +    int64_t coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bd,
> +        bool allocation, int64_t offset, int64_t bytes, int64_t *pnum,
> +        BlockDriverState **file);

Apart from my earlier question regarding the naming of "allocation", looks fine.

I haven't reviewed the implementation though.

Fam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-04  7:06   ` Fam Zheng
@ 2017-07-05 11:56     ` Eric Blake
  2017-07-05 12:07       ` Fam Zheng
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-05 11:56 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, kwolf, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 1957 bytes --]

On 07/04/2017 02:06 AM, Fam Zheng wrote:
> On Mon, 07/03 17:14, Eric Blake wrote:
>> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>>   * Drivers not implementing the functionality are assumed to not support
>>   * backing files, hence all their sectors are reported as allocated.
>>   *
>> + * If 'allocation' is true, the caller only cares about allocation
>> + * status; this is a hint that a larger 'pnum' result is more
>> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
>> + *
> 
> This is slightly unintuitive. I would guess "allocation == false" means "I don't
> care about the allocation status" but it actually is "I don't care about the
> consecutiveness". The "only" semantics is missing in the parameter name.
> 
> Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
> true" means BDRV_BLOCK_OFFSET_VALID is wanted.

Reasonable idea; other [shorter] names I've been toying with:
strict
mapping
precise

any of which, if true (set true by bdrv_get_block_status), means that I
care more about BDRV_BLOCK_OFFSET_VALID and validity for learning host
offsets, if false  it means I'm okay getting a larger *pnum even if it
extends over disjoint host offsets; or:

fast

which if true (set true by bdrv_is_allocated) means I want a larger
*pnum even if I have to sacrifice accuracy by omitting
BDRV_BLOCK_OFFSET_VALID, because I'm trying to reduce effort spent.

> 
> Sorry for bikeshedding.

Not a problem, I also had some double-takes in writing my own code
trying to remember which way I wanted the 'allocation' boolean to be
set, so coming up with a more intuitive name/default state in order to
help legibility is worth it.  Do any of my above suggestions sound better?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/15] block: Align block status requests
  2017-07-04  9:44   ` Fam Zheng
@ 2017-07-05 12:01     ` Eric Blake
  2017-07-05 12:12       ` Fam Zheng
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-05 12:01 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, kwolf, jsnow, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

On 07/04/2017 04:44 AM, Fam Zheng wrote:
> On Mon, 07/03 17:14, Eric Blake wrote:
>> Any device that has request_alignment greater than 512 should be
>> unable to report status at a finer granularity; it may also be
>> simpler for such devices to be guaranteed that the block layer
>> has rounded things out to the granularity boundary (the way the
>> block layer already rounds all other I/O out).  Besides, getting
>> the code correct for super-sector alignment also benefits us
>> for the fact that our public interface now has byte granularity,
>> even though none of our drivers have byte-level callbacks.
>>
>> Add an assertion in blkdebug that proves that the block layer
>> never requests status of unaligned sections, similar to what it
>> does on other requests (while still keeping the generic helper
>> in place for when future patches add a throttle driver).  Note
>> note that iotest 177 already covers this (it would fail if you
> 
> Drop one "note"?

Indeed. There are studies on how people read that show that redundant
words that cross a line boundaries are the most easy to mentally overlook.


>> +    /* Round out to request_alignment boundaries */
>> +    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
>> +    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
>> +    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
>> +
>> +    {
> 
> Not sure why using a {} block here?
> 
>> +        int count; /* sectors */

Because it makes it easier (less indentation churn) to delete the code
when series 4 later deletes the .bdrv_co_get_block_status sector-based
callback in favor of the newer .bdrv_co_block_status byte-based (patch
16/15 which start series 4 turns it into an if statement, then a later
patch deletes the entire conditional); it is also justifiable because it
creates a tighter scope for 'int count' which is needed only on the
boundary of sector-based operations when the rest of the function is
byte-based (rather than declaring the helper variable up front).  I'll
have to call it out more specifically in the commit message as intentional.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-05 11:56     ` Eric Blake
@ 2017-07-05 12:07       ` Fam Zheng
  2017-07-05 14:01         ` Eric Blake
  0 siblings, 1 reply; 41+ messages in thread
From: Fam Zheng @ 2017-07-05 12:07 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, qemu-block, el13635, qemu-devel, Max Reitz,
	Stefan Hajnoczi, jsnow

On Wed, 07/05 06:56, Eric Blake wrote:
> On 07/04/2017 02:06 AM, Fam Zheng wrote:
> > On Mon, 07/03 17:14, Eric Blake wrote:
> >> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
> >>   * Drivers not implementing the functionality are assumed to not support
> >>   * backing files, hence all their sectors are reported as allocated.
> >>   *
> >> + * If 'allocation' is true, the caller only cares about allocation
> >> + * status; this is a hint that a larger 'pnum' result is more
> >> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
> >> + *
> > 
> > This is slightly unintuitive. I would guess "allocation == false" means "I don't
> > care about the allocation status" but it actually is "I don't care about the
> > consecutiveness". The "only" semantics is missing in the parameter name.
> > 
> > Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
> > true" means BDRV_BLOCK_OFFSET_VALID is wanted.
> 
> Reasonable idea; other [shorter] names I've been toying with:
> strict
> mapping
> precise
> 
> any of which, if true (set true by bdrv_get_block_status), means that I
> care more about BDRV_BLOCK_OFFSET_VALID and validity for learning host
> offsets, if false  it means I'm okay getting a larger *pnum even if it
> extends over disjoint host offsets; or:
> 
> fast
> 
> which if true (set true by bdrv_is_allocated) means I want a larger
> *pnum even if I have to sacrifice accuracy by omitting
> BDRV_BLOCK_OFFSET_VALID, because I'm trying to reduce effort spent.
> 
> > 
> > Sorry for bikeshedding.
> 
> Not a problem, I also had some double-takes in writing my own code
> trying to remember which way I wanted the 'allocation' boolean to be
> set, so coming up with a more intuitive name/default state in order to
> help legibility is worth it.  Do any of my above suggestions sound better?
> 

I'd vote for "mapping" because it has a close connection with offset (as in
BDRV_BLOCK_OFFSET_VALID).

Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.

Fam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/15] block: Align block status requests
  2017-07-05 12:01     ` Eric Blake
@ 2017-07-05 12:12       ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-05 12:12 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, jsnow, qemu-block, el13635, Max Reitz,
	Stefan Hajnoczi

On Wed, 07/05 07:01, Eric Blake wrote:
> On 07/04/2017 04:44 AM, Fam Zheng wrote:
> > On Mon, 07/03 17:14, Eric Blake wrote:
> >> Any device that has request_alignment greater than 512 should be
> >> unable to report status at a finer granularity; it may also be
> >> simpler for such devices to be guaranteed that the block layer
> >> has rounded things out to the granularity boundary (the way the
> >> block layer already rounds all other I/O out).  Besides, getting
> >> the code correct for super-sector alignment also benefits us
> >> for the fact that our public interface now has byte granularity,
> >> even though none of our drivers have byte-level callbacks.
> >>
> >> Add an assertion in blkdebug that proves that the block layer
> >> never requests status of unaligned sections, similar to what it
> >> does on other requests (while still keeping the generic helper
> >> in place for when future patches add a throttle driver).  Note
> >> note that iotest 177 already covers this (it would fail if you
> > 
> > Drop one "note"?
> 
> Indeed. There are studies on how people read that show that redundant
> words that cross a line boundaries are the most easy to mentally overlook.
> 
> 
> >> +    /* Round out to request_alignment boundaries */
> >> +    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
> >> +    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
> >> +    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
> >> +
> >> +    {
> > 
> > Not sure why using a {} block here?
> > 
> >> +        int count; /* sectors */
> 
> Because it makes it easier (less indentation churn) to delete the code
> when series 4 later deletes the .bdrv_co_get_block_status sector-based
> callback in favor of the newer .bdrv_co_block_status byte-based (patch
> 16/15 which start series 4 turns it into an if statement, then a later
> patch deletes the entire conditional); it is also justifiable because it
> creates a tighter scope for 'int count' which is needed only on the
> boundary of sector-based operations when the rest of the function is
> byte-based (rather than declaring the helper variable up front).  I'll
> have to call it out more specifically in the commit message as intentional.

Thanks for explaining, with the syntax error fixed in the commit message:

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-05 12:07       ` Fam Zheng
@ 2017-07-05 14:01         ` Eric Blake
  2017-07-05 14:21           ` Fam Zheng
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2017-07-05 14:01 UTC (permalink / raw)
  To: Fam Zheng
  Cc: kwolf, qemu-block, el13635, qemu-devel, Max Reitz,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

On 07/05/2017 07:07 AM, Fam Zheng wrote:
>>>
>>> Sorry for bikeshedding.
>>
>> Not a problem, I also had some double-takes in writing my own code
>> trying to remember which way I wanted the 'allocation' boolean to be
>> set, so coming up with a more intuitive name/default state in order to
>> help legibility is worth it.  Do any of my above suggestions sound better?
>>
> 
> I'd vote for "mapping" because it has a close connection with offset (as in
> BDRV_BLOCK_OFFSET_VALID).
> 
> Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.

Well, there ARE drivers that WANT to return BDRV_BLOCK_OFFSET_VALID
regardless of the state of the boolean (namely, any driver that also
returns BDRV_BLOCK_RAW, to hint that this is a passthrough and the query
should be repeated at the next BDS in the chain).  So stating that
'offset' is false MUST preclude BDRV_BLOCK_OFFSET_VALID is a bit too
strong, but I can probably come up with appropriate wording that meets
in the middle ground (if 'offset' is true, then make all efforts to
include BDRV_BLOCK_OFFSET_VALID in the return if that is possible; if it
is false, then omitting the flag in order to get a larger pnum is
acceptable)).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-05 14:01         ` Eric Blake
@ 2017-07-05 14:21           ` Fam Zheng
  0 siblings, 0 replies; 41+ messages in thread
From: Fam Zheng @ 2017-07-05 14:21 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, qemu-block, el13635, qemu-devel, Max Reitz,
	Stefan Hajnoczi, jsnow

On Wed, 07/05 09:01, Eric Blake wrote:
> On 07/05/2017 07:07 AM, Fam Zheng wrote:
> >>>
> >>> Sorry for bikeshedding.
> >>
> >> Not a problem, I also had some double-takes in writing my own code
> >> trying to remember which way I wanted the 'allocation' boolean to be
> >> set, so coming up with a more intuitive name/default state in order to
> >> help legibility is worth it.  Do any of my above suggestions sound better?
> >>
> > 
> > I'd vote for "mapping" because it has a close connection with offset (as in
> > BDRV_BLOCK_OFFSET_VALID).
> > 
> > Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.
> 
> Well, there ARE drivers that WANT to return BDRV_BLOCK_OFFSET_VALID
> regardless of the state of the boolean (namely, any driver that also
> returns BDRV_BLOCK_RAW, to hint that this is a passthrough and the query
> should be repeated at the next BDS in the chain).  So stating that
> 'offset' is false MUST preclude BDRV_BLOCK_OFFSET_VALID is a bit too
> strong, but I can probably come up with appropriate wording that meets
> in the middle ground (if 'offset' is true, then make all efforts to
> include BDRV_BLOCK_OFFSET_VALID in the return if that is possible; if it
> is false, then omitting the flag in order to get a larger pnum is
> acceptable)).

Yes you are totally right, I thought that too after replied.

Fam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status()
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status() Eric Blake
  2017-07-04  7:33   ` Fam Zheng
@ 2017-07-05 22:00   ` Eric Blake
  1 sibling, 0 replies; 41+ messages in thread
From: Eric Blake @ 2017-07-05 22:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, el13635, Jeff Cody, Max Reitz,
	Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

On 07/03/2017 05:14 PM, Eric Blake wrote:
> From: Manos Pitsidianakis <el13635@mail.ntua.gr>
> 
> bdrv_co_get_block_status_from_file() and
> bdrv_co_get_block_status_from_backing() set *file to bs->file and
> bs->backing respectively, so that bdrv_co_get_block_status() can recurse
> to them. Future block drivers won't have to duplicate code to implement
> this.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
> Message-Id: <20170629184320.7151-4-el13635@mail.ntua.gr>
> 

Missing my Signed-off-by if we take this one in isolation through my
series, and still awaiting resolution on what will happen to the rest of
his series (v3 had some valid review comments that still need addressing)

> ---
> v2: Including this patch from Manos, since it affects my later patches;
> however, I anticipate that we will get a full v3 series from Manos
> merged first


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
  2017-07-04  7:06   ` Fam Zheng
@ 2017-07-06  2:35   ` Eric Blake
  1 sibling, 0 replies; 41+ messages in thread
From: Eric Blake @ 2017-07-06  2:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, el13635, Max Reitz, Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 2785 bytes --]

On 07/03/2017 05:14 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  In particular, bdrv_is_allocated() cares more
> about finding the largest run of allocated data from the guest
> perspective, whether or not that data is consecutive from the
> host perspective.  Therefore, doing subsequent refinements such
> as checking how much of the format-layer allocation also satisfies
> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
> case, it just costs extra CPU cycles during a single
> bdrv_is_allocated(), but in the worst case, it results in a smaller
> *pnum, and forces callers to iterate through more status probes when
> visiting the entire file for even more extra CPU cycles.
> 
> This patch only optimizes the block layer.  But subsequent patches
> will tweak the driver callback to be byte-based, and in the process,
> can also pass this hint through to the driver.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

> @@ -1810,12 +1817,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>          }
>      }
> 
> -    if (local_file && local_file != bs &&
> +    if (!allocation && local_file && local_file != bs &&
>          (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>          (ret & BDRV_BLOCK_OFFSET_VALID)) {
>          int file_pnum;
> 
> -        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +        ret2 = bdrv_co_get_block_status(local_file, true,
> +                                        ret >> BDRV_SECTOR_BITS,
>                                          *pnum, &file_pnum, NULL);
>          if (ret2 >= 0) {
>              /* Ignore errors.  This is just providing extra information, it

Hmm. My initial thinking here was that if we already have a good primary
status, we want our secondary status (where we are probing bs->file for
whether we can add BDRV_BLOCK_ZERO) to be as fast as possible, so I
hard-coded the answer that favors is_allocated (I have to be careful
describing this, since v3 will switch from 'bool allocated=true' to
'bool mapping=false' to express that same request).  But it turns out
that, at least for file-posix.c (for that matter, for several protocol
drivers), it's a LOT faster to just blindly state that the entire file
is allocated and data than it is to lseek(SEEK_HOLE).  So favoring
allocation status instead of mapping status defeats the purpose, and
this should be s/true/allocation/ (which is always false at this point)
[or conversely s/false/mapping/, which is always true, in v3].

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-07-06  2:35 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-03 22:14 [Qemu-devel] [PATCH v2 00/15] make bdrv_get_block_status byte-based Eric Blake
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status() Eric Blake
2017-07-04  7:33   ` Fam Zheng
2017-07-05 22:00   ` Eric Blake
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 02/15] block: Allow NULL file for bdrv_get_block_status() Eric Blake
2017-07-04  7:33   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
2017-07-04  7:06   ` Fam Zheng
2017-07-05 11:56     ` Eric Blake
2017-07-05 12:07       ` Fam Zheng
2017-07-05 14:01         ` Eric Blake
2017-07-05 14:21           ` Fam Zheng
2017-07-06  2:35   ` Eric Blake
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 04/15] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
2017-07-04  7:34   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 05/15] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
2017-07-04  7:34   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 06/15] block: Switch bdrv_make_zero() " Eric Blake
2017-07-04  7:35   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 07/15] qemu-img: Switch get_block_status() " Eric Blake
2017-07-04  7:40   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 08/15] block: Convert bdrv_get_block_status() to bytes Eric Blake
2017-07-04  8:40   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 09/15] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
2017-07-04  8:59   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 10/15] block: Switch BdrvCoGetBlockStatusData " Eric Blake
2017-07-04  9:00   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 11/15] block: Switch bdrv_common_block_status_above() " Eric Blake
2017-07-04  9:02   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 12/15] block: Switch bdrv_co_get_block_status_above() " Eric Blake
2017-07-04  9:28   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 13/15] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
2017-07-04  9:32   ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 14/15] block: Align block status requests Eric Blake
2017-07-04  9:44   ` Fam Zheng
2017-07-05 12:01     ` Eric Blake
2017-07-05 12:12       ` Fam Zheng
2017-07-03 22:14 ` [Qemu-devel] [PATCH v2 15/15] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
2017-07-04  9:50   ` Fam Zheng
2017-07-03 22:20 ` [Qemu-devel] [RFC PATCH v2 16/15] block: Add .bdrv_co_block_status() callback Eric Blake
2017-07-04 11:30   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.