All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based
@ 2017-09-13 16:03 Eric Blake
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status() Eric Blake
                   ` (23 more replies)
  0 siblings, 24 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

Now that 2.11 is open, I'm rebasing/reposting the remaining patches.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (merged, commit 51b0a488)
part 2: dirty-bitmap (v7 is posted [1], mostly reviewed)
part 3: bdrv_get_block_status (this series, v3 at [2])
part 4: .bdrv_co_block_status (v2 is posted [3], but needs a rebase)

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-status-v4

Based-on: <20170912203119.24166-1-eblake@redhat.com>
([PATCH v7 00/20] make dirty-bitmap byte-based)

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03160.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg03853.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04370.html

Since v3:
- Minor rebasing

001/23:[----] [-C] 'block: Allow NULL file for bdrv_get_block_status()'
002/23:[----] [--] 'block: Add flag to avoid wasted work in bdrv_is_allocated()'
003/23:[----] [-C] 'block: Make bdrv_round_to_clusters() signature more useful'
004/23:[----] [--] 'qcow2: Switch is_zero_sectors() to byte-based'
005/23:[----] [--] 'block: Switch bdrv_make_zero() to byte-based'
006/23:[----] [--] 'qemu-img: Switch get_block_status() to byte-based'
007/23:[----] [--] 'block: Convert bdrv_get_block_status() to bytes'
008/23:[----] [--] 'block: Switch bdrv_co_get_block_status() to byte-based'
009/23:[----] [--] 'block: Switch BdrvCoGetBlockStatusData to byte-based'
010/23:[----] [--] 'block: Switch bdrv_common_block_status_above() to byte-based'
011/23:[----] [--] 'block: Switch bdrv_co_get_block_status_above() to byte-based'
012/23:[0002] [FC] 'block: Convert bdrv_get_block_status_above() to bytes'
013/23:[----] [--] 'qemu-img: Simplify logic in img_compare()'
014/23:[----] [--] 'qemu-img: Speed up compare on pre-allocated larger file'
015/23:[----] [--] 'qemu-img: Add find_nonzero()'
016/23:[----] [--] 'qemu-img: Drop redundant error message in compare'
017/23:[----] [--] 'qemu-img: Change check_empty_sectors() to byte-based'
018/23:[----] [--] 'qemu-img: Change compare_sectors() to be byte-based'
019/23:[----] [--] 'qemu-img: Change img_rebase() to be byte-based'
020/23:[----] [--] 'qemu-img: Change img_compare() to be byte-based'
021/23:[----] [--] 'block: Align block status requests'
022/23:[----] [--] 'block: Relax bdrv_aligned_preadv() assertion'
023/23:[----] [--] 'qemu-io: Relax 'alloc' now that block-status doesn't assert'
Eric Blake (23):
  block: Allow NULL file for bdrv_get_block_status()
  block: Add flag to avoid wasted work in bdrv_is_allocated()
  block: Make bdrv_round_to_clusters() signature more useful
  qcow2: Switch is_zero_sectors() to byte-based
  block: Switch bdrv_make_zero() to byte-based
  qemu-img: Switch get_block_status() to byte-based
  block: Convert bdrv_get_block_status() to bytes
  block: Switch bdrv_co_get_block_status() to byte-based
  block: Switch BdrvCoGetBlockStatusData to byte-based
  block: Switch bdrv_common_block_status_above() to byte-based
  block: Switch bdrv_co_get_block_status_above() to byte-based
  block: Convert bdrv_get_block_status_above() to bytes
  qemu-img: Simplify logic in img_compare()
  qemu-img: Speed up compare on pre-allocated larger file
  qemu-img: Add find_nonzero()
  qemu-img: Drop redundant error message in compare
  qemu-img: Change check_empty_sectors() to byte-based
  qemu-img: Change compare_sectors() to be byte-based
  qemu-img: Change img_rebase() to be byte-based
  qemu-img: Change img_compare() to be byte-based
  block: Align block status requests
  block: Relax bdrv_aligned_preadv() assertion
  qemu-io: Relax 'alloc' now that block-status doesn't assert

 include/block/block.h      |  26 ++--
 include/block/block_int.h  |  11 +-
 block/io.c                 | 287 ++++++++++++++++++++---------------
 block/blkdebug.c           |  13 +-
 block/mirror.c             |  24 +--
 block/qcow2-cluster.c      |   2 +-
 block/qcow2.c              |  53 +++----
 qemu-img.c                 | 365 ++++++++++++++++++++-------------------------
 qemu-io-cmds.c             |  13 --
 block/trace-events         |   2 +-
 tests/qemu-iotests/074.out |   2 -
 tests/qemu-iotests/177     |  12 +-
 tests/qemu-iotests/177.out |  19 ++-
 13 files changed, 420 insertions(+), 409 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status()
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-25 22:43   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz, Jeff Cody

Not all callers care about which BDS owns the mapping for a given
range of the file.  This patch merely simplifies the callers by
consolidating the logic in the common call point, while guaranteeing
a non-NULL file to all the driver callbacks, for no semantic change.
The only caller that does not care about pnum is bdrv_is_allocated,
as invoked by vvfat; we can likewise add assertions that the rest
of the stack does not have to worry about a NULL pnum.

Furthermore, this will also set the stage for a future cleanup: when
a caller does not care about which BDS owns an offset, it would be
nice to allow the driver to optimize things to not have to return
BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
allocation (for example, it's fairly easy to create a qcow2 image
where consecutive guest addresses are not at consecutive host
addresses), the current contract requires bdrv_get_block_status()
to clamp *pnum to the limit where host addresses are no longer
consecutive, but allowing a NULL file means that *pnum could be
set to the full length of known-allocated data.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: only context changes
v3: rebase to recent changes (qcow2_measure), dropped R-b
v2: use local variable and final transfer, rather than assignment
of parameter to local
[previously in different series]:
v2: new patch, https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05645.html
---
 include/block/block_int.h | 10 ++++++----
 block/io.c                | 44 ++++++++++++++++++++++++++++----------------
 block/mirror.c            |  3 +--
 block/qcow2.c             |  8 ++------
 qemu-img.c                | 10 ++++------
 5 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 55c5d573d4..7f71c585a0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -202,10 +202,12 @@ struct BlockDriver {
         int64_t offset, int bytes);

     /*
-     * Building block for bdrv_block_status[_above]. The driver should
-     * answer only according to the current layer, and should not
-     * set BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
-     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.
+     * Building block for bdrv_block_status[_above] and
+     * bdrv_is_allocated[_above].  The driver should answer only
+     * according to the current layer, and should not set
+     * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
+     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
+     * layer guarantees non-NULL pnum and file.
      */
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum,
diff --git a/block/io.c b/block/io.c
index 8a0cd8835a..f250029395 100644
--- a/block/io.c
+++ b/block/io.c
@@ -695,7 +695,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
     int64_t target_sectors, ret, nb_sectors, sector_num = 0;
     BlockDriverState *bs = child->bs;
-    BlockDriverState *file;
     int n;

     target_sectors = bdrv_nb_sectors(bs);
@@ -708,7 +707,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
         if (nb_sectors <= 0) {
             return 0;
         }
-        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, &file);
+        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, NULL);
         if (ret < 0) {
             error_report("error getting block status at sector %" PRId64 ": %s",
                          sector_num, strerror(-ret));
@@ -1755,8 +1754,9 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * beyond the end of the disk image it will be clamped; if 'pnum' is set to
  * the end of the image, then the returned value will include BDRV_BLOCK_EOF.
  *
- * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, 'file'
- * points to the BDS which the sector range is allocated in.
+ * If returned value is positive, BDRV_BLOCK_OFFSET_VALID bit is set, and
+ * 'file' is non-NULL, then '*file' points to the BDS which the sector range
+ * is allocated in.
  */
 static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
                                                      int64_t sector_num,
@@ -1766,15 +1766,22 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     int64_t total_sectors;
     int64_t n;
     int64_t ret, ret2;
+    BlockDriverState *local_file = NULL;

-    *file = NULL;
+    assert(pnum);
     total_sectors = bdrv_nb_sectors(bs);
     if (total_sectors < 0) {
+        if (file) {
+            *file = NULL;
+        }
         return total_sectors;
     }

     if (sector_num >= total_sectors) {
         *pnum = 0;
+        if (file) {
+            *file = NULL;
+        }
         return BDRV_BLOCK_EOF;
     }

@@ -1791,23 +1798,27 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
         if (bs->drv->protocol_name) {
             ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
-            *file = bs;
+            if (file) {
+                *file = bs;
+            }
+        } else if (file) {
+            *file = NULL;
         }
         return ret;
     }

     bdrv_inc_in_flight(bs);
     ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
-                                            file);
+                                            &local_file);
     if (ret < 0) {
         *pnum = 0;
         goto out;
     }

     if (ret & BDRV_BLOCK_RAW) {
-        assert(ret & BDRV_BLOCK_OFFSET_VALID && *file);
-        ret = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
-                                       *pnum, pnum, file);
+        assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
+        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+                                       *pnum, pnum, &local_file);
         goto out;
     }

@@ -1825,14 +1836,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }

-    if (*file && *file != bs &&
+    if (local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
-        BlockDriverState *file2;
         int file_pnum;

-        ret2 = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
-                                        *pnum, &file_pnum, &file2);
+        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+                                        *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
              * is useful but not necessary.
@@ -1854,6 +1864,9 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     }

 out:
+    if (file) {
+        *file = local_file;
+    }
     bdrv_dec_in_flight(bs);
     if (ret >= 0 && sector_num + *pnum == total_sectors) {
         ret |= BDRV_BLOCK_EOF;
@@ -1957,7 +1970,6 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
-    BlockDriverState *file;
     int64_t sector_num = offset >> BDRV_SECTOR_BITS;
     int nb_sectors = bytes >> BDRV_SECTOR_BITS;
     int64_t ret;
@@ -1966,7 +1978,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
     assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
     assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) && bytes < INT_MAX);
     ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
-                                &file);
+                                NULL);
     if (ret < 0) {
         return ret;
     }
diff --git a/block/mirror.c b/block/mirror.c
index 5cdaaed7be..032cfe91fa 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -390,7 +390,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         int io_sectors;
         unsigned int io_bytes;
         int64_t io_bytes_acct;
-        BlockDriverState *file;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
             MIRROR_METHOD_ZERO,
@@ -401,7 +400,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         ret = bdrv_get_block_status_above(source, NULL,
                                           offset >> BDRV_SECTOR_BITS,
                                           nb_chunks * sectors_per_chunk,
-                                          &io_sectors, &file);
+                                          &io_sectors, NULL);
         io_bytes = io_sectors * BDRV_SECTOR_SIZE;
         if (ret < 0) {
             io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
diff --git a/block/qcow2.c b/block/qcow2.c
index 64dcd98a91..9a7b5cd41f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2975,7 +2975,6 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
                             uint32_t count)
 {
     int nr;
-    BlockDriverState *file;
     int64_t res;

     if (start + count > bs->total_sectors) {
@@ -2985,8 +2984,7 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
     if (!count) {
         return true;
     }
-    res = bdrv_get_block_status_above(bs, NULL, start, count,
-                                      &nr, &file);
+    res = bdrv_get_block_status_above(bs, NULL, start, count, &nr, NULL);
     return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
 }

@@ -3654,13 +3652,11 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
                  offset += pnum * BDRV_SECTOR_SIZE) {
                 int nb_sectors = MIN(ssize - offset,
                                      BDRV_REQUEST_MAX_BYTES) / BDRV_SECTOR_SIZE;
-                BlockDriverState *file;
                 int64_t ret;

                 ret = bdrv_get_block_status_above(in_bs, NULL,
                                                   offset >> BDRV_SECTOR_BITS,
-                                                  nb_sectors,
-                                                  &pnum, &file);
+                                                  nb_sectors, &pnum, NULL);
                 if (ret < 0) {
                     error_setg_errno(&local_err, -ret,
                                      "Unable to get block status");
diff --git a/qemu-img.c b/qemu-img.c
index df984b11b9..0c12e1c240 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1374,7 +1374,6 @@ static int img_compare(int argc, char **argv)

     for (;;) {
         int64_t status1, status2;
-        BlockDriverState *file;

         nb_sectors = sectors_to_process(total_sectors, sector_num);
         if (nb_sectors <= 0) {
@@ -1382,7 +1381,7 @@ static int img_compare(int argc, char **argv)
         }
         status1 = bdrv_get_block_status_above(bs1, NULL, sector_num,
                                               total_sectors1 - sector_num,
-                                              &pnum1, &file);
+                                              &pnum1, NULL);
         if (status1 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename1);
@@ -1392,7 +1391,7 @@ static int img_compare(int argc, char **argv)

         status2 = bdrv_get_block_status_above(bs2, NULL, sector_num,
                                               total_sectors2 - sector_num,
-                                              &pnum2, &file);
+                                              &pnum2, NULL);
         if (status2 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename2);
@@ -1598,15 +1597,14 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
     n = MIN(s->total_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);

     if (s->sector_next_status <= sector_num) {
-        BlockDriverState *file;
         if (s->target_has_backing) {
             ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
                                         sector_num - src_cur_offset,
-                                        n, &n, &file);
+                                        n, &n, NULL);
         } else {
             ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
                                               sector_num - src_cur_offset,
-                                              n, &n, &file);
+                                              n, &n, NULL);
         }
         if (ret < 0) {
             return ret;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status() Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 18:31   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

Not all callers care about which BDS owns the mapping for a given
range of the file.  In particular, bdrv_is_allocated() cares more
about finding the largest run of allocated data from the guest
perspective, whether or not that data is consecutive from the
host perspective.  Therefore, doing subsequent refinements such
as checking how much of the format-layer allocation also satisfies
BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
case, it just costs extra CPU cycles during a single
bdrv_is_allocated(), but in the worst case, it results in a smaller
*pnum, and forces callers to iterate through more status probes when
visiting the entire file for even more extra CPU cycles.

This patch only optimizes the block layer.  But subsequent patches
will tweak the driver callback to be byte-based, and in the process,
can also pass this hint through to the driver.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: only context changes
v3: s/allocation/mapping/ and flip sense of bool
v2: new patch
---
 block/io.c | 52 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index f250029395..6509c804d4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1709,6 +1709,7 @@ typedef struct BdrvCoGetBlockStatusData {
     int nb_sectors;
     int *pnum;
     int64_t ret;
+    bool mapping;
     bool done;
 } BdrvCoGetBlockStatusData;

@@ -1743,6 +1744,11 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
+ * If 'mapping' is true, the caller is querying for mapping purposes,
+ * and the result should include BDRV_BLOCK_OFFSET_VALID where
+ * possible; otherwise, the result may omit that bit particularly if
+ * it allows for a larger value in 'pnum'.
+ *
  * If 'sector_num' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
  *
@@ -1759,6 +1765,7 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * is allocated in.
  */
 static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
+                                                     bool mapping,
                                                      int64_t sector_num,
                                                      int nb_sectors, int *pnum,
                                                      BlockDriverState **file)
@@ -1817,14 +1824,15 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+        ret = bdrv_co_get_block_status(local_file, mapping,
+                                       ret >> BDRV_SECTOR_BITS,
                                        *pnum, pnum, &local_file);
         goto out;
     }

     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
-    } else {
+    } else if (mapping) {
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
         } else if (bs->backing) {
@@ -1836,12 +1844,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }

-    if (local_file && local_file != bs &&
+    if (mapping && local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int file_pnum;

-        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+        ret2 = bdrv_co_get_block_status(local_file, mapping,
+                                        ret >> BDRV_SECTOR_BITS,
                                         *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
@@ -1876,6 +1885,7 @@ out:

 static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
         BlockDriverState *base,
+        bool mapping,
         int64_t sector_num,
         int nb_sectors,
         int *pnum,
@@ -1887,7 +1897,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
+        ret = bdrv_co_get_block_status(p, mapping, sector_num, nb_sectors,
+                                       pnum, file);
         if (ret < 0) {
             break;
         }
@@ -1917,6 +1928,7 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
     BdrvCoGetBlockStatusData *data = opaque;

     data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
+                                               data->mapping,
                                                data->sector_num,
                                                data->nb_sectors,
                                                data->pnum,
@@ -1929,11 +1941,12 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
  *
  * See bdrv_co_get_block_status_above() for details.
  */
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file)
+static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
+                                              BlockDriverState *base,
+                                              bool mapping,
+                                              int64_t sector_num,
+                                              int nb_sectors, int *pnum,
+                                              BlockDriverState **file)
 {
     Coroutine *co;
     BdrvCoGetBlockStatusData data = {
@@ -1943,6 +1956,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
         .sector_num = sector_num,
         .nb_sectors = nb_sectors,
         .pnum = pnum,
+        .mapping = mapping,
         .done = false,
     };

@@ -1958,6 +1972,16 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
     return data.ret;
 }

+int64_t bdrv_get_block_status_above(BlockDriverState *bs,
+                                    BlockDriverState *base,
+                                    int64_t sector_num,
+                                    int nb_sectors, int *pnum,
+                                    BlockDriverState **file)
+{
+    return bdrv_common_block_status_above(bs, base, true, sector_num,
+                                          nb_sectors, pnum, file);
+}
+
 int64_t bdrv_get_block_status(BlockDriverState *bs,
                               int64_t sector_num,
                               int nb_sectors, int *pnum,
@@ -1970,15 +1994,15 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
-    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
     int64_t ret;
     int psectors;

     assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
     assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) && bytes < INT_MAX);
-    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
-                                NULL);
+    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false,
+                                         offset >> BDRV_SECTOR_BITS,
+                                         bytes >> BDRV_SECTOR_BITS, &psectors,
+                                         NULL);
     if (ret < 0) {
         return ret;
     }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status() Eric Blake
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 18:51   ` John Snow
  2017-09-29 20:03   ` Eric Blake
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
                   ` (20 subsequent siblings)
  23 siblings, 2 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz, Jeff Cody

In the process of converting sector-based interfaces to bytes,
I'm finding it easier to represent a byte count as a 64-bit
integer at the block layer (even if we are internally capped
by SIZE_MAX or even INT_MAX for individual transactions, it's
still nicer to not have to worry about truncation/overflow
issues on as many variables).  Update the signature of
bdrv_round_to_clusters() to uniformly use int64_t, matching
the signature already chosen for bdrv_is_allocated and the
fact that off_t is also a signed type, then adjust clients
according to the required fallout.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

---
v4: only context changes
v3: no change
v2: fix commit message [John], rebase to earlier changes, including
mirror_clip_bytes() signature update
---
 include/block/block.h | 4 ++--
 block/io.c            | 7 ++++---
 block/mirror.c        | 7 +++----
 block/trace-events    | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 2ad18775af..bb3b95d491 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -475,9 +475,9 @@ int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
 void bdrv_round_to_clusters(BlockDriverState *bs,
-                            int64_t offset, unsigned int bytes,
+                            int64_t offset, int64_t bytes,
                             int64_t *cluster_offset,
-                            unsigned int *cluster_bytes);
+                            int64_t *cluster_bytes);

 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
diff --git a/block/io.c b/block/io.c
index 6509c804d4..b362b46e3d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -446,9 +446,9 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-                            int64_t offset, unsigned int bytes,
+                            int64_t offset, int64_t bytes,
                             int64_t *cluster_offset,
-                            unsigned int *cluster_bytes)
+                            int64_t *cluster_bytes)
 {
     BlockDriverInfo bdi;

@@ -946,7 +946,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
     struct iovec iov;
     QEMUIOVector bounce_qiov;
     int64_t cluster_offset;
-    unsigned int cluster_bytes;
+    int64_t cluster_bytes;
     size_t skip_bytes;
     int ret;

@@ -967,6 +967,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
     trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
                                    cluster_offset, cluster_bytes);

+    assert(cluster_bytes < SIZE_MAX);
     iov.iov_len = cluster_bytes;
     iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
     if (bounce_buffer == NULL) {
diff --git a/block/mirror.c b/block/mirror.c
index 032cfe91fa..67f45cec4e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -190,10 +190,9 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
     bool need_cow;
     int ret = 0;
     int64_t align_offset = *offset;
-    unsigned int align_bytes = *bytes;
+    int64_t align_bytes = *bytes;
     int max_bytes = s->granularity * s->max_iov;

-    assert(*bytes < INT_MAX);
     need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
     need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
                           s->cow_bitmap);
@@ -388,7 +387,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     while (nb_chunks > 0 && offset < s->bdev_length) {
         int64_t ret;
         int io_sectors;
-        unsigned int io_bytes;
+        int64_t io_bytes;
         int64_t io_bytes_acct;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
@@ -413,7 +412,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             io_bytes = s->granularity;
         } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
             int64_t target_offset;
-            unsigned int target_bytes;
+            int64_t target_bytes;
             bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
                                    &target_offset, &target_bytes);
             if (target_offset == offset &&
diff --git a/block/trace-events b/block/trace-events
index 25dd5a3026..4c6586f156 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -12,7 +12,7 @@ blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flag
 bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x"
-bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"
+bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %"PRId64" cluster_offset %"PRId64" cluster_bytes %u"

 # block/stream.c
 stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (2 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 19:06   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() " Eric Blake
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and rename it to is_zero() in the
process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
v3: no change
v2: rename function, rebase to upstream changes
---
 block/qcow2.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9a7b5cd41f..eb498b56d4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2971,21 +2971,28 @@ finish:
 }


-static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
-                            uint32_t count)
+static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
     int nr;
     int64_t res;
+    int64_t start;

-    if (start + count > bs->total_sectors) {
-        count = bs->total_sectors - start;
+    /* Widen to sector boundaries, then clamp to image length, before
+     * checking status of underlying sectors */
+    start = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+    bytes = QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE) - start;
+
+    if (start + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - start;
     }

-    if (!count) {
+    if (!bytes) {
         return true;
     }
-    res = bdrv_get_block_status_above(bs, NULL, start, count, &nr, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
+    res = bdrv_get_block_status_above(bs, NULL, start >> BDRV_SECTOR_BITS,
+                                      bytes >> BDRV_SECTOR_BITS, &nr, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) &&
+        nr * BDRV_SECTOR_SIZE == bytes;
 }

 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
@@ -3003,24 +3010,21 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     }

     if (head || tail) {
-        int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
         uint64_t off;
         unsigned int nr;

         assert(head + bytes <= s->cluster_size);

         /* check whether remainder of cluster already reads as zero */
-        if (!(is_zero_sectors(bs, cl_start,
-                              DIV_ROUND_UP(head, BDRV_SECTOR_SIZE)) &&
-              is_zero_sectors(bs, (offset + bytes) >> BDRV_SECTOR_BITS,
-                              DIV_ROUND_UP(-tail & (s->cluster_size - 1),
-                                           BDRV_SECTOR_SIZE)))) {
+        if (!(is_zero(bs, offset - head, head) &&
+              is_zero(bs, offset + bytes,
+                      tail ? s->cluster_size - tail : 0))) {
             return -ENOTSUP;
         }

         qemu_co_mutex_lock(&s->lock);
         /* We can have new write after previous check */
-        offset = cl_start << BDRV_SECTOR_BITS;
+        offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
         bytes = s->cluster_size;
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (3 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 19:13   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() " Eric Blake
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of zeroing a device to track by bytes instead of
sectors (although we are still guaranteed that we iterate by steps
that are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

---
v3: no change
v2: rebase to earlier changes
---
 block/io.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index b362b46e3d..638b3890b7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -693,38 +693,38 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
  */
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
-    int64_t target_sectors, ret, nb_sectors, sector_num = 0;
+    int64_t target_size, ret, bytes, offset = 0;
     BlockDriverState *bs = child->bs;
-    int n;
+    int n; /* sectors */

-    target_sectors = bdrv_nb_sectors(bs);
-    if (target_sectors < 0) {
-        return target_sectors;
+    target_size = bdrv_getlength(bs);
+    if (target_size < 0) {
+        return target_size;
     }

     for (;;) {
-        nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);
-        if (nb_sectors <= 0) {
+        bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
+        if (bytes <= 0) {
             return 0;
         }
-        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, NULL);
+        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+                                    bytes >> BDRV_SECTOR_BITS, &n, NULL);
         if (ret < 0) {
-            error_report("error getting block status at sector %" PRId64 ": %s",
-                         sector_num, strerror(-ret));
+            error_report("error getting block status at offset %" PRId64 ": %s",
+                         offset, strerror(-ret));
             return ret;
         }
         if (ret & BDRV_BLOCK_ZERO) {
-            sector_num += n;
+            offset += n * BDRV_SECTOR_BITS;
             continue;
         }
-        ret = bdrv_pwrite_zeroes(child, sector_num << BDRV_SECTOR_BITS,
-                                 n << BDRV_SECTOR_BITS, flags);
+        ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
         if (ret < 0) {
-            error_report("error writing zeroes at sector %" PRId64 ": %s",
-                         sector_num, strerror(-ret));
+            error_report("error writing zeroes at offset %" PRId64 ": %s",
+                         offset, strerror(-ret));
             return ret;
         }
-        sector_num += n;
+        offset += n * BDRV_SECTOR_SIZE;
     }
 }

-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (4 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 19:16   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes Eric Blake
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
an internal function (no semantic change), and simplifying its
caller accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
v2-v3: no change
---
 qemu-img.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 0c12e1c240..54f7682069 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2670,14 +2670,16 @@ static void dump_map_entry(OutputFormat output_format, MapEntry *e,
     }
 }

-static int get_block_status(BlockDriverState *bs, int64_t sector_num,
-                            int nb_sectors, MapEntry *e)
+static int get_block_status(BlockDriverState *bs, int64_t offset,
+                            int64_t bytes, MapEntry *e)
 {
     int64_t ret;
     int depth;
     BlockDriverState *file;
     bool has_offset;
+    int nb_sectors = bytes >> BDRV_SECTOR_BITS;

+    assert(bytes < INT_MAX);
     /* As an optimization, we could cache the current range of unallocated
      * clusters in each file of the chain, and avoid querying the same
      * range repeatedly.
@@ -2685,8 +2687,8 @@ static int get_block_status(BlockDriverState *bs, int64_t sector_num,

     depth = 0;
     for (;;) {
-        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &nb_sectors,
-                                    &file);
+        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS, nb_sectors,
+                                    &nb_sectors, &file);
         if (ret < 0) {
             return ret;
         }
@@ -2706,7 +2708,7 @@ static int get_block_status(BlockDriverState *bs, int64_t sector_num,
     has_offset = !!(ret & BDRV_BLOCK_OFFSET_VALID);

     *e = (MapEntry) {
-        .start = sector_num * BDRV_SECTOR_SIZE,
+        .start = offset,
         .length = nb_sectors * BDRV_SECTOR_SIZE,
         .data = !!(ret & BDRV_BLOCK_DATA),
         .zero = !!(ret & BDRV_BLOCK_ZERO),
@@ -2836,16 +2838,12 @@ static int img_map(int argc, char **argv)

     length = blk_getlength(blk);
     while (curr.start + curr.length < length) {
-        int64_t nsectors_left;
-        int64_t sector_num;
-        int n;
-
-        sector_num = (curr.start + curr.length) >> BDRV_SECTOR_BITS;
+        int64_t offset = curr.start + curr.length;
+        int64_t n;

         /* Probe up to 1 GiB at a time.  */
-        nsectors_left = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE) - sector_num;
-        n = MIN(1 << (30 - BDRV_SECTOR_BITS), nsectors_left);
-        ret = get_block_status(bs, sector_num, n, &next);
+        n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE);
+        ret = get_block_status(bs, offset, n, &next);

         if (ret < 0) {
             error_report("Could not read file metadata: %s", strerror(-ret));
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (5 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 19:39   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status() to
bdrv_block_status() ensures that the compiler enforces that all
callers are updated.  For now, the io.c layer still assert()s that
all callers are sector-aligned, but that can be relaxed when a later
patch implements byte-based block status in the drivers.

Note that we have an inherent limitation in the BDRV_BLOCK_* return
values: BDRV_BLOCK_OFFSET_VALID can only return the start of a
sector, even if we later relax the interface to query for the status
starting at an intermediate byte; document the obvious interpretation
that valid offsets are always sector-relative.

Therefore, for the most part this patch is just the addition of scaling
at the callers followed by inverse scaling at bdrv_block_status().  But
some code, particularly bdrv_is_allocated(), gets a lot simpler because
it no longer has to mess with sectors.

For ease of review, bdrv_get_block_status_above() will be tackled
separately.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: clamp bytes to 32-bits, rather than asserting
v2: rebase to earlier changes
---
 include/block/block.h | 12 +++++++-----
 block/io.c            | 31 +++++++++++++++++++------------
 block/qcow2-cluster.c |  2 +-
 qemu-img.c            | 20 +++++++++++---------
 4 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index bb3b95d491..7a9a8db588 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -138,8 +138,10 @@ typedef struct HDGeometry {
  *
  * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK)
  * represent the offset in the returned BDS that is allocated for the
- * corresponding raw data; however, whether that offset actually contains
- * data also depends on BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO, as follows:
+ * corresponding raw data.  Individual bytes are at the same sector-relative
+ * locations (and thus, this bit cannot be set for mappings which are
+ * not equivalent modulo 512).  However, whether that offset actually
+ * contains data also depends on BDRV_BLOCK_DATA, as follows:
  *
  * DATA ZERO OFFSET_VALID
  *  t    t        t       sectors read as zero, returned file is zero at offset
@@ -421,9 +423,9 @@ int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
-int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
-                              int nb_sectors, int *pnum,
-                              BlockDriverState **file);
+int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
+                          int64_t bytes, int64_t *pnum,
+                          BlockDriverState **file);
 int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                     BlockDriverState *base,
                                     int64_t sector_num,
diff --git a/block/io.c b/block/io.c
index 638b3890b7..1ed46bcece 100644
--- a/block/io.c
+++ b/block/io.c
@@ -695,7 +695,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
     int64_t target_size, ret, bytes, offset = 0;
     BlockDriverState *bs = child->bs;
-    int n; /* sectors */

     target_size = bdrv_getlength(bs);
     if (target_size < 0) {
@@ -707,24 +706,23 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
         if (bytes <= 0) {
             return 0;
         }
-        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-                                    bytes >> BDRV_SECTOR_BITS, &n, NULL);
+        ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL);
         if (ret < 0) {
             error_report("error getting block status at offset %" PRId64 ": %s",
                          offset, strerror(-ret));
             return ret;
         }
         if (ret & BDRV_BLOCK_ZERO) {
-            offset += n * BDRV_SECTOR_BITS;
+            offset += bytes;
             continue;
         }
-        ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
+        ret = bdrv_pwrite_zeroes(child, offset, bytes, flags);
         if (ret < 0) {
             error_report("error writing zeroes at offset %" PRId64 ": %s",
                          offset, strerror(-ret));
             return ret;
         }
-        offset += n * BDRV_SECTOR_SIZE;
+        offset += bytes;
     }
 }

@@ -1983,13 +1981,22 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                           nb_sectors, pnum, file);
 }

-int64_t bdrv_get_block_status(BlockDriverState *bs,
-                              int64_t sector_num,
-                              int nb_sectors, int *pnum,
-                              BlockDriverState **file)
+int64_t bdrv_block_status(BlockDriverState *bs,
+                          int64_t offset, int64_t bytes, int64_t *pnum,
+                          BlockDriverState **file)
 {
-    return bdrv_get_block_status_above(bs, backing_bs(bs),
-                                       sector_num, nb_sectors, pnum, file);
+    int64_t ret;
+    int n;
+
+    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
+    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
+    ret = bdrv_get_block_status_above(bs, backing_bs(bs),
+                                      offset >> BDRV_SECTOR_BITS,
+                                      bytes >> BDRV_SECTOR_BITS, &n, file);
+    if (pnum) {
+        *pnum = n * BDRV_SECTOR_SIZE;
+    }
+    return ret;
 }

 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0d4824993c..d837b3980d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1584,7 +1584,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
          * cluster is already marked as zero, or if it's unallocated and we
          * don't have a backing file.
          *
-         * TODO We might want to use bdrv_get_block_status(bs) here, but we're
+         * TODO We might want to use bdrv_block_status(bs) here, but we're
          * holding s->lock, so that doesn't work today.
          *
          * If full_discard is true, the sector should not read back as zeroes,
diff --git a/qemu-img.c b/qemu-img.c
index 54f7682069..897f80abb3 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1598,9 +1598,14 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)

     if (s->sector_next_status <= sector_num) {
         if (s->target_has_backing) {
-            ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
-                                        sector_num - src_cur_offset,
-                                        n, &n, NULL);
+            int64_t count = n * BDRV_SECTOR_SIZE;
+
+            ret = bdrv_block_status(blk_bs(s->src[src_cur]),
+                                    (sector_num - src_cur_offset) *
+                                    BDRV_SECTOR_SIZE,
+                                    count, &count, NULL);
+            assert(ret < 0 || QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
+            n = count >> BDRV_SECTOR_BITS;
         } else {
             ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
                                               sector_num - src_cur_offset,
@@ -2677,9 +2682,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
     int depth;
     BlockDriverState *file;
     bool has_offset;
-    int nb_sectors = bytes >> BDRV_SECTOR_BITS;

-    assert(bytes < INT_MAX);
     /* As an optimization, we could cache the current range of unallocated
      * clusters in each file of the chain, and avoid querying the same
      * range repeatedly.
@@ -2687,12 +2690,11 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,

     depth = 0;
     for (;;) {
-        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS, nb_sectors,
-                                    &nb_sectors, &file);
+        ret = bdrv_block_status(bs, offset, bytes, &bytes, &file);
         if (ret < 0) {
             return ret;
         }
-        assert(nb_sectors);
+        assert(bytes);
         if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
             break;
         }
@@ -2709,7 +2711,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,

     *e = (MapEntry) {
         .start = offset,
-        .length = nb_sectors * BDRV_SECTOR_SIZE,
+        .length = bytes,
         .data = !!(ret & BDRV_BLOCK_DATA),
         .zero = !!(ret & BDRV_BLOCK_ZERO),
         .offset = ret & BDRV_BLOCK_OFFSET_MASK,
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (6 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 20:15   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData " Eric Blake
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change); and as with its public counterpart,
rename to bdrv_co_block_status() to make the compiler enforce that
we catch all uses.  For now, we assert that callers still pass
aligned data, but ultimately, this will be the function where we
hand off to a byte-based driver callback, and will eventually need
to add logic to ensure we round calls according to the driver's
request_alignment then touch up the result handed back to the
caller, to start permitting a caller to pass unaligned offsets.

Note that we are now prepared to accepts 'bytes' larger than INT_MAX;
this is okay as long as we clamp things internally before violating
any 32-bit limits, and makes no difference to how a client will
use the information (clients looping over the entire file must
already be prepared for consecutive calls to return the same status,
as drivers are already free to return shorter-than-maximal status
due to any other convenient split points, such as when the L2 table
crosses cluster boundaries in qcow2).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: no change
v3: rebase to allocation/mapping sense change, clamp bytes to 32-bits
when needed, drop R-b
v2: rebase to earlier changes
---
 block/io.c | 91 +++++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 37 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1ed46bcece..da85c903dd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1748,42 +1748,43 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * possible; otherwise, the result may omit that bit particularly if
  * it allows for a larger value in 'pnum'.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is
+ * If 'offset' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
- * allocated/unallocated state.
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
+ * allocated/unallocated state.  It may be NULL.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped; if 'pnum' is set to
  * the end of the image, then the returned value will include BDRV_BLOCK_EOF.
  *
  * If returned value is positive, BDRV_BLOCK_OFFSET_VALID bit is set, and
- * 'file' is non-NULL, then '*file' points to the BDS which the sector range
- * is allocated in.
+ * 'file' is non-NULL, then '*file' points to the BDS which owns the
+ * allocated sector that contains offset.
  */
-static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
-                                                     bool mapping,
-                                                     int64_t sector_num,
-                                                     int nb_sectors, int *pnum,
-                                                     BlockDriverState **file)
+static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
+                                                 bool mapping,
+                                                 int64_t offset, int64_t bytes,
+                                                 int64_t *pnum,
+                                                 BlockDriverState **file)
 {
-    int64_t total_sectors;
-    int64_t n;
+    int64_t total_size;
+    int64_t n; /* bytes */
     int64_t ret, ret2;
     BlockDriverState *local_file = NULL;
+    int count; /* sectors */

     assert(pnum);
-    total_sectors = bdrv_nb_sectors(bs);
-    if (total_sectors < 0) {
+    total_size = bdrv_getlength(bs);
+    if (total_size < 0) {
         if (file) {
             *file = NULL;
         }
-        return total_sectors;
+        return total_size;
     }

-    if (sector_num >= total_sectors) {
+    if (offset >= total_size) {
         *pnum = 0;
         if (file) {
             *file = NULL;
@@ -1791,19 +1792,19 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         return BDRV_BLOCK_EOF;
     }

-    n = total_sectors - sector_num;
-    if (n < nb_sectors) {
-        nb_sectors = n;
+    n = total_size - offset;
+    if (n < bytes) {
+        bytes = n;
     }

     if (!bs->drv->bdrv_co_get_block_status) {
-        *pnum = nb_sectors;
+        *pnum = bytes;
         ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
-        if (sector_num + nb_sectors == total_sectors) {
+        if (offset + bytes == total_size) {
             ret |= BDRV_BLOCK_EOF;
         }
         if (bs->drv->protocol_name) {
-            ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
+            ret |= BDRV_BLOCK_OFFSET_VALID | (offset & BDRV_BLOCK_OFFSET_MASK);
             if (file) {
                 *file = bs;
             }
@@ -1814,18 +1815,28 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     }

     bdrv_inc_in_flight(bs);
-    ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
+    /*
+     * TODO: Rather than require aligned offsets, we could instead
+     * round to the driver's request_alignment here, then touch up
+     * count afterwards back to the caller's expectations.
+     */
+    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
+    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
+    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+                                            bytes >> BDRV_SECTOR_BITS, &count,
                                             &local_file);
     if (ret < 0) {
         *pnum = 0;
         goto out;
     }
+    *pnum = count * BDRV_SECTOR_SIZE;

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_get_block_status(local_file, mapping,
-                                       ret >> BDRV_SECTOR_BITS,
-                                       *pnum, pnum, &local_file);
+        ret = bdrv_co_block_status(local_file, mapping,
+                                   ret & BDRV_BLOCK_OFFSET_MASK,
+                                   *pnum, pnum, &local_file);
+        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
         goto out;
     }

@@ -1836,8 +1847,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
             ret |= BDRV_BLOCK_ZERO;
         } else if (bs->backing) {
             BlockDriverState *bs2 = bs->backing->bs;
-            int64_t nb_sectors2 = bdrv_nb_sectors(bs2);
-            if (nb_sectors2 >= 0 && sector_num >= nb_sectors2) {
+            int64_t size2 = bdrv_getlength(bs2);
+            if (size2 >= 0 && offset >= size2) {
                 ret |= BDRV_BLOCK_ZERO;
             }
         }
@@ -1846,11 +1857,11 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     if (mapping && local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
-        int file_pnum;
+        int64_t file_pnum;

-        ret2 = bdrv_co_get_block_status(local_file, mapping,
-                                        ret >> BDRV_SECTOR_BITS,
-                                        *pnum, &file_pnum, NULL);
+        ret2 = bdrv_co_block_status(local_file, mapping,
+                                    ret & BDRV_BLOCK_OFFSET_MASK,
+                                    *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
              * is useful but not necessary.
@@ -1876,7 +1887,7 @@ out:
         *file = local_file;
     }
     bdrv_dec_in_flight(bs);
-    if (ret >= 0 && sector_num + *pnum == total_sectors) {
+    if (ret >= 0 && offset + *pnum == total_size) {
         ret |= BDRV_BLOCK_EOF;
     }
     return ret;
@@ -1896,11 +1907,17 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        ret = bdrv_co_get_block_status(p, mapping, sector_num, nb_sectors,
-                                       pnum, file);
+        int64_t count;
+
+        ret = bdrv_co_block_status(p, mapping,
+                                   sector_num * BDRV_SECTOR_SIZE,
+                                   nb_sectors * BDRV_SECTOR_SIZE, &count,
+                                   file);
         if (ret < 0) {
             break;
         }
+        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
+        *pnum = count >> BDRV_SECTOR_BITS;
         if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
             /*
              * Reading beyond the end of the file continues to read
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (7 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-26 20:20   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() " Eric Blake
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

---
v4: no change
v3: rebase to context conflicts, simple enough to keep R-b
v2: rebase to earlier changes
---
 block/io.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index da85c903dd..0adfbb8e70 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1700,17 +1700,17 @@ int bdrv_flush_all(void)
 }


-typedef struct BdrvCoGetBlockStatusData {
+typedef struct BdrvCoBlockStatusData {
     BlockDriverState *bs;
     BlockDriverState *base;
     BlockDriverState **file;
-    int64_t sector_num;
-    int nb_sectors;
-    int *pnum;
+    int64_t offset;
+    int64_t bytes;
+    int64_t *pnum;
     int64_t ret;
     bool mapping;
     bool done;
-} BdrvCoGetBlockStatusData;
+} BdrvCoBlockStatusData;

 int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
                                                         int64_t sector_num,
@@ -1941,14 +1941,16 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
 /* Coroutine wrapper for bdrv_get_block_status_above() */
 static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
 {
-    BdrvCoGetBlockStatusData *data = opaque;
+    BdrvCoBlockStatusData *data = opaque;
+    int n;

     data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
                                                data->mapping,
-                                               data->sector_num,
-                                               data->nb_sectors,
-                                               data->pnum,
+                                               data->offset >> BDRV_SECTOR_BITS,
+                                               data->bytes >> BDRV_SECTOR_BITS,
+                                               &n,
                                                data->file);
+    *data->pnum = n * BDRV_SECTOR_SIZE;
     data->done = true;
 }

@@ -1965,13 +1967,14 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
                                               BlockDriverState **file)
 {
     Coroutine *co;
-    BdrvCoGetBlockStatusData data = {
+    int64_t n;
+    BdrvCoBlockStatusData data = {
         .bs = bs,
         .base = base,
         .file = file,
-        .sector_num = sector_num,
-        .nb_sectors = nb_sectors,
-        .pnum = pnum,
+        .offset = sector_num * BDRV_SECTOR_SIZE,
+        .bytes = nb_sectors * BDRV_SECTOR_SIZE,
+        .pnum = &n,
         .mapping = mapping,
         .done = false,
     };
@@ -1985,6 +1988,8 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
         bdrv_coroutine_enter(bs, co);
         BDRV_POLL_WHILE(bs, !data.done);
     }
+    assert(data.ret < 0 || QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
+    *pnum = n >> BDRV_SECTOR_BITS;
     return data.ret;
 }

-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (8 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 18:26   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() " Eric Blake
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

---
v3: rebase to allocation/mapping sense change, simple enough to keep R-b
v2: new patch
---
 block/io.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/block/io.c b/block/io.c
index 0adfbb8e70..bc0e3fd0e2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1962,19 +1962,18 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
 static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
                                               BlockDriverState *base,
                                               bool mapping,
-                                              int64_t sector_num,
-                                              int nb_sectors, int *pnum,
+                                              int64_t offset,
+                                              int64_t bytes, int64_t *pnum,
                                               BlockDriverState **file)
 {
     Coroutine *co;
-    int64_t n;
     BdrvCoBlockStatusData data = {
         .bs = bs,
         .base = base,
         .file = file,
-        .offset = sector_num * BDRV_SECTOR_SIZE,
-        .bytes = nb_sectors * BDRV_SECTOR_SIZE,
-        .pnum = &n,
+        .offset = offset,
+        .bytes = bytes,
+        .pnum = pnum,
         .mapping = mapping,
         .done = false,
     };
@@ -1988,8 +1987,6 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
         bdrv_coroutine_enter(bs, co);
         BDRV_POLL_WHILE(bs, !data.done);
     }
-    assert(data.ret < 0 || QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
-    *pnum = n >> BDRV_SECTOR_BITS;
     return data.ret;
 }

@@ -1999,8 +1996,19 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                     int nb_sectors, int *pnum,
                                     BlockDriverState **file)
 {
-    return bdrv_common_block_status_above(bs, base, true, sector_num,
-                                          nb_sectors, pnum, file);
+    int64_t ret;
+    int64_t n;
+
+    ret = bdrv_common_block_status_above(bs, base, true,
+                                         sector_num * BDRV_SECTOR_SIZE,
+                                         nb_sectors * BDRV_SECTOR_SIZE,
+                                         &n, file);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
+    *pnum = n >> BDRV_SECTOR_BITS;
+    return ret;
 }

 int64_t bdrv_block_status(BlockDriverState *bs,
@@ -2025,20 +2033,13 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
     int64_t ret;
-    int psectors;
+    int64_t dummy;

-    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
-    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) && bytes < INT_MAX);
-    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false,
-                                         offset >> BDRV_SECTOR_BITS,
-                                         bytes >> BDRV_SECTOR_BITS, &psectors,
-                                         NULL);
+    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
+                                         bytes, pnum ? pnum : &dummy, NULL);
     if (ret < 0) {
         return ret;
     }
-    if (pnum) {
-        *pnum = psectors * BDRV_SECTOR_SIZE;
-    }
     return !!(ret & BDRV_BLOCK_ALLOCATED);
 }

-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (9 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 18:31   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

---
v3: rebase to allocation/mapping sense change, simple enough to keep R-b
v2: rebase to earlier changes
---
 block/io.c | 48 ++++++++++++++++++------------------------------
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/block/io.c b/block/io.c
index bc0e3fd0e2..409cfe0938 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1893,12 +1893,12 @@ out:
     return ret;
 }

-static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
+static int64_t coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
         BlockDriverState *base,
         bool mapping,
-        int64_t sector_num,
-        int nb_sectors,
-        int *pnum,
+        int64_t offset,
+        int64_t bytes,
+        int64_t *pnum,
         BlockDriverState **file)
 {
     BlockDriverState *p;
@@ -1907,17 +1907,10 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        int64_t count;
-
-        ret = bdrv_co_block_status(p, mapping,
-                                   sector_num * BDRV_SECTOR_SIZE,
-                                   nb_sectors * BDRV_SECTOR_SIZE, &count,
-                                   file);
+        ret = bdrv_co_block_status(p, mapping, offset, bytes, pnum, file);
         if (ret < 0) {
             break;
         }
-        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
-        *pnum = count >> BDRV_SECTOR_BITS;
         if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
             /*
              * Reading beyond the end of the file continues to read
@@ -1925,39 +1918,35 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
              * unallocated length we learned from an earlier
              * iteration.
              */
-            *pnum = nb_sectors;
+            *pnum = bytes;
         }
         if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
             break;
         }
-        /* [sector_num, pnum] unallocated on this layer, which could be only
-         * the first part of [sector_num, nb_sectors].  */
-        nb_sectors = MIN(nb_sectors, *pnum);
+        /* [offset, pnum] unallocated on this layer, which could be only
+         * the first part of [offset, bytes].  */
+        bytes = MIN(bytes, *pnum);
         first = false;
     }
     return ret;
 }

 /* Coroutine wrapper for bdrv_get_block_status_above() */
-static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
+static void coroutine_fn bdrv_block_status_above_co_entry(void *opaque)
 {
     BdrvCoBlockStatusData *data = opaque;
-    int n;

-    data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
-                                               data->mapping,
-                                               data->offset >> BDRV_SECTOR_BITS,
-                                               data->bytes >> BDRV_SECTOR_BITS,
-                                               &n,
-                                               data->file);
-    *data->pnum = n * BDRV_SECTOR_SIZE;
+    data->ret = bdrv_co_block_status_above(data->bs, data->base,
+                                           data->mapping,
+                                           data->offset, data->bytes,
+                                           data->pnum, data->file);
     data->done = true;
 }

 /*
- * Synchronous wrapper around bdrv_co_get_block_status_above().
+ * Synchronous wrapper around bdrv_co_block_status_above().
  *
- * See bdrv_co_get_block_status_above() for details.
+ * See bdrv_co_block_status_above() for details.
  */
 static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
                                               BlockDriverState *base,
@@ -1980,10 +1969,9 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,

     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
-        bdrv_get_block_status_above_co_entry(&data);
+        bdrv_block_status_above_co_entry(&data);
     } else {
-        co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry,
-                                   &data);
+        co = qemu_coroutine_create(bdrv_block_status_above_co_entry, &data);
         bdrv_coroutine_enter(bs, co);
         BDRV_POLL_WHILE(bs, !data.done);
     }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (10 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 18:41   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare() Eric Blake
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz, Jeff Cody

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status_above()
to bdrv_block_status_above() ensures that the compiler enforces that
all callers are updated.  For now, the io.c layer still assert()s
that all callers are sector-aligned, but that can be relaxed when a
later patch implements byte-based block status in the drivers.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status().  But some
code, particularly bdrv_block_status(), gets a lot simpler because
it no longer has to mess with sectors.  Likewise, mirror code no
longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore
drop an assertion (fix a neighboring assertion to use is_power_of_2
while there).

For ease of review, bdrv_get_block_status() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase to earlier changes
v3: rebase to allocation/mapping sense change and qcow2-measure, tweak
mirror assertions, drop R-b
v2: rebase to earlier changes
---
 include/block/block.h | 10 +++++-----
 block/io.c            | 39 ++++++++-------------------------------
 block/mirror.c        | 16 +++++-----------
 block/qcow2.c         | 25 +++++++++----------------
 qemu-img.c            | 39 +++++++++++++++++++++++----------------
 5 files changed, 50 insertions(+), 79 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 7a9a8db588..e87348dcfa 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -426,11 +426,11 @@ bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
                           int64_t bytes, int64_t *pnum,
                           BlockDriverState **file);
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file);
+int64_t bdrv_block_status_above(BlockDriverState *bs,
+                                BlockDriverState *base,
+                                int64_t offset,
+                                int64_t bytes, int64_t *pnum,
+                                BlockDriverState **file);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
diff --git a/block/io.c b/block/io.c
index 409cfe0938..ea63d19480 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1931,7 +1931,7 @@ static int64_t coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
     return ret;
 }

-/* Coroutine wrapper for bdrv_get_block_status_above() */
+/* Coroutine wrapper for bdrv_block_status_above() */
 static void coroutine_fn bdrv_block_status_above_co_entry(void *opaque)
 {
     BdrvCoBlockStatusData *data = opaque;
@@ -1978,43 +1978,20 @@ static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
     return data.ret;
 }

-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file)
+int64_t bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
+                                int64_t offset, int64_t bytes, int64_t *pnum,
+                                BlockDriverState **file)
 {
-    int64_t ret;
-    int64_t n;
-
-    ret = bdrv_common_block_status_above(bs, base, true,
-                                         sector_num * BDRV_SECTOR_SIZE,
-                                         nb_sectors * BDRV_SECTOR_SIZE,
-                                         &n, file);
-    if (ret < 0) {
-        return ret;
-    }
-    assert(QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE));
-    *pnum = n >> BDRV_SECTOR_BITS;
-    return ret;
+    return bdrv_common_block_status_above(bs, base, true, offset, bytes,
+                                          pnum, file);
 }

 int64_t bdrv_block_status(BlockDriverState *bs,
                           int64_t offset, int64_t bytes, int64_t *pnum,
                           BlockDriverState **file)
 {
-    int64_t ret;
-    int n;
-
-    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
-    ret = bdrv_get_block_status_above(bs, backing_bs(bs),
-                                      offset >> BDRV_SECTOR_BITS,
-                                      bytes >> BDRV_SECTOR_BITS, &n, file);
-    if (pnum) {
-        *pnum = n * BDRV_SECTOR_SIZE;
-    }
-    return ret;
+    return bdrv_block_status_above(bs, backing_bs(bs),
+                                   offset, bytes, pnum, file);
 }

 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
diff --git a/block/mirror.c b/block/mirror.c
index 67f45cec4e..fab59739c5 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -328,7 +328,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     uint64_t delay_ns = 0;
     /* At least the first dirty chunk is mirrored in one iteration. */
     int nb_chunks = 1;
-    int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
     int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

@@ -376,7 +375,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     }

     /* Clear dirty bits before querying the block status, because
-     * calling bdrv_get_block_status_above could yield - if some blocks are
+     * calling bdrv_block_status_above could yield - if some blocks are
      * marked dirty in this window, we need to know.
      */
     bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, offset,
@@ -386,7 +385,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
     while (nb_chunks > 0 && offset < s->bdev_length) {
         int64_t ret;
-        int io_sectors;
         int64_t io_bytes;
         int64_t io_bytes_acct;
         enum MirrorMethod {
@@ -396,11 +394,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         } mirror_method = MIRROR_METHOD_COPY;

         assert(!(offset % s->granularity));
-        ret = bdrv_get_block_status_above(source, NULL,
-                                          offset >> BDRV_SECTOR_BITS,
-                                          nb_chunks * sectors_per_chunk,
-                                          &io_sectors, NULL);
-        io_bytes = io_sectors * BDRV_SECTOR_SIZE;
+        ret = bdrv_block_status_above(source, NULL, offset,
+                                      nb_chunks * s->granularity,
+                                      &io_bytes, NULL);
         if (ret < 0) {
             io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
         } else if (ret & BDRV_BLOCK_DATA) {
@@ -1121,9 +1117,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         granularity = bdrv_get_default_bitmap_granularity(target);
     }

-    assert ((granularity & (granularity - 1)) == 0);
-    /* Granularity must be large enough for sector-based dirty bitmap */
-    assert(granularity >= BDRV_SECTOR_SIZE);
+    assert(is_power_of_2(granularity));

     if (buf_size < 0) {
         error_setg(errp, "Invalid parameter 'buf-size'");
diff --git a/block/qcow2.c b/block/qcow2.c
index eb498b56d4..721cb077fe 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2973,7 +2973,7 @@ finish:

 static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
-    int nr;
+    int64_t nr;
     int64_t res;
     int64_t start;

@@ -2989,10 +2989,8 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
     if (!bytes) {
         return true;
     }
-    res = bdrv_get_block_status_above(bs, NULL, start >> BDRV_SECTOR_BITS,
-                                      bytes >> BDRV_SECTOR_BITS, &nr, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) &&
-        nr * BDRV_SECTOR_SIZE == bytes;
+    res = bdrv_block_status_above(bs, NULL, start, bytes, &nr, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
 }

 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
@@ -3650,17 +3648,13 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
             required = virtual_size;
         } else {
             int64_t offset;
-            int pnum = 0;
+            int64_t pnum = 0;

-            for (offset = 0; offset < ssize;
-                 offset += pnum * BDRV_SECTOR_SIZE) {
-                int nb_sectors = MIN(ssize - offset,
-                                     BDRV_REQUEST_MAX_BYTES) / BDRV_SECTOR_SIZE;
+            for (offset = 0; offset < ssize; offset += pnum) {
                 int64_t ret;

-                ret = bdrv_get_block_status_above(in_bs, NULL,
-                                                  offset >> BDRV_SECTOR_BITS,
-                                                  nb_sectors, &pnum, NULL);
+                ret = bdrv_block_status_above(in_bs, NULL, offset,
+                                              ssize - offset, &pnum, NULL);
                 if (ret < 0) {
                     error_setg_errno(&local_err, -ret,
                                      "Unable to get block status");
@@ -3672,11 +3666,10 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
                 } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
                            (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
                     /* Extend pnum to end of cluster for next iteration */
-                    pnum = (ROUND_UP(offset + pnum * BDRV_SECTOR_SIZE,
-                                 cluster_size) - offset) >> BDRV_SECTOR_BITS;
+                    pnum = ROUND_UP(offset + pnum, cluster_size) - offset;

                     /* Count clusters we've seen */
-                    required += offset % cluster_size + pnum * BDRV_SECTOR_SIZE;
+                    required += offset % cluster_size + pnum;
                 }
             }
         }
diff --git a/qemu-img.c b/qemu-img.c
index 897f80abb3..b91133b922 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1225,7 +1225,7 @@ static int img_compare(int argc, char **argv)
     BlockDriverState *bs1, *bs2;
     int64_t total_sectors1, total_sectors2;
     uint8_t *buf1 = NULL, *buf2 = NULL;
-    int pnum1, pnum2;
+    int64_t pnum1, pnum2;
     int allocated1, allocated2;
     int ret = 0; /* return value - 0 Ident, 1 Different, >1 Error */
     bool progress = false, quiet = false, strict = false;
@@ -1379,9 +1379,11 @@ static int img_compare(int argc, char **argv)
         if (nb_sectors <= 0) {
             break;
         }
-        status1 = bdrv_get_block_status_above(bs1, NULL, sector_num,
-                                              total_sectors1 - sector_num,
-                                              &pnum1, NULL);
+        status1 = bdrv_block_status_above(bs1, NULL,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          (total_sectors1 - sector_num) *
+                                          BDRV_SECTOR_SIZE,
+                                          &pnum1, NULL);
         if (status1 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename1);
@@ -1389,9 +1391,11 @@ static int img_compare(int argc, char **argv)
         }
         allocated1 = status1 & BDRV_BLOCK_ALLOCATED;

-        status2 = bdrv_get_block_status_above(bs2, NULL, sector_num,
-                                              total_sectors2 - sector_num,
-                                              &pnum2, NULL);
+        status2 = bdrv_block_status_above(bs2, NULL,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          (total_sectors2 - sector_num) *
+                                          BDRV_SECTOR_SIZE,
+                                          &pnum2, NULL);
         if (status2 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename2);
@@ -1399,10 +1403,12 @@ static int img_compare(int argc, char **argv)
         }
         allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
         if (pnum1) {
-            nb_sectors = MIN(nb_sectors, pnum1);
+            nb_sectors = MIN(nb_sectors,
+                             DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE));
         }
         if (pnum2) {
-            nb_sectors = MIN(nb_sectors, pnum2);
+            nb_sectors = MIN(nb_sectors,
+                             DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE));
         }

         if (strict) {
@@ -1416,7 +1422,7 @@ static int img_compare(int argc, char **argv)
             }
         }
         if ((status1 & BDRV_BLOCK_ZERO) && (status2 & BDRV_BLOCK_ZERO)) {
-            nb_sectors = MIN(pnum1, pnum2);
+            nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
         } else if (allocated1 == allocated2) {
             if (allocated1) {
                 ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
@@ -1597,23 +1603,24 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
     n = MIN(s->total_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);

     if (s->sector_next_status <= sector_num) {
+        int64_t count = n * BDRV_SECTOR_SIZE;
+
         if (s->target_has_backing) {
-            int64_t count = n * BDRV_SECTOR_SIZE;

             ret = bdrv_block_status(blk_bs(s->src[src_cur]),
                                     (sector_num - src_cur_offset) *
                                     BDRV_SECTOR_SIZE,
                                     count, &count, NULL);
-            assert(ret < 0 || QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
-            n = count >> BDRV_SECTOR_BITS;
         } else {
-            ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
-                                              sector_num - src_cur_offset,
-                                              n, &n, NULL);
+            ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
+                                          (sector_num - src_cur_offset) *
+                                          BDRV_SECTOR_SIZE,
+                                          count, &count, NULL);
         }
         if (ret < 0) {
             return ret;
         }
+        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);

         if (ret & BDRV_BLOCK_ZERO) {
             s->status = BLK_ZERO;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare()
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (11 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 19:05   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file Eric Blake
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

As long as we are querying the status for a chunk smaller than
the known image size, we are guaranteed that a successful return
will have set pnum to a non-zero size (pnum is zero only for
queries beyond the end of the file).  Use that to slightly
simplify the calculation of the current chunk size being compared.
Likewise, we don't have to shrink the amount of data operated on
until we know we have to read the file, and therefore have to fit
in the bounds of our buffer.  Also, note that 'total_sectors_over'
is equivalent to 'progress_base'.

With these changes in place, sectors_to_process() is now dead code,
and can be removed.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 40 +++++++++++-----------------------------
 1 file changed, 11 insertions(+), 29 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b91133b922..f8423e9b3f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1171,11 +1171,6 @@ static int64_t sectors_to_bytes(int64_t sectors)
     return sectors << BDRV_SECTOR_BITS;
 }

-static int64_t sectors_to_process(int64_t total, int64_t from)
-{
-    return MIN(total - from, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
-}
-
 /*
  * Check if passed sectors are empty (not allocated or contain only 0 bytes)
  *
@@ -1372,13 +1367,9 @@ static int img_compare(int argc, char **argv)
         goto out;
     }

-    for (;;) {
+    while (sector_num < total_sectors) {
         int64_t status1, status2;

-        nb_sectors = sectors_to_process(total_sectors, sector_num);
-        if (nb_sectors <= 0) {
-            break;
-        }
         status1 = bdrv_block_status_above(bs1, NULL,
                                           sector_num * BDRV_SECTOR_SIZE,
                                           (total_sectors1 - sector_num) *
@@ -1402,14 +1393,9 @@ static int img_compare(int argc, char **argv)
             goto out;
         }
         allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
-        if (pnum1) {
-            nb_sectors = MIN(nb_sectors,
-                             DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE));
-        }
-        if (pnum2) {
-            nb_sectors = MIN(nb_sectors,
-                             DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE));
-        }
+
+        assert(pnum1 && pnum2);
+        nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);

         if (strict) {
             if ((status1 & ~BDRV_BLOCK_OFFSET_MASK) !=
@@ -1422,9 +1408,10 @@ static int img_compare(int argc, char **argv)
             }
         }
         if ((status1 & BDRV_BLOCK_ZERO) && (status2 & BDRV_BLOCK_ZERO)) {
-            nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
+            /* nothing to do */
         } else if (allocated1 == allocated2) {
             if (allocated1) {
+                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
                 ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
                                 nb_sectors << BDRV_SECTOR_BITS);
                 if (ret < 0) {
@@ -1453,7 +1440,7 @@ static int img_compare(int argc, char **argv)
                 }
             }
         } else {
-
+            nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
             if (allocated1) {
                 ret = check_empty_sectors(blk1, sector_num, nb_sectors,
                                           filename1, buf1, quiet);
@@ -1476,30 +1463,24 @@ static int img_compare(int argc, char **argv)

     if (total_sectors1 != total_sectors2) {
         BlockBackend *blk_over;
-        int64_t total_sectors_over;
         const char *filename_over;

         qprintf(quiet, "Warning: Image size mismatch!\n");
         if (total_sectors1 > total_sectors2) {
-            total_sectors_over = total_sectors1;
             blk_over = blk1;
             filename_over = filename1;
         } else {
-            total_sectors_over = total_sectors2;
             blk_over = blk2;
             filename_over = filename2;
         }

-        for (;;) {
+        while (sector_num < progress_base) {
             int64_t count;

-            nb_sectors = sectors_to_process(total_sectors_over, sector_num);
-            if (nb_sectors <= 0) {
-                break;
-            }
             ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
                                           sector_num * BDRV_SECTOR_SIZE,
-                                          nb_sectors * BDRV_SECTOR_SIZE,
+                                          (progress_base - sector_num) *
+                                          BDRV_SECTOR_SIZE,
                                           &count);
             if (ret < 0) {
                 ret = 3;
@@ -1513,6 +1494,7 @@ static int img_compare(int argc, char **argv)
             assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
             nb_sectors = count >> BDRV_SECTOR_BITS;
             if (ret) {
+                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
                 ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
                                           filename_over, buf1, quiet);
                 if (ret) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (12 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare() Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 20:54   ` John Snow
  2017-10-03  9:32   ` Vladimir Sementsov-Ogievskiy
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero() Eric Blake
                   ` (9 subsequent siblings)
  23 siblings, 2 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

Compare the following images with all-zero contents:
$ truncate --size 1M A
$ qemu-img create -f qcow2 -o preallocation=off B 1G
$ qemu-img create -f qcow2 -o preallocation=metadata C 1G

On my machine, the difference is noticeable for pre-patch speeds,
with more than an order of magnitude in difference caused by the
choice of preallocation in the qcow2 file:

$ time ./qemu-img compare -f raw -F qcow2 A B
Warning: Image size mismatch!
Images are identical.

real	0m0.014s
user	0m0.007s
sys	0m0.007s

$ time ./qemu-img compare -f raw -F qcow2 A C
Warning: Image size mismatch!
Images are identical.

real	0m0.341s
user	0m0.144s
sys	0m0.188s

Why? Because bdrv_is_allocated() returns false for image B but
true for image C, throwing away the fact that both images know
via lseek(SEEK_HOLE) that the entire image still reads as zero.
>From there, qemu-img ends up calling bdrv_pread() for every byte
of the tail, instead of quickly looking for the next allocation.
The solution: use block_status instead of is_allocated, giving:

$ time ./qemu-img compare -f raw -F qcow2 A C
Warning: Image size mismatch!
Images are identical.

real	0m0.014s
user	0m0.011s
sys	0m0.003s

which is on par with the speeds for no pre-allocation.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index f8423e9b3f..f5ab29d176 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1477,11 +1477,11 @@ static int img_compare(int argc, char **argv)
         while (sector_num < progress_base) {
             int64_t count;

-            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
+            ret = bdrv_block_status_above(blk_bs(blk_over), NULL,
                                           sector_num * BDRV_SECTOR_SIZE,
                                           (progress_base - sector_num) *
                                           BDRV_SECTOR_SIZE,
-                                          &count);
+                                          &count, NULL);
             if (ret < 0) {
                 ret = 3;
                 error_report("Sector allocation test failed for %s",
@@ -1489,11 +1489,11 @@ static int img_compare(int argc, char **argv)
                 goto out;

             }
-            /* TODO relax this once bdrv_is_allocated_above does not enforce
+            /* TODO relax this once bdrv_block_status_above does not enforce
              * sector alignment */
             assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
             nb_sectors = count >> BDRV_SECTOR_BITS;
-            if (ret) {
+            if (ret & BDRV_BLOCK_ALLOCATED && !(ret & BDRV_BLOCK_ZERO)) {
                 nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
                 ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
                                           filename_over, buf1, quiet);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero()
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (13 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 21:16   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare Eric Blake
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

During 'qemu-img compare', when we are checking that an allocated
portion of one file is all zeros, we don't need to waste time
computing how many additional sectors after the first non-zero
byte are also non-zero.  Create a new helper find_nonzero() to do
the check for a first non-zero sector, and rebase
check_empty_sectors() to use it.

The new interface intentionally uses bytes in its interface, even
though it still crawls the buffer a sector at a time; it is robust
to a partial sector at the end of the buffer.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index f5ab29d176..dfccebe6bc 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1064,6 +1064,28 @@ done:
 }

 /*
+ * Returns -1 if 'buf' contains only zeroes, otherwise the byte index
+ * of the first sector boundary within buf where the sector contains a
+ * non-zero byte.  This function is robust to a buffer that is not
+ * sector-aligned.
+ */
+static int64_t find_nonzero(const uint8_t *buf, int64_t n)
+{
+    int64_t i;
+    int64_t end = QEMU_ALIGN_DOWN(n, BDRV_SECTOR_SIZE);
+
+    for (i = 0; i < end; i += BDRV_SECTOR_SIZE) {
+        if (!buffer_is_zero(buf + i, BDRV_SECTOR_SIZE)) {
+            return i;
+        }
+    }
+    if (i < n && !buffer_is_zero(buf + i, n - end)) {
+        return i;
+    }
+    return -1;
+}
+
+/*
  * Returns true iff the first sector pointed to by 'buf' contains at least
  * a non-NUL byte.
  *
@@ -1188,7 +1210,9 @@ static int check_empty_sectors(BlockBackend *blk, int64_t sect_num,
                                int sect_count, const char *filename,
                                uint8_t *buffer, bool quiet)
 {
-    int pnum, ret = 0;
+    int ret = 0;
+    int64_t idx;
+
     ret = blk_pread(blk, sect_num << BDRV_SECTOR_BITS, buffer,
                     sect_count << BDRV_SECTOR_BITS);
     if (ret < 0) {
@@ -1196,10 +1220,10 @@ static int check_empty_sectors(BlockBackend *blk, int64_t sect_num,
                      sectors_to_bytes(sect_num), filename, strerror(-ret));
         return ret;
     }
-    ret = is_allocated_sectors(buffer, sect_count, &pnum);
-    if (ret || pnum != sect_count) {
+    idx = find_nonzero(buffer, sect_count * BDRV_SECTOR_SIZE);
+    if (idx >= 0) {
         qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
-                sectors_to_bytes(ret ? sect_num : sect_num + pnum));
+                sectors_to_bytes(sect_num) + idx);
         return 1;
     }

-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (14 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero() Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 21:35   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based Eric Blake
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

If a read error is encountered during 'qemu-img compare', we
were printing the "Error while reading offset ..." message twice.
Update the testsuite for the improved output.

Further simplify the code by hoisting the error code conversion
into the helper function, rather than repeating it at the callers.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c                 | 19 +++++--------------
 tests/qemu-iotests/074.out |  2 --
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index dfccebe6bc..3e1e373e8f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1196,8 +1196,10 @@ static int64_t sectors_to_bytes(int64_t sectors)
 /*
  * Check if passed sectors are empty (not allocated or contain only 0 bytes)
  *
- * Returns 0 in case sectors are filled with 0, 1 if sectors contain non-zero
- * data and negative value on error.
+ * Intended for use by 'qemu-img compare': Returns 0 in case sectors are
+ * filled with 0, 1 if sectors contain non-zero data (this is a comparison
+ * failure), and 4 on error (the exit status for read errors), after emitting
+ * an error message.
  *
  * @param blk:  BlockBackend for the image
  * @param sect_num: Number of first sector to check
@@ -1218,7 +1220,7 @@ static int check_empty_sectors(BlockBackend *blk, int64_t sect_num,
     if (ret < 0) {
         error_report("Error while reading offset %" PRId64 " of %s: %s",
                      sectors_to_bytes(sect_num), filename, strerror(-ret));
-        return ret;
+        return 4;
     }
     idx = find_nonzero(buffer, sect_count * BDRV_SECTOR_SIZE);
     if (idx >= 0) {
@@ -1473,11 +1475,6 @@ static int img_compare(int argc, char **argv)
                                           filename2, buf1, quiet);
             }
             if (ret) {
-                if (ret < 0) {
-                    error_report("Error while reading offset %" PRId64 ": %s",
-                                 sectors_to_bytes(sector_num), strerror(-ret));
-                    ret = 4;
-                }
                 goto out;
             }
         }
@@ -1522,12 +1519,6 @@ static int img_compare(int argc, char **argv)
                 ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
                                           filename_over, buf1, quiet);
                 if (ret) {
-                    if (ret < 0) {
-                        error_report("Error while reading offset %" PRId64
-                                     " of %s: %s", sectors_to_bytes(sector_num),
-                                     filename_over, strerror(-ret));
-                        ret = 4;
-                    }
                     goto out;
                 }
             }
diff --git a/tests/qemu-iotests/074.out b/tests/qemu-iotests/074.out
index 8fba5aea9c..ede66c3f81 100644
--- a/tests/qemu-iotests/074.out
+++ b/tests/qemu-iotests/074.out
@@ -4,7 +4,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 wrote 512/512 bytes at offset 512
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Error while reading offset 0 of blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error
-qemu-img: Error while reading offset 0: Input/output error
 4
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=0
@@ -12,7 +11,6 @@ Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=0
 wrote 512/512 bytes at offset 512
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Error while reading offset 0 of blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error
-qemu-img: Error while reading offset 0 of blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error
 Warning: Image size mismatch!
 4
 Cleanup
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (15 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 21:43   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based Eric Blake
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

Continue on the quest to make more things byte-based instead of
sector-based.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 3e1e373e8f..2e05f92e85 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1202,30 +1202,29 @@ static int64_t sectors_to_bytes(int64_t sectors)
  * an error message.
  *
  * @param blk:  BlockBackend for the image
- * @param sect_num: Number of first sector to check
- * @param sect_count: Number of sectors to check
+ * @param offset: Starting offset to check
+ * @param bytes: Number of bytes to check
  * @param filename: Name of disk file we are checking (logging purpose)
  * @param buffer: Allocated buffer for storing read data
  * @param quiet: Flag for quiet mode
  */
-static int check_empty_sectors(BlockBackend *blk, int64_t sect_num,
-                               int sect_count, const char *filename,
+static int check_empty_sectors(BlockBackend *blk, int64_t offset,
+                               int64_t bytes, const char *filename,
                                uint8_t *buffer, bool quiet)
 {
     int ret = 0;
     int64_t idx;

-    ret = blk_pread(blk, sect_num << BDRV_SECTOR_BITS, buffer,
-                    sect_count << BDRV_SECTOR_BITS);
+    ret = blk_pread(blk, offset, buffer, bytes);
     if (ret < 0) {
         error_report("Error while reading offset %" PRId64 " of %s: %s",
-                     sectors_to_bytes(sect_num), filename, strerror(-ret));
+                     offset, filename, strerror(-ret));
         return 4;
     }
-    idx = find_nonzero(buffer, sect_count * BDRV_SECTOR_SIZE);
+    idx = find_nonzero(buffer, bytes);
     if (idx >= 0) {
         qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
-                sectors_to_bytes(sect_num) + idx);
+                offset + idx);
         return 1;
     }

@@ -1468,10 +1467,12 @@ static int img_compare(int argc, char **argv)
         } else {
             nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
             if (allocated1) {
-                ret = check_empty_sectors(blk1, sector_num, nb_sectors,
+                ret = check_empty_sectors(blk1, sector_num * BDRV_SECTOR_SIZE,
+                                          nb_sectors * BDRV_SECTOR_SIZE,
                                           filename1, buf1, quiet);
             } else {
-                ret = check_empty_sectors(blk2, sector_num, nb_sectors,
+                ret = check_empty_sectors(blk2, sector_num * BDRV_SECTOR_SIZE,
+                                          nb_sectors * BDRV_SECTOR_SIZE,
                                           filename2, buf1, quiet);
             }
             if (ret) {
@@ -1516,7 +1517,9 @@ static int img_compare(int argc, char **argv)
             nb_sectors = count >> BDRV_SECTOR_BITS;
             if (ret & BDRV_BLOCK_ALLOCATED && !(ret & BDRV_BLOCK_ZERO)) {
                 nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
-                ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
+                ret = check_empty_sectors(blk_over,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          nb_sectors * BDRV_SECTOR_SIZE,
                                           filename_over, buf1, quiet);
                 if (ret) {
                     goto out;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (16 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-27 22:25   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 19/23] qemu-img: Change img_rebase() " Eric Blake
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

In the continuing quest to make more things byte-based, change
compare_sectors(), renaming it to compare_buffers() in the
process.  Note that one caller (qemu-img compare) only cares
about the first difference, while the other (qemu-img rebase)
cares about how many consecutive sectors have the same
equal/different status; however, this patch does not bother to
micro-optimize the compare case to avoid the comparisons of
sectors beyond the first mismatch.  Both callers are always
passing valid buffers in, so the initial check for buffer size
can be turned into an assertion.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 55 +++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 2e05f92e85..034122eba5 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1155,31 +1155,28 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
 }

 /*
- * Compares two buffers sector by sector. Returns 0 if the first sector of both
- * buffers matches, non-zero otherwise.
+ * Compares two buffers sector by sector. Returns 0 if the first
+ * sector of each buffer matches, non-zero otherwise.
  *
- * pnum is set to the number of sectors (including and immediately following
- * the first one) that are known to have the same comparison result
+ * pnum is set to the sector-aligned size of the buffer prefix that
+ * has the same matching status as the first sector.
  */
-static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n,
-    int *pnum)
+static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2,
+                           int64_t bytes, int64_t *pnum)
 {
     bool res;
-    int i;
+    int64_t i = MIN(bytes, BDRV_SECTOR_SIZE);

-    if (n <= 0) {
-        *pnum = 0;
-        return 0;
-    }
+    assert(bytes > 0);

-    res = !!memcmp(buf1, buf2, 512);
-    for(i = 1; i < n; i++) {
-        buf1 += 512;
-        buf2 += 512;
+    res = !!memcmp(buf1, buf2, i);
+    while (i < bytes) {
+        int64_t len = MIN(bytes - i, BDRV_SECTOR_SIZE);

-        if (!!memcmp(buf1, buf2, 512) != res) {
+        if (!!memcmp(buf1 + i, buf2 + i, len) != res) {
             break;
         }
+        i += len;
     }

     *pnum = i;
@@ -1254,7 +1251,7 @@ static int img_compare(int argc, char **argv)
     int64_t total_sectors;
     int64_t sector_num = 0;
     int64_t nb_sectors;
-    int c, pnum;
+    int c;
     uint64_t progress_base;
     bool image_opts = false;
     bool force_share = false;
@@ -1436,6 +1433,8 @@ static int img_compare(int argc, char **argv)
             /* nothing to do */
         } else if (allocated1 == allocated2) {
             if (allocated1) {
+                int64_t pnum;
+
                 nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
                 ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
                                 nb_sectors << BDRV_SECTOR_BITS);
@@ -1455,11 +1454,11 @@ static int img_compare(int argc, char **argv)
                     ret = 4;
                     goto out;
                 }
-                ret = compare_sectors(buf1, buf2, nb_sectors, &pnum);
-                if (ret || pnum != nb_sectors) {
+                ret = compare_buffers(buf1, buf2,
+                                      nb_sectors * BDRV_SECTOR_SIZE, &pnum);
+                if (ret || pnum != nb_sectors * BDRV_SECTOR_SIZE) {
                     qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
-                            sectors_to_bytes(
-                                ret ? sector_num : sector_num + pnum));
+                            sectors_to_bytes(sector_num) + (ret ? 0 : pnum));
                     ret = 1;
                     goto out;
                 }
@@ -3350,16 +3349,16 @@ static int img_rebase(int argc, char **argv)
             /* If they differ, we need to write to the COW file */
             uint64_t written = 0;

-            while (written < n) {
-                int pnum;
+            while (written < n * BDRV_SECTOR_SIZE) {
+                int64_t pnum;

-                if (compare_sectors(buf_old + written * 512,
-                    buf_new + written * 512, n - written, &pnum))
+                if (compare_buffers(buf_old + written,
+                                    buf_new + written,
+                                    n * BDRV_SECTOR_SIZE - written, &pnum))
                 {
                     ret = blk_pwrite(blk,
-                                     (sector + written) << BDRV_SECTOR_BITS,
-                                     buf_old + written * 512,
-                                     pnum << BDRV_SECTOR_BITS, 0);
+                                     (sector << BDRV_SECTOR_BITS) + written,
+                                     buf_old + written, pnum, 0);
                     if (ret < 0) {
                         error_report("Error while writing to COW image: %s",
                             strerror(-ret));
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 19/23] qemu-img: Change img_rebase() to be byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (17 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-29 19:38   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 20/23] qemu-img: Change img_compare() " Eric Blake
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

In the continuing quest to make more things byte-based, change
the internal iteration of img_rebase().  We can finally drop the
TODO assertion added earlier, now that the entire algorithm is
byte-based and no longer has to shift from bytes to sectors.

Most of the change is mechanical ('num_sectors' becomes 'size',
'sector' becomes 'offset', 'n' goes from sectors to bytes); some
of it is also a cleanup (use of MIN() instead of open-coding,
loss of variable 'count' added earlier in commit d6a644bb).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 84 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 34 insertions(+), 50 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 034122eba5..028c34a2cc 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3244,70 +3244,58 @@ static int img_rebase(int argc, char **argv)
      * the image is the same as the original one at any time.
      */
     if (!unsafe) {
-        int64_t num_sectors;
-        int64_t old_backing_num_sectors;
-        int64_t new_backing_num_sectors = 0;
-        uint64_t sector;
-        int n;
-        int64_t count;
+        int64_t size;
+        int64_t old_backing_size;
+        int64_t new_backing_size = 0;
+        uint64_t offset;
+        int64_t n;
         float local_progress = 0;

         buf_old = blk_blockalign(blk, IO_BUF_SIZE);
         buf_new = blk_blockalign(blk, IO_BUF_SIZE);

-        num_sectors = blk_nb_sectors(blk);
-        if (num_sectors < 0) {
+        size = blk_getlength(blk);
+        if (size < 0) {
             error_report("Could not get size of '%s': %s",
-                         filename, strerror(-num_sectors));
+                         filename, strerror(-size));
             ret = -1;
             goto out;
         }
-        old_backing_num_sectors = blk_nb_sectors(blk_old_backing);
-        if (old_backing_num_sectors < 0) {
+        old_backing_size = blk_getlength(blk_old_backing);
+        if (old_backing_size < 0) {
             char backing_name[PATH_MAX];

             bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
             error_report("Could not get size of '%s': %s",
-                         backing_name, strerror(-old_backing_num_sectors));
+                         backing_name, strerror(-old_backing_size));
             ret = -1;
             goto out;
         }
         if (blk_new_backing) {
-            new_backing_num_sectors = blk_nb_sectors(blk_new_backing);
-            if (new_backing_num_sectors < 0) {
+            new_backing_size = blk_getlength(blk_new_backing);
+            if (new_backing_size < 0) {
                 error_report("Could not get size of '%s': %s",
-                             out_baseimg, strerror(-new_backing_num_sectors));
+                             out_baseimg, strerror(-new_backing_size));
                 ret = -1;
                 goto out;
             }
         }

-        if (num_sectors != 0) {
-            local_progress = (float)100 /
-                (num_sectors / MIN(num_sectors, IO_BUF_SIZE / 512));
+        if (size != 0) {
+            local_progress = (float)100 / (size / MIN(size, IO_BUF_SIZE));
         }

-        for (sector = 0; sector < num_sectors; sector += n) {
-
-            /* How many sectors can we handle with the next read? */
-            if (sector + (IO_BUF_SIZE / 512) <= num_sectors) {
-                n = (IO_BUF_SIZE / 512);
-            } else {
-                n = num_sectors - sector;
-            }
+        for (offset = 0; offset < size; offset += n) {
+            /* How many bytes can we handle with the next read? */
+            n = MIN(IO_BUF_SIZE, size - offset);

             /* If the cluster is allocated, we don't need to take action */
-            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
-                                    n << BDRV_SECTOR_BITS, &count);
+            ret = bdrv_is_allocated(bs, offset, n, &n);
             if (ret < 0) {
                 error_report("error while reading image metadata: %s",
                              strerror(-ret));
                 goto out;
             }
-            /* TODO relax this once bdrv_is_allocated does not enforce
-             * sector alignment */
-            assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
-            n = count >> BDRV_SECTOR_BITS;
             if (ret) {
                 continue;
             }
@@ -3316,30 +3304,28 @@ static int img_rebase(int argc, char **argv)
              * Read old and new backing file and take into consideration that
              * backing files may be smaller than the COW image.
              */
-            if (sector >= old_backing_num_sectors) {
-                memset(buf_old, 0, n * BDRV_SECTOR_SIZE);
+            if (offset >= old_backing_size) {
+                memset(buf_old, 0, n);
             } else {
-                if (sector + n > old_backing_num_sectors) {
-                    n = old_backing_num_sectors - sector;
+                if (offset + n > old_backing_size) {
+                    n = old_backing_size - offset;
                 }

-                ret = blk_pread(blk_old_backing, sector << BDRV_SECTOR_BITS,
-                                buf_old, n << BDRV_SECTOR_BITS);
+                ret = blk_pread(blk_old_backing, offset, buf_old, n);
                 if (ret < 0) {
                     error_report("error while reading from old backing file");
                     goto out;
                 }
             }

-            if (sector >= new_backing_num_sectors || !blk_new_backing) {
-                memset(buf_new, 0, n * BDRV_SECTOR_SIZE);
+            if (offset >= new_backing_size || !blk_new_backing) {
+                memset(buf_new, 0, n);
             } else {
-                if (sector + n > new_backing_num_sectors) {
-                    n = new_backing_num_sectors - sector;
+                if (offset + n > new_backing_size) {
+                    n = new_backing_size - offset;
                 }

-                ret = blk_pread(blk_new_backing, sector << BDRV_SECTOR_BITS,
-                                buf_new, n << BDRV_SECTOR_BITS);
+                ret = blk_pread(blk_new_backing, offset, buf_new, n);
                 if (ret < 0) {
                     error_report("error while reading from new backing file");
                     goto out;
@@ -3349,15 +3335,13 @@ static int img_rebase(int argc, char **argv)
             /* If they differ, we need to write to the COW file */
             uint64_t written = 0;

-            while (written < n * BDRV_SECTOR_SIZE) {
+            while (written < n) {
                 int64_t pnum;

-                if (compare_buffers(buf_old + written,
-                                    buf_new + written,
-                                    n * BDRV_SECTOR_SIZE - written, &pnum))
+                if (compare_buffers(buf_old + written, buf_new + written,
+                                    n - written, &pnum))
                 {
-                    ret = blk_pwrite(blk,
-                                     (sector << BDRV_SECTOR_BITS) + written,
+                    ret = blk_pwrite(blk, offset + written,
                                      buf_old + written, pnum, 0);
                     if (ret < 0) {
                         error_report("Error while writing to COW image: %s",
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 20/23] qemu-img: Change img_compare() to be byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (18 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 19/23] qemu-img: Change img_rebase() " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-29 20:42   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 21/23] block: Align block status requests Eric Blake
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

In the continuing quest to make more things byte-based, change
the internal iteration of img_compare().  We can finally drop the
TODO assertion added earlier, now that the entire algorithm is
byte-based and no longer has to shift from bytes to sectors.

Most of the change is mechanical ('total_sectors' becomes
'total_size', 'sector_num' becomes 'offset', 'nb_sectors' becomes
'chunk', 'progress_base' goes from sectors to bytes); some of it
is also a cleanup (sectors_to_bytes() is now unused, loss of
variable 'count' added earlier in commit 51b0a488).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch
---
 qemu-img.c | 119 ++++++++++++++++++++++++-------------------------------------
 1 file changed, 46 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 028c34a2cc..ef7062649d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1185,11 +1185,6 @@ static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2,

 #define IO_BUF_SIZE (2 * 1024 * 1024)

-static int64_t sectors_to_bytes(int64_t sectors)
-{
-    return sectors << BDRV_SECTOR_BITS;
-}
-
 /*
  * Check if passed sectors are empty (not allocated or contain only 0 bytes)
  *
@@ -1240,7 +1235,7 @@ static int img_compare(int argc, char **argv)
     const char *fmt1 = NULL, *fmt2 = NULL, *cache, *filename1, *filename2;
     BlockBackend *blk1, *blk2;
     BlockDriverState *bs1, *bs2;
-    int64_t total_sectors1, total_sectors2;
+    int64_t total_size1, total_size2;
     uint8_t *buf1 = NULL, *buf2 = NULL;
     int64_t pnum1, pnum2;
     int allocated1, allocated2;
@@ -1248,9 +1243,9 @@ static int img_compare(int argc, char **argv)
     bool progress = false, quiet = false, strict = false;
     int flags;
     bool writethrough;
-    int64_t total_sectors;
-    int64_t sector_num = 0;
-    int64_t nb_sectors;
+    int64_t total_size;
+    int64_t offset = 0;
+    int64_t chunk;
     int c;
     uint64_t progress_base;
     bool image_opts = false;
@@ -1364,39 +1359,36 @@ static int img_compare(int argc, char **argv)

     buf1 = blk_blockalign(blk1, IO_BUF_SIZE);
     buf2 = blk_blockalign(blk2, IO_BUF_SIZE);
-    total_sectors1 = blk_nb_sectors(blk1);
-    if (total_sectors1 < 0) {
+    total_size1 = blk_getlength(blk1);
+    if (total_size1 < 0) {
         error_report("Can't get size of %s: %s",
-                     filename1, strerror(-total_sectors1));
+                     filename1, strerror(-total_size1));
         ret = 4;
         goto out;
     }
-    total_sectors2 = blk_nb_sectors(blk2);
-    if (total_sectors2 < 0) {
+    total_size2 = blk_getlength(blk2);
+    if (total_size2 < 0) {
         error_report("Can't get size of %s: %s",
-                     filename2, strerror(-total_sectors2));
+                     filename2, strerror(-total_size2));
         ret = 4;
         goto out;
     }
-    total_sectors = MIN(total_sectors1, total_sectors2);
-    progress_base = MAX(total_sectors1, total_sectors2);
+    total_size = MIN(total_size1, total_size2);
+    progress_base = MAX(total_size1, total_size2);

     qemu_progress_print(0, 100);

-    if (strict && total_sectors1 != total_sectors2) {
+    if (strict && total_size1 != total_size2) {
         ret = 1;
         qprintf(quiet, "Strict mode: Image size mismatch!\n");
         goto out;
     }

-    while (sector_num < total_sectors) {
+    while (offset < total_size) {
         int64_t status1, status2;

-        status1 = bdrv_block_status_above(bs1, NULL,
-                                          sector_num * BDRV_SECTOR_SIZE,
-                                          (total_sectors1 - sector_num) *
-                                          BDRV_SECTOR_SIZE,
-                                          &pnum1, NULL);
+        status1 = bdrv_block_status_above(bs1, NULL, offset,
+                                          total_size1 - offset, &pnum1, NULL);
         if (status1 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename1);
@@ -1404,11 +1396,8 @@ static int img_compare(int argc, char **argv)
         }
         allocated1 = status1 & BDRV_BLOCK_ALLOCATED;

-        status2 = bdrv_block_status_above(bs2, NULL,
-                                          sector_num * BDRV_SECTOR_SIZE,
-                                          (total_sectors2 - sector_num) *
-                                          BDRV_SECTOR_SIZE,
-                                          &pnum2, NULL);
+        status2 = bdrv_block_status_above(bs2, NULL, offset,
+                                          total_size2 - offset, &pnum2, NULL);
         if (status2 < 0) {
             ret = 3;
             error_report("Sector allocation test failed for %s", filename2);
@@ -1417,15 +1406,14 @@ static int img_compare(int argc, char **argv)
         allocated2 = status2 & BDRV_BLOCK_ALLOCATED;

         assert(pnum1 && pnum2);
-        nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
+        chunk = MIN(pnum1, pnum2);

         if (strict) {
             if ((status1 & ~BDRV_BLOCK_OFFSET_MASK) !=
                 (status2 & ~BDRV_BLOCK_OFFSET_MASK)) {
                 ret = 1;
                 qprintf(quiet, "Strict mode: Offset %" PRId64
-                        " block status mismatch!\n",
-                        sectors_to_bytes(sector_num));
+                        " block status mismatch!\n", offset);
                 goto out;
             }
         }
@@ -1435,59 +1423,54 @@ static int img_compare(int argc, char **argv)
             if (allocated1) {
                 int64_t pnum;

-                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
-                ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
-                                nb_sectors << BDRV_SECTOR_BITS);
+                chunk = MIN(chunk, IO_BUF_SIZE);
+                ret = blk_pread(blk1, offset, buf1, chunk);
                 if (ret < 0) {
-                    error_report("Error while reading offset %" PRId64 " of %s:"
-                                 " %s", sectors_to_bytes(sector_num), filename1,
-                                 strerror(-ret));
+                    error_report("Error while reading offset %" PRId64
+                                 " of %s: %s",
+                                 offset, filename1, strerror(-ret));
                     ret = 4;
                     goto out;
                 }
-                ret = blk_pread(blk2, sector_num << BDRV_SECTOR_BITS, buf2,
-                                nb_sectors << BDRV_SECTOR_BITS);
+                ret = blk_pread(blk2, offset, buf2, chunk);
                 if (ret < 0) {
                     error_report("Error while reading offset %" PRId64
-                                 " of %s: %s", sectors_to_bytes(sector_num),
-                                 filename2, strerror(-ret));
+                                 " of %s: %s",
+                                 offset, filename2, strerror(-ret));
                     ret = 4;
                     goto out;
                 }
-                ret = compare_buffers(buf1, buf2,
-                                      nb_sectors * BDRV_SECTOR_SIZE, &pnum);
-                if (ret || pnum != nb_sectors * BDRV_SECTOR_SIZE) {
+                ret = compare_buffers(buf1, buf2, chunk, &pnum);
+                if (ret || pnum != chunk) {
                     qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
-                            sectors_to_bytes(sector_num) + (ret ? 0 : pnum));
+                            offset + (ret ? 0 : pnum));
                     ret = 1;
                     goto out;
                 }
             }
         } else {
-            nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
+            chunk = MIN(chunk, IO_BUF_SIZE);
             if (allocated1) {
-                ret = check_empty_sectors(blk1, sector_num * BDRV_SECTOR_SIZE,
-                                          nb_sectors * BDRV_SECTOR_SIZE,
+                ret = check_empty_sectors(blk1, offset, chunk,
                                           filename1, buf1, quiet);
             } else {
-                ret = check_empty_sectors(blk2, sector_num * BDRV_SECTOR_SIZE,
-                                          nb_sectors * BDRV_SECTOR_SIZE,
+                ret = check_empty_sectors(blk2, offset, chunk,
                                           filename2, buf1, quiet);
             }
             if (ret) {
                 goto out;
             }
         }
-        sector_num += nb_sectors;
-        qemu_progress_print(((float) nb_sectors / progress_base)*100, 100);
+        offset += chunk;
+        qemu_progress_print(((float) chunk / progress_base) * 100, 100);
     }

-    if (total_sectors1 != total_sectors2) {
+    if (total_size1 != total_size2) {
         BlockBackend *blk_over;
         const char *filename_over;

         qprintf(quiet, "Warning: Image size mismatch!\n");
-        if (total_sectors1 > total_sectors2) {
+        if (total_size1 > total_size2) {
             blk_over = blk1;
             filename_over = filename1;
         } else {
@@ -1495,14 +1478,10 @@ static int img_compare(int argc, char **argv)
             filename_over = filename2;
         }

-        while (sector_num < progress_base) {
-            int64_t count;
-
-            ret = bdrv_block_status_above(blk_bs(blk_over), NULL,
-                                          sector_num * BDRV_SECTOR_SIZE,
-                                          (progress_base - sector_num) *
-                                          BDRV_SECTOR_SIZE,
-                                          &count, NULL);
+        while (offset < progress_base) {
+            ret = bdrv_block_status_above(blk_bs(blk_over), NULL, offset,
+                                          progress_base - offset, &chunk,
+                                          NULL);
             if (ret < 0) {
                 ret = 3;
                 error_report("Sector allocation test failed for %s",
@@ -1510,22 +1489,16 @@ static int img_compare(int argc, char **argv)
                 goto out;

             }
-            /* TODO relax this once bdrv_block_status_above does not enforce
-             * sector alignment */
-            assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
-            nb_sectors = count >> BDRV_SECTOR_BITS;
             if (ret & BDRV_BLOCK_ALLOCATED && !(ret & BDRV_BLOCK_ZERO)) {
-                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
-                ret = check_empty_sectors(blk_over,
-                                          sector_num * BDRV_SECTOR_SIZE,
-                                          nb_sectors * BDRV_SECTOR_SIZE,
+                chunk = MIN(chunk, IO_BUF_SIZE);
+                ret = check_empty_sectors(blk_over, offset, chunk,
                                           filename_over, buf1, quiet);
                 if (ret) {
                     goto out;
                 }
             }
-            sector_num += nb_sectors;
-            qemu_progress_print(((float) nb_sectors / progress_base)*100, 100);
+            offset += chunk;
+            qemu_progress_print(((float) chunk / progress_base) * 100, 100);
         }
     }

-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 21/23] block: Align block status requests
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (19 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 20/23] qemu-img: Change img_compare() " Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-09-13 19:26   ` Eric Blake
  2017-10-02 20:24   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 22/23] block: Relax bdrv_aligned_preadv() assertion Eric Blake
                   ` (2 subsequent siblings)
  23 siblings, 2 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz, Stefan Hajnoczi

Any device that has request_alignment greater than 512 should be
unable to report status at a finer granularity; it may also be
simpler for such devices to be guaranteed that the block layer
has rounded things out to the granularity boundary (the way the
block layer already rounds all other I/O out).  Besides, getting
the code correct for super-sector alignment also benefits us
for the fact that our public interface now has byte granularity,
even though none of our drivers have byte-level callbacks.

Add an assertion in blkdebug that proves that the block layer
never requests status of unaligned sections, similar to what it
does on other requests (while still keeping the generic helper
in place for when future patches add a throttle driver).  Note
that iotest 177 already covers this (it would fail if you use
just the blkdebug.c hunk without the io.c changes).  Meanwhile,
we can drop assertions in callers that no longer have to pass
in sector-aligned addresses.

There is a mid-function scope added for 'int count', for a
couple of reasons: first, an upcoming patch will add an 'if'
statement that checks whether a driver has an old- or new-style
callback, and can conveniently use the same scope for less
indentation churn at that time.  Second, since we are trying
to get rid of sector-based computations, wrapping things in
a scope makes it easier to group and see what will be deleted
in a final cleanup patch once all drivers have been converted
to the new-style callback.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: tweak commit message [Fam], rebase to context conflicts, ensure
we don't exceed 32-bit limit, drop R-b
v2: new patch
---
 include/block/block_int.h |  3 ++-
 block/io.c                | 55 +++++++++++++++++++++++++++++++----------------
 block/blkdebug.c          | 13 ++++++++++-
 3 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7f71c585a0..b1ceffba78 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -207,7 +207,8 @@ struct BlockDriver {
      * according to the current layer, and should not set
      * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
      * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
-     * layer guarantees non-NULL pnum and file.
+     * layer guarantees input aligned to request_alignment, as well as
+     * non-NULL pnum and file.
      */
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum,
diff --git a/block/io.c b/block/io.c
index ea63d19480..c78201b8eb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1773,7 +1773,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     int64_t n; /* bytes */
     int64_t ret, ret2;
     BlockDriverState *local_file = NULL;
-    int count; /* sectors */
+    int64_t aligned_offset, aligned_bytes;
+    uint32_t align;

     assert(pnum);
     total_size = bdrv_getlength(bs);
@@ -1815,28 +1816,45 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     }

     bdrv_inc_in_flight(bs);
-    /*
-     * TODO: Rather than require aligned offsets, we could instead
-     * round to the driver's request_alignment here, then touch up
-     * count afterwards back to the caller's expectations.
-     */
-    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
-    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-                                            bytes >> BDRV_SECTOR_BITS, &count,
-                                            &local_file);
-    if (ret < 0) {
-        *pnum = 0;
-        goto out;
+
+    /* Round out to request_alignment boundaries */
+    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
+    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
+    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
+
+    {
+        int count; /* sectors */
+
+        assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
+                               BDRV_SECTOR_SIZE));
+        ret = bs->drv->bdrv_co_get_block_status(
+            bs, aligned_offset >> BDRV_SECTOR_BITS,
+            MIN(INT_MAX, aligned_bytes) >> BDRV_SECTOR_BITS, &count,
+            &local_file);
+        if (ret < 0) {
+            *pnum = 0;
+            goto out;
+        }
+        *pnum = count * BDRV_SECTOR_SIZE;
+    }
+
+    /* Clamp pnum and ret to original request */
+    assert(QEMU_IS_ALIGNED(*pnum, align));
+    *pnum -= offset - aligned_offset;
+    if (aligned_offset >> BDRV_SECTOR_BITS != offset >> BDRV_SECTOR_BITS &&
+        ret & BDRV_BLOCK_OFFSET_VALID) {
+        ret += QEMU_ALIGN_DOWN(offset - aligned_offset, BDRV_SECTOR_SIZE);
+    }
+    if (*pnum > bytes) {
+        *pnum = bytes;
     }
-    *pnum = count * BDRV_SECTOR_SIZE;

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
         ret = bdrv_co_block_status(local_file, mapping,
-                                   ret & BDRV_BLOCK_OFFSET_MASK,
+                                   (ret & BDRV_BLOCK_OFFSET_MASK) |
+                                   (offset & ~BDRV_BLOCK_OFFSET_MASK),
                                    *pnum, pnum, &local_file);
-        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
         goto out;
     }

@@ -1860,7 +1878,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
         int64_t file_pnum;

         ret2 = bdrv_co_block_status(local_file, mapping,
-                                    ret & BDRV_BLOCK_OFFSET_MASK,
+                                    (ret & BDRV_BLOCK_OFFSET_MASK) |
+                                    (offset & ~BDRV_BLOCK_OFFSET_MASK),
                                     *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 46e53f2f09..f54fe33cae 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -628,6 +628,17 @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
     return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
 }

+static int64_t coroutine_fn blkdebug_co_get_block_status(
+    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+    BlockDriverState **file)
+{
+    assert(QEMU_IS_ALIGNED(sector_num | nb_sectors,
+                           DIV_ROUND_UP(bs->bl.request_alignment,
+                                        BDRV_SECTOR_SIZE)));
+    return bdrv_co_get_block_status_from_file(bs, sector_num, nb_sectors,
+                                              pnum, file);
+}
+
 static void blkdebug_close(BlockDriverState *bs)
 {
     BDRVBlkdebugState *s = bs->opaque;
@@ -897,7 +908,7 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
     .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
     .bdrv_co_pdiscard       = blkdebug_co_pdiscard,
-    .bdrv_co_get_block_status = bdrv_co_get_block_status_from_file,
+    .bdrv_co_get_block_status = blkdebug_co_get_block_status,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 22/23] block: Relax bdrv_aligned_preadv() assertion
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (20 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 21/23] block: Align block status requests Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-10-02 21:20   ` John Snow
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
  2017-09-13 21:05 ` [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Stefan Hajnoczi, Max Reitz

Now that bdrv_is_allocated accepts non-aligned inputs, we can
remove the TODO added in commit d6a644bb.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: new patch [Kevin]
---
 block/io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index c78201b8eb..e0f9bca7e2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1055,18 +1055,14 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
     }

     if (flags & BDRV_REQ_COPY_ON_READ) {
-        /* TODO: Simplify further once bdrv_is_allocated no longer
-         * requires sector alignment */
-        int64_t start = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
-        int64_t end = QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE);
         int64_t pnum;

-        ret = bdrv_is_allocated(bs, start, end - start, &pnum);
+        ret = bdrv_is_allocated(bs, offset, bytes, &pnum);
         if (ret < 0) {
             goto out;
         }

-        if (!ret || pnum != end - start) {
+        if (!ret || pnum != bytes) {
             ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
             goto out;
         }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (21 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 22/23] block: Relax bdrv_aligned_preadv() assertion Eric Blake
@ 2017-09-13 16:03 ` Eric Blake
  2017-10-02 21:27   ` John Snow
  2017-09-13 21:05 ` [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
  23 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jsnow, kwolf, qemu-block, Max Reitz

Previously, the alloc command required that input parameters be
sector-aligned and clamped to 32 bits, because the underlying
bdrv_is_allocated used a 32-bit parameter and asserted aligned
inputs.  But now that we have fixed block status to report a
64-bit bytes value, and to properly round requests on behalf of
guests, we can pass any values, and can use qemu-io to add
coverage that our rounding is correct regardless of the guest
alignment constraints.

Update iotest 177 to intentionally probe block status at
unaligned boundaries as well as with a bytes value that does not
map to 32-bit sectors, which also required tweaking the image
prep to leave an unallocated portion to the image under test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: also test huge bytes value, R-b dropped
v2: new patch
---
 qemu-io-cmds.c             | 13 -------------
 tests/qemu-iotests/177     | 12 ++++++++++--
 tests/qemu-iotests/177.out | 19 ++++++++++++++-----
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 2811a89099..d9a32f3bed 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1769,10 +1769,6 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
     if (offset < 0) {
         print_cvtnum_err(offset, argv[1]);
         return 0;
-    } else if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
-        printf("%" PRId64 " is not a sector-aligned value for 'offset'\n",
-               offset);
-        return 0;
     }

     if (argc == 3) {
@@ -1780,19 +1776,10 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
         if (count < 0) {
             print_cvtnum_err(count, argv[2]);
             return 0;
-        } else if (count > INT_MAX * BDRV_SECTOR_SIZE) {
-            printf("length argument cannot exceed %llu, given %s\n",
-                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
-            return 0;
         }
     } else {
         count = BDRV_SECTOR_SIZE;
     }
-    if (!QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE)) {
-        printf("%" PRId64 " is not a sector-aligned value for 'count'\n",
-               count);
-        return 0;
-    }

     remaining = count;
     sum_alloc = 0;
diff --git a/tests/qemu-iotests/177 b/tests/qemu-iotests/177
index f8ed8fb86b..28990977f1 100755
--- a/tests/qemu-iotests/177
+++ b/tests/qemu-iotests/177
@@ -51,7 +51,7 @@ echo "== setting up files =="
 TEST_IMG="$TEST_IMG.base" _make_test_img $size
 $QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
 _make_test_img -b "$TEST_IMG.base"
-$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -P 22 0 110M" "$TEST_IMG" | _filter_qemu_io

 # Limited to 64k max-transfer
 echo
@@ -82,6 +82,13 @@ $QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
          -c "discard 80000001 30M" | _filter_qemu_io

 echo
+echo "== block status smaller than alignment =="
+limits=align=4k
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+	 -c "alloc 1 1" -c "alloc 0x6dffff0 1000" -c "alloc 127m 5P" \
+	 -c map | _filter_qemu_io
+
+echo
 echo "== verify image content =="

 function verify_io()
@@ -103,7 +110,8 @@ function verify_io()
     echo read -P 0 32M 32M
     echo read -P 22 64M 13M
     echo read -P $discarded 77M 29M
-    echo read -P 22 106M 22M
+    echo read -P 22 106M 4M
+    echo read -P 11 110M 18M
 }

 verify_io | $QEMU_IO -r "$TEST_IMG" | _filter_qemu_io
diff --git a/tests/qemu-iotests/177.out b/tests/qemu-iotests/177.out
index 43a777836c..f788b55e20 100644
--- a/tests/qemu-iotests/177.out
+++ b/tests/qemu-iotests/177.out
@@ -5,8 +5,8 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
-wrote 134217728/134217728 bytes at offset 0
-128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 115343360/115343360 bytes at offset 0
+110 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

 == constrained alignment and max-transfer ==
 wrote 131072/131072 bytes at offset 1000
@@ -26,6 +26,13 @@ wrote 33554432/33554432 bytes at offset 33554432
 discard 31457280/31457280 bytes at offset 80000001
 30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

+== block status smaller than alignment ==
+1/1 bytes allocated at offset 1 bytes
+16/1000 bytes allocated at offset 110 MiB
+0/1048576 bytes allocated at offset 127 MiB
+110 MiB (0x6e00000) bytes     allocated at offset 0 bytes (0x0)
+18 MiB (0x1200000) bytes not allocated at offset 110 MiB (0x6e00000)
+
 == verify image content ==
 read 1000/1000 bytes at offset 0
 1000 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -43,12 +50,14 @@ read 13631488/13631488 bytes at offset 67108864
 13 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 30408704/30408704 bytes at offset 80740352
 29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 23068672/23068672 bytes at offset 111149056
-22 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4194304/4194304 bytes at offset 111149056
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 18874368/18874368 bytes at offset 115343360
+18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Offset          Length          File
 0               0x800000        TEST_DIR/t.IMGFMT
 0x900000        0x2400000       TEST_DIR/t.IMGFMT
 0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
-0x6a00000       0x1600000       TEST_DIR/t.IMGFMT
+0x6a00000       0x400000        TEST_DIR/t.IMGFMT
 No errors were found on the image.
 *** done
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/23] block: Align block status requests
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 21/23] block: Align block status requests Eric Blake
@ 2017-09-13 19:26   ` Eric Blake
  2017-09-13 20:36     ` Eric Blake
  2017-10-02 20:24   ` John Snow
  1 sibling, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-13 19:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

On 09/13/2017 11:03 AM, Eric Blake wrote:
> Any device that has request_alignment greater than 512 should be
> unable to report status at a finer granularity; it may also be
> simpler for such devices to be guaranteed that the block layer
> has rounded things out to the granularity boundary (the way the
> block layer already rounds all other I/O out).  Besides, getting
> the code correct for super-sector alignment also benefits us
> for the fact that our public interface now has byte granularity,
> even though none of our drivers have byte-level callbacks.
> 
> Add an assertion in blkdebug that proves that the block layer
> never requests status of unaligned sections, similar to what it
> does on other requests (while still keeping the generic helper
> in place for when future patches add a throttle driver).  Note
> that iotest 177 already covers this (it would fail if you use
> just the blkdebug.c hunk without the io.c changes).  Meanwhile,
> we can drop assertions in callers that no longer have to pass
> in sector-aligned addresses.

Bummer - 'git bisect' says this patch causes iotests 190 to hang.  I'm
investigating root cause, but I'll have to post a fixup once I figure it
out.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/23] block: Align block status requests
  2017-09-13 19:26   ` Eric Blake
@ 2017-09-13 20:36     ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-13 20:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]

On 09/13/2017 02:26 PM, Eric Blake wrote:
> On 09/13/2017 11:03 AM, Eric Blake wrote:
>> Any device that has request_alignment greater than 512 should be
>> unable to report status at a finer granularity; it may also be
>> simpler for such devices to be guaranteed that the block layer
>> has rounded things out to the granularity boundary (the way the
>> block layer already rounds all other I/O out).  Besides, getting
>> the code correct for super-sector alignment also benefits us
>> for the fact that our public interface now has byte granularity,
>> even though none of our drivers have byte-level callbacks.
>>
>> Add an assertion in blkdebug that proves that the block layer
>> never requests status of unaligned sections, similar to what it
>> does on other requests (while still keeping the generic helper
>> in place for when future patches add a throttle driver).  Note
>> that iotest 177 already covers this (it would fail if you use
>> just the blkdebug.c hunk without the io.c changes).  Meanwhile,
>> we can drop assertions in callers that no longer have to pass
>> in sector-aligned addresses.
> 
> Bummer - 'git bisect' says this patch causes iotests 190 to hang.  I'm
> investigating root cause, but I'll have to post a fixup once I figure it
> out.

Found it:

> +    /* Round out to request_alignment boundaries */
> +    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
> +    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
> +    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;

ROUND_UP(64-bit, 32-bit) has a subtle bug: it truncates the operation at
32 bits, instead of producing a 64-bit result.  Using QEMU_ROUND_UP
instead does NOT have the bug.

That's a ticking time bomb, so I'll patch ROUND_UP() directly as a
pre-requisite, then reply to the cover letter with a Based-on tag.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based
  2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
                   ` (22 preceding siblings ...)
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
@ 2017-09-13 21:05 ` Eric Blake
  23 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-13 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

On 09/13/2017 11:03 AM, Eric Blake wrote:
> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
> but NBD wants to report status on byte granularity (even if the
> reporting will probably be naturally aligned to sectors or even
> much higher levels).  I've therefore started the task of
> converting our block status code to report at a byte granularity
> rather than sectors.
> 
> Now that 2.11 is open, I'm rebasing/reposting the remaining patches.
> 
> The overall conversion currently looks like:
> part 1: bdrv_is_allocated (merged, commit 51b0a488)
> part 2: dirty-bitmap (v7 is posted [1], mostly reviewed)
> part 3: bdrv_get_block_status (this series, v3 at [2])
> part 4: .bdrv_co_block_status (v2 is posted [3], but needs a rebase)
> 
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-status-v4
> 
> Based-on: <20170912203119.24166-1-eblake@redhat.com>
> ([PATCH v7 00/20] make dirty-bitmap byte-based)

Also,

Based-on: <20170913210343.19078-1-eblake@redhat.com>
(osdep: Fix ROUND_UP(64-bit, 32-bit))

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status()
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status() Eric Blake
@ 2017-09-25 22:43   ` John Snow
  2017-09-27 21:46     ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-25 22:43 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  This patch merely simplifies the callers by
> consolidating the logic in the common call point, while guaranteeing
> a non-NULL file to all the driver callbacks, for no semantic change.
> The only caller that does not care about pnum is bdrv_is_allocated,
> as invoked by vvfat; we can likewise add assertions that the rest
> of the stack does not have to worry about a NULL pnum.
> 
> Furthermore, this will also set the stage for a future cleanup: when
> a caller does not care about which BDS owns an offset, it would be
> nice to allow the driver to optimize things to not have to return
> BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
> allocation (for example, it's fairly easy to create a qcow2 image
> where consecutive guest addresses are not at consecutive host
> addresses), the current contract requires bdrv_get_block_status()
> to clamp *pnum to the limit where host addresses are no longer
> consecutive, but allowing a NULL file means that *pnum could be
> set to the full length of known-allocated data.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v4: only context changes
> v3: rebase to recent changes (qcow2_measure), dropped R-b
> v2: use local variable and final transfer, rather than assignment
> of parameter to local
> [previously in different series]:
> v2: new patch, https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05645.html
> ---
>  include/block/block_int.h | 10 ++++++----
>  block/io.c                | 44 ++++++++++++++++++++++++++++----------------
>  block/mirror.c            |  3 +--
>  block/qcow2.c             |  8 ++------
>  qemu-img.c                | 10 ++++------
>  5 files changed, 41 insertions(+), 34 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 55c5d573d4..7f71c585a0 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -202,10 +202,12 @@ struct BlockDriver {
>          int64_t offset, int bytes);
> 
>      /*
> -     * Building block for bdrv_block_status[_above]. The driver should
> -     * answer only according to the current layer, and should not
> -     * set BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
> -     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.
> +     * Building block for bdrv_block_status[_above] and
> +     * bdrv_is_allocated[_above].  The driver should answer only
> +     * according to the current layer, and should not set
> +     * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
> +     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
> +     * layer guarantees non-NULL pnum and file.
>       */
>      int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum,
> diff --git a/block/io.c b/block/io.c
> index 8a0cd8835a..f250029395 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -695,7 +695,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
>  {
>      int64_t target_sectors, ret, nb_sectors, sector_num = 0;
>      BlockDriverState *bs = child->bs;
> -    BlockDriverState *file;
>      int n;
> 
>      target_sectors = bdrv_nb_sectors(bs);
> @@ -708,7 +707,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
>          if (nb_sectors <= 0) {
>              return 0;
>          }
> -        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, &file);
> +        ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, NULL);
>          if (ret < 0) {
>              error_report("error getting block status at sector %" PRId64 ": %s",
>                           sector_num, strerror(-ret));
> @@ -1755,8 +1754,9 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * beyond the end of the disk image it will be clamped; if 'pnum' is set to
>   * the end of the image, then the returned value will include BDRV_BLOCK_EOF.
>   *
> - * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, 'file'
> - * points to the BDS which the sector range is allocated in.
> + * If returned value is positive, BDRV_BLOCK_OFFSET_VALID bit is set, and
> + * 'file' is non-NULL, then '*file' points to the BDS which the sector range
> + * is allocated in.
>   */
>  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>                                                       int64_t sector_num,
> @@ -1766,15 +1766,22 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>      int64_t total_sectors;
>      int64_t n;
>      int64_t ret, ret2;
> +    BlockDriverState *local_file = NULL;
> 
> -    *file = NULL;
> +    assert(pnum);

nice assert..

>      total_sectors = bdrv_nb_sectors(bs);
>      if (total_sectors < 0) {
> +        if (file) {
> +            *file = NULL;
> +        }

Function reads slightly worse for the wear now with all of the return
logic handled at various places within, but unifying it might be even
stranger, perhaps..

Let's see if I hate this more:

out:
bdrv_dec_in_flight(bs);
    bdrv_dec_in_flight(bs);
    if (ret >= 0 && sector_num + *pnum == total_sectors) {
        ret |= BDRV_BLOCK_EOF;
    }
early_out:
    if (file) {
        *file = local_file;
    }
    return ret;


and then earlier in the function, we can just:

if (total_sectors < 0) {
  ret = total_sectors;
  goto early_out;
}

>          return total_sectors;
>      }
> 
>      if (sector_num >= total_sectors) {
>          *pnum = 0;
> +        if (file) {
> +            *file = NULL;
> +        }

ret = BDRV_BLOCK_EOF;
goto early_out;

>          return BDRV_BLOCK_EOF;
>      }
> 
> @@ -1791,23 +1798,27 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>          }
>          if (bs->drv->protocol_name) {
>              ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
> -            *file = bs;
> +            if (file) {
> +                *file = bs;
> +            }

local_file = bs;

> +        } else if (file) {
> +            *file = NULL;

no longer needed

>          }
>          return ret;

replaced with:

goto early_out;

>      }
> 
>      bdrv_inc_in_flight(bs);
>      ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
> -                                            file);
> +                                            &local_file);
>      if (ret < 0) {
>          *pnum = 0;
>          goto out;
>      }
> 
>      if (ret & BDRV_BLOCK_RAW) {
> -        assert(ret & BDRV_BLOCK_OFFSET_VALID && *file);
> -        ret = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
> -                                       *pnum, pnum, file);
> +        assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
> +        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +                                       *pnum, pnum, &local_file);
>          goto out;
>      }
> 
> @@ -1825,14 +1836,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>          }
>      }
> 
> -    if (*file && *file != bs &&
> +    if (local_file && local_file != bs &&
>          (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>          (ret & BDRV_BLOCK_OFFSET_VALID)) {
> -        BlockDriverState *file2;
>          int file_pnum;
> 
> -        ret2 = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
> -                                        *pnum, &file_pnum, &file2);
> +        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +                                        *pnum, &file_pnum, NULL);
>          if (ret2 >= 0) {
>              /* Ignore errors.  This is just providing extra information, it
>               * is useful but not necessary.
> @@ -1854,6 +1864,9 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>      }
> 
>  out:
> +    if (file) {
> +        *file = local_file;
> +    }
>      bdrv_dec_in_flight(bs);
>      if (ret >= 0 && sector_num + *pnum == total_sectors) {
>          ret |= BDRV_BLOCK_EOF;
> @@ -1957,7 +1970,6 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
>  int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>                                     int64_t bytes, int64_t *pnum)
>  {
> -    BlockDriverState *file;
>      int64_t sector_num = offset >> BDRV_SECTOR_BITS;
>      int nb_sectors = bytes >> BDRV_SECTOR_BITS;
>      int64_t ret;
> @@ -1966,7 +1978,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>      assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>      assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) && bytes < INT_MAX);
>      ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
> -                                &file);
> +                                NULL);
>      if (ret < 0) {
>          return ret;
>      }
> diff --git a/block/mirror.c b/block/mirror.c
> index 5cdaaed7be..032cfe91fa 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -390,7 +390,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>          int io_sectors;
>          unsigned int io_bytes;
>          int64_t io_bytes_acct;
> -        BlockDriverState *file;
>          enum MirrorMethod {
>              MIRROR_METHOD_COPY,
>              MIRROR_METHOD_ZERO,
> @@ -401,7 +400,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>          ret = bdrv_get_block_status_above(source, NULL,
>                                            offset >> BDRV_SECTOR_BITS,
>                                            nb_chunks * sectors_per_chunk,
> -                                          &io_sectors, &file);
> +                                          &io_sectors, NULL);
>          io_bytes = io_sectors * BDRV_SECTOR_SIZE;
>          if (ret < 0) {
>              io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 64dcd98a91..9a7b5cd41f 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2975,7 +2975,6 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
>                              uint32_t count)
>  {
>      int nr;
> -    BlockDriverState *file;
>      int64_t res;
> 
>      if (start + count > bs->total_sectors) {
> @@ -2985,8 +2984,7 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
>      if (!count) {
>          return true;
>      }
> -    res = bdrv_get_block_status_above(bs, NULL, start, count,
> -                                      &nr, &file);
> +    res = bdrv_get_block_status_above(bs, NULL, start, count, &nr, NULL);
>      return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
>  }
> 
> @@ -3654,13 +3652,11 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>                   offset += pnum * BDRV_SECTOR_SIZE) {
>                  int nb_sectors = MIN(ssize - offset,
>                                       BDRV_REQUEST_MAX_BYTES) / BDRV_SECTOR_SIZE;
> -                BlockDriverState *file;
>                  int64_t ret;
> 
>                  ret = bdrv_get_block_status_above(in_bs, NULL,
>                                                    offset >> BDRV_SECTOR_BITS,
> -                                                  nb_sectors,
> -                                                  &pnum, &file);
> +                                                  nb_sectors, &pnum, NULL);
>                  if (ret < 0) {
>                      error_setg_errno(&local_err, -ret,
>                                       "Unable to get block status");
> diff --git a/qemu-img.c b/qemu-img.c
> index df984b11b9..0c12e1c240 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1374,7 +1374,6 @@ static int img_compare(int argc, char **argv)
> 
>      for (;;) {
>          int64_t status1, status2;
> -        BlockDriverState *file;
> 
>          nb_sectors = sectors_to_process(total_sectors, sector_num);
>          if (nb_sectors <= 0) {
> @@ -1382,7 +1381,7 @@ static int img_compare(int argc, char **argv)
>          }
>          status1 = bdrv_get_block_status_above(bs1, NULL, sector_num,
>                                                total_sectors1 - sector_num,
> -                                              &pnum1, &file);
> +                                              &pnum1, NULL);
>          if (status1 < 0) {
>              ret = 3;
>              error_report("Sector allocation test failed for %s", filename1);
> @@ -1392,7 +1391,7 @@ static int img_compare(int argc, char **argv)
> 
>          status2 = bdrv_get_block_status_above(bs2, NULL, sector_num,
>                                                total_sectors2 - sector_num,
> -                                              &pnum2, &file);
> +                                              &pnum2, NULL);
>          if (status2 < 0) {
>              ret = 3;
>              error_report("Sector allocation test failed for %s", filename2);
> @@ -1598,15 +1597,14 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
>      n = MIN(s->total_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS);
> 
>      if (s->sector_next_status <= sector_num) {
> -        BlockDriverState *file;
>          if (s->target_has_backing) {
>              ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
>                                          sector_num - src_cur_offset,
> -                                        n, &n, &file);
> +                                        n, &n, NULL);
>          } else {
>              ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
>                                                sector_num - src_cur_offset,
> -                                              n, &n, &file);
> +                                              n, &n, NULL);
>          }
>          if (ret < 0) {
>              return ret;
> 

It's only shed paint, though:

Reviewed-by: John Snow <jsnow@redhat.com>

I'm looking at the rest of the series now, so please stand by.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
@ 2017-09-26 18:31   ` John Snow
  2017-09-28 14:58     ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-26 18:31 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  In particular, bdrv_is_allocated() cares more
> about finding the largest run of allocated data from the guest
> perspective, whether or not that data is consecutive from the
> host perspective.  Therefore, doing subsequent refinements such
> as checking how much of the format-layer allocation also satisfies
> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
> case, it just costs extra CPU cycles during a single
> bdrv_is_allocated(), but in the worst case, it results in a smaller
> *pnum, and forces callers to iterate through more status probes when
> visiting the entire file for even more extra CPU cycles.
> 
> This patch only optimizes the block layer.  But subsequent patches
> will tweak the driver callback to be byte-based, and in the process,
> can also pass this hint through to the driver.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v4: only context changes
> v3: s/allocation/mapping/ and flip sense of bool
> v2: new patch
> ---
>  block/io.c | 52 ++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index f250029395..6509c804d4 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1709,6 +1709,7 @@ typedef struct BdrvCoGetBlockStatusData {
>      int nb_sectors;
>      int *pnum;
>      int64_t ret;
> +    bool mapping;
>      bool done;
>  } BdrvCoGetBlockStatusData;
> 
> @@ -1743,6 +1744,11 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * Drivers not implementing the functionality are assumed to not support
>   * backing files, hence all their sectors are reported as allocated.
>   *
> + * If 'mapping' is true, the caller is querying for mapping purposes,
> + * and the result should include BDRV_BLOCK_OFFSET_VALID where
> + * possible; otherwise, the result may omit that bit particularly if
> + * it allows for a larger value in 'pnum'.
> + *
>   * If 'sector_num' is beyond the end of the disk image the return value is
>   * BDRV_BLOCK_EOF and 'pnum' is set to 0.
>   *
> @@ -1759,6 +1765,7 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * is allocated in.
>   */
>  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
> +                                                     bool mapping,
>                                                       int64_t sector_num,
>                                                       int nb_sectors, int *pnum,
>                                                       BlockDriverState **file)
> @@ -1817,14 +1824,15 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
> 
>      if (ret & BDRV_BLOCK_RAW) {
>          assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
> -        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +        ret = bdrv_co_get_block_status(local_file, mapping,
> +                                       ret >> BDRV_SECTOR_BITS,
>                                         *pnum, pnum, &local_file);
>          goto out;
>      }
> 
>      if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>          ret |= BDRV_BLOCK_ALLOCATED;
> -    } else {
> +    } else if (mapping) {
>          if (bdrv_unallocated_blocks_are_zero(bs)) {
>              ret |= BDRV_BLOCK_ZERO;
>          } else if (bs->backing) {
> @@ -1836,12 +1844,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>          }
>      }
> 
> -    if (local_file && local_file != bs &&
> +    if (mapping && local_file && local_file != bs &&

Tentatively this looks OK to me, but I have to admit I'm a little shaky
on this portion because I've not really investigated this function too
much. I am at the very least convinced that when mapping is true that
the function is equivalent and that existing callers don't have their
behavior changed too much.

Benefit of the doubt:

Reviewed-by: John Snow <jsnow@redhat.com>

>          (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>          (ret & BDRV_BLOCK_OFFSET_VALID)) {
>          int file_pnum;
> 
> -        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +        ret2 = bdrv_co_get_block_status(local_file, mapping,
> +                                        ret >> BDRV_SECTOR_BITS,
>                                          *pnum, &file_pnum, NULL);
>          if (ret2 >= 0) {
>              /* Ignore errors.  This is just providing extra information, it
> @@ -1876,6 +1885,7 @@ out:
> 
>  static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
>          BlockDriverState *base,
> +        bool mapping,
>          int64_t sector_num,
>          int nb_sectors,
>          int *pnum,
> @@ -1887,7 +1897,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
> 
>      assert(bs != base);
>      for (p = bs; p != base; p = backing_bs(p)) {
> -        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
> +        ret = bdrv_co_get_block_status(p, mapping, sector_num, nb_sectors,
> +                                       pnum, file);
>          if (ret < 0) {
>              break;
>          }
> @@ -1917,6 +1928,7 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
>      BdrvCoGetBlockStatusData *data = opaque;
> 
>      data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
> +                                               data->mapping,
>                                                 data->sector_num,
>                                                 data->nb_sectors,
>                                                 data->pnum,
> @@ -1929,11 +1941,12 @@ static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
>   *
>   * See bdrv_co_get_block_status_above() for details.
>   */
> -int64_t bdrv_get_block_status_above(BlockDriverState *bs,
> -                                    BlockDriverState *base,
> -                                    int64_t sector_num,
> -                                    int nb_sectors, int *pnum,
> -                                    BlockDriverState **file)
> +static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
> +                                              BlockDriverState *base,
> +                                              bool mapping,
> +                                              int64_t sector_num,
> +                                              int nb_sectors, int *pnum,
> +                                              BlockDriverState **file)
>  {
>      Coroutine *co;
>      BdrvCoGetBlockStatusData data = {
> @@ -1943,6 +1956,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>          .sector_num = sector_num,
>          .nb_sectors = nb_sectors,
>          .pnum = pnum,
> +        .mapping = mapping,
>          .done = false,
>      };
> 
> @@ -1958,6 +1972,16 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>      return data.ret;
>  }
> 
> +int64_t bdrv_get_block_status_above(BlockDriverState *bs,
> +                                    BlockDriverState *base,
> +                                    int64_t sector_num,
> +                                    int nb_sectors, int *pnum,
> +                                    BlockDriverState **file)
> +{
> +    return bdrv_common_block_status_above(bs, base, true, sector_num,
> +                                          nb_sectors, pnum, file);
> +}
> +
>  int64_t bdrv_get_block_status(BlockDriverState *bs,
>                                int64_t sector_num,
>                                int nb_sectors, int *pnum,
> @@ -1970,15 +1994,15 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
>  int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>                                     int64_t bytes, int64_t *pnum)
>  {
> -    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
> -    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
>      int64_t ret;
>      int psectors;
> 
>      assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>      assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) && bytes < INT_MAX);
> -    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
> -                                NULL);
> +    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false,
> +                                         offset >> BDRV_SECTOR_BITS,
> +                                         bytes >> BDRV_SECTOR_BITS, &psectors,
> +                                         NULL);
>      if (ret < 0) {
>          return ret;
>      }
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
@ 2017-09-26 18:51   ` John Snow
  2017-09-26 19:18     ` Eric Blake
  2017-09-29 20:03   ` Eric Blake
  1 sibling, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-26 18:51 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> In the process of converting sector-based interfaces to bytes,
> I'm finding it easier to represent a byte count as a 64-bit
> integer at the block layer (even if we are internally capped
> by SIZE_MAX or even INT_MAX for individual transactions, it's
> still nicer to not have to worry about truncation/overflow
> issues on as many variables).  Update the signature of
> bdrv_round_to_clusters() to uniformly use int64_t, matching
> the signature already chosen for bdrv_is_allocated and the
> fact that off_t is also a signed type, then adjust clients
> according to the required fallout.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> 
> ---
> v4: only context changes
> v3: no change
> v2: fix commit message [John], rebase to earlier changes, including
> mirror_clip_bytes() signature update
> ---
>  include/block/block.h | 4 ++--
>  block/io.c            | 7 ++++---
>  block/mirror.c        | 7 +++----
>  block/trace-events    | 2 +-
>  4 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 2ad18775af..bb3b95d491 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -475,9 +475,9 @@ int bdrv_get_flags(BlockDriverState *bs);
>  int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
>  ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
>  void bdrv_round_to_clusters(BlockDriverState *bs,
> -                            int64_t offset, unsigned int bytes,
> +                            int64_t offset, int64_t bytes,
>                              int64_t *cluster_offset,
> -                            unsigned int *cluster_bytes);
> +                            int64_t *cluster_bytes);
> 
>  const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
>  void bdrv_get_backing_filename(BlockDriverState *bs,
> diff --git a/block/io.c b/block/io.c
> index 6509c804d4..b362b46e3d 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -446,9 +446,9 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
>   * Round a region to cluster boundaries
>   */
>  void bdrv_round_to_clusters(BlockDriverState *bs,
> -                            int64_t offset, unsigned int bytes,
> +                            int64_t offset, int64_t bytes,
>                              int64_t *cluster_offset,
> -                            unsigned int *cluster_bytes)
> +                            int64_t *cluster_bytes)
>  {
>      BlockDriverInfo bdi;
> 
> @@ -946,7 +946,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>      struct iovec iov;
>      QEMUIOVector bounce_qiov;
>      int64_t cluster_offset;
> -    unsigned int cluster_bytes;
> +    int64_t cluster_bytes;
>      size_t skip_bytes;
>      int ret;
> 
> @@ -967,6 +967,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>      trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
>                                     cluster_offset, cluster_bytes);
> 
> +    assert(cluster_bytes < SIZE_MAX);

later in this function, is there any real or imagined risk of
cluster_bytes exceeding INT_MAX when it's passed to
bdrv_co_do_pwrite_zeroes?

>      iov.iov_len = cluster_bytes;
>      iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
>      if (bounce_buffer == NULL) {
> diff --git a/block/mirror.c b/block/mirror.c
> index 032cfe91fa..67f45cec4e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -190,10 +190,9 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
>      bool need_cow;
>      int ret = 0;
>      int64_t align_offset = *offset;
> -    unsigned int align_bytes = *bytes;
> +    int64_t align_bytes = *bytes;
>      int max_bytes = s->granularity * s->max_iov;
> 
> -    assert(*bytes < INT_MAX);
>      need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
>      need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
>                            s->cow_bitmap);
> @@ -388,7 +387,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      while (nb_chunks > 0 && offset < s->bdev_length) {
>          int64_t ret;
>          int io_sectors;
> -        unsigned int io_bytes;
> +        int64_t io_bytes;
>          int64_t io_bytes_acct;
>          enum MirrorMethod {
>              MIRROR_METHOD_COPY,
> @@ -413,7 +412,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>              io_bytes = s->granularity;
>          } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
>              int64_t target_offset;
> -            unsigned int target_bytes;
> +            int64_t target_bytes;
>              bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
>                                     &target_offset, &target_bytes);
>              if (target_offset == offset &&
> diff --git a/block/trace-events b/block/trace-events
> index 25dd5a3026..4c6586f156 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -12,7 +12,7 @@ blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flag
>  bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
>  bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
>  bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x"
> -bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"
> +bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %"PRId64" cluster_offset %"PRId64" cluster_bytes %u"
> 
>  # block/stream.c
>  stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
> 

Everything else looks obviously correct to me.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
@ 2017-09-26 19:06   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-26 19:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change), and rename it to is_zero() in the
> process.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() " Eric Blake
@ 2017-09-26 19:13   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-26 19:13 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of zeroing a device to track by bytes instead of
> sectors (although we are still guaranteed that we iterate by steps
> that are sector-aligned).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() " Eric Blake
@ 2017-09-26 19:16   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-26 19:16 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> an internal function (no semantic change), and simplifying its
> caller accordingly.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful
  2017-09-26 18:51   ` John Snow
@ 2017-09-26 19:18     ` Eric Blake
  2017-09-26 19:29       ` John Snow
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-26 19:18 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]

On 09/26/2017 01:51 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> In the process of converting sector-based interfaces to bytes,
>> I'm finding it easier to represent a byte count as a 64-bit
>> integer at the block layer (even if we are internally capped
>> by SIZE_MAX or even INT_MAX for individual transactions, it's
>> still nicer to not have to worry about truncation/overflow
>> issues on as many variables).  Update the signature of
>> bdrv_round_to_clusters() to uniformly use int64_t, matching
>> the signature already chosen for bdrv_is_allocated and the
>> fact that off_t is also a signed type, then adjust clients
>> according to the required fallout.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>

>> @@ -946,7 +946,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>>      struct iovec iov;
>>      QEMUIOVector bounce_qiov;
>>      int64_t cluster_offset;
>> -    unsigned int cluster_bytes;
>> +    int64_t cluster_bytes;
>>      size_t skip_bytes;
>>      int ret;
>>
>> @@ -967,6 +967,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>>      trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
>>                                     cluster_offset, cluster_bytes);
>>
>> +    assert(cluster_bytes < SIZE_MAX);
> 
> later in this function, is there any real or imagined risk of
> cluster_bytes exceeding INT_MAX when it's passed to
> bdrv_co_do_pwrite_zeroes?
> 
>>      iov.iov_len = cluster_bytes;

cluster_bytes is the input 'unsigned int bytes' rounded out to cluster
boundaries, but where we know 'bytes <= BDRV_REQUEST_MAX_BYTES' (which
is 2^31 - 511).  Still, I guess you are right that rounding to a cluster
size could produce a larger value of exactly 2^31 (bigger than INT_MAX,
but still fits in 32-bit unsigned int, so my assert was to make sure
that truncating 64 bits to size_t iov.iov_len still works on 32-bit
platforms).

In theory, I don't think we ever attempt an unaligned operation near
2^31 that would round up to INT_MAX overflow (if we can, that's a
pre-existing bug that should be fixed separately).

Should I tighten the assertion to assert(cluster_bytes <=
BDRV_REQUEST_MAX_BYTES), then see if I can come up with a case where we
can violate that?

> Everything else looks obviously correct to me.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful
  2017-09-26 19:18     ` Eric Blake
@ 2017-09-26 19:29       ` John Snow
  2017-09-28 22:29         ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-26 19:29 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi



On 09/26/2017 03:18 PM, Eric Blake wrote:
> On 09/26/2017 01:51 PM, John Snow wrote:
>>
>>
>> On 09/13/2017 12:03 PM, Eric Blake wrote:
>>> In the process of converting sector-based interfaces to bytes,
>>> I'm finding it easier to represent a byte count as a 64-bit
>>> integer at the block layer (even if we are internally capped
>>> by SIZE_MAX or even INT_MAX for individual transactions, it's
>>> still nicer to not have to worry about truncation/overflow
>>> issues on as many variables).  Update the signature of
>>> bdrv_round_to_clusters() to uniformly use int64_t, matching
>>> the signature already chosen for bdrv_is_allocated and the
>>> fact that off_t is also a signed type, then adjust clients
>>> according to the required fallout.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>>
> 
>>> @@ -946,7 +946,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>>>      struct iovec iov;
>>>      QEMUIOVector bounce_qiov;
>>>      int64_t cluster_offset;
>>> -    unsigned int cluster_bytes;
>>> +    int64_t cluster_bytes;
>>>      size_t skip_bytes;
>>>      int ret;
>>>
>>> @@ -967,6 +967,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>>>      trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
>>>                                     cluster_offset, cluster_bytes);
>>>
>>> +    assert(cluster_bytes < SIZE_MAX);
>>
>> later in this function, is there any real or imagined risk of
>> cluster_bytes exceeding INT_MAX when it's passed to
>> bdrv_co_do_pwrite_zeroes?
>>
>>>      iov.iov_len = cluster_bytes;
> 
> cluster_bytes is the input 'unsigned int bytes' rounded out to cluster

Ah, yes, we're probably not going to exceed that, you're right.

> boundaries, but where we know 'bytes <= BDRV_REQUEST_MAX_BYTES' (which
> is 2^31 - 511).  Still, I guess you are right that rounding to a cluster
> size could produce a larger value of exactly 2^31 (bigger than INT_MAX,
> but still fits in 32-bit unsigned int, so my assert was to make sure
> that truncating 64 bits to size_t iov.iov_len still works on 32-bit
> platforms).
> 
> In theory, I don't think we ever attempt an unaligned operation near
> 2^31 that would round up to INT_MAX overflow (if we can, that's a
> pre-existing bug that should be fixed separately).
> 
> Should I tighten the assertion to assert(cluster_bytes <=
> BDRV_REQUEST_MAX_BYTES), then see if I can come up with a case where we
> can violate that?
> 

*Only* if you think it's worth your time. You'd know better than me at
this point if this is remotely possible or not. Just a simple width
check that caught my eye.

(Gotta prove to everyone I'm reading these, right? :p)

>> Everything else looks obviously correct to me.
>>
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes Eric Blake
@ 2017-09-26 19:39   ` John Snow
  2017-09-26 19:57     ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-26 19:39 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the name of the function from bdrv_get_block_status() to
> bdrv_block_status() ensures that the compiler enforces that all
> callers are updated.  For now, the io.c layer still assert()s that
> all callers are sector-aligned, but that can be relaxed when a later
> patch implements byte-based block status in the drivers.
> 
> Note that we have an inherent limitation in the BDRV_BLOCK_* return
> values: BDRV_BLOCK_OFFSET_VALID can only return the start of a
> sector, even if we later relax the interface to query for the status
> starting at an intermediate byte; document the obvious interpretation
> that valid offsets are always sector-relative.
> 
> Therefore, for the most part this patch is just the addition of scaling
> at the callers followed by inverse scaling at bdrv_block_status().  But
> some code, particularly bdrv_is_allocated(), gets a lot simpler because
> it no longer has to mess with sectors.
> 
> For ease of review, bdrv_get_block_status_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: clamp bytes to 32-bits, rather than asserting
> v2: rebase to earlier changes
> ---
>  include/block/block.h | 12 +++++++-----
>  block/io.c            | 31 +++++++++++++++++++------------
>  block/qcow2-cluster.c |  2 +-
>  qemu-img.c            | 20 +++++++++++---------
>  4 files changed, 38 insertions(+), 27 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index bb3b95d491..7a9a8db588 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -138,8 +138,10 @@ typedef struct HDGeometry {
>   *
>   * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK)
>   * represent the offset in the returned BDS that is allocated for the
> - * corresponding raw data; however, whether that offset actually contains
> - * data also depends on BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO, as follows:
> + * corresponding raw data.  Individual bytes are at the same sector-relative
> + * locations (and thus, this bit cannot be set for mappings which are
> + * not equivalent modulo 512).  However, whether that offset actually
> + * contains data also depends on BDRV_BLOCK_DATA, as follows:
>   *
>   * DATA ZERO OFFSET_VALID
>   *  t    t        t       sectors read as zero, returned file is zero at offset
> @@ -421,9 +423,9 @@ int bdrv_has_zero_init_1(BlockDriverState *bs);
>  int bdrv_has_zero_init(BlockDriverState *bs);
>  bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
> -int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
> -                              int nb_sectors, int *pnum,
> -                              BlockDriverState **file);
> +int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
> +                          int64_t bytes, int64_t *pnum,
> +                          BlockDriverState **file);
>  int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>                                      BlockDriverState *base,
>                                      int64_t sector_num,
> diff --git a/block/io.c b/block/io.c
> index 638b3890b7..1ed46bcece 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -695,7 +695,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
>  {
>      int64_t target_size, ret, bytes, offset = 0;
>      BlockDriverState *bs = child->bs;
> -    int n; /* sectors */
> 
>      target_size = bdrv_getlength(bs);
>      if (target_size < 0) {
> @@ -707,24 +706,23 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
>          if (bytes <= 0) {
>              return 0;
>          }
> -        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
> -                                    bytes >> BDRV_SECTOR_BITS, &n, NULL);
> +        ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL);
>          if (ret < 0) {
>              error_report("error getting block status at offset %" PRId64 ": %s",
>                           offset, strerror(-ret));
>              return ret;
>          }
>          if (ret & BDRV_BLOCK_ZERO) {
> -            offset += n * BDRV_SECTOR_BITS;
> +            offset += bytes;
>              continue;
>          }
> -        ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
> +        ret = bdrv_pwrite_zeroes(child, offset, bytes, flags);
>          if (ret < 0) {
>              error_report("error writing zeroes at offset %" PRId64 ": %s",
>                           offset, strerror(-ret));
>              return ret;
>          }
> -        offset += n * BDRV_SECTOR_SIZE;
> +        offset += bytes;
>      }
>  }
> 
> @@ -1983,13 +1981,22 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>                                            nb_sectors, pnum, file);
>  }
> 
> -int64_t bdrv_get_block_status(BlockDriverState *bs,
> -                              int64_t sector_num,
> -                              int nb_sectors, int *pnum,
> -                              BlockDriverState **file)
> +int64_t bdrv_block_status(BlockDriverState *bs,
> +                          int64_t offset, int64_t bytes, int64_t *pnum,
> +                          BlockDriverState **file)
>  {
> -    return bdrv_get_block_status_above(bs, backing_bs(bs),
> -                                       sector_num, nb_sectors, pnum, file);
> +    int64_t ret;
> +    int n;
> +
> +    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
> +    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
> +    ret = bdrv_get_block_status_above(bs, backing_bs(bs),
> +                                      offset >> BDRV_SECTOR_BITS,
> +                                      bytes >> BDRV_SECTOR_BITS, &n, file);
> +    if (pnum) {
> +        *pnum = n * BDRV_SECTOR_SIZE;
> +    }

Is it safe to truncate the request in the event that the caller did not
provide a pnum target? that is, how will they know for what range we are
answering?

> +    return ret;
>  }
> 
>  int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 0d4824993c..d837b3980d 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1584,7 +1584,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>           * cluster is already marked as zero, or if it's unallocated and we
>           * don't have a backing file.
>           *
> -         * TODO We might want to use bdrv_get_block_status(bs) here, but we're
> +         * TODO We might want to use bdrv_block_status(bs) here, but we're

thanks for updating comments too :)

>           * holding s->lock, so that doesn't work today.
>           *
>           * If full_discard is true, the sector should not read back as zeroes,
> diff --git a/qemu-img.c b/qemu-img.c
> index 54f7682069..897f80abb3 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1598,9 +1598,14 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
> 
>      if (s->sector_next_status <= sector_num) {
>          if (s->target_has_backing) {
> -            ret = bdrv_get_block_status(blk_bs(s->src[src_cur]),
> -                                        sector_num - src_cur_offset,
> -                                        n, &n, NULL);
> +            int64_t count = n * BDRV_SECTOR_SIZE;
> +
> +            ret = bdrv_block_status(blk_bs(s->src[src_cur]),
> +                                    (sector_num - src_cur_offset) *
> +                                    BDRV_SECTOR_SIZE,
> +                                    count, &count, NULL);
> +            assert(ret < 0 || QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
> +            n = count >> BDRV_SECTOR_BITS;
>          } else {
>              ret = bdrv_get_block_status_above(blk_bs(s->src[src_cur]), NULL,
>                                                sector_num - src_cur_offset,
> @@ -2677,9 +2682,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>      int depth;
>      BlockDriverState *file;
>      bool has_offset;
> -    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
> 
> -    assert(bytes < INT_MAX);
>      /* As an optimization, we could cache the current range of unallocated
>       * clusters in each file of the chain, and avoid querying the same
>       * range repeatedly.
> @@ -2687,12 +2690,11 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
> 
>      depth = 0;
>      for (;;) {
> -        ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS, nb_sectors,
> -                                    &nb_sectors, &file);
> +        ret = bdrv_block_status(bs, offset, bytes, &bytes, &file);
>          if (ret < 0) {
>              return ret;
>          }
> -        assert(nb_sectors);
> +        assert(bytes);
>          if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
>              break;
>          }
> @@ -2709,7 +2711,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
> 
>      *e = (MapEntry) {
>          .start = offset,
> -        .length = nb_sectors * BDRV_SECTOR_SIZE,
> +        .length = bytes,
>          .data = !!(ret & BDRV_BLOCK_DATA),
>          .zero = !!(ret & BDRV_BLOCK_ZERO),
>          .offset = ret & BDRV_BLOCK_OFFSET_MASK,
> 

Rest appears obviously correct.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes
  2017-09-26 19:39   ` John Snow
@ 2017-09-26 19:57     ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-26 19:57 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

On 09/26/2017 02:39 PM, John Snow wrote:
>> -int64_t bdrv_get_block_status(BlockDriverState *bs,
>> -                              int64_t sector_num,
>> -                              int nb_sectors, int *pnum,
>> -                              BlockDriverState **file)
>> +int64_t bdrv_block_status(BlockDriverState *bs,
>> +                          int64_t offset, int64_t bytes, int64_t *pnum,
>> +                          BlockDriverState **file)
>>  {
>> -    return bdrv_get_block_status_above(bs, backing_bs(bs),
>> -                                       sector_num, nb_sectors, pnum, file);
>> +    int64_t ret;
>> +    int n;
>> +
>> +    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
>> +    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
>> +    ret = bdrv_get_block_status_above(bs, backing_bs(bs),
>> +                                      offset >> BDRV_SECTOR_BITS,
>> +                                      bytes >> BDRV_SECTOR_BITS, &n, file);
>> +    if (pnum) {
>> +        *pnum = n * BDRV_SECTOR_SIZE;
>> +    }
> 
> Is it safe to truncate the request in the event that the caller did not
> provide a pnum target? that is, how will they know for what range we are
> answering?

Hmm. I think I have some rebase cruft here. At one point, I was playing
with the idea of allowing pnum == NULL for ALL get_status() callers,
similar to the existing block/vvfat.c:cluster_was_modified():

block/vvfat.c:                    res = bdrv_is_allocated(s->qcow->bs,
block/vvfat.c-                                            (offset + i) *
BDRV_SECTOR_SIZE,
block/vvfat.c-
BDRV_SECTOR_SIZE, NULL);

but looking further, only bdrv_is_allocated() (and NOT
bdrv_[get_]block_status) is ever used in that manner.  Or, in terms of
the 'mapping' variable, a NULL pnum only makes sense when mapping ==
false.  So the conditional on 'if (pnum)' should be dropped here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
@ 2017-09-26 20:15   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-26 20:15 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change); and as with its public counterpart,
> rename to bdrv_co_block_status() to make the compiler enforce that
> we catch all uses.  For now, we assert that callers still pass
> aligned data, but ultimately, this will be the function where we
> hand off to a byte-based driver callback, and will eventually need
> to add logic to ensure we round calls according to the driver's
> request_alignment then touch up the result handed back to the
> caller, to start permitting a caller to pass unaligned offsets.
> 
> Note that we are now prepared to accepts 'bytes' larger than INT_MAX;
> this is okay as long as we clamp things internally before violating
> any 32-bit limits, and makes no difference to how a client will
> use the information (clients looping over the entire file must
> already be prepared for consecutive calls to return the same status,
> as drivers are already free to return shorter-than-maximal status
> due to any other convenient split points, such as when the L2 table
> crosses cluster boundaries in qcow2).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData " Eric Blake
@ 2017-09-26 20:20   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-26 20:20 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> type (no semantic change), and rename it to match the corresponding
> public function rename.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() " Eric Blake
@ 2017-09-27 18:26   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 18:26 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() " Eric Blake
@ 2017-09-27 18:31   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 18:31 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> type (no semantic change), and rename it to match the corresponding
> public function rename.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>


Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
@ 2017-09-27 18:41   ` John Snow
  2017-09-27 18:57     ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-27 18:41 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the name of the function from bdrv_get_block_status_above()
> to bdrv_block_status_above() ensures that the compiler enforces that
> all callers are updated.  For now, the io.c layer still assert()s
> that all callers are sector-aligned, but that can be relaxed when a
> later patch implements byte-based block status in the drivers.
> 
> For the most part this patch is just the addition of scaling at the
> callers followed by inverse scaling at bdrv_block_status().  But some
> code, particularly bdrv_block_status(), gets a lot simpler because
> it no longer has to mess with sectors.  Likewise, mirror code no
> longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore
> drop an assertion (fix a neighboring assertion to use is_power_of_2
> while there).
> 

Huh, I suppose so, yeah. Do you have a test that covers what happens in
this newly available use case?

> For ease of review, bdrv_get_block_status() was tackled separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Looks mechanically correct, anyway.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
  2017-09-27 18:41   ` John Snow
@ 2017-09-27 18:57     ` Eric Blake
  2017-09-27 19:40       ` John Snow
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-09-27 18:57 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

On 09/27/2017 01:41 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> We are gradually moving away from sector-based interfaces, towards
>> byte-based.  In the common case, allocation is unlikely to ever use
>> values that are not naturally sector-aligned, but it is possible
>> that byte-based values will let us be more precise about allocation
>> at the end of an unaligned file that can do byte-based access.
>>
>> Changing the name of the function from bdrv_get_block_status_above()
>> to bdrv_block_status_above() ensures that the compiler enforces that
>> all callers are updated.  For now, the io.c layer still assert()s
>> that all callers are sector-aligned, but that can be relaxed when a
>> later patch implements byte-based block status in the drivers.
>>
>> For the most part this patch is just the addition of scaling at the
>> callers followed by inverse scaling at bdrv_block_status().  But some
>> code, particularly bdrv_block_status(), gets a lot simpler because
>> it no longer has to mess with sectors.  Likewise, mirror code no
>> longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore
>> drop an assertion (fix a neighboring assertion to use is_power_of_2
>> while there).
>>
> 
> Huh, I suppose so, yeah. Do you have a test that covers what happens in
> this newly available use case?

Not directly - the mirror code no longer requires sector alignment, but
is still unlikely to use sub-sector requests unless a particular driver
returns really small status information.  I suppose we could tweak the
blkdebug driver to force status requests to be fragmented at
ridiculously small alignments, and then prove that mirroring still
occurs correctly, once all the series are in, but it's probably more
effort than it is worth to force sub-sector mirroring if we don't have a
real use case that will rely on it.

> 
>> For ease of review, bdrv_get_block_status() was tackled separately.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
> 
> Looks mechanically correct, anyway.
> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare()
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare() Eric Blake
@ 2017-09-27 19:05   ` John Snow
  2017-09-27 19:15     ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-09-27 19:05 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> As long as we are querying the status for a chunk smaller than
> the known image size, we are guaranteed that a successful return
> will have set pnum to a non-zero size (pnum is zero only for
> queries beyond the end of the file).  Use that to slightly
> simplify the calculation of the current chunk size being compared.
> Likewise, we don't have to shrink the amount of data operated on
> until we know we have to read the file, and therefore have to fit
> in the bounds of our buffer.  Also, note that 'total_sectors_over'
> is equivalent to 'progress_base'.
> 
> With these changes in place, sectors_to_process() is now dead code,
> and can be removed.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: new patch
> ---
>  qemu-img.c | 40 +++++++++++-----------------------------
>  1 file changed, 11 insertions(+), 29 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index b91133b922..f8423e9b3f 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1171,11 +1171,6 @@ static int64_t sectors_to_bytes(int64_t sectors)
>      return sectors << BDRV_SECTOR_BITS;
>  }
> 
> -static int64_t sectors_to_process(int64_t total, int64_t from)
> -{
> -    return MIN(total - from, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
> -}
> -
>  /*
>   * Check if passed sectors are empty (not allocated or contain only 0 bytes)
>   *
> @@ -1372,13 +1367,9 @@ static int img_compare(int argc, char **argv)
>          goto out;
>      }
> 
> -    for (;;) {
> +    while (sector_num < total_sectors) {
>          int64_t status1, status2;
> 
> -        nb_sectors = sectors_to_process(total_sectors, sector_num);
> -        if (nb_sectors <= 0) {
> -            break;
> -        }
>          status1 = bdrv_block_status_above(bs1, NULL,
>                                            sector_num * BDRV_SECTOR_SIZE,
>                                            (total_sectors1 - sector_num) *
> @@ -1402,14 +1393,9 @@ static int img_compare(int argc, char **argv)
>              goto out;
>          }
>          allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
> -        if (pnum1) {
> -            nb_sectors = MIN(nb_sectors,
> -                             DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE));
> -        }
> -        if (pnum2) {
> -            nb_sectors = MIN(nb_sectors,
> -                             DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE));
> -        }
> +
> +        assert(pnum1 && pnum2);
> +        nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);

In the apocalyptic future where non-sector sized returns are possible,
does this math make sense?

e.g. say the return is zeroes, but it's not aligned anymore, so we
assume we have an extra half a sector's worth of zeroes here.

> 
>          if (strict) {
>              if ((status1 & ~BDRV_BLOCK_OFFSET_MASK) !=
> @@ -1422,9 +1408,10 @@ static int img_compare(int argc, char **argv)
>              }
>          }
>          if ((status1 & BDRV_BLOCK_ZERO) && (status2 & BDRV_BLOCK_ZERO)) {
> -            nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
> +            /* nothing to do */
>          } else if (allocated1 == allocated2) {
>              if (allocated1) {
> +                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
>                  ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
>                                  nb_sectors << BDRV_SECTOR_BITS);
>                  if (ret < 0) {
> @@ -1453,7 +1440,7 @@ static int img_compare(int argc, char **argv)
>                  }
>              }
>          } else {
> -
> +            nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
>              if (allocated1) {
>                  ret = check_empty_sectors(blk1, sector_num, nb_sectors,
>                                            filename1, buf1, quiet);
> @@ -1476,30 +1463,24 @@ static int img_compare(int argc, char **argv)
> 
>      if (total_sectors1 != total_sectors2) {
>          BlockBackend *blk_over;
> -        int64_t total_sectors_over;
>          const char *filename_over;
> 
>          qprintf(quiet, "Warning: Image size mismatch!\n");
>          if (total_sectors1 > total_sectors2) {
> -            total_sectors_over = total_sectors1;
>              blk_over = blk1;
>              filename_over = filename1;
>          } else {
> -            total_sectors_over = total_sectors2;
>              blk_over = blk2;
>              filename_over = filename2;
>          }
> 
> -        for (;;) {
> +        while (sector_num < progress_base) {
>              int64_t count;
> 
> -            nb_sectors = sectors_to_process(total_sectors_over, sector_num);
> -            if (nb_sectors <= 0) {
> -                break;
> -            }
>              ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
>                                            sector_num * BDRV_SECTOR_SIZE,
> -                                          nb_sectors * BDRV_SECTOR_SIZE,
> +                                          (progress_base - sector_num) *
> +                                          BDRV_SECTOR_SIZE,
>                                            &count);
>              if (ret < 0) {
>                  ret = 3;
> @@ -1513,6 +1494,7 @@ static int img_compare(int argc, char **argv)
>              assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
>              nb_sectors = count >> BDRV_SECTOR_BITS;
>              if (ret) {
> +                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
>                  ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
>                                            filename_over, buf1, quiet);
>                  if (ret) {
> 

Rest looks right to me.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare()
  2017-09-27 19:05   ` John Snow
@ 2017-09-27 19:15     ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-27 19:15 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]

On 09/27/2017 02:05 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> As long as we are querying the status for a chunk smaller than
>> the known image size, we are guaranteed that a successful return
>> will have set pnum to a non-zero size (pnum is zero only for
>> queries beyond the end of the file).  Use that to slightly
>> simplify the calculation of the current chunk size being compared.
>> Likewise, we don't have to shrink the amount of data operated on
>> until we know we have to read the file, and therefore have to fit
>> in the bounds of our buffer.  Also, note that 'total_sectors_over'
>> is equivalent to 'progress_base'.
>>
>> With these changes in place, sectors_to_process() is now dead code,
>> and can be removed.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>> @@ -1402,14 +1393,9 @@ static int img_compare(int argc, char **argv)
>>              goto out;
>>          }
>>          allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
>> -        if (pnum1) {
>> -            nb_sectors = MIN(nb_sectors,
>> -                             DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE));
>> -        }
>> -        if (pnum2) {
>> -            nb_sectors = MIN(nb_sectors,
>> -                             DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE));
>> -        }
>> +
>> +        assert(pnum1 && pnum2);
>> +        nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
> 
> In the apocalyptic future where non-sector sized returns are possible,
> does this math make sense?
> 
> e.g. say the return is zeroes, but it's not aligned anymore, so we
> assume we have an extra half a sector's worth of zeroes here.

Not introduced in this patch, but a good question for 12/23.  We want to
round up rather than down to ensure that we don't inf-loop on a partial
sector response; but at the same time, you're right that if we got a
report of a half-sector zero and we widen it, we can't guarantee that
the second half is zero.

On the bright side, this rounding goes away when later patches switch
img_compare to be byte-based, later in this series.  But you're right
that it is probably smarter to have 12/23 assert that things are already
aligned (and thus we don't need to round in the first place).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
  2017-09-27 18:57     ` Eric Blake
@ 2017-09-27 19:40       ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 19:40 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi



On 09/27/2017 02:57 PM, Eric Blake wrote:
> On 09/27/2017 01:41 PM, John Snow wrote:
>>
>>
>> On 09/13/2017 12:03 PM, Eric Blake wrote:
>>> We are gradually moving away from sector-based interfaces, towards
>>> byte-based.  In the common case, allocation is unlikely to ever use
>>> values that are not naturally sector-aligned, but it is possible
>>> that byte-based values will let us be more precise about allocation
>>> at the end of an unaligned file that can do byte-based access.
>>>
>>> Changing the name of the function from bdrv_get_block_status_above()
>>> to bdrv_block_status_above() ensures that the compiler enforces that
>>> all callers are updated.  For now, the io.c layer still assert()s
>>> that all callers are sector-aligned, but that can be relaxed when a
>>> later patch implements byte-based block status in the drivers.
>>>
>>> For the most part this patch is just the addition of scaling at the
>>> callers followed by inverse scaling at bdrv_block_status().  But some
>>> code, particularly bdrv_block_status(), gets a lot simpler because
>>> it no longer has to mess with sectors.  Likewise, mirror code no
>>> longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore
>>> drop an assertion (fix a neighboring assertion to use is_power_of_2
>>> while there).
>>>
>>
>> Huh, I suppose so, yeah. Do you have a test that covers what happens in
>> this newly available use case?
> 
> Not directly - the mirror code no longer requires sector alignment, but
> is still unlikely to use sub-sector requests unless a particular driver
> returns really small status information.  I suppose we could tweak the
> blkdebug driver to force status requests to be fragmented at
> ridiculously small alignments, and then prove that mirroring still
> occurs correctly, once all the series are in, but it's probably more
> effort than it is worth to force sub-sector mirroring if we don't have a
> real use case that will rely on it.
> 

Hmm, yeah, the code probably can't be exercised currently but I do
wonder if we're removing too many breadcrumbs for potential problem
spots if someone decides to return sub-sector information in the future.

Well, I suppose I haven't been too diligent about complaining about
their removal elsewhere, so for consistency:

Either with or without the assertion removed as you see fit:

Reviewed-by: John Snow <jsnow@redhat.com>

>>
>>> For ease of review, bdrv_get_block_status() was tackled separately.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
>>
>> Looks mechanically correct, anyway.
>>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>>
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file Eric Blake
@ 2017-09-27 20:54   ` John Snow
  2017-10-03  9:32   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 20:54 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Compare the following images with all-zero contents:
> $ truncate --size 1M A
> $ qemu-img create -f qcow2 -o preallocation=off B 1G
> $ qemu-img create -f qcow2 -o preallocation=metadata C 1G
> 
> On my machine, the difference is noticeable for pre-patch speeds,
> with more than an order of magnitude in difference caused by the
> choice of preallocation in the qcow2 file:
> 
> $ time ./qemu-img compare -f raw -F qcow2 A B
> Warning: Image size mismatch!
> Images are identical.
> 
> real	0m0.014s
> user	0m0.007s
> sys	0m0.007s
> 
> $ time ./qemu-img compare -f raw -F qcow2 A C
> Warning: Image size mismatch!
> Images are identical.
> 
> real	0m0.341s
> user	0m0.144s
> sys	0m0.188s
> 
> Why? Because bdrv_is_allocated() returns false for image B but
> true for image C, throwing away the fact that both images know
> via lseek(SEEK_HOLE) that the entire image still reads as zero.
> From there, qemu-img ends up calling bdrv_pread() for every byte
> of the tail, instead of quickly looking for the next allocation.
> The solution: use block_status instead of is_allocated, giving:
> 
> $ time ./qemu-img compare -f raw -F qcow2 A C
> Warning: Image size mismatch!
> Images are identical.
> 
> real	0m0.014s
> user	0m0.011s
> sys	0m0.003s
> 
> which is on par with the speeds for no pre-allocation.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Makes good sense to me.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero()
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero() Eric Blake
@ 2017-09-27 21:16   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 21:16 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> During 'qemu-img compare', when we are checking that an allocated
> portion of one file is all zeros, we don't need to waste time
> computing how many additional sectors after the first non-zero
> byte are also non-zero.  Create a new helper find_nonzero() to do
> the check for a first non-zero sector, and rebase
> check_empty_sectors() to use it.
> 
> The new interface intentionally uses bytes in its interface, even
> though it still crawls the buffer a sector at a time; it is robust
> to a partial sector at the end of the buffer.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare Eric Blake
@ 2017-09-27 21:35   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 21:35 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> If a read error is encountered during 'qemu-img compare', we
> were printing the "Error while reading offset ..." message twice.
> Update the testsuite for the improved output.
> 
> Further simplify the code by hoisting the error code conversion
> into the helper function, rather than repeating it at the callers.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: new patch
> ---
>  qemu-img.c                 | 19 +++++--------------
>  tests/qemu-iotests/074.out |  2 --
>  2 files changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index dfccebe6bc..3e1e373e8f 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1196,8 +1196,10 @@ static int64_t sectors_to_bytes(int64_t sectors)
>  /*
>   * Check if passed sectors are empty (not allocated or contain only 0 bytes)
>   *
> - * Returns 0 in case sectors are filled with 0, 1 if sectors contain non-zero
> - * data and negative value on error.
> + * Intended for use by 'qemu-img compare': Returns 0 in case sectors are
> + * filled with 0, 1 if sectors contain non-zero data (this is a comparison
> + * failure), and 4 on error (the exit status for read errors), after emitting
> + * an error message.
>   *
>   * @param blk:  BlockBackend for the image
>   * @param sect_num: Number of first sector to check
> @@ -1218,7 +1220,7 @@ static int check_empty_sectors(BlockBackend *blk, int64_t sect_num,
>      if (ret < 0) {
>          error_report("Error while reading offset %" PRId64 " of %s: %s",
>                       sectors_to_bytes(sect_num), filename, strerror(-ret));
> -        return ret;
> +        return 4;
>      }
>      idx = find_nonzero(buffer, sect_count * BDRV_SECTOR_SIZE);
>      if (idx >= 0) {
> @@ -1473,11 +1475,6 @@ static int img_compare(int argc, char **argv)
>                                            filename2, buf1, quiet);
>              }
>              if (ret) {
> -                if (ret < 0) {
> -                    error_report("Error while reading offset %" PRId64 ": %s",
> -                                 sectors_to_bytes(sector_num), strerror(-ret));
> -                    ret = 4;
> -                }
>                  goto out;
>              }
>          }
> @@ -1522,12 +1519,6 @@ static int img_compare(int argc, char **argv)
>                  ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
>                                            filename_over, buf1, quiet);
>                  if (ret) {
> -                    if (ret < 0) {
> -                        error_report("Error while reading offset %" PRId64
> -                                     " of %s: %s", sectors_to_bytes(sector_num),
> -                                     filename_over, strerror(-ret));
> -                        ret = 4;
> -                    }
>                      goto out;
>                  }
>              }
> diff --git a/tests/qemu-iotests/074.out b/tests/qemu-iotests/074.out
> index 8fba5aea9c..ede66c3f81 100644
> --- a/tests/qemu-iotests/074.out
> +++ b/tests/qemu-iotests/074.out
> @@ -4,7 +4,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
>  wrote 512/512 bytes at offset 512
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  qemu-img: Error while reading offset 0 of blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error
> -qemu-img: Error while reading offset 0: Input/output error
>  4
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
>  Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=0
> @@ -12,7 +11,6 @@ Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=0
>  wrote 512/512 bytes at offset 512
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  qemu-img: Error while reading offset 0 of blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error
> -qemu-img: Error while reading offset 0 of blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error
>  Warning: Image size mismatch!
>  4
>  Cleanup
> 

Hm, naively I might assume it's best for the caller to report the error
and to leave the function a nicely self-contained helper, but I won't
insist on it.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based Eric Blake
@ 2017-09-27 21:43   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 21:43 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Continue on the quest to make more things byte-based instead of
> sector-based.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status()
  2017-09-25 22:43   ` John Snow
@ 2017-09-27 21:46     ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-27 21:46 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2297 bytes --]

On 09/25/2017 05:43 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> Not all callers care about which BDS owns the mapping for a given
>> range of the file.  This patch merely simplifies the callers by
>> consolidating the logic in the common call point, while guaranteeing
>> a non-NULL file to all the driver callbacks, for no semantic change.
>> The only caller that does not care about pnum is bdrv_is_allocated,
>> as invoked by vvfat; we can likewise add assertions that the rest
>> of the stack does not have to worry about a NULL pnum.
>>
>> Furthermore, this will also set the stage for a future cleanup: when
>> a caller does not care about which BDS owns an offset, it would be
>> nice to allow the driver to optimize things to not have to return
>> BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
>> allocation (for example, it's fairly easy to create a qcow2 image
>> where consecutive guest addresses are not at consecutive host
>> addresses), the current contract requires bdrv_get_block_status()
>> to clamp *pnum to the limit where host addresses are no longer
>> consecutive, but allowing a NULL file means that *pnum could be
>> set to the full length of known-allocated data.
>>
> 
> Function reads slightly worse for the wear now with all of the return
> logic handled at various places within, but unifying it might be even
> stranger, perhaps..
> 
> Let's see if I hate this more:
> 
> out:
> bdrv_dec_in_flight(bs);
>     bdrv_dec_in_flight(bs);
>     if (ret >= 0 && sector_num + *pnum == total_sectors) {
>         ret |= BDRV_BLOCK_EOF;
>     }
> early_out:
>     if (file) {
>         *file = local_file;
>     }
>     return ret;
> 
> 
> and then earlier in the function, we can just:
> 
> if (total_sectors < 0) {
>   ret = total_sectors;
>   goto early_out;
> }

Seems reasonable enough, I'll work that in to v5, since there are other
reasons to respin the series anyway.

> 
> It's only shed paint, though:
> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 
> I'm looking at the rest of the series now, so please stand by.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based Eric Blake
@ 2017-09-27 22:25   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-27 22:25 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> In the continuing quest to make more things byte-based, change
> compare_sectors(), renaming it to compare_buffers() in the
> process.  Note that one caller (qemu-img compare) only cares
> about the first difference, while the other (qemu-img rebase)
> cares about how many consecutive sectors have the same
> equal/different status; however, this patch does not bother to
> micro-optimize the compare case to avoid the comparisons of
> sectors beyond the first mismatch.  Both callers are always
> passing valid buffers in, so the initial check for buffer size
> can be turned into an assertion.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: new patch
> ---
>  qemu-img.c | 55 +++++++++++++++++++++++++++----------------------------
>  1 file changed, 27 insertions(+), 28 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 2e05f92e85..034122eba5 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1155,31 +1155,28 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
>  }
> 
>  /*
> - * Compares two buffers sector by sector. Returns 0 if the first sector of both
> - * buffers matches, non-zero otherwise.
> + * Compares two buffers sector by sector. Returns 0 if the first
> + * sector of each buffer matches, non-zero otherwise.
>   *
> - * pnum is set to the number of sectors (including and immediately following
> - * the first one) that are known to have the same comparison result
> + * pnum is set to the sector-aligned size of the buffer prefix that
> + * has the same matching status as the first sector.
>   */
> -static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n,
> -    int *pnum)
> +static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2,
> +                           int64_t bytes, int64_t *pnum)
>  {
>      bool res;
> -    int i;
> +    int64_t i = MIN(bytes, BDRV_SECTOR_SIZE);
> 
> -    if (n <= 0) {
> -        *pnum = 0;
> -        return 0;
> -    }
> +    assert(bytes > 0);
> 
> -    res = !!memcmp(buf1, buf2, 512);
> -    for(i = 1; i < n; i++) {
> -        buf1 += 512;
> -        buf2 += 512;
> +    res = !!memcmp(buf1, buf2, i);

It is temporarily confusing that 'i' is never again used for this
particular parameter, because

> +    while (i < bytes) {

This gives the brief impression that we might be looping in a way that
changes the comparison size passed to memcmp, which isn't true.

Just me being cranky, though. It's probably still the best way, because
of how you have to prime the loop. Doing it the literal-minded way
requires an extra i += len, so:

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated()
  2017-09-26 18:31   ` John Snow
@ 2017-09-28 14:58     ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-28 14:58 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]

On 09/26/2017 01:31 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> Not all callers care about which BDS owns the mapping for a given
>> range of the file.  In particular, bdrv_is_allocated() cares more
>> about finding the largest run of allocated data from the guest
>> perspective, whether or not that data is consecutive from the
>> host perspective.  Therefore, doing subsequent refinements such
>> as checking how much of the format-layer allocation also satisfies
>> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
>> case, it just costs extra CPU cycles during a single
>> bdrv_is_allocated(), but in the worst case, it results in a smaller
>> *pnum, and forces callers to iterate through more status probes when
>> visiting the entire file for even more extra CPU cycles.
>>
>> This patch only optimizes the block layer.  But subsequent patches
>> will tweak the driver callback to be byte-based, and in the process,
>> can also pass this hint through to the driver.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>>   *
>> + * If 'mapping' is true, the caller is querying for mapping purposes,
>> + * and the result should include BDRV_BLOCK_OFFSET_VALID where
>> + * possible; otherwise, the result may omit that bit particularly if
>> + * it allows for a larger value in 'pnum'.

I decided one more tweak to the comment will help:

+ * If 'mapping' is true, the caller is querying for mapping purposes,
+ * and the result should include BDRV_BLOCK_OFFSET_VALID and
+ * BDRV_BLOCK_ZERO where possible; otherwise, the result may omit those
+ * bits particularly if it allows for a larger value in 'pnum'.


>> @@ -1836,12 +1844,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>>          }
>>      }
>>
>> -    if (local_file && local_file != bs &&
>> +    if (mapping && local_file && local_file != bs &&
> 
> Tentatively this looks OK to me, but I have to admit I'm a little shaky
> on this portion because I've not really investigated this function too
> much. I am at the very least convinced that when mapping is true that
> the function is equivalent and that existing callers don't have their
> behavior changed too much.
> 
> Benefit of the doubt:
> 
> Reviewed-by: John Snow <jsnow@redhat.com>

Then I'll tentatively keep your R-b even with the comment tweak, unless
you say otherwise :)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful
  2017-09-26 19:29       ` John Snow
@ 2017-09-28 22:29         ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-28 22:29 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3676 bytes --]

On 09/26/2017 02:29 PM, John Snow wrote:
> 
> 
> On 09/26/2017 03:18 PM, Eric Blake wrote:
>> On 09/26/2017 01:51 PM, John Snow wrote:
>>>
>>>
>>> On 09/13/2017 12:03 PM, Eric Blake wrote:
>>>> In the process of converting sector-based interfaces to bytes,
>>>> I'm finding it easier to represent a byte count as a 64-bit
>>>> integer at the block layer (even if we are internally capped
>>>> by SIZE_MAX or even INT_MAX for individual transactions, it's
>>>> still nicer to not have to worry about truncation/overflow
>>>> issues on as many variables).  Update the signature of
>>>> bdrv_round_to_clusters() to uniformly use int64_t, matching
>>>> the signature already chosen for bdrv_is_allocated and the
>>>> fact that off_t is also a signed type, then adjust clients
>>>> according to the required fallout.
>>>>
>>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>>>
>>
>>>> @@ -946,7 +946,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>>>>      struct iovec iov;
>>>>      QEMUIOVector bounce_qiov;
>>>>      int64_t cluster_offset;
>>>> -    unsigned int cluster_bytes;
>>>> +    int64_t cluster_bytes;
>>>>      size_t skip_bytes;
>>>>      int ret;
>>>>
>>>> @@ -967,6 +967,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>>>>      trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
>>>>                                     cluster_offset, cluster_bytes);
>>>>
>>>> +    assert(cluster_bytes < SIZE_MAX);
>>>
>>> later in this function, is there any real or imagined risk of
>>> cluster_bytes exceeding INT_MAX when it's passed to
>>> bdrv_co_do_pwrite_zeroes?
>>>
>>>>      iov.iov_len = cluster_bytes;
>>
>> cluster_bytes is the input 'unsigned int bytes' rounded out to cluster
> 
> Ah, yes, we're probably not going to exceed that, you're right.
> 
>> boundaries, but where we know 'bytes <= BDRV_REQUEST_MAX_BYTES' (which
>> is 2^31 - 511).  Still, I guess you are right that rounding to a cluster
>> size could produce a larger value of exactly 2^31 (bigger than INT_MAX,
>> but still fits in 32-bit unsigned int, so my assert was to make sure
>> that truncating 64 bits to size_t iov.iov_len still works on 32-bit
>> platforms).
>>
>> In theory, I don't think we ever attempt an unaligned operation near
>> 2^31 that would round up to INT_MAX overflow (if we can, that's a
>> pre-existing bug that should be fixed separately).
>>
>> Should I tighten the assertion to assert(cluster_bytes <=
>> BDRV_REQUEST_MAX_BYTES), then see if I can come up with a case where we
>> can violate that?
>>
> 
> *Only* if you think it's worth your time. You'd know better than me at
> this point if this is remotely possible or not. Just a simple width
> check that caught my eye.

I reproduced a test case - we have a pre-existing bug.  An update to
qemu-io coming up (I need to make it easy to turn on
BDRV_O_COPY_ON_READ); then a new iotests with my test case: create a
backing file with more than 2G of explicit 0, then open a brand new
wrapper qcow2 file and read 2G-512 bytes at offset 1024.  This will,
given default qcow2 cluster size of 64k, proceed to copy-on-write 2G+64k
of data; which fits fine in the pre-patch unsigned int or post-patch
int64_t, but becomes an unintended no-op in the bdrv_co_do_pwrite_zeroes.

Took me the better part of a day to figure out how to provoke it in a
way appropriate for iotests, but I'm grateful you gave me the challenge.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/23] qemu-img: Change img_rebase() to be byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 19/23] qemu-img: Change img_rebase() " Eric Blake
@ 2017-09-29 19:38   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-29 19:38 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> In the continuing quest to make more things byte-based, change
> the internal iteration of img_rebase().  We can finally drop the
> TODO assertion added earlier, now that the entire algorithm is
> byte-based and no longer has to shift from bytes to sectors.
> 
> Most of the change is mechanical ('num_sectors' becomes 'size',
> 'sector' becomes 'offset', 'n' goes from sectors to bytes); some
> of it is also a cleanup (use of MIN() instead of open-coding,
> loss of variable 'count' added earlier in commit d6a644bb).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
  2017-09-26 18:51   ` John Snow
@ 2017-09-29 20:03   ` Eric Blake
  1 sibling, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-09-29 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, famz, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi, jsnow

[-- Attachment #1: Type: text/plain, Size: 2229 bytes --]

On 09/13/2017 11:03 AM, Eric Blake wrote:
> In the process of converting sector-based interfaces to bytes,
> I'm finding it easier to represent a byte count as a 64-bit
> integer at the block layer (even if we are internally capped
> by SIZE_MAX or even INT_MAX for individual transactions, it's
> still nicer to not have to worry about truncation/overflow
> issues on as many variables).  Update the signature of
> bdrv_round_to_clusters() to uniformly use int64_t, matching
> the signature already chosen for bdrv_is_allocated and the
> fact that off_t is also a signed type, then adjust clients
> according to the required fallout.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> 

> @@ -946,7 +946,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
>      struct iovec iov;
>      QEMUIOVector bounce_qiov;
>      int64_t cluster_offset;
> -    unsigned int cluster_bytes;
> +    int64_t cluster_bytes;
>      size_t skip_bytes;
>      int ret;

Here, 'bytes' is still unsigned int, and I widened cluster_bytes,

> +++ b/block/trace-events
> @@ -12,7 +12,7 @@ blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flag
>  bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
>  bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
>  bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x"
> -bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"
> +bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %"PRId64" cluster_offset %"PRId64" cluster_bytes %u"

but I botched which variable got widened in the trace.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/23] qemu-img: Change img_compare() to be byte-based
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 20/23] qemu-img: Change img_compare() " Eric Blake
@ 2017-09-29 20:42   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-09-29 20:42 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> In the continuing quest to make more things byte-based, change
> the internal iteration of img_compare().  We can finally drop the
> TODO assertion added earlier, now that the entire algorithm is
> byte-based and no longer has to shift from bytes to sectors.
> 
> Most of the change is mechanical ('total_sectors' becomes
> 'total_size', 'sector_num' becomes 'offset', 'nb_sectors' becomes
> 'chunk', 'progress_base' goes from sectors to bytes); some of it
> is also a cleanup (sectors_to_bytes() is now unused, loss of
> variable 'count' added earlier in commit 51b0a488).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: new patch
> ---
>  qemu-img.c | 119 ++++++++++++++++++++++++-------------------------------------
>  1 file changed, 46 insertions(+), 73 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 028c34a2cc..ef7062649d 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1185,11 +1185,6 @@ static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2,
> 
>  #define IO_BUF_SIZE (2 * 1024 * 1024)
> 
> -static int64_t sectors_to_bytes(int64_t sectors)
> -{
> -    return sectors << BDRV_SECTOR_BITS;
> -}
> -
>  /*
>   * Check if passed sectors are empty (not allocated or contain only 0 bytes)
>   *
> @@ -1240,7 +1235,7 @@ static int img_compare(int argc, char **argv)
>      const char *fmt1 = NULL, *fmt2 = NULL, *cache, *filename1, *filename2;
>      BlockBackend *blk1, *blk2;
>      BlockDriverState *bs1, *bs2;
> -    int64_t total_sectors1, total_sectors2;
> +    int64_t total_size1, total_size2;
>      uint8_t *buf1 = NULL, *buf2 = NULL;
>      int64_t pnum1, pnum2;
>      int allocated1, allocated2;
> @@ -1248,9 +1243,9 @@ static int img_compare(int argc, char **argv)
>      bool progress = false, quiet = false, strict = false;
>      int flags;
>      bool writethrough;
> -    int64_t total_sectors;
> -    int64_t sector_num = 0;
> -    int64_t nb_sectors;
> +    int64_t total_size;
> +    int64_t offset = 0;
> +    int64_t chunk;
>      int c;
>      uint64_t progress_base;
>      bool image_opts = false;
> @@ -1364,39 +1359,36 @@ static int img_compare(int argc, char **argv)
> 
>      buf1 = blk_blockalign(blk1, IO_BUF_SIZE);
>      buf2 = blk_blockalign(blk2, IO_BUF_SIZE);
> -    total_sectors1 = blk_nb_sectors(blk1);
> -    if (total_sectors1 < 0) {
> +    total_size1 = blk_getlength(blk1);
> +    if (total_size1 < 0) {
>          error_report("Can't get size of %s: %s",
> -                     filename1, strerror(-total_sectors1));
> +                     filename1, strerror(-total_size1));
>          ret = 4;
>          goto out;
>      }
> -    total_sectors2 = blk_nb_sectors(blk2);
> -    if (total_sectors2 < 0) {
> +    total_size2 = blk_getlength(blk2);
> +    if (total_size2 < 0) {
>          error_report("Can't get size of %s: %s",
> -                     filename2, strerror(-total_sectors2));
> +                     filename2, strerror(-total_size2));
>          ret = 4;
>          goto out;
>      }
> -    total_sectors = MIN(total_sectors1, total_sectors2);
> -    progress_base = MAX(total_sectors1, total_sectors2);
> +    total_size = MIN(total_size1, total_size2);
> +    progress_base = MAX(total_size1, total_size2);
> 
>      qemu_progress_print(0, 100);
> 
> -    if (strict && total_sectors1 != total_sectors2) {
> +    if (strict && total_size1 != total_size2) {
>          ret = 1;
>          qprintf(quiet, "Strict mode: Image size mismatch!\n");
>          goto out;
>      }
> 
> -    while (sector_num < total_sectors) {
> +    while (offset < total_size) {
>          int64_t status1, status2;
> 
> -        status1 = bdrv_block_status_above(bs1, NULL,
> -                                          sector_num * BDRV_SECTOR_SIZE,
> -                                          (total_sectors1 - sector_num) *
> -                                          BDRV_SECTOR_SIZE,
> -                                          &pnum1, NULL);
> +        status1 = bdrv_block_status_above(bs1, NULL, offset,
> +                                          total_size1 - offset, &pnum1, NULL);
>          if (status1 < 0) {
>              ret = 3;
>              error_report("Sector allocation test failed for %s", filename1);
> @@ -1404,11 +1396,8 @@ static int img_compare(int argc, char **argv)
>          }
>          allocated1 = status1 & BDRV_BLOCK_ALLOCATED;
> 
> -        status2 = bdrv_block_status_above(bs2, NULL,
> -                                          sector_num * BDRV_SECTOR_SIZE,
> -                                          (total_sectors2 - sector_num) *
> -                                          BDRV_SECTOR_SIZE,
> -                                          &pnum2, NULL);
> +        status2 = bdrv_block_status_above(bs2, NULL, offset,
> +                                          total_size2 - offset, &pnum2, NULL);
>          if (status2 < 0) {
>              ret = 3;
>              error_report("Sector allocation test failed for %s", filename2);
> @@ -1417,15 +1406,14 @@ static int img_compare(int argc, char **argv)
>          allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
> 
>          assert(pnum1 && pnum2);
> -        nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE);
> +        chunk = MIN(pnum1, pnum2);

Ayup, there goes that line.

> 
>          if (strict) {
>              if ((status1 & ~BDRV_BLOCK_OFFSET_MASK) !=
>                  (status2 & ~BDRV_BLOCK_OFFSET_MASK)) {
>                  ret = 1;
>                  qprintf(quiet, "Strict mode: Offset %" PRId64
> -                        " block status mismatch!\n",
> -                        sectors_to_bytes(sector_num));
> +                        " block status mismatch!\n", offset);
>                  goto out;
>              }
>          }
> @@ -1435,59 +1423,54 @@ static int img_compare(int argc, char **argv)
>              if (allocated1) {
>                  int64_t pnum;
> 
> -                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
> -                ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
> -                                nb_sectors << BDRV_SECTOR_BITS);
> +                chunk = MIN(chunk, IO_BUF_SIZE);
> +                ret = blk_pread(blk1, offset, buf1, chunk);
>                  if (ret < 0) {
> -                    error_report("Error while reading offset %" PRId64 " of %s:"
> -                                 " %s", sectors_to_bytes(sector_num), filename1,
> -                                 strerror(-ret));
> +                    error_report("Error while reading offset %" PRId64
> +                                 " of %s: %s",
> +                                 offset, filename1, strerror(-ret));
>                      ret = 4;
>                      goto out;
>                  }
> -                ret = blk_pread(blk2, sector_num << BDRV_SECTOR_BITS, buf2,
> -                                nb_sectors << BDRV_SECTOR_BITS);
> +                ret = blk_pread(blk2, offset, buf2, chunk);
>                  if (ret < 0) {
>                      error_report("Error while reading offset %" PRId64
> -                                 " of %s: %s", sectors_to_bytes(sector_num),
> -                                 filename2, strerror(-ret));
> +                                 " of %s: %s",
> +                                 offset, filename2, strerror(-ret));
>                      ret = 4;
>                      goto out;
>                  }
> -                ret = compare_buffers(buf1, buf2,
> -                                      nb_sectors * BDRV_SECTOR_SIZE, &pnum);
> -                if (ret || pnum != nb_sectors * BDRV_SECTOR_SIZE) {
> +                ret = compare_buffers(buf1, buf2, chunk, &pnum);
> +                if (ret || pnum != chunk) {
>                      qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
> -                            sectors_to_bytes(sector_num) + (ret ? 0 : pnum));
> +                            offset + (ret ? 0 : pnum));
>                      ret = 1;
>                      goto out;
>                  }
>              }
>          } else {
> -            nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
> +            chunk = MIN(chunk, IO_BUF_SIZE);
>              if (allocated1) {
> -                ret = check_empty_sectors(blk1, sector_num * BDRV_SECTOR_SIZE,
> -                                          nb_sectors * BDRV_SECTOR_SIZE,
> +                ret = check_empty_sectors(blk1, offset, chunk,
>                                            filename1, buf1, quiet);
>              } else {
> -                ret = check_empty_sectors(blk2, sector_num * BDRV_SECTOR_SIZE,
> -                                          nb_sectors * BDRV_SECTOR_SIZE,
> +                ret = check_empty_sectors(blk2, offset, chunk,
>                                            filename2, buf1, quiet);
>              }
>              if (ret) {
>                  goto out;
>              }
>          }
> -        sector_num += nb_sectors;
> -        qemu_progress_print(((float) nb_sectors / progress_base)*100, 100);
> +        offset += chunk;
> +        qemu_progress_print(((float) chunk / progress_base) * 100, 100);
>      }
> 
> -    if (total_sectors1 != total_sectors2) {
> +    if (total_size1 != total_size2) {
>          BlockBackend *blk_over;
>          const char *filename_over;
> 
>          qprintf(quiet, "Warning: Image size mismatch!\n");
> -        if (total_sectors1 > total_sectors2) {
> +        if (total_size1 > total_size2) {
>              blk_over = blk1;
>              filename_over = filename1;
>          } else {
> @@ -1495,14 +1478,10 @@ static int img_compare(int argc, char **argv)
>              filename_over = filename2;
>          }
> 
> -        while (sector_num < progress_base) {
> -            int64_t count;
> -
> -            ret = bdrv_block_status_above(blk_bs(blk_over), NULL,
> -                                          sector_num * BDRV_SECTOR_SIZE,
> -                                          (progress_base - sector_num) *
> -                                          BDRV_SECTOR_SIZE,
> -                                          &count, NULL);
> +        while (offset < progress_base) {
> +            ret = bdrv_block_status_above(blk_bs(blk_over), NULL, offset,
> +                                          progress_base - offset, &chunk,
> +                                          NULL);
>              if (ret < 0) {
>                  ret = 3;
>                  error_report("Sector allocation test failed for %s",
> @@ -1510,22 +1489,16 @@ static int img_compare(int argc, char **argv)
>                  goto out;
> 
>              }
> -            /* TODO relax this once bdrv_block_status_above does not enforce
> -             * sector alignment */
> -            assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
> -            nb_sectors = count >> BDRV_SECTOR_BITS;
>              if (ret & BDRV_BLOCK_ALLOCATED && !(ret & BDRV_BLOCK_ZERO)) {
> -                nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
> -                ret = check_empty_sectors(blk_over,
> -                                          sector_num * BDRV_SECTOR_SIZE,
> -                                          nb_sectors * BDRV_SECTOR_SIZE,
> +                chunk = MIN(chunk, IO_BUF_SIZE);
> +                ret = check_empty_sectors(blk_over, offset, chunk,
>                                            filename_over, buf1, quiet);
>                  if (ret) {
>                      goto out;
>                  }
>              }
> -            sector_num += nb_sectors;
> -            qemu_progress_print(((float) nb_sectors / progress_base)*100, 100);
> +            offset += chunk;
> +            qemu_progress_print(((float) chunk / progress_base) * 100, 100);
>          }
>      }
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/23] block: Align block status requests
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 21/23] block: Align block status requests Eric Blake
  2017-09-13 19:26   ` Eric Blake
@ 2017-10-02 20:24   ` John Snow
  2017-10-02 23:51     ` Eric Blake
  1 sibling, 1 reply; 64+ messages in thread
From: John Snow @ 2017-10-02 20:24 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Any device that has request_alignment greater than 512 should be
> unable to report status at a finer granularity; it may also be
> simpler for such devices to be guaranteed that the block layer
> has rounded things out to the granularity boundary (the way the
> block layer already rounds all other I/O out).  Besides, getting
> the code correct for super-sector alignment also benefits us
> for the fact that our public interface now has byte granularity,
> even though none of our drivers have byte-level callbacks.
> 
> Add an assertion in blkdebug that proves that the block layer
> never requests status of unaligned sections, similar to what it
> does on other requests (while still keeping the generic helper
> in place for when future patches add a throttle driver).  Note
> that iotest 177 already covers this (it would fail if you use
> just the blkdebug.c hunk without the io.c changes).  Meanwhile,
> we can drop assertions in callers that no longer have to pass
> in sector-aligned addresses.
> 
> There is a mid-function scope added for 'int count', for a
> couple of reasons: first, an upcoming patch will add an 'if'
> statement that checks whether a driver has an old- or new-style
> callback, and can conveniently use the same scope for less
> indentation churn at that time.  Second, since we are trying
> to get rid of sector-based computations, wrapping things in
> a scope makes it easier to group and see what will be deleted
> in a final cleanup patch once all drivers have been converted
> to the new-style callback.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: tweak commit message [Fam], rebase to context conflicts, ensure
> we don't exceed 32-bit limit, drop R-b
> v2: new patch
> ---
>  include/block/block_int.h |  3 ++-
>  block/io.c                | 55 +++++++++++++++++++++++++++++++----------------
>  block/blkdebug.c          | 13 ++++++++++-
>  3 files changed, 51 insertions(+), 20 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 7f71c585a0..b1ceffba78 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -207,7 +207,8 @@ struct BlockDriver {
>       * according to the current layer, and should not set
>       * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
>       * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
> -     * layer guarantees non-NULL pnum and file.
> +     * layer guarantees input aligned to request_alignment, as well as
> +     * non-NULL pnum and file.
>       */
>      int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum,
> diff --git a/block/io.c b/block/io.c
> index ea63d19480..c78201b8eb 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1773,7 +1773,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>      int64_t n; /* bytes */
>      int64_t ret, ret2;
>      BlockDriverState *local_file = NULL;
> -    int count; /* sectors */
> +    int64_t aligned_offset, aligned_bytes;
> +    uint32_t align;
> 
>      assert(pnum);
>      total_size = bdrv_getlength(bs);
> @@ -1815,28 +1816,45 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>      }
> 
>      bdrv_inc_in_flight(bs);
> -    /*
> -     * TODO: Rather than require aligned offsets, we could instead
> -     * round to the driver's request_alignment here, then touch up
> -     * count afterwards back to the caller's expectations.
> -     */
> -    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
> -    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
> -    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
> -                                            bytes >> BDRV_SECTOR_BITS, &count,
> -                                            &local_file);
> -    if (ret < 0) {
> -        *pnum = 0;
> -        goto out;
> +
> +    /* Round out to request_alignment boundaries */
> +    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);

There's something funny to me about an alignment request getting itself
aligned...

> +    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
> +    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
> +
> +    {
> +        int count; /* sectors */
> +
> +        assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
> +                               BDRV_SECTOR_SIZE));
> +        ret = bs->drv->bdrv_co_get_block_status(
> +            bs, aligned_offset >> BDRV_SECTOR_BITS,
> +            MIN(INT_MAX, aligned_bytes) >> BDRV_SECTOR_BITS, &count,

I guess under the belief that INT_MAX will be strictly less than
BDRV_REQUEST_MAX_BYTES, or some other reason I'm missing for the change?

> +            &local_file);
> +        if (ret < 0) {
> +            *pnum = 0;
> +            goto out;
> +        }
> +        *pnum = count * BDRV_SECTOR_SIZE;

Is it asking for trouble to be updating pnum here before we undo our
alignment corrections? For readability reasons and preventing an
accidental context-based oopsy-daisy.

> +    }
> +
> +    /* Clamp pnum and ret to original request */
> +    assert(QEMU_IS_ALIGNED(*pnum, align));

Oh, do we guarantee this? I guess we do..

> +    *pnum -= offset - aligned_offset;

can pnum prior to adjustment ever be less than offset - aligned_offset?
i.e., can this underflow?

(Can we fail to actually inquire about the range the caller was
interested in by aligning down too much and observing a difference in
allocation status between the alignment pre-range and the actual range?)

> +    if (aligned_offset >> BDRV_SECTOR_BITS != offset >> BDRV_SECTOR_BITS &&
> +        ret & BDRV_BLOCK_OFFSET_VALID) {
> +        ret += QEMU_ALIGN_DOWN(offset - aligned_offset, BDRV_SECTOR_SIZE);
> +    }

Alright, and if the starting sectors are different (Wait, why is it
sectors now instead of the requested alignment? Is this safe for all
formats?) we adjust the return value forward a little bit to match the
difference.

> +    if (*pnum > bytes) {
> +        *pnum = bytes;
>      }

Assuming this clamps the aligned_bytes range down to the bytes range, in
case it's contiguous beyond what the caller asked for.

> -    *pnum = count * BDRV_SECTOR_SIZE;
> 
>      if (ret & BDRV_BLOCK_RAW) {
>          assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
>          ret = bdrv_co_block_status(local_file, mapping,
> -                                   ret & BDRV_BLOCK_OFFSET_MASK,
> +                                   (ret & BDRV_BLOCK_OFFSET_MASK) |
> +                                   (offset & ~BDRV_BLOCK_OFFSET_MASK),
>                                     *pnum, pnum, &local_file);
> -        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
>          goto out;
>      }
> 
> @@ -1860,7 +1878,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>          int64_t file_pnum;
> 
>          ret2 = bdrv_co_block_status(local_file, mapping,
> -                                    ret & BDRV_BLOCK_OFFSET_MASK,
> +                                    (ret & BDRV_BLOCK_OFFSET_MASK) |
> +                                    (offset & ~BDRV_BLOCK_OFFSET_MASK),
>                                      *pnum, &file_pnum, NULL);
>          if (ret2 >= 0) {
>              /* Ignore errors.  This is just providing extra information, it
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 46e53f2f09..f54fe33cae 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -628,6 +628,17 @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
>      return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
>  }
> 
> +static int64_t coroutine_fn blkdebug_co_get_block_status(
> +    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> +    BlockDriverState **file)
> +{
> +    assert(QEMU_IS_ALIGNED(sector_num | nb_sectors,
> +                           DIV_ROUND_UP(bs->bl.request_alignment,
> +                                        BDRV_SECTOR_SIZE)));
> +    return bdrv_co_get_block_status_from_file(bs, sector_num, nb_sectors,
> +                                              pnum, file);
> +}
> +
>  static void blkdebug_close(BlockDriverState *bs)
>  {
>      BDRVBlkdebugState *s = bs->opaque;
> @@ -897,7 +908,7 @@ static BlockDriver bdrv_blkdebug = {
>      .bdrv_co_flush_to_disk  = blkdebug_co_flush,
>      .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
>      .bdrv_co_pdiscard       = blkdebug_co_pdiscard,
> -    .bdrv_co_get_block_status = bdrv_co_get_block_status_from_file,
> +    .bdrv_co_get_block_status = blkdebug_co_get_block_status,
> 
>      .bdrv_debug_event           = blkdebug_debug_event,
>      .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
> 

Looks good overall but I have some comprehension issues in my own head
about the adjustment math and why the various alignments are safe.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 22/23] block: Relax bdrv_aligned_preadv() assertion
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 22/23] block: Relax bdrv_aligned_preadv() assertion Eric Blake
@ 2017-10-02 21:20   ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-10-02 21:20 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Now that bdrv_is_allocated accepts non-aligned inputs, we can
> remove the TODO added in commit d6a644bb.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
@ 2017-10-02 21:27   ` John Snow
  2017-10-02 23:56     ` Eric Blake
  0 siblings, 1 reply; 64+ messages in thread
From: John Snow @ 2017-10-02 21:27 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 09/13/2017 12:03 PM, Eric Blake wrote:
> Previously, the alloc command required that input parameters be
> sector-aligned and clamped to 32 bits, because the underlying
> bdrv_is_allocated used a 32-bit parameter and asserted aligned
> inputs.  But now that we have fixed block status to report a
> 64-bit bytes value, and to properly round requests on behalf of
> guests, we can pass any values, and can use qemu-io to add
> coverage that our rounding is correct regardless of the guest
> alignment constraints.
> 
> Update iotest 177 to intentionally probe block status at
> unaligned boundaries as well as with a bytes value that does not
> map to 32-bit sectors, which also required tweaking the image
> prep to leave an unallocated portion to the image under test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: also test huge bytes value, R-b dropped
> v2: new patch
> ---
>  qemu-io-cmds.c             | 13 -------------
>  tests/qemu-iotests/177     | 12 ++++++++++--
>  tests/qemu-iotests/177.out | 19 ++++++++++++++-----
>  3 files changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 2811a89099..d9a32f3bed 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1769,10 +1769,6 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>      if (offset < 0) {
>          print_cvtnum_err(offset, argv[1]);
>          return 0;
> -    } else if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
> -        printf("%" PRId64 " is not a sector-aligned value for 'offset'\n",
> -               offset);
> -        return 0;
>      }
> 
>      if (argc == 3) {
> @@ -1780,19 +1776,10 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>          if (count < 0) {
>              print_cvtnum_err(count, argv[2]);
>              return 0;
> -        } else if (count > INT_MAX * BDRV_SECTOR_SIZE) {
> -            printf("length argument cannot exceed %llu, given %s\n",
> -                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
> -            return 0;
>          }
>      } else {
>          count = BDRV_SECTOR_SIZE;
>      }
> -    if (!QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE)) {
> -        printf("%" PRId64 " is not a sector-aligned value for 'count'\n",
> -               count);
> -        return 0;
> -    }
> 
>      remaining = count;
>      sum_alloc = 0;
> diff --git a/tests/qemu-iotests/177 b/tests/qemu-iotests/177
> index f8ed8fb86b..28990977f1 100755
> --- a/tests/qemu-iotests/177
> +++ b/tests/qemu-iotests/177
> @@ -51,7 +51,7 @@ echo "== setting up files =="
>  TEST_IMG="$TEST_IMG.base" _make_test_img $size
>  $QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
>  _make_test_img -b "$TEST_IMG.base"
> -$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "write -P 22 0 110M" "$TEST_IMG" | _filter_qemu_io
> 
>  # Limited to 64k max-transfer
>  echo
> @@ -82,6 +82,13 @@ $QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
>           -c "discard 80000001 30M" | _filter_qemu_io
> 
>  echo
> +echo "== block status smaller than alignment =="
> +limits=align=4k
> +$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
> +	 -c "alloc 1 1" -c "alloc 0x6dffff0 1000" -c "alloc 127m 5P" \
> +	 -c map | _filter_qemu_io
> +
> +echo
>  echo "== verify image content =="
> 
>  function verify_io()
> @@ -103,7 +110,8 @@ function verify_io()
>      echo read -P 0 32M 32M
>      echo read -P 22 64M 13M
>      echo read -P $discarded 77M 29M
> -    echo read -P 22 106M 22M
> +    echo read -P 22 106M 4M
> +    echo read -P 11 110M 18M
>  }
> 
>  verify_io | $QEMU_IO -r "$TEST_IMG" | _filter_qemu_io
> diff --git a/tests/qemu-iotests/177.out b/tests/qemu-iotests/177.out
> index 43a777836c..f788b55e20 100644
> --- a/tests/qemu-iotests/177.out
> +++ b/tests/qemu-iotests/177.out
> @@ -5,8 +5,8 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
>  wrote 134217728/134217728 bytes at offset 0
>  128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
> -wrote 134217728/134217728 bytes at offset 0
> -128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 115343360/115343360 bytes at offset 0
> +110 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> 
>  == constrained alignment and max-transfer ==
>  wrote 131072/131072 bytes at offset 1000
> @@ -26,6 +26,13 @@ wrote 33554432/33554432 bytes at offset 33554432
>  discard 31457280/31457280 bytes at offset 80000001
>  30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> 
> +== block status smaller than alignment ==
> +1/1 bytes allocated at offset 1 bytes
> +16/1000 bytes allocated at offset 110 MiB
> +0/1048576 bytes allocated at offset 127 MiB
> +110 MiB (0x6e00000) bytes     allocated at offset 0 bytes (0x0)
> +18 MiB (0x1200000) bytes not allocated at offset 110 MiB (0x6e00000)
> +
>  == verify image content ==
>  read 1000/1000 bytes at offset 0
>  1000 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> @@ -43,12 +50,14 @@ read 13631488/13631488 bytes at offset 67108864
>  13 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  read 30408704/30408704 bytes at offset 80740352
>  29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -read 23068672/23068672 bytes at offset 111149056
> -22 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 4194304/4194304 bytes at offset 111149056
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 18874368/18874368 bytes at offset 115343360
> +18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  Offset          Length          File
>  0               0x800000        TEST_DIR/t.IMGFMT
>  0x900000        0x2400000       TEST_DIR/t.IMGFMT
>  0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
> -0x6a00000       0x1600000       TEST_DIR/t.IMGFMT
> +0x6a00000       0x400000        TEST_DIR/t.IMGFMT
>  No errors were found on the image.
>  *** done
> 

aaand I'll hold off on this one until the respin so I don't have to
review the test twice.

I'll say I'm done for v4 for now :)

Thanks,
--js

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/23] block: Align block status requests
  2017-10-02 20:24   ` John Snow
@ 2017-10-02 23:51     ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2017-10-02 23:51 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 13816 bytes --]

On 10/02/2017 03:24 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> Any device that has request_alignment greater than 512 should be
>> unable to report status at a finer granularity; it may also be
>> simpler for such devices to be guaranteed that the block layer
>> has rounded things out to the granularity boundary (the way the
>> block layer already rounds all other I/O out).  Besides, getting
>> the code correct for super-sector alignment also benefits us
>> for the fact that our public interface now has byte granularity,
>> even though none of our drivers have byte-level callbacks.
>>
>> Add an assertion in blkdebug that proves that the block layer
>> never requests status of unaligned sections, similar to what it
>> does on other requests (while still keeping the generic helper
>> in place for when future patches add a throttle driver).  Note
>> that iotest 177 already covers this (it would fail if you use
>> just the blkdebug.c hunk without the io.c changes).  Meanwhile,
>> we can drop assertions in callers that no longer have to pass
>> in sector-aligned addresses.
>>
>> There is a mid-function scope added for 'int count', for a
>> couple of reasons: first, an upcoming patch will add an 'if'
>> statement that checks whether a driver has an old- or new-style
>> callback, and can conveniently use the same scope for less
>> indentation churn at that time.  Second, since we are trying
>> to get rid of sector-based computations, wrapping things in
>> a scope makes it easier to group and see what will be deleted
>> in a final cleanup patch once all drivers have been converted
>> to the new-style callback.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>> @@ -1815,28 +1816,45 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>>      }
>>
>>      bdrv_inc_in_flight(bs);
>> -    /*
>> -     * TODO: Rather than require aligned offsets, we could instead
>> -     * round to the driver's request_alignment here, then touch up
>> -     * count afterwards back to the caller's expectations.
>> -     */
>> -    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
>> -    bytes = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
>> -    ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
>> -                                            bytes >> BDRV_SECTOR_BITS, &count,
>> -                                            &local_file);
>> -    if (ret < 0) {
>> -        *pnum = 0;
>> -        goto out;
>> +
>> +    /* Round out to request_alignment boundaries */
>> +    align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
> 
> There's something funny to me about an alignment request getting itself
> aligned...

Pre-patch, we are asserting that all callers are passing in
sector-aligned requests (even though we've switched the interface to
allow byte-based granularity, none of the callers are yet taking
advantage of that), then passing on sector-aligned requests to the
driver (regardless of whether the driver has 1-byte alignment, like
posix-file, or is using 4k alignment, like some block devices).
Post-patch, we are going with the larger of the driver's preferred
alignment, and our minimum of 512 (mainly because until series 4, we
have no way to pass byte values on to the driver, even if the driver
otherwise supports smaller alignments).  This MAX() disappears later in
series 4, once the driver callback is made byte-based, first by becoming
conditional on whether the driver is old sector-based or new byte-based
callback:
 https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03814.html
and then altogether when the sector-based is deleted:
 https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03833.html

So, this patch, coupled with the new driver callback in series 4, is
what lets us introduce clients that are able to finally pass in values
that are not sector-aligned; and the rounding up of alignment here is a
stop-gap measure to keep things working until the transition is complete.

> 
>> +    aligned_offset = QEMU_ALIGN_DOWN(offset, align);
>> +    aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
>> +
>> +    {
>> +        int count; /* sectors */
>> +
>> +        assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
>> +                               BDRV_SECTOR_SIZE));
>> +        ret = bs->drv->bdrv_co_get_block_status(
>> +            bs, aligned_offset >> BDRV_SECTOR_BITS,
>> +            MIN(INT_MAX, aligned_bytes) >> BDRV_SECTOR_BITS, &count,
> 
> I guess under the belief that INT_MAX will be strictly less than
> BDRV_REQUEST_MAX_BYTES, or some other reason I'm missing for the change?

INT_MAX is larger than BDRV_REQUEST_MAX_BYTES.  aligned_bytes, however,
may be larger than BDRV_REQUEST_MAX_BYTES (or even larger than INT_MAX).
 Once series 4 introduces the driver callback with a 64-bit length, then
we can pass in aligned_bytes as-is; but until then, while we are stuck
with a 32-bit sector length callback, we are artificially capping the
user's request to not exceed what we can reliably expect to work through
the driver callback.

It's not much of a real loss - our interface is already "tell me the
status of this offset, and of as many subsequent offsets that you can
easily check have the same status, up to my limit, and return the number
of like-typed offsets in pnum".  The caller can't tell the difference
between a driver that can give a full answer all the way to the
requested limit, and a driver that caps all answers at a single cluster
per call (we already document that a caller must be prepared to see the
same status for two subsequent calls in a row, rather than a single call
with a pnum covering both offsets).

> 
>> +            &local_file);
>> +        if (ret < 0) {
>> +            *pnum = 0;
>> +            goto out;
>> +        }
>> +        *pnum = count * BDRV_SECTOR_SIZE;
> 
> Is it asking for trouble to be updating pnum here before we undo our
> alignment corrections? For readability reasons and preventing an
> accidental context-based oopsy-daisy.

As in, write the code to make all calculations in a temporary, and then
assign *pnum only at the end?  I suppose I can tweak the code along
those lines, but I'm not sure it will make the end result any more legible.

> 
>> +    }
>> +
>> +    /* Clamp pnum and ret to original request */
>> +    assert(QEMU_IS_ALIGNED(*pnum, align));
> 
> Oh, do we guarantee this? I guess we do..

My overriding argument here is that a driver should never expose block
status that changes mid-alignment (for drivers that support an alignment
of 1, that's trivially true; but for drivers that refuse to read or
write anything smaller than 512 bytes at a time, it also stands to
reason that the driver won't report allocation status that differs
smaller than 512 bytes at a time).

> 
>> +    *pnum -= offset - aligned_offset;
> 
> can pnum prior to adjustment ever be less than offset - aligned_offset?
> i.e., can this underflow?

No, underflow should not be possible.  The difference between offset and
aligned_offset is at most (align - 1), and we already argued that the
driver cannot be returning results that lie in the middle of 'align'.
We have the corner case of a caller requesting status on 0 bytes, but
that should already be handled at the front of the function before we
call the driver, so a driver should never be called with a limit smaller
than align, and should never return success unless pnum is at least align.

Maybe an assertion would help, though.

> 
> (Can we fail to actually inquire about the range the caller was
> interested in by aligning down too much and observing a difference in
> allocation status between the alignment pre-range and the actual range?)

Again, the argument of alignment is that we are widening from the
caller's area of interest into the granularity supported by the driver;
since the driver can't report two different block status for different
portions of the granularity, we should never be in the situation where
aligning down too much sees the wrong status of the pre-range area that
differs from the status of the area in question.

> 
>> +    if (aligned_offset >> BDRV_SECTOR_BITS != offset >> BDRV_SECTOR_BITS &&
>> +        ret & BDRV_BLOCK_OFFSET_VALID) {
>> +        ret += QEMU_ALIGN_DOWN(offset - aligned_offset, BDRV_SECTOR_SIZE);
>> +    }
> 
> Alright, and if the starting sectors are different (Wait, why is it
> sectors now instead of the requested alignment? Is this safe for all
> formats?) we adjust the return value forward a little bit to match the
> difference.

The inherent problem with the bdrv_co_get_status() interface is that we
CANNOT report finer granularity than 512-byte mapping, even though we
now take byte offset/length input (we have only 64 bits to report an
answer, and those bits must include the upper 55 bits of the offset and
a lower 9 bits that are used as flags - we don't have the luxury of
reporting a full 64-bit mapping).  The solution is that we document that
the answer is an offset modulo 512: if we return an offset, it is the
offset of the start of the sector matching the byte in question.  (If I
ask for the status of byte 2, and the answer includes offset 0x1000,
then I know that I read offset 0x1002 to get the contents of the byte I
asked about)

So that answers what happens for sub-sector results.  The question then
is what do we do with drivers that require larger-than-512 alignment,
such as block devices that require 4k-byte allocation.

Pre-patch, we don't care about the driver's granularity - our question
was always 512-byte aligned, so our answer is also, and we don't have to
fudge anything back into place here in io.c (but we DO have to do the
fudging elsewhere; for example, qcow2.c:qcow2_co_get_block_status()
calls qcow2_get_cluster_offset on a number that is rounded down to a
cluster boundary, and has to 'cluster_offset |= (index_in_cluster <<
BDRV_SECTOR_BITS)' to get back to the right sector boundary).
Post-patch, because we know obey the driver's bs->bl.request_alignment,
a driver that requires 4k questions is in the same boat where we may
have rounded down, and then have to add back to the answer to get to the
right sector (the driver's answer already means we have pnum contiguous
bytes starting at the aligned offset).


> 
>> +    if (*pnum > bytes) {
>> +        *pnum = bytes;
>>      }
> 
> Assuming this clamps the aligned_bytes range down to the bytes range, in
> case it's contiguous beyond what the caller asked for.
> 
>> -    *pnum = count * BDRV_SECTOR_SIZE;
>>
>>      if (ret & BDRV_BLOCK_RAW) {
>>          assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
>>          ret = bdrv_co_block_status(local_file, mapping,
>> -                                   ret & BDRV_BLOCK_OFFSET_MASK,
>> +                                   (ret & BDRV_BLOCK_OFFSET_MASK) |
>> +                                   (offset & ~BDRV_BLOCK_OFFSET_MASK),
>>                                     *pnum, pnum, &local_file);
>> -        assert(QEMU_IS_ALIGNED(*pnum, BDRV_SECTOR_SIZE));
>>          goto out;
>>      }
>>
>> @@ -1860,7 +1878,8 @@ static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>>          int64_t file_pnum;
>>
>>          ret2 = bdrv_co_block_status(local_file, mapping,
>> -                                    ret & BDRV_BLOCK_OFFSET_MASK,
>> +                                    (ret & BDRV_BLOCK_OFFSET_MASK) |
>> +                                    (offset & ~BDRV_BLOCK_OFFSET_MASK),
>>                                      *pnum, &file_pnum, NULL);
>>          if (ret2 >= 0) {
>>              /* Ignore errors.  This is just providing extra information, it
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index 46e53f2f09..f54fe33cae 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
>> @@ -628,6 +628,17 @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
>>      return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
>>  }
>>
>> +static int64_t coroutine_fn blkdebug_co_get_block_status(
>> +    BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
>> +    BlockDriverState **file)
>> +{
>> +    assert(QEMU_IS_ALIGNED(sector_num | nb_sectors,
>> +                           DIV_ROUND_UP(bs->bl.request_alignment,
>> +                                        BDRV_SECTOR_SIZE)));
>> +    return bdrv_co_get_block_status_from_file(bs, sector_num, nb_sectors,
>> +                                              pnum, file);
>> +}
>> +
>>  static void blkdebug_close(BlockDriverState *bs)
>>  {
>>      BDRVBlkdebugState *s = bs->opaque;
>> @@ -897,7 +908,7 @@ static BlockDriver bdrv_blkdebug = {
>>      .bdrv_co_flush_to_disk  = blkdebug_co_flush,
>>      .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
>>      .bdrv_co_pdiscard       = blkdebug_co_pdiscard,
>> -    .bdrv_co_get_block_status = bdrv_co_get_block_status_from_file,
>> +    .bdrv_co_get_block_status = blkdebug_co_get_block_status,
>>
>>      .bdrv_debug_event           = blkdebug_debug_event,
>>      .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
>>
> 
> Looks good overall but I have some comprehension issues in my own head
> about the adjustment math and why the various alignments are safe.

I may add a couple more asserts and/or comments in the next spin,
documenting what conditions hold each step along the way.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert
  2017-10-02 21:27   ` John Snow
@ 2017-10-02 23:56     ` Eric Blake
  2017-10-03  3:18       ` John Snow
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2017-10-02 23:56 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]

On 10/02/2017 04:27 PM, John Snow wrote:
> 
> 
> On 09/13/2017 12:03 PM, Eric Blake wrote:
>> Previously, the alloc command required that input parameters be
>> sector-aligned and clamped to 32 bits, because the underlying
>> bdrv_is_allocated used a 32-bit parameter and asserted aligned
>> inputs.  But now that we have fixed block status to report a
>> 64-bit bytes value, and to properly round requests on behalf of
>> guests, we can pass any values, and can use qemu-io to add
>> coverage that our rounding is correct regardless of the guest
>> alignment constraints.
>>
>> Update iotest 177 to intentionally probe block status at
>> unaligned boundaries as well as with a bytes value that does not
>> map to 32-bit sectors, which also required tweaking the image
>> prep to leave an unallocated portion to the image under test.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>>  echo
>> +echo "== block status smaller than alignment =="
>> +limits=align=4k
>> +$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
>> +	 -c "alloc 1 1" -c "alloc 0x6dffff0 1000" -c "alloc 127m 5P" \
>> +	 -c map | _filter_qemu_io
>> +

> 
> aaand I'll hold off on this one until the respin so I don't have to
> review the test twice.

Fair enough; thanks for the reviews.

By the way, it's operations like the above additions where you can step
through bdrv_co_block_status in gdb to see all the rounding/alignment
steps in action, so I do feel pretty confident that my changes in 21/23
were fairly well covered.

> 
> I'll say I'm done for v4 for now :)

I'll try to get v5 posted in the next day or two.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert
  2017-10-02 23:56     ` Eric Blake
@ 2017-10-03  3:18       ` John Snow
  0 siblings, 0 replies; 64+ messages in thread
From: John Snow @ 2017-10-03  3:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, famz, qemu-block, Max Reitz



On 10/02/2017 07:56 PM, Eric Blake wrote:
> On 10/02/2017 04:27 PM, John Snow wrote:
>>
>>
>> On 09/13/2017 12:03 PM, Eric Blake wrote:
>>> Previously, the alloc command required that input parameters be
>>> sector-aligned and clamped to 32 bits, because the underlying
>>> bdrv_is_allocated used a 32-bit parameter and asserted aligned
>>> inputs.  But now that we have fixed block status to report a
>>> 64-bit bytes value, and to properly round requests on behalf of
>>> guests, we can pass any values, and can use qemu-io to add
>>> coverage that our rounding is correct regardless of the guest
>>> alignment constraints.
>>>
>>> Update iotest 177 to intentionally probe block status at
>>> unaligned boundaries as well as with a bytes value that does not
>>> map to 32-bit sectors, which also required tweaking the image
>>> prep to leave an unallocated portion to the image under test.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
> 
>>>  echo
>>> +echo "== block status smaller than alignment =="
>>> +limits=align=4k
>>> +$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
>>> +	 -c "alloc 1 1" -c "alloc 0x6dffff0 1000" -c "alloc 127m 5P" \
>>> +	 -c map | _filter_qemu_io
>>> +
> 
>>
>> aaand I'll hold off on this one until the respin so I don't have to
>> review the test twice.
> 
> Fair enough; thanks for the reviews.
> 
> By the way, it's operations like the above additions where you can step
> through bdrv_co_block_status in gdb to see all the rounding/alignment
> steps in action, so I do feel pretty confident that my changes in 21/23
> were fairly well covered.
> 

I do actually trust you, but I wasn't able to *quickly* convince myself,
so I held up on the r-b. I didn't spot any problems either, to be fair...!

>>
>> I'll say I'm done for v4 for now :)
> 
> I'll try to get v5 posted in the next day or two.
> 

No rush on my end ...

--js

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file
  2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file Eric Blake
  2017-09-27 20:54   ` John Snow
@ 2017-10-03  9:32   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 64+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-03  9:32 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, jsnow, famz, qemu-block, Max Reitz

13.09.2017 19:03, Eric Blake wrote:
> Compare the following images with all-zero contents:
> $ truncate --size 1M A
> $ qemu-img create -f qcow2 -o preallocation=off B 1G
> $ qemu-img create -f qcow2 -o preallocation=metadata C 1G
>
> On my machine, the difference is noticeable for pre-patch speeds,
> with more than an order of magnitude in difference caused by the
> choice of preallocation in the qcow2 file:
>
> $ time ./qemu-img compare -f raw -F qcow2 A B
> Warning: Image size mismatch!
> Images are identical.
>
> real	0m0.014s
> user	0m0.007s
> sys	0m0.007s
>
> $ time ./qemu-img compare -f raw -F qcow2 A C
> Warning: Image size mismatch!
> Images are identical.
>
> real	0m0.341s
> user	0m0.144s
> sys	0m0.188s
>
> Why? Because bdrv_is_allocated() returns false for image B but
> true for image C, throwing away the fact that both images know
> via lseek(SEEK_HOLE) that the entire image still reads as zero.
>  From there, qemu-img ends up calling bdrv_pread() for every byte
> of the tail, instead of quickly looking for the next allocation.
> The solution: use block_status instead of is_allocated, giving:
>
> $ time ./qemu-img compare -f raw -F qcow2 A C
> Warning: Image size mismatch!
> Images are identical.
>
> real	0m0.014s
> user	0m0.011s
> sys	0m0.003s
>
> which is on par with the speeds for no pre-allocation.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
> v3: new patch
> ---
>   qemu-img.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index f8423e9b3f..f5ab29d176 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1477,11 +1477,11 @@ static int img_compare(int argc, char **argv)
>           while (sector_num < progress_base) {
>               int64_t count;
>
> -            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
> +            ret = bdrv_block_status_above(blk_bs(blk_over), NULL,
>                                             sector_num * BDRV_SECTOR_SIZE,
>                                             (progress_base - sector_num) *
>                                             BDRV_SECTOR_SIZE,
> -                                          &count);
> +                                          &count, NULL);
>               if (ret < 0) {
>                   ret = 3;
>                   error_report("Sector allocation test failed for %s",
> @@ -1489,11 +1489,11 @@ static int img_compare(int argc, char **argv)
>                   goto out;
>
>               }
> -            /* TODO relax this once bdrv_is_allocated_above does not enforce
> +            /* TODO relax this once bdrv_block_status_above does not enforce
>                * sector alignment */
>               assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
>               nb_sectors = count >> BDRV_SECTOR_BITS;
> -            if (ret) {
> +            if (ret & BDRV_BLOCK_ALLOCATED && !(ret & BDRV_BLOCK_ZERO)) {
>                   nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS);
>                   ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
>                                             filename_over, buf1, quiet);

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2017-10-03  9:32 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-13 16:03 [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status() Eric Blake
2017-09-25 22:43   ` John Snow
2017-09-27 21:46     ` Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 02/23] block: Add flag to avoid wasted work in bdrv_is_allocated() Eric Blake
2017-09-26 18:31   ` John Snow
2017-09-28 14:58     ` Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 03/23] block: Make bdrv_round_to_clusters() signature more useful Eric Blake
2017-09-26 18:51   ` John Snow
2017-09-26 19:18     ` Eric Blake
2017-09-26 19:29       ` John Snow
2017-09-28 22:29         ` Eric Blake
2017-09-29 20:03   ` Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 04/23] qcow2: Switch is_zero_sectors() to byte-based Eric Blake
2017-09-26 19:06   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 05/23] block: Switch bdrv_make_zero() " Eric Blake
2017-09-26 19:13   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 06/23] qemu-img: Switch get_block_status() " Eric Blake
2017-09-26 19:16   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 07/23] block: Convert bdrv_get_block_status() to bytes Eric Blake
2017-09-26 19:39   ` John Snow
2017-09-26 19:57     ` Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 08/23] block: Switch bdrv_co_get_block_status() to byte-based Eric Blake
2017-09-26 20:15   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 09/23] block: Switch BdrvCoGetBlockStatusData " Eric Blake
2017-09-26 20:20   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() " Eric Blake
2017-09-27 18:26   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() " Eric Blake
2017-09-27 18:31   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes Eric Blake
2017-09-27 18:41   ` John Snow
2017-09-27 18:57     ` Eric Blake
2017-09-27 19:40       ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare() Eric Blake
2017-09-27 19:05   ` John Snow
2017-09-27 19:15     ` Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file Eric Blake
2017-09-27 20:54   ` John Snow
2017-10-03  9:32   ` Vladimir Sementsov-Ogievskiy
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero() Eric Blake
2017-09-27 21:16   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare Eric Blake
2017-09-27 21:35   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based Eric Blake
2017-09-27 21:43   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based Eric Blake
2017-09-27 22:25   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 19/23] qemu-img: Change img_rebase() " Eric Blake
2017-09-29 19:38   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 20/23] qemu-img: Change img_compare() " Eric Blake
2017-09-29 20:42   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 21/23] block: Align block status requests Eric Blake
2017-09-13 19:26   ` Eric Blake
2017-09-13 20:36     ` Eric Blake
2017-10-02 20:24   ` John Snow
2017-10-02 23:51     ` Eric Blake
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 22/23] block: Relax bdrv_aligned_preadv() assertion Eric Blake
2017-10-02 21:20   ` John Snow
2017-09-13 16:03 ` [Qemu-devel] [PATCH v4 23/23] qemu-io: Relax 'alloc' now that block-status doesn't assert Eric Blake
2017-10-02 21:27   ` John Snow
2017-10-02 23:56     ` Eric Blake
2017-10-03  3:18       ` John Snow
2017-09-13 21:05 ` [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.